National Archives News

NARA’s Digital Preservation Framework Goes Live as Linked Open Dataset

By Victoria Macchi and Angela Tudico | National Archives News

refer to caption


New equipment in the Sandia computer room, part of the Department of Energy's National Nuclear Security Administration, 1983. The National Archives and Records Administration (NARA) maintains a database of how to preserve 684 file formats, some dating back to the first transfers of electronic records to NARA 50 years ago,

View in National Archives Catalog

WASHINGTON, August 25, 2022 — Beginning today, the National Archives and Records Administration (NARA) is making its Digital Preservation Framework available as a Linked Open Dataset, a first for the agency.

Aimed at sharing NARA’s research with digital preservation professionals around the world, the dataset expands access that was previously available only through GitHub. Linked Open Data is a method for publishing data in a machine-readable way that allows it to be connected and enriched through links to directly related resources published by other organizations.

The Digital Preservation Framework describes best practices for the preservation of 684 file formats, some dating back to the first transfers of electronic records to NARA 50 years ago.

Included in the data is an assessment of the risk level of a particular file type, with NARA’s award-winning Digital Preservation team’s suggestions for how to handle, for example, the file of a drawing made in a specific CAD (computer-aided design) software package from the early 1990s.

“If I’m a records manager, and one of the offices at my agency has records to manage before sending them to NARA in alignment with a records schedule, and if some of them are in unusual file formats not covered by the NARA Transfer Guidance, the Framework is the best way to get our recommendations for what could be done to preserve them,” explained Leslie Johnston, Director of Digital Preservation.

  The public can make use of the linked data in three ways:

  •  Download the full Digital Preservation Framework File Format Plans and supporting documentation needed for dataset research use available as Linked Open Data in the RDF Turtle (ttl) format.
  •  Browse the full list of file formats to reach the Linked Open Data RDF Turtle file for a specific format’s preservation plan. Several formats are part of multiple categories.
  •  Browse the lists of formats by Record Category.


The launch of the Linked Open Dataset is the culmination of five years of research and collaboration across the agency and with peer institutions. It will be updated quarterly to keep pace with new formats and evolving digital preservation standards. Senior Digital Preservation Specialist Elizabeth England in the NARA Digital Preservation unit did the technical work to produce the Linked Data version of the Framework; the Digital Engagement team in the Office of Innovation implemented the public access environment.

"I'm proud of the diligent work by these two teams in advancing the National Archives mission of expanding public access, not only to our records but also to our research," said Debra Steidel Wall, Acting Archivist of the United States. "Their commitment to furthering transparency and digital preservation will allow for professionals around the world to improve their records management."

With this launch, NARA joins other institutional peers like the Library of Congress in providing such a resource.

“This is NARA joining the linked data community in a big way,” said Johnston, who, along with eight of her colleagues, developed the datasets over several years of research. 

For people outside of the digital preservation field, the value of the datasets may be hard to comprehend. But Johnston wants the public to understand the work it takes to ensure that files from, for example, a Presidential administration in 2022 will be accessible for decades to come.

“We are putting something out there that fits into a larger set of international resources, like Wikidata for Digital Preservation,” Johnston said. “It means you can find relevant information from all the authoritative sources and follow the path through what we, and other researchers, know about hundreds of formats and what we suggest should be done with them to maintain accessibility.”

The importance of the linked data resource, compared to the previously released version of the Digital Preservation Framework on GitHub, is that it can be incorporated into multiple existing community resources; interested users can explore information from authoritative sources that maintain similar datasets.

The GitHub repository, which launched in 2019, will remain live and will be continuously updated.

With the number of born-digital records growing and the upcoming deadline for the Transition to Electronic Records (OMB M-19-21), the digital preservation field is gaining in significance and attracting an increasing number of professionals, boosting the need for documented standards and best practices.

The National Archives Digital Preservation team won a National Digital Stewardship Alliance award in 2020 for its National Archives Digital Preservation Framework, which was the basis for the launch of this Linked Open Dataset.

Contributions to the international collection of digital preservation resources amplify NARA’s goals of promoting leadership and transparency in records management and expands the reach of researchers’ work across the world.

Read more information about National Archives’ digital preservation work on