National Archives Applied Research

Applied Research Technology Products

The Applied Research Division continues to acknowledge and validate the universal problem of preserving and accessing electronic records by partnering with computer scientists, engineers, and information management professionals with diverse domain expertise, and leveraging additional research. Below are some Applied Research Technology Products that can help archivists in their work.

  • iRODS – The Integrated Rule Oriented Data System is an open source data grid that helps in the organization and management of large collections of distributed digital data. For more information visist the iRODS.

  • PERPOS - Initially designed to process presidential records, PERPOS is a suite of tools that support the Accessioning, Preservation, Arrangement, Review/Redaction, and Description of electronic records. For more information about PERPOS applications, visit the Georgia Tech Research Institute’s web site at:

  • CI-BER - CI-BER stands for CyberInfrastructure for Billions of Electronic Records, and is a project involving NARA, the National Science Foundation, the Renaissance Computing Institute (RENCI), and UNC/Chapel Hill. The CI-BER project is building a master copy of all the NARA research holdings at UNC. This testbed will be networked with distributed holdings across other key NARA sites and will serve, among other things, as a testbed to explore visual analytics of records. Visit the CI-BER blog at:

  • Doc2Learn - A technology that allows an archivist to compare the contents of documents containing text, images, and vector graphics even if the documents are stored in different file formats.

  • File2Learn - A technology that enables the discovery of relationships among records in large collections. This technology has been specifically used for establishing the relationships between two dimensional engineering drawings and three dimensional CAD models in large collections of engineering records.

  • Polyglot - Polyglot was created to provide an extensible, scalable, and quantifiable means of converting between file formats. The system is extensible in terms of being able to easily incorporate new conversion software, scalable in being able to distribute work load among parallel machines, and quantifiable in having a built in framework for measuring information loss across conversions.

For additional information about File2Learn, Polyglot, and Doc2Learn, visit the project website at or download and try out the project software at: