National Archives Applied Research

Advanced Research Projects

The Center for Advanced Systems and Technologies (NCAST) continues to acknowledge and validate the universal problem of preserving and accessing electronic records by partnering with computer scientists, engineers, and information management professionals with diverse domain expertise, and leveraging additional research. Below are some examples of NCAST partner research projects that can help archivists in their work.

  • iRODS – The Integrated Rule Oriented Data System is an open source data grid that helps in the organization and management of large collections of distributed digital data.

  • PERPOS - Initially designed to process presidential records, PERPOS is a suite of tools that support the Accessioning, Preservation, Arrangement, Review/Redaction, and Description of electronic records. For more information about PERPOS applications, visit the Georgia Tech Research Institute’s web site at:

  • Transcontinental Persistent Archives Prototype (TPAP) – NARA’s TPAP is a research test bed used by NCAST research partners located in the Washington metropolitan area and across the country. The TPAP supports collaborative efforts to examine preservation and access issues for large volumes of diverse and complex data that may be stored in various locations.

  • Doc2Learn - A technology that allows an archivist to compare the contents of documents containing text, images, and vector graphics even if the documents are stored in different file formats.

  • File2Learn - A technology that enables the discovery of relationships among records in large collections. This technology has been specifically used for establishing the relationships between two dimensional engineering drawings and three dimensional CAD models in large collections of engineering records.

  • Polyglot - Polyglot was created to provide an extensible, scalable, and quantifiable means of converting between file formats. The system is extensible in terms of being able to easily incorporate new conversion software, scalable in being able to distribute work load among parallel machines, and quantifiable in having a built in framework for measuring information loss across conversions.

For additional information about File2Learn, Polyglot, and Doc2Learn, visit the project website at or download and try out the project software at: