National Archives Applied Research

Applied Research Technology Products

The Applied Research Division continues to acknowledge and validate the universal problem of preserving and accessing electronic records by partnering with computer scientists, engineers, and information management professionals with diverse domain expertise, and leveraging additional research. Below are some Applied Research Technology Products that can help archivists in their work.

  • iRODS – The Integrated Rule Oriented Data System is an open-source data grid that helps in the organization and management of large collections of distributed digital data. For more information visist the iRODS.
  • PERPOS - Initially designed to process presidential records, PERPOS is a suite of tools that support the Accessioning, Preservation, Arrangement, Review/Redaction, and Description of electronic records. For more information about PERPOS applications, visit the Georgia Tech Research Institute’s web site at:
  • CI-BER - CI-BER was a second-generation research collaboration that brings together experts in computer science, engineering, and archival science. It extended earlier work done under TPAP (Transcontinental Persistent Archives Prototype) and its goal was to further the understanding of infrastructure that scales and provide insights into the management of scientific data in general. We looked at ultra high scale collections and visual analytics techniques, in order to enhance the value of government records that can lead to generalizable infrastructure and technology. Visit the CI-BER blog at​.
  • Doc2Learn - A technology that allows an archivist to compare the contents of documents containing text, images, and vector graphics even if the documents are stored in different file formats.


  • Polyglot - Polyglot was created to provide an extensible, scalable, and quantifiable means of converting between file formats. The system is extensible in terms of being able to easily incorporate new conversion software, scalable in being able to distribute workload among parallel machines and quantifiable in having a built-in framework for measuring information loss across conversions.


For additional information about the project visit the project website at or download and try out the project software: Polyglot (, and Doc2Learn (