National Archives Applied Research

Doc2Learn

Applied Research Technology Products: Software Downloads

This is a technology that allows an archivist to compare the contents of documents containing text, images, and vector graphics even if the documents are stored in different file formats. Two main capabilities of Doc2Learn are:

  • Compare document content across file formats
    • Word frequencies;
    • Color frequencies in images; and
    • Frequencies of encoded vector graphics
  • Automated grouping of documents

Use this URL to access the Doc2Learn software:
http://isda.ncsa.illinois.edu/download/index.php?project=Doc2Learn&sort=category

Top