Doc2Learn
Applied Research Technology Products: Software Downloads
This is a technology that allows an archivist to compare the contents of documents containing text, images, and vector graphics even if the documents are stored in different file formats. Two main capabilities of Doc2Learn are:
- Compare document content across file formats
- Word frequencies;
- Color frequencies in images; and
- Frequencies of encoded vector graphics
- Automated grouping of documents
Use this URL to access the Doc2Learn software:
http://isda.ncsa.illinois.edu/download/index.php?project=Doc2Learn&sort=category