Preservation

Technical Information Paper No. 12

Digital-Imaging and Optical Digital Data Disk Storage Systems: Long-Term Access Strategies for Federal Agencies

July 1994

A Report by:

The Technology Research Staff
The National Archives at College Park
8601 Adelphi Road
College Park, Maryland 20740-6001


July 14, 1994

In recent years, Federal agencies have become increasingly interested in using digital information technologies to store large amounts of information economically and efficiently. This is particularly true of programs designed to provide Federal information to citizens, since a corresponding reduction in the creation of paper records could potentially reduce costs and improve the delivery of services to the public. However, agencies need to ensure that whatever technologies they employ to store information are capable of retrieving that information for as long as it is needed.

In 1991, the National Archives and Records Administration (NARA), in conjunction with the National Association of Government Archives and Records Administrators, conducted a study of digital imaging and optical media storage technologies at the State and local government levels. Building on the 1991 report, NARA initiated a study of Federal use of these two technologies. A project team from NARA's Technology Research Staff reviewed fifteen Federal digital imaging and optical digital data disk storage applications and interviewed a number of experts in the field. This Technical Information Paper, which makes recommendations for ensuring long-term access to digital images stored on optical digital data disks, is the result of that study.

As the keeper of the nation's memory and as the Federal government's institutional records manager, NARA has a mandate to provide records management guidance to Federal officials. At the same time, we invite archivists, records managers, information resource managers, and other information professionals to share their experiences and observations with us. Together we can develop strategies for using new technologies to store Federal information while ensuring that retention and access requirements are met.

TRUDY HUSKAMP PETERSON
Acting Archivist of the United States
National Archives and Records Administration


Table of Contents

Preface

In 1983 the Archivist of the United States directed that the recently created Archival Research and Evaluation Staff initiate a program to monitor developments in digital imaging and optical digital data disk storage technologies. By 1985, the Staff had initiated a research agenda that included several key programs. A pilot digital-imaging and optical digital data disk project, completed in 1989, assessed the suitability of digital-imaging and optical digital data disk technologies for National Archives holdings. Another program funded a research project by the National Institute of Standards and Technology (NIST) to develop a generic testing methodology for predicting the life expectancy of write once read many (WORM) optical media. This NIST study was completed in 1990. Subsequently, an international standards group on image permanence drew upon the results and conclusions of the NIST work in developing life expectancy standards for optical digital data disk storage systems. Thus far, this standards group has developed a draft life expectancy standard for CD-ROM media, and is currently developing a similar one for rewritable optical digital data disks.

Although NIST developed and demonstrated a successful generic testing methodology, it is useful only as a general indicator of media life expectancy. This is because specific optical digital data disks may fail at different periods of time that can vary over many years. Even though this indicator is helpful, perhaps as a general guide in media selection, it cannot indicate when to recopy data stored on a particular optical digital data disk. Consequently, in 1990 the National Archives and Records Administration (NARA) commissioned the National Institute of Standards and Technology to address this problem as part of a larger study on the data integrity of optical media. NIST program staff organized a working group that developed a set of procedures for monitoring and reporting results of error detection and error correction codes on optical disk drives. These error detection and error correction activities are executed automatically without any action by users. One existing problem is that most optical disk drives do not provide a functionality for monitoring and reporting the error-checking results. The results of the NIST/industry working group were used as the basis for an Association for Information and Image Management (AIIM) draft standard that stipulates media error monitoring and reporting techniques for verification of the information stored on optical digital data disks. This AIIM draft standard is now being balloted. NIST's data integrity study includes research in care and handling of optical digital data disks. The results of these experiments will be published in a NIST report due in the fall of 1994.

Concurrently, NARA's Technology Research Staff initiated a study of the use of digital-imaging and optical media technologies in public records programs for State and local governments. Although the impetus for this study was a request from the National Association of Government Archives and Records Administrators (NAGARA), the project's descriptive plan of work stated that it was the first stage of a two-stage project. The project would culminate in an information access strategy report for Federal agencies. NARA and NAGARA published the report of the first phase, the State and local government study, in December 1991. Shortly thereafter, NARA's Technology Research Staff began research on the second phase and produced this report detailing strategies for long-term access for Federal agencies.

Drawing upon what had been learned in the State and local governments report, the project staff developed a research methodology based upon:

  • A review of the digital-imaging and optical digital data disk marketplace and technical literature (e.g., optical industry journals, special studies, and technical reports) to identify emerging trends,
  • An assessment of private industry and Federal agency system administrator experiences with digital-imaging and optical digital data disk projects, and
  • A nationwide onsite examination of 15 Federal agency digital imaging or optical digital data disk applications.

The objective of this three-part analysis was to identify critical management issues and relate them to technical trends and user experiences. This report consists of an executive summary, a list of recommendations, an overview of the challenges involved in long-term data access, followed by five sections that describe digital image capture, indexing systems, optical digital data disk storage systems, information retrieval, and information management policy. Each of the five main report sections contains management issues, technology trends, user experiences, and recommendations.

Most of the recommendations provided in this report conform to the State and local government study, although there are several clear departures. Generally, these departures take into account a better understanding of the issues and reflect changing trends in digital information technology. Four report appendixes provide detailed descriptions of agency site visits, a summary of relevant technical standards, a glossary of technical terms, and an annotated bibliography. As an aid to readers of this report, the authors selected boldface type to identify technical terms that are subsequently defined in appendix C, "Glossary of Terms."

The National Archives staff responsible for preparing this long-term strategies report included Charles Dollar, Barry Roginski, Peter Hirtle, and Charles Obermeyer II. Barry Roginski had the main responsibility for collecting the descriptions of agency site visits and organizing the report.


Acknowledgments

NARA's Technology Research Staff would like to take this opportunity to recognize the special assistance provided by those individuals and Federal agencies whose contributions made this report possible. The following individuals generously participated in our site visits and provided editorial assistance during the preparation of the site visit reports:

Tim Allard, Minerals Management Service Ray Buland, National Earthquake Information Center Ann Christy and Kristin Vajs, Library of Congress Malcolm Ewell, Social Security Administration James F. Gegen and Linda Brooks, Bureau of Land Management David Grooms, Patent and Trademark Office Sharon O. Jacobs, Agency for Toxic Substances and Disease Registry Rick Kanner, Federal Communications Commission Jacqui Lilly, Department of State Charles MacFarland, National Oceanic and Atmospheric Administration Gail Martin, Department of the Army Hunton G. Oliver, Commodity Futures Trading Commission Major Perkins, Department of the Army Linda Worthington, US Army Corps of Engineers Charles Young, Environmental Protection Agency

Special thanks are also due to the following individuals for their contributions during the data collection, technical analysis, and editorial review phases of this report:

Robert M. Blatt, Telos Systems Group

Eric Chaskes, Mary Donovan, Steven Puglia, and Sandra Tilley, NARA

Paul Conway, Yale University

Marilyn Courtot, Children's Literature

Howard N. Greenhalgh and Eric E. Tolbert, Department of the Army

Richard Harrington, Virginia State Library

Anne R. Kenney, Cornell University

Basil Manns, Library of Congress

Whitney S. Minkler, MSTC

Lance W. Morgan, Science Applications International Corporation

Julie Peternick, PRC Inc.

Fernando L. Podio, National Institute of Standards and Technology

William K. Saffady, State University of New York at Albany


Section 1: Executive Summary

The National Archives and Records Administration (NARA) recently completed a study of digital-imaging and optical digital data disk storage systems in the Federal Government. This report, prepared by NARA's Technology Research Staff, discusses the findings of that study. Major areas identified are critical information management issues, technological trends, and germane user experiences. Research study elements included analysis of optical digital data disk technological developments, review of the relevant technical literature, assessment of Federal agency program management experiences with optical digital data disk systems, and site visits to 15 Federal agency optical disk projects. Report appendixes consist of descriptive summaries of site visits, a listing of technical standards, a glossary defining technical terms, and an annotated bibliography.

Potential benefits of imaging systems can best be achieved when the optical digital data disk system supports the information needs of the agency as a whole and when the technology is used to enhance service - not simply to address a single, isolated problem. Federal agency records management officers and archivists have a vital interest in optical digital data disks, but they should also be cognizant of the perils of technological obsolescence, inconsistent equipment performance specifications, incompatible new products, and a shortage of technical and administrative standards. Note that the term "long-term value" (defined by agency need) is not synonymous with "permanent," which denotes historical value and permanent retention by the National Archives. The National Archives current policy regarding optical digital data disks as a transfer medium for information of permanent value is described in section 8, "Disposition of Original Records".

Federal agency program managers responsible for records with long-term value should find this report's recommendations useful in designing and implementing an optical digital data disk system. This report is not a comprehensive overview of every important issue regarding the long-term access to electronic records. Rather, it is intended to complement existing technical studies and other generally available literature pertaining to digital-imaging and optical digital data disk technologies. In particular, this report is not intended to provide an exhaustive analysis of issues related to database indexing and retrieval, the development of optical digital data disk standards, or compact disc read only memory (CD-ROM) technology. Compact disc recordable (CD-R), for example, has both formal and de facto standards for factors such as directory structure and physical format and offers a relative independence of the stored data from proprietary retrieval mechanisms. The scarcity of observable CD-R systems during the 15 site visits precluded extensive discussion of user experiences with CD-R technology in this report.

Federal agency officials responsible for selecting and managing optical storage systems must adopt as an overall goal maintaining access to records of long-term value stored in digital format. To achieve this objective, these officials must:

  • Ensure the quality of digital images captured through an electronic conversion process,
  • Provide for the ongoing functionality of system components,
  • Monitor and limit the deterioration of optical digital data disk storage, and
  • Anticipate and plan for further technological developments.

Long-term usability of digitally stored information, including scanned document images, digital data, and descriptive index data, will best be achieved by implementing a sound policy for migrating data to future technology generations, adhering to well-documented image file-header formats, and monitoring media degradation. System managers should create the technical and administrative infrastructure required to implement relevant information technology standards as they are developed.

Ensuring the quality of digital images means exercising continuous control over three processes: conversion of the original image to digital data; enhancement of the digital image, if necessary; and compression and/or decompression of the digital data for transmission, storage, and retrieval. Quality-control inspections, either at the document scanner workstation or as a followup task, should compare the original documents to the captured electronic images and index data.

Ensuring that information stored on optical digital data disks will continue to serve the function for which it was originally intended for as long as it is needed requires:

  • A long-term commitment to an open-systems architecture and
  • Adoption of a methodical approach to system component upgrade and data migration that guarantees the interoperability of current technologies with those yet to be developed.

Federal agency administrators may utilize either write once read many (WORM) or rewritable optical digital data disk technologies to store records of long-term value. Administrators who use rewritable optical digital data disks should ensure that read/write privileges are meticulously controlled and that an audit trail of rewrites is maintained. Unless there are specific program justifications, administrators should select the most suitably sized optical digital data disk storage format that satisfies long-term agency program needs while conforming to industry standards.

NARA has long recognized the potential benefits of optical digital data disk technology for storing and retrieving large quantities of information. Several years ago, due to the unsettled state of optical media technology and especially the absence of standards, NARA issued an optical media policy bulletin. This bulletin notified Federal agencies that NARA could not accession optical digital data disks containing records of permanent value. Permanent records stored originally on optical digital data disks required conversion to a medium acceptable to NARA at the time of transfer to NARA's legal custody. A revised bulletin addressing these concerns was recently issued by NARA.

The long-term stability of optical digital data disks requires the specification of a reliable storage/recording technology and ongoing protection of the media from damage and abuse through handling and adverse environmental conditions. Although optical digital data disks appear to be more durable and stable than the hardware and software required to maintain access, vendor claims regarding durability must be carefully examined. This examination should involve an evaluation of the optical digital data disk's manufacture data, testing methodology and procedures, and test results based upon the findings described in NIST SP-200. This evaluation should support a predicted pre-write shelf life of five years, and a post-write life expectancy of at least twenty years. Because optical digital data disks are not immune to hostile storage conditions, it is prudent to store them in a stable environment.

By themselves, digital-imaging and optical digital data disk storage systems cannot solve access problems stemming from existing inefficient manual or computerized information systems and practices. Indeed, automating inefficient processes may merely exacerbate existing deficiencies. Agency administrators who seize the opportunity to reassess office operating procedures when adopting new technologies are more likely to benefit from improved administrative productivity, enhanced user services, and operational cost savings.

Long-term access to records of enduring value stored on optical digital data disks involves more, however, than image quality, system functionality, and media stability. Adoption of digital information storage on optical digital data disks effectively binds the agency to a technological evolution that it does not control. Administrators must continue to monitor technological trends; plan for systematic maintenance, upgrade, and eventual migration to newer technologies; use existing and emerging standards; support the development of data interchange standards; and adopt prudent information preservation measures in the interim.

Agency administrators, information management officials, systems development analysts, records managers, and agency historians must work together to develop and implement policies and procedures governing the care of Federal agency records appraised as permanent by NARA.


Section 2: Listing of Recommendations

This section lists the significant technical and administrative recommendations that are described in further detail within this report. For ease of user reference, this section's basic organization conforms to the major sections and subheadings in the remainder of this report.

DIGITAL IMAGE CAPTURE

Open-Systems Architecture

Adopt an open-systems architecture for new optical digital data disk applications. or

Require a "bridge" to systems with nonproprietary configurations.

Digital Image System Configuration

Clearly define user and agency requirements during the imaging project's requirements analysis phase.

Verify that the imaging system has inherent flexibility and has an open, or nonproprietary, design that accepts future hardware and software upgrades.

Conversion of Original Records

Analyze the domain of documents to be scanned, identify the levels of uniformity, and consider the use of a document conversion contractor when the backfile holdings are extensive.

Implement a comprehensive records accounting and tracking process during the project's conversion phase.

Invest sufficient staff or contractor resources into document preparation to increase scanning productivity.

As appropriate, assign experienced agency staff to conversion processes that benefit from their knowledge of existing agency operations.

Digital Image Scanners

Prior to system acquisition, validate vendor claims regarding document throughput rates, image quality, and ease of operation using a representative sampling of the agency's holdings.

Follow the standard test procedures outlined in FIPS PUB 157 "Guideline for Quality Control of Image Scanners."

Scanner Resolution

Employ a scanning resolution of at least 300 dpi for office documents when future applications (e.g., OCR) for the digital images are anticipated.

Specify a higher scanning resolution (between 300 and 600 dpi or higher) as needed, for engineering drawings, maps, and documents containing significant fine line and background detail information.

Dynamic Range

Employ gray-scale or color imaging technology as needed for suitable continuous-tone images such as photographs, maps, and related records.

As appropriate, utilize 8-bits per pixel gray scale image technology for capturing continuous tone black & white photographs and/or negatives, and 24-bit mode to obtain true color rendition.

Image Enhancement

Conduct scanner testing using selected documents during the system design phase to determine the need for special scanner hardware modifications.

Retain scanned unenhanced images of documents of intrinsic value.

Digital Image File Headers

Use file formats that promote/facilitate network data transfer, such as Aldus/Microsoft TIFF Version 5.0 which meets the Internet Engineering Task Force standard definition for exchange of black and white images on the Internet.

Require use of a nonproprietary image file-header label. Or Require a "bridge" to a nonproprietary image file-header label. Or Require a detailed definition of image file-header label structure.

Data Compression Techniques

Use a lossless compression scheme when continued fidelity to the exact appearance of the original document is achievable and desired.

Use JPEG or MPEG for images with continuous tonal qualities when some loss of detail is acceptable.

For digital images without continuous tonal qualities, require standardized compression techniques, such as CCITT Group 3, CCITT Group 4, or JBIG.

If a proprietary lossless compression system is used, require that the vendor provide a means of decompressing the data to its original format.

Digital Image Quality Assurance

Routinely evaluate scanner performance based on quality control procedures recommended in FIPS PUB 157 "Guideline for Quality Control of Image Scanners."

Establish a consensus on what constitutes the "best" image for the different types of agency source documents; monitor on-going image quality using the system's display screens and laser printers.

Perform a 100-percent visual quality evaluation of each scanned image and related index data; quality control inspections must be meticulous if the original documents are not retained after conversion.

Verify the information as copied when transferring images/data to any other media from the original magnetic hard drive or optical digital data disk.

If WORM disks are the storage medium of choice, permanently write the information only after conducting a thorough quality-control inspection of the scanned images and index data.

INDEXING SYSTEMS

Indexing Systems

As appropriate, ensure that information retrieval software is SQL compliant.

Regardless of the capture methodology used, conduct a 100-percent quality-control inspection of all index data.

Index Database Location

Store the index data magnetically for improved operations and optically if long-term preservation is a concern.

Indexing Database Complexity

Index system design and capability decisions should be based on a thorough analysis of agency operations and user needs.

OPTICAL DIGITAL DATA DISK STORAGE SYSTEMS

Optical Digital Data Disk Recording Technologies

Either WORM or rewritable technologies may be used, with the actual selection determined by the agency's specific application requirements. Ensure that read/write privileges are carefully controlled and that an audit trail of rewrites is maintained when rewritable technology is used.

Optical Digital Data Disk Storage Capacity

Based on a requirements analysis and systems design study of an agency's operations, select the most suitably sized optical storage form factor that satisfies the agency's long-term programmatic needs and conforms to industry standards.

Jukebox Storage Systems

When selecting an optical digital data disk jukebox, consider the following factors: the overall information access needs (staff and public), budget and procurement considerations, and existing operations staff requirements.

Error Detection and Correction

Require that equipment conform to the proposed national standard ANSI/AIIM MS59-199X, "Use of Media Error Monitoring and Reporting Techniques for Verification of the Information Stored on Optical Digital Data Disks."

Small Computer Systems Interface

Specify the SCSI "Write and Verify" command when writing data to optical digital data disks.

Require system manufacturers and integrators to provide complete documentation on the specific configuration of the SCSI (or other interface) hardware and software.

Backward Compatibility of Optical Systems

Require upgrades or replacement systems to be backward compatible with existing information systems.

Or

Convert the existing digital information to the new format at the time of system upgrade or acquisition.

Optical Digital Data Disk Longevity

In addition to conducting a careful analysis of each manufacturer's media life expectancy testing methodologies and procedures, require the use of optical digital data disks with a pre-write shelf life of at least five years.

Require a minimum post-write life of twenty years based upon manufacturer optical digital data disk life expectancy tests that conform to the findings of NIST SP-200.

Optical Digital Data Disk Substrates

Optical digital data disk substrates of polycarbonate or tempered optical glass are acceptable.

Optical Digital Data Disk Storage Environments

Optical digital data disks should be stored in areas with stable room temperatures and with relative humidity ranges consistent with the storage of magnetic tape media. Avoid storage areas with excessive humidity and high temperature, and do not subject optical digital data disks to rapid temperature extremes.

If possible, do not operate systems or store optical digital data disks in environments with excessive airborne particulate matter.

Cleaning procedures for optical digital data disks must be in strict conformance with the media manufacturer's recommendations.

INFORMATION RETRIEVAL

Conduct a comprehensive requirements analysis of end users' information-access needs and a systems design study prior to procuring imaging system components.

INFORMATION MANAGEMENT POLICY

Cost-Effectiveness

In order to maximize the long-term viability of systems, develop digital imaging and optical digital disk applications in a cost-effective manner.

Where possible, link system design to Government improvement initiatives such as the National Performance Review.

Reexamine existing paper records systems prior to conversion to optical digital data disk systems to maximize productivity and improve delivery of information services.

Disposition of Original Records

Conform to NARA policy regarding the disposition of original records when converting to an optical digital data disk system.

Legal Admissibility

Become familiar with how the rules of evidence apply to Federal records, and ensure that procedural controls that protect their integrity are in place and adhered to.

Implement the recommendations provided in AIIM TR31, Parts I and II, applicable to agency projects using digital-imaging and optical digital data disk storage technologies, either in the conversion of paper documents to digital form or their initial creation in digital form.

Long-term Access

Develop an agency-wide data migration and disaster recovery plan well in advance of such an event for the digital imaging and optical digital data disk storage system.

Vendor Instability

In the event that warning signs of impending obsolescence appear, managers should make immediate plans to migrate the application to a new system.

Require vendors to deposit a copy of the computer system's application software codes and associated documentation with a bank, archives, or secure records facility in case of a business failure.

Media Life Expectancy/Data Transfer and Backups

Recopy data stored on optical digital data disks based on the information obtained through periodic verification of media degradation.

Create a backup copy of the information stored on optical digital data disks for retention in an offsite facility, using the appropriate storage media (optical, magnetic, paper, or microfilm) that best satisfies agency requirements.

System Obsolescence

Specify that the vendor provide a complete set of documentation, including source code with flow diagrams, object code, and operations and maintenance manuals as a contract deliverable.

Periodically review and revise system documentation to ensure that all subsequent system modifications and enhancements are adequately described.

Migration Strategies

Upgrade equipment as technology evolves, and periodically recopy optical digital data disks as required. Or Recopy optical digital data disks based upon periodic verification. Or Transfer data from a nearly obsolete generation of optical digital data disks to a newly emerging generation, in some cases bypassing the intermediate generation that is mature but at risk of becoming obsolete.

Information Technology Standards

Regularly monitor trends in the technological environment that conform to open-systems standards.

Specify existing and emerging nonproprietary technology standards in system design. Where possible, system components should conform to nonproprietary or commonly accepted practices.

Evaluate possible data degradation of information stored on optical digital data disks and system functionality on a regular basis using media error monitoring and reporting tools outlined in proposed and evolving standards such as ANSI/AIIM MS59-199X.

Support the ongoing development of nonproprietary standards for data exchange and interoperability.


Section 3: The Challenge of Long-term Access

Digital-imaging and optical digital data disk storage technologies have been available in the commercial marketplace for more than a decade. Digital imaging typically involves converting existing paper documents (e.g., forms, reports, maps, drawings, correspondence), photographs, or microforms into an electronic representation for computerized storage and retrieval. Due to the vast data storage capacities they offer, optical digital data disks are often integral components of digital imaging systems. The linkage of digital imaging, which generates sizeable electronic files, to the superior storage capacity of optical digital data disks has made both technologies increasingly attractive to those seeking improved staff productivity and enhanced user services.

Given vendor claims about the cost-effectiveness of digital-imaging and optical storage technologies, it is not surprising that interest in these two technologies is mushrooming. Newsletters, journals, and periodicals relating to these technologies regularly describe an increasing number of digital-imaging and optical digital data disk applications in health care services, banks, insurance companies, pharmaceutical companies, and universities, to name only a few. Two factors are driving the use of digital and optical digital data disk storage technologies: the availability of an increasing variety of devices and decreasing costs. Consequently, it is not surprising that the number of digital-imaging and optical digital data disk applications being planned or actually implemented at all levels of government — Federal, State, and local — continues to grow. A recent review of major Federal Government digital imaging and optical media applications highlights the enormous financial investment committed to these two technologies.

Many Federal agency program managers consider digital-imaging and optical digital data disk storage technologies to be essential tools in providing improved and more cost-effective services. However, there is another group of governmental administrators whose chief concern is the impact of these two technologies on the disposition of Federal records. Records managers, who implement the approved disposition of Federal records, and archivists, who protect and service valuable permanent records, have a vital interest in the viability of digital-imaging and optical digital data disk storage systems. Many records managers are concerned about the legal admissibility of Federal records in a court of law. Both records managers and archivists are concerned with storage media longevity and information system obsolescence.

This report addresses these as well as other issues relevant to the administration of optical digital data disk information systems. The report also identifies issues that the digital-imaging and optical digital data disk industries must address and resolve if these technologies are to prove viable for applications requiring long-term access. In this regard, it is hoped that the guidance provided herein helps initiate a much-needed vendor and user impetus to ensure that archival information access considerations are taken into account during an information system's requirements analysis, system design, engineering development, and integration phases.

For purposes of this report, optical digital data disk systems are defined as an amalgam of technological processes that includes, at a minimum, digital imaging or storage of digital data on optical digital data disks or similar optical media. Based on a review of central management issues, technical trends, and user experiences, the report covers a range of topics that are unique to these technologies. Technology trends have been identified through market research and a diligent review of technical specifications, product literature, and analytical reports published in a wide range of journals and periodicals. In order to enrich the study and to provide a suitable environment in which to "test" key concepts and ideas, 15 Federal Government optical digital data disk applications were surveyed. Although these 15 applications are not fully representative of all optical storage systems currently in operation, they do provide important background information to view the implementation of digital-imaging and optical digital data disk information storage systems within the Federal Government.

Government officials responsible for selecting digital-imaging and optical digital data disk storage systems must address several critical factors. They must, of course, ensure that the digital-imaging and optical digital data disk storage system they select meets their agency's immediate needs in a cost-efficient manner. They must also ensure that records scheduled as permanent are retained in their original format following conversion or are converted to a medium acceptable to NARA at the time of transfer to the National Archives.

The decision to acquire or build an image record system using optical digital data disks with specific system capabilities should be based on a thorough analysis of the agency's immediate and long-term information processing requirements. At the same time, the adoption of digital image and optical digital data disks is not without perils, due to the many challenges resulting from a rapid technological evolution. For example:

  • New optical storage products, many incompatible with each other, are constantly emerging;
  • The paucity of technical and administrative standards limits the development of objective criteria for selecting equipment;
  • The rapid adoption of optical technology by Government agencies increases the pressure to act;
  • A proliferation of vendors at the National and local levels, with competing claims, may complicate the setting of performance specifications; and
  • The longevity (market life) of digital-imaging technology products and the vendor community providing systems and technical services is volatile.

Of course, these "pitfalls" are manifestations of a more pervasive problem of technological obsolescence. Far too often the implications of these hazards to long-term access, linked to agency accountability for its programs, are not sufficiently taken into account. Long-term access to, or usability of, digital-image, or character-based, data must be viewed distinctly from the medium on which the information is stored. This distinction allows for a continuing commitment to information in digitized format, while simultaneously recognizing that the media storing that data must eventually be replaced due to inevitable obsolescence.

Establishing a commitment to maintaining the long-term usability of digitized information, as opposed to the media itself, is not enough. Maintaining this long-term commitment to use digitally stored information also requires continuous:

  • Data readability
  • Data retrievability
  • Data intelligibility

Data readability refers to the ability to process the information on a computer system or device other than the one that initially created the digital information or on which it is currently stored. Typically, nonreadability involves some aspect of an older storage device (a tape or disk) that makes it physically incompatible with existing equipment. This "hardware obsolescence" occurs when storage devices and media used today become incompatible with those developed in the future. For instance, the 556 bits per inch (bpi) magnetic storage tapes commonly used in the 1960's cannot be read by current tape drives. Of course, as long as a magnetic tape or hard disk drive continues to function properly and repair parts are available, its usable life can be extended for a time. Similarly, the lifespan of optical digital data disks can be extended through proper storage and maintenance practices.

Data retrievability, which assumes readability as just defined, means that identifiable records or parts of records can be selected and accessed. Accurate retrieval requires keys, or pointers, that link the logical structure of records (i.e., data fields, text strings, directories, and indexes) to the physical storage locations of the data on a disk. The optical digital data disk logical structure may have little relationship to the media and format involved. Usually, this linkage information is found in a file header, or label. The label may include information required to locate the beginning of a file, to indicate the number of bytes each record contains and where these bytes are physically located, and to distinguish among the various informational units of fields that form records. Typically, the interpretation of the record's logical structure is a function of the computer's operating system (e.g., MS-DOS). Ensuring the long-term retrievability of records requires the continued functionality of the original operating systems or device drivers because these, too, are likely to become obsolete given enough time.

Data intelligibility means that the information a computer retrieves is comprehensible to another computer system or a human viewer. Intelligibility may occur at three levels. At its most simple level, intelligibility occurs when two computer systems either use or understand the same digital representation of the information, and this representation is translated into a form that humans recognize and understand. A prime example of an understandable form is an American Standard Code for Information Interchange (ASCII) text file. The second level occurs when two computer systems can use or understand the same representation of the information (e.g., ASCII), but when the representation is presented to users, it does not carry sufficient information (e.g., it is not self-referential) for a human to comprehend. Usually, this problem is associated with both coded and numeric data, and the intelligibility of such information can only be assured through documentation defining the values represented by the numbers and codes. The third level occurs when two different software applications, functioning in different computing environments, can process the same digital data and achieve identical results. One example of this is a text document embedded in one word-processing system that can be processed by a totally different word-processing system with no loss of information or page formatting details such as type fonts and line spacing. This lack of intelligibility becomes particularly evident when a proprietary encryption scheme is encountered or when digital images are compressed based on a proprietary technique.

Factors such as proprietary file-header labels, data compression techniques, and software obsolescence are major barriers to achieving intelligibility over time. Nonproprietary standards can begin to address and resolve some of these concerns. In spite of the absence of information technology standards in some areas that could help ensure the long-term records accessibility, system managers should create the technical and administrative infrastructure required to implement relevant information technology standards as they are developed.

An additional layer of complexity may arise with the way some optical digital data disk storage and retrieval systems write index pointer information. Furthermore, the searching and retrieval software associated with a particular application system usually requires a specific operating system platform, such as MS-DOS. Typically, a retrieval software application will add other pointers to the logical structure of the records. The retrievability of these records is therefore inextricably linked to the software application. Unless built-in data migration paths are established or newer software generations are installed that offer backward compatibility, accessing the records will be impossible.

Long-term access to digitally stored information, including scanned document images and descriptive index data, can be assured through migration to future technology generations. In order to make this migration possible, applications must follow certain practices in the initial digital image capture and its storage on optical digital data disk storage systems. In addition, the indexing and retrieval systems in place must still be useful in new technological environments. The establishment of a clear data migration strategy must be established, and an information management policy with comprehensive administrative procedures for both the data and its migration is also necessary. This report addresses each of these critical areas in turn.

In particular, it is hoped that the report's findings will assist program managers, archivists, and records managers to maintain the long-term usability of digitally stored information, including scanned document and microform images, electronic data files, and accompanying character index databases. The likelihood of long-term readability, retrievability, and intelligibility will be increased if programs:

  • Ensure the quality of digital images captured through an electronic conversion process,
  • Provide for the continuing functionality of system hardware and software components over time,
  • Monitor for any potential data degradation of optical digital data disks, and
  • Anticipate technological developments and plan accordingly.

This report assumes that Federal agencies have decided to implement an optical digital data disk application following approved requirements analysis, system design, and cost-benefit study processes and in accordance with NARA regulations covering records scheduling and the transfer of permanent records to the National Archives (see section 8, "Disposition of Original Records"). Other readily available information sources, some of which are listed in the bibliography, examine and review approaches to the technology decision-making process. This report does not discuss computer skills and technical expertise required of agency staff who implement and operate an optical digital data disk application.

The National Archives recognizes the potential benefits of optical digital data disk technology for storing and retrieving large quantities of digital information. Several years ago, due to several factors that included the unsettled and rapidly evolving state of the optical media technology marketplace and the absence of formally adopted industry standards, the National Archives notified Federal agencies that optical digital data disks containing records of permanent value could not be accessioned. Under the terms of this notification, permanent records stored on optical digital data disks must be converted to a medium acceptable to NARA at the time of transfer to NARA's legal custody. Unscheduled records converted to an optical medium must be retained in their original format pending scheduling or, if they are later determined to be permanent, must be converted to a medium acceptable to NARA at the time of transfer. Federal agency records cannot be disposed of without the authorization of the Archivist of the United States.

NARA continues to monitor developments in optical digital data disk technology, and has recently issued policy bulletins describing the accessioning of optical digital data disks. Federal agency administrators are encouraged to inform the National Archives of significant applications for implementing optical digital data disk technology.

The recommendations cited in this strategies report are not intended to set standards for system development or to determine the procurement of optical digital data disk application systems, nor should they be viewed as de facto archival standards. Rather, they are offered as reasoned conclusions that prudent managers of programs involving records of long-term value may find useful in designing and implementing optical digital data disk information storage systems.


Section 4: Digital Image Capture

Transforming paper documents into digitally stored electronic images offers Federal agencies several immediate benefits including greatly reduced record handling costs, improved operational efficiency, and increased information-processing effectiveness in the workplace. Since original source documents can deteriorate (or even disappear) when used for reference, in many cases it is prudent to convert them to another form or medium that can provide equal or greater utility without harming the originals. Historically, micrographics has been by far the most popular medium on which to transfer reference copies of original documents. Digital imaging technology has demonstrated an ability to convert and store electronic images of documents on optical digital data disks and automatically retrieve the information to a display screen or printer for reference. The majority of the 15 Federal agencies visited for this report utilize digital-imaging technology for scanning documents. The popularity of digital imaging is understandable, considering the sheer volume of Federal agency information that already exists in paper form. Traditional paper-based records storage-and-retrieval systems are often labor-intensive, time-consuming processes. Alternatively, a digital conversion of Federal Government records involves complex issues such as holdings conversion, file-header schemes, data compression, and quality control.

Management Issues

A fundamental issue facing agency administrators is the tradeoff between the costs of integrating digital imaging systems and the benefits that accrue to system users. Large-scale digital-imaging projects require an ongoing agency commitment to highly specialized equipment, a technology-oriented staff, and suitably equipped conversion facilities. Significant conversion project cost items may include identifying and preparing the materials to be converted, image capture, indexing, quality inspections, and refiling of records. Sophisticated indexing at the image-item level requires a considerable investment in human resources. Imaging offers many potential user benefits, including multiple simultaneous access to images, rapid and accurate retrievals, economical communication of image data over great distances, high-resolution image display on a desktop terminal, and laser-printed output.

One possible conversion alternative is the use of an imaging service bureau for a one-time backfile conversion. This approach eliminates the need to establish an in-house equipment and operations staff capability for a one-time endeavor. Since Federal agency personnel possess a greater working knowledge of the agency's records holdings, they function more effectively as document preparation staff, quality-control inspectors, or as monitors of the contractor's performance. Under these circumstances, contractor staff are often assigned to the more repetitive tasks of scanning and data entry. Conversion contracts are facilitated when the universe of documents is relatively static, such as a set of existing historic land deeds, and where the conversion is conducted in-house or at a preestablished record's holding site. Agency control over the conversion site offers several advantages including reduced records transportation costs, improved document security, and simplification of administrative records tracking.

Additional costs may be incurred when attempting to increase the technical image quality at each stage of the conversion process. Several approaches to improved image quality are available, depending on the characteristics of the original documents. For example, selecting a higher scan resolution often improves image sharpness. Alternatively, specifying a lower scan resolution combined with computed gray scale can also improve image appearance. These benefits may be offset by larger digital-image file sizes and, quite possibly, slightly longer wait times during image retrieval. Higher quality images are more likely to be of increased value in the future, extending the effective life of the digital files and postponing the need to rescan (assuming the original documents are retained). Ensuring that the best possible representation of each original document is captured and preserved can increase the agency management's confidence level in the imaging system.

Currently, there are no objective empirical indicators of acceptable image quality for digitally scanned images. The Association for Information and Image Management (AIIM) is supporting work to develop source-document-independent approaches for evaluating the quality of images captured in document conversions. One possible approach is to categorize documents based upon types of perceived scanning problems. An impartial agency review committee could compare the original documents to the scanned digital images and arrive at a consensus on capturing the "best" image for each document category. Regardless of the quality standard selected, adherence to quality-assurance procedures is a management responsibility. This critical element should not be left solely to the discretion of the conversion operations staff.

Several issues should be addressed during the program's system planning and budgeting phases: initial system cost; maintenance requirements; and, unforeseen demands placed on the system due to creative use. Initial system costs include factors such as system capability, performance requirements, and equipment and applications software configurations. Maintenance requirements include compliance with system revisions, updates, and component replacements needed for continuing system operations. As part of system maintenance, specific system benchmarks should also be established and continually reviewed to assure that the installed system is meeting agency requirements. Lastly, unforeseen creative use of the system will increase demands for additional user reference services. These added demands will require highly qualified agency or vendor staff to perform services such as integrating additional system components, upgrading the data base management system (DBMS), and enhancing the performance of the imaging system's data communications network.

System Integration

Technical Trends

The system development phase of a digital-imaging or optical digital data disk project usually includes the identification, collection, and organization of the Federal agency's materials to be processed. Image data stored on optical digital data disks is frequently created by a retrospective conversion of all existing information in an agency's files or, conversely, adoption of a "today-forward" conversion concept, wherein only the most recent or active records are converted. Additionally, there are system applications geared toward Federal agency records that never originally existed in paper format and do not require a document-scanning conversion.

A digitally scanned raster image is essentially an electronic "photograph" of a document, divided into a "grid" composed of thousands of minuscule picture elements, or pixels. The brightness value for each pixel is converted to a digital representation. Unlike alphanumeric data, raster images consist of binary 1's and 0's that in themselves carry no intelligence and therefore cannot be queried in terms of what information the image represents. It is for this reason that comprehensive, accurate indexing of digital images is mandatory for efficient user retrieval access. Properly indexed electronic digital images can be displayed on high-resolution display screens, transmitted to a remote user sites, or distributed as hard copy.

The digital-imaging and optical digital data disk storage marketplace will continue to introduce new products that vary in configuration, capability, and cost. This process reflects the diversity of user information-access needs. The pioneering digital-imaging projects were often research pilot programs or stand-alone operations with no direct links to existing information systems. This concept is undergoing a fundamental change, as today's agency administrators and end users alike are more technically sophisticated and expect tangible results. In response, digital-imaging and optical digital data disk storage systems are increasingly assuming higher profile roles in agencies, serving as catalysts for organizational changes.

Federal agency management must recognize the potential impact that imaging technology can impose on workflow processes, including agency forms design and administration, personnel management and supervision, paperwork and records management, and agency-patron relations. Introducing an imaging system into the workplace will not in itself necessarily provide an immediate solution to inherent operational shortcomings unless significant effort is expended for document control and indexing. Maximum benefits are obtained when existing workflow processes and operational procedures are adapted to the new technology.

User Experiences

The 15 Federal agency systems visited for this report were integrated using one of the several approaches, and many of the systems utilize components obtained from diverse equipment manufacturers, rather than relying on one single product source. Approximately one-third of the systems, including the Department of the Army's Personnel Electronic Records Management System (PERMS) and the Bureau of Land Management's imaging projects, were obtained from a large corporate integrator or vendor. Another third, including the Minerals Management Service and the Commodity Futures Trading Commission projects, were obtained from somewhat smaller, more localized vendors responsible for designing and assembling the components obtained from a variety of manufacturers. The final one-third of the systems, including imaging systems at the Patent and Trademark Office and the imaging program at the Agency for Toxic Substances and Disease Registry, were designed, developed, and integrated through a collaborative effort between the integration company personnel and Federal agency staff.

Open-Systems Architecture

Technical Trends

Digital-imaging systems that feature proprietary hardware and software components may have limited ability to accept components supplied by alternative manufacturers. As the imaging technology marketplace continues to evolve, an increasing emphasis is placed on "open" systems that incorporate a multivendor environment. Open-systems architecture is defined for the purposes of this report as a systems design that:

  • Permits component upgrades with negligible degradation to system functions
  • Allows the system to be upgraded over time without a significant risk of information loss
  • Supports the import and exporting of digital data.

The emphasis of the user community and digital-imaging industry is shifting toward system architectures with inherent operational and configuration flexibility. An open-systems environment supports the integration of standardized system components, while meeting unique user needs. One of the key factors in achieving true open systems is the development, acceptance, and widespread adoption of nonproprietary standards.

User Experiences

The majority of the 15 Federal agency sites visited have some proprietary components or processes. These elements include unique file headers, image compression and data transmission processes, hardware components, applications software, and the overall operating system configuration. Federal agency systems administrators are cognizant of the need to move toward open systems and are taking positive steps in that direction. Many of the system administrators interviewed noted that although their existing systems contain proprietary components, long-range agency objectives are to eventually move toward an open-system concept. These plans involve adopting industry-wide standards and integrating off-the-shelf hardware and operating systems software. Agency administrators noted that these steps are expected to increase the existing optical digital data disk storage system's interoperability. That is, they expect immediate benefits of improved data-sharing and communication linkages with other information systems both within and outside the immediate agency application, while recognizing that in some cases a total system replacement may be required.

Recommendations

Adopt an open-systems architecture for new optical digital data disk applications. Or

Require a "bridge" to systems with nonproprietary configurations.

Digital Image System Configuration

Technical Trends

Image system selection should only be attempted after conducting a detailed analysis of existing and planned agency information requirements. Depending on the quantity of records and user access requirements, installing an off-the-shelf imaging system is the least complicated and least costly approach. Turnkey imaging systems provide generic capabilities, accepting minor hardware and software refinements to better meet the user's unique needs. If proprietary components preclude the ability to accept component reconfiguration to meet organizational requirements, agencies may need to acquire specially engineered systems. Optimally, an agency would adopt a total-systems approach to records management that provides practical solutions to support the agency's information-processing applications.

An image-capture system allows digital-imaging technology to assume many different configurations if that system has the following basic hardware components:

  • Document scanner/digital image capture equipment
  • High-resolution display monitor,
  • Personal computer (PC) system platform,
  • Temporary file storage devices, and
  • Laser printer.

Depending on agency requirements, the final system configuration may include optional components such as file servers, indexing and image quality-inspection workstations, high-speed document scanners, microform scanners, and local area networks (LAN). Figure 1 illustrates the basic elements of a digital image-capture subsystem.

Figure 1 Digital Image Capture Subsystem

Computer software links the system modules, while retaining the flexibility needed to meet unique user needs. Digital-imaging systems implemented at the beginning of the technology's growth curve often required extensive one-of-a-kind software development that resulted in incompatible configurations. Efforts to modify or upgrade specialized systems were difficult and time consuming, frequently requiring assistance from the vendor's original software engineering team. This situation has improved over time as more software applications were developed for the following areas:

Data Communications: Communication's software facilitates the movement of data within the system hardware, among systems linked on a local area network, and across channels linking systems separated geographically, even worldwide.

Database Management: Indexing and database software provide the foundation for retrieval by controlling the nature and structure of information recorded about each image or group of images, organizing this information in ways meaningful to the user, linking images into documents and files, and allowing the user to identify and display images on demand.

Image Enhancement: Image-enhancement software allows the manipulation of image characteristics to improve legibility, clean up images, and reduce file sizes.

Display System Management: Workstation image-display software allows image control for zooming, rotation, scrolling, and multiple image-display screen formats.

Workflow Management: System workflow software controls all phases of document image capture, tracking, routing, indexing, retrieval, and printing.

Optical Character Recognition: Optical character recognition software can interpret and convert raster-image data into machine-manipulable textual data.

User Experiences

Several significant similarities were noted within the 15 Federal agency imaging systems visited, although the applications varied considerably. For example, more than half of the agencies use desktop-type scanners for converting paper records, while approximately one-third of the sites adopted high-speed document scanners to more efficiently process a larger daily volume of records. Most of the systems visited utilize PC platforms, file servers, and local area network communications. Almost half of the systems surveyed were connected to (or shared index information with) the agency's existing mainframe or minicomputer systems, and mainframe computer connections for others are in the planning and development stages.

A majority of the digital-imaging systems visited use workstations equipped with high-resolution display monitors. The remaining sites store electronic alphanumeric or graphic data, where high-resolution digital-image display is not a requirement. Several systems integrated microform scanners or film output devices, and others have tested or use optical character recognition technologies. Most of the systems visited have installed an optical digital data disk jukebox for automated storage and retrieval, and also utilize laser printers for hard-copy distribution.

To varying degrees, many of the agencies visited have changed or are considering upgrading their imaging systems in areas such as higher performance document scanners, increased memory in workstations, more powerful file servers, new workstation displays, and different optical disk drives or media. Agency administrators noted that open-type system architectures are more amenable to configuration changes, as proprietary hardware or software components are more difficult to upgrade when responding to additional unexpected user demands.

Recommendations

Clearly define user and agency requirements during the imaging project's requirements analysis phase.

Verify that the imaging system has inherent flexibility and has an open, or nonproprietary, design that accepts future hardware and software upgrades.

Conversion of Original Records

Technical Trends

The retrospective conversion of paper records to digital images requires the integration of specially configured production facilities, conversion equipment, and a technology-minded operations staff. It is not uncommon for Federal agencies to limit record conversions to a "today forward" concept, converting only the most current and frequently accessed records. This concept is especially attractive when older, less requested records are a significant segment of an organization's holdings.

Imaging systems convert information into electronic images that can be indexed and searched, routed to user workstations, and remotely distributed and printed. Major input processing steps include:

  • Converting original records (paper, microforms, analog data) to a digital format,
  • Electronically enhancing images that are difficult to read to improve legibility,
  • Appending a file header and compressing the digital images to reduce data transmission and storage requirements,
  • Indexing the images at appropriate levels,
  • Conducting quality-control inspections of the index and image data and rescanning documents as needed, and
  • Recording the digital information on a suitable storage medium.

A comprehensive records tracking and accounting process is necessary for all conversion efforts to insure that all records designated for conversion were in fact converted, and to monitor exactly what was converted. Tracking and monitoring must begin when a record is identified for conversion, and not cease until completing all tasks related to acceptance of the new record form and disposition of the old record form.

The point of entry (input device) for paper records is a document scanner, available in several configurations: some use a stand-alone mode and magnetic storage of the electronic images, some are attached to an imaging workstation, and some are operated under a computer file-server configuration. Image-enhancement processes may be applied to the digital images to improve their legibility, while concurrently reducing overall file sizes. The scanning process usually includes image data compression in accordance with a standard or proprietary format. The document images are then indexed using traditional manual key entry, bar-code scanning, or optical character recognition.

Depending on an agency's requirements, the index subsystem may be maintained in several ways including storage of the index data as part of the imaging system or storage of index data in a separate database management system. In either case, the index data is usually retained on magnetic storage media. Magnetic storage simplifies index data revision and supports faster user access to the information. The scanned images may also be stored magnetically, but optical digital data disks are a viable option for long term information retention. A single scanned image may require between 20,000 and 300,000 or more bytes of storage (image compression at 10:1). When planning a system, it is useful to conduct testing with the original materials to be converted to determine potential scanning throughput rates, storage requirements, and image file transfer speeds. The digital information can be distributed in several ways, including image display on high-resolution monitors, laser printers, computer output microforms, or remote image transmission. These processes often are under the control of a workflow process management system that can also route images to workstations, distribute output, and conduct image tracking and status reporting.

User Experiences

Several salient patterns emerged during the site visits that illustrate generic Federal agency strategies. Federal agency production managers understand the critical role of document preparation in high-volume conversions. Document preparation is a nontechnical component of a records conversion that affects overall operational productivity. The steps involved in preparing documents for scanning are very similar to those used in preparing records for microfilming. The removal of staples, bindings, and other fasteners and proper sequential ordering of documents are important steps that are best performed offline. Performing these steps diligently reduces nonproductive or idle wait time at the scanning workstation and improves document-scanning throughput rates, especially for imaging systems with high-speed equipment.

Eight of the fifteen Federal agency systems surveyed use existing agency staff to accomplish in-house document or data conversions. Five agencies visited utilize onsite contractors for scanning and indexing services arranged through contracts with service bureaus or integration vendors. The survey indicated that several agencies use a combination, or "team," approach for the actual conversion process, in which contractors and Federal employees share conversion tasks. The State Department's imaging system, for example, operates with agency staff processing the incoming mail requests and contractor-supplied personnel operating the scanning and indexing systems.

Several of the agencies visited conduct document scanning at conversion sites that are not located near the storage and retrieval systems. In these cases, the scanned images are temporarily stored using magnetic media or rewritable optical digital data disks. For example, document scanning for the Patent and Trademark Office's conversion project was conducted by contractor personnel at an offsite document storage facility.

Three Federal agencies visited employ optical digital data disks for digitally storing graphic or alphanumeric ASCII data. The National Oceanic and Atmospheric Administration (NOAA), for example, uses optical digital data disk technology for archival retention of coastal environmental data. NOAA monitors natural climatic events and manmade environmental factors and analyzes their impact on rapidly changing global processes. The National Earthquake Information Center uses optical digital data disk technology to store seismic data detected from earth tremors caused by events such as earthquakes, volcanic activity, nuclear tests and oil prospecting. And finally, the Social Security Administration retains workers' earnings data in digital format using rewritable optical digital data disks.

Recommendations

Analyze the domain of documents to be scanned, identify the levels of uniformity, and consider the use of a document conversion contractor when the backfile holdings are extensive.

Implement a comprehensive records accounting and tracking process during the project's conversion phase.

Invest sufficient staff or contractor resources into document preparation to increase scanning productivity.

As appropriate, assign experienced agency staff to conversion processes that benefit from their knowledge of existing agency operations.

Digital Image Scanners

Technical Trends

Document Scanners: A scanner is the hardware component that converts original documents to electronic digital images. The commercial imaging marketplace offers scanning equipment with a diversity of throughput speeds, automated operator features, and acquisition costs. Actual elapsed times for scanning and displaying the images varies based on several factors including the inherent performance of the specific scanning unit, physical dimensions of the documents, and scan resolution selected. These factors contribute to desktop-class scanner production rates of between 2 and 20 documents per minute. Therefore, when equipped with document feeders and two-sided scan capability, desktop scanners are useful for smaller imaging applications with lower daily conversion volumes and for midrange office imaging systems. When equipped with special image-enhancement capabilities, desktop scanners are also effective for scanning low-contrast documents, which are difficult to read. These scanners also function effectively for rescanning poor-quality images rejected during routine image quality-control inspections.

For larger document conversion applications, higher performance scanners that employ heavy-duty mechanized document transports and multiple scanning charge-coupled device (CCD) arrays are available. This equipment offers throughput rates ranging from approximately 40 to 120 or more pages per minute. Depending on the specific unit, these scanners may capture two sides of each document on one pass, improving productivity through reduced paper handling. Since scanner production rates can be affected by the document's physical condition, manufacturer's claims regarding scanner throughput rates should be verified prior to equipment procurement. Verification is best accomplished using a representative sampling of actual agency records. Specialized scanners are also available for capturing larger documents such as maps and engineering drawings.

Document scanners are generally installed and calibrated in accordance with instructions set forth in the manufacturer's operation and maintenance guides. To determine the quality of an image acquired with a recently calibrated scanner, it is recommended that a standardized target be scanned and evaluated in accordance with FIPS PUB 157 "Guideline for Quality Control of Image Scanners." Test targets are available to evaluate scanner performance for a variety of image characteristics, including color, type size, and resolution.

Digital Scanning and Microforms: Depending upon an organization's data storage and retrieval requirements, several alternatives are available to migrate information between digital image systems and analog microform technologies. These approaches include:

  • Concurrently scanning and microfilming documents at the image capture stage,
  • Creating microforms from existing digitally stored data, and
  • Converting existing microforms to digital data for storage on optical digital data disks.

The first approach requires image-capture equipment that digitally scans the paper documents and, at the same time, photographically records the images on microfilm. This bilevel capability potentially offers the best of both worlds. That is, the digital images are stored on optical digital data disks for automated storage and retrieval, while the microforms, when processed and stored in accordance with NARA micrographics regulations (36 CFR 1230), comply with long-term information retention requirements. One potential drawback that could nullify any potential benefits of this approach would be when quality control problems are discovered with the processed microforms, and the document batch needs to be completely refilmed.

A second approach is to create microforms from existing digital image and text data stored on optical digital data disks. Several configurations of microform recorder equipment are available for producing microforms from digitally stored text and raster-image formats. One technique uses a laser-beam recording technology to "write" the digital information, line by line, directly onto the microform materials with micron-sized pixel patterns. The second technique, which applies only to coded data including text, uses the more conventional cathode ray tube (CRT) imaging technology provided in computer output to microfilm (COM) recording devices. Commercial availability is rather limited for high resolution microform recorders that create raster COM images on 35mm films. Since these systems are also complex to operate and expensive, users should consider a service bureau for lower volume applications. Further complexity is introduced when the digital information is marked with unique or proprietary image file-header information, requiring conversion to a widely used format (e.g., Tagged Image File Format [TIFF]).

A third approach involves the digital conversion of microforms. These microforms may already exist in an agency's files. Conversely, they may have been created in lieu of digital scanning of the documents under a microfilm-first, scan-later concept. If this approach is under consideration, investigate the input requirements of commercially available microform scanning equipment. Ensuring that the microform's technical production specifications comply with scanner requirements will expedite the digitization process. Unlike paper document scanners that sense reflected light, microform scanners transmit a beam of light through the film media. The technical quality of the original input microforms directly impacts the readability of the digital images. High-quality microforms provide the most legible (cleanest) digital images, with an added benefit of smaller digital file sizes. Depending on user needs, the commercial marketplace offers microfilm scanners with various operational performance features. High-end microform-scanning equipment may include powerful image-enhancement algorithms that improve legibility from low-contrast microforms and electronic sensors for detecting skewed or misaligned images.

User Experiences

The 15 Federal agencies visited used desktop or high-speed scanners to convert existing paper or microform records. Several applications begun with low-volume desktop scanners found it necessary to upgrade to higher performance equipment. Other agencies achieved higher production by adding autofeeders to desktop sheet-fed scanners or by integrating additional high-speed scanners. Site managers noted that under actual production conditions, document scanner throughput rates may vary considerably from manufacturer estimates. System administrators noted that increased demands placed on the existing components were based in part on greater awareness and acceptance of imaging technology. At peak times, this unexpected increased demand on imaging systems may overload inherent capabilities.

The conversion of microform to digital images offers a number of potential benefits, but there can be a significant downside. The original quality of the microform, including image contrast, spacing, skew, and sharpness, all contribute to potential scan production rates and quality of the scanned images. In order to deal effectively with these issues, the Department of the Army's PERMS system employs specialized microform scanner systems. Several other agencies surveyed chose optical digital data disk technology as a replacement for existing microform-based information systems, accomplished either through a digital scan conversion of the existing microforms or by recording data directly to optical digital data disks with no paper records created as intermediaries.

Recommendations

Prior to system acquisition, validate vendor claims regarding document throughput rates, image quality, and ease of operation using a representative sampling of the agency's holdings.

Follow the standard test procedures outlined in FIPS PUB 157 "Guideline for Quality Control of Image Scanners."

Scanner Resolution

Technical Trends

The document scanner resolution selected directly impacts several key factors including the display-screen readability of digital images, the legibility of hard-copy output, and the usefulness of the digital images for future agency applications. The selection of optimum scan resolution is critical for both immediate and longer term applications, as the original scan resolution can never be increased even if future information retrieval technologies require a higher quality image. Consequently, a strong case can be made for scanning at the highest resolution that is currently affordable.

Scanner resolution is a complex equation: Image resolution, color spectrum, file storage size, and compression algorithms are interdependent and dependent on the scanning, display, and printing equipment available. Digital image file sizes, for example, depend on the scanner resolution selected. A standard office document requires approximately 500,000 bytes (uncompressed) at 200 dots per inch (dpi) and almost 2 million bytes before compression at 400 dpi. This four-to-one storage factor becomes significant when capturing thousands of images.

In the past, it was primarily scanner and display equipment limitations that determined digital-scanning practice, and scant attention was paid to objective criteria. Because scanner resolution also affects input productivity, vendors formerly specified lower resolution settings to achieve efficient throughput speeds and reduced data storage and processing costs. A scanning resolution of 300 dpi produces a quality comparable to that of an average office laser printer (though lower than a photocopier) and may be adequate for typical office documents that contain no type-font size smaller than 6-point. If a Federal agency plans to integrate optical character recognition (OCR) technology, a minimum scanning resolution of 300 dots per inch is recommended.

Engineering drawings, maps, and documents that have very detailed, fine line and handwritten information may require a scanning resolution of up to 600 dpi or greater. In all cases, but especially if the documents to be scanned include maps, drawings, or documents with fine line and background detail, tests should be conducted to verify the appropriate scanning resolution on a case-by-case basis with actual document samples prior to equipment acquisition.

User Experiences

Scanner resolution is often specified by program managers responsible for balancing two critical factors: image data storage and image display legibility. Scanning resolution at 12 Federal agency sites capturing original documents range from 200 to 400 dpi, with a majority employing 300 dpi. Agencies placing less value on the quality of screen display while emphasizing storage economics tend to scan in the 150 to 200 dpi range, arguing that a far greater number of compressed images can be placed on any given optical digital data disk. Agencies requiring a higher quality image display accept larger file sizes, scanning in the 400 dpi and higher ranges. For example, the State Department's REDAC imaging system routinely scans at 200 dots per inch, while poor quality documents are scanned at up to 400 dpi. The Bureau of Land Management's General Land Office Records Automation Project system captures document images at 300 dpi resolution, displays images at 150 dpi, and laser prints 300 dpi images.

Recommendations

Employ a scanning resolution of at least 300 dpi for office documents when future applications (e.g., OCR) for the digital images are anticipated.

Specify a higher scanning resolution (between 300 and 600 dpi or higher) as needed, for engineering drawings, maps, and documents containing significant fine line and background detail information.

Dynamic Range

Technical Trends

Digital-imaging systems typically include binary-type scanning and display equipment. That is, each dot (pixel) of a raster-image bit map is interpreted as either black or white. Binary scanners are not ideally suited for capturing images of colored documents, photographs, illustrations, or other items containing continuous tones. To optimally capture these images, a scanner with a greater dynamic range is needed. Dynamic range is defined for this report as the variation in tone in any given scanned dot. In black-and-white images the range is represented by a scale of gray tones. The degree of blackness associated with each picture element, or pixel, in a gray-scale image is controlled by the digital information, or bits, associated with that pixel. Similarly, in color images each pixel is represented by a value for the three primary colors (usually red, green, and blue) that, when combined together, produce the desired color.

Imaging systems recording gray-scale and color images require specialized components such as a scanner with gray-scale or color capability, video displays that can reflect the greater dynamic range in the system, and more powerful image-processing software. A gray-scale scanner is mandatory when scanning continuous-tone black-and-white photographs or negatives. Such images should be scanned at 8 bits per pixel, allowing the expression of 256 gray values, unless it can be determined in advance that there is no current or anticipated future need for this level of detail. Gray-scale scanning techniques may also be effectively applied in the scanning of black-and-white documents. Because the human eye is highly sensitive to variations in luminance, documents scanned at a relatively low resolution (e.g., 200 dpi) but with 4- or 6-bit gray-scale may actually be more readable on low-resolution monitors than documents scanned at a higher resolution in a bilevel mode. Use of a higher resolution monitor or printing often requires a higher resolution scan (e.g., 300­400 dpi) with 8-bit gray-scale imaging.

Color scanning presents even greater technical challenges. For example, the full visible color spectrum may not be captured accurately by all scanners. Additional problems occur on output. Whereas the red, green, and blue values of documents are usually captured during scanning, printer output is achieved by balancing cyan, magenta, yellow, and black. For accurate representation of the colors on output, the input and output devices must be calibrated. Even with sophisticated image compression techniques, gray-scale and color imaging requires substantial data storage capability. A standard uncompressed binary digital image consists of hundreds of thousands of pixels, each represented by 1 bit of information. Because an 8-bit gray-scale image represents each one of those pixels with 8 bits, the resultant uncompressed file is eight times as large. For example, an uncompressed 300 dpi bi-tonal image requires 1.05 megabytes of storage, while an uncompressed 300 dpi continuous tone gray scale image (8-bits per pixel) requires 8.4 megabytes of storage. Gray scale compression algorithms are by nature less efficient than binary compression schemes (i.e., compressed gray scale image files can be much larger than 8 times the size of compressed binary files). Based on agency requirements, this factor may make gray scale or color scanning a prohibitively expensive storage option.

User Experiences

Only one of the Federal agency sites visited, the National Earthquake Information Center of the US Geological Survey, utilizes computer workstations with 800 x 1,000 pixel monitors to display seismic digital data images that contain 256 shades of gray. None of the remaining agency sites visited are routinely scanning or storing gray-scale images.

Recommendations

Employ gray-scale or color imaging technology as needed for suitable continuous-tone images such as photographs, maps, and related records.

As appropriate, utilize 8-bits per pixel gray scale image technology for capturing continuous tone black and white photographs and/or negatives, and 24-bit mode to obtain true color rendition.

Image Enhancement

Technical Trends

Digital image enhancement invokes software algorithms used to "clean up" the visual appearance and quality of digital images. Image enhancement should be carefully used, because this process may actually remove minute elements of the image data. This image data deletion may occur either selectively or automatically, with the end product having increased visual contrast and improved readability. Electronically enhanced images can dramatically increase display-screen and hard-copy legibility. Image enhancement can also reduce storage requirements by improving the efficiency of image-compression software. Documents that are difficult to capture, such as carbon copies with blossomed characters, multigeneration photocopies, light blue and purple mimeographs, faded or stained originals, and faint pencil and ink annotations, are prime candidates for image enhancement. Imaging systems typically provide a fundamental image-contrast manipulation capability; additional hardware and software is available to expand enhancement power and speed while increasing compression capabilities.

One negative aspect of certain image-enhancement algorithms is a possible loss of detail contained within the original documents. For example, documents containing color printing, handwritten annotations, or marginalia may not be uniformly imaged. In these cases, image-enhancement software might inadvertently remove some faint or low-contrast markings. Similarly, all bi-tonal systems convert colors to black, increasing readability problems. Special filters can be used in the scanning process to minimize this problem, and administrators should ensure that the scanning capability of the proposed system matches the characteristics of the documents to be scanned. It is prudent to test the imaging system with a sample of agency documents prior to a full-scale conversion.

If a source document has intrinsic value, the original must be retained following digital image scanning. The digitally scanned raster-image data of intrinsically valuable documents should be stored in unenhanced form to ensure that all of the digital information as captured is available for processing in the future by more powerful image-enhancement techniques.

User Experiences

Image enhancement is attractive due to its ability to improve the legibility of stained, aged, and low-contrast documents. Even though the majority of document scanners provide basic contrast (light/dark) controls to adjust the digital image appearance, several of the sites visited are considering or have already integrated special add-on image-enhancement capabilities. For example, the US Army PERMS program uses software enhancement to clean up "noisy" images, thereby obtaining a greater compaction in data storage and higher quality images. Another example is the Commodity Futures Trading Commission, where an image enhancement computer circuit card was installed in the desktop scanner to obtain higher quality images. System operators also noted the usefulness of a display screen reversal (positive and negative) capability that increases the visual image legibility of hard-to-read documents, scanned negative microfilms, and faded photostats.

Recommendations

Conduct scanner testing using selected documents during the system design phase to determine the need for special scanner hardware modifications.

Retain scanned unenhanced images of documents of intrinsic value.

Digital Image File Headers

Technical Trends

Digital-imaging systems use a complex set of computer software for capture, storage, and image retrieval functions. The user's request for an image is linked to a specific location on the optical digital data disk or other storage medium. Linking is accomplished by means of a header preceding the digital data of each discrete image or group of images. Image file-header data may include such items as the file size, type of compression technique, and scanning resolution. File headers are often proprietary and typically are supplied as an integral imaging system component. In spite of a file header's importance to retrieving images over a long period, file headers are often overlooked by users until problems surface. Difficulties usually occur when image data must be transferred or when a system is upgraded or otherwise modified.

It is essential to use nonproprietary image file formats and header structures or have the ability to migrate image files into a common standardized format. When proprietary image file formats and headers cannot be avoided, the system developer should be required to provide a "bridge" to nonproprietary image file formats or, at a minimum, comprehensive documentation describing the image file structure. At present there are no agreed-upon industry-wide standards for image file formats and headers, although many in the industry are currently working to develop such standards. An Image Interchange Facility (IIF), for example, is currently under development under the auspices of the International Standards Organization as part of its International Image Processing and Interchange Standard. The main component of the IIF will be the definition of a data format for exchanging arbitrarily structured image data across heterogeneous application boundaries.

In the absence of an accepted standard image format, many imaging systems use the Tagged Image File Format, or TIFF. It is one of the most widely supported image file formats for personal computers. Every TIFF file includes a header, one or more image file directories, and content data. TIFF headers and image file directories tell the computer system how to read the data and contain such information as the width of the image, its length, and resolution. Some image system developers are adopting the TIFF to support image transfer among systems. Unfortunately, different versions of TIFF headers can be implemented; therefore the TIFF does not automatically guarantee success with image transfers between disparate systems. Acquiring comprehensive documentation about the header structure is recommended, even when using the Tagged Image File Format.

User Experiences

More than half of the Federal agency sites surveyed for this report use proprietary digital image file headers. Only three of the Federal agency sites employ some version of the more widely used Tagged Image File Format. The complex terminology and specific details of image file headers contributes to the lack of universal understanding and recognition of their importance. As a result, many system administrators rely on integration companies, vendors, and optical digital data disk manufacturers to guide them through the labyrinth of technical details and file-header format specifications. This reliance on proprietary vendor solutions may contribute to future difficulties in data migration.

Recommendations

Use file formats that promote/facilitate network data transfer, such as Aldus/Microsoft TIFF Version 5.0 which meets the Internet Engineering Task Force standard definition for exchange of black and white images on the Internet.

Require use of a nonproprietary image file-header label. Or Require a "bridge" to a nonproprietary image file-header label. Or Require a detailed definition of image file-header label structure.

Data Compression Techniques

Technical Trends

Digital images are usually compressed as part of the scanning and storage process and subsequently decompressed at retrieval. A compression algorithm transforms the original digital image raster pattern into a mathematical code that is stored more compactly, with compression techniques that are one- or two-dimensional. One-dimensional compression uses contiguous (adjacent) pixels on the same scanned line, while two-dimensional compression compares the differences between scanned lines, as well as within the same line. Depending on the document characteristics and techniques chosen, the actual compression ratios achieved can vary widely. Typical imaging systems may compress at a 10-to-1 ratio, while a 20-to-1 ratio or even greater, is feasible with more sophisticated compression schemes.

Although there are many compression techniques in use today, they generally fall into two categories: proprietary or standard. Proprietary compression algorithms tend to operate faster and offer greater data compaction. However, the stored images may not be easily transportable between different systems because of the algorithm's specialized characteristics. Standardized compression algorithms may not be as powerful but may support image data transfer between systems that otherwise might be incompatible. Standard, or nonproprietary, compression techniques are therefore an indispensable part of a migration strategy for records of long-term value.

Proprietary and standard compression techniques can each be further subdivided into "lossy" and "lossless" compression methods. With lossy compression, a certain amount of the original information is discarded as part of the compression process. Lossless compression, as its name implies, allows for the reconstruction of a file identical to the original. When performed correctly on a suitable document, lossy compression has the advantage of dramatically decreasing the size of the original digital files in a way that is almost undetectable by the human eye. For those archival documents in which continued fidelity to the exact appearance of the original document is important, a lossless compression scheme is recommended.

One of the most commonly used lossless compression techniques utilizes a method called run-length encoding. It evaluates patterns of adjacent pixels on a single horizontal line and encodes binary transitions. Run-length encoding is most efficient for documents with large areas of blank space, commonly found in office text files. More complex documents that include line drawings, charts, photographs, and maps, among others, may be more efficiently compressed using techniques that use "look-up tables" for comparison with the scanned image.

The former Consultative Committee on International Telegraph and Telephones (CCITT), now called the Telecommunications Standardization Sector (TSS), has developed international standards for data transmission over communication lines in one- and two-dimensional modes. These facsimile standards are known as Group 3 and Group 4. Group 4 provides greater compression capability (though at a certain point in a lossy fashion) and operates in a two-dimensional mode. Currently under development by the Joint BI-level Image Group (JBIG) is a new international standard intended to replace CCITT Group 3 and CCITT Group 4 compression standards.

In addition to the preceding standards, system software developers may occasionally need to implement other compression schemes. Two significant compression schemes are the Joint Photographic Experts Group (JPEG) and Motion Picture Experts Group (MPEG). JPEG is designed for compressing either full-color or gray-scale digital images of continuous-tone quality. It offers both a lossy and lossless compression alternative. The former occurs when a mathematical process called discrete cosine transform (DCT) is invoked that utilizes an 8 x 8 frame of pixels and yields a substantial compression. This process produces a lossy image with some loss of detail that may not necessarily be detectable to the human eye. The actual amount of loss depends upon the compression ratio selected. In contrast, the lossless compression alternative achieves complete fidelity to the source image because the sampling area or frame is 2 x 2 pixels, three of which are aligned along different axes with respect to the fourth. The compression ratio is user controlled and limited to either 2:1 or 3:1.

MPEG is a compression scheme for full motion video images. It uses JPEG for the compression of individual frames and also uses other lossy techniques to compress data between frames. The growing demands of multimedia computing, video conferencing, and high-definition digital television make it likely that there will be new standards developed shortly for the rapid transmission of moving images. Because MPEG is inherently lossy, and the high compression ratio of JPEG is only possible with the lossy compression technique, system administrators should carefully evaluate their system functional needs to determine if either technique will meet current and anticipated future image requirements.

System developers and administrators must choose between standard and proprietary compression techniques. Using compression techniques conforming to CCITT or the developing JBIG specifications when storing nontonal data will increase the likelihood that the images can be used with other technologies or migrated between systems. Although proprietary techniques may provide greater data compression, compatibility is not assured. In fact, if the software supporting a proprietary compression technique becomes obsolete, then for all practical purposes the image cannot be restored. There may be times when the use of a proprietary lossless compression technique is unavoidable, but in those instances the vendor should be required to provide a utility to decompress the data to its original digitized data format. At some future time, the data can be compressed again using any method desired. Table 1 provides a comparison of data compression techniques and examples of their applications.

Table 1. Data Compression Techniques
Group 3
Group 4
JBIG
JPEG
MPEG
Type
Lossy
 
X
X
X
Lossless
X
X
X
X
Level
BI-tonal
X
X
X
X
X
Gray
Scale
64
Shades
X
X
Color
256
Colors
X
X

Image
Source

Paper
Records
Primary
Use
Primary
Use
Primary
Use
X
Photographs
X
Primary
Use
Motion
Pictures
X
Primary
Use

User Experiences

The sites surveyed used both proprietary and standardized compression schemes. More than half of the 15 sites have adopted some type of proprietary data compression algorithms, whereas only 5 used the standardized CCITT Group 4 compression technique. Although the various proprietary compression schemes effectively reduce the digital file sizes, resulting in faster image transmission and reduced storage on the optical digital data disks, adoption of a proprietary approach introduces a level of risk into any subsequent data migration effort. Regardless of the technique adopted, system administrators recognized the importance of obtaining descriptive documentation about the compression algorithms employed.

Recommendations

Use a lossless compression scheme when continued fidelity to the exact appearance of the original document is achievable and desired.

Use JPEG or MPEG for images with continuous tonal qualities when some loss of detail is acceptable.

For digital images without continuous tonal qualities, require standardized compression techniques, such as CCITT Group 3, CCITT Group 4, or JBIG.

If a proprietary lossless compression system is used, require that the vendor provide a means of decompressing the data to its original format.

Digital Image Quality Assurance

Technical Trends

Successful digital-imaging systems include a quality-assurance program as part of the system management process. Effective quality-assurance programs involve (at a minimum) two critical aspects:
  • Process control
  • Product quality control

Process control ensures that production equipment (e.g., document scanners, image compression hardware, laser printers) and related system processes are performing at optimum levels according to preestablished criteria. Ideally, these process-control criteria are routinely used to monitor the performance of the imaging system and its individual components. Specialized diagnostic and technical evaluation tools, combined with detailed logbooks, are an invaluable resource in troubleshooting future system problems.

Product quality control evaluates the quality of the individual digital images and related index data produced by the imaging system. This level of quality control is expedited when the scanned digital images are temporarily stored on magnetic disk cache. Magnetic storage permits rescanning prior to recording the image data onto the optical digital data disks. Corrective rescanning is an especially important capability for systems utilizing write once optical digital data disks. Depending on system configuration, corrections may be performed at the scanner capture station or at specially designated inspection or rescan workstations.

Training and supervision of operations staff is a key factor in maintaining acceptable image quality. As noted earlier, there are no objective empirical indicators of acceptable image quality for digitally scanned images. An alternative is to categorize documents based upon scanning problems and reach a consensus on how to most effectively capture the "best" image. Ideally, this decision process would involve a team consisting of image system production staff, records managers, and system users and researchers. These evaluations should include visual analysis of workstation display screen images and laser printer output. Retaining a set of representative laser prints for future reference would be a valuable image analysis benchmark tool.

Quality-control inspection programs have a direct impact on document conversion productivity and overall usefulness of the imaging system. For example, an inspection program may include a comprehensive visual comparison of 100 percent of images scanned to the original documents, or it may be limited to defining the inspection population based on a calculated percentage or sampling plan. The overall level of quality control inspection should be extremely high if the original documents are not retained after conversion. In some systems, pass/fail decisions about the adequacy of the scanned images are based on operator judgements. These judgments may be fine-tuned through training and hands-on experience rather than relying on more objective criteria such as written agency guidelines or operations procedures. That is, if a screen image "feels right" to the scanner operator or quality-control technician, then the system's quality criteria have been achieved. In any case, prudent Federal agency managers should require immediate evaluation of each index entry and document image.

It is important to not only verify the digital images as captured at the scanning station, but also when images are written to an optical digital data disk, or after creating an optical disk-to-disk backup copy. Problems during these processes can result in images that are statistically counted by the system as "pages", but are not retrievable when users attempt to view or print the file. Depending on the extent of the problem, if only a part of the page was copied or transferred, these corrupted files may contain useless system generated noise or extraneous lines. These problems are caused by several sources such as hardware component failures, software glitches, or even power surges. Therefore, it is advisable when transferring image data to any other media from the original magnetic hard drive or optical digital data disk to verify the images as copied. This process can be performed manually, or automatically using image verification software.

A 100-percent quality inspection of the index data is mandatory to continued successful operation and maintaining system user confidence in an agency's digital imaging program. If the index data is key entered incorrectly or if errors are introduced during the OCR/bar-code automated data-capture process, the related digital images are essentially unretrievable. Index verification can be accomplished in several ways including a visual comparison of the key-entered data to the displayed scanned images or paper documents. The index data can be also be verified by double-keying, whereby the index data is manually rekeyed, with the computer system's software automatically performing a comparison of the two entries.

User Experiences

Virtually every Federal agency site visited has some type of quality-control inspection program. The inspection programs vary as to whether visual or automated verification of image files is performed. Image inspectors use display monitors to verify visually the quality of scanned documents. A few system managers expressed confidence that experienced scanner operators are sufficiently skilled to obtain adequate image quality without inspecting each captured image. One of the sites visited uses an independent quality-control contractor responsible for monitoring the onsite conversion operations and verifying equipment calibrations. Under this configuration, agency management oversees the quality-control process and spot-checks the production efforts to verify ongoing conformance to established guidelines. Less than half of the sites visited employ specialized test targets for image quality evaluation (see FIPS PUB 157) to monitor the system's ongoing performance. As noted by one system administrator with extensive real world experience, "a test target image is always better than a piece of paper with a footprint across it."

Recommendations

Routinely evaluate scanner performance based on quality control procedures recommended in FIPS PUB 157 "Guideline for Quality Control of Image Scanners."

Establish a consensus on what constitutes the "best" image for the different types of agency source documents; monitor ongoing image quality using the system's display screens and laser printers.

Perform a 100-percent visual quality evaluation of each scanned image and related index data; quality control inspections must be meticulous if the original documents are not retained after conversion.

Verify the information as copied when transferring images/data to any other media from the original magnetic hard drive or optical digital data disk.

If WORM disks are the storage medium of choice, permanently write the information only after conducting a thorough quality-control inspection of the scanned images and index data.


Section 5: Indexing Systems

Management Issues

Efficient retrieval of scanned document images and graphic data depends on an accurate, up-to-date index database. Indexing a digital image involves linking descriptive image information with file-header information. Index data is typically manually keyed in using the original documents or the scanned images, either at the time of image capture or later in the production process. Index data verification in which database entries are compared with the original source documents for completeness and accuracy is crucial: An erroneous index term may result in nonretrieval of the related image.

File-header information is automatically supplied by the imaging system's storage subsystem and generally incorporates an image reference number that is included in the descriptive information index. The descriptive index and the file-header index frequently exist as separate entities, creating a management issue for optically stored archival information. The indexing function of an image capture subsystem can use any one of several different approaches typically controlled by three factors:

  • Operations sequence,
  • Data complexity
  • Input methodology

The operations sequence refers to the order in which the index is input into the system. Some applications benefit from an existing database that can be utilized in conjunction with pointers to the related image file. In applications where the index must be created, two approaches are available relative to when the index will be entered into the system. The first approach involves creation of the index before scanning, followed by the scanning and then creation of the image files with pointers to them immediately placed within the corresponding index entries. The second is to perform the document scanning without an existing index, allowing for index creation following the image file creation. In this case, the data-entry operator keys in the index data and indicates the beginning and ending of the file, all in one step.

Data complexity is a significant factor to consider when designing an indexing system. In most digital image-based systems, the number of index fields is kept to a reasonable quantity, sufficient to provide the researcher with adequate information to locate the file without overburdening the database search. It is the researcher's responsibility to read the image and extract the information and not rely on the index to supply all of the required data. If a raster-based image system is encumbered with a large, complex indexing system, however, search times could be increased, and the efficiencies of the system will be somewhat negated.

With key-entry input methodology, the data-entry operator keys in the preidentified fields either from the digital image or from the original paper documents. The lowest point of direct access (usually at the file level) must carry its own index data. A file may consist of any number of pages, and individual pages within a file can then be directly accessed after the file is retrieved from the optical digital data disk. In some applications, however, each document page is its own file, and as a result, every page must be indexed individually. Depending on document characteristics, optical character recognition (OCR) may be useful for identification and capture of index data. Predefined document zones are scanned, and the raster-image data residing there is converted into ASCII character data. Applications that use standard forms or any type of page layout that contains consistent field locations may qualify for this technology. Currently, standard OCR technology can efficiently convert most standard type fonts and some structured hand printing to ASCII. Conversion costs can rise dramatically (doubling for every percent of improved accuracy) when rekeying OCR errors to obtain a 100 percent index data accuracy rate. If 100 percent index accuracy is not mandatory for a specific application, it is possible (though not recommended) to utilize the uncorrected OCR index version to support searches of the image files.

An alternative methodology for those agency applications that have little consistency in data location utilizes preprinted labels, or header sheets, that can be inserted at the beginning of each file to be scanned. These labels can use machine-readable bar codes or character data that are easily converted by OCR technology. Generally speaking, if it is possible to utilize OCR or some other form of machine-readability for index entry and is cost-beneficial to do so, that form should be utilized to gain speed, accuracy, and ease of use. However, manual key entry can provide rapid conversion rates if file sizes are large and data-entry fields are minimized.

Many mainframe-supported indexing systems can meet existing Federal Government database standards, such as Federal Information Processing Standards Publication (FIPS PUB) Number 127, "Structured Query Language (SQL)". However, many small- to medium- scale digital imaging systems tend to be targeted for a local area network (LAN) or desktop environment. Smaller systems may use index databases optimized for search-and-retrieval performance rather than compliance with interoperability and portability standards. As a result, as smaller systems outgrow their operating platform, it is very difficult to migrate them to a larger database environment. Therefore, smaller systems that have a potential to evolve into larger SQL environments should be developed with FIPS PUB 127 in mind, and the requirements analysis process must consider future growth to determine if there is a data portability requirement.

Performance and size issues generally drive the design of indexing systems. When these indexing systems are linked to optical image-retrieval systems, the end result often is a proprietary solution. In some cases it may be better to relax performance requirements in favor of an open-systems approach (see FIPS PUB 127) that facilitates migration of the index database while maintaining the existing image database.

Technology Trends

Research into and development of automated methods to accurately capture the required indexing data, including OCR and bar-code technologies, are ongoing. OCR is rapidly progressing, as demonstrated by improved recognition capabilities and higher conversion throughput speeds. Small office imaging systems now support integrated optical character recognition capability, although poor-quality documents may require higher performance scanning than is currently offered by turnkey systems. Vendors have also integrated bar-code technology for scanner control, file delineation, and index data collection.

LAN operating systems are beginning to utilize data compression techniques on LAN servers optimized for data retrieval. These techniques may make it possible to store medium- scale databases on network servers instead of mainframe systems. This trend may lessen the cost of implementing information retrieval systems that are linked to or integrated with image retrieval systems. Finally, large index data sets can now be backed up by very high-capacity tape cartridge subsystems, which can be integrated into multicartridge, multidrive storage- and-retrieval subsystems. This technology brings mainframe data backup and archive capability to a LAN server environment, making server-based imaging attractive and cost- effective.

User Experiences

Federal agency records holdings contain documents that are difficult to read, such as manually typewritten documents, facsimiles, carbon copies, and overly complex documents. Historically and practically, the existence of such documents has restricted widespread adoption of automated indexing technologies. Automated character recognition systems were often limited to applications containing a significant percentage of high-quality standardized forms. During the site visits conducted for this report, a Federal agency project manager remarked that "indexing is the dark side of imaging." The comment alludes to the manual, labor-intensive work efforts needed to achieve highly accurate index field data, often under the constraints of a pressurized production schedule.

The Federal agency sites visited used both manual data-entry and automated-entry indexing processes. Data-entry operators at 10 of the sites manually key the descriptive index data, using either the original documents or the displayed scanned images. The Bureau of Land Management's (BLM) contractor staff, for example, view the digital screen images while keying data into 35 distinct fields. The keyed data is subsequently verified by BLM agency staff, and an overall accuracy rate of 99.5 percent is claimed. The computer system assists the key-entry operators by automatically completing certain predefined fields. Another example is the Environmental Protection Agency's Superfund digital imaging system, which supports manual indexing using a keyboard, mouse, or bar-code reader.

System software observed at three systems storing digital data (nonscanned images) automatically selects and creates the index information. For example, the National Earthquake Information Center's National Seismographic Network automates the functions of data capture, indexing, and transmission and user access. The automated indexing of the data files occurs at the point of capture and initial data storage, with the seismic event's recorded chronological elapsed time (duration) serving as the primary index key. The Library of Congress has automated the index database creation by capturing index data (in this case, a unique accession number) from specially created input header sheets, with the OCR errors requiring corrections by key-entry operators. Automating an agency's indexing process improves the conversion project's throughput and increases the recorded data's accuracy as well.

Recommendations

As appropriate, ensure that information retrieval software is SQL compliant.

Regardless of the capture methodology utilized, conduct a 100-percent quality-control inspection of all index data.

Index Database Location

Technology Trends

Several options are available for storing index data linked to optically stored raster images or digital data. Magnetically stored index data provides improved data searching, image access, and less complex procedures when modifying the index information. However, this approach requires the retention and periodic recopying of two separate systems: the magnetically stored index data and the optically stored images. Depending on an agency's data storage requirements, backup copies of the index data should also be considered, providing redundancy and protection against accidental data loss. The index data can also be recorded directly onto the optical digital data disks, serving as a permanent table of contents to the adjacently stored images. This technique may be useful to agencies with specialized indexing, preservation, or data security requirements.

User Experiences

All 15 Federal agency sites visited store their index data using magnetic storage technology. The Library of Congress, for example, maintains a unique document index accession number along with document location pointers to the optically stored images. Storing this index accession information magnetically allows the Library staff to update or modify the data as needed. The Patent and Trademark Office maintains patent index data in magnetic format, but unlike the other agencies surveyed, maintains the data in optical format as well. While the Patent Office's magnetically stored index data provides ease of searching and modification, system administrators also permanently record index data on each optical digital data disk. The optically stored index data functions as a permanent directory or table of contents to the digital images. Under the Patent Office's dual storage concept, the index and image data are inseparable, helping to ensure the long-term linkage of these two information resources.

Recommendations

Store the index data magnetically for improved operations and optically if long-term preservation is a concern.

Index Database Complexity

Technology Trends

The index data processed by the database software may be sufficiently descriptive to answer user queries without needing to retrieve the digital images. Operator productivity is highest when the index data can be easily extracted. Depending on indexing system requirements, considerable staff time is needed to manually key several fields of data or the complete text for full text searching. Based on the site visits and discussions with Federal agency system administrators, the optical imaging systems observed are based upon one of the following generic retrieval concepts.

  • Images (digital data) or clustered groups of images are linked to a unique personal identifier, such as a social security number, stored in a simple flat database. Such a database is easily created and maintained, requiring little from the systems administrator in the way of sophisticated programming.

  • Images (digital data) or clustered groups of images can be easily linked to a sophisticated, preexisting database, where the primary responsibility of the imaging system operator is to ensure that each image or set of images is linked to the appropriate database record.

Descriptive information retrieval subsystems generally must be capable of rapid search through large databases with a variety of query options. Image retrieval subsystems generally do not require much storage space for the file-header index, and their ability to deliver the image to the user's workstation depends on the capabilities of the storage or access device and the communications subsystem. The index data retrieval and the image retrieval subsystems must be sufficiently integrated for an imaging system to function within the user's expectations.

User Experiences

Some variation of the less complex, flat file type indexing system were observed in operation at 10 Federal agency sites visited. The Commodity Futures Trading Commission's indexing system uses an OCR server that converts bit-mapped images to ASCII text. The ASCII text files are directed to the network server for indexing by applications software. Tagged index items include newspaper clipping headlines, originating news sources, and approximately 50 categories of page topics entered for later use as searchable database keywords. The Environmental Protection Agency's (EPA) imaging system, on the other hand, uses a more complex indexing system linked to the agency's existing IBM AS/400 host minicomputers. The EPA's regional office operations personnel manually key-enter the index information using the original documents prior to scanning. Bar-coded information serves as an alternative to manual key entry whenever possible. The indexing software provides workstation pulldown user interface menus controlled by mouse or keyboard. Operational productivity is further enhanced when the index data display screens provide users with sufficient information to effectively preclude the need to retrieve the digital images.

Recommendations

Index system design and capability decisions should be based on a thorough analysis of agency operations and user needs.


Sections 6: Optical Digital Data Disk Systems

Figure 2 Optical Disk Storage Subsystem

Management Issues

Federal agency administrators responsible for long-term records custody can benefit from adopting a technology-based information storage-and-retrieval system. The high- capacity data storage offered with optical digital data disks is of special interest to archivists, records managers, and others concerned with information preservation. Selecting the most appropriate information storage system should be based on a comprehensive analysis of each Federal agency's immediate and long-term needs. An analysis of an agency's functional needs should reflect the requirements of the agency's entire user community, including administrators, program managers, information resources management (IRM) officials, technical support staff, and the general public, if appropriate. Although the imaging industry emphasizes benefits of efficient access and file handling, too often cost factors assume a significant role in the decision-making process. Preservation concerns such as system or media longevity, the need for routine system upgrades, transfer of image or indexing data, and similar issues are subsequently relegated to a subordinate status. An important element to consider is the expected usability of the information storage media over the life of the digital-imaging and optical digital data disk storage system.

Optical Digital Data Disk System Configuration

Technical Trends

Optical digital data disk storage components can be integrated into diverse system configurations, ranging from single user workstations or multiple users operating under client-server configurations to mainframe computers with hundreds of remote user terminals. An optical storage subsystem may include:

  • System mainframe, minicomputer, or personal computer (PC) server platforms
  • Optical disk drive and optical digital data disk media
  • Jukebox storage equipment
  • Systems and applications software

Figure 2 illustrates one possible configuration of an optical digital data disk storage and retrieval subsystem.

A Federal agency's information storage requirements should be assessed based on several factors including required data-storage capacity, retrieval effectiveness, system and component reliability, capacity to ensure data integrity, and the longevity of both the selected storage media and the related imaging system components. Additional technical factors to consider are physical characteristics of the media, the optical digital data disk recording process specified, data-recording characteristics and usable storage capacity typically expressed in gigabytes (GB), and optical digital data disk durability and compliance with national and international standards.

User Experiences

Due to rapid technological changes, Federal agency program managers are grappling with devising realistic storage migration plans. Onsite interviews with Federal agency administrators indicated a general awareness of a need to eventually upgrade system components or to develop alternative approaches. The onsite visits revealed that a few Federal agencies have already modified, upgraded, or replaced individual components or entire subsystems. They took these steps while seeking higher performance and improved access to the stored information. The changes include improvements to system applications software, higher resolution workstation displays and increased cache memory, integration of high-performance Reduced Instruction Set Computer (RISC) workstations, and the acquisition of additional optical digital data disk drives or jukeboxes and higher capacity optical digital data disks.

A majority of the 15 agency sites surveyed use PC-based platforms as workstations or servers, and more than half are controlled by or have linkages to a mainframe or minicomputer system. These mainframe links typically enhance user access to the optically stored data while also providing a passageway to existing index database retrieval systems. Additionally, although almost all of the systems surveyed utilize some type of local area network (LAN), optical digital data disk systems vary in size, complexity, and methodology. Data indexing, file construction, and workflow procedures are in many cases tailored to the unique application requirements of individual sites.

Most of the system sites visited for this report utilize a jukebox for optical digital data disk storage. The Patent and Trademark Office has adopted a unique approach by integrating a series of dedicated single-disk rapid access optical drives (RAD) in addition to conventional jukeboxes. The RAD technology provides optimized information retrieval in support of user requests. In contrast, several of the sites surveyed use stand-alone optical drive systems in which the disks are manually handled by operations staff. These manual systems are often upgraded to more automated operations as user needs dictate improved system response. All of the systems visited that process digital-image data use higher resolution (100 dpi to 150 dpi) display monitors and laser-printing equipment.

Optical Digital Data Disk Recording Technologies

Technical Trends

Depending on agency requirements, digital-imaging and optical digital data disk storage systems may incorporate either of two incompatible approaches for recording digital information:

  • Write once read many (WORM) systems
  • Rewritable systems

Write Once Read Many (WORM) Systems. Write once read many, or WORM, optical media were first introduced in the early 1980's and remain a popular choice for digital imaging systems. WORM processes record data by means of a laser beam permanently altering the reflective characteristics of the disk's recording surface or sensitized layer(s). WORM optical recording technology processes include ablative, thermal bubble, bimetallic alloy, dye-polymer, and phase change. The ablative process, the earliest commercially available optical recording technology, alters the optical digital data disk's reflective characteristics by creating submicron-sized pits, or bubbles, to indicate 1's and 0's. Thermal bubble recording creates bubbles on the optical media's surface using the laser beam's concentrated energy. Bimetallic technology uses a laser beam to fuse several alloys together into a totally new alloy with a different reflective index. In the dye-polymer and phase change recording technologies, the laser beam alters the media's physical color and reflective characteristics, with the information signified by the color changes. Because these recording technologies involve a nonreversible physical alteration of the recording surface, they are designated "write once." Although each technique has advantages and disadvantages, from an archives and records management perspective, none is known to be inherently superior. At the present time, there is no preferred recording process for records of long-term value.

When data integrity is a primary concern, WORM optical digital data disks become attractive because the recorded data cannot be altered or erased. If at some point an existing image is determined to be incorrect or is no longer needed, the electronic pointer to that search location can be disabled. This pointer disabling process effectively eliminates future user access. New or corrected data is then written to a different, unused area of the optical digital data disk. This process is essentially transparent to the user community. Although access to the original image is blocked, the data is still on the WORM disk and potentially remains accessible unless expressly overwritten by the system manager to obliterate the recorded data pattern.

One of the most recent developments in write once read many optical digital data disks is the emergence of compact disc recordable, or CD-R. Beginning in the mid-1980's, compact disc read only memory (CD-ROM) rapidly became a popular means for publishing and distributing digital information. This acceptance process was particularly enhanced as international standards were developed to ensure the interchange of CD-ROM discs across a variety of players. The conventional CD-ROM publishing process is complex and often requires access to commercial mastering facilities that achieve economies of scale only when adequate numbers of discs are produced. Hence, CD-ROM's utility in the production of only one or two discs, common practice in archival and records management environments, has been limited. This situation is rapidly changing with the introduction of relatively low-cost CD-R equipment that record data directly onto individual discs in-house. A major advantage is that CD-R employs the same International Standards Organization (ISO) standards for physical media (ISO 10419) and file formats (ISO 9660) as CD-ROM. Adopting these standards greatly enhances disc interchange and the ability to access information stored on CD-R discs.

Rewritable Systems. Rewritable optical digital data disks offer a viable alternative for digital data that requires frequent update or revision. Rewritable disks offer users the ability to update recorded data as is now done with magnetic hard disks.. Rewritable optical digital data disks are commercially available in 3.5-inch and 5.25-inch diameters, with other formats under research and development. Although WORM systems currently are predominant, industry observers predict continued growth of rewritable formats. The marketplace currently offers two incompatible rewritable techniques: magneto-optical (MO) and phase change.

Magneto-optical systems combine properties of magnetic and optical technologies. The data recording, or "write," process uses a laser beam to heat (usually to the Curie point) a premagnetized site on the optical media's recording surface. The heat process causes a reversal of the magnetic polarity, resulting in subtle reflective differences that are sensed by the "read" laser beam. The process is reversible to effectively erase the digital data. There are approved national and international standards for some types of magneto-optical media.

Phase change rewritable processes alter the optical media's amorphous recording surface. Existing data can be erased and new data written during the same disk rotation, providing a direct data overwrite capability. Standardization of phase change drives and media has been slow compared to that of magneto-optical technology.

Federal agencies have several options regarding the selection of commercially available optical media systems. Conventional ablative-type WORM systems have been joined in the marketplace by rewritable systems offered as basically equivalent alternatives to WORM recording systems. Depending on the manufacturer's approach, these rewritable optical systems utilize prerecorded security codes, or software "locks" integrated into the optical drives or the rewritable optical media or both. These firmware/software security measures serve to control the system manager's data write and overwrite authority, effectively providing a WORM function with rewritable media.

In the final analysis, selection of either WORM or rewritable optical digital data disk technologies depends on several factors including the Federal agency's application requirements, regulatory constraints, available resources, and standardization issues. Ablative-type WORM technology appears to offer a substantial degree of information security and user confidence because, unlike rewritable technology, it is nonreversible. It is important to note that user confidence in the integrity of stored data is usually based on more than media reversibility or nonreversibility. In fact, data security functionalities, which can limit users to "read only" or provide an audit capability that tracks users with "write" privileges, are far more important.

User Experiences

No single WORM process was identified as the overwhelming, definitive choice among the 15 Federal agency sites surveyed. The actual recording process was only one of several factors considered, including the optical digital data disk manufacturer's name recognition, industry reputation, and history of producing quality products. Five of the fifteen systems employed rewritable technology. One example is the Department of the Army PERMS program's utilization of 5.25-inch rewritable magneto-optical recording technology as interim, or temporary (in-process), scanned image storage prior to transferring the image data to 12-inch WORM media. Another example is the Army's captured documents (DOCEX) system, which used digital audiotape (DAT) technology as interim storage, followed by image data transfer onto 5.25-inch rewritable magneto-optical disks for permanent storage. Regardless of the format selected, the site visits indicated that all optical digital data disk systems were functioning in conformity with the manufacturer's specifications and were meeting user needs.

Recommendations

Either WORM or rewritable technologies may be used, with the actual selection determined by the agency's specific application requirements. Ensure that read/write privileges are carefully controlled and that an audit trail of rewrites is maintained when rewritable technology is used.

Optical Digital Data Disk Storage Capacity

Technical Trends

Optical digital data disk storage capacity is based on several factors including the physical diameter (e.g., 5.25 inches) of a specific disk, the state of the optical recording technology, and the compliance to applicable industry standards. Continuing engineering developments in technical areas such as track widths, data recording, and shorter wavelength laser diodes (e.g., blue and green) are expected to dramatically improve optical digital data disk storage capacities. For example, decreasing the distance between adjacent tracks or reducing the spot size of the recording laser beam will allow more data to be recorded onto each optical digital data disk.

Table 2 illustrates the general storage capacities of various optical media formats and recording technologies.

Table 2. Optical Digital Data Disk Storage Capacities
Disk Type Diameter Capacity
CD-ROM 4.72 in 550 megabytes (MB)
CD-R 4.72 in 600 MB
WORM 5.25 in. (130 mm) 650 MB ­ 2 GB
WORM 12 in. (300 mm) 2.18 GB ­ 5.6 GB
WORM 14 in. (356 mm) 6.8 GB ­ 13 GB
Rewritable - MO 3.5 in. (86 mm) 128 MB-384 MB
Rewritable-MO 5.25 in. (130 mm) 650 MB – 2 GB
Phase Change 5.25 in. (130 mm) 940 MB - 1.5 GB

Many industry observers believe that ability to remove and interchange media is a key factor in the long-term market success and viability of optical digital data disks. The intense marketplace competition that existed during the early development of 12-inch WORM optical digital data disk technology meant that little effort was devoted to standardization. Consequently, 12-inch WORM disks are not interchangeable. Learning from this experience, industry leaders adopted a more cooperative approach with other optical digital data disk formats and achieved greater technical standardization.

For example, 5.25-inch diameter and smaller optical digital data disk formats are increasingly popular due to several factors: greater data storage capabilities, reduced costs for the optical drives and media, high-performance jukeboxes, and widely accepted industry standards. CD-R proponents believe that the current storage capacity of approximately 600 megabytes for the 4.72-inch discs will increase dramatically. Industry observers expect the increased storage capacity, lower costs for equipment and media, plus the interchange capability offered with CD-R's, will make them the optical media of choice for many applications. Nonetheless, larger diameter WORM optical digital data disks are likely to continue in large-scale digital-imaging environments. Laboratory research and development in areas such as laser wavelengths and read-head design will further increase optical media storage capacities and performance. These research efforts will impose additional demands on optical media and drive manufacturers to continue to improve product performance. Newly introduced products should be evaluated in terms of their unique system capabilities, applicability to agency needs, and compliance with existing industry standards.

User Experiences

The 12-inch WORM optical digital data disk format was the predominate media at the 15 Federal agency sites surveyed. For example, the Environmental Protection Agency's Superfund imaging system uses 12-inch WORM media in support of the compilation, retention, and distribution of cost-recovery reports and legal documents associated with cleaning up high-priority toxic waste sites. The Federal Communication Commission uses 12-inch WORM optical digital data disks to store and retrieve official records of agency rule-making and adjudicatory matters.

In comparison, several agencies visited use more than one optical digital data disk format. The Commodity Futures Trading Commission, for example, utilizes 12-inch WORM and 5.25-inch multifunction (WORM and MO) technology within one imaging platform for newsclips and agency legal files. Another example is the Department of the Army's PERMS system. The PERMS conversion site stores the scanned digital images on 5.25-inch rewritable magneto-optical disks. This image data is subsequently transferred to 12-inch WORM optical digital data disks for jukebox retrieval access. No agency sites surveyed utilized 14-inch WORM media.

Table 3 illustrates data collected during the 15 Federal agency site visits pertaining to disk sizes, recording methodologies, and system startup dates. More than half of the sites visited installed their 12-inch WORM optical digital data disk systems since 1990, with the rewritable magneto-optical format experiencing growth in 1991–92. Federal agency program administrators noted that the selection of critical system functional criteria such as optical digital data disk format or data-recording technology is optimally based on an analysis of organizational and end user needs.

Table 3. Optical Digital Data Disk Size/Recording Methodology
System Start Date 12-inch WORM 5.25-inch WORM 5.25-inch MO (Rewritable)
1985-1988 3 0 0
1989 1 0 0
1990 3 0 0
1991 3 0 2
1992 2 1 2
1993 0 1 1
Totals 12 2 5

Recommendations

Based on a requirements analysis and systems design study of an agency's operations, select the most suitably sized optical storage form factor that satisfies the agency's long-term programmatic needs and conforms to industry standards.

Jukebox Storage Systems

Technical Trends

A jukebox automates the storage and retrieval of optical digital data disks. Digital-imaging systems with only a few disks or minimal reference requests may rely instead on a multiple-drive tower configuration, cartridge system, or even manual loading of individual optical media into stand-alone optical drives. Factors to consider in selecting a jukebox include the planned data growth rates, the agency's information retrieval needs, and system acquisition and maintenance costs. A jukebox storage system is especially valuable in public access environments to avoid risks of damage or theft inherent in the manual selection, insertion, and refiling of optical digital data disks. Advantages of jukeboxes include elimination of manual disk loading, increased physical security for the optical digital data disks, and improved user access. Conversely, disadvantages of jukeboxes include system procurement costs, additional hardware maintenance, and slower information access as compared to magnetic storage technology.

Jukeboxes contain one or more internal optical drives; a disk "picker," or selector, mechanism; a specified number of disk storage bins or slots; and computer system interface controllers. Precise internal jukebox mechanical alignments are required to avoid optical media damage. The disk's identification data is read upon loading and logged into the controller's memory. The optical digital data disk is then transported to an available jukebox storage location or bin. When servicing a user's request, the computer system's database software automatically matches the query to an optical digital data disk location. The jukebox's disk selector mechanism delivers the requested optical digital data disk to an available optical drive. Following data read/write operations, the optical digital data disk is automatically returned to its preassigned bin. These operational steps are normally invisible to end users.

Jukebox systems can be obtained for all existing optical media sizes, although 5.25-inch and 12-inch formats are currently in greatest demand. A variation on the full-sized jukebox is the magazine jukebox, supporting fewer optical platters in modular-type removable magazines or cartridges. These systems can be cost-effective for small- to medium-sized imaging applications. The cartridge-type disk selector mechanism is generally faster than a full-sized jukebox, and the hardware supports read/write to both sides of the optical media without the need to physically flip the disk. Although several smaller jukeboxes can be linked or chained together, large-scale Federal agency applications should still evaluate full-size high performance jukeboxes.

Another conceivable alternative to a jukebox is a network of multiple stand-alone drives loaded with continuously rotating optical digital data disks. This configuration responds almost immediately to user requests for optically stored information, limited only by the optical drive's data access and transfer rates. Disadvantages of these dedicated-disk multiple-drive configurations are increased hardware procurement costs, computer room floor space, and higher equipment maintenance costs.

User Experiences

Jukeboxes are used at most of the 15 Federal agency sites visited for this report. One example is the Patent and Trademark Office, which employs a series of jukeboxes to store 300 dpi images to support the agency's printing needs, augmented with rapid-access disk drives storing 150 dpi images. In comparison, the Commodity Futures Trading Commission has lower reference demands and consequently installed a multicartridge magazine optical disk autochanger. Rather than obtaining a jukebox, the Army's captured documents system successfully used a multidrive tower configuration for rewritable optical digital data disks. Sites visited that lack jukebox equipment attribute this condition to procurement problems or budget constraints.

Recommendations

When selecting an optical digital data disk jukebox, consider the following factors: the overall information access needs (staff and public), budget and procurement considerations, and existing operations staff requirements.

Error Detection and Correction

Technical Trends

Optical digital data disk technology uses two methods to minimize digital data-recording and data-retrieval errors. The first uses powerful algorithmic error correction codes to automatically detect and correct data-read errors. The second technique employs error correction code software to determine if and when the utilization of error correction codes is approaching a critical point. If so, the data sector is automatically rewritten to another sector on the optical digital data disk referred to as the spare sector area.

Error Detection and Correction (EDAC) computer chips in the optical drive electronics subsystem routinely monitor information about the raw uncorrected error rates and provide for automatic correction. The EDAC system also monitors the seriousness of the required correction. When the error correction reaches or exceeds certain thresholds, the sector in error is copied to another area and appropriate tables are updated. Unfortunately, the disk drive's EDAC system does not always provide advanced warning of an impending copy operation to the computer's operating system. To users, and even to the computer system itself, everything may appear normal when in fact the error thresholds for invoking the automatic copy (reallocation) may be invoked on the next read operation. If the error rate were to rise suddenly, such as the case of a new blemish on the media, the availability of spare sectors for optical digital data disk copy operations could be seriously impacted. This situation can lead to a permanent inability to retrieve the optically stored data. Error correction status information also provides an audit trail to measure the progress and degree of data degradation.

The Association for Information and Image Management (AIIM) C21 Committee (Optical Disk Applications) is working with optical digital data disk users and drive vendors to develop a standardized set of tools for monitoring and reporting media errors to the system. These media error monitoring and reporting tools can be used by the users for monitoring of possible data degradation. These efforts have led to the proposed national standard ANSI/AIIM MS59-199X entitled Use of Media Error Monitoring and Reporting Techniques for Verification of the Information Stored on Optical Digital Data Disks. Predicting the point at which an optical digital data disk is no longer readable is critical so that the data can be recopied in time. The proposed ANSI/AIIM MS59 standard provides for a set of optical digital data disk media error monitoring and reporting tools that a user or system integrator can use to design a utility that would allow the user to obtain media error information. AIIM Committee C21 is also developing a Technical Report that will provide guidelines on how to use the tools included in ANSI/AIIM MS59.

User Experiences

The 15 site visits revealed that no information is routinely provided to system administrators detailing the extent to which the automated error detection and correction mechanisms are invoked during the optical digital data disk read or write operations. A vast majority of system administrators reported that they have no indepth understanding of the use of the error detection capability provided by the equipment manufacturer. No loss of information stored on the optical media was attributed to the error correction subsystems.

Recommendations

Require that equipment conform to the proposed national standard ANSI/AIIM MS59-199X, "Use of Media Error Monitoring and Reporting Techniques for Verification of the Information Stored on Optical Digital Data Disks."

Small Computer Systems Interface

Technical Trends

The Small Computer Systems Interface (SCSI) is one of the most important system component developments in recent years. The SCSI is the primary communications interface used in optical digital data disk systems. The American National Standards Institute (ANSI) X3T9.2 committee developed the intelligent parallel interface for transfer of information to and from mass storage devices such as tape, magnetic disk, and optical disk drives. It is the primary mechanism that enables optical drives and other peripheral devices from different manufacturers to communicate freely with each other. Although this standard is comprehensive, many of the specific implementation factors of the SCSI access method are the responsibility of local systems integration firms. This situation results in incompatibilities at the software level among the different manufacturers' disk drives.

The "Write and Verify" command available within the SCSI is a particularly valuable command for assessing error checking. This command can help ensure that accurate data is written to the optical digital data disk during the initial recording process. In SCSI parlance, when the "Write and Verify" command is invoked, it requires that the target device (typically the optical drive controller board) write the data transferred from the initiator device (typically the central processing unit or CPU) to the medium and then verify that the data is correctly written. Optical digital data disk technology (including the SCSI interface) has not stabilized to the point that "one size fits all." The existence of interface standards such as SCSI-1 and SCSI-2 does not necessarily mean that all SCSI interfaces will be compatible. A properly designed optical mass storage system can overcome this problem by including either a high-speed communications capacity or a magnetic tape drive with associated software.

User Experiences

Data collected during the Federal agency site visits indicated that technical details concerning SCSI implementation are generally the responsibility of the systems integrator. Program managers are able to utilize optical media storage systems without any indepth SCSI operational knowledge. No loss of system functionality was attributable to any particular SCSI interface in use.

Recommendations

Specify the SCSI "Write and Verify" command when writing data to optical digital data disks.

Require system manufacturers and integrators to provide complete documentation on the specific configuration of the SCSI (or other interface) hardware and software.

Backward Compatibility of Optical Systems

Technical Trends

Unlike analog formats of paper or microfilm, optical digital data disks are digitally encoded and are not "human-readable," necessitating continued access to specialized retrieval equipment; therefore compatible retrieval equipment will be needed for as long as the data must be accessed. Technological obsolescence is a major impediment to digital data access over time. Backward compatibility minimizes the impact of obsolescence when acquiring new or replacement information storage systems, ensuring that information stored by an older digital storage system can be read, converted, and integrated into the new generation. Since the useful life of most computer components is finite, agency administrators are responsible for transferring digital data to any newly acquired systems.

User Experiences

The 15 site visits revealed that backward compatibility issues are not a major concern. The Patent and Trademark Office, the earliest adopter of optical storage technology surveyed, successfully "migrated" image data from an older optical digital data disk generation to a newer one. This migration was partially accomplished by converting a proprietary network system to an open-systems environment in which the older drive controllers (when upgraded) could communicate with the applications program in the same way that newer drive systems could. Once the next generation optical drives became operational, it was then possible to copy or load the information stored on the older media to the newer technology media. Program administrators noted that successful data migration requires high-level planning and critical technology forecasting to be effective. An in-house agency system architect, or "visionary," is important to ensure that the existing and replacement processes are compatible.

Recommendations

Require upgrades or replacement systems to be backward compatible with existing information systems.

Or

Convert the existing digital information to the new format at the time of system upgrade or acquisition.

Optical Digital Data Disk Longevity

Technical Trends

Longevity is defined in this report as the useful shelf-life expectancy of optical digital data disks before writing (prewrite), plus the estimated post-write data lifespan. Most optical digital data disk manufacturers guarantee a minimum prewrite shelf life of 5 years, which is considered sufficient for most programs employing supply-inventory controls. A postwrite life expectancy of at least 20 years should be required.

Although the Joint Technical Commission (JTC) of IT9-5 Committee of Image Permanence (which is an accredited ANSI organization) and the Audio Engineering Society (AES) are together developing standards for determining the life expectancy (LE) of optical media systems, at the present time there are no approved standards for LE. Manufacturer's claims of optical digital data disk longevity, particularly post-write, should be carefully analyzed. Because extrapolated life expectancy values may vary widely as a result of different assumptions, approaches, and test methodologies, it is especially important to determine the basis of the manufacturer's testing process. This determination should include an evaluation of manufacture data, test methodology, test procedures, and test results based upon the findings described in NIST SP-200. The nature of this analysis requires technically qualified users to adequately evaluate the test procedures, results, and other criteria. Ideally, any advertised accelerated aging test results are based upon tests of specific areas (outer, middle, and inner), rather than an average reading of the entire optical digital data disk surface. This important issue will remain a concern until an industry-wide standard reference test methodology for optical digital data disks is widely adopted. A joint ANSI and Audio Engineering Society working group has developed a draft standard that proposes a standard test methodology for predicting the life expectancy of CD-ROM media.

User Experiences

During the survey of the 15 agency sites, no digital data recording or information retrieval problems linked to optical media longevity were reported. Since a majority of the 15 systems visited were not installed until 1990 or later, shelf-life or postwrite longevity problems were not expected. Isolated problems during information retrievals, if they occur at such an early stage, might best be investigated as manufacturing defects (or care or handling lapses) in specific optical digital data disks.

Recommendations

In addition to conducting a careful analysis of each manufacturer's media life expectancy testing methodologies and procedures, require the use of optical digital data disks with a pre-write shelf life of at least five years.

Require a minimum post-write life of twenty years based upon manufacturer optical digital data disk life expectancy tests that conform to the findings of NIST SP-200.

Optical Digital Data Disk Substrates

Technical Trends

An optical digital data disk's substrate constitutes the optical media's basic foundation. Optical substrates require precision manufacturing using materials such as polycarbonate and tempered glass. The manufacturer's choice of materials depends on several factors: cost benefits, data storage, longevity characteristics, and media durability. Information recorded on any of the these substrates is likely to outlive the system's hardware and software components, assuming the optical digital data disks are manufactured according to stated specifications, and maintained under controlled environmental storage conditions.

User Experiences

Tempered glass and polycarbonate were the substrate materials observed in use at the 15 Federal agency sites surveyed. Federal program managers do not appear to consider media substrate characteristics as a significant factor in product selection. Federal agencies generally allow such technical decisions to remain with the individual media manufacturers. No operational problems or data loss directly attributable to optical digital data disk substrates were noted.

Recommendations

Optical digital data disk substrates of polycarbonate or tempered optical glass are acceptable.

Optical Digital Data Disk Storage Environments

Technical Trends

Information recorded on optical digital data disks is not immune to degradation from hostile storage environments. High relative humidity can oxidize an optical digital data disk's recording layers and seriously jeopardize information retrieval. The current standards for storage environment are based on storage of magnetic media, or more specifically, magnetic tape. Generally the same conditions are suitable for optical media, but a few notes of caution are in order. Optical media generally cannot tolerate rapid temperature fluctuations over wide ranges. Very rapid temperature fluctuations can cause some optical media to partially separate, which in turn allows moisture to enter between the substrate layers. Also, optical media should be stabilized in temperature before use. High humidity can also promote the growth of molds on optical digital data disk surfaces. Ideally, relative humidity should not exceed 50 percent (lower is acceptable), with ambient room temperature no greater than 75 degrees Fahrenheit. The ability to maintain stable environmental conditions is more important than any predetermined temperature and humidity settings.

Optical digital data disk systems should not be operated, or the individual disks stored, under high levels of dust, dirt or other particulate matter. This applies to airborne carbon toner particles released by laser printers and electrostatic copy machines. Dust particle buildup on the optical media surfaces can reduce the laser diode's ability to detect differences in reflectivity. If optical digital data disks require cleaning to remove contamination such as dust, particulate matter, or fingerprints, this should only be attempted under strict adherence with the manufacturer's guidelines. Improper disk cleaning techniques may result in permanent optical media damage. If the optical media manufacturer recommends disk cleaning, then follow the manufacturer's suggested procedures.

User Experiences

It is not always feasible to operate digital imaging systems or store optical digital data disks under optimum environmental conditions. For example, the Army's captured documents system scanned records and stored the data on digital audiotape (DAT) cartridges under extremely adverse field conditions. Although the image data was eventually returned to the United States, the Army's experience provides one example of the extremes under which digital imaging technology can be utilized (although such extremes are not preferable).

The Patent and Trademark Office experienced a gradual degradation in their system's ability to access the optically stored image data. Data-read errors were diagnosed and traced to dust and dirt particles on the optical media surfaces. Further analysis of the problem indicated that the source of the particulate matter was the system's laser printers. Carbon toner particles and paper dust generated during normal laser printer operations eventually caused disk-read errors. Relocation of the laser printers eliminated the problem.

Related NARA research indicates that this dust and dirt problem also exists in other optically based technologies. CD-ROM drives installed in nonoffice environment areas, such as warehouses, repair shops, and construction sites, are vulnerable. Under these environments, routine cleaning of the discs and the drive's lens mechanism may be required. In any case, optical equipment should be installed in the cleanest environment available, and preventive maintenance should be conducted in accordance with the equipment manufacturer's recommendations.

Recommendations

Optical digital data disks should be stored in areas with stable room temperatures and with relative humidity ranges consistent with the storage of magnetic tape media. Avoid storage areas with excessive humidity and high temperature, and do not subject optical digital data disks to rapid temperature extremes.

If possible, do not operate systems or store optical digital data disks in environments with excessive airborne particulate matter.

Cleaning procedures for optical digital data disks must be in strict conformance with the media manufacturer's recommendations.


Section 7: Information Retrieval

Management Issues

Providing users with convenient and inclusive access to the stored image and index data is a fundamental objective of information storage-and-retrieval systems. Federal agencies are recognizing the inherent advantages of digital imaging and optical digital data disk storage systems to:

  • Improve preservation of original records,
  • Lower costs of document storage, and
  • Improve information retrieval.

Since original paper records are exposed to risk of theft, physical damage, and misfiling during user reference, scanned electronic images provide an extra measure of document control. In addition, storage costs can be reduced by transferring the infrequently accessed original records out of expensive prime office spaces.

The retrieval subsystem is the user's primary point of contact with the optical digital data disk storage system, and usually is the only connection that a typical researcher has with the imaging system. It is crucial that the imaging system's user interface component be powerful yet easy to understand and support the researcher's information access requirements. Information retrieval includes searching of the index database, identification of the image file, and creation of either a hard or soft copy (i.e., screen display) for viewing.

Depending on the nature of the agency's operations and the classification of the information stored, access to the digitally stored information may need to be restricted. These restrictions may be instituted through system software controls tied to passwords or user identification codes or both. These controls can be instituted onsite and, in the case of imaging systems with telecommunications capabilities, offsite. System administrators may also impose physical constraints, such as not permitting the general public to gain personal access to the system workstations, and different levels of software restrictions, such as allowing certain authorized individuals to access specific index or image data. Software controls can also restrict the types of operations that specific users can perform. Thus, a given user or group of users may be allowed access to certain database records but not others; similarly, these authorizations may extend to retrieving database information but not the ability to edit the data.

In order to retrieve a digital image file, an index search must be conducted based on the existing index database. Database software is useful for managing image file indexes, relying on searchable key data fields with the assistance of various Boolean search operators (such as AND, OR and NOT) to augment the search process. The results of a successful search (when at least one file meets the search criteria) are usually provided in a "hit" listing, with the researcher free to select the desired images for viewing or printing. The researcher may also choose to continue refining the search process using the database software prior to actually retrieving any of the digitally stored images.

Optical digital data disk systems frequently utilize temporary magnetic disk buffer storage to store images locally. Upon user request, the digital images are transferred from the optical digital data disks to temporary (primary) storage buffers. The images are then automatically spooled to the local workstation or print-server cache storage buffers, freeing the optical media subsystem to accept other user requests or to perform other functions. To accelerate apparent system response and allow users to begin reference as soon as possible, the retrieval subsystem may transmit and display the first image in a file while the other images are subsequently undergoing transfer to the requester's workstation. The local workstation's cache buffer, which can be either magnetic hard disk or random access memory (RAM) storage, provides an image storage site for image decompression and facilitates a faster "page-turning" response.

An image-enabled workstation is defined as any terminal equipped with a display screen that renders a representation of a document image. An imaging workstation is the primary user reference tool in a digital image-based system. Since a digital image can be effectively displayed and appear as a facsimile of the original paper document, a paperless reference system is theoretically possible. In reality, however, many users will still request hard-copy prints (if available), at least until they reach a comfort level with the imaging system's display monitors. Screen display resolution is similar to scanning resolution, based on the number of lines that determine image legibility. From a system acquisition standpoint, as image display quality increases, so does workstation per unit cost. Since screen display of images may not be sufficient for all applications, virtually all digital image systems require a hard-copy output capability. The most common printing equipment by far is laser-print technology, utilizing components similar to those used in electrostatic copiers, which represents a proven, dependable digital imaging technology.

Technology Trends

Digital-imaging systems are designed to respond quickly to user requests for information using high-resolution image displays and laser prints. The information storage-and-retrieval industry is researching and developing higher performance display and output technologies to include larger sized display screens with higher resolution capabilities, integration of fax servers on the network, laser printers with higher print qualities and output speeds, image compression techniques, computer output to laser disk (COLD) systems evolving as replacements for existing microform applications, and high-performance raster computer output to microfilm (COM) recorders.

Image Display Workstations. One of the primary concerns of system designers and users alike is determining the exact level of screen resolution that is adequate for legibly displaying digital images. Although typical monochrome monitors supplied with personal computers for text display are low in cost, they do not have sufficient resolution to legibly display a digital image of an office document regenerated at full size. When these low-resolution monitors are used for digital-imaging systems, viewing of selected image areas in a zoomed or enlarged mode must be readable.

Typically, monitor screens selected for image-based systems have sufficient screen diameter and resolution to legibly display an 8.5- by 11-inch document at full size, which usually requires a minimum of 100-dpi display resolution both horizontally and vertically. Depending on the characteristics of the original documents, this display screen resolution may still require frequent image zooming to obtain adequate legibility. A higher display resolution of 150 dpi both horizontally and vertically is quite common in imaging systems and provides legible screen display of even 4-point type without the need to zoom the image. An added advantage of larger, higher resolution display technology is the ability to display both character (index and system) data and image data simultaneously.

There are a number of processes that can be applied to a digital image to increase its utility to users. In most cases, the image modifications are temporary for display only and do not result in permanent changes to the stored image files. Permanent changes to images usually take the form of image editing that involves the addition or removal of pixels from the image. One example of a temporary image modification is image sizing, a process that refers to the increase or decrease in the size of the image in relation to the physical dimensions of the display screen. Images that are too large to be displayed completely on the screen may be scaled down using pixel reduction, normally performed equally in the horizontal and vertical dimensions to avoid distortion. A technique for displaying oversized images without resorting to pixel reduction uses the image-scroll and -pan capabilities, in which the screen displays only a portion of the image at a time. The researcher pans horizontally or scrolls vertically through the image using a mouse device or keyboard cursor keys.

It is also possible to display selected portions of digital images at their original scan resolutions (e.g., 200 dpi) on display screens that do not have that specific resolution capability. This display function is referred to as image enlargement or zooming. For example, if an image was scanned at 200 dpi and displayed on a 100 dpi screen, the image segment would be enlarged by a factor of two in both the horizontal and vertical dimensions. This provides an effective zoom factor of 4:1 (x-y dimensions) and displays all scanned pixels for that specific image window.

Another visual alteration feature is the capability to selectively switch the positive and negative appearance of digital images. That is, the researcher may convert from the normal black character and white background screen format to a reversed white-on-black background mode. This process is performed electronically by altering the digital pixel 1's and 0's, and is especially useful when scanning from negative-appearing microforms or aged Photostats Another useful image display feature is electronic image rotation between portrait and landscape orientations, allowing the researcher to pivot images using either a predetermined or variable number of degrees.

Digital Image Printers. Virtually all digital imaging systems have an integrated capability to produce hard-copy output. Laser-print quality is superior to other printing technologies such as dot matrix, ink jet, and others. Laser printers are the most common printing technology for imaging systems, using a laser beam or other light source to create a temporary image on a photosensitive surface. Electrostatic toner particles are applied to this temporary image and subsequently transferred and heat-fused to a sheet of paper. Similar to scanners and display terminal screens, image resolution is a critical print-image quality factor. The vast majority of laser printers are currently based on 300 dpi resolution, while newer laser printers offer 400 dpi and higher resolution. If the original scan resolution matches the printer output (i.e, 300 dpi), the print will be a simple one-to-one scale process. However, if the scan resolution is different, digital image scaling is required to synchronize with the printer resolution. Recent developments in laser-printer controller components can significantly enhance print quality and resolution by altering the size or location of the dots to reduce the stair-step, or jagged, appearance of lines, print at resolutions higher than 300 dpi, and print true halftones.

Another consideration is printer speed, usually stated in pages per minute (ppm). Low-end printers are typically rated at 7 to 8 pages per minute, mid-range may reach 20 pages per minute, and high-speed printers can produce over 100 pages per minute. Printer speed is a function of many interrelated factors, with image-buffer storage performance as a significant issue. Since the printing equipment can only print the digital data it has available, the image buffer can influence the quantity and throughput speed of the print production process. Screen prints are useful to supply a researcher with a "snapshot" of the display screen contents. Users find this capability beneficial when using a large display screen that provides both index and image data simultaneously. Screen prints provide a unitized record of the digital image and related index information.

Retrieval Applications Software. This vital element of imaging systems also contributes to improved in-process workflow and data output. Digital-imaging system processes controlled by software include data communications, image capture and workflow management, index and database creation, and system management. System workflow software controls all phases of document image capture, tracking, routing, indexing, retrieval, and printing. Image-enhancement software allows the manipulation of images to improve legibility and reduce digital file sizes. Workstation image-display software allows image control for functions such as zooming, rotation, and scrolling, as well as multiple-page screen formats.

User Experiences

The Federal agency sites visited for this report were generally equipped with hardware and software information-retrieval components suitable for the intended applications. For example, systems capturing and displaying document images were equipped with high-resolution display monitors and printing hardware, while the systems storing ASCII data require slightly different retrieval components.

High-resolution displays: Twelve of the fifteen Federal agency systems visited are scanning documents and viewing the images on display monitors of 114 dpi to 150 dpi resolution. The predominant screen dimension is 19 inches (diagonal), providing simultaneous display two digital images side-by-side.

Text data displays: Two of the systems visited that capture, store, and retrieve digital (nonimage) textual data utilize conventional computer character-type display terminals. The higher resolution (and higher cost) image display terminals are unnecessary for these applications.

Gray-scale display: The US Geological Survey's National Earthquake Information Center (NEIC) utilizes display technology featuring 800 x 1,000 pixels and 256 shades of gray. These workstations support the higher resolution, finer detail display of the NEIC 24-bit digital image data format.

Laser printing: Laser printers were the output devices of choice for the majority of systems visited. Laser printing provides the ability to print raster-image data as well as offering multiple-font capability. The systems not equipped with laser printers utilize conventional computer impact-type printers.

COM output: The Department of the Army's PERMS system provides several output options: A soldier's entire personnel record (or selected portions thereof) may be recreated from the optical digital data disks using laser printers, microfiche recorder, or magnetic tape. The PERMS microfiche recorder produces film at 300 dpi resolution at a production throughput rate of 60 images per minute.

Local area networks (LAN): Most sites visited utilize a local (or wide) area network as an integral system capability. Workstation connectivity ranges from a few terminals internally linked within the imaging system to a large network of hundreds of workstations, such as the one planned by the Patent and Trademark Office. Several of the systems feature network connections to minicomputer or mainframe computer systems for index data access or electronic distribution of image data to remote users.

Image redaction: The Department of State's REDAC system allows online review and selective electronic editing of document images prior to release. Selective processing of the electronic images allows the system's output to be tailored to meet information security requirements.

Recommendations

Conduct a comprehensive requirements analysis of end users' information-access needs and a systems design study prior to procuring imaging system components.


Section 8: Information Management Policy

In order to maximize the benefits of digital optical storage technologies, Federal agency administrators must take into account four significant information management policy issues: (1) linking optical digital data disk technologies with more effective and less costly delivery of services to citizens, (2) disposition of original records converted to a digital format and stored on digital optical media, (3) legal admissibility of digital images stored on optical digital media, and (4) long-term access to the stored information. Data collected during visits to the 15 Federal agencies revealed several different approaches to address these issues. Although agency administrators were not specifically asked for their assessment of the viability of the legal admissibility of digital images, there is ample evidence that legal admissibility is a matter of considerable concern. Generally, agency administrators do not view long-term information access as a significant issue at this time.

Cost-Effectiveness

Management Issues

The long-term viability of a digital image and optical digital data disk storage system is enhanced when the system provides visible value for the investment expended. In terms of the National Performance Review objectives, digital imaging and optical digital data disk storage systems can deliver improved information services to citizens at less cost. Certainly, effective systems design and program administration should be the heart of any Federal agency records information system.

A key factor in effective systems development is to establish early in the design process the purpose of the proposed digital-imaging or optical digital data disk system and how the system will deliver better service at a reduced cost. Existing paper-based information practices and overall information needs should be reexamined in order to identify any necessary changes in internal operational practices, procedures, or staff responsibilities. A poorly designed or awkward paper records system is likely to be unwieldy when converted directly to electronic form, especially when existing processes are not evaluated and changes are not adopted.

Federal agency administrators considering a digital-imaging system should recognize that integrating optical digital data disk technology is not an automatic palliative for poor records management practices. If imaging systems merely automate existing document-control procedures, then preexisting operational problems and inefficiencies will persist. Agency administrators must be willing to adopt new organizational processes to significantly improve internal agency functions that can be achieved with new technologies. Generally known as "business process reengineering," this approach views the introduction of new technologies as an opportunity to reassess preexisting office operating procedures, maximize the benefits of increased information access, and reduce administrative disruptions and operational costs. An overall goal should be to improve a Federal agency's responsiveness to its constituency by eliminating labor-intensive manual processes, enhancing internal document controls, and improving internal organizational processes. Any system that meets this goal is likely to remain of enduring value.

User Experiences

Responses from the 15 site visits indicate that agency administrators are often willing to consider changing existing organizational procedures if corresponding benefits appear feasible. The surveys showed a diversity of benefits, often directly proportional to the level of commitment to undergo a change. Benefits obtained often depended on the size and configuration of the imaging system, as well as expectations of agency administrators and end users. Modifications of functional workflow areas to accommodate the new systems ranged from minor adjustments in the existing records processing to complete redesign of routine organizational processes.

For example, the Federal Communications Commission adopted changes to agency procedures in the processing and retention of original and duplicate paper records. The Library of Congress redesigned operations within their Master File Unit to capitalize on the benefits offered with imaging technology and realized an immediate improvement in faster input processing. Following the integration of imaging, the Agency for Toxic Substances and Disease Registry experienced a major change in the way scientists (the agency's major users) conduct their research work. Improvements to information flow eliminated manual paper file updates, providing enhanced information access to the scientific community. Direct benefits included access to a more efficient scientific database, which has ultimately improved the agency's public health response to constituents.

Recommendations

In order to maximize the long-term viability of systems, develop digital imaging and optical digital data disk applications in a cost-effective manner.

Where possible, link system design to Government improvement initiatives such as the National Performance Review.

Reexamine existing paper records systems prior to conversion to optical digital data disk systems to maximize productivity and improve delivery of information services.

Disposition of Original Records

Management Issues

The disposition of Federal records is governed by regulations issued by the National Archives and Records Administration (NARA). Records cannot be destroyed or otherwise disposed of without the authorization of the Archivist of the United States. This authorization is granted by approval of a records disposition schedule or issuance of a General Records Schedule. Approval of records disposition schedules and General Records Schedules involves an appraisal process to determine the appropriate retention period for the records. In general, between 3 and 5 percent of Federal records are of sufficient value to merit permanent retention in the National Archives. Records appraised by NARA as temporary may be stored on any medium, including optical digital data disks, that ensures maintenance of the information until its authorized retention period has expired. However, when paper records appraised as permanent are converted to another format, both the original paper records and the records in the new format must both be scheduled. No permanent records converted to optical digital data disks may be destroyed without NARA's approval. In some cases, NARA may authorize an agency to destroy permanent records after copying onto an optical digital data disk provided that the agency agrees to convert the records to a medium acceptable to NARA at the time of transfer to NARA's legal custody.

NARA regulations specify the conditions under which records on media other than paper, such as records on microform, digital, or electronic media, may be accepted for transfer to the National Archives. The National Archives accepts records on magnetic media that conform to ANSI standards (6250 bpi open reel tape and 3480 class cartridge tape), and CD-ROM. NARA regulations do not cover the actual conversion process. Consequently, agencies that wish to destroy permanent records after copying them onto optical digital data disks must submit a proposed records disposition schedule to NARA. The agency must certify on a Standard Form 115 that either the optical disks meet the requirements of NARA Bulletin 94-4, or that the agency will convert the optical disk images to a medium that meets the standards specified in Subchapter B of 36 CFR Chapter XII before transfer to NARA's legal custody.

User Experiences

The 15 Federal agencies surveyed for this report adopted digital-imaging systems for processing both nonpermanent and permanently valuable records. Digital-imaging systems for Federal records with retention schedules longer than 10 years were in the majority. The most common types of permanent records surveyed included military personnel records, land grant records, environmental cost-recovery files, legal court and rule-making documents, and patent registration records. For example, the Federal Communication Communication's docket files contain records pertaining to rule-making and adjudicatory matters and thus are designated for permanent retention.

The State Department's REDAC imaging system scans photocopies that are subsequently discarded; however, the original documents remain in their filing systems supporting day-to-day operations. The Environmental Protection Agency typically retains their record holdings for 20 years after cost-recovery litigation is completed. With the exception of the Patent and Trademark Office records, where the originals are retained at an underground storage facility in Boyers, PA, the added cost of also retaining the original paper records could not be verified. In the case of the Patent and Trademark Office, other substantial benefits more than offset the cost of storing the original records.

Examples of temporary or shorter term records (retained for a period of 1 to 10 years) included standard forms (purchasing and account records), technical reports, journal and periodical information, and general office correspondence. In these cases, optical digital data disk storage systems provide enhanced access to files, especially when more than one department or user may need access simultaneously. At the Library of Congress, for example, some of the journal literature and other scanned documentation are destroyed after scanning, while others are retained for an interim period (3 years or less).

Recommendations

Conform to NARA policy regarding the disposition of original records when converting to an optical digital data disk storage system.

Legal Admissibility

Management Issues

Over the past few years, there has been considerable debate and discussion regarding the legal admissibility of digital images stored on optical digital data disks, especially disks used for storing Federal records after the original paper records are disposed. A guideline issued by the Department of Justice (DOJ) in 1992, along with an Association for Information and Image Management (AIIM) study and recommendations, calls attention to the fact that the key question is not the storage media but rather the integrity of the records themselves. Hence, the DOJ report refers to "electronic records" and argues that information stored on an optical digital data disk "should be treated no differently than information stored on magnetic disk or tape regarding its admissibility and trustworthiness." The key issue for Federal records managers is to become familiar with how the rules of evidence apply to such records and to ensure that procedural controls that protect the integrity of Federal records are in place and adhered to.

AIIM's technical report TR31-1992, "Performance Guideline for the Legal Acceptance of Records Produced by Information Technology Systems," builds upon the DOJ guideline by identifying strategies that should be followed in creating, storing, and keeping electronic documents to ensure legal admissibility. The AIIM report is designed to assist Federal and State Government organizations in the enactment of laws, promulgation of regulations, and adoption of policies concerning legal acceptance of records produced by information technology systems. Additionally, the report provides guidance in areas such as the types of legal language and organizational procedures that should be adopted to achieve records admissibility in a court of law. Although AIIM TR31-1992 is media-independent, in that the guidance is applicable for microfilm or magnetic storage, the performance guidelines contained therein are especially relevant to paper records converted to digital images.

AIIM TR31 identifies and discusses three crucial issues that establish the accuracy and authenticity of electronic records. First, it must be demonstrated that the system used to produce the information is capable of producing accurate and trustworthy records. One measure of this capability is that the records are produced "as part of a regularly conducted activity such as those produced in the regular course of business" and are relied upon to carry out day-to-day business. The second crucial issue is that "established [written] procedures demonstrate what an organization plans to do in managing and controlling the process or system--as opposed to what it actually does," especially when records are modified, duplicated, converted, or destroyed. The third crucial issue is the use of periodic audits to document "that the process or system produces accurate results." The performance guidelines delineated in the AIIM report are rooted in sound management practices, supported by written procedures that take into account the three preceding crucial issue areas. It is worth noting that under the best evidence rule, Federal courts are more likely to accept as evidence the records that agencies relied upon in conducting their day-to-day business.

User Experiences

As noted earlier in this report, NARA's study of 15 Federal agency applications storing document images on digital optical digital data disks did not specifically identify legal admissibility as a problem issue. There are several possible explanations that may account for this lack of concern. The primary reason is that many of the paper records converted to digital images were noncurrent records, and were unlikely to be the subject of litigation. Second, in numerous instances, the original paper records were retained as the "record copy" with the consequence that the paper records would be the "best available evidence" in any litigation. Of course, these particular circumstances are likely to be encountered less frequently in the future.

Recommendations

Become familiar with how the rules of evidence apply to Federal records, and ensure that procedural controls that protect their integrity are in place and adhered to.

Implement the recommendations provided in AIIM TR31, Parts I and II, applicable to agency projects using digital-imaging and optical digital data disk storage technologies, either in the conversion of paper documents to digital form or their initial creation in digital form.

Long-term Access

Digital-imaging and optical digital data disk storage technologies are evolving rapidly. Both areas are marked by a nearly continuous development of new approaches to converting, storing, and retrieving information. Ensuring long-term access to records of enduring value stored on optical digital data disk systems is more than a matter of image quality, system functionality, and media stability. Adoption of imaging and optical digital data disk storage requires an agency commitment to a technological evolution that it does not control. Hence, the long-term needs of system users will not be met by merely making an organizational decision to acquire optical digital data disk technology. Federal Government agencies using optical digital data disk technology for the storage and retrieval of records needed for longer than a system's life of 7 to 10 years must take steps to ensure that the information remains accessible into the future.

The importance of planning for technological obsolescence and disaster recovery, and the need to migrate data under these adverse conditions must be recognized by senior management. A cohesive plan will help avoid being overwhelmed by uncertainty and wasting valuable resources in a perpetual game of technological catch-up. Federal agency administrators adopting optical digital data disk technology must also continue to monitor technological trends; plan for systematic maintenance, upgrade, and eventual migration to newer technologies; and, overall, act responsibly to ensure that the quality, integrity, and value of important agency information are preserved.

Recommendations

Develop an agency-wide data migration and disaster recovery plan well in advance of such an event for the digital imaging and optical digital data disk storage system.

Obstacles to Access

The long-term viability of digital-imaging and optical digital data disk storage systems is constrained by their current inability to retain content, context, and functionality over time. These limitations are driven primarily by three considerations:

  • Vendor instability
  • Optical digital data disk life expectancy
  • System obsolescence (hardware or software)

Vendor Instability. It is in the nature of manufacturer-vendor behavior that short-term customer benefits take precedence over longer term needs, however clearly defined. Therefore, system administrators must carefully assess the viability of vendors when acquiring optical systems that are obviously dependent on specific vendors or manufacturers. A situation wherein a system supplier goes out of business or abruptly withdraws maintenance could result in a loss of software support, and a system's continued operation could be jeopardized. To minimize these risks, there are two impending obsolescence warning signs to which managers should be alert: (1) when manufacturers announce cessation of a particular product or line of products or (2) when the manufacturer or vendor announces the end of maintenance support. In the event of either circumstance, managers should make immediate plans for migrating the records application to a new storage system.

Some Federal agencies are exploring the option of having proprietary vendor and manufacturer computer applications software codes placed in a secure, accessible location in case of corporate failure. Vendors could, for example, be required to deposit with a bank or similar facility a copy of the computer software application codes. Federal agency management would have access to and ownership of the computer software code in the event of company reorganization or dissolution.

Recommendations

In the event warning signs of impending obsolescence appear, managers should make immediate plans to migrate the application to a new system.

Require vendors to deposit a copy of the computer system's application software codes and associated documentation with a bank, archives, or secure records facility in case of a business failure.

Media Life Expectancy/Data Transfer and Backups. Media life expectancy is based on accelerated aging tests that are only a general indicator and not necessarily a predictor of the life expectancy of individual optical digital data disks. Therefore, agency administrators should not rely exclusively on a predicted life expectancy in determining when to recopy to new media. Periodic verification of disk degradation obtained through information about error detection and error correction activity is essential.

The combination of projected media life expectancy with periodic verification of optical digital data disk degradation is useful in identifying when to recopy media, but it offers no solution for a natural or man-caused catastrophe. Creating a duplicate copy of the information, preferably stored in an offsite location, provides the best protection in the event of such a catastrophe. One possible approach to obtaining a backup copy is to record the data to two optical digital data disks simultaneously at the time of creation. Undoubtedly, this duplication would entail greater cost and could possibly degrade the overall system performance. A second alternative is to copy the information to inexpensive, high-density magnetic storage media. A third alternative, retention of the original paper records or creation of a microfilm copy of the information, would minimize problems of technology obsolescence.

Recommendations

Recopy data stored on optical digital data disks based on the information obtained through periodic verification of media degradation.

Create a backup copy of the information stored on optical digital data disks for retention in an offsite facility, using the appropriate storage media (optical, magnetic, paper, or microfilm) that best satisfies agency requirements.

System Obsolescence. Optical digital data disks are far more durable and stable than the hardware and software required to maintain access. All rapidly evolving technologies, including digital-imaging and optical digital data disk storage systems, render specific applications obsolete at a daunting pace. High technology imposes a never-ending upgrading process on those who are responsible for maintaining information access. Archivists, records managers, and program administrators must assume that continuing access to data stored on optical digital data disks may require some degree of system upgrade. Upgrading requires planning and budgeting from the point at which the original system is acquired.

Federal agency information systems storing records or other digital data with long-term value must be capable of transferring data to future technology generations. The capacity to guarantee system upgrade and data transferability at a minimum cost involves tradeoffs among a number of factors including system security, system performance, and storage efficiency. As noted elsewhere in this report, adherence to international standards that guarantee backward compatibility between new and old technology generations is an effective tool to combat system obsolescence. It will be several years, however, before all of these standards are in place, and it is possible that vendors may elect not to build backward compatibility to some of today's systems. In this event, it may be necessary for the agency to assume direct responsibility for data transfer. Consequently, it is important that agencies receive full technical documentation of system components, application software, and operating systems. Minimum documentation requirements include:

  • A hardware systems administrator manual, specifying standard hardware configurations (including cabling) and other specialized configurations (i.e., data communications)
  • Software applications documentation from the appropriate developers (i.e., user manuals, design documentation, and maintenance manuals)
  • Application-specific operational procedures for scanning images, indexing and verifying the accuracy of the index terms and image quality and safeguards to prevent tampering or unauthorized use.

Recommendations

Specify that the vendor provide a complete set of documentation, including source code with flow diagrams, object code, operations, and maintenance manuals as a contract deliverable.

Periodically review and revise system documentation to ensure that all subsequent system modifications and enhancements are adequately described.

Migration Strategies

Management Issues

It is the responsibility of agency administrators, archivists, and records managers working together, rather than vendors and manufacturers, to ensure long-term access to records that are subject to technology obsolescence. Meeting this responsibility involves a continuum of actions already reviewed in this section. Ultimately, however, an explicit strategy for migrating optical digital data disk systems to successive technological generations as yet undefined must be adopted and consistently followed. There are at least three approaches to managing this continuum:

  • Selective equipment upgrading and scheduled optical digital data disk copying
  • Wholesale recopying of the data, based on the information on possible data degradation obtained using tools that periodically test media to verify data integrity
  • Transferring data to an entirely new generation of information technology.

The first approach is a continuous process of maintaining system functionality through selective equipment upgrades as the technology evolves, combined with scheduled systematic recopying of data as required. The advantage of this approach is the opportunity to work with established vendors and manufacturers as their product lines evolve to take advantage of the latest technological advances. In a rapidly evolving marketplace, this approach incurs the least amount of risk. Disadvantages are costs for continuous equipment upgrade and periodic data recopying and dependence on the manufacturer's stability and commitment to upgrading rather than wholesale replacement of a customer's existing equipment. In a rapidly evolving marketplace, none of these factors is certain.

The second approach involves a wholesale recopying of the data on a periodic schedule tied to the information on possible media errors obtained by periodic verification of disk degradation, but independent of expected system life. Such a strategy assumes that hardware and software compatibility is an "uncontrollable" issue that depends more on the dynamics of the marketplace than the needs of the archives and records management community. The advantage of this approach is that it focuses the energies of systems administrators on protecting data integrity. The disadvantage is the risk that the assumption of future systems compatibility may be incorrect.

The third approach involves transferring optical image and index data from a nearly obsolete generation to a newly emerging generation, in some cases bypassing one that is starting to become obsolete. In a sense this strategy "leapfrogs" from the optical technology on the verge of losing its usefulness to state-of-the-art technology, which may or may not utilize optical digital data disk storage media. The primary advantage of this strategy is the time it buys for systems administrators while the optical digital data disk industry settles into a more predictable development routine or is superseded by technologies as yet undeveloped. The strategy places the greatest demands on system manufacturers to guarantee "backward compatibility," which is the capacity of new equipment to function similarly to the equipment it replaces, in addition to its new and different capabilities. Adopting the leapfrog strategy requires that systems administrators closely monitor trends and not place blind faith in the future viability of any technology.

Rather than initially acquiring a large integrated system, Federal agencies may choose to install a prototype or research system to gain experience with digital-imaging and optical digital data disk technologies. This approach allows experimentation under operational conditions, without incurring the risk and expense of a major system procurement. Prototype systems also allow testing of design concepts, acquire system users input, and help demonstrate technological viability and validate operational capabilities. However, Federal agency administrators need to be aware of several factors including the inherent difficulties involved in expanding a one-of-a-kind system or limited-capability prototype to a full-production configuration, system component technology obsolescence issues, and challenges in migrating a prototype system's digitally stored information to a succeeding larger scale system.

Recommendations

Upgrade equipment as technology evolves, and periodically recopy optical digital data disks as required. Or Recopy optical digital data disks based upon periodic verification. Or Transfer data from a nearly obsolete generation of optical digital data disks to a newly emerging generation, in some cases bypassing the intermediate generation that is mature but at risk of becoming obsolete.

Information Technology Standards

The foundation on which the preceding management approaches rest is adherence to nonproprietary technology standards. Nonproprietary standards minimize, and in some instances perhaps eliminate, data exchange and computer incompatibilities. Although these standards inevitably must change as technology evolves, nonetheless they provide a modicum of stability in the midst of rapid change. Hence, their use increases the likelihood that the information stored in a particular optical digital image storage system may still be readable, retrievable, and intelligible in the future. Unquestionably, proprietary system configurations can create serious impediments for long-term access to records stored on optical digital data disks. Therefore, administrators should require an open-systems architecture that utilizes products that conform to nonproprietary standards for optical digital data disk application systems. If this strategy is not possible, the vendor should be required to provide a "bridge" to systems that do conform to nonproprietary standards.

Appendix B of this report summarizes the current and emerging nonproprietary standards that systems administrators should incorporate into Federal agency technology migration strategies. A viable and aggressive technology migration strategy is essential since it is unlikely that the readability, retrievability, and intelligibility of optical image data in systems can be maintained over time without active participation of all concerned parties. The primary challenge to agency administrators responsible for managing optical digital data systems in the years ahead is ensuring that the technology migration strategy:

  • Takes into account trends in the technological environment
  • Uses existing and emerging nonproprietary technology standards and supports the ongoing development of data interchange standards
  • Adopts prudent preservation measures in the interim

The adoption of optical digital data disk technology by a Government agency is a major decision that carries with it responsibilities that extend well beyond the acquisition of the original hardware and software. Only with careful planning can the long-term viability of digital imaging and optical digital data disk storage systems be ensured.

Recommendations

Regularly monitor trends in the technological environment that conform to open-systems standards.

Specify existing and emerging nonproprietary technology standards in system design. Where possible, system components should conform to nonproprietary or commonly accepted practices.

Evaluate possible data degradation of information stored on optical digital data disks and system functionality on a regular basis using media error monitoring and reporting tools outlined in proposed and evolving standards such as ANSI/AIIM MS59-199X.

Support the ongoing development of nonproprietary standards for data exchange and interoperability.


The name of this office was changed to Technology Research Staff in 1992 as part of an overall NARA reorganization.

Optical disk is a generic term that includes both analog and digital technologies. The common link is the optical methodology used to read/write the information for diverse formats such as optical digital data disks, optical tape, video discs, and CD-ROM. The term optical digital data disk appears extensively in this report to describe the form of optical disk media used to store digital data.


Notes

National Archives and Records Administration and National Association of Government Archives and Records Administrators, "Digital Imaging and Optical Media Storage Systems: Guidelines for State and Local Government Agencies," December 1991.

Questions concerning this policy or requests for information on NARA bulletins currently in effect may be directed to the National Archives at College Park, Office of Record Administration, Agency Services Division (NIA), 8601 Adelphi Road, College Park, MD 20740, 301-713-6677.

National Institute of Standards and Technology, "Development of a Testing Methodology to Predict Optical Disk Life Expectancy Values," NIST Special Publication 500-200, 1991.

Yvonne Kidd, "Federal Imaging in 1993: Electronic Imaging Comes of Age – Part I," Inform (March 1993): 14–27; "Federal Imaging in 1993: Electronic Imaging Comes of Age–Part II," Inform (June 1993): 36–40.

Further information concerning NARA policies and bulletins currently in effect may be obtained by contacting the Office of Record Administration, Agency Services Division (NIA), 8601 Adelphi Road, College Park, MD 20740, 301-713-6677.

National Archives and Records Administration, "Optical Digital Image Storage System," March 1991. For additional information about this project, contact the Director, Standards and Technical Services, Association for Information and Image Management, 1100 Wayne Avenue, Suite 1100, Silver Spring, MD 20910.

See also American National Standards Institute and Association for Information and Image Management, ANSI/AIIM MS52-1991, "Recommended Practices for the Requirements and Characteristics of Original Documents Intended for Optical Scanning," 1991.

Association for Information and Image Management, AIIM TR25-1990, "The Use of Optical Disks for Public Records," 1990.

Association for Information and Image Management, AIIM TR25-1990, "The Use of Optical Disks for Public Records," 1990.

For a definition of the concept of intrinsic value, see National Archives and Records Administration, "Intrinsic Value in Archival Material," Staff Information Paper 21, Washington, DC, 1982.

The draft international standard is to be known as ISO 12087. Work on this standard is proceeding at a good pace, and approval is expected sometime in 1994.

National Archives and Records Administration, "Optical Digital Image Storage System," March 1991.

Adapted from U.S. General Services Administration, "Applying Technology to Record Systems: A Media Guideline," May 1993. The optical digital data disk storage capacities listed reflect the state of the art at the time of publication of this strategies report, and are subject to change.

The sum totals indicate more than the 15 agencies originally visited due to several sites using more than one optical digital data disk format.

A post write life expectancy should not be relied upon exclusively in determining when to recopy the data on optical digital data disks.

The Working Group has completed a draft test methodology for predicting the life expectancy of CD-ROM media and anticipates that it will turn its attention to developing a test methodology for rewritable media, particularly magneto-optical. Subsequently, the Working Group will focus on a test methodology for WORM media. These priorities reflect what is perceived to be the concerns of the marketplace, especially from the perspective of vendors.

A National Institute of Standards and Technology (NIST) project funded by the National Archives is currently researching critical issues in the care, handling, and storage of optical digital data disks.

National Archives and Records Administration, "Optical Digital Image Storage System," March 1991.

Association for Information and Image Management, AIIM TR25-1990, "The Use of Optical Disks for Public Records," 1990.

Vice President Al Gore, "From Red Tape to Results: Creating a Government that Works Better & Costs Less: Report of the National Performance Review," September 7, 1993.

Additional information on NARA policy or bulletins currently in effect is available from the National Archives at College Park, Office of Record Administration, Agency Services Division (NIA), 8601 Adelphi Road, College Park, MD 20740, 301-713-6677.

Most of these records date from the 18th and 19th centuries and are likely to have intrinsic value that precludes their disposal after conversion to digital images.

Many of these records date from the early 19th century and have intrinsic value.

U.S. Department of Justice, Justice Management Division, "Admissibility of Electronically Filed Federal Records as Evidence," Washington, DC, 1991.

Ibid., p. 11.

TR31-1992 Part 1: Performance Guideline for Admissibility of Records Produced by Information Technology Systems as Evidence, and TR31-1993 Part 2: Acceptance of Records Produced by Information Technology Systems by Federal or State Agencies (published reports); Part 3: User Guidelines, and Part 4: Model Rule and Model Law (draft reports).

Association for Information and Image Management, "Performance Guidelines for the Legal Acceptance of Records Produced by Information Technology Systems," Part I, "Performance Guideline for the Admissibility of Records Produced by Information Technology Systems as Evidence," Silver Spring, MD, 1992, p. 10.

Ibid.

Ibid., p. 11.

Quarter-inch cartridge (QIC), 8 mm, and 4 mm magnetic tape are inexpensive, as are the tape drives themselves. The storage capacity ranges from 3 to 6 gigabytes. Their chief drawbacks are slow data transfer rates, which nominally are about 1 gigabyte per hour, and a projected life expectancy of about 10 years.

Further details describing NARA's approach to the development and adoption of standards for electronic records management and archives are provided in: "A National Archives Strategy for the Development and Implementation of Standards for the Creation, Transfer, Access, and Long-Term Storage of Electronic Records of the Federal Government," Technical Information Paper No. 8, June 1990.

Top of Page

PDF files require the free Adobe Reader.
More information on Adobe Acrobat PDF files is available on our Accessibility page.

Preservation >

The U.S. National Archives and Records Administration
1-86-NARA-NARA or 1-866-272-6272

.