Technical Information Paper No. 12
Digital-Imaging and Optical Digital Data Disk Storage Systems: Long-Term Access Strategies for Federal Agencies
July 1994
A Report by:
The Technology Research Staff
The National Archives at College
Park
8601 Adelphi Road
College Park, Maryland 20740-6001
July 14, 1994
In recent years, Federal agencies have become increasingly interested in using digital information technologies to store large amounts of information economically and efficiently. This is particularly true of programs designed to provide Federal information to citizens, since a corresponding reduction in the creation of paper records could potentially reduce costs and improve the delivery of services to the public. However, agencies need to ensure that whatever technologies they employ to store information are capable of retrieving that information for as long as it is needed.
In 1991, the National Archives and Records Administration (NARA), in conjunction with the National Association of Government Archives and Records Administrators, conducted a study of digital imaging and optical media storage technologies at the State and local government levels. Building on the 1991 report, NARA initiated a study of Federal use of these two technologies. A project team from NARA's Technology Research Staff reviewed fifteen Federal digital imaging and optical digital data disk storage applications and interviewed a number of experts in the field. This Technical Information Paper, which makes recommendations for ensuring long-term access to digital images stored on optical digital data disks, is the result of that study.
As the keeper of the nation's memory and as the Federal government's institutional records manager, NARA has a mandate to provide records management guidance to Federal officials. At the same time, we invite archivists, records managers, information resource managers, and other information professionals to share their experiences and observations with us. Together we can develop strategies for using new technologies to store Federal information while ensuring that retention and access requirements are met.
TRUDY HUSKAMP PETERSON
Acting Archivist of the United
States
National Archives and Records
Administration
Table of Contents
- Preface
- Acknowledgments
- Section
1: Executive Summary
- Section
2: Listing of Recommendations
- Section
3: The Challenge of Long-term
Access
- Section
4: Digital Image Capture
- System Integration
- Open-Systems Architecture
- Digital Image System Configuration
- Conversion of Original Records
- Digital Image Scanners
- Document Scanners
- Digital Scanning and Microforms
- Scanner Resolution
- Dynamic Range
- Image Enhancement
- Digital Image File Headers
- Data Compression Techniques
- Digital Image Quality
Assurance
- Section
5: Indexing Systems
- Index Database Location
- Index Database Complexity
- Section
6: Optical Digital Data
Disk Systems
- Optical Digital Data Disk System Configuration
- Optical Digital Data
Disk Recording Technologies
- Write Once Read Many (WORM) Systems.
- Rewritable Systems
- Optical Digital Data Disk Storage Capacity
- Jukebox Storage Systems
- Error Detection and Correction
- Small Computer Systems Interface
- Backward Compatibility of Optical Systems
- Optical Digital Data Disk Longevity
- Optical Digital Data Disk Substrates
- Optical Digital Data
Disk Storage Environments
- Section
7: Information Retrieval
- Image Display Workstations
- Digital Image Printers
- Retrieval Applications Software
- Cost-Effectiveness
- Disposition of Original Records
- Legal Admissibility
- Long-term Access
- Obstacles to Access
- Vendor Instability
- Media Life Expectancy/Data Transfer and Backups
- System Obsolescence
- Migration Strategies
- Information Technology
Standards
- Section
8: Information Management
Policy
- Appendix
A: Federal Agency Site
Visit Reports
- Agency for Toxic Substances and Disease Registry
- U.S. Army Corps of Engineers (USACE)
- Bureau of Land Management (Eastern States)
- Commodity Futures Trading Commission
- Department of the Army (Office of Chief of Staff)
- Department of the Army (PERMS)
- Environmental Protection Agency
- Federal Communications Commission
- Library of Congress
- Minerals Management Service
- National Oceanic and Atmospheric Administration
- Patent and Trademark Office
- Social Security Administration
- State Department
- United States Geological
Survey
- Appendix
B: Summary of Technical
Standards
- Overview of the Standards Process
- Optical Media Standards Criteria
- Status of Digital
Image and Optical Disk
Standards
- Appendix
C: Glossary of Terms
- Appendix
D: Bibliography
- Notes
Preface
In 1983 the Archivist of the United States directed that the recently created Archival Research and Evaluation Staff initiate a program to monitor developments in digital imaging and optical digital data disk storage technologies. By 1985, the Staff had initiated a research agenda that included several key programs. A pilot digital-imaging and optical digital data disk project, completed in 1989, assessed the suitability of digital-imaging and optical digital data disk technologies for National Archives holdings. Another program funded a research project by the National Institute of Standards and Technology (NIST) to develop a generic testing methodology for predicting the life expectancy of write once read many (WORM) optical media. This NIST study was completed in 1990. Subsequently, an international standards group on image permanence drew upon the results and conclusions of the NIST work in developing life expectancy standards for optical digital data disk storage systems. Thus far, this standards group has developed a draft life expectancy standard for CD-ROM media, and is currently developing a similar one for rewritable optical digital data disks.Although NIST developed and demonstrated a successful generic testing methodology, it is useful only as a general indicator of media life expectancy. This is because specific optical digital data disks may fail at different periods of time that can vary over many years. Even though this indicator is helpful, perhaps as a general guide in media selection, it cannot indicate when to recopy data stored on a particular optical digital data disk. Consequently, in 1990 the National Archives and Records Administration (NARA) commissioned the National Institute of Standards and Technology to address this problem as part of a larger study on the data integrity of optical media. NIST program staff organized a working group that developed a set of procedures for monitoring and reporting results of error detection and error correction codes on optical disk drives. These error detection and error correction activities are executed automatically without any action by users. One existing problem is that most optical disk drives do not provide a functionality for monitoring and reporting the error-checking results. The results of the NIST/industry working group were used as the basis for an Association for Information and Image Management (AIIM) draft standard that stipulates media error monitoring and reporting techniques for verification of the information stored on optical digital data disks. This AIIM draft standard is now being balloted. NIST's data integrity study includes research in care and handling of optical digital data disks. The results of these experiments will be published in a NIST report due in the fall of 1994.
Concurrently, NARA's Technology Research Staff initiated a study of the use of digital-imaging and optical media technologies in public records programs for State and local governments. Although the impetus for this study was a request from the National Association of Government Archives and Records Administrators (NAGARA), the project's descriptive plan of work stated that it was the first stage of a two-stage project. The project would culminate in an information access strategy report for Federal agencies. NARA and NAGARA published the report of the first phase, the State and local government study, in December 1991. Shortly thereafter, NARA's Technology Research Staff began research on the second phase and produced this report detailing strategies for long-term access for Federal agencies.
Drawing upon what had been learned in the State and local governments report, the project staff developed a research methodology based upon:
- A review of the digital-imaging and optical digital data disk marketplace and technical literature (e.g., optical industry journals, special studies, and technical reports) to identify emerging trends,
- An assessment of private industry and Federal agency system administrator experiences with digital-imaging and optical digital data disk projects, and
- A nationwide onsite examination of 15 Federal agency digital imaging or optical digital data disk applications.
The objective of this three-part analysis was to identify critical management issues and relate them to technical trends and user experiences. This report consists of an executive summary, a list of recommendations, an overview of the challenges involved in long-term data access, followed by five sections that describe digital image capture, indexing systems, optical digital data disk storage systems, information retrieval, and information management policy. Each of the five main report sections contains management issues, technology trends, user experiences, and recommendations.
Most of the recommendations provided in this report conform to the State and local government study, although there are several clear departures. Generally, these departures take into account a better understanding of the issues and reflect changing trends in digital information technology. Four report appendixes provide detailed descriptions of agency site visits, a summary of relevant technical standards, a glossary of technical terms, and an annotated bibliography. As an aid to readers of this report, the authors selected boldface type to identify technical terms that are subsequently defined in appendix C, "Glossary of Terms."
The National Archives staff responsible for preparing this long-term strategies report included Charles Dollar, Barry Roginski, Peter Hirtle, and Charles Obermeyer II. Barry Roginski had the main responsibility for collecting the descriptions of agency site visits and organizing the report.
Acknowledgments
NARA's Technology Research Staff would like to take this opportunity to recognize the special assistance provided by those individuals and Federal agencies whose contributions made this report possible. The following individuals generously participated in our site visits and provided editorial assistance during the preparation of the site visit reports:
Tim Allard, Minerals Management Service Ray Buland, National Earthquake Information Center Ann Christy and Kristin Vajs, Library of Congress Malcolm Ewell, Social Security Administration James F. Gegen and Linda Brooks, Bureau of Land Management David Grooms, Patent and Trademark Office Sharon O. Jacobs, Agency for Toxic Substances and Disease Registry Rick Kanner, Federal Communications Commission Jacqui Lilly, Department of State Charles MacFarland, National Oceanic and Atmospheric Administration Gail Martin, Department of the Army Hunton G. Oliver, Commodity Futures Trading Commission Major Perkins, Department of the Army Linda Worthington, US Army Corps of Engineers Charles Young, Environmental Protection Agency
Special thanks are also due to the following individuals for their contributions during the data collection, technical analysis, and editorial review phases of this report:
Robert M. Blatt, Telos Systems Group
Eric Chaskes, Mary Donovan, Steven Puglia, and Sandra Tilley, NARA
Paul Conway, Yale University
Marilyn Courtot, Children's Literature
Howard N. Greenhalgh and Eric E. Tolbert, Department of the Army
Richard Harrington, Virginia State Library
Anne R. Kenney, Cornell University
Basil Manns, Library of Congress
Whitney S. Minkler, MSTC
Lance W. Morgan, Science Applications International Corporation
Julie Peternick, PRC Inc.
Fernando L. Podio, National Institute of Standards and Technology
William K. Saffady, State University of New York at Albany
Section 1: Executive Summary
The National Archives and Records Administration (NARA) recently completed a study of digital-imaging and optical digital data disk storage systems in the Federal Government. This report, prepared by NARA's Technology Research Staff, discusses the findings of that study. Major areas identified are critical information management issues, technological trends, and germane user experiences. Research study elements included analysis of optical digital data disk technological developments, review of the relevant technical literature, assessment of Federal agency program management experiences with optical digital data disk systems, and site visits to 15 Federal agency optical disk projects. Report appendixes consist of descriptive summaries of site visits, a listing of technical standards, a glossary defining technical terms, and an annotated bibliography.
Potential benefits of imaging systems can best be achieved when the optical digital data disk system supports the information needs of the agency as a whole and when the technology is used to enhance service - not simply to address a single, isolated problem. Federal agency records management officers and archivists have a vital interest in optical digital data disks, but they should also be cognizant of the perils of technological obsolescence, inconsistent equipment performance specifications, incompatible new products, and a shortage of technical and administrative standards. Note that the term "long-term value" (defined by agency need) is not synonymous with "permanent," which denotes historical value and permanent retention by the National Archives. The National Archives current policy regarding optical digital data disks as a transfer medium for information of permanent value is described in section 8, "Disposition of Original Records".
Federal agency program managers responsible for records with long-term value should find this report's recommendations useful in designing and implementing an optical digital data disk system. This report is not a comprehensive overview of every important issue regarding the long-term access to electronic records. Rather, it is intended to complement existing technical studies and other generally available literature pertaining to digital-imaging and optical digital data disk technologies. In particular, this report is not intended to provide an exhaustive analysis of issues related to database indexing and retrieval, the development of optical digital data disk standards, or compact disc read only memory (CD-ROM) technology. Compact disc recordable (CD-R), for example, has both formal and de facto standards for factors such as directory structure and physical format and offers a relative independence of the stored data from proprietary retrieval mechanisms. The scarcity of observable CD-R systems during the 15 site visits precluded extensive discussion of user experiences with CD-R technology in this report.
Federal agency officials responsible for selecting and managing optical storage systems must adopt as an overall goal maintaining access to records of long-term value stored in digital format. To achieve this objective, these officials must:
- Ensure the quality of digital images captured through an electronic conversion process,
- Provide for the ongoing functionality of system components,
- Monitor and limit the deterioration of optical digital data disk storage, and
- Anticipate and plan for further technological developments.
Long-term usability of digitally stored information, including scanned document images, digital data, and descriptive index data, will best be achieved by implementing a sound policy for migrating data to future technology generations, adhering to well-documented image file-header formats, and monitoring media degradation. System managers should create the technical and administrative infrastructure required to implement relevant information technology standards as they are developed.
Ensuring the quality of digital images means exercising continuous control over three processes: conversion of the original image to digital data; enhancement of the digital image, if necessary; and compression and/or decompression of the digital data for transmission, storage, and retrieval. Quality-control inspections, either at the document scanner workstation or as a followup task, should compare the original documents to the captured electronic images and index data.
Ensuring that information stored on optical digital data disks will continue to serve the function for which it was originally intended for as long as it is needed requires:
- A long-term commitment to an open-systems architecture and
- Adoption of a methodical approach to system component upgrade and data migration that guarantees the interoperability of current technologies with those yet to be developed.
Federal agency administrators may utilize either write once read many (WORM) or rewritable optical digital data disk technologies to store records of long-term value. Administrators who use rewritable optical digital data disks should ensure that read/write privileges are meticulously controlled and that an audit trail of rewrites is maintained. Unless there are specific program justifications, administrators should select the most suitably sized optical digital data disk storage format that satisfies long-term agency program needs while conforming to industry standards.
NARA has long recognized the potential benefits of optical digital data disk technology for storing and retrieving large quantities of information. Several years ago, due to the unsettled state of optical media technology and especially the absence of standards, NARA issued an optical media policy bulletin. This bulletin notified Federal agencies that NARA could not accession optical digital data disks containing records of permanent value. Permanent records stored originally on optical digital data disks required conversion to a medium acceptable to NARA at the time of transfer to NARA's legal custody. A revised bulletin addressing these concerns was recently issued by NARA.
The long-term stability of optical digital data disks requires the specification of a reliable storage/recording technology and ongoing protection of the media from damage and abuse through handling and adverse environmental conditions. Although optical digital data disks appear to be more durable and stable than the hardware and software required to maintain access, vendor claims regarding durability must be carefully examined. This examination should involve an evaluation of the optical digital data disk's manufacture data, testing methodology and procedures, and test results based upon the findings described in NIST SP-200. This evaluation should support a predicted pre-write shelf life of five years, and a post-write life expectancy of at least twenty years. Because optical digital data disks are not immune to hostile storage conditions, it is prudent to store them in a stable environment.
By themselves, digital-imaging and optical digital data disk storage systems cannot solve access problems stemming from existing inefficient manual or computerized information systems and practices. Indeed, automating inefficient processes may merely exacerbate existing deficiencies. Agency administrators who seize the opportunity to reassess office operating procedures when adopting new technologies are more likely to benefit from improved administrative productivity, enhanced user services, and operational cost savings.
Long-term access to records of enduring value stored on optical digital data disks involves more, however, than image quality, system functionality, and media stability. Adoption of digital information storage on optical digital data disks effectively binds the agency to a technological evolution that it does not control. Administrators must continue to monitor technological trends; plan for systematic maintenance, upgrade, and eventual migration to newer technologies; use existing and emerging standards; support the development of data interchange standards; and adopt prudent information preservation measures in the interim.
Agency administrators, information management officials, systems development analysts, records managers, and agency historians must work together to develop and implement policies and procedures governing the care of Federal agency records appraised as permanent by NARA.
Section 2: Listing of Recommendations
This section lists the significant technical and administrative recommendations that are described in further detail within this report. For ease of user reference, this section's basic organization conforms to the major sections and subheadings in the remainder of this report.
DIGITAL IMAGE CAPTURE
Open-Systems Architecture
Adopt an open-systems architecture for new optical digital data disk applications. or
Require a "bridge" to systems with nonproprietary configurations.
Digital Image System Configuration
Clearly define user and agency requirements during the imaging project's requirements analysis phase.
Verify that the imaging system has inherent flexibility and has an open, or nonproprietary, design that accepts future hardware and software upgrades.
Conversion of Original Records
Analyze the domain of documents to be scanned, identify the levels of uniformity, and consider the use of a document conversion contractor when the backfile holdings are extensive.
Implement a comprehensive records accounting and tracking process during the project's conversion phase.
Invest sufficient staff or contractor resources into document preparation to increase scanning productivity.
As appropriate, assign experienced agency staff to conversion processes that benefit from their knowledge of existing agency operations.
Digital Image Scanners
Prior to system acquisition, validate vendor claims regarding document throughput rates, image quality, and ease of operation using a representative sampling of the agency's holdings.
Follow the standard test procedures outlined in FIPS PUB 157 "Guideline for Quality Control of Image Scanners."
Scanner Resolution
Employ a scanning resolution of at least 300 dpi for office documents when future applications (e.g., OCR) for the digital images are anticipated.
Specify a higher scanning resolution (between 300 and 600 dpi or higher) as needed, for engineering drawings, maps, and documents containing significant fine line and background detail information.
Dynamic Range
Employ gray-scale or color imaging technology as needed for suitable continuous-tone images such as photographs, maps, and related records.
As appropriate, utilize 8-bits per pixel gray scale image technology for capturing continuous tone black & white photographs and/or negatives, and 24-bit mode to obtain true color rendition.
Image Enhancement
Conduct scanner testing using selected documents during the system design phase to determine the need for special scanner hardware modifications.
Retain scanned unenhanced images of documents of intrinsic value.
Digital Image File Headers
Use file formats that promote/facilitate network data transfer, such as Aldus/Microsoft TIFF Version 5.0 which meets the Internet Engineering Task Force standard definition for exchange of black and white images on the Internet.
Require use of a nonproprietary image file-header label. Or Require a "bridge" to a nonproprietary image file-header label. Or Require a detailed definition of image file-header label structure.
Data Compression Techniques
Use a lossless compression scheme when continued fidelity to the exact appearance of the original document is achievable and desired.
Use JPEG or MPEG for images with continuous tonal qualities when some loss of detail is acceptable.
For digital images without continuous tonal qualities, require standardized compression techniques, such as CCITT Group 3, CCITT Group 4, or JBIG.
If a proprietary lossless compression system is used, require that the vendor provide a means of decompressing the data to its original format.
Digital Image Quality Assurance
Routinely evaluate scanner performance based on quality control procedures recommended in FIPS PUB 157 "Guideline for Quality Control of Image Scanners."
Establish a consensus on what constitutes the "best" image for the different types of agency source documents; monitor on-going image quality using the system's display screens and laser printers.
Perform a 100-percent visual quality evaluation of each scanned image and related index data; quality control inspections must be meticulous if the original documents are not retained after conversion.
Verify the information as copied when transferring images/data to any other media from the original magnetic hard drive or optical digital data disk.
If WORM disks are the storage medium of choice, permanently write the information only after conducting a thorough quality-control inspection of the scanned images and index data.
INDEXING SYSTEMS
Indexing Systems
As appropriate, ensure that information retrieval software is SQL compliant.
Regardless of the capture methodology used, conduct a 100-percent quality-control inspection of all index data.
Index Database Location
Store the index data magnetically for improved operations and optically if long-term preservation is a concern.
Indexing Database Complexity
Index system design and capability decisions should be based on a thorough analysis of agency operations and user needs.
OPTICAL DIGITAL DATA DISK STORAGE SYSTEMS
Optical Digital Data Disk Recording Technologies
Either WORM or rewritable technologies may be used, with the actual selection determined by the agency's specific application requirements. Ensure that read/write privileges are carefully controlled and that an audit trail of rewrites is maintained when rewritable technology is used.
Optical Digital Data Disk Storage Capacity
Based on a requirements analysis and systems design study of an agency's operations, select the most suitably sized optical storage form factor that satisfies the agency's long-term programmatic needs and conforms to industry standards.
Jukebox Storage Systems
When selecting an optical digital data disk jukebox, consider the following factors: the overall information access needs (staff and public), budget and procurement considerations, and existing operations staff requirements.
Error Detection and Correction
Require that equipment conform to the proposed national standard ANSI/AIIM MS59-199X, "Use of Media Error Monitoring and Reporting Techniques for Verification of the Information Stored on Optical Digital Data Disks."
Small Computer Systems Interface
Specify the SCSI "Write and Verify" command when writing data to optical digital data disks.
Require system manufacturers and integrators to provide complete documentation on the specific configuration of the SCSI (or other interface) hardware and software.
Backward Compatibility of Optical Systems
Require upgrades or replacement systems to be backward compatible with existing information systems.
Or
Convert the existing digital information to the new format at the time of system upgrade or acquisition.
Optical Digital Data Disk Longevity
In addition to conducting a careful analysis of each manufacturer's media life expectancy testing methodologies and procedures, require the use of optical digital data disks with a pre-write shelf life of at least five years.
Require a minimum post-write life of twenty years based upon manufacturer optical digital data disk life expectancy tests that conform to the findings of NIST SP-200.
Optical Digital Data Disk Substrates
Optical digital data disk substrates of polycarbonate or tempered optical glass are acceptable.
Optical Digital Data Disk Storage Environments
Optical digital data disks should be stored in areas with stable room temperatures and with relative humidity ranges consistent with the storage of magnetic tape media. Avoid storage areas with excessive humidity and high temperature, and do not subject optical digital data disks to rapid temperature extremes.
If possible, do not operate systems or store optical digital data disks in environments with excessive airborne particulate matter.
Cleaning procedures for optical digital data disks must be in strict conformance with the media manufacturer's recommendations.
INFORMATION RETRIEVAL
Conduct a comprehensive requirements analysis of end users' information-access needs and a systems design study prior to procuring imaging system components.
INFORMATION MANAGEMENT POLICY
Cost-Effectiveness
In order to maximize the long-term viability of systems, develop digital imaging and optical digital disk applications in a cost-effective manner.
Where possible, link system design to Government improvement initiatives such as the National Performance Review.
Reexamine existing paper records systems prior to conversion to optical digital data disk systems to maximize productivity and improve delivery of information services.
Disposition of Original Records
Conform to NARA policy regarding the disposition of original records when converting to an optical digital data disk system.
Legal Admissibility
Become familiar with how the rules of evidence apply to Federal records, and ensure that procedural controls that protect their integrity are in place and adhered to.
Implement the recommendations provided in AIIM TR31, Parts I and II, applicable to agency projects using digital-imaging and optical digital data disk storage technologies, either in the conversion of paper documents to digital form or their initial creation in digital form.
Long-term Access
Develop an agency-wide data migration and disaster recovery plan well in advance of such an event for the digital imaging and optical digital data disk storage system.
Vendor Instability
In the event that warning signs of impending obsolescence appear, managers should make immediate plans to migrate the application to a new system.
Require vendors to deposit a copy of the computer system's application software codes and associated documentation with a bank, archives, or secure records facility in case of a business failure.
Media Life Expectancy/Data Transfer and Backups
Recopy data stored on optical digital data disks based on the information obtained through periodic verification of media degradation.
Create a backup copy of the information stored on optical digital data disks for retention in an offsite facility, using the appropriate storage media (optical, magnetic, paper, or microfilm) that best satisfies agency requirements.
System Obsolescence
Specify that the vendor provide a complete set of documentation, including source code with flow diagrams, object code, and operations and maintenance manuals as a contract deliverable.
Periodically review and revise system documentation to ensure that all subsequent system modifications and enhancements are adequately described.
Migration Strategies
Upgrade equipment as technology evolves, and periodically recopy optical digital data disks as required. Or Recopy optical digital data disks based upon periodic verification. Or Transfer data from a nearly obsolete generation of optical digital data disks to a newly emerging generation, in some cases bypassing the intermediate generation that is mature but at risk of becoming obsolete.
Information Technology Standards
Regularly monitor trends in the technological environment that conform to open-systems standards.
Specify existing and emerging nonproprietary technology standards in system design. Where possible, system components should conform to nonproprietary or commonly accepted practices.
Evaluate possible data degradation of information stored on optical digital data disks and system functionality on a regular basis using media error monitoring and reporting tools outlined in proposed and evolving standards such as ANSI/AIIM MS59-199X.
Support the ongoing development of nonproprietary standards for data exchange and interoperability.
Section 3: The Challenge of Long-term Access
Digital-imaging and optical digital data disk storage technologies have been available in the commercial marketplace for more than a decade. Digital imaging typically involves converting existing paper documents (e.g., forms, reports, maps, drawings, correspondence), photographs, or microforms into an electronic representation for computerized storage and retrieval. Due to the vast data storage capacities they offer, optical digital data disks are often integral components of digital imaging systems. The linkage of digital imaging, which generates sizeable electronic files, to the superior storage capacity of optical digital data disks has made both technologies increasingly attractive to those seeking improved staff productivity and enhanced user services.
Given vendor claims about the cost-effectiveness of digital-imaging and optical storage technologies, it is not surprising that interest in these two technologies is mushrooming. Newsletters, journals, and periodicals relating to these technologies regularly describe an increasing number of digital-imaging and optical digital data disk applications in health care services, banks, insurance companies, pharmaceutical companies, and universities, to name only a few. Two factors are driving the use of digital and optical digital data disk storage technologies: the availability of an increasing variety of devices and decreasing costs. Consequently, it is not surprising that the number of digital-imaging and optical digital data disk applications being planned or actually implemented at all levels of government Federal, State, and local continues to grow. A recent review of major Federal Government digital imaging and optical media applications highlights the enormous financial investment committed to these two technologies.
Many Federal agency program managers consider digital-imaging and optical digital data disk storage technologies to be essential tools in providing improved and more cost-effective services. However, there is another group of governmental administrators whose chief concern is the impact of these two technologies on the disposition of Federal records. Records managers, who implement the approved disposition of Federal records, and archivists, who protect and service valuable permanent records, have a vital interest in the viability of digital-imaging and optical digital data disk storage systems. Many records managers are concerned about the legal admissibility of Federal records in a court of law. Both records managers and archivists are concerned with storage media longevity and information system obsolescence.
This report addresses these as well as other issues relevant to the administration of optical digital data disk information systems. The report also identifies issues that the digital-imaging and optical digital data disk industries must address and resolve if these technologies are to prove viable for applications requiring long-term access. In this regard, it is hoped that the guidance provided herein helps initiate a much-needed vendor and user impetus to ensure that archival information access considerations are taken into account during an information system's requirements analysis, system design, engineering development, and integration phases.
For purposes of this report, optical digital data disk systems are defined as an amalgam of technological processes that includes, at a minimum, digital imaging or storage of digital data on optical digital data disks or similar optical media. Based on a review of central management issues, technical trends, and user experiences, the report covers a range of topics that are unique to these technologies. Technology trends have been identified through market research and a diligent review of technical specifications, product literature, and analytical reports published in a wide range of journals and periodicals. In order to enrich the study and to provide a suitable environment in which to "test" key concepts and ideas, 15 Federal Government optical digital data disk applications were surveyed. Although these 15 applications are not fully representative of all optical storage systems currently in operation, they do provide important background information to view the implementation of digital-imaging and optical digital data disk information storage systems within the Federal Government.
Government officials responsible for selecting digital-imaging and optical digital data disk storage systems must address several critical factors. They must, of course, ensure that the digital-imaging and optical digital data disk storage system they select meets their agency's immediate needs in a cost-efficient manner. They must also ensure that records scheduled as permanent are retained in their original format following conversion or are converted to a medium acceptable to NARA at the time of transfer to the National Archives.
The decision to acquire or build an image record system using optical digital data disks with specific system capabilities should be based on a thorough analysis of the agency's immediate and long-term information processing requirements. At the same time, the adoption of digital image and optical digital data disks is not without perils, due to the many challenges resulting from a rapid technological evolution. For example:
- New optical storage products, many incompatible with each other, are constantly emerging;
- The paucity of technical and administrative standards limits the development of objective criteria for selecting equipment;
- The rapid adoption of optical technology by Government agencies increases the pressure to act;
- A proliferation of vendors at the National and local levels, with competing claims, may complicate the setting of performance specifications; and
- The longevity (market life) of digital-imaging technology products and the vendor community providing systems and technical services is volatile.
Of course, these "pitfalls" are manifestations of a more pervasive problem of technological obsolescence. Far too often the implications of these hazards to long-term access, linked to agency accountability for its programs, are not sufficiently taken into account. Long-term access to, or usability of, digital-image, or character-based, data must be viewed distinctly from the medium on which the information is stored. This distinction allows for a continuing commitment to information in digitized format, while simultaneously recognizing that the media storing that data must eventually be replaced due to inevitable obsolescence.
Establishing a commitment to maintaining the long-term usability of digitized information, as opposed to the media itself, is not enough. Maintaining this long-term commitment to use digitally stored information also requires continuous:
- Data readability
- Data retrievability
- Data intelligibility
Data readability refers to the ability to process the information on a computer system or device other than the one that initially created the digital information or on which it is currently stored. Typically, nonreadability involves some aspect of an older storage device (a tape or disk) that makes it physically incompatible with existing equipment. This "hardware obsolescence" occurs when storage devices and media used today become incompatible with those developed in the future. For instance, the 556 bits per inch (bpi) magnetic storage tapes commonly used in the 1960's cannot be read by current tape drives. Of course, as long as a magnetic tape or hard disk drive continues to function properly and repair parts are available, its usable life can be extended for a time. Similarly, the lifespan of optical digital data disks can be extended through proper storage and maintenance practices.
Data retrievability, which assumes readability as just defined, means that identifiable records or parts of records can be selected and accessed. Accurate retrieval requires keys, or pointers, that link the logical structure of records (i.e., data fields, text strings, directories, and indexes) to the physical storage locations of the data on a disk. The optical digital data disk logical structure may have little relationship to the media and format involved. Usually, this linkage information is found in a file header, or label. The label may include information required to locate the beginning of a file, to indicate the number of bytes each record contains and where these bytes are physically located, and to distinguish among the various informational units of fields that form records. Typically, the interpretation of the record's logical structure is a function of the computer's operating system (e.g., MS-DOS). Ensuring the long-term retrievability of records requires the continued functionality of the original operating systems or device drivers because these, too, are likely to become obsolete given enough time.
Data intelligibility means that the information a computer retrieves is comprehensible to another computer system or a human viewer. Intelligibility may occur at three levels. At its most simple level, intelligibility occurs when two computer systems either use or understand the same digital representation of the information, and this representation is translated into a form that humans recognize and understand. A prime example of an understandable form is an American Standard Code for Information Interchange (ASCII) text file. The second level occurs when two computer systems can use or understand the same representation of the information (e.g., ASCII), but when the representation is presented to users, it does not carry sufficient information (e.g., it is not self-referential) for a human to comprehend. Usually, this problem is associated with both coded and numeric data, and the intelligibility of such information can only be assured through documentation defining the values represented by the numbers and codes. The third level occurs when two different software applications, functioning in different computing environments, can process the same digital data and achieve identical results. One example of this is a text document embedded in one word-processing system that can be processed by a totally different word-processing system with no loss of information or page formatting details such as type fonts and line spacing. This lack of intelligibility becomes particularly evident when a proprietary encryption scheme is encountered or when digital images are compressed based on a proprietary technique.
Factors such as proprietary file-header labels, data compression techniques, and software obsolescence are major barriers to achieving intelligibility over time. Nonproprietary standards can begin to address and resolve some of these concerns. In spite of the absence of information technology standards in some areas that could help ensure the long-term records accessibility, system managers should create the technical and administrative infrastructure required to implement relevant information technology standards as they are developed.
An additional layer of complexity may arise with the way some optical digital data disk storage and retrieval systems write index pointer information. Furthermore, the searching and retrieval software associated with a particular application system usually requires a specific operating system platform, such as MS-DOS. Typically, a retrieval software application will add other pointers to the logical structure of the records. The retrievability of these records is therefore inextricably linked to the software application. Unless built-in data migration paths are established or newer software generations are installed that offer backward compatibility, accessing the records will be impossible.
Long-term access to digitally stored information, including scanned document images and descriptive index data, can be assured through migration to future technology generations. In order to make this migration possible, applications must follow certain practices in the initial digital image capture and its storage on optical digital data disk storage systems. In addition, the indexing and retrieval systems in place must still be useful in new technological environments. The establishment of a clear data migration strategy must be established, and an information management policy with comprehensive administrative procedures for both the data and its migration is also necessary. This report addresses each of these critical areas in turn.
In particular, it is hoped that the report's findings will assist program managers, archivists, and records managers to maintain the long-term usability of digitally stored information, including scanned document and microform images, electronic data files, and accompanying character index databases. The likelihood of long-term readability, retrievability, and intelligibility will be increased if programs:
- Ensure the quality of digital images captured through an electronic conversion process,
- Provide for the continuing functionality of system hardware and software components over time,
- Monitor for any potential data degradation of optical digital data disks, and
- Anticipate technological developments and plan accordingly.
This report assumes that Federal agencies have decided to implement an optical digital data disk application following approved requirements analysis, system design, and cost-benefit study processes and in accordance with NARA regulations covering records scheduling and the transfer of permanent records to the National Archives (see section 8, "Disposition of Original Records"). Other readily available information sources, some of which are listed in the bibliography, examine and review approaches to the technology decision-making process. This report does not discuss computer skills and technical expertise required of agency staff who implement and operate an optical digital data disk application.
The National Archives recognizes the potential benefits of optical digital data disk technology for storing and retrieving large quantities of digital information. Several years ago, due to several factors that included the unsettled and rapidly evolving state of the optical media technology marketplace and the absence of formally adopted industry standards, the National Archives notified Federal agencies that optical digital data disks containing records of permanent value could not be accessioned. Under the terms of this notification, permanent records stored on optical digital data disks must be converted to a medium acceptable to NARA at the time of transfer to NARA's legal custody. Unscheduled records converted to an optical medium must be retained in their original format pending scheduling or, if they are later determined to be permanent, must be converted to a medium acceptable to NARA at the time of transfer. Federal agency records cannot be disposed of without the authorization of the Archivist of the United States.
NARA continues to monitor developments in optical digital data disk technology, and has recently issued policy bulletins describing the accessioning of optical digital data disks. Federal agency administrators are encouraged to inform the National Archives of significant applications for implementing optical digital data disk technology.
The recommendations cited in this strategies report are not intended to set standards for system development or to determine the procurement of optical digital data disk application systems, nor should they be viewed as de facto archival standards. Rather, they are offered as reasoned conclusions that prudent managers of programs involving records of long-term value may find useful in designing and implementing optical digital data disk information storage systems.
Section 4: Digital Image Capture
Transforming paper documents into digitally stored electronic images offers Federal agencies several immediate benefits including greatly reduced record handling costs, improved operational efficiency, and increased information-processing effectiveness in the workplace. Since original source documents can deteriorate (or even disappear) when used for reference, in many cases it is prudent to convert them to another form or medium that can provide equal or greater utility without harming the originals. Historically, micrographics has been by far the most popular medium on which to transfer reference copies of original documents. Digital imaging technology has demonstrated an ability to convert and store electronic images of documents on optical digital data disks and automatically retrieve the information to a display screen or printer for reference. The majority of the 15 Federal agencies visited for this report utilize digital-imaging technology for scanning documents. The popularity of digital imaging is understandable, considering the sheer volume of Federal agency information that already exists in paper form. Traditional paper-based records storage-and-retrieval systems are often labor-intensive, time-consuming processes. Alternatively, a digital conversion of Federal Government records involves complex issues such as holdings conversion, file-header schemes, data compression, and quality control.
Management Issues
A fundamental issue facing agency administrators is the tradeoff between the costs of integrating digital imaging systems and the benefits that accrue to system users. Large-scale digital-imaging projects require an ongoing agency commitment to highly specialized equipment, a technology-oriented staff, and suitably equipped conversion facilities. Significant conversion project cost items may include identifying and preparing the materials to be converted, image capture, indexing, quality inspections, and refiling of records. Sophisticated indexing at the image-item level requires a considerable investment in human resources. Imaging offers many potential user benefits, including multiple simultaneous access to images, rapid and accurate retrievals, economical communication of image data over great distances, high-resolution image display on a desktop terminal, and laser-printed output.
One possible conversion alternative is the use of an imaging service bureau for a one-time backfile conversion. This approach eliminates the need to establish an in-house equipment and operations staff capability for a one-time endeavor. Since Federal agency personnel possess a greater working knowledge of the agency's records holdings, they function more effectively as document preparation staff, quality-control inspectors, or as monitors of the contractor's performance. Under these circumstances, contractor staff are often assigned to the more repetitive tasks of scanning and data entry. Conversion contracts are facilitated when the universe of documents is relatively static, such as a set of existing historic land deeds, and where the conversion is conducted in-house or at a preestablished record's holding site. Agency control over the conversion site offers several advantages including reduced records transportation costs, improved document security, and simplification of administrative records tracking.
Additional costs may be incurred when attempting to increase the technical image quality at each stage of the conversion process. Several approaches to improved image quality are available, depending on the characteristics of the original documents. For example, selecting a higher scan resolution often improves image sharpness. Alternatively, specifying a lower scan resolution combined with computed gray scale can also improve image appearance. These benefits may be offset by larger digital-image file sizes and, quite possibly, slightly longer wait times during image retrieval. Higher quality images are more likely to be of increased value in the future, extending the effective life of the digital files and postponing the need to rescan (assuming the original documents are retained). Ensuring that the best possible representation of each original document is captured and preserved can increase the agency management's confidence level in the imaging system.
Currently, there are no objective empirical indicators of acceptable image quality for digitally scanned images. The Association for Information and Image Management (AIIM) is supporting work to develop source-document-independent approaches for evaluating the quality of images captured in document conversions. One possible approach is to categorize documents based upon types of perceived scanning problems. An impartial agency review committee could compare the original documents to the scanned digital images and arrive at a consensus on capturing the "best" image for each document category. Regardless of the quality standard selected, adherence to quality-assurance procedures is a management responsibility. This critical element should not be left solely to the discretion of the conversion operations staff.
Several issues should be addressed during the program's system planning and budgeting phases: initial system cost; maintenance requirements; and, unforeseen demands placed on the system due to creative use. Initial system costs include factors such as system capability, performance requirements, and equipment and applications software configurations. Maintenance requirements include compliance with system revisions, updates, and component replacements needed for continuing system operations. As part of system maintenance, specific system benchmarks should also be established and continually reviewed to assure that the installed system is meeting agency requirements. Lastly, unforeseen creative use of the system will increase demands for additional user reference services. These added demands will require highly qualified agency or vendor staff to perform services such as integrating additional system components, upgrading the data base management system (DBMS), and enhancing the performance of the imaging system's data communications network.
System Integration
Technical Trends
The system development phase of a digital-imaging or optical digital data disk project usually includes the identification, collection, and organization of the Federal agency's materials to be processed. Image data stored on optical digital data disks is frequently created by a retrospective conversion of all existing information in an agency's files or, conversely, adoption of a "today-forward" conversion concept, wherein only the most recent or active records are converted. Additionally, there are system applications geared toward Federal agency records that never originally existed in paper format and do not require a document-scanning conversion.
A digitally scanned raster image is essentially an electronic "photograph" of a document, divided into a "grid" composed of thousands of minuscule picture elements, or pixels. The brightness value for each pixel is converted to a digital representation. Unlike alphanumeric data, raster images consist of binary 1's and 0's that in themselves carry no intelligence and therefore cannot be queried in terms of what information the image represents. It is for this reason that comprehensive, accurate indexing of digital images is mandatory for efficient user retrieval access. Properly indexed electronic digital images can be displayed on high-resolution display screens, transmitted to a remote user sites, or distributed as hard copy.
The digital-imaging and optical digital data disk storage marketplace will continue to introduce new products that vary in configuration, capability, and cost. This process reflects the diversity of user information-access needs. The pioneering digital-imaging projects were often research pilot programs or stand-alone operations with no direct links to existing information systems. This concept is undergoing a fundamental change, as today's agency administrators and end users alike are more technically sophisticated and expect tangible results. In response, digital-imaging and optical digital data disk storage systems are increasingly assuming higher profile roles in agencies, serving as catalysts for organizational changes.
Federal agency management must recognize the potential impact that imaging technology can impose on workflow processes, including agency forms design and administration, personnel management and supervision, paperwork and records management, and agency-patron relations. Introducing an imaging system into the workplace will not in itself necessarily provide an immediate solution to inherent operational shortcomings unless significant effort is expended for document control and indexing. Maximum benefits are obtained when existing workflow processes and operational procedures are adapted to the new technology.
User Experiences
The 15 Federal agency systems visited for this report were integrated using one of the several approaches, and many of the systems utilize components obtained from diverse equipment manufacturers, rather than relying on one single product source. Approximately one-third of the systems, including the Department of the Army's Personnel Electronic Records Management System (PERMS) and the Bureau of Land Management's imaging projects, were obtained from a large corporate integrator or vendor. Another third, including the Minerals Management Service and the Commodity Futures Trading Commission projects, were obtained from somewhat smaller, more localized vendors responsible for designing and assembling the components obtained from a variety of manufacturers. The final one-third of the systems, including imaging systems at the Patent and Trademark Office and the imaging program at the Agency for Toxic Substances and Disease Registry, were designed, developed, and integrated through a collaborative effort between the integration company personnel and Federal agency staff.
Open-Systems Architecture
Technical Trends
Digital-imaging systems that feature proprietary hardware and software components may have limited ability to accept components supplied by alternative manufacturers. As the imaging technology marketplace continues to evolve, an increasing emphasis is placed on "open" systems that incorporate a multivendor environment. Open-systems architecture is defined for the purposes of this report as a systems design that:
- Permits component upgrades with negligible degradation to system functions
- Allows the system to be upgraded over time without a significant risk of information loss
- Supports the import and exporting of digital data.
The emphasis of the user community and digital-imaging industry is shifting toward system architectures with inherent operational and configuration flexibility. An open-systems environment supports the integration of standardized system components, while meeting unique user needs. One of the key factors in achieving true open systems is the development, acceptance, and widespread adoption of nonproprietary standards.
User Experiences
The majority of the 15 Federal agency sites visited have some proprietary components or processes. These elements include unique file headers, image compression and data transmission processes, hardware components, applications software, and the overall operating system configuration. Federal agency systems administrators are cognizant of the need to move toward open systems and are taking positive steps in that direction. Many of the system administrators interviewed noted that although their existing systems contain proprietary components, long-range agency objectives are to eventually move toward an open-system concept. These plans involve adopting industry-wide standards and integrating off-the-shelf hardware and operating systems software. Agency administrators noted that these steps are expected to increase the existing optical digital data disk storage system's interoperability. That is, they expect immediate benefits of improved data-sharing and communication linkages with other information systems both within and outside the immediate agency application, while recognizing that in some cases a total system replacement may be required.
Recommendations
Adopt an open-systems architecture for new optical digital data disk applications. Or
Require a "bridge" to systems with nonproprietary configurations.
Digital Image System Configuration
Technical Trends
Image system selection should only be attempted after conducting a detailed analysis of existing and planned agency information requirements. Depending on the quantity of records and user access requirements, installing an off-the-shelf imaging system is the least complicated and least costly approach. Turnkey imaging systems provide generic capabilities, accepting minor hardware and software refinements to better meet the user's unique needs. If proprietary components preclude the ability to accept component reconfiguration to meet organizational requirements, agencies may need to acquire specially engineered systems. Optimally, an agency would adopt a total-systems approach to records management that provides practical solutions to support the agency's information-processing applications.
An image-capture system allows digital-imaging technology to assume many different configurations if that system has the following basic hardware components:
- Document scanner/digital image capture equipment
- High-resolution display monitor,
- Personal computer (PC) system platform,
- Temporary file storage devices, and
- Laser printer.
Depending on agency requirements, the final system configuration may include optional components such as file servers, indexing and image quality-inspection workstations, high-speed document scanners, microform scanners, and local area networks (LAN). Figure 1 illustrates the basic elements of a digital image-capture subsystem.
Figure 1 Digital Image Capture Subsystem
Computer software links the system modules, while retaining the flexibility needed to meet unique user needs. Digital-imaging systems implemented at the beginning of the technology's growth curve often required extensive one-of-a-kind software development that resulted in incompatible configurations. Efforts to modify or upgrade specialized systems were difficult and time consuming, frequently requiring assistance from the vendor's original software engineering team. This situation has improved over time as more software applications were developed for the following areas:
Data Communications: Communication's software facilitates the movement of data within the system hardware, among systems linked on a local area network, and across channels linking systems separated geographically, even worldwide.
Database Management: Indexing and database software provide the foundation for retrieval by controlling the nature and structure of information recorded about each image or group of images, organizing this information in ways meaningful to the user, linking images into documents and files, and allowing the user to identify and display images on demand.
Image Enhancement: Image-enhancement software allows the manipulation of image characteristics to improve legibility, clean up images, and reduce file sizes.
Display System Management: Workstation image-display software allows image control for zooming, rotation, scrolling, and multiple image-display screen formats.
Workflow Management: System workflow software controls all phases of document image capture, tracking, routing, indexing, retrieval, and printing.
Optical Character Recognition: Optical character recognition software can interpret and convert raster-image data into machine-manipulable textual data.
User Experiences
Several significant similarities were noted within the 15 Federal agency imaging systems visited, although the applications varied considerably. For example, more than half of the agencies use desktop-type scanners for converting paper records, while approximately one-third of the sites adopted high-speed document scanners to more efficiently process a larger daily volume of records. Most of the systems visited utilize PC platforms, file servers, and local area network communications. Almost half of the systems surveyed were connected to (or shared index information with) the agency's existing mainframe or minicomputer systems, and mainframe computer connections for others are in the planning and development stages.
A majority of the digital-imaging systems visited use workstations equipped with high-resolution display monitors. The remaining sites store electronic alphanumeric or graphic data, where high-resolution digital-image display is not a requirement. Several systems integrated microform scanners or film output devices, and others have tested or use optical character recognition technologies. Most of the systems visited have installed an optical digital data disk jukebox for automated storage and retrieval, and also utilize laser printers for hard-copy distribution.
To varying degrees, many of the agencies visited have changed or are considering upgrading their imaging systems in areas such as higher performance document scanners, increased memory in workstations, more powerful file servers, new workstation displays, and different optical disk drives or media. Agency administrators noted that open-type system architectures are more amenable to configuration changes, as proprietary hardware or software components are more difficult to upgrade when responding to additional unexpected user demands.
Recommendations
Clearly define user and agency requirements during the imaging project's requirements analysis phase.
Verify that the imaging system has inherent flexibility and has an open, or nonproprietary, design that accepts future hardware and software upgrades.
Conversion of Original Records
Technical Trends
The retrospective conversion of paper records to digital images requires the integration of specially configured production facilities, conversion equipment, and a technology-minded operations staff. It is not uncommon for Federal agencies to limit record conversions to a "today forward" concept, converting only the most current and frequently accessed records. This concept is especially attractive when older, less requested records are a significant segment of an organization's holdings.
Imaging systems convert information into electronic images that can be indexed and searched, routed to user workstations, and remotely distributed and printed. Major input processing steps include:
- Converting original records (paper, microforms, analog data) to a digital format,
- Electronically enhancing images that are difficult to read to improve legibility,
- Appending a file header and compressing the digital images to reduce data transmission and storage requirements,
- Indexing the images at appropriate levels,
- Conducting quality-control inspections of the index and image data and rescanning documents as needed, and
- Recording the digital information on a suitable storage medium.
A comprehensive records tracking and accounting process is necessary for all conversion efforts to insure that all records designated for conversion were in fact converted, and to monitor exactly what was converted. Tracking and monitoring must begin when a record is identified for conversion, and not cease until completing all tasks related to acceptance of the new record form and disposition of the old record form.
The point of entry (input device) for paper records is a document scanner, available in several configurations: some use a stand-alone mode and magnetic storage of the electronic images, some are attached to an imaging workstation, and some are operated under a computer file-server configuration. Image-enhancement processes may be applied to the digital images to improve their legibility, while concurrently reducing overall file sizes. The scanning process usually includes image data compression in accordance with a standard or proprietary format. The document images are then indexed using traditional manual key entry, bar-code scanning, or optical character recognition.
Depending on an agency's requirements, the index subsystem may be maintained in several ways including storage of the index data as part of the imaging system or storage of index data in a separate database management system. In either case, the index data is usually retained on magnetic storage media. Magnetic storage simplifies index data revision and supports faster user access to the information. The scanned images may also be stored magnetically, but optical digital data disks are a viable option for long term information retention. A single scanned image may require between 20,000 and 300,000 or more bytes of storage (image compression at 10:1). When planning a system, it is useful to conduct testing with the original materials to be converted to determine potential scanning throughput rates, storage requirements, and image file transfer speeds. The digital information can be distributed in several ways, including image display on high-resolution monitors, laser printers, computer output microforms, or remote image transmission. These processes often are under the control of a workflow process management system that can also route images to workstations, distribute output, and conduct image tracking and status reporting.
User Experiences
Several salient patterns emerged during the site visits that illustrate generic Federal agency strategies. Federal agency production managers understand the critical role of document preparation in high-volume conversions. Document preparation is a nontechnical component of a records conversion that affects overall operational productivity. The steps involved in preparing documents for scanning are very similar to those used in preparing records for microfilming. The removal of staples, bindings, and other fasteners and proper sequential ordering of documents are important steps that are best performed offline. Performing these steps diligently reduces nonproductive or idle wait time at the scanning workstation and improves document-scanning throughput rates, especially for imaging systems with high-speed equipment.
Eight of the fifteen Federal agency systems surveyed use existing agency staff to accomplish in-house document or data conversions. Five agencies visited utilize onsite contractors for scanning and indexing services arranged through contracts with service bureaus or integration vendors. The survey indicated that several agencies use a combination, or "team," approach for the actual conversion process, in which contractors and Federal employees share conversion tasks. The State Department's imaging system, for example, operates with agency staff processing the incoming mail requests and contractor-supplied personnel operating the scanning and indexing systems.
Several of the agencies visited conduct document scanning at conversion sites that are not located near the storage and retrieval systems. In these cases, the scanned images are temporarily stored using magnetic media or rewritable optical digital data disks. For example, document scanning for the Patent and Trademark Office's conversion project was conducted by contractor personnel at an offsite document storage facility.
Three Federal agencies visited employ optical digital data disks for digitally storing graphic or alphanumeric ASCII data. The National Oceanic and Atmospheric Administration (NOAA), for example, uses optical digital data disk technology for archival retention of coastal environmental data. NOAA monitors natural climatic events and manmade environmental factors and analyzes their impact on rapidly changing global processes. The National Earthquake Information Center uses optical digital data disk technology to store seismic data detected from earth tremors caused by events such as earthquakes, volcanic activity, nuclear tests and oil prospecting. And finally, the Social Security Administration retains workers' earnings data in digital format using rewritable optical digital data disks.
Recommendations
Analyze the domain of documents to be scanned, identify the levels of uniformity, and consider the use of a document conversion contractor when the backfile holdings are extensive.
Implement a comprehensive records accounting and tracking process during the project's conversion phase.
Invest sufficient staff or contractor resources into document preparation to increase scanning productivity.
As appropriate, assign experienced agency staff to conversion processes that benefit from their knowledge of existing agency operations.
Digital Image Scanners
Technical Trends
Document Scanners: A scanner is the hardware component that converts original documents to electronic digital images. The commercial imaging marketplace offers scanning equipment with a diversity of throughput speeds, automated operator features, and acquisition costs. Actual elapsed times for scanning and displaying the images varies based on several factors including the inherent performance of the specific scanning unit, physical dimensions of the documents, and scan resolution selected. These factors contribute to desktop-class scanner production rates of between 2 and 20 documents per minute. Therefore, when equipped with document feeders and two-sided scan capability, desktop scanners are useful for smaller imaging applications with lower daily conversion volumes and for midrange office imaging systems. When equipped with special image-enhancement capabilities, desktop scanners are also effective for scanning low-contrast documents, which are difficult to read. These scanners also function effectively for rescanning poor-quality images rejected during routine image quality-control inspections.
For larger document conversion applications, higher performance scanners that employ heavy-duty mechanized document transports and multiple scanning charge-coupled device (CCD) arrays are available. This equipment offers throughput rates ranging from approximately 40 to 120 or more pages per minute. Depending on the specific unit, these scanners may capture two sides of each document on one pass, improving productivity through reduced paper handling. Since scanner production rates can be affected by the document's physical condition, manufacturer's claims regarding scanner throughput rates should be verified prior to equipment procurement. Verification is best accomplished using a representative sampling of actual agency records. Specialized scanners are also available for capturing larger documents such as maps and engineering drawings.
Document scanners are generally installed and calibrated in accordance with instructions set forth in the manufacturer's operation and maintenance guides. To determine the quality of an image acquired with a recently calibrated scanner, it is recommended that a standardized target be scanned and evaluated in accordance with FIPS PUB 157 "Guideline for Quality Control of Image Scanners." Test targets are available to evaluate scanner performance for a variety of image characteristics, including color, type size, and resolution.
Digital Scanning and Microforms: Depending upon an organization's data storage and retrieval requirements, several alternatives are available to migrate information between digital image systems and analog microform technologies. These approaches include:
- Concurrently scanning and microfilming documents at the image capture stage,
- Creating microforms from existing digitally stored data, and
- Converting existing microforms to digital data for storage on optical digital data disks.
The first approach requires image-capture equipment that digitally scans the paper documents and, at the same time, photographically records the images on microfilm. This bilevel capability potentially offers the best of both worlds. That is, the digital images are stored on optical digital data disks for automated storage and retrieval, while the microforms, when processed and stored in accordance with NARA micrographics regulations (36 CFR 1230), comply with long-term information retention requirements. One potential drawback that could nullify any potential benefits of this approach would be when quality control problems are discovered with the processed microforms, and the document batch needs to be completely refilmed.
A second approach is to create microforms from existing digital image and text data stored on optical digital data disks. Several configurations of microform recorder equipment are available for producing microforms from digitally stored text and raster-image formats. One technique uses a laser-beam recording technology to "write" the digital information, line by line, directly onto the microform materials with micron-sized pixel patterns. The second technique, which applies only to coded data including text, uses the more conventional cathode ray tube (CRT) imaging technology provided in computer output to microfilm (COM) recording devices. Commercial availability is rather limited for high resolution microform recorders that create raster COM images on 35mm films. Since these systems are also complex to operate and expensive, users should consider a service bureau for lower volume applications. Further complexity is introduced when the digital information is marked with unique or proprietary image file-header information, requiring conversion to a widely used format (e.g., Tagged Image File Format [TIFF]).
A third approach involves the digital conversion of microforms. These microforms may already exist in an agency's files. Conversely, they may have been created in lieu of digital scanning of the documents under a microfilm-first, scan-later concept. If this approach is under consideration, investigate the input requirements of commercially available microform scanning equipment. Ensuring that the microform's technical production specifications comply with scanner requirements will expedite the digitization process. Unlike paper document scanners that sense reflected light, microform scanners transmit a beam of light through the film media. The technical quality of the original input microforms directly impacts the readability of the digital images. High-quality microforms provide the most legible (cleanest) digital images, with an added benefit of smaller digital file sizes. Depending on user needs, the commercial marketplace offers microfilm scanners with various operational performance features. High-end microform-scanning equipment may include powerful image-enhancement algorithms that improve legibility from low-contrast microforms and electronic sensors for detecting skewed or misaligned images.
User Experiences
The 15 Federal agencies visited used desktop or high-speed scanners to convert existing paper or microform records. Several applications begun with low-volume desktop scanners found it necessary to upgrade to higher performance equipment. Other agencies achieved higher production by adding autofeeders to desktop sheet-fed scanners or by integrating additional high-speed scanners. Site managers noted that under actual production conditions, document scanner throughput rates may vary considerably from manufacturer estimates. System administrators noted that increased demands placed on the existing components were based in part on greater awareness and acceptance of imaging technology. At peak times, this unexpected increased demand on imaging systems may overload inherent capabilities.
The conversion of microform to digital images offers a number of potential benefits, but there can be a significant downside. The original quality of the microform, including image contrast, spacing, skew, and sharpness, all contribute to potential scan production rates and quality of the scanned images. In order to deal effectively with these issues, the Department of the Army's PERMS system employs specialized microform scanner systems. Several other agencies surveyed chose optical digital data disk technology as a replacement for existing microform-based information systems, accomplished either through a digital scan conversion of the existing microforms or by recording data directly to optical digital data disks with no paper records created as intermediaries.
Recommendations
Prior to system acquisition, validate vendor claims regarding document throughput rates, image quality, and ease of operation using a representative sampling of the agency's holdings.
Follow the standard test procedures outlined in FIPS PUB 157 "Guideline for Quality Control of Image Scanners."
Scanner Resolution
Technical Trends
The document scanner resolution selected directly impacts several key factors including the display-screen readability of digital images, the legibility of hard-copy output, and the usefulness of the digital images for future agency applications. The selection of optimum scan resolution is critical for both immediate and longer term applications, as the original scan resolution can never be increased even if future information retrieval technologies require a higher quality image. Consequently, a strong case can be made for scanning at the highest resolution that is currently affordable.
Scanner resolution is a complex equation: Image resolution, color spectrum, file storage size, and compression algorithms are interdependent and dependent on the scanning, display, and printing equipment available. Digital image file sizes, for example, depend on the scanner resolution selected. A standard office document requires approximately 500,000 bytes (uncompressed) at 200 dots per inch (dpi) and almost 2 million bytes before compression at 400 dpi. This four-to-one storage factor becomes significant when capturing thousands of images.
In the past, it was primarily scanner and display equipment limitations that determined digital-scanning practice, and scant attention was paid to objective criteria. Because scanner resolution also affects input productivity, vendors formerly specified lower resolution settings to achieve efficient throughput speeds and reduced data storage and processing costs. A scanning resolution of 300 dpi produces a quality comparable to that of an average office laser printer (though lower than a photocopier) and may be adequate for typical office documents that contain no type-font size smaller than 6-point. If a Federal agency plans to integrate optical character recognition (OCR) technology, a minimum scanning resolution of 300 dots per inch is recommended.
Engineering drawings, maps, and documents that have very detailed, fine line and handwritten information may require a scanning resolution of up to 600 dpi or greater. In all cases, but especially if the documents to be scanned include maps, drawings, or documents with fine line and background detail, tests should be conducted to verify the appropriate scanning resolution on a case-by-case basis with actual document samples prior to equipment acquisition.
User Experiences
Scanner resolution is often specified by program managers responsible for balancing two critical factors: image data storage and image display legibility. Scanning resolution at 12 Federal agency sites capturing original documents range from 200 to 400 dpi, with a majority employing 300 dpi. Agencies placing less value on the quality of screen display while emphasizing storage economics tend to scan in the 150 to 200 dpi range, arguing that a far greater number of compressed images can be placed on any given optical digital data disk. Agencies requiring a higher quality image display accept larger file sizes, scanning in the 400 dpi and higher ranges. For example, the State Department's REDAC imaging system routinely scans at 200 dots per inch, while poor quality documents are scanned at up to 400 dpi. The Bureau of Land Management's General Land Office Records Automation Project system captures document images at 300 dpi resolution, displays images at 150 dpi, and laser prints 300 dpi images.
Recommendations
Employ a scanning resolution of at least 300 dpi for office documents when future applications (e.g., OCR) for the digital images are anticipated.
Specify a higher scanning resolution (between 300 and 600 dpi or higher) as needed, for engineering drawings, maps, and documents containing significant fine line and background detail information.
Dynamic Range
Technical Trends
Digital-imaging systems typically include binary-type scanning and display equipment. That is, each dot (pixel) of a raster-image bit map is interpreted as either black or white. Binary scanners are not ideally suited for capturing images of colored documents, photographs, illustrations, or other items containing continuous tones. To optimally capture these images, a scanner with a greater dynamic range is needed. Dynamic range is defined for this report as the variation in tone in any given scanned dot. In black-and-white images the range is represented by a scale of gray tones. The degree of blackness associated with each picture element, or pixel, in a gray-scale image is controlled by the digital information, or bits, associated with that pixel. Similarly, in color images each pixel is represented by a value for the three primary colors (usually red, green, and blue) that, when combined together, produce the desired color.
Imaging systems recording gray-scale and color images require specialized components such as a scanner with gray-scale or color capability, video displays that can reflect the greater dynamic range in the system, and more powerful image-processing software. A gray-scale scanner is mandatory when scanning continuous-tone black-and-white photographs or negatives. Such images should be scanned at 8 bits per pixel, allowing the expression of 256 gray values, unless it can be determined in advance that there is no current or anticipated future need for this level of detail. Gray-scale scanning techniques may also be effectively applied in the scanning of black-and-white documents. Because the human eye is highly sensitive to variations in luminance, documents scanned at a relatively low resolution (e.g., 200 dpi) but with 4- or 6-bit gray-scale may actually be more readable on low-resolution monitors than documents scanned at a higher resolution in a bilevel mode. Use of a higher resolution monitor or printing often requires a higher resolution scan (e.g., 300400 dpi) with 8-bit gray-scale imaging.
Color scanning presents even greater technical challenges. For example, the full visible color spectrum may not be captured accurately by all scanners. Additional problems occur on output. Whereas the red, green, and blue values of documents are usually captured during scanning, printer output is achieved by balancing cyan, magenta, yellow, and black. For accurate representation of the colors on output, the input and output devices must be calibrated. Even with sophisticated image compression techniques, gray-scale and color imaging requires substantial data storage capability. A standard uncompressed binary digital image consists of hundreds of thousands of pixels, each represented by 1 bit of information. Because an 8-bit gray-scale image represents each one of those pixels with 8 bits, the resultant uncompressed file is eight times as large. For example, an uncompressed 300 dpi bi-tonal image requires 1.05 megabytes of storage, while an uncompressed 300 dpi continuous tone gray scale image (8-bits per pixel) requires 8.4 megabytes of storage. Gray scale compression algorithms are by nature less efficient than binary compression schemes (i.e., compressed gray scale image files can be much larger than 8 times the size of compressed binary files). Based on agency requirements, this factor may make gray scale or color scanning a prohibitively expensive storage option.
User Experiences
Only one of the Federal agency sites visited, the National Earthquake Information Center of the US Geological Survey, utilizes computer workstations with 800 x 1,000 pixel monitors to display seismic digital data images that contain 256 shades of gray. None of the remaining agency sites visited are routinely scanning or storing gray-scale images.
Recommendations
Employ gray-scale or color imaging technology as needed for suitable continuous-tone images such as photographs, maps, and related records.
As appropriate, utilize 8-bits per pixel gray scale image technology for capturing continuous tone black and white photographs and/or negatives, and 24-bit mode to obtain true color rendition.
Image Enhancement
Technical Trends
Digital image enhancement invokes software algorithms used to "clean up" the visual appearance and quality of digital images. Image enhancement should be carefully used, because this process may actually remove minute elements of the image data. This image data deletion may occur either selectively or automatically, with the end product having increased visual contrast and improved readability. Electronically enhanced images can dramatically increase display-screen and hard-copy legibility. Image enhancement can also reduce storage requirements by improving the efficiency of image-compression software. Documents that are difficult to capture, such as carbon copies with blossomed characters, multigeneration photocopies, light blue and purple mimeographs, faded or stained originals, and faint pencil and ink annotations, are prime candidates for image enhancement. Imaging systems typically provide a fundamental image-contrast manipulation capability; additional hardware and software is available to expand enhancement power and speed while increasing compression capabilities.
One negative aspect of certain image-enhancement algorithms is a possible loss of detail contained within the original documents. For example, documents containing color printing, handwritten annotations, or marginalia may not be uniformly imaged. In these cases, image-enhancement software might inadvertently remove some faint or low-contrast markings. Similarly, all bi-tonal systems convert colors to black, increasing readability problems. Special filters can be used in the scanning process to minimize this problem, and administrators should ensure that the scanning capability of the proposed system matches the characteristics of the documents to be scanned. It is prudent to test the imaging system with a sample of agency documents prior to a full-scale conversion.
If a source document has intrinsic value, the original must be retained following digital image scanning. The digitally scanned raster-image data of intrinsically valuable documents should be stored in unenhanced form to ensure that all of the digital information as captured is available for processing in the future by more powerful image-enhancement techniques.
User Experiences
Image enhancement is attractive due to its ability to improve the legibility of stained, aged, and low-contrast documents. Even though the majority of document scanners provide basic contrast (light/dark) controls to adjust the digital image appearance, several of the sites visited are considering or have already integrated special add-on image-enhancement capabilities. For example, the US Army PERMS program uses software enhancement to clean up "noisy" images, thereby obtaining a greater compaction in data storage and higher quality images. Another example is the Commodity Futures Trading Commission, where an image enhancement computer circuit card was installed in the desktop scanner to obtain higher quality images. System operators also noted the usefulness of a display screen reversal (positive and negative) capability that increases the visual image legibility of hard-to-read documents, scanned negative microfilms, and faded photostats.
Recommendations
Conduct scanner testing using selected documents during the system design phase to determine the need for special scanner hardware modifications.
Retain scanned unenhanced images of documents of intrinsic value.
Digital Image File Headers
Technical Trends
Digital-imaging systems use a complex set of computer software for capture, storage, and image retrieval functions. The user's request for an image is linked to a specific location on the optical digital data disk or other storage medium. Linking is accomplished by means of a header preceding the digital data of each discrete image or group of images. Image file-header data may include such items as the file size, type of compression technique, and scanning resolution. File headers are often proprietary and typically are supplied as an integral imaging system component. In spite of a file header's importance to retrieving images over a long period, file headers are often overlooked by users until problems surface. Difficulties usually occur when image data must be transferred or when a system is upgraded or otherwise modified.
It is essential to use nonproprietary image file formats and header structures or have the ability to migrate image files into a common standardized format. When proprietary image file formats and headers cannot be avoided, the system developer should be required to provide a "bridge" to nonproprietary image file formats or, at a minimum, comprehensive documentation describing the image file structure. At present there are no agreed-upon industry-wide standards for image file formats and headers, although many in the industry are currently working to develop such standards. An Image Interchange Facility (IIF), for example, is currently under development under the auspices of the International Standards Organization as part of its International Image Processing and Interchange Standard. The main component of the IIF will be the definition of a data format for exchanging arbitrarily structured image data across heterogeneous application boundaries.
In the absence of an accepted standard image format, many imaging systems use the Tagged Image File Format, or TIFF. It is one of the most widely supported image file formats for personal computers. Every TIFF file includes a header, one or more image file directories, and content data. TIFF headers and image file directories tell the computer system how to read the data and contain such information as the width of the image, its length, and resolution. Some image system developers are adopting the TIFF to support image transfer among systems. Unfortunately, different versions of TIFF headers can be implemented; therefore the TIFF does not automatically guarantee success with image transfers between disparate systems. Acquiring comprehensive documentation about the header structure is recommended, even when using the Tagged Image File Format.
User Experiences
More than half of the Federal agency sites surveyed for this report use proprietary digital image file headers. Only three of the Federal agency sites employ some version of the more widely used Tagged Image File Format. The complex terminology and specific details of image file headers contributes to the lack of universal understanding and recognition of their importance. As a result, many system administrators rely on integration companies, vendors, and optical digital data disk manufacturers to guide them through the labyrinth of technical details and file-header format specifications. This reliance on proprietary vendor solutions may contribute to future difficulties in data migration.
Recommendations
Use file formats that promote/facilitate network data transfer, such as Aldus/Microsoft TIFF Version 5.0 which meets the Internet Engineering Task Force standard definition for exchange of black and white images on the Internet.
Require use of a nonproprietary image file-header label. Or Require a "bridge" to a nonproprietary image file-header label. Or Require a detailed definition of image file-header label structure.
Data Compression Techniques
Technical Trends
Digital images are usually compressed as part of the scanning and storage process and subsequently decompressed at retrieval. A compression algorithm transforms the original digital image raster pattern into a mathematical code that is stored more compactly, with compression techniques that are one- or two-dimensional. One-dimensional compression uses contiguous (adjacent) pixels on the same scanned line, while two-dimensional compression compares the differences between scanned lines, as well as within the same line. Depending on the document characteristics and techniques chosen, the actual compression ratios achieved can vary widely. Typical imaging systems may compress at a 10-to-1 ratio, while a 20-to-1 ratio or even greater, is feasible with more sophisticated compression schemes.
Although there are many compression techniques in use today, they generally fall into two categories: proprietary or standard. Proprietary compression algorithms tend to operate faster and offer greater data compaction. However, the stored images may not be easily transportable between different systems because of the algorithm's specialized characteristics. Standardized compression algorithms may not be as powerful but may support image data transfer between systems that otherwise might be incompatible. Standard, or nonproprietary, compression techniques are therefore an indispensable part of a migration strategy for records of long-term value.
Proprietary and standard compression techniques can each be further subdivided into "lossy" and "lossless" compression methods. With lossy compression, a certain amount of the original information is discarded as part of the compression process. Lossless compression, as its name implies, allows for the reconstruction of a file identical to the original. When performed correctly on a suitable document, lossy compression has the advantage of dramatically decreasing the size of the original digital files in a way that is almost undetectable by the human eye. For those archival documents in which continued fidelity to the exact appearance of the original document is important, a lossless compression scheme is recommended.
One of the most commonly used lossless compression techniques utilizes a method called run-length encoding. It evaluates patterns of adjacent pixels on a single horizontal line and encodes binary transitions. Run-length encoding is most efficient for documents with large areas of blank space, commonly found in office text files. More complex documents that include line drawings, charts, photographs, and maps, among others, may be more efficiently compressed using techniques that use "look-up tables" for comparison with the scanned image.
The former Consultative Committee on International Telegraph and Telephones (CCITT), now called the Telecommunications Standardization Sector (TSS), has developed international standards for data transmission over communication lines in one- and two-dimensional modes. These facsimile standards are known as Group 3 and Group 4. Group 4 provides greater compression capability (though at a certain point in a lossy fashion) and operates in a two-dimensional mode. Currently under development by the Joint BI-level Image Group (JBIG) is a new international standard intended to replace CCITT Group 3 and CCITT Group 4 compression standards.
In addition to the preceding standards, system software developers may occasionally need to implement other compression schemes. Two significant compression schemes are the Joint Photographic Experts Group (JPEG) and Motion Picture Experts Group (MPEG). JPEG is designed for compressing either full-color or gray-scale digital images of continuous-tone quality. It offers both a lossy and lossless compression alternative. The former occurs when a mathematical process called discrete cosine transform (DCT) is invoked that utilizes an 8 x 8 frame of pixels and yields a substantial compression. This process produces a lossy image with some loss of detail that may not necessarily be detectable to the human eye. The actual amount of loss depends upon the compression ratio selected. In contrast, the lossless compression alternative achieves complete fidelity to the source image because the sampling area or frame is 2 x 2 pixels, three of which are aligned along different axes with respect to the fourth. The compression ratio is user controlled and limited to either 2:1 or 3:1.
MPEG is a compression scheme for full motion video images. It uses JPEG for the compression of individual frames and also uses other lossy techniques to compress data between frames. The growing demands of multimedia computing, video conferencing, and high-definition digital television make it likely that there will be new standards developed shortly for the rapid transmission of moving images. Because MPEG is inherently lossy, and the high compression ratio of JPEG is only possible with the lossy compression technique, system administrators should carefully evaluate their system functional needs to determine if either technique will meet current and anticipated future image requirements.
System developers and administrators must choose between standard and proprietary compression techniques. Using compression techniques conforming to CCITT or the developing JBIG specifications when storing nontonal data will increase the likelihood that the images can be used with other technologies or migrated between systems. Although proprietary techniques may provide greater data compression, compatibility is not assured. In fact, if the software supporting a proprietary compression technique becomes obsolete, then for all practical purposes the image cannot be restored. There may be times when the use of a proprietary lossless compression technique is unavoidable, but in those instances the vendor should be required to provide a utility to decompress the data to its original digitized data format. At some future time, the data can be compressed again using any method desired. Table 1 provides a comparison of data compression techniques and examples of their applications.
|
Table
1. Data Compression
Techniques
|
||||||
|
Group
3
|
Group
4
|
JBIG
|
JPEG
|
MPEG
|
||
|
Type
|
Lossy
|
X
|
X
|
X
|
||
|
Lossless
|
X
|
X
|
X
|
X
|
||
|
Level
|
BI-tonal
|
X
|
X
|
X
|
X
|
X
|
|
Gray
Scale |
64
Shades |
X
|
X
|
|||
|
Color
|
256
Colors |
X
|
X
|
|||
|
Image |
Paper
Records |
Primary
Use |
Primary
Use |
Primary
Use |
X
|
|
|
Photographs
|
X
|
Primary
Use |
||||
|
Motion
Pictures |
X
|
Primary
Use |
||||
User Experiences
The sites surveyed used both proprietary and standardized compression schemes. More than half of the 15 sites have adopted some type of proprietary data compression algorithms, whereas only 5 used the standardized CCITT Group 4 compression technique. Although the various proprietary compression schemes effectively reduce the digital file sizes, resulting in faster image transmission and reduced storage on the optical digital data disks, adoption of a proprietary approach introduces a level of risk into any subsequent data migration effort. Regardless of the technique adopted, system administrators recognized the importance of obtaining descriptive documentation about the compression algorithms employed.
Recommendations
Use a lossless compression scheme when continued fidelity to the exact appearance of the original document is achievable and desired.
Use JPEG or MPEG for images with continuous tonal qualities when some loss of detail is acceptable.
For digital images without continuous tonal qualities, require standardized compression techniques, such as CCITT Group 3, CCITT Group 4, or JBIG.
If a proprietary lossless compression system is used, require that the vendor provide a means of decompressing the data to its original format.
Digital Image Quality Assurance
Technical Trends
Successful digital-imaging systems include a quality-assurance program as part of the system management process. Effective quality-assurance programs involve (at a minimum) two critical aspects:- Process control
- Product quality control
Process control ensures that production equipment (e.g., document scanners, image compression hardware, laser printers) and related system processes are performing at optimum levels according to preestablished criteria. Ideally, these process-control criteria are routinely used to monitor the performance of the imaging system and its individual components. Specialized diagnostic and technical evaluation tools, combined with detailed logbooks, are an invaluable resource in troubleshooting future system problems.
Product quality control evaluates the quality of the individual digital images and related index data produced by the imaging system. This level of quality control is expedited when the scanned digital images are temporarily stored on magnetic disk cache. Magnetic storage permits rescanning prior to recording the image data onto the optical digital data disks. Corrective rescanning is an especially important capability for systems utilizing write once optical digital data disks. Depending on system configuration, corrections may be performed at the scanner capture station or at specially designated inspection or rescan workstations.
Training and supervision of operations staff is a key factor in maintaining acceptable image quality. As noted earlier, there are no objective empirical indicators of acceptable image quality for digitally scanned images. An alternative is to categorize documents based upon scanning problems and reach a consensus on how to most effectively capture the "best" image. Ideally, this decision process would involve a team consisting of image system production staff, records managers, and system users and researchers. These evaluations should include visual analysis of workstation display screen images and laser printer output. Retaining a set of representative laser prints for future reference would be a valuable image analysis benchmark tool.
Quality-control inspection programs have a direct impact on document conversion productivity and overall usefulness of the imaging system. For example, an inspection program may include a comprehensive visual comparison of 100 percent of images scanned to the original documents, or it may be limited to defining the inspection population based on a calculated percentage or sampling plan. The overall level of quality control inspection should be extremely high if the original documents are not retained after conversion. In some systems, pass/fail decisions about the adequacy of the scanned images are based on operator judgements. These judgments may be fine-tuned through training and hands-on experience rather than relying on more objective criteria such as written agency guidelines or operations procedures. That is, if a screen image "feels right" to the scanner operator or quality-control technician, then the system's quality criteria have been achieved. In any case, prudent Federal agency managers should require immediate evaluation of each index entry and document image.
It is important to not only verify the digital images as captured at the scanning station, but also when images are written to an optical digital data disk, or after creating an optical disk-to-disk backup copy. Problems during these processes can result in images that are statistically counted by the system as "pages", but are not retrievable when users attempt to view or print the file. Depending on the extent of the problem, if only a part of the page was copied or transferred, these corrupted files may contain useless system generated noise or extraneous lines. These problems are caused by several sources such as hardware component failures, software glitches, or even power surges. Therefore, it is advisable when transferring image data to any other media from the original magnetic hard drive or optical digital data disk to verify the images as copied. This process can be performed manually, or automatically using image verification software.
A 100-percent quality inspection of the index data is mandatory to continued successful operation and maintaining system user confidence in an agency's digital imaging program. If the index data is key entered incorrectly or if errors are introduced during the OCR/bar-code automated data-capture process, the related digital images are essentially unretrievable. Index verification can be accomplished in several ways including a visual comparison of the key-entered data to the displayed scanned images or paper documents. The index data can be also be verified by double-keying, whereby the index data is manually rekeyed, with the computer system's software automatically performing a comparison of the two entries.
User Experiences
Virtually every Federal agency site visited has some type of quality-control inspection program. The inspection programs vary as to whether visual or automated verification of image files is performed. Image inspectors use display monitors to verify visually the quality of scanned documents. A few system managers expressed confidenc