Preservation

Technical Information Paper No. 12

Digital-Imaging and Optical Digital Data Disk Storage Systems: Long-Term Access Strategies for Federal Agencies

July 1994

A Report by:

The Technology Research Staff
The National Archives at College Park
8601 Adelphi Road
College Park, Maryland 20740-6001


Appendix A: Federal Agency Site Visit Reports

Site Visit Selection Criteria

NARA's Technology Research Staff conducted a nationwide survey of Federal government agencies to identify existing optical digital data disk installations. This data collection process obtained up-to-date user experiences, and helped to gather insights into system administrator's plans for applying optical digital data disk technology within their respective agencies. The survey process identified a diverse universe of small, mid-range, and large sized systems storing raster image and digital data. System criteria used to select the fifteen site visits included:

Size of System (Small or Large)

Small systems were defined as having under twenty optical digital data disks in use, few image capture and user workstations, and no jukebox. In practice, small systems may often be pilot projects that have not fully scaled up, or are serving as a research test platform. Larger systems, on the other hand, typically store optical digital data disks in a jukebox, employ network communications linking multiple imaging and user retrieval workstations, and feature high speed image capture equipment.

Type of Digital Information Stored (Image or Data)

The systems described in this report store information on optical digital data disks. This digital information is in the form of scanned document images, or digital ASCII data, databases, numerical information, or scientific data.

Information Retention (Temporary or Long-term)

Temporary information retention includes records scheduled (or likely to be) with a life-span of under seven years, which is also the approximate life span of a typical computer system. Long-term retention includes scheduled or unscheduled records with a life span greater than seven years, regardless of whether or not the records will ever end up in the National Archives.

Functionality of System (Stand Alone or Integrated)

Stand alone systems often have a single, narrowly defined purpose, even if the system is linked to the agency data base. In many cases, stand alone systems have at best a fax link to gain access to other information systems. In comparison, integrated systems serve the core mission of the agency, or are linked to other automated systems that are administered by other units or even by other agencies.

The research study site selection process for this report also included other criteria such as: identification and availability of knowledgeable agency resource people able to cooperate with this research project; access to full technical documentation that describes each system; and, achieving a diversity of agency missions and types of information processed contributing to a balanced report coverage.

Site Visit Record Holdings

A majority of the fifteen Federal agency systems surveyed maintain multi-page case files, comprised of records containing mixed forms where a single index point (typically personal name, corporate body, or case number) provides access to a single image or logical file. Examples of case file storage systems are official military personnel records, hazardous waste site documentation, and records released under the Freedom of Information Act. Other systems visited maintain images of single or multi-page standard forms where access is provided by unique identifying number (e.g., social security number) or personal name. Examples of standard form systems are those for patent and trademark applications, applications for licenses and grants, and income tax returns. The remaining systems as surveyed contain non-image format data files formerly stored magnetically or as computer output microfiche; and mixed records containing a variety of records applications. A summary of the records classifications stored on optical digital data disks by the Federal agency sites visited includes:

  • Case files: Construction engineering documentation containing a mixture of electronic and non-electronic formats including construction documents, tech reports, maps, microforms, engineering drawings, video tapes, 35mm films, books and periodicals.
  • Case files: Official personnel records from paper and microfiche that need purging; government personnel forms, evaluations, awards, and medical forms.
  • Case files: Federal land records of survey notes, plats, tract books that form the basis of land title searches--old, often fragile (brittle) handwritten information.
  • Case files: Technical reports and documents describing hazardous waste sites, used for evaluating health risks and emergency events involving toxic substances.
  • Case files: Environmental cost recovery reports, legal documents associated with cleanup of high priority toxic waste sites.
  • Case files: Documents to be released under the FOIA laws, need redaction or clean- up prior to release.
  • Case files/standard forms: Official agency records of judicial rule making and adjudicatory matters, applications for licenses and grants, and reports filed by cable system operators, often requiring next-day turnaround.
  • Case files/standard forms: Claims processing; royalty collection documents for payments to the government for natural resources extracted from US lands; government forms fiscal records.
  • standard forms: Applications and approvals for patent documents, scanned off site at document storage repository.
  • data files: Seismic data (earth tremors) captured by remote sensors, useful for earthquake monitoring, replaces magnetic tape storage.
  • Data files: Environmental and coastal satellite data for water temperatures, weather patterns, ocean currents, and other US coastal and Great Lakes data measured with instruments or observed.
  • Data files: Microfilm replacement system for self-employment tax information.
  • mixed records: Captured war documents, maps, misc. used for intelligence often in foreign language.
  • Mixed records: Daily newsclips and legal docket records.
  • Mixed records: Newly released public policy documents.

Listing of Federal Agency Sites

Of the fifteen Federal agency systems examined in detail, the range of responsibilities included: Armed Forces units (3); Federal land management office (1); public health care oversight agency (1); financial trading regulator (1); environmental oversight (1); communications regulation office (1); library references and services provider (1); natural resources (1); climatic monitoring office (1); invention registry office (1); wage and retirement benefits claims processing (1); Freedom of Information Act processing unit (1); and, earth tremors or seismology events monitoring (1).

Detailed site descriptions are provided for the following fifteen Federal agencies:


SITE VISIT REPORT #1

AGENCY: Agency for Toxic Substances and Disease Registry
SYSTEM: Toxicological Profile Image System Public Health Assessments Image System Cost Recovery Image System

CONTACT: Sharon O. Jacobs, Director, Office of Information Resources, Management Agency for Toxic Substances and Disease Registry, Atlanta, GA

SUMMARY DESCRIPTION:

Since 1988, the Agency for Toxic Substances and Disease Registry (ATSDR) has utilized a state of the art information system for document image management. A Wang Integrated Imaging System (WIIS) is used to convert scientific and administrative documents to optically stored digital images. The digital images describe the links between human exposure to hazardous substances and an increased incidence of adverse effects to health. The image data is recorded onto twelve-inch write once, read many (WORM) times optical digital data disks in a multi-platter jukebox retrieval system. A single unified computer interface provides access to the document indexing system, and preserves the complex structure of ATSDR reports and documents. This dual purpose computer interface also provides user access to HazDat, a scientific database containing environment and health data stored on a mainframe computer. Additionally, the ATSDR imaging subsystem conforms, to the extent feasible, with existing industry and government information technology standards.

The ATSDR information system is considered important to this study for several reasons, including: the system's potential to support interagency sharing of health related digital image data; the agency-wide approach concept applied to system development; the use of a single imaging integration vendor for system design, personnel training, and follow on technical support; the automated computer linkage to the mainframe database access system; and, the agency's recognition of document imaging legal admissibility issues.

BACKGROUND:

The mission of the Agency for Toxic Substances and Disease Registry (ATSDR) is to "prevent or lessen harmful effects to people and their quality of life caused by hazardous materials in or near their communities." The Agency was created as a separate entity of the Public Health Service (PHS) in 1980, within the Department of Health and Human Services. The creation of ATSDR as a Federal agency is one of several initiatives resulting from the Comprehensive Environmental Response, Compensation, and Liability Act (CERCLA), or what is more commonly known as "Superfund" Legislation. Congress generated this body of legislation as part of its response to two highly publicized and catastrophic events of the late 1970's: discovery of the Love Canal waste site in Niagara Falls, New York, and the industrial fire in Elizabethtown, New Jersey, which set off the release of highly toxic fumes into the air in a densely populated area. Although the ATSDR functions as an autonomous agency, it receives administrative support from the Centers for Disease Control, also headquartered in Atlanta, Georgia. The public health mandate of ATSDR differs substantially from the regulatory function of the Environmental Protection Agency (EPA), although both agencies exchange information on sites.

More than 400 ATSDR scientists and science administrators are tasked with collecting information on the release of hazardous substances from toxic waste sites or from emergency events involving hazardous materials. They are also concerned with the health effects of these substances on human populations. This information, compiled in the HazDat database, is then used as a scientific technical repository when creating agency products such as public health assessments and supporting documentation, medical health consultations, toxicological profiles and other site characterization documents. These important information sources are revised regularly as new research findings are released. The agency is currently responsible for assessing the health risks of more than 1,350 National Priorities List (NPL) toxic waste sites identified by the EPA. This is only a small part of the more than 38,000 toxic waste sites listed in the EPA CERCLIS database.

The need for imaging capability within the ATSDR is based on several factors, including: the complexity of the HazDat database; the incompatibility of at least four stand-alone database systems used by the research scientists; and, the volume and complexity of the published and unpublished output of the agency. Maintaining an effective audit trail throughout the technical report production process is one of the agency's biggest challenges. In addition, the wide distribution of the agency's products in hard-copy form is an expensive and time-consuming task. The agency's products are written for scientists and public health officials, but are provided on request to those individuals with any interest in toxic waste sites. These requesters include Congressional staff and the general public, state and local governments, other Federal Agencies and academia.

Origins: A Needs Assessment study, undertaken by the Office of Information Resources Management in 1987, identified key integrated system functions needed to support the agency's primary responsibilities. Document tracking was difficult due to lack of communication between existing stand-alone ATSDR database systems. The principal system requirements identified in that study included: linkages to the HazDat scientific database; remote access from ten regional offices; data portability; the creation of a report production audit trail; provision for records management and disaster recovery; the capability to retrieve selected portions of complex documents and reports compiled from a number of diverse sources; and storage and space considerations.

ATSDR's goal was to create a system for electronically capturing, processing, storing, and retrieving ATSDR toxic substance data linked to the scientific HazDat database. Important system criteria included the need for a user-friendly interface, accurate and timely data, security, and the ability to integrate existing hardware into the new system configuration. In 1988, a "try and buy" prototype document imaging system was installed and tested under operational conditions. A full-scale document imaging system was subsequently developed and installed. Full scale document conversion began in March 1990, and to date all of the National Priorities List (NPL), public health assessment documents and toxicological profiles have been digitally scanned and stored on optical digital data disks, as well as Toxicological Profile references numbering over 35,660. The Agency's next imaging priority is inclusion of site files documentation. ATSDR's ultimate goal is to include all relevant documentary sources and agency products into a fully integrated information system readily accessible by users.

SYSTEM CONFIGURATION:

Date system installed: 1988

System Installed by: Combination of OIRM staff and Vendor (Wang).

System Configuration Changed Since Installation? Yes. Two optical disk drives have been added to the system. One in the Jukebox and one stand alone. These two drives support the Cost Recovery Image System.

  • Communication Environment: Novell LAN interface between an IBM 9070 Model 520 (mainframe), a Wang VS 7310 minicomputer, and multiple desktop personal computers.

  • Database development was performed under ADABAS/Natural.

  • Document Scanning: Controlled by a Wang VS computer as part of Wang's Integrated Imaging System (WIIS).

  • Index: The index for retrieving the document images is maintained separately on magnetic media.

  • Image Storage: Wang 80-disk optical jukebox with 3 optical drives.

DIGITAL IMAGE CAPTURE:

Data Access: Public Health Assessments and toxicological profile documents are scanned even though a majority of recently produced documents are available electronically. The rationale behind this is that the imaging system serves as an agency-wide repository for accurate, timely, and complete scientific and administrative information. Further imaging serves as a vital element in the agency's corporate electronic enterprise. The other elements include, text, data, and voice.

Document Scanning: Four document scanning workstations, each operating at a rate of 7-8 pages per minute, convert the original hard copy reports. The physical condition and visual appearance of the documents vary considerably, requiring scanner contrast control adjustments to ensure image quality.

Scanning Personnel: Operational responsibilities for all applications are handled by ATSDR staff.

Estimated Number of Documents/Records Converted: 75,405 divided as follows (Asterisk marks generated computation above):
  • TOXICOLOGY DIVISION
    Profiles 108*
    References (records) 35,660*
    Reference Image 20,474
    Total pages scanned 316,316

  • HEALTH ASSESSMENTS
    Profiles 1,315*
    Total pages scanned 19,171

  • COST RECOVERY
    Time Sheets records 6,631
    Linked T/S entries 38,322*
    Total pages scanned 6,631
    GRAND TOTAL PAGES SCANNED 342,118

  • Number of Platters used:
    • Tox Division: 12 double sided (2 Gigabytes each)
      24 single sided (1 Gigabyte each)
    • DHAC: 4 single sided (1 Gigabyte each)
    • CRS: 2 single sided (1 Gigabyte each)


  • Total Information in optical disks (approximately): 54 Gigabytes

Disposition of Original Records: Records scheduled for 1-10 years.

Quality Control: ATSDR digital image quality assurance procedures require on- going evaluation and maintenance of scanner performance. Operator procedures specify that the scanners be calibrated in accordance with the manufacturer's specifications. Scanner operators visually inspect each image captured to ensure conformance with established quality criteria. A major image acceptance factor is eye-readability (i.e., not too dark or light). No follow-up image quality sampling inspection or image evaluation test targets are used.

Scanning Resolution: ATSDR's scanner resolution settings are selected based on document type and physical characteristics. For example, toxicological profile files are typically scanned at 200 dots per inch (dpi), while health assessments are routinely scanned at 300 dpi. Testing with actual ATSDR documents showed that 300 dpi retains fine-line details of the graphs and other complex graphics features, while also providing a significant improvement in screen display and laser print qualities.

Color and Gray Scale: ATSDR has identified no immediate need for either color or gray scale scanning.

Image Enhancement: No special image enhancement techniques are used other than basic light/dark contrast adjustments.

Compression/Decompression: Wang's proprietary software efficiently reduces and restores the electronic digital image files.

DOCUMENT INDEXING:

Creation of Index Database: During the scanning process, computer programs take over the task of indexing. These programs make the process completely transparent to the varied report structures which was an important system design factor. The system accepts new information and the (sometimes) substantial content and structure report updates. After document scanning, index data is key entered using the display screen images.

Location of Index Database: The IBM 9070 mainframe computer, containing the central HazDat database, is linked to the Wang WIIS document index files. The initial HazDat database modules were correlated from several agency stand-alone information systems. The 9070 computer system's magnetic disks contain the document structures and formats. The document index database is maintained and located on the Wang Image server VS7310.

Index Structures: The indexing system preserves the structure of ATSDR documents using a complex hierarchical indexing scheme. Report components (e.g., table of contents, charts, graphs, chapter breaks) are tagged after scanning using a series of PF (function) keys. This function key capability, of particular value to ATSDR scientists, allows users to quickly switch from report text to technical references. The hierarchical indexing scheme follows the sample format.

3.2.2 Health Assessments

3.2.2.1 Summary/Executive Summary

3.2.2.2 Background/Introduction

3.2.2.2.1 Site Description and History

3.2.2.2.2 Site Visit

3.2.2.2.3 Demographics

3.2.2.2.4 State and Local Health Data

3.2.2.3 Community Health Concerns

An intellectual challenge arose when ATSDR grappled with the philosophy of the indexing scheme for linking document images and the image database. Top management voiced strong support for an Agency-wide indexing scheme, while some of the scientists voted for at least two different approaches, by section/chapter and by document. This issue took a good bit of time to resolve and it was finally agreed upon to have an agency-wide standard that allowed the two indexing schemes to coexist. The reasoning behind these two approaches was simple--the indexing scheme was based upon how the scientists were accustomed to locating information from the agency products.

OPTICAL DIGITAL DATA DISK STORAGE:

Scanned image pages are stored on magnetic disk cache until each document page passes quality control inspection. The Wang WIIS software allows the in-process images to be overwritten (re-scanned) to correct image quality problems. Approved document images are subsequently recorded onto the write once optical digital data disk media for permanent retention. Exact duplicates of the original optical digital data disks are created using the Wang system's backup procedures, with Fort Knox used for off-site archival disk storage and disaster recovery.

Image File Headers: The ATSDR's images are in compliance with the tagged image file format (TIFF) (Class B, Type 3 or 4, Version 5.0).

Error Detection/Correction: An optical media data error checking capability runs in system background and is transparent to users. The ATSDR optical disk subsystem has error reporting capability, but the specific system software capabilities are unknown. No optical digital data disk failures were reported to date.

Recording Process: 12-inch, write once, read many (WORM), dual-sided optical digital data disk media.

Optical Digital Data Disk Composition: Glass substrate.

Capacity: Data storage of two Gigabytes per platter.

Number of Optical Digital Data Disks in Use: 60

Jukebox: Wang jukebox with three optical drives, 76 disks.

Storage Environment: A computer room environment with controlled temperature/humidity conditions is maintained for the operational system.

RETRIEVAL AND OUTPUT:

ATSDR's imaging system automates an important segment of the agency's total information processing needs. The ATSDR system's primary function is to support information retrieval and produce hard copy reports on demand. The system's search and retrieval software assists in the rapid identification of appropriate segments of relevant reports, and automatically spools the images to a print server. Access to both index data and the HazDat database is through a single software interface, with EPA's CERCLIS toxic waste site identifier used as the common link. All users may access the imaging subsystem, and facsimile transmission (FAX) provides image and text data transmittal and receipt.

Primary System Users: Scientists and Science Administrators

User Interface: The ATSDR's image applications were developed with ease of use in mind, designed in "electronic book" format. The first menu screen provides a list of topics, chemicals or sites. After user selection, the sections or chapters available are displayed. Using the display screen menu prompts under keyboard control, users specify that an image be displayed, printed, or facsimile transmitted anywhere in the world. Display Output: Users use 19 inch high resolution monochrome image display monitors and/or standard or super VGA color monitors.

Laser Printing: Hewlett-Packard 3SI laser printing equipment.

DATA MIGRATION POLICY ISSUES:

ATSDR is committed to building and maintaining an agency-wide imaging capability as one part of a comprehensive management information system. The overall system includes: a bulletin board service; a geographical information system; a regional information system; as well as an administrative and personnel database, and a Cost Recovery application. Although still in the early stages, the imaging system was designed with remote access and data transfer capabilities in mind.

Linkages with Other Agency ADP Applications: ATSDR in-house computer scientists developed the IBM mainframe computer linkages to the proprietary Wang hardware and software, resulting in a considerable cost savings. Remote access to the HazDat database is currently offered, but remote images are only available through FAX request interface.

Network Transmission: Full LAN capability exists for transfer of image and index data. The system is also equipped with an Internet gateway for transferring index data only.

Backup of Image and Index Data: Magnetic disks are used for daily incremental backups of image and index data, with bi-weekly magnetic disk backups of the image data. Image data is then written to magnetic tape. When the master optical digital data disks are completely filled with images, mirror-image optical digital data disk copies are created. These backup optical digital data disks, with a descriptive naming convention for identification, are stored in Fort Knox under environmentally controlled conditions.

Technical Support And Documentation: System users and managers use a combination of in-house and Wang-supplied technical and administrative documentation. In addition, full time Wang computer technicians are located on-site. A Wang senior systems specialist is available as needed for additional technical system consultation.

Interoperability: Wang's Integrated Image System (WIIS) open image architecture support the capture, storage, retrieval, management, and control of digital image data stored on magnetic or optical media. Wang's imaging system is compatible within the Wang VS minicomputer family. WIIS applications can be developed using Wang software and standard programming languages, while also supporting third party software packages. Stand alone optical disk drives and jukeboxes appear as any other storage (magnetic) device through the SCSI and RS 232 interfaces.

Migration Plans: ATSDR system administrators are committed to wholesale optical digital data disk recopying upon expiration of the media's warranty. They intend to stay with the existing system for the short term, but the long-term strategic plan is to migrate imaging to a different platform. ATSDR's Office of Information Resource Management and WANG are beta testing, on-site at ATSDR, WANG imaging on an IBM RISC, RS6000 platform.

OVERVIEW OF SIGNIFICANT ISSUES:

Business Process Re-Engineering: Performed to obtain greater benefits from the new imaging system?: Yes. The paper flow of the organization has been changed. Documents are not simply placed in file cabinets nor stored in boxes in a warehouse. They are now indexed according to subject, site, employee or whatever the application might call for, and stored in optical digital data disks and made available to those scientists and science administrators needing the information.

The health assessments, toxicological profiles and other documentation description of the 1,350 hazardous waste sites contain data and information that is frequently accessed, retrieved, and copied by scientists in Atlanta, Washington, D.C., and the ten regional offices as ATSDR staff conduct their work. Each year, some of the health assessments and the toxicological profiles may as required be updated. This may require reviewing old or new references the total of which may be in the tens of hundreds for an individual document, and contains everything from chemical compound listings, maps, photographs, site sampling data, and even handwritten notes. All of this data has to be identified, retrieved and copied by the reviewers.

Updating a paper-based health assessment or toxicological profile was a time- consuming task and involved considerable amount of staff effort to accomplish. Often, time was lost trying to find the most current version of the document of interest. Time was also wasted just trying to find a misplaced file or document. Imaging changed the way the scientists did their work. No longer would they be required to keep numerous paper documents on desks. Originals would be scanned onto optical digital data disks, where they would be easily accessible to those with the need to review such material.

With imaging, scientists are confident that they are working on the bona fide and latest versions of the document--the one that is in electronic form in the system--and not have to question which paper version is the most current. And with OCR, they can manipulate, edit, and update the information using local word processing packages.

The imaging system enabled ATSDR to identify and to measure a series of notable benefits. Most significantly, it has saved the scientists considerable time in accomplishing their tasks. Time once spent searching for paperwork can now be spent addressing complex and pressing public health problems.

The integrated imaging system also provides scientists greater accessibility to timely, complete, and credible information and thus has enhanced the Agency's ability to respond to both public and private sector inquiries. Information on a hazardous waste site may be needed on short notice when ATSDR is called to testify before Congress. Before document imaging, this often involved long searches and there was little control in place as to who had what material, or whether duplicates or outdated versions existed.

Without a doubt, document imaging has enabled ATSDR to provide a better public health response to its constituents.

Agency-wide Imaging: ATSDR decided to build an imaging capability in a phased approach, beginning with functions that promise to have the largest agency staff "payback". The imaging system currently supports agency functions toward the end of the work flow process, namely, the retrieval and dissemination of ATSDR products. The introduction of imaging, however, is already affecting working relationships and work flow in other agency components. Another critical issue was evaluating and educating top level management on the implications of adopting imaging technology on an agency-wide level. One major challenge was to foster and understanding of the potential for all of the Agency's staff as to how imaging technology could support their individual and joint efforts.

ATSDR's Office of Information Resource Management pointed out that Information Technology in general and imaging in particular served as a stimulus toward changing agency policy for establishing and maintaining comprehensive, readily accessible public health findings, as mandated by Congress.

Single Vendor: The ATSDR utilized Wang office information and automation equipment prior to the installation of the WIIS imaging system. The agency's administrators have a firm commitment to an imaging system; The platform may change, however, in 3-5 years.

Access System: The ATSDR's indexing system enhances access to and preserves the structure of the complex imaged reports. The agency's codified indexing system and the process followed to develop it (including the resolution of internal differences) may provide useful guidance for other Federal agencies in converting records with complex filing structures.

Legal Admissibility: ATSDR recognizes the potential legal implications of maintaining the record copy of agency documents on an imaging system. The agency's Assistant Administrator sought a legal opinion from the general counsel of the Department of Health and Human Services concerning the admissibility of optical images in cost recovery litigation. The response noted that courts have been very willing to admit evidence stored in computers. "As long as the printout is readable, and there is a witness who can testify as to the originality and authenticity of the computer records and the printout, there should be no problem of admissibility."


SITE VISIT REPORT #2

AGENCY: U.S. Army Corps of Engineers (USACE)

SYSTEM: USACE ODI Pilot Project

CONTACT: Linda Worthington, USACE Records Administrator, Washington, DC

AGENCY OVERVIEW:

Effective utilization of information resources is critical to daily operations in the US Army Corps of Engineers. The Corps mission is to provide quality, responsive engineering and environmental services to the American nation. To do this, the Corps employs about 40,000 civilian and 600 military personnel worldwide. The annual budget is about $12 billion.

The Corps plans, designs, builds and operates water resources and other civil works projects, provides military construction including design, construction management and real estate work for the Army and Air Force and design and construction management for other Defense and Federal Agencies. The Corps remediates hazardous and toxic wastes at Army and Air Force installations and at Formerly Used Defense sites. The Corps has four research and development laboratories. Its regulatory program, established in the 19th century to protect navigation, has been expanded so that today the Corps implements environmental protection statutes, preserves wetlands and protects other natural values. The Corps responds directly to natural disasters and other emergencies as the nation's primary engineering agency through its own authorities and in support of other agencies.

These mission critical functions require ready access to the agency's records holdings. This information management effort is made more difficult due to the variety of incompatible information storage media and formats in the Army Corps records holdings.

The Corps is looking for more effective approaches to accessing and sharing information with offices throughout the Corps as well as enhancing our customer service. One example is a recent Corps of Engineers information management initiative to pilot digital imaging systems. Five Corps of Engineer offices will serve as pilot imaging system test sites and will evaluate imaging technology under real world conditions. The Corps is pursuing pilot test systems with open system architectures, avoiding proprietary or unique vendor- specific solutions.

The Army Corps of Engineers optical imaging system is important for this study because of: the agency's need to make mission critical image and index information available Corps-wide; the need to integrate multi-media formats into one cohesive information system; and, the integration of simultaneous multiple site pilot imaging systems connected in a network configuration.

BACKGROUND:

The US Army Corps of Engineers optical digital data disk imaging pilot system responds to a need to manage large volumes of information currently maintained in a variety of non-electronic formats. The US Army Corps of Engineers monitored the optical imaging marketplace for several years. The proprietary solutions offered by the imaging industry, combined with the lack of Federal Government standards or policies related to optical media, resulted in minimal Corps involvement to date.

Although the existing imaging technology industry environment can result in incompatible systems, many Corps Offices nationwide were planning to adopt digital imaging systems into their business operations. As a result of this interest and the lack of standards, the US Army Corps of Engineers chose to conduct a pilot test to determine the feasibility of using digital imaging technology.

Pilot System Approach: The pilot systems will help determine the role of digital imaging technology in the Corps future information strategy. Corps management is seeking to identify imaging requirements and eventually adopt a Corps-wide imaging solution, utilizing off-the-shelf, commercially available technology as much as possible. The Corps adopted a three phased pilot system approach:

  • Phase I - Conduct Requirements Analysis Study
  • Phase II - Design Pilot System; Develop Unique Functional and Technical Specifications
  • Phase III - Install, test and evaluate the pilot systems

Pilot System Overview: The Corps existing non-electronic diversely formatted information, including documents, technical reports, engineering drawings, maps, and other formats requires manual, labor intensive and time consuming searches for information. The Corps of Engineers Information Management goal is to improve the efficiency of its records and information management programs by making existing and future data available Corps- wide in electronic format. This includes converting incompatible data formats to digital images, creating a computerized index system for improved search and retrievals, and permanently storing the information on write once, read many (WORM) optical digital data disks in Group IV compression.

The US Army Corps of Engineers expects to more effectively store and retrieve diversely formatted information once it is converted to a single, user friendly digital form. Adoption of digital information technology will provide a future capability to electronically route and share the agency's information more efficiently. The index database will provide electronic access to the valuable records collection. The Corps of Engineers expects to derive tangible benefits from digital imaging technology including improved staff productivity. These benefits will be based on: multiple, simultaneous access to electronic information; faster access to information; enhanced decision making processes due to improved access to information; increased record integrity; lower costs and space needs for records storage; and, improved efficiency and service to Corps of Engineers customers.

A series of pilot projects will help determine the suitability of imaging technology for Corps applications and imaging's ability to support inter-office and intra-office workflow and information exchange. No records will be destroyed since this is a pilot test.

Pilot System Description: In 1992, the Directorate of Information management initiated an Optical Disk Imaging (ODI) Pilot Test to evaluate the feasibility of integrating the latest commercially available technologies to provide Corps offices greater information access, storage and retrieval capabilities; determine appropriate policies, standards, and procedures; and, determine the most cost-effective solution.

The final phase of the ODI Pilot Test is in progress and will be completed in 1994. During this integration phase, the following items will be tested and evaluated:

  • Integrate ODI technology into the Corps 95 open systems architecture.
  • Scan and index documents, drawings, photographs and maps.
  • Link documents to a Corporate Database.
  • Provide remote access to images among test sites and HQUSACE personnel.
  • Use and evaluate a Corps developed records management indexing system.
  • Use and evaluate a Corps Scanning contract for digitizing E-size drawings and aerial photographs.
  • Determine feasibility of importing digital microfiche and 35mm slides.
  • Evaluate impact on LANs and the Corps WAN based on adding image traffic.

Functional users were recently trained on how to retrieve information from the image database. The database resides on an multi-function optical digital data disk jukebox connected to the CD4000 platform. They are using their own locally networked 386 and 486 PCs to search, retrieve, and display the images. Windows, Oracle SQL and imaging software were added to their PC configuration (See Pilot Configuration Section).

An evaluation of the pilot test will be conducted. Functional users will be asked to comment on how ODI helped them. Value added benefits we hope to achieve include providing an additional tool for re-engineering some of our business processes; increased productivity; enhanced decision-making; reduced storage and paper costs; and, enhanced customer service.

Plans are to use existing contracts to acquire ODI equipment and software. By late Spring, ODI policies, standards, procedures, and lessons learned will also be developed.

In 1992, a moratorium was issued on the purchase of ODI equipment/systems. This moratorium remains in effect until ODI policies, standards, and procedures are in place.

Pilot System Configuration:

The pilot design is based on open systems architecture. Pilot sites will use the Corps existing wide area network (WAN) to support the image traffic via T-1 lines. They will use locally owned Unix computer systems along with their relational database software to run a Corps developed records management indexing database system. The pilot imaging system will utilize commercially available, off-the-shelf (COTS) hardware and software components.

In the Fall of 1993, a systems integrator installed imaging systems at the following pilot locations.

  • Mobile District, Mobile, Alabama
  • Albuquerque District, Albuquerque, New Mexico
  • Huntington District, Huntington, West Virginia
  • HQ Health and Safety Office, Washington, DC
  • Army Environmental Center, Aberdeen, MD
Each pilot system consists of the following major elements:
  • Client/Server with Multifunction Optical Jukebox
    • Scan/Index Workstation
    • Retrieval Workstation
  • Client/Server with Multifunction Optical Jukebox
    • UNIX Client Server with SCSI.
    • Multifunction Optical Jukebox.
    • UNIX Operating System.
    • Relational Database Software.
    • Optical Disk Software.
Scan/Index Workstation:
  • Tabletop Document Scanner.
  • PC Platform: 486/50Mhz, 16MB RAM, 5.25 and 3.5-inch floppy drives, 500 MB Hard Drive, SCSI Controller, Mouse, Network Interface Card, 101-style keyboard.
  • 19-inch High Resolution Monitor, Dual Page 150 DPI.
  • Image Scan/Display/Compression and Decompression Components.
  • PC Operating System Software, Graphical User Interface Software, Imaging Software, and SQL Software.
  • Desktop Laser Printer.
Retrieval Workstation:
  • PC Platform: 486/33Mhz PC with 8MB RAM, VGA 14-inch Color Monitor, Mouse, 101-style keyboard, Network Interface Card.
  • PC Operating System Software, Graphical User Interface Software, Imaging Software, and SQL Software.

OVERVIEW OF SIGNIFICANT ISSUES

Interagency agreements: The Corps of Engineers is identifying Federal Government policies and standards applicable to imaging systems and records management. The Corps plans to establish a working group to assist with their technology projects, and coordinate with the National Archives. The Army Corps of Engineers and the National Archives, under a formalized Memorandum of Understanding, plan to examine legal admissibility and long term archival requirements related to digital imaging and optical digital data disk technologies.


SITE VISIT REPORT #3

AGENCY: Bureau of Land Management (Eastern States)

SYSTEM: Federal Land Patents System

CONTACT: James F. Gegen, Project Manager, Bureau of Land Management, General Land Office Records, Springfield, Virginia

SUMMARY DESCRIPTION:

In 1989, The Bureau of Land Management's (BLM) Eastern States office initiated a multi-year project to digitally scan, enhance, index, store, and retrieve approximately nine million pages of historic Federal land grant patents and related survey documents using twelve-inch optical digital data disks. The major goals of the General Land Office Automated Records System (GLOARS) include the preservation of the land patent documents dating back over two hundred years, and improving user access to the information. The records, which chronicle land title transfers for over 1.5 billion acres of public domain properties, are important in adjudicating land ownership. The actual conversion to digital images is an ongoing effort performed by an on-site contractor. The conversion process includes document preparation, scanning, indexing, image/data quality control, and image recording onto write once, read many (WORM) times optical media. The retrieval system supports full boolean searching of index fields, and offers access to the index and image data. Digital images identified through an index search may be displayed on a 19" high resolution monitor when linked to an optical "jukebox" and printed out to letter-size paper using laser printing equipment. The land grant patent retrieval system went on-line in February 1993 and operates on a fee-based cost recovery basis.

The BLM System is important for this study because of: the historical significance of the land grant records; the comprehensive indexing and searching capabilities; lessons learned about information technology standards; positive experience with a system integrator; and, cost recovery plans based on user retrievals.

BACKGROUND:

The functions of the Bureau of Land Management and its predecessors date back to the Land Ordinance of 1785, establishing for the first time the rectangular survey system for public lands. The public domain initially consisted of western territory claimed by the original 13 states eventually ceded to the Federal Government. Additional acquisitions over the years resulted in the public domain consisting of about 1.8 billion acres. The General Land Office, then part of the Treasury Department, was tasked with surveying these lands and maintaining the land status records. The BLM's present management of public lands and resources is based on the Federal Land Policy and Management Act of 1976. The BLM now manages over 270 million acres of public lands, including the resources they contain such as soil, water, air, timber, surface and subsurface minerals, oil and gas, geothermal energy, wildlife habitat, wild and scenic rivers, and open space.

According to its mission statement, the Bureau of Land Management Eastern States Organization "is responsible for the stewardship of public lands and resources under the jurisdiction of the BLM in the 31 states east of and bordering the Mississippi River on the west. These public lands and resources will be managed to protect the environment and provide a diverse array of products and outdoor experiences. The Eastern States is also responsible for the maintenance and protection of the official land records and cadastral surveys for the Department of the Interior." Customer service and public outreach are stated components of this mission and the BLM intends to use state-of-the-art technology and related research efforts to accomplish its goals.

BLM Eastern States has custody of more than 9 million Federal land documents such as survey notes and plats, tract books, and land patent records that document the country's westward expansion. The information in these records includes areas, boundaries, ownership, limitations on titles such as rights-of-way, and other characteristics that affect the value and use of the land. The tract books, first put into use in 1810, are large, bound volumes with public domain transactions recorded. In many cases, the Eastern States' copy is the only extant version. The original (often brittle) paper versions must often be consulted to decipher handwriting and other important information. Although the entire collection exists on microfilm the quality of the film is poor and the originals are relied on for accurate information. Access to the microfilm and paper versions of the land patent records is only available via track book indexes that require the user to supply a specific legal description of the land coordinates.

Improved public access to land records is a critical part of the BLM mission. These land records, some dating as far back as 1788, document the initial transfer of sovereignty to private individuals. They form the cornerstone of the title search process that is required by law whenever property is sold. In addition to title companies, BLM customers include other Federal, State and local government agencies, lands and minerals consultants, scholars, and private citizens. The Eastern States is responsible for maintaining the documents relating to the public lands for the 31 states geographically located east of and bordering the Mississippi River on the west. (The public land states are Indiana, Illinois, Michigan, Wisconsin, Minnesota, Iowa, Missouri, Arkansas, Louisiana, Mississippi, Alabama, Florida, and part of Ohio. Federal lands in any of the other eastern states are generally lands acquired for parks, forests, wildlife refuges, Native American reservations, etc.)

Origins: A key question facing the BLM was how to protect the fragile, historical general land office documents and continue to meet the needs of its users including: Federal, State, and local government agencies; title companies; lands and minerals consultants; and, private citizens. This growing concern led to a contract to Stone and Webster Company to evaluate preservation alternatives and costs. This 1986 study recommended the continuation of the existing microfilming program augmented with an automated indexing system at a cost of $20 million over seven years. An ensuing feasibility study was conducted in 1988 by West Coast Information Systems (WESCO). This WESCO study recommended that the Bureau's land records be digitally scanned, indexed, and stored on optical digital data disks over a four year period at a cost of $6 million. BLM management decided in April 1989 to develop a digital imaging system. This decision was based largely on two factors: reliability of optical digital data disk storage systems; and, the ability to provide improved public access to the land records.

In 1990 the Department of Energy (DOE) entered into an interagency agreement with BLM to contract with a private firm to develop system requirements and attribute database specifications. A Science Applications International Corporation (SAIC) team, based in Oak Ridge, Tennessee, identified these requirements and specifications and in 1990 developed a prototype system that was used to scan and retrieve 163,000 Arkansas patents and other documents. The imaging system prototype was especially helpful in validating the PC-based architecture and throughput conversion rates. After some minor modifications in the prototype design, a production system was implemented in 1991. To date, over one million land patent records and related indexes for eight states (Arkansas, Louisiana, Florida, Michigan, Minnesota, Ohio, Mississippi, and Wisconsin) have been digitally converted. The conversion process is scheduled for completion by late 2000, costing up to $15 million to develop the full image and index database.

SYSTEM CONFIGURATION:

The BLM imaging system is a PC-based architecture divided into four functional subsystems - scanning, indexing, quality control, and retrieval. The PC-based environment is client/server oriented and supports the four subsystems and the ORACLE Relational Database Management System, Ethernet for a local area network, and ORACLE Structured Query Language (SQL) for database communications.

  • LaserData LaserView LVNET Imaging System; LV-6000 Corvette Video Boards for Image Compression and Display; LV-8010 Scanner Controller Cards; QEMM Extended Memory Device Drivers.
  • 80486 33-MHz PC with two 1.4 gigabyte hard drives and two 425MB external Fujitsu magnetic disc drives.
  • Scanning Workstations--IBM PC/AT compatible 80386; Ricoh IS400 Document Scanners.
  • Indexing/QA Workstations--IBM PC/AT compatible 80386 with 40MB hard disks.
  • Optical Disk Subsystem--Sony WDD-600 Disk Drive; Sony WDC-610 Disk Controller; Sony WDA-610 50-platter optical disk jukebox.
  • Image Retrieval Workstations--IBM PC/AT compatible 80386 with 40MB hard disks; Hewlett-Packard Laser Printers.
  • Operating Systems: PC-based (NEC 386-486/20) Client/Server Environment; MS-DOS 5.0 on Novell File Servers, Optical Servers and Workstations; SCO- UNIX on the Database Servers.
  • MicroSoft C v5.1 for Compiler/Assembler Functions.
  • ORACLE Relational Database Management System version 7.0; ORACLE Lanserver for UNIX; Oracle SQL*Net for database communications.
  • Ethernet EXCELAN XLN, NOVELL v3.11 LAN, TCP/IP, Group 3; Ethernet Controller Boards, Ethernet and SinNet cables; Ethernet Standard Transceivers and Receivers.
  • SQL*Forms serves as Application Development System.

Technical System Specifications of Retrieval Components:

  1. SCO UNIX/ORACLE DATABASE SERVER (Qty=1)
    (For Database and Operating System Services)
    Gateway 2000 486/33 Tower PC
    64Mb Random Access Memory (RAM)
    Color VGA Monitor
    (2) 1.3Gb Micropolis SCSI Hard Drive
    (2) 425Mb Power Drive External SCSI Hard Drive
    Equinox MegaPort (12 serial ports)
    3COM 503-16 Network Adapter
    Mountain FileSafe 1200Plus External Tape Drive
    (12) 9600b External Hayes-compatible Modem
    SCO UNIX System V Operating System
    SCO TCP/IP Runtime System
    Oracle Relational Database Management System
    Oracle SQL*Net TCP/IP

  2. MS-DOS/ORACLE CLIENT PC (Qty=3)
    (For Accounting Administration, System Administration, and System Development)
    Gateway 2000 486/33 Tower PC
    8Mb Random Access Memory (RAM)
    Color VGA Monitor
    120Mb ESDI Hard Disk
    EXOS 205T-512K Network Adapter
    MS-DOS v5.0
    Oracle Tools for MS-DOS
    SQL*Net TCP/IP for DOS
    Microdyne LAN Workplace for DOS

  3. MS-DOS PRINT SERVER & LASER PRINTER (Qty=1)
    (For Printing Document Images)
    NEC PowerMate 486SX/25e PC
    2Mb Random Access Memory (RAM)
    Samsung 14" VGA Monochrome Monitor
    120Mb Hard Disk
    EXOS 205T-512K Network Adapter
    LaserData LV6004 Image Processing Board
    LaserData LV8030 Printer Controller
    LaserData LV8023 LaserJet III Adapter
    Hewlett-Packard LaserJet IIID Printer
    MS-DOS 5.0
    LaserData LV9100, LV914B Software
    SQL*Net TCP/IP for DOS

  4. MS-DOS IMAGING WORKSTATIONS (Qty=3)
    (For Users in the BLM Eastern States Public Services Section)
    NEC PowerMate 386/20 PC
    8Mb Random Access Memory (RAM)
    LaserData (Monoterm) LV719 19" High-Resolution Monochrome Monitor
    40Mb Hard Disk
    EXOS 205T-512K Network Adapter
    LaserData LV6004 Image Processing Board
    Oracle Tools for MS-DOS
    SQL*Net TCP/IP for DOS
    Microdyne LAN Workplace for DOS
    LaserData LV9100, LV9150, LV914, LV9910 Software

  5. NETWORK LASER PRINTER (Qty=1)
    (For Printing Data Reports)
    Hewlett-Packard LaserJet II
    Parallel Link HPL-100 Print Extender

  6. MS-DOS FAX SERVER (Qty=1)
    (For Unattended Fax-Out Services)
    NEC PowerMate 386/20 PC
    2Mb Random Access Memory (RAM)
    Samsung Monochrome Monitor
    40Mb Hard Disk
    LaserData LV6004 Image Processing Board
    Gammalink GammaFax CP Board
    LaserData LV9100, LV916 Software
    Alcom Easygate LanFax/10 Software

  7. MS-DOS DOCUMENT SERVER & JUKEBOX (Qty=2)
    (For Storage and Retrieval of Document Images)
    NEC 486/25 PC
    4Mb Random Access Memory (RAM)
    Samsung 14" VGA Monochrome Monitor
    120Mb Hard Disk
    Adaptec 1540B/1542B SCSI Controller
    LaserData LV6004 Image Processing Board
    LaserData LV914B, LV912S, LV913S, LV9910 Software
    Sony 50-Platter DSDD 2-Drive Optical Disk Autochanger

Date system installed: 1989-1990

System Installed by: Staff and SAIC

System Configuration Changed Since Installation? (Yes)

The system originally started out with single density SONY optical media which was converted to accommodate double density media when it became available. Prior to installation of the Novell file server, the images were routed through a two cache system. The imaging components and systems software have been maintained and upgraded to maximize effectiveness of components.

DIGITAL IMAGE CAPTURE:

BLM's conversion processing includes nine distinct production stages: document preparation; logging volumes to processing queue; digital scanning; indexing; transfer to optical digital data disk; quality control review; BLM quality assurance; logging volumes from processing queue; and migration of attribute database to retrieval platform. The image conversion work flow process, performed on-site, uses a Novell file server configuration to reduce device and/or access contention problems.

Estimated Number of Documents/Records Converted to Date: Over 1.1 million.

Conversion of Records Performed By: Contractor - Dynamic Concepts, Inc.

Disposition of Original Records: Designated as Permanent Records. With approval from the Director, Bureau of Land Management, and Director of Eastern States, retire patent documents to National Archives after Project is completed and data in system is verified to be correct.

Document Preparation: After the scanned images are accepted, the patent documents are placed in acid-free archival boxes and stored in temperature-controlled vaults. We are not scanning or indexing the tract books.

Document Scanning: Ricoh IS400 scanners capture documents up to 11 X 17 inches, using a 6 page per minute manual feed transport system. The Ricoh scanners are calibrated prior to converting each discrete volume of land patent records. Contrast settings are also adjusted as needed during scanning to compensate for visible signs of document deterioration such as aged yellowed records, and volume-wide water stains. Dynamics Concepts, Inc., was awarded the conversion contract, providing on-site staff including: site manager; systems specialist; line supervisors; and, production workers rotating between scanning, indexing, and quality control stations. The document scanner throughput rates currently average 1,250 pages per day per scanner. (The prospects for any increases in project funding are very dim. We began the Project with two production teams - one working on BLM-owned equipment the other on leased equipment. The leased equipment has not been fully utilized for some time and the lease will expire in 1994. Because of funding limitations, we are not planning on renewing the lease and will continue to operate the production facility at the current rate of production.)

Quality Control: Quality control workstations utilize IBM PC compatible 80386 with nineteen inch high resolution (150 dots per inch) display monitors. The contractor conducts a 100 percent quality inspection of all images. The BLM Quality Assurance personnel select images for inspection using a statistical sampling technique, with image quality judgments based on visual comparisons of the digital images to the original documents. Scanned image data is stored in Novell file server (1.4 GB) until quality control is completed. When an image(s) fails the quality inspection, the document(s) are rescanned. Staff training and supervision provided by the contractor results in a claimed 99 percent scanning accuracy rate. Acceptable quality is defined as the ability to capture and display legible document images.

Scanning Resolution: The Ricoh scanners capture images at 300 dots per inch (DPI). Images are subsequently displayed at 150 DPI and printed at 300 DPI.

Color and Gray Scale: The Ricoh IS400 scanners do not provide color or gray scale capability.

Image Enhancement: Enhancement capabilities exist through seven levels of pixel density.

Compression/Decompression: Proprietary LaserData image compression and decompression algorithms are programmed into the system's workstation video boards. The algorithms conform to CCITT Group 3 standards, and a typical image is approximately 150KB after compression. BLM system administrators recommend open system technology as soon as industry/government standards are available to avoid becoming restricted to a single vendor's product line over the information system's life.

DOCUMENT INDEXING

Electronic images are immediately available for indexing following document scanning. Images retrieved for indexing are temporarily stored on Novell file server via a Local Area Network. Indexing workstations feature IBM PC compatible 80386 with nineteen inch high resolution (150 dots per inch) display monitors.

Creation of Index Database: Index data is key entered using the digital screen images. The index information is verified by quality assurance specialists using special quality control workflow software.

Location of Index Database: An IBM PC/AT compatible 486 server with a 1.4GB hard disk uses ORACLE software to manage and control index data. SCO-UNIX provides processing power and multi-user support.

Index Structures: Each land patent is fully indexed in 35 distinct fields. In cooperation with the National Archives (NARA), information missing from the original records due to physical deterioration is recovered using other NARA holdings. The BLM claims an indexing accuracy rate of 99.5%. Machine assisted indexing reduces keystrokes by automatically completing certain pre-defined fields (e.g., volume number, accession number, document number and state code). Key index fields include patentee name, warrantee name, and legal land descriptions.

OPTICAL DIGITAL DATA DISK STORAGE:

The scanned pages are committed to optical digital data disk only when accepted by the indexing specialists, under the control of proprietary system software.

Image File Headers: The LaserData proprietary structure is not compatible with the Tagged Image File Format (TIFF).

Error Detection/Correction: Operation is transparent to the user. No disk failures encountered.

Recording Process: 12-inch, WORM, double density (Sony WDD-600).

Optical Digital Data Disk Composition: Polycarbonate substrate disk material.

Capacity: 3.2 gigabytes per side (6.55 GB total storage per disk).

Number of Optical Digital Data Disks in Use: 40

Jukebox: 2 Sony 50-Platter DSDD 2-Drive Optical Disk Autochangers WDA-610 "Jukeboxes" (50 disk capacity).

Storage Environment: Typical computer room environment with supplemental heating/ventilation/air conditioning (HVAC) maintains constant temperature (74 degrees) and relative humidity (55-60%). No system operational problems due to the storage environment were noted.

RETRIEVAL AND OUTPUT:

The GLOARS images are searched and retrieved using the key entered index data (e.g. land patent descriptive data and patentee names). System users having access to the SONY jukeboxes via the LAN view the images on high resolution display monitors or print hard copies using laser printers.

Primary System Users: Clerks/Administrative Staff/Public

Display Output: The windows-like environment simultaneously displays index and image data. Images are displayed on nineteen-inch high resolution monitors on PC-based workstations.

Laser Printing: Hewlett-Packard IIID for images and Hewlett-Packard II for hard copy text reports.

DATA MIGRATION POLICY ISSUES:

Providing access to data beyond the confines of the immediate system, and long-term data retention extending beyond the expected life of the existing system should be design goals when the value of the information warrants such concerns. Data migration strategies should be viewed as a continuum, beginning with the universal capability of systems to display images or print them on paper.

Linkages with Other BLM ADP Applications: The imaging system has no direct linkages with other BLM information systems. An overall BLM agency ADP modernization effort is underway, with a goal of achieving inter-system compatibility. (However, all efforts have been made to use pre-existing BLM information system casetype authority codes and other codes where possible.)

Network Transmission: Users can access the data using a remote PC with 9600 baud modem capability, Kermit communications software, and BLM communications software with a BLM query session charge applied for searching the document attribute database (index data). This access system responds to remote requests for FAXes of document images and query search results while supporting a BLM cost recovery accounting system. The costs of the initial records conversion process will not be recouped through user fees. The Records Administrator/System Administrator expects that cost recovery fees received from users in the future will fund system access and maintenance. CD-ROM distribution of the attribute data base has been implemented.

Backup of Image and Index Data: A mirror-image backup copy of the optical media data is created as image and index data is completed for each state. Backup optical digital data disks are now stored on-site in a climate-controlled vault, although BLM expects to send them to the National Archives in the future. Index data is backed up regularly onto magnetic tape and floppy disks.

Technical Support and Documentation: Technical support is currently provided by the integrator under terms of the development contract. LaserData provided technical and administrative manuals as a contract deliverable. Detailed documentation describing the proprietary compression algorithms remains as exclusive LaserData property.

Interoperability: LaserData has a proprietary approach to writing image data to the Sony optical digital data disks, meaning the disks can only be read by an identical Sony optical drive with compatible LaserData software.

Migration Plans: As part of its current mission statement, BLM has a long-term commitment to make image and index data available to the general public. Current planning for migration to future technologies largely consists of assuring that the system under development functions as specified and that technical and administrative documentation is adequate for ongoing maintenance and periodic equipment upgrades. Concerns over legal admissibility of optical digital data disk images have prompted the development of detailed management and operator procedural manuals.

OVERVIEW OF SIGNIFICANT ISSUES:

Business Process Re-engineering: Performed to obtain greater benefits from the new imaging system: Yes

The mission of the Eastern States is to provide prompt, professional, and courteous service to all customers. The Legal Clerks in the Branch of Records and Public Service research the GLO records in response to walk-in and written requests for information. A recent study determined that more than 10,000 requests are received each year. Using the automated records system to conduct the research will result in more timely request processing and the processing of more requests. Patent queries are no longer restricted to land description information. For the first time in history searches can be conducted by patentee name, making the records accessible to a greater number of clientele. A search that could take hours is being reduced to minutes. This is a savings to the Bureau and the customer.

Historical Value of Records: The BLM system contains digital images of important, permanently valuable historical records. These records support one of the BLM's central missions, and have value to a wide variety of outside users. Image enhancement of the sometimes badly deteriorated records increases their legibility, and laser hard-copy output is of sufficient quality for most users. The system's potential to become the foundation of a nationwide land patent records databank increases its value to both the BLM and to the National Archives. The BLM system is one possible model for a production system that could support the conversion, preservation, and use of archival materials currently held by NARA and other Federal agencies.

Access: The BLM system enhances access to land patent records by providing a more powerful retrieval system. The previous manual storage system permitted retrieval through a single access point, namely the legal land description. The searching capabilities of the computerized index database, when combined with simultaneous display of index and image data, provides significant new opportunities for users. This includes rapid retrieval, easy comparison of adjacent land tracts, and statistical analysis of land transfer trends.

Information Technology Standards: BLM may face obstacles in future data migrations due to the absence of industry standards for the physical or logical formats of 12-inch optical media. These problems are likely to be exacerbated by the continued use of proprietary image compression algorithms and header file formats. The BLM plans to move to a non-proprietary imaging component as soon as standards are adopted.

Role of the Integrator: BLM administrators are pleased with the third-party integrator's performance due to: a proven track record in developing systems with similar functions; intensive interviews with BLM staff in the early design stages to ensure the system met BLM needs; and, a willingness to closely inspect physical records and processing procedures. The contractor's design team posed many questions that led to a re-thinking of the fundamental assumptions underlying traditional access procedures and customer services. BLM staff were especially pleased with the quality and readability of the system design documents.

Cost Recovery: BLM has implemented an automated accounting, fee-based cost recovery public access system. The basic concept includes: an on-line tutorial describing user search techniques; the ability to order FAX copies or print copies by mail; and, acceptance of a credit card as payment for services. BLM Records Administrator/System Administrator expects that the fees will recover the costs of systems access and maintenance.


SITE VISIT REPORT #4

AGENCY: Commodity Futures Trading Commission

SYSTEM: Document Management System

CONTACT: Hunton G. Oliver, Office of Information Resources Management, Commodity Futures Trading Commission, Washington, DC

SUMMARY DESCRIPTION:

The Commodities Futures Trading Commission (CFTC) staff needs direct access to up-to-the-minute market information to effectively fulfill their mission as commodity trade regulators. To achieve this goal, a Document Management System was installed in 1992 that effectively integrates several agency applications. This system provides on-line access to daily newspaper clippings and related financial wire service reports, replacing labor intensive photocopy distribution of pertinent commodities industry data. Other CFTC agency applications, including legal dockets files and correspondence, are also digitally scanned, indexed, stored, and retrieved in the Document Management System. Imaging technology may eventually assume an even greater role in processing the CFTC's document-based information.

The CFTC's imaging system is more than a pilot or test system with reference capabilities, it is a fully functional production operation. Special features include optical character recognition technology to automatically capture textual information. Additionally, the system software provides users with full text retrieval capability. This imaging system was obtained through an 8A contract, and integrated as a new application into the agency's existing local area network. The Document Management System uses conventional and multifunction optical digital data disk equipment to store digital image data on write once, read many (WORM) and rewritable optical media.

The CFTC's optical imaging system is important to this study because of: the successful implementation of imaging technology into an agency's daily operations; the integration of WORM and rewritable optical digital data disk technologies; and, the imaging system's user interface flexibility to support several unrelated applications.

BACKGROUND:

The Commodity Futures Trading Commission promotes healthy economic growth, protects the rights of customers, and ensures fairness and integrity in the marketplace through regulation of futures trading. To this end, it also engages in the analysis of economic issues affected by or affecting futures trading. The Commodity Futures Trading Commission, the Federal regulatory agency for futures trading, was established by the Commodity Futures Trading Commission Act of 1974. The Commission began operation in April 1975, and its authority to regulate futures trading was renewed by Congress in 1978, 1982, and 1986. The Commission consists of five Commissioners appointed by the President with the advice and consent of the Senate. The Commission has five major operating components: the divisions of enforcement, economic analysis, trading and markets, and the offices of the executive director and the general counsel.

The Commission regulates trading on the 13 US futures exchanges, which offer active futures and options contracts. It also regulates the activities of numerous commodity exchange members, public brokerage houses, Commission-registered futures industry salespeople and associated persons, commodity trading advisors, and commodity pool operators. Some off-exchange transaction involving instruments similar in nature to futures contracts fall under Commission jurisdiction. The Commission's regulatory and enforcement efforts are designed to ensure that the futures trading process is fair and that it protects both the rights of customers and the financial integrity of the marketplace. It approves the rules under which an exchange proposes to operate, and monitors exchange enforcement of those rules. It reviews the terms of proposed futures contracts, and registers companies and individuals who handle customer funds or give trading advice. The Commission also protects the public by enforcing rules that require that customer funds be kept in bank accounts separate from accounts maintained by firms for their own use, and that such customer accounts be marked to present market value at the close of trading each day.

Futures contracts for agricultural commodities were traded in the United States for more than 100 years before futures trading was diversified to include trading in contracts for precious metals, raw materials, foreign currencies, commercial interest rates, and US Government and mortgage securities. Contract diversification has grown in exchange trading in both traditional and newer commodities. Large regional offices are maintained in Chicago, IL, and New York, NY where many of the Nation's futures exchanges are located. Smaller regional offices are located in Kansas City, MO, and Los Angeles, CA. A suboffice of the Kansas City regional office is located in Minneapolis, MN.

Origins: The CFTC's management decision to obtain the Document Management System was finalized in June 1991. The imaging system was installed in February 1992, through an 8A procurement with Westco Automated Systems and Sales, Inc. This system was originally designed to support several agency imaging applications, including the distribution of the agency's daily newsclips files and managing the legal docket case records. The system's primary application is digital scanning and dissemination of daily newsclips, previously distributed to staff in a hard copy "read file" format. These newsclips contain pertinent commodities information published in daily newspapers and on-line financial wire services. The Document Management System can also capture, index, store, and retrieve the agency's legal dockets and published Commodity Exchange rules. Researcher access to the imaging system's database is currently limited to CFTC staff (approximately 500 employees). Although public access is not possible at this time due to concerns over data security, a CFTC public access bulletin board is under consideration.

SYSTEM CONFIGURATION:

Date system installed: 1992

The CFTC system operates as a series of interconnected servers using an existing Banyan Vines network. The Scan Station captures raster images of the original documents and verifies image quality. An OCR Server converts the bit mapped images to ASCII text files. The Network Server controls user network access to the document management system. The Image Database Server maintains the physical addresses of the scanned image records. The Optical Server is the permanent storage facility for the scanned digital images. The Retrieval Stations support user access to the full text and image data. The Print Server provides hard copies of requested images received via the network.

Scan Station: Fujitsu 3096 11 x 17-Inch Document Scanner with Auto Feeder; Everex 386/25 MHz Computer; 4MB RAM; 300MB Disk; 19- inch Cornerstone High Resolution Monitor; Xionics Compression/Decompression Board.

OCR Server: Calera MM600 Optical Character Recognition System; Everex 386/25 MHz Computer; 4MB RAM; 100MB Disk.

Network Server: 386/20 MHz Banyan File Server; 80MB & 300MB Disks.

Image Database Server: Everex 386/33 MHz; 4MB RAM; 600MB Disk.

Object Server: LMSI LF 4500 Auto Changer; 28GB WORM Capacity; Everex 486/33 MHz Computer; 8MB RAM; Two Each 600MB Magnetic Hard Disks.

Retrieval Station: Everex 386/25 MHz Computer; 4MB RAM; 100MB Disk; 19- Inch Cornerstone High Resolution Monitor.

Print Server: HP LaserJet III Laser Printer with Video Control; Everex 386/25 MHz Computer; 4MB RAM; 100MB Disk; Xionics Compression/Decompression Board.

The CFTC's Document Management System utilizes Advanced Information Management Systems Plus software for PC-based applications. CFTC expects to implement a systems-wide Windows environment, and several existing workstations are Windows- equipped locally. This conversion will best be accomplished with an agency-wide upgrade to 486/50 DX-2 workstations. The existing imaging system uses 10Base2 and 10Base 5 Ethernet communications.

System Installed by: Intrafed Corporation

System Configuration Changed Since Installation? No

DIGITAL IMAGE CAPTURE:

Scan Station: This station is the system's input device for scanning documents and converting the images into TIFF format bitmap files. The digital files may be directed to spool disk on the object server, or stored locally and processed in batch mode.

Document Preparation: Newsclippings, wire service reports, and agency legal documents require manual preparation prior to scanning. National daily newspapers are perused each day by CFTC staff, and a clippings file is created that contains financial articles of interest. The newsclips are photocopied prior to scanning to improve automated feeding. Wire service financial reports, and CFTC legal documents may also require preparation for scanning.

Image Capture: The desktop Fujitsu 3096 document scanner offers an auto feeder and two-sided scanning. The scanner is controlled by a 386/25 PC with high resolution display and Xionics Corporation image compression hardware.

Disposition of Original Records: The newsclippings are not considered as permanent records for transfer to the National Archives. The CFTC agency's legal staff use the imaging system for documents to be retained for long term.

Scanning Resolution: 300 dpi.

Quality Control: The scanned images are inspected by the conversion operator to verify legibility.

Color and Gray Scale: No color or gray scale images are captured.

Image Enhancement: An image enhancement board was installed in the Fujitsu scanner to improve image quality.

Compression/Decompression: Group 4 compression. Software compression provides digital image files of 50KB to 100KB per newsclip image. Image decompression at the user workstations is under software control.

DOCUMENT INDEXING:

OCR Server: The OCR server converts TIFF bitmap images to ASCII text. The TIFF image files are retrieved from spool disk on the object server. ASCII text files are directed to the network server for indexing by Personal Library Software. A 386/25 PC is the workstation controller for the Calera OCR system. The documents are scanned, OCR processed, and indexed during the data capture operation. Tagged index items include the newspaper headlines, originating news source, and important page topics (approximately 50 categories) are entered as database keywords, later serving as user searchable fields. The Calera OCR processor is able to accurately decipher a significant portion of the newspaper clipping's text, despite difficulty with font characteristics and photocopy qualities. The CFTC input staff performs no manual cleanup or corrections of the OCR files. The converted ASCII text data permanently resides on magnetic hard disks. Documents from other CFTC applications (i.e. legal dockets) are indexed by other criteria, and retained in WordPerfect format.

Index Database Server: A 386/33 MHz PC with 600MB hard disk maintains physical addresses of the image records located on the object server. The database maintains image folder and page information. The GUPTA Corporation structured query language (SQL) database permits manual indexing of scanned documents. The fully scaleable database system also provides security access control over the indexed information.

Creation of Index Database: Index data is captured by OCR technology from the scanned newsclip images. Manual tagging is also performed for headlines, source, and topics of interest. The separate legal documents "Proceedings Court System" uses the document number, complainant, respondent, and document type.

Location of Index Database: Index data resides on magnetic storage disks on a dedicated database server.

Index Structures: Full text search available through Personal Librarian software. OCR index data errors as captured are not "cleaned up".

DATA COMMUNICATIONS:

Network Server: This 386/20 MHz BANYAN file server controls network access for the Document Management System. This server maintains the Personal Library Software, the text representation of images, and indices. This server also stores images for DOS retrieval.

Magnetic Image Cache: Sufficient magnetic cache storage for images is not available at the local workstations; rather, images are retrieved and displayed directly from the optical digital data disks. Although this limitation is not a major problem for users of the newsclip files, access to sizable CFTC legal case file dockets could be enhanced with sufficient local cache memory.

OPTICAL DIGITAL DATA DISK STORAGE:

Optical Server: This optical digital data disk subsystem is the permanent storage facility for the scanned digital images. The two 600 MB magnetic hard disks on the PC workstation store in-process digital images awaiting recording onto the LMSI optical digital data disks.

Image File Headers: Xionics Corporation version of tagged image file format (TIFF) header.

Error Detection/Correction: Supplied by manufacturer (operational specifics unknown).

Recording Process: Two different optical storage systems are used: 1) LMSI 12-inch WORM drive and media; 2) Micro Design International (MDI) 5.25-inch multifunction optical drive with WORM and rewritable capability.

Optical Digital Data Disk Composition: LMSI 12-inch disks are a two-sided glass substrate. The LMSI drives also accept Maxell optical media as a second source.

Capacity: LMSI 12-inch optical media = 5.6 GB each; Panasonic 5.25-inch rewritable optical media = 1 GB each; Panasonic 5.25-inch WORM optical media = 940 MB each.

Number of Optical Digital Data Disks in Use: Five LMSI optical digital data disks are currently used to store CFTC data.

Jukebox: LMSI LF4500 five-cartridge Auto Changer offers a total of 28 GB WORM capacity.

Storage Environment: Computer room with security controls.

RETRIEVAL AND OUTPUT:

Primary System Users: Agency Staff.

Retrieval Workstations: The retrieval subsystem responds to keyword searching, and displays the OCR text of the scanned news article. The 386/25 workstation provides full text index searching and image retrievals. Image retrievals are conducted by folder, by page, or by SQL field values.

Image Display: 19-Inch Cornerstone High Resolution Monitor.

Screen Formats: Newsclip images are displayed on left side of the user screen; OCR text is displayed on right side (with key words highlighted).

Print Server: A 386/25 functions as print server using a Xionics compression/decompression processing board. The system uses a HP LaserJet III Laser Printer to print from compressed image files received via the Document Management System network.

DATA MIGRATION POLICY ISSUES:

Linkages with Other Agency ADP Applications: The imaging system was integrated into an existing BANYAN network with DOS-based workstations.

Network Transmission: The Document Management System is sized to support 400 workstations accessed by 16 concurrent on-line users. CFTC expects to double this to 32 concurrent users.

Backup of Image and Index Data: Backups of the newsclip image files are performed on a routine schedule using 5.25-inch Panasonic WORM and rewritable optical media (multifunction optical drive).

Technical Support and Documentation: Ongoing system maintenance is performed by a contractor, Westco Automated Systems and Sales. Intrafed Corporation supplied the CFTC with complete manufacturer's documentation for each system component to the circuit board level. A system description document was also provided that details the overall integrated configuration.

Interoperability: Intrafed Corporation specified the systems architecture. The CFTC's imaging system uses standard DOS and network configurations, each workstation can operate independently.

Migration Plans: Although no specific upgrade plans exist now, the CFTC agency users (legal staff and others) are pleased with the imaging system and expect to continue using imaging/optical digital data disk technology.

OVERVIEW OF SIGNIFICANT ISSUES:

Business Process Re-Engineering: Were any existing agency procedures changed following the installation of the imaging system? no

User Access: CFTC management plans to double the number of retrieval workstations, eventually connecting Regional Office users. Telecommunications may become a bottleneck with the large image file sizes.

Records Management: The LMSI optical digital data disk autochanger's five disk capacity is adequate for storing several years of CFTC data (2 disks filled/year) and additional devices can be added as needed.


SITE VISIT REPORT #5

AGENCY: Department of the Army (Office of Chief of Staff)

SYSTEM: Captured Gulf War Document Exploitation (DOCEX) System

CONTACT: Major Perkins,
USA-DSMA,
The Pentagon,
Washington, DC

SUMMARY DESCRIPTION:

The Department of the Army developed a special purpose, stand-alone digital imaging system to scan, index, store on optical digital data disks, and retrieve images of captured Persian Gulf War documents. The Army Department's Decision Systems Management Agency (DSMA) specifically developed the Document Exploitation (DOCEX) system for recording the contents of Iraqi filing cabinets and other documentation captured by Allied Forces. Conversion scanning of the more than twelve million pages was performed in several locations including Kuwait City and the Saudi Arabian city of Dhahran. The operations staff encountered difficult environmental field conditions, including excessive heat and abrasive airborne dust particles. The scanned images and ASCII index data were initially written to magnetic hard disk, and then down loaded to digital audio tape (DAT). The DAT media was subsequently returned to the United States for data transfer onto 5.25-inch rewritable optical digital data disks at a Defense Intelligence Agency (DIA) computer system facility. Operating under an electronic filing cabinet concept, the image retrieval system uses the optically stored document images in continued support of document searching, retrievals, and language translations.

The DOCEX system is considered important for this study due to: the imaging system's conformance to existing information technology standards, where possible; the short time frames available to DSMA for developing and implementing complex systems; the use of DAT technology for interim data storage; and, the selection of rewritable optical media as the primary data storage technology.

BACKGROUND:

The DOCEX system was developed under the auspices of the Decision Systems Management Agency (DSMA). The DSMA reports to the Army's Office of Chief of Staff, and provides a full range of automation services upon request to the Army Staff, Secretariats, and the Army's Joint Staff. The DSMA is primarily a task oriented organization, geared for solving specific Army information system needs. The DSMA is also responsible for evaluating emerging information technologies and developing specialized prototype applications for possible use throughout the Department of the Army.

The DOCEX system contains digital images of over 12 million documents. The records were removed from offices, bunkers, storage depots, and other locations seized in January 1991 by the combined Allied troops as they advanced through Iraq and Kuwait. The conversion project was made more complex because most of the documents were non-English (Arabic--Farsi) language. The DOCEX system was initially designed with a multimedia capability. This concept included digitized audio and video recording of captured objects, and the surrounding physical spaces from which the objects were originally removed. Time constraints resulted in implementing a document-only scanning capability.

DOCEX supports the document translation efforts of the Defense Intelligence Agency's (DIA) analytical staff. A major DIA goal was to identify important captured intelligence documents and related materials for possible translation from the original Farsi to English. DSMA staff envisioned the DOCEX system to serve as a possible prototype or model for future systems, although no immediate secondary uses of the scanned documents or the imaging system were anticipated when the original system design criteria were developed. The Army's Litigation Center is developing an automated case file tracking and access system that will utilize similar optical digital data disk storage technology.

Origins: DOCEX is an excellent example of an imaging system successfully implemented in spite of extreme developmental time constraints. A Military Order to develop an imaging system for captured Gulf War documents was issued on the morning of January 26, 1991. Within 24 hours, the DSMA staff developed system specifications and briefed senior officers, who authorized procurement. The decision to utilize digital imaging technology was driven in part by the Army's previous unsatisfactory experience in 1989 with a microfiche-based system for documents captured in Panama during Operation Just Cause. For this project, the Army required the imaging of over six million Spanish and English- language documents. Due to several factors, however, only 10 percent of these documents were converted after six months. The intelligence value of captured documents is frequently based on the ability to immediately retrieve and analyze key information. The useful life of similar captured documents is often only two to three years. Once the initial analysis is completed, interest in the records themselves often declines dramatically. Records determined to have long term retention value may be digitally scanned and also microfilmed to obtain an archival backup copy.

DSMA staff received authorization up front to move quickly with full procurement of necessary DOCEX system components. This strategy was based on one of the Army's primary lessons learned from the Panama experience consisting of: "never go into a project with partial funding". The DOCEX system design criteria ensured image and index data portability through several ensuing formats: data down loaded from magnetic hard disks to DAT tape, followed by data transferred to rewritable optical digital data disks. The DSMA system designers sought commercial off-the-shelf (COTS) hardware and software to minimize specialized, time consuming system modifications and customized features. The high-pressure, rapid development conditions resulted in equipment that met two requirements: conformance to existing information technology standards; and, access to comprehensive vendor support. Total DOCEX system costs were approximately $500,000 in a systems integration contract with Intrafed Inc. of Washington, DC.

Date system installed: 1991

SYSTEM CONFIGURATION:

  • Document Scanners--Two (2) Kodak Imagelink 900D scanners equipped with: 8MB scanner memory, automatic document numbering system, OCR capability, and bar code recognition.

  • Imaging Platform--IBM Corp. PS/2 486 class, Model 95, IBM database management software, OS/2 driver with 16MB memory, and an 8514 XGA monitor.

  • Storage Media--Magnetic hard disks for in-process data; 1.3 gigabyte capacity digital audio tapes (DAT) (total of 18 tapes); 5.25-inch rewritable optical digital data disks for image storage.

  • Document Retrieval--Index: Knowledge-Based Management System software from AI Corp. Images: Optical digital data disks stored in an imaging system running OS/2 from Imara Research Corp.

DIGITAL IMAGE CAPTURE:

The initial digital image conversion of approximately 12 million pages was performed in Kuwait City and Dhahran, Saudi Arabia. Conversion equipment included two Kodak document scanners linked to IBM PS/2 microcomputers and database software. DOCEX systems administrators were particularly pleased with the consistent high quality performance of the scanning equipment and the large memory capacity (16MB) of the IBM OS/2 drivers. This was achieved even with the persistent desert dust and other unavoidable harsh environmental conditions.

Document Scanning: Document conversion and initial manual keyword indexing took one month to complete. During that time, two daily work shifts containing nine production people each prepared the documents for scanning, followed by four teams of two people each who operated the scanners. Ten translators completed the index function by referring to printed key word listings in various languages to assign documents to specific categories.

Quality Control: Conversion staff used display screens to evaluate image quality, adjusting equipment as needed to meet image quality guidelines. DOCEX system administrators highly recommend test targets for calibrating equipment and maintaining consistent image legibility. This was qualified with the following statement reflecting the reality of high-pressure production: "A test target image is almost always better than a piece of paper with a footprint across it."

Scanning Resolution: 200 dots per inch, selected as a compromise between achieving maximum scanning throughput rates and acceptable image legibility.

Color and Gray Scale: Although the document scanners offered color and gray scale capabilities, maximum throughput speeds were achieved in the binary mode (black & white). Due to the scanner's CCD color sensitivity and color drop out, red inks on the original documents were difficult to capture.

Image Enhancement: The Kodak scanner's standard contrast controls provided acceptable image quality, and no special add-on image enhancement technology needed. Other DSMA imaging applications have employed image enhancement technology successfully.

Compression/Decompression: The DOCEX imaging system's compression algorithms conform to CCITT Group 4 standards. DSMA prefers hardware assisted compression and RAM cache memory to achieve high speed image display.

DOCUMENT INDEXING

Conversion staff translators perused each document and annotated work sheets with keyword information, such as the date of capture, location where the document was found, and type of document. The completed worksheets were scanned along with each document folder, subsequently used to create the index database. The indexing operators consulted key word listings pre-printed in various languages. DOCEX system administrators note that due to it's labor intensive, time consuming nature "indexing is the dark side of imaging." They recommend creation of index batches and automated indexing capability.

Creation of Index Database: Basic index information on document content and structure was compiled in Kuwait and Saudi Arabia using the scanned worksheets. Additional indexing of selected images was performed to enhance user access upon return of the system to the United States.

Location of Index Database: The DOCEX images and index data are not stored together on the optical media. Rather, the index database is stored on magnetic hard disks for improved retrieval speeds and ease of update.

OPTICAL DIGITAL DATA DISK STORAGE:

During conversion, magnetic hard disks stored all scanned image and key entered index data. The data was then copied to digital audio tape (DAT) prior to shipment to Washington, DC. The DIA computer center transferred the data from DAT to rewritable optical digital data disks. The rewritable optical media serviced the reference needs of Defense Department intelligence staff.

Image File Headers: The DOCEX system image file headers adhere to the tagged image file format (TIFF). A system administrator noted that the TIFF convention is "thin but effective." Although the IBM supplied processing software, MODCA:IOCA (mixed object document content architecture:image object content architecture) is proprietary, detailed documentation is available.

Error Detection/Correction: DOCEX system administrators experimented with the small computer system interface (SCSI) firmware to evaluate the system's error reporting capability. No optical digital data disk failures or non-retrievable images were experienced. DOCEX managers encountered fewer problems with the optical digital data disk subsystem than with other system components, notably basic computer hardware settings (e.g., dip switches) and incompatible computer software.

Recording Process: 5.25-inch rewritable magneto-optical media.

Optical Digital Data Disk Composition: Polycarbonate substrate, dual-sided media manufactured by Phillips DuPont Optical (PDO).

Capacity: 940 megabytes of data storage per optical digital data disk.

Jukebox: Procurement difficulties and contracting delays ruled out acquisition of an optical disk jukebox. As an alternative, a multi-drive, direct access optical storage device (DASD) in a tower configuration was acquired.

Storage Environment: Document scanning was performed under field conditions, while the retrieval system operates in a normal office environment. A DSMA goal is to eventually install an enterprise-wide system located in a raised floor, computer room environment.

RETRIEVAL AND OUTPUT:

DOCEX users search index data and perform image retrievals assisted by AI Corporation's Knowledge-Based Management System.

Search Techniques: The DOCEX system uses a hierarchical searching scheme supported by folder application software. Users adopt a card catalog or broad subject approach for initial search entry, later refined to a narrower scope as reference needs dictate.

Index Structures: The indexing database software included special features: automatic document numbering; acceptance of bar coded data; and, a "patch coding" technique that preserved the structure of complex, multi-page documents. DOCEX system end users requested that a broad subject thesaurus be included that could be refined at a later date. Information concerning this thesaurus, especially the document form and genre terms, is unavailable. Because inter-indexing inconsistencies were problematic, system designers added an intelligent interface (Knowledge-Based Management System) to increase retrieval effectiveness.

Image Display: Nineteen-inch high resolution monitors provide dual display capability, and laser printers provide hard copy prints on demand.

Primary System Users: US Army Intelligence Staff.

DATA MIGRATION POLICY ISSUES:

Open Systems: Open systems architecture is an on-going DSMA strategic design goal. For the present time, however, efforts to achieve an Army enterprise-wide optical systems development approach are on hold. This is due in part to the often unavoidable bureaucratic complexities involved in implementing inter-organizational information systems. Another contributing factor is that the DSMA typically does not "own" the installed computer system equipment or software, and is only responsible for providing design assistance.

Linkages with Other Agency ADP Applications: DIA analysts use Knowledge- Based Management System software to query the existing index database and retrieve images. The DOCEX system was designed for stand-alone operation, and there are no plans for linking it with any other Defense Department imaging or database applications.

Standards: The DOCEX system was based (to the extent feasible) on existing information technology standards. This reliance was not due to the specific functional requirements of the system per se, but because the DSMA views each system as one step closer to reaching a goal of a generic application model. DSMA expects that this model will eventually meet a variety of mission needs, while easily adopting emerging technology standards.

System Output: The DOCEX system utilized 19-inch diagonal image display screens. Priority was placed primarily on full page display capability, rather than display resolution. DOCEX system administrators consider the term "high resolution" display to be somewhat of a misnomer, given the current state of the screen display technology. Hard copy laser printing remains a significant factor in system developments. This is due to Defense Department staff preferences for paper copies rather than viewing electronic display screens. Persistent attempts of DSMA staff to reorient users away from paper records has had mixed success.

Network Transmission: Off-site information requests by FAX.

Backup of Image and Index Data: No ongoing backup procedures are in place. Currently, the optical digital data disks, and a backup copy on DAT, are the only existing copies. The status of the original paper records is unknown.

Technical Support and Documentation: Each system component has manufacturer supplied technical documentation, supplemented with quick reference sheets to aid in routine maintenance and troubleshooting. Documentation describing the DOCEX system and its capabilities was not complete due to the rapid system development response to the Gulf War situation. The DOCEX system administrator relied to a significant degree on the original equipment manufacturers to maintain system operations.

Interoperability: The system uses the Small Computer Systems Interface (SCSI-1) boards (synchronous transfer--off).

Migration Plans: Due to the short term intelligence value of the DOCEX records, no need exists to migrate the image and/or index data.

OVERVIEW OF SIGNIFICANT ISSUES:

Rapid System Development: Sufficient time is needed to complete critical project milestones such as equipment design, development, installation, calibration, staff training, and technical documentation. DSMA staff consider system development time as a rare luxury, given the short time frames they must operate under. A favorite staff expression is "given enough time, you can always succeed." Imaging systems with off-the-shelf, generic database software often have inherent limitations and limited flexibility. System developers may need to develop custom software, requiring additional time for procurement and integration. DSMA often uses "beta code" in its computer systems, believing that the payoff of system performance justifies any risks of software "bugs." They emphasize that for this approach to be successful, vendor support must be strong. The Army's rapid development model relies on constant input from end users. DSMA strongly believes that lessons learned from developing one system can and should be applied in developing the next one.

Information Technology Standards: DSMA staff are aware of the vicious circle of technology developments and the inherent limitations of off-the-shelf approaches, often forcing system developers to seek out custom integrated solutions. Under these conditions, vendors often develop obscure, proprietary approaches to meet the customer's unique performance requirements. In the defense agency arena in particular, a history of large system contracts under low-bid procurement rules has led to a de facto collection of incompatible systems. If the imaging industry is to reach its full potential in the Federal government domain, the rapid adoption of industry standards supporting data portability and inter-system compatibility is essential. The increased use of a distributed workstation architecture for enterprise-wide imaging, as opposed to reliance on mainframe-based central indexing, is another factor forcing the standards issue. DSMA is seeing increasing acceptance of imaging technology within the Department of the Army despite the lack of adequate industry standards.

New Technology: DSMA systems use "cutting edge" hardware and software as much as possible. The DSMA administrators emphasize that adopting this approach requires ongoing monitoring of industry and vendor trends, and there are advantages to using new technology rather than proven imaging "solutions". Adopting the latest developments increases the likelihood that such technology will conform to relevant existing standards and vendor support will be stronger.

Technology Trends: DSMA staff continually monitor the imaging industry for technology trends that may have an impact on future systems. They note a resurgence of demand for WORM optical digital data disks even as rewritable technology is increasing its market share. The large storage capacity of WORM media make them ideal for smaller imaging systems where all image data may fit on a single platter. DSMA analysts note that digital imaging technology appears to be following a development cycle similar to that of the database industry, experiencing increased compatibility and sophistication with maturity.


SITE VISIT REPORT #6

AGENCY: Department of the Army (PERMS)

SYSTEM: Personnel Electronic Records Management System (PERMS)

CONTACT: Ms. Gail Martin, PERMS Program Manager, Ft. Belvoir, VA

SUMMARY DESCRIPTION:

The United States of America's Armed Forces depend on rapid deployment of troops around the globe to fulfill their missions. The US Army recognizes the importance of accurate, up-to-date personnel records in meeting this objective. The Personnel Electronic Record Management System (PERMS) enhances the Army's ability to store and access tens of millions of document images, replacing a labor intensive records management system based on paper documents and updatable microfiche. An overall PERMS goal is to improve records management using new but proven commercially available technologies. The Army expects to receive other tangible benefits from PERMS such as the ability to respond faster during troop deployments, promotions, school assignments, and benefits processing.

The PERMS data storage architecture dedicates a specific area for document images from an individual soldier's personnel record on each twelve-inch write once, read many (WORM) optical digital data disk. This data recording strategy improves productivity and file integrity, requiring the retrieval of only a single optical digital data disk to access a soldier's complete personnel record. Unlike the existing microfiche system, PERMS can supply information to multiple users simultaneously, within twenty seconds of request. Output is available in hard copy, microfiche, or in digital form. PERMS sites are linked through the Defense Services Network (DSN), and future technology will eventually enable the acquisition and distribution of personnel data through this network.

PERMS is important to the study of digital imaging and optical media systems because of: the multiple site conversion effort underway to load the system with existing personnel information; the system's ability to support high volume scanning of paper documents and microfiche; the use of PERMS as a primary source for mission critical information; and the system's ability to output to paper, microfiche, or magnetic tape.

BACKGROUND:

An Official Military Personnel File (OMPF) is maintained for each soldier, and is useful to the Army in formulating, managing, and evaluating manpower and personnel policies, plans, and programs. The Army's soldier records identification system began in 1917. The Army's records repositories need to maintain the official files accurately, since they contain the official historical, performance, legal record of service, and other information pertaining to an individual both during and after active duty. This information is used by the Headquarters, Department of the Army (HQDA), to support military decisions such as selecting personnel for promotion, retention, schooling, and command assignments. The US Army, under Congressional Mandate, maintains an official personnel record on every soldier regardless of status (active, reserve, discharged, or retired). These records are retired to the National Archives when they are no longer useful to the Army. These records provide an appropriate physical condition, dependency status, military qualifications, civilian occupation skills, availability for service, and other such information as the service secretary concerned may prescribe. Such records are used to manage the troop strength and the careers/employment of the individual soldier. The alternatives for the PERMS project have centered around the physical media on which the records are maintained, such as: paper, microfiche, and optical media.

Prior to the early 1970's, the Army Official Military Personnel Files were a paper- based records management system. The paper file system experienced: extensive storage space needs; labor intensive file integrity and records management processes; only the original records existed (no backup); and, time consuming efforts were needed to service Selection Boards. The 1973 fire at the St. Louis records center consumed a majority of the Army's personnel records created between 1917 and 1959. Since no backup copies existed, this resulted in massive administrative problems in processing Army veterans benefits programs. A special Records Administration in Microform Mode (RAM2) task force studied the Army's existing records processes, analyzed the systems used by the Navy and Air Force military records systems, and evaluated the existing technological marketplace for alternative solutions.

Based on RAM2 task force recommendations, A.B. Dick System 200 updatable microfiche camera/processor systems were selected to convert the paper records to microfiche for storage in automated Access-M retrieval equipment. These micrographic systems still operate at the four personnel records management centers, although only 17 percent of the US Army Reserve Personnel Center (ARPERCEN) records were converted to microfiche. The microfiche Official Military Personnel File format is based on:
  • Performance Fiche (P-Fiche) used for evaluation and selection boards.
  • Service Fiche (S-Fiche) used by career managers for general information.
  • Restricted Fiche (R-Fiche) used to store historical information which may be improper for viewing by selection boards and career managers.

In 1983, the consulting firm Austin Associates studied the Army's overall personnel records operations, and made recommendations for improving the existing micrographics processes. The Austin Study, as it is often referred to, also contained recommendations pertaining to digital image technology. Subsequently, a pilot imaging project was tested at the US Army Enlisted Records Evaluation Center (EREC) in 1986, a mission needs statement was developed, and imaging system funding requirements were established at several personnel records centers. In 1986, the Secretary of the Army directed that records management problems at ARPERCEN be corrected, funding requirements were established, and program efforts were initiated to implement Austin Study recommendations. Approval to begin PERMS with ARPERCEN records was obtained in 1989, and transferred to PEO STAMIS for management and oversight. ARPERCEN was selected as the initial digital conversion site, to be followed by the remaining Army personnel records centers.

Origins: ARPERCEN is the designated custodian for Active Reserve member's and retired Army member's personnel records. The combined holdings of 2.7 million records consist of approximately 156 million paper documents and 175 million microfiche images. The holdings require an ever increasing expensive storage space, with an annual growth rate of more than 15 million documents. A filing backlog continually exists due to the record jackets being out of file, with an average ten day wait time to obtain a requested record. Almost 400,000 microfiche duplicates are produced annually, supporting functions such as selection boards, personnel transfers, and other purposes.

The Army's existing updatable microform system was recognized as being technologically outmoded, and was considered to be unresponsive in meeting today's dynamic personnel management needs. Paper records created during the soldier's entry process are converted directly to microfiche. Ensuing documents are stored as a temporary paper file for up to one year before the microfiche official record is updated, unless the soldier's record is scheduled for a HQDA Selection Board review. In this case, the documents are converted to microfiche as soon as possible. A study indicated that up to forty percent of the microfiche images are of poor legibility, and ten percent are unreadable. Contributing to this was the inability to obtain full funding for the microfiche systems, resulting in long delays for record updates/changes. Previously a routine change to a microfiche record through normal channels could take several months, and three weeks on a priority walk-through basis.

The ARPERCEN digital image conversion contract (with four option years) was awarded on 1 November 1990. Operational test demonstrations were completed in January 1991, and production began in April 1991. The integration contract for system hardware/installation was awarded on 11 April 1991 as a one year contract with five option years. Authority for PERMS is under Title 10 of the US Code for Armed Forces and Title 44 of the US Code. These public laws define Federal agency records management, and govern the creation and maintenance of an official military personnel record for all aspects of a soldier's career. The PERMS project provides an automated, integrated digital image and optical digital data disk-based personnel system for each of the four Army Records Centers that will: (1) improve the quality of personnel records; (2) streamline the records update process; (3) improve accountability and control; (4) improve record accessibility and support to selection boards; (5) reduce record storage space through paper reduction; and, (6) reduce operating costs.

Although they will not be directly connected, PERMS capability will be installed at the four major Army personnel records sites: 1) the US Army Reserve Personnel Center (ARPERCEN) in St. Louis, MO; 2) the US Army Total Army Personnel Command for active duty Army enlisted personnel by the Enlisted Records Evaluation Center (EREC) at Fort Benjamin Harrison, IN; 3) the US Army Total Army Personnel Command for active duty Army officers by the Management Support Directorate (MSD) in Alexandria, VA; and, 4) the US Army National Guard Personnel Services Division (NGB-PSD) in Arlington, VA. ARPERCEN's existing Army retiree records are not yet approved for conversion to PERMS. A long-term Army goal is to create a consolidated records management system at St. Louis.

The total PERMS project, estimated at 79 million dollars, is being performed under a prime and three separate conversion contracts. PRC, Inc. is the prime system integration contractor, responsible for overall operations, maintenance, and integration of the Army personnel records sites. PRC provided the PERMS input subsystem and host database as Government Furnished Equipment (GFE) to operate conversion scanning sites in Washington DC, and Indianapolis, IN. I-NET, Inc. is converting up to 114 million paper and microfiche images at ARPERCEN (St. Louis). I-NET will also convert up to 5.3 million images for MSD at the St. Louis facility. Using GFE, MSTC, Inc. is converting 33 million images from enlisted records at Fort Benjamin Harrison (EREC), and Kathpal Technologies, Inc. is responsible for converting up to 6.8 million images from National Guard officer records in Washington, DC.

SYSTEM CONFIGURATION:

Date system installed and accepted at ARPERCEN: December, 1992

System Installed by: PRC, Inc., Reston, VA.

System configuration changes since installation at ARPERCEN include: an upgrade of the UNIX operating system software and the Informix database software; an upgrade to the PC workstation from 386DX25 to 486DX2-66 cpu; implementation of CISCO routers instead of FDDI bridges for network communications; use of Sun Sparc 10 servers to replace PC servers for the Computer Output Microfiche (COM) units; 150 dpi image display monitors replaced by 120 dpi monitors; the addition of a COM duplicator at ARPERCEN and NGB- PSD; and the addition of SunRise microfiche scanners to replace TDC fiche scanners.

PERMS architecture is based on commercially available digital imaging technology. The systems are to be installed at headquarters level record centers, and are image based rather than electronic data processing. The standard PERMS configuration has five interconnected subsystems: The Input subsystem serves as the data capture component, using document and microfiche scanning equipment, key entry indexing, and quality control; The Index Database Host subsystem processes index data and supports searches and retrievals; The Optical Storage subsystem utilizes optical digital data disk components to store the image data; The User subsystem has high resolution display screens and laser printers for accessing the index and image files; and The Output subsystem includes paper and microfiche printing equipment. The Selection Board subsystem, initially envisioned as a sixth subsystem will be provided in the future as a stand-alone system. It will provide access to the index database and optical digital data disk images in support of promotion board activities.

PERMS HARDWARE COMPONENTS

Servers:
  • Input Subsystem:
    • Input File Server--IBM/6000 (Model 55L)
    • Input Optical File Server--IBM/6000 (Model 250)
    • Image Processing Server--Everex 486DX2-66
  • Index Database Subsystem:
    • Index Database Server--IBM/6000 Data Base Engine ( Model 55L)
  • Optical Storage Subsystem:
    • Optical File Server--IBM/6000 (Model 25S)
  • User Subsystem:
    • Low Speed Printer Server--Everex 486DX2-66
  • Output Subsystem:
    • COM Output Server--Sun Sparc 10
    • High Speed Printer Server--Everex 486DX2-66
Workstations--Scan Stations:
  • Input Subsystem:
    • Basic Processor--Everex 486DX2-66
    • Image Compression/Decompression software
    • Display Terminal--Sigma Designs Multimode 19-inch 120 dpi
Workstations--Index/QC/Users:
  • Input and User Subsystems:
    • Basic Processor--Everex 486DX2-66
    • Image Compression/Decompression software
    • Display Terminal--Sigma Designs Multimode 19-inch 120 dpi
Scanners:
  • Input Subsystem:
    • Paper Documents--TDC 4530
    • Paper (Desktop)--Fujitsu M3096B
    • Microfiche--Sunrise Corporation (Model SRI 50)
Storage Devices:
  • Input Subsystem:
    • 9-Track Magnetic Tape Drive--Cipher M995S
    • Interim Optical Drive--LMSI LD4100
  • Optical Storage Subsystem:
    • Stand-Alone Optical Drive--LMSI LD4100
    • Optical Disk Drive--LMSI LD4100
    • Optical Disk Jukebox--Cygnet 1800 Series
Printers:
  • User Subsystem:
    • Low Speed (Local) Laser--QMS PS1725
  • Output Subsystem:
    • High Speed Laser--QMS PS1725
  • Input and Output Subsystems:
    • Low Speed Printer--IBM Proprinter
COM Output:
  • Output Subsystem:
    • COM Output Device--Anacomp XFP2000
Communications:
  • Input, Index Database, Optical Storage, User, Output Subsystems:
    • Cisco Multiprotocol Router
    • Network Manager--Sun Microsystems SPARCstation IPC
PERMS SOFTWARE COMPONENTS Operating System:
  • Software Products: IBM AIX Version 3.2;
    Interactive 386/ix
  • Standard Specification: POSIX
  • PERMS Implementation: INTERACTIVE UNIX OS V 4.0 Operating Environment Across All Platforms
User Interface:
  • Software Products: OSF/Motif
  • Standard Specification: X-Windows
  • PERMS Implementation: Common User Interface
Language Processor:
  • Software Products: C-AIX; C-Interactive 386.ix;
    Informix ESQL/C
  • Standard Specification: C
  • PERMS Implementation: Single Application Development Language
Data Base Management System:
  • Software Products: Informix V 5.0 - Online
  • Standard Specification: Structured Query Language (SQL)
  • PERMS Implementation: Common Distributed DBMS
Network Services:
  • Software Products: TCP/IP; Network File Service;
    TSP/3270 SNA Services; Informix - Star;
    Informix - Net
  • Standard Specification: GOSIP
  • PERMS Implementation: Layered Software on All Platforms

DIGITAL IMAGE CAPTURE:

Document Preparation: A labor intensive element of the conversion effort is cleaning up the existing paper/fiche records. PERMS conversion operations staff consult an Army Personnel Records Regulation to determine which documents (or microfiche images) to include or purge. Analysis shows that approximately fifty percent of the existing documents/images are not pertinent to a soldier's personnel record and are discarded. As appropriate, a "Best Document Available" mark is stamped on difficult documents.

Disposition of Original Records: After scanning, the paper documents and copies of the microfiche will be returned to the individual soldier for the OMPF and Military Personnel Record Jacket (MPRJ) or "201" file, and the A.B. Dick updatable microfiche will be destroyed.

Image Scanning: The I-NET Corporation conversion staff scan the documents and microfiche, recording the data onto rewritable magneto-optical disks. The magneto-optical disks are retained until the scanned images are accepted by the Army PERMS staff, at which time the data is transferred to twelve-inch optical digital data disks. Master personnel health, dental, flight and jump records are processed in batch mode, with image data indexed and quality inspected. The ARPERCEN conversion site employs six TDC high speed paper document scanners and four Lenzar microfiche scanners. The Army base system will utilize a Sunrise Corporation microfiche scanner to provide adaptive threshold image enhancement and improved image quality. The GFE conversion systems utilize a combination of TDC IS3000 and Sunrise SRI 50 microfiche scanners.

Production Throughput: The TDC document scanners can capture 43 images per minute at 300 dpi (50/min. at 200 dpi), with "on-the-fly" compression. More than 23 million ARPERCEN document pages have already been converted to digital images.

Quality Control: An independent quality assurance staff monitors the performance of the conversion contractors. A QC sampling formula with a sliding scale of 100%--30%--10% is used, augmented with random government staff audits. Both index and image data are quality verified.

Scanning Resolution: Paper documents and microfiche are scanned at 300 dpi.

Color and Gray Scale: Soldier photographs or other continuous tone images are not part of the PERMS digital record. The Selection Board Subsystem is expected to incorporate the capability to display digital color photos for promotion board review.

Image Enhancement: Enhancement algorithms reduce image file sizes and provide cleaner-looking screen and printed output images. Benefits include the ability to store more images per optical digital data disk, and improved network transmission.

Compression/Decompression: Software compression was selected for PERMS to avoid a dependency on proprietary hardware solutions. To reduce data loss and speed up transmission, PERMS program managers established an average digital image conversion file size of 85KB. Using electronic clean-up, a file may be reduced to 71KB. Unclean or "noisy" documents may result in digital images of 87KB to 125KB each.

DOCUMENT INDEXING:

The PERMS indexing system improves document access, eliminating the need to browse an entire personnel record to retrieve a specific document. The index system can: provide overall management information concerning the military personnel records (i.e. the number of Articles 15 possessed by a specific category of soldier); permit the system to automatically suspend documents; and, allow the Official Military Personnel File (OMPF) custodian to correct a record without using blackouts or voids (as done in the microfiche system). The system provides the records custodian with the ability to quickly respond to telephonic inquiries regarding the personnel records, protects documents not authorized to be released to the field, and permits the system to automatically identify and eliminate the filing of duplicate documents.

Limitations may be applied to documents to control access to specific authorized users or to restrict particular documents. Selection Board procedures will be improved by guaranteeing that only authorized performance documents will be displayed, allowing Board members to tailor display of documents for every record reviewed in chronological (or reverse) order depending on Board member requirements.

Creation of Index Database: The conversion staff is responsible for key entering the index information from the scanned images. PERMS has no Optical Character Recognition (OCR) requirement. The PERMS document index is able to support the unique information retrieval needs of diverse groups such as the National Guard, regular Army, and Army Reserves.

Location of Index Database: The PERMS document index data is transferred from the input file servers to the host database file server, and stored on the system's magnetic drives. Magnetic tapes are the PERMS system's index data backup storage media, requiring five gigabytes for forty eight million soldier's records.

Index Structures: The PERMS index system contains the following elements:
  • Social Security Account Number--9N (numerics)**
  • Name (Last, First, Middle Initial) (27AN) (alpha-num)**
  • Name Abbreviation (2AN)**
  • Document Type (11AN) plus edition date (8N) (Varies)**
  • Document Effective Date (8N)**
  • Document Number of Pages (3N)**
  • Special Form Identifier (1A)
  • Action Pending Indicator (1AN)** (only MPRJ)
  • Pending Removal Date (8N)** (only MPRJ)
  • Date added to record (8N)

** This field element requires manual keystroke data entry.

The four fields considered as key information elements are: SSAN, Name Abbreviation, Document Type, and Document Effective Date, followed by the others during key entry.

OPTICAL DIGITAL DATA DISK STORAGE:

During conversion at ARPERCEN, the in-process images are recorded onto rewritable magneto-optical disks. Batches of converted images are subsequently transferred to twelve- inch WORM (interim) disks. The interim disks are then copied to LMSI optical digital data disks, and stored in the jukeboxes for servicing user requests. Areas of the optical digital data disks are pre-reserved for a specific soldier's images and updated when additional documents are received, enhancing system retrieval performance by reduced disk handling. Retired optical digital data disk platters are stored at the St. Louis Archives.

Image File Headers: Block header information is linked through pointers to the personnel records index database.

Error Detection/Correction: System supplied by manufacturer.

Recording Process: The I-NET Conversion site uses rewritable magneto-optical (MO) optical media, these are copied to "interim" write once, read many (WORM) LMSI LD4100 optical digital data disk media. Elapsed time is approximately ten hours to load data from conversion site disks onto the LMSI 4100 optical digital data disks. The EREC and NGB-PSD conversion sites write image and index data directly to interim 12-inch WORM disks, using the GFE input subsystem. Interim disks will be dataloaded to disks in jukebox storage when the base system is installed for these sites.

Optical Digital Data Disk Composition: LMSI optical media.

Capacity: Each 12-inch (5.6 GB) optical digital data disk stores the complete personnel records for approximately 600 soldiers, or the equivalent of 60,000 document images.

Number of Optical Disks in Use: ARPERCEN will reach a capacity of 1,230 platters in 10 jukeboxes, providing 6.8 terabytes for more than 73 million images.

Jukebox: The ARPERCEN system currently has seven Cygnet 1803 series jukeboxes. Each jukebox contains two LMSI LD4100 optical drives, and can hold up to 131 optical digital data disks.

Storage Environment: The ARPERCEN system was installed in a temperature controlled, raised floor computer room environment.

RETRIEVAL AND OUTPUT:

The index database server processes record requests from network user workstations, and the images are automatically retrieved from the optical digital data disk jukeboxes. Images are viewed on a high resolution monitor, and output to local laser printers or high speed printers or microfiche output is available.

Display Output: The ARPERCEN system uses 19-inch diagonal, 150 dpi Cornerstone display monitors. Systems installed at MSD, EREC and NG-PSD will utilize 120 dpi Sigma Design Multimode monitors.

Output Formats: The user has several options: an entire Army personnel record (or specific portions selectively identified) can be printed out (with document ID number) using laser printers, COM (film output) recorders, or magnetic tape. At 300 dpi resolution, the COM microfiche production equipment requires 1.5 to 2 minutes per fiche (rated at 60 images/min.).

System Performance: PERMS contract requirements specify access to the first image within 20 seconds, subsequent images requested from the same file are displayed within 4 seconds each.

Primary System Users: System access is through a network of workstations in the facility housing the records center. Remote offsite access to PERMS is a future possibility, but is not a current requirement.

DATA MIGRATION POLICY ISSUES:

Linkages with Other Agency ADP Applications: PERMS has a communications link with the Army's site host mainframe computer. TPS/3270 System Network Architecture (SNA) software provides the SNA connection between the PERMS index database host server and the site host mainframe. This connection is used to transfer data from the site host to the PERMS index database host or to perform a task on site host applications such as PERNET. PERMS provides increased management tracking controls over personnel records located at different Army records centers.

Network Transmission: The Defense System Network (DSN) currently links the personnel record centers. Future plans include image transfer between records center sites using a Fiber Distributed Data Interface (FDDI). This LAN-based connectivity (ARPERCEN having 175 nodes) will enable management to access personnel records after PERMS is installed at the four personnel centers. Magnetic tapes are now used for image data exchange between operational sites.

Backup of Image and Index Data: The conversion or "interim" optical digital data disks serve as image data backup. Army program managers are interested in optical tape technology (approximate 1 terabyte of data) for image data backup at an off-site emergency backup storage site. The backup process might take up to several days to accomplish, and could possibly be contractor maintained. It is envisioned that the Continuity of Operations Plan (COOP) for PERMS will be integrated into the COOP for the site host mainframe and other ADP systems.

Technical Support and Documentation: The PERMS system software is maintained in an escrow account for redundancy backup. Army operations staff were trained by the imaging contractor. After the conversion is completed, a Federal government systems administrator and employee production staff will be responsible for systems operations. Contractor support is needed when serious system problems occur, or when system changes are needed. PRC provides two on-site personnel during the first year of operation. ARPERCEN currently continues into the second year with one on-site PRC technician with in-depth knowledge of systems operations.

Interoperability: Data exchange between the Army's PERMS records centers is planned.

Migration Plans: Lack of standards for imaging technology is worrisome to Army management. If funding was not an issue, the current rapid pace of industry technology developments could support system upgrades in as little as every two years. A long-term online database is needed to retain retired soldier data, with optical digital data disks containing retired soldier's images stored off site.

OVERVIEW OF SIGNIFICANT ISSUES:

Conversion Process: The PERMS conversion faced challenges from: huge existing paper filing backlogs; unclean or noisy documents; labor intensive indexing; and, scanning difficulties with physically damaged or illegible microfiche. The paper records needed a labor intensive cleanup to purge unnecessary documents. The ongoing conversion effort involves a contractor operated conversion site with 22 Government quality control staff. The contractor converts entire records in batches of 600-900 files. ARPERCEN has organized a PERMS Division which includes 52 Military Personnel Clerks responsible for scanning and indexing update documents. The images from both efforts are loaded to the PERMS jukeboxes for user retrieval within the Center. The original A.B. Dick microfiche present a special conversion problem due to the wear and tear they experienced over time during the normal course of records updating and retrievals. Special handling to prepare the microfiche, and careful calibration of the microfiche scanner improves the digital image legibility.

Business Process Re-engineering: Records management is typically the last frontier of office automation, and it is difficult to change people's ingrained habits. The automated PERMS system introduces profound changes into the personnel files access and update procedures. Digital imaging supports simultaneous shared access to information, providing more efficient support for personnel managers and providing improved responsiveness to tasks such as troop mobilization. PERMS supports a new way to mobilize Army forces through faster personnel identifications and improved access to document images with information such as pay grades and job skills. Large scale conversions may be easier to accomplish using contractors due to less restrictive government union rules and regulations. The possibility of storing text rather than images to greatly reduce storage needs was considered. However, the information is typically generated in the field and submitted to PERMS for retention in different physical formats. PERMS system tests were conducted to verify performance and the system's ability to meet Army-wide needs.

Image Display: PERMS management staff is studying image clarity and display quality issues. The existing requirement for image display terminals specifies screens with 120 dpi (minimum) capability. PERMS project management is interested in image resolution and other human factors issues. This is due to the economic issues involved in special ordering high resolution monitors.

Forms Removal: The PERMS installed at ARPERCEN had a capability that removed selected form templates so that only text data was retained. This capability tested data reduction technology using a select number of standard Army forms. The scanned images serve as input, and up to a fifty percent reduction in digital file sizes is possible. The text data is saved separately from the forms library data. Upon user request to display the document, the system software merges the text and the appropriate personnel form template. Since forms removal was viewed as a PERMS test, the optical storage subsystem was sized sufficiently in case no actual data reduction savings were realized. Factors that cause operational problems for the forms removal software include: non-standardized forms; lack of sufficient print quality; updated forms that relocate the needed information; and, various print fonts. The Army has eliminated Forms Removal as a requirement for PERMS.

Mission Functions Supported: PERMS improved several key areas: 1) Records Clean-up: An effort was started to purge existing paper records of duplicate or unneeded documents (approximately fifty percent discarded). The clean-up affected approximately 20% of the paper files before PERMS conversion started. Paper filing backlogs are being eliminated as well, providing greater control over the Army's personnel resources. 2) Promotion Boards: Improved service to large selection boards (i.e. up to 30,000 candidates) by eliminating the lengthy time to produce duplicate microfiche; users view images on high resolution monitors rather than microfiche readers. An improved promotion process will enable the Army to better reward individual soldiers for their performance. 3) Voting System: This future enhancement might employ voice recognition technology, accepting voice commands for record searches and retrievals through Internet.

Agency-wide Planning: A comprehensive indexing scheme was devised to support diverse groups with personnel data access. An overall goal is to integrate the U.S. Army, Army Reserve, and National Guard Bureau personnel records center systems. Currently, National Guard enlisted members records are maintained at 54 distinct state sites. During a National Guard mobilization, records provided to ARPERCEN are deactivated at a later date.

Lessons Learned: Organizations should consider: production delays are unavoidable due to the many "exceptions" to the norm; input documents and microfiche are not in perfect condition and require special handling; technology-based solutions cost more to implement than manual systems; and, indexing is the golden "key to success" and extra efforts in this area will be rewarded.

PERMS Success Story: The Army Records Managers can say that PERMS was a success because: the Austin Study recommendations were implemented; a successful prototype system was installed at EREC to prove the merits of ODI technology; PERMS operational test and demonstration and its criteria was derived from lessons learned from the prototype; the PERMS statement of work was developed by records managers and imaging specialists; the functional community drove implementation of PERMS; PERMS proved early-on the capability to scan A.B. Dick microfiche for conversion to ODI; the prototype identified the hardware requirements that became PERMS equipment. The Army Records Managers have succeeded in providing the ability to personally assist "people/soldiers" on important matters in a timely manner. The Army Records Managers had vision that focused on the "possibilities". They brought "new blood" into the program during each year of development. Efforts to develop program documentation and marketing films have been rewarded because, through this media, high level management was kept informed of the importance of the project.


SITE VISIT REPORT #7

AGENCY: Environmental Protection Agency

SYSTEM: SCRIPS--Superfund Cost Recovery Image Processing System

CONTACT: Charles Young, Environmental Protection Agency, Washington, DC

SUMMARY DESCRIPTION:

The Environmental Protection Agency's (EPA) commitment to imaging technology is demonstrated in the Superfund Cost Recovery Image Processing System (SCRIPS). SCRIPS supports the compilation, production, retention, and distribution of cost recovery reports and legal documents associated with the clean up of high priority toxic waste sites. SCRIPS automates the storage and retrieval of all site-specific Superfund cost documentation. An EPA goal is to seamlessly integrate this information with information from the agency's existing mainframe-based financial management system.

SCRIPS replaces a previously manual, labor intensive process of photocopying, filing, and retrieving multiple paper copies of the cost recovery reports. Cost recovery documents are currently processed according to pre-established procedures in the EPA's regional offices. The EPA regional offices digitally scan, index, and store the data on magnetic media. This information is kept in each regional office. The three nationwide finance offices (Washington, DC, Cincinnati, and Research Triangle Park) scan their images into a single machine in RTP. The offices are connected through a WAN. The nationwide images associated with each region are downloaded to nine-track magnetic tapes and transfer to each region every two weeks. Superfund images are maintained in each of the regions and the images are recorded onto optical digital data disks and subsequently stored in the optical disk jukebox.

The SCRIPS imaging system is important for this study because of: the critical agency mission functions it supports; its significance in the EPA's agency-wide imaging efforts; the output and data transfer requirements; and, the multi-site information processing functions performed.

BACKGROUND:

The Environmental Protection Agency facilitates the coordinated and effective governmental action on behalf of the environment. The mission of the Environmental Protection Agency is to control and abate pollution in the areas of air, water, solid waste, pesticides, radiation, and toxic substances. Its mandate is to mount an integrated, coordinated attack on environmental pollution in cooperation with state and local governments. The EPA Office of Solid Waste and Emergency Response is responsible for directing the Agency's hazardous waste programs, including the administration of the Superfund Act. The EPA's ten regional offices are responsible for accomplishing the agency's established national program objectives. In summary, the Environmental Protection Agency serves as the public's advocate for a livable environment.

Under terms of the Superfund Act, the EPA is responsible for compiling a substantial amount of documentation on funds expended in cleaning up hazardous substance sites, including such financial records as employee time sheets and payroll vouchers. The agency is tasked with gathering information on "potentially responsible parties" who may be required to reimburse the government for the cleanup activity costs. An agency manual referred to as the "Blue Book" provides detailed instructions on filing, reconciliation, and cost documentation procedures. The resulting cost recovery packages, most being quite voluminous, are developed to document costs spent and are used by EPA regional finance offices in judicial court legal proceedings. The records and reports created in the site cleanup process are scheduled for disposal after twenty years; alternatively, a thirty year records retention schedule governs cases involving EPA's water monitoring activities.

EPA has a demonstrated commitment to the intelligent use of imaging to address its document management problems. In approaching the question of optical digital data disk technology, the agency developed a comprehensive planning strategy for imaging applications. The EPA's Office of Information Resources Management has issued guidance on the development of imaging systems, including those installed in regional branches. EPA's National Data Processing Division in Research Triangle Park, North Carolina includes staff experienced in planning and implementing imaging systems. Presently, the EPA has approximately nine imaging applications in some stage of development. For example, the Office of Toxic Substances Image Processing System (OTSIPS) is a pilot system supporting the PreManufacturing Notice Program. Another example is the Superfund Document Management System (SDMS), currently in the system design phase. The SDMS will assist the Superfund program in managing millions of pages of site file material that is not part of the cost recovery documentation.

Origins: In 1986, the EPA's Office of the Comptroller made the decision to automate the cost recovery documentation process using imaging technology. In 1987, a prototype imaging system was developed and tested at one regional office and the EPA's Research Triangle Park facility in North Carolina. This experience contributed to the issuance of a Request for Proposals in late 1989 for a more complex imaging system with expanded capabilities. In 1990, the EPA awarded a contract to IBM to develop the SCRIPS system using IBM AS/400 minicomputers and Electronic Filing Cabinet (EFC) software. SCRIPS workstations are currently installed in 12 of the 13 EPA finance offices nationwide including: Cincinnati, Ohio; New York City; Atlanta, Georgia; Philadelphia, Pa.; Chicago, Illinois; Dallas, Texas; Seattle, Wash.; San Francisco, Ca.; Boston, Mass.; Kansas City; Research Triangle Park, NC; and, the Washington, DC Information Center. The final site scheduled for system installation is Denver, Colorado. The EPA expects to gain additional benefits from SCRIPS by fully integrating the imaging system into its automated financial management and reporting system for Superfund (SCORE$ - Superfund Cost Organization and Recovery Enhancement System).

The SCRIPS imaging system has exceeded the EPA agency's expectations by successfully removing large volumes of paper records from storage shelves. The system has also streamlined formerly labor intensive, routine paper case processing functions. Cost recovery document packages can now be easily created, printed, and delivered within a matter of a few days. This is in comparison to spending several weeks performing the same functions with the previous manual systems.

SYSTEM CONFIGURATION:

Date system installed: 1991

System Installed by: Installation was accomplished by the efforts of EPA staff in the Finance offices and Information Resources and with Vendor staff from IBM, Planning Research Corporation, Unisys, Martin Marietta, and SAIC.

Some changes have been made in the system configuration since installation. The workstations have received memory upgrades and more importantly the optical storage devices are improved. The optical configuration has changed from single drive - 2 gb optical platters to 5-drive - 5.8 gb platters. This change has greatly improved system performance and reduced human intervention.

The SCRIPS system consists of host processor minicomputers; scanning, viewing, and printing workstations; data communications networks; optical disk drives; and, optical disk jukeboxes in accordance with the following specifications:
  • Host Processors--IBM Application System AS/400 minicomputers provide system index and image data storage, wide area and local area network data communications, and system operational control. The AS/400s have expandable CPU processors, expandable main memory, IBM 9332 magnetic disk storage devices, IBM 3476 InfoWindow monitor system console, IBM 4202 Proprinter system printer, IBM 9348 magnetic reel tape drives, IBM 9346 magnetic cartridge (streamer) tape drive, and IBM 5363 optical disk storage controller. IBM supplied software includes operating system, communications utilities, office and PC support, and applications development tools.

  • Image Processing Workstation--IBM PS/2 Model 80 with 80386 processors, 3.25 inch 1.44 MB floppy disk drive, a 70 MB internal hard disk drive, an IBM 8508-001 11 x 14-inch high resolution (114 dpi) monochrome display monitor, mouse, bar code wand readers, and internal expansion cards to accommodate imaging functions.

  • AS/400 Electronic Filing Cabinet (EFC) software resident on both the host computer and local image processing workstations supporting all system interface, control, and executable functions.

  • Document Scanners--Bell & Howell CopiScan Model 3338 with auto page feeders.

  • Optical Disk Drive--IBM 9247 Optical Disk WORM drives (Laser Magnetic Storage International--LMSI 1200E) with removable 12-inch media (2.0 GB per disk). The optical drives are provided with protective metal case housings. These are no longer in use and have been replaced by LMSI LD 4500 RapidChangers with a 5-platter capacity at 5.8 gb per platter.

  • Optical Disk Jukebox--64 disk capacity IBM 9246 Automated Optical Disk Library Unit manufactured by FileNet Corporation. The FileNet jukebox contains LMSI Model 1250 optical disk drives (provided without rigid cases). Maximum optical data storage capacity per jukebox is 128 GB.

  • Laser Printers--Laser printers for printing text, graphics, and scanned images.

  • Equipment on-call maintenance is available under the contract to ensure system repairs in the event of hardware failures.

DIGITAL IMAGE CAPTURE:

Conversion Staff: Paper documents are converted to digital images by EPA regional office staff using a four stage process: document preparation; indexing; scanning; and, image quality verification.

Document Preparation: The documents are prepared according to pre-established agency guidelines. One document make-ready staff member can prepare document batches for up to five scanner operators. The prepared batches are forwarded to the scanning station for indexing and image capture at the same workstation.

Indexing: Information extracted from the documents is manually key entered or automatically captured using bar code technology.

Document Scanning: The scanners provide either 200 or 300 dots per inch resolution via the AS/400 software. The automatic page feeders achieve a maximum throughput rate of 25 pages per minute, although actual scanning rates depend upon document quality, size, and contrast variations. The scanner feed systems automatically handle up to fifty 8.5 x 11 inch or 8.5 x 14 inch pages, accepting sheets as small as 3 x 5 inches.

Disposition of Original Records: The original paper documents are stored after scanning according to a pre-established records retention schedule. The current records retention schedule provides for 20 year retention after completion of cost recovery litigation (Schedule Number NCI-412-85-27).

Quality Control: System managers conduct quality control of image data by visually inspecting random images displayed on the workstation screens. No specific technical image evaluation procedures exist.

Scanning Resolution: Cost recovery documents are scanned at 200 dpi.

Color and Gray Scale: The Bell & Howell CopiScan II Model 3338 scanners have gray scale capability but do not have color image recording capability.

Image Enhancement: The scanners have adjustable light/dark contrast settings.

Compression/Decompression: The SCRIPS imaging system uses proprietary I-Net Corporation image compression software algorithms.

DOCUMENT INDEXING

The SCRIPS image conversion work flow supports an integrated indexing and scanning process. Prior to scanning, indexing information is manually key entered using the original documents or through bar code technology when machine readable information is available.

Creation of Index Database: System operators enter index data for each document prior to scanning using workstation pull-down user interface menus. Indexing can be performed using a keyboard or mouse attachment, or by bar code reader/wand input for documents with suitable bar code symbology. Creating and maintaining an effective, error- free indexing system compatible with the EPA's Integrated Financial Management Systems (implemented in March 1989) is time consuming and presents a significant system design challenge.

Index Structures: Limitations of early SCRIPS system indexing software restricted indexing codes to only eight fields. As the system's software has evolved and been customized for specific applications, the number of active fields indexed per document has grown significantly. Indexing support software now includes a modicum amount of authority control from index tables. The indexing screen format was designed to provide requestors with sufficient information to often preclude the need to actually retrieve the scanned images. EPA's Integrated Financial Management System was the model for the SCRIPS indexing fields and image cataloging.

As SCRIPS evolved, the need for redundant indexing was reduced. SCRIPS currently uses two indices for non-payroll transactions - Site/Spill Identifier (SSID) and Bar Code. For Payroll transactions the indices are social security number, SSID, pay period, and fiscal year. The integration of SCRIPS with SCORE$ (the Agency's Superfund site cost system) has allowed the Agency to reduce the number of indices. Within SCORE$ are the financial transactions. Through the Bar Code, SCORE$ prepares a list of required Superfund site documents, which SCRIPS reads and produces the appropriate documents.

Location of Index Database: Index data is captured and stored on magnetic disk media for ease of update and to maintain adequate search and retrieval rates.

OPTICAL DIGITAL DATA DISK STORAGE:

Image data is downloaded to magnetic media and then to optical digital data disks within the region. For nationwide images, a 9-track magnetic tapes is prepared in EPA's technical facilities in Research Triangle Park, North Carolina and delivered to the regions every two weeks.

Image File Headers: The image file headers are created by a proprietary adaptation of the IBM Electronic File Cabinet software. The software modification was performed by PRC, Inc. working for the EPA as a third-party system integrator.

Error Detection/Correction: Technical details of the error detection processes were not available to the system administrators. The error detection system is transparent to end users, and no optical digital data disk failures or data retrieval problems were noted to date.

Disk Drives: The LMSI RapidChanger, 5-disk drive jukeboxes are manufactured by Laser Magnetic Storage Inc. (LMSI) as model LD 4500. The optical digital data disks are automatically loaded into the RapidChanger at regional offices not equipped with jukeboxes.

Recording Process: Twelve-inch, write once, read many (WORM), dual-sided media.

Optical Digital Data Disk Composition: LMSI glass substrate media.

Capacity: 5.8 GB per disk side (approximately 50,000 pages per disk).

Number of Optical Disks Used: Approximately 30 within the regions.

Jukebox: The FileNet Corporation jukebox equipment holds 64 optical digital data disks. Two units are installed in Research Triangle Park, North Carolina, and one jukebox is operating in Region 5, Chicago, Illinois and one in Region 3, Philadelphia, Pa. In addition, the 2 LMSI RapidChangers each are, or will be, installed in the remaining eight regions.

Storage Environment: The IBM workstations and stand-alone optical disk drives are operated under normal regional office temperature and humidity environments. The SCRIPS optical digital data disk systems installed in Washington, DC and Research Triangle Park, North Carolina are operated under humidity and air conditioned (HVAC) controlled raised floor computer room environments.

RETRIEVAL AND OUTPUT:

EPA's document indexing goal is to obtain high search relevance and information recall, effectively streamlining the image searching process. This eliminates the need for users to "flip" randomly through the electronic image files. The EPA accepted this enhanced indexing capability as a trade-off for slightly slower image access and display times.

Primary System Users: The primary users of the system are clerks who either scan or retrieve the documents.

Display Output: Each SCRIPS workstation includes an IBM 19-inch diagonal display screen with 114 dots per inch resolution. The monitor screens display 24 lines with 80 columns per line of data in a PC-based workstation configuration. The display monitors support VGA graphics, and within the Electronic Filing Cabinet application display pull down windows and two full-page images side by side.

System Network: The SCRIPS communication network has been decentralized. Previously, regional requests came from the regions to the centralized data repository processes in Research Triangle Park. This process has evolved to a decentralized environment. Each region now has the capability to produce their own cost recovery packages. The regions scan and retain their own images and receive, every two-weeks, images from the nationwide finance offices.

Image Redaction: The SCRIPS system supports image redaction for on-screen viewing and printing. Using the workstation's mouse, the operator can "point" to an area designated for redaction, and then drag the mouse over the area to be covered. This procedure creates a blackened box over the designated image area, and the redacted image can be saved for later retrieval and printing.

Laser Printers: SCRIPS system laser printers provide hard copy access to text, graphics, and scanned images.

DATA MIGRATION POLICY ISSUES:

Linkages with Other Agency ADP Applications: SCRIPS has many inherent capabilities including the ability to: move image and index data within the system; transfer image data to magnetic tape; and to output document images using laser printers. There are significant limitations, however, on the ability to share information with non-IBM software supported systems. Achieving a seamless interface of SCRIPS images and index data to the SCORE$ system is a key agency goal.

Network Transmission: SCRIPS utilizes an internal SNA Gateway communication network. EPA system administrators are investigating T-1 data communications (data rate of 1.544 million bits per second) for future applications. The existing process for long distance data transmission now is through sneaker-net. EPA system administrators expect to eventually eliminate the site-to-site transport of nine track magnetic tapes.

Backup of Image and Index Data: Routine data backups are performed to ensure against inadvertent data loss for optically stored images.

Technical Support and Documentation: The highest level policy documentation (i.e. the report--Guidance for Developing Image Processing Systems in EPA) is a real strength of EPA system development. At the technical level, they have less specific on-site documentation than would be required if EPA did not have significant technical support personnel at Research Triangle Park. The SCRIPS system also has comprehensive system user guides and training manuals.

Interoperability: The EPA has adopted a hybrid approach, with the trend at the index/database management level toward full agency integration/search capability supported by stand-alone imaging systems.

Migration Plans: Given the mainly short-term use of the internal agency administrative data, EPA has little incentive at the operational level to plan for migration of SCRIPS data to future generations of imaging technology. Although the Agency is considering the migration from the AS/400 platform to a PC-based Novell compatible processing system. However, long-term upper management commitment to imaging technology is demonstrated through: in-house technical support; a coordinated IRM strategy; and, a stated policy of coordination across the centralized and regional offices. In some ways, this management umbrella compensates for lack of system interoperability, especially given the existing limitations of the optical digital data disk industry.

OVERVIEW OF SIGNIFICANT ISSUES:

Business Process Re-Engineering: The use of SCRIPS has enabled the Agency to discontinue the active site file project. The active site file project required the Agency to make and keep paper copies of all site documents in the individual site file. With many documents applying to many sites, e.g, a timesheet with ten sites would be copied ten times and placed in the ten site files. SCRIPS has enabled EPA to free up storage space and increase resource efficiency.

Mission Functions Supported: The SCRIPS imaging system supports processing of Superfund cost recovery reports by providing: improved records management through increased control and enhanced document processing; elimination of need to produce multiple photocopies for filing; improved production of reports compared to the old manual methods; and, automated electronic information redaction without altering the original images.

Agency-wide Planning: EPA senior management have adopted an enlightened approach to imaging systems by: establishing an advisory committee; publishing official internal guidance manuals; and, issuing detailed agency policy directives. The EPA's Office of Information Resources Management has issued a formal document entitled "Guidance for Developing Image Processing Systems in EPA". This document provides generic guidance to EPA managers and administrators in analyzing agency mission needs, and explains the process involved in justifying imaging systems. The report discusses alternative technologies and provides representative cost benefit data, developed under the guidance of the EPA's Image Processing Systems Committee (IPS).

The IPS, serving as a senior level management advisory group, evaluates and makes recommendations relative to acquiring imaging systems, and ensures that digital imaging systems are cost effective and meet agency needs. The committee contains a cross section of the EPA's top level managers, providing imaging technology with senior level visibility. The Office of Information Resources Management issued OIRM Policy Directive 90-01 dated 10/24/90 establishing the governing principles relative to the procurement and application of imaging systems for EPA organizations. Because imaging has become successful and more commonplace, the IPS committee has completed its task and is no longer active.

Integration with Existing and Future Systems: Based on the agency-wide approach towards integrating imaging systems, and the support of the agency's Image Processing Systems Committee, the SCRIPS index and image data storage is under evaluation for possible linkage to information from the EPA's Integrated Financial Management System through SCORE$. SCRIPS data currently supports the EPA's regional finance offices in producing cost recovery packages. The EPA's Administrative Division is considering optical digital data disk technology for agency reports and other top level documents.

Multi-site Information Processing: SCRIPS involves distributed data collection sites (regional offices) combined with a centralized nationwide data distribution (system hub is Research Triangle Park) linked via a wide area network. The development of cost recovery packages is done in the regions. A new T-1 data communications system is also under evaluation.

Interagency Cooperation: SCRIPS images and the EPA's hazardous substance sites database are available for use by other government agencies. For example, the data is considered a fundamental resource for the research scientists from the Agency for Toxic Substances and Disease Registry (ATSDR). In return, ADSTR's health assessments reports are an important component of the EPA's Superfund cost recovery reports included in the SCRIPS system.


SITE VISIT REPORT #8

AGENCY: Federal Communications Commission

SYSTEM:
RIPS-Record Imaging Processing System

CONTACT: Rick Kanner, Federal Communications Commission, Washington, DC

SUMMARY DESCRIPTION:

The Record Image Processing System (RIPS) is the Federal Communications Commission's (FCC) initial entry into digital imaging technology. The RIPS system is a technology-based research platform for storing and retrieving important agency records using digital scanning, indexing, quality control, optical digital data disks, and laser printing processes. The FCC implemented this state-of-the-art optical digital data disk imaging system to improve user access, reduce document storage space, and increase security over the original public filings. These files are the official records of rulemaking and adjudicatory matters. RIPS workstations are located in the FCC's public reference room and adjacent areas for use by researchers, staff, and contractor personnel. The Public Reference Room provides access to a computerized document index system that contains descriptive docket information. The index database management system identifies records available in either RIPS electronic image format or the original paper. RIPS retains the scanned document images on twelve-inch optical digital data disks stored in an automated retrieval jukebox. High quality laser prints of all or part of the docket case files are available to FCC staff, and public users obtain fee-based laser prints from an on-site contractor. The Record Image Processing System is important to this study because of the: improved public user access to public filings; implementation of a fee-for-printing service system; the system software documentation storage arrangements; and, the Federal Communications Commission's ongoing utilization of a Federal Records Center for archived files.

BACKGROUND:

The Federal Communications Commission regulates interstate and foreign communications including: radio and television broadcasting; telephone, telegraph, and cable television operation; two-way radio and radio operators; and satellite communication. It is responsible for the orderly development and operation of broadcast services and the provision of rapid, efficient nationwide and worldwide telephone and telegraph services at reasonable rates. This also includes the promotion of safety of life and property through radio and the use of radio and television facilities to strengthen the national defense. All FCC licensing proceedings, that have been designated for hearing, and rulemaking is a matter of public record, documented in sequentially numbered case dockets. The docket files provide information concerning litigation and action taken to resolve conflicts and issues regarding FCC rules, procedures and the granting of licenses. A typical docket may contain correspondence to and from the public and law firms, petitions and hearing records, court rulings, and FCC procedural records and final rulings. Newly received filings are processed by FCC staff, and are available for inspection the following workday. Documents are stored in binders according to Docket Number or Rulemaking number. Information pertaining to these filings are key entered into two automated FCC computer systems: the Dockets System, designed to track docket and rulemaking information; and, the Dockets History System, containing abstracts of significant filings entered sequentially in chronological order as an abbreviated case history. This licensing and rulemaking petition information is in the public domain, and is made available for public inspection. FCC staff share space with the general public and the photocopy equipment while reviewing the paper docket files in the agency's public reference room. Under the existing paper records system, two copies of each docket are maintained: the original filing; and, a photocopy used to satisfy user requests (unless the original is required). The process begins with the submission of a printed form containing the specific docket number, used by the information technician to manually search the paper files. Public researchers must view the files in the public reference room, while FCC staff may remove a docket file to their assigned work areas. The first-come, first-served nature of the process and the labor intensive, time consuming searching often results in time delays for attorneys, their legal assistants, and others. Additional time is needed to obtain photocopies from an on-site copier contractor, as the files may contain hundreds of pages. An additional problem is that the files may not be complete due to misfiled, lost, or stolen documents. The average life span of a docket is three years, although some have a considerably longer active life. Closed or terminated dockets are transferred to a Federal Records Center (FRC) for archival storage with proceedings before the Administrative Law Judge requiring certification by the appropriate official. The duplicate photocopies are removed from the shelves to make room for newly received dockets. The docket files may be retrieved from the FRC at the request of FCC staff or the general public, but this process involves delays up to ten days.

Origins: A search to identify alternative approaches to improve access and security over docket information resulted from two factors: unavoidable delays in providing users access to the records; and, concern by FCC staff reference personnel over the increasing theft and damage to the holdings. A requirements analysis study completed in 1988 determined the baseline management needs and user access issues. A technology feasibility study conducted six months later evaluated the suitability of optical imaging technology for automating existing FCC functions. The FCC's Request for Proposals (RFP-90-07 dated 23 May 1990) emphasized a comprehensive functional approach for the technology vendor community, and required specific hardware and software cost data. The contract was awarded in July 1991 to Severn Inc. (a subsidiary of ICF International) located in Lanham, Maryland, in spite of an initial delay due to agency funding problems. Severn Inc. staff conducted a live test capability demonstration in October 1991. The system equipment was installed at the FCC headquarters, and staff training was conducted in November 1991. System acceptance testing was completed in December 1991, followed by full production operations beginning in January 1992. Approximately eight staff members from the FCC's Public Information and Reference Services are assigned various responsibilities within the RIPS operation.

SYSTEM CONFIGURATION:

Date System Installed: 1991

System Installed by: Vendor

System Configuration Changed Since Installation? Basically, the system configuration has not changed since installation. However, an additional 1.3 gigabyte external hard disk was installed.

The system configuration includes:
  • Sun Microsystems Sparc 330 file server--Unix operating system.
  • Novell Inc. NetWare Local Area Network (LAN) for GOSIP compliant workstation communications.
  • Everex Systems Inc. 386-based workstations; Plexus XDP software.
  • MegaScan Technology Inc. 19-inch high resolution displays.
  • Workstation software: Microsoft Corporation's Windows Version 3.0; WordPerfect Corporation's WordPerfect Version 5.1.
  • Document Fujitsu M3093 scanners--flat bed, automated feeders.
  • Sony WDA-610 Jukebox, Sony optical drives and controllers, Sony 12-inch optical media.
  • Staff and public 300 DPI (16 page/min.) Fujitsu M3722 laser printers.

DIGITAL IMAGE CAPTURE:

The document scanning team adopted a "day forward" conversion approach, starting with newly received 1992 docket filings. Approximately ten older high interest, or "hot dockets" were retrospectively converted to test system performance. Docket materials are received and routinely date stamped by the FCC's Office of the Secretary. Next-day turnaround in supporting public access to the docket filings is an agency production goal. This includes: a docket history for pre-RIPS documents; and, scanned images and key entered index data for post-RIPS documents. Optical character recognition (OCR) scanning is a RIPS capability. Therefore, under consideration are plans for improving standardized forms data capture. Conversion Staff: Contractor employees, not supervised directly by agency staff, operate the scanning and indexing equipment after normal FCC business hours (third/overnight shift). Document Preparation: Documents are prepared toward the end of the normal work day (2 pm to 10:30 pm). An average of 1,000 documents are received each day, grouped into batches, and sorted thereunder by docket number. Unusual ink/paper colors, or over/under sized documents are batched, with the condition noted on each document, to insure special handling at the scanning station.

Document Scanning: Two identical flat-bed document scanners equipped with automatic feeders are used. One scanner was originally designated to be used for production, while the second unit served as a backup unit. After gaining experience, FCC system managers reallocated the scanning resources so that, depending on the workload, one scanner is dedicated to routine production, and the second unit performs quality control/rescanning. The scanning equipment accepts documents up to 8.5 x 14 inches with two-sided printing, of varying thicknesses and physical conditions. The scanner software assigns a unique document control number to each document scanned, and a management reporting system automatically collects scanner production statistics.

Disposition of Original Records: Retain documents longer than 10 years - Permanent. Quality Control: Images are visually checked immediately after scanning. A second more detailed inspection occurs at quality control, with a "scan/edit" function allowing error correction. A log of the types and quantities of rescanned images provides useful feedback to the prototype developers. The scanned images are temporarily stored on magnetic disk buffer prior to recording onto the optical digital data disks. This allows any necessary changes/rescans to be performed as required before permanent recording. Quality evaluations are conducted the following morning by FCC Public Information and Reference Services' staff to determine contractor performance including: scanned image legibility; page sequencing; and, key entry index data accuracy. The scanned images are compared to the original documents, with image pass/fail determinations based on the observer's judgment. Rescan operators use the scanner's contrast controls to correct improperly exposed (too light/dark) images. Images may also be reordered, and skewed images may be rescanned.

Scanning Resolution: The docket case file documents are digitally scanned at 300 dots per inch (dpi) for improved screen display and print legibility.

Color and Gray Scale: The RIPS scanners record binary (white/black) images only. Documents containing pink and yellow inks or paper may require rescanning due to scanner blindness. Complex or "busy" documents, such as cartographic illustrations, may create compression and data transmission problems. Image Enhancement: The RIPS scanners are equipped with basic light/dark contrast threshold image processing.

Compression/Decompression: RIPS utilizes a fully proprietary image compression scheme prior to network transmission and storage. Testing has shown that 33 megahertz synchronous microprocessors provide improved image compression, display, and printing (over slower 25 MHz devices).

DOCUMENT INDEXING

On average, the Public Information and Reference Services' staff establishes two new dockets every workday. Index records for new docket cases are created in the computerized data management system containing an index to the paper docket files received since 1983. The index records indicate the availability of RIPS scanned documents. The docket information system provides three case types: Active Cases that are frequently requested and remain available on disk cache; Archived Cases loaded onto disk cache upon user request; and, Retired Cases considered to be non-scannable back files.

Creation of Index Database: Document indexing may be performed either before or immediately after scanning. The index must be created before the scanned image can be saved. The docket itself is used as a key entry "face card" for completing the index data fields. A docket history is created using the index data, and users may scroll through an on- screen listing of documents stored in the system. The index data base signifies the requestor that a document is available as an optical digital data disk image.

Document Index Terms: The FCC has a Standard Form 3060-0486 for collecting appropriate document indexing information. The form requests that the first seven items of information are typed in the spaces provided with a maximum number of characters as shown in parenthesis. The completed form is then attached to the first page of the filing. Docket Number (7 characters) Rulemaking Number (8) Date of Filed Document (mm/dd/yy) (8) Name of Applicant/Petitioner (last, first, mi) (25) Law Firm Name (25) Attorney/Author Name (last, first, mi) (25) File Number (20) FOR FCC USE ONLY: Document Type (2) FCC/DA Number (10) Release/Denied Date (mm/dd/yy) (8) Receipt/Adopted/Issued Date (mm/dd/yy) (8) Viewing Status (1) Ex Parte/Late Filed (1) This information serves two purposes: it provides the RIPS system key entry operator with a quick reference source during the database building process; and, it assists public users in querying docket information.

Location of Index Database: Image index, docket history, and other character-based index data are stored magnetically. Magnetic disks allow the index data to be easily revised and support more efficient index searches. Indexing Software: The docket history index database is in ASCII format. When a non-imaged docket index record is retrieved, the system allows the researcher to scroll forward and backward through the docket history text file and perform simple character searches. The indexing system was written using X-Turbo and Plexus XDP software. System software can also control user access to documents selectively classified as not for public or staff viewing/printing.

OPTICAL DIGITAL DATA DISK STORAGE:

After scanning, indexing, and quality control acceptance, the images are written to the permanent write once, read many (WORM) optical media. The magnetic to optical conversion is a software driven capability accessed under the system manager's menu-driven operating system.

Image File Headers: No technical information is available from the vendor describing the system's proprietary image file header format.

Error Detection/Correction: No information is available from the vendor; no optical digital data disk failures or lost image data have been reported. Recording Process: WORM technology; SONY bi-metallic alloy recording process.

Optical Digital Data Disk Composition: Polycarbonate disk substrate. Capacity: 12-inch diameter dual-sided optical media; data storage capacity is 3.2 gigabytes per side; 6.4 GB total per optical digital data disk.

Number of Optical Digital Data Disks in Use: There are currently 10 optical digital data disks in the jukebox. Jukebox: Sony Corporation jukebox, 50-disk capacity, equipped with two Sony optical drives and Sony drive controller. Total jukebox data storage capacity is 320 gigabytes. The RIPS system initially experienced intermittent software problems when committing images to optical digital data disks or examining platters into the jukebox. The jukebox would reset thereby temporarily disrupting service to the system.

Storage Environment: The imaging system's server hardware and optical digital data disk jukebox are located in an air conditioned, security controlled computer room in the FCC's Washington, DC headquarters. The document scanners and user retrieval workstations are in normal office environments.

RETRIEVAL AND OUTPUT:

The RIPS indexing subsystem includes listings of documents available in image format and a historical abstract of the dockets prepared in report writer style. The digital images are automatically retrieved from the optical media, and transmitted to the user's workstation magnetic disk cache. The user workstations feature nineteen-inch high resolution monitors for displaying the index data and scanned images. Remote agency staff can access index data using RS 232-based terminals.

Primary System Users: Staff and public users access the index and image data using RIPS workstations.

Search Fields: Public researchers use computer workstations and powerful search capabilities to locate docket related information. Since the amount of on-line help for users is limited, maximum search success is achieved after a user gains increased personal experience in system use. After successfully signing on, the researcher is confronted with a succession of search fields. For example, the "Subject" field allows keyword and free text searching, while boolean searching is provided using a "*" as a truncation symbol. The document screen display includes numerous field identifiers including (but not limited to): Bureau Docket Rulemaking Part of the Docket Status-----Open-----Restricted File Number Subject Petitioner Filed by Location Channel Call Sign Date Closed Appeal Number Commission Decision If a researcher selects to search a docket history, for example, the FCC's computer system responds within approximately five seconds with: the chronological order of events with a date and description; the type of document (e.g. motion, letter, order, report); the names of significant people; reference numbers; and any additional narrative information.

Display Output: The workstations feature 19-inch high resolution (150 DPI) MegaScan Corporation monitors capable of displaying two 8.5 x 11-inch document images side-by-side.

Disk Buffer: The imaging system's magnetic disk buffer has a 12,000 page capacity. The first two images are retrieved from optical storage immediately upon user request, and stored in the workstations cache memory. While these first two images are being viewed, the remaining images in the requested file are retrieved automatically.

Laser Printing: The system releases queued print requests to the staff printers, and provides public users with fee-based laser prints on request. The 300 DPI printers provide ASCII text or image data at 16 pages per minute. Hard copy prints can be generated either singly or in a batch mode. All printing and fee collection associated with public user print requests are handled by contractor employees. RIPS software restricts print access for images not to be released to staff and/or the public.

DATA MIGRATION POLICY ISSUES:

Linkages with Other Agency ADP Applications: Existing Dockets System and Dockets History System data was converted for RIPS use with no information loss permitted. The FCC plans to make docket information more widely available by: expanding the existing LAN network; and, replacing existing dumb terminals with new PC workstations.

Network Transmission: RIPS uses an internal Novell NetWare local area network that currently does not support remote access links.

Backup of Image and Index Data: Index data is stored on magnetic media under UNIX servers. A SUN workstation using 150 Mb magnetic tapes is used for daily data back-up, and transition log files are backed-up using XTurbo. Scanned digital image data is written onto two optical digital data disks simultaneously. One optical digital data disk serves as the user access copy, and the second disk, when filled to its pre-determined capacity, is stored off-site. This provides secure off-site data storage, and the original paper records and filing systems are also maintained.

Technical Support and Documentation: Initial monitoring of systems operations, problem diagnosis, and trouble-shooting is performed by on-site FCC technical system managers. RIPS has maintenance contracts for vendor-supplied technicians, software maintenance, and spare and/or replacement parts. FCC system administrators are confident that the technical documentation and training materials provided by Severn Inc. are very comprehensive and of high quality. Printouts of the RIPS system's proprietary software (computer language=4GL) source codes are to be maintained in a security controlled escrow account. This will provide the FCC with critical systems backup documentation in case of unforeseen problems with the software maintenance vendor.

Interoperability: Although the RIPS system currently offers minimal interoperability, the agency plans to upgrade staff terminals and communications to provide increased information access. The Severn Inc. Plexus XDP software is proprietary to RIPS. The workstations are based on level 1 of the small computer systems interface (SCSI-1).

Migration Plans: Studies conducted during the design phase indicated that dockets are most likely to be consulted during the first three to five years of their life. Based on this, optical digital data disks containing digital images older than five years will be retired and "archived." The system's user interface provides a message if the file is "off-line," and provides a capability to reverse the archive process to allow user access to the data. RIPS tracks the status and location of all archived optical digital data disks, and notifies the system operator to load a specific optical digital data disk upon user request.

OVERVIEW OF SIGNIFICANT ISSUES:

Business Process Re-Engineering: In anticipation of the new imaging system the FCC's Office of the Secretary, under whose auspices RIPS was developed, insured that original documents were received and maintained as a part of the official record. Previously, the responsible bureau or office received the original petition for rulemaking and copies were maintained in the official files. In addition, the duplicates were no longer maintained starting with 1992 docketed and rulemaking proceedings.

Prototype Evaluation: The FCC's Public Information and Reference Services conducts technical reviews of the system's performance. The Public Information and Reference Services' staff also meets regularly with public research room users to collect system data. The advantages of RIPS identified to date include: the system's user friendliness; increased service to the public in better file access, printing, and information searching capabilities; reduced paper document storage space needs; and, improved file integrity and document preservation. The technical reviews and user feedback will be valuable input in the agency's decision to proceed with Phase Two of the technology implementation.

System Expansion: FCC's Public Information and Reference Services has adopted an agency-wide, three phased integration approach for imaging technology. Phase One requires the successful in-house implementation of the RIPS imaging system for staff and public users. Phase Two includes: an expansion of the RIPS network image data agency-wide, providing electronic docket information to FCC staff using existing PCs; and, replacement of existing computer terminals with PC workstations. Phase Three involves making RIPS docket information available to remote users via dial up access modem communication links.

Future Systems: FCC management is confident that the agency has acquired sufficient experience with imaging systems so that any future systems would be integrated by in-house technical staff. The agency would serve as its own systems integrator and specify commercial off the shelf (COTS) equipment whenever possible.

Public Access: Remote user access via modem linkages raises several legal issues of using electronic images rather than the original documents. These include: dealing with authentication of handwritten signatures; security issues and restricting access to images and prints; how much image processing is acceptable after the original scan; how do you factor in cost recovery; and, how best to introduce attorneys and other non-technical individuals to efficiently use computer technology.

System Reporting: RIPS has a transaction log capability for recording important production statistics. FCC management is evaluating the need and features of additional system reporting capabilities.


SITE VISIT REPORT #9

AGENCY: Library of Congress

SYSTEM: LC-CRS Optical Imaging System

CONTACT: Ann Christy, Information Technology Services, Library of Congress, Washington, DC; Kristin Vajs, Congressional Research Service, Library of Congress, Washington, DC

SUMMARY DESCRIPTION:

Since November 1990, the Congressional Research Service (CRS) of the Library of Congress (LOC) has operated a digital imaging and optical digital data disk system in support of congressional research and reference activities, including a current awareness service. This system provides CRS and congressional staff with access to two extensive electronic bibliographic databases of recent public policy literature and digital images of selected items. The CRS system utilizes Sun workstation platforms and links to the SCORPIO system via a 3270 interface. A PC-based multitasking workstation with a multifunctional printer is currently being developed under contract. A high-speed dual-sided scanner captures the document images. Images are recorded onto 12" optical digital data disks stored in an automated jukebox containing two drives. A single interface allows users to search the two databases and identify, retrieve, and view document images on 19" high resolution display monitors. Remote print capabilities also exist on workstations linked in a Local Area Network across several Capitol Hill facilities. Future plans include expanding the variety of information stored in the system, updating the existing scanning system, an improved user interface, and linking the system to the Capitol Hill-wide network, CAPNET, which is currently under development. The CRS system is important to this study because of: the long term experience with imaging systems acquired by the LOC since the early 1980's; the data communications link to a mainframe computer; and, the visibility and high performance demands placed on the imaging system.

BACKGROUND:

The Library of Congress was established in 1800, with a primary responsibility of providing information services to Congress. As the Library has developed, its range of services expanded to include the entire government in all its branches and the public at large, becoming a national library for the United States. The Library's extensive collections are universal in scope, including books, manuscripts, maps, photographs, prints, drawings, audio and video recordings, motion pictures, microforms, newspapers, periodicals, and pamphlets on every subject and in a multitude of languages.

The mission of the Congressional Research Service, a department of the Library of Congress, is to support the research and information needs of the Legislative Branch of the Federal government. The Congressional Research Service does not serve the general public. The CRS provides objective, nonpartisan research, analysis, and informational support of the highest quality to assist Congress' legislative, oversight, and representative functions. CRS is organized to respond objectively to congressional inquiries for information and analysis at every stage of the legislative process, and in subject areas relevant to policy issues before Congress. The CRS is composed of seven research divisions spanning a range of subjects and disciplines, and two information divisions providing reference, bibliographic, and other informational services. CRS creates specialized reading lists for Members of Congress and their staffs, and disseminates other materials of interest. The CRS's Selective Dissemination of Information (SDI) Service is a current awareness service for newly released public policy literature.

Beginning in 1975, SDI documents were converted to microfiche to improve document control and distribution. The microfiche system served adequately into the 1980's. As time went on, Library managers were confronted with increased user demand for information while concurrently dealing with deteriorating and obsolete microfiche printing equipment. Delays inherent in the preparation of microfiche sparked interest in technology which could provide faster access to the documents. These factors were instrumental in the decision to migrate to an optical digital data disk imaging system as the primary document storage source. The more than eleven hundred SDI service subscribers define their unique areas of research interest, and receive customized summaries of the latest reports and articles each week. Librarians who are subject area specialists select material from over one thousand recently received serials, government documents, "think tank" publications and research studies. The selected items are indexed, abstracted, and entered in the Library's computer databases. Subscribers receive listings of new subject area information and they may request the full text of items of interest by using an SDI order sheet. The information stored in the imaging system is also used by CRS technical and professional staff when responding to congressional inquiries and performing other research, and by Congressional staff using image workstations in CRS reference centers.

Origins: A decision was made in 1984 to develop a small pilot optical digital data disk system for testing the feasibility of using imaging technology. Library administrators sought noncopyrighted Government materials for pilot scanning; permission to scan copyrighted material was received from the publishers beforehand. Pilot testing focused on the Congressional Record and congressional publications for the 99th Congress, the Louis E. Asher autograph collection of presidential and vice-presidential letters and portraits, and serials with article level indexing. The original LOC pilot imaging system was developed and installed by, what is now, Integrated Automation. The pilot system was a popular public demonstration tool and confirmed the efficacy of existing scanning technology. This pilot system suffered from significant printer interface problems and poor print hardware vendor support. A subsequent contract to redo the hard copy printing subsystem allowed the Congressional Research Service to use the pilot system for an interim period while congressional approval was sought for systems enhancement. An updated system was installed in November 1990, with production operations beginning in the spring of 1991. Responsibility for the imaging system and related programs shared by the Congressional Research Service and the Information Technology Services; the staff from both departments serve on the project team. In CRS, Library Services Division staff manage the daily operations, including the preparation of the computerized databases. The Automation Office supplies user training, and manages the printing centers. Systems management, design, development, and maintenance, both in-house and contractual, is provided by the Information Technology Services, the Library's computer department.

SYSTEM CONFIGURATION:

Date System Installed: November 1990

System Installed by: Integrator. The CRS scanning, storage, and printing system components are located in the Library's James Madison Building. The system uses a Sun Corporation server with remote user access LAN links to sixteen Sun Sparc 1, 1+, and IPC workstations. Document images are captured with a refurbished high-speed, double-sided customized scanner, and a Fujitsu desktop scanner. Document images are recorded on 12" WORM disks stored in a Cygnet jukebox. Two high-speed printers are each capable of printing up to 10,000 double-sided pages per day, and remote workstations have local low volume laser printers.

  • Systems management: Sun 690 system server; 100 Mbit/sec FDDI network backbone using Cisco AGS+ routers; local area network configuration to workstations. The system has ten gigabytes of magnetic storage capacity for image, index, and systems operations data.
  • Workstations: Nineteen Sun Sparc 1, 1+, and IPC's, with split screen 19-inch monitors; ZEOS 486 PC workstations with Cornerstone dual page 150n monitors
  • Document Scanners: Terminal Data Corporation high-speed, double-sided custom scanner; and a Fujitsu desktop scanner, model M3093E.
  • Jukeboxes: Cygnet 1802 with a 50 disk capacity.
  • Optical Media: Maxell 12-inch WORM optical digital data disks.
  • Printers: Fujitsu Luna 2 high-speed, double-sided laser model M3773; remote workstations equipped with Fujitsu model 7300 printers and Telaris printstations, models 1590-T and 1794.

DIGITAL IMAGE CAPTURE:

A document scanning room in the James Madison Building contains the high speed and tabletop scanners, and image quality inspection workstations. Each scanned document is examined to assure that it is complete and the pages are in the correct sequential order. Conversion Staff: Library staff operate the scanning, indexing, and retrieval equipment on-site.

Document Preparation: The journal literature and document formats require unbinding, sequential ordering, and other manual handling prior to scanning.

Document Scanning: A high-speed Terminal Data Corporation scanner, refurbished from the original LOC pilot project, captures both sides of each page simultaneously. A companion desktop Fujitsu scanner provides 300 dpi resolution. The document scanners and the application software accept a variety of physical paper types and thicknesses. A display monitor previews scanned images temporarily stored in magnetic buffer.

Disposition of Original Records: Some documents are destroyed after scanning. Others are retained for an interim period (3 years or less) in office or divisional collections or a large vertical file in CRS.

Quality Control: Scanner operators visually inspect each image for eye-readability (i.e., not too dark or light), and that the images within a document are in the correct sequential order. Images that fail the quality inspection are immediately rescanned. Scanning Resolution: The documents are scanned at 300 dpi.

Color and Gray Scale: The scanners have difficulty capturing shades of red, and have problems detecting some shades of blue and green.

Image Enhancement: Basic light/dark contrast adjustments. The Fujitsu scanner has photographic half tone capability. Compression/Decompression: The compression algorithms conform to CCITT Group 4. Autocropping: Black edges are removed in images to save storage space and enhance appearance.

DOCUMENT INDEXING:

Creation of Index Database: Optical character recognition (OCR) technology captures index data (unique accession number) from specially created input header sheets. OCR errors require corrections by key entry operators.

Location of Index Database: Magnetic disks store index data and document location pointers to the optical digital data disk images.

OPTICAL DIGITAL DATA DISK STORAGE:

The imaging system's computer controllers, file server, and optical disk equipment are installed in the Library's James Madison Building computer room. After review and acceptance, the image data is transferred from magnetic cache onto the WORM optical digital data disks and the Library's databases are updated with optical disk references.

Image File Headers: A proprietary Sun Raster file header, not compatible with the Tagged Image File Format (TIFF), is used.

Error Detection/Correction: System not identified by vendor. Recording Process: 12" WORM, dual sided optical digital data disks.

Optical Digital Data Disk Composition: Maxell polycarbonate media. Capacity: 1.3 gigabytes per optical disk side; 40,000 images total.

Number of Optical Disks in Use: 30 optical disks contain image data. Jukebox: Cygnet 1802 jukebox (capacity 50 disks)

Storage Environment: The optical disk jukebox is installed in the Library's computer room.

RETRIEVAL AND OUTPUT:

User workstations and printers are available in several remote sites. An image zoom feature has been developed for improving image display. Primary System Users: LOC staff and congressional users access the index and image data daily.

Index/Image Access: Workstations connected through a communications network provide searchable access to the index and image data at several sites including: the LaFollette Reading Room; the Russell Senate and Rayburn House Reference Centers; the Senate Library; and the Joint Committee on Taxation.

Laser Printing: Fujitsu high-speed laser printers and convenience printers produce 300 dpi hard-copies.

DATA MIGRATION POLICY ISSUES:

Linkages with Other Agency ADP Applications: A 3270 emulation window supports the search of the SCORPIO Public Policy Literature and CRS Products databases and notifies the user when the document is available on optical digital data disk. The image system informs SCORPIO that a new file entry is available, and a unique alpha numeric accession number becomes the key retrieval link.

Network Configuration: The imaging system can support: Ethernet Local Area Network, FDDI backbone, and Token Ring. Fiber optic cable is utilized. A Library management goal is to eventually support one hundred image-enabled user workstations.

Backup of Image and Index Data: Plans call for ITS operations staff to create exact duplicates of the original optical digital data disks. The magnetically stored index data is routinely backed up as part of ongoing operations. The system's magnetic storage capacity is two gigabytes of cache memory, improving system response and data backups. A LOC goal is to store related data on dedicated optical digital data disks (external and internal LOC documents are currently intermixed).

Technical Support and Documentation: The imaging system contractor is currently preparing technical system information. Comprehensive documentation helps to avoid loss of system knowledge following the departure of the original development team. Interoperability: The CRS system has network links to the Library's SCORPIO system.

OVERVIEW OF SIGNIFICANT ISSUES:

Business Process Re-Engineering: The work flow of the Master File Unit, which handles scanning and document delivery for the SDI (current awareness) application, was redesigned to capitalize on the strengths of imaging technology. The time horizon for preparing microfiche was lengthy, extending over several weeks. Using imaging technology, staff seeks to scan material within 24 to 48 hours of its listing in the database.

Image Legibility: The system uses 19-inch split screen displays. The left half of the screen allows database searching and document display at 115 dpi, and the right side provides function keys. 115 dpi display resolution is considered inadequate for extended image viewing. System administrators note that the resolution level was chosen to increase image decompression rates and speed up image display. This decision was based on the assumption that users request images only to confirm the need to examine the full text of a document, and that hard copy output is the primary goal at the present time.

Other imaging applications under development: The Library's Information Technology Services is developing a system for processing copyright applications and storing Copyright registrations on optical digital data disk. Copyright certificates will be printed from the image of the registration form. The planned copyright system will retain electronic images of over six million copyright related documents, with the original records relocated to off-site storage.

Integrator/Vendor Support: Vendor support is critical to any project's success. The CRS imaging system is tied to a series of vendor contracts, with modifications requiring planning and long lead times. Integrated Automation provided total integration services including systems analysis and design. Library management is seeking to build a staff knowledge base sufficient to develop enhancements in-house, providing the Library with greater internal control over the system development process.

Standards: The CRS has an internal working group tasked with monitoring the status of existing and emerging standards, and ensuring that planned systems are in conformance with applicable standards. Some of the areas of interest to the standards working group are: internal networking and fiber optic cabling; compatible networking systems; and related internal data communications issues.

Long Term Issues: 1) Budget constraints may limit new system initiatives. 2) The Library wants to move toward industry-wide standards, for example, changing the proprietary Sun Raster file header to the Tagged Image File Format (TIFF). 3) The Library is moving toward using off-the-shelf hardware and software components rather than proprietary system components. 4) The user interface needs to be redesigned prior to widespread introduction of the system on Capitol Hill. The system would also benefit from a replacement or major enhancement of the SCORPIO system which provides the search engine for document identification.


SITE VISIT REPORT #10

AGENCY: Minerals Management Service

SYSTEM: Royalty Management Program System

CONTACT: Tim Allard Minerals Management Service Denver, CO

SUMMARY DESCRIPTION:

The Department of Interior's Mineral Management Service's (MMS) imaging system stores and retrieves important Royalty Management Program (RMP) revenue collection documents. During its ten year existence, the RMP has collected rents and royalties for the United States Treasury of nearly fifty billion dollars. In 1988 the Fiscal Accounting Division acquired a pilot digital imaging system to aid in processing this paper-intensive workload. This system was obtained to evaluate the suitability of digital imaging technology for revenue collection documents. The RMP staff is experienced with imaging and other data capture methodologies, and is committed to a long term strategy to increase the agency's information collection, processing, and distribution capabilities. RMP management has monitored the pilot system's performance, and have integrated additional components to refine the core system. The imaged documents delineate royalties paid by oil, gas, and mineral recovery companies for removing natural resources from Federal and American Indian lands nationwide. Timely processing ensures correct payment and subsequent distribution of revenues, and also helps to resolve reporting inconsistencies. The pilot system's tabletop scanners convert the financial reports and supporting documents to electronic images stored on optical digital data disks. Rather than requesting the original paper records, agency staff now use high resolution workstation display monitors to view the optically stored images. An optical disk jukebox contains the most recent information, with the older disks manually accessed on user request. The MMS optical digital data disk system is considered important for this study because of the importance of the agency's mission; the lessons learned during the evolution from a pilot to a production system; and management's commitment to improving the agency's technology capabilities and user services in the long term.

BACKGROUND:

The Department of the Interior has stewardship for most of the U.S. nationally owned public lands and natural resources, and acts as the Nation's principal conservation agency. This responsibility includes the need to promote the wise use of the land and water resources, protect fish and wildlife, preserve the environment and national parks, and monitor the American Indian reservation communities and island territories under United States administration. The Minerals Management Service was established by the Interior Department in 1982. All Outer Continental Shelf leasing responsibilities of the Department of the Interior were consolidated within the Service. Secretarial Order No. 3087 and amendments provided for the transfer of royalty and mineral revenue management functions, including collection and distribution, to the Minerals Management Service and transferred all on-shore minerals management functions on Federal and Indian lands to the Bureau of Land Management. The Minerals Management Service assesses the nature, extent, recoverability, and value of leasable minerals on-shore and on the Outer Continental Shelf. It ensures the orderly and timely inventory and development, and the efficient recovery of mineral resources. The Service conducts resource evaluation and classification, environmental review, leasing activities and management, and program inspection and enforcement. The Service collects royalty payments, rentals, bonus payments, fines, and other revenues due the Federal Government and Indian Lessors from the extraction of mineral resources. The revenues generated by mineral leasing are one of the largest non-tax sources of income to the Federal Government and are shared according to collection site. For example, some on-shore revenues are distributed to the States, offshore revenues go to the general fund of the U.S. Treasury, and revenues collected from American Indian controlled lands are distributed to the Indian lessor. The Mineral Management Service is headquartered in Washington, DC, with components in Herndon, VA; the Royalty Management Program is headquartered in Lakewood, CO; four Outer Continental Shelf regional offices; and three administrative service centers.

Origins: A systems consultant studied the internal Royalty Management Program operations prior to acquiring the pilot imaging system. This study categorized the existing paper work flow processes, and determined that operational improvements in dealing with the ever increasing workload could be achieved using a technology-based approach. The paper work flow process required data entry from Form MMS-2014/4014 royalty reports and Payor Information Forms (PIF) Form MMS-4025 into the agency's Auditing and Financial System (AFS). The paper documents were routinely stored in the agency's File Room, and after a prescribed life cycle were transferred to the Federal Records Center. The paper files were manually retrieved in response to user requests, and helped to adjudicate problems in royalty processing. Recommendations issued in the consultant's study contributed to the decision in 1988 to acquire a small pilot document imaging system. The imaging system scans the documents and stores the information on write once read many (WORM) optical digital data disks. The system provides quick retrieval response to the stored images using special imaging enabled workstations. The imaging system eliminates the delays users normally encounter with out-of-file or misfiled paper records. Productivity and system throughput factors are critical factors to the RMP due to specific elapsed time limitations on royalty processing. Document imaging could be performed at the beginning of the RMP's processing work flow, eliminating the need to re-key the index data into different systems. The stand alone imaging system now accepts documents after the data has been processed by other RMP systems. Since the government is required to pay interest on any late disbursements, future imaging systems should be integrated with in-house ADP applications to ensure timely processing. Other agency data processing systems are the Bonus and Rental Accounting Support System (BRASS) and the Production Accounting and Auditing System (PAAS). The existing pilot imaging system hardware is now nearing the end of its normal life-cycle. The MMS is continuing to assess and plan for alternatives to improving the short and long-term RMP data processing capabilities. They plan to adopt a cohesive information processing network for linking the agency's data processing systems.

SYSTEM CONFIGURATION:

Date System Installed: 1988

System Installed by: Integrator/Vendor Staff.

System Configuration Changed Since Installation? (Yes) The pilot system initially had only three workstations, but was expanded with additional components as it evolved into a larger production system. Equipment added in January 1991 included: optical disk jukebox; new document scanners; four retrieval stations; and a remote communications bridge.

Image Scanners--Two Ricoh IS400 tabletop document scanners with automatic document feeders.

Workstations--LaserData Corvette workstations with 150 DPI display screens.

Optical Storage--Laser Magnetic Storage International Corp. (LM 1200 media) stored in an 16-disk capacity Access jukebox.

DIGITAL IMAGE CAPTURE:

RMP document preparation clerks organize the files to ensure efficient processing. Document scanning is performed "after the fact," as the incoming documents are not scanned until all immediate agency processing of the financial information is completed. The scanned images are quality verified, and the original documents are stored in the agency's Central Filing System and subsequently transferred to the Federal Records Center. Agency staff perform information searches with the imaging system's workstations to retrieve the optically stored images.

Document Preparation: The 2014/4014 royalty reports and Payor Information Forms (PIF) documents are organized according to case folders containing pre-defined index data. The case file documents contain the report form, transmittal letters, and related information. The significant data entry information on the folder labels includes: Document Control Number; Payor's Name and Code; and, Report Date Month/Year.

Conversion Staff: The system is operated and the records converted on-site by agency contractor staff.

Document Scanning: Tabletop Ricoh IS400 scanners capture the images. MMS staff provide training in document scanning procedures to the contractor management, who in turn train the equipment operators.

Disposition of Original Records: The records disposition schedules for the majority of these financial documents are one year retention on-line, followed by six years of archival storage. Financial records related to American Indian Communities are required to be retained indefinitely.

Scanning Resolution: Documents are scanned at 300 DPI.

Color and Gray Scale: System administrators report no problems with scanning different ink or paper background colors. Since the primary input is high contrast business documents, there is no immediate need for gray scale scanning.

Image Enhancement: No image enhancement is used other than the scanner's basic light/dark contrast controls. Compression/Decompression: Vendor supplied proprietary software.

Quality Control: Although no test targets are utilized to calibrate scanning equipment (Ricoh IS400) or to serve as quality benchmarks, all document images are visually inspected to ensure legibility. When image quality problems are detected after the images have been recorded on the optical digital data disks, the documents with poor quality or missing images are retrieved and re-scanned. Corrections to the electronic index and optical digital data disk "pointers" ensure that the corrected versions are retrieved.

DOCUMENT INDEXING:

Creation of Index Database: Index data is key entered at the scanning workstations, and an electronic case file "folder" is established via user interface screens. This index data is entered at the time of the image creation.

Location of Index Database: The document imaging system's index data is stored magnetically on a PC-based server, linked to the system's retrieval workstations via a local area network.

Index Structures: Key entered index data is based on a scaled down version of the MMS's files room document tracking system. The imaging system's index information includes a document control number, payor name and code, document date, and report form (e.g., transmittal letter), also duplicating some of the accounting information stored in the agency's mainframe database system.

OPTICAL DIGITAL DATA DISK STORAGE:

The RMP optical storage system consists of WORM optical media, two LMSI optical drives, and an automated jukebox containing two additional LMSI optical drives.

Image File Headers: The optical digital data disk file headers are configured in proprietary Laser Magnetic Storage International Company (LMSI) format.

Error Detection/Correction: No error correction code (ECC) information is available. Optical digital data disk data retrieval failures were rarely reported.

Recording Process: 12" WORM, dual-sided LMSI media (LM 1200-002).

Optical Disk Composition: Tempered glass, manufactured in England. Capacity: Each disk stores two gigabytes of user data (one gigabyte on each side).

Number of Optical Digital Data Disks in Use: To date, over 775,000 documents were scanned and stored on 88 twelve-inch optical digital data disks. These 88 disks contain backup images as well. Jukebox: The maximum capacity of the LMSI jukebox is 16 optical digital data disks. Due to the paper-intensive RMP workload, the agency has already outgrown the jukebox's capacity.

Storage Environment: The scanning equipment, the active (backup) copy of each platter, and the jukebox are all installed in a normal office environment. The original optical digital data disks are stored off-site in a temperature and humidity controlled tape vault.

RETRIEVAL AND OUTPUT:

Document image retrievals are made directly from the optical digital data disks without intermediate storage on magnetic disk cache. Public access to the records is denied due to the confidential financial information.

Primary System Users: The primary system users include Auditing and Financial agency accountants and other staff members responsible for managing the system. The royalty payments information is proprietary, and the digitally stored images must be security protected and not open to the general public. The RMP generates comprehensive monthly reports describing its operations, and conducts substantial financial audits dealing with the collection, distribution, and other accounting functions. These activities generate volumes of paper, requiring monitoring throughout their active life.

Image Display: The RMP imaging system uses LaserData Corvette workstations equipped with 19-inch monitors at 150 DPI resolution. The Corvette microprocessors utilize proprietary image compression software to improve storage and data transmission efficiency.

Image Access: The optical disk jukebox has a 16-disk capacity. The most recently created optical digital data disks are stored in the jukebox, while older disks are stored on metal shelving. Requests placed up to 2:00 pm each day for information recorded on optical digital data disks not stored in the jukebox is provided by midafternoon, as system operations staff are required to manually load the off-line optical digital data disks. System administrators noted a significant demand for images not residing in the jukebox resulting in user access delays.

DATA MIGRATION POLICY ISSUES:

Linkages with Other Agency ADP Applications: No linkages of index or image data with other RMP computer systems exist. RMP management is considering integrating the image system with central MMS database management and other accounting systems. This will eliminate the need to index documents several times as they are processed into the agency's various computer systems.

Network Configuration: The Royalty Management Program operates several local area networks (LANs), and is installing additional communication links between buildings using remote bridge technology. Long range plans specify remote access to image and index data via Wide Area Networks and FTS-2000. The RMP is undergoing a transition from a centralized minicomputers to a distributed processing environment using PC workstations, file servers, and local area networks. The emphasis for new RMP systems will be on standardized open systems architecture, avoiding proprietary hardware and software.

Backup of Image and Index Data: Index data: A daily backup of index data is made using magnetic tape, and a full index system backup is performed monthly also using magnetic tapes. The monthly backup tape is stored in a special vault in a separate building under controlled temperature and humidity conditions.

Image data: The scanned image data stored on optical media is copied onto a backup set of optical digital data disks. The RMP system has separate applications for Federal and American Indian data, effectively slowing down the optical disk copying process. A disk copying backlog may exist due to manual optical digital data disk handling.

Technical Support and Documentation: The Ethernet cabling system is supported in-house, the optical disk jukebox is under a separate maintenance contract; and all other components and software are supported by the LaserData VAR. RMP system managers noted that service support response time is slower than desired. RMP management noted that the imaging vendor staff is hesitant to modify system features such as user interface software. System administrators report that existing technical and administrative documentation is inadequate, and are pursuing this information through the value added re-seller. RMP staff are also independently compiling descriptive operational and system technical manuals.

Interoperability: At this point, the existing pilot document imaging system offers no easy interoperability with other RMP systems. Critical parameters such as file headers, image compression algorithms, and data transmission are proprietary imaging system configurations. Non-proprietary solutions are needed since some RMP data is shared with outside organizations including the Department of Interior and other Federal agencies, auditors, and various State agencies.

Migration Plans: RMP system managers expect the existing pilot system will be replaced due to the age of many original components. Due to the seven year records retention schedules, it has not been determined whether or not all image data will be migrated to a future system. Agency management expect to install a system with super VGA-type monitors, high density optical digital data disks, and local and wide area networking capabilities. The LAN/WAN will allow index and image data to be easily acquired or transported between other MMS computer systems, increasing the agency's ability to monitor royalty compliance.

OVERVIEW OF SIGNIFICANT ISSUES:

The MMS continues to review existing RMP processes to identify areas suitable for automated technologies:

Business Process Re-Engineering: Previously RMP successfully used microfilm as a document storage media. Imaging technology streamlines the agency's existing manual paper processing, and supports more efficient agency functions. RMP management is considering other approaches such as scanning all documents immediately upon receipt, and performing all subsequent internal work flow processes using the electronic case file images. This would eliminate manual paper handling throughout the agency's various buildings and improve document tracking and security. The trend is towards increased electronic filings already accounting for fifty to sixty percent of the total, although these are not stored in the imaging system.

Information Access: RMP management expects to integrate future imaging systems with existing computer mainframe databases, creating a "business data warehouse" concept supporting user's one-stop shopping for information, be it data, image, or other form. Although information sharing across agency departments is a goal, agency-level data exchange standards are needed within the Department of Interior for this to become a reality.

Integrator Support: Complex imaging systems must receive support from the all key players including equipment manufacturers, resellers, and contractor vendors and maintenance facilities. It is especially critical for original manufacturers to support system resellers when dealing with proprietary operating systems, microprocessors, and applications software.

Prototype Development: The agency has outgrown the existing imaging system's capabilities due to: software operational limitations; inability to significantly upgrade the software; and, the jukebox's limited optical digital data disk storage capacity. Users often wait to gain access to the optically stored images, and are demanding more services than are currently provided. MMS has determined the imaging system's limitations, and is assessing alternative approaches for short and long term solutions.

Document Processing Work Flow: Imaging can profoundly impact existing agency procedures and operations. One example is the Bonus and Rental Accounting Support System (BRASS), collecting and distributing rents received from Federal and Indian land leases. Incoming checks are currently OCR scanned and microfiche created for permanent storage. An upgraded imaging system could integrate this information, replacing the obsolete OCR/microfiche system.


SITE VISIT REPORT #11

AGENCY: National Oceanic and Atmospheric Administration

SYSTEM: Coastwatch Satellite Data System

CONTACT: Charles MacFarland, National Oceanic and Atmospheric Administration, National Ocean Data Center, Washington, DC

SUMMARY DESCRIPTION:

The National Oceanic and Atmospheric Administration (NOAA) utilizes optical digital data disk technology for archival retention of coastal environmental data. This data is considered a national scientific and historical resource supporting the monitoring of natural climatic events and man-made environmental factors, and their impact on rapidly changing global processes. The data collection system includes a digital communications network of earth-based detectors and the Advanced Very-High Resolution Radiometer (AVHRR) earth-orbiting satellites as part of NOAA's environmental Coastwatch program. The satellite image data, transmitted to NOAA's National Oceanographic Data Center, focuses on coastal environment activities of the continental United States, including the Great Lakes region. Upon receipt of the digital information, data errors are removed and the files are quality sampled to verify machine readability. The data files are then processed using specialized mathematical compression techniques to more efficiently utilize the optical storage media. The optical storage system consists of twelve-inch write once, read many (WORM) optical digital data disks stored in a automated retrieval jukebox. This automated optical media system more closely emulates the high performance data retrieval capabilities of random access, magnetic disks. This is in contrast to the former slower, end-to-end serial searches required with magnetic tapes. The NOAA system is important for this National Archives study because of: the agency's concern for preserving environmental data on tape; the relationship of NOAA and the National Archives; NOAA's use of data exchange standards; and, the agency's established disaster contingency and data recovery plans.

BACKGROUND:

The National Oceanic and Atmospheric Administration's mission is to explore, map, and chart the global ocean and its living resources and effectively manage, use, and conserve those resources; to describe, monitor, and predict conditions in the earth's atmosphere, oceanic conditions, solar activity, and inner/outer space environments; to issue timely warnings against impending destructive natural events; to assess the consequences of inadvertent environmental modification over several scales of time; and to manage and disseminate long-term environmental information. Among its principle activities, NOAA reports the weather of the United States and its possessions and provides weather forecasts to the general public; issues warnings against such destructive natural events as hurricanes, tornadoes, floods, and ocean wave tsunamis; and provides information services in support of aviation, marine activities, agriculture, forestry, urban air-quality control, and other weather-sensitive activities. In addition, the Administration operates a national environmental satellite system; and it acquires, stores, and disseminates worldwide environmental data through a system of meteorological, oceanographic, geodetic, and seismological data centers.

NOAA's National Ocean Data Center supports the agency's mission by collecting, organizing, and making oceanographic data available within the agency, other Federal agencies, and the academic research community. The data represents a cohesive historical record of the coastal environmental activity. NOAA operates three different series of earth-orbiting satellite networks, each monitoring and collecting various information: Geostationary Operational Environmental Satellites (GOES); polar orbiting environmental satellites, and Landsat. The satellites are equipped with special purpose computers controlling and initiating data communications. The agency also acquires real time data from weather stations, ships, radiosondes, and 10,000 remote data acquisition platforms that relay data back to the GOES satellites. Overall, data collected by the agency from all sources is growing at the rate of twenty terabytes a year, and now totals three hundred terabytes. The data collection rate is expected to double within this decade based on improved satellite data capture and transmission capabilities.

The GOES satellites support the forecasting duties of the National Weather Service for critical applications such as hurricane tracking. The information is used also for environmentally sensitive projects in research, architecture, engineering, and in structural design applications affected by the coastal environment. NOAA currently captures all transmitted data on computer hard drives and subsequently off-loads it onto magnetic tape cartridges. Due to the unavoidable degradation affecting magnetic media under long term retention, more than 240,000 existing tapes in the NOAA data library are aging and in danger of data loss. In response, NOAA is actively rescuing high priority historical data from deteriorating tapes and transferring it to newer format tape cartridges.

In cooperation with the National Archives, NOAA is attempting to determine the most appropriate methods for the long-term storage of important scientific data sets. Currently, the National Archives provides courtesy storage of oceanographic data on magnetic tape at NARA records centers. NOAA is also copying oceanographic and radiometric data from magnetic tapes onto write once, read many (WORM) optical media. The large format optical digital data disks are stored in automated retrieval jukeboxes for servicing user requests and archival storage. NOAA distributes some of its data sets on compact disc, read only memory (CD-ROM), and already has produced over forty five disc titles. NOAA has developed search and retrieval software included with the CD-ROM data. NOAA also distributes data electronically in a standardized format compatible with data sets compiled by the National Aeronautics and Space Administration, U.S.Geological Survey and other agencies. In partnership with the Interagency Working Group on Data Management for Global Change, NOAA has established an on-line master directory system and a PC version that provides access to over 950 significant data sets.

Origins: The optical digital data disk system was proposed by the agency's Coastwatch Program staff in the summer of 1990. Coastwatch data users require extensive on-line access to recently captured data. Due to the sheer volume of coastal satellite information in the data center, the former magnetic tape system was not meeting the user's retrieval performance needs. The Coastwatch data consists of environmental information pertaining to water temperatures, weather patterns, ocean currents, and other observational and instrument measurements on the entire coastline of the United States and the Great Lakes. The Coastwatch optical digital data disk system was patterned after a similar NASA data recording installation.

SYSTEM CONFIGURATION:

Date System Installed: NOAA's satellite data processing and storage system became operational in September 1990.

System Installed by: Integrator/Vendor.

NOAA's data storage system components include:
  • Digital Equipment Corporation (DEC) VAX 8530 computer system with 96 MB memory and floating point accelerator, 4 VUP (speed).
  • VAX 11-785 computer system with 64 MB of memory and floating point accelerator, 1.7 VUP (speed).
  • VAX 6000-510 computer system with 128 MB of memory and floating point accelerator, 13 VUP (speed).
  • Data communications network including Star Coupler configuration and Ethernet (Ethernet Server-16 ports); router to Internet, NASA, DAMUS, and NOAA backbone.
  • User workstations consisting of 48 PCs.
  • Multiple banks of magnetic hard disks, 9-track magnetic tape (800, 1600, 6250 bpi) drives, square tape (240 MB), and tape cartridge drives (5.2 GB and 2.3 GB).
  • Sony optical disk jukebox; 50 disk capacity (50 x 6.0 GB disks).
  • Technical system documentation, UPS power supply, and on-site systems maintenance staff.

DIGITAL DATA CAPTURE:

Data Capture Network: Environmental related information is captured in digital form in real time from multiple ocean and land-based detection sites. This data is relayed to orbiting GOES satellites, and then transmitted to the Oceanographic Data Center. Data Capture Staff: Oceanographic Data Center staff perform the operations of data processing, storage, and servicing researchers.

Input Quality Control: Data received from satellite relay stations is processed to correct random data errors and some sampling takes place to verify data quality.

Data Compression: Data compression is needed due to volume and file sizes of data received. The satellite image data is compressed based on a non-proprietary algorithm called IDIDAS. A GOES satellite transmits at a continuous data rate of 2 megabits per second, with image files containing approximately .25 MB of compressed data.

Disposition of Original Records: The optical digital data disks serve as the permanent archival storage media, replacing open reel magnetic tapes in the Coastwatch digital data archives. Optical media now provides the scientific community with improved data preservation, retention, and access.

INDEXING:

Location of Index Database: Index database and data access software are maintained on the agency's NCASS mainframe computer. Data security safeguards require a user's name/password to access the data. Software upgrades to the NCASS system's Oracle database programs are planned to improve data search and retrieval.

OPTICAL DIGITAL DATA DISK STORAGE:

Incoming data is initially staged on magnetic disks while data error cleaning and quality sampling operations are conducted. Validated data sets are transferred to magnetic tapes, eventually serving as the input source for the optical digital data disk system. Data transfer from tape to the optical digital data disks is performed using Sony Corporation optical disk drive and controller equipment. Improved user access is obtained by storing the optical digital data disks in the optical disk jukebox.

Image File Headers: The NOAA system uses Digital Equipment Corporation (DEC) proprietary FILE11 format to structure the information on the optical media. System administrators consider existing technical documentation on file structures and file headers sufficiently complete to maintain the system.

Error Detection/Correction: No information is available to system technicians detailing the extent that automated error correction mechanisms are invoked during the optical disk read/write operations. No optical digital data disk failures or data losses have been reported.

Recording Process: The Coastwatch system uses 12-inch, write once, read many (WORM), dual-sided disks using the Sony Corporation bi-metallic alloy recording process. Optical Digital Data Disk Composition: Polycarbonate substrate.

Disk Capacity: 3.2 gigabytes per each side of the two-sided optical disk. Jukebox: Sony Corporation 50 platter capacity automated jukebox unit with integrated Sony optical disk drives and drive controllers.

Storage Environment: Computer room controlled environmental conditions for all optical storage equipment. The system retrieval workstations are distributed throughout NOAA, operated under normal office environments.

RETRIEVAL AND OUTPUT:

Primary System Users: The agency's Coastwatch research staff and scientists use the satellite data to monitor global changes, to gain a better understanding of the earth's climatic systems, and to observe major environmental phenomena such as volcanoes and earthquakes. The data is also distributed to academic research centers in near real-time.

DATA MIGRATION POLICY ISSUES:

Linkages with Other Agency ADP Applications: The Coastwatch satellite data and its associated index and retrieval software are configured in a stand-alone mode. The environmental data sets do not directly support any administrative functions within the agency, and so have no linkage requirement with agency automated data processing applications (e.g., financial or personnel management). Given the agency's use of Local Area Networks and Wide Area Networks, however, the Coastwatch data is widely available to agency staff. Network Transmission: An Ethernet-based LAN transfers Coastwatch related information within the National Oceanographic Data Center. Data communication linkages to external systems use Internet and specially dedicated NASA/NOAA communication lines. Remote access software was developed jointly by NASA and the University of Miami in Miami, Florida. This software supports user registration and password protection, and provides on-line assistance on database search and retrieval.

Backup of Image and Index Data: The Oceanographic Data Center's computer system provides an automated archiving capability for Coastwatch data two days or older. In response to the scientific community's concerns with Coastwatch data preservation and access, magnetic tapes are routinely copied to optical media at the NODC located in Asheville, North Carolina. Software developed by the University of Miami allows special data blocks encoded on older magnetic tapes to be read, but due to unusual formats this data cannot always be easily manipulated at the users workstations. Currently, no Coastwatch optical digital data disk backup copies exist. System administrators recognize this as potentially problematical, and are considering methods to backup the most frequently accessed data. An optical disk backup program would involve both partially filled active/open platters, and disks that are completely filled to capacity. System managers are considering creating mirror-image backup optical digital data disk copies for security storage at a NOAA data center in Washington, DC.

Technical Support and Documentation: Optical disk hardware repairs are acquired from the original equipment vendor under a renewable maintenance contract. Most agency software is obtained directly from agency contracts with vendors, although some software programming is also provided by third-party contractors. Interoperability: The Data Center's VAX computer systems use DEC's proprietary system configuration software. The proprietary DEC FILES11 disk format is supported with adequate technical documentation. According to NOAA system administrators, DEC is committed to two levels of standards compliance: Government Open Systems Interconnection Profile (GOSIP), and SF1. VMS-Altrex interface mechanisms are also under development.

Migration Plans: NOAA's commitment to provide uninterrupted data access over time is based upon the expectation that Digital Equipment Corporation (DEC), the agency's primary computer equipment vendor, will continue to provide technical support. This support is needed both for older DEC equipment and for future generations of backward-compatible DEC equipment. This expectation is based on a carefully prepared contingency plan for the recovery and reconstruction of data files in the event of man-made or natural disasters. Additionally, the Coastwatch data is fully portable to other systems after being downloaded to magnetic tapes.

OVERVIEW OF SIGNIFICANT ISSUES:

Preservation of Environmental Data: Historical data on the world's environment, gathered and analyzed by NOAA's National Oceanographic Data Center, is currently recorded on magnetic tape and housed in special vaults in Asheville, North Carolina. NOAA has a firm commitment to long-term data access, and in cooperation with the scientific community is screening historical data sets to determine their long-term preservation value. NOAA is also supporting a study through the National Institute of Standards and Technology (NIST) to determine the archival stability of various chromium dioxide coatings and their suitability as archival media.

NOAA and the National Archives: NOAA and the National Archives are working cooperatively to deal with the rapidly increasing volume of earth data and to identify effective long-term data management and preservation strategies. In this regard, NOAA is preparing for the selection of permanently valuable data and the eventual data acquisition by the National Archives. NOAA is currently supplying NARA with data tapes and is working with NARA on long-term records disposition schedules.

Data Exchange Standards: NOAA relies on agency-wide data exchange standards for the movement of structured data files within and outside the agency's systems. Even though NOAA's optical disk data file structures are dictated by proprietary protocols, the overall database conforms with metadata standards. These standards permit users on non-DEC computer systems to identify appropriate files, request them via a variety of telecommunications channels, and manipulate them on local systems. These are important capabilities not widely duplicated in optical digital data disk systems storing bit-mapped images. NOAA participates with other Federal agencies in the Interagency Working Group on Data Management for Global Change, in creating a master directory system. Almost one thousand available data sets are stored in a standard format, with on-line descriptions available at NOAA, USGS, NASA and other Federal agencies. This data is fully compatible and can be shared between agencies, providing scientists with ready access to data on global environmental changes.

Disaster Recovery Planning: The National Oceanic Data Center is responsible for data maintenance and data security. System administrators at NOAA have implemented the most comprehensive disaster contingency plan of any agency in this study. The plan specifies procedures for preventing a variety of possible disaster scenarios involving the satellite collection system and data repositories, reacting quickly in the event of disaster, and recovering system functionality and lost data as necessary.


SITE VISIT REPORT #12

AGENCY: Patent and Trademark Office

SYSTEM: APS--Automated Patent System

CONTACT: David Grooms, Patent and Trademark Office, Washington, DC

SUMMARY DESCRIPTION:

The Department of Commerce's Patent and Trademark Office's (PTO) Automated Patent System (APS) is one of the Federal Government's largest projects to capture text and image documents to date. Due to the sheer volume of information under the Office's domain, the conversion effort involved text and images of more than five million U.S. patents, and nine million European and Japanese patents. The system was developed under a congressional mandate to fully automate the PTO's information processing operations. The system improves information access for the agency's patent examiners using specially designed high resolution dual-display workstations. Eventually several hundred image workstations will allow patent examiners to establish their own search parameters, and electronically manipulate the retrieved images for improved viewing and printing. The most recently dated patent images are digitally stored in rapid access optical disk devices providing high performance system response. This image data is supplemented with fourteen optical disk jukebox retrieval subsystems storing less frequently requested information, the entire collection of issued U.S. patents. The PTO's mainframe computer stores textual data accessible along with the image data to the more than 1,600 patent examiner staff already trained in full text search techniques. The project's complexity and state-of-the-art software and hardware technology developments involved a primary contractor and a complementary team of subcontractors. The Automated Patent System is important for this study because of: the important agency mission it supports; the large scale paper to digital image conversion requirements; the complex hardware and software integration efforts; the user demands for rapid system response; and, the improved patent image quality.

BACKGROUND:

The patent system was originally established in 1790 by Congress under the U.S. Constitution. The purpose was to provide inventors with exclusive rights to the results of their creative efforts. The patent system is intended to accomplish several goals, including: to promote incentives to invent; to invest in research and development; to commercialize new technology; and, to make public inventions that would otherwise go unnoticed. The Patent and Trademark Office examines applications for three kinds of patents: design patents (issued for 14 years); plant patents; and, utility patents (issued for 17 years). The Patent Office also issues Statutory Invention Registrations with defensive but not enforceable patent attributes, and it processes international patent applications. The Patent and Trademark Office is a large scale operation, with approximately 107,400 patents issued for fiscal year 1993 alone. These patents may be reviewed and searched in the agency's office, and in over 74 public libraries nationwide. The PTO also sells printed copies of issued patents and trademark registrations, maintains a scientific library and search files for over 30 million documents, provides search rooms for the public, hears and decides appeals, participates in legal proceedings, and helps represent the United States in international efforts to cooperate on patent and trademark policy. Over time, the Patent and Trademark Office has evaluated alternative technologies for automating the agency's holdings. The PTO has had varying degrees of success due to the limitations of technology development compared to the agency's critical need for high performance system operations. For example, in the late 1960's an automated microform aperture card system was attempted. This was subsequently abandoned due in part to the systems inability to provide the required one second page to page display rates. Computerized photocomposition was incorporated in 1970. This became the method of text data capture for patents issued since 1971. At first, only mechanical patents were captured, however, by 1976 all newly issuing patents were formatted in an electronic word-searchable format.

Origins: In the early 1980's, there was growing sentiment outside of the PTO about the viability of the labor intensive, existing manual system to continue to meet program goals and users needs. The Patent and Trademark Office began its current automation efforts under congressional mandate through Public Law 96-517, Section 9, which charged the Commissioner with preparing a plan to fully automate agency operations. A comprehensive plan to improve the quality of patents and trademarks through automation was prepared, and submitted to Congress in 1982. Congress approved the plan's concepts and instructed the office to go ahead with the implementation of its plan. This contributed to the decision reached in 1982 to fully automate the existing manual process using the newly emerging digital imaging and optical digital data disk technologies. This decision also included a two year implementation goal without regard to potential system costs. Due to the expected exponential increase in patent activity and subsequent flood of paper documents, a system concept involving "paperless office" operations was attempted in spite of the expected technical challenges. In late 1987, a Blue Ribbon Industry Review Panel was commissioned by the Secretary of Commerce to review the PTO's imaging system. The Panel determined that the overall information system master planning process needed to be adjusted to reflect the high risk of systems development, and concurred with the need for a high technology solution to address the Patent Office's complex operational environment. The Panel's recommendations included the reconfiguration of the overall project management and a chain of command realignment. Following an Office restructuring, program authority now is controlled by an Assistant Commissioner who reports directly to the Commissioner. The Panel's recommendations for gaining control over project contractors also resulted in contract renegotiations with the integration contractor.

SYSTEM CONFIGURATION:

Date System Installed: 1985/1986

System Installed by: Contractor and PTO staff.

System Configuration Changes Since Installation: The system was originally configured to run on two small NAS computers. These were replaced by a large AMDAHL. DASD and optical storage media have grown as more data was loaded.

System components include: APS Full-Text Database Search/Retrieval Software - Messenger by Chemical Abstract Services (CAS): Contains full text searchable Patent text data from 1971, and is updated weekly with all issuing U.S. Patents.

APS Image Database: (Classified Search and Retrieval (CSIR)): Patent images have been loaded for the backfile (documents from 1790) and there is a weekly update of the optical disk subsystems of all issuing U.S. Patents. (1 ea.)

Mainframe: AMDAHL 5990-1100 (SIERRA Class) 256 MB main memory, 90 MIPS, 3 CPUs; Operating System: MVS/XA. This computer maintains index data and provides support for the search and retrieval of text and image data. (1 ea.)

PTONET TCP/IP X.25 GATEWAY operating speed 10 Mbps supports 256 con curr ent cust ome rs. (1 ea.)

FDDI ring, the first of three rings to support the image application, (only new optical image devices and new workstations use this ring). Process speed is 100 megabits per second. (1 ea.)

IBX (Integrated Business Exchange) Switch (Fiber Optic): Redundant, non-blocking digital communications switch providing packet switched Local Area Networking uniting all system elements over a fiber optic network. (4 production + 2 spares)

Host to Network Interface: (HTN by AUSCOM) Allows the MVS/XA Operating System to communicate with the UNIX Operating System, and translates the XNS network protocol between the mainframe and all other system devices (Model 8911A proprocessor subsystem channel interface unit). (2 ea.)

SPARC HTNs: Allows the MVS/XA Operating System to communicate with the UNIX Operating System and translates the TCP/IP network protocol between the mainframe and the system devices. These are UNIX/SPARC machines acting as HTNs. (6 ea.)

Fileservers: (SUN 3/160) 16 Mhz, Motorola 68020 CPU, 4 MB RAM main memory with Fujitsu 337 MB hard disk; Operating System = Berkley UNIX RTU 3.1B; Fileserver back- ups Fujitsu to 9 track 6250 bpi tape, are done nightly using Support Processors. (4 ea.)

Support Processors: (MASSCOMP 5500) 16.67 Mhz, Motorola 68020 CPU, 4 MB main memory and 2 MB expansion memory board with 570 MB hard disk; Operating System = Berkley UNIX RTU 3.1B; used to monitor and maintain all devices on the CMC/XNS/Ethernet Network. (4 ea.)

Support Processors: (SUN 3/160) 32 bit word field Fujitsu fixed drived interfaces with XLOGIC disk. (64 ea.)

Rapid Access Devices: (RAD by FALCON) 16.67 Mhz, SUN 3/160 Motorola 68020 CPU, 4 MB RAM main memory with 130 MB hard disk; Operating System = AT&T UNIX Version 4.2 Release 3.2; 4 Optimem optical disk drives; platters spin at Constant Angular Velocity (720 rpm); RADs store frequently accessed information such as the digital compressed patent images at 150 dpi on 1.2 GB (PDO) platters. (30 ea.)

Rapid Access Device: (RADs - SPARC) - 4 LMSI drives per RAD. (14 ea.) Jukebox/High Density Devices: (HDD by Sony) + Intelligent Controller: (IC by FALCON) 16.67 MHz, SUN 3/160 68020 CPU, 4 MB main memory with 825 MB hard disk in the Controller; Operating System = AT&T UNIX Version 4.2 Release 3.2; platters spin at Constant Linear Velocity (360-720 rpm); used as back-up for RADs, primarily for group printers and PTCS; images stored at 300 dpi on 3.2 GB per side (SONY) platters. (99 ea.)

Text Terminals: (MAD D1000) 8 MHz, 80286 CPU, 80287 numeric coprocessor 6 MB main memory with 70 MB hard disk, 1.2 MB floppy disk; 1 CRT, keyboard, and dot matrix printer; Operating System= SCO XEXIX System V; more than 1,000 terminals will provide additional user interface to the system for patent text search, commercial data base access, and office automation functions. (18 ea.)

PTCS Group Printers: (SUN 3/160) (60 ea.)

Workstations: (ORACLE) 16.67 MHz, SUN 3/160 Motorola 68020 CPU, 4 MB main memory with 337 MB hard disk; 2 CRTs, keyboard, mouse, and laser printer; Operating System = AT&T UNIX Version 4.2 Release 3.2; can search or display images and/or full-text of patents. Eventually more than 800 workstations will provide the primary user interface to the system for search and retrieval and full office automation functions. (67 ea.)

Workstations: (SPARC) (7 ea.)

Group Printers: (ORACLE) 16.67 MHz, SUN 3/160 68020 CPU, 4 MB main memory with 337 MB hard disk, WYSE 85 CRT; Operating System = AT&T UNIX Version 4.2 Release 3.2; Centralized printing of patent images at 300 lines per inch. (27 ea.)

Group Printers: (SPARC) (4 ea.)

Gateway Processors: (SUN) Supports 62 MACs and 12 PSR PCs (15) Patent Depository Libraries (PTDLs) in production.

DIGITAL IMAGE CAPTURE:

Document preparation: The original patent documents are stored in Boyers, PA, under suitable archival conditions. Conversion staff assembled batches of patents containing approximately 1,200 documents each. Patent documents with high acidic content, in widespread use during the 1930-40s era, required special careful handling due to aging degradation.

Conversion Staff: Contractor employees, not supervised directly by the PTO staff, operated (Boyers, PA) conversion scanning and indexing equipment for the capture of backfile patent data. The contractor's work was inspected for quality by the onsite PTO staff. The conversion and indexing of current issue patents was brought in house and is now done by the PTO employees.

Indexing: Patent document index header records were stored on magnetic media as one of the initial conversion tasks. Document Scanning: Contractor staff converted the paper patents to digital images using modified Terminal Data Corporation (TDC) high speed scanners with Photomatrix electronics. The TDC scanners captured double-sided documents at a rate of 1.5 seconds per page. The images were temporarily stored on magnetic tapes, and scanning costs were calculated at approximately $0.17 per page. Following completion of the backfile conversion, digital scanning continues today by the PTO staff on newly received patent documents and to correct backfile errors.

Disposition of Original Records: The original paper patent documents are permanently retained as the primary archival copy and a complete "A" set of patents is maintained at the Boyers facility.

Quality Control: A quality control sample size of 5 percent (based on Mil Std. 105D) was established. The index header records and images were compared against the original documents for inspection pass/fail decisions. The Patent Office developed special imaging test targets for calibrating scanner electronics and evaluating system performance. The PTO staff noted that screen display characteristics and the quality of the input source documents play an important role in imaging systems.

Scanning Resolution: The documents are scanned at 300 dpi. However, two digital image resolution levels are maintained: 150 dots per inch in the RAD units; and 300 dots per inch in the Sony optical disk jukeboxes.

Gray Scale: A special MITRE Corporation image quality study evaluated digital scanning of photographs at 150 dots per inch at 8 bits per pixel. Although the images produced were promising, the gray scale images increased the storage requirements by a factor of eight. Further investigation into compression techniques are planned.

Compression/Decompression: The PTO system compresses images using the CCITT Group 4 compression.

DOCUMENT INDEXING:

Index Formats: The AMDAHL mainframe computer stores the first level of index data, and users are provided with three retrieval options: full text search; search for a specific patent; and, search by a patent classification (1,420 classes, 128,000 subclasses).

Location of Index Data: The APS mainframe computer provides the pointer location for the optical digital data disk image location, and additional detailed index information is stored along with the image data on the optical media. Index Design: The PTO was originally faulted for over-designing the indexing system. These efforts were subsequently vindicated when analytical studies showed that years of staff time can be wasted due to a non-efficient patent index search and retrieval process.

OPTICAL DIGITAL DATA DISK STORAGE:

Storage Architecture: The APS image storage subsystem includes both single disk drive Rapid Access Devices (RADs) and multi-drive high density optical jukeboxes. Technical specifications for the two configurations are: RADs: -64 RADs, each with 4 single-sided optical drives -Constant Angular Velocity (CAV) 720 rpm -Patent images stored at 150 dpi resolution -Faster access, but less storage capacity -1.2 GB (PDO) disks (approx. 212 platters) -30 LMSI drives (4 drives/RAD) -Patents stored at 150 dpi resolution -6.2 GB storage (318 total platters SUN and SPARCs) Jukeboxes: -High Density Devices -Serve as backup for the RAD retrievals -Constant linear velocity (360-720 rpm) -Patent images stored at 300 dpi resolution -Slower access, greater storage capacity than RADs -3.2 GB (Sony) disks (approx. 791 platters) The PTO management monitors the digital imaging industry, and dynamically manages the system's high performance storage and retrieval system capabilities. The PTO staff are also exploring new workstation designs and alternative storage technologies.

Recording Process: Write once, read many (WORM) optical media are used for both the RAD and jukebox storage units.

Optical Digital Data Disk Composition: The RAD Optimem Corporation disks are single-sided, and Laser Magnetic Storage Incorporated media are dual-sided with dual head drives. Sony disks are of polycarbonate materials with two-sided recording capability.

Capacity: The RAD units use 1.2 GB optical digital data disks; the jukeboxes use 3.2 GB (6.4 GB total each) Sony disks. LMSI has 6.2 GB each with 4 LMSI drives per RAD.

Number of Optical Digital Data Disks in Use: 1,089 disks. Jukebox: The APS has ten Sony jukeboxes, each offering 50-disk storage capacities and two Sony optical disk drives. The jukeboxes store all Patent images including the recently stored and the less frequently accessed images, about 5.7 million documents. A small computer systems interface (SCSI) links the jukebox optical disk drives and controllers.

Storage Environment: The PTO has long-term plans for establishing a environmentally controlled optical digital data disk storage area.

RETRIEVAL AND OUTPUT:

The PTO system offers full text searching of:
  • All U.S. patents issued since January 1971 (approximately 1.5 million)
  • More than two million English language abstracts of Japanese patents
  • Over six thousand English language abstracts of published Chinese patent applications.
  • Significant amounts of foreign data will be added in 1994. More than 1,600 patent examiners were trained in the text search software running on the AMDAHL mainframe computer. This search software is a commercially available package especially modified for the PTO system. Database searchers may use the text terminals, or selected PTO examining groups can gain access using the dual display workstations. The image search software was created specifically for the APS. Following a user request, the first patent image is received by the workstation within 40 seconds. Subsequent pages of the same patent are provided in under one second each. Text or patent drawing images may be printed selectively using the high quality laser printers.

Primary System Users: Patent examination staff use the index and image data. Public users can gain access to the data at the Patent and Trademark Office for a nominal fee. The Patent Office is phasing in full text search capability at its 74 Patent and Trademark Depository Libraries (PTDLs) and is producing CD-ROM distribution discs.

Search Enhancements: Two concepts under consideration for enhanced search capabilities are: 1) applying other retrieval capabilities to the existing database, e.g., fuzzy searching; and 2) meeting the demand for image data beyond the existing PTO campus, thus making the electronic patent data available as a national resource.

Workstation Development: Enhanced PTO workstation capabilities result from continuing industry technology advancements. Increased performance combined with lowered costs are having a positive impact on the PTO's system operations. For example, the PTO's first generation dual-display imaging workstations cost $45,000 each, the second generation cost $35,000 each, while each third generation unit cost $12,000. These lowered costs do not result from lowered capabilities, rather, they are accompanied by improvements to workstation image display and user human/machine interface ergonomics.

Data Retrieval and Display: Elapsed time to retrieve the first patent image is a critical PTO system performance measure. The dual display workstations provide simultaneous side-by-side viewing of two patent images. System administrators noted that the image display zoom or enlargement capability is not used as much as expected, compared to the image rotation feature that is used frequently. No user interface window manager software was available when the system was designed and implemented.

Printing Subsystem: All print requests from the group printers use the higher quality 300 dpi optical disk jukebox images. For best results, PTO staff calibrate the laser printers for optimum quality using actual document images rather than test targets.

DATA MIGRATION POLICY ISSUES:

Storage Factors: The Blue Ribbon Industry Review Panel recommended adoption of the RAD configuration. This decision was based on system performance and cost considerations. RAD images stored at 150 dpi provide faster system response as well as legible screen images. The RAD subsystem currently contains approximately 2.4 million patents and 1.5 million images. Back-up of Image and Index Data: The PTO image storage subsystem automatically creates two Sony WORM optical digital data disks. One is loaded into a jukebox for servicing data retrievals, while the second is the designated archival back-up.

Technical Support and Documentation: Technical support for the imaging system is provided under the primary integration contract. PTO expects to have full hardware and software documentation describing the intricate details of the system in time for the 1996 contract re-bid. This documentation will be patterned after Department of Defense requirements, and will include a configuration management plan and device driver specifications. Technical support for maintenance and repairs of the optical imaging system will also be included. Interoperability: A Patent Application Management (PAM) system and Trademark Systems are under development. The existing Trademark system utilizes a proprietary Unisys system and lacks documentation and source code.

Migration Plans: PTO is undergoing transitional planning for a new systems integration contract scheduled for fiscal year 1996. The system is expected to be fully compliant with the Government Open Systems Interconnection Profile (GOSIP). PTO's migration strategy is constrained by existing government procurement rules. For example, PTO is unable to specify Sony compatibility or specific data portability requirements. Another consideration is the unavoidable physical media degradation that is inherent in mountable (spinning/mechanical) type storage media. Due to ongoing technology developments, PTO expects to retain patent image data on two successive generations of optical digital data disk hardware. PTO "leapfrogged" technology in a sense when changing from Optimum to LMSI drives. Successful leapfrogging requires high level planning and critical technology forecasting to effectively jump over marginal increases in storage technology. An in-house agency system architect or "visionary" is important when forecasting the future, and to ensure that existing processes take advantage of the jump.

OVERVIEW OF SIGNIFICANT ISSUES:

Business Process Re-Engineering: From the late 1970's to the early 1980's, PTO conducted a full scale office-wide operational assessment. This effort resulted in an analysis of existing work flow processes; preparation of a requirements statement specification of a representative system architecture; and in defining a operational concept and a procedural level analysis.

Agency-wide Planning and Organization: The PTO's initial imaging system was overly complex and financially costly. The Blue Ribbon Industry Review Panel decided that difficulties linked to political expectations far exceeded the reality of technology development. Imaging systems on the scale of the PTO's need high visibility within the agency. Administrative leadership is a key to success.

Integration with Existing and Future Systems: The original system design envisioned the creation of a comprehensive information system supporting all of the PTO's operations. This highly distributed processing system was eventually scaled down and focussed on supporting the patent examination process. PTO expects that within ten years the larger, far reaching vision will be achieved.

International Cooperation: An international effort began in the early 1980's to standardize the patent process and improve information exchange. Countries involved included the U.S., Japan, the European Patent Office and most European Community countries. Cooperative arrangements with the Japanese and the European Patent Offices are continuing towards this goal.

Lessons Learned:

1) The commonly expressed assumption that implementing optical digital data disk technology is risk-free because it is basically similar to magnetic media except for the "data pits" is incorrect. Integration of optical digital data disk systems is a more complex process than magnetic storage systems, and the old familiar ways of storing information do not necessarily transfer to new technologies. Changes are required in many areas including staff knowledge, system capabilities, and operational processes when adopting any new technology.

2) The half-life of data is much longer than the half-life of the optical digital data disk hardware and the optical equipment manufacturers. The ability to migrate data to newer systems becomes important, with greater emphasis placed on avoiding proprietary technological solutions.

3) When system performance is a key driving force, areas such as high performance communication systems and protocols become critical system components.

4) A strong technical development and management staff capability is needed. The PTO learned this from both actual operations and from the Blue Ribbon Industry Review Panel's review. Data capture and system management proved to be a significant issue with commensurate large financial costs, and system loading and database creation were also expensive. A comprehensive migration path is critical for agencies considering adopting a new technical approach to their normal course of business.

5) Top level comprehensive project management and agency head commitment to program goals is critical to program success.

6) Implementation plans and policy: The PTO was faulted for automating the present process in an incremental evolutionary approach, rather than dictating change to the agency's ingrained culture. PTO management adopted a cautious approach due to the potential impact on daily operations such as developing new information searching techniques. The PTO attempted to establish a foundation within the agency so that changes can occur over time, and their leadership believes an organized approach is mandatory when planning for change.

7) The PTO's most critical system acceptance factor was workstation ergonomics. This included human/machine interface issues of image legibility, contrast, screen text formats, user interfaces, and physical placement of screens and keyboards. The PTO's original system configuration failed to adequately address technology capabilities and user perceptions. Human/machine interface issues are not a unique PTO problem, as any information system requiring users to operate workstation terminals for extended periods will attest. The Europeans are working on a long range study of human/machine interfaces, involving many complicated, interrelated issues.

8) Data quality was given considerable emphasis during capture and load. In retrospect, this was a wise decision, but even more steps could have been taken, especially to automate the data assurance steps.


SITE VISIT REPORT #13

AGENCY: Social Security Administration

SYSTEM: EASEAR Pilot Project

CONTACT: Malcolm Ewell, Social Security Administration, Baltimore, MD 21235

SUMMARY DESCRIPTION:

The Social Security Administration (SSA) developed a comprehensive, agency-wide information systems implementation strategy in order to gain increased management control over the agency's planned automation initiatives. One of the SSA's first optical digital data disk technology projects, the Earnings Accounts Self-Employed Annual Reports (EASEAR) pilot system, was developed in conjunction with National Institute of Standards and Technology Advanced Systems Division staff. This pilot project began operations in June 1991, supporting efforts to evaluate the practicality of replacing an existing manual microfilm system with an electronic search and retrieval process. Even though the existing EASEAR holdings consist of more than 100,000 rolls of computer output microfilm (COM), access to the information can be impeded due to out-of-file conditions. A subsequent prototype, EAMATE (Earnings Accounts Magnetic Annual Tape Employers) has been tested and will begin pilot operations in 1994. The pilot system required the conversion of nine track computer tape data to rewritable optical media. Self-employed earnings information for tax years 1988/1989 (approximately five gigabytes of data) were recorded onto 5 1/4-inch optical digital data disks, and stored in a jukebox for servicing user requests. For the subsequent Employer Report (EAMATE) prototype a subset of one year's holdings was similarly converted and stored. Senior level SSA agency approval has subsequently been obtained to pilot the EAMATE prototype, and to expand the pilot system and convert additional tax year data. The EASEAR pilot system is important for this report because of: the SSA's comprehensive automation plans regarding optical digital data disk systems; the agency's utilization of pilot projects for concept verification prior to full scale implementations; the level and effectiveness of cooperative agreements between the SSA and NIST; and, the concerted effort to reduce labor intensive processes in SSA offices nationwide.

BACKGROUND:

The Social Security Administration, dating from 1936, is part of the Department of Health and Human Services. The SSA administers a national program of contributory social insurance whereby employees, employers, and the self-employed contribute to special trust fund pools. In the event that earnings are stopped or reduced due to retirement, death, or disability, the worker's family is provided with monthly cash benefits to partially replace the lost earnings. Principal programs include the Old Age Survivors and Disability Insurance Program, providing monthly benefits to retired and disabled workers, their spouses and children, and to survivors of insured workers. The Social Security Administration also administers the supplemental security income program for the aged, blind, and disabled. The SSA uses a nationwide field organization of 10 regional offices, 6 program service centers, 3 data operation centers, and over 1,300 local offices to direct all aspects of it's cash benefit program operations; and it directs the activities of those offices responsible for various program operations, including retirement, survivors, disability insurance, and supplemental security income. The Social Security Administration also provides administrative direction to a national organization of administrative law judges, who conduct independent hearings and decide appealed determinations involving the benefit provisions of Administration programs. Social Security Administration operations are decentralized to provide appropriate services at the local level. The United States is divided into 10 regions, each headed by a Regional Commissioner who is responsible for ensuring that services are effective and consistent with national and regional requirements.

Origins: Interest in optical digital data disk systems at the Social Security Administration dates back to 1984, beginning with agency staff investigations into microcomputer networks and evaluations of an "electronic folder" concept. The agency faced obstacles of high system costs versus available funding issues for their large scale systems, as well as technical problems associated with existing non-standardized technologies and data formats. The SSA created an information policy plan defining the various agency initiatives, and a strategy to implement technological solutions. The first SSA optical pilot project is the EASEAR pilot system. A three phased approach was adopted for design and implementation of the Earnings Reports systems, with two of the three major project phases completed to date. The first phase was the conversion of two years of EASEAR (self-employed) data from magnetic tape to optical digital data disks. Phase two converted a subset of one year's holding's of regular wage earners/employer report files (EAMATE) from magnetic tapes onto optical digital data disks. The third phase will involve pilot system expansion and linkage with the SSA's main computers planned for 1994. The pilot system allows SSA staff to evaluate the impact of technology on overall agency operations, and determine the benefits of on line data access.

SYSTEM CONFIGURATION:

Computer workstations and printing equipment were installed in the SSA's two pilot sites: the Division of Operations Support's Archive Area; and, the Division of Certification and Coverage 2 (DCC) Files Maintenance Unit.

Date System Installed: June 1991 System Installed by: NIST and SSA Staff. The EASEAR pilot system was exercised for more than a year. The configuration has not changed.

SYSTEM HARDWARE

  • File Servers: Compaq 386/25 with 16MB RAM, 300MB hard drive.
  • Optical Storage: Hewlett-Packard C1710A Rack Mountable Optical Disk Library with 2 drives; jukebox capacity of 32 cartridge slots; optical media capacity of 600MB per platter-18.375GB total storage.
  • Hewlett-Packard 88780B 1/2-inch open reel tape drive.
  • SCSI cable with male Amphenol connectors (3 each); WD 7000 FASST-2 SCSI Host Bus Adapter.
  • Arnet SmartPort 8-Multiport Adapter Board (1 each).
  • 19200 baud Datapath Modems (4 each supplied by OCRO)
  • Eltech 386 Personal Computers with color monitors and 101 key keyboards (4 each supplied by OCRO).
  • Okidata Dot Matrix Printers (2 each supplied by OCRO)

SYSTEM SOFTWARE

  • Interactive UNIX System V Release 3.2 Version 2.2.
  • Informix ESQL/C Database Engine.
  • Columbia Data Products Drivers and SDLP Interface
  • Rapport DOS/UNIX Bridge.
  • UNIX Compression Utility.

DATA CONVERSION:

SSA used the pilot optical digital data disk system to evaluate alternatives to an existing 16mm computer output microfilm (COM) system. The microfilm retrieval process requires teams of SSA employees called "scouts" to manually search for the self-employed (SE) wage reports. The pilot project's data conversion process involved machine readable data supplied on magnetic tapes, and no paper document scanning was required.

Digital Records: The EASEAR records (self-employment wage earner files) for the tax years 1988/1989 contain 22 million records, stored on 34 nine-track 6250 open reel magnetic tapes. The same EASEAR tax information occupies 360 rolls of COM microfilm.

Data Conversion: The original magnetic tape data in an EBCDIC character set was formatted with a header, data, and a trailer separated by filemarks. The data was converted to ASCII character set, analyzed to determine the internal revenue district (IRD) status, and then formatted into distinct database files of 300 records each. This file size optimizes data compression and provides improved system response. Each data file contains one IRD, identified with a unique name consisting of the IRD and a sequential file number. After converting the data to an Informix relational database table format, the last steps included data compression using a Huffman encoding scheme followed by optical media data recording. During the data conversion phase, the system manager was responsible for: loading, formatting, and unloading the optical digital data disk cartridges; signing onto the UNIX operating system (as a super-user) and accessing the data conversion programs.

Conversion Staff: EASEAR data was converted to optical digital data disk format by on-site SSA staff and the NIST development staff. Data Retention: The EASEAR data is normally retained indefinitely on microfilm. Scanning Resolution: Not applicable (no documents were scanned).

Color and Gray Scale: ASCII data only, no gray scale processing. Another SSA pilot project under the Office of Disabilities and International Operations (ODIO) will use gray scale technology to enhance medical records (e.g. X-ray films).

Image Enhancement: Not applicable (non-image data).

System Management: The EASEAR pilot system has system monitoring and management reporting capabilities providing data on system performance and utilization. System information collected includes: number of user accesses to the system; quantities of queries for 1988-89 tax years; time used during a query for tax year information; number of requests for detailed records and prints. This information is proving useful to SSA in evaluating the pilot's utilization, capabilities, and performance.

DOCUMENT INDEXING:

The query screen requires users to provide, at a minimum, the tax year, the internal revenue district number, and the self-employed individual's last name. Other search parameters are optional and are useful in further refining the search process. The NIST pilot project staff successfully integrated a blend of commercial-off-the-shelf (COTS) software with custom developed "C" software programs.

Creation of Index Database: Each index file contains 300 records. The files are based on the IRD number, the beginning and ending self-employed wage earner's last name and initials, and the filename for each output file.

Location of Index Database: The self-employed wage earner's last name and initials are converted to a database table format and stored on magnetic disk media. Field information regarding the location of the data on the optical digital data disks are also added to the tables.

OPTICAL DIGITAL DATA DISK STORAGE:

The optical storage media and drive equipment selection was based on conformance with industry standards, system compatibility, and performance criteria. NIST staff selected International standard ISO/IEC 10089 compatible removable, rewritable optical media. The optical disk drives are under control of UNIX software utilities with Columbia Data Products software drivers. The first step in the data conversion process was to copy the EASEAR data directly to the optical media. The data was divided into 50 block sets and converted to ASCII. Eliminating manual handling of the magnetic tapes enhanced the conversion process and provided direct access to the data.

Data File Headers: The data was separated into unique files containing 300 records each to optimize data compression and provide reasonable response times to user requests.

Error Detection/Correction: Technique not specified. Recording Process: Rewritable magneto-optic technology.

Optical Disk Composition: 5 1/4-inch (130 mm), double sided media. Capacity: 600MB per optical digital data disk cartridge.

Number of Optical Disks in Use: 6 -the data was split for access purposes.

Compression/Decompression: Compression/Decompression Code Lempel-Ziv-Welsh Code from Dr. Dobb's Journal was tried early in the development. Later it was replaced with the UNIX Compression Utility. Both the index data and EASEAR database are compressed prior to storage. The Huffman encoding scheme compressed the data to approximately one gigabyte of optical storage for each data (1988/1989) year.

Jukebox: One Hewlett-Packard C1710A rack mountable automated optical digital data disk library, containing 2 optical disk drives and storage slots for 32 optical digital data disks; 18.375GB total jukebox storage.

Storage Environment: The system is operated under a normal office environment,but the file server and jukebox are housed within an environmentally controlled computer room.

RETRIEVAL AND OUTPUT:

After the 1988/1989 tax data was converted to optical digital data disks, the records custodians purposely withheld the existing EASEAR microfilm rolls to ensure the utilization of the pilot system's retrieval capabilities. SSA staff from various divisions, responsible for searching the existing microfilm holdings, received training in EASEAR pilot system search and retrieval operations. NIST computer software specialists created special display screen formats for ease of use for the SSA scouts tasked with data retrievals. A hard copy data printout is also available.

Primary System Users: SSA staff from various divisions, responsible for searching the existing microfilm holdings, received training in EASEAR pilot system search and retrieval operations.

Query Screen: The EASEAR query screen allows the search parameters to be entered by the requester: The requester enters the tax year (1988 or 1989), the Internal Revenue District (IRD), and a minimum of 4 letters of the last name. The other information is not mandatory, but does help in search refinement. The EASEAR computer processes the query based on the key entered information, conducts a search, retrieves the optically stored data, and displays the findings in a Browse Screen format.

Browse Screen: The EASEAR browse screen provides data on all records matching the query entry. The user may peruse the records line-by-line or page-by-page under keyboard control. Individual records may be selected for a more detailed examination under the Detail Screen format.

Detail Screen: The detail screen provides the full EASEAR record information as requested, displayed in a format identical to the corresponding microfilm. After viewing the data, the requestor may print the selected information or the entire screen using the pilot system's dot matrix printer.

Printout: The pilot optical digital data disk system formats the printouts as the data was formatted on the microfilm.

Error Screen: This function displays various error messages concerning improper search parameters or possible system difficulties. The user is notified to contact the computer room for assistance when appropriate.

DATA MIGRATION POLICY ISSUES:

Linkages with Other Agency ADP Applications: The EASEAR pilot system was a data storage model for a regular wage earners (EAMATE) system.

Network Transmission: The pilot project is a stand alone system, and as such does not have network capability. The subsequent EAMATE prototype which will be piloted in 1994 introduced network capability. It will operate with 25 workstations.

Backup of Image and Index Data: Data backup schedule not specified.

Technical Support and Documentation: The Social Security Administration has effectively utilized inter-agency agreements with the National Institute of Standards and Technology (NIST) to obtain assistance in building in house applications development expertise. The EASEAR system is thoroughly described by Natalie Willman in the August 1991 NIST Report.

Interoperability: A companion system for storing digital data for regular wage earners from employers who report using magnetic media (EAMATE) average 176 million records per year. SSA staff working with NIST personnel have converted a selected subset of the EAMATE records into an enhanced optical digital data disk prototype system.

Migration Plans: SSA considers the 5 1/4-inch optical digital data disk format to be the leading storage technology candidate for the future, but clearly understands that an agency's applications will ultimately dictate media selection. Media selection for imaging systems must evaluate data access versus storage capacity, with systems designers working to solve the queue or disk wait time problems for various sizes/configurations of optical digital data disk libraries.

OVERVIEW OF SIGNIFICANT ISSUES:

Business Process Re-Engineering: The Social Security Administration is responsible for complex, data intensive programs. The SSA's senior management recognizes the importance of integrating automated systems into daily operations, and formulated an Information Systems Plan defining the agency's information strategy. The SSA currently has several systems in the planning or pilot stages including:
  • EASEAR Self-Employment Earnings System.
  • EAMATE Employer Annual Magnetic Tape Reports.
  • Data Entry Scanner Replacement System.
  • ADCAR Document Imaging System.
  • Paperless Processing Prototype and Pilot Operation

These systems include planned pilot projects as well as full scale imaging applications, ranging from machine readable data conversions (COM replacement systems) to massive document image scanning systems. The pilot systems will help SSA staff to determine: the impact of imaging technology on agency operations; pre-identify potential problems before the systems are implemented on a grander scale; and, capture valuable data concerning system operations. The larger document imaging systems will eliminate the labor intensive paper processing, or replace existing manual microfilm operations in use at SSA headquarters and in the program service centers nationwide. The SSA is currently utilizing compact disc, read only memory (CD-ROM) technology for distributing procedural manuals in digital format to approximately 150 SSA pilot site offices nationwide. Requests for Proposals (RFPs) are on the street currently to begin the implementation of this technology in many more offices. The current RFPs are expected to provide the hardware and software for an additional 300 offices by the end of calendar year 1996.

ADCARS System: The SSA is faced with significant challenges in records and information retention throughout the agency. SSA management is committed to technology-based solutions, including document imaging systems to reduce existing labor intensive paper records systems. The Office of Public Inquiries (OPI) is developing a mission critical optical digital data disk imaging system. ADCARS will be integrated into the existing SSA correspondence operations, improving the processing of high priority inquiries received from sources such as the White House, Congress, the Secretary of HHS, and other Federal agencies. The system will process the approximate 120,000 yearly inquiries, including telephonic inquiries that are political or policy oriented. Other SSA correspondence sections will continue to routinely handle lower level public inquiries for data. The ADCARS system will support the 120 member staff in tracking, controlling, retrieving and displaying all high priority incoming requests for agency maintained data. The ADCARS system will need to support the four hundred access requests received each week.

The actual records generated as part of the inquiry resolution process do not have long term permanent archival value. A current retention (shelf storage) period for the up to 500,000 current active files is three months after the process is completed. SSA's Freedom of Information Act (FOIA) Officer is part of the inquiry process, and will be included in the new ADCARS system. Denials and appeals for information also are processed by OPI personnel. NIST personnel assisted the SSA in fine-tuning ADCARS functional specifications. Special emphasis was placed on optical media issues including: technology standardization and marketplace availability; pre- and post-write media longevity; optical drive response times (millisecond range); jukebox capacity; and, the availability of equipment through multiple commercial sources.

ADCAR workstation design is another critical factor, as the SSA currently has more than 40,000 non-image capable terminals. A single display configuration for index/image display is preferred by SSA staff. ADCARS will operate under a Token Ring environment, with the ADCARS index locally maintained on MS-DOS workstations. A wide area network (WAN) linkage to the SSA's mainframe computer system will be through a host/database server configuration. The ADCARS system will function as an office automation capability, combining imaging technology under a Windows environment with word processing, E-mail and FAX. ADCARS will employ document scanners (with forms dropout capability) in the OPI mailroom, creating electronic folders using the system's workflow software. Various servers will support capabilities of scanning, optical storage, index database retrievals, LAN files, printing/FAX, and CD-ROM. Word processing document output will also be stored on the optical digital data disk media, and hard copy output will be accomplished with high quality laser prints on bond paper.

Data Entry Scanner Replacement System: The SSA awarded a contract in April 1993 to replace the data entry scanners and support systems at the Data Operation Centers (DOCs). Earnings reports received by the agency on paper is the primary workload. As part of SSA's annual wage reporting process, the agency receives approximately 70 million paper forms W-2 from employers each year. These forms are processed in its three DOCs. The DOCs also receive and process documents from employers,individuals and other State and Federal agencies in connection with other workloads, such as Representative Payee and Health Care Financing Administration Application for Medical Insurance.

The workloads are currently processed either by: (1) optical character recognition devices;(2) hand held scanning wands; or (3) keying from paper documents or microfilm using terminal keyboard workstations. This initiative calls for the development of an integrated image-based data capture system for the documents received and processed in the DOCs. When documents enter the system, an electronic digitized image will be captured. For documents to be character recognized, the system will convert data on the documents to machine readable format. The system will use two keying techniques to correct unrecognized or rejected data:character identification and data validation. The major functions of the system will be image capture, data capture, data purification, storage, microfilm output, processing control, and communications.

The total installation will occur in two stages. A minimum configuration of equipment was installed in the new Wilkes-Barre Data Operation Center facility for validation of the Annual Wage Reporting (AWR) software. Contractor-developed software for remaining workloads was completed in time for validation in September 1993. Installation of the full configuration is scheduled to begin in November 1993 and end in January 1994. Acceptance testing and processing of the Tax Year 1993 AWR data will start in February 1994. Installation of the remaining equipment will begin in November 1994. By February 1995, the new system will be processing all DOC workloads.

EAMATE Prototype: The second phase of the optical digital data disk pilot operations involving earnings reports began in January 1993 with the exercising of the EAMATE Prototype system. The system was tried out on about 20 different OCRO scouts who normally search earnings reports on microfilm. Unlike the EASEAR Pilot, the EAMATE prototype was designed using a microsoft Windows environment. The scouts, although unfamiliar even with a mouse when they began, were able to learn the system very rapidly. One of NIST's objectives was to develop a user interface that was easy to use and easy to learn. An estimated one hour of training was all that was required to get the scouts using the system.

The EAMATE files represent that part of employer reports that are received by SSA on magnetic media,usually from large employers having from 100 employees to well over 400,000 employees, and total about 175 million records per year. These files, maintained on microfilm, parallel those received generally from small employers on paper in SSA's AWR process which total approximately 67 million records per year. Both of these files and the EASEAR files are the files that must be searched when a person's earnings records must be verified. This is a cumbersome process as the film is difficult to read, and frequently employees are listed in no particular order, requiring a sequential search through the entire report. The high storage density and cost effectiveness of optical digital data disk technology suggested possibilities of automating these files. With the large employer reports being available in electronic form, and the small employer reports now being digitally captured and converted to text via optical character recognition techniques via the data scanner replacement system the project was initiated to explore the feasibility of automating these files and re-engineering the scouting process for earnings verification.

A secondary objective was to explore non-traditional searching techniques to locate records that normally elude computer searches (e.g. misspellings, incorrect SSNs, etc.). The agency has decided to expand the testing of the EAMATE Prototype system in an operations environment by authorizing the procurement of hardware and software for a six month pilot. The pilot is expected to begin in June 1994 after the acquisition of sufficient equipment to establish a network of twenty-five workstations, and the conversion of a full year of data.


SITE VISIT REPORT #14

AGENCY: State Department

SYSTEM: Freedom of Information Act Case Processing Prototype

CONTACT: Jacqui Lilly, Technology Application Branch, Department of State, Washington, DC

SUMMARY DESCRIPTION:

The Department of State utilizes a digital imaging system to improve the processing of requests received under the Freedom of Information Act (FOIA) and related legislation. The Office of Freedom of Information, Privacy and Classification Review in the State Department's Information Services Directorate (IS/FPC) installed a prototype imaging system called REDAC, in February 1991, and subsequently has continued to upgrade the operation. As its name implies, REDAC allows on-line review and selective electronic excision (editing) of sensitive withholdable document text for public release. Excisions are contained in overlay files so that the integrity of the original document remains unaltered. It also enables on-line storage and retrieval of previously reviewed documents. The REDAC imaging process begins with the creation of a case file and scanning of documents that result from research on a specific request. Additional processing steps include: the retention of unadulterated copies of the original documents along with overlays created by State Department review staff to indicate the review results; and, the clear outlining of those document textual segments that must be excised prior to release to the requester. The REDAC system permanently stores the scanned document images on 12" write once, read many (WORM) optical digital data disks stored in a jukebox. The State Department continues to update and evolve the system through additional capabilities and hardware. A stand-alone retrieval terminal will be installed in the agency's public reading room, providing public researchers with automated search, request, and redacted document viewing and printing capability. The State Department's FOIA processing system is important for this study because of: its unique system operational and redaction features; its potential linkage with other State Department office automation systems; and, the possibility that the system may serve as a model for other Federal agencies with similar document security classification and review requirements.

BACKGROUND:

The Department of State's overall mission is to advise the President in the formulation and execution of foreign policy. As Chief Executive, the President has overall responsibility for the foreign policy of the United States. The Department of State's primary objective in the conduct of foreign relations is to promote the long-range security and well-being of the United States. The Department determines and analyzes the facts relating to American overseas interests, makes recommendations on policy and future action, and takes the necessary steps to carry out established policy. In so doing, the Department engages in continuous consultations with the American public, the Congress, other U.S. departments and agencies, and foreign governments; negotiates treaties and agreements with foreign nations; speaks for the United States in the United Nations and in more that 50 major international organizations in which the United States participates; and represents the United States at more than 800 international conferences annually.

In the process of carrying out its responsibilities, the agency creates a significant amount of documentation in electronic and paper form that has immediate and high interest to persons outside the Department, as well as long-term research value. Release of such information to the public, which may or may not be security classified, is governed by a variety of laws, including the Freedom of Information Act (FOIA) and the Privacy and Ethics in Government Acts, as well as by Executive Order 12356. The State Department handles more than 5,000 FOIA and Privacy Act requests each year, in addition to servicing special project requests on behalf of the Secretary of State, or originating from the Congress, the courts, or elsewhere. Any identifiable Department of State record can be requested under the Freedom of Information Act (5 U.S.C. 552). Requestors should provide as much identifying information as possible about the document to assist the Department in locating it, including subject matter, timeframe, originator of the information, or any other helpful data. Only persons who are U.S. citizens or aliens who are lawfully admitted to the United States for permanent residence can request information under the Privacy Act. Under this act, individuals may request access to records that are maintained under the individual's name or some other personally identifiable symbol. Descriptions of record systems from which documents can be retrieved by the individual's name are published in the Federal Register. To facilitate processing of requests, individuals should specify the system of records they wish to have searched and be prepared to provide the necessary personal identification. A public reading room, where unclassified and declassified documents may be inspected and copied, is located in the Department of State.

The Office of Freedom of Information, Privacy and Classification Review of the Information Services Directorate (IS/FPC) is the office responsible for administering the agency's information access and document production program with the aid of the Information Request Management System (INFORM). This computer mainframe based system maintains a record of each request case as it progresses through each case processing step. The INFORM system tracks State Department staff performance in responding to requests; indexes official agency documents that have been declassified, downgraded, upgraded or denied for release; and maintains, by document, the results of review to minimize subsequent search and review efforts and to ensure consistency in response to requests for the same or similar information. The INFORM system maintains a record of each request case as it progresses through each step of case processing. This tracking data provides immediate access to case status, and can generate prompting notices to bureaus responsible for searching and reviewing when their responses are overdue. INFORM can also determine if retrieved or furnished documents have been previously declassified or are duplicates of documents submitted by other searching offices, which helps to streamline overall declassification efforts. INFORM provides State Department management with statistical reports on a monthly, annual, and ad hoc basis which detail the relevant workload performance and case status information. INFORM operates in both direct access and batch on-line modes, with case data entered and accessible through remote terminals located in a secured area. INFORM system software was developed using Model 204 Data Base Management System and currently operates on a State Department IBM 3380 mainframe computer system. Development efforts are underway for the migration of INFORM to a RISC 6000 using a relational DBMS, thereby enabling ultimate integration with the REDAC system and interoperability with other related systems, particularly the Department's corporate database.

Origins: The State Department's management decision to install an optical digital data disk-based redacted document processing system was made in 1990. REDAC was designed to meet the needs of government FOIA offices. Prior to the REDAC prototype system, the actual review, excising, and annotation of agency documents was a time consuming manual process of marked-up photocopies. The REDAC system allows on-line review of scanned documents, maintaining unadulterated copies of the original documents along with the review results. REDAC also clearly outlines those segments that must be excised before the document can be released. In addition, to justify the reviewer's action the reasons for the excisions are printed on the document adjacent to the excised segment. A State Department reviewer may attach notes or comments to a specific document explaining the decision-making process. This reviewer is part of a team of retired foreign service officers with extensive experience in national security and foreign policy issues. The electronic documents "package" is then forwarded to a senior reviewer for verification of the original reviewer's decisions. Approved electronic document packages are forwarded to appropriate State Department staff for final action including printing of the releasable versions for the requestor, and subsequent closing of the case. WORM optical digital data disk technology was selected due to the media's non-alterability characteristics ensuring FOIA process integrity. Future State Department plans for REDAC include: increasing the number of reviewer workstations, installing a stand-alone workstation in the FOIA reading room available for public use, and upgrading the current version of REDAC to a Microsoft Windows version of REDAC. The State Department hopes to expand to a total of 125 workstations within FPC to include all case officers, research specialists, reviewers and members of the Technology Application Branch. The State Department will continue to utilize existing paper procedures to process requests for top-secret information, unless the case can be downgraded to a lower security sensitivity for imaging system retention. As mentioned previously, the State Department plans to ultimately integrate the REDAC and INFORM systems along with the Department's corporate database.

Date System Installed: 02/91; REDAC system recently upgraded during the summer of 92, Windows Version 3.0 expected in 1994.

System Installed by: The system was developed by Severn Companies, Inc. of Lanham, Maryland (Vendor). The original system configuration used Banyan VINES 4.0 as the Network Operating System. The network interface card was 3COM's 3C503 (8 bit).

The servers and workstations listed below were Wang 386 PC's with 8 MB RAM and a 120 MB hard drive.

  • File Server (2 320 MB ESDI Hard Drives)
  • Database Server (Gupta DBMS)
  • Optical Server (SONY Single Platter Optical Disk Drive)
  • Print Server (Wang LDP8 Printer attached)
  • Merge Server
  • 5 Reviewer Workstations with Cornerstone 19-inch High Resolution Monitors
  • 2 Fujitsu Desktop Document Scanners (15 -20 pages @ 200 DPI) The current system configuration now uses Banyan VINES 5.52 (5) as the Network Operating System. The network card is 3COM's 3C507 (16 bit). The servers and workstations listed below are Wang 486 PC's with 16 MB RAM and a 120 MB hard drive.
  • File Server (4 632 MB SCSI Hard Drives and 2 1.37 GB SCSI Hard Drives)
  • 2 Database Servers (Gupta DBMS)
  • Optical Server (SONY WDA-610 Optical Juke Box -- 50 disk capacity)
  • Print Server (Wang LDP8 Printer attached)
  • Merge Server
  • 20 Reviewer Workstations with Cornerstone 19-inch High Resolution Monitors
  • Kodak ImageLink 900 Document Scanner (120 pages/min @ 200 DPI)

Efforts are underway to directly link the State Department's classified IBM Corp. 3381 mainframe computers storing the agency's cable traffic record, the INFORM system tracking the FOIA processing, and the REDAC LAN server-based optical digital data disk system.

DIGITAL IMAGE CAPTURE:

Document Preparation: Documents are manually sorted, chronologically arranged, and batch scanned in, in small groups to improve scanning conversion. Hard copy prints of electronic State Department cable traffic are produced prior to REDAC scanning currently, however direct transfer of digital text plus indexing to optical digital data disk is a planned, future enhancement.

Conversion Staff: State Department staff process the incoming mail requests, and on- site contractor personnel operate the scanning and indexing systems.

Document Scanning: A high speed Kodak scanner captures up to 120 pages per minute. LAN servers manage the REDAC work flow processes and automatically route the image and index data to appropriate workstations. The servers also reallocate magnetic storage cache after the images are recorded onto the optical digital data disks, and provide a system management capability to monitor the status of in-process cases.

Disposition of Original Records: The REDAC System scanners scan copies of documents. Original records remain in their respective filing systems to meet day-to-day operational requirements. No paper copies of documents are retained in case folders any longer since copies are available from the optical storage medium.

Quality Control: The REDAC scanning staff do not routinely use image test targets to calibrate the document scanners or set quality benchmarks. Visual inspections of the displayed images are used to adjust the scanning equipment to achieve optimum quality.

Scanning Resolution: Documents are routinely scanned at 200 dots per inch. Adjustments can be made for very poor quality originals to up to 400 DPI.

Color and Gray Scale: The REDAC system scanner offers binary (black & white) capability. No scanner color blindness or dropout problems were noted.

Image Enhancement: No special enhancement algorithms are employed (other than basic scanner contrast settings).

Compression/Decompression: Full proprietary software processing system delivered with REDAC by Severn Companies, Inc. of Lanham, Maryland.

DOCUMENT INDEXING:

Location of Index Database: The document indexing information is stored on peripheral SCSI hard disks controlled by the file server and on optical for the REDAC system. This indexing duplicates the document listing currently being done in the INFORM system. Future migration plans will, however, result in the document indexing function being performed in only one system with document index data transferring to the other system as appropriate.

Index Structures: Each document is indexed by at least twelve specific information fields, containing characters which vary in structure from free-text to controlled alphanumerics. Subject classification is controlled by an agency-wide thesaurus system, TAGS (Traffic Analysis by Geography and Subject), in existence since 1973. Other fields include an 80-character free-text description, document date, type and number of pages, issuing bureau, and classification level. Codes are also included for geographical region, organization, and subject.

OPTICAL DIGITAL DATA DISK STORAGE:

Image File Headers: A proprietary header format was supplied with the REDAC imaging system.

Error Detection/Correction: The REDAC imaging system's vendor supplied an error detection capability, but no specific technical details are available. No optical digital data disk failures or any inability to retrieve optically stored images were noted.

Recording Process: Write once, read many (WORM), dual sided Sony Corporation optical media using bi-metallic alloy recording technology.

Optical Disk Composition: 12-inch diameter, polycarbonate substrate material. Capacity: 3.2 gigabytes per side of each optical digital data disk (6.4 GB disk capacity).

Number of Optical Digital Data Disks in Use: FPC currently has 12 platters and are now using the second platter. Purchase of additional platters will occur as needed since storage capacity for the platters has already doubled and, doubtless, will increase dramatically by the time additional platters are needed. Jukebox: A Sony Corporation 50-platter jukebox provides a maximum total optical storage system capacity of 328 gigabytes.

Storage Environment: The REDAC imaging system is operated in a normal office environment. The Information System Security Office has authorized FPC to operate the REDAC system at a Secret high level using non-Tempested equipment in a secured area. The jukebox is located in a SCIF (Secure Compartmented Information Facility) to facilitate linkage with the central data base for transfer of cable texts/indexing.

RETRIEVAL AND OUTPUT:

Primary System Users: State Department FOIA staff access the index and image data using staff workstations; public access workstations are planned for installation.

Display Output: Image and index data are displayed on Cornerstone 19" high- resolution monitors in a windows-like user interface environment.

Network Transmission: Workstations are linked on a Local Area Network, but no outside transmission of image or index data is permitted. The Small Computer Systems Interface (SCSI-1) does not currently function with the Viewstar platform. Document Security: Passwords are required as a system security measure. From a physical security perspective, the optical media (jukebox) is located in a SCIF area. In any case, Top Secret information is not stored in the REDAC system.

Output Printing: The existing REDAC system capabilities allow the printing of any of three versions of the requested document: 1) original format with no modifications; 2) requester version with sensitive information obliterated and exemption categories indicated; and, 3) "Justice" version which prints the entire text with the excised areas highlighted and exemption categories printed.

DATA MIGRATION POLICY ISSUES:

Linkages with Other Agency ADP Applications: The REDAC imaging system is currently not directly connected to the State Department's INFORM document tracking system or the 3381 mainframe computer database. This planned connection will permit direct downloading of State Department cable communication text in ASCII format, eliminating paper copy production prior to scanning. Long-term plans also include enhanced connectivity within all State Department bureaus for document input, search, and retrieval, subject to resolving information security issues.

Backup of Image and Index Data: Image and index data are backed-up incrementally every night and completely each week onto magnetic tape.

Technical Support and Documentation: The INFORM system's primary technical support is performed by on-site contractors from Computer Business Methods, Inc. (CBMI). The REDAC imaging system is supported through the development contract vendor agreement with Severn and with an on-site System Administrator from IMC. Complete system documentation is a contract deliverable upon completion of the prototype project. System administrators report that version 2.1 documentation appears to be adequate.

Interoperability: ISOO and NARA's recent initiative to direct a government-wide feasibility study to explore the potential creation of an electronic database for interagency use, possibly containing images of documents, released in whole or in part, has surfaced critical issues regarding interoperability. The State Department will face, along with any other government agency which has already implemented an imaging system, potential questions regarding interoperability. Cognizant of these issues, the State Department is carefully considering interoperability issues in their migration efforts and taking steps to exclude incompatibility problems. State has already demonstrated an awareness of the interoperability problem by eliciting another government agency's cooperation (specifically NARA) in the area of data standardization through State's effort to implement their Presidential Library System at NARA and its component Libraries.

Migration Plans: FPC has placed an order for a 970 RISC platform to migrate the INFORM data base from the mainframe classified computer so that INFORM can be fully integrated with the REDAC system. A relational data base and appropriate additional software are on order to facilitate this process. Plans are underway for a complete integration within the next year. The ultimate system will provide spreadsheet, graphics, word processing and numerous other capabilities to users in a Windows environment. Plans include the incorporation of all FPC offices within the network. The reading room application will provide access to a data base of declassified documents on-site at the Department and via dial-up from the Federal Depository Library Network. The automated system that FPC has provided to the Presidential Libraries for case tracking and document listing provides a direct upload capability for creating cases and document lists in the FPC systems and will provide for download of review results for return to the Libraries to reduce redundant data entry.

OVERVIEW OF SIGNIFICANT ISSUES:

Business Process Re-Engineering: The State Department recognizes that imaging technology can increase productivity if an organization is willing to adopt new operational procedures, rather than merely automating the existing processes. To that end, FPC contracted with Information Management Consultants, Inc. (IMC) to conduct several studies: a system evaluation and cost benefit analysis of the REDAC system, a Business Process Re-Engineering Study of FPC case processing and a Workflow Analysis of the Initial Processing Branch. The State Department is currently implementing the study's recommendations that changes be made to the FOIA's input processes. The overall goal is to reduce time consuming and labor intensive activities, improve internal controls, promote efficiency and quality, and enhance responsiveness to requesters.

System Model: The REDAC system could be viewed as a possible model for other Federal agencies with similar document security and/or redaction requirements. The prototype system developers focused on software interface development and the optical digital data disk storage components. System modifications would be required to reflect the peculiarities of local document types and index filing structures. A government-wide, standardized approach to imaging system developments would enhance the overall review process of documents that contain information originating from or affecting more than one Federal agency. The development of inter-agency, electronic document image sharing capabilities may depend, however, on the successful resolution of information security concerns.

Records Management: Although the records management issues of document appraisal, scheduling, and disposition are not immediately relevant, the INFORM system maintains a current classification status for these records. The integrated system will ultimately provide update data for the Department's central foreign policy data base, in terms of classification changes, releasability, etc.

Specific System Features: The State Department's document review system has a number of features that are particularly innovative and take advantage of the latest in computer software technology. These include: the redacting (excising) capability; the ability of the system to display a replica of the Department's Release and Declassification Logo on printed output; the maintenance of a complete audit trail of review decisions; and, the potential for timely public access to released documents.


SITE VISIT REPORT #15

AGENCY: United States Geological Survey

SYSTEM: United States National Seismographic Network

CONTACT: Ray Buland, National Earthquake Information Center, Denver Federal Center, Denver, CO

SUMMARY DESCRIPTION:

The National Earthquake Information Center (NEIC) of the United States Geological Survey (USGS) has used optical digital data disk technology since 1990 for seismology data storage. Seismic data is collected using the United States National Seismographic Network (USNSN), with a network control center located at the NEIC in Golden, Colorado. This seismic monitoring network is expected to eventually be installed and operated at nearly 100 sites nationwide. USGS researchers acquire the seismic data from detection sensors monitoring earth tremors. The system captures and stores seismic data from events such as earthquakes, volcanic activity, nuclear tests, and oil prospecting. The new optical digital data disk storage subsystem supports scientific research to a far greater degree than was possible with the preceding time consuming, labor intensive microform and magnetic tape based data storage systems. The twelve-inch write once, read many (WORM) optical digital data disks are stored in a jukebox, with custom hardware and software minimizing human intervention when responding to remote user requests. The NEIC system is important to this study due to: the scientific value of the captured seismic data; the agency's recognition of the significance and lack of comprehensive data exchange standards; the relationship of magnetic disk and optical digital data disk storage sub-systems; and, the successful implementation of optical media into a highly distributed, computerized seismic data system network.

BACKGROUND:

The primary responsibilities of the United States Geological Survey are: to identify the nation's land, water, energy and mineral resources; to classify Federally owned lands for minerals and energy resources and water power potential; to investigate natural hazards such as earthquakes, volcanos, and landslides; and to conduct the National Mapping Program. The USGS's Office of Earthquakes, Volcanoes, and Engineering is responsible for monitoring earthquake activity within the United States and world-wide. The Geologic Division gathers, analyzes and disseminates vibration arrival time data on about 1,200 seismic events each month. This data is captured from over 7,500 seismic detection stations located throughout the United States. The agency's National Earthquake Information Center is able to release news bulletins about new earthquakes occurring around the world almost immediately. The NEIC publishes weekly, monthly, and annual bulletins summarizing seismic activity. A technical NEIC research group uses the seismic event data to develop and adopt mathematical algorithms for improving detection and analysis efforts. The NEIC actively disseminates the seismic data directly to USGS staff and academic seismology research units such as those located at Harvard University and the University of New Mexico. Traditional seismic activity recording technology was developed in the 1960's as a world-wide strategy for collecting and storing earth tremor data. During that time, systems primarily used light pen technology for recording data onto photographically sensitive paper. The recorded paper media was stored in Boulder, Colorado after image analysis and data processing. A modified step and repeat microfiche camera produced microphotographs for long term retention. The USGS continually strived to enhance the original record quality so that the subsequent microfilm images were also improved. The decision to replace the microfilm system was based on: increased costs for silver film recording materials; and a need to eliminate the time consuming manual microfilm browsing and access problems.

Origins: Due to the slow growth of seismic data rates, and the costly computer power required to efficiently process large data volumes, a major conversion to digital processing was not considered cost effective prior to 1987. However, a computing technological wave crest resulted in across the board cost reductions combined with improved processing capabilities. USGS staff monitoring the marketplace observed high performance computer workstations that previously cost one million dollars each in the mid-1980's, were drastically reduced to as little as fifty thousand dollars. The NEIC project is a state of the art national seismic network designed to improve data gathering capabilities, a much needed system discussed for more than 30 years. USGS collaborated with the National Research Council's (NRC) standing committee on seismology concerning the end-to-end retooling of the seismic data collection network. In 1987 the NRC funded the enhancement of the eastern United States seismic recording network. High fidelity recording of vibration wave forms is the latest technological approach to monitoring seismic events. Research support to further develop this technique was supplied from several sources including the seismic laboratories at Harvard University and Albuquerque Seismic Laboratory in New Mexico. The university's regional consortium of global monitoring project, identified as the Incorporated Research Institutions for Seismology (IRIS) in Albuquerque, New Mexico served as a useful model for staff analyzing alternative seismic recording systems.

The decision to include optical digital data disk technology as the archival storage media resulted from two key factors: optical media was the first commercially viable technology combining sufficient data storage capacity with a favorable cost structure; and, the permanent nature of WORM media offers long term data retention. The NEIC system is production oriented, and user data access needs for on-line data typically abates after the weekly and monthly bulletins are produced. Seismic data can be stored off-line as user needs for up-to-the-minute data diminishes. The NEIC is not a data management center, but rather is tasked with collecting and distributing event data. Automated data set production supported by the optical digital data disk subsystem is a key requirement. Users: The National Earthquake Information Service (NEIS) is widely recognized as a highly cooperative effort. This service indirectly drives the real-time nature of the program, although there are no direct on-line data users. While the previous manual seismic system served mainly as a warning alarm, the automated capabilities of the replacement system offers many benefits. Internal NEIS center staff analysts can now get immediate feedback concerning an earthquake's location and intensity. Analysts may begin their search using the existing data base to obtain evidence for early notifications, contributing to quicker and more accurate release of earthquake warnings. The NEIS is committed to quickly providing reliable data to secondary users including: USGS staff; other government organizations such as the National Research Council (NRC); Department of Defense; university groups needing realtime data access; and foreign governments through an NEIS exchange agreement.

SYSTEM CONFIGURATION:

Date System Installed: 1990; system components include:
  • NEIS VAX 8350 mainframe computer acting as a mass storage server with data access through multi-ported magnetic disks.
  • Data communication to stored data via Internet and SPAN.
  • Remotely installed high and low-gain motion detection seismometers.
  • CMOS 68000 microprocessor based station processors.
  • Bi-directional, host-to-host star configuration computer networked satellite telemetry system.
  • Ethernet LAN network processors.
  • VAX workstation 3600, 800 x 1000 pixels and 256 shades of gray.
  • Aquidneck Systems International optical disk subsystem using 12" Sony WORM disks in a Sony 50-platter jukebox.
  • Future data distribution via satellite link and possibly CD-ROM.

DIGITAL DATA CAPTURE:

The NEIC system automates the functions of: data capture, indexing, data transmission, and user access.

Conversion Staff: On-site agency staff record data onto optical disks.

Input Quality Control: Signal to noise ratios impact the quality of the signal received despite the low and high-gain sensors developed using the latest detection technologies.

Recording Density: The system records in 24-bit data format. Image Enhancement: NEIC staff use a true domain filter algorithm to smooth out the images, resulting in a less grainy appearance with the broad band sensor data.

Compression/Decompression: The NEIC system uses a high level compression scheme similar to other seismological data storage methods. The NEIC system accepts seismic data in mixed compressed and uncompressed modes. The data compression algorithms support remote sensor recording systems designed to capture large values of ground motion that are rarely attained. This allows the compression algorithms to eliminate much of the high order zeros, or the sign extension portion of the raw data, while still ensuring that the original data value is recoverable. The NEIC scheme makes provisions for effective use of fixed length computer stored records, embedded header information, and redundant information to promote fault recovery. Data consistency checks are available during decompression to determine where the data may have been corrupted. If found, the forward compression is halted until the next uncorrupted header is located. A technique known as backwards decompression is then used to reconstruct some of the corrupted data.

DATA INDEXING:

Creation of Index Database: Fully automated indexing of the data files occurs at the point of capture and initial data storage. The seismic event's recorded chronological elapsed time (duration) serves as the primary index key.

Location of Index Database: Index database stored on magnetic tapes.

Index Structures: The seismic database consists of the following information: primary seismic wave form data; indexes of seismic wave data arrival time estimates with time sequence of one channel of one field recording instrument; and the actual recorded earthquake seismic events. Many of the recorded events are non-earthquake related surface vibrations such as violent explosions that result from routine strip mining.

OPTICAL DIGITAL DATA DISK STORAGE:

The intelligent micro-computer based controller makes the optical drive emulate a standard 9-track small computer system interface (SCSI) tape drive, while taking advantage of the random access searching with optical media.

Data File Headers: Data is structured according to standards developed by members of the Federation of Digital Seismographic Networks (FDSN). Initial implementations were made by the Incorporated Research Institutions for Seismology (IRIS) and the USGS. The Standard for the Exchange of Earthquake Data (SEED) is an international standard format for the exchange of digital seismological data. SEED was designed for use by the earthquake research community, primarily for the exchange of unprocessed earth motion data between institutions. It is a format for digital data measured at one point in space and at equal intervals of time.

Error Detection/Correction: USGS proprietary algorithms. Recording Process: 12" WORM (Sony), dual-sided.

Optical Digital Data Disk Composition: Polycarbonate substrate. Capacity: 3.2 gigabytes per side, 6.4 GB data storage per disk. Jukebox: Sony jukebox with 50-disk capacity.

Storage Environment: Jukebox and magnetic disk systems are installed and operated under a computer room environment. Optical disk drives and workstations are located in normal office environments.

RETRIEVAL AND OUTPUT:

Primary System Users: USGS staff and other scientific researchers.

Display Output: VAX workstation 3600, display resolution of 800 x 1000 pixels, 256 shades of gray. Optically stored data is transferred to magnetic disks for user access. Printed hard copy is rarely used except for publications, seismic data bulletins, or for public relations press releases.

Network Transmission: External data distribution is accomplished using several techniques, including: system software that supports automated real-time data transmission (e.g. to Cal Tech); NEIC network access using off-line browsing from magnetic disk storage; broadcast of entire data stream within the continental United States using variable time delay based on load modeled after weather map data distribution systems; compact disc, read only memory (CD-ROM). All of these systems are currently in the testing stages. The first test of receiving end software is at the Hoffman Laboratory at Harvard University, with one telemeter station at Cal Tech seismic laboratory.

DATA MIGRATION POLICY ISSUES:

Migration Plans: No specific data migration plans exist at this time. Long range agency planning is inhibited by the lack of specific information regarding the timing and availability of adequate funding. The USGS currently has data migration problems with storing data on 7 track magnetic tapes. The Albuquerque, New Mexico facility has a huge magnetic tape data storage center with staff responsible for storing, accessing, and refreshing a vast magnetic tape collection. Linkages with Other Agency ADP Applications: None planned. The optical disk jukebox will store data received from several sources to accomplish several interrelated tasks. The network buffers direct outside access to data stored in the optical disk jukebox.

Backup of Image and Index Data: System operational procedures require twice weekly data backups.

Technical Support and Documentation: The NEIC system utilizes a combination of in-house and contractor technical support. The in-house government technical support concentrates on the central processing system. Contractor staff develop system software and provide technical support for the remote satellite communications and field sensor equipment. The NEIC system's documentation is not static, rather it reflects the evolutionary development process of the system itself. The software source code is maintained by two programmers.

Interoperability: There is a growing trend towards seismic network consolidation, with the seismic community considering the archiving of the last thirty years of regional data within the United States. The archiving concept would evolve to where the seismic data would be more likely to be preserved, and more readily available to users.

Data Security: The NEIC's mission is to distribute data as widely as possible, therefore greater emphasis is placed on data integrity, data quality, and continuous data availability rather than on concerns of limiting access or protecting the data from computer hackers. However, system managers are concerned about computer viruses on Internet, and have installed protective electronic safeguards.

OVERVIEW OF SIGNIFICANT ISSUES:

Nature of system information: The master NEIC earth station is in Golden, Colorado. Data is received in real-time from two main sources: telephone lines transmitting synchronous data; and, satellite links transmitting data captured in the field with high level wide protocols with significant data buffering. Although the SEED standard contains an agreed upon data compression format, a big challenge exists when processing mixed compressed and uncompressed data. The system's data definition language works best with uncompressed data, as compressed data increases processing complexity.

Data Exchange Standards: Although standards are sometimes viewed as having a negative influence on technological development and innovation, the absence of a firm agreement on international data format standards contributes to significant seismic data interchange problems. Whereas the Standard for the Exchange of Earthquake Data (SEED) is a valuable reference manual, a sizable problem facing administrators is the more than twenty different data formats among the forty to fifty existing seismic networks. Three networks in California alone account for eighty per cent of the seismic data collected. System software is a major data conversion problem as well, but the consortium of university regional data managers is attempting to improve the data standards situation. Another factor is that seismology data exchange standards are constantly evolving, having been through three generations of standards within the last fifteen years. Agency administrators are also considering data distribution using CD-ROM technology.

Data Expansion: Data utilization encompasses data movement and on-line data access for production of earthquake event bulletins. Digital data storage requirements are impacted by system growth and coverage. The NEIC implementation will start with data from the broadband network system dominated by the short period network. The broadband network generates approximately twelve times more data, and the sixty projected broadband stations are expected to daily produce up to 80 MB of data. The most optimistic projection for seismic digital data collected per day is one gigabyte. The data archiving is just starting, with one year's worth of data (one and one half optical digital data disks are now full). Additionally, there are four and one half gigabytes of data on magnetic storage, more than two gigabytes of data on auxiliary storage, and an additional two gigabytes of data on system boot magnetic disk storage units.

Information Management Policy: The NEIC's data management and distribution policies are impacted by the type and magnitude of the seismic event. For example, some selections are made in real time, differentiating the seismic event from unavoidable signal-to-noise levels. Other selections are performed within an hour of initial recording, after separating critical seismic event data from other tremor data such as man-made explosions. The NEIC has to distinguish between civil defense events, requiring immediate notification of authorities, from events of interest mainly to researchers. Researcher interest is driven by the earthquake's seismic intensity or magnitude, with a recording level of 5.8 or greater considered adequate. The United States continental scale network requires only a 4.5 or greater event, due to the increased quality of sensing and recording technologies. New sensor technology has greatly improved detecting capabilities for medium sized events, with local recording networks performing well with smaller events and global nets better on events of 5.0 and greater. The United States continental net is in between, performing adequately with events rated as low as 3.5. The NEIC has experienced a real change in attitudes across the government, shifting away from academic research to viewing seismic measuring as a national service. There is more interest in the United States in identifying and learning about the source of the seismic event as useful predictive data. In this approach, the needs of science and society can be equitably balanced between pure research and the needs of public safety advocates.


APPENDIX B: SUMMARY OF TECHNICAL STANDARDS

Overview of the Standards Process

The domain of significant international standards activities include Europe, Japan, and the United States. The European Computer Manufacturers Association (ECMA) and other European and International Standards Organizations (ISO) are important contributors to standards creation, which are often followed by American National Standards Institute (ANSI) acceptance.

Technical standards development is a complex process involving political relationships at the various technical committee levels. These standards bodies oversee the development of non-proprietary or open standards which may be used by any manufacturer or distributor. A standard may address a specific product or generic process, and standards are assuming an increasingly important role in system integrations due to the increasing requirements for component compatibility. Developing non-proprietary technical standards is a deliberative process requiring intellectual consensus among diverse and often competing organizations. The objectives of standards bodies should include the planning, development, coordination, and distribution of standards and the exchange of technical information.

The standards process involves several draft document review cycles with formal approval at each level. Also required is a clearly defined document flow which assures that all interested parties are involved in the review process, and timely public announcements when specific phases are completed. Sufficient time is required to accomplish these well defined procedures, and consequently developing a standard requires patience.

Since high technology products usually continue to evolve, standards may not remain current with the latest product developments. Although standards may help users avoid proprietary products that may soon become obsolete and unsupported, standards do not always guarantee one hundred per cent compatibility. This is due to the complexity of today's technologies, and the opportunities within a standard for different technical interpretations. Generally, there are two schools of thought concerning the scarcity of technical standards for digital imaging and optical storage systems, and the impact on this situation on the industry.

One viewpoint is that market growth is being restrained because potential users are waiting until the technologies are more mature. On the other hand, some consider the very concept of standardization as an unneeded constraint, and prefer to let the marketplace determine the industry's direction. It is not uncommon for standards to lag behind the marketplace, especially in rapidly changing technology-driven industries. Many de facto standards result from products that have surpassed their competitors in commercial sales and installations.

In the case of compact disc, read only memory (CD- ROM) technology for example, several prominent companies informally created technical guidelines which were subsequently recognized by the accredited standards bodies. One drawback for companies who delay product developments in anticipation of standards is the risk of being left behind in technological expertise and marketing capabilities.

Optical Media Standards Criteria

Three levels of optical media interchange standards would ideally be included in a comprehensive optical digital data disk standards document:
  • Optical digital data disk media - description of the optical and mechanical properties of the media such as substrate materials, diameter, thickness, and center hole specifications.
  • Physical format - criteria such as error correction codes, track location, and data code formats.
  • Logical format - format of the data including volume labels, file structures, and header files.
Two popular optical media recording techniques differ in the laser servo tracking operation within the optical disk drives. Continuous composite servo (CCS) format uses optical media with grooved tracks installed during optical digital data disk manufacturing process. The sampled servo format (SSF) incorporates tracking indicators specially encoded into the optical digital data disks. Although these formats are incompatible, they were implemented into 5.25-inch optical digital data disk standards by the ISO and ANSI standards organizations. Methods for testing media characteristics are needed to ensure conformance once media interchange standards are formally adopted. The International Standards Organization (ISO) has developed standards for optical media without major emphasis on testing methods or data permanence, although most manufacturers realize the importance of these issues. Most test methods for optical digital data disks involve dynamic processes--measures taken while the disk is rotating--rather than static processes on the media itself. This testing approach results in test methods which are adaptable to different physical media. Since a general data permanence testing methodology does not exist, there are no longevity standards useful in planning how long information may be stored on optical digital data disks. Status of Digital Image and Optical Disk Standards The American National Standards Institute (ANSI) represents the United States in the international standards arena. An ANSI technical committee (TC) is responsible for developing optical media interchange standards. TC X3B11 consists of 36 principal members, with meetings held bi-monthly. The committee's work is coordinated with the International Standards Organization (ISO), through committee ISO/IEC JTC 1/SC23. National and international standards activities for digital imaging and optical disk (and related technologies) include: Digital Image Capture ANSI/AIIM MS52-1991 Recommended Practice for the Requirements and Characteristics of Original Documents Intended for Optical Scanning ANSI/AIIM MS49-1993 Recommended Practice for Monitoring Image Quality of Roll Film and Microfiche Scanners. Image Quality Standards ANSI/AIIM MS44-1988 Recommended Practice for Quality Control of Image Scanners. Adopted as FIPS PUB 157 - Guideline for Quality Control of Image Scanners. MS44-1988 makes available an X440 Scanner Test Target Set, including an IEEE facsimile test chart, two RIT process ink gamut charts, and ten AIIM scanner test targets. AIIM TR26-1993 Resolution as it Relates to Photographic and Electronic Imaging. AIIM TR27-1991 Electronic Imaging Request for Proposal (RFP) Guidelines. Indexing Database Related Standards/Reports ANSI Z39.4-1984 Basic Criteria for Indexes. 90mm (Nominal) Diameter Optical Digital Data Disk Standards ISO/IEC 10090 Information Technology - 90 mm (3.54-inch) case disk, Rewritable Optical Disk Cartridge for Information Interchange. 130mm (Nominal) Diameter Optical Digital Data Disk Standards ISO/IEC 9171-1, 1990 Information Technology - 130 mm (5.25-inch) Optical Disk Cartridge, Write Once, for Information Interchange
  • Part 1: Unrecorded Optical Disk Cartridge. ISO/IEC 9171-2, 1990 Information Technology - 130 mm (5.25-inch) Optical Disk Cartridge, Write Once, for Information Interchange
  • Part 2: Recording Format. ISO/IEC 10089, 1991 Information Technology - 130 mm (5.25-inch) Rewritable Optical Disk Cartridges for Information Interchange. ISO/IEC 11560-1992 Information Technology - 130 mm (5.25-inch) Rewritable Optical Disk Cartridge Write-Once, Using the Magneto-Optical Effect for Information Interchange. ANSI X3.211-1992 American National Standard for Information Systems - 130 mm Write- Once Optical Disk Cartridge using Continuous Composite Serve, RLL 2,7 Encoding and LDC. ANSI X3.212-1992 American National Standard for Information Systems - 130 mm Optical Disk Cartridge Using the Magneto-Optical Effect and Continuous Composite Servo Format. ANSI X3.214-1992 American National Standard for Information Systems - 130 mm Write- Once Optical Disk Cartridge Using Sampled Servo and 4/15 Modulation. ANSI X3.220-1992 American National Standard for Information Systems Digital Information Interchange - 130 mm (5.25-inch) Optical Disk Cartridge Using the Magneto-Optical Effect for Write-Once Functionality. ANSI X3.191-1991 American National Standard for Information Systems - Recorded Optical Media Unit for Digital Information Interchange - 130 mm (5.25-inch) Optical Disk Cartridge, Write-Once, for Information Interchange, Sampled Servo (S/S), RZ Modulation with Selectable Pitch. 356mm (Nominal) Diameter Optical Digital Data Disk Standards Note: The following two Standards are the same document. One Standard has an ISO designation, while the other has an ANSI designation. ISO/IEC 10885 Information technology - 356 mm optical disk cartridge for information interchange - write once. ANSI X3.200-1992 American National Standard for Information Systems - 356 mm Write Once Optical Disk Cartridge for Information Interchange. Other Optical Digital Data Disk Related Standards/Reports AIIM/TR21-1991 Recommendations for Identifying Information to be Placed on Write- Once-Read-Many (WORM) and Rewritable Optical Disk (OD) Cartridge Label(s) and Optical Disk Cartridge Packaging (Shipping Containers). AIIM/TR25-1990 The Use of Optical Disks for Public Records. AIIM/TR28-1991 The Expungement of Information on Write-Once-Read-Many (WORM) Optical Media. Compression/Decompression Standards and Schemes CCITT 84a-1984 Standardization of Group 3 Facsimile Apparatus for Document Transmission (Group 3 is for black and white images only); Recommendation T.4. CCITT 84b-1988 Facsimile Coding Schemes and Coding Control Functions for Group 4 Facsimile Apparatus; Recommendation T.6. ISO 10918-1 Joint Photographic Experts Group (JPEG); Digital Compression and Coding of Continuous-tone Still Images; ISO/IEC standard; Recommendation T.81. ISO/IEC 11544-1 Joint Bi-level Images Group (JBIG); a group of compression techniques designed to replace CCITT Group 3 and Group 4 compression; Recommendation T.82. Consortium and Proprietary Standards PBM Portable Bitmap Format. Allows bitonal bitmap images to be sent by mailers unable to handle pure binary files. PGM Portable Gray-scale Map. Allows grey scale bitmap images to be sent by mailers unable to handle pure binary files. PPM Portable Pix Map. Allows color bitmap images to be sent by mailers unable to handle pure binary files. SunRaster A standard introduced by Sun Microsystems for use on their range of workstations. Both monochrome and RGB color images are supported. The main raster formats supported are raw-bitmap, RLE, (X)RGB, and (X)BGR. GIF Graphics Interchange Format. GIF was developed by Compuserve Inc. to enable their users to exchange color graphic files independent of hardware they own. GIF uses a variation of the Lempel-Ziv and Welch algorithm known as variable-length code LZW. PCX PC Paintbrush Format. PCX is the image file format used by ZSoft Corporation's PC Paintbrush graphics application. It is almost entirely driven by the IBM PC graphics display hardware requirements. A PCX file consists of a header, two color maps (optional), and a raster image. TGA Targa. The TGA image file standard was developed by Truevision, Inc. to support their widely used video graphics products. The revised specification (version 2.0) is claimed to be the first true-color format. The format consists of a fairly traditional header, color map, and image. An image may be variable in size and can be represented in monochrome, true color (no color map but direct storage of images in RGB values), and direct- or pseudo-color (using color map). Run Length Encoding (RLE) is employed where compression is required. FITS Flexible Image Transport System. The FITS image file standard was developed to service the need to transfer large images for the astronomical community between installations using nine-track, half-inch magnetic tape. FITS images differ from other images in that they are arrays that can have up to 999 dimensions and that the data can be represented in unsigned bytes. There are no limitations on the spatial dimensions of the image. TIFF Tagged Image File Format. TIFF was originally developed by Aldus/Microsoft staff to for image scanning and desktop publishing applications. The file format incorporates powerful functionality and flexibility but is somewhat complex. The current version of TIFF (Revision 6.0) segments its features into two subgroups: TIFF Baseline, and TIFF Extension Features. The Baseline subgroup refers to a minimum set of TIFF features that all general-purpose TIFF readers should implement. The Extension Features subgroup (extensions should be registered) accommodates special applications. The TIFF file comprises a header, n file directories, and n images arranged in a hierarchial structure. This allows multiple images to be stored within a single TIFF file. EPS Encapsulated Postscript. An EPS file holds image information in the form of a page description language program which instructs the display where and how to draw lines and fill surfaces to reconstruct the original image. There is also an optional bitmap at the beginning of the EPS file to allow applications that are unable to understand postscript commands to include and display the image. The EPS file usually contains a header, image bitmap (optional), and the image description. EPS files should also conform to the latest version of the Adobe Document Structuring Convention and be well behaved by returning the environment to its original state upon completion. Photo-CD A photographic image standard developed by Eastman Kodak to enable traditional photographic images to be digitally transferred to a compact disc (CD). The photographic images are available for display using a computer system or television set, using a Photo-CD player or any XA drive compatible with the Photo CD format. Photo-CD employs a device independent color- encoding method called Photo-YCC. It is based on the CCIR 600-1 and 709 video standards which enable it to minimize the display processing overheads. The components that make up a single image are stored in a grouping on the CD known as an Image pac which consists of a header (IPA or Image Pac Attributes header), the image components in order of increasing resolution, and an extension field (IPE - Image Pac Extensions). An Image Pac for a 35- mm frame typically requires 3 to 6 MBytes of storage, and depending on image resolution a range of up to six sizes are now available. HDF Hierarchial Data Format. The HDF file structure was developed by the National Center for Supercomputing Applications (NCSA), and it can be viewed as a header (including signature) followed by a number of data objects. The data descriptor contains a tag designating the data type, a unique reference number, and pointers to the data itself. There are potentially over 65,000 tags that can be defined which should allow for a wide variety of data types to be supported. Software support is provided in the form of a high-level application interface and a low-level interface for basic HDF manipulation. These interfaces may be called from either FORTRAN or C. Both 8 and 24 bit images are supported. DSF Data Storage Format. DSF is a variation of the TIFF format in which the image strips are replaced with fields called datamaps. A datamap is defined only as a set of bytes. Unlike TIFF, the directories and datamaps in DSF are referenced by their names (character string) and can be dynamically created or deleted. Image File Standards ANSI/AIIM MS53-1993 Standard Recommended Practice, File Format for Storage and Exchange of Images; Bi-Level Image File Format: Part I. This standard addresses bi-level electronic images that are coded using CCITT Recommendations T.4 and T.6 (Group 3 and 4) as well as bit-mapped images (having no compression). MS53 will put into one standard a self-contained file format for bi-level image file transfer environments other than facsimile. Approved in March 1993 to replace TIFF bi-level format for applications that require file transfer across different platforms. Part 2 and Part 3 are undergoing development for continuous tone and color images. I/O Interface Standards ANSI X3.131-1986 American National Standard for information systems, Small Computer Systems Interface (SCSI). Digital Image Display and Output Standards/Guidelines AIIM/TR19-1993 Electronic Imaging Output/Display Devices. AIIM/TR29-1993 Electronic Imaging Output/Printers. CBEMA/X3W1 Image Printer Specifications. Legal Admissibility Standards AIIM/TR31-1992 Part 1: Performance Guidelines for Admissibility of Records Produced by Information Technology Systems as Evidence. AIIM/TR31-1993 Part 2: Performance Guidelines for the Legal Acceptance of Records Produced by Information Technology Systems for Regulatory Purposes. Optical Character Recognition Standards/Guidelines ISO 1073/1-1976 Alphanumeric Character Sets for Optical Recognition, Part I: Character Set of OCR-A- Shapes and Dimensions of the Printed Image. ISO 1073/2-1976 Alphanumeric Character Sets for Optical Recognition, Part II: Character Set OCR-B- Shapes and Dimensions of the Printed Image. FIPS PUB 32-1 Character Sets for Optical Character Recognition, Adopted 13 September, 1989. Incorporates OCR-A and OCR-B. Adopts ANSI X3.2-1970 (R1976), ANSI X3.49-1975 (R1982). ANSI X3.93M-1981 (R1989) Optical Character Recognition Positioning. ANSI X3.99-1983 (R1991) Guideline for Optical Character Recognition Print Quality. Compact Disc Standards and Industry Guides ISO/IEC 9660, 1988 Information Processing - 120 mm (4.75-inch) Volume and File Structure of CD-ROM for Information Interchange. Red Book A CD-ROM Color Book Drive Standard for audio compact disc (CD Audio) drives. Yellow Book A CD-ROM Color Book Drive Standard for compact disc read only memory (CD-ROM) drives. This standard also applies to the CD-ROM XA compatible drives used to read Kodak's Photo CD media. Green Book A CD-ROM Color Book Drive Standard for compact disc interactive (CD-I) drives. Orange Book A developing CD-ROM Color Book Drive Standard for magneto- optical (MO) and recordable (CD-R) drives. This book defines a CD-ROM/MO hybrid device, and includes single-session and multisession CD-R. Miscellaneous Standards AIIM/TR17-1989 Facsimile and Its Role in Electronic Imaging. ANSI/EIA 527-1986 Screen Definition for Color Picture Tubes ANSI/HFS 100-1988 American National Standard for Human Factors Engineering of Visual Display Terminal Work Stations ANSI/ISO 5807-1985 Information Processing--Documentation Symbols and Conventions for Date, Program and System Flowcharts, Program Network Charts, and System Resource Charts. ANSI X3/TR-7-89 Information Processing Systems Technical Support--User Documentation for Consumer Software Packages MIL-STD-28000, MIL-M-28001, MIL-R-28002 Department of Defense standards in wide use in the CALS engineering data systems (DSREDS/EDCARS/EDMICS). Draft/Emerging Standards and Projects The following proposed standards and projects are under development, and are provided here for informational purposes only. AIIM/TR31 Part 3: User Guidelines (Draft document under review). AIIM/TR31 Part 4: Model Rule and Model Law (Draft document under review). ANSI/AIIM MS50 Proposed standard: Recommended Practice for Monitoring Image Quality of Aperture Card Film Image Scanners. AIIM Project 99 Projects (in production) for: Electronic/Digital Test Targets (Part 1: The Target; Part 2: Description and Use). AIIM/MS55 Proposed standard (in production): Identification and Indexing of Page Components (Zones) for Automated Processing in an EIM Environment. AIIM/MS56 Technical Report (TR) on Indexing Considerations For An EIM System (draft report currently under revision by AIIM standards committee). ISO/IEC DIS 13481 Data interchange on 130 mm optical disk cartridges - Capacity: 1 gigabyte per cartridge. ISO CD 13403 Proposed draft standard under development for Information Interchange on 300 mm Optical Disk Cartridges of the Write Once, Read Multiple (WORM) using the Continuous Composite Servo (CCS) Method. A different proposed draft standard also under development is for the Sampled Servo (SS) method. ISO (JTC1/SC24/WG7) DIS 12087-3 (IIF) Image Interchange Facility. The IIF is part of the first International Image Processing and Interchange Standard (IPI). It comprises both a data format definition and a gateway functional specification. A data format definition is proposed for exchanging arbitrarily structured image data which can be used across application boundaries. There are also definitions of parsers, generators, and format converters to enhance open image communication. Part 2 of the IPI standard is the Imaging Kernel System (PIKS) and is located in the IIF Gateway which controls the import/export of image data to/from applications as well as the PIKS. The IIF may serve as a future image content architecture of the Open Document Architecture (ODA). ISO/IEC DIS 13346 (ECMA 167) Information technology - Volume and File Structure of Write-Once and Rewritable Media using Non-Sequential Recording for Information Interchange. ISO DIS 13490 (ECMA 168) (Also known as the "Frankfort Specification") Draft International Standard and Draft ECMA Standard for read only and write once compact disc media which promises equal enrichment for Unix, MacIntosh, OS/2, and Windows NT. Frankfort also supports the incremental update capability that is lacking in ISO 9660. ECMA 168 will conform to the Orange Book Specification. CGATS 3.9-199X -proposed Specification for the Calibration and Use of a Color Monitor for Comparison to Hard Copy for Color Proofing of Graphics Arts Images. ANSI/AIIM MS60 Proposed standard (in production): Electronic Folder Interchange Datastream. dpANS X3.131-199X Draft proposed American National Standard for information systems, Small Computer Systems Interface - 2 (SCSI-2). ISO DIS 9316-1 Draft International Standard, Small Computer Systems Interface - 2 (SCSI-2). ANSI/NISO Z39.72-199X Proposed American National Standard for CD-ROM Mastering. AIIM/MS57 Proposed standard (in production): CD-ROM Application Profile for EIM. ANSI/AIIM MS59-199X Proposed ANSI Standard for Information Systems - Use of Media Error Monitoring and Reporting Techniques for Verification of the Information Stored on Optical Digital Data Disks. Standards Organizations/Groups (Abbreviations) TERM ORGANIZATION AIIM Association for Information and Image Management ANSI American National Standards Institute ARMA Association of Records Managers of America CBEMA Computer Business and Equipment Manufacturers Association CCITT International Consultative Telegraph and Telephone Committee CEN Comité Européen de Normalisation CENELEC Comité Européen de Normalisation Electrotechnique COS Corporation for Open Systems ECMA European Computer Manufacturers Association IEC International Electrotechnical Commission IEEE Institute for Electrical and Electronic Engineers IMA Interactive Multimedia Association ISO International Standards Organization ISSB Information Systems Standards Board ITSB Image Technology Standards Board ITU International Telecommunications Union JBIG Joint Bi-level Image Group JPEG Joint Photographic Experts Group MPEG Motion Pictures Experts Group NARA National Archives and Records Administration NAPA National Association for Photographic Manufacturers NISO National Information Standards Organization NIST National Institute of Standards and Technology NTSC National Television Standards Committee TSS Telecommunications Standardization Sector (Former CCITT)

APPENDIX C: GLOSSARY OF TERMS

Ablative: A process of write-once optical recording in which a laser creates microscopic holes or pits in a thermally-sensitive recording material consisting of a tellurium-based thin film coated on a glass or plastic surface. When read by a laser, areas of an optical disk's surface that contain holes/pits will reflect light differently, thereby permitting identification of recorded bits. See recording process.

Access: In data processing, the process of retrieving data from memory.

Access Time: A term that describes 1) the time it takes to get an instruction or a unit of data from computer memory to the processing unit of a computer, 2) the time it takes to get a unit of data from a direct access storage device to computer memory.

Algorithm: A formula for solving a problem; a set of steps in a specific order, such as a mathematical formula or the instructions in a computer program.

ANSI: American National Standards Institute. A highly active group affiliated with the International Organization for Standardization (ISO); ANSI prepares and establishes standards in a number of technical disciplines, including transmission codes (e.g., ASCII), protocols, storage media (tape and diskette), and high level languages (e.g. Fortran, Cobol).

ASCII: American Standard Code for Information Interchange. American National Standard binary-coding scheme consisting of 128 eight-bit patterns (7 bits plus a parity check bit) for printable characters and control of equipment functions.

Backfile Conversion: Scanning older existing document holdings for image processing and retrieval.

Back-up Copy: The process of making a copy of index or image data files for use in the event that the original is lost, damaged, or destroyed.

Bar Code Scanner: A device used to read such bar codes as the Universal Product Code by means of reflected light.

Bi-metallic Alloy Media: A write-once recording process which features a disk consisting of two metal- alloy layers sealed in plastic to protect against oxidation and contaminants. A laser beam records information by fusing the two layers together, thereby creating a four-element layer that represents the "one" bits in digitally-coded data. The two layers are left unfused to represent "zero" bits. When such disks are read by a laser, the fused layers will reflect light differently than the unfused layers.

Binary: A computer code using two distinct characters, normally 0 and 1. Binary Scanner: An optical reader that scans and converts images into digital form. A binary scanner records each pixel as only black or white. See gray scale scanner.

Bit: Contraction of BInary digiT. The smallest unit of data a computer can process. Represents one of two conditions: On or Off, 1 or 0, Mark or Space, Something or Nothing. Bits are arranged into groups of eight called bytes.

Bit Map: A method of representing images by assigning an individual memory location for each picture element (pixel).

Bitonal: Having picture elements (pixels) that are only 1 bit deep. A bitonal image has two intensity values (1 and 0), corresponding to black and white. See binary scanner.

bpi: Bits per inch. Measure of the density of information storage on media.

bps: Bits Per Second. In serial data transmission, the instantaneous bit speed with which a device or channel transmits a signal. Sometimes confused with baud.

Byte: A group of bits, processed or operating together. An electronic data processing term that is used to describe one position or one character of information. The most common byte is eight bits long. A byte has 256 different possible combinations of eight binary digits.

CALS: Computer Aided Acquisition and Logistics Support. A Department of Defense initiative supporting the electronic interchange of data and documents (including engineering drawings) between contractor, government agencies and end users.

CCITT: International Telegraph and Telephone Consultative Committee. Abbreviation of the French name for the committee which, among other things, issues standards for facsimile, including Group 3 and Group 4 digital standards which include data compression and decompression.

COLD: Computer Output to Laser Disk. Technique for the transfer of computer-generated output to optical disk, such that it can be viewed or printed without use of the original program.

COM: Computer Output to Microfilm. Microforms containing data produced by a recorder from computer-generated electrical signals.

Compatibility: The characteristic of data processing equipment by which one machine may accept and process data prepared by another machine without conversion or code modification.

Compressed File: Refers to final digital file image storage required after compression. Smaller file sizes are generally preferred to maximize storage media use and facilitate data access.

Compression: A software or hardware process that "shrinks" images so they occupy less storage space, and can be transmitted faster and easier. Generally accomplished by removing the bits that define blank spaces and other redundant data, and replacing them with a smaller algorithm that represents the removed bits.

Computer System: A configuration, or working combination, of computer hardware, software, and data communications devices.

Continuous Tone: An image that has all the values (0 to 100%) of gray (black and white) or color in it. A photograph is a continuous tone image.

Conversion: Procedure in which one format is transferred to another format, e.g., paper to microfilm, microfilm to electronic information.

Curie Point: A transition temperature marking a change in the magnetic properties of a substance, esp. the change from ferromagnetism to paramagnetism.

Data Backup: To create a duplicate copy for security or disaster recovery purposes.

Data Communication: The movement of encoded information using electrical transmission systems; the transmission of data from one point to another.

Database Management: A software program that acts as a computerized filing cabinet full of information which comes with a superior indexing system.

Decompression: The process of decoding a compressed image and expanding the data to its original format.

Desktop Imaging System: A single-user setup for image processing.

Digital: Use of binary code to record information. "Information" can be text in a binary code, e.g., ASCII, or images in a bit-mapped form, or sound in a sampled digital form or video.

Digital Data: Data represented by binary codes.

Digital Image: Image composed of discrete pixels of digitally quantized brightness or color.

DIS: Draft International Standard.

dpANS: Draft proposed American National Standard.

dpi: Dots per inch. Measure of output device resolution and quality, e.g., number of pixels per inch on a display device. Measures the number of dots horizontally and vertically.

Dye Polymer: A recording process of write-once optical recording in which a laser's energy is converted into heat to form pits in a polymer which contains an infrared-absorbing dye. Information is recorded by the laser which operates at the dye's absorption wavelength.

EBCDIC: Extended Binary Coded Decimal Interchange Code. An 8-bit computer code used to represent 256 numbers, letters, and characters. Developed by IBM and used primarily in IBM equipment. See also ASCII.

EDAC: Error Detection And Correction. Operation that includes all phases of identifying and dealing with data errors, including direct-read-after-write and error correction codes.

EIM: Electronic Information Management.

Enhancement: Technique for processing an image so that the result is visually clearer than the original image.

Enterprisewide Imaging System: Large system with hundreds of users in different buildings; it can consist of several integrated department imaging systems.

FDDI: Fiber Distributed Data Interface. An ANSI standard for a 100 megabit-per-second LAN using fiber-optic cabling.

FIPS: Federal Information Processing Standard.

Firmware: A set of software instructions set permanently or semi-permanently into the read only memory (ROM) of a computer chip.

Gigabyte: GB. A unit of measure that is the equivalent of 230, or one billion bytes.

Gray Scale Scanner: Scanners with gray scale capability detect how dark or light a pixel is and pass this information on to the computer. The more bits of data the scanner records for each pixel it scans, the more levels of gray. For example, a one-bit-per-pixel scanner records only black or whites, a two-bit-per-pixel scanner records 4 levels of gray, and a four-bit-per-pixel scanner records 16 levels of gray. See binary scanner.

Group 3: CCITT compression technique which applies run-length encoding to a single horizontal line at a time.

Group 4: CCITT compression technique that efficiently compresses digitized images both horizontally and vertically (two-dimensions).

Halftone: Technique of reproducing continuous-tone illustrations by photographing the image through an etched screen.

ICR: Intelligent Character Recognition. Advanced form of OCR technology that may include capabilities such as learning fonts during processing, or using context to strengthen probabilities of correct recognition.

Index: At its simplest, it is a descriptive set of data associated with a document for locating the document's storage location. In a more complex and demanding role, indexing can be used to consolidate documents that may not be, at first glance, related, or that may be stored in different locations, or on different media.

Indexing stored documents is the great intellectual challenge in document retrieval. Anyone can scan a piece of paper, the hard part is devising an indexing scheme that describes every possible parameter of each document for later searches, comparisons and processing.

Information System: The organized collection, processing, transmission, and dissemination of information in accordance with defined procedures, whether automated or manual. Sometimes called a record system. Electronic records are generally scheduled by information system, whereas non-electronic records are generally scheduled by series.

Integration: Combining various pieces of hardware and software, often acquired from different vendors, into a unified system.

JBIG: Joint Bi-level Image Group. Algorithm standard under development by a CCITT/ISO Committee, or a bi-tonal compression algorithm which has potential applications in database management systems that are composed of black and white half-toned photos and text.

JPEG: Joint Photographic Experts Group. Algorithm standard under development by a CCITT/ISO committee for a general purpose compression technique for color and gray-scale image applications.

Jukebox: Automated device for housing multiple optical disks and one or more read/write drives.

Laser Printer: A printer device that uses a laser beam to generate an image that is developed with toner and fused to paper using heat and pressure.

Local Area Network (LAN): A system for linking together computers, terminals, printers, and other equipment, usually within the same office or building.

Longevity: The useful shelf life expectancy of optical data disks before writing (pre-write), plus the estimated post-write data life span.

Lossy: Method of image compression, such as JPEG, that reduces the size of an image by disregarding some pictorial information.

Lossless: Image and data compression applications and algorithms, such as Huffman Encoding, that reduce the number of bits a picture would normally take up without losing any data.

Magnetic Disk Cache: Temporary storage on magnetic disk for quick retrieval of frequently used documents.

Magneto-Optical: An optical recording process that is rewritable. The recording or "write" process uses a laser beam to heat a pre-magnetized site on the media's recording surface. This causes a reversal of the magnetic polarity, resulting in subtle reflective differences sensed as digital data by the "read" laser beam. The process is reversed to erase the data.

Megabyte: MB. A unit of measurement equivalent to 220 or about one million bytes.

Modem: MOdulator-DEModulator; a device that encodes and decodes digital data for transmission as analog signals over a particular medium, such as telephone lines, coaxial cables, fiber optics, or microwaves.

MPEG: Motion Pictures Experts Group. An image compression scheme for full motion video proposed by the Motion Picture Experts Group, an ISO-sanctioned group. MPEG takes advantage of the fact that full motion video is made up of many successive frames, often consisting of large areas that do not change - like blue sky background. MPEG performs "differencing" noting differences between consecutive frames. If two consecutive frames are identical, the second does not need to be stored.

Multifunction: An optical data disk storage system that accepts removable, double-sided 5.25-inch optical media in both write-once and rewritable formats.

OCR: Optical Character Recognition or Reader. The ability of a scanner with the proper software to capture, recognize, and translate printed alphanumeric characters into machine readable text. Most OCRs work by using either Pattern Matching or Feature Extraction.

Open System Architecture: Proponents of open systems are seeking to standardize computer equipment and processes so that data contained in one machine or system can be transferred or communicated easily to another. Such standards may address purely physical concerns (whether a plug fits into a particular socket or how fast electrical impulses are sent through a cable) or higher order logical concerns (so that, for instance, one word processing program can recognize footnotes or chapter headings created by another word processor as discrete elements with specific characteristics).

Optical Disk: A direct access storage device that is written and read by laser light. Certain optical disks are considered Write Once, Read Many (WORM), because data is permanently engraved in the disk's surface either by gouging pits (ablation) or by causing the non-image area to bubble, reflecting light away from the reading head. Erasable optical drives use technologies such as the magneto-optic technique, which electrically alters the bias of grains of material after they have been heated by a laser. Compact discs and laser (or video) disc are optical disks.

Optical Digital Data Disk: A form of optical disk used to store and retrieve digital data or digital image information.

Permanent Records: Records appraised by NARA as having sufficient historical or other value to warrant continued preservation by the Federal Government beyond the time they are needed for a particular agency's administrative, legal, or fiscal purposes. Sometimes called archival records.

Phase Change: A recording process in which a laser beam records information by heating selected areas of the recording layer until its glass-transition temperatures is reached. A crystalline-to-amorphous or amorphous-to-crystalline transition occurs in heated areas, accompanied by a change in their reflection characteristics.

Pixel: A sort-of acronym for picture element. Also called a Pel. When an image is defined by many tiny dots, those dots are pixels. On the printed page, each pixel is one dot. On color monitors, though, a pixel can be made up of several dots, with the color of the pixel depending on which dots are illuminated, and how brightly.

RAID: Redundant Arrays of Inexpensive Disks. A storage technology in which information is split up between multiple hard disks. This hardware configuration holds gigabytes of data, and is capable of storing and retrieving information faster than ordinary hard disks.

Raster Image Data: A line or array of pixels, as depicted on a CRT monitor, that corresponds to the original scanned image. The number of lines per inch is a function of the scan resolution (e.g., 200 dots per inch equals 200 lines per inch). The resulting scanned image contains a large number of pixels that collectively form a digital image or bit-map image.

Read/Write Head: Component that records and senses data on a magnetic or optical disk.

Recording Process: The means of inscribing and storing digitally coded information generated by computer systems. There are two approaches to recording: rewritable and write-once, read-many (WORM). Within these two categories are several recording processes. Phase change and magneto-optical are rewritable. Ablative, bi-metallic, dye-polymer, and thermal bubble are WORM recording processes.

Records Management Officer: The person assigned responsibility by the agency head for overseeing an agency-wide records management program. Also called records officer or records manager.

Resolution: 1. Measure of imager output capability, usually expressed in dots per inch. 2. Measure of halftone quality, usually expressed in lines per inch. The higher the resolution, a greater amount of detail may be shown.

Rewritable Optical Disk: A recording media that, unlike WORM disks, can be erased, written over, and otherwise reused. Both magneto-optical and phase change technology is currently used.

Run-Length Code: A method of redundancy reduction (data compression) used by digital facsimile transmitters to enhance speed. When image patterns of an original are converted into digital signals, all black and white areas on a page are reported as a series of ones (black) and zeros (white). The number of white spaces between black elements (number of zeros between ones) is assigned a number, or run-length code. The unit assigns a short code to represent each space encountered, rather than reporting all of the individual white spaces (zeros), and the most frequently used run-length codes are given the shortest binary numbers. Run-length coding may be performed horizontally across the width of the page (one-dimensional) or vertically (two-dimensional).

Scanner: Device that converts a document into binary (digital) code by detecting and measuring the intensity of light reflected from paper or transmitted through microfilm. Scheduled Records: Records whose final disposition has been approved by NARA.

SCSI: Small Computer System Interface. Industry standard for connecting peripheral devices and their controllers to a microprocessor. The SCSI defines both hardware and software standards for communication between a host computer and a peripheral.

Server: Computer dedicated to operating some portion of a total system, such as a database server, image server, or fax server.

SGML: Standard Generalized Markup Language. A language for describing documents that facilitates the exchange of text among systems. The Department of Defense mandated that its publishing systems support this standard.

SQL: Structured query language. A relational database language developed by IBM and standardized by ANSI.

Terabyte: A unit of measurement equivalent to 240 or about one trillion bytes.

Thermal Bubble: A recording process of write-once optical recording in which highly focused laser beam evaporates a polymer layer to form bubbles or bumps on a thin film composed of precious metals, such as gold or platinum. The bubbles open to form pits which reveal a reflective underlayer.

TIFF: Tagged Image File Format. A standardized header or tag that defines the exact data structure of images to be processed. TIFF is supported by many desktop publishing and paint programs.

Transfer: The act or process of moving records from one location to another, especially from office space to agency storage facilities or Federal records centers, from one Federal agency to another, or from office or storage space to the National Archives for permanent preservation.

TSS: Telecommunications Standardization Sector.

Turnkey System: An integrated configuration of preselected hardware and pre-written software designed to accomplish a particular information processing task. The term is most often applied to dedicated computer systems that use minicomputers or microcomputers.

Unscheduled Records: Records whose final disposition has not been approved by NARA.

Workflow: In imaging software, a program that tracks the progress of a document from its entry into the system through the various departments in the organization to its final destination.

WORM: Write-Once, Read-Many. Optical disks which store user data (write) and are accessible (read) when needed. Information recorded on WORM disks is considered permanent, in that the disks are not rewritable like magnetic media. See recording process.

Zoom: To enlarge a selected portion of an image displayed on a screen.

The following sources were consulted is compiling this Glossary of Terms (refer to the Bibliography for complete citations):

Digital Imaging and Optical Media Storage Systems; Guidelines for State and Local Government Agencies. National Archives and Records Administration, December 1991.

A Glossary for Archivists, Manuscript Curators, and Records Managers. The Society of American Archivists, 1991.

Glossary of Imaging Technology. Association for Information and Image Management (AIIM TR2-1992).

Stability, Care and Handling of Microforms, Magnetic Media and Optical Disks. American Library Association, Library Technical Reports, January-February 1991.

Terms Widely Used in Image Processing. (Glossary). Government Computer News, April 29, 1991.

The Imaging Glossary, Electronic Document and Image Processing Terms, Acronyms and Concepts. Andy Moore, 1991.

The Use of Optical Disks for Public Records. Association for Information and Image Management, (AIIM TR25-1990).

User's Guide and Glossary. Datapro Reports on Document Imaging Systems, McGraw-Hill, February 1994.


Appendix D: Bibliography

GLOSSARIES

Avedon, Don M.; Courtot, Marilyn E. Glossary of Imaging Technology AIIM TR2-1992. Silver Spring, MD: Association for Information and Image Management; 1992. A comprehensive glossary of technical terms.

Bellardo, Lewis J.; Bellardo, Lynn Lady, compilers. A Glossary for Archivists, Manuscript Curators, and Records Managers. Chicago: Society of American Archivists; 1992. 45 pp. (Archival Fundamental Series). The best glossary available for archival terms.

Datapro Reports on Document Imaging Systems. User's Guide and Glossary. McGraw-Hill; Delran, New Jersey; February 1994.

Moore, Andy. The Imaging Glossary, Electronic Document and Image Processing Terms, Acronyms and Concepts. 1991

Stoddard, Brooke. "Terms Widely Used in Image Processing." Government Computer News; April 29, 1991; vol. 10, no. 9: 16.

GENERAL WORKS

Applying Technology to Record Systems--A Media Guideline. Washington, DC: U.S. General Services Administration; May 1993.132 pp. ISBN 0-16-041784-8. Information Resources Management Service publication number KML-93-1-R. This publication was created by the GSA to help Federal agency personnel understand their media options.

Cinnamon, Barry; Nees, Richard. The Optical Disk...Gateway to 2000. Silver Spring, MD: Association for Information and Image Management; 1991. 72 pp. ISBN 0892582189. Overall introduction to optical disk technology and systems; includes bibliography.

D'Alleyrand, Marc R. Image Storage and Retrieval Systems: A New Approach to Records Management. New York: McGraw-Hill; 1989. 246 pp. ISBN 0070152314. Part A provides a short but excellent tutorial in the theories of information retrieval and indexing; Part B, a detailed discussion of data and image capture technologies; Part C, a plan for designing a system. Chapter 10 is specific to optical technology.

Electronic Imaging Request for Proposal (RFP) Guidelines. Silver Spring, MD: Association for Information and Image Management; 1991. 34 pp. (Technical Report for Information and Image Management AIIM TR27-1991). ISBN 0892582227. Provides guidelines for developing proposals for imaging systems. Gives "step by step procedures for analyzing system requirements, developing functional specifications, and evaluating configuration alternatives."

Hendley, A.M. A Technical Introduction: Document Image Processing. Manchester, England: National Computing Centre; 1990. 301 pp. Provides a conceptual introduction to the issues involved in and theories behind DIP in addition to technical matter. Covers all aspects of DIP. Broad but in-depth. Includes some discussion of standards.

Waegemann, C. Peter. The Handbook of Optical Memory Systems: Feasibility, Design, Implementation. Newton, MA: Optical Disk Institute; 1989. Focuses on uses of the technology; but gives excellent, brief discussions of various theoretical issues. Includes a list of vendors and consultants.

DIGITAL IMAGE CAPTURE

Image Acquisition. New Jersey: McGraw-Hill; February 1994. 57 pp. Datapro Reports on Document Imaging Systems. Brief overview of market and technology trends followed by product specification charts. Covers scanners and optical character recognition.

Roth, Judith Paris, ed. Converting Information for WORM Optical Storage: A Case Study Approach. Westport, CT: Meckler; 1990. 284 pp. ISBN: 0887363806. Part 1 covers conversion models and methodology; part 2, user experiences, including the National Archives and National Library of Medicine, and various commercial or trademarked systems. Includes a glossary, bibliography, and directory of firms and organizations.

Schantz, Herbert F. Optical Digital Imaging Text Systems. Silver Spring, MD: Association for Information and Image Management; 1991. 45 pp. (AIIM Resource Report). Covers various ODIT and OCR systems; discusses technology, trends, and applications.

OPTICAL MEDIA STORAGE

Abraham, Robert C.; Freeman, Raymond C., Jr.; Mass Storage Solutions. Santa Barbara, CA: Freeman Associates; 1991. 140 pp. (Freeman Reports). Contains two parts: 1) market assessment through 1995; 2) product specification charts. Includes glossary and producer profiles. Covers WORM, magneto-optical, CD-ROM, and various tape- based systems.

Behera, Bailochan; Singh, Harpreet. Optical Storage Performance Modeling and Evaluation. Optical Information Systems; 1990; 10(5): 275-286. ISSN: 0886-5809. Evaluates various storage media for long-term archival storage of large amounts of data. Proposes three evaluation models.

Campbell, David K. and Kraig Proehl. Optical Advances. BYTE; 19(3) March 1994: 107- 116. Explains how MO storage is poised to dramatically increase data storage and describes MO as a solution for storage-hungry applications such as image management.

Digital Imaging and Optical Media Storage Systems: Guidelines for State and Local Government Agencies. Washington, DC: National Archives and Records Administration and National Association of Government Archives and Records Administrators. December 1991. 87 pp.

Optical Digital Image Storage System: Project Report. Washington, DC: National Archives and Records Administration Research and Evaluation Staff. March 1991. 378 pp.

Podio, Fernando. Development of a Testing Methodology to Predict Optical Disk Life Expectancy Values. Washington, DC: National Institute of Standards and Technology. NIST Special Publication 500-200; December 1991. 96 pp.

Roth, Judith Paris, ed. Case Studies of Optical Storage Applications. Westport, CT: Meckler; 1990. 139 pp. ISBN: 0887365353. User experiences with various commercial or government optical storage systems, including the National Library of Medicine and the United States Navy. This is the first collection of such reports. Includes bibliography, glossary, and a 20 page listing of existing systems (8 pages of Federal and State listings).

Roth, Judith Paris, ed. Essential Guide to Multifunction Optical Storage. Westport, CT: Meckler; 1991. 134 pp. ISBN 0887367518. Includes a 42 page, annotated "Directory of Organizations and Individuals" providing rewritable optical technology; a bibliography; and a glossary. Note the following two chapters (remaining chapters provide vendor specific information):
  • Stevens, John J. Rewritable Media Manufacturing for Multifunctional Optical Disk Drives. Chapter 5, pp. 45-51. Describes physical characteristics of magneto-optic disks.

  • Berg, Brian A. Software Considerations for Rewritable and Multifunction Optical Drives. Chapter 6, pp. 52-60. Focuses on the SCSI-2 standard and discusses efforts towards a universal logical format for rewritable optical digital data disks (ANSI X3B11.1).

Saffady, William. Stability, Care and Handling of Microforms, Magnetic Media and Optical Disks. Library Technology Reports; Jan.-Feb. 1991; 27(1): 5-116. ISSN: 0024-2586. Optical disks, pp.63-67. Covers stability and wear, giving guidelines for daily use. 28 page bibliography.

Saffady, William. Optical Storage Technology 1990-91: A State of the Art Review. Westport, CT: Meckler; 1990. 230 pp. ISBN 0887365949. Updated annually. The current edition of this comprehensive review covers CD-ROM, read/write optical digital data disks, and optical cards and tape, and includes a 71 page bibliography.

Image Storage. New Jersey: McGraw-Hill; February 1994. 54 pp. Datapro Reports on Document Imaging Systems. Overview of market and technology trends followed by product specification charts. Covers optical storage: jukeboxes, multifunction/rewritable drives, and other drives.

RETRIEVAL AND OUTPUT

Image Distribution. New Jersey: McGraw-Hill; February 1994. 37 pp. Datapro Reports on Document Imaging Systems. Overview of market and technology trends followed by product specification charts. Covers laser printers, large screen displays, and facsimile.

DATA MIGRATION

Baronas, Jean. An Introduction to the Association For Information and Image Management's Standards Program and Its Electronic Image Management (EIM) Standards. Silver Spring, MD: AIIM Standards Activity Status Report dated April 1994. 22 pp.

Courtot, Marilyn. Document Imaging Standards Development: How, Why, and for Whom? Silver Spring, MD: Association for Information and Image Management; 1992. 60 pp. (AIIM Resource Report). Overview of the standards process and discussion of existing standards.

Courtot, Marilyn. Imaging Standards. Silver Spring, MD: Association for Information and Image Management; 1991. 52 pp. (AIIM Resource Report).

Courtot, Marilyn. Impact of Optical Storage Standards on the Image and Information Industry. Optical Information Systems; 1990; 10(2): 70-74. ISSN: 0886-5809. Reviews standards in the optical storage industry, focusing on image capture and image output processes.

INFORMATION MANAGEMENT POLICY

The Expungement of Information Recorded on Optical Write-Once-Read-Many (WORM) Systems. Silver Spring, MD: Association for Information and Image Management; 1991. 5 pp. (Technical Report for Information and Image Management AIIM TR28- 1991). ISBN 0892582235.
Prescribes practices for the court-ordered expungement of information on WORM optical digital data disks. Intended for use by public offices, records managers, and archives. Prepared by the AIIM C18 Public Records Committee.

The Use of Optical Disks for Public Records. Silver Spring, MD: Association for Information and Image Management; 1990. 33 pp. (Technical Report for Information and Image Management AIIM TR25-1990.
Discusses the use of electronic image management technologies and methodologies for the storage of long-term and permanent public records on optical digital data disks. Prepared by the AIIM C18.2 Committee on Digital Imaging for Public Records.

Hall, George M. Image Processing: A Management Perspective. New York: McGraw-Hill; 1991. 210 pp. ISBN: 0071557467.
The most comprehensive treatment of optical systems management, covering cost/risk analysis, implementation, legal and security issues, technology, capture, and storage.

McIntosh, Toby J. Federal Information in the Electronic Age: Policy Issues for the 1990's. Washington, DC: Bureau of National Affairs; 1990. 119 pp. with appendices. ISBN: 1558711708.
Focuses on legal issues such as FOIA and agency search obligations; and covers current agency practices. Comprehensive compilation of Federal practices and issues.

Memo, Paula L. Kocher to Bill Adams, Department of Health and Human Services, Office of the General Counsel, February 20, 1992.

U.S. Army Corps of Engineers Optical Disk Imaging Pilot Test Briefing for NARA, April 8, 1993.

U.S. Army Corps of Engineers Optical Disk Imaging (ODI) Pilot Test Briefing for the National Archives, April 8, 1993.

U.S. Army Corps of Engineers Optical Disk Imaging (ODI) Pilot Project IPR.

National Archives and Records Administration, Office of Records Administration, The Management of Permanent Records in the Bureau of Land Management: An Evaluation (Washington, DC: NARA, 1988), pp. 4-13.

Bureau of Land Management, General Land Office, "Records Preservation Project."

The United States Government Manual, Office of the Federal Register, 1993/1994. Abstracted from a CFTC Document Management System Description. PERMS Phase II Program Management Plan (Unclassified). Joint Services Committee on Optical Imaging Systems Report, 3 May 1991.

Personnel Electronic Record Management System (PERMS) PEO STAMIS Program Execution. The United States Government Manual, Office of the Federal Register, 1993/1994.

Office of Information Resources Management, Guidance for Developing Image Processing Systems in EPA: EPA System Design and Development Guidance: Supplement to Volumes A & B (Washington, DC: Environmental Protection Agency, February 1991).

The United States Government Manual, Office of the Federal Register, 1993/1994.

The United States Government Manual, Office of the Federal Register, 1993/1994.

The United States Government Manual, Office of the Federal Register, 1993/1994.

The United States Government Manual, Office of the Federal Register, 1993/1994.

The United States Government Manual, Office of the Federal Register, 1993/1994.

Compiled from U.S.Patent and Trademark Office reports: Office of Information Systems - Automation Activities - 1991; and the APS Fact Sheet dated May 13, 1991.

The United States Government Manual, Office of the Federal Register, 1993/1994.

"Development of an Optical Disk System for the Automated Retrieval of EASEAR Records", National Institute of Standards and Technology, Natalie Willman, August 1991.

Social Security Administration's Optical Disk Pilot overview. Social Security Administration's Optical Disk Pilot overview.

The United States Government Manual, Office of the Federal Register, 1993/1994.

The United States Government Manual, Office of the Federal Register, 1993/1994.

"Journal of Electronic Imaging" 2(2), 126-137, (April 1993). This section lists a number of consortium and proprietary standards. Although in wide use, they are not necessarily industry standards developed through the industry consensus process. This section lists various "Book" standards that are not ISO/ANSI accepted standards.

Top of Page

PDF files require the free Adobe Reader.
More information on Adobe Acrobat PDF files is available on our Accessibility page.

Preservation >

The U.S. National Archives and Records Administration
1-86-NARA-NARA or 1-866-272-6272

.