Spring 2006, Vol. 38, No. 1
The World War II Army Enlistment Records File and Access to Archival Databases
By Theodore J. Hull
World War II Army Enlistment Records provide a rich source of information for genealogists and other researchers at the National Archives and Records Adminstration interested in Army enlistees in World War II. Since their release through NARA's Access to Archival Databases (AAD) resource in May 2004, they have quickly become the most popular series of electronic records accessible through that resource.
AAD, as the first installment of NARA's Electronic Records Archives (ERA) program, is leading the way to providing improved access to NARA's rich holdings of electronic records. In the first year, thousands of AAD users performed more than 700,000 queries against the enlistment records file alone. With 9.2 million records for enlistments in the Army, Enlisted Reserve Corps, and Women's Army Auxiliary Corps, this should come as little surprise.
In addition to genealogists, individuals who served in the war (and their children and grandchildren) are using the records to document military service.
The enlistment records are one of 45 series of electronic records currently available on AAD. Those series contain more than 85 million historic electronic records created by more than 20 federal agencies on a wide range of topics. The enlistment records complement other World War II–era electronic records in AAD, including the Records of Duty Locations for Naval Intelligence Personnel, Records About Japanese Americans Relocated During World War II, and Records of World War II Prisoners of War.
This article provides information about how the enlistment file came to be in AAD, along with some tips and pointers for finding records in the file.
Preparing the Records for Access in AAD
The story of the electronic World War II Army Enlistment Record file begins with the disastrous July 12, 1973, fire at NARA's National Personnel Records Center for Military Personnel Records (NPRC). The fire destroyed approximately 16–18 million Official Military Personnel Files, including the records of approximately 80 percent of U.S. Army personnel discharged between November 1, 1912, and January 1, 1960. Following the fire, NPRC staff began identifying various series of records in NARA's custody that could assist them in reconstructing the lost basic service data. With these alternative sources, they could verify military service and provide a Certification of Military Service.
Among the sources identified was a series of 16mm microfilm of computer punch cards titled "Microfilm Copy of the Army Serial Number File, 1938–1946." The Personnel Services Support Division of the Adjutant General's Office had created the microfilm in 1947, and NARA accessioned it in 1959. The original punch cards, which contained basic information about enlistees at the time they entered Army service, were destroyed after microfilming, a common practice at that time. The NPRC began using a copy of the microfilm, but it presented some challenges. First, there were 1,586 rolls of microfilm, making manual review very difficult. Second, the punch cards were microfilmed in serial number order, making a search by name impossible. Third, a variety of punch card formats were used to record the enlistment data over time, and documentation of the various recording formats was hard to identify.
A goal of the NPRC was to have as many of the reconstructed records available to its staff electronically to speed response time to its over one million annual requestors. In 1992, NPRC contacted NARA's Center for Electronic Records seeking some assistance with these challenges.
The Center's director was familiar with the Bureau of the Census's Film Optical Sensing Device for Input to Computers (FOSDIC) system and its successful use in processing the 1960 through 1990 decennial censuses. Census returns, which were essentially "bubble" forms where answers were supplied by blacking out the appropriate circle, were microfilmed, and then FOSDIC extracted the answers from the image. Since the Bureau of the Census had already modified the original FOSDIC to process a series of 300 million microfilmed punch cards containing weather data, it responded affirmatively to the challenge presented by NARA.
The Bureau of the Census completed the project during federal fiscal year 1994 on time and below budget. They successfully converted 1,374 of the 1,586 rolls, or 87 percent of the rolls of microfilm. The 212 remaining rolls containing approximately 1.5 million punch cards could not be converted because the card images were so dark that the scanner produced few, or no, usable records. In July 1994, the Bureau of the Census provided NARA with 1,374 data files (one per converted roll) on twelve 3480-class tape cartridges. NPRC received copies of the files, and they worked with Center for Electronic Records staff to identify the relevant War Department Technical Manuals containing technical documentation for the punch cards. Additional code tables and documentation continue to be identified among NARA's vast textual records holdings from World War II.
The unique characteristic of the files created by the Bureau of the Census was that FOSDIC read each punch card image up to 10 times in an attempt to create a clean record and extract all characters from the original punch card. Usually, the first read would contain the majority of data extracted from the card image. If all data could not be extracted, subsequent reads of the card image would result in additional records containing periods for characters successfully read on previous reads and alphanumeric characters for those interpreted on the current read. Varying interpretations of the same character may have occurred across the multiple reads. A blank record separates records or groups of records pertaining to an individual punch card image. Each file also contained a header record indicating the box and microfilm roll number and an end of file record. In cases where FOSDIC could not interpret any information from a punch card or series of punch cards within a file, FOSDIC inserted a record indicating "ONE OR MORE RECORDS WERE UNREADABLE AT THIS LOCATION."
These features presented challenges to the NPRC because the alphanumeric data were spread over multiple records, making it hard to use and interpret. The large number of files still presented a logistical problem for identifying and searching for individuals, especially given the computer technology of that time. During the 1990s, NPRC collected code books and began an analysis of the records while NARA's St. Louis Data Systems Center created early edit programs in an attempt to merge best guesses into one record. Given the complexities of the files, however, and the limited ability to search and locate individual records, NARA undertook no additional processing of the electronic version of the "Microfilm Copy of the Army Serial Number File, 1938–1946."
That is, until 2002. In that year, staff took another look at the languished project, primarily because of the newly developed Access to Archival Databases (AAD) resource. They determined that to get the records ready for AAD, the project should be approached in two phases. The first phase involved "merging" the 1,374 files into 12 files, corresponding to the number of computer tape cartridges provided by the Bureau of the Census. The purpose was to reduce the number of files to a manageable number and allow for an overall evaluation of the scope, content, and quality of the electronic files. This first phase was completed in May 2002 and resulted in the series "Electronic Army Serial Number Raw Files, 1994–2002," which contains 23,446,462 records.
The objective of phase two was to get a single data file with a single "best guess" record for each serial number so that it could be made available through the AAD resource. First, the 12 files were merged again into a single file. A NARA programmer then wrote a computer program to "collapse" the multiple FOSDIC reads of the punch card images into a single "best guess" record. When we collapsed the multiple records, we were able to collapse only the data appearing in the FOSDIC second read of the punch card into the first read. FOSDIC may have correctly interpreted any specific character correctly on the third or later reads of the punch card, but we were unable to apply a more complicated algorithm to the processing to provide a better "guess" than what appears in the resultant file. We therefore have retained the Electronic Army Serial Number Raw Files, should researchers wish to reprocess the raw data and create a better "best guess" file.
The program also matched the associated box and roll data to the end of each cleaned up record. The records with the value "ONE OR MORE CARDS WERE UNREADABLE AT THIS LOCATION" are retained in the file in their original positions. The resultant file, known as the "World War II Enlistment Records: Electronic Army Serial Number Merged File, 2002," has a total of 9,200,232 "best guess" records, including 160,390 records indicating the punch cards that FOSDIC was unable to interpret. It is this file that NARA makes available in the AAD resource.
Army Enlistment Records File Characteristics and AAD
It is important for users of the AAD file to understand how far removed the enlistment records are from the microfilm images of the original computer punch cards. Each successive processing stage invariably introduced the chance of errors.
As with most archival records now used for genealogical research, the records were originally created for a very different purpose than identifying specific individuals. In the case of the enlistment cards, they were designed to reflect, at the time of entrance into service, basic characteristics of each enlistee in the Army, Enlisted Reserve Corps, and the Women's Army Auxiliary Corps. The Adjutant General's Office used the punch cards to prepare tables analyzing occurrence of the various characteristics among individuals, enlisted or inducted, and to provide information for policies of demobilization. Therefore, given that the original intent of the program was to prepare statistical tables, less attention may have been paid to the proper spelling of names and accurate keypunching of personal data fields.
Most important, the many migrations of these records—from original recording on punch cards, to copying them to microfilm, to FOSDIC processing, to "merging" and "collapsing"—means that error could have been introduced at any phase. The poor quality of the original microfilm caused most of the errors. To determine the level of error in the resultant file, NARA staff compared a random sample of the World War II Enlistment Records to the microfilmed punch cards. Of the sample records examined, 35 percent of them were found to have a scanning error. However, only 4.7 percent of the records had any character error in the name column, and only 1.3 percent had character errors in the serial number column. While a large number of records had other errors, they were minor. For example, the term of enlistment column frequently has the value of "0" in the electronic file where no punch appears on the original card. Other errors can be intuitively corrected by the users, such as understanding "POT" or "PVO" to mean PVT in the grade column. To help minimize these problems, NARA staff outlined some of the common errors in a set of Frequently Asked Questions for AAD.
The bulk of the records are for the period 1941 through 1946. About 4 percent of the records contain data originally recorded on Enlisted Reserve Corps Statistical cards, and the bulk of those records are from 1942 and 1943.Number of Enlistment Records by Year
|Year||Number of Enlistment Cards|
|Other Years or Miscoded||41,756|
In general, the records contain the serial number, name, state and county of residence, place of enlistment, date of enlistment, grade, Army branch, term of enlistment, longevity, nativity (place of birth), year of birth, race, education, civilian occupation, marital status, height and weight (before 1943), military occupational specialty (1945 and later), and component of the Army. As noted earlier, at the end of each "best guess" record appear the box and roll number of the microfilmed punch cards.
To facilitate search and retrieval in the AAD resource, the file is split into two tables: a large file containing general Army enlistment records, including enlistees in the Women's Army Auxiliary Corps, and a second with records of enlistees in the reserve corps. Over time the enlistment card format changed, and the height and weight or military occupational specialty categories were recorded in the same columns on the original punch cards. Because there is no easy way to distinguish original data recorded on the twoforms, NARA chose to drop that data from the AAD version of the file.
Finding Records in AAD
Users can search and retrieve the enlistment records through the Access to Archival Database (AAD) resource. Before using AAD, we recommend that the user read the "Getting Started Guide" on the AAD home page. The Frequently Asked Questions developed especially for the World War II Army Enlistment Records File also provide a number of helpful tips and hints about technical data characteristics of various fields.
From the AAD home page, the user can execute a search across all series in AAD by entering a name or other search term in the "Search AAD" box. Results will be returned from the Army serial number file and from all other series in AAD where appropriate. Alternatively, the user may go straight to the enlistment records by using the link under "Most Popular" or by choosing the categories for Military Personnel, World War II, or 1940–1955. The user next clicks on "search" to access either the Enlistment Records or the Reserve Corps Records. This will bring up a page where the user may search these records.
Using an individual's Army serial number may be the most efficient way to find a record. Type the serial number in the search box without hyphens, submit the search, and a summary of the record with that serial number will appear. Clicking the icon in the column titled "View Record" will display the full record, which will contain meanings for the coded data. To print a copy of any record, click "Print" at the top of the screen, and this will display the full record again in a format suitable for printing.
A common way to search for individual records is by name. Users should note that searches are not case sensitive even though entries are uppercase in the file. In making the records available in AAD, staff inserted "#" for blanks that would normally appear between the last and first names and in other instances. The name column includes all possible parts of a name: surname, space, first name, space, middle initial, and SR, JR, 3rd, etc. Names with "Mac," "Mc," "de," "Van," etc., have a space between the prefix and the rest of the surname when both the prefix and following letter are capitalized. For example: McAffee was recorded as MC AFFEE, but Mcaffee was recorded as MCAFFEE. Names with apostrophes, like O'Brien, usually do not have a space between the prefix and the rest of the name, i.e., OBRIEN. Van Heusen is recorded as VAN HEUSEN. When the full name was longer than the number of characters available in the name column, as much of the surname as possible is in the column, and initials were used for the first name. AAD also allows for using wildcards in searches so that users can identify records even when unsure of name spelling or format.
For example, to find my grandfather's record, I entered "James N Tronolone" into the name search box. Alternatively, I could have simply entered "Tronolone" and selected his record from among the 23 records for persons with that last name in the enlistment table. If the user is searching for a common name, the name can be combined with other fields, such as state or state and county, to narrow the search for an individual record. Users will often use the information retrieved in the AAD search, such as the serial number when not otherwise known, to request further information about their relative from the National Personnel Records Center.
Because this file was originally designed for computer processing, data fields such as the state and county of residence, place of enlistment, civilian occupation, and marital status were represented by numeric codes rather than being spelled out. These codes allowed for the uniform recording of repetitive data in a keypunch operation and for the efficient sorting and tabulation of the computer punch cards. AAD reinterprets the coded fields "in English" so that users can understand the information. The full record also links to notes on specific fields that more fully explain the meanings of codes.
Another common search strategy is to find records of individuals who enlisted at a specific place or came from a specific county. This requires searching AAD using one or more coded fields. The fields Residence: State; Residence: County; and Place of Enlistment are options on the main database search screen. To search these fields, click on the "Select from Code List" link to bring up a window with a list of all the coded values. Select a value, and then click the "Submit" button. This will paste the code into the search box, and then the user can execute the search.
For example, to get a list of enlistees from Centre County, Pennsylvania, first select the primary code for Pennsylvania (code 32). Then select the appropriate county code (Centre County's code is 027). Once these codes are pasted into the search boxes and the search is submitted, AAD will return 3,170 records. All search results will be returned, but because this number exceeds the download limitation of 1,000 records, no records can be downloaded for additional processing. To get a complete list, a user could execute multiple queries, such as by running a series of searches by year of enlistment. Multiple records retrieved in this way can be downloaded to the user's computer in the form of an ASCII spreadsheet file with comma-separated values, with or without the code meanings. The file can then be directly imported into spreadsheet software, such as Microsoft Excel, for further manipulation.
The story of the World War II Army Enlistment Records File is unique, but it illustrates the lengths to which NARA will go to provide researchers with ready access to the documentary heritage of the United States Government.
As NARA develops its Electronic Records Archives, AAD will continue to be an integral part of that program and will grow to provide access to the expanding number and variety of electronic records being deposited in the National Archives.
Theodore J. Hull is an archivist in the Electronic and Special Media Records Services Division of the National Archives and Records Administration, College Park, Maryland. His primary responsibility is the archival processing of NARA's electronic records holdings of the Bureau of the Census.