National Archives News

National Archives Releases Catalog, 1940 Census Datasets

By Victoria Macchi | National Archives News

refer to caption

1940 Census population schedule for Alaska's First Judicial Division, enumeration district 1-1.

WASHINGTON, April 19, 2021 — The public is now able to download full datasets of the National Archives Catalog archival descriptions and authority records, as well as the entirety of the 1940 census, for the first time. This free service will provide researchers access through the Amazon Web Services (AWS) Registry of Open Data. 

Until now, this data was available through the Catalog and the 1940 census websites, but not in bulk. This release aligns with the National Archives’ effort to Make Access Happen for the records in its care. This is the first time the National Archives is releasing a census dataset in full. 

"By publishing these datasets to the Registry of Open Data, we are unlocking them in ways the Catalog currently can't,” said Jason Clingerman, Director of the National Archives Digital Engagement Division. 

“For the first time, users can access the data for the National Archives Catalog and the 1940 Census in bulk,” Clingerman said. “By publishing to the Registry, we're also supporting the ability for users to analyze the datasets using Amazon's suite of tools. This opens up the National Archives’ data to innovative research approaches."

These datasets will be of particular interest to universities, private industry, other government agencies, and demographic researchers, Clingerman said.

The Catalog dataset includes 225 gigabytes of data, including archival descriptions that have  record group/collection descriptions, series descriptions, file unit descriptions, and item descriptions as well as the URLs for over 127 million digital copies and data from citizen archivist contributions. 

The National Archives intends to update the dataset on the Registry of Open Data regularly. 

The 1940 census dataset contains the images of the entirety of the digitized 1940 census and has 15 terabytes of data: the metadata index and 3.7 million images of the population schedules, the enumeration district maps, and the enumeration district descriptions. 

The tools available through AWS will allow researchers to review, for example, specific sections of the census, like records of one state or county. Previously, that task would have required reviewing individual images in the Catalog or using technical knowledge to query and download the data and images, with limits on how much data could be queried at once. 

Read more about the dataset release in the AOTUS Blog post.