About the National Archives

Remarks of Archivist of the United States David S. Ferriero at the joint annual meeting of the Society of American Archivists, the Council of State Archivists, and the National Association of Government Archives and Records Administrators. Washington, D.C

August 12, 2010

Who is the Archivist?

David S. Ferriero

David S. Ferriero The Archivist of the United States is the head of our agency, appointed by the President of the United States.

The AOTUS Blog
What's an Archivist?

(See video)

Thank you.

It is a special honor to be addressing the largest single gathering of archivists and records administrators in the United States. I would like to say a special thank you to the Presidents and executive directors of your three organizations, Pat Michaelis, Tracy Berezansky, Peter Gottlieb, Nancy Beaumont, Steve Grandin, and Vicki Walch, for their efforts to welcome me to my new job—their honesty, passion, and guidance has been invaluable to me. I also want to congratulate all of the members of these organizations who have worked to make this meeting a success.

I have been Archivist for about eight months now, and I am still having a good time, despite being called before Congress six times and having at least two more in the wings. And I must say being in the line of fire as the members of Congress take shots at one another is a new experience for me!

Last year at this conference, we celebrated the 75th anniversary of the National Archives, and this year we celebrate the 75th anniversary of the Society of American Archivists. We have come a long way over those 75 years. Yet in some ways, we are facing many of the same challenges.

The first head of the National Archives was Robert Digges Wimberly Connor. Dr. Connor was a native of Wilson, North Carolina, a graduate of, and history faculty member at, the University of North Carolina.

One of Dr. Connor’s first acts was to commission a study on the state of the government’s records. Describing the situation from the perspective of the researcher he writes:

...conditions make it impossible for officials to find adequate room for both their files and their staffs. Few facilities can be furnished the student and his presence is tolerated rather than encouraged by staffs already sufficiently burdened with the routine duties of the day. He finds the records he desires to use scattered throughout the country, stored wherever space can be found for them, in cellars and sub-cellars, under terraces and over boiler-rooms, in attics and corridors, piled in dumps on floors and packed into alcoves, abandoned carbarns, storage warehouses, deserted theaters, or ancient but more humble edifices that should long ago have served their last useful purpose. Typical is the case of valuable records relating to Indian Affairs which were found in a depository in Washington piled on dust-covered shelves mingled higgledy-piggledy with empty whisky bottles and with rags and other highly inflammable trash. In another Washington depository packed with documents the most prominent object which meets the eye as one enters the room is the skull of a cat protruding from under a pile of valuable records. I think it is a fair question that if a cat with nine lives to risk in the cause of history could not survive the conditions of research in the depositories of government records, can we justly blame the poor scholar who has only one life to give for his country if he refuses to take the risk?

Connor and his colleagues created systems for records management, accessioning, processing, preserving, and encouraging use. They cleaned out the whiskey bottles and put out the cat. They brought order to the chaos, just as the storm of records grew into a tempest.

As early as December 1935, records began to arrive at the Archives building, and by June 1941, over 300,000 cubic feet were on file. Eight years later, the space in the building was exhausted, and by the mid 1940s, one million cubic feet of Federal records were being created each year. "It is almost inconceivable," U.S. Archivist Wayne Grover observed in 1953, "that the federal government, in the 22 years from 1930 to 1952, should have created more than seven times as many records as it did during its previous 155 years of history."

* * *

Before my confirmation, I read most of R. D. W. Connor’s speeches, and I find myself taking guidance from his thoughts as I face a task similar to his. The state of electronic records in the federal government today is much like the state of paper records faced by Archivist Connor. We may not have whisky bottles and dead cats, but we do have Trojan horses and viruses, and a volume of records that strains the imagination. The Bush White House alone deposited over 77 terabytes of electronic records.

Like Archivist Connor, we are faced with the challenge of not knowing the volume of electronic records in the federal government and not knowing the conditions in which they are stored.

Our first survey of how federal agencies are managing electronic records was not reassuring. Eighty percent of the agencies reported that their electronic records were at moderate or severe risk. We are at risk of having a 20-year gap in our history because we failed to properly manage and preserve the electronic records that document that period.

Almost two decades ago, Richard Cox, in his book Archivists, Archival Institutions, and Electronic Records argued that technological innovation was changing the way information was created and used within organizations. He suggested that archivists needed to change the way they do business in order to adequately respond to those organizational changes. A decade later, he extended this idea with an examination of how management programs to reinvent or reengineer organizations posed further challenges to archivists.

We must change the way we do business if we are going to be able to manage electronic records in the federal government.

Our survey of agencies vulnerabilities was the beginning of changing the way we work with agencies to help them manage their electronic records. We are reaching out to agencies to help then assess their weaknesses and to find ways to confront them. One step in that process is getting the agency chief information officers talking to the record managers. I was surprised to learn that the CIO council and the Records Management Council had never met together. I will be co-hosting the first joint meeting of these two groups next month at the National Archives with the CIO Council chair, Vivek Kundra, the nation’s Chief Information Officer.

We also need to turn our attention inward. The National Archives should lead by example in managing electronic records. I plan to see that we do that.

* * *

The National Archives has been in the electronic records business for a long time. We started preserving electronic records in 1968. National Archives staff like Tom Brown and Peggy Adams are frequently cited in the professional literature for their leadership. Our attention then was focused on what we used to call machine-readable data. Those statistical data sets and later databases most often produced by the federal statistical system. We were in the forefront on those complicated data sets, but we got left behind on other kinds of electronic records.

Two laws govern the National Archives: the Federal Records Act and the Presidential Records Act. Through action by the courts, the Presidential Records Act recognized electronic office records as records around 1996 — we have all of the emails from the Reagan White House. However, the Federal Records Act has yet to change. For federal records, we still live in a world of print-and-file. The first bill to address electronic records in the Federal Records Act is still languishing in Congress.

Perhaps the largest undertaking in the history of the National Archives is the creation of the Electronic Records Archives—or ERA. Established in partnership with the private sector and developed using the best available research from around the world, ERA is intended to preserve and provide access to any type of electronic record created by a federal agency. The ERA got off to a rocky start, but we have stayed the course and progress has been steady.

ERA as a functioning system is within sight. The system will be complete by the end of 2012. The ERA, however, is not the end of the road. It will provide the Archives with a system to manage all existing record schedules, to create and track new schedules for paper and electronic records, as well as a storage and retrieval system for federal electronic records once they are deposited with us. It will not provide agencies with a way to manage their electronic records internally, and we need to help them figure that out.

The fundamental nature of information has changed, and the ERA was developed within the same framework as that developed by R. D. W. Connor and his colleagues. Confronted by limited space, an overwhelming volume of records in desperate need of preservation, Connor and his staff devised a system that saves only three to five percent of the information created; a system that identifies records hierarchically — papers of officials within agencies within departments.

Contrast that, however, with the recent acquisition of the Twitter Archives by the Library of Congress. They are preserving all of the tweets ever created. Why keep all of those tweets? It is not because there is value in every tweet, but rather because in the aggregate it will tell us something about our culture. The Library readily acknowledges that they do not have today the tools to analyze those tweets, but they are confident that those tools will be developed.

We are fortunate to have the Presidential Records Act that forces us to keep all presidential records. As a result, we now have over 200 million e-mails from the most recent Bush Administration. The volume and complexity of Presidential electronic records provides a valuable test bed for developing new methods of identifying and retrieving those records. I can guarantee you that we are going to have to find a better way to deal with those emails than print and read.

I believe that when the records of the Obama administration are compiled — the official documents, the emails, the blogs and twitters and the face book walls — we will have the corpus of information to allow future researchers to examine what is was like to be within that administration. Like the tweets at the Library of Congress, we do not have the tools to do that analysis today, but we will preserve the raw materials for future researchers.

One of the less well known by-products of the ERA project has been our involvement in research on tools for managing and reading electronic records. We do not have a research and development budget; however, from the beginning we have used funding from the ERA account and invested those monies in coordinated research with leading research agencies through the Federal Networking and Information Technology Research and Development Program.

This has led to collaboration with some of the most innovative research being conducted at places like the Army Research Laboratory in Maryland, the National Center for Supercomputing Applications at the University of Illinois, and more recently the School of Information and Library Science at the University of North Carolina and Chapel Hill’s Renaissance Computing Institute, which draws on the talents of seven universities and institutions in North Carolina including Duke, UNC, and NC State.

SAA recognized the excellence of the DICE group at UNC in 2008 with the presentation of the J. Franklin Jameson Archival Advocacy Award. Without the involvement of the National Archives, federal information technology research would not be focusing on tools that are useful to archivists.

Last year, we formalized our research activities by creating a new unit called the National Archives Center for Advanced Systems and Technologies or "NCAST". Ken Thibodeau is directing that group, and they will focus on how to move cutting edge research in computer technology applied linguistics into tools that can be used by Archivists.

Some of those tools are available for you to use today. The National Archives of Great Britain developed DROIDS, software that will tell you what software created a particular file. NCAST and the Army Research Center are developing novel methods, which today identify over 70 file types, not recognized by DROIDS. These results and developments will ultimately be incorporated into DROIDS.

* * *

DROIDS, however, just shows us how to unlock the box. It doesn’t tell us anything about what is inside the box. In a separate project with Georgia Tech, we are developing software that will identify what type of record we have opened. Is it a letter, or a memo? Is this a document associated with a nomination, or is it a commendation? This is the first step towards an intelligent system that can look into those 200 million emails and weed out the lunch dates and give us the ones that deal with policy.

Last month, the National Archives delivered to the Senate over 170,000 pages of records from the Clinton Library for the Elena Kagan confirmation hearings. We did this in record time, but we did it with brute force —16 archivists, 6 archival technicians, and a supervisory archivist put in over 6,000 hours on the job — working every Saturday, Sunday, and Memorial Day. I hope that some of these tools will be ready for the next nomination.

Another tool available to you today is iRODS, which stands for "Integrated Rule-Oriented Data System". iRODS is another example of how our research collaboration is serving the broader archival community. This year, at the iRODS User’s Group Meeting, a group of researchers presented initial results from the Distributed Custodial Archival Preservation Environments (DCAPE) a project funded by the National Historical Publications and Records Commission. DCAPE develops a framework to support institution-specific preservation policies while providing the economy of scale needed for cost-effective services. DCAPE relies on iRODS to implement those institution-specific rules.

Electronic records are just one of the challenges facing the National Archives. When I arrived last winter, I found an agency in need of a culture change to be able to exist and thrive in the digital age. The National Archives needs to be more nimble. It needs to take risks in trying new approaches to recordkeeping. And it needs to make smarter and more creative use of technology.

The President’s Open Government Initiative has given us the encouragement and the authority to make these changes in the Archives’ culture —- while pursuing the President’s goals of more transparency, more collaboration, and more public participation in government.

The goal of the President’s initiative is to transform the relationship between the government and the people. But it has other aims, too, as the President said last year: "Openness will strengthen our democracy and promote efficiency and effectiveness in Government."

What the President wants with his Open Government Initiative is what we at the Archives also want, as expressed in our mission statement and strategic goals. Open government is the essence of the work we do every day. It’s rooted in the belief that citizens have the right to examine and learn from the records of their government. This will equip them to make informed decisions in the future.

* * *

Ready access to the records of our nation has always been what the National Archives is all about. But we must open up more—and we will—to provide access to ever-more-complex records in a digital world. And we have made our intentions clear in our own Open Government Plan.

Some of you have criticized my use of the term citizen archivist. I think you may not understand my intent. I have worked in research libraries my whole life, and one of the constants across all of those institutions, including the National Archives, is that we let researchers walk out with valuable information about our collections without leaving any of it behind. I have seen firsthand the benefits of tapping those resources.

The National Archives needs to capture the knowledge created by our researchers so that it is readily accessible for the next generation of researchers. To do that, we have to make it easy for the researchers to deposit their knowledge, and easy for the next researcher to find.

We have started a conversation with our constituents through our public blogs and public meetings. It is clear from those venues that our researchers are eager to contribute. It is now incumbent upon us to create the mechanisms to make that contribution happen.

Internally, we are using Web 2.0 tools to foster discussions. As you may have heard, we have been instructed by OMB to prepare a 2012 budget that is ten percent below our 2011 budget, and then to prepare a second one that is fifteen percent below 2011. In July, we set up a forum in Idea Scale to allow the staff at the National Archives to participate in the discussions of how we should achieve those savings.

I have also created a Transformation Task Force charged with identifying the key issues, barriers, and opportunities for organizational effectiveness facing the National Archives today, and to recommend strategies that position us to flourish and lead in the future.

Physics has taught us that what once appeared chaotic is often quite orderly when viewed through the appropriate lens. We are fortunate to live in a time when grinding that new lens is called for. It is difficult work, and it takes the talents of a community of scholars to make it happen. I welcome each and every one of you to join us in these dialogues through the Web, or through the medium of your choice, to chart this new course.