Congressional Web Harvest
As the repository of the official records of Congress, the Center for Legislative Archives conducts a web harvest of all congressional websites at the end of each Congress.
Web harvests began with the 109th Congress in 2006, and the collection is available online. These snapshots of Congress' websites capture the evolution of the web as a medium for Congress to communicate with the public.
The most recent harvest, conducted in December of 2012, visited 287,000 hosts and processed nearly 190,000,000 URIs. The volume of data captured approaches eight terabytes. It expands on previous harvests with a scope designed to capture not only content hosted and stored on Member and committee websites, but also content hosted on a number of social media sites used by Congress.