Editor’s Note: This article first appeared in the July 2015 Library of Virginia Newsletter.
One of the Library of Virginia’s newest online collections was recently hacked, and we could not be more excited. The Kaine Email Project has caught the attention of a group of civic hackers called Code for Hampton Roads. As the local chapter of the Code for America Brigade, Code for Hampton Roads provides opportunities for people to marry technological skills with a desire to foster open government and improve communities through open-source web solutions. The group’s recent projects include web apps for finding local restaurants’ health inspection results and for searching all of Virginia’s civil court records from a single search page.
In the case of the Kaine Email Project, on 6 June 2015, hackers got a chance to tackle this massive data set (currently composed of more than 130,000 processed records) as part of the third annual National Day of Civic Hacking. The hackers’ goal was to devise new entry points for researching the collection, such as visualizations of topic frequency in Kaine administration email discussions or maps showing which correspondents interacted with each other the most. An immediate output of the hack-a-thon was a “word cloud” of the most common terms used in the set of emails currently available for public viewing.
Governor Kaine attending launch of the Virginia Higher Education Wizard, Virginia State Police Headquarters, Richmond, 11 March 2009
Office of the Governor (Kaine: 2006-2010), State Records Collection, Library of Virginia
A word-cloud generator creates a free-form collage of words from a piece of text or from text data, giving greater prominence to words that appear most often. In this case, the words “budget,” “governor,” “meeting,” and “update” were among the largest words. The hackers also began developing network maps to show communication channels within the administration.
@StanZheng explaining his work on the Governor's Emails Project #NDoCH2015 #Code4HR, 6 June 2015
Photo from Code for Hampton Rhoads Twitter Feed
They hope to refine their preliminary maps to show communication channels over time, as well as to highlight communication networks among people within the Kaine administration and with other government agencies or media outlets.
As a bonus, one of the participants in the hack-a-thon was able to compress the Kaine email data set provided by the Library of Virginia, thus reducing its size by nearly 70 gigabytes and making it much easier to distribute to future hackers. We hope that this is only the beginning of the public’s innovative engagement with the Kaine Email Project and look forward to sharing more results as the hackers continue their work.
The Library of Virginia’s Kaine Email Project makes the email records from the administration of Governor Timothy M. Kaine, Virginia’s 70th governor (2006–2010), accessible online. Users can search and view email records from the Governor’s Office and his cabinet secretaries; learn about other public records from the Kaine Administration; go behind the scenes to see how the Library of Virginia made the email records available; and read what others are saying about the collection. This project would not have been possible without funding provided by Congress for the Library Services and Technology Act (LSTA).