In a perfect example of where libraries generally find themselves in the era of fast-paced technological innovation and Big Data, the Library of Congress is having some trouble transitioning.
In the spring of 2010, the Library of Congress announced it was taking a big stride toward preserving the nation’s increasingly digital heritage — by acquiring Twitter’s entire archive of tweets and planning to make it all available to researchers. But more than five years later, the project is in limbo. The library is still grappling with how to manage an archive that amounts to something like half a trillion tweets. And the researchers are still waiting.
The archive’s fate is yet another example of the difficulty of safeguarding the historical records of an era when people communicate using easily deletable emails, websites that can be taken down in seconds and transient tweets, Vines and Snaps. But the library’s critics also see it as a cautionary tale from the 28-year tenure of retiring Librarian of Congress James Billington.
During Billington’s time in office, say critics, the library has espoused grand technological ambitions but didn’t back them up with the planning, budget or nuts-and-bolts needed to turn them from buzzy news releases to tangible accomplishments. It has also repeatedly faced criticism for its management of the U.S. Copyright Office, which has been drawn into numerous controversies on issues involving software, cellphones and online music streaming.
This isn't a unique story. It's a very public example, but most libraries are going through something similar on a much smaller scale.
But in response to deeply critical Government Accountability Office report in March about the library’s tech shortcomings, he said he was taking steps to “fully realize the possibilities of the digital era,” including plans to hire a chief information officer by September; the library has not had a permanent CIO since 2012.
And this is a great example of the types of expertise that libraries need to stay at the forefront and successfully keep pace with technological advances.
Osterberg said the library is still making progress on the tweet archive — officially known as the Twitter Research Access project. “The Library has been working to index the collection and develop use policies,” while having to balance “the size and dynamic nature of the Twitter platform” and “the resource realities of a public institution,” she said.
And this sentence sums up the issue to a T.
While some have accused the library of wasting time and money on preserving social media ephemera, the institution has argued that the huge stash of tweets could provide future generations an invaluable real-time record of how humans in the 21st century communicate.
“Archiving and preserving outlets such as Twitter will enable future researchers access to a fuller picture of today’s cultural norms, dialogue, trends and events to inform scholarship, the legislative process, new works of authorship, education and other purposes,” Osterberg wrote in a short 2013 white paper that was the last major update on the state of the tweet collection.
This is, again, where we have to ask ourselves if we have too much collective knowledge and what is truly useful to archive and preserve. I was mostly convinced during a recent IBM Watson discussion to accept that the goal is not less data but to provide better ways to organize and retrieve data. It's futile to reject Big Data. And libraries and librarians need to understand the skills necessary to provide relevant services in the Big Data age. Coding camp anyone?