Wednesday, July 29, 2015

LOC's Twitter Archive In Limbo

In a perfect example of where libraries generally find themselves in the era of fast-paced technological innovation and Big Data, the Library of Congress is having some trouble transitioning.

In the spring of 2010, the Library of Congress announced it was taking a big stride toward preserving the nation’s increasingly digital heritage — by acquiring Twitter’s entire archive of tweets and planning to make it all available to researchers. But more than five years later, the project is in limbo. The library is still grappling with how to manage an archive that amounts to something like half a trillion tweets. And the researchers are still waiting.

The archive’s fate is yet another example of the difficulty of safeguarding the historical records of an era when people communicate using easily deletable emails, websites that can be taken down in seconds and transient tweets, Vines and Snaps. But the library’s critics also see it as a cautionary tale from the 28-year tenure of retiring Librarian of Congress James Billington.

During Billington’s time in office, say critics, the library has espoused grand technological ambitions but didn’t back them up with the planning, budget or nuts-and-bolts needed to turn them from buzzy news releases to tangible accomplishments. It has also repeatedly faced criticism for its management of the U.S. Copyright Office, which has been drawn into numerous controversies on issues involving software, cellphones and online music streaming.

This isn't a unique story. It's a very public example, but most libraries are going through something similar on a much smaller scale.

But in response to deeply critical Government Accountability Office report in March about the library’s tech shortcomings, he said he was taking steps to “fully realize the possibilities of the digital era,” including plans to hire a chief information officer by September; the library has not had a permanent CIO since 2012.

And this is a great example of the types of expertise that libraries need to stay at the forefront and successfully keep pace with technological advances.

Osterberg said the library is still making progress on the tweet archive — officially known as the Twitter Research Access project. “The Library has been working to index the collection and develop use policies,” while having to balance “the size and dynamic nature of the Twitter platform” and “the resource realities of a public institution,” she said.

And this sentence sums up the issue to a T.

While some have accused the library of wasting time and money on preserving social media ephemera, the institution has argued that the huge stash of tweets could provide future generations an invaluable real-time record of how humans in the 21st century communicate.

“Archiving and preserving outlets such as Twitter will enable future researchers access to a fuller picture of today’s cultural norms, dialogue, trends and events to inform scholarship, the legislative process, new works of authorship, education and other purposes,” Osterberg wrote in a short 2013 white paper that was the last major update on the state of the tweet collection.

This is, again, where we have to ask ourselves if we have too much collective knowledge and what is truly useful to archive and preserve. I was mostly convinced during a recent IBM Watson discussion to accept that the goal is not less data but to provide better ways to organize and retrieve data. It's futile to reject Big Data. And libraries and librarians need to understand the skills necessary to provide relevant services in the Big Data age. Coding camp anyone?

Tuesday, July 28, 2015

New Repository For Dark Data

Do you ever wonder what happens to the vast treasure trove of data on which researchers rely for some of their most startling discoveries? Most of it goes "dark" and is never seen again after a research project is over.

CHE is reporting that Researchers at the University of North Carolina at Chapel Hill are leading an effort to create a one-stop shop for data sets that would otherwise be lost to the public after the papers they were produced for are published. The goal of the project, called DataBridge, is to expand the life cycle of so-called dark data. It will serve as an archive for data sets and metadata, and will group them into clusters of information to make relevant data easier to find.

The hope is that eventually researchers from around the country will submit their data after publishing their findings.

Ideally, this is a great way to share data that is often very time consuming and expensive to extract.

Ultimately, The researchers are also interested in including another type of “dark data”: archives of social-media posts. For example, the group has imagined creating algorithms to sort through tweets posted during the Arab Spring, for researchers studying the role of social media in the movement.

And in some cases, the project could serve as a model for libraries at research institutions that are looking to better track data in line with federal requirements.

But as commenters to the article noted, there are issues with reusing data sets. What about authenticity and ownership rights?

To that end, librarians tried to get involved with the MLA beta repository that allows its members to post such data sets, as well as blog posts and conference papers, to assign a suitable license to them, and to receive a DOI for them (thereby going some way towards solving the authenticity and ownership issues). It was developed in collaboration with the librarians at Columbia's CDRS, and while only members can deposit their work, anyone can view and download it.

This is a good start for organizing data in a searchable format with disruptive technologies, such as the Internet of Things, capable of producing vast data sets that librarians should be at the forefront of sifting through.

Monday, July 27, 2015

Disruptive Technologies & Libraries

Continuing with the theme of technology, in 2013, McKinsey released a report on the 12 disruptive technologies that have the greatest potential to drive substantial economic impact and disruption by 2025.

Important technologies can come in any field or emerge from any scientific discipline, but they share four characteristics: high rate of technology change, broad potential scope of impact, large economic value that could be affected, and substantial potential for disruptive economic impact. Many technologies have the potential to meet these criteria eventually, but leaders need to focus on technologies with potential impact that is near enough at hand to be meaningfully anticipated and prepared for. Therefore, we focused on technologies that we believe have significant potential to drive economic impact and disruption by 2025. 

Here is the list of 12: 

And here is the projected impact:

Quite a few of the forecasted disruptive technologies will have a large impact on the way that people access information:
  • Automation of knowledge work: Intelligent software systems that can perform knowledge work tasks involving unstructured commands and subtle judgments
  • The Internet of Things: Networks of low-cost sensors and actuators for data collection, monitoring, decision making, and process optimization
  • Mobile Internet: Increasingly inexpensive and capable mobile computing devices and Internet connectivity
  • Cloud technology: Use of computer hardware and software resources delivered over a network or the Internet, often as a service
The report goes on to discuss other interesting observations and implications, and it is well worth the read. One thing that is specifically noted is that not all technologies live up to the hype.
    The link between hype and potential is not clear. Emerging technologies often receive a great deal of notice. News media know that the public is fascinated with gadgets and eager for information about how the future might unfold. The history of technology is littered with breathless stories of breakthroughs that never quite materialized. The hype machine can be equally misleading in what it chooses to ignore. As Exhibit E5 shows, with the exception of the mobile Internet, there is no clear relationship between the amount of talk a technology generates and its potential to create value.
    The lesson for leaders is to make sure that they and their advisers have the knowledge to make their own assessments based on a structured analysis involving multiple scenarios of technology advancement and potential impact. 

    And it's very important that policymakers do not make preemptive decisions in light of where technology stands now and where it is projected to go in the future. None of us know exactly where it will go or what it will mean, specifically for the future of libraries. In 2025, I suspect that we will be closer to our own personal Watson's but still very far from computers completing replacing humans in the knowledge-work sector. 

    Friday, July 24, 2015

    Innovation & Jobs

    With so much chatter recently about technology killing jobs, it's hard not to notice.

    A 2014 NYTimes article reviewed books with competing outlooks. One camp is optimistic about technology and jobs, while the other is much more pessimistic.

    As noted, in looking at the effect technology can have on jobs, look no further than Kodak. "At its peak, Kodak employed 140,000 people; Instagram had only 13 employees when it was bought by Facebook (for $1 billion!) in 2012." This is the pessimistic view.

    In addition, Erik Brynjolfsson and Andrew McAfee, two economists from the Massachusetts Institute of Technology, note that “[r]apid and accelerating digitization is likely to bring economic rather than environmental disruption, stemming from the fact that as computers get more powerful, companies have less need for some kinds of workers.” They believe that we are at a moment when technological innovation is about to accelerate, and make the world much wealthier, just as the Industrial Revolution did 250 years ago. Yet buried in their sunny prose is a darker forecast: that while this digital revolution will be great for innovators, entrepreneurs and other creative people, not everyone will participate — especially those who do jobs that software can do better.

    On the other side of the fence is Robert J. Gordon, a macroeconomist at Northwestern University. "In his view, the next 40 years of innovation is not going to look much different from the past 40 years, which he believes haven’t been nearly as transformative or wealth-creating as the discovery of electricity and the invention of the light bulb." When asked whether future innovation would cost jobs, he said he thought it would, but no more or less than has always been the case.

    So, in essence, we are where we were 30 years ago. We have one side telling us that doom is imminent, and we have the other side telling us that the type of innovation that will kill jobs is still many decades away.

    What is important is that we should recognize now that we have the ability to take control of our destiny rather than letting technology take control of us. We need to stay ahead of the curve and make sure that there is a place for us (librarians) in the future.

    Thursday, July 23, 2015

    Intelligence Augmentation (IA) v. Artificial Intelligence (AI)

    While at AALL, I watched Kyla Moran present on IBM's Watson. One thing struck me: the big difference between intelligence augmentation (IA) and artificial intelligence (AI). Kyla likened it to Ironman's JARVIS v. Terminator.

    It's a long-running "joke" of sorts within the librarian profession that "they've" been predicting our demise in favor of artificial intelligence for at least 30 years. And it's gotten louder recently with books like Rise of the Robots.

    Kyla commented that Watson is augmented intelligence. He makes us smarter. And IBM is not trying to overtake humans with machines.

    According to Wikipedia:
    Intelligence amplification (IA) (also referred to as cognitive augmentation and machine augmented intelligence) refers to the effective use of information technology in augmenting human intelligence. The idea was first proposed in the 1950s and 1960s by cybernetics and early computer pioneers.

    IA is sometimes contrasted with AI (Artificial Intelligence), that is, the project of building a human-like intelligence in the form of an autonomous technological system such as a computer or robot. AI has encountered many fundamental obstacles, practical as well as theoretical, which for IA seem moot, as it needs technology merely as an extra support for an autonomous intelligence that has already proven to function.

    Augmented intelligence will be our reality in the near future. We will use computers to aid us in our capability to retrieve relevant results in the age of big data.

    The ethics are thorny in this area, and although IBM says that it doesn't want to replace humans, it's not inherent that other entities will be so ethical. That said, we won't be in a position anytime soon to be completely replaced by computers, and it's important for the public to understand where computing power stands now and why librarians are still needed. If the public perception is that library's are not needed, then budgets will be slashed. But if the public understands the need for libraries and librarians, we will continue to be supported and offer our invaluable services.

    Wednesday, July 22, 2015

    Google Fares Better Than Proprietary Plagiarism Software

    Expensive plagiarism detection software from vendors such as Turnitin and SafeAssign proves to be no better than Google at detecting plagiarism. In fact, in past studies, Google has done a better job.

    InsideHigherEd recently reported on a study by Susan E. Schorn from the University of Texas at Austin. The data come from Susan E. Schorn, a writing coordinator at the University of Texas at Austin. Schorn first ran a test to determine Turnitin’s efficacy back in 2007, when the university was considering paying for an institutionwide license. Her results initially dissuaded the university from paying a five-figure sum to license the software, she said. A follow-up test, conducted this March, produced similar results. For the 2007 test, Schorn created six essays that copied and pasted text from 23 different sources, which were chosen after asking librarians and faculty members to give examples of commonly cited works. Examples included textbooks and syllabi, as well as websites such as Wikipedia and free essay repositories. Of the 23 sources, used in ways that faculty members would consider inappropriate in an assignment, Turnitin identified only eight, but produced six other matches that found some text, nonoriginal sources or unviewable content. That means the software missed almost two-fifths, or 39.34 percent, of the plagiarized sources.

    SafeAssign (the product UT-Austin ended up choosing, as it was bundled with the university's learning management system) fared even worse. It missed more than half, or 56.6 percent, of the sources used in the test. Mark Strassman, Blackboard's senior vice president of industry and product management, said the company has since "changed the match algorithms … changed web search providers" and "massively" grown the database of submissions SafeAssign uses.

    Google -- which Schorn notes is free and worked the fastest -- trounced both proprietary products. By searching for a string of three to five nouns in the essays, the search engine missed only two sources. Neither Turnitin nor SafeAssign identified the sources Google missed.

    A more recent test shows that results are not much better since 2007. As UT-Austin recently replaced its learning management system, it also needed to replace its plagiarism detection software. Schorn therefore conducted the Turnitin test again this March. Out of a total of 37 sources, the software fully identified 15, partially identified six and missed 16. That test featured some word deletions and sentence reshuffling -- common tricks students use to cover up plagiarism.

    We must be cognizant of the limitations of these plagiarism detectors. While they are useful, plagiarism detectors are a starting point, and we cannot use them with abandon.

    Monday, July 20, 2015

    AALL Annual Conference 2015

    The AALL Annual Conference 2015 is currently underway. Follow me on Twitter @gngrlibrarian for updates, or go to #AALL15 to see updates from all attendees.

    Friday, July 17, 2015

    .Law Domain Names Available Oct. 12

    Last April, the ABA Journal reported on a new .law domain. Minds + Machines has the exclusive license to operate the new .law domain from the Internet Corporation for Assigned Names and Numbers. Minds + Machines said in a press release that it was partnering with the Legal Marketing Association to allow its members to submit an expression of early interest in .law domain names.

    But that doesn’t mean others can’t submit their own expressions of interest, Andreozzi says. Those who submit an expression of interest aren’t obligated to buy the domain, but they will be allowed to purchase it when .law becomes generally available, if no one else expresses an interest. When more than one person is interested, an auction is held.

    Standard names such as will cost $200, while premium names such as will start at $500. The cost will be based on factors such as the number of characters, and the value of certain practice areas.

    Anyone who applies for a .law domain will have to certify that he or she is a lawyer and submit to a verification process. Lawyers can apply on behalf of themselves, their law firms and their companies.

    The company has announced that, beginning July 30, trademark holders will be able to register corresponding names. The names will be available for sale to lawyers Oct. 12.

    Other new domains of interest to lawyers will also become available. They include .attorney, .esq and .lawyer.

    Thursday, July 16, 2015

    Libraries Matter More Than Ever

    Salon had it right when it stated that libraries are more important than ever.
    In our heartfelt but naïve fondness for “quiet, inviting spaces” full of books and nothing else, we fail to realize that libraries are becoming more important, not less, to our communities and our democracy.

    One of the main reasons that libraries are more important than ever is because libraries and librarians help sift through the mountains of data that humans are currently producing.
    Humans are producing such quantities of data—2.5 quintillion bytes of data daily, to be precise—and on such a steep curve, that 90 percent of all existing data is less than two years old. An overwhelming amount of information, access to which is marked by the same stark inequality that exists between economic classes, demands to be moderated for the public good, and libraries are the institutions that do that.

    The risk of a small number of technically savvy, for-profit companies determining the bulk of what we read and how we read it is enormous. The great beauty of the rich, diverse library system that has developed over past century and a half has been the role of librarians in selecting and making available a range of material for people to consult and enjoy. No one pressing an ideology can co-opt this system; no single commercial entity can do an end run around the library system in the interest of profit.

    Libraries and librarians help moderate this data in an age when we are really starting to question if there is too much collective knowledge. It's not that libraries are becoming less important as the need for print materials lowers; it's that the public needs to adjust its notion of what it means to be a library.

    Tuesday, July 14, 2015

    LSAT Takers Up

    The Wall Street Journal is reporting that "[t]he number of LSAT takers released by the Law School Admission Council now suggest strongly that law schools are starting to pull themselves back up."

    A total of 23,238 people took the test last month, up 6.6% from the year before, according to new LSAC data. The figures represented the first growth in the June crop of test-takers since 2010 and extend an upswing that began in December when the numbers inched up by 0.8% and accelerated in February with a 4.4% increase. The bump follows a 2014-2015 cycle in which the number of test-takers over the year nearly slipped below 100,000, down 40% from a peak high of 171,514 six years ago.

    While LSAT test takers are up, "[t]he latest applicant figures give law schools a bit less to cheer. The number of people that applied to law school is down 2% from 2014, according to LSAC. And the total number of applications submitted is 4.2% below last year’s total."

    This may signal that the market is finally righting itself. There's been more news lately that now is a good time to go to law school. This in direct contradiction to the news we've heard since 2010 when it was heavily reported that there were too many lawyers and not enough jobs.

    Monday, July 13, 2015

    Scribes Awards Luncheon 2015

    Scribes--The American Society of Legal Writers--will hold an award luncheon in Chicago during the ABA Annual Meeting.  Awards will be presented for the best new book in legal writing and for the best student-written briefs from moot court competitions.

    The luncheon will feature a special presentation of the Scribes Lifetime-Achievement Award to The Right Honorable, the Lord Woolf, with comments by Lord Woolf.  Lord Woolf was Master of the Rolls from 1996 until 2000 and Lord Chief Justice of England and Wales from 2000 until 2005. The Constitutional Reform Act 2005 made him the first Lord Chief Justice to be President of the Courts of England and Wales. He has also been a non-permanent judge of the Court of Final Appeal of Hong Kong since 2003.

    Also during the luncheon, keynote speaker Bryan A. Garner will share The Biggest Secret for Clear and Persuasive Writing at the 2015 Scribes Awards Luncheon in Chicago. Bryan Garner has written several books about English usage and style, including Garner's Modern American Usage and Elements of Legal Style. He is the editor-in-chief of Black's Law Dictionary and he has coauthored two books with U.S. Supreme Court Justice Antonin Scalia: Making Your Case: The Art of Persuading Judges (2008) and Reading Law: The Interpretation of Legal Texts (2012). He is the Founder and president of LawProse, Inc. and serves as Distinguished Research Professor of Law at Southern Methodist University School of Law.

    In addition to the award presentations and speaker, the event will mark the installation of the new officers for Scribes. The current Scribes president completing her term on August 1st is Darby Dickerson, Dean at Texas Tech University School of Law and the W. Frank Newton Endowed Professor at that school. She will be replaced by incoming president Justice Michael Hyman of the Illinois Supreme Court, a former president of the Chicago Bar Association.

    Here is the information about the luncheon, including how to order tickets:

    Saturday, August 1, Noon – 2:00 p.m.
    Swissotel, Edelweiss I Room, 43rd Floor
    323 Upper Wacker Drive, Chicago, Illinois
    $75 per person
    $50 for Judges, Government Employees, Young Lawyers, Law Professors, and Law Students

    RSVP to by July 17 with your choice of lunch entrée: chicken or vegetarian. For questions, go to or call (806) 834-5792.

    Librarians Stuck Between A Book & A Hard Pixel

    Libraries and librarians are being pulled in seemingly opposite directions. As the Washington Post states, "[a]round the country, libraries are slashing their print collections in favor of e-books, prompting battles between library systems and print purists, including not only the pre-pixel generation but digital natives who represent a sizable portion of the 1.5 billion library visits a year and prefer print for serious reading."

    And librarians are feeling the heat. “'We’re caught between two worlds,' said Darrell Batson, director of the Frederick County Public Libraries system. 'But libraries have to evolve or die. We’re probably the classic example of Darwinism.'”

    In the process of evolving from print to digital collections, centuries-old library traditions have been abandoned. To library futurists, this is progress. “For a lot of people, libraries represent a certain kind of quiet, a certain kind of place, a certain kind of book in large numbers,' said Matthew Battles, a fellow at the Berkman Center for Internet and Society and co-author of 'Library Beyond the Book.' 'These are beautiful ideas and ideals. But they demand reinterpretation and cultivation from generation to generation.'"

    To library purists, this is nonsense. “I get the sense that a lot of people have a feeling that tech has just moved along, that books are these old-fashioned things, that everything is going to be on the Internet, that a Kindle and Google is all you need,” Hays said. “But getting reliable information is a constant challenge today. Libraries help people find the credible information they need.

    This is the constant struggle of the library world today. We must evolve or die; but are we evolving in the right way? It's very difficult to be at the helm of organizing the world's collective knowledge while undergoing a paradigm shift in the way that people access information. I just hope that we are successful in our evolution because I can't imagine a world without books and libraries.

    Friday, July 3, 2015

    Casetext Adds Writing Tool

    The ABA Journal reports that Casetext, the free legal research website that uses crowdsourcing to annotate cases, has launched a new writing tool called LegalPad that publishes lawyers’ articles and links them to cases they cite.

    LegalPad has a lot of great functionality. When an article writer types in a case name, it is supplied in correct Bluebook form with a hyperlink to the case. A writer can select text from the case, and it will be inserted in the article. Writers will also be able to choose the Casetext communities where their articles will appear.

    LegalPad users can also write articles that are shared with like-minded Casetext community groups based on practice areas and interests. There are links to cases discussed in the articles, and the cases will in turn link to articles.

    Casetext founder Jake Heller tells LawSites that his goal is for Casetext to become a place to build legal commentary as well as a tool for legal research. A Casetext press release points out that lawyers who publish articles on the website can build reputations in their specialty areas.

    Thursday, July 2, 2015

    Recent Westlaw Announcements

    A few recent announcements from Westlaw:

    Goodbye Classic For All Segments:

    Westlaw Classic will officially be sunsetted in all segments nationwide on July 31, 2015. Westlaw Classic was discontinued in the Academic segment on July 1, 2014. Is there a feature that you will miss from Westlaw Classic that you don’t have on WestlawNext? If so, let me know. We love feedback!

    Cloud Delivery Now Available With Dropbox

    WestlawNext now allows you to save your files directly into your personal or business Dropbox accounts while researching. You will find Dropbox as an option when you use the document delivery at the top of the legal document you are viewing. Dropbox is a private, cloud storage service offering free and pay subscriptions. A separate subscription is required. 

    West Academic Online Study Aids

    Effective July 1st, the first time a student clicks on the Online Study Aids (SAS) link they will be prompted to setup a West Academic account (example below) and sign in.  The account creation process is quick and easy, and a wizard will guide them through the process.  This is a one-time process, and when completed, a student will use their OnePass to log into and seamlessly enter the SAS homepage as they have in the past.

    It's good news that Classic is gone for all segments. When it was discontinued for Academic accounts last summer but not for professional accounts, I worried that if an extern or recent graduate was sent to a firm, for example, that was still using Classic, it would be very hard to assist them with research help. 

    As to the Dropbox integration, I have been using Dropbox to deliver faculty research this spring, and it is a wonderful tool. I love that is has been integrated into Westlaw's functionality.

    West Academic recently split from Thomson Reuters, which is the cause of the additional account creation. It sounds like after students create the West Academic account, it should work seamlessly with their OnePass. 

    Wednesday, July 1, 2015

    The Benefits Of Metasearch Engines

    The use of metasearch engines hasn't caught on with the general population. Most everyone still defaults to one of the major search engines - most often Google.

    But we should be aware of metasearch engines and the benefits of using them. From Wikipedia:
    A metasearch engine (or aggregator) is a search tool that uses another search engine's data to produce their own results from the Internet.[1][2] Metasearch engines take input from a user and simultaneously send out queries to third party search engines for results. Sufficient data is gathered, formatted by their ranks and presented to the users.

    Information stored on the World Wide Web is constantly expanding, making it increasingly impossible for a single search engine to index the entire web for resources. A metasearch engine is a solution to overcome this limitation. By combining multiple results from different search engines, a metasearch engine is able to enhance the user’s experience for retrieving information, as less effort is required in order to access more materials. A metasearch engine is efficient, as it is capable of generating a large volume of data. 

    In essence, you can use a metasearch engine to run queries and compile results from all of the various serious engines. 

    Two metasearch engines are Dogpile and Ixquick.

    From Dogpile's About page:
    InfoSpace created the Dogpile search engine because your time is important to us. Powered by Metasearch technology, Dogpile returns all the best results from leading search engines including Google and Yahoo!, so you find what you’re looking for faster. Each search engine has its own method of searching and each will return different results. Dogpile looks at all of them, decides which are most relevant to your search, eliminates duplicates and reveals them to you. In the end, you get a list of results more complete than anywhere else on the Web.

    From Ixquick's About page:
    Ixquick search results are more comprehensive and more accurate than other search engines.  Ixquick's unique capabilities include an Advanced Search, a global search and power refinement. Professional researchers rely on advanced search methods such as Boolean logic, phrases, wildcards, and field searches. But different search engines support different advanced search methods, and require the user to access them in different ways. Keeping track of these differences can be time-consuming and burdensome. Ixquick solves this problem by understanding which advanced search methods each search engine supports, and how to access them.

    These are powerful search engines that do a lot of the heavy lifting for you. Why run a search limited to one search engine's algorithm and results when you can run a search across multiple search engines at one time? The answer seems obvious.