Creatively Harvesting Bluebook Data

As late as 2016, I was ready to join Justice Posner and give up on The Bluebook. After research into the use of algorithms in the era of big data, however, my thinking has changed.

The Chronicle of Higher Education recently ran an article articulating the concerns with following a particular citation style. The problem with the rules-heavy approach to teaching [citation] isn’t just the rigidity with which students are taught those rules or follow them. It’s that too often students are taught rules without any context or justification. That’s just "the way things are." Students are left following rules just because a [law review editor] told them to, none the wiser about their function or history. It’s a recipe for seeing writing as foreign or external — something a student is supposed to do but not necessarily understand. Just follow the rules, kid, and there won’t be any trouble.

Instead of taking this approach to citation, the author leads a discussion not about citation styles, but about citation itself. Why do we cite? What purpose does it serve? Why should we have conventions for citation? Eventually, of course, we end up talking about style guides, and those students who just want to know whether to put the period before or after the closing parenthesis will get their answer. But they also learn something about why there are so many different style guides. (It has to do with different disciplinary priorities: The sentence is important to the humanities; the year is important to the sciences.) Students learn about the value of giving credit to people for their ideas, and about the collaborative nature of all scholarly work. What begins as a technicality can end up going pretty deep into the very nature of the writer’s task.

Even given this deeper thinking about citation, it was hard to completely buy into the rigidity of The Bluebook (particularly given Posner's justifications for a streamlined legal citation style).

That is until algorithms and big data became powerful players in quickly harvesting content.

The folks at Casetext are leading the way here, as Pablo Arredondo, Casetext's Co-founder and Chief Legal Research Officer, explains in his recent article Harvesting and Utilizing Explanatory Parentheticals, 69 S.C. L. Rev. 659 (2018).

From the intro:
Explanatory parentheticals -- the concise summaries neatly packages alongside case citations -- are ubiquitous, easily harvested, and grossly underutilized. This Paper describes what is believed to be the first instance of harvesting explanatory parentheticals and utilizing them on a mass scale. Specifically, this Paper describes how hundreds of thousands of parentheticals were identified, mined from case law, and then integrated into Casetext, a free legal research platform. The value that parentheticals add to research is explored, including enhancing the value of citatory. 

The article provides, [i]n sum, the common law is teeming with concise case law summaries, and leveraging The Bluebook-decreed consistency of the explanatory parenthetical format enables an immense set of these summaries to be harvested algorithmically. 

Thank you to Casetext for creatively leveraging this data and providing a "real" justification for following rigid citation rules. When teaching citation, this is a wonderful conversation starter on many fronts.


  1. But the parenthetical information is summarizing the case for the point of law being discussed in the text. It is not attempting to summarize the case, it is attempting to highlight any number of things - facts that make this case different from another; treatment of an aspect of law that aids the author's argument; etc. Aggregating these parenthetical comments might be interesting at some level, but to refer to these as "concise case law summaries" is simply not accurate. If I read a case summary in a reputable treatise or practice aid, or the summaries on Westlaw or Lexis, I'll have enough information to determine whether the case warrants further explanation. If I read aggregated parentheticals about a case that are harvested algorithmically, I'm not convinced that I'll have a complete understanding of that case.

    1. Hi Karen - you raise a valid point, one that I try to address in my paper. I would also recommend you check out Casetext and see the summaries in action. Cases are rarely monolithic, and sometimes an aggregate of summaries (each generated from a unique context) can capture aspects of a case that are overlooked or ignored in the one-size-fits-all case summary generated by (the very good) editors at Lexis and Westlaw. If after checking out the paper and Casetext you want to discuss further, please let me know as I never tire of talking about the power (and, yes, limitations) of explanatory parentheticals.

  2. That's a very practical point. I would recommend reading Pablo's article. As he mentioned on Twitter today: "Subtitle of Bluebook should be "You can regex this now." Huge value to be gained by exploiting simple but ubiquitous patterns in legal docs. Viva judicial language processing!" Ultimately, we can expect to see much more of this type of exploitation of word patterns -- see, for example, Corpus Linguistics as a Tool in Legal Interpretation. And we have to understand the benefits and risks of relying on it.

  3. The purpose of citations is to enable the reader to find cited content. I can't speak to every single citation format out there, but I do know that the Blue Book editors do not provide helpful citation formats for state documents. The emphasis is too much on uniformity and not based on how to cite so you can actually find the content. Citators should also give credit to AALL's concept of vendor-neutral citation of case law.


Post a Comment

Popular posts from this blog

For The Love Of Archives

Law Library Lessons in Vendor Relations from the UC/Elsevier Split

Library Catalogs & Discovery Layers