Law and Corpus Linguistics

From Wikipedia, the free encyclopedia

Law and corpus linguistics (LCL) is a new academic sub-discipline that uses large databases of examples of language usage equipped with tools designed by linguists called corpora to better get at the meaning of words and phrases in legal texts (statutes, constitutions, contracts, etc.).[citation needed] Thus, LCL is the application of corpus linguistic tools, theories, and methodologies to issues of legal interpretation in much the same way law and economics is the application of economic tools, theories, and methodologies to various legal issues.


A 2005 law review article by Lawrence Solan noted in passing that corpus linguistics had potential for its application to interpreting legal texts.[1] But the first systematic exploration and advocacy of applying the tools and methodologies of corpus linguistics to legal interpretive questions of law and corpus linguistics came in the fall of 2010, when the BYU Law Review published a note by Stephen Mouritsen, entitled The Dictionary is Not a Fortress: Definitional Fallacies and a Corpus-Based Approach to Plain Meaning.[2] The note argued that dictionaries are the primary linguistic tool used by judges to determine the plain or ordinary meaning of words and phrases, and highlighted the deficiencies of such an approach. In its stead, the note proposed using corpus linguistics. And the note would be later cited by Adam Liptak in a New York Times article on statutory construction.[3]

Law and corpus linguistics (LCL) gained greater legitimacy in July 2011 with the first judicial opinion in American history utilizing corpus linguistics to determine the meaning of a legal text: In re the Adoption of Baby E.Z.[4]: 702  In a concurrence in part and in the judgment, Justice Thomas Lee wrote to put forth an alternative ground for the majority's holding—interpreting the phrase "custody determination" by using corpus linguistics. Justice Lee looked at 500 randomized sample sentences from the Corpus of Contemporary American English (COCA) and found that the most common sense of "custody" was in the context of divorce rather than adoption.[4]: 724  Further, he found that "custody" is ten times more likely to co-occur (or collocate) with "divorce" than with "adoption".[4]: 724  From that evidence Justice Lee concluded that he "would find that the custody proceedings covered by the Act are limited to proceedings resulting in the modifiable custody orders of a divorce", rather than the broader range of custody proceedings.[4]: 725 

Other jurisprudence and scholarship would follow. In a 2015 concurrence in State v. Rasabout, Justice Lee used a COCA search to determine that "discharge" when used with a firearm (or one of its synonyms) overwhelmingly referred to a single shot rather than emptying the entire magazine of the weapon.[5] And in 2016, four of the five justices joined a footnote in a majority opinion by Justice Lee commending a party for using corpus linguistics in its briefing even though the Court found it unnecessary to resolve the related question.[6] Finally, in 2016 the Michigan Supreme Court became the first court to use a linguist-designed corpus in a majority opinion (COCA), with both the majority and the dissent turning to COCA to determine the meaning of the word "information".[7]

In 2020, courts desiring to bolster the legal theory of original intent have sought the opportunity to undertake analyses of statutes utilizing corpus linguistics. In a Ninth Circuit Court of Appeals case, Jones v. Becerra (No. 20-56174), a case involving the Second Amendment and the constitutionality of a California statute which bans the sale of firearms to individuals under the age of 21, a Ninth Circuit panel requested that the parties address three questions:

1) “What is the original public meaning of the Second Amendment phrases: ‘A well regulated Militia’; ‘the right of the people’; and ‘shall not be infringed’? 2) How does the tool of corpus linguistics help inform the determination of the original public meaning of those Second Amendment phrases?” 3) How do the data yielded from corpus linguistics assist in the interpretation of the constitutionality of age-based restrictions under the Second Amendment?[8]

As to scholarship, in 2012, Mouritsen followed up his original work with an article in the Columbia Science and Technology Law Review, where he further refined and promoted the use of corpus-based methods for determining questions of legal ambiguity.[9] Additionally, in 2016 two essays and an article on law and corpus linguistics were published. The Yale Law Journal Forum published Corpus Linguistics & Original Public Meaning: A New Tool to Make Originalism More Empirical. Written by Justice Lee and two co-authors, the essay urged originalists to turn to corpus linguistics to improve the rigor and accuracy of originalist scholarship.[10] And in response, the Forum published an essay by Lawrence Solan (a Brooklyn Law professor with a PhD in linguistics), Can Corpus Linguistics Help Make Originalism Scientific?[11] The Boston University Public Interest Law Journal published The Merciful Corpus: The Rule of Lenity, Ambiguity and Corpus Linguistics by Daniel Ortner.[12] In the article Ortner applied corpus linguistics to determining whether sufficient ambiguity exists to trigger the rule of lenity in five Supreme Court cases. Looking forward, in 2017 two more articles are slated for publication. Lee Strang focuses on corpus linguistics and originalism in the U.C. Davis Law Review,[13] and Lawrence Solan and Tammy Gales explore corpus linguistics in the context of finding ordinary meaning in statutory interpretation in the International Journal of Legal Discourse.[14]

Lawyers and journalists have also taken notice of corpus linguistics at it relates to the law. In 2010, Neal Goldfarb filed the first known brief in the Supreme Court using corpus linguistics (COCA) to determine whether the ordinary meaning of "personal" referred to corporations in the case FCC v. AT&T. The amicus brief looked at the top collocates (words that co-occur) of "personal" in COHA as well as BYU's Time Magazine Corpus.[15] And writing for The Atlantic, Ben Zimmer took note of this new trend, referring to corpus linguistics in the courts as "Like Lexis on Steroids".[16]

On the academic front, in 2013 BYU Law School started the first class on law and corpus linguistics, co-taught by Mouritsen, Lee, and (now Dean) Gordon Smith. The class is currently in its fourth year. And in February 2016, BYU Law School hosted the inaugural conference on LCL, with over two dozen legal and linguistic scholars from around the country discussing and debating the next steps forward for the growing academic movement.[17] A second conference is scheduled for February 2017. At the conference BYU Law School announced its plans and progress on the Corpus of Founding Era American English (COFEA), a corpus that will cover 1760–1799.[18] To date 120 million words have been collected from founding era letters, diaries, newspapers, non-fiction books, fiction, sermons, speeches, debates, legal cases, and other legal materials.


  1. ^ Solan, Lawrence M. (2005). "The New Textualists' New Text". Loyola of L.A. Law Rev. 38: 2027–2062. SSRN 719786 – via SSRN.
  2. ^ C., Mouritsen, Stephen (2010-01-01). "The Dictionary Is Not a Fortress: Definitional Fallacies and a Corpus-Based Approach to Plain Meaning". BYU Law Review. 2010 (5).{{cite journal}}: CS1 maint: multiple names: authors list (link)
  3. ^ Liptak, Adam (2011-06-13). "Justices Turning More Frequently to Dictionary, and Not Just for Big Words". The New York Times. ISSN 0362-4331. Retrieved 2016-12-05.
  4. ^ a b c d 266 P.3d (Utah 2011). Available at
  5. ^ 356 P.3d 1258, 1281–1282 (Utah 2015) (Lee, J., concurring). Available at
  6. ^ Craig v. Provo, 2016 WL 4506309, 2016 UT 40, para. 26 n.3. Available at
  7. ^ People v. Harris, 2016 WL 3449466, 499 Mich. 332. Available at,33&sciodt=6,33#r[31]
  8. ^[bare URL PDF]
  9. ^ "Hard Cases and Hard Data: Assessing Corpus Linguistics as an Empirical Path to Plain Meaning – Columbia Science and Technology Law Review". Retrieved 2016-12-05.
  10. ^ "Corpus Linguistics & Original Public Meaning: A New Tool to Make Originalism More Empirical".
  11. ^[bare URL PDF]
  12. ^ Ortner, Daniel (2014-12-01). "The Merciful Corpus: The Rule of Lenity, Ambiguity and Corpus Linguistics". Rochester, NY: Social Science Research Network. SSRN 2576475. {{cite journal}}: Cite journal requires |journal= (help)
  13. ^ Strang, Lee J. (2016-09-07). "How Big Data Can Increase Originalism's Methodological Rigor: Using Corpus Linguistics to Reveal Original Language Conventions". Rochester, NY: Social Science Research Network. SSRN 2665131. {{cite journal}}: Cite journal requires |journal= (help)
  14. ^ Solan, Lawrence M.; Gales, Tammy A. (2016-10-10). "Finding Ordinary Meaning in Law: The Judge, the Dictionary or the Corpus?". Rochester, NY: Social Science Research Network. SSRN 2850703. {{cite journal}}: Cite journal requires |journal= (help)
  15. ^[bare URL PDF]
  16. ^ Zimmer, Ben. "The Corpus in the Court: 'Like Lexis on Steroids'". The Atlantic. Retrieved 2016-12-05.
  17. ^ See "Conference Attendees" and "Conference Schedule"
  18. ^ See "Current Projects"