Problems with Modern Electronic Legal Research—“Googlization” versus “Boolean Logic”
By Michael Wein
About every two to three years, Westlaw and Lexis representatives contact my office about renewing my existing (and not inexpensive) legal database subscription or switching to the other company. I have used legal databases since college, though which one I prefer is not the subject here. What’s comparably new is both companies for about the past decade, have boasted their new legal database systems now have a sophisticated algorithm search comparable to Google, for every appellate (and most Federal District Court cases), to quickly and correctly locate the best cases on the topic. This conceivably makes Boolean logic or just regular “keyword” searching, unnecessary or obsolete. As explained below, though there are some advantages, depending on the market audience, it is not necessarily these algorithms are superior, particularly when it comes to locating more obscure appellate caselaw, to help win a case.
The Movement to “Googlization” Algorithms from Traditional Boolean Logic Keyword Searches
Westlaw with “Westlaw Next” and Lexis with “Lexis Advanced” both began around 2010 and have new logarithms based on non-Boolean keyword searches, and a type of intuitive AI (artificial intelligence) feature, comparable to Google searches. The change has been quick in the past decade. While in the past, Boolean logic searches with a combination usually of keywords, were the only way to do things; now, while they are permitted, they are not needed, and arguably no longer considered to be best practice. A copy of the University of Maryland’s School of Law “Electronic Research Techniques” discussion gives a good refresher course on this topic of Boolean Logic searches.
As noted by Lexis discussing Boolean searches, these now appear to have been disfavored, at least through public marketing:
“If you’re comfortable using search commands and connectors—! and * to truncate words and W/n, OR, AND, &, etc., to connect search words and phrases—you can use these special commands to develop Lexis searches.” [Emphasis Added]
Both companies stress the new AI technology, which has become a mainstay, will vastly improve or at least should be seen as a significant upgrade in more accurate and faster legal research. The specifics of how, are considered proprietary, but a 2019 online Law Review article from the University of Colorado Law School, helps fill in the gaps. Despite only seven printed pages, they discuss how the main legal research software companies view their own proprietary systems. It appears to have the initial conversation on the topic; each main database company was allowed to “co-author” their own description. For the article, Inside the Black Box of Search Algorithms,[1] Professor Susan Mary of the University of Colorado Law School, co-wrote portions with the President of Bloomberg Law, CEO of Fastcase, Vice President of LexisNexis, and the head of Westlaw’s (Thomson Reuter’s) AI division.
From that article, while the exact formulations remain proprietary (and are written somewhat as a Public Relations company would explain it using passive voice), it is clear significant investment has gone into the technology, seeking to beat the competition for the audience of legal professionals. This will continue to ramp up, for legal professionals, discerning which system is the best value, especially with new players on the market, such as Bloomberg Law and Fastcase.
While Lexis and Westlaw are still dominant, and arguably the most comprehensive (but expensive) database services, less expensive alternatives, will likely continue to seek out the large variety of legal professionals. This may be driven in part, by the patent expiration at the end of 20 years, of Google’s ranking algorithm known as “Page Rank” on September 23, 2019, so this initially revolutionary patent in AI technology is no longer considered proprietary, with more Google search patent expirations presumably on the horizon). Additionally, in May 2020 in Georgia v. Public.resource.org. the United States Supreme Court held 5-4 that Lexis’ agreement with the State of Georgia, to exclusively provide the statutory annotations, was not copyrightable at all, under the Government Edict Doctrine, reducing some of the exclusivity afforded jurisdictions with Westlaw and Lexis in the past.
All of this will allow even more competition to provide legal research databases. For example, the recently announced “upgrade” with the Start-up Casetext, advocates the benefits of a legal search tool “no longer limited to keywords.” As the SCOTUSBLOG founder Tom Goldstein noted on the site, this is “so powerful, that users have described the technology as ‘straight up witchcraft” and “almost…like cheating […]”
That may be an exaggeration, but accepting the new algorithms have useful value, there remains the problem with doing orange to orange comparison because of the “proprietary” nature of the algorithms themselves. Yet, I find Boolean-logic remains my preference, as if used effectively, for those attuned to its benefits, it typically remains faster and more efficient. This concern against relying on the google-like algorithm model is also predicated on concerns the algorithm technology itself can lead to missed cases. Thus, it is not usually an improvement, if you are already familiar with Boolean-logic searches and you regularly do legal research.
Thus, for some of the legal professional audience who do not conduct regular legal research it could greatly benefit. This would also apply to non-lawyers with need to access legal cases occasionally. A legal search, using natural language, or a bunch of words that explain the concept even though not premised upon a precise “Boolean” keyword search, will likely turn up the main case on the topic. However, I have a concern unfettered and exclusive use of algorithms, may lead to negatives in legal research and successful arguments, by giving a false sense of security you located “best case” when sometimes the best cases are difficult to locate and a “hidden gem.” This is particularly true in appellate law, which typically rely upon locating and putting together one or several cases as on-point precedent and/or most analogous as persuasive precedent, to help win. Thus, overall benefit for exclusive use of algorithms, is more tenuous.
Why Google-Like Searches Are Not The Same in Legal Research, As They May Miss Valuable, But Not Well-Travelled Precedent and Other Similar Cases Discoverable Through Boolean-Logic.
When people do a Google search on a topic, it is rare nowadays to have to go much further than the initial page of about ten (10) web page documents. The Page Rank system, originated by Google, helps analyze what web pages are frequented most on the word and synonym word subjects, functioning somewhat similarly to GPS recommendations that used to be based solely on the shortest distance. It essentially shows what are the most frequently and consistently “driven” paths. This will often lead to the information being sought. (Such as a Wikipedia entry on the topic).
But legal research is different. While the more well-trodden paths may quickly locate the main cases (especially if decided in the Supreme Court, or the particular State High Court), and from there, one can hopefully narrow down to search for better-related cases, it can miss less obvious cases. And these less obvious cases, may be at least, persuasively on point, or even decisively precedential.
So as a contextual example, a recent and now closed appellate case, argued in Maryland had a question of “First Impression” related to a legal issue in a pending case of my office. I had assisted to a degree, with the entered counsels, in providing case law I had argued previously in other cases.
It later came to my attention through some routine legal research on my own non-appellate case, reviewing the Briefing done by the firm in charge, they appeared to have “missed” not only arguing, but citing a remarkably similar case, in a far-flung different jurisdiction. This case appeared to have almost identical legal issues, overtly similar facts, involving a very similarly worded statute to Maryland, and came out with a decision quite positive. Though well-cited and good case law, because it wasn’t a Maryland case, it likely would not have popped up in an algorithmic search. However, it was relatively easy to locate, in a “Boolean-logic” search.
Fortunately, despite not being briefed, the Maryland appellate court unanimously in favor on the legal issue, as a basis of similar cases interpreting Maryland law, public policy, and the facts of the case. Still, it may have been a bit easier for the judges, had they been apprised of the almost identical case, from a different jurisdiction, to rightly constitute persuasive authority to consider. It is precisely these types of cases, that are in my opinion, missing from an overreliance on Google-like algorithms. Other states, not linked to the jurisdiction, but with a proper Boolean-logic search, will turn up, but not if you only do an algorithmic search.
Another variety of cases sometimes missed, are ones that involve precedent in the same jurisdiction, but not been “well-travelled.” So from a legal standpoint, if a case with remarkably similar facts, and relevant holding hasn’t been cited regularly for decades, there’s no good reason for the algorithmic search, to reliably pick it up, especially as one of the “first page” searches like in Google. However, the appellate law researcher, knows any still good precedent, can transform a quite difficult appeal for your client, to a quite difficult appeal for the opposing counsel, who now needs to explain why the appellate court should reverse or overturn existing precedent.
My advice and suggestion is primarily for those who keep track of appellate law, or engage in regular complicated trial level memoranda. If you know the basics of most legal issues, it is advantageous to primarily use “Boolean logic” search to formulate key words or phrases. Sometimes they do not “hit” on the first round, with an overly broad or narrow search result. But usually by the third attempt, is it not far off the mark.
Presently, I recommend algorithmic searches as a primary in narrow circumstances. First, is the rare circumstance of it’s unclear where to start, on an obscure or very new topic area, that I’d be surprised if there were any caselaw, which may justify first checking secondary sources like law review articles. Second, is when a “Boolean-logic” search does not provide a satisfactory answer directly on the legal issue, after tracking conventional leads from cases, statutes, and secondary sources. So in that circumstance, after Boolean logic searches thoroughly explore the topic, I will “double check” I got the best answer, by doing a more “natural wording” type search with the algorithm. If things check out, the “closest case” my office located, is usually one of the top choices that the algorithm came up with as well.
That’s not to say “Boolean-logic” searching is infallible. Part Two on the topic of Electronic Modern Legal Research, I will discuss with two specific closed appellate cases as examples, what I term “Rosetta Stone” cases when the “search term” has evolved through the ages, and can therefore, remain hidden.
Michael Wein is an attorney in Greenbelt, Maryland, whose practice concentrates on appellate, civil, and criminal litigation. He can be reached at weinlaw@hotmail.com.
[1] The citation for the paper is, Susan Nevelow Mart, Joe Breda, Ed Walters, Tito Sierra & Khalid Al-Kofahi, Inside the Black Box of Search Algorithms, AALL Spectrum, Nov.-Dec. 2019, at 10, available at https://scholar.law.colorado.edu/articles/ 1238/.
Trackbacks / Pingbacks