The Legal Hallucinatory Detectorist
Events this past week have raised the importance of law firms employing ChatGPT technology to also employ a Legal Hallucinatory Detectorist. Alternatively, perhaps, simply not to use ChatGPT at all.
Legal Prompt Engineers
The initial buzz around ChatGPT led to many saying you had to employ Legal Prompt Engineers. Without the correct prompts the technology would be useless. Large businesses with more money to burn than common sense, such as Deloitte, have actually gone down that road.
Legal Hallucinatory Detectorists
What no one, so far as I am aware, is saying is that we actually need to employ Legal Hallucinatory Detectorists to identify the mistakes (known in the world of AI as “hallucinations”) that ChatGPT is very prone to make.
Tomorrow’s Lawyers
Neither of these were jobs for ‘Tomorrow’s Lawyers‘ that Richard Susskind foresaw. Maybe they will appear in the fourth edition of his book!
Hallucinating in Court
What has brought the need for a Legal Hallucinatory Detectorist to a head? It is the revelation that lawyers in the USA have been using ChatGPT for legal research and using its hallucinations in court filings.
This arose from a case involving an airline Avianca being sued for a metal serving cart allegedly striking a passenger’s knee during a flight to Kennedy International Airport. The passenger, Roberto Mata, engaged lawyers, Levidow, Levidow & Oberman. They decided to use ChatGPT to do legal research on the subject matter in dispute.
Fake Citations
ChatGPT produced at least six fake citations to ‘cases’ that simply did not exist. These included Martinez v. Delta Air Lines, Zicherman v. Korean Air Lines and Varghese v. China Southern Airlines. A 10 page brief citing these cases was lodged with the court on behalf of Mr. Mata by Steven A, Schwartz of Levidow, Levidow & Oberman.
The court raised the possibility of these cases being fake. Mr. Schwartz, who had been practising law in New York for 30 years, said he had never used ChatGPT before. He was “therefore was unaware of the possibility that its content could be false.”
Mr. Schwartz, apparently believing the magic of ChatGPT, asked it to verify that the cases were real and it told him that they were!
The need for a Legal Hallucinatory Detectorist
However, he knows now not to trust ChatGPT and told the court that he “greatly regrets” relying on the AI platform. He went onto say that he “will never do so in the future without absolute verification of its authenticity”. Thus the need for a Legal Hallucinatory Detectorist!
Lawyer in Trouble
The court said that it had been presented with “an unprecedented circumstance”, a legal submission that contained “bogus judicial decisions, with bogus quotes and bogus internal citations.” A hearing has been ordered to take place on 8 June 2023 to look at potential sanctions against Mr. Schwartz.
Being aware of Hallucinations
I commented recently on the lack of awareness of ChatGPT hallucinating at the British Legal Technology Forum 2023. Other than by one speaker, Jack Shepherd, the topic was all but ignored by the other speakers who, on the whole, simply hyped the technology up.
Tamara Box said ChatGPT would be “cataclysmic” to the legal industry but clearly not predicting how cataclysmic it would be for poor Mr. Schwartz!
There is a lot of comment on social media suggesting that Mr. Schwartz should have known the limitations of the technology he was using. It was his fault apparently and not the fault of the technology. With headlines in recent times about ChatGPT replacing lawyers (this is a clear example of how that is not actually going to happen anytime soon) and little noise to the contrary is it any wonder that some people think ChatGPT is a silver bullet?
It is also interesting to see some of the commentators on this being ones who were at one point extolling the virtues of ChatGPT with little, if any, comment on its limitations.
Whether or not Mr. Schwartz knew about the shortcomings of ChatGPT it is to me incomprehensible that he did not actually look up the cases that it cited to read the source material. Had he done so he would, of course, have discovered it was all a lie.
It is being said that Mr. Schwartz’s failure to do so will bring ChatGPT into disrepute in a way it does not deserve. He was incompetent for not checking its output. But surely ChatGPT is also incompetent for hallucinating in the first place? If anything, Mr. Schwartz has done the legal profession a big favour by bringing this fact mainstream. Lawyers will now approach ChatGPT with the health warning in mind that has not been amplified enough to date. Similar to lawyers being more conscious about how to use Zoom after the ‘cat lawyer‘ incident.
I often see people suggest that these mistakes are entirely the fault of the user: the ChatGPT interface shows a footer stating “ChatGPT may produce inaccurate information about people, places, or facts” on every page.
Anyone who has worked designing products knows that users don’t read anything—warnings, footnotes, any form of microcopy will be studiously ignored. This story indicates that even lawyers won’t read that stuff!
People do respond well to stories though. I have a suspicion that this particular story is going to spread far and wide, and in doing so will hopefully inoculate a lot of lawyers and other professionals against making similar mistakes.
Who is Really Hallucinating?
An article by Naomi Klein in The Guardian suggested it’s not really the robots who are hallucinating:
Warped hallucinations are indeed afoot in the world of AI, however – but it’s not the bots that are having them; it’s the tech CEOs who unleashed them, along with a phalanx of their fans, who are in the grips of wild hallucinations, both individually and collectively. Here I am defining hallucination not in the mystical or psychedelic sense, mind-altered states that can indeed assist in accessing profound, previously unperceived truths. No. These folks are just tripping: seeing, or at least claiming to see, evidence that is not there at all, even conjuring entire worlds that will put their products to use for our universal elevation and education.
Tricky Hallucinations for the Legal Hallucinatory Detectorist
Apparently ChatGPT making cases up completely from scratch is not the only problem. It has been known to cite real cases but with those cases having no relevance to the subject matter at hand. Paul Hardy pointed out on Twitter:
I used ChatGPT for an Irish employment law issue, and it more or less got the answer right, but supported it with real cases that had no bearing on the question.
Use other Legal AI tools but not ChatGPT
There is an argument going around social media that Mr. Schwartz should not have used ChatGPT because it was never designed for legal research. No one was telling us that at the British Legal Technology Forum a couple of weeks ago!
There are apparently other AI tools special to legal he could have used. Well I believe LexisNexis and Westlaw have for years been providing technology for this purpose. To what extent they use AI I have no idea. But no one ever said lawyers were going to be replaced by LexisNexis or Westlaw did they? ChatGPT was special for a short while. The reality is now sinking in.
If we just have to continue using LexisNexis or Westlaw what was the ChatGPT fuss all about? #bringbackboring
ChatGPTification
Thomson Reuters recently announced that it would be investing $100 million a year in AI. This includes incorporating chat functionality into its research and workflow products by the second half of this year. Thus ChatGPT it is coming to, but is not currently part of, Westlaw. Will that introduce hallucinations to Westlaw? Will that actually diminish the usefulness and certainty associated with using Westlaw? Is the rush to incorporate ChatGPT into anything and everything unnecessary and possibly detrimental? This meme (hat tip Alex G Smith) is apt in this regard:
LexisNexis is doing similar with the introduction of Lexis+ AI. They told Bob Ambrogi that:
the risk of hallucination is minimal with Lexis+ AI because it leverages trusted and authoritative content directly from LexisNexis.
Good Data does not Stop Hallucinations
However, Nicola Shaver highlighted recently on LinkedIn:
Something I’ve seen a few times now from vendors who are incorporating generative AI and large language models into their tech: proclamations that customers don’t have to worry about hallucinations because their product uses good data.
Look, it’s great that they are refining the data leveraged by the LLM, pointing it at structured databases of sound information. That solves part of the problem.
Poor data is not the only cause of hallucinations, though, and good data alone won’t prevent them.
Poor data can cause poor responses, yes. LLM responses are probabilistic, and they will draw upon multiple sources to generate the answer that seems most appropriate in the context, which can give rise to inaccurate or inconsistent answers.
But hallucinations occur because, unless engineered otherwise, an LLM will answer a question even when it cannot find an answer to the question in the data upon which it has been trained, or on which it has been asked to focus for the purpose of answering that question.
If you are seeking to procure this type of technology, don’t accept that hallucinations have been eliminated because the data sources behind the product are sound.
Instead, ask: how, specifically, has the product been engineered to respond when there is no information to support a response from those data sources?
You want to know that the tech has been trained to respond “I don’t know the answer”, or “There is no information on this point in X data set”.
I think LexisNexis and Westlaw need to be specifically asked this question. If they don’t have an acceptable answer to it we could soon find them, rather than ChatGPT, being quoted as the root of similar pickles to the one that Mr. Schwartz has found himself in.
Artificial Ignorance?
It would appear that we may be ending up with Artificial Ignorance rather than Artificial Intelligence.
But people might just say you simply need a Legal Hallucinatory Detectorist.
Billable Hours
However, surely spending hours trawling through potentially fake information. that you have asked a program to create, to clarify through other means whether it is correct or false is not a good use of anyone’s time? Will a client be happy to pay for that time? Time that may overall take longer to get to an end result than would otherwise have been the case. Maybe that is why Big Law is so interested in Harvey? AI to increase the billable hour not reduce it?!
I am confused, wasn’t ChatGPT supposed to put something like half the employees in law offices out of work almost immediately?
I have seen it said that we do this checking with technology at the moment. But I don’t think we use technology that we know will make things up. Technology that actively produces false information that we then have to check whether it is correct or not.
Surely, we would be better using proven established technology that we know does not hallucinate. We can then supplement that with old fashioned legal research using law books if necessary. It would appear that using LLM technology (as it currently stands) means we will be doing that in any event but just with added confusion thrown into the mix!
No Legal Hallucinatory Detectorists
You don’t need a Legal Prompt Engineer or a Legal Hallucinatory Detectorist. You just need some common sense.
However, if it helps follow this flowchart by Aleksandr Tiulkanov for whether or not to use any AI tool that may be known to hallucinate:
Image Credits: ChatGPT flowchart © Aleksandr Tiulkanov; ChatGPT meme © Alex G. Smith; Main header image: Detectorists © Channel X / BBC
Reactions on Social Media to The Legal Hallucinatory Detectorist
As well as the comments in the comments section of this blog below, on LinkedIn the following comments have been made:-
Clare Fraser (citaizen | #legaltech | #accesstojustice):
Poor Mr Schwartz, he’s not the first to do this, he’s the first to have been caught out. While I like job title of LHD as well as the show I think we can reduce hallucinations not by using ChatGPT but by using the OpenAI api within a contextualised environment of case law so that you can have the citations.
Me:
Thanks Clare Fraser. However, as expanded upon in my full blog post, Nicola Shaver thinks it may be a little more complicated than just having contained data. The LLM might still hallucinate. We need to be sure as to what the vendors are doing to prevent that if they can. As Alex Smith points out via his meme you can’t just plug in a ChatGPT API and think everything is hunky-dory.
Clare Fraser:
We’ll see 😉
Me:
We might need a Legal Hallucinatory Detectorist on hand to see 😉
Alex Smith (Global Search & AI Product Lead (Senior Director) at iManage):
Clare Fraser there are lots of different ways to build these services from training own LLM to transactional calls to RAG like Bing. As well as backend “prompt engineering” to try to contain the craziness in the raw open AI service, not that it is raw they have also black boxed a lot of layers of additional stuff. And we’re only talking about the citations here, there’s the rest of the text it’s generating to get QA-ed as well. This example is just the stuff that’s very obvious, like a bad cite that the lawyer didn’t verify against a Westlaw etc and in fact doubled down on in the follow up!!
Fascinating times ahead. I’m looking forward to the QA phase of this. Prefer to get Quality Assurance approaches on that and not silly job titles … but I’m old skool.
-+-+-+-+-+
Antonio Rocha (Data Leader):
The legal profession is still largely affected by the same drivers as from the 50’s.
There’s much absolutism thinking; ego, and “I come from x so im brighter than you” aka my “word” is worth more. 😂
Loads and loads of folks who earn 150-200k being document machines.
No real thinking. No collaborative work. No facts; and a lot of window dressing into facts, as window dressing is a socially accepted way for brains and money to get out of facts constructions where money speaks louder than “law”, or “justice”.
Justice… don’t make me laugh!
What “justice” there is, when large groups of monkeys in suits use 10% of facts and social words to get a money driven version of “truth” into a court? Lady justice becomes a social construct to keep the masses controlled.
There’s little justice whatsoever in most places. Very little use of facts and an agreement on a version that is 99% based on facts, no window dressing to skew the balance of justice. It’s abhorrent to do that.
That’s the only real hallucination worth discussing. The core one. Any others are just byproducts of the core hallucination.
Me:
Thanks Antonio Rocha. How little actually changes in legal is often conveniently forgotten by those thinking sweeping change is about to happen by a new AI (or other) tool. The fact that the new AI tool is now turning out to not be what it was first made out to be matters not. Most lawyers, as you say, will simply be getting on with their day job as they have always done driven by those 1950’s drivers.
Clare Fraser (citaizen | #legaltech | #accesstojustice):
Don’t hold back Antonio Rocha, tell us what you really mean 😉💪
Graeme Johnston (Software to map work – before that a lawyer):
Antonio Rocha: Indeed. That really goes to the heart of the matter.
-+-+-+-+-+
Michelle Thomson (Scottish National Party – Member of Scottish Parliament for Falkirk East):
Thank you for highlighting this – it caught my eye as I am speaking in a debate about trustworthy, ethical AI in the Scottish Parliament this week!
Me:
Thanks Michelle. I hope the debate went well and there is seen to be a need to control the hallucinations!
-+-+-+-+-+
David Kinnear (Entrepreneur | Award-winning Barrister & Legal Counsel | Disputes Neutral (Mediation, Adjudication, Private Hearings) | Legaltech Founder | Publisher of HPC.law | Legal Industry Influencer):
I think we will look back and laugh at the early antics in this space. The pope in a puffer jack syndrome. But we will be a long way down the track given the pace.
-+-+-+-+-+
Stephen Gold (Principal at Stephen Gold Consulting: Helping Lawyers Build Great Firms):
This is a really excellent post Brian, congrats.
-+-+-+-+-+
Stephen Moore (I help law firms grow – CEO and founder of Moore Legal Technology. Build the law firm you want and earn more 💷 money for the same, or less work, by harnessing the value of the Internet and the opportunities it provides.):
Yes, thanks for writing and sharing Brian Inkster
-+-+-+-+-+
Helen Goldberg referenced my post in another one by her and that produced several more comments:
Helen Goldberg (Co-Founder & COO, General Counsel, In-house Lawyer, Startup & Scaleup Adviser, Mentor, Flexible Working Champion, ex M&A lawyer):
Very interesting posts on using Chatgpt and AI for legal stuff, and this great flow diagram (thanks Aleksandr Tiulkanov) and thanks Brian Inkster https://lnkd.in/eaxQjNRJ and Alex Su (best – only?! but v good legal meme maker) https://lnkd.in/e4FQ54jU
And here’s another! https://www.lawgazette.co.uk/news/lip-presents-false-citations-to-court-after-asking-chatgpt/5116143.article
Me:
That link by you to the Litigant in Person + ChatGPT article in the Law Society Gazette led me to doing another blog post!: Litigants in Person and ChatGPT
Amy McDaniel Williams (Structured Finance Partner at Hunton Andrews Kurth LLP)
This ignores the other problem : if you describe a client’s legal issue then you have revealed a client confidence.
Me:
That is another blog post in itself!
Amy McDaniel Williams:
I’m sure!
Daniel (Dan) (Head of Artificial Intelligence and Machine Learning Model Risk at Ally | Executive Influence | Marketing Strategy | Technology | Innovation | Digital Transformation | Banking Regulation | Leadership | Mentoring | Dogs):
Please write that blog!
-+-+-+-+-+
Chris Keen (Head of Emerging Companies at Mishcon de Reya):
Let’s not forget that the first question should be / at least also needs to be “is the subject matter of your question confidential or subject to privilege?”
-+-+-+-+-+
Caryn Lusinchi, FHCA (AI Strategy & Governance | EU AI Act & GDPR | 1 out of 250 Voices in Responsible Tech):
Yes, inaccurate output and harmful misinformation propagation is a huge concern but the iceberg risk is LLM violation of GDPR’s 7 principles, data privacy, confidentiality, copyright and IP. For that, you need a much bigger flow chart.
-+-+-+-+-+
Dana Denis-Smith (🚀 Expand your in-house legal team & expertise with experienced lawyers and paralegals | Tap into Obelisk Support’s large pool of legal professionals):
The main issue with the tool is not just output but input – it draws on a vast yet limited resource bank of only digital content and even that we are not clear on the methodology it uses to ingest – where from and why ? Does it access paywalled research for example? Ask it how it finds its info and you will see how revealing that is…
-+-+-+-+-+
Todd Carter (#artificialintelligence, #multimediasemantics, #interactivemediaexperience, #disruptiveinnovation #digitaltransformation, #partnerships, #linkedmedia, #livestreaming, #dynamicsemanticpublishing):
Right on … ChatGPT doesn’t know when its hallucinating and it does so quite convincingly!
-+-+-+-+-+
Simon Walkden (Passionate about leveraging the power of the cloud to deliver innovative solutions for clients):
OK, so guidance on neurosurgery is probably out. Got it.
-+-+-+-+-+
Mr. Ashley Moore (Certified IEEE AI Ethics Lead Assessor/AI Architect and Hard Law Influencer “Rugged with the perfect mix of Scruffy, Handsome and Mutt with tattoos”. You’re invited to join the LinkedIn AI Governance to Compliance Group):
-+-+-+-+-+
Bill Bice referenced my post in another one by him and that produced several more comments:
Bill BiceBill Bice (CEO at nQ Zebraworks | Legal Technology AdvocateCEO at nQ Zebraworks | Legal Technology Advocate):
𝗧𝗵𝗲 𝗟𝗲𝗴𝗮𝗹 𝗛𝗮𝗹𝗹𝘂𝗰𝗶𝗻𝗮𝘁𝗼𝗿𝘆 𝗗𝗲𝘁𝗲𝗰𝘁𝗼𝗿𝗶𝘀𝘁
If you’re in #legal and paying attention to #generativeAI, there’s a good chance you’ve already aware of the predicament of the attorney in NY state cutting and pasting answers out of ChatGPT into court filings, including citations to cases that simply don’t exist.
Or, in ChatGPT terms, hallucinations. Which is a great marketing term for simply making things up.
A great takedown of this incident and others from Brian Inkster:
https://lnkd.in/gFtYB3Yp
Nicola Shaver has been educating the legal community on how difficult it is to solve these problems:
“Something I’ve seen a few times now from vendors who are incorporating generative AI and large language models into their tech: proclamations that customers don’t have to worry about hallucinations because their product uses good data.”
Unfortunately, that’s not enough:
“Poor data is not the only cause of hallucinations, though, and good data alone won’t prevent them… LLM responses are probabilistic, and they will draw upon multiple sources to generate the answer that seems most appropriate in the context, which can give rise to inaccurate or inconsistent answers.”
I recently tackled why this is in my article, 𝘉𝘦𝘭𝘪𝘦𝘷𝘦 𝘯𝘦𝘪𝘵𝘩𝘦𝘳 𝘵𝘩𝘦 𝘩𝘺𝘱𝘦 𝘯𝘰𝘳 𝘵𝘩𝘦 𝘣𝘢𝘤𝘬𝘭𝘢𝘴𝘩 𝘰𝘯 𝘊𝘩𝘢𝘵𝘎𝘗𝘛: https://lnkd.in/gTkYBszF
Flowchart credit: Aleksandr Tiulkanov
Robert Ambrogi adds a valuable take: “All of this leads me to one conclusion. The moral of this story is not a cautionary tale of the dangers of new technology. It is not even a cautionary tale about technological incompetence. Rather, it is a cautionary tale about lawyer competence and why it is so important for lawyers simply to exercise caution and common sense.” https://www.lawnext.com/2023/05/why-the-avianca-bogus-cases-news-is-not-about-either-generative-ai-or-lawyers-tech-competence.html
-+-+-+-+-+
Glenn Dawson (Software Developer | .NET, TypeScript/JavaScript, Python, Azure, Docker, k8s):
I mean, he did ask ChatGPT to verify that they were real cases. That counts as verifying, right?🤣
-+-+-+-+-+
David Rakowski, JD MPA (I’m here to help with content writing and strategy for the legal, B2B and education tech sectors):
Saw that yesterday…perfect.
-+-+-+-+-+
David Gilroy (I lead a marketing agency who specialise in helping law firms make more money from their website. I am looking for M&A (to acquire) opportunities in the agency space.):
Sound advice here. Thanks for signposting this to me Brian Inkster.
I don’t think ChatGPT is going to be much use for lawyers at the moment, it’s more the access to justice angle which I think might be a game changer, basically Do Not Pay on steroids. But I suspect future iterations of GPT will be far more accurate and less prone to “hallucinations” – apparently GPT 4 is much better than the standard version of ChatGPT (which runs on GPT 3.5). Do we know which version that lawyer used which made those incorrect case citations?
Not sure which version of ChatGPT was involved. I’ve heard it may actually be getting worse not better: https://twitter.com/Lauramaywendel/status/1659921077156339713. There appears to be some thought that it has been replaced with a distilled smaller model to save costs and/or how reinforcement learning can actually cause brain damage on the model: https://twitter.com/holdehnj/status/1660315495982329859?s=20.
And on the question of ChatGPT and access to justice see my follow up blog post: https://thetimeblawg.com/2023/06/03/litigants-in-person-and-chatgpt/