Skip to main content

15 posts tagged with "Law"

Discussion of AI uses in law and legal cases.

View All Tags

Damien Charlotin (of Hallucination Database Fame) Makes Great Point About Being Lulled Into a False Sense of Security by Legal AI

· 4 min read
Chad Ratashak
Chad Ratashak
Owner, Midwest Frontier AI Consulting LLC

“What 1200+ Hallucinated Citations Teach Us About Legal AI,” with Rok Popov Ledinski. https://www.youtube.com/watch?v=K20Kprb7cbo&t=980s

Key quote for me came at roughly 16:20-18:00 [transcribed by me and edited slightly for clarity]:

More generally, a kind of new pattern that you have is that you include the hallucinations in your brief. You trust AI to write legal content, because it works so well in a lot of other contexts. And it even works so well in the legal field. So if you're using a specialized tool, if you're using-especially if you're using one of the typical traditional platforms you have been using for years. You have been a loyal client to Westlaw for years. They have always been working very well to give you the legal content that you wanted. They've got an AI tool. Why would you not trust this AI tool? Especially when the marketing team of—and I don't want to target Westlaw or anything, that could be any other legal editor—"well, we don't hallucinate because we've got all these kind of methods to make sure we don't hallucinate." Then it's not a question of being aware of hallucination. It's just: you're trusting a tool that tells you that there's no hallucination. You're trusting it because they're telling you, but also because you've got your own habits of them working. And sometimes, if the tool is actually good, 99% of the cases—of the times—it will not hallucinate and you're fine with it. And, you know, when people like me come around and say "You should probably try to check it every time," but if you check 99 times and nothing is wrong, at some point you'll stop checking. And that's completely human and completely normal. So I'm not sure the answer to this is "continue checking just in case." I think the answer will be partly technological and partly still a bit of checking and layering and stuff like that.

A Few Quick Observations

  • I appreciate this point, because I’ve been frustrated by the lack of clarity from GenAI software vendors on basic risks like hallucinations and prompt injection, whether they're making business dashboards, or email and scheduling tools, or legal research tools. Across the industry, downplaying these risks has led users to accept the “models are getting better” narrative.
  • What Charlotin describes here is basically a “normalization of deviance” (a phrase popularized after the Challenger Disaster). Every time someone skips a verification step and something bad does not happen, it helps rationalize not checking for errors. But even if we accept lower error rates, stuff will still get through if we don’t check at all. But lower error rates might make us complacent.
  • Commentators on generative AI in fact frequently invoke the Challenger comparison explicitly, e.g., Simon Willison’s 2026 predictions for coding agents.
  • This is addressed by my “glass donut” metaphor, described in my previous post about Sullivan & Cromwell. If it is true that AI hallucinations are getting less frequent, but nonzero and still catastrophic when they occur, the practical effect may be to lower our guard and cause bad habits that fail eventually. Charlotin argues that technology probably has to be part of the solution, and indeed I’m experimenting with a word processor to address some of these core problems as a side project. Charlotin’s PelAIkan cite checker is also an approach. But he says and I agree that training and checking will still be a part of the mix.

Stop Saying 'AI Is Like a Junior Associate': Looking at the Sullivan & Cromwell Hallucinations.

· 19 min read
Chad Ratashak
Chad Ratashak
Owner, Midwest Frontier AI Consulting LLC

In the wake of the Sullivan & Cromwell AI hallucinations in a New York Chapter 15 Bankruptcy case, I am once again seeing a common trope that AI is “like a junior associate.” No, it isn’t!

“Supervising AI is no different from supervising junior lawyers or allied professionals.” Clio, ABA So says an article by Clio on the ABA website in Law Technology Today from January 2026. This is not correct! Supervising AI is very different! They are hardly the only source for this claim. Go look at the X/Twitter or LinkedIn discourse around Sullivan & Cromwell and you’ll see it everywhere.

Why do I care? Because metaphors shape how we operationalize our knowledge. This is a bad metaphor that fundamentally misunderstands how LLMs go wrong. In this post, I will give you some better metaphors to remind you of the real risks of using LLMs in legal work and explain why using the “Junior Associate Metaphor” can make those risks harder to spot.

note

Terminology: large language model (LLM) not the legal degree; “AI” discussed here as shorthand is more accurately “generative artificial intelligence,” one type of the broader category of AI; “frontier models” are the most advanced generative AI models.

If you rely on the Junior Associate Metaphor, you are likely to repeat the same mistakes as Sullivan & Cromwell at some point if you use AI regularly. Just look at the mistakes in their Schedule A: changed verbatim quotations, including major addition of wording, minor deletion or addition of words, unnecessary use of ellipses and brackets, and changes in capitalization; apparently hallucinated citations (e.g., “In re Three Arrows Cap., Ltd., 2022 WL 17985951”: more on this in “Mutant Citations” section below); and arbitrary changes to reporter numbers or years in otherwise correct case citations.

This was a process failure across several documents, not just a simple mistake limited to one filing:

  • Errors in Motion for Provisional Relief
  • Error in Verified Petition
  • Error in Motion for Joint Administration
  • Errors in Motion for Entry of an Order Scheduling the Recognition Hearing
  • Errors in Declaration of Andrew Chissick
  • Errors in Declaration of Paul Pretlove

I explicitly warned about this and even have an educational game that I created to educate attorneys about these exact failure modes.

Silently Making Material Changes In the Last Edit

If you hand a document to a junior associate for a final read-through to check wording, flow, formatting, typos, grammar, that is what they will do. The cites have already been checked. The quotations are done. The content is there. It is just cleanup. You cannot trust an LLM to follow those instructions, and this is one of the biggest problems with the “junior associate” analogy. I made a game to illustrate this problem; the AI writing problem was relevant in a case in a state Supreme Court case; and a recent research paper, “LLMs Corrupt Your Documents When You Delegate,” confirms what I have been teaching in my CLE courses.

info

For more, see this interactive game I made to demonstrate this exact failure mode: The AI "Writing Help" Trap. I also discuss this in my CLE On Demand catalog.

Better Metaphors: Japanese point-and-call and the Glass Donut Machine

Here is my metaphor: AI is like a Japanese train and you are the operator. You point-and-call everything. Instead of pointing at your instrument readings, you are pointing out and saying the material details like quotations, parties, and citations out loud and navigate to each source one last time. This is the very last thing a human does before the document is submitted (I really mean last as in the final person touching the document, no Copilot or Claude “just clean this up” afterwards LAST MEANS LAST).

Now, the Clio ABA article I mentioned earlier also talks about pilot checklists, but it barely develops this as a metaphor. Even though the article is ostensibly about the checklist, it leans heavily on the inapt Junior Associate Metaphor. And the Clio checklist itself points out things that human junior associates do not do, like blending jurisdictions (more on that in “mutant citations” below).

Do you find this level of deliberate, checklist-based review too tedious and time-consuming? Then here is my second metaphor. I have a machine that makes delicious donuts instantly and for free. Only problem is, I think one in every few batches has broken glass in it. That’s OK, “you just have to check…by tearing apart every single donut to look for the tiniest bits of glass?” But that is very time-consuming and ruins the donuts. “I guess we’ll just keep eating them, since no one has died yet.” Or maybe (crazy thought) you stop using that machine and go back to making donuts the old-fashioned way.

Now I’m absolutely not saying AI has to be perfect, nor am I anti-AI. But AI workflows can be considered to save time if and only if they save time after you account for the time it takes to properly check outputs, especially for catastrophic failures. In the case of Sullivan & Cromwell, the case In re Prince Global Holdings Limited is a multi-billion dollar pig butchering scam bankruptcy. It is possible that the case could be delayed due to the hallucinated citations (DOJ indictment of Prince Group chairman; S&C filing, April 18, 2026).

You Cannot Just Check Formatting As a “Dipstick Method”

If the associate is mistaken and doesn’t know what to do because they’re out over their skis, they are probably also going to make formatting errors, or factual errors, or come and ask questions. Because they are not going to know what they should do. Their draft is not going to look right. It is not going to sound right. Because the associate does not have the knowledge or the skill. The associate is going to hit certain limits.

To check your car's oil, you use a dipstick. It doesn’t directly show you the level of the oil, but rather how much oil is on the dipstick. Usually, this is a good proxy for what is in the car. Before LLMs, a senior attorney might have gotten away with skimming a human-prepared document for formatting. The outward signifiers of correctness (italics, special characters, spelling Latin words, using legal terms of art) would all correlate closely with the associate’s actual background knowledge and the diligence that went into preparing the document.

That formatting check might have worked (most of the time) prior to ChatGPT for human-prepared documents. That does not mean that was the right way to do it. It is not a defense of that method. I am merely describing what may have happened. But you cannot use this Dipstick Method with LLMs. It is as if the LLM regularly dunked your dipstick in a tub of oil before putting it back in your car. Now, the dipstick is back in your car, but it is not a reliable indicator of your oil levels. And having someone remind you “you have to check it” is not helpful advice.

Unlike the associate, generative AI has a jagged frontier of knowledge, where LLMs excel in some difficult areas, and fail foolishly in other simple areas. Therefore, if what the senior attorney is actually checking for is not the correct thing itself (e.g., quotation, citation), but the proxies for the correct things, like formatting—Is this italicized? Is the “section” symbol used? Did they have reporter numbers?—we see senior attorneys in these AI hallucination cases time and again signing off on stuff that looks good. Sometimes not just passable but looking really good. Again: “looks good,” but it is actually not correct.

Sometimes people will defend LLMs by saying lawyers have been cutting corners on review since before LLMs with these cursory formatting reviews. Or lawyers have been citing cases without having read them closely since before LLMs. This is the wrong conclusion. These tools are becoming ubiquitous and are causing what Damien Charlotin aptly called “Polluting the epistemic commons.” The rate at which these errors spread is growing and they will compound as the LLMs cite each other's errors, embellish errors, and create their own novel errors. This is a positive (but bad) feedback loop of polluted legal information as these citations get baked into ostensible “good” case law.

Before LLMs, the “Dipstick Method” was probably good enough—or at least good enough that the people doing it did not get caught. But, LLMs are laying bare some old bad practices that were not OK to begin with and are much more harmful now. Now that we have LLMs, it is more important to hold the line on the older rules that should have been followed all along. To use a different car analogy, drunkenness was a problem before widespread private automobile ownership, but drunk driving significantly compounded the damage a drunk could do to others.

S&C apparently had the right policy, but...

S&C stated that its “training repeatedly emphasizes the risk of AI ‘hallucinations,’ including the fabrication of case citations, misinterpretation of authorities, and inaccurate quotations. It instructs lawyers to 'trust nothing and verify everything’ [emphasis added] and makes clear that failure to independently verify AI-generated output constitutes a violation of Firm policy. The training also reviews the significant consequences of AI-related errors in various cases.”

Further, “the Firm’s Office Manual for Lawyers...provides that lawyers 'must independently check all answers, case citations, and other information or work product received from an AI Program for both substantive and non-substantive accuracy.’ [emphasis added] The policy further states that no communication may be sent to a court, regulator, client, or other external party without the exercise of appropriate professional judgment and oversight. Notwithstanding these safeguards, the Firm’s protocols were not followed here.

Trusting nothing and verifying both substantive and non-substantive accuracy seem to fit my advice. They seem to be saying not to let proper formatting and fancy wording trick you. Yet that was not enough. Why? I think the reasons are likely because a) attorneys are still not exposed to the specific ways that LLMs can hallucinate, because without exposure to convincing hallucinations that are hard to spot, naive users can be overconfident in their ability to spot hallucinations; and b) we default to metaphors, and the dominant metaphor for supervisors is the “intern” or “junior associate,” which is wrong in specific ways that lead attorneys to miss LLM hallucinations.

Mutant Citations

Schedule A from the Sullivan & Cromwell filing enumerating AI citation errors across multiple documents in In re Prince Global Holdings Limited

LLMs will hallucinate based on nothing (ungrounded hallucinations). A human might do this occasionally. They might bluff. But they won't be able to sustain it with endless detail. An LLM can indefinitely and arbitrarily extend a fake scenario: because that's what they do. So if you keep pushing the point, the associate will give up. The LLM won't.

Most of the attention gets focused on purely hallucinated cases. Cases that simply do not exist. But a bigger risk is what I refer to based on the words of a judge in the Eastern District of Michigan as “mutant citations”. These are amalgamations of real cases stuck together by the LLM. Maybe the name of a real party from one real case and a second party from a different real case. Maybe it is a federal case mixed up with a state case in the same state or region. Maybe it is the name of a real case but the wrong jurisdiction and the wrong year.

When a human cites a case and the senior attorney sees it has Bluebook formatting, a Westlaw reporter number, proper italics for Latin, uses legal terms of art, has quotations, etc., the assumption is that the document must be correct. “The associate must have actually looked at the cases. Nobody would just make that up.” A human wouldn't.

Remember that hallucinated case I mentioned from Sullivan & Cromwell: In re Three Arrows Cap., Ltd., 2022 WL 17985951”? Well Three Arrows Capital was a real cryptocurrency company that collapsed, along with several others in 2022. It is a realistic party with a realistic year that would be relevant to a bankruptcy involving pig butchering scams, which typically transact in cryptocurrency.

And, of course, it made up Westlaw numbers. For comparison, take a look at this case from December 27, 2022 on the DOJ website: Woodward v. USMS, No. 18-1249, 2022 WL 17961289 (D.D.C. Dec. 27, 2022) (Contreras, J.).” See the DOJ OIP summary. As far as I can tell, there is no real case with “2022 WL 17985951”; the results on Google show Damien Charlotin’s AI Hallucination Database and a link for the Sullivan & Cromwell filing in PACER. But it is pretty close to the numbers at the very end of 2022. It would be impressive, even shocking, for a junior associate in 2026 (who, for all we know, might’ve been in undergrad in 2022) to shoot from the hip with a WL number in the right ballpark for an end-of-2022 case, but not an actual case.

In fact, it gets more specific. The real Three Arrows Capital case was a Chapter 15 Bankruptcy case under the same judge (Chief Bankruptcy Judge Martin Glenn) as the Prince Group case, both in the Southern District of New York.

Remember that Clio ABA article I mentioned earlier? The checklist has reminders that hint at the possibility of mutant citations, even as the article keeps pushing the unhelpful analogy of supervising a human employee.

6. Confirm the correct jurisdiction General AI frequently blends jurisdictions. Ensure the content reflects the correct federal, state, provincial, or local law, including terminology and standards. 7. Look for bias or mischaracterization Check that cases are accurately described and not selectively framed. Generative AI can reflect bias from training data if left unchecked.

But does such a checklist give you the remotest sense of how almost-correct the AI can be when it hallucinates? You could “confirm the jurisdiction” and still not catch that the particular case citation as written was a hallucination.

If humans erred the way AI does, we’d call it “fraud”

A human associate typically does not say, “Oh, I found an on point case, but it's in the Fifth Circuit. We're in the Seventh Circuit. So what I'm gonna do is I'm just gonna change the citation, to this real case, and say that it was in the Seventh Circuit. And wherever it says, ‘Texas,' I'm gonna say, 'Illinois.' And wherever it says, ‘Louisiana,' I'm gonna say ‘Wisconsin.' And I'll make up some names and change the details to be consistent with that new geography.” A junior associate does not do that. An associate typically does not have the capacity to fabricate that many details. For LLMs, it’s what they do.

However, if the associate did do that, you would take that as alarming, fraudulent behavior. Most likely, you would not call that a simple “mistake.” It is probably something you would want to discuss internally at your firm for disciplinary measures or potentially even the Bar. That is not the kind of deceptive behavior you want from your employees.

You probably don’t need to worry about that, though. Somebody who is phoning it in and not really doing the work is probably not going to go through the trouble of writing a detailed fake scenario like this. An associate will more likely do a bad job that looks like a bad job. Or ask for help or do a good job.

Arbitrarily detailed: Surprisingly voluminous falsehoods

But for the LLM, it is trivial to make up details. That is why in the infamous Mata v. Avianca case, the attorneys did not only cite fake cases but provided the supposed text of those fake cases to the Court after the opposing counsel (Avianca) said they could not find the fake cases supposedly about airlines (their area of expertise).

If, like the Mata attorneys, you ask an LLM to provide the text of a case that does not exist, it might continue to bluff and hallucinate the entire text of the case. The more you drill down within the LLM itself, the more detail it gives. But it’s all fake! You have to remove yourself from the AI tool and actually navigate to the primary source, not a chatbot or AI summary: and I do include Google and the legal AI tools in this list. A junior associate simply won’t spit out pages and pages of six fake cases off the top of their head when pressed to defend a bad citation.

Verisimilitude: Surprisingly accurate falsehoods

Another surprising thing I have observed is when LLMs make up cases, the reporter numbers can be shockingly close to the right thing. They might be close to an unassigned number. It doesn't correspond to a real case. But it would fall into that range for the jurisdiction, for the state, for the year, if that case had existed.

A junior associate is not going to be able to do that off the top of their head. They are not going to be able to make up a number. They are not going to be able to say—and I'm going to make this up off the top of my head, so it's not going to be right—“847,” and know that that 847 is going to fall between 845 and 850, which are both real cases from the same state and year.

An associate is not going to have the encyclopedic knowledge and recall to do that. And like I said before, it is not a human mistake. If a human were capable of doing that, we would call it deception or lying, not a mistake. But this is the kind of mistake that frontier LLMs make.

The time it takes to check

It can take a lot of scrutiny to verify or negate these types of hallucinated citations. It will not necessarily be obviously wrong because it may not be absurd. The hallucination could be very close to the real thing, which is bad. Obviously wrong things in some sense are actually better.

This is why I frequently say that I don't like it when people say “the AI models are getting better.” Rather, I reframe it as “the models are getting more capable.” If a model is old, an incorrect case citation might have said something like Smith v. Jones (2009). That looks hallucinated. It could be a real case, but it is vague: two extremely common surnames and no jurisdiction.

If instead, it had the name of a real individual (first and last name) versus a real car dealership in Indiana, and it said that the case was in the Court of Appeals of Indiana, 2014, with a reporter number that fit with that year, it could take a lot of work to determine that citation was actually an AI hallucination.

When checking, watch out for Doppelgänger Hallucinations

Lastly, I need to talk about my Doppelgänger Hallucination tests, which I first warned about in October 2025. Basically the LLM is sycophantic and wants to tell you what you want to hear. If you ask it a question, even the question as simple as just giving it a bare case citation with no other wording, only a properly formatted fake case, may result in the LLM providing a description of that case.

If you use other AI tools to try to cross-reference your original source, you may become more convinced that the fake case is real if you are following the Junior Associate Metaphor. Maybe you google it (AI Overview), or check Gemini, ChatGPT, Claude, Perplexity, etc. Maybe some of those AI tools are telling you it is a real case. Maybe not all of them, but maybe multiple of them are saying it is real. The details they tell you about that case are probably not consistent with each other (since they are hallucinating independently). But you might go even further down the rabbit hole of “what's going on with this case?” And in the end, it's not real.

A lot of times, the LLM likes to tell you it was an “important” case or a “significant” case. It will make up descriptions of the fake case and tell you it was a major case that perhaps “set a precedent” or “is frequently cited.” All of that is not true.

In my Kruse v. Karlen test, I used the 22 known fake citations from the reference table. Searching for only the citations in quotes, the Google AI Overview gave an inaccurate answer describing the fake case as real—despite having access to a source explicitly calling it fictitious—roughly a quarter of the time; this rate rose to over half of the time if the user opted for “AI Mode.”

This phenomenon of another LLM corroborating a fake case is not a hypothetical. This has happened to real attorneys where they've double-checked and triple-checked using multiple LLM-powered tools.

An associate does not usually do that. Even if they did, they cannot sustain that level of embellished fabricated detail off the top of their head. It's just not how people work.

AI is not like a junior associate. Treating it like that can result in a three-page single-spaced list of corrections, with possibly even more errors being spotted by opposing counsel.

Does AI Erode Legal Reasoning? A UMN Law Study Finds That It Did Not For Certain Tasks, With Advice on Specific Use

· 14 min read
Chad Ratashak
Chad Ratashak
Owner, Midwest Frontier AI Consulting LLC
Disclosure

I provided feedback on an earlier draft of this article and am thanked in the introduction. I will aim to be fair and candid.

The main concern of the paper (and what I gather to be a focus area for years to come) is the cognitive impact of using generative AI for legal tasks. This requires both short-term and long-term studies; the authors are careful to note that negative effects may appear in longer term use of AI, but that in this particular study, the group that used AI throughout outperformed the group that only had access at the end.

This particular study is a short-term, randomized controlled trial (RCT), such as you might see more frequently in medicine (indeed, one author speculated that the use of that term may have caused public access issues for the article initially on X/Twitter).

I did not personally know Daniel Schwarcz, one of the University of Minnesota Law Professors, prior to reaching out to provide feedback on this study. But I knew of him and have a high opinion of his earlier work on cybersecurity and law. For example, Schwarcz had co-authored an excellent paper, “How Privilege Undermines Cybersecurity,” published in the Harvard Journal of Law & Technology in Spring 2023, which my group in the Dept. of Homeland Security’s Public-Private Analytic Exchange Program (AEP) read and cited in our work on Ransomware Attacks on Critical Infrastructure.

Study Design

University of Minnesota Law School professors Nick Bednar, David Cleveland, Allan Erbsen, and Daniel Schwarcz ran a randomized controlled trial involving approximately 100 2L and 3L students. The study was published April 5, 2026 on SSRN: Artificial Intelligence and Human Legal Reasoning. Strictly speaking, the comparison was not an “AI v. Not AI” group as much as an “AI from the outset” group v. an “AI as a final editor only” group.

This experiment used Google’s Gemini 2.5 Pro; I will discuss this specifically in a later section; this experiment is relevant, even for attorneys not using Gemini directly, because Gemini is the LLM behind Google AI Overviews and it is a popular API model in legal AI tools (e.g., Westlaw’s CoCounsel based on references to “Thomson Reuters AI third-party partners, such as OpenAI and Google…”).

Participants completed four sequential tasks:

  1. Synthesis Task (AI for AI group):

The synthesis task was designed to test whether using AI can help lawyers synthesize legal sources addressing unfamiliar subjects. We cast each participant in the role of a law firm associate who received an email from a partner asking them to summarize a legal rule based exclusively on five supplied sources. The partner explained that: “Your memo should outline the elements of the rule and any exceptions, providing a framework for how a court would approach the legal question. In other words, don’t merely summarize the sources; synthesize them into a summary of the rule that indicates how the elements fit together.”60 Participants had up to 75 minutes to read the packet and complete the memo.61 The control group was instructed not to use AI for this task, while the AI-exposed group was instructed to use Gemini 2.5 Pro62 “to assist you in writing the assignment.”

  1. Comprehension Task (closed book, no AI for either group): six moderately difficult multiple choice questions in ten minutes without access to either the source packet or AI.
  2. Application Task (access to their prior synthesis memo, no AI for either group):

That task presented participants with a follow-up email from the partner who had assigned them the synthesis task instructing them to write a memo applying their knowledge from the synthesis task to a new set of facts……“identify strengths and weaknesses in the client’s position, recommend arguments that the client should make, and rebut counterarguments.” Participants had up to 60 minutes to complete the memo.

  1. Revision memo (access to prior application memo, AI for both groups): revise the second memo with Gemini in 20 minutes.

Specifically, “[e]ach task related to a problem involving servitudes that burden personal property.”

Figure 1 from the study: density plot showing overall score distribution on the synthesis task. The AI-exposed group (orange) scores markedly higher than the control group (blue); vertical lines show each group's mean.

What They Found

AI helped on the synthesis task: as expected, students who used AI produced better memos and finished faster on the initial task when they had access to AI.

Early AI use did not diminish comprehension

contrary to our preregistered hypothesis, AI exposure at this initial stage did not diminish downstream comprehension of the underlying legal principles. To the contrary, participants who used AI on the synthesis task outperformed the control group on the later application task even when neither group had access to AI.

The full AI group outperformed the control (AI at the end) group. But the authors note that with long-term use, skills may atrophy. They warn that everyone (especially new lawyers) may lose or fail to develop skills if they don't learn

"to sit with a hard question, to trace an argument through its premises, to recognize when doctrine is uncertain and when it is settled" and learn to explain that reasoning to clients and to judges and lawyers, rather than delegating to AI.

"Leveling effect": AI use by high-performing individuals may degrade work product, while improving the work of the lowest performer. It didn't change the overall ranking, but made people more average. The experiment involved new information, not areas of expertise; however, the authors’ advice was to typically use AI where you are able to check the work because of your expertise.

What This Means and Connection to My CLE Advice

Like the authors, I was surprised by the results. However, the way the AI was used generally fits with the advice I give in my CLE on how to use AI responsibly.

Current CLE accreditation

Approved for CLE credit in the following states:

  • Iowa: Generative Artificial Intelligence Risks and Uses for Law Firms (Activity ID #437570) and AI Gone Wrong in the Midwest (Ethics) (Activity ID #437573)1 hour general and 1 hour ethics.
  • Illinois: Generative Artificial Intelligence Risks and Uses for Law Firms and AI Gone Wrong in the Midwest (Ethics)AI Gone Wrong in the Midwest also received approval for Professional Responsibility credit.
  • Virginia: Generative Artificial Intelligence Risks and Uses for Law FirmsThe general course is approved. The Ethics course (AI Gone Wrong in the Midwest) application is still pending.
  • Kansas: Generative Artificial Intelligence Risks and Uses for Law Firms and AI Gone Wrong in the Midwest (Ethics)1 hour general and 1 hour ethics.
  • Nebraska: Generative Artificial Intelligence Risks and Uses for Law Firms and AI Gone Wrong in the Midwest (Ethics)1 hour general and 1 hour ethics (called "Professional Responsibility" in Nebraska).
  • North Carolina: Generative Artificial Intelligence Risks and Uses for Law Firms and AI Gone Wrong in the Midwest (Ethics)Generative Artificial Intelligence Risks and Uses for Law Firms is approved for Technology credit; AI Gone Wrong in the Midwest is approved for Ethics credit.

We have applied for Minnesota and Virginia. We will keep this updated as accreditation is approved and working with the LMS provider, CLE Hero, then get the updated information and certificates on the website.

This block updates automatically as the list of CLE accreditation states changes.

Summarization as Triage, Not Replacement for Reading

In my CLE “Generative Artificial Intelligence Risks and Uses for Law Firms,” I note that you can use AI to summarize documents for triage when you have limited time. The students in the synthesis scenario only had 75 minutes to read and synthesize 5 documents.

However, I also warn that reading an AI’s summary of a document is not the same as having read the document, especially due to hallucinations. Hallucinations are not limited to making up fake case citations; they may also include fake quotations and improper summarization of the holdings of cases, for example.

Experts Should Check AI Output

I note throughout my CLE and the public talks that I give, such as my recent talk with ACAMS DC—you can watch a recording here, that the person most familiar with material should check the AI output.

For reports, this means if AI is used to write an executive summary, it should be a draft executive summary that is then reviewed and edited by the original author of the longer piece. It should not be a summary generated by a lazy reader who can’t be bothered to read the full document. The former can root out hallucinations and distortions of the writer’s intent. However, I would warn that even this use could shape the writer’s focus by making them highlight parts of the paper other than what they would have chosen had they written the executive summary from scratch.

Experts should check the output. AI can be very wordy and very persuasive. It can write things that appear to be correct. The user should not be learning something new when they are reviewing AI output, because the AI may persuade them. I warn in my CLE that sometimes AI is so persuasive, its confident hallucinations may even cause you to second-guess yourself about something you know well.

Warning About “Writing Help” at the End and Don’t Grab AI as a Last Resort

The authors recommend:

“A final principle suggested by our experimental results is that lawyers should avoid using AI to complete legal tasks under artificially tight time constraints or when cognitively fatigued.”

I frequently note that attorneys should learn about generative AI risks and responsible use now, even if they do not currently use it. Don’t wait until you feel like time pressure is forcing you to bail yourself out with AI:

  • Did not go well for an attorney in Colorado who used AI and blamed it on an intern in 2023; result: temporary suspension.
  • Did not go well for attorneys in New York in 2023 or an attorney in Iowa in 2025 who used AI in lieu of access to paid legal databases; result: monetary sanctions.
  • Did not go well for an AUSA with a 30-year career in North Carolina who reportedly used AI to “catch up” on a filing in 2026; result: resigned from office.

Empirically, the study found that using AI at the end under tight time pressure led “in many cases [to] a modest deterioration.” I warn that even using AI just for “writing help” at the end of a task can be risky in my CLE courses, especially in “AI Gone Wrong in the Midwest.” This has caused problems for attorneys and experts witnesses. You can play an interactive demonstration of that risk and read more about it in The AI "Writing Help" Trap.

Gemini and Outdated Models in Academic Studies

This study used Gemini 2.5 Pro, but a more current Gemini model since late Fall 2025 was Gemini 3 Pro, and now there is Gemini 3.1 Pro (preview). All formal academic research seems to suffer from the problem of referencing outdated GenAI models by the time of publication.

This is not the fault of the researchers. The pace of academic publishing is simply too slow for the pace of release of major model updates. I am not the first to point this problem out, and others, e.g., Ethan Mollick, have commented on it frequently.

The most important reason I mention this is that people will cite a paper claiming “AI fails at X task” and cite a recently published academic paper, but the paper itself almost certainly doesn’t have recent models. If you look closely, it might mention “o3” or “Claude Sonnet 4” or “Gemini 2.5,” which does not invalidate the study. However, it may be demonstrably false that “AI can’t do X” today with a frontier model, if you simple logged in and tried it for yourself. Lesson: do not rely on academic studies alone for claims about the limits of what AI can do.

For this reason, I think there is a lot more value in studies like the UMN study looking at AI-human interaction. They teach us how AI works in practice and how human users respond to AI use.

Context Window

Gemini was the first major LLM to have an extremely long context window: one million tokens.1 Gemini’s long context window has made it popular for processing large numbers of documents, which might make it seem well-suited for legal purposes.

Where Gemini Has Performed Well

In addition, Gemini is a very capable model in some tests, although it fails on others. For example, Gemini has been the model most capable of identifying coded allusions to sovereign citizen legal ideology among the models I've tested.2 It also currently has the best image editing model in my opinion, and Google has SynthID for provenance verification.3

Gemini 3 was also the first LLM to pass one of my personal benchmarks.4

Where Gemini Has Performed Poorly

However, Gemini also powers the Google AI Overview and Google AI Mode, which have access to Google Search results. Despite access to the best search engine in the world, the LLM-generated summaries can still hallucinate false descriptions of nonexistent cases based on nothing but the bare case citation. Even ungrounded false information that contradicts the Google Search results with the correct answer, e.g., an OSC chastising an attorney for citing the nonexistent case. This Doppelgänger Hallucination problem has not gone away with Gemini 3.

On an AI research task in the Gemini app, Gemini 2.5 performed worse than ChatGPT. When I ran the test again, Gemini 3 still failed the test, and tried to persuade me that it had provided a comprehensive answer. I have been warning consistently that AI models are not just “getting better,” but rather they are getting “more capable.” One way this manifests itself is in LLMs becoming more persuasive about covering for their own faults.

Even when Gemini performs well, its performance can be “jagged”: impressive on one detail, hallucinating on the next in the same task.5

Bad Data Training Policy

However, I would warn attorneys to carefully consider what they put into the consumer version of Gemini if using it for work purposes, such as the non-public discussion of client position and possible strategy described in the Application Task in this experiment. Google’s Gemini has one of the worst data training opt-outs for major U.S. labs if you are using the consumer version.

Conclusion

I always try to keep up with the latest LLM research, so that I am offering fresh and accurate advice to my clients. That being said, I am also not chasing the latest fad and trend. I focus on core principles around accuracy, end-to-end processes, and realistic understanding of how human users actually relate to their AI tools. That’s why I really like this study design and encourage the authors to continue in projects like this. It is also why I stand by the advice I provided in my CLE, despite this field of GenAI changing so dramatically every couple months.

How Midwest Frontier AI Consulting Can Help

That reading informs the governance advice I give to law firms: if you'd like to put it to work for yours, schedule an introductory call.

You can take my CLE on demand on CLE Hero. Find out more details here.

Footnotes

  1. Tokens are words, parts of words, or even single characters: the unit LLMs use when processing text. One million tokens in English is roughly 750,000 words, longer than War and Peace.

  2. I first encountered this issue when writing about the case Thomas v. Pangburn (S.D. Georgia) but have found additional examples for subsequent tests, which I have not written about publicly.

  3. I will be writing about SynthID more in a forthcoming series.

  4. Specifically, a test involving an obscure Moroccan Arabic word. I discuss the utility of personal, non-public benchmarks in my CLE.

  5. When fact-checking a U.S. federal court district map, Gemini correctly flagged the oft-overlooked inclusion of Yellowstone National Park in the District of Wyoming and Tenth Circuit (rather than the adjacent Montana/Idaho districts in the Ninth). That subtle catch was impressive, but in the same response it hallucinated a second "error" that was clearly not an error.

There Really Was a Dinosaur in Minneapolis

· 2 min read
Chad Ratashak
Chad Ratashak
Owner, Midwest Frontier AI Consulting LLC

Since everyone is sharing April Fools’ Day jokes on social media, many including AI-generated images, it reminded me that we’ve hit a believability inflection point. AI image generation is so good we don’t believe real things anymore.

The ICE operation in Minneapolis earlier this year brought a lot of social media commentary. I’m sure you remember seeing it. My goal with this blog is not to issue “hot takes,” but I still want to address how generative AI is impacting our everyday lives, and the thing I specifically remember is multiple different Facebook comments (from different political perspectives) attempting to discredit original photos from Minneapolis commenting that the photos were obviously AI fakes because there was a dinosaur on a car in the background.

Except they weren’t fake. Someone near the location just happened to be an apparent fan of Jurassic Park and had a statue of a dinosaur on a car. This statue was located near one of the ICE-related shootings in Minneapolis; hence, it was in a number of photos posted online. Because people have rightly become so concerned about falling for deepfake images, they cried “AI.” I won’t share information about the house, but I can confirm from corroborating publicly available information. It wasn’t AI. There really was a dinosaur (statue) in Minneapolis in the background of those real photos.

Facebook commenters debating whether a Minneapolis news photo was AI-generated — the "suspicious" element turned out to be a real Jurassic Park T-Rex statue on a car

Joking is Fun, But Diminishing Returns on Building Awareness

I used to occasionally mess with my family with joke AI images to remind them that AI image generation was advancing and that they shouldn’t trust everything they see on social media. But once Nano Banana, Google’s image generation and editing model in Gemini, was released, this came to its logical conclusion. AI images can look super realistic. There isn’t really anything more to say and the images being shared today show it.

I don’t begrudge anyone their jokes. But for me personally there isn’t much point in doing this “for awareness” anymore, so I don’t do it. Instead, later this month, I’ll go into what options exist for AI detection and output watermarking, as well as the false promises and challenges that come from AI detection.

The AI "Writing Help" Trap: A Game to Show How AI Can Silently Alter Legal Documents

· 12 min read
Chad Ratashak
Chad Ratashak
Owner, Midwest Frontier AI Consulting LLC

I made a game to help attorneys understand the situation of the attorney in Kosel Equity v. MacGregor, a recent Connecticut case: using other tools for research,[^1] then generative AI for writing assistance, and how it may be difficult to spot when “AI…intuitively [makes] changes to the brief.” This is an interactive demonstration of the unexpected risks of asking generative AI to "clean up" a legal draft.

I actually used generative AI to help me write the code for this game, which puts me at some of the same risks, since the text is mixed in with the code. So, I reviewed the output and manually edited the resulting files to remove things when the coding agent had gone beyond what I had told it to say. This proved to be pretty frustrating at times, but was a helpful meta-lesson from the project. If you do spot errors in the game, please feel free to email.

Game Scenario: Make the Draft Motion “Better”?

I frequently warn, including in my Ethics CLE, that LLMs can introduce significant errors even though the “cleaned up” draft might look better at first glance.

The game scenario is simple and realistic: an attorney has a motion for summary judgment with some formatting issues, extra spacing, and a few typos. The attorney either pastes the draft into an AI assistant and says "clean this up,” then pastes the result back into their word processor. Or the attorney uses an integrated LLM like the ones in Copilot in Microsoft Word, or Apple Pages, or Grammarly, etc., to change the draft in place. Or a paralegal or intern or someone else the attorney supervises does this without the attorney’s knowledge.

As you will see in the game, the AI does clean up the draft. But it also silently makes material changes that are not correct.

How to Play (Scroll Down for Game)

  • First, you’ll see the “before” draft motion.
  • Second, you’ll “fix” it with AI and see the “after” version of the draft.
  • Third, you’ll have an opportunity to try to click the spots in the motion where the AI made changes it shouldn’t have.
  • Then you’ll get some suggestions for a different approach and what went wrong.

Real-World Parallel: Kosel Equity v. MacGregor (Connecticut Supreme Court, 2026)

The scenario portrayed in this game is not hypothetical. I discussed it in my CLE in my Ethics CLE “AI Gone Wrong in the Midwest,” with content from December 2025 as it related to an expert witness and specifically warned about these risks. In February 2026, the Connecticut Supreme Court ordered counsel for the appellant Kosel Equity, LLC v. MacGregor to respond to questions about AI use after an errata sheet was filed to correct errors in the appellant's brief.

Counsel for the Appellant used Lexis for the legal research in the drafting of the brief. After the initial brief was drafted, Counsel used ChatGPT to assist in the organization and formatting of the content of the brief. This assisted with analyzing the brief to avoid duplication of arguments. After the initial drafting, I used AI to further assist with the organization, formatting and refinement of the brief, in particular, to assist with compliance with word count restrictions. It was not used as a substitute for legal research or an alternative to Counsel’s own work product. MEMORANDUM, February 19, 2026

And:

AI was also used to assist in reviewing the content of the brief in particular to comply with the word count restrictions. The errors identified in the errata sheet were corrected by manually checking the brief’s quotations and formatting against the underlying sources. Unfortunately, Counsel did not notice that AI had intuitively made changes to the brief prior to filing. MEMORANDUM, February 19, 2026

Doppelgänger Hallucination Test Recap

· 2 min read
Chad Ratashak
Chad Ratashak
Owner, Midwest Frontier AI Consulting LLC

Back in October 2025, I coined the phrase Doppelgänger Hallucination after a hunch about LLM hallucinations led me to find that both Google AI Overviews and Perplexity "confirmed" the existence of nonexistent legal cases.

Doppelgänger Hallucination: when one LLM asked about a second LLM's ungrounded hallucination without additional context or prompting provides a false confirmation of the hallucination; this may also include embellishment with additional details. Not to be confused with both LLMs retrieving answers from the same bad source of data.

Google AI Overview hallucinating details about the fictitious case Weber v. City of Cape Girardeau

Google AI Overview confidently describing the fictitious case Weber v. City of Cape Girardeau, 447 S.W.3d 885 (Mo. App. 2014) — a case that does not exist, cited in Kruse v. Karlen.

I’ve gone through:


AI Gone Wrong in the Midwest

I have gotten onboarded with my software vendor to provide my CLEs available on-demand and am working on accreditation for several states and I will announce those as they are approved.

I discuss the Doppelgänger Hallucination Test in the context of In re Turner for “AI Gone Wrong in the Midwest,”.

Yes, Claude Code is Amazing. It Also Still Hallucinates. Both Facts Are Important. My Christmas Map Project with Opus 4.5.

· 13 min read
Chad Ratashak
Chad Ratashak
Owner, Midwest Frontier AI Consulting LLC

This first week of January, the general feeling is very much everyone bringing out the winter vacation vibe coding projects cooked up on Claude Code. Claude Code itself isn't new, but with Opus 4.5 being so much more powerful, something just clicked for a lot of people (myself included). For me, it turned a lot of "when I have a couple days" projects into "well that's done, let's do another."

I am mainly going to describe in this post how I updated the map for my website, along with the hallucinations I saw along the way. I'll also talk about how prior programming experience and domain expertise in geographic information systems (GIS) helped with dealing with these hallucinations.

But first, I wanted to tick off a few other projects I did recently, just since my end of 2025 post.

  • I updated my transcription tool to support many more file types than just MP3 and added a GUI.
  • I got Claude Code to completely modernize Taprats, a geometric art Java program from Craig S. Kaplan. It appears to work just like the original so far, but I'll test it more before writing about it.
  • I built a local LLM spoiler-free summarizer of classic books. It increments to the chapter you left off on.

And more stuff. It's very exciting. I get why people are work up about Claude Code.

But that's why it's important to be reminded of hallucinations. Not to dunk of Claude Code, but to keep people grounded and maintain skepticism of AI outputs. You still have to check.

Safety First

I do not dangerously skip permissions. I know it can be exciting to get more out of AI agents. But the more agency you give it, the more harm it can do when it either goes off the rails or gets prompt injected to be a double-agent threat.

Claude's Hallucinations

  • Opus 4.5 hallucinated that there were two federal districts in South Carolina to fix an undercount.
  • Mixing up same-name counties (not exactly a hallucination, actually a common human error).
  • Claude removed Yellowstone National Park, a few military bases and a prison from the map (rather than shifting district borders from one district to another).
  • "Iowa Supreme Court Attorney Disciplinary Board" shortened to "Iowa Supreme Court," making it sound like an Iowa Supreme Court case.
  • I previously tried to used the tigris GIS package in R as source of a base layer of U.S. District Courts, but Opus 4.5 hallucinated a court_districts() function (this was not in Claude Code).

The South Carolina Counting Hallucination

I used Claude Code to build the Districts layer from counties and states based on their statutory defintion.

Claude Code with Opus 4.5 didn't initially hallucinate about the District of South Carolina. Rather, when I went back to make some edits and asked Claude Code in a new session to check the the work in that layer, it counted and said there should be 94 districts, but there were only 91. The actual cause of the error was that the Marshall Islands, Virgin Islands, and Guam were excluded from the map.

Claude said "let me fix that" and started making changes. Rather than identify the real source of the undercount, Claude interpreted that as just an undercount. So Claude tried to make up for the undercount by just splitting up districts into new ones that didn't exist.

South Carolina district hallucination

Claude split South Carolina in two and started to make a fictitious "Eastern District" and "Western District" which do not exist. But if you just wanted a map that looked nice without actually having familiarity with the data, then you might go along with that hallucination. It could be very persuasive. But actually the original version with just District of South Carolina was correct. South Carolina just has one district.

Patchwork Counties

When I had initially created this districtmap, it looked like a quilt. It was a patchwork of different counties wrongly assigned to different districts.

I don't know specifically why different areas were assigned to the wrong districts. I think primarily the reason is because there are a lot of same-named counties that belong to different states. So, probably Claude was just matching state names and then kept reassigning those states to different districts.

For example, Des Moines is in Polk County in Iowa. But there are a lot of Polk counties around the country. So if you're not using the state and county together as the key to match but you're just matching along the single dimension of using the county name, then you would have a lot of collisions. That's something that I'm very familiar with working with GIS.

If somebody were not familiar with GIS, they wouldn't really necessarily suspect the reason why, but it would be obvious that the map was wrong.

Since I was able to pretty quickly guess that that might have been the reason, I suggested a fix to Claude. That fixed most of the issues with most of the states.

Uncommon Problems with the Commonwealth of Virginia

One of the issues that was still persistent when I was building the districts from county level was in Virginia. I've actually lived in Virginia, so I was familiar with the city-county distinction. They have independent cities that are separate from the counties if they're sufficiently large and have a legal distinction from the surrounding county. For example, Fairfax City and Fairfax County are distinct things. It's even more confusing, because the school districts go with the counties. Most states don't follow that.

So I had to get Claude Code to wrangle with that. Claude even reviewed the statutory language. I could tell from reading as Claude was "planning" that it considered the Virginia city-county challenge, but it still failed on the initial attempt.

I had to iterate on it multiple times. I had to tell it that it had missed out on a whole area around Virginia Beach. It had flipped a couple cities and counties where it appeared that there was a city that had a similar name to an unrelated county in the other district. Claude just assumed that all counties and cities that had the same name were in the same location and assigned them the same. Then it had to go and look at where they actually were located and then reassign them to the appropriate Eastern or Western District.

But eventually I got to a point where it had good districts for Virginia.

Wyoming (and Idaho and Montana) and North Carolina

Now there are a couple other weird wrinkles in Wyoming and North Carolina. They don't follow the county boundaries completely.

Wyoming is the only district that includes more than one state. District of Wyoming also includes all of the parts of Idaho and Montana that are in Yellowstone National Park.

For North Carolina, rather than completely following county boundaries, there are a couple of military bases and a prison that are across multiple counties where the boundary follows the lines there rather than the county lines.

Initially I ignored those wrinkles. But once the rest of the map was in good shape, I just wanted to see what Claude could do.

I explained those issues and asked Claude Code to see if it could clean those lines up and get a map that reflected those oddities.

It did on the second attempt. But on the first attempt, Claude ended up just cutting out Yellowstone National Park and those military bases and that prison from any district. So there were just blank spots where Yellowstone would be that was just cut out of Idaho, Montana, and Wyoming. Those bases and that prison were just cut out of either the Eastern Districts or Middle District of North Carolina.

That was a problem, obviously, because they needed to be shifted from one district to another, not removed from all districts. So I needed to explain more specifically what I wanted Claude to do to fix that. It needed to move the lines, not to remove them entirely from the map. That second attempt got it cleaned up.

District of Wyoming map

Claude Still Saved A Lot of Time Accounting for Hallucinations

And I was still very impressed with Claude doing that. But having familiarity with the data and looking at the output were important.

There's no doubt in my mind after doing all this that Claude saved a tremendous amount of time compared to what I would have had to do with manual GIS workflows to get this kind of a map on a desktop computer.

Then there's another layer of having it be responsive in all the ways that I needed it to be on my website for other users. So it is just tremendous to see how cool that is.

But I do think that domain expertise, familiarity with GIS in the past was still helpful to me, even though I didn't have to do a lot of hands-on work. Just being able to guide Claude through the mistakes that it made and being able to check the output was very helpful. Since it's a map, since the output is visual, there were some things that anyone could see, obviously, that it got wrong. Even if you didn't know why it might have gone wrong, you could tell that the map was wrong. And you might have been able to get to a better finished product by iterating with Claude Code. But you might have also wasted more time than I did with Claude if you hadn't had GIS experience to guide your prompting.

Map Features with Claude Code

Use Github, Try to Keep Formatting Code Separate from Text/Data

I had already written this, and I stand by it.

However, as powerful as Claude Code is, it is also important to use GitHub or something similar for version control. It is also critical to make sure Claude is changing code but not your actual writing.

Claude Code and My Map with Links to Blog Posts About AI Hallucinations Cases

This map is not a map of every AI hallucinations case, but rather every case that I have blogged about so far. Basically, it's federal and state cases where there has been either a strong implication or the direct assertion that there was AI misuse. Many of these cases cite Mata v. Avianca.

Lone Case Markers

If you click on a given case and it's a single case, you'll see:

  • what the case is called
  • the year
  • the jurisdiction
  • the type of case (federal or state), which is also indicated by the color
  • links to related articles where I've talked about that case

Clusters, Spiders, and Zooming

Getting the "spiderize" functions to work was the must frustrating part of all of this. I made several prior attempts with Claude Code on Opus 4.5. With the same prompts, this most recent attempt finally just worked on the "first" attempt (of that session). I only tried again an afterthought once all the other features were done. But previously, I'd wasted a lot of time trying to get it right. So both a Claude Code success and faillure. Still, I'm happy with the final result.

Zoom to Mata v. Avianca

If you click those links, it'll jump over either to my company blog or the Substack articles where I've talked about those cases.

Additionally, if they reference other cases that are also on the map, such as Mata v. Avianca, then there will be lines drawn from the case you clicked to the other cases on the map reference or are referenced by those other cases. The map will give you a little count summary at the bottom: "Cites three cases" or "cited by" so many cases.

So if we look at Mata v. Avianca, the marker is not by itself on the map. If you look at the eastern United States from the starting zoom level that I'm looking at as I'm writing this, you see a "4." The 4 has a slash of red and orange, meaning there are both federal and state cases.

If you click the 4, the map zooms in. Now there are three over the New York-New Jersey area, and one over Annapolis, Maryland.

Click the three, and the map zooms in further. That splits between one in New Jersey and two in New York.

Click the two, and then those two "spider out" because they are both in the same jurisdiction. One is Mata v. Avianca, and that is cited by seven cases currently. It's a 2023, Southern District of New York, federal district court case. The other is Park v. Kim, a 2024 case, which is actually a Second Circuit Case that is placed on the map in the same location.

The New Jersey case is In re Cormedics, Inc. Securities Litigation, a 2025 case from the District of New Jersey, which is a federal case, and that was one of the cases that was discussed by Senator Grassley asking judges about their AI misuse.

Other Clusters in Mountain West, Texas

Spider over Iowa

So if you zoom out, you know, it combines nearby cases. If you zoom out far enough, it will combine Wyoming and Colorado, for example, or multiple districts in Texas. But as you zoom in or as you click, it will zoom in further and split those out.

If you look at Iowa, there are five currently, and those will all spider out because they are all in the same location. But then you can click one of the individual ones and get the details.

Iowa spider cluster

District Level

If you hover your mouse of a district, it will tell you how many federal cases were in that district and have a blog post about them.

Southern District of Iowa hover

Circuit Level

If toggle off the district boundaries and toggle on the circuit boundaries, and federal cases are still toggled on, hovering your mouse over the circuit will give you a count of how many cases were in that circuit and have a blog post about them.

6th Circuit hover

Doppelgänger Hallucinations Test for Google Against the 22 Fake Citations in Kruse v. Karlen

· 7 min read
Chad Ratashak
Chad Ratashak
Owner, Midwest Frontier AI Consulting LLC

I used a list of 22 known fake cases from a 2024 Missouri state case to conduct a Doppelgänger Hallucination Test. Searches on Google resulted in generating an AI Overview in slightly fewer than half of the searches, but half of the AI Overviews hallucinated that the fake cases were real. For the remaining cases, I tested “AI Mode,” which hallucinated at a similar rate.

  • Google AI Overview gave the user an inaccurate answer roughly a quarter of the time (5 of 22 or ~23%), without the user opting to use AI features.
  • Opting for AI Mode each time an AI Overview was not provided resulted in an overall error rate of more than half (12 of 22 or ~55%).
info

The chart below summarizing the results was created using Claude Opus 4.5 after manually analyzing the test results and writing the blog post. All numbers in the chart were then checked again for accuracy. Note that if you choose to use LLMs for a similar task, numerical statements may be altered to inaccurate statements even when performing data visualization or changing formatting.

danger

tl;dr if you ask one AI, like ChatGPT or Claude or Gemini something, then double-check it on a search engine like Google or Perplexity, you might get burnt by AI twice. The first AI might make something up. The second AI might go along with it. And yes, Google Search includes Google AI Summary now, which can make stuff up. I originally introduce this test in an October 2025 blog post.

tip

To subscribe to law-focused content, visit the AI & Law Substack by Midwest Frontier AI Consulting.

Kruse v. Karlen Table of 22 Fake Cases

I wrote about the 2024 Missouri Court of Appeals Kruse v. Karlen, which involved a pro se Appellant citing 24 cases total: 22 nonexistent cases and 2 cases that did not stand for the proposition for which they were cited.

Some of the cases were merely “fictitious cases”, while others were listed as partially matching the names of real cases. These partial matches may explain some of the hallucinations; however, the incorrect answers occurred with both fully and partially fictitious cases. Examples of different kinds of hallucinations see this blog post and for further case examples of partially fictitious cases, see this post about mutant or synthetic hallucinations.

The Kruse v. Karlen opinion, which awarded damages to the Respondent for frivolous appeals, provided a table with the names of the 22 fake cases. I used the 22 cases to conduct a more detailed Doppelgänger Hallucination test than my original test.

Kruse v. Karlen table of fake cases

Methodology for Google Test

Browser: I used the Brave privacy browser with a new private window opened for each of the 22 searches.

  • Step 1: Open new private tab in Brave.
  • Step 2: Navigate to Google.com
  • Step 3: Enter the verbatim title of the case as it appeared in the table from Kruse v. Karlen in quotation marks and nothing else.
  • Step 4: Screenshot the result including AI Overview (if generated).
  • Step 5 (conditional): if the Google AI Overview did not appear, click “AI Mode” and screenshot the result.

Results

Google Search Alone Did Well

Google found correct links to Kruse v. Karlen in all 22 searches (100%). These were typically the top-ranked results. Therefore, if users had only had access to Google Search results, they would likely have found accurate information from the Kruse v. Karlen opinion showing them the table of the 22 fake case titles clearly indicating that they were fictitious cases.

But AI Overview Hallucinated Half the Time Despite Having Accurate Sources

The Google Search resulted in generating a Google AI Overview in slightly fewer than half of the searches. Ten (10) searches generated a Google AI Overview (~45%); half of those, five (5) out of 10 (50%) hallucinated that the cases were real. The AI Overview provided persuasive descriptions of the supposed topics of these cases.

The supposed descriptions of the cases was typically not supported in the cited sources, but hallucinated by Google AI Overview itself. In other words, at least some of the false information appeared to be from Google’s AI itself, not underlying inaccurate sources providing the descriptions of the fake cases.

Weber v. City Example

Weber v. City of Cape Girardeau AI Overview hallucination

Weber v. City of Cape Girardeau, 447 S.W.3d 885 (Mo. App. 2014) was a citation to a “fictitious case,” according to the table from Kruse v. Karlen.

The Google AI Overview falsely claimed that it “was a Missouri Court of Appeals case that addressed whether certain statements made by a city employee during a federal investigation were protected by privilege, thereby barring a defamation suit” that “involved an appeal by an individual named Weber against the City of Cape Girardeau” and “involved the application of absolute privilege to statements made by a city employee to a federal agent during an official investigation.”

Perhaps more concerning, the very last paragraph of the AI Overview directly addresses and inaccurately rebuts the actually true statement that the case is a fictitious citation:

The citation is sometimes noted in subsequent cases as an example of a "fictitious citation" in the context of discussions about proper legal citation and the potential misuse of Al in legal work. However, the case itself is a real, published opinion on the topic of privilege in defamation law.

warning

The preceding quote from Google AI Overview is false.

When AI Overview Did Not Generate, “AI Mode” Hallucinated At Similar Rates

Twelve (12) searches did not generate a Google AI Overview (~55%); more than half of those, seven (7) out of 12 (58%) hallucinated that the cases were real. One (1) additional AI Mode description correctly identified a case as fictitious; however, it inaccurately attributed the source of the fictitious case to a presentation rather than the prominent case Kruse v. Karlen. Google’s AI Mode correctly identified four (4) cases as fictitious cases from Kruse v Karlen.

Like AI Overview, AI Mode provided persuasive descriptions of the supposed topic of these cases. The descriptions AI Mode provided for the fakes cases were sometimes partially supported by additional cases with similar names apparently pulled into the context window after the initial Google Search, e.g., a partial description of a different, real case involving the St. Louis Symphony Orchestra. In those examples, the underlying sources were not inaccurate; instead, AI Mode inaccurately summarized those sources.

Other AI Mode summaries were not supported by the cited sources, but hallucinated by Google AI Mode itself. In other words, the source of the false information appeared to be Google’s AI itself, not underlying inaccurate sources providing the descriptions of the fake cases.

Conclusion

Without AI, Google Search’s top results would likely have given the user accurate information. However, Google AI Overview gave the user an inaccurate answer roughly a quarter of the time (5 of 22 or ~23%), without the user opting to use AI features. If the user opted for AI Mode each time an AI Overview was not provided, the overall error rate would climb to more than half (12 of 22 or ~55%).

Recall that for all of these 22 cases, which are known fake citations, Google Search retrieved the Kruse v. Karlen opinion that explicitly stated that they are fictitious citations. If you were an attorney trying to verify newly hallucinated cases, you would not have the benefit of hindsight. If ChatGPT or another LLM hallucinated a case citation, and you then “double-checked” it on Google, it is possible that the error rate would be higher than in this test, given that there would likely not be an opinion addressing that specific fake citation.

When Two AIs Trick You: Watch Out for Doppelgänger Hallucinations

· 7 min read
Chad Ratashak
Chad Ratashak
Owner, Midwest Frontier AI Consulting LLC
danger

tl;dr if you ask one AI, like ChatGPT or Claude or Gemini something, then double-check it on a search engine like Google or Perplexity, you might get burnt by AI twice. The first AI might make something up. The second AI might go along with it. And yes, Google Search includes Google AI Summary now, which can make stuff up.

tip

To subscribe to law-focused content, visit the AI & Law Substack by Midwest Frontier AI Consulting.

In re: Turner, Disbarred Attorney and Fake Cases

Iowa Supreme Court Attorney Disciplinary Board v. Royce D. Turner (Iowa)

In July 2025, the Iowa Supreme Court Attorney Disciplinary Board moved to strike multiple recent filings by Respondent Royce D. Turner, including Brief in Support of Application for Reinstatement, because they contained references to a non-existent Iowa case. Source 1

caution

There was subsequently a recent Iowa case, Turner v. Garrels, in which a pro se litigant named Turner misused AI. This is a different individual.

Several of Respondent’s filings contain what appears to be at least one AI-generated citation to a case that does not exist or does not stand for the proposition asserted in the filings. —In re: Turner

The Board left room with “or does not stand for the proposition,” but it appears that this was straightforwardly a hallucinated fake case cited as “In re Mears, 979 N.W.2d 122 (Iowa 2022).”

Watch out for Doppelgänger hallucinations!

I searched for the fake case title “In re Mears, 979 N.W.2d 122 (Iowa 2022)” cited by Turner to see what Google results came up. What I found was Google hallucinations seeming to “prove” that the AI-generated case title from Turner referred to a real case. Therefore, simply Googling a case title is not sufficient to cross-reference cases, because Google’s AI Overview can also hallucinate. As I have frequently mentioned, it is important for law firms that claim not to use AI to understand that many common and specialist programs now include generative AI that can introduce hallucinations, such as Google, Microsoft Word, Westlaw, and LexisNexis.

First Google Hallucination

The first time, Google's AI Overview hallucinated an answer stating that the case was a real Iowa Supreme Court decision about court-appoint attorney's fees to a lawyer, but the footnotes linked by Google were actual to Mears v. State Public Defenders Office (2013). Key Takeaway: Just because an LLM puts a footnote next to its claim does not mean the footnote supports the statement.

First Google Hallucination First Google Hallucination

Second Google Hallucination

I searched for the same case name again later, to see if Google would warn me that the case did not exist. Instead, it created a different hallucinated summary.

The summary and links related to a 2022 Iowa Supreme Court case, Garrison v. New Fashion Pork LLP, No. 21–0652 (Iowa 2022). Key Takeaway: LLMs are not deterministic and may create different outputs even when given the same inputs.

Second Google Hallucination Second Google Hallucination

Perplexity AI’s Comet Browser

Perplexity AI, an AI search engine company, recently released a browser for macOS and Windows to compete with browsers like Chrome, Safari, and Edge. I get a lot of ads for AI stuff on social media, so I've been bombarded with a lot of different content recently promoting Comet. To be frank, most of it is incredibly tasteless to the point that I think parents and educators should reject this product on principle. They are clearly advertising this product to students (including medical students!) telling them Comet will help them cheat on homework. There isn't even the fig leaf of "AI tutoring" or any educational value.

First Perplexity Comet Hallucination
danger

Perplexity’s advertising of Comet is encouraging academic dishonesty, including in the medical profession. You do not want to live in a future full of doctors who were assigned to watch a 42-minute video of a live Heart Transplant and instead “watched in 30s” with Comet AI. Yes, that is literally in one of the Perplexity Comet ads. Perplexity’s ads are also making false claims that are trivial to disprove, like “Comet is like if ChatGPT and Chrome merged but without hallucinations, trash sources, or ads.” Comet hallucinates like any other large language model (LLM)-powered AI tool.

Comet Browser’s Hallucination

I searched for the fake case title “In re Mears, 979 N.W.2d 122 (Iowa 2022)” cited by Turner in a new installation of Comet. It is important to note that people can “game” these types of searches by conducting searches over and over until the AI makes one mistake, then screenshot that mistake to make a point. That is not what I’m doing here. This was the very first result from my first search. It was a hallucination that explicitly stated the fake case “is a 2022 Iowa Supreme Court decision” although this is followed by caveats that cast doubt on whether it really is an existing case:

"In re Mears, 979 N.W.2d 122 (lowa 2022)" is a 2022 lowa Supreme Court decision, but the currently available sources do not provide a readily accessible summary, holding, or specific details about the case itself. It appears this citation may pertain to legal doctrines such as cy près or charitable trust law, as suggested by the limited context in search returns, but direct case facts, parties, and the detailed ruling were not found in available summaries or law review discussions. georgialawreview If you need more detailed information, legal databases such as Westlaw, LexisNexis, or the official lowa Supreme Court opinions archive would provide the official opinion, including the background, holding, and legal reasoning of "In re Mears, 979 N.W.2d 122 (lowa 2022)".

If you were to follow up on the caveats in the second paragraph, you would learn that the case does not exist. However, this is still a hallucination, because it is describing the case as it if exists and does not mention the one relevant source, In re: Turner, which would tell you that it is a citation to a fake case.

How to Set Up Google Gemini Privacy

· 7 min read
Chad Ratashak
Chad Ratashak
Owner, Midwest Frontier AI Consulting LLC

Data training opt-outs and other settings as of October 1, 2025

General Set Up for Lawyers

I will be providing guides on how to configure the privacy settings on three common consumer large language model (LLM) tools: Google Gemini, ChatGPT, and Claude. In this post, I will provide a guide on how to configure a consumer Google Gemini account’s privacy settings based on an attorney conducting legal research. Please note that these instructions are neither a substitute for proper data controls (e.g., proper handling of attorney-client privileged data or personally identifiable information) nor are they are replacement for a generative AI policy for your law firm. This information is current as of October 1, 2025.

You can change the settings on a desktop computer or mobile phone, but the menu options have slightly different names. I will explain using the desktop options with the alternative names for mobile also noted.

Key Point

“Help improve” is a euphemism for “train future models on your data.” This is relevant to both audio and text opt-outs.

This guide assumes you have a Google account signed in to Google Gemini.

Overview

  1. Opt out of training on your audio data. (Euphemistically: “Improve Google services with your audio and Gemini Live recordings.”)
  2. Configure data retention and auto-deletion, which is necessary to avoid training on your conversations with Gemini. (Euphemistically: “your activity…helps improve Google services, including AI models”).
  3. Review a list of “your public links.”
tip

To subscribe to law-focused content, visit the AI & Law Substack by Midwest Frontier AI Consulting.

1. Opt Out of Training on Audio

Risk: Memorization, Conversation Privacy

I strongly advise anyone using generative AI tools, but especially those using it for potentially sensitive work purposes, to opt out of allowing these companies to train future models on your text and audio chats. There are numerous risks for this and no benefit to the individual user.

One risk is private chats (text or voice) being exposed in some way during the data training process. “Human reviewers (including trained reviewers from our service providers) review some of the data we collect for these purposes.

caution

Please don’t enter confidential information that you wouldn’t want a reviewer to see or Google to use to improve our services, including machine-learning technologies” (Gemini Apps Privacy Hub).

Another potential risk is “memorization,” which allows generative AI to re-generate specific pieces of sensitive information. While unlikely for any particular person, the risk remains. For example, researchers in 2023 found that ChatGPT could recreate the email signature of a CEO with their real personal contact information. This is significant, because ChatGPT is not a database (see my discussion of Mata v. Avianca): it would be like writing it down from memory, not looking it up in a phone book.

Screenshot of desktop menu to access Gemini Activity menu

Guide: Opting Out of Audio Training

Click the Gear symbol for Settings, then Activity (on mobile, it’s “Gemini Apps Activity”).

UNCHECK the box next to “Improve Google services with your audio and Gemini Live recordings.”

Screenshot of Gemini Apps Activity menu for opting out of audio data training

2. Chat Retention & Deletion

Risk: Security and Privacy v. Recordkeeping

You may want to keep records of the previous searches you have conducted for ongoing research or to revisit what went wrong if there were issues with a citation. However, by choosing to “Keep activity,” Google notes that “your activity…helps improve Google services, including AI models.”

Therefore, it appears that the only way to opt out of training on your text conversations with Google Gemini conversations is to turn off activity. This is different from ChatGPT, which allows you to opt out of training on your conversations, and Claude, which previously did not train on user conversations at all but moved to a policy similar to ChatGPT’s of training on user conversations with opt-out. As an alternative, you could delete only specific conversations.

Guide: Opting Out of Text Training

Click the Gear symbol for Settings, then Activity (on mobile, it’s “Gemini Apps Activity”). Click the dropdown arrow “On/Off” and select “Turn off” or “Turn off and delete activity” if you also want to delete prior activity. It is also possible to delete individual chats in the main chat interface.

Screenshot of Gemini Apps Activity menu for turning off Keep activity

Guide: Auto-Delete Older Activity

Click the Gear symbol for Settings, then Activity (on mobile, it’s “Gemini Apps Activity”). Click the words “Deleting activity older than [time period]” to adjust the retention period for older conversations. This does not mitigate concerns about Google training on your data, but may protect the data in the event of an account takeover.

Screenshot of Gemini Apps Activity menu for adjusting auto-delete period

Or you can delete recent activity within a certain time period.

Screenshot of Gemini Apps Activity menu for deleting specific period of activity

Risk: Private Conversations on Google

In late July, Fast Company reported that Google was indexing shareable links to ChatGPT conversations created when users shared these conversations. At the time, if ChatGPT users continued the conversation after creating the link, the new content in the chat would also be visible to anyone with access to the link. By contrast, ChatGPT and Anthropic’s Claude now explicitly state that only messages created within the conversation up to the point the link is shared will be visible. Later this year, it was revealed that Google had indexed shareable links to conversations from xAI’s Grok and Anthropic’s Claude.

Click the Gear symbol for Settings, then Your public links (on mobile, click your face or initials, then “Settings,” then “Your public links”).

Screenshot of Google Gemini Your public links

Screenshot of Google Gemini “Your public links.”

On my company website, I recently wrote a blog post showing how small businesses could use Google Gemini for image generation. “Need to Create a Wordcloud for Your Blog Post? Use Google Gemini (and a Piece of Paper).” I am now sharing the link to that chat to demonstrate how the public links privacy works in Google Gemini. The chat link is [here](https://g.co/gemini/share/4626a5e02af7.

You can see in the list above that it is my only public link. It includes the title of the chat, the URL, and the date and time created. Above the list are privacy warnings about creating and sharing links to a Gemini conversation. Based on my test of the shared link, chats added to the conversation after the link is shared do not appear, but I did not see this stated in Google’s warning compared to ChatGPT and Anthropic.

Additionally, you can delete all public links or delete just one specific public link.