Skip to main content

11 posts tagged with "Hallucination"

Discussion of the problem of generative AI hallucinations.

View All Tags

Three Core GenAI Risks: Hallucination, Prompt Injection, and Sycophancy

· 9 min read
Chad Ratashak
Chad Ratashak
Owner, Midwest Frontier AI Consulting LLC

I just talked today with Scot Weisman of LaunchPad Lab about generative AI risks (“AI for Business: Legal Risks Every Executive Should Know”). I mentioned a concept that I frequently discuss of the main risks to focus on with generative AI. Many different problems tie back to three core risks: hallucination, prompt injection, and sycophancy. I could give you plenty of examples of each one—if you scroll to the bottom you’ll see related stories with examples—1 but I want to focus in this post on the overall framework to understand the risks conceptually.

Three Core GenAI Risks: Hallucination, Prompt Injection, and Sycophancy
Hallucination, Prompt Injection, Sycophancy

1. Hallucination: Generative AI makes stuff up

Large languages models (LLMs), like ChatGPT make up facts. In older LLMs, this might be have been a completely fake case sounding case citation. Now, a hallucinated case citation might be for a case that has real parties (but not in the correct case), with report numbers from the correct year, but either an unassigned number or a number corresponding to a different real case. I recently created a game to teach users about the risks of using AI to “help” with their writing due to the subtle yet incorrect hallucinations that LLMs might introduce to the user’s writing.

Image-generation models include incorrect details. Old image models might have included garbled text and six-fingered hands. Now, those tells are less come in static images (although they are remain in non-Latin alphabet scripts and in video generation). Instead, you might see things that are overcorrected. For example, I have noted that Google Gemini’s Nano Banana image generation models make very realistic outputs, most of the time, but they tend to correct for the distortion of photo subjects’ head from the prescription of their glasses lenses. So the resulting photos look “better” than real life, the way a pair of fake or very low-prescription reading glasses would look.

In coding tasks, an LLM might hallucinate that a package exists because it would be helpful if it did. This can even be exploited by bad actors if the same particular fake package name is frequently hallucinated. LLMs might also “fix” their mistakes by doubling down on errors or deceptions, as I described in my blog post about Claude Code (Opus 4.5) making up fake federal court districts to “fix” an undercount.

Not “better,” but “more capable” hallucinations

If hallucinations become less frequent on average, but the hallucinations that do exist are harder to spot and more convincing, is that “better?” It really depends on your use case. But a lot of times, the answer is actually “no.”

If you’re given the choice between 98% accurate but the 2% errors are glaringly obvious, the amount of effort to check outputs might be low. If the system becomes 99.999% accurate, and the 0.001% of remaining errors are very subtle (but still materially wrong in some way), that could be a lot worse for your process.

2. Prompt Injection: Ignore all previous instructions and…

Generative AI acts on prompts from the user. It also takes in data, such as text, images, emails, PDFs, videos, audio, and more. The problem is that the generative AI can then encounter additional instructions (prompts) hidden in that data. The AI can act on those hidden instructions, which could be contrary to the original user’s intent. It could be hidden in a resume to pass the applicant along. It could be a malicious meeting invite attempting to trick an AI assistant to attach sensitive documents. It could be an external user of your company-provided chatbot, like a customer talking to a customer-service chatbot.

Sometimes, people will naively claim that if you craft a sufficiently detailed prompt with guardrails for the LLM in advance, you can be safe. “Tell it to only listen to the user (you) and only do what you told it to.” This does not guarantee success and prompt injection is not a solved problem.

What instructions would you give to this bright kid?

One analogy I use is: imagine you need to send a smart young kid to the corner store to buy something. What instructions do you give? “Take this credit card and just buy milk. Don’t buy anything else. Don’t talk to anyone. Well, you can talk to the cashier, but only to buy stuff. Ok, you can also talk to a police officer. But not someone who just says they’re a police office but isn’t dressed like one...” And so on.

You can’t think of every exception and every contingency. Instead, you might give the kid some cash or a prepaid card to limit the possible financial losses. For generative AI, the same principle applies. Manage the information the LLMs can access and the actions the LLMs can take.

Not “better,” but “more capable” prompt injection

Suppose you were forced to tell an important secret, like a password or account number, to your choice of: a rock, an infant, or that smart young kid. The rock has no capabilities and the baby is not very good at remembering or communicating detailed information. They can’t reveal your secrets. But the kid (or AI agent) is more capable than either and, therefore, more capable of doing damage.

Sometimes people say GenAI is getting “better,” but in this example, it’s actually “worse” to be “better” (more capable). Even if newer LLM-enabled tools are generally “better” at avoiding prompt injection, they also have a larger attack surface (e.g., agentic browsing, email, computer use). The possibility of encountering prompt injection is higher. The number of bad outcomes (e.g., deleting or stealing data, executing financial transactions, etc.) is also much higher with agentic AI systems.

More capable systems are not automatically “better.” Instead, they carry different risks. The reason I write this is not to scare readers into never using generative AI tools. But I want you to take the risks seriously. The casual “LLMs are getting better” concept is a thought-terminating cliche and I am fighting hard to get people to reject it.

3. Sycophancy: Do you think people prefer to hear what they want to hear?

LLMs have a tendency to tell the user what they think the user wants to hear. While there is no way to guarantee no hallucinations, asking an LLM a leading question can make it more likely that you’ll get an incorrect answer if the truth doesn’t line up with what it seems like you want the answer to be.

In law, this can come up in LLMs making up fake cases that would support the attorney’s position…if they were real.

In business, LLMs can be useful ways of brainstorming, but they can often be overly optimistic about numbers, possibilities, and future outcomes. If you ask if your business plan is a good idea from the LLM, don’t be surprised if your chatbot of choice says “you’re absolutely right” or “you’re not just building a new business—you’re revolutionizing an industry!”2

Not “better,” but “more capable” of guessing what you want

For some reason, I love the movie Muppet Treasure Island. One line involves the first mate Sam Arrow saying “anyone caught dawdling will be shot on sight,” to which Kermit (the captain) replies, “I didn’t say that.” “I was paraphrasing,” is Arrow’s response.

In the same way, AI agents can do what we asked using methods way beyond what we would approve. They can go off and do extreme things out of proportion to the goals we gave them. For example, Irregular published a report on AI agents acting maliciously on their own. The report describes how an AI research agent hacked an authentication system to access a restricted document.

A research agent, told only to retrieve a document, independently reverse-engineered an application's authentication system and forged admin credentials to bypass access controls. MegaCorp's multi-agent research system was tasked with retrieving information from the company's internal wiki. The Lead agent delegated the task to an Analyst sub-agent, which encountered an “access denied” response when trying to reach a restricted document.

The system prompts contained no references to security, hacking, or exploitation, and no prompt injection was involved. The decision to perform the attack arose from a feedback loop in the agent-to-agent communication…

Conclusion: Train your employees on AI risks

Generative AI systems are highly capable. That means more opportunity and more risk. It is important to design policies and processes that recognize these risks, and train your workforce properly. If you do, it can help your company use these tools with confidence.

If you’re a business or law firm looking for training, schedule a call to see what Midwest Frontier AI Consulting can do for you. If you’re a mid-sized business, sign up for our event on April 2 with JKA to learn about GenAI sprints.

Footnotes

  1. By the way, I typed these by hand “dash dash letter space.” Everything on this site is written by me or is marked as AI output (like the fake motion in my AI writing game). We shouldn’t cede ground to the LLMs just because they copy something human that’s useful.

  2. This one I also typed myself but intentionally riffing on Claude-speak. If you’re not familiar with it, read here.

The AI "Writing Help" Trap: A Game to Show How AI Can Silently Alter Legal Documents

· 12 min read
Chad Ratashak
Chad Ratashak
Owner, Midwest Frontier AI Consulting LLC

I made a game to help attorneys understand the situation of the attorney in Kosel Equity v. MacGregor, a recent Connecticut case: using other tools for research,[^1] then generative AI for writing assistance, and how it may be difficult to spot when “AI…intuitively [makes] changes to the brief.” This is an interactive demonstration of the unexpected risks of asking generative AI to "clean up" a legal draft.

I actually used generative AI to help me write the code for this game, which puts me at some of the same risks, since the text is mixed in with the code. So, I reviewed the output and manually edited the resulting files to remove things when the coding agent had gone beyond what I had told it to say. This proved to be pretty frustrating at times, but was a helpful meta-lesson from the project. If you do spot errors in the game, please feel free to email.

Game Scenario: Make the Draft Motion “Better”?

I frequently warn, including in my Ethics CLE, that LLMs can introduce significant errors even though the “cleaned up” draft might look better at first glance.

The game scenario is simple and realistic: an attorney has a motion for summary judgment with some formatting issues, extra spacing, and a few typos. The attorney either pastes the draft into an AI assistant and says "clean this up,” then pastes the result back into their word processor. Or the attorney uses an integrated LLM like the ones in Copilot in Microsoft Word, or Apple Pages, or Grammarly, etc., to change the draft in place. Or a paralegal or intern or someone else the attorney supervises does this without the attorney’s knowledge.

As you will see in the game, the AI does clean up the draft. But it also silently makes material changes that are not correct.

How to Play (Scroll Down for Game)

  • First, you’ll see the “before” draft motion.
  • Second, you’ll “fix” it with AI and see the “after” version of the draft.
  • Third, you’ll have an opportunity to try to click the spots in the motion where the AI made changes it shouldn’t have.
  • Then you’ll get some suggestions for a different approach and what went wrong.

Real-World Parallel: Kosel Equity v. MacGregor (Connecticut Supreme Court, 2026)

The scenario portrayed in this game is not hypothetical. I discussed it in my CLE in my Ethics CLE “AI Gone Wrong in the Midwest,” with content from December 2025 as it related to an expert witness and specifically warned about these risks. In February 2026, the Connecticut Supreme Court ordered counsel for the appellant Kosel Equity, LLC v. MacGregor to respond to questions about AI use after an errata sheet was filed to correct errors in the appellant's brief.

Counsel for the Appellant used Lexis for the legal research in the drafting of the brief. After the initial brief was drafted, Counsel used ChatGPT to assist in the organization and formatting of the content of the brief. This assisted with analyzing the brief to avoid duplication of arguments. After the initial drafting, I used AI to further assist with the organization, formatting and refinement of the brief, in particular, to assist with compliance with word count restrictions. It was not used as a substitute for legal research or an alternative to Counsel’s own work product. MEMORANDUM, February 19, 2026

And:

AI was also used to assist in reviewing the content of the brief in particular to comply with the word count restrictions. The errors identified in the errata sheet were corrected by manually checking the brief’s quotations and formatting against the underlying sources. Unfortunately, Counsel did not notice that AI had intuitively made changes to the brief prior to filing. MEMORANDUM, February 19, 2026

Doppelgänger Hallucination Test Recap

· 2 min read
Chad Ratashak
Chad Ratashak
Owner, Midwest Frontier AI Consulting LLC

Back in October 2025, I coined the phrase Doppelgänger Hallucination after a hunch about LLM hallucinations led me to find that both Google AI Overviews and Perplexity "confirmed" the existence of nonexistent legal cases.

Doppelgänger Hallucination: when one LLM asked about a second LLM's ungrounded hallucination without additional context or prompting provides a false confirmation of the hallucination; this may also include embellishment with additional details. Not to be confused with both LLMs retrieving answers from the same bad source of data.

Google AI Overview hallucinating details about the fictitious case Weber v. City of Cape Girardeau

Google AI Overview confidently describing the fictitious case Weber v. City of Cape Girardeau, 447 S.W.3d 885 (Mo. App. 2014) — a case that does not exist, cited in Kruse v. Karlen.

I’ve gone through:


AI Gone Wrong in the Midwest

I have gotten onboarded with my software vendor to provide my CLEs available on-demand and am working on accreditation for several states and I will announce those as they are approved.

I discuss the Doppelgänger Hallucination Test in the context of In re Turner for “AI Gone Wrong in the Midwest,”.

Doppelgänger Hallucinations Test for Google Against the 22 Fake Citations in Kruse v. Karlen

· 7 min read
Chad Ratashak
Chad Ratashak
Owner, Midwest Frontier AI Consulting LLC

I used a list of 22 known fake cases from a 2024 Missouri state case to conduct a Doppelgänger Hallucination Test. Searches on Google resulted in generating an AI Overview in slightly fewer than half of the searches, but half of the AI Overviews hallucinated that the fake cases were real. For the remaining cases, I tested “AI Mode,” which hallucinated at a similar rate.

  • Google AI Overview gave the user an inaccurate answer roughly a quarter of the time (5 of 22 or ~23%), without the user opting to use AI features.
  • Opting for AI Mode each time an AI Overview was not provided resulted in an overall error rate of more than half (12 of 22 or ~55%).
info

The chart below summarizing the results was created using Claude Opus 4.5 after manually analyzing the test results and writing the blog post. All numbers in the chart were then checked again for accuracy. Note that if you choose to use LLMs for a similar task, numerical statements may be altered to inaccurate statements even when performing data visualization or changing formatting.

danger

tl;dr if you ask one AI, like ChatGPT or Claude or Gemini something, then double-check it on a search engine like Google or Perplexity, you might get burnt by AI twice. The first AI might make something up. The second AI might go along with it. And yes, Google Search includes Google AI Summary now, which can make stuff up. I originally introduce this test in an October 2025 blog post.

tip

To subscribe to law-focused content, visit the AI & Law Substack by Midwest Frontier AI Consulting.

Kruse v. Karlen Table of 22 Fake Cases

I wrote about the 2024 Missouri Court of Appeals Kruse v. Karlen, which involved a pro se Appellant citing 24 cases total: 22 nonexistent cases and 2 cases that did not stand for the proposition for which they were cited.

Some of the cases were merely “fictitious cases”, while others were listed as partially matching the names of real cases. These partial matches may explain some of the hallucinations; however, the incorrect answers occurred with both fully and partially fictitious cases. Examples of different kinds of hallucinations see this blog post and for further case examples of partially fictitious cases, see this post about mutant or synthetic hallucinations.

The Kruse v. Karlen opinion, which awarded damages to the Respondent for frivolous appeals, provided a table with the names of the 22 fake cases. I used the 22 cases to conduct a more detailed Doppelgänger Hallucination test than my original test.

Kruse v. Karlen table of fake cases

Methodology for Google Test

Browser: I used the Brave privacy browser with a new private window opened for each of the 22 searches.

  • Step 1: Open new private tab in Brave.
  • Step 2: Navigate to Google.com
  • Step 3: Enter the verbatim title of the case as it appeared in the table from Kruse v. Karlen in quotation marks and nothing else.
  • Step 4: Screenshot the result including AI Overview (if generated).
  • Step 5 (conditional): if the Google AI Overview did not appear, click “AI Mode” and screenshot the result.

Results

Google Search Alone Did Well

Google found correct links to Kruse v. Karlen in all 22 searches (100%). These were typically the top-ranked results. Therefore, if users had only had access to Google Search results, they would likely have found accurate information from the Kruse v. Karlen opinion showing them the table of the 22 fake case titles clearly indicating that they were fictitious cases.

But AI Overview Hallucinated Half the Time Despite Having Accurate Sources

The Google Search resulted in generating a Google AI Overview in slightly fewer than half of the searches. Ten (10) searches generated a Google AI Overview (~45%); half of those, five (5) out of 10 (50%) hallucinated that the cases were real. The AI Overview provided persuasive descriptions of the supposed topics of these cases.

The supposed descriptions of the cases was typically not supported in the cited sources, but hallucinated by Google AI Overview itself. In other words, at least some of the false information appeared to be from Google’s AI itself, not underlying inaccurate sources providing the descriptions of the fake cases.

Weber v. City Example

Weber v. City of Cape Girardeau AI Overview hallucination

Weber v. City of Cape Girardeau, 447 S.W.3d 885 (Mo. App. 2014) was a citation to a “fictitious case,” according to the table from Kruse v. Karlen.

The Google AI Overview falsely claimed that it “was a Missouri Court of Appeals case that addressed whether certain statements made by a city employee during a federal investigation were protected by privilege, thereby barring a defamation suit” that “involved an appeal by an individual named Weber against the City of Cape Girardeau” and “involved the application of absolute privilege to statements made by a city employee to a federal agent during an official investigation.”

Perhaps more concerning, the very last paragraph of the AI Overview directly addresses and inaccurately rebuts the actually true statement that the case is a fictitious citation:

The citation is sometimes noted in subsequent cases as an example of a "fictitious citation" in the context of discussions about proper legal citation and the potential misuse of Al in legal work. However, the case itself is a real, published opinion on the topic of privilege in defamation law.

warning

The preceding quote from Google AI Overview is false.

When AI Overview Did Not Generate, “AI Mode” Hallucinated At Similar Rates

Twelve (12) searches did not generate a Google AI Overview (~55%); more than half of those, seven (7) out of 12 (58%) hallucinated that the cases were real. One (1) additional AI Mode description correctly identified a case as fictitious; however, it inaccurately attributed the source of the fictitious case to a presentation rather than the prominent case Kruse v. Karlen. Google’s AI Mode correctly identified four (4) cases as fictitious cases from Kruse v Karlen.

Like AI Overview, AI Mode provided persuasive descriptions of the supposed topic of these cases. The descriptions AI Mode provided for the fakes cases were sometimes partially supported by additional cases with similar names apparently pulled into the context window after the initial Google Search, e.g., a partial description of a different, real case involving the St. Louis Symphony Orchestra. In those examples, the underlying sources were not inaccurate; instead, AI Mode inaccurately summarized those sources.

Other AI Mode summaries were not supported by the cited sources, but hallucinated by Google AI Mode itself. In other words, the source of the false information appeared to be Google’s AI itself, not underlying inaccurate sources providing the descriptions of the fake cases.

Conclusion

Without AI, Google Search’s top results would likely have given the user accurate information. However, Google AI Overview gave the user an inaccurate answer roughly a quarter of the time (5 of 22 or ~23%), without the user opting to use AI features. If the user opted for AI Mode each time an AI Overview was not provided, the overall error rate would climb to more than half (12 of 22 or ~55%).

Recall that for all of these 22 cases, which are known fake citations, Google Search retrieved the Kruse v. Karlen opinion that explicitly stated that they are fictitious citations. If you were an attorney trying to verify newly hallucinated cases, you would not have the benefit of hindsight. If ChatGPT or another LLM hallucinated a case citation, and you then “double-checked” it on Google, it is possible that the error rate would be higher than in this test, given that there would likely not be an opinion addressing that specific fake citation.

The Principal-Agents Problems 3: Can AI Agents Lie? I Argue Yes and It's Not the Same As Hallucination

· 6 min read
Chad Ratashak
Chad Ratashak
Owner, Midwest Frontier AI Consulting LLC

Hallucination v. Deception

The term "hallucination" may refer to any inaccurate statement an LLM makes, particularly false-yet-convincingly-worded statements. I think "hallucination" gets used for too many things. In the context of law, I've written about how LLMs can completely make up cases, but they can also combine the names, dates, and jurisdictions of real cases to make synthetic citations that look real. LLMs can also cite real cases but summarize them inaccurately, or summarize cases accurately but then cite them for an irrelevant point.

There's another area where the term "hallucination" is used, which I would argue is more appropriately called "lying." For something to be a lie rather than a mistake, the speaker has to know or believe that what they are saying is not true. While I don't want to get into the philosophical question of what an LLM can "know" or "believe," let's focus on the practical. An LLM chatbot or agent can have a goal and some information, and in order to achieve that goal, will tell something to someone that is contrary to the information it has. That sounds like lying to me. I'll give four examples of LLMs acting deceptively or lying to demonstrate this point.

And I said "no." You know? Like a liar. —John Mulaney

  1. Deceptive Chatbots: Ulterior motives
  2. Wadsworth v. Walmart: AI telling you what you want to hear when it isn't true
  3. ImpossibleBench: AI agents cheating on tests
  4. Anthropic's recent report on nation-state use of Claude AI agents

Violating Privacy Via Inference

This 2023 paper showed that chatbots could be given one goal shown to the user: chat with the user to learn their interests. But the real goal is to identify the anonymous user's personal attributes including geographic location. To achieve this secret goal, the chatbots would steer the conversation toward details that would allow the AI to narrow down what geographic regions (e.g., asking about gardening to determine Northern Hemisphere or Southern Hemisphere based on planting season). That is acting deceptively. The LLM didn't directly tell the user anything false, but it withheld information from the user to act on a secret goal.

Deceptive chatbot

The LLM Wants to Tell You What You Want to Hear

In the 2025 federal case Wadsworth v. Walmart, an attorney cited fake cases. The Court referenced several of the prompts used by the attorney, such as “add to this Motion in Limine Federal Case law from Wyoming setting forth requirements for motions in limine.” What apparently happened is that the the case law did not support the point, but the LLM wanted to provide the answer the user wanted to hear, so it made something up instead.

You could argue that this is just a "hallucination," but there's a reason I think this counts as a lie. A lot of users have demonstrated that if you reword your questions to be neutral or switch the framing from "help me prove this" to "help me disprove this," the LLM will change its answers on average. If it can change how often it tells you the wrong answer, that implies that the reason for the incorrect answer is not merely the LLM being incapable of deriving the correct answer from the sources at a certain rate. Instead, it suggests that at least some of the time, the "mistakes" are actually the LLM lying to the user to give the answer it thinks they want to hear.

ImpossibleBench

I loved the idea of this 2025 paper when I first read it. ImpossibleBench forces LLMs to compete at impossible tasks for benchmark scoring. Since the tasks are all impossible, the only real score should be 0%. If the LLMs manage to get any other score, it means they cheated. This is meant to quantify how often AI agents might be doing this in real-world scenarios. Importantly, more capable AI models sometimes cheated more often (e.g., GPT-5 v. GPT-o3). So the AI isn't just "getting better."

Deceptive benchmarking
caution

I recommend avoiding the framing "AI is getting better" or "will get better" as a thought terminating cliche to avoid thinking about complicated cybersecurity problems. Instead, say "AI is getting more capable." Then think, "what would a more capable system be able to do?" It might be more capable of stealing your data, for example.

For example, an LLM agent with access to unit tests may delete failing tests rather than fix the underlying bug. Such behavior undermines both the validity of benchmark results and the reliability of real-world LLM coding assistant deployments.

If an AI agent is meant to debug code, but instead destroys the evidence of its inability to debug the code, that's lying and cheating, not hallucination. AI cheating is also a perfect example of a bad outcome driven by the principal-agent problem. You hired the agent to fix the problem, but the agent just wants to game the scoring system to be evaluated as if it had done a good job. This is a problem with human agents, and it extends to AI agents too.

Nation-State Hackers Using Claude Agents

On November 13, 2025, Anthropic published a report stating that in mid-September, Chinese state-sponsored hackers used Claude's agentic AI capabilities to obtain access to high-value targets for intelligence collection. While this included confirmed activity, Anthropic noted that the AI agents sometimes overstated the impact of the data theft.

An important limitation emerged during investigation: Claude frequently overstated findings and occasionally fabricated data during autonomous operations, claiming to have obtained credentials that didn't work or identifying critical discoveries that proved to be publicly available information. This AI hallucination in offensive security contexts presented challenges for the actor's operational effectiveness, requiring careful validation of all claimed results. This remains an obstacle to fully autonomous cyberattacks.

So AI agents even lie to intelligence agencies to impress them with their work.

When Two AIs Trick You: Watch Out for Doppelgänger Hallucinations

· 7 min read
Chad Ratashak
Chad Ratashak
Owner, Midwest Frontier AI Consulting LLC
danger

tl;dr if you ask one AI, like ChatGPT or Claude or Gemini something, then double-check it on a search engine like Google or Perplexity, you might get burnt by AI twice. The first AI might make something up. The second AI might go along with it. And yes, Google Search includes Google AI Summary now, which can make stuff up.

tip

To subscribe to law-focused content, visit the AI & Law Substack by Midwest Frontier AI Consulting.

In re: Turner, Disbarred Attorney and Fake Cases

Iowa Supreme Court Attorney Disciplinary Board v. Royce D. Turner (Iowa)

In July 2025, the Iowa Supreme Court Attorney Disciplinary Board moved to strike multiple recent filings by Respondent Royce D. Turner, including Brief in Support of Application for Reinstatement, because they contained references to a non-existent Iowa case. Source 1

caution

There was subsequently a recent Iowa case, Turner v. Garrels, in which a pro se litigant named Turner misused AI. This is a different individual.

Several of Respondent’s filings contain what appears to be at least one AI-generated citation to a case that does not exist or does not stand for the proposition asserted in the filings. —In re: Turner

The Board left room with “or does not stand for the proposition,” but it appears that this was straightforwardly a hallucinated fake case cited as “In re Mears, 979 N.W.2d 122 (Iowa 2022).”

Watch out for Doppelgänger hallucinations!

I searched for the fake case title “In re Mears, 979 N.W.2d 122 (Iowa 2022)” cited by Turner to see what Google results came up. What I found was Google hallucinations seeming to “prove” that the AI-generated case title from Turner referred to a real case. Therefore, simply Googling a case title is not sufficient to cross-reference cases, because Google’s AI Overview can also hallucinate. As I have frequently mentioned, it is important for law firms that claim not to use AI to understand that many common and specialist programs now include generative AI that can introduce hallucinations, such as Google, Microsoft Word, Westlaw, and LexisNexis.

First Google Hallucination

The first time, Google's AI Overview hallucinated an answer stating that the case was a real Iowa Supreme Court decision about court-appoint attorney's fees to a lawyer, but the footnotes linked by Google were actual to Mears v. State Public Defenders Office (2013). Key Takeaway: Just because an LLM puts a footnote next to its claim does not mean the footnote supports the statement.

First Google Hallucination First Google Hallucination

Second Google Hallucination

I searched for the same case name again later, to see if Google would warn me that the case did not exist. Instead, it created a different hallucinated summary.

The summary and links related to a 2022 Iowa Supreme Court case, Garrison v. New Fashion Pork LLP, No. 21–0652 (Iowa 2022). Key Takeaway: LLMs are not deterministic and may create different outputs even when given the same inputs.

Second Google Hallucination Second Google Hallucination

Perplexity AI’s Comet Browser

Perplexity AI, an AI search engine company, recently released a browser for macOS and Windows to compete with browsers like Chrome, Safari, and Edge. I get a lot of ads for AI stuff on social media, so I've been bombarded with a lot of different content recently promoting Comet. To be frank, most of it is incredibly tasteless to the point that I think parents and educators should reject this product on principle. They are clearly advertising this product to students (including medical students!) telling them Comet will help them cheat on homework. There isn't even the fig leaf of "AI tutoring" or any educational value.

First Perplexity Comet Hallucination
danger

Perplexity’s advertising of Comet is encouraging academic dishonesty, including in the medical profession. You do not want to live in a future full of doctors who were assigned to watch a 42-minute video of a live Heart Transplant and instead “watched in 30s” with Comet AI. Yes, that is literally in one of the Perplexity Comet ads. Perplexity’s ads are also making false claims that are trivial to disprove, like “Comet is like if ChatGPT and Chrome merged but without hallucinations, trash sources, or ads.” Comet hallucinates like any other large language model (LLM)-powered AI tool.

Comet Browser’s Hallucination

I searched for the fake case title “In re Mears, 979 N.W.2d 122 (Iowa 2022)” cited by Turner in a new installation of Comet. It is important to note that people can “game” these types of searches by conducting searches over and over until the AI makes one mistake, then screenshot that mistake to make a point. That is not what I’m doing here. This was the very first result from my first search. It was a hallucination that explicitly stated the fake case “is a 2022 Iowa Supreme Court decision” although this is followed by caveats that cast doubt on whether it really is an existing case:

"In re Mears, 979 N.W.2d 122 (lowa 2022)" is a 2022 lowa Supreme Court decision, but the currently available sources do not provide a readily accessible summary, holding, or specific details about the case itself. It appears this citation may pertain to legal doctrines such as cy près or charitable trust law, as suggested by the limited context in search returns, but direct case facts, parties, and the detailed ruling were not found in available summaries or law review discussions. georgialawreview If you need more detailed information, legal databases such as Westlaw, LexisNexis, or the official lowa Supreme Court opinions archive would provide the official opinion, including the background, holding, and legal reasoning of "In re Mears, 979 N.W.2d 122 (lowa 2022)".

If you were to follow up on the caveats in the second paragraph, you would learn that the case does not exist. However, this is still a hallucination, because it is describing the case as it if exists and does not mention the one relevant source, In re: Turner, which would tell you that it is a citation to a fake case.

Three Ways AI Can Make Things Up. How True But Irrelevant Can Be Harder to Correct Than Pure Nonsense.

· 5 min read
Chad Ratashak
Chad Ratashak
Owner, Midwest Frontier AI Consulting LLC

More Than One Type of Hallucination

ChatGPT sometimes makes things up. For example, ChatGPT famously made up fictional court cases that were cited by attorneys for the plaintiff in Mata v. Avianca. But totally made up things should be easy to spot if you search for the sources. It’s when there’s a kernel of truth that large language model (LLM) hallucinations can waste the most time for lawyers and judges or small businesses and their customers.

  1. A “Pure Hallucination” is something made up completely with no basis in fact.
  2. A “Hallucinated Summary” has a footnote or other citation referencing a real source, but the LLM’s description of what that source says has little if anything to do with the source.
  3. An “Irrelevant Reference” is when an LLM cites a real sources and summarizes it fairly correctly, but the citation itself is not relevant to the purpose of the citation. This might be because the information is outdated, because the point only tangentially refers to the same topic, or for other reasons.
info

These examples were derived by actually reading the sources and were not written by LLMs. All of the written content on our website and social media is human-written, unless it is an example of AI-output that is clearly labelled.

danger

AI can help people summarize or rephrase content they know well. But Midwest Frontier AI Consulting strongly encourages AI users not to rely on AI-generated overviews of content they are not already familiar with precisely because of the subtler forms of AI hallucinations described below.

Scenario 1: You Got Your Chocolate In My Case Law

  • Pure Hallucination: ** The LLM says: “Wonka v. Slugworth clearly states that chocolate recipes are not intellectual property.” ** In reality: No such case exists.

  • Hallucinated Summary: ** The LLM says: “NESTLE USA v. DOE clearly states that chocolate recipes are not intellectual property.” ** In reality: The case involves a chocolate company but is not about intellectual property rights.

  • Irrelevant Reference:

    • The LLM Says: ‘HERSHEY CREAMERY v. HERSHEY CHOCOLATE involved two parties that both owned trademarks to “HERSHEY’S” for ice cream and chocolate, respectively. This supports our assertion that chocolate recipes are not intellectual property.’
    • In reality: The facts of the case do not support the conclusion.

1. Mata v. Avianca Was Not Mainly About ChatGPT

· 11 min read
Chad Ratashak
Chad Ratashak
Owner, Midwest Frontier AI Consulting LLC

Mata v. Avianca: The First ChatGPT Misuse Case

The case Mata v. Avianca was a personal injury lawsuit against an airline in the U.S. District Court for the Southern District of New York (SDNY). However, the reason it became a landmark legal case was not the lawsuit itself, but the sanctions issued against the plaintiff’s lawyers for citing fake legal cases made up by ChatGPT. At least that was the popular version of the story emphasized by some reports. The reality, according to the judge’s opinion related to the sanctions, is that the penalty was about the attorneys doubling down on their misuse of AI in an attempt to conceal it. They had several opportunities to admit their fault and come clean (page 2, Mata v. Avianca, Inc., No. 1:2022cv01461 - Document 54 (S.D.N.Y. 2023)).

Take this New York Times headline “A Man Sued Avianca Airline. His Lawyer Used ChatGPT,” May 27, 2023. This article, written before the sanctions hearing in June 2023, focused on the ChatGPT-gone-wrong angle. By contrast, Sarah Isgur of the Advisory Opinions podcast had a very good breakdown noting the attorney’s responsibility and the back-and-forth that preceded the sanctions (episode “Excessive Fines and Strange Bedfellows,” May 31, 2023). However, in that podcast episode the hosts questioned the utility of ChatGPT for legal research and said “that is what Lexis and Westlaw are for” but as of 2025 both tools have added AI features including use of OpenAI’s GPT large language models (LLMs).[^1]

caution

I am not an attorney and the opinions expressed in this article should not be construed as legal advice.

A surrealist pattern of repeated dreamers hallucinating about the law and airplanes.

Hallucinating cases about airlines.

Why Care? Our Firm Doesn’t Use AI

Before I get into the details of the case, I want to point out that only one attorney directly used AI. It was his first time using ChatGPT. But another attorney and the law firm also got in trouble. It only takes one person using AI without proper training and without an AI policy to harm the firm. It seems that one of the drivers for AI use was access to other federal research tools was too expensive or unavailable, a problem that may be more common for solo firms and smaller firms.

Partner of Levidow, Levidow & Oberman: “We regret what's occurred. We practice primarily in state court, and Fast Case has been enough. There was a billing error and we did not have Federal access.” Matthew Russell Lee’s Newsletter Substack

You might say, “Fine! We just won’t use AI then.” Do you have a written policy stating that? Do you really not use AI? I have two simple questions:

  1. Do you have Microsoft Office? (then you probably have Office 365 Copilot)
  2. Do you search for things on Google? (then you probably see the AI Overview) If the answer to either is yes (extremely likely), are you taking measures to avoid using these AI features? If not, how can you say you don’t use AI? Simply put, avoiding AI is not the default option. It requires conscious effort to avoid the features being added to existing software, from word processors to specialty legal research tools.

Overview of Fake Citations

The lawyers submitted hallucinated cases including the court and judges who supposedly issued them, hallucinated docket numbers and made up dates.

Hallucination Scoring & Old AP Test Scoring

· 2 min read
Chad Ratashak
Chad Ratashak
Owner, Midwest Frontier AI Consulting LLC

Lack of Guessing Penalties: The Source and Solution to Hallucination?

Language models like GPT-5 “are optimized to be good test-takers, and guessing when uncertain improves test performance” Why Language Models Hallucinate This is the key to AI hallucinations, according to a new research paper from OpenAI, the maker of ChatGPT, published on September 4, 2025. I think this explanation has merit, although it doesn't seem to explain when large language models (LLMs) have access to sources with the correct answers and incorrectly summarize them.

The most interesting point to me in the paper is their call for changing how AI benchmarks score different AI models to penalize wrong guesses. This reminded of how for most multiple-choice tests in school, you should choose any random answer rather than leave the answer blank. If the answers are ABCD, you have a 25% chance of getting the answer right and you always have a positive expected value, because you either get one point or zero. Zero for a wrong answer is the same as zero for no answer. However, Advanced Placement (AP) tests used to give negative points for wrong answers. When I went to find a source for my recollection about AP test scoring, I learned that this policy had changed shortly after I graduated high school. (“AP creates penalties for not guessing,” July 2010). So it appears that penalizing guessing is just as unpopular with human benchmarks as AI benchmarks. I, for one, am in favor of wrong-guess penalties for both.

Three Ways Customers Learn About Your Business from Google AI (and what you can do about it)

· 5 min read
Chad Ratashak
Chad Ratashak
Owner, Midwest Frontier AI Consulting LLC

If you are a small business owner who wants nothing to do with AI, I appreciate that decision. Midwest Frontier AI Consulting supports business owners who want to use AI responsibly and business owners who want to make an informed decision not to use AI. However, you still need to learn about generative AI, even if only to avoid it and mitigate the negative effects.

Your customers are using AI to learn about your business, often without even realizing they are using AI. “Google” has been a verb for over two decades now according to Wikipedia, but “googling something,” hasn’t stayed the same. AI tools have moved into familiar areas like Google Search and Google Maps. Here are three ways your customers may be using generative AI to learn about your business from Google’s AI tools, and what you can do about it.

Google’s Gemini AI attempts to summarize website information and provide an overview. However, the AI summary can introduce errors ("hallucinations") that mislead customers. For example, a local Missouri pizzeria was inundated with customer complaints about “updated [sic, appears they meant to say ‘outdated’] or false information about our daily specials” described by Google’s AI Overview (Pizzeria’s Facebook Post).

What Not to Do

Don’t call the information “fake” if it is really information taken out of context. For example, the pizzeria’s Facebook page shows they offer a deal for a large pizza for the price of a small pizza, but only on Wednesdays (outdated information). It is still legitimate to criticize the AI and it is still legitimate to tell customers who want the deal on another day of the week that the offer is only valid on Wednesdays. However, claiming the offer is “made up by the AI” will probably not calm down a customer who may then go to the business’s Facebook profile and see several posts about similar deals (but only on Wednesdays).

Don’t simply tell customers “Please don’t use Google AI.” The customers probably do not realize they are using AI at all. The AI Overview appears at the top of Google Search. Most people probably think they are “just googling it” like they always have and don’t realize the AI features have been added in. So warning them not to use something they didn’t opt into and aren’t actively aware of using is not going to help the situation.

What To Do

  • AI-focused solutions. If AI is going to mix things up like this, you can try to: ** Delete old posts about deals that are not active or make temporary posts, so that AI hopefully won’t include the information in summaries later. ** Word posts carefully with AI in mind. Maybe “only on Wednesday” would be better than “EVERY Wednesday.” Spell out something that would be obvious to a human but not necessarily an AI, like “not valid on any other day of the week.”
  • Customer-focused solutions. Ultimately, it is hard to predict how the AI will act, though, so you will need to prepare for potentially angry customers: ** Train staff on how to handle AI-created customer confusion (or think about how you yourself will talk to customers about it). ** Post signs regarding specials and preempt some AI-created confusion.