Skip to main content

The AI "Writing Help" Trap: A Game to Show How AI Can Silently Alter Legal Documents

I made a game to help attorneys understand the situation of the attorney in Kosel Equity v. MacGregor, a recent Connecticut case: using other tools for research, then generative AI for writing assistance, and how it may be difficult to spot when “AI…intuitively [makes] changes to the brief.” This is an interactive demonstration of the unexpected risks of asking generative AI to "clean up" a legal draft.

I made a game to help attorneys understand the situation of the attorney in Kosel Equity v. MacGregor, a recent Connecticut case: using other tools for research,1 then generative AI for writing assistance, and how it may be difficult to spot when “AI…intuitively [makes] changes to the brief.” This is an interactive demonstration of the unexpected risks of asking generative AI to "clean up" a legal draft.

I actually used generative AI to help me write the code for this game, which puts me at some of the same risks, since the text is mixed in with the code. So, I reviewed the output and manually edited the resulting files to remove things when the coding agent had gone beyond what I had told it to say. This proved to be pretty frustrating at times, but was a helpful meta-lesson from the project. If you do spot errors in the game, please feel free to email.

Game Scenario: Make the Draft Motion “Better”?

I frequently warn, including in my Ethics CLE, that LLMs can introduce significant errors even though the “cleaned up” draft might look better at first glance.

The game scenario is simple and realistic: an attorney has a motion for summary judgment with some formatting issues, extra spacing, and a few typos. The attorney either pastes the draft into an AI assistant and says "clean this up,” then pastes the result back into their word processor. Or the attorney uses an integrated LLM like the ones in Copilot in Microsoft Word, or Apple Pages, or Grammarly, etc., to change the draft in place. Or a paralegal or intern or someone else the attorney supervises does this without the attorney’s knowledge.

As you will see in the game, the AI does clean up the draft. But it also silently makes material changes that are not correct.

How to Play (Scroll Down for Game)

  • First, you’ll see the “before” draft motion.
  • Second, you’ll “fix” it with AI and see the “after” version of the draft.
  • Third, you’ll have an opportunity to try to click the spots in the motion where the AI made changes it shouldn’t have.
  • Then you’ll get some suggestions for a different approach and what went wrong.

Real-World Parallel: Kosel Equity v. MacGregor (Connecticut Supreme Court, 2026)

The scenario portrayed in this game is not hypothetical. I discussed it in my CLE in my Ethics CLE “AI Gone Wrong in the Midwest,” with content from December 2025 as it related to an expert witness and specifically warned about these risks.

In February 2026, the Connecticut Supreme Court ordered counsel for the appellant Kosel Equity, LLC v. MacGregor to respond to questions about AI use after an errata sheet was filed to correct errors in the appellant's brief.

Counsel for the Appellant used Lexis for the legal research in the drafting of the brief. After the initial brief was drafted, Counsel used ChatGPT to assist in the organization and formatting of the content of the brief. This assisted with analyzing the brief to avoid duplication of arguments. After the initial drafting, I used AI to further assist with the organization, formatting and refinement of the brief, in particular, to assist with compliance with word count restrictions. It was not used as a substitute for legal research or an alternative to Counsel’s own work product. MEMORANDUM, February 19, 2026

And:

AI was also used to assist in reviewing the content of the brief in particular to comply with the word count restrictions. The errors identified in the errata sheet were corrected by manually checking the brief’s quotations and formatting against the underlying sources. Unfortunately, Counsel did not notice that AI had intuitively made changes to the brief prior to filing. MEMORANDUM, February 19, 2026

Try the Interactive Demo

Loading interactive demo...

The Setup

The demonstration uses a fictional motion for summary judgment in Illinois state court — Crestline Holdings, LLC v. Thornton Industrial Partners. Note that the content of this draft motion was itself AI-generated and is not intended to be related to any real cases or parties. The original draft has the kinds of minor issues you'd expect in an early draft: a misspelled "trhough," inconsistent spacing, a line break in the middle of a sentence, "Judgement" instead of "Judgment." Real errors that an AI can genuinely help fix.

The problem is what the AI does beyond those mechanical fixes.

The Five Categories of AI Errors

The demonstration walks through five categories of errors. Each error is based on a real hallucination scenario I have seen in my research and tests. These types of errors would have real consequences for an attorney, such as potential Order to Show Cause for Federal Rule of Civil Procedure 11 sanctions or analogous state-level rules. See “AI & Law: History of AI Misuse” Page for More.

1. Incorrect Court Citation

The original draft cited a case as "CA" The context of First District should indicate that “CA” means Court of Appeals, rather than “California,” and this is an area where LLMs outperform raw string search. However, I have still seen situations, replicated here in the example, when LLMs interpret and expand "CA" as the postal abbreviation for California and expanded it to “California”—shown in the game as improperly changing the citation to “N.D. Cal.,” i.e., the United States District Court for the Northern District of California.

LLMs also tend to be biased toward larger jurisdictions, like California. This is probably at least partially due to the volume of training data available. For example, in VerbalizedSampling (2025), the authors prompted LLMs asking to “Name a US state.” Under the naive prompting technique without any prompt engineering, they found that California was the answer 95% of the time, Texas 4.5%, Ohio 0.2% and the remainder were negligible.

I am planning to conduct tests on how this California bias may impact legal documents like motions and contracts.

California CLE if you aren’t in California at the time

As an aside, I have gotten interest from attorneys based in California. I have not yet applied for accreditation in California. However, Iowa is an Approved Jurisdiction according to the State Bar of California website

Further, according to the State Bar of California:

A California attorney can claim California MCLE credit for education activities attended/taken outside California, provided that:

  • the attorney is outside California when attending/taking the activity
  • the activity is the type of activity that can be approved for California MCLE credit;
  • the activity is approved by an Approved Jurisdiction.

So based on this information, it is my understanding that a California attorney would be able to take my Iowa-accredited CLE (which includes 1 hour of Ethics), provided you are outside of California when taking the on-demand course, (e.g., traveling to another state for work).

2. Hallucinated First Name

The motion referred only to "Mr. Thornton.” There is no given name in the record. The LLM added a first name, “James,” on its own. I have seen hallucinations like this, which are not based on the cited sources (called “ungrounded hallucinations”), in tests such as with Google AI Overviews and AI Mode.

Claude has been known to favor certain surnames in generated text, e.g., Chen and Martinez.

3. Altered Verbatim Testimony

A witness's deposition testimony was quoted verbatim as: "He had already signed the contract before anyone else arrived."

The AI hypothetically changed "contract" (singular) to "contracts" (plural) in this scenario. Subtle changes like this are likely to be missed when you skim, but nevertheless make significant semantic changes to the text (i.e., is there one contract or several?).

As a real example, I tested how an LLM would perform with reformatting my CLE course material from notes to slides. This is material I know well, and the error was still subtle enough that it could have been missed.

I had two separate statements:

  1. opposing counsel identified a fake case citation
  2. the judge described that citation as “a mutant citation.” The LLM combined them into a single bullet:
  • “opposing counsel identified ‘mutant citation.’” Wrong speaker, wrong number. But if you were skimming dozens of pages, you might miss this, especially if you are not specifically testing the LLM’s output for errors.

4. Hallucinated Academic Title

The original draft cited only (Chen & Wallace, 2018). The AI fabricated a plausible-sounding title: “Fiduciary Obligations and Corporate Liability: A Comparative Analysis.” In Kohls v. Ellison (D. Minn. 2025), the expert witness noted that placeholder [CITE] instructions left in the draft were likely interpreted by ChatGPT as instructions to generate a citation when they were intended to remind the author to replace them with citations. I cover this case in my Ethics CLE.

When the expert witness then pasted the edited text from ChatGPT back into the word processor, the hallucinated plausible-yet-fake academic publication titles escaped further notice until the opposing counsel identified the fake academic citations and moved to exclude the expert witness testimony. Based on the explanation provided by that expert witness, the cause was not AI research, but AI editing after the fact. This is also what apparently happened in the recent Connecticut case.

“Grossly negligent” is legal term of art. LLMs may be opinionated about rewording things to sound “more natural” for the perceived audience, but these phrases cannot be substitutions like simple synonyms. Using something like “very careless” unintentionally would be a major error.

The Wrong Way vs. The Better Way

My game scenario attempts to contrast two approaches:

Wrong way: Paste the draft into an AI, tell it to “clean this up” or “make this sound more professional,” then accept the output wholesale. The attorney sees tidier formatting at a glance and assumes the job is done. Or, the attorney may even catch an erroneous error or two, but if that does not trigger a more thorough redline comparison between versions, the damage may still be done.

Better way: Ask the AI to return an itemized list of proposed changes for the attorney to review before accepting. This approach surfaces both the mechanical fixes and the substantive changes, giving the attorney the opportunity to accept the typo corrections while flagging and rejecting the inappropriate suggested content changes.

Even with the better approach, some AI suggestions still require careful scrutiny. The demonstration's AI suggestion list correctly flagged its incomplete citation inference as tentative — but still proposed the wrong answer.

Key Takeaway

Using LLMs for copyediting can introduce meaningful errors that may be difficult to detect on casual review. The fixes look right. The formatting is cleaner and the typographical errors are gone. But the AI may have changed other details it wasn’t supposed to.

I have two recommendations given this risk and one additional warning:

  • Human author is first in and last out: first in, so that the direction is based on your knowledge and judgement; last out, so that these types of errors do not go unnoticed.
  • Get suggestions, not direct rewrites: ask the LLM for areas with errors in grammar, spelling, or formatting. Then, it can provide you with a list of suggested changes. This will ensure that any changes are reviewed by a human before entering the draft.
  • Don’t fall for “The LLMs will just get better.” I often reframe this for audiences when I give talks or provide training. Instead of saying LLMs are getting “better,” say they are getting “more capable.” This reframes how we think about risk. GenAI tools that are more capable may make fewer mistakes, but their errors may also be more convincing. For example, instead of frequently hallucinating obviously wrong case citations, they may hallucinate cases that are within a few numbers of real reporter numbers for real cases within that same jurisdiction. In some ways, that’s an incredible capability! But it is also not “better” from the perspective of the attorney doing the proofreading.

For more on convincing hallucinations, see this blog post.


AI Gone Wrong in the Midwest

Check out my CLE options and check the new map to see if your state recognizes Iowa-accredited on-demand CLE. I am also working on accreditation in additional states. Learn more here..

Footnotes

  1. It is possible for LLM-powered AI features in specialist legal tools like Lexis and Westlaw to hallucinate, too. However, I am taking the explanation as given that the ChatGPT writing assistance is what introduced the errors, as this is a plausible scenario. The point holds that you could have “done everything right” on the research side, and then have the AI mess up your writing in your draft afterwards.

Loading...