Skip to main content

8 posts tagged with "Gemini"

Discussion of Google's chatbot and LLM Gemini.

View All Tags

Does AI Erode Legal Reasoning? A UMN Law Study Finds That It Did Not For Certain Tasks, With Advice on Specific Use

· 14 min read
Chad Ratashak
Chad Ratashak
Owner, Midwest Frontier AI Consulting LLC
Disclosure

I provided feedback on an earlier draft of this article and am thanked in the introduction. I will aim to be fair and candid.

The main concern of the paper (and what I gather to be a focus area for years to come) is the cognitive impact of using generative AI for legal tasks. This requires both short-term and long-term studies; the authors are careful to note that negative effects may appear in longer term use of AI, but that in this particular study, the group that used AI throughout outperformed the group that only had access at the end.

This particular study is a short-term, randomized controlled trial (RCT), such as you might see more frequently in medicine (indeed, one author speculated that the use of that term may have caused public access issues for the article initially on X/Twitter).

I did not personally know Daniel Schwarcz, one of the University of Minnesota Law Professors, prior to reaching out to provide feedback on this study. But I knew of him and have a high opinion of his earlier work on cybersecurity and law. For example, Schwarcz had co-authored an excellent paper, “How Privilege Undermines Cybersecurity,” published in the Harvard Journal of Law & Technology in Spring 2023, which my group in the Dept. of Homeland Security’s Public-Private Analytic Exchange Program (AEP) read and cited in our work on Ransomware Attacks on Critical Infrastructure.

Study Design

University of Minnesota Law School professors Nick Bednar, David Cleveland, Allan Erbsen, and Daniel Schwarcz ran a randomized controlled trial involving approximately 100 2L and 3L students. The study was published April 5, 2026 on SSRN: Artificial Intelligence and Human Legal Reasoning. Strictly speaking, the comparison was not an “AI v. Not AI” group as much as an “AI from the outset” group v. an “AI as a final editor only” group.

This experiment used Google’s Gemini 2.5 Pro; I will discuss this specifically in a later section; this experiment is relevant, even for attorneys not using Gemini directly, because Gemini is the LLM behind Google AI Overviews and it is a popular API model in legal AI tools (e.g., Westlaw’s CoCounsel based on references to “Thomson Reuters AI third-party partners, such as OpenAI and Google…”).

Participants completed four sequential tasks:

  1. Synthesis Task (AI for AI group):

The synthesis task was designed to test whether using AI can help lawyers synthesize legal sources addressing unfamiliar subjects. We cast each participant in the role of a law firm associate who received an email from a partner asking them to summarize a legal rule based exclusively on five supplied sources. The partner explained that: “Your memo should outline the elements of the rule and any exceptions, providing a framework for how a court would approach the legal question. In other words, don’t merely summarize the sources; synthesize them into a summary of the rule that indicates how the elements fit together.”60 Participants had up to 75 minutes to read the packet and complete the memo.61 The control group was instructed not to use AI for this task, while the AI-exposed group was instructed to use Gemini 2.5 Pro62 “to assist you in writing the assignment.”

  1. Comprehension Task (closed book, no AI for either group): six moderately difficult multiple choice questions in ten minutes without access to either the source packet or AI.
  2. Application Task (access to their prior synthesis memo, no AI for either group):

That task presented participants with a follow-up email from the partner who had assigned them the synthesis task instructing them to write a memo applying their knowledge from the synthesis task to a new set of facts……“identify strengths and weaknesses in the client’s position, recommend arguments that the client should make, and rebut counterarguments.” Participants had up to 60 minutes to complete the memo.

  1. Revision memo (access to prior application memo, AI for both groups): revise the second memo with Gemini in 20 minutes.

Specifically, “[e]ach task related to a problem involving servitudes that burden personal property.”

Figure 1 from the study: density plot showing overall score distribution on the synthesis task. The AI-exposed group (orange) scores markedly higher than the control group (blue); vertical lines show each group's mean.

What They Found

AI helped on the synthesis task: as expected, students who used AI produced better memos and finished faster on the initial task when they had access to AI.

Early AI use did not diminish comprehension

contrary to our preregistered hypothesis, AI exposure at this initial stage did not diminish downstream comprehension of the underlying legal principles. To the contrary, participants who used AI on the synthesis task outperformed the control group on the later application task even when neither group had access to AI.

The full AI group outperformed the control (AI at the end) group. But the authors note that with long-term use, skills may atrophy. They warn that everyone (especially new lawyers) may lose or fail to develop skills if they don't learn

"to sit with a hard question, to trace an argument through its premises, to recognize when doctrine is uncertain and when it is settled" and learn to explain that reasoning to clients and to judges and lawyers, rather than delegating to AI.

"Leveling effect": AI use by high-performing individuals may degrade work product, while improving the work of the lowest performer. It didn't change the overall ranking, but made people more average. The experiment involved new information, not areas of expertise; however, the authors’ advice was to typically use AI where you are able to check the work because of your expertise.

What This Means and Connection to My CLE Advice

Like the authors, I was surprised by the results. However, the way the AI was used generally fits with the advice I give in my CLE on how to use AI responsibly.

Current CLE accreditation

Approved for CLE credit in the following states:

  • Iowa: Generative Artificial Intelligence Risks and Uses for Law Firms (Activity ID #437570) and AI Gone Wrong in the Midwest (Ethics) (Activity ID #437573)1 hour general and 1 hour ethics.
  • Illinois: Generative Artificial Intelligence Risks and Uses for Law Firms and AI Gone Wrong in the Midwest (Ethics)AI Gone Wrong in the Midwest also received approval for Professional Responsibility credit.
  • Virginia: Generative Artificial Intelligence Risks and Uses for Law FirmsThe general course is approved. The Ethics course (AI Gone Wrong in the Midwest) application is still pending.
  • Kansas: Generative Artificial Intelligence Risks and Uses for Law Firms and AI Gone Wrong in the Midwest (Ethics)1 hour general and 1 hour ethics.
  • Nebraska: Generative Artificial Intelligence Risks and Uses for Law Firms and AI Gone Wrong in the Midwest (Ethics)1 hour general and 1 hour ethics (called "Professional Responsibility" in Nebraska).
  • North Carolina: Generative Artificial Intelligence Risks and Uses for Law Firms and AI Gone Wrong in the Midwest (Ethics)Generative Artificial Intelligence Risks and Uses for Law Firms is approved for Technology credit; AI Gone Wrong in the Midwest is approved for Ethics credit.

We have applied for Minnesota and Virginia. We will keep this updated as accreditation is approved and working with the LMS provider, CLE Hero, then get the updated information and certificates on the website.

This block updates automatically as the list of CLE accreditation states changes.

Summarization as Triage, Not Replacement for Reading

In my CLE “Generative Artificial Intelligence Risks and Uses for Law Firms,” I note that you can use AI to summarize documents for triage when you have limited time. The students in the synthesis scenario only had 75 minutes to read and synthesize 5 documents.

However, I also warn that reading an AI’s summary of a document is not the same as having read the document, especially due to hallucinations. Hallucinations are not limited to making up fake case citations; they may also include fake quotations and improper summarization of the holdings of cases, for example.

Experts Should Check AI Output

I note throughout my CLE and the public talks that I give, such as my recent talk with ACAMS DC—you can watch a recording here, that the person most familiar with material should check the AI output.

For reports, this means if AI is used to write an executive summary, it should be a draft executive summary that is then reviewed and edited by the original author of the longer piece. It should not be a summary generated by a lazy reader who can’t be bothered to read the full document. The former can root out hallucinations and distortions of the writer’s intent. However, I would warn that even this use could shape the writer’s focus by making them highlight parts of the paper other than what they would have chosen had they written the executive summary from scratch.

Experts should check the output. AI can be very wordy and very persuasive. It can write things that appear to be correct. The user should not be learning something new when they are reviewing AI output, because the AI may persuade them. I warn in my CLE that sometimes AI is so persuasive, its confident hallucinations may even cause you to second-guess yourself about something you know well.

Warning About “Writing Help” at the End and Don’t Grab AI as a Last Resort

The authors recommend:

“A final principle suggested by our experimental results is that lawyers should avoid using AI to complete legal tasks under artificially tight time constraints or when cognitively fatigued.”

I frequently note that attorneys should learn about generative AI risks and responsible use now, even if they do not currently use it. Don’t wait until you feel like time pressure is forcing you to bail yourself out with AI:

  • Did not go well for an attorney in Colorado who used AI and blamed it on an intern in 2023; result: temporary suspension.
  • Did not go well for attorneys in New York in 2023 or an attorney in Iowa in 2025 who used AI in lieu of access to paid legal databases; result: monetary sanctions.
  • Did not go well for an AUSA with a 30-year career in North Carolina who reportedly used AI to “catch up” on a filing in 2026; result: resigned from office.

Empirically, the study found that using AI at the end under tight time pressure led “in many cases [to] a modest deterioration.” I warn that even using AI just for “writing help” at the end of a task can be risky in my CLE courses, especially in “AI Gone Wrong in the Midwest.” This has caused problems for attorneys and experts witnesses. You can play an interactive demonstration of that risk and read more about it in The AI "Writing Help" Trap.

Gemini and Outdated Models in Academic Studies

This study used Gemini 2.5 Pro, but a more current Gemini model since late Fall 2025 was Gemini 3 Pro, and now there is Gemini 3.1 Pro (preview). All formal academic research seems to suffer from the problem of referencing outdated GenAI models by the time of publication.

This is not the fault of the researchers. The pace of academic publishing is simply too slow for the pace of release of major model updates. I am not the first to point this problem out, and others, e.g., Ethan Mollick, have commented on it frequently.

The most important reason I mention this is that people will cite a paper claiming “AI fails at X task” and cite a recently published academic paper, but the paper itself almost certainly doesn’t have recent models. If you look closely, it might mention “o3” or “Claude Sonnet 4” or “Gemini 2.5,” which does not invalidate the study. However, it may be demonstrably false that “AI can’t do X” today with a frontier model, if you simple logged in and tried it for yourself. Lesson: do not rely on academic studies alone for claims about the limits of what AI can do.

For this reason, I think there is a lot more value in studies like the UMN study looking at AI-human interaction. They teach us how AI works in practice and how human users respond to AI use.

Context Window

Gemini was the first major LLM to have an extremely long context window: one million tokens.1 Gemini’s long context window has made it popular for processing large numbers of documents, which might make it seem well-suited for legal purposes.

Where Gemini Has Performed Well

In addition, Gemini is a very capable model in some tests, although it fails on others. For example, Gemini has been the model most capable of identifying coded allusions to sovereign citizen legal ideology among the models I've tested.2 It also currently has the best image editing model in my opinion, and Google has SynthID for provenance verification.3

Gemini 3 was also the first LLM to pass one of my personal benchmarks.4

Where Gemini Has Performed Poorly

However, Gemini also powers the Google AI Overview and Google AI Mode, which have access to Google Search results. Despite access to the best search engine in the world, the LLM-generated summaries can still hallucinate false descriptions of nonexistent cases based on nothing but the bare case citation. Even ungrounded false information that contradicts the Google Search results with the correct answer, e.g., an OSC chastising an attorney for citing the nonexistent case. This Doppelgänger Hallucination problem has not gone away with Gemini 3.

On an AI research task in the Gemini app, Gemini 2.5 performed worse than ChatGPT. When I ran the test again, Gemini 3 still failed the test, and tried to persuade me that it had provided a comprehensive answer. I have been warning consistently that AI models are not just “getting better,” but rather they are getting “more capable.” One way this manifests itself is in LLMs becoming more persuasive about covering for their own faults.

Even when Gemini performs well, its performance can be “jagged”: impressive on one detail, hallucinating on the next in the same task.5

Bad Data Training Policy

However, I would warn attorneys to carefully consider what they put into the consumer version of Gemini if using it for work purposes, such as the non-public discussion of client position and possible strategy described in the Application Task in this experiment. Google’s Gemini has one of the worst data training opt-outs for major U.S. labs if you are using the consumer version.

Conclusion

I always try to keep up with the latest LLM research, so that I am offering fresh and accurate advice to my clients. That being said, I am also not chasing the latest fad and trend. I focus on core principles around accuracy, end-to-end processes, and realistic understanding of how human users actually relate to their AI tools. That’s why I really like this study design and encourage the authors to continue in projects like this. It is also why I stand by the advice I provided in my CLE, despite this field of GenAI changing so dramatically every couple months.

How Midwest Frontier AI Consulting Can Help

That reading informs the governance advice I give to law firms: if you'd like to put it to work for yours, schedule an introductory call.

You can take my CLE on demand on CLE Hero. Find out more details here.

Footnotes

  1. Tokens are words, parts of words, or even single characters: the unit LLMs use when processing text. One million tokens in English is roughly 750,000 words, longer than War and Peace.

  2. I first encountered this issue when writing about the case Thomas v. Pangburn (S.D. Georgia) but have found additional examples for subsequent tests, which I have not written about publicly.

  3. I will be writing about SynthID more in a forthcoming series.

  4. Specifically, a test involving an obscure Moroccan Arabic word. I discuss the utility of personal, non-public benchmarks in my CLE.

  5. When fact-checking a U.S. federal court district map, Gemini correctly flagged the oft-overlooked inclusion of Yellowstone National Park in the District of Wyoming and Tenth Circuit (rather than the adjacent Montana/Idaho districts in the Ninth). That subtle catch was impressive, but in the same response it hallucinated a second "error" that was clearly not an error.

Yes, Claude Code is Amazing. It Also Still Hallucinates. Both Facts Are Important. My Christmas Map Project with Opus 4.5.

· 13 min read
Chad Ratashak
Chad Ratashak
Owner, Midwest Frontier AI Consulting LLC

This first week of January, the general feeling is very much everyone bringing out the winter vacation vibe coding projects cooked up on Claude Code. Claude Code itself isn't new, but with Opus 4.5 being so much more powerful, something just clicked for a lot of people (myself included). For me, it turned a lot of "when I have a couple days" projects into "well that's done, let's do another."

I am mainly going to describe in this post how I updated the map for my website, along with the hallucinations I saw along the way. I'll also talk about how prior programming experience and domain expertise in geographic information systems (GIS) helped with dealing with these hallucinations.

But first, I wanted to tick off a few other projects I did recently, just since my end of 2025 post.

  • I updated my transcription tool to support many more file types than just MP3 and added a GUI.
  • I got Claude Code to completely modernize Taprats, a geometric art Java program from Craig S. Kaplan. It appears to work just like the original so far, but I'll test it more before writing about it.
  • I built a local LLM spoiler-free summarizer of classic books. It increments to the chapter you left off on.

And more stuff. It's very exciting. I get why people are work up about Claude Code.

But that's why it's important to be reminded of hallucinations. Not to dunk of Claude Code, but to keep people grounded and maintain skepticism of AI outputs. You still have to check.

Safety First

I do not dangerously skip permissions. I know it can be exciting to get more out of AI agents. But the more agency you give it, the more harm it can do when it either goes off the rails or gets prompt injected to be a double-agent threat.

Claude's Hallucinations

  • Opus 4.5 hallucinated that there were two federal districts in South Carolina to fix an undercount.
  • Mixing up same-name counties (not exactly a hallucination, actually a common human error).
  • Claude removed Yellowstone National Park, a few military bases and a prison from the map (rather than shifting district borders from one district to another).
  • "Iowa Supreme Court Attorney Disciplinary Board" shortened to "Iowa Supreme Court," making it sound like an Iowa Supreme Court case.
  • I previously tried to used the tigris GIS package in R as source of a base layer of U.S. District Courts, but Opus 4.5 hallucinated a court_districts() function (this was not in Claude Code).

The South Carolina Counting Hallucination

I used Claude Code to build the Districts layer from counties and states based on their statutory defintion.

Claude Code with Opus 4.5 didn't initially hallucinate about the District of South Carolina. Rather, when I went back to make some edits and asked Claude Code in a new session to check the the work in that layer, it counted and said there should be 94 districts, but there were only 91. The actual cause of the error was that the Marshall Islands, Virgin Islands, and Guam were excluded from the map.

Claude said "let me fix that" and started making changes. Rather than identify the real source of the undercount, Claude interpreted that as just an undercount. So Claude tried to make up for the undercount by just splitting up districts into new ones that didn't exist.

South Carolina district hallucination

Claude split South Carolina in two and started to make a fictitious "Eastern District" and "Western District" which do not exist. But if you just wanted a map that looked nice without actually having familiarity with the data, then you might go along with that hallucination. It could be very persuasive. But actually the original version with just District of South Carolina was correct. South Carolina just has one district.

Patchwork Counties

When I had initially created this districtmap, it looked like a quilt. It was a patchwork of different counties wrongly assigned to different districts.

I don't know specifically why different areas were assigned to the wrong districts. I think primarily the reason is because there are a lot of same-named counties that belong to different states. So, probably Claude was just matching state names and then kept reassigning those states to different districts.

For example, Des Moines is in Polk County in Iowa. But there are a lot of Polk counties around the country. So if you're not using the state and county together as the key to match but you're just matching along the single dimension of using the county name, then you would have a lot of collisions. That's something that I'm very familiar with working with GIS.

If somebody were not familiar with GIS, they wouldn't really necessarily suspect the reason why, but it would be obvious that the map was wrong.

Since I was able to pretty quickly guess that that might have been the reason, I suggested a fix to Claude. That fixed most of the issues with most of the states.

Uncommon Problems with the Commonwealth of Virginia

One of the issues that was still persistent when I was building the districts from county level was in Virginia. I've actually lived in Virginia, so I was familiar with the city-county distinction. They have independent cities that are separate from the counties if they're sufficiently large and have a legal distinction from the surrounding county. For example, Fairfax City and Fairfax County are distinct things. It's even more confusing, because the school districts go with the counties. Most states don't follow that.

So I had to get Claude Code to wrangle with that. Claude even reviewed the statutory language. I could tell from reading as Claude was "planning" that it considered the Virginia city-county challenge, but it still failed on the initial attempt.

I had to iterate on it multiple times. I had to tell it that it had missed out on a whole area around Virginia Beach. It had flipped a couple cities and counties where it appeared that there was a city that had a similar name to an unrelated county in the other district. Claude just assumed that all counties and cities that had the same name were in the same location and assigned them the same. Then it had to go and look at where they actually were located and then reassign them to the appropriate Eastern or Western District.

But eventually I got to a point where it had good districts for Virginia.

Wyoming (and Idaho and Montana) and North Carolina

Now there are a couple other weird wrinkles in Wyoming and North Carolina. They don't follow the county boundaries completely.

Wyoming is the only district that includes more than one state. District of Wyoming also includes all of the parts of Idaho and Montana that are in Yellowstone National Park.

For North Carolina, rather than completely following county boundaries, there are a couple of military bases and a prison that are across multiple counties where the boundary follows the lines there rather than the county lines.

Initially I ignored those wrinkles. But once the rest of the map was in good shape, I just wanted to see what Claude could do.

I explained those issues and asked Claude Code to see if it could clean those lines up and get a map that reflected those oddities.

It did on the second attempt. But on the first attempt, Claude ended up just cutting out Yellowstone National Park and those military bases and that prison from any district. So there were just blank spots where Yellowstone would be that was just cut out of Idaho, Montana, and Wyoming. Those bases and that prison were just cut out of either the Eastern Districts or Middle District of North Carolina.

That was a problem, obviously, because they needed to be shifted from one district to another, not removed from all districts. So I needed to explain more specifically what I wanted Claude to do to fix that. It needed to move the lines, not to remove them entirely from the map. That second attempt got it cleaned up.

District of Wyoming map

Claude Still Saved A Lot of Time Accounting for Hallucinations

And I was still very impressed with Claude doing that. But having familiarity with the data and looking at the output were important.

There's no doubt in my mind after doing all this that Claude saved a tremendous amount of time compared to what I would have had to do with manual GIS workflows to get this kind of a map on a desktop computer.

Then there's another layer of having it be responsive in all the ways that I needed it to be on my website for other users. So it is just tremendous to see how cool that is.

But I do think that domain expertise, familiarity with GIS in the past was still helpful to me, even though I didn't have to do a lot of hands-on work. Just being able to guide Claude through the mistakes that it made and being able to check the output was very helpful. Since it's a map, since the output is visual, there were some things that anyone could see, obviously, that it got wrong. Even if you didn't know why it might have gone wrong, you could tell that the map was wrong. And you might have been able to get to a better finished product by iterating with Claude Code. But you might have also wasted more time than I did with Claude if you hadn't had GIS experience to guide your prompting.

Map Features with Claude Code

Use Github, Try to Keep Formatting Code Separate from Text/Data

I had already written this, and I stand by it.

However, as powerful as Claude Code is, it is also important to use GitHub or something similar for version control. It is also critical to make sure Claude is changing code but not your actual writing.

Claude Code and My Map with Links to Blog Posts About AI Hallucinations Cases

This map is not a map of every AI hallucinations case, but rather every case that I have blogged about so far. Basically, it's federal and state cases where there has been either a strong implication or the direct assertion that there was AI misuse. Many of these cases cite Mata v. Avianca.

Lone Case Markers

If you click on a given case and it's a single case, you'll see:

  • what the case is called
  • the year
  • the jurisdiction
  • the type of case (federal or state), which is also indicated by the color
  • links to related articles where I've talked about that case

Clusters, Spiders, and Zooming

Getting the "spiderize" functions to work was the must frustrating part of all of this. I made several prior attempts with Claude Code on Opus 4.5. With the same prompts, this most recent attempt finally just worked on the "first" attempt (of that session). I only tried again an afterthought once all the other features were done. But previously, I'd wasted a lot of time trying to get it right. So both a Claude Code success and faillure. Still, I'm happy with the final result.

Zoom to Mata v. Avianca

If you click those links, it'll jump over either to my company blog or the Substack articles where I've talked about those cases.

Additionally, if they reference other cases that are also on the map, such as Mata v. Avianca, then there will be lines drawn from the case you clicked to the other cases on the map reference or are referenced by those other cases. The map will give you a little count summary at the bottom: "Cites three cases" or "cited by" so many cases.

So if we look at Mata v. Avianca, the marker is not by itself on the map. If you look at the eastern United States from the starting zoom level that I'm looking at as I'm writing this, you see a "4." The 4 has a slash of red and orange, meaning there are both federal and state cases.

If you click the 4, the map zooms in. Now there are three over the New York-New Jersey area, and one over Annapolis, Maryland.

Click the three, and the map zooms in further. That splits between one in New Jersey and two in New York.

Click the two, and then those two "spider out" because they are both in the same jurisdiction. One is Mata v. Avianca, and that is cited by seven cases currently. It's a 2023, Southern District of New York, federal district court case. The other is Park v. Kim, a 2024 case, which is actually a Second Circuit Case that is placed on the map in the same location.

The New Jersey case is In re Cormedics, Inc. Securities Litigation, a 2025 case from the District of New Jersey, which is a federal case, and that was one of the cases that was discussed by Senator Grassley asking judges about their AI misuse.

Other Clusters in Mountain West, Texas

Spider over Iowa

So if you zoom out, you know, it combines nearby cases. If you zoom out far enough, it will combine Wyoming and Colorado, for example, or multiple districts in Texas. But as you zoom in or as you click, it will zoom in further and split those out.

If you look at Iowa, there are five currently, and those will all spider out because they are all in the same location. But then you can click one of the individual ones and get the details.

Iowa spider cluster

District Level

If you hover your mouse of a district, it will tell you how many federal cases were in that district and have a blog post about them.

Southern District of Iowa hover

Circuit Level

If toggle off the district boundaries and toggle on the circuit boundaries, and federal cases are still toggled on, hovering your mouse over the circuit will give you a count of how many cases were in that circuit and have a blog post about them.

6th Circuit hover

A List of Odd Projects I Did This Year That Don’t (Yet) Have Dedicated Blog Posts

· 11 min read
Chad Ratashak
Chad Ratashak
Owner, Midwest Frontier AI Consulting LLC

I started Midwest Frontier AI Consulting in August 2025. I have been writing on my company blog since August 20, 2025 about different AI-related content and since September 13, 2025, I’ve been writing on Substack specifically about AI and Law.

I have described a variety of projects, including:

But there are many other projects I did not write about this year.

Misc. Projects, Tests, and Quick Things I Tried in 2025

I spend a lot of time talking about the risks of generative AI, not because I categorically dislike genAI, but because it’s an important technology and to use it responsibly we need to understand the risks.

I don’t always talk about the cool, interesting, and useful things you can do with it. So consider this a catch-up post. Some of these projects were quick and may not warrant a full write-up. Others will get a blog post at a later date, but will be included in this year-end list.

Local AI Models

AI models that run on your own computer.

This can simply provide the files with the best similarity score to the user’s query.

Or, you can have retrieval augmented generation (RAG), with a chatbot using your documents for context but also generating responses. RAG can help mitigate AI hallucinations, but does not limit them.

I am doing a lot of experimenting with vector search, RAG setups, and local LLMs that I haven’t written about in much detail yet. On recent project involved using Chroma and ollama to build RAG on top of my blog posts for the year (on my device, this is not a live feature on my website).

You can quickly bulk transcribe audio files, such as podcast episodes, with Parakeet, which runs locally on-device.

I was able to transcribe at a rate of about 1-2 minutes per hour of audio on a Mac mini that is a few years old. This is substantially faster than Whisper. I tested on nearly 500 audio files, running for over 16 hours. The quality of transcription was good, but did not label speakers.

Parakeet test

Image and Video Generation

You can use ChatGPT for image generation.

In my opinion, ChatGPT is still much better than Google Gemini for text-to-image generation, while I prefer Gemini for editing or transforming reference images). But if you don’t provide specifics in your prompts, you might end up with that generic “ChatGPT comic strip cartoon” style.

Google Gemini’s image generation can generate and read coherent it’s own writing (even other languages).

This one surprised me. I got Gemini to generate an image with a line of famous Arabic poetry. Then, I used that image as an input in a new chat to create a video in Veo 3. Gemini correctly “read” the poem from the image. Previously, I have had mixed results with generative AI generating images with Arabic text and I had little success in generative AI interpreting other AI-generated text within images.

Text within AI-generated images has improved significantly since last year.

You can write a “Title card” and have Google’s Veo 3 generate a short scene.

First, I tried this myself. Then my 1st-grader wanted to try it. Despite his handwriting and some spelling errors, Gemini generated the scene correctly.

You can draw topographic / contour lines and have Google Gemini generate images with them.

This one was based on a random hunch I had while looking at mapping Twitter. I drew contour lines on a piece of paper, snapped a picture, and threw that into Gemini. I asked for various scenes like a tropical island, a Warhammer miniatures scene, and realistic 3D mountain scenery.

Contour warhammer

You can animate a piece of art with Google’s Veo 3.

If you take a picture of a drawing, painting, or collage you’ve done, you can animate it.

Synthetic Data Generation

You can generate synthetic data using the Prompts to create unique data methodology.

While my original explanation of the Verbalized Sampling paper used the example of kids’ jokes, it can also be used for creating non-existent-yet-correctly-formatted data, which is also useful for long-term Doppelgänger Hallucination testing.

Create synthetic data for Arabic regional dialects.

Both ChatGPT (GPT-5) and Claude (Sonnet-4.5) did a surprisingly good job simulating answers to a questionnaire to identify different words for things in Arabic dialects, based on city and simulated background (more specific than just country-level dialects).

Quickly Build Software for My Own Use

You can use Claude to build playable games with Artifacts.

My kids and I play board games a lot. Sometimes I try out ideas with Claude Artifacts. One idea that got a lot of use was when I made a three-player version of Reversi I came up with to play with my sons.

Three-player Othello game

You can use Claude Code to write entire programs, clean up old code, or organize files and documents.

I have mainly been using Claude Code to organize and improve my website with customize themes and layouts, and improve features like the map of AI cases.

However, as powerful as Claude Code is, it is also important to use GitHub or something similar for version control. It is also critical to make sure Claude is changing code but not your actual writing.

I’ll likely be using Claude Code more in the coming year and writing more about my thinking around it. But I will also be explaining my concerns about automation bias related to Claude Code and similar coding tools.

Create a weekly meal planning app for only my family’s exact needs.

Another beauty of Claude Artifacts. The program included my own recipes and I had Claude add and change features to my heart’s content. The program only had to work for me. Once created, the program itself didn’t use AI.

Learning and Teaching My Kids

ChatGPT Can Make Sight Word Flashcards for Kids

I use two programs for flashcards for my kids: Anki and Hashcards. Both are based on spaced-repetition. I can ask ChatGPT to make sight word cards formatted for either Anki or Hashcards, which saves a ton of time.

I can tell ChatGPT to add more words to the list. I can tell ChatGPT to make the words age appropriate for age or grade or reading proficiency level. I can target the words to a certain topic like Minecraft or Star Wars or birds or dinosaurs.

If you kid has written down sight words they want to practice, you can take a picture of that, tell ChatGPT to transcribe it, and turn that into flashcards in the Anki or Hashcards format. My son had rotated the paper and written in four different directions, had some backward or misshapen letters, and had some misspellings, yet ChatGPT successfully transcribed all of the words. And that was before GPT-5.

K2 Think is good at explaining math concepts and thinking about numbers.

K2 Think is a math-focused LLM from the United Arab Emirates (not to be confused with the similarly named Chinese Kimi K2 Thinking model). While LLMs can sometimes make baffling mathematical mistakes (e.g., “which is larger, 9.11 or 9.9?”), K2 Think is very effective at rephrasing math concepts for a target audience.

I would recommend K2 Think (k2think.ai) for parents as an option to help explain math homework to kids. It is also helpful contextualizing anything involving very large numbers or hard to understand numbers.

However, I would caution against any “teaching yourself with AI” because if you are learning new material you may not be able to identify hallucinations. Drilling on fake information can be worse than not knowing the material at all. And harder to fix later.

Create realistic conversations in multilingual post-colonial contexts, like Franco-Lingala of D.R. Congo or the French-infused Arabic of Morocco or Tunisia.

LLMs even seem capable of recognizing, correctly, that Tunisians would use a heavier mix of French than Moroccans.

I’ve currently been learning some Lingala from a friend, and he’s been consistently surprised at how generally accurate Claude and ChatGPT are at suggesting the right mix of French words with Lingala for natural-sounding speech.

Fact-Checking or Catching Errors

You can use Google Gemini to fact-check maps…with caveats.

I used Gemini to look at the Wikipedia map of U.S. Federal Court District, which incorrectly misses the fact that the District of Wyoming includes the Yellowstone National Park parts of Idaho and Montana. Gemini caught the Yellowstone error, but it also hallucinated an additional error (that was not actually present). Gemini said that “Connecticut is colored as part of the 1st Circuit, but it actually belongs to the 2nd Circuit.” If Connecticut were colored part of the 1st Circuit, that would have been an error, but in fact the map had Connecticut correctly colored as the 2nd Circuit already.

Basically, Gemini could be helpful as a second check to catch things a human reviewer might miss, but it is not reliable enough on its own to be the only check. And if a human reviewer accepted everything it said, it may also not work well because of hallucinations. So for now I think the best use is a hybrid model with a knowledgeable human expert getting some quick feedback from an AI for improved accuracy.

If an image was generated using Google Gemini, you could search for it on Google Gemini to find the SynthID.

If you think an image you saw on social media may have been generated with AI, you can go on Google Gemini and ask it. It can search for a watermark called “SynthID” that shows the image was generated or edited with Google AI. However, images can be edited to remove the watermark, so a negative response does not necessarily mean the image is “real.” Additionally, this does not generally work with images edited with other AI tools, e.g., if you ask Google but the image is from ChatGPT.

It is especially important to note that this is specifically about a watermarking feature that Google added to image outputs. It is not a general principle about all AI outputs for all AI models. For example, you can’t just put a student’s paper into ChatGPT, ask if the paper was generated by ChatGPT, and get a valid response.

K2 Think is good at sanity checks for numbers.

For example, there was a recent controversy about a statistic in the book Empire of AI. Andy Masley noted that the per capita water usage rates for a Peruvian city were implausibly low (this error appears to have been from the original source and not Karen Hao, the author of Empire of Water, who corrected this when it was brought to her attention). Masley gave an example from South Africa of 50 L/person for an extremely low water rationing situation, and noted that the Peruvian stats implied much less, likely due to a unit error mixing up liters and meters cubed.

When I entered the numbers from Hao’s tweet showing the numbers from the primary source in Peru, K2 Think also stated that this was below a same issue as a likely liters/cubic meters mix-up. Bottom line: you could potentially use a model like K2 Think as a “sanity check” for very large numbers to potentially spot errors like this.

Moroccan Thanksgiving Pumpkin Pie Spice Test: Opus 4.5 and Gemini 3 Released Just In Time to Pass One of My Personal Benchmark Questions

· 7 min read
Chad Ratashak
Chad Ratashak
Owner, Midwest Frontier AI Consulting LLC

It’s almost Thanksgiving, which is a fitting time for this story with the new LLM releases from Google and Anthropic. PROMPT: I need to make pumpkin pie in Meknes, Morocco. What word do I need to say verbally in the souq to buy allspice there? Respond only with that word in Arabic and transliteration

Apple pie bites with Moroccan flavors

Apple pie bites and Moroccan balgha (pointed shoes)

info

This is not a very elaborate “benchmark,” but in defense of this, neither is Simon Willison’s Pelican on a Bicycle. Yet that was influential enough for Google to reference it during the release of Gemini 3.

Gemini 3 Pro was the Clear Winner on This Test (Until Opus 4.5 Came Out)

Recently, I tested the newly-released Gemini 3 Pro against ChatGPT-5.1 and Claude Sonnet 4.5 to see which model or models could tell me the word. Then Claude Opus 4.5 came out. I added a few more models to the test for good measure.

  • Gemini 3 Pro got the right answer AND it followed my instructions to answer my question with only the correct word.
  • ChatGPT-5.1 almost got the word (missing some letters), AND it rambled on for several paragraphs despite my instructions to only answer with the word and nothing else.
  • Claude Sonnet 4.5 answered with a common Arabic term for allspice, but not the correct Moroccan Arabic term; when I said “nope, try again” it made a similar error to ChatGPT and almost got the word (missing some letters). Like Gemini and unlike ChatGPT, Claude followed the instructions to answer with only the word.
  • Since Claude Opus 4.5 just came out, I ran the test with Opus, which answered correctly AND followed the instructions to answer with only the word, just like Gemini 3 Pro had done.
info

I tested GPT, Claude, and Gemini LLMs because they are used in legal research tools in addition to being popular in general purpose chatbots. I also tested Grok Expert and Grok 4.1 Thinking for comparison and both answered with a potential Arabic translation, but not the correct answer I was looking for, but followed the instructions. Grok searched a large number of sources and took considerably longer to think before answering than either Gemini or Claude. Meta AI with Llama 4 gave the wrong answer and gave a multiple-paragraph answer despite the instructions. The additional information it provided was also not correct for the Moroccan dialect, which is surprising given the amount of written Arabic dialect usage on Facebook.

caution

LLMs are not deterministic. I ran each of these tests only once for this comparison, so you may not get the same results with the same result and model if you ran the prompt again. I’ve tried this prompt before on earlier versions of ChatGPT and Claude.

Background on Allspice in Moroccan Arabic

Over a decade ago now, I studied abroad in Morocco and was responsible for making apple pie and pumpkin pie for our American Thanksgiving. Apple pie was easy: all the ingredients are readily available in Morocco and nothing has a weird name. But pumpkin pie was harder. I could get cinnamon and cloves easily enough, but in Morocco, they did not understand what I meant when I used the dictionary version of the translation of “allspice” in the market to make pumpkin pie spice.

One of my classmates finally tracked it down in French transliteration in a cooking forum for second-generation French-Algerians. We went to the souq, hoping that the Algerian dialect word for allspice would be the same as the Moroccan word (they have a lot of overlap, but also major differences). Fortunately, it was the same in both dialects, I got the allspice, and we had great pie for Thanksgiving.

…But Gemini Was the Clear Loser on Another Test

So was Gemini 3 Pro the overall best model, at least until Opus 4.5 was just released? Not exactly. I already wrote last week about how Gemini 3 Pro failed at a fairly straightforward and verifiable legal research task.“Gemini 3 Pro Failed to Find All Case Citations With the Test Prompt, Doubled Down When I Asked If That Was All” Note: I have not yet run this legal research test with Claude Opus 4.5, but based on prior Claude models, it would almost certainly do better than Gemini.

How to Set Up Google Gemini Privacy

· 7 min read
Chad Ratashak
Chad Ratashak
Owner, Midwest Frontier AI Consulting LLC

Data training opt-outs and other settings as of October 1, 2025

General Set Up for Lawyers

I will be providing guides on how to configure the privacy settings on three common consumer large language model (LLM) tools: Google Gemini, ChatGPT, and Claude. In this post, I will provide a guide on how to configure a consumer Google Gemini account’s privacy settings based on an attorney conducting legal research. Please note that these instructions are neither a substitute for proper data controls (e.g., proper handling of attorney-client privileged data or personally identifiable information) nor are they are replacement for a generative AI policy for your law firm. This information is current as of October 1, 2025.

You can change the settings on a desktop computer or mobile phone, but the menu options have slightly different names. I will explain using the desktop options with the alternative names for mobile also noted.

Key Point

“Help improve” is a euphemism for “train future models on your data.” This is relevant to both audio and text opt-outs.

This guide assumes you have a Google account signed in to Google Gemini.

Overview

  1. Opt out of training on your audio data. (Euphemistically: “Improve Google services with your audio and Gemini Live recordings.”)
  2. Configure data retention and auto-deletion, which is necessary to avoid training on your conversations with Gemini. (Euphemistically: “your activity…helps improve Google services, including AI models”).
  3. Review a list of “your public links.”
tip

To subscribe to law-focused content, visit the AI & Law Substack by Midwest Frontier AI Consulting.

1. Opt Out of Training on Audio

Risk: Memorization, Conversation Privacy

I strongly advise anyone using generative AI tools, but especially those using it for potentially sensitive work purposes, to opt out of allowing these companies to train future models on your text and audio chats. There are numerous risks for this and no benefit to the individual user.

One risk is private chats (text or voice) being exposed in some way during the data training process. “Human reviewers (including trained reviewers from our service providers) review some of the data we collect for these purposes.

caution

Please don’t enter confidential information that you wouldn’t want a reviewer to see or Google to use to improve our services, including machine-learning technologies” (Gemini Apps Privacy Hub).

Another potential risk is “memorization,” which allows generative AI to re-generate specific pieces of sensitive information. While unlikely for any particular person, the risk remains. For example, researchers in 2023 found that ChatGPT could recreate the email signature of a CEO with their real personal contact information. This is significant, because ChatGPT is not a database (see my discussion of Mata v. Avianca): it would be like writing it down from memory, not looking it up in a phone book.

Screenshot of desktop menu to access Gemini Activity menu

Guide: Opting Out of Audio Training

Click the Gear symbol for Settings, then Activity (on mobile, it’s “Gemini Apps Activity”).

UNCHECK the box next to “Improve Google services with your audio and Gemini Live recordings.”

Screenshot of Gemini Apps Activity menu for opting out of audio data training

2. Chat Retention & Deletion

Risk: Security and Privacy v. Recordkeeping

You may want to keep records of the previous searches you have conducted for ongoing research or to revisit what went wrong if there were issues with a citation. However, by choosing to “Keep activity,” Google notes that “your activity…helps improve Google services, including AI models.”

Therefore, it appears that the only way to opt out of training on your text conversations with Google Gemini conversations is to turn off activity. This is different from ChatGPT, which allows you to opt out of training on your conversations, and Claude, which previously did not train on user conversations at all but moved to a policy similar to ChatGPT’s of training on user conversations with opt-out. As an alternative, you could delete only specific conversations.

Guide: Opting Out of Text Training

Click the Gear symbol for Settings, then Activity (on mobile, it’s “Gemini Apps Activity”). Click the dropdown arrow “On/Off” and select “Turn off” or “Turn off and delete activity” if you also want to delete prior activity. It is also possible to delete individual chats in the main chat interface.

Screenshot of Gemini Apps Activity menu for turning off Keep activity

Guide: Auto-Delete Older Activity

Click the Gear symbol for Settings, then Activity (on mobile, it’s “Gemini Apps Activity”). Click the words “Deleting activity older than [time period]” to adjust the retention period for older conversations. This does not mitigate concerns about Google training on your data, but may protect the data in the event of an account takeover.

Screenshot of Gemini Apps Activity menu for adjusting auto-delete period

Or you can delete recent activity within a certain time period.

Screenshot of Gemini Apps Activity menu for deleting specific period of activity

Risk: Private Conversations on Google

In late July, Fast Company reported that Google was indexing shareable links to ChatGPT conversations created when users shared these conversations. At the time, if ChatGPT users continued the conversation after creating the link, the new content in the chat would also be visible to anyone with access to the link. By contrast, ChatGPT and Anthropic’s Claude now explicitly state that only messages created within the conversation up to the point the link is shared will be visible. Later this year, it was revealed that Google had indexed shareable links to conversations from xAI’s Grok and Anthropic’s Claude.

Click the Gear symbol for Settings, then Your public links (on mobile, click your face or initials, then “Settings,” then “Your public links”).

Screenshot of Google Gemini Your public links

Screenshot of Google Gemini “Your public links.”

On my company website, I recently wrote a blog post showing how small businesses could use Google Gemini for image generation. “Need to Create a Wordcloud for Your Blog Post? Use Google Gemini (and a Piece of Paper).” I am now sharing the link to that chat to demonstrate how the public links privacy works in Google Gemini. The chat link is [here](https://g.co/gemini/share/4626a5e02af7.

You can see in the list above that it is my only public link. It includes the title of the chat, the URL, and the date and time created. Above the list are privacy warnings about creating and sharing links to a Gemini conversation. Based on my test of the shared link, chats added to the conversation after the link is shared do not appear, but I did not see this stated in Google’s warning compared to ChatGPT and Anthropic.

Additionally, you can delete all public links or delete just one specific public link.

Need to Create a Wordcloud for Your Blog Post? Use Google Gemini (and a Piece of Paper)

· 3 min read
Chad Ratashak
Chad Ratashak
Owner, Midwest Frontier AI Consulting LLC

Paper to photo to Google Gemini

This simple workflow will be faster and give you more control over your output than using ChatGPT image generation. That’s because Google Gemini’s new image model, called “nano banana” (hence the banana emoji next to the image generation option) is a better AI model of editing photos without changing too much. Gemini also generates images more quickly than GPT-5. My rule of thumb for images is: if you want to change some specific, use Gemini; if you want to create something creative from scratch, use ChatGPT.

Step 1: Handwriting

Start by writing the wordcloud you want. For my example, I wrote a bunch of generic terms that popped into my head like “example” and “whatever.” If you can’t think of the words you want, you can always generate a short list with Gemini. Vary the direction and size of the writing to make the final image more visually interesting.

Step 2: Photo of the Handwriting

Take a photo of the piece of paper. Crop out the background.

handwritten words in different directions related to generic topic

Step 3: Prompt Gemini

Upload the photo of the handwriting the paper to Gemini with a prompt, such as: Turn these words into the style of a graffiti mural. It should only take a few second to generate the output image. My resulting image was:

words painted in graffiti in different directions related to generic topic on a brick wall

Three Ways Customers Learn About Your Business from Google AI (and what you can do about it)

· 5 min read
Chad Ratashak
Chad Ratashak
Owner, Midwest Frontier AI Consulting LLC

If you are a small business owner who wants nothing to do with AI, I appreciate that decision. Midwest Frontier AI Consulting supports business owners who want to use AI responsibly and business owners who want to make an informed decision not to use AI. However, you still need to learn about generative AI, even if only to avoid it and mitigate the negative effects.

Your customers are using AI to learn about your business, often without even realizing they are using AI. “Google” has been a verb for over two decades now according to Wikipedia, but “googling something,” hasn’t stayed the same. AI tools have moved into familiar areas like Google Search and Google Maps. Here are three ways your customers may be using generative AI to learn about your business from Google’s AI tools, and what you can do about it.

Google’s Gemini AI attempts to summarize website information and provide an overview. However, the AI summary can introduce errors ("hallucinations") that mislead customers. For example, a local Missouri pizzeria was inundated with customer complaints about “updated [sic, appears they meant to say ‘outdated’] or false information about our daily specials” described by Google’s AI Overview (Pizzeria’s Facebook Post).

What Not to Do

Don’t call the information “fake” if it is really information taken out of context. For example, the pizzeria’s Facebook page shows they offer a deal for a large pizza for the price of a small pizza, but only on Wednesdays (outdated information). It is still legitimate to criticize the AI and it is still legitimate to tell customers who want the deal on another day of the week that the offer is only valid on Wednesdays. However, claiming the offer is “made up by the AI” will probably not calm down a customer who may then go to the business’s Facebook profile and see several posts about similar deals (but only on Wednesdays).

Don’t simply tell customers “Please don’t use Google AI.” The customers probably do not realize they are using AI at all. The AI Overview appears at the top of Google Search. Most people probably think they are “just googling it” like they always have and don’t realize the AI features have been added in. So warning them not to use something they didn’t opt into and aren’t actively aware of using is not going to help the situation.

What To Do

  • AI-focused solutions. If AI is going to mix things up like this, you can try to: ** Delete old posts about deals that are not active or make temporary posts, so that AI hopefully won’t include the information in summaries later. ** Word posts carefully with AI in mind. Maybe “only on Wednesday” would be better than “EVERY Wednesday.” Spell out something that would be obvious to a human but not necessarily an AI, like “not valid on any other day of the week.”
  • Customer-focused solutions. Ultimately, it is hard to predict how the AI will act, though, so you will need to prepare for potentially angry customers: ** Train staff on how to handle AI-created customer confusion (or think about how you yourself will talk to customers about it). ** Post signs regarding specials and preempt some AI-created confusion.

Double Agent AI— Staying Ahead of AI Security Risks, Avoiding Marketing Hype

· 5 min read
Chad Ratashak
Chad Ratashak
Owner, Midwest Frontier AI Consulting LLC

Hype Around Agents

You may have heard a marketing pitch or seen an ad recently touting the advantages of “Agentic AI” or “AI Agents” working for you. These growing buzzwords in AI marketing come with significant security concerns. Agents take actions on behalf of the user, often with some pre-authorization to act without asking for further human permission. For example, an AI agent might be given a budget to plan a trip, might be authorized to schedule meetings, or might be authorized to push computer code updates to a GitHub repo.

info

Midwest Frontier AI Consulting LLC does not sell any particular AI software, device, or tool. Instead, we want to equip our clients with the knowledge to be effective users of whichever generative AI tools they choose to use, or help our clients make an informed decision not to use GenAI tools.

Predictable Risks…

…Were Predicted

To be blunt: for most small and medium businesses with limited technology support, I would generally not recommend using agents at this time. It is better to find efficient uses of generative AI tools that still require human approval. In July 2025, researchers published Design Patterns for Securing LLM Agents Against Prompt Injections. The research paper described a threat model very similar to an incident that later happened to the Node JS Package Manager (npm) in August 2025.

“4.10 Software Engineering Agent…a coding assistant with tool access to…install software packages, write and push commits, etc…third-party code imported into the assistant could hijack the assistant to perform unsafe actions such as…exfiltrating sensitive data through commits or other web requests.”

tip

Midwest Frontier AI Consulting LLC offers training and consultation to help you design workflows that take these threats into consideration. We stay on top of the latest AI security research to help navigate these challenges and push back on marketing-driven narratives. Then, you can decide by weighing the risks and benefits.

I was just telling some folks in the biomedical research industry about the risks of agents and prompt injection earlier this week. The following day, I read about how the npm software package was hacked to prompt inject large language model (LLM) coding agents to exfiltrate sensitive data via GitHub.