Damien Charlotin (of Hallucination Database Fame) Makes Great Point About Being Lulled Into a False Sense of Security by Legal AI
“What 1200+ Hallucinated Citations Teach Us About Legal AI,” with Rok Popov Ledinski. https://www.youtube.com/watch?v=K20Kprb7cbo&t=980s
“What 1200+ Hallucinated Citations Teach Us About Legal AI,” with Rok Popov Ledinski. https://www.youtube.com/watch?v=K20Kprb7cbo&t=980s
Key quote for me came at roughly 16:20-18:00 [transcribed by me and edited slightly for clarity]:
More generally, a kind of new pattern that you have is that you include the hallucinations in your brief. You trust AI to write legal content, because it works so well in a lot of other contexts. And it even works so well in the legal field. So if you're using a specialized tool, if you're using-especially if you're using one of the typical traditional platforms you have been using for years. You have been a loyal client to Westlaw for years. They have always been working very well to give you the legal content that you wanted. They've got an AI tool. Why would you not trust this AI tool? Especially when the marketing team of—and I don't want to target Westlaw or anything, that could be any other legal editor—"well, we don't hallucinate because we've got all these kind of methods to make sure we don't hallucinate." Then it's not a question of being aware of hallucination. It's just: you're trusting a tool that tells you that there's no hallucination. You're trusting it because they're telling you, but also because you've got your own habits of them working. And sometimes, if the tool is actually good, 99% of the cases—of the times—it will not hallucinate and you're fine with it. And, you know, when people like me come around and say "You should probably try to check it every time," but if you check 99 times and nothing is wrong, at some point you'll stop checking. And that's completely human and completely normal. So I'm not sure the answer to this is "continue checking just in case." I think the answer will be partly technological and partly still a bit of checking and layering and stuff like that.
A Few Quick Observations
- I appreciate this point, because I’ve been frustrated by the lack of clarity from GenAI software vendors on basic risks like hallucinations and prompt injection, whether they're making business dashboards, or email and scheduling tools, or legal research tools. Across the industry, downplaying these risks has led users to accept the “models are getting better” narrative.
- What Charlotin describes here is basically a “normalization of deviance” (a phrase popularized after the Challenger Disaster). Every time someone skips a verification step and something bad does not happen, it helps rationalize not checking for errors. But even if we accept lower error rates, stuff will still get through if we don’t check at all. But lower error rates might make us complacent.
- Commentators on generative AI in fact frequently invoke the Challenger comparison explicitly, e.g., Simon Willison’s 2026 predictions for coding agents.
- This is addressed by my “glass donut” metaphor, described in my previous post about Sullivan & Cromwell. If it is true that AI hallucinations are getting less frequent, but nonzero and still catastrophic when they occur, the practical effect may be to lower our guard and cause bad habits that fail eventually. Charlotin argues that technology probably has to be part of the solution, and indeed I’m experimenting with a word processor to address some of these core problems as a side project. Charlotin’s PelAIkan cite checker is also an approach. But he says and I agree that training and checking will still be a part of the mix.