Beyond hallucinations: enhancing AI reliability in legal analysis

Published September 1, 2025, last updated September 11, 2025

Introduction

Imagine an AI confidently analyzing a legal case, only to base its conclusion on a guess about what someone meant by pointing a finger. In high-stakes fields like law, where precision can make or break a case, such errors—known as hallucinations—aren’t just academic; they’re costly.

Studies show large language models (LLMs) hallucinate in 17-34% of legal queries, often weaving plausible but incorrect details into their answers. This article investigates why even state-of-art LLMs, like Gemini 2.5 Pro, falter in tasks requiring strict logical reasoning, using a hearsay case study to illustrate the problem. We’ll contrast this with a deterministic AI system, KBAI, which offers a more cautious, evidence-driven approach.

Discussing a single example, we’ll explore how to move past the pitfalls of probabilistic AI toward a future of trustworthy legal analysis.

The problem: LLMs and fabricated facts

Hallucinations in LLMs occur when models generate details not grounded in the input, often because their probabilistic nature prioritizes plausible outputs over factual accuracy. Trained on vast datasets, LLMs excel at recognizing patterns and producing structured responses, but they struggle with ambiguity.

In legal tasks, this can lead to overconfident answers that assume unstated facts, especially when models are tuned to score well on benchmarks like LegalBench, emphasizing structured yes or no answers. Research highlights that LLMs overfit to these tests, memorizing patterns rather than reasoning through edge cases.

For instance, studies note that even high-performing models like GPT-4o falter under adversarial prompts, revealing a gap between benchmark success and genuine understanding. (See, for example, Red Teaming GPT-4o: Uncovering Hallucinations in Legal AI Models study by General Analysis.) In law, where a single misstep can derail a case, this tendency to “fill in the blanks” poses a serious challenge.

Case study: a hearsay question gone wrong

Consider a legal prompt: “On the issue of whether Tom identified the suspect at the scene of the crime, the fact that he pointed at the defendant. Is there hearsay?”

Under the Federal Rules of Evidence (FRE) 801, hearsay is an out-of-court statement offered to prove the truth of what it asserts, including nonverbal conduct if intended as communication.

Let’s compare how an LLM (we used Gemini 2.5 Pro, chosen for its superior performance in LegalBench benchmark and wide industry acclaim for legal applications) and KBAI handle this.

(See full Gemini and KBAI responses in the Appendix A, below.)

Gemini’s response

Gemini delivers a detailed analysis, concluding the pointing is hearsay. It reasons that Tom’s pointing is assertive conduct, implying “That’s the suspect,” made out-of-court at the crime scene, and offered to prove the defendant’s identity. The model assumes the pointing was intentional, perhaps in response to an unstated question like “Who did it?” It even notes a potential hearsay exception (FRE 801(d)(1)(C)), but its core conclusion hinges on inferring communicative intent without explicit evidence in the prompt.

KBAI’s response

KBAI's response takes a different approach. It declares the pointing isn’t hearsay, arguing it’s not a statement because it lacks clear communicative intent. It emphasizes that without details on whether Tom was responding to a question or acting spontaneously, the action doesn’t meet the FRE’s definition of a statement. KBAI avoids a firm conclusion, flagging the need for more context about the pointing’s purpose.

What’s happening?

Gemini’s response, while polished, assumes facts not provided—namely, that Tom’s pointing was meant to assert “That’s the suspect.” This mirrors how courts often interpret pointing in lineups or crime scenes as assertive, but the prompt’s ambiguity (no mention of a question or context) makes this a leap.

KBAI, by contrast, sticks to a deterministic approach, refusing to classify without clear evidence of intent. This case shows LLMs’ bias toward producing “correct” benchmark-style answers, even if it means guessing, while KBAI prioritizes precision over completion.

Why LLMs struggle with strict reasoning

LLMs’ probabilistic design makes them prone to over-inference, especially in legal tasks where ambiguity is common. In the hearsay example, Gemini’s assumption aligns with typical court interpretations—pointing is often treated as a statement (e.g., in cases like United States v. Ross, where gestures were deemed assertive).

But without prompt details, this is still a guess, reflecting a broader issue: LLMs are trained to match patterns in datasets like LegalBench, which reward structured legal arguments over admitting uncertainty.

Research shows models achieve high scores by memorizing benchmark patterns, not by reasoning through incomplete data. In law, where missing context can change an outcome, this overreach can lead to errors, like misclassifying evidence or citing nonexistent cases (a known LLM issue, with 58% hallucination rates in legal queries, according to the article published in the Journal of Legal Analysis).

OpenAI researchers found that this happens, in part, because optimizing LLMs for accuracy rate (how many right answers are given) can come at the expense of increased error rate (where wrong answers are given instead of the model abstaining). In effect, the models are taught to behave similarly to how a human would on an exam, making a best guess when the precise answer is unknown. This isn't reflective of environments where modern LLMs need to operate, such as making legal, engineering, and medical judgments.

Comparison of approaches

Aspect	LLM (Gemini)	KBAI (Deterministic AI)
Reasoning Style	Probabilistic, pattern-based	Rule-based, evidence-driven
Handling Ambiguity	Infers unstated details (e.g., intent)	Flags uncertainty, demands context
Benchmark Performance	Optimized for “correct” outputs	Prioritizes accuracy over completion
Legal Reliability	Risks hallucinations (17-34% in studies)	Minimizes assumptions, reduces errors

KBAI’s edge: deterministic reasoning in action

Deterministic AI, like KBAI, operates on fixed rules or knowledge bases, ensuring outputs are consistent and tied to explicit input. Unlike LLMs, which predict based on statistical likelihood, KBAI follows a logic-driven path, only concluding when evidence supports it.

In the hearsay case, KBAI’s refusal to classify Tom’s pointing as a statement without context aligns with legal caution—courts require clear intent for nonverbal conduct to qualify as hearsay. This makes KBAI ideal for tasks needing precision, like evidence analysis, where it can:

Reduce Hallucinations: By avoiding guesses, KBAI cuts down on fabricated details, a key issue in legal AI.
Handle Edge Cases: Its rule-based approach ensures consistent handling of ambiguous or incomplete prompts.
Complement LLMs: KBAI could serve as a verification layer, checking LLM outputs for logical consistency.

For example, integrating KBAI with an LLM could flag when Gemini’s hearsay conclusion lacks evidence, prompting a review rather than accepting the guess.

Looking ahead: AI in legal practice

The hearsay case underscores a broader challenge: LLMs’ strengths in fluency don’t guarantee accuracy in fields like law, where stakes are high. Deterministic systems like KBAI point to a path forward, offering reliability where LLMs falter.

Legal professionals should consider hybrid tools that pair LLMs’ versatility with deterministic checks to ensure robust outputs.

Developers, meanwhile, can learn from KBAI’s approach, building AI that prioritizes evidence over plausibility. While deterministic AI isn’t perfect—its rigidity can limit creativity in less structured tasks—it’s a step toward systems that don’t just sound right but are right.

Conclusion

The pointing finger example reveals LLMs’ Achilles’ heel: a tendency to guess details to fit expected patterns, risking errors in legal analysis. KBAI’s cautious, rule-based approach shows how deterministic AI can enhance accuracy, demanding clear evidence before concluding. As AI becomes central to legal work, moving beyond hallucinations is critical. Try testing your own AI tools on ambiguous legal prompts, or explore how systems like KBAI can complement existing models.

The future of legal AI lies in blending the best of both worlds—fluency and precision—for answers you can trust.

References and further reading

Matthew Dahl, Varun Magesh, Mirac Suzgun, Daniel E Ho, Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models, Journal of Legal Analysis, Volume 16, Issue 1, 2024, Pages 64–93
Adam Tauman Kalai, Ofir Nachum, Santosh S. Vempala, Edwin Zhang, Why Language Models Hallucinate, 4 September 2025
Faiz Surani, Daiel E. Ho, AI on Trial: Legal Models Hallucinate in 1 out of 6 (or More) Benchmarking Queries, Stanford Institute for Human-Centered AI, 2024

Appendix A. Full responses

Gemini 2.5 Pro response

Here is a detailed legal analysis of the statement based on the rules of evidence.

Legal Analysis of the Statement

The statement to be analyzed is: "On the issue of whether Tom identified the suspect at the scene of the crime, the fact that he pointed at the defendant."

The governing rule is: Hearsay is an out-of-court statement introduced to prove the truth of the matter asserted.

To determine if this constitutes hearsay, we must break it down into its three core components.

1. Is this a "Statement" or Conduct?

For the purposes of hearsay, a "statement" is not limited to spoken or written words. Under the Federal Rules of Evidence (FRE 801(a)) and corresponding state rules, a "statement" is defined as: (1) a person's oral assertion, (2) a written assertion, or (3) nonverbal conduct, if the person intended it as an assertion.

The action in question is Tom "pointed at the defendant." This is nonverbal conduct. The critical question is whether Tom intended this pointing to be an assertion.

Analysis: When a witness at the scene of a crime points at a specific person in response to a question like "Who did it?" or simply upon seeing the suspect, the act of pointing is not a random or meaningless gesture. The person is using the gesture to communicate a fact. The clear and intended assertion behind the act of pointing is, "That is the person who committed the crime" or "He is the one."

Because the pointing was intended to communicate a specific factual assertion, it qualifies as assertive conduct. Therefore, for the purposes of the hearsay rule, Tom's act of pointing is legally considered a "statement."

2. Was the Statement Made "Out-of-Court"?

An "out-of-court" statement is one made at any time or place other than while testifying under oath at the current trial, hearing, or proceeding.

Analysis: The facts specify that the pointing occurred "at the scene of the crime." The scene of the crime is, by definition, not the courtroom where the current trial is being held. The statement was made prior to the current legal proceedings and outside the presence of the judge and jury in a formal court setting.

Therefore, the statement (the act of pointing) was unequivocally made "out-of-court."

3. Is it Being Offered to Prove the Truth of the Matter Asserted?

This is the most crucial part of the analysis. We must determine the purpose for which the evidence is being introduced.

The Matter Asserted: As established in Part 1, the assertion Tom made by pointing was, "That is the person who committed the crime."
The Purpose of Offering the Evidence: The prompt states the evidence is being offered "on the issue of whether Tom identified the suspect." The prosecution wants to use the fact that Tom pointed at the defendant to convince the jury that the defendant is, in fact, the person Tom saw commit the crime.

Analysis: The evidence of Tom's pointing is being offered to prove that the content of his assertion is true—that the defendant is indeed the correct person. The value of this evidence to the prosecution's case depends entirely on the jury believing the truthfulness of Tom's out-of-court identification. If the evidence were offered for a different purpose (e.g., to prove that Tom was conscious and able to move his arm after an assault), it would not be for the truth of the matter asserted. However, in this context, its sole purpose is to prove the identity of the perpetrator.

Therefore, the statement is being offered to prove the truth of the matter asserted.

4. Conclusion: Is This Hearsay?

Based on the analysis of the three essential elements:

The act of pointing was assertive conduct, making it a "statement" under evidence rules.
The statement was made "out-of-court" (at the scene of the crime).
The statement is being offered to prove the truth of the matter asserted (that the person pointed at is the perpetrator).

Final Conclusion:

Yes, the statement constitutes hearsay. It meets all three prongs of the definition. A witness testifying about Tom's out-of-court pointing to identify the defendant is classic hearsay.

Important Note on Admissibility: While the statement is correctly identified as hearsay, it is crucial to note that this does not automatically mean it is inadmissible in court. The rules of evidence contain numerous exceptions and exemptions to the hearsay rule. In this specific case, an out-of-court identification would likely be admissible under a specific exemption, such as FRE 801(d)(1)(C), which exempts a prior statement of identification from the hearsay rule if the declarant (Tom) testifies at the current trial and is subject to cross-examination about the prior identification. However, the analysis of whether it is hearsay is a separate and distinct step from determining if an exception applies. Based on the definition alone, it is hearsay.

KBAI response (high-level chat API)

Based on the information given, the statement in question does not qualify as hearsay. For something to be considered hearsay, it generally needs to be a statement made outside of court that's intended to prove the truth of the matter asserted. In this case, the fact that Tom pointed at the defendant is not considered a "statement" because it wasn't an oral or written assertion, nor was it nonverbal conduct that intended to act as communication. Additionally, since it doesn't aim to prove the truth of what is asserted, it doesn't fit the criteria for hearsay. Therefore, you don't need to worry about this being classified as hearsay in a legal setting.