Too Fast To Benchmark?

Overflowing paper binders stacked on a shelf, illustrating the chaos of traditional litigation document management
November 1, 2025

I've expressed skepticism that legal-specific AI can outperform frontier models (ChatGPT, Claude, Gemini) on general legal tasks. One reason for that is I think the frontier models add capabilities faster than any other company can. So even if your legal specific AI does a little better now, it probably won't in a week.

A recent report by 𝒗𝒂𝒍𝒔.𝒂𝒊 that was published a couple weeks ago compares the performance of several legal-specific AI legal research tools against both real lawyers and ChatGPT on a series of 200 legal research questions. It finds certain areas where legal-specific AIs outperform so-called "generalist" AI (i.e., ChatGPT in that case), as well as instances in which human lawyers outperform AI on legal research.

They haven't publicly released all the questions they used but they have released a few of them. I took the examples that were intended to illustrate the two points above--(1) legal AI beating general AI and (2) humans beating AI -- and I gave the exact same questions to the current versions of ChatGPT, Claude, and Gemini.

Here are the results.

Legal AI Beating General AI: This question was about campaign contribution limits. Claude got it right. Gemini and ChatGPT both misunderstood the question as being about what a PAC could contribute rather than what you could contribute to a PAC (which wasn't the problem they had in the 𝒗𝒂𝒍𝒔.𝒂𝒊 study). On clarifying that the question was on limits on donating TO a PAC, they both got it right as well.

Lawyers Beating AI: This was a question about unlicensed sale of securities in California. Each of the frontier models immediately got it right.

The 𝒗𝒂𝒍𝒔.𝒂𝒊 study came out two weeks ago. The work was done over the past summer. And yet, it looks to me like, as it applies to frontier models, the results are already out of date. Which is an example of why I find it hard to bet against the frontier models at least on general purpose legal tasks. They move REALLY fast.

Latest posts

Litigation Tools
AI
Articles
Stop Buying and Start Mapping: How to Make the Right Tech Investments

Firms that buy first and ask questions later get a fraction of the value. Firms that diagnose first capture nearly all of it.

Depositions
Binders
Cross Examination
Litigation Tools
Blog
How to Build a Digital Case Binder in 2026: Organizing Transcripts and Exhibits for Trial

How to organize deposition transcripts and exhibits for trial using a digital case binder. Build witness files, subject matter binders, and exhibit organization that starts from day one of the case.

Going Paperless for Trial in 2026
Binders
AI
Litigation Tools
Depositions
Blog
Going Paperless for Trial in 2026: What to Do at Every Stage of Litigation

A stage-by-stage guide to going paperless for trial. Where paper accumulates in litigation and how to replace physical binders with a digital trial binder at every phase from case intake through courtroom presentation.