Benchmarks

Search benchmarks…

Search benchmarks…

Search benchmarks…

GenAI benchmarking study headed by John Craske

Provider

CMS and LITIG

Dataset name

Litig AI Benchmark

Dataset size

TBD testcases

Benchmark type

Upcoming

Date published

February 19, 2025

Industry

Legal

The LinksAI English law benchmark

Provider

Linklaters

Dataset size

50 testcases

Benchmark type

Open-ended task

Date published

February 10, 2025

Industry

Legal

Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models

Provider

Stanford RegLab

Dataset size

5000 testcases

Benchmark type

Open-ended task

Date published

June 21, 2024

Industry

Legal

Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools

Provider

Stanford

Dataset name

Dataset size

200+ testcases

Benchmark type

Open-ended RAG

Date published

June 6, 2024

Industry

Legal

Long-context retrieval benchmark on legal documents

Provider

Stanford Hazy Research

Dataset name

Dataset size

7,730 testcases

Benchmark type

Open-ended task

Date published

February 12, 2024

Industry

Legal

Version of our internal dataset for evaluating large language models (LLMs) and model systems on complex legal tasks

Provider

Harvey

Dataset name

Dataset size

50+ testcases

Benchmark type

Open-ended task

Date published

August 29, 2024

Industry

Legal

An Evaluation Benchmark Assessing comprehensive performance of LLMs in highly specialized legal domains on Chinese Law

Provider

Open Compass

Dataset name

Dataset size

20 testcases

Benchmark type

Open-ended task

Date published

September 28, 2023

Industry

Legal

A collaboratively built large language model benchmark for legal reasoning

Provider

Stanford

Dataset name

Dataset size

162 testcases

Benchmark type

Open-ended task

Date published

August 20, 2023

Industry

Legal

The Overruling Dataset: A Benchmark for Detecting Legal Decisions that Have Been Overruled

Provider

Casetext

Dataset name

Dataset size

2,400 testcases

Benchmark type

Binary classification

Date published

April 22, 2021

Industry

Legal

CLAUDETTE: an Automated Detector of Potentially Unfair Clauses in Online Terms of Service

Provider

Universit`a di Modena

Dataset name

Dataset size

9,414 testcases

Benchmark type

Binary classification

Date published

May 3, 2018

Industry

Legal