Benchmarks
GenAI benchmarking study headed by John Craske

Provider
CMS and LITIG
Dataset name
Litig AI Benchmark
Dataset size
TBD testcases
Benchmark type
Upcoming
Date published
February 19, 2025
Industry
Legal
The LinksAI English law benchmark

Provider
Linklaters
Dataset name
Dataset size
50 testcases
Benchmark type
Open-ended task
Date published
February 10, 2025
Industry
Legal
Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models

Provider
Stanford RegLab
Dataset name
Dataset size
5000 testcases
Benchmark type
Open-ended task
Date published
June 21, 2024
Industry
Legal
Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools

Provider
Stanford
Dataset name
Dataset size
200+ testcases
Benchmark type
Open-ended RAG
Date published
June 6, 2024
Industry
Legal
Long-context retrieval benchmark on legal documents

Version of our internal dataset for evaluating large language models (LLMs) and model systems on complex legal tasks

Provider
Harvey
Dataset name
Dataset size
50+ testcases
Benchmark type
Open-ended task
Date published
August 29, 2024
Industry
Legal
An Evaluation Benchmark Assessing comprehensive performance of LLMs in highly specialized legal domains on Chinese Law

A collaboratively built large language model benchmark for legal reasoning

Provider
Stanford
Dataset name
Dataset size
162 testcases
Benchmark type
Open-ended task
Date published
August 20, 2023
Industry
Legal
The Overruling Dataset: A Benchmark for Detecting Legal Decisions that Have Been Overruled

Provider
Casetext
Dataset name
Dataset size
2,400 testcases
Benchmark type
Binary classification
Date published
April 22, 2021
Industry
Legal