Accepting contributions

The trusted source for AI benchmarks

AgentEval.org is an open, community-driven initiative dedicated to advancing AI development by sharing domain-specific benchmarks and methodologies.

Join the working group

Submit a resource

conciseness

perplexity

trustworthiness

alignment

factuality

guidelines

voice_and_tone

recall

precision

Trust and verify mission-critical AI systems

conciseness

perplexity

trustworthiness

alignment

factuality

guidelines

voice_and_tone

recall

precision

Trust and verify mission-critical AI systems

conciseness

perplexity

trustworthiness

alignment

factuality

guidelines

voice_and_tone

recall

precision

Why we built AgentEval.org

Mission

To establish a trusted, open source for sharing AI benchmarks and best practices that drive transparency and continuous improvement in AI evaluation.

Vision

A future where open collaboration and shared data standards accelerate responsible AI innovation, making evaluation methodologies accessible and verifiable for everyone.

Why Open Benchmarks?

Transparency & Trust

Open-sourcing our benchmarks and methodologies allows anyone to inspect, validate, and contribute to our evaluation processes.

Community-driven innovation

An open platform invites contributions from a broad community, leading to more robust and diverse evaluation practices.

Industry adoption

Open-source tools and standards are more likely to be adopted by academic institutions, industry players, and public agencies.

Non-profit and collaborative alignment

Emphasizing open source aligns perfectly with our mission to move the industry forward through shared knowledge and collective effort.

The trusted source for AI benchmarks

Why we built AgentEval.org

Mission

Vision

Why Open Benchmarks?

Transparency & Trust

Community-driven innovation

Industry adoption

Non-profit and collaborative alignment

Find resources

Benchmarks

Benchmarks & evals

Best practices & methodology