Accepting contributions

The trusted source for AI benchmarks

AgentEval.org is an open, community-driven initiative dedicated to advancing AI development by sharing domain-specific benchmarks and methodologies.

conciseness

perplexity

trustworthiness

alignment

factuality

guidelines

voice_and_tone

recall

precision

Trust and verify mission-critical AI systems

conciseness

perplexity

trustworthiness

alignment

factuality

guidelines

voice_and_tone

recall

precision

Trust and verify mission-critical AI systems

Trust and verify mission-critical AI systems

conciseness

perplexity

trustworthiness

alignment

factuality

guidelines

voice_and_tone

recall

precision

Why we built AgentEval.org

Mission

To establish a trusted, open source for sharing AI benchmarks and best practices that drive transparency and continuous improvement in AI evaluation.

Vision

A future where open collaboration and shared data standards accelerate responsible AI innovation, making evaluation methodologies accessible and verifiable for everyone.

Why Open Benchmarks?

Transparency & Trust

Open-sourcing our benchmarks and methodologies allows anyone to inspect, validate, and contribute to our evaluation processes.

Community-driven innovation

An open platform invites contributions from a broad community, leading to more robust and diverse evaluation practices.

Industry adoption

Open-source tools and standards are more likely to be adopted by academic institutions, industry players, and public agencies.

Non-profit and collaborative alignment

Emphasizing open source aligns perfectly with our mission to move the industry forward through shared knowledge and collective effort.

Find resources

Benchmarks

Benchmarks & evals

Best practices & methodology