Steering transformative AI towards better outcomes.

For everyone.

A Platform to Build and Share AI Evaluations.
Current AI evaluations measure what's easy, not what's important. Benchmarks that rely on multiple-choice questions or simple pass/fail tests can't capture the nuance of real-world tasks. They can tell you if code runs, but not if it's well-written. They can test for textbook knowledge, but not for applied wisdom or regional accuracy.
That's why we built Weval: an open, collaborative platform to build evaluations that test what matters to you. We empower a global community to create qualitative benchmarks for any domain—from medical chatbots to legal assistance. Just as Wikipedia democratized knowledge, Weval aims to democratize evaluation, ensuring that AI works for, and represents, everyone.

We engage thousands of people across the globe through structured, AI-enabled deliberation to understand how they’re interacting with and impacted by AI systems. This creates an open, longitudinal dataset to inform AI development, policy, and build globally relevant benchmarks.

Using AI-enabled deliberative tools, our community models platform lets communities create and refine AI models based on collectively-defined constitutions.

If AI is going to be deployed globally, it should work globally. We are conducting cultural and social evaluations to better measure and improve the contextually-specific capabilities of frontier AI. We are partnering with international NGOs, local experts, and AI labs to build regional benchmarks of AI.

In 2024, we set the agenda for Democratic AI, publishing a roadmap for what can be immediately done, built, researched, advocated for, and funded.

We’ve run alignment assemblies to incorporate collective input at the ground level, developing new ways to determine what is good and how to achieve it within AI themselves, and the control structures that govern them.