CIP and Anthropic launch Collective Constitutional AI
Earlier this year, CIP created Alignment Assemblies to figure out how collective input from society can shape AI development. We’ve been experimenting with different methods, and are excited to announce that we’ve partnered with Anthropic to pilot one way of doing this: training a model on a collectively-designed constitution. Read the Anthropic blog post for details on how we did this. (Also, our project was reported on in the New York Times.)
If generative AI usage is going to shape how people work, communicate, and interact at a mass scale – which it is doing, and will do into the future — having public input into model behaviour is crucial. We found Anthropic’s Constitutional AI work a promising starting point for an Alignment Assembly: this technique provides a way to directly steer model behavior through written principles which, to us, opened up the possibility of training a large language model on a constitution that is collectively designed by the public, and better reflects the public’s values. Constitutional AI makes it easier for democratic oversight to access than traditional methods, and enables the public to provide input into and understand the behavioural rules of the AI they interact with.
In this post, we walk through how we asked a representative group of 1,000 Americans what behavior they desired in an LLM chat agent, what we learned from the model we trained on those values, and what might come next. In the end, the public model was less biased on a range of stereotypes, and performed equivalently to the baseline model in evaluations looking at math, natural language understanding, and degrees of helpfulness and harmlessness. We’d call this a success!
More than the resulting model, we’re excited about the process. We believe that this may be one of the first instances in which members of the public have, as a group, intentionally directed the behavior of a large language model. We hope that communities around the world will build on techniques like this to train culturally- and context-specific models that serve their needs, and that these processes will be incorporated into foundation models as well. This work is necessarily imperfect, but we hope that it opens the door to many more experiments in which groups of people are able to directly influence the technologies that will continue to transform society.
What else, and what’s next?
CIP is continuing to work on democratic/public input into AI at many levels, which is critical for steering AI towards collective good.
We’re running Alignment Assemblies to feed collective intelligence into other parts of AI development, coordinating with various process and industry partners, and building an online platform for wider coordination. We plan to bring together and support thinkers and doers in both the AI and democracy spaces, to advance our collective agency over our AI futures. Stay tuned!