This page is still being polished. If you have thoughts, please share them via the feedback form.
Data on this page is preliminary and may change. Please do not share or cite these figures publicly.
Red teaming, capability evaluations, adversarial testing, and performance verification.
Also in Risk & Assurance
Give AI systems standardised tests to assess their capabilities, which can inform the risks they might pose.
There are many streams of work within evaluations: - Building new tests. Working on this: [METR](https://github.com/METR/public-tasks), [CAIS](https://www.safe.ai/work/research), [MLCommons](https://arxiv.org/pdf/2404.12241) and others. - Evaluating existing systems. Working on this: [UK AISI](https://www.aisi.gov.uk/work/advanced-ai-evaluations-may-update), [METR](https://metr.org/blog/2023-08-01-new-report/) and others. - Figuring out when and how we should evaluate systems, as well as what we should do as a result of these evaluations. - [DeepMind](https://arxiv.org/pdf/2305.15324) and [GovAI](https://arxiv.org/pdf/2307.03718) (both papers have people from a wide array of organisations) have explored when and how we should evaluate systems. There's also [a blueprint for frontier AI regulation involving these evaluations](https://www.ai-far.org/news/frontier-ai-regulation-blueprint). - Research into how we can do evaluations well. - Work here includes [understanding how structuring questions changes results](https://arxiv.org/pdf/2310.11324), or detecting when [models might be intentionally underperforming (“sandbagging”)](https://arxiv.org/pdf/2406.07358) - [Apollo Research](https://www.apolloresearch.ai/blog/we-need-a-science-of-evals) has a good summary post which lists several open questions. - Taking the results of research and turning this into practical guidance, standards and regulations. Working on this: [AI Standards Lab](https://www.aistandardslab.org/). - Building infrastructure to build and carry out tests. Working on this: [METR](https://github.com/METR/task-standard), [AISI](https://github.com/UKGovernmentBEIS/inspect_ai), [Atla](https://www.atla-ai.com/) and others.
Reasoning
Standardized testing assesses AI system capabilities to identify potential risks and inform deployment decisions.
Compute goverance
Regulate companies in the highly concentrated AI chip supply chain, given AI chips are key inputs to developing frontier AI models.
3.1.1 Legislation & PolicyData input controls
Filter data used to train AI models, e.g. don’t train your model with instructions to launch cyberattacks.
1.1.1 Training DataLicensing
Require organisations or specific training runs to be licensed by a regulatory body, similar to licensing regimes in other high-risk industries.
3.1.4 Compliance RequirementsOn-chip governance mechanisms
Make alterations to AI hardware (primarily AI chips), that enable verifying or controlling the usage of this hardware.
1.2.4 Security InfrastructureSafety cases
Develop structured arguments demonstrating that an AI system is unlikely to cause catastrophic harm, to inform decisions about training and deployment.
2.2.4 Assurance DocumentationRed-teaming
Perform exploratory and custom testing to find vulnerabilities in AI systems, often engaging external experts.
2.2.2 Testing & EvaluationThe AI regulator’s toolbox: A list of concrete AI governance practices
Jones, Adam (2024)
This article explains concrete AI governance practices people are exploring as of August 2024. Prior summaries have mapped out high-level areas of work, but rarely dive into concrete practice details. This summary explores specific practices addressing risks from advanced AI systems. Practices are grouped into categories based on where in the AI lifecycle they best fit. The primary goal of this article is to help newcomers contribute to the field of AI governance by providing a comprehensive overview of available practices.
Verify and Validate
Testing, evaluating, auditing, and red-teaming the AI system
Governance Actor
Regulator, standards body, or oversight entity shaping AI policy
Measure
Quantifying, testing, and monitoring identified AI risks
Other