BackData input controls

This page is still being polished. If you have thoughts, please share them via the feedback form.

Data on this page is preliminary and may change. Please do not share or cite these figures publicly.

Data input controls

Jones (2024)|LLM classified

Mitigation Taxonomy

1AI System

1.1Model

1.1.1Training Data

Modifications to training data composition, quality, and filtering that affect what the model learns.

Also in Model

1.1.2 Learning Objectives1.1.3 Capability Modification1.1.4 Model Architecture

Definition

Filter data used to train AI models, e.g. don’t train your model with instructions to launch cyberattacks.

Additional Information

iltering out this information could help prevent AI systems from doing bad things. Work here might involve: - Determining what information should be filtered out in the first place - or more likely, guidelines for identifying what should be filtered out or not. - Example of needing empirical research: it’s unclear whether a model trained with misinformation would be less helpful in getting people to the truth. Training on misinformation might encourage bad outputs that copy this, or could help models detect misinformation and develop critiques that convince people of the truth. Researchers could train models with and without this information and see what performs better. - Example of context-dependent filtering: a model to help autocomplete general emails probably doesn’t need to know how to launch a cyberattack. But a model used by well-intentioned security researchers might. - Developing tools to do this filtering effectively and at scale. For example, developing an open-source toolkit for classifying or cleaning input data. The focus here should probably be on implementing filters for high-risk data.

LLM Classification Details

Reasoning

Filters harmful data from training corpus before model learning occurs.

Code: 1.1.1Version: v0.5Classified: Jan 22, 2026

Other mitigations from Jones (2024) (49)

Compute goverance

Regulate companies in the highly concentrated AI chip supply chain, given AI chips are key inputs to developing frontier AI models.

3.1.1 Legislation & Policy

Lifecycle:Other (outside lifecycle)Actor:Governance ActorAIRM:Govern

Licensing

Require organisations or specific training runs to be licensed by a regulatory body, similar to licensing regimes in other high-risk industries.

3.1.4 Compliance Requirements

Lifecycle:Other (outside lifecycle)Actor:Governance ActorAIRM:Govern

On-chip governance mechanisms

Make alterations to AI hardware (primarily AI chips), that enable verifying or controlling the usage of this hardware.

1.2.4 Security Infrastructure

Lifecycle:Other (stage not listed)Actor:Infrastructure ProviderAIRM:Govern

Safety cases

Develop structured arguments demonstrating that an AI system is unlikely to cause catastrophic harm, to inform decisions about training and deployment.

2.2.4 Assurance Documentation

Lifecycle:Plan and DesignActor:Governance ActorAIRM:Measure

Evaluations (aka “evals”)

Give AI systems standardised tests to assess their capabilities, which can inform the risks they might pose.

2.2.2 Testing & Evaluation

Lifecycle:Verify and ValidateActor:Governance ActorAIRM:Measure

Red-teaming

Perform exploratory and custom testing to find vulnerabilities in AI systems, often engaging external experts.

2.2.2 Testing & Evaluation

Lifecycle:Verify and ValidateActor:Governance ActorAIRM:Measure

View all 49 mitigations from this source →

Source Document

The AI regulator’s toolbox: A list of concrete AI governance practices

Jones, Adam (2024)

This article explains concrete AI governance practices people are exploring as of August 2024. Prior summaries have mapped out high-level areas of work, but rarely dive into concrete practice details. This summary explores specific practices addressing risks from advanced AI systems. Practices are grouped into categories based on where in the AI lifecycle they best fit. The primary goal of this article is to help newcomers contribute to the field of AI governance by providing a comprehensive overview of available practices.

View source

Classification

AI Lifecycle Stage

Collect and Process Data

Gathering, curating, labelling, and preprocessing training data

Responsible Actor

Developer

Entity that creates, trains, or modifies the AI system

NIST AI RMF Function

Manage

Prioritising, responding to, and mitigating AI risks

Map

Risk Domains

Primary

7.2 AI possessing dangerous capabilities

Other

3.1 False or misleading information 4.2 Cyberattacks, weapon development or use, and mass harm