This page is still being polished. If you have thoughts, please share them via the feedback form.
Data on this page is preliminary and may change. Please do not share or cite these figures publicly.
Cross-organization coordination mechanisms, information sharing, and collaborative monitoring.
Also in Voluntary & Cooperative
Supporting ecosystem mitigations involves developers providing information, tools, and capabilities that enable other actors – governments, organizations, and civil society – to implement effective defenses against AI-enabled threats. While they may often not be the primary actor for societal defenses, developers can contribute by sharing resources that strengthen the broader defensive ecosystem.
6.1 Information Sharing and Documentation
Developers can enhance AI ecosystem defenses through voluntary information sharing. This includes sharing threat models and risk assessments that help recipients understand potential threat vectors and/or inform defensive strategies. This information sharing requires balancing transparency with security and legal considerations, such as providing sufficient detail to enable defenses without creating roadmaps for malicious actors.
3.3.1 Industry Coordination6.2 Supporting Defensive Systems and Research
Developers, other actors in the AI ecosystem, and society broadly can develop or support the development of systems specifically designed to strengthen defensive capabilities, potentially including the use of frontier models. In the biological domain, this could include supporting improved pathogen surveillance using data sources like wastewater monitoring and open-source health reporting, investment into improved personal protective equipment resources, or improved DNA synthesis screening to detect potentially dangerous orders. For cybersecurity, developers could support the development of vulnerability detection tools for critical infrastructure. Developers could also support research into detecting and preventing autonomous misalignment, such as tools for monitoring goal drift or unexpected model behaviors.
3.2 Shared Infrastructure6.3 Reporting and Early Warning Systems
Developers can establish mechanisms for rapid threat detection and response. This includes contributing to early warning systems by reporting novel threats or concerning behaviors to relevant authorities, establishing clear reporting channels for security researchers and incident responders to communicate with AI developers, and developing internal thresholds for escalating concerns to appropriate government agencies. These systems enable faster response to emerging threats across the ecosystem.
3.3.1 Industry CoordinationCapability Limitation Mitigations
Capability limitation mitigations aim to prevent models from possessing knowledge or abilities that could enable harm. These methods alter the model’s weights or training process, so that it cannot assist with harmful actions when prompted by humans or autonomously pursue harmful objectives.
1.1.3 Capability ModificationCapability Limitation Mitigations > Data Filtering
Data filtering involves removing content from training datasets that could lead to dual-use or potentially harmful capabilities. Developers can use several methods: automated classifiers to identify and remove content related to weapons development, detailed attack methodologies, or other high-risk domains; keyword-based filters to exclude documents containing specific terminology or instructions of concern; and machine learning models trained to recognize subtle patterns in content that might contribute to dangerous capabilities.
1.1.1 Training DataCapability Limitation Mitigations > Exploratory Methods
Beyond data filtering, researchers are investigating additional capability limitation approaches
1.1.3 Capability ModificationCapability Limitation Mitigations
Capability limitation mitigations aim to prevent models from possessing knowledge or abilities that could enable harm. These methods alter the model's weights or training process, so that it cannot assist with harmful actions when prompted by humans or autonomously pursue harmful objectives. However, the effectiveness of these mitigations is an active area of research, and they can currently be circumvented if dual-use knowledge (knowledge that has both benign and harmful applications) is added in the context window during inference or fine-tuning.
1.1.3 Capability ModificationCapability Limitation Mitigations > 2.1 Data Filtering
Data filtering involves removing content from training datasets that could lead to dual-use or potentially harmful capabilities. Developers can use several methods: automated classifiers to identify and remove content related to weapons development, detailed attack methodologies, or other high-risk domains; keyword-based filters to exclude documents containing specific terminology or instructions of concern; and machine learning models trained to recognize subtle patterns in content that might contribute to dangerous capabilities.
1.1.1 Training DataCapability Limitation Mitigations > 2.2 Exploratory Methods
Beyond data filtering, researchers are investigating additional capability limitation approaches. However, these methods face technical challenges, and their effectiveness remains uncertain. ● Model distillation could create specialized versions of frontier models with capabilities limited to specific domains. For example, a model could excel at medical diagnosis while lacking knowledge needed for biological weapons development. While the capability limitations may be more fundamental than post-hoc safety training, it remains unclear how effectively this approach prevents harmful capabilities from being reconstructed. Additionally, multiple specialized models would be needed to cover various use cases, increasing development and maintenance costs. ● Targeted unlearning attempts to remove specific dangerous capabilities from models after initial training, offering a more precise alternative to full retraining. Possible approaches include fine-tuning on datasets to overwrite specific knowledge while preserving general capabilities, or modifying how models internally structure and access particular information. However, these methods may be reversible with relatively modest effort – restoring "unlearned" capabilities through targeted fine-tuning with small datasets. Models may also regenerate removed knowledge by inferring from adjacent information that remains accessible. While research continues on these approaches, developers currently rely more heavily on post-deployment mitigations that can be more reliably implemented and assessed.
1.1.3 Capability ModificationFrontier Mitigations
Frontier Model Forum (2025)
Frontier mitigations are protective measures implemented on frontier models, with the goal of reducing the risk of potential high-severity harms, especially those related to national security and public safety, that could arise from their advanced capabilities. This report discusses emerging industry practices for implementing and assessing frontier mitigations. It focuses on mitigations for managing risks in three primary domains: chemical, biological, radiological and nuclear (CBRN) information threats; advanced cyber threats; and advanced autonomous behavior threats. Given the nascent state of frontier mitigations, this report describes the range of controls and mitigation strategies being employed or researched by Frontier Model Forum members and documents the known limitations of these approaches.
Other (outside lifecycle)
Outside the standard AI system lifecycle
Developer
Entity that creates, trains, or modifies the AI system
Govern
Policies, processes, and accountability structures for AI risk management