BackGoogle DeepMind Frontier Safety Framework Version 2.0

Google DeepMind Frontier Safety Framework Version 2.0

EnactedOtherAI SystemsPrivate Sector

|Private-sector companies·February 4, 2025

Official name: Google DeepMind Frontier Safety Framework Version 2.0

Establishes Critical Capability Levels (CCLs) to assess AI risks from misuse and deceptive alignment and requires evaluations for CCL proximity and applies mitigations. Addresses detection and monitoring strategies for deceptive alignment and deployment procedures for misuse risks.

Analysis summaries, actor details, and coverage mappings were LLM-classified and may contain errors.

Analysis Summaries

This is an internal corporate policy document from Google DeepMind establishing voluntary safety protocols and governance structures for frontier AI development. The document uses predominantly voluntary language ('intend to', 'aim to', 'may') and establishes internal governance bodies rather than external enforcement mechanisms.

The document has good coverage of approximately 10-12 subdomains, with strong focus on malicious actors (4.1, 4.2, 4.3), AI system security (2.2), competitive dynamics (6.4), governance failure (6.5), and AI safety failures (7.1, 7.2, 7.3). Coverage is concentrated in security, misuse prevention, and AI safety domains, particularly around CBRN, cyber, and deceptive alignment risks.

This is an internal corporate policy document from Google DeepMind governing their own AI development activities. The primary sectors governed are Information (where Google DeepMind operates as a technology company) and Scientific Research and Development Services (as an AI research organization). The document does not regulate external sectors but addresses risks that could affect multiple sectors including national security, healthcare (CBRN), and critical infrastructure.

The document covers multiple AI lifecycle stages with primary focus on Build and Use Model, Verify and Validate, Deploy, and Operate and Monitor stages. It addresses model development, evaluation protocols, deployment procedures, and ongoing monitoring for frontier AI systems.

The document explicitly focuses on frontier AI models and systems, with detailed discussion of critical capability levels. It does not explicitly define or distinguish between general purpose AI, task-specific AI, foundation models, or generative vs predictive AI. The document mentions model weights and discusses open release considerations but does not use the term 'open-weight' or 'open-source.' No specific compute thresholds are mentioned.

Actor Details

Google DeepMind

AI Developer

The document is authored and proposed by Google DeepMind as indicated throughout the framework. It represents Google DeepMind's internal safety framework for frontier AI development.

Google DeepMind AGI Safety Council; Google DeepMind Responsibility and Safety Council; Google Trust & Compliance Council

AI Developer

Internal corporate governance bodies within Google DeepMind are responsible for reviewing and approving response plans when alert thresholds are reached, and for periodically reviewing framework implementation.

Google DeepMind AGI Safety Council; external evaluators; appropriate government authorities

AI DeveloperAI Governance Actor

The Google DeepMind AGI Safety Council monitors framework implementation. External evaluators may be used to test models. The document also mentions potential information sharing with government authorities for oversight purposes.

Google DeepMind; frontier AI field

AI Developer

The framework primarily applies to Google DeepMind's own frontier AI model development and deployment activities. It also makes recommendations for the broader frontier AI field regarding security practices.

: —
: Private Sector

View source document View on ETO AGORA

Risk Subdomain Coverage

11 subdomains (8 Good, 3 Minimal)

2.2 AI system security vulnerabilities and attacks 4.2 Cyberattacks, weapon development or use, and mass harm 5.2 Loss of human agency and autonomy 6.4 Competitive dynamics 6.5 Governance failure 7.1 AI pursuing its own goals in conflict with human goals or values 7.2 AI possessing dangerous capabilities 7.3 Lack of capability or robustness

Minimal Coverage

4.1 Disinformation, surveillance, and influence at scale 4.3 Fraud, scams, and targeted manipulation 7.4 Lack of transparency or interpretability

AI SystemsAI ModelsFrontier AIOpen-Weight or Open-Source

(5)

Security: CybersecuritySecuritySafetySecurity: DisseminationReliability

Harm to health/safetyDetrimental contentHarm to infrastructureFinancial loss

(15)

EvaluationEvaluation: Impact assessmentGovernance developmentTieringTiering: Tiering based on domain of applicationTiering: Tiering based on planning abilityTiering: Tiering based on impactEvaluation: External auditingEvaluation: Conformity assessmentEvaluation: Post-market monitoringEvaluation: Adversarial testingDisclosureDisclosure: About evaluationDisclosure: In deploymentDisclosure: About incidents

(5)

Plan and DesignBuild and Use ModelVerify and ValidateDeployOperate and Monitor

Google DeepMind Frontier Safety Framework Version 2.0

EnactedOtherAI SystemsPrivate Sector

|Private-sector companies·February 4, 2025

Official name: Google DeepMind Frontier Safety Framework Version 2.0

Analysis summaries, actor details, and coverage mappings were LLM-classified and may contain errors.

Analysis Summaries

Actor Details

Google DeepMind

AI Developer

The document is authored and proposed by Google DeepMind as indicated throughout the framework. It represents Google DeepMind's internal safety framework for frontier AI development.

Google DeepMind AGI Safety Council; Google DeepMind Responsibility and Safety Council; Google Trust & Compliance Council

AI Developer

Google DeepMind AGI Safety Council; external evaluators; appropriate government authorities

AI DeveloperAI Governance Actor

Google DeepMind; frontier AI field

AI Developer

Google DeepMind Frontier Safety Framework Version 2.0

Analysis Summaries

Actor Details

Risk Subdomain Coverage

Technical Scope

Harms

Google DeepMind Frontier Safety Framework Version 2.0

Analysis Summaries

Actor Details

Risk Subdomain Coverage

Technical Scope

Harms