“Defense-in-depth” across the AI Lifecyc…

BackInternal Governance Mechanisms

This page is still being polished. If you have thoughts, please share them via the feedback form.

Data on this page is preliminary and may change. Please do not share or cite these figures publicly.

Internal Governance Mechanisms

Tse (2025)|LLM classified

Mitigation Taxonomy

2Organisation

2.1Oversight & Accountability

Internal decision-making bodies, roles, authority structures, and accountability frameworks that establish who has power over AI-related decisions and how they are held responsible.

Also in Organisation

2.2 Risk & Assurance2.3 Operations & Security2.4 Engineering & Development

LLM Classification Details

Reasoning

Generic governance framework lacking specificity to distinguish between leadership oversight, role definition, or formal policies.

Code: 2.1.9Version: v0.5Classified: Jan 22, 2026

Sub-mitigations (9)

The “Three Lines Model” in organization risk management

This model clarifies risk management responsibilities within the organization and ensures that risks are effectively controlled by specifying three lines of defense. (1) First Line of Defense: Operational business units responsible for identifying, analyzing, and mitigating risks in daily activities. (2) Second Line of Defense: Risk management and compliance teams that oversee and support the first line, ensuring the risk management framework functions effectively. (3)Third Line of Defense: Internal audit, independently evaluating the first two lines' effectiveness and providing assurance to the board of directors.

2.1.2 Roles & Accountability

Lifecycle:Other (outside lifecycle)Actor:DeployerAIRM:Govern

AI safety and security committee

Establish a dedicated committee to oversee AI risk identification, mitigation strategies, and system deployment approvals, ensuring compliance with security standards and regulations

2.1.1 Leadership Oversight

Lifecycle:Other (outside lifecycle)Actor:DeveloperAIRM:Govern

AI safety team and research unit

Form an internal team led by a designated safety officer to conduct AI risk management practices. This team is tasked to perform proactive safety research on high-risk AI applications, and to investigate potential misuse and loss of control scenarios to inform risk mitigation strategies

2.1.2 Roles & Accountability

Lifecycle:Other (outside lifecycle)Actor:DeveloperAIRM:Govern

Evaluation and approval process for major decisions

Before proceeding with model training, deployment, or entry into highly sensitive domains, internal safety evaluation and decision-making processes should be conducted to clarify risk mitigation plans and usage authorization boundaries, determine whether to proceed, and ensure that high-risk operations have adequate governance capability support

2.2.1 Risk Assessment

Lifecycle:Other (multiple stages)Actor:DeveloperAIRM:Govern

Allocate AI safety resources based on risk severity

Yellow Line: Minimum 10% of staff and project budget dedicated to safety. Red Line: Minimum 30% of staff and project budget allocated to safety measures.

2.1.2 Roles & Accountability

Lifecycle:Other (outside lifecycle)Actor:DeveloperAIRM:Manage

Organizational safety culture and training

Cultivate a safety-first culture through regular internal audits to ensure compliance with AI safety protocols, reinforcing accountability. Mandate ongoing, targeted safety training for R&D staff and leadership to uphold AI safety best practices, fostering a culture of responsibility and vigilance

2.4.4 Training & Awareness

Lifecycle:Other (outside lifecycle)Actor:DeveloperAIRM:Govern

Whistleblower protection and reporting mechanism

Establish secure, anonymous reporting channels to disclose critical AI safety risks or violations without fear of retaliation. Implement robust protections to prevent restrictive confidentiality or non-disparagement agreements from suppressing safety-related disclosures, ensuring a transparent and accountable environment

2.1.3 Policies & Procedures

Lifecycle:Other (outside lifecycle)Actor:Governance ActorAIRM:Govern

Authorization levels and responsibility matching mechanisms

Prior to model or system deployment, authorizations should be based on risk levels, e.g., limited to closed beta testing, regulatory sandboxes, or critical industry users. Higher levels of authorization should be based on stronger governance and controls, including user qualification, audit trails and isolation of the operating environment.

2.3.1 Deployment Management

Lifecycle:DeployActor:DeveloperAIRM:Govern

Risk register

Developers could maintain a dynamic risk register, an internal document designed for rapid updates and action-oriented risk tracking. The risk register would catalog a comprehensive taxonomy of risks, detailing for each: 1) the highest risk level across all models, 2) the designated risk owner, 3) specific evaluations to run at various stages, 4) tailored mitigation procedures for different risk levels, and 5) evaluation thresholds. Distinct from stable, long-term AI safety policies, risk registers enable agile responses to emerging threats. As a transparency measure, a redacted version of the risk register could be published annually, sharing insights with stakeholders while protecting sensitive data

2.2.1 Risk Assessment

Lifecycle:Other (multiple stages)Actor:DeveloperAIRM:Map

Other mitigations from Tse (2025) (55)

Safety Pre-training & Post-training Measures

The safety pre-training and post-training phase is a key line of defense against AI risks. The core objective is to enhance the model's alignment with human intent and ability to identify and refuse harmful instructions 56 , and to limit the formation and expression of dangerous capabilities from the outset.

1.1.2 Learning Objectives

Lifecycle:Build and Use ModelActor:DeveloperAIRM:Manage

Safety Pre-training & Post-training Measures > Training data filters & unlearning

Filter out data that could be hazardous, such as bioweapon and gain-of-function-related knowledge. While currently less successful, unlearning techniques could also be applied to make hazardous knowledge more difficult for users to access.

1.1 Model

Lifecycle:Collect and Process DataActor:DeveloperAIRM:Manage

Safety Pre-training & Post-training Measures > Safety alignment training against harmful instructions

Through alignment training (e.g., RLHF/RLAIF) and red-team-driven fine-tuning, enhance the model's ability to recognize and refuse high-risk content related to violence, weapon development, etc.

1.1.2 Learning Objectives

Lifecycle:Build and Use ModelActor:DeveloperAIRM:Manage

Safety Pre-training & Post-training Measures > Embedding safety values and behavioral constraints

Inject constraints aligned with values like honesty and controllability during training to ensure the model adheres to human intent in complex scenarios

1.1.2 Learning Objectives

Lifecycle:Build and Use ModelActor:DeveloperAIRM:Manage

Safety Pre-training & Post-training Measures > Real-time monitoring of reasoning processes

Introduce automated chain-of-thought monitoring to identify anomalies or potentially malicious behaviors during reasoning, to help detect deceptive, conspiratorial, or manipulative outputs.

1.2.3 Monitoring & Detection

Lifecycle:Operate and MonitorActor:DeveloperAIRM:Measure

Safety Pre-training & Post-training Measures > Enhancing interpretability and formal verification

Use techniques like neural network reverse engineering to analyze internal mechanisms and identify risks; combine with formal verification methods to mathematically validate critical behaviors, increasing trustworthiness.

2.2 Risk & Assurance

Lifecycle:Verify and ValidateActor:DeveloperAIRM:Measure

View all 55 mitigations from this source →

Source Document

Frontier AI Risk Management Framework (v1.0)

Tse, Brian; Fang, Liang; Xu, Jia; Duan, Yawen; Shao, Jing (2025)

The field of Artificial Intelligence (AI) is rapidly advancing, with systems increasingly performing at or above human levels across various domains. These breakthroughs offer unprecedented opportunities to address humanity's greatest challenges, from scientific breakthroughs and improved healthcare to enhanced economic productivity. However, this rapid progress also introduces unprecedented risks. As advanced AI development and deployment outpace crucial safety measures, the need for robust risk management has never been more critical. Shanghai Artificial Intelligence Laboratory is an advanced research institute focusing on AI research and application. Working in concert with universities and industry, we explore the future of AI by conducting original and forward-looking scientific research that makes fundamental contributions to basic theory as well as innovations in various technological fields. We strive to become a top-tier global AI Laboratory, committed to the safe and beneficial development of AI. To proactively navigate these challenges and foster a global “race to the top” in AI safety, we have proposed the AI-45° Law,1 a roadmap to trustworthy AGI. Introducing our Frontier AI Risk Management Framework Today, Shanghai AI Laboratory, in collaboration with Concordia AI,2 is proud to introduce the Frontier AI Risk Management Framework v1.0 (the “Framework”). We propose a robust set of protocols designed to empower general-purpose AI developers with comprehensive guidelines for proactively identifying, assessing, mitigating, and governing a set of severe AI risks that pose threats to public safety and national security, thereby safeguarding individuals and society. This framework serves as a guideline for general-purpose AI model developers to manage the potential severe risks from their general-purpose AI models. This framework aligns with standards and best practices in risk management of safety-critical industries. It encompasses six interconnected stages: risk identification, risk thresholds, risk analysis, risk evaluation, risk mitigation, and risk governance.

View source

Classification

AI Lifecycle Stage

Other (outside lifecycle)

Outside the standard AI system lifecycle

Responsible Actor

Developer

Entity that creates, trains, or modifies the AI system

NIST AI RMF Function

Govern

Policies, processes, and accountability structures for AI risk management

Risk Domains

Primary

6.5 Governance failure