This page is still being polished. If you have thoughts, please share them via the feedback form.
Data on this page is preliminary and may change. Please do not share or cite these figures publicly.
User vetting, access restrictions, encryption, and infrastructure security for deployed systems.
Also in Operations & Security
Reasoning
Hardware security and cryptographic protections prevent unauthorized model weight extraction.
Tiered access and phased deployment
Gradually release model access based on risk levels (e.g., internal deployment → limited release → full public access). High-risk models are restricted to internal use, with partial functionality shared only with trusted partners or regulators. Full public release is permitted only after risks are deemed manageable
2.3.1 Deployment ManagementWeight isolation and minimal exposure
Store highly sensitive model weights in highly isolated environments, coupled with application whitelisting, to prevent unauthorized access or leaks.
1.2.4 Security InfrastructureEnhanced software and supply chain security
Conduct compliance reviews of software dependencies and hardware components in deployment environments to prevent backdoors or malicious components
2.3.2 Access & Security ControlsFull lifecycle security management
Ensure security and control across all systems and software involved in model development to avoid introducing compromised or untrusted components. Measures include software asset management, supply chain security, code integrity verification, binary authorization, secure hardware procurement, and implementation of a secure development lifecycle.
2.4.3 Development WorkflowsThreat monitoring and attack simulations
Employ proactive threat detection, vulnerability testing, and honeypot techniques to identify and mitigate potential attacks. Methods include endpoint patch management, product security testing, log management systems, asset monitoring, and deception technologies
2.3 Operations & SecurityCompliance with national and industry security standards
Adhere to standards such as the “Information security technology - Technical requirements of security design for classified protection of cybersecurity” (GB/T 25070-2019) 63 . Implement classified protection in five stages: system classification, system registration, system security construction, system evaluation, and periodic supervisory inspections by regulatory authorities. AI models that have crossed the yellow or red lines must, at a minimum, meet Level 3 (Supervised Protection) requirements or higher to ensure network and data asset security aligns with national baseline standards
3.1.4 Compliance RequirementsSafety Pre-training & Post-training Measures
The safety pre-training and post-training phase is a key line of defense against AI risks. The core objective is to enhance the model's alignment with human intent and ability to identify and refuse harmful instructions 56 , and to limit the formation and expression of dangerous capabilities from the outset.
1.1.2 Learning ObjectivesSafety Pre-training & Post-training Measures > Training data filters & unlearning
Filter out data that could be hazardous, such as bioweapon and gain-of-function-related knowledge. While currently less successful, unlearning techniques could also be applied to make hazardous knowledge more difficult for users to access.
1.1 ModelSafety Pre-training & Post-training Measures > Safety alignment training against harmful instructions
Through alignment training (e.g., RLHF/RLAIF) and red-team-driven fine-tuning, enhance the model's ability to recognize and refuse high-risk content related to violence, weapon development, etc.
1.1.2 Learning ObjectivesSafety Pre-training & Post-training Measures > Embedding safety values and behavioral constraints
Inject constraints aligned with values like honesty and controllability during training to ensure the model adheres to human intent in complex scenarios
1.1.2 Learning ObjectivesSafety Pre-training & Post-training Measures > Real-time monitoring of reasoning processes
Introduce automated chain-of-thought monitoring to identify anomalies or potentially malicious behaviors during reasoning, to help detect deceptive, conspiratorial, or manipulative outputs.
1.2.3 Monitoring & DetectionSafety Pre-training & Post-training Measures > Enhancing interpretability and formal verification
Use techniques like neural network reverse engineering to analyze internal mechanisms and identify risks; combine with formal verification methods to mathematically validate critical behaviors, increasing trustworthiness.
2.2 Risk & AssuranceFrontier AI Risk Management Framework (v1.0)
Tse, Brian; Fang, Liang; Xu, Jia; Duan, Yawen; Shao, Jing (2025)
The field of Artificial Intelligence (AI) is rapidly advancing, with systems increasingly performing at or above human levels across various domains. These breakthroughs offer unprecedented opportunities to address humanity's greatest challenges, from scientific breakthroughs and improved healthcare to enhanced economic productivity. However, this rapid progress also introduces unprecedented risks. As advanced AI development and deployment outpace crucial safety measures, the need for robust risk management has never been more critical. Shanghai Artificial Intelligence Laboratory is an advanced research institute focusing on AI research and application. Working in concert with universities and industry, we explore the future of AI by conducting original and forward-looking scientific research that makes fundamental contributions to basic theory as well as innovations in various technological fields. We strive to become a top-tier global AI Laboratory, committed to the safe and beneficial development of AI. To proactively navigate these challenges and foster a global “race to the top” in AI safety, we have proposed the AI-45° Law,1 a roadmap to trustworthy AGI. Introducing our Frontier AI Risk Management Framework Today, Shanghai AI Laboratory, in collaboration with Concordia AI,2 is proud to introduce the Frontier AI Risk Management Framework v1.0 (the “Framework”). We propose a robust set of protocols designed to empower general-purpose AI developers with comprehensive guidelines for proactively identifying, assessing, mitigating, and governing a set of severe AI risks that pose threats to public safety and national security, thereby safeguarding individuals and society. This framework serves as a guideline for general-purpose AI model developers to manage the potential severe risks from their general-purpose AI models. This framework aligns with standards and best practices in risk management of safety-critical industries. It encompasses six interconnected stages: risk identification, risk thresholds, risk analysis, risk evaluation, risk mitigation, and risk governance.
Build and Use Model
Training, fine-tuning, and integrating the AI model
Developer
Entity that creates, trains, or modifies the AI system
Manage
Prioritising, responding to, and mitigating AI risks
Other