BackTechnical oversight proposals

This page is still being polished. If you have thoughts, please share them via the feedback form.

Data on this page is preliminary and may change. Please do not share or cite these figures publicly.

Technical oversight proposals

She (2024)|LLM classified

Mitigation Taxonomy

3Ecosystem

3.1Legal & Regulatory

Laws, mandates, and enforcement mechanisms requiring state authority to create or enforce.

Also in Ecosystem

3.2 Shared Infrastructure3.3 Voluntary & Cooperative

Definitionp. 83

Technical oversight in AI regulation involves a variety of components that ensure AI systems adhere to ethical and safety standards. These components include transparency and explainability, auditing and monitoring, accountability mechanisms, and the establishment of safety standards and certification processes.

Additional Informationp. 83-85

Transparency and Explainability: Transparency is a cornerstone of responsible AI governance. For AI systems to be effectively regulated, their decision-making processes must be interpretable by human operators and auditors (Bengio et al., 2024b; Bommasani et al., 2024b). Explainability refers to the ability to trace and understand how an AI system arrives at its conclusions. This is particularly important in highstakes fields such as healthcare and criminal justice, where opaque decision-making can lead to harmful consequences. The push for transparency aligns with regulatory frameworks, such as the European Union’s GDPR, which mandates that individuals have the right to an explanation of AI-driven decisions (Bommasani et al., 2024a). • Auditing and Monitoring: The auditing of AI systems is essential for identifying potential biases, operational flaws, or security vulnerabilities. AI audits can be performed at various stages of system development, from pre-deployment assessments to continuous monitoring once AI systems are in operation (Bengio et al., 2024b; Bommasani et al., 2024b). Continuous monitoring ensures that AI systems remain compliant with ethical guidelines and legal requirements over time. Monitoring frameworks should include mechanisms for tracking data quality, decision-making processes, and model performance, especially in dynamic environments where AI models learn and adapt (Bengio et al., 2024b). • Accountability Mechanisms: Accountability ensures that developers and operators of AI systems are responsible for the outcomes produced by their technologies. One of the major proposals in this area is the introduction of mandatory incident reporting for high-risk AI applications (Bengio et al., 2024b). This would require companies and organizations to disclose failures or unethical outcomes produced by their AI systems. Additionally, clear guidelines must be established to define liability in cases where AI systems cause harm, particularly in scenarios where the harm could have been anticipated or prevented through proper oversight (Bommasani et al., 2024a). • Safety Standards and Certification: The development of safety standards and certification processes for AI systems is a critical element of technical oversight. These standards should be based on international cooperation to ensure harmonized regulatory approaches across different jurisdictions. Certification processes would involve third-party assessments to verify that AI systems meet established safety and ethical benchmarks before they are deployed in critical settings (Bommasani et al., 2024a). Such standards should cover aspects like data privacy, algorithmic fairness, and robustness against adversarial attacks (Bengio et al., 2024b; Bommasani et al., 2024b).

Part of

Proposals

Other mitigations from She (2024) (83)

Value Misalignment

99.9 Other

Lifecycle:Unable to classifyActor:DeveloperAIRM:Unable to classify

Value Misalignment > Mitigating social bias

1 AI System

Lifecycle:Other (multiple stages)Actor:DeveloperAIRM:Manage

Value Misalignment > Privacy protection

1 AI System

Lifecycle:Other (multiple stages)Actor:DeveloperAIRM:Manage

Value Misalignment > Methods for mitigating toxicity

1 AI System

Lifecycle:Build and Use ModelActor:DeveloperAIRM:Manage

Value Misalignment > Methods for mitigating LLM amorality

1 AI System

Lifecycle:Build and Use ModelActor:DeveloperAIRM:Manage

Robustness to attack

1 AI System

Lifecycle:Other (multiple stages)Actor:DeveloperAIRM:Other

View all 83 mitigations from this source →

Source Document

Large Language Model Safety: A Holistic Survey

Shi, Dan; Shen, Tianhao; Huang, Yufei; Li, Zhigen; Leng, Yongqi; Jin, Renren; Liu, Chuang; Wu, Xinwei; Guo, Zishan; Yu, Linhao; Shi, Ling; Jiang, Bojian; Xiong, Deyi (2024)

The rapid development and deployment of large language models (LLMs) have introduced a new frontier in artificial intelligence, marked by unprecedented capabilities in natural language understanding and generation. However, the increasing integration of these models into critical applications raises substantial safety concerns, necessitating a thorough examination of their potential risks and associated mitigation strategies. This survey provides a comprehensive overview of the current landscape of LLM safety, covering four major categories: value misalignment, robustness to adversarial attacks, misuse, and autonomous AI risks. In addition to the comprehensive review of the mitigation methodologies and evaluation resources on these four aspects, we further explore four topics related to LLM safety: the safety implications of LLM agents, the role of interpretability in enhancing LLM safety, the technology roadmaps proposed and abided by a list of AI companies and institutes for LLM safety, and AI governance aimed at LLM safety with discussions on international cooperation, policy proposals, and prospective regulatory directions. Our findings underscore the necessity for a proactive, multifaceted approach to LLM safety, emphasizing the integration of technical solutions, ethical considerations, and robust governance frameworks. This survey is intended to serve as a foundational resource for academy researchers, industry practitioners, and policymakers, offering insights into the challenges and opportunities associated with the safe integration of LLMs into society. Ultimately, it seeks to contribute to the safe and beneficial development of LLMs, aligning with the overarching goal of harnessing AI for societal advancement and well-being. A curated list of related papers has been publicly available at https://github.com/tjunlp-lab/Awesome-LLM-Safety-Papers.

View source DOI: 10.48550/arXiv.2412.17686

Classification

AI Lifecycle Stage

Other (outside lifecycle)

Outside the standard AI system lifecycle

Responsible Actor

Governance Actor

Regulator, standards body, or oversight entity shaping AI policy

DeveloperDeployer

NIST AI RMF Function

Govern

Policies, processes, and accountability structures for AI risk management

Measure

Risk Domains

Primary

6.5 Governance failure

Other

7 AI System Safety, Failures & Limitations

Technical oversight proposals

She (2024)|LLM classified

Mitigation Taxonomy

3Ecosystem

3.1Legal & Regulatory

Laws, mandates, and enforcement mechanisms requiring state authority to create or enforce.

Also in Ecosystem

3.2 Shared Infrastructure3.3 Voluntary & Cooperative

Definitionp. 83

Additional Informationp. 83-85

Source Document

Large Language Model Safety: A Holistic Survey

Shi, Dan; Shen, Tianhao; Huang, Yufei; Li, Zhigen; Leng, Yongqi; Jin, Renren; Liu, Chuang; Wu, Xinwei; Guo, Zishan; Yu, Linhao; Shi, Ling; Jiang, Bojian; Xiong, Deyi (2024)

Classification

AI Lifecycle Stage

Other (outside lifecycle)

Outside the standard AI system lifecycle

Responsible Actor

Governance Actor

Regulator, standards body, or oversight entity shaping AI policy

DeveloperDeployer

NIST AI RMF Function

Govern

Policies, processes, and accountability structures for AI risk management

Measure