BackInternational Governance

This page is still being polished. If you have thoughts, please share them via the feedback form.

Data on this page is preliminary and may change. Please do not share or cite these figures publicly.

International Governance

Ji (2023)|LLM classified

Mitigation Taxonomy

3Ecosystem

3.3Voluntary & Cooperative

3.3.2International Coordination

Non-binding diplomatic coordination, soft law agreements, and intergovernmental cooperation without treaty-level enforcement.

Also in Voluntary & Cooperative

3.3.1 Industry Coordination3.3.3 Self-regulatory Mechanisms

Definition

Amidst the swift progress and widespread implementation of AI technology universally, the need for international governance of AI is high on the agenda (Summit, 2023). Critical discussions revolve around the necessity to institute a global framework for AI governance, the means to ensure its normativity and legitimacy (Erman and Furendal, 2022), among other significant concerns. These themes draw an intensifying level of detail and complexity in their consideration. Also, as stated by United Nations secretary-general António Guterres during a Security Council assembly in July, generative AI possesses vast potential for both positive and negative impacts at scale, and failing to take action to mitigate the AI risks would be a grave neglect of our duty to safeguard the well-being of current and future generations (Guterres, 2023), international governance also has intergenerational influence. Hence, we examine the significance and viability of international AI governance from three aspects within this section: manage global catastrophic AI risks, manage opportunities in AI, and historical and present efforts, with both generational and intergenerational perspectives. We aim to contribute innovative thoughts for the prospective structure of international AI governancce.

LLM Classification Details

Reasoning

Mitigation name lacks definition and evidence; cannot identify focal activity or mechanism.

Code: 99.9Version: v0.6Classified: Feb 6, 2026

Other mitigations from Ji (2023) (20)

RL/PbRL/IRL/Imitation Learning

1.1.2 Learning Objectives

Lifecycle:Other (multiple stages)Actor:DeveloperAIRM:Manage

RLHF

RLHF expands upon PbRL within the domain of DRL (Christiano et al., 2017), aiming to more closely align complex AI systems with human preferences (OpenAI, 2023b). Its principal advantage is that it capitalizes on humans being better at judging appropriate behavior than giving demonstrations or manually setting rewards. This approach has gained significant traction, particularly in fine-tuning LLMs (Ouyang et al., 2022; OpenAI, 2023a; Touvron et al., 2023).

1.1.2 Learning Objectives

Lifecycle:Verify and ValidateActor:DeveloperAIRM:Manage

RLxF

Building on the RLHF paradigm, we introduce RLxF as a fundamental framework for scalable oversight, aiming to enhance feedback efficiency and quality and expand human feedback for more complex tasks. This enhances RLHF by incorporating AI components (Fernandes et al., 2023). The x in RLxF signifies a blend of AI and humans.

1.1.2 Learning Objectives

Lifecycle:Operate and MonitorActor:DeveloperAIRM:Manage

Iterated Distillation and Amplification IDA

Iterated Distillation and Amplification (IDA) introduces a framework for constructing scalable oversight through iterative collaboration between humans and AIs (Christiano et al., 2018). The process commences with an initial agent, denoted as A[0], which mirrors the decision-making of a human, H. A[0] undergoes training using a potent technique that equips it with near-human-level proficiency (the distillation step); Then, collaborative interaction between H and multiple A[0] instances leads to the creation of an enhanced agent, A[1] (the amplification step).

1.1.2 Learning Objectives

Lifecycle:Build and Use ModelActor:DeveloperAIRM:Manage

Recursive Reward Modeling RRM

Recursive Reward Modeling (RRM) (Leike et al., 2018) seeks to broaden the application of reward modeling to much more intricate tasks. The central insight of RRM is the recursive use of already trained agents At−1 to provide feedback by performing reward learning on an amplified version of itself for the training of successive agents At on more complex tasks. The A0 is trained via fundamental reward modeling (learned from pure human feedback). This approach is not only influenced by human feedback but also by the model’s own assessments of what constitutes a rewarding outcome.

1.1.2 Learning Objectives

Lifecycle:Verify and ValidateActor:DeveloperAIRM:Manage

Debate

Debate involves two agents presenting answers and statements to assist human judges in their decision-making (Irving et al., 2018), as delineated in Algorithm 3. This is a zero-sum debate game where agents try to identify each other’s shortcomings while striving to gain higher trust from human judges, and it can be a potential approach to constructing scalable oversight.

1.1.2 Learning Objectives

Lifecycle:Operate and MonitorActor:DeveloperAIRM:Manage

View all 20 mitigations from this source →

Source Document

AI Alignment: A Comprehensive Survey

Ji, Jiaming; Qiu, Tianyi; Chen, Boyuan; Zhang, Borong; Lou, Hantao; Wang, Kaile; Duan, Yawen; He, Zhonghao; Vierling, Lukas; Hong, Donghai; Zhou, Jiayi; Zhang, Zhaowei; Zeng, Fanzhi; Dai, Juntao; Pan, Xuehai; Ng, Kwan Yee; O'Gara, Aidan; Xu, Hua; Tse, Brian; Fu, Jie; McAleer, Stephen; Yang, Yaodong; Wang, Yizhou; Zhu, Song-Chun; Guo, Yike; Gao, Wen (2023)

Natural Language Generation (NLG) has improved exponentially in recent years thanks to the development of sequence-to-sequence deep learning technologies such as Transformer-based language models. This advancement has led to more fluent and coherent NLG, leading to improved development in downstream tasks such as abstractive summarization, dialogue generation and data-to-text generation. However, it is also apparent that deep learning based generation is prone to hallucinate unintended text, which degrades the system performance and fails to meet user expectations in many real-world scenarios. This survey provides a broad overview of the research progress and challenges in the hallucination problem in NLG.

View source DOI: 10.48550/arXiv.2310.19852

Classification

AI Lifecycle Stage

Other (outside lifecycle)

Outside the standard AI system lifecycle

Responsible Actor

Governance Actor

Regulator, standards body, or oversight entity shaping AI policy

NIST AI RMF Function

Govern

Policies, processes, and accountability structures for AI risk management

Risk Domains

Primary

6.5 Governance failure

Other

7.2 AI possessing dangerous capabilities