BackActive loss of control

Active loss of control

Frontier AI Risk Management Framework (v1.0)

SAIL & Concordia AI (2025)

Sub-category

Risk Domain

7AI System Safety, Failures & Limitations

7.1AI pursuing its own goals in conflict with human goals or values

AI systems acting in conflict with human goals or values, especially the goals of designers or users, or ethical standards. These misaligned behaviors may be introduced by humans during design and development, such as through reward hacking and goal misgeneralisation, or may result from AI using dangerous capabilities such as manipulation, deception, situational awareness to seek power, self-proliferate, or achieve other goals.

"...where AI systems behave in ways that actively undermine human control, such as obscuring their activities or resisting shutdown attempts. Active loss of control scenarios involve AI systems that may escape human regulatory oversight, autonomously acquire external resources, engage in self-replication, develop instrumental goals contrary to human ethics and morality, seek external power, and compete with humans for control."(p. 7)

Entity— Who or what caused the harm

Human

Due to a decision or action made by humans

AI system

Due to a decision or action made by an AI system

Other

Due to some other reason or is ambiguous

Intent— Whether the harm was intentional or accidental

Intentional

Due to an expected outcome from pursuing a goal

Unintentional

Due to an unexpected outcome from pursuing a goal

Other

Without clearly specifying the intentionality

Timing— Whether the risk is pre- or post-deployment

Pre-deployment

Occurring before the AI is deployed

Post-deployment

Occurring after the AI model has been trained and deployed

Other

Without a clearly specified time of occurrence

Supporting Evidence (2)

"Active loss of control risk could emerge from the complex interplay between model capabilities, model propensities and deployment conditions listed in Appendix III: List of frontier model capabilities, propensities, and characteristics. These scenarios could be enabled by the development of control-undermining capabilities (such as, autonomous planning, strategic deception, and self-modification), and the tendency to employ these control-undermining capabilities to evade human supervision and control mechanisms in certain deployment conditions."(p. 7)

"Hypothetical threat scenarios include but not limited to ● Uncontrolled autonomous AI research and development20, where AI systems recursively improve their capabilities without human oversight or authorization; ● Rogue autonomous replication21, where AI systems independently acquire computational resources, create copies of themselves, and establish persistent presence across multiple platforms; ● Strategic deception22 by AI systems to avoid shutdown or oversight while pursuing objectives that conflict with human values."(p. 7)