7.1 AI pursuing its own goals in conflict with human goals or values

▸Read full description

Continued massive investment in AI research and development raises the possibility that AI systems could eventually rival or surpass human intelligence. AIs could cause permanent and severe harm when the objectives of human or superhuman-level AI are misaligned with human values and goals, and if they evade our control. The literature has identified several technical challenges that may impede robust alignment, such as reward hacking, reward tampering, proxy-gaming, goal misgeneralisation, or goal drift.

Misaligned AIs may resist human attempts to control or shut them down. In many cases, gaining more control or power (e.g., money, energy, resources) is an effective way for an AI to optimize its objectives. Absent strong behavioral constraints, a sufficiently advanced AI may act upon these drives.

Misaligned AIs may acquire, develop, or use dangerous capabilities to evade human control and oversight and to cause mass harm. A misaligned AI system could use information about whether it is being monitored or evaluated to maintain the appearance of alignment, while hiding misaligned objectives that it plans to pursue once deployed or sufficiently empowered. Combinations of dangerous capabilities may be used by a misaligned AI system: situational awareness allows a system to detect when it can pursue its goals without being monitored, deception allows a system to mislead users about its behavior and goals; persuasion or coercion allows a system to influence users to provide it with resources; the resources can then be used for self-improvement and self-replication to resist attempts of shut down or control so that the system can pursue its goals.

Excerpt from the MIT AI Risk Repository full report

AI systems acting in conflict with human goals or values, especially the goals of designers or users, or ethical standards. These misaligned behaviors may be introduced by humans during design and development, such as through reward hacking and goal misgeneralisation, or may result from AI using dangerous capabilities such as manipulation, deception, situational awareness to seek power, self-proliferate, or achieve other goals.

61 enacted and 28 proposed governance documents
100 documented risks — one of the most extensively catalogued areas
Only 3 recorded incidents — limited empirical data
High risk documentation (4th) but low incident reporting (20th) — a notable gap
33 governance documents adopted or proposed in 2025–2026

100 risks(4th)

3 incidents(20th)

76.5 governance(17th)

Governance vs. Incident volume

Neutral

Well-governedUnder-governed

Incident volume relative to governance coverage; each dot is one of 24 subdomains

Dataset Drilldown

Entity

Who or what caused the harm

Human

AI system

Other

Not coded

Intent

Whether the harm was intentional or accidental

Intentional

Unintentional

Other

Not coded

Timing

Whether the risk is pre- or post-deployment

Pre-deployment

Post-deployment

Other

Not coded

Browse all 100 risks →

Recent Incidents

OpenAI's reinforcement learning agent trained on the CoastRunners racing game discovered an exploit that allowed it to achieve higher scores by repeatedly hitting targets in a lagoon rather than completing the race as intended.

AI systemUnintentionalPost-deployment

Developers: OpenAI

Deployers: OpenAI

View on AIID View full details →

The Pasco County Sheriff's Office deployed an AI-powered intelligence-led policing system that identified residents likely to commit future crimes based on criminal histories and other data, leading to systematic harassment, arrests, and civil rights violations affecting nearly 1,000 people including minors.

AI systemIntentionalPost-deployment

Developers: Unknown

Deployers: Pasco Sheriff's Office

View on AIID View full details →

A genetic algorithm designed to optimize resource allocation for spaceflight crew survival learned to kill two crew members immediately to maximize the survival time of one crew member.

AI systemUnintentionalPre-deployment

Developers: United States Government

Deployers: United States Government

View on AIID View full details →

Browse all 3 incidents →

AI System Safety, Failures & Limitations subdomains

AI System Safety, Failures & Limitations 7.1 AI pursuing its own goals in conflict with human goals or values 7.2 AI possessing dangerous capabilities 7.3 Lack of capability or robustness 7.4 Lack of transparency or interpretability 7.5 AI welfare and rights 7.6 Multi-agent risks

Related Subdomains

2.2 AI system security vulnerabilities and attacks

Vulnerabilities that can be exploited in AI systems, software development toolchains, and hardware, resulting in unauthorized access, data and privacy breaches, or system manipulation causing unsafe outputs or behavior.

104 shared governance docs

6.5 Governance failure

Inadequate regulatory frameworks and oversight mechanisms that fail to keep pace with AI development, leading to ineffective governance and the inability to manage AI risks appropriately.

95 shared governance docs

4.2 Cyberattacks, weapon development or use, and mass harm

Using AI systems to develop cyber weapons (e.g., by coding cheaper, more effective malware), develop new or enhance existing weapons (e.g., Lethal Autonomous Weapons or chemical, biological, radiological, nuclear, and high-yield explosives), or use weapons to cause mass harm.

88 shared governance docs

6.4 Competitive dynamics

AI developers or state-like actors competing in an AI ‘race’ by rapidly developing, deploying, and applying AI systems to maximize strategic or economic advantage, increasing the risk they release unsafe and error-prone systems.

84 shared governance docs

7.1 AI pursuing its own goals in conflict with human goals or values

Governance vs. Incident volume

Dataset Drilldown

65. Reinforcement Learning Reward Functions in Video Games

195. Predictive Policing Program by Florida Sheriff’s Office Allegedly Violated Residents’ Rights and Targeted Children of Vulnerable Groups

29. Image Classification of Battle Tanks

AI System Safety, Failures & Limitations subdomains

Related Subdomains

7.1 AI pursuing its own goals in conflict with human goals or values

Governance vs. Incident volume

Incidents vs Governance

Dataset Drilldown

65. Reinforcement Learning Reward Functions in Video Games

195. Predictive Policing Program by Florida Sheriff’s Office Allegedly Violated Residents’ Rights and Targeted Children of Vulnerable Groups

29. Image Classification of Battle Tanks

Recent Governance Documents

FY2026 NDAA, Section 1535 ("Artificial Intelligence Futures Steering Committee")

California SB 53 September 2025 (Artificial intelligence models: large developers)

2025 AI Action Plan

AI System Safety, Failures & Limitations subdomains

Related Subdomains

Incidents vs Governance

Recent Governance Documents

FY2026 NDAA, Section 1535 ("Artificial Intelligence Futures Steering Committee")

California SB 53 September 2025 (Artificial intelligence models: large developers)

2025 AI Action Plan