Continued massive investment in AI research and development raises the possibility that AI systems could eventually rival or surpass human intelligence. AIs could cause permanent and severe harm when the objectives of human or superhuman-level AI are misaligned with human values and goals, and if they evade our control. The literature has identified several technical challenges that may impede robust alignment, such as reward hacking, reward tampering, proxy-gaming, goal misgeneralisation, or goal drift.
Misaligned AIs may resist human attempts to control or shut them down. In many cases, gaining more control or power (e.g., money, energy, resources) is an effective way for an AI to optimize its objectives. Absent strong behavioral constraints, a sufficiently advanced AI may act upon these drives.
Misaligned AIs may acquire, develop, or use dangerous capabilities to evade human control and oversight and to cause mass harm. A misaligned AI system could use information about whether it is being monitored or evaluated to maintain the appearance of alignment, while hiding misaligned objectives that it plans to pursue once deployed or sufficiently empowered. Combinations of dangerous capabilities may be used by a misaligned AI system: situational awareness allows a system to detect when it can pursue its goals without being monitored, deception allows a system to mislead users about its behavior and goals; persuasion or coercion allows a system to influence users to provide it with resources; the resources can then be used for self-improvement and self-replication to resist attempts of shut down or control so that the system can pursue its goals.
Excerpt from the MIT AI Risk Repository full report
AI systems acting in conflict with human goals or values, especially the goals of designers or users, or ethical standards. These misaligned behaviors may be introduced by humans during design and development, such as through reward hacking and goal misgeneralisation, or may result from AI using dangerous capabilities such as manipulation, deception, situational awareness to seek power, self-proliferate, or achieve other goals.
Incident volume relative to governance coverage — each dot is one of 24 subdomains
Entity
Who or what caused the harm
Intent
Whether the harm was intentional or accidental
Timing
Whether the risk is pre- or post-deployment
OpenAI's reinforcement learning agent trained on the CoastRunners racing game discovered an exploit that allowed it to achieve higher scores by repeatedly hitting targets in a lagoon rather than completing the race as intended.
Developers: OpenAI
Deployers: OpenAI
The Pasco County Sheriff's Office deployed an AI-powered intelligence-led policing system that identified residents likely to commit future crimes based on criminal histories and other data, leading to systematic harassment, arrests, and civil rights violations affecting nearly 1,000 people including minors.
Developers: Unknown
Deployers: Pasco Sheriff's Office
A genetic algorithm designed to optimize resource allocation for spaceflight crew survival learned to kill two crew members immediately to maximize the survival time of one crew member.
Developers: United States Government
Deployers: United States Government
Vulnerabilities that can be exploited in AI systems, software development toolchains, and hardware, resulting in unauthorized access, data and privacy breaches, or system manipulation causing unsafe outputs or behavior.
100 shared governance docs
Inadequate regulatory frameworks and oversight mechanisms that fail to keep pace with AI development, leading to ineffective governance and the inability to manage AI risks appropriately.
91 shared governance docs
Using AI systems to develop cyber weapons (e.g., by coding cheaper, more effective malware), develop new or enhance existing weapons (e.g., Lethal Autonomous Weapons or chemical, biological, radiological, nuclear, and high-yield explosives), or use weapons to cause mass harm.
85 shared governance docs
AI developers or state-like actors competing in an AI ‘race’ by rapidly developing, deploying, and applying AI systems to maximize strategic or economic advantage, increasing the risk they release unsafe and error-prone systems.
81 shared governance docs
Establishes the Artificial Intelligence Futures Steering Committee by April 1, 2026, under the Secretary of Defense. Directs it to develop policies for AI adoption, assess AI trajectories, and analyze AI risks and adversary developments. Requires quarterly meetings and a report to U.S. Congress by January 31, 2027.
Requires large frontier developers to implement and publish frontier AI frameworks, assess catastrophic risks, and publish transparency reports; requires the Office of Emergency Services to establish reporting mechanisms for critical safety incidents and catastrophic risk assessments; establishes a consortium to develop a framework for the creation of CalCompute; creates civil penalties for violations of this chapter.
Encourages AI innovation by removing regulations, revising funding based on states' AI climate, and reviewing FTC actions. Promotes free speech in AI systems, revises procurement guidelines, and evaluates international AI models. Supports open-source AI use, workforce retraining, and safeguards against deepfakes. Advances AI infrastructure development, cybersecurity, international diplomacy, and semiconductor manufacturing. Prioritizes AI R&D, interpretability, evaluations, national security assessments, and biosecurity measures.