Harms from increasingly agentic algorith…

BackDangerous capabilities in AI systems

Situational awareness

Home/Risks/Maas (2023)/Dangerous capabilities in AI systems

Harms from increasingly agentic algorith…

Situational awareness

Home/Risks/Maas (2023)/Dangerous capabilities in AI systems

Harms from increasingly agentic algorith…

Situational awareness

Dangerous capabilities in AI systems

Advancing AI Governance: A Literature Review of Problems, Options, and Proposals

Maas (2023)

Source DOI

Sub-categories (7)

Situational awareness

"cases where a large language model displays awareness that it is a model, and it can recognize whether it is currently in testing or deployment;"

7.2 AI possessing dangerous capabilities

AI systemUnintentionalOther

Acquisition of a goal to harm society

"cases of AI systems being given the outright goal of harming humanity (ChaosGPT);"

4.2 Cyberattacks, weapon development or use, and mass harm

HumanIntentionalPre-deployment

Acquisition of goals to seek power and control

"cases where AI systems converge on optimal policies of seeking power over their environment;135"

7.1 AI pursuing its own goals in conflict with human goals or values

AI systemIntentionalOther

Self-improvement

"examples of cases where AI systems improve AI systems"

7.2 AI possessing dangerous capabilities

AI systemIntentionalOther

Autonomous replication

"the ability of simple software to autonomously spread around the internet in spite of countermeasures (various software worms and computer viruses)"

7.2 AI possessing dangerous capabilities

AI systemIntentionalPost-deployment

Anonymous resource acquisition

"The demonstrated ability of anonymous actors to accumulate resources online (e.g., Satoshi Nakamoto as an anonymous crypto billionaire)"

7.2 AI possessing dangerous capabilities

AI systemIntentionalPost-deployment

Deception

"Cases of AI systems deceiving humans to carry out tasks or meet goals.139"

7.2 AI possessing dangerous capabilities

AI systemIntentionalPost-deployment

Other risks from Maas (2023) (25)

Alignment failures in existing ML systems

7.1 AI pursuing its own goals in conflict with human goals or values

AI systemUnintentionalOther

Alignment failures in existing ML systems > Faulty reward functions in the wild

7.1 AI pursuing its own goals in conflict with human goals or values

HumanUnintentionalPost-deployment

Alignment failures in existing ML systems > Specification gaming

7.1 AI pursuing its own goals in conflict with human goals or values

AI systemIntentionalOther

Alignment failures in existing ML systems > Reward model overoptimization

7.1 AI pursuing its own goals in conflict with human goals or values

AI systemIntentionalOther

Alignment failures in existing ML systems > Instrumental convergence

7.1 AI pursuing its own goals in conflict with human goals or values

AI systemIntentionalPost-deployment

Alignment failures in existing ML systems > Goal misgeneralization

7.1 AI pursuing its own goals in conflict with human goals or values

AI systemUnintentionalOther

View all 25 risks from this paper →