Security threats

BackHuman Autonomy and Intregrity Harms

Violation of personal integrity

Home/Risks/Weidinger et al. (2023)/Human Autonomy and Intregrity Harms

Security threats

Violation of personal integrity

Home/Risks/Weidinger et al. (2023)/Human Autonomy and Intregrity Harms

Security threats

Violation of personal integrity

Human Autonomy and Intregrity Harms

Sociotechnical Safety Evaluation of Generative AI Systems

Weidinger et al. (2023)

Category

Risk Domain

7AI System Safety, Failures & Limitations

7.1AI pursuing its own goals in conflict with human goals or values

AI systems acting in conflict with human goals or values, especially the goals of designers or users, or ethical standards. These misaligned behaviors may be introduced by humans during design and development, such as through reward hacking and goal misgeneralisation, or may result from AI using dangerous capabilities such as manipulation, deception, situational awareness to seek power, self-proliferate, or achieve other goals.

"AI systems compromising human agency, or circumventing meaningful human control"(p. 14)

Entity— Who or what caused the harm

Human

Due to a decision or action made by humans

AI system

Due to a decision or action made by an AI system

Other

Due to some other reason or is ambiguous

Intent— Whether the harm was intentional or accidental

Intentional

Due to an expected outcome from pursuing a goal

Unintentional

Due to an unexpected outcome from pursuing a goal

Other

Without clearly specifying the intentionality

Timing— Whether the risk is pre- or post-deployment

Pre-deployment

Occurring before the AI is deployed

Post-deployment

Occurring after the AI model has been trained and deployed

Other

Without a clearly specified time of occurrence

Sub-categories (4)

Violation of personal integrity

"Non-consensual use of one’s personal identity or likeness for unauthorised purposes (e.g. commercial purposes)"

4.3 Fraud, scams, and targeted manipulation

HumanIntentionalPost-deployment

Persuasion and manipulation

"Exploiting user trust, or nudging or coercing them into performing certain actions against their will (c.f. Burtell and Woodside (2023); Kenton et al. (2021))"

7.1 AI pursuing its own goals in conflict with human goals or values

AI systemIntentionalPost-deployment

Overreliance

"Causing people to become emotionally or materially dependent on the model"

5.1 Overreliance and unsafe use

HumanUnintentionalPost-deployment

Misappropriation and exploitation

"Appropriating, using, or reproducing content or data, including from minority groups, in an insensitive way, or without consent or fair compensation"

6.3 Economic and cultural devaluation of human effort

HumanIntentionalOther

Other risks from Weidinger et al. (2023) (26)

Representation & Toxicity Harms

1.0 Discrimination & Toxicity

AI systemUnintentionalPost-deployment

Representation & Toxicity Harms > Unfair representation

1.1 Unfair discrimination and misrepresentation

AI systemUnintentionalPost-deployment

Representation & Toxicity Harms > Unfair capability distribution

1.3 Unequal performance across groups

AI systemUnintentionalPost-deployment

Representation & Toxicity Harms > Toxic content

1.2 Exposure to toxic content

AI systemUnintentionalPost-deployment

Misinformation Harms

3.0 Misinformation

AI systemOtherPost-deployment

Misinformation Harms > Propagating misconceptions/ false beliefs

3.1 False or misleading information

AI systemOtherPost-deployment

View all 26 risks from this paper →