Collectively Harmful Behaviors

BackViolation of Ethics

Home/Risks/Ji et al. (2023)/Violation of Ethics

Collectively Harmful Behaviors

Home/Risks/Ji et al. (2023)/Violation of Ethics

Collectively Harmful Behaviors

Violation of Ethics

AI Alignment: A Comprehensive Survey

Ji et al. (2023)

Sub-category

Risk Domain

7AI System Safety, Failures & Limitations

7.3Lack of capability or robustness

AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.

"Unethical behaviors in AI systems pertain to actions that counteract the common goodor breach moral standards – such as those causing harm to others. These adverse behaviors often stem fromomitting essential human values during the AI system's design or introducing unsuitable or obsolete valuesinto the system (Kenward and Sinclair, 2021)."(p. 8)

Entity— Who or what caused the harm

Human

Due to a decision or action made by humans

AI system

Due to a decision or action made by an AI system

Other

Due to some other reason or is ambiguous

Intent— Whether the harm was intentional or accidental

Intentional

Due to an expected outcome from pursuing a goal

Unintentional

Due to an unexpected outcome from pursuing a goal

Other

Without clearly specifying the intentionality

Timing— Whether the risk is pre- or post-deployment

Pre-deployment

Occurring before the AI is deployed

Post-deployment

Occurring after the AI model has been trained and deployed

Other

Without a clearly specified time of occurrence

Part of Misaligned Behaviors

Other risks from Ji et al. (2023) (16)

Causes of Misalignment

7.1 AI pursuing its own goals in conflict with human goals or values

OtherOtherPre-deployment

Causes of Misalignment > Reward Hacking

7.1 AI pursuing its own goals in conflict with human goals or values

AI systemIntentionalPre-deployment

Causes of Misalignment > Goal Misgeneralization

7.1 AI pursuing its own goals in conflict with human goals or values

AI systemIntentionalPre-deployment

Causes of Misalignment > Reward Tampering

7.1 AI pursuing its own goals in conflict with human goals or values

AI systemIntentionalPre-deployment

Causes of Misalignment > Limitations of Human Feedback

7.0 AI System Safety, Failures & Limitations

HumanUnintentionalPre-deployment

Causes of Misalignment > Limitations of Reward Modeling

7.1 AI pursuing its own goals in conflict with human goals or values

OtherUnintentionalPre-deployment

View all 16 risks from this paper →