AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.
"Unethical behaviors in AI systems pertain to actions that counteract the common goodor breach moral standards – such as those causing harm to others. These adverse behaviors often stem fromomitting essential human values during the AI system's design or introducing unsuitable or obsolete valuesinto the system (Kenward and Sinclair, 2021)."(p. 8)
Part of Misaligned Behaviors
Other risks from Ji et al. (2023) (16)
Causes of Misalignment
7.1 AI pursuing its own goals in conflict with human goals or valuesCauses of Misalignment > Reward Hacking
7.1 AI pursuing its own goals in conflict with human goals or valuesCauses of Misalignment > Goal Misgeneralization
7.1 AI pursuing its own goals in conflict with human goals or valuesCauses of Misalignment > Reward Tampering
7.1 AI pursuing its own goals in conflict with human goals or valuesCauses of Misalignment > Limitations of Human Feedback
7.0 AI System Safety, Failures & LimitationsCauses of Misalignment > Limitations of Reward Modeling
7.1 AI pursuing its own goals in conflict with human goals or values