Skip to main content
Home/Risks/G'sell (2024)/Technical vulnerabilities (The risk of misalignment)

Technical vulnerabilities (The risk of misalignment)

Regulating under Uncertainty: Governance Options for Generative AI

G'sell (2024)

Sub-category
Risk Domain

AI systems acting in conflict with human goals or values, especially the goals of designers or users, or ethical standards. These misaligned behaviors may be introduced by humans during design and development, such as through reward hacking and goal misgeneralisation, or may result from AI using dangerous capabilities such as manipulation, deception, situational awareness to seek power, self-proliferate, or achieve other goals.

"To assess whether an AI model is reliable or robust, it is crucial to consider whether the model is “aligned.” “Alignment” focuses on whether an AI model effectively operates in accordance with the goals established by its designers.238 A misaligned AI model may pursue some objectives, but not the intended ones. Therefore, misaligned AI models can malfunction and cause harm."(p. 63)

Supporting Evidence (1)

1.
"Aligning an AI model poses significant challenges for developers due to the difficulty in specifying a comprehensive range of desired and undesired behaviors. Additionally, AI models can identify loopholes that allow them to achieve the specified objective efficiently but in unintended and potentially harmful ways.240 They may develop unwanted instrumental strategies, such as seeking power, as these strategies can help them achieve their specified objectives"(p. 63)

Part of Technical and operational risks

Other risks from G'sell (2024) (33)