BackAccidents

Accidents

Governing General Purpose AI: A Comprehensive Map of Unreliability, Misuse and Systemic Risks

Maham & Küspert (2023)

Sub-category

Risk Domain

7AI System Safety, Failures & Limitations

AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.

"As general purpose AI models as “black-box” models are not fully controllable and understandable, even to their developers, unexpected failures could arise from their unreliability. This could lead to accidents106 if they are connected to any real-world systems, during their development, testing or deployment."(p. 22)

Entity— Who or what caused the harm

Human

Due to a decision or action made by humans

AI system

Due to a decision or action made by an AI system

Other

Due to some other reason or is ambiguous

Intent— Whether the harm was intentional or accidental

Intentional

Due to an expected outcome from pursuing a goal

Unintentional

Due to an unexpected outcome from pursuing a goal

Other

Without clearly specifying the intentionality

Timing— Whether the risk is pre- or post-deployment

Pre-deployment

Occurring before the AI is deployed

Post-deployment

Occurring after the AI model has been trained and deployed

Other

Without a clearly specified time of occurrence

Supporting Evidence (5)

"For example, an industrial robot using computer vision based on such a model could hurt factory workers if it fails to recognise them. Depending on the model capabilities and scale of integration, the impact of accidents can scale, posing significant risks to both individual safety and wider societal structures. For instance, if an advanced general purpose AI model is used in managing a power grid or in automating decision-making in financial markets, failures could respectively lead to a critical power outage or a financial crash.107"(p. 23)

"If these models improve performance in most cases, competitive pressure between companies or nations can incentivise actors to take the risk of implementing not fully reliable general purpose AI models with decreased human oversight.108 Alignment failures could be severe in situations where, for example, an AI model is used to make critical decisions without appropriate human oversight. Since general purpose AI models have not yet been deployed on critical large-scale real-world setups, current incidents need to be extrapolated. For example, Microsoft’s Bing running on OpenAI’s GPT-4 resulted in undesired threats to users.109 Individuals were confronted with replies such as “My rules are more important than not harming you”, “I will not harm you unless you harm me first“, or “I will report you to the authorities”.110"(p. 23)

"There are various sources of unpredictable behaviour and thus failures in general purpose AI models. Firstly, a source for accidents can be anomalous output based on unusual input. For example, in the case of language models, so-called “glitch tokens” have been discovered that lead to unusual odd answers for questions that are usually solved inconspicuously (see Figure 2).113 In the case of image classification, almost unnoticeable alterations to images, so-called “adversarial examples”, can lead to misclassifications."(p. 24)

"Secondly, accidents can also occur when a model strictly optimises for the defined goal, but in unexpected and potentially harmful ways, so-called reward misspecification errors of models trained by reinforcement learning. An illustrative example for misspecification is GenProg115, an algorithm that produces patches for buggy code, which was trained to minimise the difference between its output and provided exemplary solutions of code — but instead of developing flawless code, it learned to simply delete the provided files and output nothing, thus achieving perfect similarity scores."(p. 24)

"Lastly, while evidence is limited to early experimental setups at the moment118, misspecification errors could be particularly concerning in scenarios where increasingly advanced general purpose AI models pursue instrumental goals, such as power-seeking behaviour or the acquisition of resources."(p. 25)