Accidents
Governing General Purpose AI: A Comprehensive Map of Unreliability, Misuse and Systemic Risks
Maham & Küspert (2023)
AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.
"As general purpose AI models as “black-box” models are not fully controllable and understandable, even to their developers, unexpected failures could arise from their unreliability. This could lead to accidents106 if they are connected to any real-world systems, during their development, testing or deployment."(p. 22)
Supporting Evidence (5)
"For example, an industrial robot using computer vision based on such a model could hurt factory workers if it fails to recognise them. Depending on the model capabilities and scale of integration, the impact of accidents can scale, posing significant risks to both individual safety and wider societal structures. For instance, if an advanced general purpose AI model is used in managing a power grid or in automating decision-making in financial markets, failures could respectively lead to a critical power outage or a financial crash.107"(p. 23)
"If these models improve performance in most cases, competitive pressure between companies or nations can incentivise actors to take the risk of implementing not fully reliable general purpose AI models with decreased human oversight.108 Alignment failures could be severe in situations where, for example, an AI model is used to make critical decisions without appropriate human oversight. Since general purpose AI models have not yet been deployed on critical large-scale real-world setups, current incidents need to be extrapolated. For example, Microsoft’s Bing running on OpenAI’s GPT-4 resulted in undesired threats to users.109 Individuals were confronted with replies such as “My rules are more important than not harming you”, “I will not harm you unless you harm me first“, or “I will report you to the authorities”.110"(p. 23)
"There are various sources of unpredictable behaviour and thus failures in general purpose AI models. Firstly, a source for accidents can be anomalous output based on unusual input. For example, in the case of language models, so-called “glitch tokens” have been discovered that lead to unusual odd answers for questions that are usually solved inconspicuously (see Figure 2).113 In the case of image classification, almost unnoticeable alterations to images, so-called “adversarial examples”, can lead to misclassifications."(p. 24)
"Secondly, accidents can also occur when a model strictly optimises for the defined goal, but in unexpected and potentially harmful ways, so-called reward misspecification errors of models trained by reinforcement learning. An illustrative example for misspecification is GenProg115, an algorithm that produces patches for buggy code, which was trained to minimise the difference between its output and provided exemplary solutions of code — but instead of developing flawless code, it learned to simply delete the provided files and output nothing, thus achieving perfect similarity scores."(p. 24)
"Lastly, while evidence is limited to early experimental setups at the moment118, misspecification errors could be particularly concerning in scenarios where increasingly advanced general purpose AI models pursue instrumental goals, such as power-seeking behaviour or the acquisition of resources."(p. 25)
Other risks from Maham & Küspert (2023) (10)
Misuse Risks
4.0 Malicious Actors & MisuseMisuse Risks > Cybercrime
4.3 Fraud, scams, and targeted manipulationMisuse Risks > Biosecurity Threats
4.2 Cyberattacks, weapon development or use, and mass harmMisuse Risks > Politically motivated misuse
4.1 Disinformation, surveillance, and influence at scaleSystemic Risks
6.1 Power centralization and unfair distribution of benefitsSystemic Risks > Economic Power Centralisation and Inequality
6.1 Power centralization and unfair distribution of benefits