BackControl
Sub-category
Risk Domain
AI systems acting in conflict with human goals or values, especially the goals of designers or users, or ethical standards. These misaligned behaviors may be introduced by humans during design and development, such as through reward hacking and goal misgeneralisation, or may result from AI using dangerous capabilities such as manipulation, deception, situational awareness to seek power, self-proliferate, or achieve other goals.
This is the difficulty of controlling the ML system(p. 12)
Entity— Who or what caused the harm
Intent— Whether the harm was intentional or accidental
Timing— Whether the risk is pre- or post-deployment
Supporting Evidence (2)
1.
"Level of autonomy: ML systems are often designed with different levels of autonomy in mind: human-in-the-loop (human execution), human-on-the-loop (human supervision), and full autonomy. Fully autonomous systems may be more difficult to regain control of, in the event of a malfunction; however, it may be simpler to program contingency measures since system developers may assume that the system always bears full responsibility. On the other hand, although a human-supervised system is designed to make intervention easier, the dynamics of human-machine interactions may increase the difficulty of determining responsibility as a situation unfolds. While human oversight is theoretically desirable, the above paradox indicates that a human-on-the-loop design could increase control risk if the additional complexity is not accounted for."(p. 12)
2.
"Manual overrides: In human-on-the-loop and fully autonomous systems, the ability to rapidly intervene and either take manual control of or shut down the system is crucial to mitigating the harms that result from misprediction. One factor that significant impacts this ability is the latency of the connection to the ML system (remote vs. on-site intervention). This is particularly important in applications that may cause acute physical or psychological injuries, such as autonomous weapons/vehicles and social media bots with a wide reach. Other factors include the ease with which the human supervisor can identify situations requiring intervention and the ease of transitioning from an observer to actor. These are often tightly connected to the design choices made with regard to the non-ML components of the system. For example, appropriate explainability/interpretability functionality may help the human supervisor identify failures (e.g., when the system’s actions and explanations do not align). For high-stakes applications, human supervisors will need to be sufficiently trained (and potentially certified) to react appropriately when they need to assume control."(p. 12)
Part of First-Order Risks
Other risks from Tan, Taeihagh & Baxter (2022) (17)
First-Order Risks
7.0 AI System Safety, Failures & LimitationsOtherOtherOther
First-Order Risks > Application
7.0 AI System Safety, Failures & LimitationsHumanIntentionalPost-deployment
First-Order Risks > Misapplication
7.3 Lack of capability or robustnessHumanIntentionalPost-deployment
First-Order Risks > Algorithm
7.3 Lack of capability or robustnessAI systemUnintentionalPre-deployment
First-Order Risks > Training & validation data
7.0 AI System Safety, Failures & LimitationsHumanOtherPre-deployment
First-Order Risks > Robustness
7.3 Lack of capability or robustnessAI systemUnintentionalPost-deployment