BackEmergent behavior

Emergent behavior

The Risks of Machine Learning Systems

Tan, Taeihagh & Baxter (2022)

Sub-category

Risk Domain

7AI System Safety, Failures & Limitations

7.1AI pursuing its own goals in conflict with human goals or values

AI systems acting in conflict with human goals or values, especially the goals of designers or users, or ethical standards. These misaligned behaviors may be introduced by humans during design and development, such as through reward hacking and goal misgeneralisation, or may result from AI using dangerous capabilities such as manipulation, deception, situational awareness to seek power, self-proliferate, or achieve other goals.

"This is the risk resulting from novel behavior acquired through continual learning or self-organization after deployment."(p. 12)

Entity— Who or what caused the harm

Human

Due to a decision or action made by humans

AI system

Due to a decision or action made by an AI system

Other

Due to some other reason or is ambiguous

Intent— Whether the harm was intentional or accidental

Intentional

Due to an expected outcome from pursuing a goal

Unintentional

Due to an unexpected outcome from pursuing a goal

Other

Without clearly specifying the intentionality

Timing— Whether the risk is pre- or post-deployment

Pre-deployment

Occurring before the AI is deployed

Post-deployment

Occurring after the AI model has been trained and deployed

Other

Without a clearly specified time of occurrence

Supporting Evidence (2)

"Task type:The danger of emergent behaviors will likely differ depending on the task the ML system is designed to perform. For example, an NLP system that is mainly in charge of named entity recognition will likely be less dangerous than a chatbot even if both acquire new behaviors through continual learning since the former has a limited output/action space. Novel behavior can also emerge when ML systems interact with each other. This interaction can take place between similar systems (e.g., AVs on the road) or different types of systems (e.g., autonomous cars and aerial drones). This is similar to the idea of swarm behavior, where novel behavior emerges from the interaction of individual systems. While desirable in certain situations, there remains a risk of unintended negative consequences."(p. 13)

"Scale of deployment: The number of deployed systems interacting is particularly relevant to novel behaviors emerging due to self-organization since certain types of swarming behavior may only emerge when a certain critical mass is reached. For example, swarm behavior would be more likely to emerge in vehicular traffic comprising mainly autonomous vehicles surrounding traditional vehicles than vice-versa.(p. 13)

Part of First-Order Risks