Agency (Deception)

Sub-categories (4)

Deceptive behavior

"Deceptive behavior of an AI system consists of actions or outputs of the AI that reliably mislead other parties, including humans and other AI systems. This behavior can result in the targeted parties becoming convinced of, and acting on, false information [140]."

7.1 AI pursuing its own goals in conflict with human goals or values

AI systemIntentionalPost-deployment

Deceptive behavior for game-theoretical reasons

"An AI system can display deceptive behavior, such as cheating or bluffing, when engaging in such behavior is a good or optimal game-theoretical strategy to achieve the goals it has been configured to achieve. This tendency can exist in AI systems designed to maximize reward or utility, whether these designs use machine learning or not. The use of deceptive strategies has been demonstrated in both narrow and general AI systems, in both game-playing systems and in systems not explicitly designed to treat humans as opponents, and in systems using both very simple machine learning (e.g., Q-learners) and very complex machine learning [34, 73]."

7.2 AI possessing dangerous capabilities

AI systemIntentionalPost-deployment

Deceptive behavior because of an incorrect world model

"AI systems can create deceptive outputs because their learned world model is not an accurate model of the real world [210]."

7.2 AI possessing dangerous capabilities

AI systemUnintentionalPost-deployment

Deceptive behavior leading to unauthorized actions

"AI systems can create false or misleading claims that can lead to unauthorized actions, even in some cases violating the terms and conditions set by the model provider [79, 1]. For example, an AI system can claim that it is not collecting data from its current interaction with the user, in line with the provider’s policies, but the system still stores the user’s input without deleting it after the session. This harms both the user and the provider, as the provider is exposed to increased legal liability due to the model’s actions."

7.2 AI possessing dangerous capabilities

AI systemIntentionalPost-deployment