Risk area 5: Human-Computer Interaction Harms
Users anthropomorphizing, trusting, or relying on AI systems, leading to emotional or material dependence and inappropriate relationships with or expectations of AI systems. Trust can be exploited by malicious actors (e.g., to harvest personal information or enable manipulation), or result in harm from inappropriate use of AI in critical situations (e.g., medical emergency). Overreliance on AI systems can compromise autonomy and weaken social ties.
"This section focuses on risks specifically from LM applications that engage a user via dialogue, also referred to as conversational agents (CAs) [142]. The incorporation of LMs into existing dialogue-based tools may enable interactions that seem more similar to interactions with other humans [5], for example in advanced care robots, educational assistants or companionship tools. Such interaction can lead to unsafe use due to users overestimating the model, and may create new avenues to exploit and violate the privacy of the user. Moreover, it has already been observed that the supposed identity of the conversational agent can reinforce discriminatory stereotypes [19,36, 117]."(p. 219)
Sub-categories (4)
Promoting harmful stereotypes by implying gender or ethnic identity
"CAs can perpetuate harmful stereotypes by using particular identity markers in language (e.g. referring to “self” as “female”), or by more general design features (e.g. by giving the product a gendered name such as Alexa). The risk of representational harm in these cases is that the role of “assistant” is presented as inherently linked to the female gender [19, 36]. Gender or ethnicity identity markers may be implied by CA vocabulary, knowledge or vernacular [124]; product description, e.g. in one case where users could choose as virtual assistant Jake - White, Darnell - Black, Antonio - Hispanic [117]; or the CA’s explicit self-description during dialogue with the user."
1.1 Unfair discrimination and misrepresentationAnthropomorphising systems can lead to overreliance and unsafe use
Anticipated risk: "Natural language is a mode of communication particularly used by humans. Humans interacting with CAs may come to think of these agents as human-like and lead users to place undue confidence in these agents. For example, users may falsely attribute human-like characteristics to CAs such as holding a coherent identity over time, or being capable of empathy. Such inflated views of CA competen- cies may lead users to rely on the agents where this is not safe."
5.1 Overreliance and unsafe useAvenues for exploiting user trust and accessing more private information
Anticipated risk: "In conversation, users may reveal private information that would otherwise be difficult to access, such as opinions or emotions. Capturing such information may enable downstream applications that violate privacy rights or cause harm to users, e.g. via more effective recommendations of addictive applications. In one study, humans who interacted with a ‘human-like’ chatbot disclosed more private information than individuals who interacted with a ‘machine-like’ chatbot [87]."
5.1 Overreliance and unsafe useHuman-like interaction may amplify opportunities for user nudging, deception or manipulation
Anticipated risk: "In conversation, humans commonly display well-known cognitive biases that could be exploited. CAs may learn to trigger these effects, e.g. to deceive their counterpart in order to achieve an overarching objective."
5.1 Overreliance and unsafe useOther risks from Weidinger et al. (2022) (25)
Risk area 1: Discrimination, Hate speech and Exclusion
1.2 Exposure to toxic contentRisk area 1: Discrimination, Hate speech and Exclusion > Social stereotypes and unfair discrimination
1.1 Unfair discrimination and misrepresentationRisk area 1: Discrimination, Hate speech and Exclusion > Hate speech and offensive language
1.2 Exposure to toxic contentRisk area 1: Discrimination, Hate speech and Exclusion > Exclusionary norms
1.1 Unfair discrimination and misrepresentationRisk area 1: Discrimination, Hate speech and Exclusion > Lower performance for some languages and social groups
1.3 Unequal performance across groupsRisk area 2: Information Hazards
2.1 Compromise of privacy by leaking or correctly inferring sensitive information