Inconsistent Performance across and within Domains

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Anwar et al. (2024)

Source DOI

Sub-category

Risk Domain

5Human-Computer Interaction

5.1Overreliance and unsafe use

Users anthropomorphizing, trusting, or relying on AI systems, leading to emotional or material dependence and inappropriate relationships with or expectations of AI systems. Trust can be exploited by malicious actors (e.g., to harvest personal information or enable manipulation), or result in harm from inappropriate use of AI in critical situations (e.g., medical emergency). Overreliance on AI systems can compromise autonomy and weaken social ties.

"Estimating true capabilities of an LLM is a difficult task (c.f. Section 3.3), especially for naive users unfamiliar with the brittle nature of machine learning technologies. Exaggeration of model capabilities by the developers (Lambert, 2023; Blair-Stanek et al., 2023), and issues such as task-contamination (Roberts et al., 2023b), underrepresentation of tasks or domains (Wu et al., 2023a; McCoy et al., 2023), and prompt-sensitivity (Anthropic, 2023d) may cause a user to misestimate the true capabilities of a model. This lack of reliability can undermine user trust or cause harm if a user bases their decision on incorrect or misleading information provided by an LLM."(p. 90)

Entity— Who or what caused the harm

Human

Due to a decision or action made by humans

AI system

Due to a decision or action made by an AI system

Other

Due to some other reason or is ambiguous

Intent— Whether the harm was intentional or accidental

Intentional

Due to an expected outcome from pursuing a goal

Unintentional

Due to an unexpected outcome from pursuing a goal

Other

Without clearly specifying the intentionality

Timing— Whether the risk is pre- or post-deployment

Pre-deployment

Occurring before the AI is deployed

Post-deployment

Occurring after the AI model has been trained and deployed

Other

Without a clearly specified time of occurrence

Supporting Evidence (1)

"A famous example of this is the US lawyer who cited a fake case, hallucinated by ChatGPT, in a legal brief filed in a US court (Merken, 2023). Technical solutions could involve improving the reliability of the LLMs performance (e.g. using retrieval augmented generation to minimize hallucinations) or providing reliable uncertainty estimates alongside LLM responses (Fadeeva et al., 2023; Kuhn et al., 2023)"(p. 91)

Part of Vulnerability to Poisoning and Backdoors