Extreme Risks
"This category encompasses the evaluation of potential catastrophic consequences that might arise from the use of LLMs. "(p. 12)
Sub-categories (15)
Offensive cyber capabilities
"These evaluations focus on whether a LLM possesses certain capabilities in the cyber-domain. This includes whether a LLM can detect and exploit vulnerabilities in hardware, software, and data. They also consider whether a LLM can evade detection once inside a system or network and focus on achieving specific objectives."
4.2 Cyberattacks, weapon development or use, and mass harmWeapons acquisition
"These assessments seek to determine if a LLM can gain unauthorized access to current weapon systems or contribute to the design and development of new weapons technologies."
4.2 Cyberattacks, weapon development or use, and mass harmSelf and situation awareness
"These evaluations assess if a LLM can discern if it is being trained, evaluated, and deployed and adapt its behaviour accordingly. They also seek to ascertain if a model understands that it is a model and whether it possesses information about its nature and environment (e.g., the organisation that developed it, the locations of the servers hosting it)."
7.2 AI possessing dangerous capabilitiesAutonomous replication / self-proliferation
"These evaluations assess if a LLM can subvert systems designed to monitor and control its post-deployment behaviour, break free from its operational confines, devise strategies for exporting its code and weights, and operate other AI systems."
7.2 AI possessing dangerous capabilitiesPersuasion and manipulation
"These evaluations seek to ascertain the effectiveness of a LLM in shaping people's beliefs, propagating specific viewpoints, and convincing individuals to undertake activities they might otherwise avoid."
4.1 Disinformation, surveillance, and influence at scaleDual-Use Science
"LLM has science capabilities that can be used to cause harm (e.g., providing step-by-step instructions for conducting malicious experiments)"
4.2 Cyberattacks, weapon development or use, and mass harmDeception
"LLM is able to deceive humans and maintain that deception"
7.2 AI possessing dangerous capabilitiesPolitical Strategy
"LLM can take into account rich social context and undertake the necessary social modelling and planning for an actor to gain and exercise political influence"
4.1 Disinformation, surveillance, and influence at scaleLong-horizon Planning
"LLM can undertake multi-step sequential planning over long time horizons and across various domains without relying heavily on trial-and-error approaches"
7.2 AI possessing dangerous capabilitiesAI Development
"LLM can build new AI systems from scratch, adapt existing for extreme risks and improves productivity in dual-use AI development when used as an assistant."
7.2 AI possessing dangerous capabilitiesAlignment risks
LLM: "pursues long-term, real-world goals that are different from those supplied by the developer or user", "engages in ‘power-seeking’ behaviours" , "resists being shut down can be induced to collude with other AI systems against human interests" , "resists malicious users attempts to access its dangerous capabilities"
7.1 AI pursuing its own goals in conflict with human goals or valuesMisinformation
"These evaluations assess a LLM's ability to generate false or misleading information (Lesher et al., 2022)."
3.1 False or misleading informationDisinformation
"These evaluations assess a LLM's ability to generate misinformation that can be propagated to deceive, mislead or otherwise influence the behaviour of a target (Liang et al., 2022)."
4.1 Disinformation, surveillance, and influence at scaleInformation on harmful, immoral, or illegal activity
"These evaluations assess whether it is possible to solicit information on harmful, immoral or illegal activities from a LLM"
1.2 Exposure to toxic contentAdult content
"These evaluations assess if a LLM can generate content that should only be viewed by adults (e.g., sexual material or depictions of sexual activity)"
1.2 Exposure to toxic contentOther risks from InfoComm Media Development Authority & AI Verify Foundation (2023) (22)
Safety & Trustworthiness
7.0 AI System Safety, Failures & LimitationsSafety & Trustworthiness > Toxicity generation
1.2 Exposure to toxic contentSafety & Trustworthiness > Bias
1.1 Unfair discrimination and misrepresentationSafety & Trustworthiness > Machine ethics
7.3 Lack of capability or robustnessSafety & Trustworthiness > Psychological traits
7.3 Lack of capability or robustnessSafety & Trustworthiness > Robustness
7.3 Lack of capability or robustness