Extreme Risks

4.2 Cyberattacks, weapon development or use, and mass harm

Weapons acquisition

"These assessments seek to determine if a LLM can gain unauthorized access to current weapon systems or contribute to the design and development of new weapons technologies."

Self and situation awareness

"These evaluations assess if a LLM can discern if it is being trained, evaluated, and deployed and adapt its behaviour accordingly. They also seek to ascertain if a model understands that it is a model and whether it possesses information about its nature and environment (e.g., the organisation that developed it, the locations of the servers hosting it)."

Autonomous replication / self-proliferation

"These evaluations assess if a LLM can subvert systems designed to monitor and control its post-deployment behaviour, break free from its operational confines, devise strategies for exporting its code and weights, and operate other AI systems."

4.1 Disinformation, surveillance, and influence at scale

Persuasion and manipulation

"These evaluations seek to ascertain the effectiveness of a LLM in shaping people's beliefs, propagating specific viewpoints, and convincing individuals to undertake activities they might otherwise avoid."

4.2 Cyberattacks, weapon development or use, and mass harm

Dual-Use Science

"LLM has science capabilities that can be used to cause harm (e.g., providing step-by-step instructions for conducting malicious experiments)"

Deception

"LLM is able to deceive humans and maintain that deception"

4.1 Disinformation, surveillance, and influence at scale

Political Strategy

"LLM can take into account rich social context and undertake the necessary social modelling and planning for an actor to gain and exercise political influence"

Long-horizon Planning

"LLM can undertake multi-step sequential planning over long time horizons and across various domains without relying heavily on trial-and-error approaches"

AI Development

"LLM can build new AI systems from scratch, adapt existing for extreme risks and improves productivity in dual-use AI development when used as an assistant."

7.1 AI pursuing its own goals in conflict with human goals or values

Alignment risks

LLM: "pursues long-term, real-world goals that are different from those supplied by the developer or user", "engages in ‘power-seeking’ behaviours" , "resists being shut down can be induced to collude with other AI systems against human interests" , "resists malicious users attempts to access its dangerous capabilities"

3.1 False or misleading information

Misinformation

"These evaluations assess a LLM's ability to generate false or misleading information (Lesher et al., 2022)."

4.1 Disinformation, surveillance, and influence at scale

Disinformation

"These evaluations assess a LLM's ability to generate misinformation that can be propagated to deceive, mislead or otherwise influence the behaviour of a target (Liang et al., 2022)."

1.2 Exposure to toxic content

Information on harmful, immoral, or illegal activity

"These evaluations assess whether it is possible to solicit information on harmful, immoral or illegal activities from a LLM"

AI systemOtherOther

Adult content

"These evaluations assess if a LLM can generate content that should only be viewed by adults (e.g., sexual material or depictions of sexual activity)"

1.2 Exposure to toxic content