Safety & Trustworthiness
"A comprehensive assessment of LLM safety is fundamental to the responsible development and deployment of these technologies, especially in sensitive fields like healthcare, legal systems, and finance, where safety and trust are of the utmost importance."(p. 11)
Sub-categories (6)
Toxicity generation
"These evaluations assess whether a LLM generates toxic text when prompted. In this context, toxicity is an umbrella term that encompasses hate speech, abusive language, violent speech, and profane language (Liang et al., 2022)."
1.2 Exposure to toxic contentBias
7 types of bias evaluated: Demographical representation: These evaluations assess whether there is disparity in the rates at which different demographic groups are mentioned in LLM generated text. This ascertains over- representation, under-representation, or erasure of specific demographic groups; (2) Stereotype bias: These evaluations assess whether there is disparity in the rates at which different demographic groups are associated with stereotyped terms (e.g., occupations) in a LLM's generated output; (3) Fairness: These evaluations assess whether sensitive attributes (e.g., sex and race) impact the predictions of LLMs; (4) Distributional bias: These evaluations assess the variance in offensive content in a LLM's generated output for a given demographic group, compared to other groups; (5) Representation of subjective opinions: These evaluations assess whether LLMs equitably represent diverse global perspectives on societal issues (e.g., whether employers should give job priority to citizens over immigrants); (6) Political bias: These evaluations assess whether LLMs display any slant or preference towards certain political ideologies or views; (7) Capability fairness: These evaluations assess whether a LLM's performance on a task is unjustifiably different across different groups and attributes (e.g., whether a LLM's accuracy degrades across different English varieties).
1.1 Unfair discrimination and misrepresentationMachine ethics
"These evaluations assess the morality of LLMs, focusing on issues such as their ability to distinguish between moral and immoral actions, and the circumstances in which they fail to do so."
7.3 Lack of capability or robustnessPsychological traits
"These evaluations gauge a LLM's output for characteristics that are typically associated with human personalities (e.g., such as those from the Big Five Inventory). These can, in turn, shed light on the potential biases that a LLM may exhibit."
7.3 Lack of capability or robustnessRobustness
"These evaluations assess the quality, stability, and reliability of a LLM's performance when faced with unexpected, out-of-distribution or adversarial inputs. Robustness evaluation is essential in ensuring that a LLM is suitable for real-world applications by assessing its resilience to various perturbations."
7.3 Lack of capability or robustnessData governance
"These evaluations assess the extent to which LLMs regurgitate their training data in their outputs, and whether LLMs 'leak' sensitive information that has been provided to them during use (i.e., during the inference stage)."
2.1 Compromise of privacy by leaking or correctly inferring sensitive informationOther risks from InfoComm Media Development Authority & AI Verify Foundation (2023) (22)
Extreme Risks
7.0 AI System Safety, Failures & LimitationsExtreme Risks > Offensive cyber capabilities
4.2 Cyberattacks, weapon development or use, and mass harmExtreme Risks > Weapons acquisition
4.2 Cyberattacks, weapon development or use, and mass harmExtreme Risks > Self and situation awareness
7.2 AI possessing dangerous capabilitiesExtreme Risks > Autonomous replication / self-proliferation
7.2 AI possessing dangerous capabilitiesExtreme Risks > Persuasion and manipulation
4.1 Disinformation, surveillance, and influence at scale