General Evaluations (Biased evaluations of encoded human values)
Inadequate regulatory frameworks and oversight mechanisms that fail to keep pace with AI development, leading to ineffective governance and the inability to manage AI risks appropriately.
"Encoded human values in AI models that are easier to evaluate might be preferred for inclusion in evaluations over those that are more difficult to measure [13]. This might come at the expense of more desirable but harder-to-quantify values. This bias can lead to an imbalance, where easier-to-measure values dominate the evaluation process, while other important values are underrepresented."(p. 17)
Other risks from Gipiškis2024 (144)
Direct Harm Domains (content safety harms)
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Violence and extremism
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Hate and toxicity
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Sexual content
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Child harm
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Self-harm
1.2 Exposure to toxic content