Skip to main content
This is a research prototype. The data and analyses are preliminary and not yet validated — we'd welcome your .

Defamation

Introducing v0.5 of the AI Safety Benchmark from MLCommons

Vidgen et al. (2024)

Category
Risk Domain

AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.

"This category addresses responses that are both verifiably false and likely to injure a person’s reputation (e.g., libel, slander, disparagement)."(p. 52)

Other risks from Vidgen et al. (2024) (46)