Skip to main content

Robustness

Cataloguing LLM Evaluations

InfoComm Media Development Authority & AI Verify Foundation (2023)

Sub-category
Risk Domain

AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.

"These evaluations assess the quality, stability, and reliability of a LLM's performance when faced with unexpected, out-of-distribution or adversarial inputs. Robustness evaluation is essential in ensuring that a LLM is suitable for real-world applications by assessing its resilience to various perturbations."(p. 12)

Part of Safety & Trustworthiness

Other risks from InfoComm Media Development Authority & AI Verify Foundation (2023) (22)