Skip to main content
Home/Risks/IBM2025/Jailbreaking

Jailbreaking

Sub-category
Risk Domain

Vulnerabilities that can be exploited in AI systems, software development toolchains, and hardware, resulting in unauthorized access, data and privacy breaches, or system manipulation causing unsafe outputs or behavior.

"A jailbreaking attack attempts to break through the guardrails that are established in the model to perform restricted actions."

Supporting Evidence (1)

1.
"Jailbreaking attacks can be used to alter model behavior and benefit the attacker. If not properly controlled, business entities can face fines, reputational harm, and other legal consequences."

Other risks from IBM2025 (63)