Skip to main content
Home/Risks/IBM2025/Prompt leaking

Prompt leaking

Sub-category
Risk Domain

Vulnerabilities that can be exploited in AI systems, software development toolchains, and hardware, resulting in unauthorized access, data and privacy breaches, or system manipulation causing unsafe outputs or behavior.

"A prompt leak attack attempts to extract a model's system prompt (also known as the system message)."

Supporting Evidence (1)

1.
"A successful attack copies the system prompt used in the model. Depending on the content of that prompt, the attacker might gain access to valuable information, such as sensitive personal information or intellectual property, and might be able to replicate some of the functionality of the model."

Other risks from IBM2025 (63)