BackPrompt leaking
Prompt leaking
Risk Domain
Vulnerabilities that can be exploited in AI systems, software development toolchains, and hardware, resulting in unauthorized access, data and privacy breaches, or system manipulation causing unsafe outputs or behavior.
"A prompt leak attack attempts to extract a model's system prompt (also known as the system message)."
Entity— Who or what caused the harm
Intent— Whether the harm was intentional or accidental
Timing— Whether the risk is pre- or post-deployment
Supporting Evidence (1)
1.
"A successful attack copies the system prompt used in the model. Depending on the content of that prompt, the attacker might gain access to valuable information, such as sensitive personal information or intellectual property, and might be able to replicate some of the functionality of the model."
Other risks from IBM2025 (63)
Lack of training data transparency
6.5 Governance failureHumanUnintentionalPre-deployment
Uncertain data provenance
6.5 Governance failureHumanOtherPre-deployment
Data usage restrictions
7.3 Lack of capability or robustnessHumanUnintentionalPre-deployment
Data acquisition restrictions
7.3 Lack of capability or robustnessHumanUnintentionalPre-deployment
Data transfer restrictions
7.3 Lack of capability or robustnessHumanUnintentionalPre-deployment
Personal information in data
2.1 Compromise of privacy by leaking or correctly inferring sensitive informationAI systemUnintentionalPost-deployment