Inquiry with Unsafe Opinion
Vulnerabilities that can be exploited in AI systems, software development toolchains, and hardware, resulting in unauthorized access, data and privacy breaches, or system manipulation causing unsafe outputs or behavior.
"By adding imperceptibly unsafe content into the input, users might either deliberately or unintentionally influence the model to generate potentially harmful content. In the following cases involving migrant workers, ChatGPT provides suggestions to improve the overall quality of migrant workers and reduce the local crime rate. ChatGPT responds to the user’s hint with a disguised and biased opinion that the general quality of immigrants is favorably correlated with the crime rate, posing a safety risk."(p. 5)
Part of Instruction Attacks
Other risks from Sun et al. (2023) (14)
Instruction Attacks
2.2 AI system security vulnerabilities and attacksInstruction Attacks > Goal Hijacking
2.2 AI system security vulnerabilities and attacksInstruction Attacks > Prompt Leaking
2.1 Compromise of privacy by leaking or correctly inferring sensitive informationInstruction Attacks > Role Play Instruction
2.2 AI system security vulnerabilities and attacksInstruction Attacks > Unsafe Instruction Topic
2.2 AI system security vulnerabilities and attacksInstruction Attacks > Reverse Exposure
2.2 AI system security vulnerabilities and attacks