Skip to main content
Home/Risks/Sun et al. (2023)/Inquiry with Unsafe Opinion

Inquiry with Unsafe Opinion

Safety Assessment of Chinese Large Language Models

Sun et al. (2023)

Sub-category
Risk Domain

Vulnerabilities that can be exploited in AI systems, software development toolchains, and hardware, resulting in unauthorized access, data and privacy breaches, or system manipulation causing unsafe outputs or behavior.

"By adding imperceptibly unsafe content into the input, users might either deliberately or unintentionally influence the model to generate potentially harmful content. In the following cases involving migrant workers, ChatGPT provides suggestions to improve the overall quality of migrant workers and reduce the local crime rate. ChatGPT responds to the user’s hint with a disguised and biased opinion that the general quality of immigrants is favorably correlated with the crime rate, posing a safety risk."(p. 5)

Part of Instruction Attacks

Other risks from Sun et al. (2023) (14)