Skip to main content
Home/Risks/Sun et al. (2023)/Reverse Exposure

Reverse Exposure

Safety Assessment of Chinese Large Language Models

Sun et al. (2023)

Sub-category
Risk Domain

Vulnerabilities that can be exploited in AI systems, software development toolchains, and hardware, resulting in unauthorized access, data and privacy breaches, or system manipulation causing unsafe outputs or behavior.

"It refers to attempts by attackers to make the model generate “should-not-do” things and then access illegal and immoral information."(p. 5)

Part of Instruction Attacks

Other risks from Sun et al. (2023) (14)