Vulnerabilities to jailbreaks exploiting long context windows (many- shot jailbreaking)
Vulnerabilities that can be exploited in AI systems, software development toolchains, and hardware, resulting in unauthorized access, data and privacy breaches, or system manipulation causing unsafe outputs or behavior.
"Language models with long context windows are vulnerable to new types of ex- ploitations that are ineffective on models with shorter context windows. While few-shot jailbreaking, which involves providing few examples of the desired harmful output, might not trigger a harmful response, many-shot jailbreak- ing, which involves a higher number of such examples, increases the likelihood of eliciting an undesirable output. These vulnerabilities become more significant as context windows expand with newer model releases [7]."(p. 27)
Other risks from Gipiškis2024 (144)
Direct Harm Domains (content safety harms)
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Violence and extremism
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Hate and toxicity
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Sexual content
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Child harm
1.2 Exposure to toxic contentDirect Harm Domains (content safety harms) > Self-harm
1.2 Exposure to toxic content