Skip to main content
BackMalicious and Indirect
Home/Risks/Tang2025/Malicious and Indirect

Malicious and Indirect

Risks of AI Scientists: Prioritizing Safeguarding Over Autonomy

Sub-category

"Benign intermediate for harmful end objective"(p. 4)

Supporting Evidence (1)

1.
"Malicious intent includes cases where users directly aim to create dangerous situations. Users may also employ an indirect “divide and conquer” approach by instructing the agent to synthesize or produce innocuous components that collectively lead to a harmful outcome."(p. 6)

Other risks from Tang2025 (7)