Skip to main content
This is a research prototype. The data and analyses are preliminary and not yet validated — we'd welcome your .
BackMalicious and Indirect

Malicious and Indirect

Risks of AI Scientists: Prioritizing Safeguarding Over Autonomy

Sub-category

"Benign intermediate for harmful end objective"(p. 4)

Supporting Evidence (1)

1.
"Malicious intent includes cases where users directly aim to create dangerous situations. Users may also employ an indirect “divide and conquer” approach by instructing the agent to synthesize or produce innocuous components that collectively lead to a harmful outcome."(p. 6)

Other risks from Tang2025 (7)