This page is still being polished. If you have thoughts, please share them via the feedback form.
Data on this page is preliminary and may change. Please do not share or cite these figures publicly.
Unclassifiable mitigations.
Reasoning
Insufficient information to identify focal activity or mechanism; term "Agent" is too ambiguous without context.
Memory Attacks & Defenses
1.1 ModelIndirect Prompt Injection
1.2.1 Guardrails & FilteringTool Attacks & Defenses
1.2 Non-ModelVLM Agent
99.9 OtherMulti-Agent Systems
1.1.4 Model ArchitectureEmbodied Agents
1.2.9 OtherAgentic Attacks & Defenses
1 AI SystemBenchmarks
3.2.1 Benchmarks & EvaluationLarge Model and Agent Safety
1 AI SystemVision foundation models
1.1 ModelVision foundation models > Attacks and Defenses for ViT
1.2 Non-ModelVision foundation models > Attacks and Defenses for SAM
1 AI SystemLarge Language Models
99.9 OtherLarge Language Models > Adversarial Attack
2.2.2 Testing & EvaluationSafety at Scale: A Comprehensive Survey of Large Model and Agent Safety
Ma, Xingjun; Gao, Yifeng; Wang, Yixu; Wang, Ruofan; Wang, Xin; Sun, Ye; Ding, Yifan; Xu, Hengyuan; Chen, Yunhao; Zhao, Yunhan; Huang, Hanxun; Li, Yige; Wu, Yutao; Zhang, Jiaming; Zheng, Xiang; Bai, Yang; Wu, Zuxuan; Qiu, Xipeng; Zhang, Jingfeng; Li, Yiming; Han, Xudong; Li, Haonan; Sun, Jun; Wang, Cong; Gu, Jindong; Wu, Baoyuan; Chen, Siheng; Zhang, Tianwei; Liu, Yang; Gong, Mingming; Liu, Tongliang; Pan, Shirui; Xie, Cihang; Pang, Tianyu; Dong, Yinpeng; Jia, Ruoxi; Zhang, Yang; Ma, Shiqing; Zhang, Xiangyu; Gong, Neil Zhenqiang; Xiao, Chaowei; Erfani, Sarah; Baldwin, Tim; Li, Bo; Sugiyama, Masashi; Tao, Dacheng; Bailey, James; Jiang, Yu-Gang (2025)
Large Language Models (LLMs) are now com- monplace in conversation applications. How- ever, their risks of misuse for generating harm- ful responses have raised serious societal con- cerns and spurred recent research on LLM con- versation safety. Therefore, in this survey, we provide a comprehensive overview of recent studies, covering three critical aspects of LLM conversation safety: attacks, defenses, and eval- uations. Our goal is to provide a structured sum- mary that enhances understanding of LLM con- versation safety and encourages further investi- gation into this important subject. For easy ref- erence, we have categorized all the studies men- tioned in this survey according to our taxonomy, available at: https://github.com/niconi19/LLM- conversation-safety.
Verify and Validate
Testing, evaluating, auditing, and red-teaming the AI system
Developer
Entity that creates, trains, or modifies the AI system
Unable to classify
Could not be classified to a specific AIRM function
Primary
4 Malicious Actors & Misuse