This page is still being polished. If you have thoughts, please share them via the feedback form.
Data on this page is preliminary and may change. Please do not share or cite these figures publicly.
Technical mechanisms operating on non-model components of the AI system without modifying model weights. Components include: input/output interfaces, runtime environment, guardrail/monitoring classifiers, tool chain, and hardware.
Also in AI System
AGAT ARD-PRM Patch-Vestiges ViTGuard ARMRO Smoothed-Attention TAP RSPC FViT SATA ADBM CGDMP OSCP PatchDrop Image Blocking ASAM Robust SAM
Reasoning
Comprehensive collection of attack and defense techniques for Vision Transformers spanning multiple technical interventions without sufficient detail to distinguish L3 categories.
Vision foundation models
Large Model and Agent Safety
1 AI SystemVision foundation models
1.1 ModelVision foundation models > Attacks and Defenses for SAM
1 AI SystemLarge Language Models
99.9 OtherLarge Language Models > Adversarial Attack
2.2.2 Testing & EvaluationLarge Language Models > adversarial defenses
1 AI SystemSafety at Scale: A Comprehensive Survey of Large Model and Agent Safety
Ma, Xingjun; Gao, Yifeng; Wang, Yixu; Wang, Ruofan; Wang, Xin; Sun, Ye; Ding, Yifan; Xu, Hengyuan; Chen, Yunhao; Zhao, Yunhan; Huang, Hanxun; Li, Yige; Wu, Yutao; Zhang, Jiaming; Zheng, Xiang; Bai, Yang; Wu, Zuxuan; Qiu, Xipeng; Zhang, Jingfeng; Li, Yiming; Han, Xudong; Li, Haonan; Sun, Jun; Wang, Cong; Gu, Jindong; Wu, Baoyuan; Chen, Siheng; Zhang, Tianwei; Liu, Yang; Gong, Mingming; Liu, Tongliang; Pan, Shirui; Xie, Cihang; Pang, Tianyu; Dong, Yinpeng; Jia, Ruoxi; Zhang, Yang; Ma, Shiqing; Zhang, Xiangyu; Gong, Neil Zhenqiang; Xiao, Chaowei; Erfani, Sarah; Baldwin, Tim; Li, Bo; Sugiyama, Masashi; Tao, Dacheng; Bailey, James; Jiang, Yu-Gang (2025)
Large Language Models (LLMs) are now com- monplace in conversation applications. How- ever, their risks of misuse for generating harm- ful responses have raised serious societal con- cerns and spurred recent research on LLM con- versation safety. Therefore, in this survey, we provide a comprehensive overview of recent studies, covering three critical aspects of LLM conversation safety: attacks, defenses, and eval- uations. Our goal is to provide a structured sum- mary that enhances understanding of LLM con- versation safety and encourages further investi- gation into this important subject. For easy ref- erence, we have categorized all the studies men- tioned in this survey according to our taxonomy, available at: https://github.com/niconi19/LLM- conversation-safety.
Build and Use Model
Training, fine-tuning, and integrating the AI model
Developer
Entity that creates, trains, or modifies the AI system
Manage
Prioritising, responding to, and mitigating AI risks