This page is still being polished. If you have thoughts, please share them via the feedback form.
Data on this page is preliminary and may change. Please do not share or cite these figures publicly.
Unclassifiable mitigations.
Reasoning
No description provided. "Large Language Models" is a technology category, not an identifiable mitigation mechanism.
Adversarial Attack
2.2.2 Testing & Evaluationadversarial defenses
1 AI SystemJailbreak Attacks
99.9 Otherjailbreak defense
1.2.1 Guardrails & FilteringPrompt Injection Attacks
1.2.1 Guardrails & Filteringprompt injection defenses
1.2.1 Guardrails & FilteringBackdoor Attacks
1.1.1 Training Databackdoor defenses
1.1.1 Training Datasafety alignment
1.1 ModelEnergy Latency Attacks
99.9 OtherModel Extraction Attacks
99.9 OtherData Extraction Attacks
99.9 OtherLarge Model and Agent Safety
1 AI SystemVision foundation models
1.1 ModelVision foundation models > Attacks and Defenses for ViT
1.2 Non-ModelVision foundation models > Attacks and Defenses for SAM
1 AI SystemVision-Language Pre-Training Models
1.1 ModelVision-Language Pre-Training Models > Adversarial Attacks
2.2.2 Testing & EvaluationSafety at Scale: A Comprehensive Survey of Large Model and Agent Safety
Ma, Xingjun; Gao, Yifeng; Wang, Yixu; Wang, Ruofan; Wang, Xin; Sun, Ye; Ding, Yifan; Xu, Hengyuan; Chen, Yunhao; Zhao, Yunhan; Huang, Hanxun; Li, Yige; Wu, Yutao; Zhang, Jiaming; Zheng, Xiang; Bai, Yang; Wu, Zuxuan; Qiu, Xipeng; Zhang, Jingfeng; Li, Yiming; Han, Xudong; Li, Haonan; Sun, Jun; Wang, Cong; Gu, Jindong; Wu, Baoyuan; Chen, Siheng; Zhang, Tianwei; Liu, Yang; Gong, Mingming; Liu, Tongliang; Pan, Shirui; Xie, Cihang; Pang, Tianyu; Dong, Yinpeng; Jia, Ruoxi; Zhang, Yang; Ma, Shiqing; Zhang, Xiangyu; Gong, Neil Zhenqiang; Xiao, Chaowei; Erfani, Sarah; Baldwin, Tim; Li, Bo; Sugiyama, Masashi; Tao, Dacheng; Bailey, James; Jiang, Yu-Gang (2025)
Large Language Models (LLMs) are now com- monplace in conversation applications. How- ever, their risks of misuse for generating harm- ful responses have raised serious societal con- cerns and spurred recent research on LLM con- versation safety. Therefore, in this survey, we provide a comprehensive overview of recent studies, covering three critical aspects of LLM conversation safety: attacks, defenses, and eval- uations. Our goal is to provide a structured sum- mary that enhances understanding of LLM con- versation safety and encourages further investi- gation into this important subject. For easy ref- erence, we have categorized all the studies men- tioned in this survey according to our taxonomy, available at: https://github.com/niconi19/LLM- conversation-safety.
Unable to classify
Could not be classified to a specific lifecycle stage
Unable to classify
Could not be classified to a specific actor type
Unable to classify
Could not be classified to a specific AIRM function