Skip to main content

Discrimination & Toxicity

Privacy & Security

Malicious Actors & Misuse

Human-Computer Interaction

Socioeconomic & Environmental

AI System Safety, Failures & Limitations

Governance Search

Incidents Search

Mitigations Search

Mitigations V2 Search

Explore Taxonomies About

© 2026 Spencer MichaelsA project of the MIT FutureTech AI Risk Initiative

··Download data·Feedback

© 2026 Spencer MichaelsA project of the MIT FutureTech AI Risk Initiative

Download data Feedback

Attacks and Defenses for SAM

BackLarge Language Models

Adversarial Attack

Home/Mitigations V2/Browse/Large Language Models

Attacks and Defenses for SAM

Adversarial Attack

Home/Mitigations V2/Browse/Large Language Models

Attacks and Defenses for SAM

Adversarial Attack

This page is still being polished. If you have thoughts, please share them via the feedback form.

Data on this page is preliminary and may change. Please do not share or cite these figures publicly.

Large Language Models

Ma (2025)|LLM classified

Mitigation Taxonomy

99Other

99.9Other

Unclassifiable mitigations.

LLM Classification Details

Reasoning

No description provided. "Large Language Models" is a technology category, not an identifiable mitigation mechanism.

Code: 99.9Version: v0.5Classified: Jan 22, 2026

Sub-mitigations (12)

Adversarial Attack

2.2.2 Testing & Evaluation

Lifecycle:Unable to classifyActor:Unable to classifyAIRM:Unable to classify

adversarial defenses

Lifecycle:Unable to classifyActor:Unable to classifyAIRM:Unable to classify

Jailbreak Attacks

Lifecycle:Operate and MonitorActor:UserAIRM:Measure

jailbreak defense

1.2.1 Guardrails & Filtering

Lifecycle:Operate and MonitorActor:DeveloperAIRM:Manage

Prompt Injection Attacks

1.2.1 Guardrails & Filtering

Lifecycle:Operate and MonitorActor:UserAIRM:Measure

prompt injection defenses

1.2.1 Guardrails & Filtering

Lifecycle:Operate and MonitorActor:DeveloperAIRM:Manage

Backdoor Attacks

1.1.1 Training Data

Lifecycle:Build and Use ModelActor:DeveloperAIRM:Unable to classify

backdoor defenses

1.1.1 Training Data

Lifecycle:Build and Use ModelActor:DeveloperAIRM:Manage

safety alignment

Lifecycle:Build and Use ModelActor:DeveloperAIRM:Manage

Energy Latency Attacks

Lifecycle:Unable to classifyActor:Unable to classifyAIRM:Unable to classify

Model Extraction Attacks

Lifecycle:Unable to classifyActor:DeveloperAIRM:Unable to classify

Data Extraction Attacks

Lifecycle:Unable to classifyActor:Unable to classifyAIRM:Unable to classify

Other mitigations from Ma (2025) (47)

Large Model and Agent Safety

Lifecycle:Unable to classifyActor:Unable to classifyAIRM:Unable to classify

Vision foundation models

Lifecycle:Unable to classifyActor:Unable to classifyAIRM:Unable to classify

Vision foundation models > Attacks and Defenses for ViT

Lifecycle:Build and Use ModelActor:DeveloperAIRM:Manage

Vision foundation models > Attacks and Defenses for SAM

Lifecycle:Unable to classifyActor:Unable to classifyAIRM:Unable to classify

Vision-Language Pre-Training Models

Lifecycle:Build and Use ModelActor:DeveloperAIRM:Unable to classify

Vision-Language Pre-Training Models > Adversarial Attacks

2.2.2 Testing & Evaluation

Lifecycle:Unable to classifyActor:DeveloperAIRM:Unable to classify

View all 47 mitigations from this source →

Source Document

Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety

Ma, Xingjun; Gao, Yifeng; Wang, Yixu; Wang, Ruofan; Wang, Xin; Sun, Ye; Ding, Yifan; Xu, Hengyuan; Chen, Yunhao; Zhao, Yunhan; Huang, Hanxun; Li, Yige; Wu, Yutao; Zhang, Jiaming; Zheng, Xiang; Bai, Yang; Wu, Zuxuan; Qiu, Xipeng; Zhang, Jingfeng; Li, Yiming; Han, Xudong; Li, Haonan; Sun, Jun; Wang, Cong; Gu, Jindong; Wu, Baoyuan; Chen, Siheng; Zhang, Tianwei; Liu, Yang; Gong, Mingming; Liu, Tongliang; Pan, Shirui; Xie, Cihang; Pang, Tianyu; Dong, Yinpeng; Jia, Ruoxi; Zhang, Yang; Ma, Shiqing; Zhang, Xiangyu; Gong, Neil Zhenqiang; Xiao, Chaowei; Erfani, Sarah; Baldwin, Tim; Li, Bo; Sugiyama, Masashi; Tao, Dacheng; Bailey, James; Jiang, Yu-Gang (2025)

Large Language Models (LLMs) are now com- monplace in conversation applications. How- ever, their risks of misuse for generating harm- ful responses have raised serious societal con- cerns and spurred recent research on LLM con- versation safety. Therefore, in this survey, we provide a comprehensive overview of recent studies, covering three critical aspects of LLM conversation safety: attacks, defenses, and eval- uations. Our goal is to provide a structured sum- mary that enhances understanding of LLM con- versation safety and encourages further investi- gation into this important subject. For easy ref- erence, we have categorized all the studies men- tioned in this survey according to our taxonomy, available at: https://github.com/niconi19/LLM- conversation-safety.

View source DOI: 10.48550/arXiv.2502.05206

Classification

AI Lifecycle Stage

Unable to classify

Could not be classified to a specific lifecycle stage

Responsible Actor

Unable to classify

Could not be classified to a specific actor type

NIST AI RMF Function

Unable to classify

Could not be classified to a specific AIRM function