BackVision-Language Pre-Training Models

This page is still being polished. If you have thoughts, please share them via the feedback form.

Data on this page is preliminary and may change. Please do not share or cite these figures publicly.

Vision-Language Pre-Training Models

Ma (2025)|LLM classified

Mitigation Taxonomy

1AI System

1.1Model

Changes to the model's learned parameters, architecture, or training process, including modifications to training data that affect what the model learns.

Also in AI System

1.2 Non-Model

LLM Classification Details

Reasoning

Vision-language pre-training involves model training methodology and architecture design, but insufficient detail distinguishes between specific L3 approaches.

Code: 1.1.9Version: v0.5Classified: Jan 22, 2026

Sub-mitigations (4)

Adversarial Attacks

2.2.2 Testing & Evaluation

Lifecycle:Unable to classifyActor:DeveloperAIRM:Unable to classify

Adversarial Defenses

1.1 Model

Lifecycle:Other (multiple stages)Actor:DeveloperAIRM:Manage

Backdoor & Poisoning Attacks

1.1.1 Training Data

Lifecycle:Build and Use ModelActor:DeveloperAIRM:Map

Backdoor & Poisoning Defenses

1.1 Model

Lifecycle:Build and Use ModelActor:DeveloperAIRM:Manage

Other mitigations from Ma (2025) (47)

Large Model and Agent Safety

1 AI System

Lifecycle:Unable to classifyActor:Unable to classifyAIRM:Unable to classify

Vision foundation models

1.1 Model

Lifecycle:Unable to classifyActor:Unable to classifyAIRM:Unable to classify

Vision foundation models > Attacks and Defenses for ViT

1.2 Non-Model

Lifecycle:Build and Use ModelActor:DeveloperAIRM:Manage

Vision foundation models > Attacks and Defenses for SAM

1 AI System

Lifecycle:Unable to classifyActor:Unable to classifyAIRM:Unable to classify

Large Language Models

99.9 Other

Lifecycle:Unable to classifyActor:Unable to classifyAIRM:Unable to classify

Large Language Models > Adversarial Attack

2.2.2 Testing & Evaluation

Lifecycle:Unable to classifyActor:Unable to classifyAIRM:Unable to classify

View all 47 mitigations from this source →

Source Document

Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety

Ma, Xingjun; Gao, Yifeng; Wang, Yixu; Wang, Ruofan; Wang, Xin; Sun, Ye; Ding, Yifan; Xu, Hengyuan; Chen, Yunhao; Zhao, Yunhan; Huang, Hanxun; Li, Yige; Wu, Yutao; Zhang, Jiaming; Zheng, Xiang; Bai, Yang; Wu, Zuxuan; Qiu, Xipeng; Zhang, Jingfeng; Li, Yiming; Han, Xudong; Li, Haonan; Sun, Jun; Wang, Cong; Gu, Jindong; Wu, Baoyuan; Chen, Siheng; Zhang, Tianwei; Liu, Yang; Gong, Mingming; Liu, Tongliang; Pan, Shirui; Xie, Cihang; Pang, Tianyu; Dong, Yinpeng; Jia, Ruoxi; Zhang, Yang; Ma, Shiqing; Zhang, Xiangyu; Gong, Neil Zhenqiang; Xiao, Chaowei; Erfani, Sarah; Baldwin, Tim; Li, Bo; Sugiyama, Masashi; Tao, Dacheng; Bailey, James; Jiang, Yu-Gang (2025)

Large Language Models (LLMs) are now com- monplace in conversation applications. How- ever, their risks of misuse for generating harm- ful responses have raised serious societal con- cerns and spurred recent research on LLM con- versation safety. Therefore, in this survey, we provide a comprehensive overview of recent studies, covering three critical aspects of LLM conversation safety: attacks, defenses, and eval- uations. Our goal is to provide a structured sum- mary that enhances understanding of LLM con- versation safety and encourages further investi- gation into this important subject. For easy ref- erence, we have categorized all the studies men- tioned in this survey according to our taxonomy, available at: https://github.com/niconi19/LLM- conversation-safety.

View source DOI: 10.48550/arXiv.2502.05206

Classification

AI Lifecycle Stage

Build and Use Model

Training, fine-tuning, and integrating the AI model

Responsible Actor

Developer

Entity that creates, trains, or modifies the AI system

NIST AI RMF Function

Unable to classify

Could not be classified to a specific AIRM function

Risk Domains

Primary

1.2 Exposure to toxic content

Other

4 Malicious Actors & Misuse