BackWatermarking of synthetic media

This page is still being polished. If you have thoughts, please share them via the feedback form.

Data on this page is preliminary and may change. Please do not share or cite these figures publicly.

Watermarking of synthetic media

Tong (2025)|LLM classified

Mitigation Taxonomy

1AI System

1.2Non-Model

1.2.5Provenance & Watermarking

Output attribution, content watermarking, and AI detection mechanisms.

Also in Non-Model

1.2.1 Guardrails & Filtering1.2.2 Runtime Environment1.2.3 Monitoring & Detection1.2.4 Security Infrastructure

LLM Classification Details

Reasoning

Watermarking marks synthetic outputs for attribution and AI detection, operating on system provenance tracking without modifying model weights.

Code: 1.2.5Version: v0.5Classified: Jan 22, 2026

Part of

Perception-level output

[Not a mitigation] Key risk metric/focus: Authenticity and consent. (Preventing deceptive or harmful manipulations of media; ensuring subjects’ rights are respected in generated content.)

Other mitigations from Tong (2025) (34)

Perception-level output

[Not a mitigation] Key risk metric/focus: Authenticity and consent. (Preventing deceptive or harmful manipulations of media; ensuring subjects’ rights are respected in generated content.)

99.9 Other

Lifecycle:Unable to classifyActor:Unable to classifyAIRM:Unable to classify

Perception-level output > Implement content filters

1.2.1 Guardrails & Filtering

Lifecycle:Verify and ValidateActor:DeveloperAIRM:Manage

Perception-level output > Detection of deepfakes

1.2.5 Provenance & Watermarking

Lifecycle:Unable to classifyActor:DeveloperAIRM:Manage

Perception-level output > Enforce usage policies

No non-consensual image generation, user identity verification for sensitive uses

2.3.2 Access & Security Controls

Lifecycle:Operate and MonitorActor:DeployerAIRM:Govern

Perception-level output > Refrain from malicious use or unwarranted trust in unverified media

99.9 Other

Lifecycle:Unable to classifyActor:UserAIRM:Unable to classify

Knowledge-level output

[Not a mitigation] Key risk metric/focus: Accuracy and veracity. (Maximizing truthfulness of outputs; minimizing false or misleading information.)

99.9 Other

Lifecycle:Unable to classifyActor:Unable to classifyAIRM:Unable to classify

View all 34 mitigations from this source →

Source Document

A First-Principles Based Risk Assessment Framework and the IEEE P3396 Standard

Tong, Richard J.; Cortês, Marina; DeFalco, Jeanine A.; Underwood, Mark; Zalewski, Janusz (2025)

Generative Artificial Intelligence (AI) is enabling unprecedented automation in content creation and decision support, but it also raises novel risks. This paper presents a first-principles risk assessment framework underlying the IEEE P3396 Recommended Practice for AI Risk, Safety, Trustworthiness, and Responsibility. We distinguish between process risks (risks arising from how AI systems are built or operated) and outcome risks (risks manifest in the AI system's outputs and their real-world effects), arguing that generative AI governance should prioritize outcome risks. Central to our approach is an information-centric ontology that classifies AI-generated outputs into four fundamen-tal categories: (1) Perception-level information, (2) Knowledge-level information, (3) Decision/Action plan information, and (4) Control tokens (access or resource directives). This classification allows systematic identification of harms and more precise attribution of responsibility to stakeholders (developers, deployers, users, regulators) based on the nature of the information produced. We illustrate how each information type entails distinct outcome risks (e.g, deception, misinformation, unsafe recommendations, security breaches) and requires tailored risk metrics and mitigations. By grounding the framework in the essence of information, human agency, and cognition, we align risk evaluation with how AI outputs influence human understanding and action. The result is a principled approach to AI risk that supports clear accountability and targeted safeguards, in contrast to broad application-based risk categorizations. We include example tables mapping information types to risks and responsibilities. This work aims to inform the IEEE P3396 Recommended Practice and broader AI governance with a rigorous, first-principles foundation for assessing generative AI risks while enabling responsible innovation. ¬© 2025 IEEE.

View source DOI: 10.1109/CAI64502.2025.00237

Classification

AI Lifecycle Stage

Verify and Validate

Testing, evaluating, auditing, and red-teaming the AI system

Responsible Actor

Developer

Entity that creates, trains, or modifies the AI system

NIST AI RMF Function

Manage

Prioritising, responding to, and mitigating AI risks

Risk Domains

Primary

3.1 False or misleading information

Other

4.1 Disinformation, surveillance, and influence at scale 4.3 Fraud, scams, and targeted manipulation