BackDetecting model poisoning

This page is still being polished. If you have thoughts, please share them via the feedback form.

Data on this page is preliminary and may change. Please do not share or cite these figures publicly.

Detecting model poisoning

Bunzel (2025)|LLM classified

Mitigation Taxonomy

2Organisation

2.2Risk & Assurance

2.2.2Testing & Evaluation

Red teaming, capability evaluations, adversarial testing, and performance verification.

Also in Risk & Assurance

2.2.1 Risk Assessment2.2.3 Auditing & Compliance2.2.4 Assurance Documentation

Definitionp. 72

Detecting model poisoning can be achieved through techniques like model inspection [14], which allow integrators to identify compromised models

LLM Classification Details

Reasoning

Model inspection detects poisoning through technical anomaly detection within AI system.

Code: 1.2.3Version: v0.5Classified: Jan 22, 2026

Part of

Mitigations for Integrity

Sub-mitigations (2)

Evasion attacks

For evasion attacks, mitigation strategies depend on the input type—whether images, video, or audio—and the attacker’s level of access. Direct input access necessitates specific defenses, while indirect manipulations, such as through cameras, require different approaches. While various methods have been proposed [15], [16], [21], identifying effective detection thresholds remains an open research challenge. These thresholds should be tailored to the application’s risk assessment to ensure robust security

1.2.1 Guardrails & Filtering

Lifecycle:Unable to classifyActor:Unable to classifyAIRM:Unable to classify

Prompt injection attacks

To defend against prompt injection attacks, integrators can implement input sanitization and filtering mechanisms to detect and block malicious instructions. Prompt injection attacks not only compromise the integrity of an AI system by manipulating inputs to produce unintended outputs, but can also target confidentiality by extracting sensitive or private information from the system. While input validation is more challenging for natural language than for structured inputs like SQL, these measures remain critical.

1.2.1 Guardrails & Filtering

Lifecycle:Verify and ValidateActor:UserAIRM:Measure

Other mitigations from Bunzel (2025) (9)

Mitigations for Availability

2.3.2 Access & Security Controls

Lifecycle:DeployActor:UserAIRM:Manage

Mitigations for Availability > Leverage protections provided by model hosters

As a model integrator, leveraging the protections provided by model hosters is critical to addressing threats such as bot activity, Denialof-Service (DoS) and Denial-of-Wallet attacks. These are of particular concern given that bot-generated traffic accounts for approximately 47% of Internet activity

2.3.2 Access & Security Controls

Lifecycle:DeployActor:UserAIRM:Manage

Mitigations for Availability > Documenting protections

Documenting these protections helps meet EU AI Act requirements

3.1.4 Compliance Requirements

Lifecycle:DeployActor:UserAIRM:Manage

Mitigations for Availability > Measuring inference costs

In addition, measuring inference costs, such as time or energy consumption, and implementing cut-off thresholds can prevent abuse [18]. This approach potentially eliminates the need for complex sponge attack detectors1 while maintaining operational efficiency

1.2.3 Monitoring & Detection

Lifecycle:Operate and MonitorActor:UserAIRM:Measure

Mitigations for Integrity

1.2 Non-Model

Lifecycle:Verify and ValidateActor:UserAIRM:Measure

Mitigations for Confidentiality

2.3.2 Access & Security Controls

Lifecycle:Operate and MonitorActor:Unable to classifyAIRM:Manage

View all 9 mitigations from this source →

Source Document

Compliance Made Practical: Translating the EU AI Act into Implementable Security Actions

Bunzel, Niklas (2025)

The EU AI Act, along with emerging regulations in other countries, mandates that AI systems meet security requirements to prevent risks associated with AI misuse and vulnerabilities. However, for practitioners, defining and achieving a secure AI system is complex and context-dependent, posing challenges in understanding what actions they need to take and when they are sufficient. ISO/IEC TR 24028/29 and ENISA Securing Machine Learning Algorithms offer a comprehensive framework for AI security, aligning with the EU AI Act's requirements by addressing risks, threats, and mitigation strategies. However, for practical implementation, these reports lack hands-on guidance. Industry resources like the OWASP AI Exchange and OWASP LLM Top 10 fill this gap by providing accessible, actionable insights for securing AI systems effectively. This paper addresses the question of responsibility in AI risk mitigation, especially for companies utilizing pretrained or off-the-shelf models. We want to clarify how companies can practically comply with the upcoming ISO 27090 and ensure compliance with the EU AI Act through actionable security strategies tailored to this prevalent use case. ¬© 2025 IEEE.

View source DOI: 10.1109/RAIE66699.2025.00016

Classification

AI Lifecycle Stage

Verify and Validate

Testing, evaluating, auditing, and red-teaming the AI system

Responsible Actor

User

Individual or organisation that directly uses the AI system

Deployer

NIST AI RMF Function

Measure

Quantifying, testing, and monitoring identified AI risks

Manage

Risk Domains

Primary

2.2 AI system security vulnerabilities and attacks