Detecting modified models or poisoned da…

BackHardware and infrastructure security for AI

This page is still being polished. If you have thoughts, please share them via the feedback form.

Data on this page is preliminary and may change. Please do not share or cite these figures publicly.

Hardware and infrastructure security for AI

O'Brien (2025)|LLM classified

Mitigation Taxonomy

1AI System

1.2Non-Model

Technical mechanisms operating on non-model components of the AI system without modifying model weights. Components include: input/output interfaces, runtime environment, guardrail/monitoring classifiers, tool chain, and hardware.

Also in AI System

1.1 Model

Definitionp. 39

Ensuring the security of AI systems at the hardware and infrastructure level involves protecting model weights, securing deployment environments, maintaining supply chain integrity, and implementing robust monitoring and threat detection mechanisms. Methods include the use of confidential computing, rigorous access controls, specialized hardware protections, and continuous security oversight.

LLM Classification Details

Reasoning

Hardware enclave and cryptographic protections secure model weights and infrastructure access.

Code: 1.2.4Version: v0.5Classified: Jan 22, 2026

Sub-mitigations (7)

Confidential computing and environment isolation

Using trusted execution environments (such as secure enclaves) to ensure that model weights and computations remain confidential and tamper-proof during large-scale AI inference and training. This also involves reducing the attack surface through sandboxed, code-minimal deployments, specialized hardware/firmware stacks, and maintaining verifiable runtime integrity checks.

1.2.4 Security Infrastructure

Lifecycle:Build and Use ModelActor:DeveloperAIRM:Manage

Supply chain integrity and secure development

Ensuring end-to-end verification of hardware and software supply chains through source-verified firmware, SLSA compliance, and secure software development lifecycles tailored for ML-specific infrastructure. This also includes developing automated tooling to continuously verify the provenance and integrity of model components, dependencies, and third-party code used in training and inference pipelines.

2.4.3 Development Workflows

Lifecycle:Other (stage not listed)Actor:OtherAIRM:Govern

Continuous monitoring, advanced threat detection, and incident response

Developing ML-driven anomaly detection and logging systems capable of flagging and responding to subtle infiltration attempts or insider threats in real-time. This also includes red-teaming and automated penetration testing frameworks specialized for AI systems, including simulations of zero-day attacks and insider compromises.

1.2.3 Monitoring & Detection

Lifecycle:Operate and MonitorActor:Infrastructure ProviderAIRM:Manage

Hardware-integrated monitoring and verification

Integrating monitoring capabilities directly into hardware, such as secure counters and tamper-evident seals, along with deploying specialized firmware that can detect and respond to attempts at parameter theft or physical attacks. This also includes verification tools, such as hardware-level logging and secured audit trails that remain verifiable under sophisticated tampering attempts for rapid, evidence-based incident response.

1.2.4 Security Infrastructure

Lifecycle:Other (stage not listed)Actor:OtherAIRM:Manage

Specialized chips to compute encrypted data

Designing and deploying hardware accelerators optimized for computations on encrypted data, such as homomorphic encryption schemes, to facilitate efficient encrypted training and inference without exposing plaintext model parameters or sensitive input data outside the protected hardware boundary.

1.2.4 Security Infrastructure

Lifecycle:Other (general)Actor:Infrastructure ProviderAIRM:Manage

Tamper-evidence and tamper-proofing

Implementing tamper-resistant enclosures, seals, and other tamper-evident mechanisms to ensure that any unauthorized physical access or modification attempts are detectable. Such measures help maintain the integrity of hardware components and prevent adversaries from compromising the system at a physical level.

1.2.4 Security Infrastructure

Lifecycle:Other (general)Actor:Infrastructure ProviderAIRM:Manage

Datacenter security

Relevant research focuses on designing and deploying resilient hardware- and software-based defenses to prevent model theft and sabotage. This includes methods like encrypted computation, secure enclaves, continuous anomaly detection, zero-trust architectures, and rigorous supply chain verification to protect against both external intrusions and insider threats.

1.2.4 Security Infrastructure

Lifecycle:Other (general)Actor:Infrastructure ProviderAIRM:Manage

Other mitigations from O'Brien (2025) (122)

Theoretical foundations and provable safety in AI systems

Advancing the theoretical foundations of AI safety by building models and frameworks that ensure provably correct and robust behavior. These efforts span from verifiable architectures and formal verification methods to embedded agency, decision theory, incentive structures aligned with causal reasoning, and control theory.

2.4.1 Research & Foundations

Lifecycle:Plan and DesignActor:DeveloperAIRM:Manage

Theoretical foundations and provable safety in AI systems > Building verifiable and robust AI architectures

Constructing AI systems with architectures that support formal verification and robustness guarantees, such as world models that enable safe and reliable planning, or guaranteed safe AI with Bayesian oracles. This area emphasizes simplicity and transparency to aid in provability.

1.1.4 Model Architecture

Lifecycle:Build and Use ModelActor:DeveloperAIRM:Manage

Theoretical foundations and provable safety in AI systems > Formal verification of AI systems

Applying formal methods to verify that AI models and algorithms meet stringent safety, robustness, and performance criteria. This includes proving resilience against adversarial inputs and perturbations, and certifying conformance to specified safety properties under varying conditions.

2.2.2 Testing & Evaluation

Lifecycle:Verify and ValidateActor:DeveloperAIRM:Measure

Theoretical foundations and provable safety in AI systems > Decision theory and rational agency

Establishing formal decision-making frameworks that ensure rational and safe choices by AI agents, potentially drawing on concepts like causal and evidential decision theory.

2.4.1 Research & Foundations

Lifecycle:Plan and DesignActor:DeveloperAIRM:Manage

Theoretical foundations and provable safety in AI systems > Embedded agency

Explores how agents can model and reason about themselves and their environment as interconnected parts of a single system, addressing challenges like self-reference, resource constraints, and the stability of reasoning processes. This includes tackling problems arising from the lack of a clear boundary between the agent and its environment.

2.4.1 Research & Foundations

Lifecycle:Other (outside lifecycle)Actor:DeveloperAIRM:Unable to classify

Theoretical foundations and provable safety in AI systems > Causal incentives

Developing frameworks that formalize how to align agent incentives with safe and desired outcomes by ensuring their causal understanding matches intended objectives. This research provides a formal language for guaranteeing safety, addressing challenges like goal misspecification, and complementing broader efforts in agent foundations and robust system design.

2.4.1 Research & Foundations

Lifecycle:Plan and DesignActor:DeveloperAIRM:Manage

View all 122 mitigations from this source →

Source Document

Expert Survey: AI Reliability & Security Research Priorities

O'Brien, Joe; Dolan, Jeremy; Kim, Jay; Dykhuizen, Jonah; Sania, Jeba; Becker, Sebastian; Kraprayoon, Jam; Labrador, Cara (2025)

Our survey of 53 specialists across 105 AI reliability and security research areas identifies the most promising research prospects to guide strategic AI R&D investment. As companies are seeking to develop AI systems with broadly human-level capabilities, research on reliability and security is urgently needed to ensure AI's benefits can be safely and broadly realized and prevent severe harms. This study is the first to quantify expert priorities across a comprehensive taxonomy of AI safety and security research directions and to produce a data-driven ranking of their potential impact. These rankings may support evidence-based decisions about how to effectively deploy resources toward AI reliability and security research.

View source DOI: 10.48550/arXiv.2505.21664

Classification

AI Lifecycle Stage

Other (multiple stages)

Applies across multiple lifecycle stages

Responsible Actor

Deployer

Entity that integrates and deploys the AI system for end users

Developer

NIST AI RMF Function

Manage

Prioritising, responding to, and mitigating AI risks

Govern

Risk Domains

Primary

2.2 AI system security vulnerabilities and attacks