Utilize a purpose-built testing environm…

BackThe AI system to be deployed is demonstrated to be valid and reliable. Limitations of the generalizability beyond the conditions under which the technology was developed are documented.

This page is still being polished. If you have thoughts, please share them via the feedback form.

Data on this page is preliminary and may change. Please do not share or cite these figures publicly.

The AI system to be deployed is demonstrated to be valid and reliable. Limitations of the generalizability beyond the conditions under which the technology was developed are documented.

US_NIST (2024)|LLM classified

Mitigation Taxonomy

2Organisation

2.2Risk & Assurance

2.2.2Testing & Evaluation

Red teaming, capability evaluations, adversarial testing, and performance verification.

Also in Risk & Assurance

2.2.1 Risk Assessment2.2.3 Auditing & Compliance2.2.4 Assurance Documentation

LLM Classification Details

Reasoning

Documents system validity, reliability, and generalization limitations as evidence supporting deployment readiness.

Code: 2.2.4Version: v0.6Classified: Feb 6, 2026

Sub-mitigations (6)

Avoid extrapolating GAI system performance or capabilities from narrow, nonsystematic, and anecdotal assessments.

2.2.2 Testing & Evaluation

Lifecycle:Verify and ValidateActor:DeveloperAIRM:Measure

Document the extent to which human domain knowledge is employed to improve GAI system performance, via, e.g., RLHF, fine-tuning, retrievalaugmented generation, content moderation, business rules.

2.2.4 Assurance Documentation

Lifecycle:Verify and ValidateActor:DeveloperAIRM:Measure

Review and verify sources and citations in GAI system outputs during predeployment risk measurement and ongoing monitoring activities

2.3.3 Monitoring & Logging

Lifecycle:Verify and ValidateActor:Other (multiple actors)AIRM:Measure

Track and document instances of anthropomorphization (e.g., human images, mentions of human feelings, cyborg imagery or motifs) in GAI system interfaces.

2.2.1 Risk Assessment

Lifecycle:Plan and DesignActor:DeployerAIRM:Measure

Verify GAI system training data and TEVV data provenance, and that fine-tuning or retrieval-augmented generation data is grounded

2.2.1 Risk Assessment

Lifecycle:Collect and Process DataActor:DeployerAIRM:Measure

Regularly review security and safety guardrails, especially if the GAI system is being operated in novel circumstances. This includes reviewing reasons why the GAI system was initially assessed as being safe to deploy.

2.2.2 Testing & Evaluation

Lifecycle:Operate and MonitorActor:DeployerAIRM:Measure

Other mitigations from US_NIST (2024) (260)

Legal and regulatory requirements involving AI are understood, managed, and documented.

2.1.3 Policies & Procedures

Lifecycle:Other (outside lifecycle)Actor:Governance ActorAIRM:Govern

Legal and regulatory requirements involving AI are understood, managed, and documented. > Align GAI development and use with applicable laws and regulations, including those related to data privacy, copyright and intellectual property law.

2.1.3 Policies & Procedures

Lifecycle:Other (outside lifecycle)Actor:Governance ActorAIRM:Govern

The characteristics of trustworthy AI are integrated into organizational policies, processes, procedures, and practices.

2.1.3 Policies & Procedures

Lifecycle:Other (outside lifecycle)Actor:Governance ActorAIRM:Govern

The characteristics of trustworthy AI are integrated into organizational policies, processes, procedures, and practices. > Establish transparency policies and processes for documenting the origin and history of training data and generated data for GAI applications to advance digital content transparency, while balancing the proprietary nature of training approaches.

2.1.3 Policies & Procedures

Lifecycle:Other (outside lifecycle)Actor:Governance ActorAIRM:Govern

The characteristics of trustworthy AI are integrated into organizational policies, processes, procedures, and practices. > Establish policies to evaluate risk-relevant capabilities of GAI and robustness of safety measures, both prior to deployment and on an ongoing basis, through internal and external evaluations.

2.1.3 Policies & Procedures

Lifecycle:Other (outside lifecycle)Actor:Governance ActorAIRM:Govern

Processes, procedures, and practices are in place to determine the needed level of risk management activities based on the organization’s risk tolerance.

2.1.3 Policies & Procedures

Lifecycle:Other (outside lifecycle)Actor:Governance ActorAIRM:Govern

View all 260 mitigations from this source →

Source Document

Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile (NIST AI 600-1)

US National Institute of Standards and Technology (NIST) (2024)

This document is a cross-sectoral profile of and companion resource for the AI Risk Management Framework (AI RMF 1.0) for Generative AI, 1 pursuant to President Biden’s Executive Order (EO) 14110 on Safe, Secure, and Trustworthy Artificial Intelligence.2 The AI RMF was released in January 2023, and is intended for voluntary use and to improve the ability of organizations to incorporate trustworthiness considerations into the design, development, use, and evaluation of AI products, services, and systems.

View source DOI: 10.6028/nist.ai.600-1

Classification

AI Lifecycle Stage

Verify and Validate

Testing, evaluating, auditing, and red-teaming the AI system

Responsible Actor

Deployer

Entity that integrates and deploys the AI system for end users

NIST AI RMF Function

Measure

Quantifying, testing, and monitoring identified AI risks

Risk Domains

Primary

7.3 Lack of capability or robustness