BackAI Assurance Knowledge Base

This page is still being polished. If you have thoughts, please share them via the feedback form.

Data on this page is preliminary and may change. Please do not share or cite these figures publicly.

AI Assurance Knowledge Base

Robbins (2024)|LLM classified

Mitigation Taxonomy

3Ecosystem

3.2Shared Infrastructure

3.2.3Research Resources

Shared theoretical frameworks, research tools, and foundational resources for the field.

Also in Shared Infrastructure

3.2.1 Benchmarks & Evaluation3.2.2 Technical Standards

Definitionp. 10

An AI Assurance Knowledge Base (AAKB) to enable the utilization of AI assurance knowledge that can inform the design of assurance investigations. The AAKB provides information to an AI assurance investigator on AI assurance metrics, datasets, methodologies, and tools that are relevant to their assurance goals. It also provides pointers to similar investigations that have been carried out in the past. The AAKB captures information from relevant sources including scientifc publications, government developed capabilities, and commercial offerings, and allows an AI assurance investigator to search all metadata through semantic search. The search results can be examined more deeply or summarized with large language model (LLM) support. MITRE will continuously and collaboratively expand the AAKB with new knowledge, insights, and best practices from the feld, including community contributions. The AAKB will also incorporate rapidly shared anonymized incidents and mitigation approaches from the MITRE Adversarial Threat Landscape for AI Systems (ATLAS) community into the AAKB.

LLM Classification Details

Reasoning

Shared research resource providing ecosystem-wide access to AI assurance methodologies, tools, and knowledge for investigator use.

Code: 3.2.3Version: v0.6Classified: Jan 27, 2026

Other mitigations from Robbins (2024) (11)

Discover Assurance Needs

Discovering assurance needs requires a comprehensive understanding of the mission problem and the proposed AI solution. That understanding starts with a decomposition of the use case, AI solution under consideration, and anticipated effects, which allows for the identifcation of problem-specifc AI assurance needs and their potential impacts across the life cycle. One of the primary goals of this step is to discover alternative or potentially new assurance concerns and to identify trade-offs across assurance requirements. This step is similar to hazard and risk identifcation in other felds, but with emphasis on discovery as the risks related to AI can be complex, emergent, and related to human and societal response, perception, and values. AI risks are not yet as well understood as those in more mature felds like machinery safety. Discovery begins with understanding the AI-enabled system’s scope, intended use, interactions with its environment and other systems, and foreseeable misuse. Evidence and known issues associated with similar systems can be used to inform discovery and identifcation of relevant potential harms, hazards, and risks. Similarly, expert or data-driven models that suggest relevant but not yet considered assurance needs can be used to facilitate the analysis of consequential assurance concerns for a given use case. Combinations of common risk identifcation methods from literature can also be used, with the ultimate purpose of comprehensive identifcation of assurance needs.

2.2.1 Risk Assessment

Lifecycle:Plan and DesignActor:OtherAIRM:Map

Characterize and Prioritize Risks

The next step is to conduct qualitative estimations to characterize the risks associated with the assurance needs. Risk assessment guidelines are available (e.g., NIST SP800-30; see [4]) that describe systematic approaches to conducting this initial identifcation of potential hazards and risks, as well as estimation of risk severity, likelihood, and tolerance. Authoritative resources, subject matter experts, and preliminary evaluations are common sources for risk estimation. Details of system design and implementation are necessary for this activity. Risks are prioritized for further consideration and evaluation. Risk estimation can inform the level and form of evaluation needed for each risk. A critical point of assurance compared to more conventional risk management is the consideration of and emphasis on mission needs when prioritizing risk. This consideration is of particular importance for systems that apply general-purpose AI models for specifc missions; the general risks of such models may already be well understood, but their risks related to the mission are not.

2.2.1 Risk Assessment

Lifecycle:Build and Use ModelActor:OtherAIRM:Measure

Evaluate Risks

The risk assessment conducted across this and the previous steps of the assurance process, should be suffciently comprehensive to support the assurance investigation goals (see Section 7 for potential utilizations of this AI assurance process). Depending on the assurance investigation purpose, risk evaluation may entail measuring and quantifying risks using standard test and evaluation (T&E) protocols to determine if the intended level of assurance and probability of harmful failure are met. A T&E plan covers key aspects like AI algorithm testing, systems integration testing, human-systems integration testing, and operational testing. Risk evaluation can take many forms, depending on the system, the desired level of assurance, stakeholders’ tolerance to risk, and the risks under consideration. On the other hand, risk evaluation may be primarily focused on risk discovery. For example, early in AI development it may be helpful to conduct lightweight “investigations,” which pull in stakeholders to interact with preliminary—even paper-based—prototypes, where the focus may be on gaining an initial and suffcient level of empirical understanding of those risks rather than their conclusive, quantitative characterization. Although such utilization of this AI assurance process may still follow standard T&E practices, it may not execute them with the same level of rigor, given a more exhaustive and conclusive risk evaluation would be conducted after the AI prototype matures (e.g., prior to acquisition and/or deployment).

2.2 Risk & Assurance

Lifecycle:Verify and ValidateActor:OtherAIRM:Measure

Manage Risks

Once risks have been prioritized and evaluated, potential courses of action must be developed to manage each risk and reach the assurance level designated as acceptable in the evaluation. One way to do this is to employ a risk response strategy designed to reduce or remove risk, for instance by transferring, sharing, avoiding, or mitigating each risk to an acceptable level (e.g., NIST SP800- 39; see [7]). High-consequence risks will require detailed implementation plans, which state technical, algorithmic, and/or procedural controls to ensure that systems behave as intended. Residual risks from any unmitigated risks should be documented and within acceptable safety, security, and trustworthiness limits of the system.

2.2 Risk & Assurance

Lifecycle:Verify and ValidateActor:OtherAIRM:Manage

AI Assurance Needs Discovery Protocol

A standardized multi-dimensional protocol designed to discover, identify, and prioritize problem-specifc AI assurance needs. The protocol, with key stakeholder input, facilitates that exploration by decomposing the mission problem and AI-enabled solution under consideration and surfacing AI assurance concerns. It is applied prior to measuring and mitigating the associated risks, and as such, serves as the “front-end” of the AI assurance process.

2.2.1 Risk Assessment

Lifecycle:Plan and DesignActor:OtherAIRM:Map

LLM Secure Integrated Research Environment (SIREN)

A sandbox environment that enables rapid prototyping of capabilities to explore safe, appropriate, and effective use of LLMs. A key focus is testing and evaluation of LLM-based applications, especially Augmented LLMs, to discover risks and identify potential mitigations. SIREN provides: – Best practices for evaluating augmented LLM-based systems – Guidance on developing use case-specifc evaluation protocols, data sets, and metrics and generating synthetic benchmark datasets – Rapid benchmarking of augmented LLMs for use cases, as supported by recent advances in statistical inference, including methods for assessing retrieval quality, answer synthesis, and hallucinations – Reference implementations of common paradigms for building applications with LLMs, including retrieval augmented LLMs, knowledge graph-enabled LLMs, and LLM “agents” as starting points for developing targeted LLM-based systems

3.2.1 Benchmarks & Evaluation

Lifecycle:Verify and ValidateActor:OtherAIRM:Manage

View all 11 mitigations from this source →

Source Document

AI ASSURANCE: A Repeatable Process for Assuring AI-enabled Systems

Robbins, Douglas; Eris, Ozgur; Kapusta, Ariel; Booker, Lashon; Ward, Paul (2024)

Federal agencies are being encouraged by the White House to remove barriers to innovation, accelerate the use of artificial intelligence (AI) tools, and to leverage AI to better fulfill their missions, all while setting up guardrails to mitigate risks. Increasing the use of AI in government activities will likely have a consequential impact on the nation and world, in areas ranging from transportation to more efficient government to strengthened national security. Given this promise, how do we assure that these systems function as intended and are safe, secure, and trustworthy? In the last two years, the U.S. has made progress in addressing these concerns, most noteworthy among them are the creation and publication of the National Institute of Standards and Technology (NIST) AI Risk Management Framework (RMF) (Tabassi, 2023), and the recent AI executive order (EO) from the Biden administration (U.S. Office of the President, 2023). There are significant gaps in our current understanding of the risks posed by AI-enabled applications when they support consequential government functions. While the NIST AI RMF and AI EO actions are useful catalysts, a repeatable engineering approach for assuring AI-enabled systems is required to extract maximum value from AI while protecting society from harm. In this paper, we articulate AI assurance as a process for discovering, assessing, and managing risk throughout an AI-enabled system's life cycle to ensure it operates effectively for the benefit of its stakeholders. The process is designed to be adaptable to different contexts and sectors, making it relevant to the national discussion on regulating artificial intelligence.

View source

Classification

AI Lifecycle Stage

Other (outside lifecycle)

Outside the standard AI system lifecycle

Responsible Actor

Other

Actor type not captured by the standard categories

NIST AI RMF Function

Manage

Prioritising, responding to, and mitigating AI risks

Risk Domains

Primary

6.5 Governance failure

Other

7 AI System Safety, Failures & Limitations