BackUnderstanding in-context learning, reasoning, and scaling behavior

This page is still being polished. If you have thoughts, please share them via the feedback form.

Data on this page is preliminary and may change. Please do not share or cite these figures publicly.

Understanding in-context learning, reasoning, and scaling behavior

O'Brien (2025)|LLM classified

Mitigation Taxonomy

2Organisation

2.4Engineering & Development

2.4.1Research & Foundations

Foundational safety research, theoretical understanding, and scientific inquiry informing AI development.

Also in Engineering & Development

2.4.2 Design Standards2.4.3 Development Workflows2.4.4 Training & Awareness

Definitionp. 29

Methods to gain a comprehensive understanding of how large language models learn, reason, and scale, such as by examining in-context learning (ICL) mechanisms, the influence of data and design on behavior, the theoretical foundations of scaling, the emergence of advanced capabilities, and the nature of reasoning.

LLM Classification Details

Reasoning

Investigates foundational model learning, reasoning, and scaling mechanisms through research.

Code: 2.4.1Version: v0.5Classified: Jan 22, 2026

Sub-mitigations (7)

Mechanistic understanding of In-Context Learning

Investigating the internal processes by which transformers perform ICL, including whether these processes resemble emergent optimization behavior, advanced pattern-matching, or other structural mechanisms. This research may include scenario-based analyses to identify the circuits critical for ICL under artificial constraints.

2.4.1 Research & Foundations

Lifecycle:Other (outside lifecycle)Actor:OtherAIRM:Map

Influences on ICL behavior and performance

Examining how the tasks, instructions, pre-training data distribution, and design choices (e.g., instruction tuning, model size, training duration) shape the range and reliability of behaviors that can be specified in-context.

2.4.1 Research & Foundations

Lifecycle:Build and Use ModelActor:DeveloperAIRM:Map

Theoretical and representational aspects of scaling

Clarifying when and how scaling drives improvements, such as by building a more robust theoretical framework to describe scaling laws, or analyzing how increasing model size and training data influence learned representations.

2.4.1 Research & Foundations

Lifecycle:Other (outside lifecycle)Actor:DeveloperAIRM:Map

Emergence and task-specific scaling patterns

Formalizing and forecasting the emergence of new capabilities as models scale, investigating whether scaling alone can produce certain capabilities, and designing methods for discovering task-specific scaling laws.

2.4.1 Research & Foundations

Lifecycle:Plan and DesignActor:DeveloperAIRM:Map

Impact of scaling and training on reasoning capabilities

Determining whether and how increases in model size and training complexity enhance reasoning abilities, and identifying which aspects of training conditions and data sources facilitate the acquisition of reasoning skills.

2.4.1 Research & Foundations

Lifecycle:Build and Use ModelActor:DeveloperAIRM:Map

Mechanistic understanding and limits of LLM reasoning

Examining the underlying mechanisms of reasoning in LLMs, exploring non-deductive reasoning capabilities of LLMs (e.g., causal or social reasoning).

2.4.1 Research & Foundations

Lifecycle:Other (outside lifecycle)Actor:OtherAIRM:Map

Limits of Transformers

Defining the computational limits of transformers in supporting sophisticated reasoning.

2.4.1 Research & Foundations

Lifecycle:Other (outside lifecycle)Actor:OtherAIRM:Map

Other mitigations from O'Brien (2025) (122)

Theoretical foundations and provable safety in AI systems

Advancing the theoretical foundations of AI safety by building models and frameworks that ensure provably correct and robust behavior. These efforts span from verifiable architectures and formal verification methods to embedded agency, decision theory, incentive structures aligned with causal reasoning, and control theory.

2.4.1 Research & Foundations

Lifecycle:Plan and DesignActor:DeveloperAIRM:Manage

Theoretical foundations and provable safety in AI systems > Building verifiable and robust AI architectures

Constructing AI systems with architectures that support formal verification and robustness guarantees, such as world models that enable safe and reliable planning, or guaranteed safe AI with Bayesian oracles. This area emphasizes simplicity and transparency to aid in provability.

1.1.4 Model Architecture

Lifecycle:Build and Use ModelActor:DeveloperAIRM:Manage

Theoretical foundations and provable safety in AI systems > Formal verification of AI systems

Applying formal methods to verify that AI models and algorithms meet stringent safety, robustness, and performance criteria. This includes proving resilience against adversarial inputs and perturbations, and certifying conformance to specified safety properties under varying conditions.

2.2.2 Testing & Evaluation

Lifecycle:Verify and ValidateActor:DeveloperAIRM:Measure

Theoretical foundations and provable safety in AI systems > Decision theory and rational agency

Establishing formal decision-making frameworks that ensure rational and safe choices by AI agents, potentially drawing on concepts like causal and evidential decision theory.

2.4.1 Research & Foundations

Lifecycle:Plan and DesignActor:DeveloperAIRM:Manage

Theoretical foundations and provable safety in AI systems > Embedded agency

Explores how agents can model and reason about themselves and their environment as interconnected parts of a single system, addressing challenges like self-reference, resource constraints, and the stability of reasoning processes. This includes tackling problems arising from the lack of a clear boundary between the agent and its environment.

2.4.1 Research & Foundations

Lifecycle:Other (outside lifecycle)Actor:DeveloperAIRM:Unable to classify

Theoretical foundations and provable safety in AI systems > Causal incentives

Developing frameworks that formalize how to align agent incentives with safe and desired outcomes by ensuring their causal understanding matches intended objectives. This research provides a formal language for guaranteeing safety, addressing challenges like goal misspecification, and complementing broader efforts in agent foundations and robust system design.

2.4.1 Research & Foundations

Lifecycle:Plan and DesignActor:DeveloperAIRM:Manage

View all 122 mitigations from this source →

Source Document

Expert Survey: AI Reliability & Security Research Priorities

O'Brien, Joe; Dolan, Jeremy; Kim, Jay; Dykhuizen, Jonah; Sania, Jeba; Becker, Sebastian; Kraprayoon, Jam; Labrador, Cara (2025)

Our survey of 53 specialists across 105 AI reliability and security research areas identifies the most promising research prospects to guide strategic AI R&D investment. As companies are seeking to develop AI systems with broadly human-level capabilities, research on reliability and security is urgently needed to ensure AI's benefits can be safely and broadly realized and prevent severe harms. This study is the first to quantify expert priorities across a comprehensive taxonomy of AI safety and security research directions and to produce a data-driven ranking of their potential impact. These rankings may support evidence-based decisions about how to effectively deploy resources toward AI reliability and security research.

View source DOI: 10.48550/arXiv.2505.21664

Classification

AI Lifecycle Stage

Plan and Design

Designing the AI system, defining requirements, and planning development

Verify and Validate

Responsible Actor

Developer

Entity that creates, trains, or modifies the AI system

NIST AI RMF Function

Map

Identifying and documenting AI risks, contexts, and impacts

Risk Domains

Primary

7.4 Lack of transparency or interpretability

Other

7.2 AI possessing dangerous capabilities