Concept · 16 episode(s)

LLM Behavior Analysis

Definition

LLM behavior analysis is the broad project of characterizing what models do across inputs — capabilities, failure modes, biases, persona shifts — treating the model as a black-box object of empirical study. It’s how most safety-relevant claims about a model actually get grounded.

Episodes covering this

210
Same Website Request, Different Code — The Bias You Can't See
Biased or Personalized? The Impact of Personal Information on AI-driven Development
· ·14 min·Jul 09, 2026
209
How 2.6 Billion Doodles Exposed the Culture Words Quietly Delete
Billions of Sketches Reveal Hidden Cultural Variation in Human Concepts
· ·15 min·Jul 09, 2026
205
The Same AI, Two Labels: How the Pitch Beat the Product in 162 Sessions
Rating the Pitch, Not the Product: User Evaluations of LLMs Reflect Expectations More Than Performance
· ·13 min·Jul 07, 2026
196
AI Agents Reached Opposite Conclusions From the Same Data — and Passed Review
The Agentic Garden of Forking Paths
Miao, Pritchard, Zou · Stanford University·18 min·Jul 03, 2026
174
When the AI 'Schemes,' It's Usually Just Lazy or Confused
Model Forensics: Investigating Whether Concerning Behavior Reflects Misalignment
Singh, Kroiz, Rajamanoharan et al. · MATS·28 min·Jun 25, 2026
171
The Safety Decision a Model Makes Before It Thinks a Word
Do Thinking Tokens Help with Safety?
Ri, Panigrahi, Arora · Princeton Language and Intelligence·25 min·Jun 25, 2026
149
When Cornering a Chatbot Makes It Lie: J.P. Morgan's Case for 'Playing Dead'
Is Your Agent Playing Dead? Deployed LLM Agents Exhibit Constraint-Evasive Fabrication and Thanatosis
Rodríguez, Pozanco, Borrajo · J.P. Morgan AI Research·23 min·Jun 16, 2026
144
When an AI Agent Just Copies Its Tool — And Bigger Models Copy More
When the Tool Decides: LLM Agents Defer Blindly to Graph Neural Network Tools, and Stronger Backbones Defer More
Wang, Vemuri · raptorX.ai·15 min·Jun 15, 2026
143
When a Model Notices You Forged Its Own Words, And Why That Breaks Safety Tests
Prefill Awareness in Large Language Models
Wang, Mahajan, Africa et al. · Constellation / University of Wisconsin-Madison·24 min·Jun 12, 2026
128
How a Model Can Earn Full Reward and Still Resist Training
Generalization Hacking: Models Can Game Reinforcement Learning by Preventing Behavioral Generalization
Xiao, Phuong · California Institute of Technology·29 min·Jun 11, 2026
118
Why the Best-Aligned AI Models Are the Easiest to Trick Into Producing Harm
Safety Paradox: How Enhanced Safety Awareness Leaves LLMs Vulnerable to Posterior Attack
Hoang, Le, Xu et al. · Singapore University of Technology and Design·23 min·Jun 05, 2026
113
What If a Prompt Injection Never Left? Attacks That Wait in Agent Memory
What If Prompt Injection Never Left? Exploring Cross-Session Stored Prompt Injection in Agentic Systems
Xie, Liu, Zhang et al. · Institute of Information Engineering·27 min·Jun 04, 2026
100
How a Prompt Wrapper Lets a Frontier Model Play Poker Like an Expert
PokerSkill: LLMs Can Play Expert-Level Poker without Training or Solvers
Li, Wang, Huang · IIIS·29 min·May 29, 2026
098
Finding Millions of Readable Concepts Inside a Real, Deployed AI Model
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
Templeton, Conerly, Marcus et al. · Anthropic·28 min·May 29, 2026
094
Chain-of-Thought Monitoring Fails Across Languages, and Worst Where It's Needed Most
The Fragility of Chain-of-Thought Monitoring Across Typologically Diverse Languages
Onyame, Zhou, Thopalli et al. · University of Virginia·24 min·May 28, 2026
015
The Audit Number Isn't What You Think: Sycophancy and the Case Against Single-Prompt Bias Tests
Political Bias Audits of LLMs Capture Sycophancy to the Inferred Auditor
Törnberg, Schimmel · Institute of Logic·21 min·May 03, 2026

Worth reading next

Papers we haven't done a deep dive on yet, but would recommend on this topic.