Concept · 44 episode(s)

Tool Use

Definition

Tool use is the model’s ability to call external functions — a calculator, a search engine, a code interpreter, an API — and use the results in its response. It’s what turns a chat model into something that can actually act in the world.

Episodes covering this

208
The Blank Space in Your AI Approval Box That Isn't Empty
Unicode TAG-Block Concealment of Tool-Metadata Payloads in the Model Context Protocol: An Approval-View Fidelity Gap Across Three Independent Server Implementations
· ·15 min·Jul 08, 2026
202
How Do You Know an AI Agent Actually Refused? Check the World, Not the Words
Safety Testing LLM Agents at Scale: From Risk Discovery to Evidence-Grounded Verification
Feng, Lin, Wen et al. · AntGroup / Hunan Institute of Advanced Technology·18 min·Jul 06, 2026
195
Why 'Be Careful' Does Nothing for AI Coding Agents, and What Does
Coding Agents Are Guessing: Measuring Action-Boundary Violations in Underspecified DevOps Instructions
Ji, Zhang, Xu et al. · Hong Kong University of Science and Technology·15 min·Jul 03, 2026
194
How a Robot Builds a Debugging Notebook It Can Read, Edit, and Hand to Another Robot
ASPIRE: Agentic /Skills Discovery for Robotics
Lu, Wu, Kou et al. · NVIDIA·24 min·Jul 02, 2026
192
A 32B Open Model Matched Frontier Systems By Learning to Take Notes
AutoMem: Automated Learning of Memory as a Cognitive Skill
Wu, Zhu, Zhang et al. · Stanford University·22 min·Jul 02, 2026
190
The Skill Every AI Manager Is Missing: Handing Out Exactly the Right Keys
ClawArena-Team: Benchmarking Subagent Orchestration and Dynamic Workflows in Language-Model Agents
Xiong, Ji, Qiu et al. · UNC Chapel Hill·21 min·Jul 02, 2026
187
An 8-Billion Agent That Beats Models 80 Times Its Size By Looking Things Up
An AI agent for treatment reasoning over a biomedical tool universe
Gao, Noori, Zhu et al. · Department of Biomedical Informatics·19 min·Jun 30, 2026
184
An AI Built an Undetectable Secret Channel, And Another AI Couldn't Find It
Tool Use Enables Undetectable Steganography in Multi-Agent LLM Systems
Rippin, Marshall, Africa et al. · Oxford University·19 min·Jun 30, 2026
181
How to Backpropagate Blame Through a Team of Chatbots — And When It Backfires
GBC: Gradient-Based Connections for Optimizing Multi-Agent Systems
Yang, Alrabah, Hakkani-Tür et al. · University of Illinois Urbana-Champaign·20 min·Jun 29, 2026
175
One Crosscoder Feature Flips a Stalling Chatbot Into a Working Agent
Localizing RL-Induced Tool Use to a Single Crosscoder Feature
Shportko, Bhokare, AlZahrani et al. · Northwestern University·26 min·Jun 26, 2026
170
When a One-Liner Beats Your Agent's Clever Verification Logic
Bayesian control for coding agents
Papamarkou, Smirnov, Mazanov et al. · PolyShape / National Technical University of Athens·26 min·Jun 24, 2026
169
Why Better Bug Reports Can Make AI Coding Agents Worse
SHERLOC: Structured Diagnostic Localization for Code Repair Agents
Tamoyan, Narenthiran, Arakelyan et al. · NVIDIA / TU Darmstadt·24 min·Jun 24, 2026
168
When Turning Experience Into Code Makes Your AI Agent Dumber
Metis: Bridging Text and Code Memory for Self-Evolving Agents
Dai, He, Li et al. · The Chinese University of Hong Kong·27 min·Jun 24, 2026
167
How Teaching an AI to Predict, Not Act, Made It a Better Actor
Qwen-AgentWorld: Language World Models for General Agents
Team, Zuo, Xiao et al. · ·27 min·Jun 24, 2026
161
A Robot That Plays Before You Give It a Job, And Why That Beats Retrying
Playful Agentic Robot Learning
Zhang, Ge, Yoo et al. · University of California·19 min·Jun 19, 2026
157
When an AI Coding Agent Drives a Phone Through the Terminal, No Screen Needed
Beyond the GUI Paradigm: Do Mobile Agents Need the Phone Screen?
Gu, Jiang, Guo et al. · Mila–Québec AI Institute / Concordia University·24 min·Jun 19, 2026
150
Don't Kill the Loser: A Different Way to Handle Two AI Agents Colliding
CoAgent: Concurrency Control for Multi-Agent Systems
Lyu, Zhang, Wu et al. · Shanghai Jiao Tong University·32 min·Jun 16, 2026
147
Agents Fail at the Body, Not the Brain: A Self-Rewriting Scaffold That Lifts a 9B Model 44 Points
HarnessX: A Composable, Adaptive, and Evolvable Agent Harness Foundry
Chen, Lu, Zhao et al. · ·30 min·Jun 15, 2026
144
When an AI Agent Just Copies Its Tool — And Bigger Models Copy More
When the Tool Decides: LLM Agents Defer Blindly to Graph Neural Network Tools, and Stronger Backbones Defer More
Wang, Vemuri · raptorX.ai·15 min·Jun 15, 2026
142
Training a Tiny Model to Run the Plumbing Between an Agent and the World
HarnessBridge: Learnable Bidirectional Controller for LLM Agent Harness
Wang, Wang, Taylor et al. · University of California·24 min·Jun 12, 2026
123
Five Identical Worlds, One Swapped Model: What Happens When AI Agents Run for Fifteen Days
Emergence World: A Platform for Evaluating Long-Horizon Multi-Agent Autonomy
Akkil, Kokku, Vikram et al. · Emergence AI·30 min·Jun 09, 2026
120
How an AI Agent Rewrites Its Own Tools, Without an Answer Key
Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts
Pan, Liu, Lin et al. · City University of Hong Kong·30 min·Jun 05, 2026
110
How an Agent Got 44 Points Better by Mining Its Own Scratch Paper
Inducing Reasoning Primitives from Agent Traces
Lei, Yan, Momo et al. · Carnegie Mellon University·27 min·Jun 03, 2026
108
The Reasoning Cliff: Why Thinking Longer Makes Models Worse at Exact Step-by-Step Tasks
The Deterministic Horizon: When Extended Reasoning Fails and Tool Delegation Becomes Necessary
Guo, Wu, Yiu · The University of Hong Kong·32 min·Jun 03, 2026
105
The Trojan Is Your Agent's Memory: Why Single-Step Defenses Miss Persistent Attacks
From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors
Tan, Dou, Yang et al. · Gaoling School of Artificial Intelligence·26 min·Jun 01, 2026
104
How Making a Research Agent Smarter Quietly Makes It Leak Your Secrets
MosaicLeaks:Privacy Risks in Querying-in-the-Open for Deep Research Agents
Gurung, Gella, Drouin et al. · University of Edinburgh·25 min·Jun 01, 2026
100
How a Prompt Wrapper Lets a Frontier Model Play Poker Like an Expert
PokerSkill: LLMs Can Play Expert-Level Poker without Training or Solvers
Li, Wang, Huang · IIIS·29 min·May 29, 2026
097
Same Tokens, Same Cost, Wildly Different Results: What Actually Scales in AI Agents
Scaling Laws for Agent Harnesses via Effective Feedback Compute
Zhang, Wang, Xu et al. · Harbin Institute of Technology·25 min·May 29, 2026
089
When AI-Written Papers Read Well But the Evidence Underneath Is Broken
ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence
Meng, Mishra, Chen et al. · Google Cloud AI Research·32 min·May 27, 2026
067
An AI Just Solved a 1996 Erdős Problem—and the Simplest Agent Won
Advancing Mathematics Research with AI-Driven Formal Proof Search
Tsoukalas, Kovsharov, Shirobokov et al. · Google DeepMind·31 min·May 22, 2026
066
Why Giving an AI Agent More Tools Can Make It Worse at Using a Computer
ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents
Hu, Zhang, Xu et al. · Tongyi Lab·26 min·May 22, 2026
063
Why Web Agents Are Slow: A Compiler-Style Fix for Computer-Use Latency
Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling
Winston, Wang, Mirhoseini et al. · Stanford University·26 min·May 21, 2026
062
Treating Hallucinations as Exploits: A Gate-Based Architecture for Agent Safety
Hallucination as Exploit: Evidence-Carrying Multimodal Agents
Zhang, Zheng, Yang · Shenzhen University·24 min·May 20, 2026
059
Firefly's Inversion: Building Verified Tool-Call Training Data by Working Backward
Firefly: Illuminating Large-Scale Verified Tool-Call Data Generation from Real APIs
Lu, Wang, Lu et al. · Northeastern University·22 min·May 20, 2026
057
How Uber Caught 206 Leaked Credentials With an LLM-Powered Security Stack
ADR: An Agentic Detection System for Enterprise Agentic AI Security
Li, Hu, Xu et al. · Uber Technologies·28 min·May 19, 2026
040
Two Frozen Models Learn to Whisper: Coupling Through Hidden States
The Bicameral Model: Bidirectional Hidden-State Coupling Between Parallel Language Models
Flamant, Ghai, Shimizu · AWS Agentic AI·29 min·May 13, 2026
039
When Smarter Agents Get Fooled by Three Extra Nodes in a Database
Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoning
Kereopa-Yorke, Diaz, Wright et al. · Microsoft·31 min·May 12, 2026
035
Why Frontier Agents Ask for Clarification at Exactly the Wrong Moment
Ask Early, Ask Late, Ask Right: When Does Clarification Timing Matter for Long-Horizon Agents?
Gulati, Gupta, Lumer et al. · PricewaterhouseCoopers U.S.·29 min·May 11, 2026
029
Why Forty-Eight Percent on FrontierMath Isn't the Real Story in DeepMind's New Math Paper
AI Co-Mathematician: Accelerating Mathematicians with Agentic AI
Zheng, Glehn, Zwols et al. · Google DeepMind·20 min·May 08, 2026
024
An AI Agent That Found 28 Zero-Days in Windows — And What Made It Work
Agentic Vulnerability Reasoning on Windows COM Binaries
Lee, Kim, Zhang · University of Illinois at Urbana-Champaign·22 min·May 07, 2026
021
Ten Thousand Examples Beat the Full Industrial Pipeline for Search Agents
OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories
Du, Ye, Tang et al. · Shanghai Jiao Tong University·14 min·May 06, 2026
020
The Compliance Gap: Why AI Says Yes and Does No
The Compliance Gap: Why AI Systems Promise to Follow Process Instructions but Don't
Shin · Polymath Minds AI Lab·28 min·May 06, 2026
016
Why Your Coding Agent Stalls While the GPU Runs Hot
MARS: Efficient, Adaptive Co-Scheduling for Heterogeneous Agentic Systems
Wang, Ye, Xu et al. · Duke University·24 min·May 03, 2026
011
When RL Actually Teaches Agents Something New, And When It Doesn't
Does RL Expand the Capability Boundary of LLM Agents? A PASS@(k,T) Analysis
Zhai, Yan, Shao et al. · Fudan University·23 min·May 02, 2026

Worth reading next

Papers we haven't done a deep dive on yet, but would recommend on this topic.