Definition
A training recipe that teaches a small AI to know when to click around a screen and when to call a structured tool.
A three-stage training pipeline for computer-use agents combining synthetic hybrid-trajectory generation, single-turn RL on switching steps, and online RL with tool-appropriateness and path-efficiency rewards.