Definition
A multi-stage agent-training recipe combining demonstrations with reinforcement learning, used as a baseline against Llama-based search agents.
Hybrid Policy Training, a multi-stage SFT-plus-RL recipe for reasoning agents, used as a comparison baseline in multi-hop research-agent benchmarks.