Glossary · Term

HPT

← all terms

Definition

A multi-stage agent-training recipe combining demonstrations with reinforcement learning, used as a baseline against Llama-based search agents.

Hybrid Policy Training, a multi-stage SFT-plus-RL recipe for reasoning agents, used as a comparison baseline in multi-hop research-agent benchmarks.

Mentioned in 1 episode

  1. 009
    How Two Silent Library Bugs Quietly Invalidated a Wave of Reasoning Papers