HPT · Glossary · AI Papers: A Deep Dive

Definition

Plain language

A multi-stage agent-training recipe combining demonstrations with reinforcement learning, used as a baseline against Llama-based search agents.

As stated in the literature

Hybrid Policy Training, a multi-stage SFT-plus-RL recipe for reasoning agents, used as a comparison baseline in multi-hop research-agent benchmarks.

Why it matters: Multi-stage SFT-plus-RL is a standard recipe, so beating it cleanly is one of the bars new training approaches have to clear.

For example, a paper introducing a new research agent might compare its results to HPT-trained models to show that its single-stage method matches or exceeds the multi-stage baseline.

Heard on the show

“LUFFY, ReLIFT, SRFT, Prefix-RFT, HPT — methods that said: don't separate the stages, blend them.”

Episode 009 — How Two Silent Library Bugs Quietly Invalidated a Wave of Reasoning Papers

Mentioned in 1 episode

009
How Two Silent Library Bugs Quietly Invalidated a Wave of Reasoning Papers

Related terms

agent multi-hop reinforcement learning SFT