Definition
MiniMax's training stack for reinforcement learning on long agent trajectories.
A modular agent-RL system separating model generation, agent harness, and training engine through standardized interfaces, supporting white-box and black-box agents and high-throughput trajectory training.