Definition
A pipeline for automatically generating verified training tasks and environments for AI agents that operate real software.
A framework for scaling RLVR training of computer-use agents using an information-barrier Generator/Discriminator pair to synthesize verified (task, environment, reward) tuples across desktop apps and ninety-four synthesized web mocks, producing roughly 32K training tuples.