Glossary · Term

Scale-SWE

← all terms

Definition

A high-scoring software-engineering agent system used as a baseline.

An agentic coding system scoring around 64% on SWE-bench Verified under its native harness but producing malformed output under different harnesses; used as evidence of harness-fragility in open-source agent training.

Mentioned in 1 episode

  1. 047
    When Agent Benchmarks Lie: The Harness Problem in Open-Source AI