Glossary · Term

MCPMark

← all terms

Definition

A benchmark of tool-calling tasks built on top of the Model Context Protocol.

A benchmark suite evaluating LLM tool-use agents on MCP-served tasks including filesystem and Postgres workloads, used as out-of-distribution transfer evaluation for Firefly-trained models.

Mentioned in 1 episode

  1. 059
    Firefly's Inversion: Building Verified Tool-Call Training Data by Working Backward