Definition
A benchmark that scores AI models on predicting outcomes from a real prediction market.
An LLM forecasting evaluation suite built on Kalshi prediction-market questions, reporting Brier-style metrics; cited as another current benchmark whose scoring rules cannot detect distributional commitment failures.