Glossary · Term

WMDP

← all terms

Definition

A benchmark testing whether AI models have hazardous knowledge that could enable mass-harm uses.

The Weapons of Mass Destruction Proxy benchmark, designed to probe hazardous knowledge in biology, chemistry, and cybersecurity domains in LLMs.

Also called: W-M-D-P

Mentioned in 1 episode

  1. 007
    Exploration Hacking: When Models Sabotage Their Own RL Training

Related concepts