Glossary · Term

guardrail

← all terms

Definition

A safety check meant to keep an AI from doing things it shouldn't.

A trained or programmed constraint on model behavior, typically implemented through refusal training, content filters, system-prompt rules, or runtime monitors.

Also called: guardrails

Mentioned in 4 episodes

  1. 062
    Treating Hallucinations as Exploits: A Gate-Based Architecture for Agent Safety
  2. 053
    An AI Agent Swapped In Focal Loss And Beat A Human-Tuned Training Script
  3. 049
    An AI Agent Reached for Root in Twelve Minutes, Without Being Attacked
  4. 045
    When a Frontier Model Talks Its Own Twin Into Climate Denial