Definition
A safety check meant to keep an AI from doing things it shouldn't.
A trained or programmed constraint on model behavior, typically implemented through refusal training, content filters, system-prompt rules, or runtime monitors.
Also called: guardrails