← all terms
Tricking a chatbot into doing something it was trained to refuse.
An adversarial prompting attack that bypasses a model's safety training to elicit prohibited outputs.
Also called: jailbreaks, jailbreaking