Zum Inhalt springen
Agent Hub
Back to glossary
compliance

Guardrail

A guardrail is a safety layer between user and language model that blocks unwanted inputs or outputs — e.g. PII leaks, prompt injection, or off-topic requests.

Also known as: Safety Layer

In detail

We apply guardrails in three places:

  • Input guardrail: prompt-injection detection (e.g. 'Ignore all previous instructions'), off-topic filtering
  • Output guardrail: PII leak detection (tax IDs, IBANs, etc.) before the answer goes out
  • Tool guardrail: permission checks before a skill executes ('Is this user really allowed to cancel this order?')

Related terms