Overview

LLM guardrails are programmable controls placed around a language model to restrict, shape, or validate its inputs and outputs at runtime. Unlike fine-tuning or alignment training (which alter model weights), guardrails are external mechanisms applied at inference time — they intercept the flow of data before it reaches the model, after it leaves, or at multiple points in between. Guardrails are the primary tool for making LLM-based applications production-safe: preventing jailbreaks, blocking prompt injection, enforcing topic scope, and ensuring outputs meet policy requirements.

NVIDIA’s NeMo Guardrails (open-source, Python) is the most prominent toolkit for this. It models guardrails as “rails” that fire at five distinct pipeline stages, defined in , the domain-specific language bundled with the toolkit.

Five rail types

NeMo Guardrails defines five pipeline positions where a rail can intercept:

Rail type Position Example uses
Input rails Applied to user message before the LLM sees it Reject jailbreak attempts; rewrite ambiguous queries
Dialog rails Control LLM prompting on canonical-form messages Enforce predefined dialog paths; constrain topic scope
Retrieval rails Applied to RAG chunks before they are used Reject irrelevant or unsafe retrieved content
Execution rails Applied to tool/action inputs and outputs Validate API call parameters; filter tool results
Output rails Applied to LLM response before it reaches the user Moderate unsafe outputs; enforce language style

Input rails

Operate before any LLM call. Use cases: detecting prompt injection, blocking disallowed topics, normalising user input. Can reject (return error) or transform (rewrite the message).

Dialog rails

Operate on a canonical representation of the conversation state — the LLM is prompted or constrained in how it continues. Used for scripted dialog paths: ensuring the bot follows a specific flow (e.g. always collect name before proceeding).

Retrieval rails

Specific to RAG pipelines. The retrieved chunks pass through the rail before being inserted into the LLM context. Use cases: filtering chunks that are off-topic, hallucination-prone, or policy-violating.

Execution rails

Applied to inputs sent to external tools (function calls, API invocations) and to their returned results. Provides a safety boundary around agentic tool use — see AI agents.

Output rails

The final checkpoint before the user sees a response. Use cases: PII detection, toxicity filtering, fact-checking against a source document, enforcing required disclaimers.

What guardrails protect against

NeMo Guardrails

NVIDIA’s open-source toolkit (github.com/NVIDIA-NeMo/Guardrails). Latest: v0.21.0. Python 3.10–3.13.

Guardrails vs fine-tuning

Guardrails are a complement to, not a replacement for, alignment training:

Resources