Introduction
As AI adoption accelerates, the word “guardrail” is increasingly being used to describe everything from policy guidance to hard technical enforcement. This makes it easy to overestimate what guardrails actually provide.
While AI guardrails are certainly important, the over use of the term rapidly dilutes its meaning, making the actual purpose of guardrails less clear. The current vendor landscape has eagerly latched on to that, once again promising all-encompassing solutions for every security cencern related to AI.
This article intends to demistify what AI guardrails are and add more clarity and meaning to the term, by exploring different types of guardrails for AI and how they can be applied to strengthen your AI stack with confidence.
Guardrails
Simply put, guardrails are safety checks placed around systems to help keep their inputs, outputs, and actions within acceptable boundaries. In the context of AI, they do not provide “AI security” on their own. They are not a magic bullet, but just 1 other tool in the toolbox to keep AI applications secure.
To provide more clarity we need to consider the following different types of “guardrails”:
Governance guardrails
These guardrails guide behaviour, not technical enforcement. They come in the form of:
- Policies and processes
- Acceptable use
- Approval paths
- Risk appetite
- Accountability
Governance guardrails set the tone for how AI applications should be managed.
Platform guardrails
Platform guardrails can be considered as the technical implementation of existing security controls.
They include cloud, identity, network, data, and deployment constraints that help prevent unsafe configurations, excessive access, or risky operating patterns. In this context, guardrails are not separate from security controls; they are the practical mechanisms that turn policy and control requirements into enforceable limits within the platform.
Application guardrails
Application guardrails are the controls built directly into the software experience.
They include business logic, input validation, workflow constraints, rate limits, permissions, and approval gates that shape what users, systems, or AI components are allowed to do inside the application. In this context, guardrails are the application-level mechanisms that enforce safe behaviour, prevent misuse, and ensure actions follow the intended business process.
AI - LLM guardrails
LLM guardrails are runtime controls that shape how the model interacts with users, data, and instructions.
They include input filtering, context inspection, output filtering, policy checks, response validation, and, increasingly, tool-call mediation. Their purpose is to help keep model behaviour aligned with safety, privacy, security, and compliance expectations. However, they should be understood as one control layer around model-facing interactions, not as a complete security model.
AI - Agent guardrails
Agent guardrails are runtime controls that constrain what an AI agent is allowed to do.
They cover tools, commands, files, network access, identity, secrets, approval flows, and action logging. While they are often discussed alongside LLM guardrails, they must extend beyond the model itself because agent risk comes not only from what the model says, but from what the agent can access, execute, change, or trigger.
LLM guardrails
In theory, an LLM guardrail provides a runtime policy layer around the AI’s large language model:
- It inspects inputs, retrieved context and outputs
- It sometimes inspects tool calls to detect or block unsafe, non-compliant, sensitive, malicious or out-of-scope behaviour
In practice, the real capabilities of LLM guardrails are highly vendor and implementation dependent. Not all vendors will currently provide tool call inspection for instance, or have the ability to block certain actions, and the quality of available LLM guardrails fluctuates greatly as well. Some may employ LLM-as-judge methods where the guardrail itself is also backed by another AI, others may just be simple parsing rules under the hood.
An LLM guardrail can reduce risks such as prompt injection, data leakage, harmful responses, unsupported claims, unsafe tool use, and malformed outputs. But: LLM guardrails are not an all-encompassing security boundary in itself. They are a targeted security checkpoint around the AI model specifically. They are mainly a filter, checker, classifier or policy decision point. That means it can miss things, over-block harmless things, or be confused by cleverly crafted prompts – which is where a lot of the hype/noise currently doing the rounds comes from.
Well-thought-out LLM guardrails will benefit from other steps in the agent-to-LLM communication flow: prefiltering inputs before they reach the guardrail can already remove obvious junk, abuse, oversized prompts, known attack strings, secrets, malware-like content, unsupported file types, or other irrelevant data. This way the guardrail does not have to deal with raw, messy and hostile content on its own.
Agent Guardrails
Agent guardrails are different from LLM guardrails. They protect and control the agent itself, because agent risk is not limited to the model response; it comes from what the agent is allowed to access, execute, change, and call, and how we enforce, audit, and monitor for this.
Agent guardrails should be managed by a central control plane, which defines and enforces agent policy across tools, identities, secrets, file access, network access, approvals, sandboxing, logging, and escalation paths. This control plane also manages the agent lifecycle, and provides a kill-switch for when an agent misbehaves.
Local or model-level guardrails can help shape behaviours, but the authoritative controls need to sit outside the agent so they cannot be bypassed by prompt injection, misconfiguration, or agent failure.
Logging & Monitoring
LLM guardrails tie into logging and monitoring by producing structured evidence of what happened: the prompt, context sources, model response, guardrail decision, policy triggered, confidence/severity scoring, tool calls attempted, user/session identity, and final action taken. That makes guardrails useful for audit, investigation, tuning, abuse detection, and SOC visibility.
Similarly, agent guardrails can also collect useful telemetry for monitoring: the tools requested, commands attempted, files accessed, network destinations contacted, identities or credentials used, approval decisions, sandbox events, execution results, policy violations, escalations, and any changes made to systems or data.
This telemetry is especially valuable because agent risk is tied to action, not just language. It helps security teams understand whether an agent stayed within its permitted boundaries, whether it attempted something unusual or high risk, and whether controls such as least privilege, sandboxing, approval gates, and tool restrictions are working as intended.
However, guardrails do not automatically equal good monitoring. The telemetry still needs to be captured, normalised, retained, correlated, and reviewed through the wider logging and monitoring pipeline. Guardrails can generate the evidence, but monitoring turns that evidence into visibility, assurance, and response.
Data Loss Prevention
LLM and agent guardrails also tie support Data Loss Prevention (DLP) by helping detect and control when sensitive information may be exposed through AI interactions or agent actions.
For LLMs, this can include checking prompts, retrieved context, and model responses for secrets, credentials, personal data, customer information, internal documents, or regulated content before that information is processed or returned.
For agents, DLP needs to extend further: monitoring which files, systems, tools, databases, APIs, and network destinations the agent can access, and preventing sensitive data from being copied, summarised, exported, uploaded, or sent to unauthorised locations.
In practice, guardrails can support DLP by redacting sensitive values, blocking risky requests, limiting context injection, preventing unsafe tool calls, escalating for approval, and producing telemetry for investigation. However, they should complement—not replace—formal DLP controls, data classification, access control, encryption, tenant isolation, egress filtering, and audit logging.
Final thoughts
LLM guardrails are not “AI security” as a whole. They are one runtime control layer for model-facing interactions. They can inspect, block, redact, reshape, or escalate prompts, retrieved context, model responses, and, in some cases, tool calls. But they do not replace identity, access control, sandboxing, supply-chain security, data governance, monitoring, or human approval.
In short:
- LLM guardrails influence what the model says. Agent guardrails, governed by a central control plane, constrain what the agent can do.
- Guardrails enforce or advise at runtime. Logging and monitoring prove, observe, and improve what happens over time.
- Guardrails work best as one layer in a wider control pipeline. They are not the whole security model.
