Skip to main content

AgentGuard — AI Security Firewalls

Access: Tenant Admins only (requires AgentGuard to be enabled in Settings)

AgentGuard is TraptureIQ's dedicated security layer. It acts as a real-time proxy between your users and your AI agents, inspecting every message to prevent data leaks, block malicious prompts, and ensure safe AI output.


What is AgentGuard?

Think of AgentGuard as a security firewall for AI conversations. Just like a network firewall inspects and filters network traffic, AgentGuard inspects and filters every prompt and response flowing through your agents.

Why it matters:

  • Users might accidentally include sensitive data (credit card numbers, SSNs, API keys) in their prompts
  • Malicious users might try to "jailbreak" your agent — tricking it into ignoring safety rules
  • Your agent might generate harmful, biased, or toxic content
  • Regulatory requirements may demand that PII never reaches the LLM

AgentGuard catches all of these issues in real-time.

Demo Video


How AgentGuard Works

When a message flows through TraptureIQ, AgentGuard performs a multi-step inspection:

1. Prompt Sanitization (Inbound)

Before the user's message reaches your agent:

  • Jailbreak Detection — Blocks attempts to override system instructions (e.g., "ignore all previous instructions and...")
  • PII Detection — Identifies and can redact sensitive data like emails, phone numbers, SSNs, API keys
  • Blocked Phrases — Checks against your custom blocklist of forbidden words/phrases
  • Geographic Restrictions — Can block requests from specific countries

2. Response Verification (Outbound)

After your agent generates a response:

  • Toxic Content Check — Inspects for harmful, hateful, or inappropriate content
  • PII Leak Prevention — Ensures the agent doesn't expose sensitive data in its response
  • Safety Category Checks — Validates against 20+ safety categories

3. Real-Time Monitoring

All safety events are logged and displayed on the Content Safety Dashboard for review and audit.


How to Enable AgentGuard

AgentGuard is not enabled by default. To activate it:

  1. Go to Settings at the bottom of the sidebar.
  2. Find the AgentGuard section.
  3. Toggle AgentGuard to Enabled.
  4. Configure notification preferences (optional).
  5. AgentGuard will now appear in the sidebar.

AgentGuard Settings Toggle — Replace with actual screenshot

Expected result: Once enabled, all agent conversations in your workspace are monitored in real-time. The AgentGuard section appears in the sidebar with the dashboard and configuration options.


The AgentGuard Dashboard

Navigate to AgentGuard in the sidebar to see the main dashboard. Here you can:

SectionWhat You Can Do
OverviewSee total safety events, top blocked agents, top blocked users, and event trends
Configure FirewallsSet up blocked phrases, country restrictions, and per-agent firewall rules
View Safety EventsBrowse a live log of every time AgentGuard blocked or sanitized a message
Audit ActionsInvestigate exactly why a specific response was modified or blocked

AgentGuard Dashboard

AgentGuard Configuration

AgentGuard Alert


What Happens When AgentGuard Blocks a Message

When AgentGuard detects an issue:

ScenarioWhat the User SeesWhat the Admin Sees
Prompt blocked (jailbreak)"Your message was blocked by security policies"Event logged in Content Safety Dashboard with category "Jailbreak"
PII detected in promptThe message may be redacted (sensitive parts replaced with [REDACTED])Event logged with category "PII/SDP"
Response contains toxic contentThe response is replaced with a safety messageEvent logged with category "RAI"
Blocked phrase detected"Your message contains restricted content"Event logged in Firewall events

Core Security Sections

AgentGuard has two main configuration areas:

  • Agent Firewall — Set up your first security policy with blocked phrases, country restrictions, and custom rules
  • Content Safety Dashboard — Monitor all safety events with real-time charts, event logs, and category breakdowns

Tips for Beginners

  • Enable AgentGuard early — Even if you don't configure custom rules, the built-in PII detection and jailbreak prevention provide immediate value.
  • Start with monitoring — Before configuring strict blocking rules, let AgentGuard run in monitoring mode to see what kinds of events are being detected.
  • Review the Content Safety Dashboard regularly — It shows you real-world threats your agents are facing.
  • Add blocked phrases for your domain — If your organization has specific terms that should never appear in AI conversations, add them to the firewall.