Safety & Guardrails for Google ADK Agents

The official ADK safety guide is unambiguous: never trust an LLM to enforce its own guardrails. Prompt injection, jailbreaks, and unsafe output handling are the most common reasons agents get pulled from production.

This page maps the ADK safety primitives to TraptureIQ's runtime defense layer — AgentGuard.

The four safety threat models

Threat	What it looks like	Where it bites
Prompt injection	User input that contains "ignore previous instructions" or pretends to be system text	Customer-facing chat agents
Jailbreak	Multi-turn social engineering to make the agent behave outside its role	Any agent with a strong persona
PII / sensitive data leak	Agent echoes data from its context or tools into the user's view	RAG agents, internal-tool agents
Unsafe output (RAI)	Hate, harassment, explicit, dangerous, or self-harm content	Open-ended generation agents

The ADK docs recommend layering defenses — never rely on a single line of defense. TraptureIQ implements those layers as configurable policies.

Checklist

1. Use the right authorization pattern

ADK distinguishes agent-auth (the agent calls a tool as itself, using a service account) from user-auth (the agent calls a tool on behalf of the end user, using their OAuth token).

Pattern	When to use	TraptureIQ setup
Agent-auth	Read-only access to internal data	Service account on the agent's runtime
User-auth	Mutating actions on user-owned resources	OAuth + signed user identity propagated to the tool

Misconfiguring this is how data leaks happen across users. See ADK safety / authorization.

2. Enable AgentGuard for any customer-facing agent

AgentGuard is TraptureIQ's runtime firewall — it inspects prompts and responses before they reach Gemini or the user. Baseline policy for any external agent:

Policy	Action
Prompt injection detection	Block + alert
Jailbreak detection	Block + alert
PII detection (email, phone, SSN, credit card)	Redact in prompts, block in responses
RAI categories (hate, harassment, explicit, dangerous)	Block + alert
CSAM detection	Block + page on-call
Malicious URI detection	Block

Configure these at AgentGuard → Agent Firewall. Tighten per-agent if needed.

3. Add safety callbacks for defense in depth

Even with AgentGuard active, add an ADK before_model_callback that:

Strips system-prompt-like patterns from user input
Caps user input length (~ 4KB is usually plenty)
Logs any blocked attempt for review

This catches things at the SDK layer before they ever hit the firewall.

4. Restrict tool surface

ADK's tool composition lets an agent call any tool you give it. Give it only what it needs:

Read-only first — start with read tools; add write tools later with explicit human approval
No raw shell, raw SQL, or raw eval() tools in any agent that touches user input
Sandbox file-system tools to a per-session directory

5. Test safety regressions in CI

Use Security Eval to maintain a test suite of known prompt injections and jailbreaks. Run it on every prompt or model change.

6. Watch the Blocked Events dashboard

Set up a weekly review of the AgentGuard → Blocked Events report. Patterns of blocked attempts reveal:

Which agents are being probed
Which users (or referrers) are sources of malicious traffic
Whether your policies need tightening or loosening

Anti-patterns

"Tell the LLM not to do X" as your only defense — Bypassed in minutes by determined users. Always layer with AgentGuard.
AgentGuard off "for performance" — Inference cost dominates; firewall overhead is ~ 50 ms. Leave it on.
Same agent for internal and customer-facing use — Different threat models, different policy needs. Run two separate agents.
Logging full prompts of blocked requests at INFO — You may end up logging PII or jailbreak strings. Use a redacted audit log instead.

Where to configure

AgentGuard overview → AgentGuard
Per-agent firewall policy → Agent Firewall
Content safety categories → Content Safety
Security test suite → Security Eval
Auth methods for Agent Engine → Authentication Setup

The four safety threat models​

Checklist​

1. Use the right authorization pattern​

2. Enable AgentGuard for any customer-facing agent​

3. Add safety callbacks for defense in depth​

4. Restrict tool surface​

5. Test safety regressions in CI​

6. Watch the Blocked Events dashboard​

Anti-patterns​

Where to configure​

References​