Skip to main content

Safety & Guardrails for Google ADK Agents

The official ADK safety guide is unambiguous: never trust an LLM to enforce its own guardrails. Prompt injection, jailbreaks, and unsafe output handling are the most common reasons agents get pulled from production.

This page maps the ADK safety primitives to TraptureIQ's runtime defense layer — AgentGuard.


The four safety threat models

ThreatWhat it looks likeWhere it bites
Prompt injectionUser input that contains "ignore previous instructions" or pretends to be system textCustomer-facing chat agents
JailbreakMulti-turn social engineering to make the agent behave outside its roleAny agent with a strong persona
PII / sensitive data leakAgent echoes data from its context or tools into the user's viewRAG agents, internal-tool agents
Unsafe output (RAI)Hate, harassment, explicit, dangerous, or self-harm contentOpen-ended generation agents

The ADK docs recommend layering defenses — never rely on a single line of defense. TraptureIQ implements those layers as configurable policies.


Checklist

1. Use the right authorization pattern

ADK distinguishes agent-auth (the agent calls a tool as itself, using a service account) from user-auth (the agent calls a tool on behalf of the end user, using their OAuth token).

PatternWhen to useTraptureIQ setup
Agent-authRead-only access to internal dataService account on the agent's runtime
User-authMutating actions on user-owned resourcesOAuth + signed user identity propagated to the tool

Misconfiguring this is how data leaks happen across users. See ADK safety / authorization.

2. Enable AgentGuard for any customer-facing agent

AgentGuard is TraptureIQ's runtime firewall — it inspects prompts and responses before they reach Gemini or the user. Baseline policy for any external agent:

PolicyAction
Prompt injection detectionBlock + alert
Jailbreak detectionBlock + alert
PII detection (email, phone, SSN, credit card)Redact in prompts, block in responses
RAI categories (hate, harassment, explicit, dangerous)Block + alert
CSAM detectionBlock + page on-call
Malicious URI detectionBlock

Configure these at AgentGuard → Agent Firewall. Tighten per-agent if needed.

3. Add safety callbacks for defense in depth

Even with AgentGuard active, add an ADK before_model_callback that:

  • Strips system-prompt-like patterns from user input
  • Caps user input length (~ 4KB is usually plenty)
  • Logs any blocked attempt for review

This catches things at the SDK layer before they ever hit the firewall.

4. Restrict tool surface

ADK's tool composition lets an agent call any tool you give it. Give it only what it needs:

  • Read-only first — start with read tools; add write tools later with explicit human approval
  • No raw shell, raw SQL, or raw eval() tools in any agent that touches user input
  • Sandbox file-system tools to a per-session directory

5. Test safety regressions in CI

Use Security Eval to maintain a test suite of known prompt injections and jailbreaks. Run it on every prompt or model change.

6. Watch the Blocked Events dashboard

Set up a weekly review of the AgentGuard → Blocked Events report. Patterns of blocked attempts reveal:

  • Which agents are being probed
  • Which users (or referrers) are sources of malicious traffic
  • Whether your policies need tightening or loosening

Anti-patterns

  • "Tell the LLM not to do X" as your only defense — Bypassed in minutes by determined users. Always layer with AgentGuard.
  • AgentGuard off "for performance" — Inference cost dominates; firewall overhead is ~ 50 ms. Leave it on.
  • Same agent for internal and customer-facing use — Different threat models, different policy needs. Run two separate agents.
  • Logging full prompts of blocked requests at INFO — You may end up logging PII or jailbreak strings. Use a redacted audit log instead.

Where to configure


References