Safety & Guardrails for Google ADK Agents
The official ADK safety guide is unambiguous: never trust an LLM to enforce its own guardrails. Prompt injection, jailbreaks, and unsafe output handling are the most common reasons agents get pulled from production.
This page maps the ADK safety primitives to TraptureIQ's runtime defense layer — AgentGuard.
The four safety threat models
| Threat | What it looks like | Where it bites |
|---|---|---|
| Prompt injection | User input that contains "ignore previous instructions" or pretends to be system text | Customer-facing chat agents |
| Jailbreak | Multi-turn social engineering to make the agent behave outside its role | Any agent with a strong persona |
| PII / sensitive data leak | Agent echoes data from its context or tools into the user's view | RAG agents, internal-tool agents |
| Unsafe output (RAI) | Hate, harassment, explicit, dangerous, or self-harm content | Open-ended generation agents |
The ADK docs recommend layering defenses — never rely on a single line of defense. TraptureIQ implements those layers as configurable policies.
Checklist
1. Use the right authorization pattern
ADK distinguishes agent-auth (the agent calls a tool as itself, using a service account) from user-auth (the agent calls a tool on behalf of the end user, using their OAuth token).
| Pattern | When to use | TraptureIQ setup |
|---|---|---|
| Agent-auth | Read-only access to internal data | Service account on the agent's runtime |
| User-auth | Mutating actions on user-owned resources | OAuth + signed user identity propagated to the tool |
Misconfiguring this is how data leaks happen across users. See ADK safety / authorization.
2. Enable AgentGuard for any customer-facing agent
AgentGuard is TraptureIQ's runtime firewall — it inspects prompts and responses before they reach Gemini or the user. Baseline policy for any external agent:
| Policy | Action |
|---|---|
| Prompt injection detection | Block + alert |
| Jailbreak detection | Block + alert |
| PII detection (email, phone, SSN, credit card) | Redact in prompts, block in responses |
| RAI categories (hate, harassment, explicit, dangerous) | Block + alert |
| CSAM detection | Block + page on-call |
| Malicious URI detection | Block |
Configure these at AgentGuard → Agent Firewall. Tighten per-agent if needed.
3. Add safety callbacks for defense in depth
Even with AgentGuard active, add an ADK before_model_callback that:
- Strips system-prompt-like patterns from user input
- Caps user input length (~ 4KB is usually plenty)
- Logs any blocked attempt for review
This catches things at the SDK layer before they ever hit the firewall.
4. Restrict tool surface
ADK's tool composition lets an agent call any tool you give it. Give it only what it needs:
- Read-only first — start with read tools; add write tools later with explicit human approval
- No raw shell, raw SQL, or raw
eval()tools in any agent that touches user input - Sandbox file-system tools to a per-session directory
5. Test safety regressions in CI
Use Security Eval to maintain a test suite of known prompt injections and jailbreaks. Run it on every prompt or model change.
6. Watch the Blocked Events dashboard
Set up a weekly review of the AgentGuard → Blocked Events report. Patterns of blocked attempts reveal:
- Which agents are being probed
- Which users (or referrers) are sources of malicious traffic
- Whether your policies need tightening or loosening
Anti-patterns
- "Tell the LLM not to do X" as your only defense — Bypassed in minutes by determined users. Always layer with AgentGuard.
- AgentGuard off "for performance" — Inference cost dominates; firewall overhead is ~ 50 ms. Leave it on.
- Same agent for internal and customer-facing use — Different threat models, different policy needs. Run two separate agents.
- Logging full prompts of blocked requests at INFO — You may end up logging PII or jailbreak strings. Use a redacted audit log instead.
Where to configure
- AgentGuard overview → AgentGuard
- Per-agent firewall policy → Agent Firewall
- Content safety categories → Content Safety
- Security test suite → Security Eval
- Auth methods for Agent Engine → Authentication Setup
References
- ADK: Safety guide
- ADK: Authorization patterns
- ADK: Callback design patterns