Cost Control for Google ADK Agents
The two costs that surprise teams running Google ADK agents in production are:
- System prompts re-billed on every turn — every request to Gemini re-sends the full system instruction unless you turn on context caching.
- Conversation history that grows unboundedly — every multi-turn session adds tokens until you hit context-window limits or your budget.
Both have documented solutions in the official ADK docs. This page maps those solutions to what you do in TraptureIQ.
The ADK pain points
- Context Caching (adk docs) — Gemini can cache large, stable instructions and reuse them across requests. Without caching, you re-pay for every token of your system prompt on every turn.
- Context Compaction (adk docs) — A sliding-window summarizer that compresses old events. Without compaction, long sessions linearly grow token use until you hit Gemini's context limit.
- Model selection — Gemini Pro is 5-10× the per-token cost of Gemini Flash for many use cases.
Checklist
1. Enable context caching for stable instructions
If your agent's system prompt is over ~2K tokens and doesn't change between requests, configure context caching at the App level in your ADK config:
# ADK app config
cache_config = CacheConfig(
min_tokens=2048,
ttl_seconds=3600,
cache_intervals=10,
)
After deployment, register the agent in TraptureIQ and watch the Token Analyser — the input token count should drop by 60–95 % for repeat conversations.
2. Turn on context compaction for long sessions
Any agent that supports multi-turn workflows beyond ~10 turns should enable ADK's compaction. Default compaction keeps the last N events verbatim and summarizes everything older.
After enabling, use TraptureIQ's Sessions module to spot-check that conversations still flow coherently across compaction boundaries.
3. Set per-agent token budgets
In TraptureIQ → Cost Control, set a monthly token budget for every production agent. Configure an alert at 80 % so you hear about overruns before the bill arrives.
Budget = expected daily volume × avg tokens per turn × 30, plus 30 % headroom.
4. Use the smallest model that meets quality
Start every new agent on Gemini Flash. Move to Gemini Pro only if your evals show a measurable quality gap. The cost differential is large enough that "let's just use Pro to be safe" can 10× your bill for no user-visible benefit.
Run a head-to-head comparison via Custom Eval before defaulting to Pro.
5. Prune the system prompt
Most system prompts grow organically. Once a month, run a prompt cleanup:
- Delete every
# DO NOT...rule that hasn't fired in 90 days - Move long examples to a tool that returns them on demand
- Cut every "you are a helpful assistant" boilerplate sentence
Use the Prompts module to version your changes so you can roll back.
Anti-patterns
- One giant system prompt with everything — Bloats every request and resists caching. Split into a cacheable preamble and a small per-request preamble.
- No budget set — You'll find out about runaway cost from your finance team, not your dashboards.
- Defaulting to Gemini Pro — Often 5-10× more expensive without measurable quality gain. Default to Flash.
- Storing conversation history in the system prompt — That's what sessions and memory are for. See Session & Memory.
Where to configure
- Per-agent cost view → Cost Control
- Token-level breakdown → Token Analyser
- Prompt versioning → Prompts
- Model selection comparison → Custom Eval
References
- ADK: Context Caching
- ADK: Context Compaction
- ADK: Gemini model pricing