Cost Control for Google ADK Agents

The two costs that surprise teams running Google ADK agents in production are:

System prompts re-billed on every turn — every request to Gemini re-sends the full system instruction unless you turn on context caching.
Conversation history that grows unboundedly — every multi-turn session adds tokens until you hit context-window limits or your budget.

Both have documented solutions in the official ADK docs. This page maps those solutions to what you do in TraptureIQ.

The ADK pain points

Context Caching (adk docs) — Gemini can cache large, stable instructions and reuse them across requests. Without caching, you re-pay for every token of your system prompt on every turn.
Context Compaction (adk docs) — A sliding-window summarizer that compresses old events. Without compaction, long sessions linearly grow token use until you hit Gemini's context limit.
Model selection — Gemini Pro is 5-10× the per-token cost of Gemini Flash for many use cases.

Checklist

1. Enable context caching for stable instructions

If your agent's system prompt is over ~2K tokens and doesn't change between requests, configure context caching at the App level in your ADK config:

# ADK app config
cache_config = CacheConfig(
    min_tokens=2048,
    ttl_seconds=3600,
    cache_intervals=10,
)

After deployment, register the agent in TraptureIQ and watch the Token Analyser — the input token count should drop by 60–95 % for repeat conversations.

2. Turn on context compaction for long sessions

Any agent that supports multi-turn workflows beyond ~10 turns should enable ADK's compaction. Default compaction keeps the last N events verbatim and summarizes everything older.

After enabling, use TraptureIQ's Sessions module to spot-check that conversations still flow coherently across compaction boundaries.

3. Set per-agent token budgets

In TraptureIQ → Cost Control, set a monthly token budget for every production agent. Configure an alert at 80 % so you hear about overruns before the bill arrives.

Budget = expected daily volume × avg tokens per turn × 30, plus 30 % headroom.

4. Use the smallest model that meets quality

Start every new agent on Gemini Flash. Move to Gemini Pro only if your evals show a measurable quality gap. The cost differential is large enough that "let's just use Pro to be safe" can 10× your bill for no user-visible benefit.

Run a head-to-head comparison via Custom Eval before defaulting to Pro.

5. Prune the system prompt

Most system prompts grow organically. Once a month, run a prompt cleanup:

Delete every # DO NOT... rule that hasn't fired in 90 days
Move long examples to a tool that returns them on demand
Cut every "you are a helpful assistant" boilerplate sentence

Use the Prompts module to version your changes so you can roll back.

Anti-patterns

One giant system prompt with everything — Bloats every request and resists caching. Split into a cacheable preamble and a small per-request preamble.
No budget set — You'll find out about runaway cost from your finance team, not your dashboards.
Defaulting to Gemini Pro — Often 5-10× more expensive without measurable quality gain. Default to Flash.
Storing conversation history in the system prompt — That's what sessions and memory are for. See Session & Memory.

Where to configure

Per-agent cost view → Cost Control
Token-level breakdown → Token Analyser
Prompt versioning → Prompts
Model selection comparison → Custom Eval

The ADK pain points​

Checklist​

1. Enable context caching for stable instructions​

2. Turn on context compaction for long sessions​

3. Set per-agent token budgets​

4. Use the smallest model that meets quality​

5. Prune the system prompt​

Anti-patterns​

Where to configure​

References​