Skip to main content

Cost Control for Google ADK Agents

The two costs that surprise teams running Google ADK agents in production are:

  1. System prompts re-billed on every turn — every request to Gemini re-sends the full system instruction unless you turn on context caching.
  2. Conversation history that grows unboundedly — every multi-turn session adds tokens until you hit context-window limits or your budget.

Both have documented solutions in the official ADK docs. This page maps those solutions to what you do in TraptureIQ.


The ADK pain points

  • Context Caching (adk docs) — Gemini can cache large, stable instructions and reuse them across requests. Without caching, you re-pay for every token of your system prompt on every turn.
  • Context Compaction (adk docs) — A sliding-window summarizer that compresses old events. Without compaction, long sessions linearly grow token use until you hit Gemini's context limit.
  • Model selection — Gemini Pro is 5-10× the per-token cost of Gemini Flash for many use cases.

Checklist

1. Enable context caching for stable instructions

If your agent's system prompt is over ~2K tokens and doesn't change between requests, configure context caching at the App level in your ADK config:

# ADK app config
cache_config = CacheConfig(
min_tokens=2048,
ttl_seconds=3600,
cache_intervals=10,
)

After deployment, register the agent in TraptureIQ and watch the Token Analyser — the input token count should drop by 60–95 % for repeat conversations.

2. Turn on context compaction for long sessions

Any agent that supports multi-turn workflows beyond ~10 turns should enable ADK's compaction. Default compaction keeps the last N events verbatim and summarizes everything older.

After enabling, use TraptureIQ's Sessions module to spot-check that conversations still flow coherently across compaction boundaries.

3. Set per-agent token budgets

In TraptureIQ → Cost Control, set a monthly token budget for every production agent. Configure an alert at 80 % so you hear about overruns before the bill arrives.

Budget = expected daily volume × avg tokens per turn × 30, plus 30 % headroom.

4. Use the smallest model that meets quality

Start every new agent on Gemini Flash. Move to Gemini Pro only if your evals show a measurable quality gap. The cost differential is large enough that "let's just use Pro to be safe" can 10× your bill for no user-visible benefit.

Run a head-to-head comparison via Custom Eval before defaulting to Pro.

5. Prune the system prompt

Most system prompts grow organically. Once a month, run a prompt cleanup:

  • Delete every # DO NOT... rule that hasn't fired in 90 days
  • Move long examples to a tool that returns them on demand
  • Cut every "you are a helpful assistant" boilerplate sentence

Use the Prompts module to version your changes so you can roll back.


Anti-patterns

  • One giant system prompt with everything — Bloats every request and resists caching. Split into a cacheable preamble and a small per-request preamble.
  • No budget set — You'll find out about runaway cost from your finance team, not your dashboards.
  • Defaulting to Gemini Pro — Often 5-10× more expensive without measurable quality gain. Default to Flash.
  • Storing conversation history in the system prompt — That's what sessions and memory are for. See Session & Memory.

Where to configure


References