reducing llm api costs by 40% with toon
practical guide to cutting llm token costs using toon format
reducing llm api costs by 40% with toon
token costs add up fast in production llm applications. this guide shows exactly how to reduce costs using toon format, with real numbers from production benchmarks.
the token cost problem
llm apis charge per token. typical pricing: - claude: $3 per 1m input tokens - gpt-4: $5 per 1m input tokens - gemini: $0.35 per 1m input tokens
a single json object might use 10k tokens. multiply by 10,000 api calls per day: - claude: 100m tokens/day = $300/day = $9,000/month - gpt-4: 100m tokens/day = $500/day = $15,000/month
even at gemini's pricing, that's $1,000+/month. reducing tokens by 40% directly cuts these costs.
measured savings: 39.6% token reduction
based on benchmarks across 209 data retrieval tasks:
json baseline: - average tokens per query: 2,847 - monthly cost (10k queries): $855 (at $3/1m tokens)
toon optimized: - average tokens per query: 1,719 - monthly cost (10k queries): $516 - savings: $339/month (39.6%)
that's $4,068 annually for a single application. multiply across multiple services and the impact scales.
cost breakdown by data type
uniform arrays (best case)
employee directory with 50 records:
// json: 3,420 tokens
[
{"id": 1, "name": "alice", "dept": "eng", "level": 5},
{"id": 2, "name": "bob", "dept": "sales", "level": 3},
// ... 48 more records
]// toon: 1,890 tokens (-44.7%)
[50]{id,name,dept,level}:
1,alice,eng,5
2,bob,sales,3
// ... 48 more recordscost savings: 44.7% on this query type.
time-series data
analytics data with 100 data points:
// json: 5,230 tokens
{
"metrics": [
{"timestamp": "2025-01-01T00:00:00Z", "value": 42.3, "status": "ok"},
// ... 99 more points
]
}// toon: 2,980 tokens (-43.0%)
metrics[100]{timestamp,value,status}:
2025-01-01T00:00:00Z,42.3,ok
// ... 99 more pointscost savings: 43.0% on time-series queries.
api response data
product catalog with 30 items:
// json: 4,890 tokens
{
"products": [
{"sku": "A101", "name": "widget", "price": 29.99, "stock": 150},
// ... 29 more products
]
}// toon: 2,750 tokens (-43.8%)
products[30]{sku,name,price,stock}:
A101,widget,29.99,150
// ... 29 more productscost savings: 43.8% on catalog queries.
implementation guide
step 1: identify high-volume queries
audit your llm api calls. find queries that: - run frequently (>100 times/day) - include structured data - have uniform field patterns
these are your optimization candidates.
step 2: measure baseline costs
before optimizing, measure current token usage:
const jsonString = JSON.stringify(data); const tokens = encode(jsonString).length; console.log(`json tokens: ${tokens}`); ```
track for a week to establish baseline costs.
step 3: implement toon conversion
install the official library:
npm install @toon-format/toonadd conversion at the llm boundary:
async function queryLLM(data: any) { const toonFormat = toToon(data); const prompt = `analyze this data: ${toonFormat}`;
return await llm.complete(prompt); } ```
step 4: measure new costs
compare token usage with toon:
import { toToon } from '@toon-format/toon';const toonString = toToon(data); const tokens = encode(toonString).length; console.log(`toon tokens: ${tokens}`); console.log(`savings: ${((jsonTokens - tokens) / jsonTokens * 100).toFixed(1)}%`); ```
step 5: validate accuracy
toon should maintain or improve accuracy. test with your actual queries:
// test suite
const testQueries = [
"find all users with role admin",
"calculate average order value",
"list products under $50"for (const query of testQueries) { const jsonResponse = await queryWithJSON(query); const toonResponse = await queryWithTOON(query);
assert(jsonResponse === toonResponse); } ```
roi calculation
calculate your potential savings:
1. daily token usage: measure current tokens/day 2. token cost: your llm provider's rate 3. expected reduction: 35-45% for uniform data 4. monthly savings: (tokens/day × 30 × cost × reduction%)
example: 50m tokens/day at $3/1m tokens with 40% reduction: - current cost: $4,500/month - new cost: $2,700/month - savings: $1,800/month
when savings are minimal
toon doesn't help with:
non-uniform data: ```json [ {"id": 1, "name": "alice"}, {"id": 2, "name": "bob", "email": "bob@example.com"}, {"id": 3, "name": "charlie", "phone": "555-0100"} ] ```
different fields per record = no tabular format = minimal savings.
deeply nested objects: ```json { "config": { "database": { "connection": { "pool": {"min": 5, "max": 20} } } } } ```
toon's indentation can use more tokens than json's compact nesting.
small payloads: - <100 tokens: conversion overhead exceeds savings - <500 tokens: savings too small to matter - >1000 tokens: meaningful cost reduction
production best practices
1. measure before optimizing
don't assume toon will help. measure actual token reduction on your data.
2. optimize high-impact endpoints first
target queries that are: - high frequency (many calls per day) - large payloads (many tokens) - uniform structure (consistent fields)
3. maintain json as source of truth
keep json throughout your codebase. convert to toon only at llm boundary. this maintains compatibility while capturing savings.
4. monitor accuracy
track llm response quality. if toon degrades accuracy, the cost savings aren't worth it.
5. document the conversion
future developers need to understand why you're using a non-standard format. document the cost justification.
real-world results
production deployments report: - e-commerce analytics: 42% token reduction, $2,100/month savings - customer data pipeline: 38% reduction, $890/month savings - financial reporting: 45% reduction, $3,400/month savings
the format pays for itself immediately. no complex optimization required - just a conversion function at your llm boundary.
conclusion
reducing llm costs doesn't require switching providers or downgrading models. for structured, uniform data, toon delivers 35-45% token reduction with maintained or improved accuracy.
implementation is straightforward: install the library, add one conversion call, measure savings. if you're spending $500+/month on llm tokens and passing uniform data, toon will cut that bill significantly.