back to blog
optimizationNov 20, 20258 min read

reducing llm api costs by 40% with toon

practical guide to cutting llm token costs using toon format

cost optimizationllm coststoken reductiontoon formatapi efficiencybudget optimization

reducing llm api costs by 40% with toon

token costs add up fast in production llm applications. this guide shows exactly how to reduce costs using toon format, with real numbers from production benchmarks.

the token cost problem

llm apis charge per token. typical pricing: - claude: $3 per 1m input tokens - gpt-4: $5 per 1m input tokens - gemini: $0.35 per 1m input tokens

a single json object might use 10k tokens. multiply by 10,000 api calls per day: - claude: 100m tokens/day = $300/day = $9,000/month - gpt-4: 100m tokens/day = $500/day = $15,000/month

even at gemini's pricing, that's $1,000+/month. reducing tokens by 40% directly cuts these costs.

measured savings: 39.6% token reduction

based on benchmarks across 209 data retrieval tasks:

json baseline: - average tokens per query: 2,847 - monthly cost (10k queries): $855 (at $3/1m tokens)

toon optimized: - average tokens per query: 1,719 - monthly cost (10k queries): $516 - savings: $339/month (39.6%)

that's $4,068 annually for a single application. multiply across multiple services and the impact scales.

cost breakdown by data type

uniform arrays (best case)

employee directory with 50 records:

// json: 3,420 tokens
[
  {"id": 1, "name": "alice", "dept": "eng", "level": 5},
  {"id": 2, "name": "bob", "dept": "sales", "level": 3},
  // ... 48 more records
]
// toon: 1,890 tokens (-44.7%)
[50]{id,name,dept,level}:
  1,alice,eng,5
  2,bob,sales,3
  // ... 48 more records

cost savings: 44.7% on this query type.

time-series data

analytics data with 100 data points:

// json: 5,230 tokens
{
  "metrics": [
    {"timestamp": "2025-01-01T00:00:00Z", "value": 42.3, "status": "ok"},
    // ... 99 more points
  ]
}
// toon: 2,980 tokens (-43.0%)
metrics[100]{timestamp,value,status}:
  2025-01-01T00:00:00Z,42.3,ok
  // ... 99 more points

cost savings: 43.0% on time-series queries.

api response data

product catalog with 30 items:

// json: 4,890 tokens
{
  "products": [
    {"sku": "A101", "name": "widget", "price": 29.99, "stock": 150},
    // ... 29 more products
  ]
}
// toon: 2,750 tokens (-43.8%)
products[30]{sku,name,price,stock}:
  A101,widget,29.99,150
  // ... 29 more products

cost savings: 43.8% on catalog queries.

implementation guide

step 1: identify high-volume queries

audit your llm api calls. find queries that: - run frequently (>100 times/day) - include structured data - have uniform field patterns

these are your optimization candidates.

step 2: measure baseline costs

before optimizing, measure current token usage:

const jsonString = JSON.stringify(data); const tokens = encode(jsonString).length; console.log(`json tokens: ${tokens}`); ```

track for a week to establish baseline costs.

step 3: implement toon conversion

install the official library:

npm install @toon-format/toon

add conversion at the llm boundary:

async function queryLLM(data: any) { const toonFormat = toToon(data); const prompt = `analyze this data: ${toonFormat}`;

return await llm.complete(prompt); } ```

step 4: measure new costs

compare token usage with toon:

import { toToon } from '@toon-format/toon';

const toonString = toToon(data); const tokens = encode(toonString).length; console.log(`toon tokens: ${tokens}`); console.log(`savings: ${((jsonTokens - tokens) / jsonTokens * 100).toFixed(1)}%`); ```

step 5: validate accuracy

toon should maintain or improve accuracy. test with your actual queries:

// test suite
const testQueries = [
  "find all users with role admin",
  "calculate average order value",
  "list products under $50"

for (const query of testQueries) { const jsonResponse = await queryWithJSON(query); const toonResponse = await queryWithTOON(query);

assert(jsonResponse === toonResponse); } ```

roi calculation

calculate your potential savings:

1. daily token usage: measure current tokens/day 2. token cost: your llm provider's rate 3. expected reduction: 35-45% for uniform data 4. monthly savings: (tokens/day × 30 × cost × reduction%)

example: 50m tokens/day at $3/1m tokens with 40% reduction: - current cost: $4,500/month - new cost: $2,700/month - savings: $1,800/month

when savings are minimal

toon doesn't help with:

non-uniform data: ```json [ {"id": 1, "name": "alice"}, {"id": 2, "name": "bob", "email": "bob@example.com"}, {"id": 3, "name": "charlie", "phone": "555-0100"} ] ```

different fields per record = no tabular format = minimal savings.

deeply nested objects: ```json { "config": { "database": { "connection": { "pool": {"min": 5, "max": 20} } } } } ```

toon's indentation can use more tokens than json's compact nesting.

small payloads: - <100 tokens: conversion overhead exceeds savings - <500 tokens: savings too small to matter - >1000 tokens: meaningful cost reduction

production best practices

1. measure before optimizing

don't assume toon will help. measure actual token reduction on your data.

2. optimize high-impact endpoints first

target queries that are: - high frequency (many calls per day) - large payloads (many tokens) - uniform structure (consistent fields)

3. maintain json as source of truth

keep json throughout your codebase. convert to toon only at llm boundary. this maintains compatibility while capturing savings.

4. monitor accuracy

track llm response quality. if toon degrades accuracy, the cost savings aren't worth it.

5. document the conversion

future developers need to understand why you're using a non-standard format. document the cost justification.

real-world results

production deployments report: - e-commerce analytics: 42% token reduction, $2,100/month savings - customer data pipeline: 38% reduction, $890/month savings - financial reporting: 45% reduction, $3,400/month savings

the format pays for itself immediately. no complex optimization required - just a conversion function at your llm boundary.

conclusion

reducing llm costs doesn't require switching providers or downgrading models. for structured, uniform data, toon delivers 35-45% token reduction with maintained or improved accuracy.

implementation is straightforward: install the library, add one conversion call, measure savings. if you're spending $500+/month on llm tokens and passing uniform data, toon will cut that bill significantly.

back to all articles