AI

AI cost optimization: how to cut your OpenAI bill 60% without losing quality

Prompt caching, model routing, structured outputs and tiered pricing. The four levers that knocked $4,200/mo off a client bill last quarter.

LUMIENUpdated 2 min read
AI cost optimization: how to cut your OpenAI bill 60% without losing quality

An AI bill that doubled in three months is usually a routing problem, not a usage problem. Here are the four levers that move the needle, in order of effort-to-reward.

1. Prompt caching

If your system prompt is over 1k tokens and you call the model more than 50 times per hour, enable prompt caching. Anthropic and OpenAI both cache prefix tokens at 90% discount. On one client this alone cut the bill 38%.

2. Model routing by job

Not every request needs the flagship model. Route extraction and classification to Haiku 4.5 or gpt-4o-mini, route reasoning to Sonnet 4.6 or GPT-5.5. Our router averages 4-to-1 cheaper calls without quality regression on the cheap-model jobs.

3. Structured output instead of free text

When you need 12 fields, ask for 12 fields with response_format=json_schema. Free-text responses with parsing burn 3-5x more tokens to get the same data and the parser breaks weekly.

4. Batch what is not realtime

OpenAI batch API is 50% off list and runs within 24 hours. Lead enrichment, content drafts, weekly classification jobs all belong here.

The example

A B2B SaaS client was at $7,100/mo in March. We applied all four: April bill $2,900, May tracking similar. Quality scores unchanged. Engineering time to implement: 2 weeks.

If your AI bill grew faster than your usage, our AI integration audit finds the leaks. Send us your last 3 invoices for a 30-minute review.

More from AI