How it works

How the numbers are made

A plain-language tour of where the prices come from and how every cost figure on the site is calculated.

Where prices come from

Model prices are vendor list rates, synced automatically from a public catalog about once an hour. We store a new entry only when a price actually moves, which builds the price-change history behind the models pages and the “latest price changes” feed. Everything is quoted per 1,000,000 tokens.

Counting tokens

Models bill by the token, not the word, so the first step is turning your text into a token count. Two modes:

Exact (the default) runs a real tokenizer in your browser — accurate for GPT-style models, within a few percent for other vendors.
Approximate uses the standard rule of thumb of about 4 characters per token — instant, with no download.

For files you don’t upload anything — you describe a file by type and size, and we estimate the extracted text with per-format density heuristics (a PDF yields fewer tokens per KB than plain text, and so on).

Estimating cost

The math is deliberately simple: for each model we multiply your input, output, and cached tokens by that model’s respective rates, then by your monthly volume. Because every model’s prices are already loaded in your browser, this recalculates instantly as you type — there is no server round-trip and nothing you enter is sent anywhere.

Prompt caching & multi-turn conversations

LLM APIs are stateless: every turn of a conversation resends the entire history so the model “remembers” it. That makes input tokens grow quadratically as a chat gets longer — a real source of surprise bills.

Prompt caching softens this. Providers that support it store the conversation prefix, so on the next turn the repeated history is billed at a much cheaper cache-read rate and only the new message pays the full input rate (plus a one-time cache-write the first time each segment is seen). The simulator models all three components, and a toggle lets you compare the cached best case against the stateless worst case. Models without cache pricing simply pay the full input rate for the resend — so a non-caching model honestly shows the steeper bill next to one that caches.

Picking a model

The chooser and the simulator’s recommendation blend two signals: an editorial 0–100 quality rating and the projected cost for your workload. Lean toward quality, cost, or a balance, and the pick (plus an alternative and its trade-off) updates accordingly — so you get a recommendation, not just a wall of numbers.

A note on accuracy

Token estimates and list-rate prices are planning tools, not invoices. Real bills vary with exact tokenizers, caching behavior, request mixes, and provider discounts. Always confirm current pricing with the provider before committing — see the terms for the full disclaimer.

Questions or a pricing correction? Contact us at contact@chooseaimodel.com.