★ FeaturedFlash — fastest and cheapest, lighter reasoningOpen Weights

Llama 3.3 70B Pricing

Meta · llama-3-3-70b

ShareX Facebook LinkedIn

Input / 1M tokens

$0.130

Output / 1M tokens

$0.400

Context window

131,072 tokens

≈ 175 pages of text

Max output

128k tokens

What it costs in practice

Typical request (1,200 in + 400 out tokens)$0.0003

1,000 requests / month$0.316/mo 10,000 requests / month$3.16/mo 100,000 requests / month$31.60/mo

Estimate your own workloadOpens the calculator with Llama 3.3 70B preloaded — adjust volume and token counts there.

Price history

List price per 1M tokens since we started tracking in Jun 2026. Each step marks a price change — hover a point for the exact date and rate.

What it excels at

Mature, well-supported open-weight model for general chat and lightweight agents — wide availability across inference providers keeps pricing competitive.

The business tradeoff

Falls behind newer flagship models on complex reasoning and coding — best suited to well-scoped, repetitive tasks.

Cheaper from Meta

Llama Guard 4 12BMeta$0.0003/ typical request Llama 3.2 3B InstructMeta$0.0002/ typical request

Same tier from other providers

GPT-5.4 NanoOpenAI$0.0007/ typical request Claude Haiku 4.5Anthropic$0.0032/ typical request Gemini 3.5 FlashGoogle$0.0054/ typical request

Head-to-head

Llama 3.3 70B vs GPT-5.4 Nano →Llama 3.3 70B vs Claude Haiku 4.5 →Llama 3.3 70B vs Gemini 3.5 Flash →

Vendor list rates, as of Jul 29, 2026 · source: openrouter · per-request examples assume 1,200 input + 400 output tokens.