AI Tools · API budget planning

LLM API Cost Calculator

Estimate the cost of GPT, Claude, Gemini, Grok, DeepSeek, Llama, and Mistral API workloads. Compare input, output, cached prompt, batch, monthly, yearly, and blended token costs in one place.

Pricing disclaimer: Prices are local planning data last reviewed April 27, 2026. Provider pricing changes often, so verify official pages before production budgeting.

Workload inputs

Use-case presets

Choose models to compare

Cost comparison

Model Input Output Request Day Month Year Blended / 1M Cache saved Context used

Why this calculator is useful

Budget before launch

Estimate support chatbot, RAG, coding agent, summarization, and classification costs before you commit to a provider or model tier.

Compare the real workload mix

Input and output tokens often have different prices. Blended cost shows the effective price for your actual ratio, not a generic model headline price.

Model caching savings

Repeated system prompts, policies, examples, and reference material can qualify for cached input rates on some providers. Use the cache slider to test the savings.

Catch context risks

The table flags how much of each model's context window your request uses, so you can avoid oversized prompts or choose a larger-context model.

How to use this calculator

  1. Estimate tokens per request. Use the token counter for exact text, or start from a preset if you only need a quick budget.
  2. Enter traffic volume. Requests per day and billing days determine the monthly and yearly projection.
  3. Adjust cache and batch settings. Use cached input when repeated prompt material is likely. Use batch pricing for offline jobs where latency is not important.
  4. Select realistic models. Compare up to six models you can use in your app, then sort visually by monthly spend.
  5. Export or share. Copy the share URL for teammates or download a CSV for a spreadsheet budget.

Pricing sources

These calculations use local pricing metadata so the page works offline. Always verify the exact model, region, caching rules, batch discounts, and long-context thresholds on official provider pages before relying on estimates.

Frequently asked questions

What is a token?

A token is the billing unit most LLM APIs use. It can be a word, word fragment, punctuation mark, or code symbol.

Why are output tokens more expensive?

Output tokens require the provider to run the model while generating each response. That serving cost is often higher than reading input.

What is cached input pricing?

Cached input pricing is a lower rate for repeated prompt content that the provider can reuse. It is useful for long system prompts, examples, policy text, and repeated reference material.

What is blended cost per 1M tokens?

Blended cost combines input, cached input, and output token prices into one effective price for your entered workload.

How accurate are these numbers?

The formulas are exact for the local prices, but provider pricing changes. Treat results as planning estimates and verify official pricing before production decisions.

Related AI Tools