Budget before launch
Estimate support chatbot, RAG, coding agent, summarization, and classification costs before you commit to a provider or model tier.
Estimate the cost of GPT, Claude, Gemini, Grok, DeepSeek, Llama, and Mistral API workloads. Compare input, output, cached prompt, batch, monthly, yearly, and blended token costs in one place.
| Model | Input | Output | Request | Day | Month | Year | Blended / 1M | Cache saved | Context used |
|---|
Estimate support chatbot, RAG, coding agent, summarization, and classification costs before you commit to a provider or model tier.
Input and output tokens often have different prices. Blended cost shows the effective price for your actual ratio, not a generic model headline price.
Repeated system prompts, policies, examples, and reference material can qualify for cached input rates on some providers. Use the cache slider to test the savings.
The table flags how much of each model's context window your request uses, so you can avoid oversized prompts or choose a larger-context model.
These calculations use local pricing metadata so the page works offline. Always verify the exact model, region, caching rules, batch discounts, and long-context thresholds on official provider pages before relying on estimates.
A token is the billing unit most LLM APIs use. It can be a word, word fragment, punctuation mark, or code symbol.
Output tokens require the provider to run the model while generating each response. That serving cost is often higher than reading input.
Cached input pricing is a lower rate for repeated prompt content that the provider can reuse. It is useful for long system prompts, examples, policy text, and repeated reference material.
Blended cost combines input, cached input, and output token prices into one effective price for your entered workload.
The formulas are exact for the local prices, but provider pricing changes. Treat results as planning estimates and verify official pricing before production decisions.