LLM API 価格比較 2026：トークン単価ではなく用途別コストで選ぶ

2026年7月2日時点で、LLM API の最安は input token 単価だけでは決まりません。同じ用途で合格できる出力を得るための総コスト、つまり input、cached input、output、tool/search fee、router fee、retry、manual review をまとめて見ます。開発者に必要なのは価格表だけではなく、価格の owner、呼び出し route、確認日、単位、そして請求を変える条件です。公式 direct API、Groq のような hosted open-model、OpenRouter のような router economics を先に分けると、同じ「安い API」でも比較できる契約が見えてきます。

価格レーン	使う場面	現在の証拠の読み方
公式 direct API	公式サポート、請求、データ経路、モデル条項が必要	OpenAI、Anthropic、Gemini、DeepSeek、Mistral、xAI の direct rows は各公式ページだけが owner。
Hosted open-model API	自前 GPU なしで open-weight model を安く使う	Groq-hosted GPT OSS、Llama、Qwen は Groq route price。モデル作者の公式 API 価格ではない。
Router / marketplace	1つの key で switching、fallback、比較をしたい	OpenRouter-style rows は platform fee、request limits、routing behavior を含む router economics。

まずこの式で見ます。

monthly API cost = uncached input + cached input + output + route/tool/search/request fees + retry overhead - batch/cache savings

一括抽出、FAQ bot、coding agent、長文分析、規制業務、offline batch では最安候補が変わります。本番前に availability、preview、cache/batch、Free Tier、data residency、router fee、deprecation を再確認します。

公式 direct API 価格のスナップショット

以下は 2026-07-02 時点の owner-labeled starting point です。永続的なランキングではありません。単位は特記がなければ USD / 1M tokens です。

Owner / route	2026-07-02 checked row	Input	Cached input	Output	Caveat
OpenAI direct API, Standard	`gpt-5.4-nano`	$0.20	$0.02	$1.25	Standard、Batch、Flex、Priority は別の料金体系。地域処理エンドポイントは追加料金があり得ます。
OpenAI direct API, Standard	`gpt-5.4-mini`	$0.75	$0.075	$4.50	低コスト OpenAI ルートの候補ですが、すべての用途で最安という意味ではありません。
OpenAI direct API, Standard	`gpt-5.5`	$5.00	$0.50	$30.00	品質差で output cost を回収できる用途だけ候補に入れます。
Anthropic direct API	Claude Sonnet 5 intro row	$2.00	cache route による	$10.00	intro 価格は 2026-08-31 まで。以後は $3.00 input、$15.00 output。
Anthropic direct API	Claude Haiku 4.5	$1.00	cache route による	$5.00	cache write、cache hit、Batch、Fast mode、data residency が請求を変えます。
Google Gemini Developer API	`gemini-3.1-flash-lite`, Paid Tier Standard	$0.25 text/image/video, $0.50 audio	$0.025 text/image/video, $0.05 audio	$1.50	Free Tier は検証用。プロダクションは paid project とデータ条件で別に見ます。
Google Gemini Developer API	`gemini-3.5-flash`, Paid Tier Standard	$1.50	$0.15	$9.00	Google Search や Maps の grounding は含まれる枠を超えると query fee が増えます。
Google Gemini Developer API	`gemini-3.1-pro-preview`, Paid Tier Standard	$2.00 <= 200k, $4.00 > 200k	$0.20 <= 200k, $0.40 > 200k	$12.00 <= 200k, $18.00 > 200k	prompt length のしきい値で単価が変わり、preview status も再確認が必要です。
DeepSeek direct API	`deepseek-v4-flash`	$0.14 cache miss	$0.0028 cache hit	$0.28	`deepseek-chat` と `deepseek-reasoner` は V4 Flash mode にマップされ、2026-07-24 に deprecation 予定です。
DeepSeek direct API	`deepseek-v4-pro`	$0.435 cache miss	$0.003625 cache hit	$0.87	公式ページには 1M context もあるため、移行前に latency と品質を検証します。
Mistral 料金	Mistral Large example	$2.00	public FAQ には未記載	$6.00	Mistral は input/output tokens を課金対象とし、Batch は 50% discount。
xAI model docs	Grok 4.3	$1.25	未記載	$2.50	coding では Grok Build 0.1 を別行で見る。voice/image/video は単位が異なります。

Hosted open-model と router は安く見える場合がありますが、契約が違います。

Route owner	行または契約	価格シグナル	使い方
Groq 料金	`openai/gpt-oss-20b` hosted by Groq	$0.075 uncached input, $0.0375 cached input, $0.30 output	GroqCloud の serving price であり、モデル作者の公式 API 価格ではありません。
Groq 料金	`openai/gpt-oss-120b` hosted by Groq	$0.15 uncached input, $0.075 cached input, $0.60 output	open-model workload の初回検証候補。品質と latency を測ります。
OpenRouter 料金	Pay-as-you-go plan	5.5% platform fee, 400+ models, 70+ providers	router contract であり underlying provider の公式価格ではありません。
OpenRouter 料金	Free plan	50 requests/day, free-model access	検証には使えるが production entitlement ではありません。

候補モデルが表にない場合は owner page を確認し、model ID、route、unit、checked date、caveat を同じ形式で追加します。

Official LLM API price snapshot board with owner, route, unit, checked date, and caveat fields

用途別に安い候補を選ぶ

安いモデルは、同じ job を許容品質と retry rate でこなせるときだけ勝ちます。最初の shortlist は provider ではなく用途から作ります。

用途	最初に試す候補	安くなり得る理由	止める条件
抽出・分類・正規化	DeepSeek V4 Flash、Gemini 3.1 Flash-Lite、Groq GPT OSS 20B、OpenAI GPT-5.4-nano	ラベルや validator で品質を測りやすく、input/output の低単価が効きます。	false positive、retry、人手確認率を測るまで本番化しない。
サポート bot / FAQ	Gemini 3.1 Flash-Lite、OpenAI GPT-5.4-mini/nano、Claude Haiku 4.5、DeepSeek V4 Pro	output ratio は中程度で、policy context の cache が効くことがあります。	escalation quality が落ちるなら token 単価は安くありません。
Coding assistant / agentic tool use	Claude Sonnet 5、OpenAI GPT-5.4/GPT-5.5、xAI Grok Build、Gemini 3.5 Flash	失敗は retry と developer time を増やすため、高性能モデルが安くなる場合があります。	same-repo eval、tool-call success、rollback cost を見る。
長文コンテキスト分析	Gemini Pro/Flash long context、DeepSeek V4 1M context、Grok 4.3	1回の大きな call が chunking + retrieval より安い場合があります。	context tier や cache storage で再計算。
規制・機密・企業ワークフロー	Direct provider API または contracted cloud route	billing、data handling、audit logs、support が低単価より重要です。	router row が安いだけでは移行しない。
Offline batch	OpenAI Batch、Google Batch、Mistral Batch、Groq Batch	非同期処理は discount を受けやすい。	latency route ではないので完了窓と出力取得を確認。

Workload route map showing first-test candidates and stop rules for bulk, chat, coding, long-context, regulated, and batch workloads

月額コストのワークシート

実際の請求は token mix から始まります。同じ workload shape で候補を見積もります。

Monthly uncached input tokens。
Monthly cached input tokens または cache-hit rate。
Monthly output tokens。reasoning/thinking tokens が output billed の場合は含める。
Tool、search、request、route、platform fees。
Retry と fallback overhead。
Batch/cache savings。
Output が通らない場合の人手確認や failure cost。

Scenario	Candidate route	Token mix	Simple monthly token cost	Interpretation
Bulk data cleanup	Groq GPT OSS 20B	100M input, 10M output	$10.50	hosted open model が検証を通るなら非常に安い。
Bulk data cleanup	DeepSeek V4 Flash	100M cache-miss input, 10M output	$16.80	direct DeepSeek row は低いが quality と latency を測る。
Bulk data cleanup	OpenAI GPT-5.4-nano	100M input, 10M output	$32.50	OpenAI compatibility や output quality が必要なら候補。
Bulk data cleanup	Gemini 3.1 Flash-Lite	100M text input, 10M output	$40.00	cache/Batch で改善するが Free Tier を production assumption にしない。
Output-heavy chatbot	Groq GPT OSS 20B	20M input, 20M output	$7.50	output も安いが open-model quality が条件。
Output-heavy chatbot	DeepSeek V4 Flash	20M cache-miss input, 20M output	$8.40	output price が低い。hallucination と escalation cost を測る。
Output-heavy chatbot	OpenAI GPT-5.4-nano	20M input, 20M output	$29.00	output cost が支配的。品質が勝つ場合だけ。
Output-heavy chatbot	Gemini 3.1 Flash-Lite	20M text input, 20M output	$35.00	Gemini ecosystem fit が retry を減らすなら候補。

modifier を足します。OpenAI GPT-5.4-nano の repeated system prompt が 40% cached input になると、その部分は $0.20/M から $0.02/M へ下がります。Gemini 3.1 Flash-Lite は Batch で input $0.25/M から $0.125/M、output $1.50/M から $0.75/M へ下がります。OpenRouter の 5.5% platform fee は direct billing と比べる前に 1.055 倍します。最後は accepted task で割ります。 price per completed task = total monthly route cost / accepted task count cheap route が 94% しか通らず、より高い route が 99.5% 通るなら、差分は retries、fallbacks、manual review、support tickets になります。

Monthly LLM API cost worksheet covering input, cached input, output, retries, route fees, batch, cache, and production recheck rows

Direct API、router、hosted open model、self-host の違い

Direct API と router は ownership が違います。公式 support、billing clarity、data route、enterprise controls、incident diagnosis が重要なら direct provider API が明確です。

Router は model switching、fallback、traffic comparison、single integration に向きます。OpenRouter の 5.5% platform fee、free limits、routing behavior は cost model に入れます。

Hosted open-model APIs はその中間です。Groq は serving price、limits、latency、roster の owner です。openai/gpt-oss という label は OpenAI API price ではありません。

Self-hosting は volume、data locality、hardware access、ops capacity がある場合だけ比較に入ります。free weights は GPU utilization、serving、monitoring、security patching を隠します。

単純な価格表を壊す要因

Output ratio が最初の罠です。chatbot や report generator では output が input より高くつくことがあります。

Caching も違います。OpenAI、Google、Anthropic、DeepSeek、Groq は cache semantics と cache rows が異なります。

Batch は offline extraction、eval generation、enrichment 用です。realtime chat と同じ product として比較しません。

Tool/search fees も大きくなります。web search、Google Grounding、compound tools、router features は token row 以外の費用です。

Preview、intro、tier thresholds も重要です。Sonnet intro row は終了日があり、Gemini Pro Preview は prompt length で変わり、DeepSeek aliases は deprecation 予定があります。

Retry overhead も必ず入れます。1.3 attempts per accepted answer なら 1.3 回分で計算します。

Provider 別メモ

OpenAI pricing page は OpenAI direct API token rows の owner で、Standard、Batch、Flex、Priority を分けます。

Anthropic pricing page は Claude direct rows と cache、Batch、Fast mode、data residency modifiers の owner です。subscription との比較は Claude API pricing versus subscription を見ます。

Gemini pricing page は Developer API rows、Free Tier、Batch、grounding fees を分けます。free quota は Gemini API free tier で扱います。

DeepSeek pricing は deepseek-v4-flash と deepseek-v4-pro を表示し、legacy chat/reasoner names は V4 Flash mode に mapping されます。

Mistral は Mistral Large example と 50% Batch discount を引用できますが、他 row は official evidence が必要です。

xAI docs は chat に Grok 4.3、coding に Grok Build 0.1 を示します。voice/image/video units は text-token table に入れません。

Groq は GroqCloud serving の owner です。

OpenRouter は router/marketplace economics の owner です。

本番移行前の再確認リスト

Check	記録すること
Price owner	Official provider、hosted provider、router、cloud marketplace、self-hosted route。
Model ID	Exact model string、alias/preview/dated/deprecation status。
Token mix	Input、cached input、output、reasoning tokens、average output ratio。
Route fees	Platform fee、request fee、search/tool fee、cache storage、data residency、marketplace uplift。
Quality threshold	Pass rate、retry rate、fallback rate、manual-review rate、failed-output cost。
Latency and limits	RPM、TPM、context limit、batch window、timeout、provider status behavior。
Data route	Retention、training use、region、enterprise terms、audit needs。
Spend controls	Hard caps、alerts、project budgets、tenant attribution、rollback route。

よくある質問

いま最安の LLM API は何ですか？

単純な high-volume text tasks では Groq GPT OSS 20B や DeepSeek V4 Flash が安く見えます。ただし output ratio、cache、batch、retry、route fees、quality threshold を入れてから判断します。

OpenAI は Claude や Gemini より安いですか？

model と workload 次第です。GPT-5.4-nano/mini、Claude Sonnet 5、Gemini 3.1 Flash-Lite は別の強みを持ちます。

OpenRouter のような router を使うべきですか？

switching、fallback、one account、provider comparison が engineering time を減らすなら有効です。platform fee と routing behavior を cost model に入れます。

Free Tier は本番に使えますか？

通常は使いません。exploration と prototype 用で、本番には predictable quota、billing owner、data terms、support path が必要です。

なぜ output price が重要ですか？

多くの provider で output tokens は input より高価です。chatbot、agent、report generator は output-heavy になりがちです。

cache と batch は順位を変えますか？

repeated prompts や stable prefixes では cache、待てる offline workloads では batch が効きます。条件に合うと順位が変わります。

第三者の価格比較表は信頼できますか？

discovery には使えます。最終 pricing は official owner page で確認します。

価格比較はどのくらい頻繁に更新すべきですか？

production decision 前と published refresh 前に必ず再確認します。model names、preview status、cache rules、router fees は変わります。