AI Infrastructure Guide

Pricing

LLM Pricing Guide

Compare LLM pricing models for token APIs, dedicated inference, GPU cloud and self-hosted models.

LLM pricing spans token APIs, dedicated endpoints, serverless inference, fine-tuning and self-hosted GPU infrastructure. Exact prices should be checked directly before committing.

Last reviewed: placeholder for v0.1 content review.

GPU hourly pricing

Raw GPU instances billed by time, often with storage, network and idle-capacity costs outside the headline rate.

Token-based inference pricing

Managed LLM APIs priced by input and output tokens. Cost depends on context length, traffic mix and model choice.

Serverless inference pricing

Usage-based runtime pricing that can reduce idle cost but may add cold-start, concurrency or platform constraints.

Dedicated GPU instances

Reserved or dedicated capacity for predictable workloads, usually with stronger planning and commitment requirements.

Self-hosted infrastructure

Hardware or cloud infrastructure operated by the team, including engineering, observability, security and maintenance costs.

FAQ

Is token pricing enough for cost planning?

No. Context length, output volume, retries, caching, latency targets and observability can materially affect total cost.

When can self-hosting be cheaper?

Self-hosting can be cheaper for stable high-volume workloads when the team can operate infrastructure efficiently.