GPU hourly pricing
Raw GPU instances billed by time, often with storage, network and idle-capacity costs outside the headline rate.
Pricing
Compare LLM pricing models for token APIs, dedicated inference, GPU cloud and self-hosted models.
LLM pricing spans token APIs, dedicated endpoints, serverless inference, fine-tuning and self-hosted GPU infrastructure. Exact prices should be checked directly before committing.
Last reviewed: placeholder for v0.1 content review.
Raw GPU instances billed by time, often with storage, network and idle-capacity costs outside the headline rate.
Managed LLM APIs priced by input and output tokens. Cost depends on context length, traffic mix and model choice.
Usage-based runtime pricing that can reduce idle cost but may add cold-start, concurrency or platform constraints.
Reserved or dedicated capacity for predictable workloads, usually with stronger planning and commitment requirements.
Hardware or cloud infrastructure operated by the team, including engineering, observability, security and maintenance costs.
No. Context length, output volume, retries, caching, latency targets and observability can materially affect total cost.
Self-hosting can be cheaper for stable high-volume workloads when the team can operate infrastructure efficiently.