Llama 4 Scout

View website

Released April 2025, a 109B MoE (17B active, 16 experts) with an industry-leading 10M token context window. Fits on a single H100 with INT4.

At a glance

Context window: 10M tokens
Max output: 10M tokens
Knowledge cutoff: Apr 2025
Modalities: Text Text Image Image Video Video → Text Text

Capabilities

Function calling

Function calling

Connect to external tools, APIs, and systems.

Pricing by provider

Provider	Input / 1M tokens	Output / 1M tokens
Replicate	$0.17	$0.65	View
Together	$0.18	$0.59	View
Novita	$0.18	$0.59	View
Meta	Self-hosted	Self-hosted	View

Heads up: We do our best to keep these specs & prices accurate. However, cloud costs may fluctuate based on region, usage, and other factors not listed here. These are estimates based on common setups and are for informational purposes only. Always verify current rates & exact specs with the provider before provisioning.

Compare with other models

Estimated prices shown. Actual costs may vary based on context length, batch size, caching, and provider-specific pricing tiers.