Alibaba Cloud logo Qwen3-VL-30B-A3B Instruct

View website

Released October 2025, an efficient 30B MoE (3B active) vision-language model. Strong multimodal capabilities at low inference cost.

At a glance

Context window
131K tokens
Max output
66K tokens
Modalities
Text Text Image Image Video Video Text Text

Capabilities

Function calling

Function calling

Connect to external tools, APIs, and systems.

Structured output

Structured output

Return responses in structured formats like JSON.

Pricing by provider

Provider Input / 1M tokens Output / 1M tokens
Novita logo Novita $0.20 $0.70

Heads up: We do our best to keep these specs & prices accurate. However, cloud costs may fluctuate based on region, usage, and other factors not listed here. These are estimates based on common setups and are for informational purposes only. Always verify current rates & exact specs with the provider before provisioning.

Compare with other models

Estimated prices shown. Actual costs may vary based on context length, batch size, caching, and provider-specific pricing tiers.