← Back to API Documentation Home

vllm-gemma4-26b

Gemma 4 26B-A4B FP8 MoE inference via vLLM. Vision + tool calling + thinking mode. 256K context with tiny KV cache (5.2 GB at full context). Primary structured output model on Delta GPU.