Gemma 4 26B-A4B FP8 MoE inference via vLLM. Vision + tool calling + thinking mode. 256K context with tiny KV cache (5.2 GB at full context). Primary structured output model on Delta GPU.
Are you sure you want to perform this action?
Status
Message here