← Back to API Documentation Home

vllm-minimax-m25

vLLM on-demand MiniMax-M2.5 AWQ Q4 229B MoE service with 32K context, tensor-parallel across Bravo+Charlie GPUs, and reasoning token support. On-demand — does not auto-start on reboot.