vLLM on-demand MiniMax-M2.5 AWQ Q4 229B MoE service with 32K context, tensor-parallel across Bravo+Charlie GPUs, and reasoning token support. On-demand — does not auto-start on reboot.
Are you sure you want to perform this action?
Status
Message here