vLLM always-on Qwen3-Embedding-4B BF16 service with 8K context for semantic embeddings. Embeddings-only — use /v1/embeddings, not /v1/chat/completions.
Are you sure you want to perform this action?
Status
Message here