← Back to API Documentation Home

vllm-qwen3-embedding

vLLM always-on Qwen3-Embedding-4B BF16 service with 8K context for semantic embeddings. Embeddings-only — use /v1/embeddings, not /v1/chat/completions.