{"openapi":"3.0.3","info":{"title":"LiteLLM MCP Server API","version":"1.1.0","description":"Internal MCP server that fronts LiteLLM for bounded local-model offload.\n\nCurrent MCP-surfaced chat models:\n- `qwen3.5-35b-a3b`\n- `gemma4-26b`\n- `huihui-qwen3.6-27b` (always-on, Charlie GPU)\n- `hermes-4.3-36b` (on-demand via llama-swap)\n- `glm-4-7-flash`\n\nRuntime note:\n- The live LiteLLM inventory advertises the Bravo backend as `qwen3.6-35b-a3b`.\n- The current MCP source still exposes that general-purpose model as `qwen3.5-35b-a3b`.\n- Charlie GPU model is now `huihui-qwen3.6-27b` (replaced `hermes-4.3-36b` as always-on, 2026-04-27).\n- `hermes-4.3-36b` remains available on-demand via llama-swap (GGUF Q6_K_L, ~10-20s cold start).\n- `qwen3-embedding-4b` is always-on underneath LiteLLM but is not surfaced through MCP tools.\n"},"servers":[{"url":"http://litellm-mcp:8769","description":"Docker internal"},{"url":"http://localhost:8769","description":"Host direct"}],"paths":{"/health":{"get":{"summary":"Health check","operationId":"getHealth","responses":{"200":{"description":"Service is healthy","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HealthResponse"},"example":{"service":"litellm-mcp","status":"healthy","tools_count":3}}}}}}},"/metrics":{"get":{"summary":"Prometheus metrics","operationId":"getMetrics","responses":{"200":{"description":"Prometheus metrics payload","content":{"text/plain":{"schema":{"type":"string"}}}}}}},"/mcp":{"post":{"summary":"Execute MCP JSON-RPC tool call","operationId":"executeMcpTool","requestBody":{"required":true,"content":{"application/json":{"schema":{"$ref":"#/components/schemas/McpRequest"},"examples":{"llm_complete":{"summary":"Run llm/complete","value":{"method":"tools/call","params":{"name":"llm/complete","arguments":{"prompt":"Summarize this deploy log in two sentences.","model":"glm-4-7-flash","temperature":0.2}}}},"llm_json":{"summary":"Run llm/json","value":{"method":"tools/call","params":{"name":"llm/json","arguments":{"prompt":"Extract service, severity, and owner from this incident note.","model":"gemma4-26b","schema":{"type":"object","properties":{"service":{"type":"string"},"severity":{"type":"string"},"owner":{"type":"string"}},"required":["service","severity","owner"]}}}}},"llm_models":{"summary":"List surfaced models","value":{"method":"tools/call","params":{"name":"llm/models","arguments":{}}}}}}}},"responses":{"200":{"description":"MCP response payload","content":{"application/json":{"schema":{"$ref":"#/components/schemas/McpResponse"}}}}}}}},"components":{"schemas":{"HealthResponse":{"type":"object","required":["service","status","tools_count"],"properties":{"service":{"type":"string"},"status":{"type":"string","enum":["healthy","unhealthy"]},"tools_count":{"type":"integer"}}},"McpRequest":{"type":"object","required":["method","params"],"properties":{"method":{"type":"string","enum":["tools/call"]},"params":{"type":"object","required":["name"],"properties":{"name":{"type":"string","enum":["llm/complete","llm/json","llm/models"]},"arguments":{"type":"object","description":"Tool-specific argument object."}}}}},"McpResponse":{"type":"object","properties":{"content":{"type":"array","items":{"$ref":"#/components/schemas/ContentItem"}}}},"ContentItem":{"type":"object","required":["type","text"],"properties":{"type":{"type":"string","enum":["text"]},"text":{"type":"string"}}},"LlmCompleteArguments":{"type":"object","additionalProperties":false,"required":["prompt"],"properties":{"prompt":{"type":"string"},"system":{"type":"string"},"model":{"type":"string","enum":["qwen3.5-35b-a3b","huihui-qwen3.6-27b","hermes-4.3-36b","glm-4-7-flash","gemma4-26b"],"default":"qwen3.5-35b-a3b","description":"General-purpose source alias defaults to `qwen3.5-35b-a3b`.\nThe live LiteLLM inventory currently advertises the Bravo backend as\n`qwen3.6-35b-a3b`.\nUse `huihui-qwen3.6-27b` for creative writing (always-on, Charlie GPU).\nUse `hermes-4.3-36b` for on-demand Hermes access via llama-swap GGUF.\n"},"temperature":{"type":"number","minimum":0,"maximum":2,"default":0.3},"max_tokens":{"type":"integer","minimum":1,"default":4096},"thinking":{"type":"boolean","default":false}}},"LlmJsonArguments":{"type":"object","additionalProperties":false,"required":["prompt"],"properties":{"prompt":{"type":"string"},"system":{"type":"string"},"schema":{"type":"object","description":"Optional JSON Schema prompt guidance."},"model":{"type":"string","enum":["gemma4-26b","qwen3.5-35b-a3b","huihui-qwen3.6-27b","hermes-4.3-36b","glm-4-7-flash"],"default":"gemma4-26b","description":"`gemma4-26b` is the current structured-output default."},"temperature":{"type":"number","minimum":0,"maximum":2,"default":0.1},"max_tokens":{"type":"integer","minimum":1,"default":4096}}},"ModelInfo":{"type":"object","required":["id","context_window","description"],"properties":{"id":{"type":"string","enum":["qwen3.5-35b-a3b","gemma4-26b","huihui-qwen3.6-27b","hermes-4.3-36b","glm-4-7-flash"]},"context_window":{"type":"integer"},"description":{"type":"string"}}}}}}