Central routing service for the Haiven AI stack. It accepts natural-language requests, classifies intent through LiteLLM, and dispatches to the correct backend.
| Property | Value |
|---|---|
| Status | Live |
| Port | 8500 -> 8000 |
| Domain | orchestrator.haiven.site |
| Networks | web, backend |
| Session Store | redis://redis:6379/1 |
| Deployed Classifier | gemma4-26b via LiteLLM |
Source/runtime note: app/config.py still has qwen3.5-27b as a source fallback, but the deployed compose/runtime sets ORCH_CLASSIFIER_MODEL=gemma4-26b.
flowchart TD
Client([Client]) -->|POST /orchestrate| Orch[haiven-orchestrator]
Orch -->|classify| LiteLLM[LiteLLM]
LiteLLM -->|gemma4-26b| Delta[vLLM Delta]
Orch --> Redis[(Redis DB 1)]
Orch --> Briefing[agent-briefing]
Orch --> WorkHub[work-hub]
Orch --> Knowledge[haiven-knowledge]
Orch --> Research[research-agent]
The live taxonomy contains 17 intents:
briefing.dailybriefing.weeklydraftemail.composescheduling.queryscheduling.createscheduling.confirmresearch.topictask.createtask.queryapprovevoice_notereview_feedbackopportunity.scancontent.publishpublishsystem.statusPOST /orchestrate.session_id is present.gemma4-26b in current runtime).ORCH_CONFIDENCE_THRESHOLD (0.7 by default), the request is dispatched.clarification_needed: true.| Variable | Runtime Value / Default | Purpose |
|---|---|---|
ORCH_LITELLM_URL |
http://litellm:4000 |
Classifier gateway |
ORCH_CLASSIFIER_MODEL |
gemma4-26b in deployed runtime |
Intent classifier model |
ORCH_CONFIDENCE_THRESHOLD |
0.7 |
Dispatch threshold |
ORCH_REDIS_URL |
redis://redis:6379/1 |
Session storage |
ORCH_SESSION_TTL |
1800 |
Session TTL in seconds |
| Endpoint | Purpose |
|---|---|
POST /orchestrate |
Classify and dispatch |
GET /health |
Liveness |
GET /metrics |
Prometheus metrics |
{
"message": "What does my day look like?",
"input_modality": "text",
"output_format": "markdown"
}
{
"request_id": "a3f1b2c4-5d6e-7f8a-9b0c-1d2e3f4a5b6c",
"intent": "briefing.daily",
"confidence": 0.95,
"response": {
"content": "Here's your day for Friday...",
"sources": [],
"actions_taken": [],
"confidence": 1.0,
"model_used": "glm-4-7-flash",
"latency_ms": 450
},
"session_id": "e7d2a1f9-3b4c-5d6e-7f8a-9b0c1d2e3f4a",
"clarification_needed": false,
"clarification_message": null
}
docker compose -f /mnt/apps/docker/ai/haiven-orchestrator/docker-compose.yml up -d
docker logs -f haiven-orchestrator
curl -sf http://localhost:8500/health
/mnt/apps/docker/ai/haiven-orchestrator/USER_GUIDE.md/mnt/apps/docker/ai/haiven-orchestrator/openapi.yaml/mnt/apps/docker/_server-info/services.yml