haiven-orchestrator

Central routing service for the Haiven AI stack. It accepts natural-language requests, classifies intent through LiteLLM, and dispatches to the correct backend.

Current Runtime

Property	Value
Status	Live
Port	`8500 -> 8000`
Domain	`orchestrator.haiven.site`
Networks	`web`, `backend`
Session Store	`redis://redis:6379/1`
Deployed Classifier	`gemma4-26b` via LiteLLM

Source/runtime note: app/config.py still has qwen3.5-27b as a source fallback, but the deployed compose/runtime sets ORCH_CLASSIFIER_MODEL=gemma4-26b.

Architecture

flowchart TD
    Client([Client]) -->|POST /orchestrate| Orch[haiven-orchestrator]
    Orch -->|classify| LiteLLM[LiteLLM]
    LiteLLM -->|gemma4-26b| Delta[vLLM Delta]
    Orch --> Redis[(Redis DB 1)]
    Orch --> Briefing[agent-briefing]
    Orch --> WorkHub[work-hub]
    Orch --> Knowledge[haiven-knowledge]
    Orch --> Research[research-agent]

Intents

The live taxonomy contains 17 intents:

briefing.daily
briefing.weekly
draft
email.compose
scheduling.query
scheduling.create
scheduling.confirm
research.topic
task.create
task.query
approve
voice_note
review_feedback
opportunity.scan
content.publish
publish
system.status

Request Flow

Client sends POST /orchestrate.
The orchestrator loads recent session context from Redis when session_id is present.
LiteLLM runs the deployed classifier model (gemma4-26b in current runtime).
If confidence is at least ORCH_CONFIDENCE_THRESHOLD (0.7 by default), the request is dispatched.
If confidence is below threshold, the service returns clarification_needed: true.

Configuration

Variable	Runtime Value / Default	Purpose
`ORCH_LITELLM_URL`	`http://litellm:4000`	Classifier gateway
`ORCH_CLASSIFIER_MODEL`	`gemma4-26b` in deployed runtime	Intent classifier model
`ORCH_CONFIDENCE_THRESHOLD`	`0.7`	Dispatch threshold
`ORCH_REDIS_URL`	`redis://redis:6379/1`	Session storage
`ORCH_SESSION_TTL`	`1800`	Session TTL in seconds

Endpoints

Endpoint	Purpose
`POST /orchestrate`	Classify and dispatch
`GET /health`	Liveness
`GET /metrics`	Prometheus metrics

Example Request

{
  "message": "What does my day look like?",
  "input_modality": "text",
  "output_format": "markdown"
}

Example Response

{
  "request_id": "a3f1b2c4-5d6e-7f8a-9b0c-1d2e3f4a5b6c",
  "intent": "briefing.daily",
  "confidence": 0.95,
  "response": {
    "content": "Here's your day for Friday...",
    "sources": [],
    "actions_taken": [],
    "confidence": 1.0,
    "model_used": "glm-4-7-flash",
    "latency_ms": 450
  },
  "session_id": "e7d2a1f9-3b4c-5d6e-7f8a-9b0c1d2e3f4a",
  "clarification_needed": false,
  "clarification_message": null
}

Operations

docker compose -f /mnt/apps/docker/ai/haiven-orchestrator/docker-compose.yml up -d
docker logs -f haiven-orchestrator
curl -sf http://localhost:8500/health

/mnt/apps/docker/ai/haiven-orchestrator/USER_GUIDE.md
/mnt/apps/docker/ai/haiven-orchestrator/openapi.yaml
/mnt/apps/docker/_server-info/services.yml