haiven-orchestrator

Central routing service for the Haiven AI stack. It accepts natural-language requests, classifies intent through LiteLLM, and dispatches to the correct backend.

Current Runtime

Property Value
Status Live
Port 8500 -> 8000
Domain orchestrator.haiven.site
Networks web, backend
Session Store redis://redis:6379/1
Deployed Classifier gemma4-26b via LiteLLM

Source/runtime note: app/config.py still has qwen3.5-27b as a source fallback, but the deployed compose/runtime sets ORCH_CLASSIFIER_MODEL=gemma4-26b.

Architecture

flowchart TD
    Client([Client]) -->|POST /orchestrate| Orch[haiven-orchestrator]
    Orch -->|classify| LiteLLM[LiteLLM]
    LiteLLM -->|gemma4-26b| Delta[vLLM Delta]
    Orch --> Redis[(Redis DB 1)]
    Orch --> Briefing[agent-briefing]
    Orch --> WorkHub[work-hub]
    Orch --> Knowledge[haiven-knowledge]
    Orch --> Research[research-agent]

Intents

The live taxonomy contains 17 intents:

Request Flow

  1. Client sends POST /orchestrate.
  2. The orchestrator loads recent session context from Redis when session_id is present.
  3. LiteLLM runs the deployed classifier model (gemma4-26b in current runtime).
  4. If confidence is at least ORCH_CONFIDENCE_THRESHOLD (0.7 by default), the request is dispatched.
  5. If confidence is below threshold, the service returns clarification_needed: true.

Configuration

Variable Runtime Value / Default Purpose
ORCH_LITELLM_URL http://litellm:4000 Classifier gateway
ORCH_CLASSIFIER_MODEL gemma4-26b in deployed runtime Intent classifier model
ORCH_CONFIDENCE_THRESHOLD 0.7 Dispatch threshold
ORCH_REDIS_URL redis://redis:6379/1 Session storage
ORCH_SESSION_TTL 1800 Session TTL in seconds

Endpoints

Endpoint Purpose
POST /orchestrate Classify and dispatch
GET /health Liveness
GET /metrics Prometheus metrics

Example Request

{
  "message": "What does my day look like?",
  "input_modality": "text",
  "output_format": "markdown"
}

Example Response

{
  "request_id": "a3f1b2c4-5d6e-7f8a-9b0c-1d2e3f4a5b6c",
  "intent": "briefing.daily",
  "confidence": 0.95,
  "response": {
    "content": "Here's your day for Friday...",
    "sources": [],
    "actions_taken": [],
    "confidence": 1.0,
    "model_used": "glm-4-7-flash",
    "latency_ms": 450
  },
  "session_id": "e7d2a1f9-3b4c-5d6e-7f8a-9b0c1d2e3f4a",
  "clarification_needed": false,
  "clarification_message": null
}

Operations

docker compose -f /mnt/apps/docker/ai/haiven-orchestrator/docker-compose.yml up -d
docker logs -f haiven-orchestrator
curl -sf http://localhost:8500/health