MCP Server

Model Context Protocol (MCP) server providing a unified tool interface for LLMs to interact with Haiven infrastructure.

Overview

Property	Value
Status	Live
Domain	`mcp.haiven.local` (internal), `mcp.haiven.site` (external)
Port	8765
Container	`mcp-server`
GPU	RTX 4090 (24GB, UUID-based) - for embedded Whisper STT
Networks	web, backend

Architecture

                                    +------------------+
                                    |   Echo/LibreChat |
                                    |  (MCP Client)    |
                                    +--------+---------+
                                             |
                                             | MCP Protocol (HTTP)
                                             v
+---------------------------+       +--------+---------+       +------------------+
|   Traefik                 |       |                  |       |   LiteLLM        |
|   mcp.haiven.local:443    +------>+   MCP Server     +------>+   (Port 4000)    |
|                           |       |   (Port 8765)    |       |                  |
+---------------------------+       +--------+---------+       +--------+---------+
                                             |                          |
                            +----------------+----------------+         |
                            |                |                |         v
                    +-------v------+ +-------v------+ +-------v------+ +--------+
                    |  Prometheus  | |    Loki      | |   ComfyUI    | |Langfuse|
                    |  (Metrics)   | |   (Logs)     | | (Image Gen)  | |(Traces)|
                    +--------------+ +--------------+ +--------------+ +--------+

LiteLLM Integration

As of version 1.2.0, all LLM calls from MCP Server are routed through LiteLLM for observability.

Benefits

Langfuse Observability: All LLM calls are traced and logged to Langfuse (ai-ops.haiven.local)
Cost Tracking: Unified cost tracking across all clients using LiteLLM
Model Access: Access to all 33 LiteLLM models including TTS/STT capabilities
API Key Management: Centralized API key management through LiteLLM

Configuration

The following environment variables control the LiteLLM integration:

Variable	Value	Description
`LLAMA_SWAP_URL`	`http://litellm:4000`	LiteLLM endpoint (previously llama-swap)
`LITELLM_API_KEY`	`sk-***`	API key for LiteLLM authentication

Viewing Traces

Navigate to https://ai-ops.haiven.local
Log in to Langfuse
View traces under the MCP Server project
Analyze costs, latencies, and token usage

Tools

MCP Server provides 23 tools across 12 namespaces:

System Status

Tool	Description
`status/docker`	Get Docker container status and health
`status/gpu`	Get GPU utilization and memory info
`status/models`	List available LLM models from LiteLLM
`status/system`	Get CPU, memory, disk metrics from Prometheus

Docker Management

Tool	Description
`docker/restart`	Restart allowed containers
`docker/logs`	Get container logs

File Operations

Tool	Description
`files/read`	Read file contents
`files/list`	List directory contents
`files/search`	Search files by pattern

Code Execution

Tool	Description
`sandbox/bash`	Execute bash commands in isolated sandbox
`sandbox/python`	Execute Python code in isolated sandbox

Search

Tool	Description
`search/web`	Web search via SearXNG

Web Scraping (Crawl4AI)

Tool	Description
`scrape/url`	Scrape a single URL
`scrape/batch`	Scrape multiple URLs
`scrape/site`	Crawl entire website
`scrape/sitemap`	Extract URLs from sitemap

Monitoring

Tool	Description
`logs/query`	Query Loki logs
`metrics/query`	Query Prometheus metrics
`alerts/list`	List active Alertmanager alerts

Audio

Tool	Description
`audio/transcribe`	Transcribe audio (embedded Whisper)
`audio/speak`	Text-to-speech via Piper

Uptime

Tool	Description
`uptime/status`	Get Uptime Kuma status

Image Generation

Tool	Description
`image/generate`	Generate images via ComfyUI

API Endpoints

Method	Path	Description
GET	`/health`	Health check endpoint
GET	`/tools`	List available MCP tools
POST	`/mcp`	Main MCP protocol endpoint
GET	`/metrics`	Prometheus metrics
POST	`/v1/audio/transcriptions`	OpenAI-compatible STT
POST	`/v1/audio/translations`	Audio translation to English

Environment Variables

Service Endpoints

Variable	Default	Description
`LLAMA_SWAP_URL`	`http://litellm:4000`	LLM API endpoint (LiteLLM)
`LITELLM_API_KEY`	(required)	API key for LiteLLM
`PROMETHEUS_URL`	`http://prometheus:9090`	Prometheus endpoint
`LOKI_URL`	`http://loki:3100`	Loki endpoint
`ALERTMANAGER_URL`	`http://alertmanager:9093`	Alertmanager endpoint
`SEARXNG_URL`	`http://searxng:8080`	SearXNG endpoint
`PIPER_URL`	`http://piper-api:5000`	Piper TTS endpoint
`COMFYUI_URL`	`http://host.docker.internal:8188`	ComfyUI endpoint
`UPTIME_KUMA_URL`	`http://uptime-kuma:3001`	Uptime Kuma endpoint
`CRAWL4AI_URL`	`http://crawl4ai:11235`	Crawl4AI endpoint

Embedded Whisper

Variable	Default	Description
`WHISPER_MODEL`	`large-v3`	Whisper model to use
`WHISPER_DEVICE`	`cuda`	Device (cuda/cpu)
`WHISPER_COMPUTE_TYPE`	`float16`	Compute type

Server Configuration

Variable	Default	Description
`MCP_PORT`	`8765`	Server port
`MCP_TIMEOUT`	`300`	Request timeout (seconds)
`MCP_KEEPALIVE_INTERVAL`	`30`	Keepalive interval
`LOG_LEVEL`	`INFO`	Logging level

Security

Variable	Default	Description
`FILE_ACCESS_ROOTS`	`/mnt/apps/docker,/mnt/storage`	Allowed file paths
`SANDBOX_TIMEOUT`	`30`	Sandbox execution timeout
`SANDBOX_MEMORY_LIMIT`	`512m`	Sandbox memory limit

Security Features

Path Validation: Files validated with resolve() + relative_to()
File Blocklist: Blocks .env, .ssh/, *.key, *.pem, secrets
Container Allowlist: Only specified containers can be restarted
Sandbox Isolation: Code execution in isolated containers with no network
Command Sanitization: Dangerous command patterns are blocked

Health Check

curl -f http://localhost:8765/health

Response:

{
  "status": "healthy",
  "service": "mcp-server",
  "version": "1.2.0",
  "uptime_seconds": 12345
}

Usage Examples

List Available Tools

curl https://mcp.haiven.local/tools

Get Docker Status

curl -X POST https://mcp.haiven.local/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "method": "tools/call",
    "params": {
      "name": "status/docker",
      "arguments": {}
    }
  }'

Get GPU Status

curl -X POST https://mcp.haiven.local/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "method": "tools/call",
    "params": {
      "name": "status/gpu",
      "arguments": {"gpu_id": "all"}
    }
  }'

List Available Models (via LiteLLM)

curl -X POST https://mcp.haiven.local/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "method": "tools/call",
    "params": {
      "name": "status/models",
      "arguments": {}
    }
  }'

Transcribe Audio

curl -X POST https://mcp.haiven.local/v1/audio/transcriptions \
  -F "file=@audio.mp3" \
  -F "model=whisper-1"

Docker Commands

Start Service

cd /mnt/apps/docker/ai/mcp-server
docker compose up -d

View Logs

docker logs -f mcp-server

Restart Service

docker compose restart mcp-server

Rebuild After Code Changes

docker compose build --no-cache
docker compose up -d

Directory Structure

/mnt/apps/docker/ai/mcp-server/
├── docker-compose.yml      # Service configuration
├── README.md               # This file
└── USER_GUIDE.md           # End-user documentation

/mnt/apps/src/mcp-server/   # Source code (hot-reloaded)
├── Dockerfile
├── server.py               # Main server
├── config.py               # Configuration
├── whisper_manager.py      # Embedded Whisper
├── tools/                  # Tool implementations
│   ├── status.py           # System status tools
│   ├── docker_tools.py     # Docker management
│   ├── files.py            # File operations
│   ├── sandbox.py          # Code execution
│   ├── monitoring.py       # Loki/Prometheus
│   ├── audio.py            # TTS/STT
│   ├── scraping.py         # Crawl4AI integration
│   └── ...
└── utils/                  # Utilities
    ├── docker_client.py
    └── errors.py

Dependencies

Service	Purpose
litellm	LLM API gateway with Langfuse observability
prometheus	System metrics
loki	Log aggregation
alertmanager	Alert management
searxng	Web search
piper-api	Text-to-speech
uptime-kuma	Uptime monitoring
comfyui	Image generation
crawl4ai	Web scraping

Troubleshooting

LLM calls failing

Check LiteLLM is running: docker ps | grep litellm
Verify API key: Check LITELLM_API_KEY environment variable
Check Langfuse for error traces: https://ai-ops.haiven.local

Models not listing

Verify LiteLLM connection: curl http://litellm:4000/v1/models
Check MCP server logs: docker logs mcp-server

Whisper transcription failing

Verify RTX 4090 is available: nvidia-smi
Check model is loaded: Look for Whisper init in logs
Ensure audio format is supported (mp3, wav, m4a, flac, ogg, webm)

Connection refused errors

Check service is running: docker ps | grep mcp-server
Verify port binding: docker port mcp-server
Check Traefik routing: docker logs traefik 2>&1 | grep mcp

Sandbox execution timeouts

Default timeout is 30 seconds
Increase via SANDBOX_TIMEOUT environment variable
Check Docker socket permissions