MCP Server

Model Context Protocol (MCP) server providing a unified tool interface for LLMs to interact with Haiven infrastructure.

Overview

Property Value
Status Live
Domain mcp.haiven.local (internal), mcp.haiven.site (external)
Port 8765
Container mcp-server
GPU RTX 4090 (24GB, UUID-based) - for embedded Whisper STT
Networks web, backend

Architecture

                                    +------------------+
                                    |   Echo/LibreChat |
                                    |  (MCP Client)    |
                                    +--------+---------+
                                             |
                                             | MCP Protocol (HTTP)
                                             v
+---------------------------+       +--------+---------+       +------------------+
|   Traefik                 |       |                  |       |   LiteLLM        |
|   mcp.haiven.local:443    +------>+   MCP Server     +------>+   (Port 4000)    |
|                           |       |   (Port 8765)    |       |                  |
+---------------------------+       +--------+---------+       +--------+---------+
                                             |                          |
                            +----------------+----------------+         |
                            |                |                |         v
                    +-------v------+ +-------v------+ +-------v------+ +--------+
                    |  Prometheus  | |    Loki      | |   ComfyUI    | |Langfuse|
                    |  (Metrics)   | |   (Logs)     | | (Image Gen)  | |(Traces)|
                    +--------------+ +--------------+ +--------------+ +--------+

LiteLLM Integration

As of version 1.2.0, all LLM calls from MCP Server are routed through LiteLLM for observability.

Benefits

Configuration

The following environment variables control the LiteLLM integration:

Variable Value Description
LLAMA_SWAP_URL http://litellm:4000 LiteLLM endpoint (previously llama-swap)
LITELLM_API_KEY sk-*** API key for LiteLLM authentication

Viewing Traces

  1. Navigate to https://ai-ops.haiven.local
  2. Log in to Langfuse
  3. View traces under the MCP Server project
  4. Analyze costs, latencies, and token usage

Tools

MCP Server provides 23 tools across 12 namespaces:

System Status

Tool Description
status/docker Get Docker container status and health
status/gpu Get GPU utilization and memory info
status/models List available LLM models from LiteLLM
status/system Get CPU, memory, disk metrics from Prometheus

Docker Management

Tool Description
docker/restart Restart allowed containers
docker/logs Get container logs

File Operations

Tool Description
files/read Read file contents
files/list List directory contents
files/search Search files by pattern

Code Execution

Tool Description
sandbox/bash Execute bash commands in isolated sandbox
sandbox/python Execute Python code in isolated sandbox
Tool Description
search/web Web search via SearXNG

Web Scraping (Crawl4AI)

Tool Description
scrape/url Scrape a single URL
scrape/batch Scrape multiple URLs
scrape/site Crawl entire website
scrape/sitemap Extract URLs from sitemap

Monitoring

Tool Description
logs/query Query Loki logs
metrics/query Query Prometheus metrics
alerts/list List active Alertmanager alerts

Audio

Tool Description
audio/transcribe Transcribe audio (embedded Whisper)
audio/speak Text-to-speech via Piper

Uptime

Tool Description
uptime/status Get Uptime Kuma status

Image Generation

Tool Description
image/generate Generate images via ComfyUI

API Endpoints

Method Path Description
GET /health Health check endpoint
GET /tools List available MCP tools
POST /mcp Main MCP protocol endpoint
GET /metrics Prometheus metrics
POST /v1/audio/transcriptions OpenAI-compatible STT
POST /v1/audio/translations Audio translation to English

Environment Variables

Service Endpoints

Variable Default Description
LLAMA_SWAP_URL http://litellm:4000 LLM API endpoint (LiteLLM)
LITELLM_API_KEY (required) API key for LiteLLM
PROMETHEUS_URL http://prometheus:9090 Prometheus endpoint
LOKI_URL http://loki:3100 Loki endpoint
ALERTMANAGER_URL http://alertmanager:9093 Alertmanager endpoint
SEARXNG_URL http://searxng:8080 SearXNG endpoint
PIPER_URL http://piper-api:5000 Piper TTS endpoint
COMFYUI_URL http://host.docker.internal:8188 ComfyUI endpoint
UPTIME_KUMA_URL http://uptime-kuma:3001 Uptime Kuma endpoint
CRAWL4AI_URL http://crawl4ai:11235 Crawl4AI endpoint

Embedded Whisper

Variable Default Description
WHISPER_MODEL large-v3 Whisper model to use
WHISPER_DEVICE cuda Device (cuda/cpu)
WHISPER_COMPUTE_TYPE float16 Compute type

Server Configuration

Variable Default Description
MCP_PORT 8765 Server port
MCP_TIMEOUT 300 Request timeout (seconds)
MCP_KEEPALIVE_INTERVAL 30 Keepalive interval
LOG_LEVEL INFO Logging level

Security

Variable Default Description
FILE_ACCESS_ROOTS /mnt/apps/docker,/mnt/storage Allowed file paths
SANDBOX_TIMEOUT 30 Sandbox execution timeout
SANDBOX_MEMORY_LIMIT 512m Sandbox memory limit

Security Features

Health Check

curl -f http://localhost:8765/health

Response:

{
  "status": "healthy",
  "service": "mcp-server",
  "version": "1.2.0",
  "uptime_seconds": 12345
}

Usage Examples

List Available Tools

curl https://mcp.haiven.local/tools

Get Docker Status

curl -X POST https://mcp.haiven.local/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "method": "tools/call",
    "params": {
      "name": "status/docker",
      "arguments": {}
    }
  }'

Get GPU Status

curl -X POST https://mcp.haiven.local/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "method": "tools/call",
    "params": {
      "name": "status/gpu",
      "arguments": {"gpu_id": "all"}
    }
  }'

List Available Models (via LiteLLM)

curl -X POST https://mcp.haiven.local/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "method": "tools/call",
    "params": {
      "name": "status/models",
      "arguments": {}
    }
  }'

Transcribe Audio

curl -X POST https://mcp.haiven.local/v1/audio/transcriptions \
  -F "file=@audio.mp3" \
  -F "model=whisper-1"

Docker Commands

Start Service

cd /mnt/apps/docker/ai/mcp-server
docker compose up -d

View Logs

docker logs -f mcp-server

Restart Service

docker compose restart mcp-server

Rebuild After Code Changes

docker compose build --no-cache
docker compose up -d

Directory Structure

/mnt/apps/docker/ai/mcp-server/
├── docker-compose.yml      # Service configuration
├── README.md               # This file
└── USER_GUIDE.md           # End-user documentation

/mnt/apps/src/mcp-server/   # Source code (hot-reloaded)
├── Dockerfile
├── server.py               # Main server
├── config.py               # Configuration
├── whisper_manager.py      # Embedded Whisper
├── tools/                  # Tool implementations
   ├── status.py           # System status tools
   ├── docker_tools.py     # Docker management
   ├── files.py            # File operations
   ├── sandbox.py          # Code execution
   ├── monitoring.py       # Loki/Prometheus
   ├── audio.py            # TTS/STT
   ├── scraping.py         # Crawl4AI integration
   └── ...
└── utils/                  # Utilities
    ├── docker_client.py
    └── errors.py

Dependencies

Service Purpose
litellm LLM API gateway with Langfuse observability
prometheus System metrics
loki Log aggregation
alertmanager Alert management
searxng Web search
piper-api Text-to-speech
uptime-kuma Uptime monitoring
comfyui Image generation
crawl4ai Web scraping

Troubleshooting

LLM calls failing

  1. Check LiteLLM is running: docker ps | grep litellm
  2. Verify API key: Check LITELLM_API_KEY environment variable
  3. Check Langfuse for error traces: https://ai-ops.haiven.local

Models not listing

  1. Verify LiteLLM connection: curl http://litellm:4000/v1/models
  2. Check MCP server logs: docker logs mcp-server

Whisper transcription failing

  1. Verify RTX 4090 is available: nvidia-smi
  2. Check model is loaded: Look for Whisper init in logs
  3. Ensure audio format is supported (mp3, wav, m4a, flac, ogg, webm)

Connection refused errors

  1. Check service is running: docker ps | grep mcp-server
  2. Verify port binding: docker port mcp-server
  3. Check Traefik routing: docker logs traefik 2>&1 | grep mcp

Sandbox execution timeouts

  1. Default timeout is 30 seconds
  2. Increase via SANDBOX_TIMEOUT environment variable
  3. Check Docker socket permissions