MCP Server

Model Context Protocol (MCP) server providing a unified tool interface for LLMs to interact with Haiven infrastructure.

Overview

Property Value
Status Live
Domain mcp.haiven.site (internal), mcp.haiven.site (external)
Port 8765
Container mcp-server
GPU RTX PRO 6000 pro6000-alpha (96GB, UUID-based) - for embedded Whisper STT
Networks web, backend

Architecture

                                    +------------------+
                                    |   Echo/LibreChat |
                                    |  (MCP Client)    |
                                    +--------+---------+
                                             |
                                             | MCP Protocol (HTTP)
                                             v
+---------------------------+       +--------+---------+       +------------------+
|   Traefik                 |       |                  |       |   LiteLLM        |
|   mcp.haiven.site:443    +------>+   MCP Server     +------>+   (Port 4000)    |
|                           |       |   (Port 8765)    |       |                  |
+---------------------------+       +--------+---------+       +--------+---------+
                                             |                          |
                            +----------------+----------------+         |
                            |                |                |         v
                    +-------v------+ +-------v------+ +-------v------+ +--------+
                    |  Prometheus  | |    Loki      | |   ComfyUI    | |Langfuse|
                    |  (Metrics)   | |   (Logs)     | | (Image Gen)  | |(Traces)|
                    +--------------+ +--------------+ +--------------+ +--------+

LiteLLM Integration

As of version 1.2.0, all LLM calls from MCP Server are routed through LiteLLM for observability.

Benefits

Configuration

The following environment variables control the LiteLLM integration:

Variable Value Description
LLAMA_SWAP_URL http://litellm:4000 LiteLLM endpoint (previously llama-swap)
LITELLM_API_KEY sk-*** API key for LiteLLM authentication

Viewing Traces

  1. Navigate to https://ai-ops.haiven.site
  2. Log in to Langfuse
  3. View traces under the MCP Server project
  4. Analyze costs, latencies, and token usage

Changelog

2026-05-01 — v2 Search/Scrape Remediations

Seven remediations landed across search.py, scrape.py, and research.py. Container restart required to activate (bind-mounted source).

search/web new params:
- source_type (enum: general|academic|news|code|social|primary) — expands to a curated SearXNG engine CSV; prefer over hand-typed engines. general uses SearXNG's default mix (no engines override).
- snippet_min_chars (int, default 0, range 0–1000) — when >0, flags results with short snippets as snippet_too_short: true. Does not auto-refetch; use as a signal to call scrape/url.

search/web new response fields:
- engine_status — per-engine dict: {name: {status, result_count, reason}}. Distinguishes real zero hits from engine timeouts.
- engines_failed — flat list of unresponsive engine names.
- snippet_length — character count of each result's snippet.

search/and_fetch new params:
- source_type — same enum as search/web; forwarded to the underlying search call.
- body_max_chars (int, default 2000, max 50000) — per-body markdown truncation limit. 0 = no truncation.

search/and_fetch new response field:
- suppressed_global — URLs dropped by the cross-query dedup/per-domain-cap pass; exposed for diagnostics.

scrape/* envelope promotion:
- scrape/url, scrape/batch, scrape/site, scrape/sitemap now use the unified _envelope response shape (build_success/build_error) matching the search tools.
- error_category on all scrape errors: transient (timeout/connection — worth retrying) | client (bad input — don't retry) | upstream (Crawl4AI or remote returned empty/error).

scrape/batch new param:
- body_max_chars (int, default 2000, max 50000, 0 = no truncation) — replaces hardcoded 2000-char truncation.

Langfuse child spans added:
- scrape.url.crawl4ai_post on scrape/url
- scrape.batch.crawl4ai_post on scrape/batch

Tools

MCP Server provides 23 tools across 12 namespaces:

System Status

Tool Description
status/docker Get Docker container status and health
status/gpu Get GPU utilization and memory info
status/models List available LLM models from LiteLLM
status/system Get CPU, memory, disk metrics from Prometheus

Docker Management

Tool Description
docker/restart Restart allowed containers
docker/logs Get container logs

File Operations

Tool Description
files/read Read file contents
files/list List directory contents
files/search Search files by pattern

Code Execution

Tool Description
sandbox/bash Execute bash commands in isolated sandbox
sandbox/python Execute Python code in isolated sandbox
Tool Description
search/web Web search via SearXNG. Params: source_type, snippet_min_chars. Response includes engine_status, engines_failed, snippet_length.
search/and_fetch Search + parallel scrape in one call. Params: source_type, body_max_chars. Response includes suppressed_global.

Web Scraping (Crawl4AI)

Tool Description
scrape/url Scrape a single URL. Unified error envelope with error_category.
scrape/batch Crawl multiple pages from a seed URL. Params: body_max_chars (0 = no truncation).
scrape/site Crawl entire website
scrape/sitemap Extract URLs from sitemap

Monitoring

Tool Description
logs/query Query Loki logs
metrics/query Query Prometheus metrics
alerts/list List active Alertmanager alerts

Audio

Tool Description
audio/transcribe Transcribe audio (embedded Whisper)
audio/speak Text-to-speech via Piper

Uptime

Tool Description
uptime/status Get Uptime Kuma status

Image Generation

Tool Description
image/generate Generate images via ComfyUI

API Endpoints

Method Path Description
GET /health Health check endpoint
GET /tools List available MCP tools
POST /mcp Main MCP protocol endpoint
GET /metrics Prometheus metrics
POST /v1/audio/transcriptions OpenAI-compatible STT
POST /v1/audio/translations Audio translation to English

Environment Variables

Service Endpoints

Variable Default Description
LLAMA_SWAP_URL http://litellm:4000 LLM API endpoint (LiteLLM)
LITELLM_API_KEY (required) API key for LiteLLM
PROMETHEUS_URL http://prometheus:9090 Prometheus endpoint
LOKI_URL http://loki:3100 Loki endpoint
ALERTMANAGER_URL http://alertmanager:9093 Alertmanager endpoint
SEARXNG_URL http://searxng:8080 SearXNG endpoint
PIPER_URL http://piper-api:5000 Piper TTS endpoint
COMFYUI_URL http://host.docker.internal:8188 ComfyUI endpoint
UPTIME_KUMA_URL http://uptime-kuma:3001 Uptime Kuma endpoint
CRAWL4AI_URL http://crawl4ai:11235 Crawl4AI endpoint

Embedded Whisper

Variable Default Description
WHISPER_MODEL large-v3 Whisper model to use
WHISPER_DEVICE cuda Device (cuda/cpu)
WHISPER_COMPUTE_TYPE float16 Compute type

Server Configuration

Variable Default Description
MCP_PORT 8765 Server port
MCP_TIMEOUT 300 Request timeout (seconds)
MCP_KEEPALIVE_INTERVAL 30 Keepalive interval
LOG_LEVEL INFO Logging level

Security

Variable Default Description
FILE_ACCESS_ROOTS /mnt/apps/docker,/mnt/storage Allowed file paths
SANDBOX_TIMEOUT 30 Sandbox execution timeout
SANDBOX_MEMORY_LIMIT 512m Sandbox memory limit

Security Features

Health Check

curl -f http://localhost:8765/health

Response:

{
  "status": "healthy",
  "service": "mcp-server",
  "version": "1.2.0",
  "uptime_seconds": 12345
}

Usage Examples

List Available Tools

curl https://mcp.haiven.site/tools

Get Docker Status

curl -X POST https://mcp.haiven.site/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "method": "tools/call",
    "params": {
      "name": "status/docker",
      "arguments": {}
    }
  }'

Get GPU Status

curl -X POST https://mcp.haiven.site/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "method": "tools/call",
    "params": {
      "name": "status/gpu",
      "arguments": {"gpu_id": "all"}
    }
  }'

List Available Models (via LiteLLM)

curl -X POST https://mcp.haiven.site/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "method": "tools/call",
    "params": {
      "name": "status/models",
      "arguments": {}
    }
  }'

Transcribe Audio

curl -X POST https://mcp.haiven.site/v1/audio/transcriptions \
  -F "file=@audio.mp3" \
  -F "model=whisper-1"

Docker Commands

Start Service

cd /mnt/apps/docker/ai/mcp-server
docker compose up -d

View Logs

docker logs -f mcp-server

Restart Service

docker compose restart mcp-server

Rebuild After Code Changes

docker compose build --no-cache
docker compose up -d

Directory Structure

/mnt/apps/docker/ai/mcp-server/
├── docker-compose.yml      # Service configuration
├── README.md               # This file
└── USER_GUIDE.md           # End-user documentation

/mnt/apps/src/mcp-server/   # Source code (hot-reloaded)
├── Dockerfile
├── server.py               # Main server
├── config.py               # Configuration
├── whisper_manager.py      # Embedded Whisper
├── tools/                  # Tool implementations
   ├── status.py           # System status tools
   ├── docker_tools.py     # Docker management
   ├── files.py            # File operations
   ├── sandbox.py          # Code execution
   ├── monitoring.py       # Loki/Prometheus
   ├── audio.py            # TTS/STT
   ├── scraping.py         # Crawl4AI integration
   └── ...
└── utils/                  # Utilities
    ├── docker_client.py
    └── errors.py

Dependencies

Service Purpose
litellm LLM API gateway with Langfuse observability
prometheus System metrics
loki Log aggregation
alertmanager Alert management
searxng Web search
piper-api Text-to-speech
uptime-kuma Uptime monitoring
comfyui Image generation
crawl4ai Web scraping

Troubleshooting

LLM calls failing

  1. Check LiteLLM is running: docker ps | grep litellm
  2. Verify API key: Check LITELLM_API_KEY environment variable
  3. Check Langfuse for error traces: https://ai-ops.haiven.site

Models not listing

  1. Verify LiteLLM connection: curl http://litellm:4000/v1/models
  2. Check MCP server logs: docker logs mcp-server

Whisper transcription failing

  1. Verify RTX PRO 6000 pro6000-alpha is available: nvidia-smi
  2. Check model is loaded: Look for Whisper init in logs
  3. Ensure audio format is supported (mp3, wav, m4a, flac, ogg, webm)

Connection refused errors

  1. Check service is running: docker ps | grep mcp-server
  2. Verify port binding: docker port mcp-server
  3. Check Traefik routing: docker logs traefik 2>&1 | grep mcp

Sandbox execution timeouts

  1. Default timeout is 30 seconds
  2. Increase via SANDBOX_TIMEOUT environment variable
  3. Check Docker socket permissions