MCP Server
Model Context Protocol (MCP) server providing a unified tool interface for LLMs to interact with Haiven infrastructure.
Overview
| Property |
Value |
| Status |
Live |
| Domain |
mcp.haiven.local (internal), mcp.haiven.site (external) |
| Port |
8765 |
| Container |
mcp-server |
| GPU |
RTX 4090 (24GB, UUID-based) - for embedded Whisper STT |
| Networks |
web, backend |
Architecture
+------------------+
| Echo/LibreChat |
| (MCP Client) |
+--------+---------+
|
| MCP Protocol (HTTP)
v
+---------------------------+ +--------+---------+ +------------------+
| Traefik | | | | LiteLLM |
| mcp.haiven.local:443 +------>+ MCP Server +------>+ (Port 4000) |
| | | (Port 8765) | | |
+---------------------------+ +--------+---------+ +--------+---------+
| |
+----------------+----------------+ |
| | | v
+-------v------+ +-------v------+ +-------v------+ +--------+
| Prometheus | | Loki | | ComfyUI | |Langfuse|
| (Metrics) | | (Logs) | | (Image Gen) | |(Traces)|
+--------------+ +--------------+ +--------------+ +--------+
LiteLLM Integration
As of version 1.2.0, all LLM calls from MCP Server are routed through LiteLLM for observability.
Benefits
- Langfuse Observability: All LLM calls are traced and logged to Langfuse (
ai-ops.haiven.local)
- Cost Tracking: Unified cost tracking across all clients using LiteLLM
- Model Access: Access to all 33 LiteLLM models including TTS/STT capabilities
- API Key Management: Centralized API key management through LiteLLM
Configuration
The following environment variables control the LiteLLM integration:
| Variable |
Value |
Description |
LLAMA_SWAP_URL |
http://litellm:4000 |
LiteLLM endpoint (previously llama-swap) |
LITELLM_API_KEY |
sk-*** |
API key for LiteLLM authentication |
Viewing Traces
- Navigate to
https://ai-ops.haiven.local
- Log in to Langfuse
- View traces under the MCP Server project
- Analyze costs, latencies, and token usage
MCP Server provides 23 tools across 12 namespaces:
System Status
| Tool |
Description |
status/docker |
Get Docker container status and health |
status/gpu |
Get GPU utilization and memory info |
status/models |
List available LLM models from LiteLLM |
status/system |
Get CPU, memory, disk metrics from Prometheus |
Docker Management
| Tool |
Description |
docker/restart |
Restart allowed containers |
docker/logs |
Get container logs |
File Operations
| Tool |
Description |
files/read |
Read file contents |
files/list |
List directory contents |
files/search |
Search files by pattern |
Code Execution
| Tool |
Description |
sandbox/bash |
Execute bash commands in isolated sandbox |
sandbox/python |
Execute Python code in isolated sandbox |
Search
| Tool |
Description |
search/web |
Web search via SearXNG |
Web Scraping (Crawl4AI)
| Tool |
Description |
scrape/url |
Scrape a single URL |
scrape/batch |
Scrape multiple URLs |
scrape/site |
Crawl entire website |
scrape/sitemap |
Extract URLs from sitemap |
Monitoring
| Tool |
Description |
logs/query |
Query Loki logs |
metrics/query |
Query Prometheus metrics |
alerts/list |
List active Alertmanager alerts |
Audio
| Tool |
Description |
audio/transcribe |
Transcribe audio (embedded Whisper) |
audio/speak |
Text-to-speech via Piper |
Uptime
| Tool |
Description |
uptime/status |
Get Uptime Kuma status |
Image Generation
| Tool |
Description |
image/generate |
Generate images via ComfyUI |
API Endpoints
| Method |
Path |
Description |
| GET |
/health |
Health check endpoint |
| GET |
/tools |
List available MCP tools |
| POST |
/mcp |
Main MCP protocol endpoint |
| GET |
/metrics |
Prometheus metrics |
| POST |
/v1/audio/transcriptions |
OpenAI-compatible STT |
| POST |
/v1/audio/translations |
Audio translation to English |
Environment Variables
Service Endpoints
| Variable |
Default |
Description |
LLAMA_SWAP_URL |
http://litellm:4000 |
LLM API endpoint (LiteLLM) |
LITELLM_API_KEY |
(required) |
API key for LiteLLM |
PROMETHEUS_URL |
http://prometheus:9090 |
Prometheus endpoint |
LOKI_URL |
http://loki:3100 |
Loki endpoint |
ALERTMANAGER_URL |
http://alertmanager:9093 |
Alertmanager endpoint |
SEARXNG_URL |
http://searxng:8080 |
SearXNG endpoint |
PIPER_URL |
http://piper-api:5000 |
Piper TTS endpoint |
COMFYUI_URL |
http://host.docker.internal:8188 |
ComfyUI endpoint |
UPTIME_KUMA_URL |
http://uptime-kuma:3001 |
Uptime Kuma endpoint |
CRAWL4AI_URL |
http://crawl4ai:11235 |
Crawl4AI endpoint |
Embedded Whisper
| Variable |
Default |
Description |
WHISPER_MODEL |
large-v3 |
Whisper model to use |
WHISPER_DEVICE |
cuda |
Device (cuda/cpu) |
WHISPER_COMPUTE_TYPE |
float16 |
Compute type |
Server Configuration
| Variable |
Default |
Description |
MCP_PORT |
8765 |
Server port |
MCP_TIMEOUT |
300 |
Request timeout (seconds) |
MCP_KEEPALIVE_INTERVAL |
30 |
Keepalive interval |
LOG_LEVEL |
INFO |
Logging level |
Security
| Variable |
Default |
Description |
FILE_ACCESS_ROOTS |
/mnt/apps/docker,/mnt/storage |
Allowed file paths |
SANDBOX_TIMEOUT |
30 |
Sandbox execution timeout |
SANDBOX_MEMORY_LIMIT |
512m |
Sandbox memory limit |
Security Features
- Path Validation: Files validated with
resolve() + relative_to()
- File Blocklist: Blocks
.env, .ssh/, *.key, *.pem, secrets
- Container Allowlist: Only specified containers can be restarted
- Sandbox Isolation: Code execution in isolated containers with no network
- Command Sanitization: Dangerous command patterns are blocked
Health Check
curl -f http://localhost:8765/health
Response:
{
"status": "healthy",
"service": "mcp-server",
"version": "1.2.0",
"uptime_seconds": 12345
}
Usage Examples
curl https://mcp.haiven.local/tools
Get Docker Status
curl -X POST https://mcp.haiven.local/mcp \
-H "Content-Type: application/json" \
-d '{
"method": "tools/call",
"params": {
"name": "status/docker",
"arguments": {}
}
}'
Get GPU Status
curl -X POST https://mcp.haiven.local/mcp \
-H "Content-Type: application/json" \
-d '{
"method": "tools/call",
"params": {
"name": "status/gpu",
"arguments": {"gpu_id": "all"}
}
}'
List Available Models (via LiteLLM)
curl -X POST https://mcp.haiven.local/mcp \
-H "Content-Type: application/json" \
-d '{
"method": "tools/call",
"params": {
"name": "status/models",
"arguments": {}
}
}'
Transcribe Audio
curl -X POST https://mcp.haiven.local/v1/audio/transcriptions \
-F "file=@audio.mp3" \
-F "model=whisper-1"
Docker Commands
Start Service
cd /mnt/apps/docker/ai/mcp-server
docker compose up -d
View Logs
docker logs -f mcp-server
Restart Service
docker compose restart mcp-server
Rebuild After Code Changes
docker compose build --no-cache
docker compose up -d
Directory Structure
/mnt/apps/docker/ai/mcp-server/
├── docker-compose.yml # Service configuration
├── README.md # This file
└── USER_GUIDE.md # End-user documentation
/mnt/apps/src/mcp-server/ # Source code (hot-reloaded)
├── Dockerfile
├── server.py # Main server
├── config.py # Configuration
├── whisper_manager.py # Embedded Whisper
├── tools/ # Tool implementations
│ ├── status.py # System status tools
│ ├── docker_tools.py # Docker management
│ ├── files.py # File operations
│ ├── sandbox.py # Code execution
│ ├── monitoring.py # Loki/Prometheus
│ ├── audio.py # TTS/STT
│ ├── scraping.py # Crawl4AI integration
│ └── ...
└── utils/ # Utilities
├── docker_client.py
└── errors.py
Dependencies
| Service |
Purpose |
| litellm |
LLM API gateway with Langfuse observability |
| prometheus |
System metrics |
| loki |
Log aggregation |
| alertmanager |
Alert management |
| searxng |
Web search |
| piper-api |
Text-to-speech |
| uptime-kuma |
Uptime monitoring |
| comfyui |
Image generation |
| crawl4ai |
Web scraping |
Troubleshooting
LLM calls failing
- Check LiteLLM is running:
docker ps | grep litellm
- Verify API key: Check
LITELLM_API_KEY environment variable
- Check Langfuse for error traces:
https://ai-ops.haiven.local
Models not listing
- Verify LiteLLM connection:
curl http://litellm:4000/v1/models
- Check MCP server logs:
docker logs mcp-server
Whisper transcription failing
- Verify RTX 4090 is available:
nvidia-smi
- Check model is loaded: Look for Whisper init in logs
- Ensure audio format is supported (mp3, wav, m4a, flac, ogg, webm)
Connection refused errors
- Check service is running:
docker ps | grep mcp-server
- Verify port binding:
docker port mcp-server
- Check Traefik routing:
docker logs traefik 2>&1 | grep mcp
Sandbox execution timeouts
- Default timeout is 30 seconds
- Increase via
SANDBOX_TIMEOUT environment variable
- Check Docker socket permissions