MCP Server User Guide

This guide explains how to use the MCP Server tools through Echo (LibreChat) or direct API access.

Getting Started

Access Methods

Method	URL	Use Case
Echo Chat	`https://echo.haiven.local`	Interactive chat with tool access
Direct API	`https://mcp.haiven.local`	Programmatic tool execution
External	`https://mcp.haiven.site`	Remote access (authenticated)

Quick Start

Open Echo at https://echo.haiven.local
Select an agent with MCP tools enabled
Ask natural language questions like:
- "What containers are running?"
- "Show me GPU usage"
- "What models are available?"

Common Use Cases

1. Monitoring System Status

Check Docker Containers

Ask: "Show me all running containers"
Tool: status/docker

The response includes container name, status, health, uptime, and ports.

Check GPU Usage

Ask: "How much GPU memory is being used?"
Tool: status/gpu

Returns utilization percentage, memory used/total, and temperature for all GPUs:
- RTX 4090 (24GB): Multimodal tasks (image gen, audio, ML)
- PRO 6000 Alpha (96GB): LLM inference (primary)
- PRO 6000 Bravo (96GB): LLM inference (overflow)

Check System Resources

Ask: "What's the CPU and memory usage?"
Tool: status/system

Returns CPU%, memory GB, disk usage, and load averages.

2. Working with Models

List Available Models

Ask: "What LLM models can I use?"
Tool: status/models

This now queries LiteLLM, providing access to all 33 configured models including:
- Local GGUF models via llama-swap
- TTS models (Piper voices)
- STT models (Whisper)

Understanding Model Traces

All model calls are logged to Langfuse. To view traces:
1. Visit https://ai-ops.haiven.local
2. Navigate to Traces
3. Filter by service or model name

3. Reading Files and Logs

Read a Configuration File

Ask: "Show me the llama-swap config"
Tool: files/read
Path: /mnt/apps/docker/ai/llama-swap/config.yaml

Search for Files

Ask: "Find all docker-compose.yml files"
Tool: files/search
Pattern: docker-compose.yml

Query Loki Logs

Ask: "Show me errors from the last hour"
Tool: logs/query

4. Docker Management

View Container Logs

Ask: "Show me the last 50 lines of mcp-server logs"
Tool: docker/logs
Container: mcp-server
Lines: 50

Restart a Service

Ask: "Restart the echo container"
Tool: docker/restart
Container: echo

Note: Only allowed containers can be restarted (security feature).

5. Audio Transcription

Transcribe Audio

Ask: "Transcribe this audio file"
Tool: audio/transcribe

Or use the OpenAI-compatible API directly:

curl -X POST https://mcp.haiven.local/v1/audio/transcriptions \
  -F "file=@meeting.mp3" \
  -F "model=whisper-1"

Supported formats: mp3, wav, m4a, flac, ogg, webm

6. Web Search and Scraping

Search the Web

Ask: "Search for Python async best practices"
Tool: search/web

Scrape a Webpage

Ask: "Get the content from https://example.com"
Tool: scrape/url

Crawl a Website

Ask: "Crawl the documentation site and get all pages"
Tool: scrape/site

7. Code Execution

Run a Bash Command

Ask: "List files in /mnt/apps/docker"
Tool: sandbox/bash
Command: ls -la /mnt/apps/docker

Run Python Code

Ask: "Calculate the sum of 1 to 100"
Tool: sandbox/python
Code: print(sum(range(1, 101)))

Note: Code runs in isolated sandboxes with no network access.

Tutorials

Tutorial 1: Infrastructure Health Check

Goal: Get a complete overview of system health

Check containers: "Show me all container statuses"
Check GPUs: "What's the GPU memory usage?"
Check resources: "Show detailed system metrics"
Check alerts: "Are there any active alerts?"
Check uptime: "What services are down in Uptime Kuma?"

Tutorial 2: Debugging a Service

Goal: Troubleshoot a misbehaving service

Check status: "Is the echo container healthy?"
View logs: "Show me the last 100 lines of echo logs"
Check resources: "What's the memory usage?"
Search for errors: "Search echo logs for 'error' in the last hour"
Restart if needed: "Restart the echo container"

Tutorial 3: Model Exploration

Goal: Understand available models and their usage

List models: "What models are available?"
Check traces: Visit https://ai-ops.haiven.local
View costs: Check the Langfuse dashboard for token usage
Compare latency: Filter traces by model to compare response times

Tutorial 4: Audio Processing

Goal: Transcribe a voice recording

Upload audio file to Echo or use API
Transcribe: "Transcribe this audio file"
Review output: Check the transcription text
Translate if needed: Use audio/translations for non-English

Tips and Best Practices

Natural Language Works Best

Instead of trying to specify exact tool parameters, describe what you want:

Less Effective	More Effective
"Run status/docker with show_stopped=true"	"Show me all containers including stopped ones"
"Execute files/read path=/mnt/apps/docker/traefik/traefik.yml"	"Show me the Traefik configuration"

Use Specific Questions for Better Results

Vague	Specific
"Check the system"	"What's the CPU usage and available memory?"
"Look at logs"	"Show me errors in the llama-swap logs from the last hour"

Combine Tools for Complex Tasks

The AI can chain multiple tools together:
- "Check if llama-swap is running, show its logs if there are errors, and restart it if needed"
- "Find all services using the PRO 6000 and show their memory usage"

Security Awareness

Some file paths are blocked (secrets, credentials, SSH keys)
Code execution has no network access
Only specific containers can be restarted
All actions are logged

Observability with Langfuse

Viewing Your Traces

Open https://ai-ops.haiven.local
Log in with your credentials
Navigate to Traces in the sidebar
Filter by:
- Time range
- Model name
- User ID
- Tags

Understanding Trace Data

Each trace shows:
- Input: The prompt sent to the model
- Output: The model's response
- Tokens: Input/output token counts
- Latency: Response time in milliseconds
- Cost: Estimated cost (if configured)

Debugging with Traces

When something goes wrong:
1. Find the relevant trace by timestamp
2. Check the input for issues
3. Review the output for errors
4. Look at the latency for timeout issues
5. Check parent spans for context

Troubleshooting

Service Won't Start - NVIDIA GPU Driver Not Loaded

Symptoms:
- mcp-server container exits with code 128
- Docker error: failed to initialize NVML: Driver Not Loaded
- nvidia-smi fails on the host

Root Cause:
mcp-server requires GPU access for audio transcription and other AI tasks. If the NVIDIA kernel modules aren't loaded (e.g., after a kernel upgrade without matching module rebuild), the container cannot initialize GPU access.

How to Verify:

nvidia-smi  # Should show GPU info; failure means driver not loaded
uname -r    # Check current kernel version
docker logs mcp-server 2>&1 | head -20  # Check for NVML errors

Fix:

# Install NVIDIA modules for current kernel and reboot
sudo apt update
sudo apt install linux-modules-nvidia-580-open-$(uname -r)
sudo reboot

# After reboot, verify and restart
nvidia-smi
cd /mnt/apps/docker/ai/mcp-server && docker compose up -d

Prevention: Install nvidia-dkms-580-open for automatic module rebuilds on kernel upgrades.

"Tool not found"

The tool name may have changed
Ask "What tools are available?" to see current tools
Check if the MCP server is running

"Permission denied"

The file path may be blocked for security
Container may not be in the restart allowlist
Check the security features in README.md

"Connection refused"

Service may be down: "Is the mcp-server container running?"
Network issue: Check Traefik routing
Try the internal endpoint if external fails

"Timeout"

Long-running queries may exceed limits
Try breaking into smaller requests
Check if backend service is responsive

Audio transcription fails

Check file format (mp3, wav, m4a, flac, ogg, webm)
Ensure file is not corrupted
RTX 4090 may be overloaded - check GPU status

Quick Reference

Tool Namespaces

Namespace	Tools
status	docker, gpu, models, system
docker	restart, logs
files	read, list, search
sandbox	bash, python
search	web
scrape	url, batch, site, sitemap
logs	query
metrics	query
alerts	list
audio	transcribe, speak
uptime	status
image	generate

API Endpoints

Endpoint	Method	Purpose
`/health`	GET	Health check
`/tools`	GET	List tools
`/mcp`	POST	Execute tool
`/metrics`	GET	Prometheus metrics
`/v1/audio/transcriptions`	POST	STT
`/v1/audio/translations`	POST	Translation

Environment Quick Reference

What	Where
MCP Server	`https://mcp.haiven.local`
Echo Chat	`https://echo.haiven.local`
Langfuse	`https://ai-ops.haiven.local`
LiteLLM	`https://llm.haiven.local`
Grafana	`https://grafana.haiven.local`

Getting Help

Check the README: /mnt/apps/docker/ai/mcp-server/README.md
View API docs: https://docs.haiven.local
Check service status: Use status/docker tool
View logs: Use docker/logs tool for mcp-server