MCP Server User Guide

This guide explains how to use the MCP Server tools through Echo (LibreChat) or direct API access.

Getting Started

Access Methods

Method URL Use Case
Echo Chat https://echo.haiven.local Interactive chat with tool access
Direct API https://mcp.haiven.local Programmatic tool execution
External https://mcp.haiven.site Remote access (authenticated)

Quick Start

  1. Open Echo at https://echo.haiven.local
  2. Select an agent with MCP tools enabled
  3. Ask natural language questions like:
    - "What containers are running?"
    - "Show me GPU usage"
    - "What models are available?"

Common Use Cases

1. Monitoring System Status

Check Docker Containers

Ask: "Show me all running containers"
Tool: status/docker

The response includes container name, status, health, uptime, and ports.

Check GPU Usage

Ask: "How much GPU memory is being used?"
Tool: status/gpu

Returns utilization percentage, memory used/total, and temperature for all GPUs:
- RTX 4090 (24GB): Multimodal tasks (image gen, audio, ML)
- PRO 6000 Alpha (96GB): LLM inference (primary)
- PRO 6000 Bravo (96GB): LLM inference (overflow)

Check System Resources

Ask: "What's the CPU and memory usage?"
Tool: status/system

Returns CPU%, memory GB, disk usage, and load averages.

2. Working with Models

List Available Models

Ask: "What LLM models can I use?"
Tool: status/models

This now queries LiteLLM, providing access to all 33 configured models including:
- Local GGUF models via llama-swap
- TTS models (Piper voices)
- STT models (Whisper)

Understanding Model Traces

All model calls are logged to Langfuse. To view traces:
1. Visit https://ai-ops.haiven.local
2. Navigate to Traces
3. Filter by service or model name

3. Reading Files and Logs

Read a Configuration File

Ask: "Show me the llama-swap config"
Tool: files/read
Path: /mnt/apps/docker/ai/llama-swap/config.yaml

Search for Files

Ask: "Find all docker-compose.yml files"
Tool: files/search
Pattern: docker-compose.yml

Query Loki Logs

Ask: "Show me errors from the last hour"
Tool: logs/query

4. Docker Management

View Container Logs

Ask: "Show me the last 50 lines of mcp-server logs"
Tool: docker/logs
Container: mcp-server
Lines: 50

Restart a Service

Ask: "Restart the echo container"
Tool: docker/restart
Container: echo

Note: Only allowed containers can be restarted (security feature).

5. Audio Transcription

Transcribe Audio

Ask: "Transcribe this audio file"
Tool: audio/transcribe

Or use the OpenAI-compatible API directly:

curl -X POST https://mcp.haiven.local/v1/audio/transcriptions \
  -F "file=@meeting.mp3" \
  -F "model=whisper-1"

Supported formats: mp3, wav, m4a, flac, ogg, webm

6. Web Search and Scraping

Search the Web

Ask: "Search for Python async best practices"
Tool: search/web

Scrape a Webpage

Ask: "Get the content from https://example.com"
Tool: scrape/url

Crawl a Website

Ask: "Crawl the documentation site and get all pages"
Tool: scrape/site

7. Code Execution

Run a Bash Command

Ask: "List files in /mnt/apps/docker"
Tool: sandbox/bash
Command: ls -la /mnt/apps/docker

Run Python Code

Ask: "Calculate the sum of 1 to 100"
Tool: sandbox/python
Code: print(sum(range(1, 101)))

Note: Code runs in isolated sandboxes with no network access.

Tutorials

Tutorial 1: Infrastructure Health Check

Goal: Get a complete overview of system health

  1. Check containers: "Show me all container statuses"
  2. Check GPUs: "What's the GPU memory usage?"
  3. Check resources: "Show detailed system metrics"
  4. Check alerts: "Are there any active alerts?"
  5. Check uptime: "What services are down in Uptime Kuma?"

Tutorial 2: Debugging a Service

Goal: Troubleshoot a misbehaving service

  1. Check status: "Is the echo container healthy?"
  2. View logs: "Show me the last 100 lines of echo logs"
  3. Check resources: "What's the memory usage?"
  4. Search for errors: "Search echo logs for 'error' in the last hour"
  5. Restart if needed: "Restart the echo container"

Tutorial 3: Model Exploration

Goal: Understand available models and their usage

  1. List models: "What models are available?"
  2. Check traces: Visit https://ai-ops.haiven.local
  3. View costs: Check the Langfuse dashboard for token usage
  4. Compare latency: Filter traces by model to compare response times

Tutorial 4: Audio Processing

Goal: Transcribe a voice recording

  1. Upload audio file to Echo or use API
  2. Transcribe: "Transcribe this audio file"
  3. Review output: Check the transcription text
  4. Translate if needed: Use audio/translations for non-English

Tips and Best Practices

Natural Language Works Best

Instead of trying to specify exact tool parameters, describe what you want:

Less Effective More Effective
"Run status/docker with show_stopped=true" "Show me all containers including stopped ones"
"Execute files/read path=/mnt/apps/docker/traefik/traefik.yml" "Show me the Traefik configuration"

Use Specific Questions for Better Results

Vague Specific
"Check the system" "What's the CPU usage and available memory?"
"Look at logs" "Show me errors in the llama-swap logs from the last hour"

Combine Tools for Complex Tasks

The AI can chain multiple tools together:
- "Check if llama-swap is running, show its logs if there are errors, and restart it if needed"
- "Find all services using the PRO 6000 and show their memory usage"

Security Awareness

Observability with Langfuse

Viewing Your Traces

  1. Open https://ai-ops.haiven.local
  2. Log in with your credentials
  3. Navigate to Traces in the sidebar
  4. Filter by:
    - Time range
    - Model name
    - User ID
    - Tags

Understanding Trace Data

Each trace shows:
- Input: The prompt sent to the model
- Output: The model's response
- Tokens: Input/output token counts
- Latency: Response time in milliseconds
- Cost: Estimated cost (if configured)

Debugging with Traces

When something goes wrong:
1. Find the relevant trace by timestamp
2. Check the input for issues
3. Review the output for errors
4. Look at the latency for timeout issues
5. Check parent spans for context

Troubleshooting

Service Won't Start - NVIDIA GPU Driver Not Loaded

Symptoms:
- mcp-server container exits with code 128
- Docker error: failed to initialize NVML: Driver Not Loaded
- nvidia-smi fails on the host

Root Cause:
mcp-server requires GPU access for audio transcription and other AI tasks. If the NVIDIA kernel modules aren't loaded (e.g., after a kernel upgrade without matching module rebuild), the container cannot initialize GPU access.

How to Verify:

nvidia-smi  # Should show GPU info; failure means driver not loaded
uname -r    # Check current kernel version
docker logs mcp-server 2>&1 | head -20  # Check for NVML errors

Fix:

# Install NVIDIA modules for current kernel and reboot
sudo apt update
sudo apt install linux-modules-nvidia-580-open-$(uname -r)
sudo reboot

# After reboot, verify and restart
nvidia-smi
cd /mnt/apps/docker/ai/mcp-server && docker compose up -d

Prevention: Install nvidia-dkms-580-open for automatic module rebuilds on kernel upgrades.


"Tool not found"

"Permission denied"

"Connection refused"

"Timeout"

Audio transcription fails

Quick Reference

Tool Namespaces

Namespace Tools
status docker, gpu, models, system
docker restart, logs
files read, list, search
sandbox bash, python
search web
scrape url, batch, site, sitemap
logs query
metrics query
alerts list
audio transcribe, speak
uptime status
image generate

API Endpoints

Endpoint Method Purpose
/health GET Health check
/tools GET List tools
/mcp POST Execute tool
/metrics GET Prometheus metrics
/v1/audio/transcriptions POST STT
/v1/audio/translations POST Translation

Environment Quick Reference

What Where
MCP Server https://mcp.haiven.local
Echo Chat https://echo.haiven.local
Langfuse https://ai-ops.haiven.local
LiteLLM https://llm.haiven.local
Grafana https://grafana.haiven.local

Getting Help

  1. Check the README: /mnt/apps/docker/ai/mcp-server/README.md
  2. View API docs: https://docs.haiven.local
  3. Check service status: Use status/docker tool
  4. View logs: Use docker/logs tool for mcp-server