This guide explains how to use the MCP Server tools through Echo (LibreChat) or direct API access.
| Method | URL | Use Case |
|---|---|---|
| Echo Chat | https://echo.haiven.local |
Interactive chat with tool access |
| Direct API | https://mcp.haiven.local |
Programmatic tool execution |
| External | https://mcp.haiven.site |
Remote access (authenticated) |
https://echo.haiven.localCheck Docker Containers
Ask: "Show me all running containers"
Tool: status/docker
The response includes container name, status, health, uptime, and ports.
Check GPU Usage
Ask: "How much GPU memory is being used?"
Tool: status/gpu
Returns utilization percentage, memory used/total, and temperature for all GPUs:
- RTX 4090 (24GB): Multimodal tasks (image gen, audio, ML)
- PRO 6000 Alpha (96GB): LLM inference (primary)
- PRO 6000 Bravo (96GB): LLM inference (overflow)
Check System Resources
Ask: "What's the CPU and memory usage?"
Tool: status/system
Returns CPU%, memory GB, disk usage, and load averages.
List Available Models
Ask: "What LLM models can I use?"
Tool: status/models
This now queries LiteLLM, providing access to all 33 configured models including:
- Local GGUF models via llama-swap
- TTS models (Piper voices)
- STT models (Whisper)
Understanding Model Traces
All model calls are logged to Langfuse. To view traces:
1. Visit https://ai-ops.haiven.local
2. Navigate to Traces
3. Filter by service or model name
Read a Configuration File
Ask: "Show me the llama-swap config"
Tool: files/read
Path: /mnt/apps/docker/ai/llama-swap/config.yaml
Search for Files
Ask: "Find all docker-compose.yml files"
Tool: files/search
Pattern: docker-compose.yml
Query Loki Logs
Ask: "Show me errors from the last hour"
Tool: logs/query
View Container Logs
Ask: "Show me the last 50 lines of mcp-server logs"
Tool: docker/logs
Container: mcp-server
Lines: 50
Restart a Service
Ask: "Restart the echo container"
Tool: docker/restart
Container: echo
Note: Only allowed containers can be restarted (security feature).
Transcribe Audio
Ask: "Transcribe this audio file"
Tool: audio/transcribe
Or use the OpenAI-compatible API directly:
curl -X POST https://mcp.haiven.local/v1/audio/transcriptions \
-F "file=@meeting.mp3" \
-F "model=whisper-1"
Supported formats: mp3, wav, m4a, flac, ogg, webm
Search the Web
Ask: "Search for Python async best practices"
Tool: search/web
Scrape a Webpage
Ask: "Get the content from https://example.com"
Tool: scrape/url
Crawl a Website
Ask: "Crawl the documentation site and get all pages"
Tool: scrape/site
Run a Bash Command
Ask: "List files in /mnt/apps/docker"
Tool: sandbox/bash
Command: ls -la /mnt/apps/docker
Run Python Code
Ask: "Calculate the sum of 1 to 100"
Tool: sandbox/python
Code: print(sum(range(1, 101)))
Note: Code runs in isolated sandboxes with no network access.
Goal: Get a complete overview of system health
Goal: Troubleshoot a misbehaving service
Goal: Understand available models and their usage
https://ai-ops.haiven.localGoal: Transcribe a voice recording
Instead of trying to specify exact tool parameters, describe what you want:
| Less Effective | More Effective |
|---|---|
| "Run status/docker with show_stopped=true" | "Show me all containers including stopped ones" |
| "Execute files/read path=/mnt/apps/docker/traefik/traefik.yml" | "Show me the Traefik configuration" |
| Vague | Specific |
|---|---|
| "Check the system" | "What's the CPU usage and available memory?" |
| "Look at logs" | "Show me errors in the llama-swap logs from the last hour" |
The AI can chain multiple tools together:
- "Check if llama-swap is running, show its logs if there are errors, and restart it if needed"
- "Find all services using the PRO 6000 and show their memory usage"
https://ai-ops.haiven.localEach trace shows:
- Input: The prompt sent to the model
- Output: The model's response
- Tokens: Input/output token counts
- Latency: Response time in milliseconds
- Cost: Estimated cost (if configured)
When something goes wrong:
1. Find the relevant trace by timestamp
2. Check the input for issues
3. Review the output for errors
4. Look at the latency for timeout issues
5. Check parent spans for context
Symptoms:
- mcp-server container exits with code 128
- Docker error: failed to initialize NVML: Driver Not Loaded
- nvidia-smi fails on the host
Root Cause:
mcp-server requires GPU access for audio transcription and other AI tasks. If the NVIDIA kernel modules aren't loaded (e.g., after a kernel upgrade without matching module rebuild), the container cannot initialize GPU access.
How to Verify:
nvidia-smi # Should show GPU info; failure means driver not loaded
uname -r # Check current kernel version
docker logs mcp-server 2>&1 | head -20 # Check for NVML errors
Fix:
# Install NVIDIA modules for current kernel and reboot
sudo apt update
sudo apt install linux-modules-nvidia-580-open-$(uname -r)
sudo reboot
# After reboot, verify and restart
nvidia-smi
cd /mnt/apps/docker/ai/mcp-server && docker compose up -d
Prevention: Install nvidia-dkms-580-open for automatic module rebuilds on kernel upgrades.
| Namespace | Tools |
|---|---|
| status | docker, gpu, models, system |
| docker | restart, logs |
| files | read, list, search |
| sandbox | bash, python |
| search | web |
| scrape | url, batch, site, sitemap |
| logs | query |
| metrics | query |
| alerts | list |
| audio | transcribe, speak |
| uptime | status |
| image | generate |
| Endpoint | Method | Purpose |
|---|---|---|
/health |
GET | Health check |
/tools |
GET | List tools |
/mcp |
POST | Execute tool |
/metrics |
GET | Prometheus metrics |
/v1/audio/transcriptions |
POST | STT |
/v1/audio/translations |
POST | Translation |
| What | Where |
|---|---|
| MCP Server | https://mcp.haiven.local |
| Echo Chat | https://echo.haiven.local |
| Langfuse | https://ai-ops.haiven.local |
| LiteLLM | https://llm.haiven.local |
| Grafana | https://grafana.haiven.local |
/mnt/apps/docker/ai/mcp-server/README.mdhttps://docs.haiven.localstatus/docker tooldocker/logs tool for mcp-server