flowise
AI workflow automation and orchestration
upload-service
Web file manager for uploading AI models and browsing storage
mcp-server
MCP protocol server with embedded Whisper STT, 23 tools, and OpenAI-compatible audio API
litellm-mcp
MCP protocol server wrapping LiteLLM proxy for local LLM calls (GLM-4.7-Flash, Seed-36B) without Claude API quota
crawl4ai
AI-optimized web scraping with JavaScript rendering, LLM-friendly markdown, and RAG pipeline integration
llama-swap
On-demand GGUF model gateway with OpenAI-compatible chat/completions APIs.
vllm-glm-flash
vLLM always-on GLM-4.7-Flash AWQ service with 200K context for fast mechanical and code-adjacent tasks.
vllm-qwen35-35b
vLLM always-on Bravo service serving qwen3.6-35b-a3b with 262K context for general-purpose completion.
vllm-heretic
vLLM always-on Huihui-Qwen3.6-27B-abliterated-BF16 service (runtime FP8, 200K context) for creative writing and low-refusal prose.
vllm-qwen3-embedding
vLLM always-on Qwen3-Embedding-4B BF16 service with 8K context for semantic embeddings. Embeddings-only — use /v1/embeddings, not /v1/chat/completions.
vllm-qwen35-27b
vLLM always-on Qwen3.5-27B FP8 service on Delta GPU with 128K context, tool calling, and structured output. Thinking disabled — optimized for JSON generation and classification.
vllm-medgemma
vLLM always-on MedGemma 27B Text IT FP8 service with 32K context for medical text comprehension, clinical reasoning, and biomedical QA. Text-only (no vision). Shared Delta GPU.
vllm-minimax-m25
vLLM on-demand MiniMax-M2.5 AWQ Q4 229B MoE service with 32K context, tensor-parallel across Bravo+Charlie GPUs, and reasoning token support. On-demand — does not auto-start on reboot.
qdrant
Vector database for embeddings, semantic retrieval, and RAG storage.
piper-api
Fast CPU-based text-to-speech using Piper neural voices
styletts2
Advanced voice cloning and text-to-speech with StyleTTS2
f5-tts
Flow-matching text-to-speech with zero-shot voice cloning
audio-converter
FFMPEG-based audio format conversion and processing
comfyui
Stable Diffusion image generation with ComfyUI workflows (native systemd service)
chat-export
Claude Code conversation export to LibreChat format
haiven-intelligence
Semantic search API for AI conversations with vector similarity and hybrid search
haiven-knowledge
Semantic knowledge base for infrastructure docs and lessons learned (744 points, 28 topics, Qdrant + Qwen3-Embed)
haiven-ingest-docling
Document format conversion for the Haiven ingestion pipeline — converts PDF, DOCX, PPTX, XLSX, HTML, and images to Markdown/JSON using Docling (IBM Research). Primary consumer: haiven-knowledge IngestionRouter.
litellm
OpenAI-compatible API gateway with unified model access, virtual keys, and Langfuse observability
sandbox-manager
Web-based orchestration platform for Claude Code containers with terminal access, MCP configuration, and LLM routing
research-agent
Autonomous 9-state web research pipeline with LLM synthesis, SearXNG search, Crawl4AI extraction, work-hub task integration, haiven-knowledge auto-ingest, and Langfuse tracing. Supports task_id for artifact write-back.
audiobook-recommender
Personal audiobook recommendation engine with semantic search, weighted scoring, and Libation import
haiven-transcribe
Tri-engine speech-to-text with NVIDIA Canary-1b-v2, Parakeet-TDT-0.6B-v2, Whisper Large v3 Turbo, and pyannote speaker diarization
meeting-scribe
Automated meeting transcription and note-generation pipeline (7-stage: transcribe → clean → infer metadata → notes → validate → extract → deliver) with v2 edit mode, partial re-runs, and 31 configurable settings
vllm-gemma4-26b
Gemma 4 26B-A4B FP8 MoE inference via vLLM. Vision + tool calling + thinking mode. 256K context with tiny KV cache (5.2 GB at full context). Primary structured output model on Delta GPU.
vllm-gemma4-e4b
Gemma 4 E4B BF16 inference via vLLM. Only Gemma 4 model with audio input (30s clips, 16 kHz). Also supports image and video. 128K context.
vllm-medgemma-4b
MedGemma 4B BF16 abliterated medical vision inference via vLLM. SigLIP encoder for radiology, dermatology, histopathology, ophthalmology. 128K context.
meeting-assistant
AI-powered meeting assistant with SSE streaming chat, reasoning blocks, real-time transcription via haiven-transcribe, speaker diarization, knowledge base search, and chat export
work-hub
Integration-first workspace — IMAP email, meeting transcription, document import (PDF/DOCX/EML/HTML/CSV), task management, and AI-assisted drafting. 37 endpoints across tasks, meetings, taxonomy, import, audio transcription, email connector, backfill, and webhooks.
content-factory
Voice-to-content pipeline with 6 content types (Entry, TIL, Link, Quote, Build, Site Page), 2 voice registers (Authority, Conversational), and a spoken feedback loop. Records voice notes, transcribes via haiven-transcribe, drafts via Seed-36B, saves as markdown with YAML frontmatter.
haiven-ragas
RAGAS evaluation service for the haiven-knowledge RAG pipeline. Measures retrieval quality (Context Precision) against a 25-question golden dataset using GLM-4.7-Flash as judge. Quality gate: >= 0.65.
haiven-reranker
Cross-encoder reranking service for the Haiven knowledge pipeline. Scores (query, passage) pairs using Qwen3-Reranker-4B-seq-cls via sentence-transformers CrossEncoder on the Delta GPU (port 8460). Internal service — no public domain. OFFLINE / on-demand — stopped 2026-03-30.
notification-hub
Multi-channel notification dispatcher for Haiven agent services. Routes notifications to email (SMTP via Mailpit), ntfy.sh push, or Home Assistant TTS based on a YAML routing table keyed by source_agent.
agent-briefing
Scope-aware briefing agent — generates daily summaries, end-of-day reports, task artifact reviews, and knowledge context assembly. Pulls from work-hub and haiven-knowledge, generates via GLM-4.7-Flash, delivers through notification-hub. Scheduled Mon–Fri via systemd timers.
haiven-orchestrator
Central AI orchestrator — deployed intent classification via gemma4-26b, session management (Redis), and agent dispatch for 17 intents across briefing, email, scheduling, research, tasks, and knowledge domains
haven-voice-gateway
Full-duplex voice pipeline gateway — sequences STT (haiven-transcribe), intent classification (haiven-orchestrator), and TTS (haven-tts-gateway) into a single voice interaction. Includes confirm flow for voice-driven action approval.
Echo (LibreChat)
AI chat frontend with multi-provider support