Haiven API Documentation

Explore and interact with Haiven platform services

flowise

AI workflow automation and orchestration

ai workflows automation

upload-service

Web file manager for uploading AI models and browsing storage

files upload storage models ai

mcp-server

MCP protocol server with embedded Whisper STT, 23 tools, and OpenAI-compatible audio API

ai mcp tools docker monitoring stt whisper audio

litellm-mcp

MCP protocol server wrapping LiteLLM proxy for local LLM calls (GLM-4.7-Flash, Seed-36B) without Claude API quota

ai mcp llm tools litellm inference glm seed

crawl4ai

AI-optimized web scraping with JavaScript rendering, LLM-friendly markdown, and RAG pipeline integration

ai web-scraping crawling markdown rag playwright

llama-swap

On-demand GGUF model gateway with OpenAI-compatible chat/completions APIs.

llm ai chat openai completions gguf

vllm-glm-flash

vLLM always-on GLM-4.7-Flash AWQ service with 200K context for fast mechanical and code-adjacent tasks.

llm ai chat openai vllm glm

vllm-qwen35-35b

vLLM always-on Bravo service serving qwen3.6-35b-a3b with 262K context for general-purpose completion.

llm ai chat openai vllm qwen

vllm-heretic

vLLM always-on Huihui-Qwen3.6-27B-abliterated-BF16 service (runtime FP8, 200K context) for creative writing and low-refusal prose.

llm ai chat openai vllm huihui creative-writing

vllm-qwen3-embedding

vLLM always-on Qwen3-Embedding-4B BF16 service with 8K context for semantic embeddings. Embeddings-only — use /v1/embeddings, not /v1/chat/completions.

llm ai embeddings openai vllm qwen semantic-search

vllm-qwen35-27b

vLLM always-on Qwen3.5-27B FP8 service on Delta GPU with 128K context, tool calling, and structured output. Thinking disabled — optimized for JSON generation and classification.

llm ai chat openai vllm qwen structured-output tool-calling

vllm-medgemma

vLLM always-on MedGemma 27B Text IT FP8 service with 32K context for medical text comprehension, clinical reasoning, and biomedical QA. Text-only (no vision). Shared Delta GPU.

llm ai chat openai vllm medical clinical biomedical

vllm-minimax-m25

vLLM on-demand MiniMax-M2.5 AWQ Q4 229B MoE service with 32K context, tensor-parallel across Bravo+Charlie GPUs, and reasoning token support. On-demand — does not auto-start on reboot.

llm ai chat openai vllm minimax moe reasoning on-demand

qdrant

Vector database for embeddings, semantic retrieval, and RAG storage.

vector-database ai embeddings search qdrant rag

piper-api

Fast CPU-based text-to-speech using Piper neural voices

tts ai audio speech

styletts2

Advanced voice cloning and text-to-speech with StyleTTS2

tts ai audio voice-cloning

f5-tts

Flow-matching text-to-speech with zero-shot voice cloning

tts ai audio voice-cloning flow-matching

audio-converter

FFMPEG-based audio format conversion and processing

audio conversion ffmpeg

comfyui

Stable Diffusion image generation with ComfyUI workflows (native systemd service)

images ai diffusion generation

chat-export

Claude Code conversation export to LibreChat format

chat export claude librechat conversations

haiven-intelligence

Semantic search API for AI conversations with vector similarity and hybrid search

search ai vector semantic qdrant conversations

haiven-knowledge

Semantic knowledge base for infrastructure docs and lessons learned (744 points, 28 topics, Qdrant + Qwen3-Embed)

knowledge ai vector semantic qdrant search rag

haiven-ingest-docling

Document format conversion for the Haiven ingestion pipeline — converts PDF, DOCX, PPTX, XLSX, HTML, and images to Markdown/JSON using Docling (IBM Research). Primary consumer: haiven-knowledge IngestionRouter.

document-conversion ai ingestion pdf docx ocr rag docling

litellm

OpenAI-compatible API gateway with unified model access, virtual keys, and Langfuse observability

llm ai openai gateway proxy chat

sandbox-manager

Web-based orchestration platform for Claude Code containers with terminal access, MCP configuration, and LLM routing

claude containers terminal mcp sandbox orchestration

research-agent

Autonomous 9-state web research pipeline with LLM synthesis, SearXNG search, Crawl4AI extraction, work-hub task integration, haiven-knowledge auto-ingest, and Langfuse tracing. Supports task_id for artifact write-back.

research ai llm search crawling synthesis pipeline knowledge work-hub

audiobook-recommender

Personal audiobook recommendation engine with semantic search, weighted scoring, and Libation import

recommendations ai audiobooks embeddings vector-search lancedb

haiven-transcribe

Tri-engine speech-to-text with NVIDIA Canary-1b-v2, Parakeet-TDT-0.6B-v2, Whisper Large v3 Turbo, and pyannote speaker diarization

stt ai audio transcription translation diarization wyoming

meeting-scribe

Automated meeting transcription and note-generation pipeline (7-stage: transcribe → clean → infer metadata → notes → validate → extract → deliver) with v2 edit mode, partial re-runs, and 31 configurable settings

transcription ai meetings notes pipeline llm

vllm-gemma4-26b

Gemma 4 26B-A4B FP8 MoE inference via vLLM. Vision + tool calling + thinking mode. 256K context with tiny KV cache (5.2 GB at full context). Primary structured output model on Delta GPU.

llm ai vllm openai-api gpu vision tool-calling moe structured-output

vllm-gemma4-e4b

Gemma 4 E4B BF16 inference via vLLM. Only Gemma 4 model with audio input (30s clips, 16 kHz). Also supports image and video. 128K context.

llm ai vllm openai-api gpu audio vision multimodal

vllm-medgemma-4b

MedGemma 4B BF16 abliterated medical vision inference via vLLM. SigLIP encoder for radiology, dermatology, histopathology, ophthalmology. 128K context.

llm ai vllm openai-api gpu medical vision radiology

meeting-assistant

AI-powered meeting assistant with SSE streaming chat, reasoning blocks, real-time transcription via haiven-transcribe, speaker diarization, knowledge base search, and chat export

meetings ai transcription chat sse diarization knowledge stt

work-hub

Integration-first workspace — IMAP email, meeting transcription, document import (PDF/DOCX/EML/HTML/CSV), task management, and AI-assisted drafting. 37 endpoints across tasks, meetings, taxonomy, import, audio transcription, email connector, backfill, and webhooks.

tasks ai meetings rag drafting import email webhooks productivity imap

content-factory

Voice-to-content pipeline with 6 content types (Entry, TIL, Link, Quote, Build, Site Page), 2 voice registers (Authority, Conversational), and a spoken feedback loop. Records voice notes, transcribes via haiven-transcribe, drafts via Seed-36B, saves as markdown with YAML frontmatter.

content ai voice pipeline transcription markdown drafting

haiven-ragas

RAGAS evaluation service for the haiven-knowledge RAG pipeline. Measures retrieval quality (Context Precision) against a 25-question golden dataset using GLM-4.7-Flash as judge. Quality gate: >= 0.65.

evaluation ai rag ragas quality knowledge

haiven-reranker

Cross-encoder reranking service for the Haiven knowledge pipeline. Scores (query, passage) pairs using Qwen3-Reranker-4B-seq-cls via sentence-transformers CrossEncoder on the Delta GPU (port 8460). Internal service — no public domain. OFFLINE / on-demand — stopped 2026-03-30.

reranking ai rag search cross-encoder gpu embeddings

notification-hub

Multi-channel notification dispatcher for Haiven agent services. Routes notifications to email (SMTP via Mailpit), ntfy.sh push, or Home Assistant TTS based on a YAML routing table keyed by source_agent.

notifications ai email ntfy ha-tts smtp agents

agent-briefing

Scope-aware briefing agent — generates daily summaries, end-of-day reports, task artifact reviews, and knowledge context assembly. Pulls from work-hub and haiven-knowledge, generates via GLM-4.7-Flash, delivers through notification-hub. Scheduled Mon–Fri via systemd timers.

briefing ai agents tasks knowledge scheduled notifications

haiven-orchestrator

Central AI orchestrator — deployed intent classification via gemma4-26b, session management (Redis), and agent dispatch for 17 intents across briefing, email, scheduling, research, tasks, and knowledge domains

orchestration ai intent-classification agents llm session dispatch

haven-voice-gateway

Full-duplex voice pipeline gateway — sequences STT (haiven-transcribe), intent classification (haiven-orchestrator), and TTS (haven-tts-gateway) into a single voice interaction. Includes confirm flow for voice-driven action approval.

voice ai audio stt tts pipeline gateway

Echo (LibreChat)

AI chat frontend with multi-provider support

chat ai frontend librechat conversations