Work Hub

Integration-first workspace connecting meeting transcription, email ingestion, document import, task management, AI-assisted drafting, calendar management, and automated research dispatch into a unified pipeline. Central task hub for all Haiven agents.

Overview

Property Value
Domain work.haiven.site
Backend Port 8030
Frontend Port 3025
Source /mnt/apps/src/work-hub/
Docker /mnt/apps/docker/ai/work-hub/
API Docs https://work.haiven.site/api/docs
Category AI — Workflow & Productivity
Tier 2 (FastAPI with auto-generated OpenAPI spec)

What It Does

Work Hub closes the loop between meetings, emails, deliverables, calendar, and automated research:

  1. Receives approved meeting notes from Meeting Scribe via HMAC-verified webhook
  2. Automatically ingests emails from any IMAP mailbox (polling, deduplication, attachment extraction)
  3. Sends email via Microsoft Graph API (O365) with rate limiting and knowledge base logging
  4. Manages calendar events via Microsoft Graph API — list, create, delete with conflict detection
  5. Chunks and embeds all content into Qdrant for semantic search (2560-dim, Cosine)
  6. Extracts and tracks tasks with company/project/tag/source taxonomy
  7. Generates AI drafts by searching accumulated knowledge via RAG (meetings + docs + emails)
  8. Supports historical import of documents, audio/video, PST archives, and batch backfill of Meeting Scribe notes
  9. Accepts file uploads of PDF, DOCX, EML, HTML, CSV, ICS, and plain text via browser drag-and-drop
  10. Auto-dispatches research tasks to the research-agent when a task is created with status queued and "Research:" in its title
  11. Stores artifacts and voice instructions on tasks for agent write-back integration

Architecture

work.haiven.site (Traefik HTTPS)
                ├── work-hub-frontend (React 19 SPA, nginx, port 3025)
             └── /api  reverse proxy to backend
                └── work-hub (FastAPI, port 8030)
              ├── work-hub-db (PostgreSQL 16, port 5437)
              ├── qdrant:6333 (documents collection, 2560d, Cosine)
              ├── litellm:4000 (qwen3-embedding-4b + glm-4-7-flash)
              ├── haiven-knowledge:8022 (email send KB logging)
              ├── meeting-scribe:5010 (webhook source)
              └── research-agent:8000 (auto-dispatch via ResearchDispatcher)

Three containers: work-hub (FastAPI backend), work-hub-frontend (React SPA via nginx), work-hub-db (PostgreSQL 16)

Networks: web (frontend via Traefik), backend (all three containers)

GPU: None — embedding and LLM calls route through LiteLLM

Features

Supported Import Formats

Format Extractor Chunking Strategy
PDF pypdf (page markers + PDF metadata for title); Tesseract OCR fallback for scanned pages Heading-based (##) or paragraph fallback
DOCX python-docx (heading styles → markdown markers) Heading-based (##) or paragraph fallback
EML stdlib email parser (From/To/Cc/Subject/Date + 14 metadata fields) Header chunk + body chunks
HTML html2text (markdown conversion, preserves headings) Heading-based (##) or paragraph fallback
CSV stdlib csv (column headers + rows) Row groups of 50, headers repeated per chunk
ICS icalendar (event summary, description, attendees, dates) One chunk per calendar event
Markdown Split on ## headers (max 3000 chars/chunk)
Plain text Paragraph-based (\n\n), sentence-level fallback
PST readpst (extracts all contained .eml and .ics files) Per-message, same as EML/ICS
Audio/Video haiven-transcribe (Canary/Parakeet/Whisper Turbo tri-engine) Meeting-style chunking post-transcription

Prerequisites

Service URL Purpose
PostgreSQL 16 work-hub-db:5432 Task, meeting, draft, and email sync state storage
Qdrant http://qdrant:6333 Vector search (documents collection)
LiteLLM http://litellm:4000 Embeddings (qwen3-embedding-4b) + drafting (glm-4-7-flash)
Meeting Scribe http://meeting-scribe:5010 Optional — sends approved meetings via webhook
haiven-transcribe http://haiven-transcribe:8000 Optional — audio/video transcription
research-agent http://research-agent:8000 Optional — receives auto-dispatched research tasks
haiven-knowledge http://haiven-knowledge:8022 Optional — email send KB logging

Configuration

All environment variables use the WH_ prefix, except Microsoft Graph and email signature vars.

Core Settings

Variable Default Description
WH_DATABASE_URL postgresql+asyncpg://... async PostgreSQL URL
WH_QDRANT_URL http://qdrant:6333 Qdrant server URL
WH_QDRANT_COLLECTION documents Qdrant collection name
WH_LITELLM_URL http://litellm:4000 LiteLLM proxy URL
WH_LITELLM_API_KEY LiteLLM API key
WH_EMBEDDING_MODEL qwen3-embedding-4b 2560-dimension embedding model
WH_EMBEDDING_DIMENSIONS 2560 Vector dimensions
WH_DRAFT_MODEL hermes-4.3-36b LLM model for AI draft generation
WH_SCRIBE_URL http://meeting-scribe:5010 Meeting Scribe service URL
WH_WEBHOOK_SECRET HMAC-SHA256 secret for webhook verification

Transcription Settings

Variable Default Description
WH_TRANSCRIBE_URL http://haiven-transcribe:8000 Transcription service URL
WH_TRANSCRIBE_TIMEOUT 600 Transcription timeout in seconds

Research Auto-Dispatch Settings

Variable Default Description
WH_RESEARCH_URL http://research-agent:8000 Research agent base URL
WH_RESEARCH_API_KEY Bearer token for research-agent (optional)

The ResearchDispatcher polls every 60 seconds for status=queued tasks with "research" in the title. It transitions the task queued → context_ready → in_progress and POSTs to {WH_RESEARCH_URL}/research with {"query": "<title without prefix>", "task_id": "<uuid>", "auto_approve": true}.

IMAP Email Connector Settings

Variable Default Description
WH_IMAP_ENABLED false Enable IMAP email connector (feature flag)
WH_IMAP_HOST IMAP server hostname (e.g. imap.gmail.com)
WH_IMAP_PORT 993 IMAP port (993=SSL, 143=STARTTLS)
WH_IMAP_USERNAME IMAP login username (usually email address)
WH_IMAP_PASSWORD IMAP password (SecretStr, masked in logs and API responses)
WH_IMAP_USE_SSL true Use IMAP4_SSL (true) or IMAP4 with STARTTLS (false)
WH_IMAP_FOLDERS INBOX Comma-separated folder names to sync
WH_IMAP_POLL_INTERVAL 300 Seconds between incremental syncs
WH_IMAP_BATCH_SIZE 50 Max UIDs per IMAP FETCH batch
WH_IMAP_MAX_BACKFILL_DAYS 30 Maximum days per backfill request

Microsoft Graph (Email Send + Calendar)

Both email send and calendar management share one Azure app registration. The OAuth2 refresh token flow is used (no interactive login required at runtime).

Variable Default Description
CONN_EMAIL_OAUTH2_CLIENT_ID Azure AD app client ID
CONN_EMAIL_OAUTH2_CLIENT_SECRET Azure AD app client secret
CONN_EMAIL_OAUTH2_TENANT_ID Azure AD tenant ID
CONN_EMAIL_OAUTH2_REFRESH_TOKEN Long-lived OAuth2 refresh token (Mail.Send + Calendars.ReadWrite scopes)
EMAIL_SIGNATURE Optional plain-text signature appended to every outbound email

Required Azure AD scopes: https://graph.microsoft.com/Mail.Send, https://graph.microsoft.com/Calendars.ReadWrite, offline_access

When these variables are absent, the email send and calendar endpoints return 503 Service Unavailable.

Secrets are stored in /mnt/apps/docker/ai/work-hub/.env.

Database Schema

Nine PostgreSQL tables (all primary keys are UUIDs, all timestamps are timezone-aware):

Table Purpose
companies Client taxonomy (name unique, domain, ai_discovered flag)
projects Projects per company (company FK, name, status)
tags Classification tags (name unique, source)
tasks Work items (title, assignee, status, priority, source, source_application, company/project FKs, context, due_date, voice_instructions JSONB[], artifacts JSONB[], history JSONB[])
task_tags Many-to-many task/tag junction
meetings Meeting records (scribe_job_id unique, title, attendees JSON, notes_md, qdrant_document_id)
documents Imported doc metadata (title, doc_type, source, source_file, embedding_status)
drafts AI-generated drafts (task FK, content, model, context_chunks JSON, version)
email_sync_state Per-folder IMAP sync state (account, folder, last_uid, uidvalidity, last_sync_at)

Task Status FSM

Valid status transitions enforced by validate_status_transition():

open → in_progress → done
open → in_progress → blocked → in_progress
open → archived
queued → context_ready → in_progress
in_progress → done
in_progress → archived
done → archived

Qdrant Integration

Property Value
Collection documents
Dimensions 2560
Distance Cosine
Quantization INT8 scalar (quantile=0.99, always_ram=true)

Payload fields: document_id, doc_type, source, company, project, topics[], tags[], title, content, attendees[], meeting_type, scribe_job_id, source_file, created_at, ingested_at, chunk_index, total_chunks, email_from, email_to, email_cc, email_subject, email_date, email_message_id, email_in_reply_to, email_references, email_folder, email_importance, email_has_attachments, email_attachment_count, email_attachment_names, email_list_unsubscribe, email_account, parent_email_message_id, calendar_uid, calendar_summary, calendar_start, calendar_event_count, calendar_attendees

Indexed for filtering: doc_type, source, company, project, topics, tags, meeting_type, scribe_job_id (keyword); created_at (datetime)

API Endpoints

50+ endpoints across 10 groups:

Group Count Description
Tasks 10 CRUD + voice-instructions + artifacts + history + AI draft generation + draft history
Meetings 3 List, detail, semantic search
Taxonomy 13 Full CRUD for companies, projects, tags
Import 5 Single document + file upload + audio transcription + directory batch + PST archive
Backfill 1 Ingest historical Meeting Scribe notes
Webhooks 1 Receive approved meetings (HMAC-verified)
Health 1 Dependency health checks
Email 6 IMAP sync, backfill, status, config, folder list + Graph send
Calendar 3 List events, create event, delete event (Microsoft Graph)

Key endpoints:

GET  /health                                    # Service health (postgres, qdrant, litellm)

# Tasks
GET  /api/v1/tasks                              # List tasks (status, priority, company_id, project_id, assignee, source, source_application)
POST /api/v1/tasks                              # Create task
GET  /api/v1/tasks/{id}                         # Task detail
PATCH /api/v1/tasks/{id}                        # Update task (FSM-validated status transitions)
DELETE /api/v1/tasks/{id}                       # Hard-delete task
PATCH /api/v1/tasks/{id}/voice-instructions     # Append voice instruction
PATCH /api/v1/tasks/{id}/artifacts              # Append artifact reference
GET  /api/v1/tasks/{id}/history                 # Full change history array
POST /api/v1/tasks/{id}/draft                   # Generate AI draft (RAG + glm-4-7-flash)
GET  /api/v1/tasks/{id}/drafts                  # Draft history

# Meetings
GET  /api/v1/meetings                           # List meetings
GET  /api/v1/meetings/{id}                      # Meeting detail
POST /api/v1/meetings/search                    # Semantic search over meeting notes

# Taxonomy
GET  /api/v1/companies                          # List companies
POST /api/v1/companies                          # Create company
GET  /api/v1/companies/{id}                     # Company detail
PATCH /api/v1/companies/{id}                    # Update company
DELETE /api/v1/companies/{id}                   # Delete company
GET  /api/v1/projects                           # List projects (filter by company_id, status)
POST /api/v1/projects                           # Create project
GET  /api/v1/projects/{id}                      # Project detail
PATCH /api/v1/projects/{id}                     # Update project
DELETE /api/v1/projects/{id}                    # Delete project
GET  /api/v1/tags                               # List tags
POST /api/v1/tags                               # Create tag
DELETE /api/v1/tags/{id}                        # Delete tag

# Import
POST /api/v1/import/document                    # Import single document (text/markdown)
POST /api/v1/import/upload                      # Multipart file upload (PDF, DOCX, EML, HTML, CSV, ICS, MD, TXT)
POST /api/v1/import/audio                       # Audio/video transcription import
POST /api/v1/import/directory                   # Import directory batch
POST /api/v1/import/pst                         # Outlook PST archive import

# Backfill
POST /api/v1/backfill/scribe-notes              # Backfill Meeting Scribe notes

# Webhooks
POST /api/webhooks/scribe                       # Scribe webhook receiver (HMAC-verified)

# Email — IMAP connector
POST /api/v1/email/sync                         # Trigger immediate incremental IMAP sync
POST /api/v1/email/backfill                     # Date-range email backfill
GET  /api/v1/email/status                       # Per-folder IMAP sync state
GET  /api/v1/email/config                       # IMAP config with password masked
GET  /api/v1/email/folders                      # Live IMAP folder list
POST /api/v1/email/send                         # Send email via Microsoft Graph (20/hr rate limit)

# Calendar — Microsoft Graph
GET  /api/v1/calendar/events                    # List events in date range
POST /api/v1/calendar/events                    # Create calendar event (409 on conflict)
DELETE /api/v1/calendar/events/{id}             # Delete calendar event

Full interactive docs at https://work.haiven.site/api/docs.

Task Filtering Parameters

GET /api/v1/tasks accepts these query parameters for filtering:

Parameter Type Description
status string Filter by task status (open, in_progress, done, queued, etc.)
priority string Filter by priority (low, medium, high, critical)
company_id UUID Filter to tasks for a specific company
project_id UUID Filter to tasks for a specific project
assignee string Filter by assignee name
source string Filter by source field (manual, agent, email, etc.)
source_application string Filter by source_application (briefing, research_agent, etc.)
page int Page number (default: 1)
page_size int Items per page (default: 20, max: 100)

Agent Integration Endpoints

For agent-to-work-hub write-back:

# Append a voice instruction to a task
PATCH /api/v1/tasks/{id}/voice-instructions
{"instruction": "Make the summary shorter and focus on action items"}

# Append an artifact (research output, briefing draft, etc.)
PATCH /api/v1/tasks/{id}/artifacts
{"type": "research_output", "path": "<session_id>"}

# Update task status and context (research agent write-back)
PATCH /api/v1/tasks/{id}
{"status": "done", "context": "<JSON summary from research>"}

Deployment

First-Time Setup

# Create .env file
cat > /mnt/apps/docker/ai/work-hub/.env <<EOF
POSTGRES_USER=workhub
POSTGRES_PASSWORD=<strong-password>
POSTGRES_DB=workhub
WH_LITELLM_API_KEY=<litellm-api-key>
WH_WEBHOOK_SECRET=<random-32-char-hex>
# Research auto-dispatch
WH_RESEARCH_URL=http://research-agent:8000
# Optional: Microsoft Graph (email send + calendar)
# CONN_EMAIL_OAUTH2_CLIENT_ID=<azure-app-client-id>
# CONN_EMAIL_OAUTH2_CLIENT_SECRET=<azure-app-client-secret>
# CONN_EMAIL_OAUTH2_TENANT_ID=<azure-tenant-id>
# CONN_EMAIL_OAUTH2_REFRESH_TOKEN=<long-lived-refresh-token>
# EMAIL_SIGNATURE=Your Name | Title
# Optional: IMAP email connector
# WH_IMAP_ENABLED=true
# WH_IMAP_HOST=imap.gmail.com
# WH_IMAP_USERNAME=you@example.com
# WH_IMAP_PASSWORD=<app-password>
# WH_IMAP_FOLDERS=INBOX,Sent
EOF

# Start all three containers
cd /mnt/apps/docker/ai/work-hub
docker compose up -d

# Wait for DB to initialize, then backfill historical notes
sleep 10
curl -X POST https://work.haiven.site/api/v1/backfill/scribe-notes

Start / Stop

cd /mnt/apps/docker/ai/work-hub
docker compose up -d        # Start all containers
docker compose down         # Stop (preserves DB volume)
docker compose restart work-hub  # Restart backend only

Rebuild After Source Changes

cd /mnt/apps/docker/ai/work-hub
docker compose build work-hub --no-cache
docker compose up -d work-hub

View Logs

docker logs -f work-hub            # Backend
docker logs -f work-hub-frontend   # Frontend (nginx)
docker logs -f work-hub-db         # PostgreSQL

Health Check

curl https://work.haiven.site/api/health
# Returns: {"status": "healthy", "postgres": true, "qdrant": true, "litellm": true, "version": "1.0.0"}

Resource Limits

Container Memory Limit Memory Reservation CPU Limit
work-hub 4G 256M 4 cores
work-hub-frontend 2G 1 core
work-hub-db 1G 512M 2 cores

Monitoring

Prometheus labels are set on the work-hub container:

prometheus.io/scrape: "true"
prometheus.io/port: "8030"
prometheus.io/path: "/metrics"

The /metrics endpoint exposes standard FastAPI/uvicorn process metrics.

Webhook Integration

Meeting Scribe sends approved meeting notes to Work Hub via HTTP webhook:

POST /api/webhooks/scribe
X-Hub-Signature-256: sha256=<hmac>
Content-Type: application/json

{
  "version": 1,
  "job_id": "uuid",
  "title": "Meeting Title",
  "notes_md": "## Agenda\n...",
  "tasks": [],
  "decisions": [],
  "metadata": {}
}

The webhook handler verifies HMAC-SHA256 using WH_WEBHOOK_SECRET, then chunks and embeds the notes into Qdrant and records the meeting in PostgreSQL.

Research Auto-Dispatch

The ResearchDispatcher runs as a background asyncio task inside the work-hub process:

  1. Every 60 seconds, it queries PostgreSQL for tasks with status=queued and "research" in the title (case-insensitive).
  2. For each matched task, it transitions: queued → context_ready → in_progress (via direct DB update, respecting FSM rules).
  3. It strips the "Research: " prefix from the title and POSTs to {WH_RESEARCH_URL}/research:
    json { "query": "RAGAS framework comparison", "task_id": "uuid", "auto_approve": true }
  4. The research-agent runs the pipeline and writes artifacts back via PATCH /api/v1/tasks/{id}/artifacts.

To trigger a research task manually:

curl -X POST https://work.haiven.site/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Research: RAGAS framework comparison",
    "status": "queued",
    "source": "agent",
    "source_application": "manual"
  }'
# Within 60 seconds the task transitions to in_progress and research-agent begins

Microsoft Graph Integration

Work Hub integrates with Microsoft 365 via the Graph API for two capabilities: sending email and managing calendar events. Both use the same Azure AD app registration and the OAuth2 refresh token flow — no interactive login is required at runtime.

Email Send

curl -X POST https://work.haiven.site/api/v1/email/send \
  -H "Content-Type: application/json" \
  -d '{
    "to": "colleague@example.com",
    "subject": "Follow-up from today'\''s meeting",
    "body": "Hi,\n\nJust following up on the action items..."
  }'

Calendar Events

curl -X POST https://work.haiven.site/api/v1/calendar/events \
  -H "Content-Type: application/json" \
  -d '{
    "summary": "Sprint Planning",
    "start": "2026-03-10T14:00:00Z",
    "end": "2026-03-10T15:00:00Z",
    "attendees": ["alice@example.com", "bob@example.com"],
    "description": "Q1 sprint kickoff"
  }'

IMAP Email Connector

When WH_IMAP_ENABLED=true, Work Hub polls the configured IMAP mailbox in the background:

Email metadata endpoints return 503 Service Unavailable when WH_IMAP_ENABLED=false.

Triggering a Manual Sync

# Trigger immediate incremental sync
curl -X POST https://work.haiven.site/api/v1/email/sync

# Backfill emails from the past 7 days
curl -X POST "https://work.haiven.site/api/v1/email/backfill?since_date=2026-02-12&before_date=2026-02-19"

# Check sync state per folder
curl https://work.haiven.site/api/v1/email/status

# List available folders on the IMAP server
curl https://work.haiven.site/api/v1/email/folders

Troubleshooting

Backend fails to start

# Check database is healthy
docker ps --filter name=work-hub-db --format "{{.Status}}"

# Check logs for startup errors
docker logs work-hub --tail 50

# Verify .env is present and has required secrets
grep -v PASSWORD /mnt/apps/docker/ai/work-hub/.env

Draft generation returns no context

The AI draft agent searches Qdrant for relevant content. If no context is found:

# Check collection has data
curl http://localhost:6333/collections/documents

# Verify embedding model is available in LiteLLM
curl http://localhost:4000/v1/models | grep qwen3-embedding

Research dispatcher not firing

# Check dispatcher is running in logs
docker logs work-hub --tail 50 | grep -i "dispatcher"

# Verify task has status=queued and "research" in title
curl https://work.haiven.site/api/v1/tasks?status=queued | python3 -m json.tool

# Check WH_RESEARCH_URL is set
docker inspect work-hub | grep -A1 WH_RESEARCH

Email send fails with 503

# Verify Graph OAuth2 credentials are set
docker inspect work-hub | grep -i "CONN_EMAIL"

# Check rate limiter — max 20 sends per hour
docker logs work-hub --tail 50 | grep -i "rate limit\|email"

Calendar returns 503

# Verify the same CONN_EMAIL_OAUTH2_* vars are set (shared with email send)
docker inspect work-hub | grep -i "CONN_EMAIL_OAUTH2"

File upload fails

# Check file size (50 MB limit for documents, 500 MB for audio)
ls -lh /path/to/file

# Verify supported format
# Documents: .pdf .docx .eml .html .htm .csv .ics .md .txt
# Audio: .mp3 .m4a .wav .ogg .webm .mp4 .flac
# Check backend logs for extraction errors
docker logs work-hub --tail 50 | grep -i "upload\|extract"

PST import fails

# Verify readpst is installed inside the container
docker exec work-hub which readpst

# Check logs for extraction errors
docker logs work-hub --tail 50 | grep -i "pst\|readpst"

IMAP connector not syncing

# Verify feature flag is enabled
grep WH_IMAP_ENABLED /mnt/apps/docker/ai/work-hub/.env

# Check sync state for errors
curl https://work.haiven.site/api/v1/email/status | python3 -m json.tool

# Check backend logs for IMAP errors
docker logs work-hub --tail 100 | grep -i "imap\|email"

Webhook signature mismatch

Verify WH_WEBHOOK_SECRET matches the secret configured in Meeting Scribe. The signature is HMAC-SHA256 of the raw request body.

Container can't reach Qdrant

# Verify both services are on backend network
docker network inspect backend | grep -E "work-hub|qdrant"

# Test connectivity from inside container
docker exec work-hub curl -sf http://qdrant:6333/health

Frontend gets 502 from /api proxy

The nginx frontend proxies /api to http://work-hub:8030. Verify the backend container is healthy:

docker ps --filter name=work-hub --format "{{.Status}}"
curl http://localhost:8030/health

Project Status

All phases complete. Production-ready with 3 healthy containers. Integrated with research-agent (auto-dispatch), agent-briefing (task read/write), haiven-notification-hub, and Microsoft 365 (email send + calendar).