Memory Remediation Approval Service

AI-powered memory remediation with human-in-the-loop approval for Haiven container infrastructure.

Architecture

This system consists of two components:

  1. memory-remediation.py (Cronicle script) - Monitors container memory, queries Prometheus, calls LiteLLM for analysis, sends approval emails
  2. remediation-approval (FastAPI service) - Approval gateway with web UI for reviewing and applying memory changes

Access

Quick Start

cd /mnt/apps/docker/infrastructure/remediation-approval
cp .env.example .env
# Edit .env with your generated REMEDIATION_SECRET
docker compose up -d

How It Works

  1. Cronicle runs memory-remediation.py every 5 minutes
  2. Script queries Prometheus for containers above 85% memory threshold
  3. For each flagged container, LiteLLM analyzes the memory profile
  4. Generates recommendation (INCREASE_LIMIT, RESTART, INVESTIGATE, NO_ACTION)
  5. Sends email with approve/reject links to admin
  6. Admin clicks approve -> FastAPI modifies compose mem_limit -> restarts container

Endpoints

Method Path Description
GET /health Healthcheck
GET /pending List pending recommendations
GET /details/{token} View recommendation details
GET /approve/{token} Approval confirmation page
POST /approve/{token}/confirm Apply the memory change
GET /reject/{token} Reject recommendation
GET /audit View approval/rejection history

Configuration

See .env.example for configuration options.

Cronicle Job Environment Variables

Set these in the Cronicle web UI (scheduler.haiven.site) for the Memory Remediation job:

REMEDIATION_SECRET=<same-as-fastapi-env>
LITELLM_MASTER_KEY=<from-litellm/.env>
REMEDIATION_THRESHOLD=85
LITELLM_MODEL=qwen3-30b-a3b

Maintenance

View Logs

docker logs -f remediation-approval

Restart Service

docker compose restart

View Audit Log

cat /mnt/storage/remediation-approval/data/audit.json | python3 -m json.tool

View Reports

ls -la /mnt/storage/remediation/reports/

Safety Features

Documentation

Service Role
Prometheus Provides container memory metrics
LiteLLM AI analysis of memory profiles
Cronicle Schedules the monitoring script
smtp-relay Delivers approval emails

Generated by haiven-service-onboarding plugin