A comprehensive guide for using the memory remediation approval gateway to review and apply AI-generated memory adjustments for Haiven container services.
Open your browser and navigate to:
The service works best with modern browsers (Chrome, Firefox, Safari, Edge).
The remediation approval service acts as a human-in-the-loop gateway for AI-generated memory remediation recommendations. When containers on the Haiven platform approach memory limits, an automated Cronicle job analyzes the situation using AI (LiteLLM) and generates recommendations. These recommendations are sent to administrators for approval before any changes are applied.
Key capabilities:
- Review AI-analyzed memory recommendations with detailed evidence
- Approve or reject memory limit changes with a single click
- View complete audit trail of all decisions
- Monitor pending recommendations
- Automatic token expiry and rate limiting for safety
Monitoring (Every 5 minutes)
- Cronicle runs memory-remediation.py script
- Script queries Prometheus for container memory metrics
- Containers above 85% memory threshold are flagged
Analysis
- For each flagged container, the script gathers:
qwen3-30b-a3b by default) analyzes the memory profileRecommendation Generation
- AI generates one of four actions (see table below)
- Each recommendation includes a risk score (1-10), detailed analysis, and evidence
Email Notification
- Email sent to admin via smtp-relay
- Contains summary with three action links: Approve, Reject, Details
Human Approval
- Admin reviews recommendation in the approval UI
- Clicks Approve or Reject
Change Application (if approved)
- Service modifies deploy.resources.limits.memory in docker-compose.yml
- Runs docker compose up -d to restart container with new limit
- Records decision in audit log
| Action | Meaning | What Happens on Approval |
|---|---|---|
| INCREASE_LIMIT | Container needs more memory | Compose file updated with new limit, container restarts |
| RESTART | Memory leak detected | Informational — admin restarts manually |
| INVESTIGATE | Unusual pattern needs human review | No automatic action |
| NO_ACTION | Memory usage is normal | No email sent, logged in report only |
What happens:
- Compose file's deploy.resources.limits.memory is updated
- docker compose up -d restarts the container (brief downtime)
- Audit entry is created with full details
- Token is consumed and cannot be reused
Navigate to "Pending" in the UI to see all recommendations awaiting action.
Each entry shows:
- Container name and recommended action (color-coded badge)
- New memory limit (if applicable) and risk score
- Created timestamp
- Status: Active (can approve) or EXPIRED (>24h, wait for next cycle)
When a recommendation is generated, you receive an HTML email with:
[Haiven] Memory Remediation: [Action] — [Container]Every approval link contains an HMAC-SHA256 signature:
- Tokens are cryptographically signed with REMEDIATION_SECRET
- Tampered tokens are rejected with 403 Forbidden
- Tokens include container name, action, new limit, and timestamp
The service has very limited scope:
- Only modifies: deploy.resources.limits.memory in docker-compose.yml
- Never touches: Any other compose fields, environment variables, or volumes
Every action is logged with:
- Timestamp, token, container name
- Action (APPROVED/REJECTED)
- Old and new memory limits
- Result status (success/error/restart_failed/rejected)
Audit log location: /mnt/storage/remediation-approval/data/audit.json
Access Cronicle at: https://scheduler.haiven.site
Navigate to Schedule > Memory Remediation > Edit.
Set in the Cronicle job's Environment tab:
REMEDIATION_SECRET=<same-as-fastapi-env>
LITELLM_MASTER_KEY=<from-litellm/.env>
| Variable | Default | Description |
|---|---|---|
REMEDIATION_THRESHOLD |
85 | Memory usage % threshold |
LITELLM_MODEL |
qwen3-30b-a3b | LLM model for analysis |
APPROVAL_BASE_URL |
https://remediation.haiven.site | Base URL for email links |
NOTIFICATION_EMAIL |
elijah@elijahryoung.com | Admin email address |
Test the system before enabling automatic emails:
--dry-run to the job's Arguments field/mnt/storage/remediation/reports/# Check smtp-relay status
docker logs smtp-relay
# Check latest report
ls -lt /mnt/storage/remediation/reports/ | head -5
# Check service logs
docker logs remediation-approval
# Common causes: compose file not found, permission denied, service name mismatch
mem_limit — add deploy.resources.limits.memory to their compose filesREMEDIATION_THRESHOLD in Cronicle envcurl 'http://prometheus:9090/api/v1/query?query=container_memory_usage_bytes'docker exec cronicle python3 /mnt/apps/docker/infrastructure/cronicle/scripts/memory-remediation.py --dry-run
Reports: /mnt/storage/remediation/reports/memory-remediation-*.json
Tokens: /mnt/storage/remediation/reports/tokens/*.json
ls -lt /mnt/storage/remediation/reports/*.json | head -1 | xargs cat | python3 -m json.tool
Q: How often does the system check?
A: Every 5 minutes (configurable in Cronicle).
Q: What if I don't approve a recommendation?
A: After 24h the token expires. If the issue persists, a new recommendation is generated.
Q: Can I set a different memory limit than recommended?
A: Not through the UI. Reject the recommendation and manually edit the compose file.
Q: What does risk score mean?
A: 1-3 = low risk, 4-6 = medium, 7-10 = high. Higher scores indicate more uncertainty or larger changes.
Q: Where is the REMEDIATION_SECRET?
A: In /mnt/apps/docker/infrastructure/remediation-approval/.env and Cronicle job env (must match).
| Resource | Location |
|---|---|
| Approval UI | https://remediation.haiven.site |
| External Access | https://remediation.haiven.site |
| Cronicle Scheduler | https://scheduler.haiven.site |
| Audit Log | /mnt/storage/remediation-approval/data/audit.json |
| Reports | /mnt/storage/remediation/reports/ |
| Service Logs | docker logs -f remediation-approval |
Generated by haiven-service-onboarding plugin