AI-optimized web scraping with JavaScript rendering, LLM-friendly markdown conversion, and RAG pipeline integration.
Crawl4AI is an open-source web scraping solution optimized for AI applications:
- LLM-optimized output - Clean markdown perfect for RAG pipelines
- JavaScript rendering - Playwright-based for modern SPAs
- Fast and efficient - Parallel crawling with smart caching
- Structured extraction - JSON schema-based data extraction
- 100% private - No telemetry, all processing local
- CPU-based - No GPU required
| Endpoint | URL |
|---|---|
| Web UI / API | https://crawler.haiven.local |
| Direct Port | http://localhost:11235 |
| Health Check | https://crawler.haiven.local/health |
cd /mnt/apps/docker/ai/crawl4ai
# Start the service
docker compose up -d
# Watch logs (first start downloads Chromium browser ~200MB)
docker compose logs -f crawl4ai
Note: First startup takes 5-10 minutes to download Chromium browser.
curl http://localhost:11235/health
curl -X POST http://localhost:11235/crawl \
-H "Content-Type: application/json" \
-d '{"urls": ["https://example.com"]}' | jq '.results[0].markdown.raw_markdown'
curl -X POST http://localhost:11235/crawl \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://react-app.com"],
"js_code": "await new Promise(r => setTimeout(r, 3000));",
"wait_for": "css:.content"
}' | jq '.results[0].markdown.raw_markdown'
curl -X POST http://localhost:11235/crawl \
-H "Content-Type: application/json" \
-d '{
"urls": [
"https://example.com",
"https://example.org"
]
}' | jq '.results[] | {url: .url, title: .metadata.title}'
See .env.example for all configuration options.
| Variable | Default | Description |
|---|---|---|
CRAWL4AI_API_TOKEN |
required | API authentication token |
CRAWL4AI_MAX_CONCURRENT_REQUESTS |
10 | Max parallel crawl requests |
| Path | Purpose |
|---|---|
/mnt/storage/crawler/cache |
Cached web pages |
/mnt/storage/crawler/outputs |
Crawled content (markdown, JSON) |
/mnt/storage/crawler/sessions |
Browser sessions and cookies |
docker logs -f crawl4ai
rm -rf /mnt/storage/crawler/cache/*
cd /mnt/apps/docker/ai/crawl4ai
docker compose pull
docker compose down
docker compose up -d
Generated by haiven-service-onboarding plugin