Your gateway to AI-powered conversations, speech synthesis, transcription, and web search on Haiven
LiteLLM is your central hub for accessing all AI capabilities on Haiven. It provides a standard OpenAI-compatible API, so any tool or library that works with OpenAI will work with LiteLLM.
| What | Where |
|---|---|
| API Endpoint | https://llm.haiven.local/v1 |
| Admin Dashboard | https://litellm.haiven.local/ui |
| Health Status | https://llm.haiven.local/health |
| TTS Pass-through (Piper) | https://llm.haiven.local/tts/v1/audio/speech |
| TTS Pass-through (StyleTTS2) | https://llm.haiven.local/styletts2/v1/audio/speech |
| STT Pass-through | https://llm.haiven.local/stt/v1/audio/transcriptions |
# Simple chat request
curl https://llm.haiven.local/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "qwen3-30b-a3b",
"messages": [{"role": "user", "content": "Hello!"}]
}'
from openai import OpenAI
client = OpenAI(
base_url="https://llm.haiven.local/v1",
api_key="YOUR_API_KEY"
)
response = client.chat.completions.create(
model="qwen3-30b-a3b",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What's the weather like?"}
]
)
print(response.choices[0].message.content)
import OpenAI from 'openai';
const openai = new OpenAI({
baseURL: 'https://llm.haiven.local/v1',
apiKey: 'YOUR_API_KEY',
});
const response = await openai.chat.completions.create({
model: 'qwen3-30b-a3b',
messages: [{ role: 'user', content: 'Hello!' }],
});
console.log(response.choices[0].message.content);
If you're running code inside the Haiven Docker network:
client = OpenAI(
base_url="http://litellm:4000/v1",
api_key="YOUR_API_KEY"
)
https://litellm.haiven.local/ui| Tab | What It Shows |
|---|---|
| Dashboard | Usage statistics, request counts, token usage |
| Keys | Create and manage API keys |
| Models | Available models and their configurations |
| Usage | Detailed spend and usage logs |
| Settings | Configuration options |
curl https://llm.haiven.local/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d '{
"model": "gpt-4",
"messages": [
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "How do I read a file in Python?"}
],
"temperature": 0.7,
"max_tokens": 500
}'
Get responses as they're generated:
curl https://llm.haiven.local/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d '{
"model": "qwen3-30b-a3b",
"messages": [{"role": "user", "content": "Write a short story"}],
"stream": true
}'
curl https://llm.haiven.local/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d '{
"model": "qwen3-30b-a3b",
"messages": [
{"role": "system", "content": "You are an expert Python programmer. Write clean, efficient code."},
{"role": "user", "content": "Write a function to calculate Fibonacci numbers with memoization"}
],
"temperature": 0.2
}'
from openai import OpenAI
client = OpenAI(
base_url="https://llm.haiven.local/v1",
api_key="YOUR_API_KEY"
)
messages = [
{"role": "system", "content": "You are a helpful tutor."}
]
# First turn
messages.append({"role": "user", "content": "Explain what a variable is in programming"})
response = client.chat.completions.create(model="gpt-4", messages=messages)
messages.append({"role": "assistant", "content": response.choices[0].message.content})
# Second turn
messages.append({"role": "user", "content": "Can you give me an example?"})
response = client.chat.completions.create(model="gpt-4", messages=messages)
print(response.choices[0].message.content)
Get structured JSON output:
curl https://llm.haiven.local/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d '{
"model": "qwen3-30b-a3b",
"messages": [
{"role": "system", "content": "You are a helpful assistant that responds in JSON format."},
{"role": "user", "content": "List 3 programming languages with their key features"}
],
"response_format": {"type": "json_object"}
}'
| Model Name | Best For | Size |
|---|---|---|
qwen3-30b-a3b |
General purpose, coding | 30B params |
qwen2.5-14b-instruct |
Fast responses | 14B params |
gemma3-27b |
General purpose | 27B params |
gpt-oss-120b |
Complex reasoning | 120B params |
For compatibility with OpenAI tools, you can use these aliases:
| Alias | Maps To |
|---|---|
gpt-4 |
qwen3-30b-a3b |
gpt-4-turbo |
qwen3-30b-a3b |
gpt-3.5-turbo |
qwen2.5-14b-instruct |
curl https://llm.haiven.local/v1/models \
-H "Authorization: Bearer $API_KEY" | jq '.data[].id'
gpt-3.5-turbo or qwen2.5-14b-instructgpt-4 or qwen3-30b-a3bgpt-oss-120b (slower but more capable)LiteLLM provides access to three TTS engines, each with different characteristics.
| Model | Engine | Speed | Quality | GPU Required | Best For |
|---|---|---|---|---|---|
tts-1 |
Piper (ONNX) | Very Fast | Good | No (CPU) | Quick responses, notifications |
tts-1-hd |
XTTS | Medium | High | No (CPU) | Voice cloning, professional audio |
styletts2 |
StyleTTS2 | Slow | Highest | Yes (RTX 4090) | Maximum quality, style transfer |
All three engines support these OpenAI-compatible voice names:
| Voice | Description | Character |
|---|---|---|
alloy |
Neutral, balanced | Professional, clear |
echo |
Male, warm | Approachable, friendly |
fable |
British accent | Storyteller, narrator |
onyx |
Male, deep | Authoritative, commanding |
nova |
Female, friendly | Conversational, warm |
shimmer |
Female, expressive | Energetic, enthusiastic |
# Using Piper TTS (fastest)
curl https://llm.haiven.local/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d '{
"model": "tts-1",
"input": "Hello! Welcome to Haiven.",
"voice": "alloy"
}' --output hello.mp3
# Using StyleTTS2 (highest quality)
curl https://llm.haiven.local/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d '{
"model": "styletts2",
"input": "This is professional-quality neural speech synthesis.",
"voice": "nova"
}' --output professional.wav
from openai import OpenAI
from pathlib import Path
client = OpenAI(
base_url="https://llm.haiven.local/v1",
api_key="YOUR_API_KEY"
)
# Quick TTS with Piper
def quick_speak(text: str, output_file: str = "output.mp3"):
"""Fast TTS for notifications and quick responses."""
response = client.audio.speech.create(
model="tts-1",
voice="alloy",
input=text
)
response.stream_to_file(Path(output_file))
return output_file
# High-quality TTS with StyleTTS2
def professional_speak(text: str, voice: str = "nova", output_file: str = "professional.wav"):
"""High-quality TTS for professional content."""
response = client.audio.speech.create(
model="styletts2",
voice=voice,
input=text
)
response.stream_to_file(Path(output_file))
return output_file
# Generate all voices for comparison
def generate_voice_samples(text: str):
"""Generate samples of all available voices."""
voices = ["alloy", "echo", "fable", "onyx", "nova", "shimmer"]
for voice in voices:
response = client.audio.speech.create(
model="tts-1",
voice=voice,
input=text
)
response.stream_to_file(Path(f"sample_{voice}.mp3"))
print(f"Generated sample_{voice}.mp3")
# Usage
quick_speak("You have a new message.")
professional_speak("Welcome to our quarterly earnings call.")
generate_voice_samples("The quick brown fox jumps over the lazy dog.")
| Format | Extension | Description | File Size |
|---|---|---|---|
mp3 |
.mp3 | Most compatible, lossy | Medium |
opus |
.opus | Best compression, lossy | Small |
aac |
.aac | Apple devices, lossy | Medium |
flac |
.flac | Lossless compression | Large |
wav |
.wav | Uncompressed, lossless | Very Large |
pcm |
.pcm | Raw audio, lossless | Very Large |
# Generate in different formats
curl https://llm.haiven.local/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d '{
"model": "tts-1",
"input": "Testing different audio formats.",
"voice": "alloy",
"response_format": "opus"
}' --output speech.opus
# High-quality WAV for editing
curl https://llm.haiven.local/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d '{
"model": "styletts2",
"input": "Uncompressed audio for post-processing.",
"voice": "nova",
"response_format": "wav"
}' --output speech.wav
Adjust speech speed with the speed parameter (0.25 to 4.0):
# Slow narration (0.75x speed)
curl https://llm.haiven.local/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d '{
"model": "tts-1",
"input": "This is spoken slowly for clarity and emphasis.",
"voice": "fable",
"speed": 0.75
}' --output slow.mp3
# Fast reading (1.5x speed)
curl https://llm.haiven.local/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d '{
"model": "tts-1",
"input": "This is spoken quickly for time-sensitive information.",
"voice": "alloy",
"speed": 1.5
}' --output fast.mp3
from openai import OpenAI
from pathlib import Path
client = OpenAI(base_url="https://llm.haiven.local/v1", api_key="YOUR_API_KEY")
# 1. Notification Sound
def create_notification(message: str):
response = client.audio.speech.create(
model="tts-1", # Fast
voice="alloy",
input=message,
speed=1.2 # Slightly faster
)
response.stream_to_file(Path("notification.mp3"))
# 2. Podcast Intro
def create_podcast_intro(show_name: str, episode_title: str):
script = f"Welcome to {show_name}. Today's episode: {episode_title}."
response = client.audio.speech.create(
model="styletts2", # High quality
voice="onyx", # Authoritative
input=script
)
response.stream_to_file(Path("podcast_intro.wav"))
# 3. Audiobook Chapter
def narrate_chapter(text: str, chapter_num: int):
response = client.audio.speech.create(
model="styletts2",
voice="fable", # British narrator
input=text,
response_format="flac", # Lossless for editing
speed=0.9 # Slightly slower for clarity
)
response.stream_to_file(Path(f"chapter_{chapter_num}.flac"))
# 4. Voice Assistant Response
def assistant_response(text: str):
response = client.audio.speech.create(
model="tts-1", # Fast response
voice="nova", # Friendly female
input=text
)
return response.content # Return bytes for immediate playback
# 5. Multi-language Announcement (using different voices)
def announcement(en_text: str, output_prefix: str):
voices_per_region = {
"us": "alloy",
"uk": "fable",
"casual": "nova"
}
for region, voice in voices_per_region.items():
response = client.audio.speech.create(
model="tts-1",
voice=voice,
input=en_text
)
response.stream_to_file(Path(f"{output_prefix}_{region}.mp3"))
import asyncio
from openai import AsyncOpenAI
from pathlib import Path
async_client = AsyncOpenAI(
base_url="https://llm.haiven.local/v1",
api_key="YOUR_API_KEY"
)
async def batch_tts(texts: list[str], voice: str = "alloy"):
"""Generate TTS for multiple texts concurrently."""
async def generate_one(idx: int, text: str):
response = await async_client.audio.speech.create(
model="tts-1",
voice=voice,
input=text
)
output_path = Path(f"output_{idx}.mp3")
response.stream_to_file(output_path)
return output_path
tasks = [generate_one(i, text) for i, text in enumerate(texts)]
results = await asyncio.gather(*tasks)
return results
# Usage
texts = [
"First notification message.",
"Second notification message.",
"Third notification message."
]
asyncio.run(batch_tts(texts))
LiteLLM provides GPU-accelerated speech recognition using Faster-Whisper.
| Model | Speed | Accuracy | Languages |
|---|---|---|---|
whisper-1 |
Fast | High | 99+ languages |
whisper-large-v3 |
Medium | Highest | 99+ languages |
Both models run on GPU for fast transcription and support automatic language detection.
# Simple transcription
curl https://llm.haiven.local/v1/audio/transcriptions \
-H "Authorization: Bearer $API_KEY" \
-F "file=@recording.mp3" \
-F "model=whisper-1"
Response:
{
"text": "Hello, this is a test recording of the speech recognition system."
}
# Get word-level timestamps
curl https://llm.haiven.local/v1/audio/transcriptions \
-H "Authorization: Bearer $API_KEY" \
-F "file=@interview.wav" \
-F "model=whisper-large-v3" \
-F "response_format=verbose_json" \
-F "timestamp_granularities[]=word"
Response:
{
"task": "transcribe",
"language": "english",
"duration": 15.5,
"text": "Welcome to the interview.",
"words": [
{"word": "Welcome", "start": 0.0, "end": 0.5},
{"word": "to", "start": 0.5, "end": 0.7},
{"word": "the", "start": 0.7, "end": 0.9},
{"word": "interview", "start": 0.9, "end": 1.5}
]
}
# Get segment-level timestamps (sentences/phrases)
curl https://llm.haiven.local/v1/audio/transcriptions \
-H "Authorization: Bearer $API_KEY" \
-F "file=@podcast.mp3" \
-F "model=whisper-large-v3" \
-F "response_format=verbose_json" \
-F "timestamp_granularities[]=segment"
Response:
{
"task": "transcribe",
"language": "english",
"duration": 120.5,
"text": "Welcome to the show. Today we discuss AI.",
"segments": [
{
"id": 0,
"seek": 0,
"start": 0.0,
"end": 2.5,
"text": "Welcome to the show.",
"tokens": [50364, 5765, 281, 264, 1656, 13],
"temperature": 0.0,
"avg_logprob": -0.25,
"compression_ratio": 1.2,
"no_speech_prob": 0.01
},
{
"id": 1,
"seek": 250,
"start": 2.5,
"end": 5.0,
"text": "Today we discuss AI.",
"tokens": [50364, 2692, 321, 2248, 7318, 13],
"temperature": 0.0,
"avg_logprob": -0.18,
"compression_ratio": 1.1,
"no_speech_prob": 0.02
}
]
}
# Transcribe Spanish audio
curl https://llm.haiven.local/v1/audio/transcriptions \
-H "Authorization: Bearer $API_KEY" \
-F "file=@spanish_audio.mp3" \
-F "model=whisper-large-v3" \
-F "language=es"
# Transcribe Japanese audio
curl https://llm.haiven.local/v1/audio/transcriptions \
-H "Authorization: Bearer $API_KEY" \
-F "file=@japanese_audio.wav" \
-F "model=whisper-large-v3" \
-F "language=ja"
# Translate any language audio to English text
curl https://llm.haiven.local/v1/audio/translations \
-H "Authorization: Bearer $API_KEY" \
-F "file=@french_speech.mp3" \
-F "model=whisper-large-v3"
| Format | Description | Use Case |
|---|---|---|
json |
Simple JSON with text | Default, simple integration |
text |
Plain text only | Minimal processing |
srt |
SubRip subtitle format | Video subtitles |
verbose_json |
Full details with timestamps | Analytics, editing |
vtt |
WebVTT subtitle format | Web video players |
# Generate SRT subtitles
curl https://llm.haiven.local/v1/audio/transcriptions \
-H "Authorization: Bearer $API_KEY" \
-F "file=@video_audio.mp3" \
-F "model=whisper-large-v3" \
-F "response_format=srt" \
--output subtitles.srt
# Generate VTT subtitles for web
curl https://llm.haiven.local/v1/audio/transcriptions \
-H "Authorization: Bearer $API_KEY" \
-F "file=@webinar.mp3" \
-F "model=whisper-large-v3" \
-F "response_format=vtt" \
--output captions.vtt
from openai import OpenAI
from pathlib import Path
client = OpenAI(
base_url="https://llm.haiven.local/v1",
api_key="YOUR_API_KEY"
)
# 1. Simple Transcription
def transcribe_audio(file_path: str) -> str:
"""Transcribe audio file to text."""
with open(file_path, "rb") as audio_file:
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file
)
return transcript.text
# 2. Transcription with Timestamps
def transcribe_with_timestamps(file_path: str) -> dict:
"""Transcribe with word-level timestamps."""
with open(file_path, "rb") as audio_file:
transcript = client.audio.transcriptions.create(
model="whisper-large-v3",
file=audio_file,
response_format="verbose_json",
timestamp_granularities=["word", "segment"]
)
return transcript
# 3. Generate Subtitles
def generate_subtitles(file_path: str, format: str = "srt") -> str:
"""Generate subtitle file from audio."""
with open(file_path, "rb") as audio_file:
result = client.audio.transcriptions.create(
model="whisper-large-v3",
file=audio_file,
response_format=format
)
output_path = Path(file_path).stem + f".{format}"
with open(output_path, "w") as f:
f.write(result)
return output_path
# 4. Translate Foreign Audio to English
def translate_to_english(file_path: str) -> str:
"""Translate foreign language audio to English text."""
with open(file_path, "rb") as audio_file:
translation = client.audio.translations.create(
model="whisper-large-v3",
file=audio_file
)
return translation.text
# 5. Transcribe with Specific Language
def transcribe_language(file_path: str, language: str) -> str:
"""Transcribe audio in a specific language."""
with open(file_path, "rb") as audio_file:
transcript = client.audio.transcriptions.create(
model="whisper-large-v3",
file=audio_file,
language=language # e.g., "es", "fr", "de", "ja", "zh"
)
return transcript.text
# 6. Meeting Transcription with Speaker Diarization Prep
def meeting_transcription(file_path: str) -> dict:
"""Transcribe meeting with segments for speaker identification."""
with open(file_path, "rb") as audio_file:
result = client.audio.transcriptions.create(
model="whisper-large-v3",
file=audio_file,
response_format="verbose_json",
timestamp_granularities=["segment"]
)
# Process segments for meeting minutes
segments = []
for seg in result.segments:
segments.append({
"start": seg["start"],
"end": seg["end"],
"text": seg["text"],
"duration": seg["end"] - seg["start"]
})
return {
"full_text": result.text,
"duration": result.duration,
"segments": segments
}
# Usage examples
text = transcribe_audio("recording.mp3")
print(f"Transcription: {text}")
detailed = transcribe_with_timestamps("interview.wav")
print(f"Duration: {detailed.duration}s")
for word in detailed.words[:5]:
print(f" {word['word']} ({word['start']:.2f}s - {word['end']:.2f}s)")
subtitles = generate_subtitles("video.mp3", "srt")
print(f"Subtitles saved to: {subtitles}")
english = translate_to_english("spanish_podcast.mp3")
print(f"English translation: {english}")
| Format | Extension | Max Size |
|---|---|---|
| MP3 | .mp3 | 25MB |
| MP4 Audio | .mp4, .m4a | 25MB |
| WAV | .wav | 25MB |
| WebM | .webm | 25MB |
| MPEG | .mpeg, .mpga | 25MB |
| OGG | .ogg | 25MB |
| FLAC | .flac | 25MB |
LiteLLM integrates with SearXNG to enable AI models to search the web and provide up-to-date information.
curl https://llm.haiven.local/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d '{
"model": "qwen3-30b-a3b-q8-abl",
"messages": [
{"role": "user", "content": "What are the latest developments in AI this week?"}
],
"tools": [{
"type": "function",
"function": {
"name": "searxng-search",
"description": "Search the web for current information"
}
}],
"tool_choice": "auto"
}'
from openai import OpenAI
client = OpenAI(
base_url="https://llm.haiven.local/v1",
api_key="YOUR_API_KEY"
)
def search_and_answer(question: str) -> str:
"""Use AI with web search to answer questions."""
response = client.chat.completions.create(
model="qwen3-30b-a3b-q8-abl", # Model with function calling support
messages=[
{"role": "system", "content": "You are a helpful assistant with web search capabilities. Use search when you need current information."},
{"role": "user", "content": question}
],
tools=[{
"type": "function",
"function": {
"name": "searxng-search",
"description": "Search the web for information",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query"
}
},
"required": ["query"]
}
}
}],
tool_choice="auto"
)
return response.choices[0].message.content
# Usage
answer = search_and_answer("What are the latest news about OpenAI?")
print(answer)
answer = search_and_answer("What's the current price of Bitcoin?")
print(answer)
from openai import OpenAI
import json
client = OpenAI(
base_url="https://llm.haiven.local/v1",
api_key="YOUR_API_KEY"
)
def research_topic(topic: str, depth: str = "overview") -> dict:
"""Research a topic using AI and web search."""
prompts = {
"overview": f"Provide a brief overview of {topic} with current information.",
"detailed": f"Provide a comprehensive analysis of {topic} including recent developments, key players, and future outlook.",
"news": f"What are the latest news and developments about {topic}?"
}
response = client.chat.completions.create(
model="qwen3-30b-a3b-q8-abl",
messages=[
{"role": "system", "content": "You are a research assistant. Search the web for current information and provide well-sourced responses."},
{"role": "user", "content": prompts.get(depth, prompts["overview"])}
],
tools=[{
"type": "function",
"function": {
"name": "searxng-search",
"description": "Search the web"
}
}],
tool_choice="auto"
)
return {
"topic": topic,
"depth": depth,
"response": response.choices[0].message.content,
"model": response.model,
"usage": {
"prompt_tokens": response.usage.prompt_tokens,
"completion_tokens": response.usage.completion_tokens
}
}
# Usage
research = research_topic("quantum computing", "detailed")
print(research["response"])
Pass-through endpoints provide direct access to backend services, bypassing LiteLLM's routing logic. Useful for:
- Avoiding API key requirements for internal services
- Direct access when you know which backend to use
- Reduced latency (no routing overhead)
| Endpoint | Backend | Purpose |
|---|---|---|
/tts/v1/audio/speech |
openedai-speech | Direct Piper/XTTS TTS |
/styletts2/v1/audio/speech |
styletts2-openai | Direct StyleTTS2 TTS |
/stt/v1/audio/transcriptions |
faster-whisper | Direct Whisper STT |
# Direct Piper TTS (no API key needed on internal network)
curl https://llm.haiven.local/tts/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1",
"input": "Direct access to Piper TTS.",
"voice": "alloy"
}' --output direct_piper.mp3
# Direct StyleTTS2 (no API key needed on internal network)
curl https://llm.haiven.local/styletts2/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1",
"input": "Direct access to StyleTTS2.",
"voice": "nova"
}' --output direct_styletts2.wav
# Direct Whisper STT (no API key needed on internal network)
curl https://llm.haiven.local/stt/v1/audio/transcriptions \
-F "file=@audio.mp3" \
-F "model=whisper-1"
| Scenario | Use Pass-through | Use Standard API |
|---|---|---|
| Internal automation | Yes | - |
| Usage tracking needed | - | Yes |
| API key management | - | Yes |
| Lowest latency | Yes | - |
| Budget limits | - | Yes |
| Langfuse tracing | - | Yes |
| Type | Purpose | Who Creates It |
|---|---|---|
| Master Key | Full admin access | System admin |
| Virtual Key | Limited access | Created via API/UI |
curl https://llm.haiven.local/key/generate \
-H "Authorization: Bearer $MASTER_KEY" \
-H "Content-Type: application/json" \
-d '{
"models": ["gpt-4", "gpt-3.5-turbo", "tts-1", "whisper-1"],
"user_id": "my-project",
"max_budget": 50.00,
"duration": "30d",
"metadata": {"project": "my-app", "team": "engineering"}
}'
# TTS-only key
curl https://llm.haiven.local/key/generate \
-H "Authorization: Bearer $MASTER_KEY" \
-H "Content-Type: application/json" \
-d '{
"models": ["tts-1", "tts-1-hd", "styletts2"],
"user_id": "tts-service",
"max_budget": 10.00,
"duration": "7d"
}'
# STT-only key
curl https://llm.haiven.local/key/generate \
-H "Authorization: Bearer $MASTER_KEY" \
-H "Content-Type: application/json" \
-d '{
"models": ["whisper-1", "whisper-large-v3"],
"user_id": "transcription-service",
"max_budget": 20.00,
"duration": "30d"
}'
# Full access key for development
curl https://llm.haiven.local/key/generate \
-H "Authorization: Bearer $MASTER_KEY" \
-H "Content-Type: application/json" \
-d '{
"models": [],
"user_id": "developer",
"max_budget": 100.00,
"duration": "90d",
"metadata": {"environment": "development"}
}'
curl https://llm.haiven.local/key/info \
-H "Authorization: Bearer $YOUR_KEY"
curl -X POST https://llm.haiven.local/key/delete \
-H "Authorization: Bearer $MASTER_KEY" \
-H "Content-Type: application/json" \
-d '{"keys": ["sk-..."]}'
from openai import OpenAI
from pathlib import Path
import tempfile
client = OpenAI(
base_url="https://llm.haiven.local/v1",
api_key="YOUR_API_KEY"
)
class VoiceChatbot:
def __init__(self, voice: str = "nova"):
self.voice = voice
self.messages = [
{"role": "system", "content": "You are a friendly voice assistant. Keep responses concise and conversational."}
]
def transcribe(self, audio_path: str) -> str:
"""Convert user speech to text."""
with open(audio_path, "rb") as audio_file:
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file
)
return transcript.text
def think(self, user_text: str) -> str:
"""Generate AI response."""
self.messages.append({"role": "user", "content": user_text})
response = client.chat.completions.create(
model="gpt-4",
messages=self.messages,
max_tokens=150
)
assistant_response = response.choices[0].message.content
self.messages.append({"role": "assistant", "content": assistant_response})
return assistant_response
def speak(self, text: str, output_path: str = None) -> str:
"""Convert AI response to speech."""
response = client.audio.speech.create(
model="tts-1", # Fast for conversation
voice=self.voice,
input=text
)
if output_path is None:
output_path = tempfile.mktemp(suffix=".mp3")
response.stream_to_file(Path(output_path))
return output_path
def chat(self, audio_input_path: str) -> tuple[str, str, str]:
"""Full voice chat cycle: listen -> think -> speak."""
# 1. Transcribe user speech
user_text = self.transcribe(audio_input_path)
# 2. Generate response
response_text = self.think(user_text)
# 3. Convert to speech
audio_output_path = self.speak(response_text)
return user_text, response_text, audio_output_path
# Usage
bot = VoiceChatbot(voice="nova")
user_said, bot_said, audio_file = bot.chat("user_recording.mp3")
print(f"User: {user_said}")
print(f"Bot: {bot_said}")
print(f"Audio: {audio_file}")
from openai import OpenAI
client = OpenAI(
base_url="https://llm.haiven.local/v1",
api_key="YOUR_API_KEY"
)
def process_podcast(audio_path: str) -> dict:
"""Transcribe podcast and generate summary with highlights."""
# 1. Transcribe with timestamps
with open(audio_path, "rb") as audio_file:
transcript = client.audio.transcriptions.create(
model="whisper-large-v3",
file=audio_file,
response_format="verbose_json",
timestamp_granularities=["segment"]
)
# 2. Generate summary
summary_response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are an expert at summarizing podcast content. Create concise, informative summaries."},
{"role": "user", "content": f"Summarize this podcast transcript in 3-5 bullet points:\n\n{transcript.text}"}
]
)
# 3. Extract key moments
moments_response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "Identify the 3 most interesting or important moments in this transcript."},
{"role": "user", "content": transcript.text}
]
)
return {
"duration": transcript.duration,
"full_transcript": transcript.text,
"segments": transcript.segments,
"summary": summary_response.choices[0].message.content,
"key_moments": moments_response.choices[0].message.content
}
# Usage
result = process_podcast("episode_42.mp3")
print(f"Duration: {result['duration']:.1f} seconds")
print(f"\nSummary:\n{result['summary']}")
print(f"\nKey Moments:\n{result['key_moments']}")
from openai import OpenAI
from pathlib import Path
client = OpenAI(
base_url="https://llm.haiven.local/v1",
api_key="YOUR_API_KEY"
)
def create_audio_content(topic: str, style: str = "educational") -> dict:
"""Generate script and audio content on any topic."""
style_prompts = {
"educational": "Create an informative, clear explanation suitable for learning.",
"entertaining": "Create an engaging, fun narrative that entertains while informing.",
"professional": "Create a polished, business-appropriate presentation.",
"conversational": "Create a casual, friendly discussion as if talking to a friend."
}
voice_for_style = {
"educational": "fable",
"entertaining": "nova",
"professional": "onyx",
"conversational": "alloy"
}
# 1. Generate script
script_response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": f"You are a content creator. {style_prompts.get(style, style_prompts['educational'])}"},
{"role": "user", "content": f"Create a 1-minute script about: {topic}"}
],
max_tokens=500
)
script = script_response.choices[0].message.content
# 2. Generate audio
voice = voice_for_style.get(style, "alloy")
audio_response = client.audio.speech.create(
model="styletts2", # High quality for content
voice=voice,
input=script,
response_format="wav"
)
output_path = Path(f"{topic.replace(' ', '_')}_{style}.wav")
audio_response.stream_to_file(output_path)
return {
"topic": topic,
"style": style,
"script": script,
"audio_file": str(output_path),
"voice": voice
}
# Usage
content = create_audio_content("quantum computing basics", "educational")
print(f"Script:\n{content['script']}")
print(f"\nAudio saved to: {content['audio_file']}")
messages = [
{"role": "system", "content": """You are an expert Python programmer.
- Write clean, readable code
- Include docstrings and type hints
- Handle errors gracefully"""},
{"role": "user", "content": "Write a function to parse CSV files"}
]
| Temperature | Use Case |
|---|---|
| 0.0 - 0.3 | Factual, deterministic (code, math) |
| 0.3 - 0.7 | Balanced (general chat) |
| 0.7 - 1.0 | Creative (stories, brainstorming) |
# Short response expected
response = client.chat.completions.create(
model="gpt-4",
messages=[...],
max_tokens=100 # Limit output length
)
Streaming shows output as it's generated, improving perceived speed:
stream = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Write a long essay"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
import time
from openai import RateLimitError
def chat_with_retry(messages, max_retries=3):
for attempt in range(max_retries):
try:
return client.chat.completions.create(
model="gpt-4",
messages=messages
)
except RateLimitError:
if attempt < max_retries - 1:
time.sleep(2 ** attempt) # Exponential backoff
else:
raise
tts-1 for: Quick responses, notifications, previewsstyletts2 for: Final production, podcasts, important contentwhisper-large-v3: For important transcriptions where accuracy mattersCause: Invalid or missing API key
Solution:
1. Check your API key is correct
2. Ensure the Authorization header is set: Bearer YOUR_KEY
3. Verify the key hasn't expired
Cause: Requested model doesn't exist or isn't available
Solution:
1. List available models: curl https://llm.haiven.local/v1/models
2. Check spelling of model name
3. Use a model alias like gpt-4
Cause: Too many requests too quickly
Solution:
1. Wait and retry (rate limit is 100 req/s average, 200 burst)
2. Implement exponential backoff
3. Request a higher rate limit from admin
Cause: Service might be down or unreachable
Solution:
1. Check service health: curl https://llm.haiven.local/health
2. Verify you're on the correct network
3. Contact system admin if issues persist
Cause: Model loading or high server load
Solution:
1. First request to a model is slower (model loading)
2. Use smaller models for faster responses
3. Set reasonable max_tokens limits
Cause: Using wrong model or voice for content type
Solution:
1. Use styletts2 for high-quality output
2. Try different voices for your content type
3. Add proper punctuation to input text
Cause: Poor audio quality or wrong language setting
Solution:
1. Improve audio quality if possible
2. Specify the correct language in the request
3. Use whisper-large-v3 for better accuracy
Yes, all requests require an API key. Contact your admin to get one, or create one yourself if you have master key access.
For most tasks, gpt-4 (which maps to qwen3-30b-a3b) is a good default. For faster responses, try gpt-3.5-turbo.
LiteLLM tracks usage but costs depend on your organization's policies. Check your usage in the Admin UI.
Yes! Any OpenAI-compatible client works. Just change the base URL to https://llm.haiven.local/v1.
Visit https://litellm.haiven.local/ui and log in with your API key to see your usage dashboard.
Yes, requests are logged to Langfuse for observability. Contact your admin about data retention policies.
Use the /v1/audio/speech endpoint with model tts-1 (fast), tts-1-hd (better), or styletts2 (best). Choose a voice like alloy, nova, or echo.
Use the /v1/audio/transcriptions endpoint with model whisper-1 or whisper-large-v3. Upload your audio file as a multipart form.
For TTS output: mp3, wav, opus, flac, aac, pcm. For STT input: mp3, mp4, mpeg, mpga, m4a, wav, webm, ogg, flac.
Yes! Models with function calling support can use the SearXNG search tool. Include tools in your request to enable web search.
tts-1: Fast Piper-based TTS, CPU only, good for quick responsestts-1-hd: XTTS with voice cloning, higher qualitystyletts2: Neural TTS with style transfer, highest quality, requires GPUYes, use the pass-through endpoints (/tts/..., /styletts2/..., /stt/...) for direct access to the backends.
| Endpoint | Method | Description |
|---|---|---|
/v1/chat/completions |
POST | Chat with AI |
/v1/completions |
POST | Text completion |
/v1/models |
GET | List models |
/v1/embeddings |
POST | Generate embeddings |
/v1/audio/speech |
POST | Text-to-speech |
/v1/audio/transcriptions |
POST | Speech-to-text |
/v1/audio/translations |
POST | Translate audio to English |
/tts/v1/audio/speech |
POST | Direct Piper TTS |
/styletts2/v1/audio/speech |
POST | Direct StyleTTS2 |
/stt/v1/audio/transcriptions |
POST | Direct Whisper |
/health |
GET | Health check |
| Model | Speed | Quality | Backend |
|---|---|---|---|
tts-1 |
Fast | Good | Piper (CPU) |
tts-1-hd |
Medium | High | XTTS (CPU) |
styletts2 |
Slow | Highest | StyleTTS2 (GPU) |
| Voice | Description |
|---|---|
alloy |
Neutral, balanced |
echo |
Male, warm |
fable |
British accent |
onyx |
Male, deep |
nova |
Female, friendly |
shimmer |
Female, expressive |
| Model | Speed | Accuracy |
|---|---|---|
whisper-1 |
Fast | High |
whisper-large-v3 |
Medium | Highest |
| Parameter | Type | Description |
|---|---|---|
model |
string | Model to use |
messages |
array | Conversation history |
temperature |
float | Randomness (0-2) |
max_tokens |
int | Max response length |
stream |
bool | Enable streaming |
top_p |
float | Nucleus sampling |
frequency_penalty |
float | Reduce repetition |
presence_penalty |
float | Encourage new topics |
| Role | Purpose |
|---|---|
system |
Set AI behavior/personality |
user |
Your input |
assistant |
AI's previous responses |
curl https://llm.haiven.local/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello!"}],
"temperature": 0.7,
"max_tokens": 100
}'
https://litellm.haiven.local/uihttps://docs.haiven.local (LiteLLM section)https://ai-ops.haiven.local (observability)