Skip to main content
Welcome to the Ultravox integration guide! Ultravox is a powerful Speech-to-Speech (STS) provider that enables ultra-low latency, natural voice conversations with your AI agents.

🚀 What is Ultravox?

Ultravox is a speech-to-speech AI platform that processes audio directly without intermediate text-to-speech steps, resulting in:
  • Ultra-low latency responses (sub-second)
  • Natural conversational flow with intelligent turn-taking
  • Real-time interruption handling - users can interrupt the agent naturally
  • Native tool execution - agents can use your configured tools during conversations
  • Multi-language support - 25+ languages supported
Ultravox is ideal for phone calls and real-time voice interactions where latency and natural conversation flow are critical.

🔧 How Ultravox Works in Convocore

Ultravox uses a server-to-server WebSocket architecture that enables ultra-low latency voice conversations. Unlike traditional voice systems that use separate transcription and speech generation steps, Ultravox processes audio directly for faster, more natural interactions.

⚙️ Configuration Steps

Step 1: Add Your Ultravox API Key (Optional)

Adding your Ultravox API key is optional. When added, consumption and billing will be charged directly to your Ultravox account instead of using Convocore credits. If you don’t add an API key, usage will be billed through Convocore credits.
  1. Navigate to your Workspace Settings in Convocore
  2. Go to the Credentials tab
  3. In the Speech to Speech section
  4. Enter your Secret API Key
  5. Click Save
Ultravox Credentials

Step 2: Enable Ultravox for Your Agent

  1. Open your agent’s configuration
  2. Go to the Voice Setup page
  3. Click on the Speech Gen tab
  4. Select Ultravox from the provider dropdown
Speech Provider Selection

Step 3: Configure Voice Settings

Select a Voice

  1. Browse the available Ultravox voices in the Audio Library
  2. Preview voices using the play button
  3. Click Use Voice to select your preferred voice

Set Language

Configure the language for your agent:
  1. Find the Language dropdown in Ultravox Settings
  2. Select your target language or leave as Auto-detect

Step 4: Advanced Settings (Optional)

Temperature

Control the creativity/randomness of responses:
  • Lower values (0.0-0.3): More focused, deterministic responses
  • Higher values (0.7-1.0): More creative, varied responses
  • Default: Recommended for most use cases

Voice Activity Detection Settings

Fine-tune turn-taking behavior for natural conversations:
SettingDescriptionDefault
Turn Endpoint DelayHow long to wait after the user stops talking before responding384ms
Minimum Turn DurationMinimum speaking time before processing0ms
Interruption DurationHow long user must speak to interrupt90ms
Activation ThresholdSensitivity for detecting speech0.1
The default VAD settings are optimized for natural phone conversations. Only adjust if you experience specific issues with turn-taking.

🛠 Tool Integration

Ultravox seamlessly integrates with your agent’s configured tools:
  1. Knowledge Base Search: Automatically enabled when KB is active on your agent
  2. Custom Tools: All tools configured on your agent’s current node are available
  3. Natural Execution: The agent naturally integrates tool results into the conversation

💰 Pricing

Ultravox pricing is usage-based and billed per minute of conversation. Below is a complete cost breakdown:
ComponentCost per minDescription
Ultravox$0.007/minAll-in-one voice model (includes STT, TTS, and LLM)
Convocore PlatformVariableInfrastructure, routing, and platform services
Twilio (Optional)0.010.01 - 0.02/minPhone number and telephony costs (only if using phone calls)

Calculate Your Costs

Use the Pricing Calculator in your dashboard to get accurate cost estimates based on your usage and plan.

🔍 Troubleshooting

  • Verify your Ultravox API key is correct
  • Check that you have sufficient Ultravox credits
  • Ensure a voice is selected in the Speech Gen settings
  • Check your network connection
  • Reduce the Turn Endpoint Delay in VAD settings
  • Ensure you’re using a region close to your users
  • Increase the Minimum Interruption Duration (e.g., to 150ms)
  • Increase the Activation Threshold (e.g., to 0.15)
  • Decrease the Minimum Interruption Duration (e.g., to 50ms)
  • Decrease the Activation Threshold (e.g., to 0.05)
  • Ensure tools are enabled on the current node
  • Check that the tool is not disabled in Tools settings
  • Verify the tool has a clear description so the AI knows when to use it

🔗 Next Steps


With Ultravox configured, your agents are ready for natural, low-latency voice conversations! 🎙️