Voice Settings
The Voice Settings tab controls how your agent sounds during voice conversations. The right voice selection significantly impacts user experience—a professional voice builds trust for customer support, while a warm, friendly voice works better for sales interactions.

Language
Select the primary language your agent will speak. This setting affects:
- Speech recognition - How the STT (Speech-to-Text) model interprets user speech
- Voice synthesis - The language model used for TTS (Text-to-Speech)
- Available voices - Different languages have different voice options
Language vs. Instructions
The language setting controls the speech engine, not what language your agent "knows". An agent with English voice settings can still understand Portuguese if you include it in the instructions—but the speech engines will be optimized for English pronunciation.
TTS Model
Choose the Text-to-Speech model that will convert your agent's text responses into spoken audio. The available models appear in the dropdown and vary as providers release new options.
Provider Categories
TTS providers generally fall into these categories:
| Category | Trade-off |
|---|---|
| Fast/Low-latency | Best for real-time voice calls |
| High-quality | Best for pre-recorded or quality-critical applications |
| Customizable | Best for brand voices and cloned voices |
Voice Latency Matters for Phone Calls
For voice agents handling phone calls, latency is critical. Users notice delays, and longer pauses feel unnatural. For phone deployments, prefer faster TTS options over higher-quality but slower ones.
Voice Selection
After selecting a TTS model, choose from the available voices. Each provider offers different voice options:
Voice Characteristics to Consider
| Characteristic | Impact on User Experience |
|---|---|
| Gender | Match your brand voice or user expectations |
| Age | Younger voices feel casual; mature voices feel authoritative |
| Tone | Professional, friendly, neutral |
| Accent | Consider your target audience's expectations |
Sample Voice Selection Strategy
| Use Case | Recommended Voice Style |
|---|---|
| Customer Support | Calm, professional, neutral accent |
| Sales | Warm, enthusiastic, engaging |
| Technical Support | Clear, patient, slightly slower pace |
| Medical/Legal | Professional, trustworthy, measured |
| Casual Chat | Friendly, upbeat, conversational |
Voice ≠ Personality
A friendly-sounding voice doesn't make your agent friendly—that comes from the instructions. Similarly, a professional voice won't help if your agent's responses are casual. Match voice selection with instruction tone.
Advanced Configuration
Custom Voices (ElevenLabs)
To use custom or cloned voices from ElevenLabs, you need to integrate your own ElevenLabs API key:
- Configure your ElevenLabs API key in Provider Integrations
- Create or clone a voice in your ElevenLabs account
- The voice will automatically appear in the SipPulse AI voice dropdown
API Key Required for Custom Voices
Custom and cloned voices are only available when using your own ElevenLabs API key. The default platform voices do not include access to your ElevenLabs library.
Voice Cloning Considerations
Custom voice cloning requires appropriate rights and consent. Ensure you have permission to clone any voice and comply with regional regulations on synthetic voice usage.
Testing Your Voice
The Playground lets you test voice settings before deployment:
- Configure your voice settings
- Open the Playground
- Enable Voice Mode
- Have a test conversation
- Adjust settings based on the experience
Test Different Scenarios
When testing, try:
- Long responses (does the voice stay natural?)
- Technical terms (are they pronounced correctly?)
- Numbers and dates (clear articulation?)
- Emotional scenarios (appropriate tone?)
Best Practices
1. Match Voice to Channel
- Phone/SIP: Prioritize low latency (OpenAI TTS, Kokoro)
- Chat Widget with Audio: Balance quality and speed (OpenAI TTS HD)
- Pre-recorded Messages: Maximize quality (ElevenLabs)
2. Consider Your Audience
- B2B: Professional, authoritative voices
- B2C: Warm, approachable voices
- Technical Users: Clear, measured pace
- General Public: Friendly, patient tone
3. Maintain Consistency
Use the same voice across all agents in a product line to build brand recognition. Users should feel they're speaking with the same "assistant" regardless of the specific agent.
4. Test with Real Users
Voice preference is subjective. If possible, A/B test different voices with actual users to find what resonates with your specific audience.
Troubleshooting
Voice Sounds Robotic
- Try a different TTS provider (ElevenLabs often sounds most natural)
- Check if your instructions produce overly formal or structured text
- Ensure the language setting matches your content
Pronunciation Issues
- Use phonetic spelling in instructions for brand names
- Example: "SipPulse" → "Sip Pulse" (with space)
- Some providers support SSML for precise pronunciation control
Latency Too High
- Switch to a faster TTS provider
- Reduce response length in instructions
- Consider streaming TTS if supported by your deployment
Related Documentation
- Profile - Configure agent identity and model
- Call Configuration - Voice call behavior settings
- Text to Speech Models - Detailed TTS provider comparison
