Voice Settings

The Voice Settings tab controls how your agent sounds during voice conversations. The right voice selection significantly impacts user experience—a professional voice builds trust for customer support, while a warm, friendly voice works better for sales interactions.

Language

Select the primary language your agent will speak. This setting affects:

Speech recognition - How the STT (Speech-to-Text) model interprets user speech
Voice synthesis - The language model used for TTS (Text-to-Speech)
Available voices - Different languages have different voice options

Language vs. Instructions

The language setting controls the speech engine, not what language your agent "knows". An agent with English voice settings can still understand Portuguese if you include it in the instructions—but the speech engines will be optimized for English pronunciation.

TTS Model

Choose the Text-to-Speech model that will convert your agent's text responses into spoken audio. The available models appear in the dropdown and vary as providers release new options.

Provider Categories

TTS providers generally fall into these categories:

Category	Trade-off
Fast/Low-latency	Best for real-time voice calls
High-quality	Best for pre-recorded or quality-critical applications
Customizable	Best for brand voices and cloned voices

Voice Latency Matters for Phone Calls

For voice agents handling phone calls, latency is critical. Users notice delays, and longer pauses feel unnatural. For phone deployments, prefer faster TTS options over higher-quality but slower ones.

Voice Selection

After selecting a TTS model, choose from the available voices. Each provider offers different voice options:

Voice Characteristics to Consider

Characteristic	Impact on User Experience
Gender	Match your brand voice or user expectations
Age	Younger voices feel casual; mature voices feel authoritative
Tone	Professional, friendly, neutral
Accent	Consider your target audience's expectations

Sample Voice Selection Strategy

Use Case	Recommended Voice Style
Customer Support	Calm, professional, neutral accent
Sales	Warm, enthusiastic, engaging
Technical Support	Clear, patient, slightly slower pace
Medical/Legal	Professional, trustworthy, measured
Casual Chat	Friendly, upbeat, conversational

Voice ≠ Personality

A friendly-sounding voice doesn't make your agent friendly—that comes from the instructions. Similarly, a professional voice won't help if your agent's responses are casual. Match voice selection with instruction tone.

Advanced Configuration

Custom Voices (ElevenLabs)

To use custom or cloned voices from ElevenLabs, you need to integrate your own ElevenLabs API key:

Configure your ElevenLabs API key in Provider Integrations
Create or clone a voice in your ElevenLabs account
The voice will automatically appear in the SipPulse AI voice dropdown

API Key Required for Custom Voices

Custom and cloned voices are only available when using your own ElevenLabs API key. The default platform voices do not include access to your ElevenLabs library.

Voice Cloning Considerations

Custom voice cloning requires appropriate rights and consent. Ensure you have permission to clone any voice and comply with regional regulations on synthetic voice usage.

Testing Your Voice

The Playground lets you test voice settings before deployment:

Configure your voice settings
Open the Playground
Enable Voice Mode
Have a test conversation
Adjust settings based on the experience

Test Different Scenarios

When testing, try:

Long responses (does the voice stay natural?)
Technical terms (are they pronounced correctly?)
Numbers and dates (clear articulation?)
Emotional scenarios (appropriate tone?)

Best Practices

1. Match Voice to Channel

Phone/SIP: Prioritize low latency (OpenAI TTS, Kokoro)
Chat Widget with Audio: Balance quality and speed (OpenAI TTS HD)
Pre-recorded Messages: Maximize quality (ElevenLabs)

2. Consider Your Audience

B2B: Professional, authoritative voices
B2C: Warm, approachable voices
Technical Users: Clear, measured pace
General Public: Friendly, patient tone

3. Maintain Consistency

Use the same voice across all agents in a product line to build brand recognition. Users should feel they're speaking with the same "assistant" regardless of the specific agent.

4. Test with Real Users

Voice preference is subjective. If possible, A/B test different voices with actual users to find what resonates with your specific audience.

Troubleshooting

Voice Sounds Robotic

Try a different TTS provider (ElevenLabs often sounds most natural)
Check if your instructions produce overly formal or structured text
Ensure the language setting matches your content

Pronunciation Issues

Use phonetic spelling in instructions for brand names
Example: "SipPulse" → "Sip Pulse" (with space)
Some providers support SSML for precise pronunciation control

Latency Too High

Switch to a faster TTS provider
Reduce response length in instructions
Consider streaming TTS if supported by your deployment

Profile - Configure agent identity and model
Call Configuration - Voice call behavior settings
Text to Speech Models - Detailed TTS provider comparison

Agents

Configuration

Tools

Advanced

Deploying Agents

Settings

Voice Settings

Language

TTS Model

Provider Categories

Voice Selection

Voice Characteristics to Consider

Sample Voice Selection Strategy

Advanced Configuration

Custom Voices (ElevenLabs)

Testing Your Voice

Best Practices

1. Match Voice to Channel

2. Consider Your Audience

3. Maintain Consistency

4. Test with Real Users

Troubleshooting

Voice Sounds Robotic

Pronunciation Issues

Latency Too High

Configuration

Tools

Advanced

Deploying Agents

Voice Settings ​

Language ​

TTS Model ​

Provider Categories ​

Voice Selection ​

Voice Characteristics to Consider ​

Sample Voice Selection Strategy ​

Advanced Configuration ​

Custom Voices (ElevenLabs) ​

Testing Your Voice ​

Best Practices ​

1. Match Voice to Channel ​

2. Consider Your Audience ​

3. Maintain Consistency ​

4. Test with Real Users ​

Troubleshooting ​

Voice Sounds Robotic ​

Pronunciation Issues ​

Latency Too High ​

Related Documentation ​

Voice Settings

Language

TTS Model

Provider Categories

Voice Selection

Voice Characteristics to Consider

Sample Voice Selection Strategy

Advanced Configuration

Custom Voices (ElevenLabs)

Testing Your Voice

Best Practices

1. Match Voice to Channel

2. Consider Your Audience

3. Maintain Consistency

4. Test with Real Users

Troubleshooting

Voice Sounds Robotic

Pronunciation Issues

Latency Too High

Related Documentation