Skip to content

Voice Settings

The Voice Settings tab controls how your agent sounds during voice conversations. The right voice selection significantly impacts user experience—a professional voice builds trust for customer support, while a warm, friendly voice works better for sales interactions.

Voice Settings tab

Language

Select the primary language your agent will speak. This setting affects:

  • Speech recognition - How the STT (Speech-to-Text) model interprets user speech
  • Voice synthesis - The language model used for TTS (Text-to-Speech)
  • Available voices - Different languages have different voice options

Language vs. Instructions

The language setting controls the speech engine, not what language your agent "knows". An agent with English voice settings can still understand Portuguese if you include it in the instructions—but the speech engines will be optimized for English pronunciation.


TTS Model

Choose the Text-to-Speech model that will convert your agent's text responses into spoken audio. The available models appear in the dropdown and vary as providers release new options.

Provider Categories

TTS providers generally fall into these categories:

CategoryTrade-off
Fast/Low-latencyBest for real-time voice calls
High-qualityBest for pre-recorded or quality-critical applications
CustomizableBest for brand voices and cloned voices

Voice Latency Matters for Phone Calls

For voice agents handling phone calls, latency is critical. Users notice delays, and longer pauses feel unnatural. For phone deployments, prefer faster TTS options over higher-quality but slower ones.


Voice Selection

After selecting a TTS model, choose from the available voices. Each provider offers different voice options:

Voice Characteristics to Consider

CharacteristicImpact on User Experience
GenderMatch your brand voice or user expectations
AgeYounger voices feel casual; mature voices feel authoritative
ToneProfessional, friendly, neutral
AccentConsider your target audience's expectations

Sample Voice Selection Strategy

Use CaseRecommended Voice Style
Customer SupportCalm, professional, neutral accent
SalesWarm, enthusiastic, engaging
Technical SupportClear, patient, slightly slower pace
Medical/LegalProfessional, trustworthy, measured
Casual ChatFriendly, upbeat, conversational

Voice ≠ Personality

A friendly-sounding voice doesn't make your agent friendly—that comes from the instructions. Similarly, a professional voice won't help if your agent's responses are casual. Match voice selection with instruction tone.


Advanced Configuration

Custom Voices (ElevenLabs)

To use custom or cloned voices from ElevenLabs, you need to integrate your own ElevenLabs API key:

  1. Configure your ElevenLabs API key in Provider Integrations
  2. Create or clone a voice in your ElevenLabs account
  3. The voice will automatically appear in the SipPulse AI voice dropdown

API Key Required for Custom Voices

Custom and cloned voices are only available when using your own ElevenLabs API key. The default platform voices do not include access to your ElevenLabs library.

Voice Cloning Considerations

Custom voice cloning requires appropriate rights and consent. Ensure you have permission to clone any voice and comply with regional regulations on synthetic voice usage.


Testing Your Voice

The Playground lets you test voice settings before deployment:

  1. Configure your voice settings
  2. Open the Playground
  3. Enable Voice Mode
  4. Have a test conversation
  5. Adjust settings based on the experience

Test Different Scenarios

When testing, try:

  • Long responses (does the voice stay natural?)
  • Technical terms (are they pronounced correctly?)
  • Numbers and dates (clear articulation?)
  • Emotional scenarios (appropriate tone?)

Best Practices

1. Match Voice to Channel

  • Phone/SIP: Prioritize low latency (OpenAI TTS, Kokoro)
  • Chat Widget with Audio: Balance quality and speed (OpenAI TTS HD)
  • Pre-recorded Messages: Maximize quality (ElevenLabs)

2. Consider Your Audience

  • B2B: Professional, authoritative voices
  • B2C: Warm, approachable voices
  • Technical Users: Clear, measured pace
  • General Public: Friendly, patient tone

3. Maintain Consistency

Use the same voice across all agents in a product line to build brand recognition. Users should feel they're speaking with the same "assistant" regardless of the specific agent.

4. Test with Real Users

Voice preference is subjective. If possible, A/B test different voices with actual users to find what resonates with your specific audience.


Troubleshooting

Voice Sounds Robotic

  • Try a different TTS provider (ElevenLabs often sounds most natural)
  • Check if your instructions produce overly formal or structured text
  • Ensure the language setting matches your content

Pronunciation Issues

  • Use phonetic spelling in instructions for brand names
  • Example: "SipPulse" → "Sip Pulse" (with space)
  • Some providers support SSML for precise pronunciation control

Latency Too High

  • Switch to a faster TTS provider
  • Reduce response length in instructions
  • Consider streaming TTS if supported by your deployment