Skip to content

SipPulse AI - Text to Speech Conversion

Text to Speech Interface

The SipPulse AI platform provides powerful capabilities for converting text into natural-sounding speech. This section documents how to use the Text to Speech tool to transform written content into audio.

Text to Speech Interface

Accessing the Text to Speech Interface

  1. Navigating to the Tool:

    • Access the left sidebar menu
    • Click on "Playground" to expand the options
    • Select "Text to Speech" to open the conversion interface
  2. Main Interface Components:

    • Text Input Area: Central zone for entering the text you want to convert
    • Audio Visualization: Waveform display of the generated audio
    • Playback Controls: Play button, download option, and timeline
    • Generated Audio List: Right panel showing previously generated audio files

Configuration Options

  1. Model and Voice Settings:

    • Model: Select from available TTS models (e.g., "eleven-labs-flash")
    • Language Options: Choose "Multi-language" or select a specific language
    • Voice Selection: Select from different voice options (e.g., "Aria")
    • Format: Choose the output audio format (e.g., "mp3")
  2. Advanced Features:

    • Autoplay: Option to automatically play audio after generation
    • Voice Customization: Some models allow adjusting voice characteristics
    • Experimental Features: Access to new capabilities (tagged as "Experimental")
  3. Keyboard Shortcuts:

    • Press "Ctrl+Enter" to quickly generate speech from the entered text
    • Use playback controls to manage audio playback

Speech Generation Process

  1. Text Preparation:

    • Enter the text you want to convert into the text input area
    • Format your text with appropriate punctuation for natural speech patterns
    • Consider using SSML tags for advanced voice control (if supported by the selected model)
  2. Configuration Selection:

    • Choose the appropriate model for your needs
    • Select the desired voice that matches your content
    • Configure language settings if generating speech in specific languages
  3. Generating Speech:

    • Click the "Speak" button or use the Ctrl+Enter shortcut
    • The system processes your text and generates the audio
    • The waveform visualization displays once processing is complete
  4. Review and Export:

    • Listen to the generated audio using the playback controls
    • Download the audio file using the download button
    • Regenerate with different settings if needed

API Integration

SipPulse AI provides a RESTful API for integrating text-to-speech capabilities directly into your applications. Below are examples of how to use the API in different programming languages.

API Parameters

  • model: Specifies the TTS model (e.g., eleven-labs-flash)
  • voice: Determines which voice to use (e.g., aria)
  • format: Output audio format (e.g., mp3)
  • api-key: Your SipPulse API authentication key

Python Example

python
import requests

url = 'https://api.sippulse.ai/tts/synthesize'
headers = {
    'accept': 'application/json',
    'content-type': 'application/json',
    'api-key': '$SIPPULSE_API_KEY'
}
data = {
    'text': "Let's see if it works",
    'model': 'eleven-labs-flash',
    'voice': 'aria',
    'format': 'mp3'
}

response = requests.post(url, json=data, headers=headers)

if response.status_code == 200:
    with open('output.mp3', 'wb') as f:
        f.write(response.content)
    print("Audio file created successfully!")
else:
    print(f"Error: {response.status_code}")
    print(response.text)

Node.js Example

javascript
const fetch = require('node-fetch');
const fs = require('fs');

const url = 'https://api.sippulse.ai/tts/synthesize';
const data = {
  text: "Let's see if it works",
  model: 'eleven-labs-flash',
  voice: 'aria',
  format: 'mp3'
};

const options = {
  method: 'POST',
  headers: {
    'accept': 'application/json',
    'content-type': 'application/json',
    'api-key': '$SIPPULSE_API_KEY'
  },
  body: JSON.stringify(data)
};

fetch(url, options)
  .then(response => {
    if (!response.ok) {
      throw new Error(`HTTP error! Status: ${response.status}`);
    }
    return response.buffer();
  })
  .then(buffer => {
    fs.writeFileSync('output.mp3', buffer);
    console.log('Audio file created successfully!');
  })
  .catch(error => console.error('Error:', error));

cURL Example

bash
curl -X 'POST' \
  'https://api.sippulse.ai/tts/synthesize' \
  -H 'accept: application/json' \
  -H 'content-type: application/json' \
  -H 'api-key: $SIPPULSE_API_KEY' \
  -d '{
    "text": "Let'\''s see if it works",
    "model": "eleven-labs-flash",
    "voice": "aria",
    "format": "mp3"
  }' \
  --output output.mp3

Usage Considerations

  1. Voice Quality Optimization:

    • Keep sentences at a natural length for more realistic speech patterns
    • Use appropriate punctuation to control pacing and intonation
    • Test different voices to find the best match for your content
  2. Cost Management:

    • Be aware that costs typically scale with the length of the text
    • Consider breaking very long texts into smaller segments
    • Use the appropriate model based on quality vs. cost requirements
  3. Performance Factors:

    • Higher quality models may have longer processing times
    • Very long texts will take more time to process
    • Some voices may be optimized for specific languages or content types
  4. Content Guidelines:

    • Avoid generating speech with sensitive personal information
    • Be aware of usage policies regarding impersonation or deception
    • Follow local regulations regarding synthetic voice generation

By using the SipPulse AI Text to Speech tool, you can efficiently convert written content into natural-sounding speech for a wide range of applications, including voice assistants, educational content, accessibility features, and more.