Skip to content

Text-to-Speech (TTS)

TTS (Text-to-Speech) services from SipPulse AI convert written text into natural-sounding audio, enabling your applications to "speak" to users. Our platform offers a variety of models from renowned providers such as OpenAI, ElevenLabs, and Microsoft, each with a distinct set of voices and features.

For detailed information on pricing and model specifications, see our Pricing page.

1. Interactive Text-to-Speech Playground

The Interactive Text-to-Speech Playground (access here) is the ideal tool to experiment with and validate TTS models intuitively before integrating them via API:

  • Model and Voice Selection: Explore various speech synthesis models (OpenAI, ElevenLabs, Microsoft) and the available voices for each.
  • Text Input: Enter the text you want to convert to audio.
  • Parameter Adjustment: Configure parameters such as speed (speed) and audio output format (response_format).
  • Immediate Generation and Playback: Run the synthesis and listen to the resulting audio directly in the interface.
  • Code Preview: Get code samples in cURL, Python, and JavaScript, pre-configured with the model, voice, and parameters you tested, making implementation easier.

The Playground is an excellent way to discover the perfect voice for your project and understand how different parameters affect the final speech synthesis result.

2. Consuming via REST API

Integrating TTS functionality into your applications is done through calls to our REST API.

2.1. Synthesize Speech

Use the /v1/tts/generate endpoint to convert a text string into audio data.

Endpoint: POST /v1/tts/generate

Request Body (JSON):

  • input (string, required): The text to be converted to speech.
  • model (string, required): The TTS model name to use (e.g., "tts-1" for OpenAI, "eleven_multilingual_v2" for ElevenLabs, or a specific Microsoft model). See the /v1/tts/models endpoint for a list of available models.
  • voice (string, required): The key of the specific voice to use for synthesis (e.g., "alloy", "shimmer" for OpenAI TTS; a voice ID for ElevenLabs; or a Microsoft voice name like "pt-BR-FranciscaNeural"). See the /v1/tts/voices endpoint for available voice keys for each model.
  • response_format (string, optional, default: "mp3"): The output audio file format. Supported values: "mp3", "opus", "aac", "flac", "wav", "pcm".
  • speed (float, optional, default: 1.0): Controls speech speed. Typical values between 0.25 and 4.0. The exact range may vary by model.

Response (JSON):

  • Success (200 OK):
json
{
  "filename": "string", // Generated file name
  "usage": {}, // Usage info (may vary)
  "performance": {}, // Performance info (may vary)
  "unit": "string", // Cost unit (e.g., "characters")
  "stream": "string", // URL for audio streaming
  "download": "string" // URL for audio download
}
  • Error: JSON response with appropriate HTTP status code and error details in the body.
bash
# Example: Synthesize speech
curl -X POST 'https://api.sippulse.ai/v1/tts/generate' \
  -H 'api-key: $SIPPULSE_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
  "input": "Hello, world! This is a speech synthesis demo.",
  "model": "tts-1",
  "voice": "alloy",
  "response_format": "mp3",
  "speed": 1.1
  }'
python
import os
import requests
import json

def synthesize_speech(
  text_input: str,
  model_id: str,
  voice_key: str,
  response_format: str = "mp3",
  speed: float = 1.0
) -> dict | None:
  """
  Synthesizes speech from text using the SipPulse AI API.
  """
  api_url = "https://api.sippulse.ai/v1/tts/generate"
  api_key = os.getenv("SIPPULSE_API_KEY")

  if not api_key:
    print("Error: SIPPULSE_API_KEY environment variable is not set.")
    return None

  headers = {
    "api-key": api_key,
    "Content-Type": "application/json"
  }
  payload = {
    "input": text_input,
    "model": model_id,
    "voice": voice_key,
    "response_format": response_format,
    "speed": speed
  }

  try:
    response = requests.post(api_url, headers=headers, data=json.dumps(payload))
    response.raise_for_status()
    return response.json()
  except requests.exceptions.HTTPError as e:
    error_content = e.response.text
    try:
      error_json = e.response.json()
      error_content = json.dumps(error_json, indent=2)
    except json.JSONDecodeError:
      pass
    print(f"API error: {e.response.status_code}\n{error_content}")
  except Exception as e:
    print(f"An unexpected error occurred: {e}")
  return None

if __name__ == "__main__":
  tts_result = synthesize_speech(
    text_input="Testing speech synthesis with the API.",
    model_id="tts-1", # Example with OpenAI model
    voice_key="nova",    # Example with OpenAI voice
    response_format="mp3",
    speed=1.0
  )
  if tts_result:
    print("Synthesis successful:")
    print(json.dumps(tts_result, indent=2, ensure_ascii=False))
    print(f"Download link: {tts_result.get('download')}")
    print(f"Stream link: {tts_result.get('stream')}")
javascript
// Node.js with fetch
async function synthesizeSpeech({
  textInput,
  modelId,
  voiceKey,
  responseFormat = "mp3",
  speed = 1.0,
}) {
  const apiUrl = "https://api.sippulse.ai/v1/tts/generate";
  const apiKey = process.env.SIPPULSE_API_KEY;

  if (!apiKey) {
  console.error("SIPPULSE_API_KEY environment variable is not set.");
  return null;
  }

  const payload = {
  input: textInput,
  model: modelId,
  voice: voiceKey,
  response_format: responseFormat,
  speed,
  };

  try {
  const response = await fetch(apiUrl, {
    method: "POST",
    headers: {
    "api-key": apiKey,
    "Content-Type": "application/json",
    },
    body: JSON.stringify(payload),
  });

  if (!response.ok) {
    let errorBody = await response.text();
    try {
    errorBody = JSON.stringify(JSON.parse(errorBody), null, 2);
    } catch (e) { /* not JSON */ }
    throw new Error(`API error: ${response.status} ${response.statusText}\n${errorBody}`);
  }
  return response.json();
  } catch (error) {
  console.error("Failed to synthesize speech:", error);
  return null;
  }
}

// Usage example:
// (async () => {
//   const result = await synthesizeSpeech({
//     textInput: "Hello, JavaScript speaking here!",
//     modelId: "eleven_multilingual_v2", // Example with ElevenLabs
//     voiceKey: "VOICE_ID_ELEVENLABS", // Replace with desired ElevenLabs voice ID
//     responseFormat: "opus",
//     speed: 0.9
//   });
//   if (result) {
//     console.log("Synthesis result:", JSON.stringify(result, null, 2));
//   }
// })();

2.2. List Available TTS Models

To query the TTS (Text-to-Speech) models currently available for your organization:

Endpoint: GET /v1/tts/models

Query Parameters:

  • status (string, optional): Filter models by status (active or inactive). Default: active.
bash
curl -X GET 'https://api.sippulse.ai/v1/tts/models?status=active' \
  -H 'api-key: $SIPPULSE_API_KEY' \
  -H 'Accept: application/json'
python
# (Implementation similar to section 2.1, adapted for GET /v1/tts/models)
# Example call: list_tts_models()
javascript
# (Implementation similar to section 2.1, adapted for GET /v1/tts/models)
# Example call: listTTSModels()

Example Response (JSON):

json
[
  {
  "name": "tts-1", // OpenAI model
  "status": "active",
  "provider": "openai"
  },
  {
  "name": "eleven_multilingual_v2", // ElevenLabs model
  "status": "active",
  "provider": "elevenlabs"
  },
  {
  "name": "MicrosoftSpeechModel", // Example Microsoft model
  "status": "active",
  "provider": "microsoft"
  }
  // ... other models
]

2.3. List Available Voices

To query the voices available for synthesis. The response is a Record where the key is the model name and the value is a list of its voices.

Endpoint: GET /v1/tts/voices

bash
curl -X GET 'https://api.sippulse.ai/v1/tts/voices' \
  -H 'api-key: $SIPPULSE_API_KEY' \
  -H 'Accept: application/json'
python
# (Implementation similar to section 2.1, adapted for GET /v1/tts/voices)
# Example call: list_tts_voices()
javascript
# (Implementation similar to section 2.1, adapted for GET /v1/tts/voices)
# Example call: listTTSVoices()

Example Response (JSON):

json
{
  "tts-1": [ // Voices for "tts-1" model (OpenAI)
  { "name": "Alloy", "key": "alloy", "language": "multilingual" },
  { "name": "Echo", "key": "echo", "language": "multilingual" },
  { "name": "Fable", "key": "fable", "language": "multilingual" },
  { "name": "Onyx", "key": "onyx", "language": "multilingual" },
  { "name": "Nova", "key": "nova", "language": "multilingual" },
  { "name": "Shimmer", "key": "shimmer", "language": "multilingual" }
  ],
  "eleven_multilingual_v2": [ // Voices for "eleven_multilingual_v2" model (ElevenLabs)
  { "name": "Rachel", "key": "21m00Tcm4TlvDq8ikWAM", "language": "multilingual" },
  { "name": "Adam", "key": "pNInz6obpgDQGcFmaJgB", "language": "multilingual" }
  // ... other ElevenLabs voices
  ],
  "MicrosoftSpeechModel": [ // Voices for a Microsoft model
  { "name": "Francisca (Portuguese, Brazil)", "key": "pt-BR-FranciscaNeural", "language": "pt-BR" },
  { "name": "Antonio (Portuguese, Brazil)", "key": "pt-BR-AntonioNeural", "language": "pt-BR" }
  // ... other Microsoft voices
  ]
}

Use the desired voice key in the voice parameter of the synthesis request (/v1/tts/generate).

3. Supported Audio Formats (response_format)

SipPulse AI TTS supports the following audio output formats:

  • mp3: audio/mpeg
  • opus: audio/ogg (Opus encapsulated in Ogg)
  • aac: audio/aac
  • flac: audio/flac
  • wav: audio/wav
  • pcm: audio/L16; rate=24000; channels=1 (Linear PCM, 16-bit, 24kHz, mono)

Choose the format that best fits your application's requirements in terms of quality, file size, and audio player compatibility.

4. Integration with OpenAI SDK

For developers who prefer to use the official OpenAI SDK, SipPulse AI offers compatibility. Set the OpenAI client's baseURL to the SipPulse AI endpoint: https://api.sippulse.ai/v1/openai.

When using this integration for TTS, the SipPulse AI API will return the binary audio stream in the requested format, instead of a JSON object with download/stream links. This matches the default behavior of the OpenAI SDK for speech synthesis.

python
import os
from openai import OpenAI

# Configure the OpenAI client to use the SipPulse AI endpoint
client = OpenAI(
  api_key=os.environ.get("SIPPULSE_API_KEY"),
  base_url="https://api.sippulse.ai/v1/openai" # SipPulse AI compatibility endpoint
)

try:
  response = client.audio.speech.create(
  model="tts-1", # OpenAI TTS model available on SipPulse AI
  voice="alloy",   # Desired voice
  input="Hello, this audio was generated using the OpenAI SDK via SipPulse AI!",
  response_format="mp3" # Desired audio format
  # speed may be supported depending on the compatibility endpoint implementation
  )
  # The 'response' contains the audio stream.
  # You can save it to a file:
  response.stream_to_file("sippulse_openai_sdk_output.mp3")
  print("Audio generated and saved as sippulse_openai_sdk_output.mp3")

except Exception as e:
  print(f"An error occurred: {e}")
javascript
// Example usage with the OpenAI JavaScript SDK in Node.js
import OpenAI from "openai";
import fs from "fs";
import path from "path";

const openai = new OpenAI({
  apiKey: process.env.SIPPULSE_API_KEY,
  baseURL: "https://api.sippulse.ai/v1/openai"
});

async function main() {
  try {
  const speechStream = await openai.audio.speech.create({
    model: "tts-1",
    voice: "nova",
    input: "Testing speech synthesis with the JavaScript SDK and SipPulse AI.",
    response_format: "opus"
  });

  // speechStream is a ReadableStream. You can save it to a file.
  const filePath = path.resolve("./sippulse_openai_sdk_output.opus");
  const writer = fs.createWriteStream(filePath);

  // Node.js < 18 (without ReadableStream.toWeb()):
  // speechStream.body.pipe(writer);
  // await new Promise((resolve, reject) => {
  //   writer.on("finish", resolve);
  //   writer.on("error", reject);
  // });

  // Node.js >= 18 (with ReadableStream.toWeb() and Readable.fromWeb())
  // Or if the SDK returns a Node ReadableStream directly:
  for await (const chunk of speechStream) {
    writer.write(chunk);
  }
  writer.end();
  // Alternatively, if speechStream.body is a web stream:
  // const nodeStream = Readable.fromWeb(speechStream.body);
  // nodeStream.pipe(writer);
  // ... (promise code to wait for 'finish')


  console.log(`Audio generated and saved as ${filePath}`);

  } catch (error) {
  console.error("An error occurred:", error);
  }
}

main();

Note: When using the OpenAI SDK, ensure that the specified model and voice are compatible with SipPulse AI's offering for the endpoint https://api.sippulse.ai/v1/openai. Parameters like speed may have specific behavior or limitations through this compatibility interface.

5. Best Practices for Speech Synthesis

  • Clear and Well-Structured Text: Provide grammatically correct and well-punctuated text for the best prosody and intelligibility.
  • Appropriate Voice and Language Selection: Use the /v1/tts/models and /v1/tts/voices endpoints to select the model and voice combination that best fits your target audience and application context.
  • Experiment with Output Formats: Test different response_format values to find the ideal balance between sound quality and file size.
  • Robust Error Handling: Implement detailed error handling in your application to deal with possible API failures.
  • Caching: For frequently synthesized texts, store the download or stream links (or the audio itself, if downloaded) to avoid repeated requests and optimize costs and latency.

6. Frequently Asked Questions (FAQ)

Q: How are costs calculated for the TTS service?
A: Costs are typically based on the number of characters in the input field processed for synthesis. Different models and voices (especially premium or custom ones) may have different costs. See the Pricing page and your account dashboard for precise details.

Q: Can I use the download and stream links multiple times?
A: Yes, the links provided in the /v1/tts/generate API response can be used to access the generated audio. However, the long-term availability of these links may depend on SipPulse AI's storage policies. For persistent use, it is recommended to download the audio.