Stereo Call Transcription with Diarization

When you record calls in stereo format—with each speaker on a separate audio channel—you can achieve 100% accurate speaker identification. This guide shows you how to transcribe stereo recordings, identify speakers automatically, and extract structured insights from your calls.

Why Stereo Diarization?

Aspect	Standard Diarization	Stereo Diarization
Speaker identification	AI-based detection	Channel-based (L/R)
Accuracy	Good	Perfect (100%)
Performance	Normal	Faster
Speaker labels	SPEAKER 1, SPEAKER 2...	SPEAKER_L, SPEAKER_R
Best for	Mono audio, meetings	Call center recordings

Ideal for Call Centers

Most PBX systems (FreeSWITCH, Asterisk) can record calls in stereo with each party on a separate channel. This eliminates any guesswork in speaker identification.

Prerequisites

Stereo audio file: MP3, WAV, or other supported format with speakers on separate channels
SipPulse AI API key: Get yours at sippulse.ai
Pro model access: pulse-precision-pro

Stereo Diarization Model

The pulse-precision-pro model supports both stereo and mono diarization:

Model	Speed	Accuracy	Best For
`pulse-precision-pro`	Optimal	Highest	Quality-critical stereo call transcriptions

Pro Model Features

The pulse-precision-pro model includes advanced features:

Stereo diarization: 100% accurate channel-based speaker identification
VAD preset: Use vad_preset=telephony for optimized 8kHz narrow-band audio
Highest accuracy: Best Word Error Rate (WER) for call center analytics

Step 1: Prepare Your Audio

For stereo diarization to work correctly, your audio must have:

Left channel (L): One speaker (e.g., the customer)
Right channel (R): Other speaker (e.g., the agent)

Recording Configuration

Most PBX systems support stereo recording:

FreeSWITCH: Use RECORD_STEREO=true in your dialplan
Asterisk: Configure MixMonitor with the D option for stereo

Channel Consistency

Ensure consistent channel assignment across recordings. Document whether customers are always on the left or right channel for accurate analysis.

Step 2: Transcribe with Stereo Diarization

Use the /v1/asr/transcribe endpoint with response_format=stereo_diarization and one of the Pro models.

cURLTypeScriptPython

bash

curl -X POST 'https://api.sippulse.ai/v1/asr/transcribe' \
  -H 'api-key: $SIPPULSE_API_KEY' \
  -F 'file=@call-recording.mp3' \
  -F 'model=pulse-precision-pro' \
  -F 'response_format=stereo_diarization' \
  -F 'language=en' \
  -F 'vad_preset=telephony'

typescript

import FormData from "form-data";
import fs from "fs";

async function transcribeStereoCall(
  filePath: string
): Promise<StereoTranscription> {
  const form = new FormData();
  form.append("file", fs.createReadStream(filePath));
  form.append("model", "pulse-precision-pro");
  form.append("response_format", "stereo_diarization");
  form.append("language", "en");
  form.append("vad_preset", "telephony"); // Optimized for phone calls

  const response = await fetch("https://api.sippulse.ai/v1/asr/transcribe", {
    method: "POST",
    headers: {
      "api-key": process.env.SIPPULSE_API_KEY!,
    },
    body: form,
  });

  if (!response.ok) {
    throw new Error(`API error: ${response.status}`);
  }

  return response.json();
}

interface StereoTranscription {
  text: string;
  segments: Array<{
    speaker: "SPEAKER_L" | "SPEAKER_R";
    text: string;
    initial_time: number;
    end_time: number;
  }>;
  words: Array<{
    word: string;
    speaker: "SPEAKER_L" | "SPEAKER_R";
    start: number;
    end: number;
  }>;
}

python

import os
import requests

def transcribe_stereo_call(file_path: str) -> dict:
    """
    Transcribe a stereo call recording with speaker diarization.

    Args:
        file_path: Path to the stereo audio file

    Returns:
        Transcription with speaker-labeled segments and words
    """
    with open(file_path, "rb") as audio_file:
        response = requests.post(
            "https://api.sippulse.ai/v1/asr/transcribe",
            headers={"api-key": os.getenv("SIPPULSE_API_KEY")},
            files={"file": audio_file},
            data={
                "model": "pulse-precision-pro",
                "response_format": "stereo_diarization",
                "language": "en",
                "vad_preset": "telephony",  # Optimized for phone calls
            },
        )

    response.raise_for_status()
    return response.json()

Step 3: Understand the Response

The stereo diarization response includes three main components:

Response Structure

json

{
  "text": "00:02-00:05 | SPEAKER L:\nHello, how can I help you today?\n\n00:05-00:08 | SPEAKER R:\nHi, I'm calling about my account...",
  "segments": [
    {
      "speaker": "SPEAKER_L",
      "text": "Hello, how can I help you today?",
      "initial_time": 2.1,
      "end_time": 5.3
    },
    {
      "speaker": "SPEAKER_R",
      "text": "Hi, I'm calling about my account...",
      "initial_time": 5.5,
      "end_time": 8.2
    }
  ],
  "words": [
    {
      "word": "Hello,",
      "speaker": "SPEAKER_L",
      "start": 2.1,
      "end": 2.5
    },
    {
      "word": "how",
      "speaker": "SPEAKER_L",
      "start": 2.5,
      "end": 2.7
    }
  ]
}

Key Fields

Field	Description
`text`	Formatted transcript with timestamps and speaker labels
`segments`	Array of speech segments with speaker, text, and timing
`words`	Word-level timestamps with speaker attribution
`speaker`	`SPEAKER_L` (left channel) or `SPEAKER_R` (right channel)

Step 4: Analyze with Structured Analysis

After transcription, use Structured Analysis to extract insights from the conversation.

Setting Up Your Analysis

Navigate to Structured Analysis in the SipPulse AI dashboard
Create a new analysis or select an existing preset like "Conversation Analysis"
Configure your schema with the fields you want to extract
Copy the Analysis ID using the copy button next to the analysis name

The Conversation Analysis template extracts:

Total questions and response rate
Whether the call achieved its goal (sale, resolution, etc.)
Customer interest level (0-1)
Main objections and if they were resolved
Service tone and empathy level
Overall success score (0-1)
Recommendations for improvement

Execute Analysis via API

Use the copied Analysis ID to execute the analysis programmatically:

TypeScriptPython

typescript

interface ConversationAnalysis {
  total_questions: number;
  questions_answered: string[];
  response_rate: number;
  sale_completed: boolean;
  client_interest: number;
  main_objections: string[];
  objections_resolved: boolean;
  service_tone: string;
  empathy_level: number;
  overall_score: number;
  recommendations: string[];
  next_steps: string;
}

async function analyzeConversation(
  analysisId: string,
  transcriptionText: string
): Promise<ConversationAnalysis> {
  const response = await fetch(
    `https://api.sippulse.ai/v1/structured-analyses/${analysisId}/execute`,
    {
      method: "POST",
      headers: {
        "api-key": process.env.SIPPULSE_API_KEY!,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        content: transcriptionText,
      }),
    }
  );

  if (!response.ok) {
    throw new Error(`API error: ${response.status}`);
  }

  const result = await response.json();
  return result.content;
}

python

import os
import requests
from typing import TypedDict

class ConversationAnalysis(TypedDict):
    total_questions: int
    questions_answered: list[str]
    response_rate: float
    sale_completed: bool
    client_interest: float
    main_objections: list[str]
    objections_resolved: bool
    service_tone: str
    empathy_level: float
    overall_score: float
    recommendations: list[str]
    next_steps: str

def analyze_conversation(
    analysis_id: str,
    transcription_text: str
) -> ConversationAnalysis:
    """
    Analyze a call transcription using Structured Analysis.

    Args:
        analysis_id: ID of the Conversation Analysis preset
        transcription_text: The formatted transcription text

    Returns:
        Structured analysis results
    """
    response = requests.post(
        f"https://api.sippulse.ai/v1/structured-analyses/{analysis_id}/execute",
        headers={
            "api-key": os.getenv("SIPPULSE_API_KEY"),
            "Content-Type": "application/json",
        },
        json={"content": transcription_text},
    )

    response.raise_for_status()
    return response.json()["content"]

Complete Example: End-to-End Pipeline

Here's a complete example that transcribes a stereo call and analyzes it:

TypeScriptPython

typescript

import FormData from "form-data";
import fs from "fs";

async function processCallRecording(audioPath: string, analysisId: string) {
  // Step 1: Transcribe with stereo diarization
  console.log("Transcribing audio...");
  const form = new FormData();
  form.append("file", fs.createReadStream(audioPath));
  form.append("model", "pulse-precision-pro");
  form.append("response_format", "stereo_diarization");
  form.append("language", "en");
  form.append("vad_preset", "telephony");

  const transcribeResponse = await fetch(
    "https://api.sippulse.ai/v1/asr/transcribe",
    {
      method: "POST",
      headers: { "api-key": process.env.SIPPULSE_API_KEY! },
      body: form,
    }
  );

  const transcription = await transcribeResponse.json();
  console.log(`Transcribed ${transcription.segments.length} segments`);

  // Step 2: Analyze the conversation
  console.log("Analyzing conversation...");
  const analyzeResponse = await fetch(
    `https://api.sippulse.ai/v1/structured-analyses/${analysisId}/execute`,
    {
      method: "POST",
      headers: {
        "api-key": process.env.SIPPULSE_API_KEY!,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({ content: transcription.text }),
    }
  );

  const analysis = await analyzeResponse.json();

  // Step 3: Return combined results
  return {
    transcription: {
      text: transcription.text,
      segments: transcription.segments,
      speakerCount: 2,
    },
    analysis: analysis.content,
  };
}

// Usage - get analysisId from the dashboard by clicking the copy button
const result = await processCallRecording(
  "./recordings/support-call.mp3",
  "sa_abc123def456" // Your Analysis ID from the dashboard
);

console.log("Overall Score:", result.analysis.overall_score);
console.log("Customer Interest:", result.analysis.client_interest);
console.log("Recommendations:", result.analysis.recommendations);

python

import os
import requests

def process_call_recording(audio_path: str, analysis_id: str) -> dict:
    """
    Complete pipeline: transcribe stereo call and analyze it.

    Args:
        audio_path: Path to the stereo audio file
        analysis_id: ID of the Conversation Analysis preset (copy from dashboard)

    Returns:
        Combined transcription and analysis results
    """
    api_key = os.getenv("SIPPULSE_API_KEY")

    # Step 1: Transcribe with stereo diarization
    print("Transcribing audio...")
    with open(audio_path, "rb") as audio_file:
        transcribe_response = requests.post(
            "https://api.sippulse.ai/v1/asr/transcribe",
            headers={"api-key": api_key},
            files={"file": audio_file},
            data={
                "model": "pulse-precision-pro",
                "response_format": "stereo_diarization",
                "language": "en",
                "vad_preset": "telephony",
            },
        )

    transcribe_response.raise_for_status()
    transcription = transcribe_response.json()
    print(f"Transcribed {len(transcription['segments'])} segments")

    # Step 2: Analyze the conversation
    print("Analyzing conversation...")
    analyze_response = requests.post(
        f"https://api.sippulse.ai/v1/structured-analyses/{analysis_id}/execute",
        headers={
            "api-key": api_key,
            "Content-Type": "application/json",
        },
        json={"content": transcription["text"]},
    )

    analyze_response.raise_for_status()
    analysis = analyze_response.json()

    # Step 3: Return combined results
    return {
        "transcription": {
            "text": transcription["text"],
            "segments": transcription["segments"],
            "speaker_count": 2,
        },
        "analysis": analysis["content"],
    }


if __name__ == "__main__":
    # Get analysis_id from the dashboard by clicking the copy button
    result = process_call_recording(
        "./recordings/support-call.mp3",
        "sa_abc123def456"  # Your Analysis ID from the dashboard
    )

    print(f"Overall Score: {result['analysis']['overall_score']}")
    print(f"Customer Interest: {result['analysis']['client_interest']}")
    print(f"Recommendations: {result['analysis']['recommendations']}")

Example Output

Here's what the complete response looks like for a support call:

json

{
  "transcription": {
    "text": "00:00-00:03 | SPEAKER L:\nThank you for calling TechSupport, my name is Sarah. How can I help you today?\n\n00:03-00:09 | SPEAKER R:\nHi Sarah, I'm having trouble logging into my account. It keeps saying my password is incorrect, but I'm sure I'm typing it right.\n\n00:09-00:15 | SPEAKER L:\nI'm sorry to hear that. Let me help you with that. Can I have your email address associated with the account?\n\n00:15-00:18 | SPEAKER R:\nSure, it's john.smith@email.com.\n\n00:18-00:25 | SPEAKER L:\nThank you, John. I can see your account here. It looks like there were several failed login attempts, so the account was temporarily locked for security.\n\n00:25-00:28 | SPEAKER R:\nOh, that explains it. How can I unlock it?\n\n00:28-00:38 | SPEAKER L:\nI can unlock it for you right now. I'll also send a password reset link to your email. You should receive it within the next few minutes. Is there anything else I can help you with?\n\n00:38-00:42 | SPEAKER R:\nNo, that's all I needed. Thank you so much for your help, Sarah!\n\n00:42-00:45 | SPEAKER L:\nYou're welcome, John! Have a great day!",
    "segments": [
      {
        "speaker": "SPEAKER_L",
        "text": "Thank you for calling TechSupport, my name is Sarah. How can I help you today?",
        "initial_time": 0.0,
        "end_time": 3.2
      },
      {
        "speaker": "SPEAKER_R",
        "text": "Hi Sarah, I'm having trouble logging into my account. It keeps saying my password is incorrect, but I'm sure I'm typing it right.",
        "initial_time": 3.5,
        "end_time": 9.1
      },
      {
        "speaker": "SPEAKER_L",
        "text": "I'm sorry to hear that. Let me help you with that. Can I have your email address associated with the account?",
        "initial_time": 9.4,
        "end_time": 15.0
      },
      {
        "speaker": "SPEAKER_R",
        "text": "Sure, it's john.smith@email.com.",
        "initial_time": 15.2,
        "end_time": 18.0
      },
      {
        "speaker": "SPEAKER_L",
        "text": "Thank you, John. I can see your account here. It looks like there were several failed login attempts, so the account was temporarily locked for security.",
        "initial_time": 18.3,
        "end_time": 25.5
      },
      {
        "speaker": "SPEAKER_R",
        "text": "Oh, that explains it. How can I unlock it?",
        "initial_time": 25.8,
        "end_time": 28.2
      },
      {
        "speaker": "SPEAKER_L",
        "text": "I can unlock it for you right now. I'll also send a password reset link to your email. You should receive it within the next few minutes. Is there anything else I can help you with?",
        "initial_time": 28.5,
        "end_time": 38.0
      },
      {
        "speaker": "SPEAKER_R",
        "text": "No, that's all I needed. Thank you so much for your help, Sarah!",
        "initial_time": 38.3,
        "end_time": 42.0
      },
      {
        "speaker": "SPEAKER_L",
        "text": "You're welcome, John! Have a great day!",
        "initial_time": 42.2,
        "end_time": 45.0
      }
    ],
    "speaker_count": 2
  },
  "analysis": {
    "total_questions": 3,
    "questions_answered": [
      "How can I help you today?",
      "Can I have your email address?",
      "Is there anything else I can help you with?"
    ],
    "response_rate": 1.0,
    "sale_completed": false,
    "client_interest": 0.75,
    "main_objections": [],
    "objections_resolved": true,
    "service_tone": "professional, friendly, and empathetic",
    "empathy_level": 0.9,
    "technical_knowledge": 0.85,
    "overall_score": 0.92,
    "recommendations": [
      "Consider proactively offering account security tips",
      "Could mention estimated time for password reset email"
    ],
    "next_steps": "Customer will receive password reset email and regain account access"
  }
}

Best Practices

Audio Quality

Sample rate: 16kHz or higher for best results (8kHz telephony audio is also supported)
Bit depth: 16-bit minimum
Format: MP3 or WAV both work well

Channel Assignment

Be consistent: Always assign the same party to the same channel
Document it: Note whether agents are on L or R in your configuration
Map speakers: In your application, map SPEAKER_L/SPEAKER_R to meaningful labels (Agent/Customer)

Choosing the Right Approach

Scenario	Recommended Model	Response Format
Stereo call recordings	`pulse-precision-pro`	`stereo_diarization`
Mono recordings with multiple speakers	`pulse-precision-pro`	`diarization`

Performance Tips

Use vad_preset=telephony: Optimized for phone call audio characteristics
Batch processing: For large volumes, process files in parallel
Combine with anonymization: Add anonymize=true to remove PII automatically

Next Steps

Structured Analysis - Create custom analysis schemas
Speech-to-Text Models - Explore all STT options
Advanced Call Analysis - Multi-step processing pipeline
Request Tracking - Monitor API usage

Agents

Configuration

Tools

Advanced

Deploying Agents

Settings

Stereo Call Transcription with Diarization

Why Stereo Diarization?

Prerequisites

Stereo Diarization Model

Step 1: Prepare Your Audio

Recording Configuration

Step 2: Transcribe with Stereo Diarization

Step 3: Understand the Response

Response Structure

Key Fields

Step 4: Analyze with Structured Analysis

Setting Up Your Analysis

Execute Analysis via API

Complete Example: End-to-End Pipeline

Example Output

Best Practices

Audio Quality

Channel Assignment

Choosing the Right Approach

Performance Tips

Next Steps

Configuration

Tools

Advanced

Deploying Agents

Stereo Call Transcription with Diarization ​

Why Stereo Diarization? ​

Prerequisites ​

Stereo Diarization Model ​

Step 1: Prepare Your Audio ​

Recording Configuration ​

Step 2: Transcribe with Stereo Diarization ​

Step 3: Understand the Response ​

Response Structure ​

Key Fields ​

Step 4: Analyze with Structured Analysis ​

Setting Up Your Analysis ​

Execute Analysis via API ​

Complete Example: End-to-End Pipeline ​

Example Output ​

Best Practices ​

Audio Quality ​

Channel Assignment ​

Choosing the Right Approach ​

Performance Tips ​

Next Steps ​

Stereo Call Transcription with Diarization

Why Stereo Diarization?

Prerequisites

Stereo Diarization Model

Step 1: Prepare Your Audio

Recording Configuration

Step 2: Transcribe with Stereo Diarization

Step 3: Understand the Response

Response Structure

Key Fields

Step 4: Analyze with Structured Analysis

Setting Up Your Analysis

Execute Analysis via API

Complete Example: End-to-End Pipeline

Example Output

Best Practices

Audio Quality

Channel Assignment

Choosing the Right Approach

Performance Tips

Next Steps