Speech to TextRealtime transcription

Quickstart

Learn how to transcribe streaming audio to text in real time.

The quickest way to try real-time transcription is via the web portal — no code required.

Using the Realtime API

The Realtime API streams audio over a WebSocket connection and returns transcript results as you speak. Unlike the Batch API, results arrive continuously — within milliseconds of the spoken words.

1. Create an API key

Create an API key in the portal, which you'll use to securely access the API. Store the key as a managed secret.

Enterprise customers may need to speak to Support to get your API keys.

2. Install the library

Install using pip:

pip install speechmatics-rt pyaudio

pyaudio is required for microphone input in this quickstart.

Install using npm:

npm install @speechmatics/real-time-client @speechmatics/auth

3. Run the example

Replace YOUR_API_KEY with your key, then run the script.

import asyncio
from speechmatics.rt import (
    AudioEncoding, AudioFormat, AuthenticationError,
    Microphone, ServerMessageType, TranscriptResult,
    TranscriptionConfig, AsyncClient,
)

API_KEY = YOUR_API_KEY

# Set up config and format for transcription
audio_format = AudioFormat(
    encoding=AudioEncoding.PCM_S16LE, 
    sample_rate=16000, 
    chunk_size=4096,
)
config = TranscriptionConfig(
    language="en", 
    max_delay=0.7,
)

async def main():

    # Set up microphone
    mic = Microphone(
        sample_rate=audio_format.sample_rate, 
        chunk_size=audio_format.chunk_size
    )
    if not mic.start():
        print("Mic not started — please install PyAudio")

    try:
        async with AsyncClient(api_key=API_KEY) as client:
            # Handle ADD_TRANSCRIPT message
            @client.on(ServerMessageType.ADD_TRANSCRIPT)
            def handle_finals(msg):
                if final := TranscriptResult.from_message(msg).metadata.transcript:
                    print(f"[Final]: {final}")

            try:
                # Begin transcribing
                await client.start_session(
                    transcription_config=config, 
                    audio_format=audio_format
                )
                while True:
                    await client.send_audio(
                        await mic.read(
                            chunk_size=audio_format.chunk_size
                        )
                    )
            except KeyboardInterrupt:
                pass
            finally:
                mic.stop()

    except AuthenticationError as e:
        print(f"Auth error: {e}")

if __name__ == "__main__":
    asyncio.run(main())

Speak into your microphone. You should see output like:

[Final]: Hello, welcome to Speechmatics.
[Final]: This is a real-time transcription example.

Press Ctrl+C to stop.

import https from "node:https";
import { createSpeechmaticsJWT } from "@speechmatics/auth";
import { RealtimeClient } from "@speechmatics/real-time-client";

const apiKey = YOUR_API_KEY;
const client = new RealtimeClient();
const streamURL = "https://media-ice.musicradio.com/LBCUKMP3";

async function transcribe() {
  // Print transcript as we receive it
  client.addEventListener("receiveMessage", ({ data }) => {
    if (data.message === "AddTranscript") {
      for (const result of data.results) {
        if (result.type === "word") {
          process.stdout.write(" ");
        }
        process.stdout.write(`${result.alternatives?.[0].content}`);
        if (result.is_eos) {
          process.stdout.write("\n");
        }
      }
    } else if (data.message === "EndOfTranscript") {
      process.stdout.write("\n");
      process.exit(0);
    } else if (data.message === "Error") {
      process.stdout.write(`\n${JSON.stringify(data)}\n`);
      process.exit(1);
    }
  });

  const jwt = await createSpeechmaticsJWT({
    type: "rt",
    apiKey,
    ttl: 60, // 1 minute
  });

  await client.start(jwt, {
    transcription_config: {
      language: "en",
      operating_point: "enhanced",
      max_delay: 1.0,
      transcript_filtering_config: {
        remove_disfluencies: true,
      },
    },
  });

  const stream = https.get(streamURL, (response) => {
    // Handle the response stream
    response.on("data", (chunk) => {
      client.sendAudio(chunk);
    });

    response.on("end", () => {
      console.log("Stream ended");
      client.stopRecognition({ noTimeout: true });
    });

    response.on("error", (error) => {
      console.error("Stream error:", error);
      client.stopRecognition();
    });
  });

  stream.on("error", (error) => {
    console.error("Request error:", error);
    client.stopRecognition();
  });
}

transcribe();

This example transcribes a live radio stream. You should see a rolling transcript printed to the console. Press Ctrl+C to stop.

Understanding the output

The API returns two types of transcript results. You can use either or both depending on your use case.

Type	Latency	Stability	Best for
Final	~0.7–2s	Definitive, never revised	Accurate transcripts, subtitles
Partial	<500ms	May be revised	Live captions, voice interfaces

Finals represent the best transcription for a span of audio and are never updated once emitted. You can tune their latency using max_delay — lower values reduce delay at the cost of slight accuracy.

Partials are emitted immediately as audio arrives and may be revised as more context is processed. A common pattern is to display partials immediately, then replace them with finals as they arrive.

To receive partials, set enable_partials=True in your TranscriptionConfig and register a handler for ADD_PARTIAL_TRANSCRIPT:

config = TranscriptionConfig(
    language="en",
    max_delay=0.7,
    enable_partials=True,  # Enable partial transcripts
)

async with AsyncClient(api_key=API_KEY) as client:
    @client.on(ServerMessageType.ADD_PARTIAL_TRANSCRIPT)
    def handle_partials(msg):
        if partial := TranscriptResult.from_message(msg).metadata.transcript:
            print(f"[Partial]: {partial}")

    @client.on(ServerMessageType.ADD_TRANSCRIPT)
    def handle_finals(msg):
        if final := TranscriptResult.from_message(msg).metadata.transcript:
            print(f"[Final]:   {final}")

With both handlers registered, you'll see partials arrive first, then be superseded by the final result:

[Partial]: Hello wel
[Partial]: Hello welcome to
[Final]:   Hello, welcome to Speechmatics.

await client.start(jwt, {
    transcription_config: {
        language: "en",
        enable_partials: true, // Enable partial transcripts
    },
});

client.addEventListener("receiveMessage", ({ data }) => {
    if (data.message === "AddPartialTranscript") {
        process.stdout.write(`[Partial]: ${data.metadata.transcript}\r`);
    } else if (data.message === "AddTranscript") {
        console.log(`[Final]:   ${data.metadata.transcript}`);
    }
});

With both handlers registered, you'll see partials arrive first, then be superseded by the final result:

[Partial]: Hello wel
[Partial]: Hello welcome to
[Final]:   Hello, welcome to Speechmatics.

Next steps

Now that you have real-time transcription working, explore these features to build more powerful applications.

Using the Realtime API​

1. Create an API key​

2. Install the library​

3. Run the example​

Understanding the output​

Next steps​

Speaker Diarization

Custom Dictionary

Turn Detection

Output & Latency

Audio Input

Speaker Identification