Skip to main content
Speech to TextRealtime transcription

Quickstart

Learn how to transcribe streaming audio to text in real time.

The quickest way to try real-time transcription is via the web portal — no code required.

Using the Realtime API

The Realtime API streams audio over a WebSocket connection and returns transcript results as you speak. Unlike the Batch API, results arrive continuously — within milliseconds of the spoken words.

1. Create an API key

Create an API key in the portal, which you'll use to securely access the API. Store the key as a managed secret.

Enterprise customers may need to speak to Support to get your API keys.

2. Install the library

Install using pip:

pip install speechmatics-rt pyaudio

pyaudio is required for microphone input in this quickstart.

3. Run the example

Replace YOUR_API_KEY with your key, then run the script.

import asyncio
from speechmatics.rt import (
AudioEncoding, AudioFormat, AuthenticationError,
Microphone, ServerMessageType, TranscriptResult,
TranscriptionConfig, AsyncClient,
)

API_KEY = YOUR_API_KEY

# Set up config and format for transcription
audio_format = AudioFormat(
encoding=AudioEncoding.PCM_S16LE,
sample_rate=16000,
chunk_size=4096,
)
config = TranscriptionConfig(
language="en",
max_delay=0.7,
)

async def main():

# Set up microphone
mic = Microphone(
sample_rate=audio_format.sample_rate,
chunk_size=audio_format.chunk_size
)
if not mic.start():
print("Mic not started — please install PyAudio")

try:
async with AsyncClient(api_key=API_KEY) as client:
# Handle ADD_TRANSCRIPT message
@client.on(ServerMessageType.ADD_TRANSCRIPT)
def handle_finals(msg):
if final := TranscriptResult.from_message(msg).metadata.transcript:
print(f"[Final]: {final}")

try:
# Begin transcribing
await client.start_session(
transcription_config=config,
audio_format=audio_format
)
while True:
await client.send_audio(
await mic.read(
chunk_size=audio_format.chunk_size
)
)
except KeyboardInterrupt:
pass
finally:
mic.stop()

except AuthenticationError as e:
print(f"Auth error: {e}")

if __name__ == "__main__":
asyncio.run(main())

Speak into your microphone. You should see output like:

[Final]: Hello, welcome to Speechmatics.
[Final]: This is a real-time transcription example.

Press Ctrl+C to stop.

Understanding the output

The API returns two types of transcript results. You can use either or both depending on your use case.

TypeLatencyStabilityBest for
Final~0.7–2sDefinitive, never revisedAccurate transcripts, subtitles
Partial<500msMay be revisedLive captions, voice interfaces

Finals represent the best transcription for a span of audio and are never updated once emitted. You can tune their latency using max_delay — lower values reduce delay at the cost of slight accuracy.

Partials are emitted immediately as audio arrives and may be revised as more context is processed. A common pattern is to display partials immediately, then replace them with finals as they arrive.

To receive partials, set enable_partials=True in your TranscriptionConfig and register a handler for ADD_PARTIAL_TRANSCRIPT:

config = TranscriptionConfig(
language="en",
max_delay=0.7,
enable_partials=True, # Enable partial transcripts
)

async with AsyncClient(api_key=API_KEY) as client:
@client.on(ServerMessageType.ADD_PARTIAL_TRANSCRIPT)
def handle_partials(msg):
if partial := TranscriptResult.from_message(msg).metadata.transcript:
print(f"[Partial]: {partial}")

@client.on(ServerMessageType.ADD_TRANSCRIPT)
def handle_finals(msg):
if final := TranscriptResult.from_message(msg).metadata.transcript:
print(f"[Final]: {final}")

With both handlers registered, you'll see partials arrive first, then be superseded by the final result:

[Partial]: Hello wel
[Partial]: Hello welcome to
[Final]: Hello, welcome to Speechmatics.

Next steps

Now that you have real-time transcription working, explore these features to build more powerful applications.