Measuring Voice Latency

Introduction

This document describes some measurements of the latency of various types of voice calls. The call types measured were:

PSTN to PSTN
PSTN to cell phone
Cell phone to PSTN
Cell phone to cell phone
OpenPhone on Windows to SIPP in loopback mode (G.711 codec)
Audio loopback device (no network or protocol stack used)

In each case, the latency was measured using CoolEdit 96 on a laptop to simultaneously record the audio fed into the microphone of the originating endpoint and the audio departing the speaker of the destination endpoint.

All audio was recorded at 48khz mono using 16 bits per sample.

The audio source was a WAV file containing five 1ms audio pulses, spaced at 0.75 seconds, at -3db. This was played on the same laptop used for recording. The original reference source can be found here

Methodology

Before starting the tests, a reference recording was made of the environment, which was a small office containing a large number of computer systems. No steps were taken to remove unwanted acoustic sources or to reduce echo.

This test showed there was no appreciable echo past 40ms after the start of the original tone. This was considered acceptable for testing.

The measurement methodology was to assume that any "echo" present after 50ms of the start of the tone was a recording of the received signal. The time between the start of the direct tone and the received tone was then measured using CoolEdit.

Results

The recorded WAV files can be found here

The results were as follows:

1. PSTN to PSTN: no appreciable echo could be found.

2. PSTN to cell: 135ms delay between signals

3. Cell to cell: 270ms delay between signals

4. OpenPhone to SIPP: 260ms delay between signals

5. Internal audio loopback (no SIP protocol) : 180ms delay between signals

Analysis

The reference tests (1 through 3) are consistent with real life experience.

Test #5 shows that latency within the audio device being used is approx 180ms. This is atrocious, but PC sound cards, and Vista, are designed for high quality audio streaming, not interactive voice applications.

If the audio latency is subtracted from the results of test #4, it appears that the latency within Opal is around 80ms. This is expected given the requirements of the jitter buffer - any less and audio could break up if the network traffic started jittering.