Adaptive Playout Scheduling and Loss Concealment Using Time-scale Modification

 

This page shows sound samples demonstrating the theory and techniques described in the papers presented at ICASSP 2001 and submitted to IEEE Trans. Multimedia. Please refer to those papers for details in theory.

Table of Contents

I. Time-scale Modification Using Waveform Similarity OverLap Add (WSOLA)
II. Low Delay Loss Concealment
III. Adaptive Playout Scheduling


Part I.    Time-scale Modification Using Waveform Similarity OverLap Add (WSOLA)
 

Packet playout schedule can be adjusted by modifying the speech rate. Time-scale modification is used to scale the packet size and the speech rate, even within a talkspurt. Here we show WSOLA based time-scale modification can modify speech rate without impairing the quality or changing the pitch. This technique works well on both speech and audio.

 

8kHz samples
-
Original

Accelerated version
(30% faster)

Retarded version
(30% slower)

Speech
Audio

 

22kHz audio samples
-
Original

Accelerated version
(20% faster)

Retarded version
(20% slower)

James Brown
U2

 

In the examples above, speech and audio samples are scaled by a large amount so that the effect of this processing technique is made more obvious. In packet voice applications, in order to adaptively adjust the packet playout schedule, speech scaling only needs to be done very infrequently and its effect is nearly imperceptible subjectively.

 

A few speech examples are shown below for three delay traces collected from Internet, each with different delay jitter characteristics (low, medium, high jitter respectively). Speech rate is adjusted to adapt the playout schedule according to the varying network delay jitter.
 

Speech samples are PCM coded, although our technique is codec independent. Packet loss is not reflected in these examples since the focus of this part is speech quality after time-scale modification. Packet loss and loss concealment will be addressed in Part II and III.
 

Trace 1: Stanford to University of Erlangen, Germany (low jitter)

Std of network delay: 4.5 ms

Maximum jitter: 45.0 ms
 
Sound file
Scaled and original versions
Plots for playout delay and scaling ratio
Wedding
Scaled
Original
Show plots
Dream
Scaled
Original
Show plots
Notes
Scaled
Original
Show plots
Market
Scaled
Original
Show plots

Trace 2: Stanford to Chicago (medium jitter)

Std of network delay: 20.9 ms

Maximum jitter: 112.0 ms
 
Sound file
Scaled and original versions
Plots for playout delay and scaling ratio
Wedding
Scaled
Original
Show plots
Dream
Scaled
Original
Show plots
Notes
Scaled
Original
Show plots
Market
Scaled
Original
Show plots

Trace 3: Montreal, Canada to Chicago (high jitter)

Std of network delay: 43.4 ms

Maximum jitter: 243.0 ms
 
Sound file
Scaled and original versions
Plots for playout delay and scaling ratio
Wedding
Scaled
Original
Show plots
Dream
Scaled
Original
Show plots
Notes
Scaled
Original
Show plots
Market
Scaled
Original
Show plots


 


Part II.    Low Delay Loss Concealment

Concealment of simulated 10% and 20% (which is very high in packet voice communications) random loss is shown in this part. Time-scale modification, waveform repetition, and cross-correlation in search of merging segments are used for concealment. This operation introduces one packet delay. Here packet size is 20ms.
 
Sound file
Original
With 10% loss
Concealed from 10% loss
With 20% loss
Concealed from 20% loss
Wedding
wav
wav
wav
wav
wav
Dream
wav
wav
wav
wav
wav
Vega
wav
wav
wav
wav
wav

Click here to compare with the concealment by G.723 and G.729 (3% and 10% loss rates)
 


Part III.    Adaptive Playout Scheduling

Voice packet are sent over the Internet and delay traces are collected from networks with different delay characteristics. Both adaptive playout scheduling (Part I) and loss concealment (Part II) are implemented during playout. Here the packet size is 20ms.

We compare the method which adjusts playout schedule only between talkspurts and that adjusts schedule both between and within talkspurts using time-scale modification, given they introduce nearly the same delay. The latter one is developed by us.

Trace 1: Stanford to Chicago

Std of jitter: 20.0 ms; Maximum jitter: 112.0 ms
Link loss rate: 0.0%

Between talkspurt adjustment:

Buffering Delay: 25.4 ms
Late Loss Rate: 13.0%
Burst Loss Rate: 7.3%

Playout wav file


 

Within talkspurt adjustment:

Buffering Delay: 25.3 ms
Late Loss Rate: 7.0%
Burst Loss Rate: 0.8%

Playout wav file




Trace 2: Stanford to University of Erlangen, Germany

Std of jitter: 23.2 ms; Maximum jitter: 111.0 ms
Link loss rate: 0.0%


Between talkspurt adjustment:

Buffering Delay: 34.1 ms
Late Loss Rate: 17.0%
Burst Loss Rate: 9.8%

Playout wav file




Within talkspurt adjustment:

Buffering Delay: 30.9 ms
Late Loss Rate: 6.0%
Burst Loss Rate: 0.37%

Playout wav file





Page maintained by Yi Liang    yiliang@stanford.edu