This page shows sound samples demonstrating the theory and techniques described in the papers presented at ICASSP 2001 and submitted to IEEE Trans. Multimedia. Please refer to those papers for details in theory.
Table of Contents
I. Time-scale Modification Using Waveform Similarity
OverLap Add (WSOLA)
II. Low Delay Loss Concealment
III. Adaptive Playout Scheduling
Packet playout schedule can be adjusted by modifying the speech rate. Time-scale modification is used to scale the packet size and the speech rate, even within a talkspurt. Here we show WSOLA based time-scale modification can modify speech rate without impairing the quality or changing the pitch. This technique works well on both speech and audio.
8kHz samples
|
-
|
Original
|
Accelerated version |
Retarded version |
|
Speech
|
|||
|
Audio
|
22kHz audio samples
|
-
|
Original
|
Accelerated version |
Retarded version |
|
James Brown
|
|||
|
U2
|
In the examples above, speech and audio samples are scaled by a large amount so that the effect of this processing technique is made more obvious. In packet voice applications, in order to adaptively adjust the packet playout schedule, speech scaling only needs to be done very infrequently and its effect is nearly imperceptible subjectively.
A few speech examples are shown below for three delay traces
collected from Internet, each with different delay jitter characteristics
(low, medium, high jitter respectively). Speech rate is adjusted to adapt
the playout schedule according to the varying network delay jitter.
Speech samples are PCM coded, although our technique is
codec independent. Packet loss is not reflected in these examples since the
focus of this part is speech quality after time-scale modification. Packet
loss and loss concealment will be addressed in Part II and III.
Trace 1: Stanford to University of Erlangen, Germany
Std of network delay: 4.5 ms
Maximum jitter: 45.0 ms
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Trace 2: Stanford to Chicago
Std of network delay: 20.9 ms
Maximum jitter: 112.0 ms
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Trace 3: Montreal, Canada to Chicago
Std of network delay: 43.4 ms
Maximum jitter: 243.0 ms
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Part II. Low Delay Loss Concealment
Concealment of simulated 10% and
20% (which is very high in packet voice communications) random loss is shown
in this part. Time-scale modification, waveform repetition, and cross-correlation
in search of merging segments are used for concealment. This operation introduces
one packet delay. Here packet size is 20ms.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Click here to
compare with the concealment by G.723 and G.729 (3% and 10% loss rates)
Part III. Adaptive Playout Scheduling
Voice packet are sent over the Internet
and delay traces are collected from networks with different delay characteristics.
Both adaptive playout scheduling (Part I) and loss concealment (Part II) are
implemented during playout. Here the packet size is 20ms.
We compare the method which adjusts playout schedule only between talkspurts and that adjusts schedule both between and within talkspurts using time-scale modification, given they introduce nearly the same delay. The latter one is developed by us.
Trace 1: Stanford to Chicago
Std of jitter: 20.0 ms; Maximum jitter: 112.0 ms
Link loss rate: 0.0%
Between talkspurt adjustment:
Buffering Delay: 25.4 ms
Late Loss Rate: 13.0%
Burst Loss Rate: 7.3%
Within
talkspurt adjustment:
Buffering Delay: 25.3 ms
Late Loss Rate: 7.0%
Burst Loss Rate: 0.8%
Std of jitter: 23.2 ms; Maximum jitter: 111.0 ms
Link loss rate: 0.0%
Between talkspurt adjustment:
Buffering Delay: 34.1 ms
Late Loss Rate: 17.0%
Burst Loss Rate: 9.8%
Within talkspurt adjustment:
Buffering Delay: 30.9 ms
Late Loss Rate: 6.0%
Burst Loss Rate: 0.37%