Demonstration and Results of Subjective Listening Tests

This page shows speech samples based on which the listening test results in papers presented at ICASSP 2001 and submitted to IEEE Trans. Multimedia are obtained. The listening tests are carried out according to ITU-T Recommendations P.800. The scores are obtained by the average over all samples in a particular condition and over all eighteen listeners.

Table of Contents

I. Audio Scaling
II. Adaptive Playout and Loss Concealment Using Time-Scale Modification
III. Modulated Noise Reference Unit (MNRU)


Part I     Audio Scaling

In this part of the test, the listeners hear speech samples in groups. Each group consists of four sentences. First  a pair of sentences "A-B" as a reference for original, unprocessed speech. The sentences "A" and "B" are short and uncorrelated in meaning. After a brief pause the same sentences are repeated. This time, however, the samples are scaled by time-scale modification.

The listeners evaluate how good they think the second pair of sentences "a-b" sounds in reference to the first pair "A-B". They are asked to rate the quality by giving a 1 - 5 score based on the following definition:
 

Quality of Speech 
Score
Degradation is inaudible 
Degradation is audible but not annoying 
Degradation is slightly annoying 
Degradation is annoying 
Degradation is very annoying 

 

The score obtained using this methodology is referred as DMOS (Degradation Mean Opinion Score), which is more sensitive than MOS (Mean Opinion Score) and is often used to evaluate sound samples with high quality. Samples and test scores received are shown below:
 

Network Condition
STD of Network Delay (ms)
Maximum Jitter (ms)
STD of End-to-end Delay (ms)
Maximum Packet Scaling Ratio
Minimum Packet Scaling Ratio
Percentage of Packet Scaled (%)
Test Samples
DMOS
1
19.6
86.0
7.5
1.7
0.55
17.8
1 2 3 4 5 6
4.7
2
20.9
112.0
10.5
2.3
0.38
18.4
1 2 3 4 5 6
4.5
3
65.0
238.0
28.2
2.1
0.35
24.1
1 2 3 4 5 6
4.6


Part II    Adaptive Playout and Loss Concealment Using Time-Scale Modification

In this part of the test, the listeners hear speech samples (each sample consists of two sentences) varying in different quality, which result from different processing techniques during playout, and they are asked to rate how good they think the samples sound according to their general impression.

They are asked to rate the quality of each sample by giving a 1 - 5 score based on the following scale:
 

Quality of Speech 
Score
Excellent 
Good 
Fair 
Poor 
Bad 

 

Audio samples and test results are shown below:

Algorithm 2 Adaptive scheduling between talkspurts.
Algorithm 3 (developed by us) Adaptive scheduling with jitter adaptation. Playout time is adjusted within talkspurts using time-scale modification.

 

Trace
STD of Network Delay (ms)
Maximum Jitter (ms)
Link Loss Rate (%)
Playout Algorithm
STD of End-to-end Delay (ms)
Buffering Delay (ms)
Total Loss Rate (%)
Burst Loss Rate (%)
Test Samples
MOS
Stanford -> Chicago
23.7
130.0
0
Alg. 2
0
55.1
8.5
6.0
1 2 3 4
2.6
Alg. 3
7.6
54.6
2.8
0.7
1 2 3 4
3.7
Stanford -> Germany
15.9
86.0
8.3
Alg. 2
0
26.6
13.3
4.1
1 2 3 4
2.4
Alg. 3
8.5
26.1
8.3
0
1 2 3 4
2.8
Stanford -> MIT
5.9
39.0
0
Alg. 2
0
23.1
2.6
0.6
1 2 3 4
3.3
Alg. 3
2.6
23.0
0.3
0
1 2 3 4
4.3
Stanford -> China
13.7
47.0
0
Alg. 2
0
28.8
5.1
3.9
1 2 3 4
3.0
Alg. 3
7.4
25.7
1.1
0
1 2 3 4
4.1

 


Part III    Test of Modulated Noise Reference Unit (MNRU)

The third part of the test is trivial in evaluating playout algorithms. These samples are MNRU samples with different SNR conditions. This part of the test is to make the results obtained in part 1 and 2 comparable to those by other tests. The listeners are asked to rate the quality of each sample by giving a 1 - 5 score based on the following scale:
 
 

Quality of Speech 
Score
Excellent 
Good 
Fair 
Poor 
Bad 

 

The MOS scores received are shown below:
 
SNR (dB)
Test Samples
MOS
Original Speech
1 2 3 4
4.4
10
1 2 3 4
1.4
18
1 2 3 4
2.7
24
1 2 3 4
3.7
30
1 2 3 4
4.1