Demonstration and Results of Subjective Listening Tests
This page shows speech samples based on which the listening test results in papers presented at ICASSP 2001 and submitted to IEEE Trans. Multimedia are obtained. The listening tests are carried out according to ITU-T Recommendations P.800. The scores are obtained by the average over all samples in a particular condition and over all eighteen listeners.
Table of Contents
I. Audio Scaling
II. Adaptive Playout and Loss Concealment Using Time-Scale
Modification
III. Modulated Noise Reference Unit (MNRU)
In this part of the test, the listeners hear speech samples in groups. Each group consists of four sentences. First a pair of sentences "A-B" as a reference for original, unprocessed speech. The sentences "A" and "B" are short and uncorrelated in meaning. After a brief pause the same sentences are repeated. This time, however, the samples are scaled by time-scale modification.
The listeners evaluate how good they think the second pair of sentences
"a-b" sounds in reference to the first pair "A-B". They are asked to rate
the quality by giving a 1 - 5 score based on the following definition:
|
|
|
| Degradation is inaudible |
|
| Degradation is audible but not annoying |
|
| Degradation is slightly annoying |
|
| Degradation is annoying |
|
| Degradation is very annoying |
|
The score obtained using this methodology is referred as DMOS (Degradation
Mean Opinion Score), which is more sensitive than MOS (Mean Opinion Score) and
is often used to evaluate sound samples with high quality. Samples and test
scores received are shown below:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Part II Adaptive Playout and Loss Concealment Using Time-Scale Modification
In this part of the test, the listeners hear speech samples (each sample consists of two sentences) varying in different quality, which result from different processing techniques during playout, and they are asked to rate how good they think the samples sound according to their general impression.
They are asked to rate the quality of each sample by giving a 1 - 5
score based on the following scale:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Audio samples and test results are shown below:
Algorithm 2 Adaptive scheduling between talkspurts.
Algorithm 3 (developed by us) Adaptive scheduling with jitter adaptation.
Playout time is adjusted within talkspurts using time-scale modification.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The third part of the test is trivial in evaluating playout algorithms.
These samples are MNRU samples with different SNR conditions. This part
of the test is to make the results obtained in part 1 and 2 comparable
to those by other tests. The listeners are asked to rate the quality of
each sample by giving a 1 - 5 score based on the following scale:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The MOS scores received are shown below:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|