Abstract
This study is relevant to forensic cases where an earwitness claims to be able to
remember and identify a voice, and, more generally, to voice recognition. Given that a voice may have been heard over the telephone, it is important to know the effects of the telephone on voice quality. In particular, we may ask whether the perceptual distance between two voice samples is affected by telephone transmission. In this article the effects of the telephone are tested by an experiment using 15 speakers of Standard Southern British English from the DyViS database of accent-matched young adult male speakers. For each possible pairing of speakers (including same-speaker pairs) 20 listeners heard a short speech sample from each of the two speakers and were asked to rate the distance between the two voices on a scale of 1 (very similar) to 9 (very different). The speech samples had been recorded simultaneously in both studio and telephone quality and were heard in ‘studio only’, ‘telephone only’ and ‘mixed (telephone and studio)’ pairs. Average similarity ratings across all speaker pairings for the three media conditions showed that the same pairs of voices are judged to be more similar when recorded over the telephone than at full bandwidth. When presentation qualities are mixed the voices sound more different for the listener; it is likely that the different transmission characteristics are being conflated with the voice differences. Implications for forensic cases are discussed.
remember and identify a voice, and, more generally, to voice recognition. Given that a voice may have been heard over the telephone, it is important to know the effects of the telephone on voice quality. In particular, we may ask whether the perceptual distance between two voice samples is affected by telephone transmission. In this article the effects of the telephone are tested by an experiment using 15 speakers of Standard Southern British English from the DyViS database of accent-matched young adult male speakers. For each possible pairing of speakers (including same-speaker pairs) 20 listeners heard a short speech sample from each of the two speakers and were asked to rate the distance between the two voices on a scale of 1 (very similar) to 9 (very different). The speech samples had been recorded simultaneously in both studio and telephone quality and were heard in ‘studio only’, ‘telephone only’ and ‘mixed (telephone and studio)’ pairs. Average similarity ratings across all speaker pairings for the three media conditions showed that the same pairs of voices are judged to be more similar when recorded over the telephone than at full bandwidth. When presentation qualities are mixed the voices sound more different for the listener; it is likely that the different transmission characteristics are being conflated with the voice differences. Implications for forensic cases are discussed.
Original language | English |
---|---|
Pages (from-to) | 229-246 |
Journal | International Journal of Speech, Language and the Law |
Volume | 20 |
Issue number | 2 |
DOIs | |
Publication status | Published - 2013 |
Keywords
- voice similarity
- voice line-ups
- voice parades
- earwitness identification
- telephone transmission