Audio Privacy

Reducing speech intelligibility while preserving environmental sounds

Audio monitoring has many applications but also raises privacy concerns. In an attempt to help alleviate these concerns, we have developed a method for reducing the intelligibility of speech while preserving intonation and the ability to recognize most environmental sounds. Our approach allows sounds indicative of possible trouble, such as breaking glass or a scream, to be identified.

Our method is based on identifying vocalic regions and replacing the vocal tract transfer function of these regions with the transfer function from prerecorded vowels, where the identity of the replacement vowel is independent of the identity of the spoken syllable. The audio signal is then re-synthesized using the original pitch and energy, but with the modified vocal tract transfer function.

The figure shows spectrograms of the original speech (top) and speech processed using the LPC coefficients
from two other speakers saying /uw/ (bottom). The vertical axis is frequency, horizontal axis is time, and darker colors denote higher magnitude. The words of the spoken phrase are roughly aligned along the bottom.

We performed an intelligibility study which showed that environmental sounds remained recognizable but speech intelligibility can be dramatically reduced to a 7% word recognition rate.

Technical Contact

Related Publications

2008
Publication Details
  • ACM Multimedia 2008
  • Oct 27, 2008

Abstract

Close
Audio monitoring has many applications but also raises pri- vacy concerns. In an attempt to help alleviate these con- cerns, we have developed a method for reducing the intelli- gibility of speech while preserving intonation and the ability to recognize most environmental sounds. The method is based on identifying vocalic regions and replacing the vocal tract transfer function of these regions with the transfer function from prerecorded vowels, where the identity of the replacement vowel is independent of the identity of the spoken syllable. The audio signal is then re-synthesized using the original pitch and energy, but with the modi ed vocal tract transfer function. We performed an intelligibility study which showed that environmental sounds remained recognizable but speech intelligibility can be dramatically reduced to a 7% word recognition rate.