IIIT Hyderabad Publications |
|||||||||
|
Single Frequency Filtering for Processing Degraded SpeechAuthor: Aneeja G Date: 2018-11-27 Report no: IIIT/TH/2018/92 Advisor:B Yegnanarayana AbstractThis thesis proposes new signal processing methods to highlight some robust speech-specific features present in the degraded speech. It considers different types of degradations that occur in practice. The signal processing methods are based on single frequency filtering (SFF) of speech signal. The SFF output gives magnitude or envelope and the phase of the speech signal at any desired frequency with high frequency resolution. The SFF output at each frequency gives some segments with high signal-to-noise ratio (SNR), as the noise power in the near zero bandwidth resonator of the single frequency will be very small, whereas the signal component, if it exists, will have high power. Thus the high SNR regions will be at different times for different frequencies. This property of the SFF analysis of speech is exploited for extracting a few robust features from the degraded speech, irrespective of the type and extent of degradation. In particular, the following studies are carried out in this thesis: - Discrimination of speech/nonspeech regions in degraded speech - Determination of speech regions in speech degraded by transient noise - Extraction of the fundamental frequency from degraded speech - Detection of glottal closure instants (GCIs) in degraded speech - Enhancement of degraded speech The major contributions of this work are the following: (a) A new signal processing method called single frequency filtering (SFF) method is proposed which gives high signal-to-noise ratio (SNR) regions in both time and frequency domains for speech affected with different types of degradations. (b) A new method for speech/nonspeech detection is proposed exploiting the high SNR features in the SFF outputs of degraded speech. The procedure works for all types of degradations, without specifically tuning for any specific type of degradation. (c) The high SNR characteristic of the SFF output is also exploited for estimating the fundamental frequency ( f o ) by exploiting information at the frequency that gives the highest SNR for that segment. (d) The noise compensation technique proposed for voice activity detection (VAD) is applied for extracting the location of the significant impulse-like excitation within a glottal cycle. This is because the noise compensated envelopes show distinct changes in the slope of the spectral variance computed as a function of time. (e) The noise compensated SFF envelopes derived at different frequency resolutions are used to derive gross and fine weight functions, as a function of time. The combined weight functions when applied to the degraded speech signal produces enhanced speech for speech affected by different types of degradations, thus improving the comfort level of listening. Full thesis: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |