IIIT Hyderabad Publications |
|||||||||
|
Implementation of epoch detection and its application on FPGAAuthor: Syed Abdul Jabbar 2020702022 Date: 2024-04-17 Report no: IIIT/TH/2024/47 Advisor:Anil Kumar Vuppala AbstractEpoch is the instant at which significant excitation of vocal tract filter takes place during the production of speech phonation. The extraction of epochs helps in speech enhancement and multi-speaker separation. For most voiced speech, the significant excitation takes place around the instant of glottal closure. Several methods have been proposed for estimating the instants of glottal closure from speech signal without using the electro-glottograph (EGG) signal. These methods are based on short-time energy of the speech signal, predictability of an all pole linear predictor, and properties of group-delay. From lot of different methods, Zero Frequency Filtering (ZFF) and Zero Phase Zero Frequency Resonator (ZP-ZFR) is one of the the best and simplest method to find prominent locations of epochs which gives highest accuracy detection rate, and it also outperforms many other parameters. This thesis work focuses on the implementation of the ZP-ZFR algorithm and its application on Field Programmable Gate Array (FPGA). The ZP-ZFR algorithm explains its principles and advantages over the traditional ZFF algorithm. The implementation details of both algorithms on FPGA are presented, including the software simulation phase using MATLAB and the subsequent implementation using FPGA boards. Moreover, the application ZP-ZFR based Voiced Activity Detector (VAD) is implemented. The primary objective is to evaluate the effectiveness and efficiency of the ZP-ZFR algorithm and detecting voiced and unvoiced segments in speech signals which are useful for real time applications. FPGAs are well-suited for also many applications, such as object detection, tracking, and recognition, radar systems, digital signal processing (DSP) and mainly for vision based applications. FPGA implementation is carried out because, these are inherently parallel devices, allowing multiple operations to be executed simultaneously. This makes them well-suited for applications that require realtime processing, such as speech recognition and speech synthesis etc. FPGAs can also be customized to accelerate specific tasks by implementing dedicated hardware. This is highly beneficial for applications requiring real-time processing or demanding computational tasks. It is also known for low-latency operations and can be more power-efficient for specific tasks. Full thesis: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |