IIIT Hyderabad Publications |
|||||||||
|
Indian Language Identification using Deep Neural NetworksAuthor: Mounika KV Date: 2018-06-02 Report no: IIIT/TH/2018/23 Advisor:Anil Kumar Vuppala AbstractThe objective of a language identification system is to identify the language in a given spoken utterance. Based on the availability of corresponding text to a given utterance, the approach for language identification can either be implicit or explicit. In an implicit case, we only have access to the acoustic signal and statistical models are built over the features extracted from the acoustic signal. In an explicit case, corresponding text is available for the given utterance and rule-based approaches are used to identify the language. This thesis presents an implicit approach based on deep neural network (DNN) models for language identification. DNN models have been recently proposed for the task of language identification. In this thesis, DNN models have been explored in the Indian scenario. While DNN based approach is inherently a frame based one, we propose an attention mechanism based DNN architecture for utterance level classification there by efficiently making use of the context at model level. This approach can be viewed as an alternative to the state-of-the-art Gaussian mixture model based iVector baseline system. On contrary to previously published works using DNNs in LID, DNN equipped with attention mechanism has the advantage of discriminative end-to-end training without any generative component or post-averaging of posterior probabilities obtained at frame level. Several experiments have been performed on a dataset consisting 13 of the official Indian languages with 120 hours of training data. Evaluation of models were performed on 30 hours of testing data with about 2.5 hours for each language. Equal error rate (EER) per language is used as the performance metric. From the results, it is found that the DNN with attention mechanism outperforms the regular DNN and i-vector baseline system indicating the effectiveness of the attention mechanism. Further, a combination of excitation source and vocal tract system features have been explored for the task of language identification. A better performance gain has been achieved by combining the features using a late fusion mechanism. This result indicates the complementary nature of the excitation source information to that of the widely used vocal tract system information. Full thesis: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |