IIIT Hyderabad Publications |
|||||||||
|
AN INVESTIGATION OF LSTM-CTC BASED JOINT ACOUSTIC MODEL FOR INDIAN LANGUAGE IDENTIFICATIONAuthors: Tirusha Mandava,Ravi Kumar Vuddagiri,Hari Vydana,Anil Kumar Vuppala Conference: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU-2019 2019) Location Sentosa, Singapore Date: 2019-12-14 Report no: IIIT/TR/2019/118 AbstractIn this paper, phonetic features derived from the joint acoustic model (JAM) of a multilingual end to end automatic speech recognition system are proposed for Indian language identification (LID). These features utilize contextual information learned by the JAM through long short-term memory-connectionist temporal classification (LSTM-CTC) framework. Hence, these features are referred to as CTC features. A multi-head self-attention network is trained using these features, which aggregates the frame-level features by selecting prominent frames through a parametrized attention layer. The proposed features have been tested on IIITH-ILSC database that consists of 22 official Indian languages and Indian English. Experimental results demonstrate that CTC features outperformed i-vector and phonetic temporal neural LID systems and produced an 8.70% equal error rate. The fusion of shifted delta cepstral and CTC feature-based LID systems at the model level and feature level further improved the performance. Full paper: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |