IIIT Hyderabad Publications |
|||||||||
|
Development of IIITH Hindi English Code Mixed Speech DatabaseAuthors: Rambabu B,Suryakanth V Gangashetty Conference: 6th international workshop on spoken language technologies for under-resourced languages(SLTU'18) (SLTU-2018 2018) Location Gurugram, India Date: 2018-08-29 Report no: IIIT/TR/2018/72 AbstractThis paper presents the design and development of IIITH Hindi-English code mixed (IIITH-HE-CM) text and corresponding speech corpus. The corpus is collected from several Hindi native speakers from different geographical parts of India. The IIITH-HE-CM corpus has phonetically balanced code mixed sentences with all the phoneme coverage of Hindi and English languages. We used triphone frequency of word internal triphone sequence, consists the language specific information, which helps in code mixed speech recognition and language modelling. The code mixed sentences are written in Devanagari script. Since computers can recognize Roman symbols, we used Indian Language Speech Sound Label (ILSL) transcription. An acoustic model is built for Hindi-English mixed language instead of language-dependent models. A large vocabulary code-mixing speech recognition system is developed based on a deep neural network (DNN) architecture. The proposed code-mixed speech recognition system attains low word error rate (WER) compared to conventional system. Full paper: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |