Comparative Analysis of the Performance of CRF, HMM and MaxEnt for Part-of-Speech Tagging, Chunking and Named Entity Recognition for a Morphologically rich language

Authors: Manish Agarwal,Rahul Goutam,Ashish Jain,K Sruthilaya Reddy,Prudhvi Kosaraju,Shashikant Muktyar,Bharat Ambati,Rajeev Sangal
Conference: Pacific Association For Computational Lingustics (PACLING2011 2011)

Date: 2011-07-19
Report no: IIIT/TR/2011/92

Abstract

In this paper, we present a comparative analysis between three methods for statistical part-of-speech(POS) tagging, chunking and named entity recognition(NER) for a morphologically rich language, Hindi, using a large annotated corpus. The methods explored are Conditional Random Fields(CRF), Hidden Markov Models(HMM) and Maximum Entropy Model(MaxEnt). We further propose an iterative approach as a method to improve the results. To the best of our knowledge, there is no previous work on comparative analysis of statistical POS tagging, chunking and NER in Hindi using the three methods when a large manually annotated corpus is used. The maximum POS tagging, chunking and NER accuracies for CRF, HMM and MaxEnt achieved are (94.00%, 91.70%, 56.03%), (92.96%, 89.23%, 48.21%) and (92.88%, 85.48%, 49.09%) respectively. Our work shows that CRF performs consistently better than HMM and MaxEnt for all of the three abovementioned tasks.

Full paper: pdf

Centre for Language Technologies Research Centre

IIIT Hyderabad Publications

Comparative Analysis of the Performance of CRF, HMM and MaxEnt for Part-of-Speech Tagging, Chunking and Named Entity Recognition for a Morphologically rich language

Abstract