IIIT Hyderabad Publications |
|||||||||
|
A Frequent Keyword-set Based Algorithm for Topic Modeling and Clustering of Research PapersAuthors: Kumar Shubhankar,Aditya Pratap Singh,Vikram Pudi Conference: Data Mining and Optimization (DMO 2011) Date: 2011-06-27 Report no: IIIT/TR/2011/28 AbstractIn this paper we introduce a novel and efficient approach to detect topics in a large corpus of research papers. With rapidly growing size of academic literature, the problem of topic detection has become a very challenging task. We present a unique approach that uses closed frequent keyword-set to form topics. Our approach also provides a natural method to cluster the research papers into hierarchical, overlapping clusters using topic as similarity measure. To rank the research papers in the topic cluster, we devise a modified PageRank algorithm that assigns an authoritative score to each research paper by considering the sub-graph in which the research paper appears. We test our algorithms on the DBLP dataset and experimentally show that our algorithms are fast, effective and scalable. Full paper: pdf Centre for Data Engineering |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |