IIIT Hyderabad Publications |
|||||||||
|
A graph-based unsupervis ed N-gram filtration technique for automatic keyphrase extractioAuthors: Niraj Kumar,Kannan Srinathan,Vasudeva Varma Journal: Int. J. Data Mining, Modelling and Management (link) Volume: X Volume Number: 2 Date: 2014-08-14 Report no: IIIT/TR/2014/93 AbstractIn this paper, we present a novel N-gram (N> = 1) filtration technique for keyphrase extraction. To filter the sophisticated candidate keyphrases (N-grams), we introduce the combine use of: 1) statistical feature (obtained by using weighted betweenness centrality scores of words, which is generally used to identify the border nodes/edges in community detection techniques); 2) co-location strength (calculated by using nearest neighbourDbpedia texts). We also introduce the use of N-gram (N> = 1) graph, which reduces the bias effect of lower length N-grams in the ranking process and preserves the semantics of words (phraseness), based upon local context. To capture the theme of the document and to reduce the effect of noisy terms in the ranking process, we apply an information theoretic framework for key-player detection on the proposed N-gram graph. Our experimental results show that the devised system performs better than the current state-of-the-art unsupervised systems and comparable/better than supervised Full article: pdf Centre for Search and Information Extraction Lab |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |