IIIT Hyderabad Publications |
|||||||||
|
Concept Hierarchy Based Diverse Frequent PatternsAuthor: Kumara Swamy Mittapally Date: 2019-08-05 Report no: IIIT/TH/2019/100 Advisor:P Krishna Reddy AbstractOver the past two decades, as a part of data mining, several approaches and frameworks have been developed for extracting different kinds of interesting knowledge from multiple types of data. In the Internet era, the data mining algorithms are being employed to build massive information/knowledge driven systems such as search engines, recommendation systems and decision support systems. Due to inherent value of new kinds of knowledge in improving business performance, there is a significant thrust towards developing better knowledge extraction approaches from different data sets for building next-generation information- and knowledge-based systems. The term diversity arises repeatedly in economics, natural sciences, social sciences, and information science. Related to information science, recently, the research efforts are being made to develop approaches to improve the diversity performance of recommender system (RS), and web search query recommendations (QRs). The problem of improving diversity is being encountered in several real-life applications. In the domain of a computer network. Finding diverse node in a network which is connected to multiple clusters of nodes has been studied. In social networks, the diversity has characterized as how diverse a given node connects with its peers. The notion of diversity has been investigated within the context of text documents using Rao's measure. Normally, the data mining approaches extract the knowledge structures like significant patterns (frequent patterns, and association rules), clusters, and classes. In the area of pattern mining, the process of frequent pattern (FP) extracts the interesting information about the association among the items in transactional data sets. Several algorithms have been proposed to extract FPs efficiently from the transactional data sets. Several approaches have been proposed to mine the subset of interesting FPs according to the needs and demands of the application by defining new interestingness measures like periodicity, pattern-length, and cost (utility) of a FP. In this thesis, we propose a model of diverse frequent patterns (DFPs) and show that it can be employed to improve the performance of RS and web search QR systems. We have proposed a model to compute the diversity of pattern (set of items). Given a domain, set of items can be grouped into a category, and a pattern may contain the items which belong to multiple categories. In several applications, it may be useful to distinguish between the pattern having items belonging to multiple categories and the pattern having items belonging to one or a few categories. The existing FP mining approaches do not distinguish the patterns based on the categories of the items in the pattern. The notion of \emph{diversity} captures the extent of the items in the pattern belong to different categories. We propose a framework to rank the pattern by analyzing the extent the items of the pattern belong to different categories in the corresponding concept hierarchy. We defined a measure to capture the extent of diversity to rank the patterns which is called as diverse rank (drank). A pattern which satisfies the threshold value of drank is called a diverse pattern (DP). Also, a pattern which satisfies the threshold values of drank and support are called diverse frequent pattern (DFP). We propose two frameworks to compute the diversity of the pattern: one is based on the balanced concept hierarchy of the items of the pattern and the other is based on the unbalanced concept hierarchy of the items of the pattern. Through experimental results, we show that the knowledge of DFPs is different from the knowledge of FPs. Based on the concept of DFP, we have proposed an improved association rule (AR) based RS approach by observing the fact that providing high accuracy of recommendation alone cannot meet user satisfaction in RS. In the literature, efforts have been made to improve the variety/diversity of recommendations for higher user satisfaction. In the existing approaches, the accuracy performance has been compromised for the sake of improving the variety of recommendations. The proposed approach is aimed to improve both the diversity as well as the accuracy of AR based RSs. The proposed RS approach computes the drank for the pattern of AR based on the concept hierarchy formed by the corresponding items. The recommendations are made based on ARs with high confidence and high drank. The experimental results on the real-world MovieLens data set show that the proposed RS approach improves the performance of the existing AR based RS with better diversity without compromising the accuracy. Further, by exploiting the concept of DFP, we have investigated an improved approach for search QRs. Search engines use a QR technique to recommend a different set of queries to improve the user satisfaction on search results. The existing approaches recommend queries that are similar to the user's query. Sometimes, similar queries may not be appropriate when the user forms improper query to represent his/her information needs. There are efforts in the literature to recommend a set of dissimilar queries to the original query using the notion of orthogonal queries, which are related queries with no commonality with the original query. However, the orthogonal recommendations may generate completely different queries those may not match to the user's intent. We propose an alternative approach for QR by extending the concepts of ARs and DFPs, and unbalanced concept hierarchy of search terms. The experimental results on real-world AOL click-through data set show that QRs with the proposed approach improves the diversity without compromising accuracy over the existing orthogonal QR approach. Overall, we have proposed a new model of DFPs, and demonstrated the utility DFPs in improving the performance of AR based RS and search QRs. The proposed DFP model provides the scope to develop new approaches to improve the performance of several information- and knowledge-based decision support/recommendation services. Full thesis: pdf Centre for Data Engineering |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |