IIIT Hyderabad Publications |
|||||||||
|
MapReduce based and Incremental Approaches to Extract Coverage Patterns from Large Transactional DatabasesAuthor: Ralla Akhil Date: 2020-12-03 Report no: IIIT/TH/2020/113 Advisor:P Krishna Reddy AbstractIn the big data era, pattern mining is an important task of data mining and involves the extraction of interesting associations from large databases. In the literature, pattern mining approaches have been investigated to extract different kinds of patterns such as frequent patterns, sequential patterns, periodic patterns, utility patterns, and coverage patterns from transactional databases. The pattern mining approaches have been widely employed to improve the performance of applications in the areas of recommendation systems, e-commerce, customer relation management, internet advertising, decision support systems, social media, and so on. One of the key issues of developing pattern mining algorithms is as follows: a pattern mining algorithm has to process a huge volume of the transactional dataset. Research efforts are going on to develop pattern mining algorithms to operate on large datasets by exploiting different parallel processing paradigms, namely shared memory, shared disk, shared-nothing, GPUs, incremental, MapReduce, and Grid. In this thesis, we propose both MapReduce based and Incremental techniques to extract the knowledge of patterns called Coverage Patterns. The framework of MapReduce has become very popular to process big data. It enables the distributed processing of huge amounts of data on hundreds of machines in geographically distributed environments with robust fault-tolerance. With Incremental mining framework, when the increment database is added, it is possible to improve the performance by limiting the computation to incremental part and with the required additional processing. Given a transactional database, the coverage pattern mining involves the extraction of multiple sets of items, where each set (covers a certain percentage of transactions) satisfies relative frequency, coverage support and the overlap ratio constraints. The applicability of coverage patterns has been demonstrated in improving the performance of search engine advertising and banner advertising. In the literature, a level-wise iterative coverage pattern mining (CMine) and a pattern growth approaches have been proposed. As a first contribution, we have proposed a MapReduce based coverage pattern mining approach. The key issue is to compute the overlap ratio value of a pattern in an efficient manner under MapReduce. However, we exploit the observation that it is possible to compute the overlap ratio of a given pattern by computing the numerator and the denominator independently in a distributed manner. Experimental evaluation with both real-world and synthetic datasets demonstrate that the proposed approach significantly improves the performance over the existing CMine approach. Furthermore, we have proposed an incremental coverage pattern extraction algorithm, when an increment of the transactional database (simply Increment) is added to the existing transactional database. It has been observed that, when the Increment is added, if the order of items in the frequency list (Flist) of the database is changed, additional patterns have to be validated and require additional processing. By considering the issue of change in the order of items in the Flist, we have proposed an Incremental framework to extract coverage patterns. We have also proposed a MapReduce based Incremental approach by combining the advantages of Incremental and MapReduce techniques. We have conducted experiments on both synthetic and real-world datasets and demonstrated that the proposed approaches could improve the performance significantly even though increment size is equal to the existing transactional database. Overall, we have proposed MapReduce, Incremental, and MapReduce based Incremental approaches to extract the knowledge of coverage patterns from large transactional databases. The efficiency of the proposed algorithms is demonstrated by conducting extensive experiments on both real-world and synthetic datasets. We hope that the proposed approaches will enable the applications which extract coverage patterns to process large transactional databases Full thesis: pdf Centre for Others |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |