IIIT Hyderabad Publications |
|||||||||
|
Improved Approaches to Mine Periodic-Frequent Patterns in Non-Uniform Temporal DatabasesAuthor: Venkatesh J N Date: 2018-03-05 Report no: IIIT/TH/2018/10 Advisor:P Krishna Reddy AbstractThe field of data mining has emerged to extract knowledge hidden in large databases for better decision making. The process of frequent pattern (a set of items represents a pattern (or an itemset)) mining finds interesting information about the association among the items in a transactional database. The notion of support is employed to extract the frequent patterns. A pattern is called a frequent pattern if it satisfies the user-defined threshold on minimum support. An important criterion to assess the interestingness of a frequent pattern is its temporal occurrences in a database. That is, whether a frequent pattern is occurring periodically, irregularly, or mostly at specific time intervals in a database. The class of frequent patterns that are occurring periodically within a database are known as periodic-frequent patterns. Finding these patterns is a significant task with many real-world applications like improving the performance of recommender systems, intrusion detection in computer networks, discovering events in Twitter. Current periodic-frequent pattern models cannot handle datasets in which multiple transactions share a common timestamp or when transactions occur at irregular time intervals. This issue limits the applicability of the model as in many real-world databases like e-Commerce, Twitter, etc., transactions share a common timestamp and uneven time gaps exist in between the consecutive transactions. Most previous models on periodic-frequent pattern mining have focused on finding all patterns in a transactional database that satisfy the user-specified minimum support (minSup) and maximum periodicity (maxP er) constraints. The minSup constraint controls the minimum number of transactions that a pattern must cover in a database. The maxP er constraint controls the maximum duration between the two transactions below which a pattern should reoccur in a database. The usage of a single minSup and maxP er for an entire database leads to the rare item problem, because real-world databases have a non-uniform item distribution, which considers that items have different support and periodicity values. Also, current periodic-frequent pattern models have focused on discovering full periodic-frequent patterns, i.e., finding all patterns that have exhibited complete cyclic repetitions throughout the entire database. These models evaluate the periodic interestingness of a frequent pattern by determining whether all of its inter-arrival times are within the user-specified maxP er threshold. Therefore, the model cannot assess the partial periodic behavior of a frequent pattern in a database. However, partial periodic-frequent patterns are more common due to the imperfect nature of real-world databases. So, to address the above issues, in this thesis, we are proposing two improved approaches that discover periodic-correlated patterns and partial periodic-frequent patterns in non-uniform temporal databases, respectively. In the first approach, we tackle rare item problem by proposing a improved model that discovers periodic-correlated patterns in a non-uniform temporal database. In this thesis, we consider temporal database as a collection of transactions, ordered by their timestamps. Further, a temporal database facilitates multiple transactions to share a common timestamp and allows time-gaps in between consecutive transactions. A temporal database is said to be non-uniform if it contains items with dissimilar support and periodicity. To tackle rare item problem in non-uniform temporal databases, the proposed model considers a pattern as interesting if its support and periodicity are close to that of its individual items. The existing all-confidence measure is used to determine how close is the support of a pattern with respect to the support of its individual items. A new interestingness measure, called periodic-all-confidence, is being proposed to determine how close is the periodicty of a pattern with respect to the periodicity of its individual items. A pattern-growth algorithm has also been discussed to find periodic-correlated patterns. Experimental results show that the proposed model is efficient and tackles rare item problem. We discuss the usefulness of periodic-correlated patterns with a real-world case study on FAA-Accidents database and show that the proposed model may be utilized to discover interesting periodic-correlated patterns involving both frequent and rare items effectively. In the second approach, we have introduced a improved model to discover partial periodic-frequent patterns in non-uniform temporal databases. The proposed model lets the user specify a different maximum inter-arrival time (M IAT ) for each item. An inter-arrival time of a pattern is considered periodic (or cyclic) if it is no more than period. Thus, different patterns may satisfy different period depending on their items’ M IAT values. This solves the rare item problem in partial periodic-frequent pattern mining. A new measure, Relative Periodic-Support (RP S), is proposed to determine the (partial) periodic interestingness of a pattern by considering the number of cyclic repetitions in the database. This measure assess the interestingness of a pattern by taking into account both the support and periodicity information of patterns. A pattern-growth algorithm has been discussed to discover all partial periodic-frequent patterns. Experimental results demonstrate that the proposed model is efficient and tackles the problem of rare item problem as well as the problem of discovering partial periodic-frequent patterns. We discuss the usefulness of partial periodic-frequent patterns with a real-world case study and show that the proposed model may be utilized to find prior knowledge about event keywords and their associations in Twitter data. Overall, in this thesis, we have proposed improved approaches to extract periodic-frequent patterns in non-uniform temporal databases and have shown the advantages through experimental results. Full thesis: pdf Centre for Data Engineering |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |