IIIT Hyderabad Publications |
|||||||||
|
Mining Cuboid Outliers in Information NetworksAuthor: Ayushi Dalmia Date: 2017-03-17 Report no: IIIT/TH/2017/15 Advisor:Vasudeva Varma,Manish Gupta AbstractThe study of complex networks or graphs has been extensively pursued by researchers from multiple disciplines. In history, the well-known “Seven Bridges of Koenigsberg” problem is the first real-world problem that was solved by the study of networks. Since then, there has been significant theoretical advancement in this area. Network science has been explored in diverse fields such as sociology, biology, physics and computer science and has spanned a plethora of applications like link analysis, community detection, influence analysis. Only recently, detecting outliers in such information networks has caught the attention of researchers in the data mining community. Outlier detection in information networks is the focus of this thesis. Outliers are structures in networks which have unusual patterns, which deviate significantly from the rest of the data. Outliers in graphs can appear in different forms like nodes, edges, subgraphs, communities and so on. In this thesis, we propose a novel outlier structure in networks, graph cuboid outliers. We put forward the novel problem of graph cuboid outlier detection in both static and timeevolving networks and design efficient algorithms for the same. Recently, there has been an interest in finding outliers with respect to a given query. Query-based outlier detection is more favorable over the general outliers as it allows the user a flexibility to find outliers following a particular schema and predicates encoded in the form of a query. Given a heterogeneous network and a query we detect Graph Cuboid Outliers (GCOutliers). Graph cuboids are semantically related regions in a network which helps us view the network along multiple dimensions and at multiple levels. GCOutliers are those cuboids which exhibit anomalous behavior with respect to the given subgraph query. An example of a GCOutlier would be those research areas which have unusual collaborations in a DBLP network. Further, we can obtain such regions with respect to specific requirements, such as finding those research areas where a large numbers of papers are published with at least five authors. The requirement is modeled as a subgraph query to obtain query sensitive graph cuboid outliers. In order to detect GCOutliers, there are several challenges; (i) the number of cuboids can be high, (ii) the number of matches can be large and (iii) subgraph isomorphism is an NP-hard problem. In our solution we address the issue of the large number of matches. This is done by designing a na¨ıve random sampling method followed by a more principled sampling method using regression models for nodes and edges. We next move our attention to time-evolving or temporal networks. For a series of snapshots of a heterogeneous temporal network we propose to detect Evolutionary Graph Cuboid Outliers (EGCOutliers) in temporal networks. EGCOutliers are those cuboids which have an anomalous behaviour with respect to the cuboids in its own snapshot and whose behavior over time is different from the average behavior of all the cuboids over time. The change in behavior over time is considered as the evolutionary behavior and therefore, we call our graph cuboid outliers as Evolutionary Graph Cuboid Outliers. We extend GCOutlier detection to detect EGCOutliers. We model the problem of detecting EGCOutliers into a regression problem. The experimental results on both real and synthetic datasets show the efficiency of the proposed algorithms. We posit the problem of detecting GCOutliers and EGCOutliers in heterogeneous networks and proposed efficient solutions for the same. We believe this thesis laid the foundation for graph cuboid outliers and opens up a new direction in the area of outlier detection research. As network based systems continue to grow and become ubiquitous, we believe that the proposed algorithms will receive attention in a wide variety of applications and will foster the area of outlier detection in general. Full thesis: pdf Centre for Search and Information Extraction Lab |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |