IIIT Hyderabad Publications |
|||||||||
|
Efficient Data Structures and Algorithms for Range Aggregate ProblemsAuthor: Jatin Agarwal Date: 2013-11-27 Report no: IIIT/TH/2013/62 Advisor:Kannan Srinathan,Kishore Kothapalli AbstractComputational Geometry is regarded as an emerging subfield of computer science. In computational geometry the standard range searching problems are studied widely because they have many real-world applications in spatial informatics, VLSI design, geographical information systems, information retrieval, databases, computer vision, computer graphics and other areas. In the standard range searching problem on an object set S we either report points or count the number of points in a given query region q. But just reporting and counting points in the query region is not sufficient for real world applications like data mining where data analysis is the major concern. In order to support applications which require data analysis to capture significant features we compute an aggregation function over the given query region like Sum, Max, Average, etc. Range Queries with such aggregate functions are called range-aggregate queries. But to capture intricate geometric properties on the data set we do geometric aggregation. Some of the examples of geometric aggregation functions are skyline, convex hull, closest pair, Voronoi diagram, triangulation and so on. However, advances in the creation and archival of digital information have led to an information explosion and therefore the number of objects inside a query range can be huge. In such cases, reporting a sample of the result set is preferred. In this work we present solutions to two such geometrical problems which give such samples. Both skyline and convex hull concepts have wide applications to other problems like outlier detection, clustering and so on. Abstractly both skyline and convex hull concepts provide some type of ordering in higher dimensions. Intuitively we all know that ordering on a set of objects makes it more efficient for answering relevant queries by capturing significant features. Let S be a set of n points in the two dimensional plane. A point p = (p x ; p y ) is said to dominate another point q = (q x ; q y ) if p x > qx and p y > qy . Points not dominated by any point in the set are called maximal or skyline points. We preprocess S into a data structure such that we can efficiently compute maximal points or skyline for any given query q. We propose a linear space data structure which takes O(log n + k) query time for any given query q in the word RAM model where k is the total number of points that are reported. We solve this problem for 2-sided quadrant queries and 3-sided range queries. We not only report points inside the given query region but we also count the number of points in S \ q. Full thesis: pdf Centre for Security, Theory and Algorithms |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |