Sampling for Algorithm Engineering

Author: Hardhik Mallipeddi
Date: 2020-05-07
Report no: IIIT/TH/2020/33
Advisor:Kishore Kothapalli

Abstract

With the current fast pace growing trend towards AI, machine learning, gaming, social media scale of processing, there is great interest in the related algorithms and the constant need to improve on the efficiency of these fundamental algorithms such as matrix multiplications and graph algorithms. The method of sampling the input instance to gain insights and leverage them to build more efficient and quicker solutions is a well explored area and is gaining popularity. It involves creating an intermediate representation of the input which is processed for obtaining insights on the entire input. Applications of this technique can be seen from Fischler et al. [37], who proposed the RANSAC (Random sample consensus) in 1981 for estimating the parameters of a mathematical model from some observed data containing outliers. Sparse matrix computations like multiplying a sparse matrix with another sparse matrix (spmm) and multiplying a sparse matrix with a dense matrix (csrmm) are fundamental to the linear algebra computations and hence is a core part of major scientific problems at scale. Similarly, graph algorithms, such as finding the connectivity of graphs, have been studied for centuries and have been an area of extensive research and have widespread applications ranging from linguistics and computational biology to social sciences and network topologies. Finding biconnected components of a graph is used in planarity testing [47], isomorphism in planar graphs [50], network analytics [26, 41, 80] and the like. The trend towards heterogeneity leads us to customize algorithms that use properties of the input graph for splitting the workload among the computing units in a heterogenous setting. The majority of the algorithms propose a mechanism of splitting the workload into the right pieces and assigning the right pieces to the right computational units for processing based on various thresholds. Computing these thresholds for each of the inputs might not be straightforward always and generally requires an empirical search across the possible range of threshold values, which leads to the time taken to find the threshold overshadowing the benefits gained from the Heterogenous efforts. There has been increasing interest in finding biconnectivity properties of a given graph given its varied applicability and is reflected in the research progress made in terms of speeding up the existing approaches using various techniques with the newly emerging computing platforms. If we have a quick and straightforward method to find the certificate of the graph, we can test it for properties like connectivity and claim that it also holds for the actual graph itself. Hence, finding a certificate for a graph is an interesting problem with immense impact.In this thesis, we propose a simple and effective sampling-based technique for the work partitioning of heterogenous algorithms. We find a nearly optimal threshold value to distribute workload among the computation units for multiplying two unstructured sparse matrices(spmm). We validate our approach on a variety of real-world graphs and show that we can find the best possible threshold value ±10.6% away from the best possible threshold value. We also propose a sequential sampling-based technique for computing the biconnected components of a graph. We validate our approach on real-world graphs and report a speedup of 3.4 times when sampling technique is used

Full thesis: pdf

Centre for Security, Theory and Algorithms

IIIT Hyderabad Publications

Sampling for Algorithm Engineering

Abstract