IIIT Hyderabad Publications
Efficient Algorithmic Techniques for Heterogeneous Computing
Author: Shashank Sharma
Report no: IIIT/TH/2017/6
Today, general-purpose GPUs (GPGPUs) have become very ubiquitous and available in something as simple as a personal laptop. These GPUs can be utilized to a great extent for their huge parallelism such as NVIDIA GT520 having 48 cores and GTX580 having 512 cores that we use for our implementations. On such devices Single Instruction Multiple Data(SIMD) operations can be performed using NVIDIA CUDA programming model. GPUs for general purpose computing has brought forth several success stories with respect to time taken, cost, power, and other metrics. However, accelerator based computing has significantly relegated the role of CPUs in computation. As CPUs evolve and also oer matching computational resources, it is important to also include CPUs in the computation. We call this the hybrid computing model. Indeed, most computer systems of the present age oer a degree of heterogeneity and therefore such a model is quite natural. With such a hybrid model in mind, in this thesis, we try to explore ways to harvest the benefits of hybrid computing. In this thesis we explore two such ways namely Dynamic Load Balancing where we split the work among GPU and CPU dynamically and the other technique is graph pruning to recursively remove 1-degree nodes before applying an algorithm on a much smaller resultant graph and later post-processing the results to the complete graph. The first two chapters propose hybrid solution to two of major application in GPU computing such as Lattice Boltzmann Method(LBM), a class of computational uid dynamic problem and Ray Casting, an image processing primitive.Later we propose a light-weight, low overhead, and completely dynamic framework that addresses the load balancing problem of heterogeneous algorithms.Our framework will be applicable for workloads which have a few simple characteristics such as having a collection of largely independent tasks that are easily describable. To show the efficacy of our framework, we consider two different heterogeneous computing platforms, and two of the above mentioned workloads: LBM and ray casting. For each of these workloads, we demonstrate that using our framework, we can identify the proportion of work to be allotted to each device up to 8% on average with reference to the base hybrid solution. Further, solutions using our framework require no more than 5% additional time on average compared to best possible load assignment obtained via empirical search. In the later part of the thesis we introduce graph pruning as a technique that aims to reduce the size of the graph. Certain elements of the graph can be pruned depending on the nature of the computation. Once a solution is obtained on the pruned graph, the solution is extended to the entire graph. We apply the above technique on a very fundamental graph algorithm: Single Pair Shortest Path that we later generalize to solve All Pairs Shortest Paths (APSP). To validate our technique, we implement our algorithms on a heterogeneous platform consisting of a multicore CPU and a GPU. On this platform, we achieve an average of 35% improvement compared to state-of-the-art solutions. Such an improvement has the potential to speed up other applications reliant on these algorithms.
Full thesis: pdf
Centre for Security, Theory and Algorithms
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved.