Nearly Balanced Heterogeneous Algorithms for Multiplying Scale-Free Sparse Matrices

Author: Kiran Raj
Date: 2018-07-04
Report no: IIIT/TH/2018/31
Advisor:Kishore Kothapalli

Abstract

Sparse matrix computations such as multiplying a sparse matrix with a sparse matrix, denoted spmm, and multiplying a sparse matrix with a dense matrix, denoted csrmm, are fundamental operations in linear algebra that have several applications. Hence, efficient and scalable implementations of spmm and csrmm have been a topic of immense research. Efforts are aimed at implementations on FPGAs, multi-core architectures, GPUs and such emerging computational platforms. The architectural trend towards heterogeneity has pushed heterogeneous computing to the fore of parallel computing research. Owing to the highly irregular nature of spmm and csrmm it is observed that CPUs and GPUs can offer comparable performance (Lee et al. [26]) and hence implementations on CPU+GPU systems are widely researched. In this thesis, we propose Sorted-Workqueue, a CPU+GPU heterogeneous algorithm for spmm and csrmm based upon Workqueue framework proposed by Kothapalli et al. [25]. We perform experiments on general sparse matrices from standard datasets and study the performance of spmm and csrmm. Another trend towards customizing algorithms according to characteristics of the input are gaining research interest in recent times. In this thesis, we consider real-world sparse matrices that exhibit scale-free nature and exploit their characteristics to propose better algorithms for spmm and csrmm. Focusing on such matrices, we propose novel algorithms HH-CPU for spmm and HD-CPU for csrmm on a CPU+GPU heterogeneous platform. We perform experiments on a wide variety of real-world scale-free matrices from standard datasets and show performance improvement over the state-of-art algorithms on CPU+GPU heterogeneous platform. A majority of the heterogeneous algorithms follow a work partitioning approach where the input is divided into appropriate sized pieces so that individual computing units can process the “right” pieces of the input. Such a division is done by means of some parameters, usually thresholds which vary for each input instance. However, identifying the right value of the threshold is usually non-trivial and may require an extensive empirical search. Such an extensive empirical search may potentially offset any gains accrued out of heterogeneous algorithms. We also show that one of our proposed method, HH-CPU, requires some thresholds to be computed in order to identify the right work that can be assigned to the right processor. For this, we propose a simple, yet effective technique based on sampling which can adapt to the algorithm used, the input instance and the heterogeneous computing units in the platform. Our technique is generic in its applicability as we will demonstrate in this thesis. We validate our technique on HH-CPU and show that we can find the required threshold ± 7.5% away from the best possible threshold.

Full thesis: pdf

Centre for Security, Theory and Algorithms

IIIT Hyderabad Publications

Nearly Balanced Heterogeneous Algorithms for Multiplying Scale-Free Sparse Matrices

Abstract