Approximate Computing Architectures and Algorithms for Error Resilient Applications

Author: Salman Ahmed
Date: 2021-07-30
Report no: IIIT/TH/2021/90
Advisor:Zia Abbas

Abstract

Approximate Computing is revolutionizing the way we look at the VLSI design flow from RTL design based micro-architectures to standard cell characterization, by adding a quality relaxation constraint that is born out of the error resilience of modern applications. With the unprecedented increase of data-driven approaches and the need to generalize these approaches over multi-class problems, it is imperative that the traditional methods of delivering consistent precision have to adapt. On the other hand, this does not give us a complete free reign over the chip design flow, as unless an acceptable quality is maintained, the overall system failure is inevitable. Approaching physical limits in the fabrication at the current technology nodes and the growing reliability concerns in the manufacturing process have not only made the paradigm shift towards approximate computing attractive, but a necessity. Approximate Computing has enabled trade-offs between acceptable level of accuracy with the leakage and dynamic power dissipation, area, critical path delay and energy consumption of the chip. Previous approaches in approximate computing have been targeted towards ad hoc and automated approaches alike. The main contribution of this thesis is improvement in these areas with novel approaches to achieve better quality and other performance metric trade-offs. This thesis explores the following major avenues in approximate computing: 1. Arithmetic circuits are the fundamental blocks in any design or architecture. Particularly, optimizing multipliers can bring about a significant edge in terms of performance, power and area. This is achieved by developing a novel and performance efficient design of an approximate multiplier using the Toom-Cook multiplication algorithm. The N-bit multiplier complexity is reduced to O(Nlogd(2d−1)) from O(N2 ), for order d. As a result, on an average, the proposed multiplier achieves 53%, 18% and 57% improvements in area, delay and power only with less than 1% mean error. 2. Approximating Fast Fourier Transform architecture can accelerate numerous digital signal and image processing application domains. In this respect, a shared memory FFT architecture is developed using the proposed Approximate Toom-Cook multiplier. Since the entire frequency domain output is not usually required, another feature of the design is a supporting function in error correction based on the sparsity patterns. The design synthesized as such shows on average, a 49% and 53% improvement in consumption of area and energy, respectively, with as less error as 0.1% with pruning based on the sparsity patterns. Approximating the Full adder standard cells by pruning transistors have enabled us to achieve cells tuned to a more power-quality optimal point. This is essential as the leakage power has increased substantially with technology scaling and has become a dominant component of power dissipation, limiting the performance of the circuits. Significant leakage reductions up to 67% are obtained through the combined efforts of optimal transistor sizing and approximate computing

Full thesis: pdf

Centre for VLSI and Embeded Systems Technology

IIIT Hyderabad Publications

Approximate Computing Architectures and Algorithms for Error Resilient Applications

Abstract