A Low Power Testable Floating Point ALU

Author: Sandhya Vinjam
Date: 2014-06-30
Report no: IIIT/TH/2014/26
Advisor:Satyam Mandavalli

Abstract

The development of general purpose micro-processor based computers has been quite notable in recent times. Arithmetic and Logic Unit (ALU) is a common component in these systems to support all arithmetic operations needed. The speed of the system gets enhanced by using ALU that also supports floating point operations. Floating Point Unit (FPU) works as a co-processor when there is a need for main processor to work on intensive tasks. The main functionalities of FPUs include addition, subtraction, multiplication, division, finding square root etc. Many digital designs in history consider speed as the primary parameter and power as the secondary. The miniaturization of silicon devices is highly followed these days and the dimensions of the MOSFET are going smaller day by day. Power dissipation in CMOS circuits is caused from two different components 1. Static power 2. Dynamic power, of which, dynamic power is the major contributor. Hence, an effort to optimize power has to be aimed at dynamic power dissipation. In general, using power reduction techniques causes the speed/performance of the system to go low. As reduction in speed is not desirable, ways to optimize the speed of the devices need to be considered. In short, Speed-Power optimization is a benchmark for a good design. In this thesis, the Speed-Power optimization of floating point ALU is carried out in two steps. 1. Enhancing the speed of Normalization Unit using partition technique that is used in realizing Adders and Multipliers. 2. Decreasing the power dissipation in array multiplier using carry propagation technique. When two floating-point numbers with the same exponents are subtracted, normalization of the result is required. In this case, the result of subtraction may come nearly zero, but may not satisfy the requirement of IEEE floating-point standard. In order to bring the result in IEEE format, the normalization of result is necessary. The Normalization Unit consists of an appropriate left shift until the occurrence of first nonzero digit. The amount of shift is determined by counting the number of zeros starting from the MSB until the first nonzero digit is reached. In conventional NU for an n-bit input, checking for first ‘1’ has to be in sequential manner. There is more delay while searching for a ‘1’ from MSB towards LSB. In this thesis, a new design technique to calculate normalization with high speed and low area overhead compared to conventional normalization units is presented. In conventional technique, a large size input leads to high speed since it has more fan-in. New technique conveys that, in order to decrease the delay of the overall circuit, the input is partitioned to reduce the complexity created by high fan-in. This technique makes use of two decoder units, one simple adder and a barrel shifter. By choosing appropriate partitioning, the delay and the area results of the above technique are observed to be better than conventional approaches till now. It is also observed that this technique has better performance in case of large input size. There is a 40% improvement in speed and 14% improvement in power consumption for 24-bit input. In many digital applications, adder is the basic building block of an arithmetic unit. Multiplication is the second basic operation to be carried out next to the addition. A wide usage of these adders and multipliers can often be seen in areas such as Digital Signal Processing (DSP) and Graphics Processing. Multipliers are built based on a structured arrangement of adders. Though the adder circuit itself is small, the replication of many adders in multiplier makes the operation time consuming and quite expensive. A low power design technique called carry propagation is applied to an array multiplier, a major building block of a floating point multiplier. This architecture considerably lowers the switching activity of conventional multipliers. A single precision floating point multiplier consists of a 24 x 24 bit array multiplier resulting in a 48 bit output. The output is rounded-off by discarding the last 24 LSBs and adding either ‘1’ or ‘0’ (carry) to remaining output depending on the value of 24 LSBs. The proposed carry propagation technique reduces the hardware requirement substantially since the 24 LSBs need not be calculated, which results in lesser area and lower power dissipation. Since the 24 LSBs are not being calculated, a tolerable rounding error occurs while using this technique in floating point multiplier. To ensure that the 24 MSBs are correct, only carry bits are propagated from the LSB side of the array multiplier. Although this technique has no effect on the critical path delay, it does reduce power dissipation up to 6.7%. An FIR filter is designed based on this floating point multiplier unit and is tested with various set of inputs. After the analysis of parameters like power and power-delay product, it is observed that the FIR filter based on the proposed technique dissipates 20.6% lesser power compared to conventional designs. According to Moore’s law, the number of transistors will be doubled for every 18 months in Integrated Circuit. Most of the today’s computers and other electronic appliances have millions of transistors. The steady decrease in the feature size results in manufacturing faults that cause functionality failure. In some cases, even if one transistor fails to work as expected, it may make the whole chip faulty at the operational frequency. Here comes the importance of testing. Testing ensures the quality of a chip by effectively checking all possible manufacturing faults taken into consideration. There are various stages in building a system which comprises of assembling different ICs to build PCBs (printed circuit boards) and assembling these PCBS to build a system. The cost of detecting a faulty IC will be less at IC stage compared to PCB stage and it is even higher in the system stage. Thus the testing that is carried in the initial stages of a system design is more important than later stages. The testing of floating point ALU is tough due to their complex structure. In order to reduce the test effort, few modifications are introduced for the proposed FP ALU. The testing of FP ALU is divided into two steps 1) Testing of addition unit 2) Testing of multiplication unit The testing of the addition and multiplication is carried out by assuming the stuck at faults at input wires. Initially, the addition block is divided into 4 functional blocks to make testing easy. The 4 blocks are – (i) Block for finding bigger exponent, (ii) Unit for subtracting the smaller exponent from bigger one to get shift amount, (iii) Shifter to shift the small mantissa, and (iv) Adder to add the shifted mantissa to bigger exponent’s mantissa. Similarly, testing of floating point multiplier is divided into two steps. In the first step, the testing of exponent subtraction stage is carried out and in the second step the array multiplier is tested. By effectively inserting control points and observation points, the addition and multiplier blocks could be tested with considerably less effort.

Full thesis: pdf

Centre for VLSI and Embeded Systems Technology

IIIT Hyderabad Publications

A Low Power Testable Floating Point ALU

Abstract