IIIT Hyderabad Publications
Algorithms for High Performance Hardware Software Partitioning
Author: Naman Govil
Report no: IIIT/TH/2016/56
Advisor:Shubhajit Roy Chowdhury,Rahul Shrestha
Designing embedded systems efficiently has always been of significant interest. This is primarily motivated by the fact that these embedded systems find application in a wide range of fields: industrial automation, commercial electronics, scientific and medical applications, and many more. With the inherent drive towards automation in today’s world, these applications are only ever-increasing. With the tremendous rise in the complexity of such systems and the need to satisfy multiple and mutually conflicting constraints, the design optimization problem has become extremely large and complex. Hence, there is an urgent necessity to develop new and efficient techniques for design of embedded systems. In this thesis, we apply and refine the design strategy of Hardware Software Partitioning (HW/SW Partitioning) to embedded systems. In the first part of the thesis, HW/SW Partitioning is introduced and then, a co-design methodology is utilized to enhance the widely used Fast Fourier Transform algorithm. A high performance and low-cost implementation is proposed which exploits the principles of Partitioning on a target system consisting of a Spartan 3E FPGA and an Atmega-16 micro-controller. A comparative analysis is then demonstrated, with respect to both, the hardware-only and software-only implementations, which depict the superior performance of the proposed implementation. It is then observed that as the problem size, i.e. complexity of target application/algorithm increased, making Partitioning decisions manually was not feasible. Hence, in order to utilize the true potential of HW/SW Partitioning (especially for large-scale applications), automation of the partitioning process was required. This formed the next focus of this thesis, which comprised of developing high speed and resource efficient algorithms for the problem of HW/SW Partitioning. The problem has been modeled as a multi-dimensional optimization problem with the aim of minimizing the area utilization, power dissipation, time of execution and system memory requirement of the implementation. The novelty of this work is in developing a fast paced, state-of-the-art algorithm which takes into consideration all the major design costs as mentioned above. The Greedy Metaheuristic Algorithm (GMA), with O(n) time complexity has been proposed, which significantly reduces the complexity of algorithms previously described in the literature. The proposed approach can also be efficiently implemented on resource constraint devices; such as low cost System On Chips (SoCs). The problem model being considered thus far was lacking one major factor, the communication costs incurred by the different hardware and software processing engines (PEs). With the contemporary age of Multi-Processor System on Chips (MPSoCs) and hybrid multi-core systems, this became a very important cost which could not be neglected. Thus, in this work, an extension to the previous model was introduced next. Subsequently, a two-phased algorithm, called Phased Greedy Metaheuristic Algorithm (PGMA) was proposed, which solved the problem of Partitioning, considering all the relevant communication costs. In order to evaluate the efficiency and performance of the proposed algorithms, a detailed empirical study was also performed. A comparison of the execution time and solution quality of the proposed algorithms with other major partitioning algorithms in the literature was depicted. The time of execution was found to be as low as 0.011 seconds, with deviation from optimal solution by less than 2% (solution error of under 2%), for partitioning an algorithm consisting of 2500 blocks. To the best of our knowledge, this is the fastest performance till date. Due to the heuristic nature of the algorithms, it was necessary to ascertain the solution quality (how optimal were the generated solutions) and compare them with other approaches. This analysis also confirmed the usefulness of the proposed algorithms. After an intensive theoretical design and performance evaluation, it became imminent to apply the proposed algorithms to real life embedded systems and determine their effectiveness. A comprehensive literature survey was conducted to find a suitable problem of relevance to HW/SW Partitioning. With the growing use and reach of digital photography, embedded systems have been increasingly used in the area of digital signal processing. Joint-Photographic Expert-Group (JPEG) Encoder was one such contemporary embedded system that is widely used in numerous applications. This has resulted in a demand for improved and more efficient constructions of the system. Any research carried out to improve the construction of JPEG encoder will have a significant impact in enhancing contemporary embedded system research. Thus, for this thesis, we chose to propose an efficient partitioning scheme for the JPEG encoder, using the proposed algorithm PGMA. A real-life case study was performed. The partitioning results were then compared with other reported works in the literature. For a power constraint of 600 mW, an area utilization of 58.28 % was achieved, which was the maximum amongst all the reported works. This allowed for a decreased offloading of tasks to software, resulting in a lower memory usage (14KB) and time of execution (0.020 seconds). This once again confirmed the effectiveness of the proposed algorithms.
Full thesis: pdf
Centre for VLSI and Embeded Systems Technology
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved.