FPGA realized parallel architectures for Classical and CNN’s based Image Segmentation Algorithms

Author: Roopal Nahar
Date: 2018-07-30
Report no: IIIT/TH/2018/62
Advisor:Madhava Krishna

Abstract

The thesis presents immensely pipelined parallel FPGA based architectures for Image segmentation by a graph-based partitioning approach and by state-of-the-art CNN’s approach. Efficient and real-time segmentation of color images has a variety of importance in many fields of computer vision such as image compression, medical imaging, mapping and autonomous navigation. Being one of the most computationally expensive operations, it is usually done through software implementation using high-performance processors. In robotic systems, however, with the constrained platform dimensions and the need for portability, low power consumption and simultaneously the need for real-time image segmentation, we envision hardware parallelism as the way forward to achieve higher acceleration. Field-programmable gate arrays (FPGAs) are attractive alternatives for this task because of their reconfigurability, high per-watt performance in a small physical area. They exceed the computing speed of various software-based implementations by breaking the paradigm of sequential execution and accomplishing more per clock cycle operations by enabling hardware level parallelization at an architectural level. The first contribution proposes three novel architectures for a well known Efficient Graph-based Image Segmentation algorithm. These proposed implementations optimize time and power consumption when compared to software implementations. The hybrid design proposed, has notable furtherance of acceleration capabilities delivering at least 2X speed gain over other implementations, which henceforth allows real-time image segmentation that can be deployed on Mobile Robotic systems. The second contribution focuses on FPGA architectures designed for state-of-the-art approach CNN’s for image segmentation and classification. Convolutional Neural Networks (CNN’s) are rapidly gaining popularity in varied fields such as computer vision, information retrieval, and mobile robotics. Due to their increasingly deep structures, modern CNN’s are continuously becoming computationally and memory intensive. Although this enhances accuracy, it becomes difficult to deploy them on energy-constrained mobile devices such as drones. As a consequence, hardware accelerators such as FPGAs with their inherent hardware parallelism, have come up with an attractive alternative. The major bottleneck while implementing huge networks on FPGA is meeting high memory throughput requirement of CNN’s with limited on-chip memory. Hence, this work proposes a high-performance FPGA based architecture - Depth Concatenation and Inter-Layer Fusion based ConvNet accelerator-DeCoILFNet which exploits the intra-layer parallelism of CNN’s by flattening across the depth and combining it with the inter-layer fusion. DeCoILFNet perfectly pipelines the data flow across convolution layers which significantly reduces off-chip memory accesses and maximizes the throughput by using multiple line buffers. To validate our approach, we demonstrate results for first five convolution layers of the VGG-16 network implemented on Xilinx Virtex7 VC709 FPGA board in Verilog .

Full thesis: pdf

Centre for Robotics

IIIT Hyderabad Publications

FPGA realized parallel architectures for Classical and CNN’s based Image Segmentation Algorithms

Abstract