Breaking the Limitation of DSP Blocks - A Novel Sparse-Convolution Scheme for FPGA Acceleration of CNN Inference
Dong Wang received the PhD and MS degrees in electronic engineering in 2010 and 2006 from Xi'an Jiaotong University, China. He has been a visiting scholar in the Department of Electrical and Computer Engineering of University of California, Davis during 2018- 2019. He is currently working at Beijing Jiaotong University as a associate professor. His research interests include reconfigurable computing, high performance computing architectures for embedded applications and computer vision.
Hardware accelerators for convolutional neural network (CNN) inference have been extensively studied in recent years. The reported designs tend to utilize a similar underlying architecture based on multiplier-accumulator (MAC) arrays, which has the practical consequence of limiting the FPGA-based accelerator performance by the number of available on-chip DSP blocks, while leaving other resource under-utilized. To address this problem, we consider a transformation to the convolution computation, which leads to transformation of the accelerator design space and relaxes the pressure on the required DSP resources. We demonstrate that our approach enables us to strike a judicious balance between utilization of the on-chip memory, logic, and DSP resources, due to which, our accelerator considerably outperforms state of the art. We report the effectiveness of our approach on a Stratix-V GXA7 FPGA, which shows 55% throughput improvement, while using x4 times less DSP blocks, compared to the best reported CNN accelerator on the same device.