Detect and Replace: Efficient Soft Error Protection of FPGA-Based CNN Accelerators
Detect and Replace: Efficient Soft Error Protection of FPGA-Based CNN Accelerators
Abstract:
Convolutional neural networks (CNNs) are widely used in computer vision and natural language processing. Field-programmable gate arrays (FPGAs) are a popular accelerator for CNNs. However, FPGAs are prone to suffer soft errors, so the reliability of FPGA-based CNNs becomes a serious problem when used in safety-critical applications. The convolution module based on a processing element (PE) array is the most complex part of the accelerator, so it is the key to efficient protection. Coding-based schemes have been proposed for efficient protection of the convolution module, where the processing of the PE array is modeled as parallel matrix–vector multiplications (MVMs), and every wrong output would be concurrently detected and corrected. However, these schemes cannot deal with errors in the configuration memory that affects many intermediate results. In this article, a protection scheme is proposed based on faulty PE detection and replace (DR) to deal with such configuration memory errors. The DR scheme is implemented on a CNN accelerator based on Xilinx Zynq 7020. A number of fault injection (FI) experiments are performed to evaluate and test the performance of the proposed DR scheme. The results show that it can effectively mitigate the effect of soft errors in the convolution module with an overhead of about 1.3 times on computation time, which is less than conventional fault tolerance schemes. In addition, DR can be applied with the advanced checksum-of-checksum (CoC) scheme. The DR scheme decreases power consumption by up to 30%.