Convolutional neural networks (CNN) have emerged as a dominant deep learning technique in various fields, including image processing, computer vision, and intelligent decision-making on embedded devices. Although the underlying structure is simple, the high computation and memory requirements pose significant challenges. To address this issue, low-precision representations of neurons, inputs, model parameters, and activations have become a promising solution. These reduced-precision models offer scalability in performance, storage, and power efficiency while sacrificing some accuracy. By leveraging reconfigurable hardware such as FPGAs-SoC, deep learning systems can take advantage of low-precision inference engines while achieving the desired accuracy and balancing performance, power consumption, and programmability. Despite the high redundancy and excellent classification accuracy provided by CNN, the increasing model size makes it challenging to execute applications on embedded FPGAs. However, recent studies have shown that high levels of accuracy can still be achieved even when weight and activation are scaled down from floating-point (FP) to binary values using approaches such as quantized neural networks (QNN) and binarized neural networks (BNN). In this paper, we review recent works that have utilized binarization and quantization frameworks to explore design space and automate the building of fully customizable inference engines for image processing on FPGAs.

Published Date
20 Rabi’ Al-Awwal 1445
Last Change Date
20 Rabi’ Al-Awwal 1445
Rating