Efficient Hardware Optimization for CNN

Seda Güzel Aydın; Hasan Şakir Bilge

Research Article

Efficient Hardware Optimization for CNN

Year 2022, Volume: 6 Issue: 1, 38 - 44, 20.07.2022

Seda Güzel Aydın , Hasan Şakir Bilge

Abstract

Convolutional Neural Networks (CNN) architectures have been increasingly well-known for image processing applications such as object detection, and remote sensing. Some applications like these systems need to adopt CNN methods for real-time implementation. Embedded devices like Field Programmable Gate Arrays (FPGA) technologies are a favorable alternative to implementing CNN-based algorithms. However, FPGA has some drawbacks such as limited resources and bottlenecks, it is difficult and so crucial to map the whole CNN that has a high number of layers, on FPGA without any optimization. Therefore, hardware optimization techniques are compulsory. In this study, an FPGA-based CNN architecture using high-level synthesis (HLS) is demonstrated, and a synthesis report is created for Xilinx Zynq-7000 xc7z020-clg484-1 target FPGAs. By implementing the CNN architecture on an FPGA platform, the implemented architecture has been fastened. To improve the throughput, the proposed design is optimized for convolutional layers. The most important contribution of this study is to perform optimization on the convolution layer by unrolling kernels and input feature maps and examine the effects on throughput, latency, and hardware resources. In this study, throughput is 15.6 GOP/s for the first convolution layer. With the proposed method in the study, approximately x2.6 acceleration in terms of latency and throughput was achieved compared to the baseline design.

Keywords

FPGA, Deep learning, Convolutional neural networks, Hardware optimization, HLS

Supporting Institution

Tubitak

Project Number

121E393

Thanks

This research was supported by a grant from (121E393) TUBITAK (Türkiye Bilimsel ve Teknolojik Araştirma Kurumu). We thank the TUBITAK for their support of our research.

References

[1] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[2] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv.org, 2014, doi: 10.48550/arXiv.1409.1556.
[3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84–90, May 2017, doi: 10.1145/3065386.
[4] M. Mikaeili and H. S. Bilge, “Estimating Rotation Angle and Transformation Matrix Between Consecutive Ultrasound Images Using Deep Learning,” 2020 Medical Technologies Congress (TIPTEKNO), Nov. 2020, doi: 10.1109/tiptekno50054.2020.9299237.
[5] C. Huang, S. Ni and G. Chen, "A layer-based structured design of CNN on FPGA," 2017 IEEE 12th International Conference on ASIC (ASICON), 2017, pp. 1037-1040, doi: 10.1109/ASICON.2017.8252656.
[6] W. A. Haque, S. Arefin, A. S. M. Shihavuddin, and M. A. Hasan, “DeepThin: A novel lightweight CNN architecture for traffic sign recognition without GPU requirements,” Expert Systems with Applications, vol. 168, p. 114481, Apr. 2021, doi: 10.1016/j.eswa.2020.114481.
[7] Y. Hu, Y. Liu, and Z. Liu, “A Survey on Convolutional Neural Network Accelerators: GPU, FPGA and ASIC,” 2022 14th International Conference on Computer Research and Development (ICCRD), Jan. 2022, doi: 10.1109/iccrd54409.2022.9730377.
[8] N. Zhang, X. Wei, H. Chen, and W. Liu, “FPGA Implementation for CNN-Based Optical Remote Sensing Object Detection,” Electronics, vol. 10, no. 3, p. 282, Jan. 2021, doi: 10.3390/electronics10030282.
[9] C, Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, J. Cong, “Optimizing fpga-based accelerator design for deep convolutional neural networks.” In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, 22–24 February 2015; pp. 161–170.
[10] A. Dundar, J. Jin, V. Gokhale, B. Krishnamurthy, A. Canziani, B. Martini, & E. Culurciello,” Accelerating deep neural networks on mobile processor with embedded programmable logic.” In Neural information processing systems conference (NIPS). 2013
[11] M. Arredondo-Velázquez, J. Diaz-Carmona, C. Torres-Huitzil, A. Padilla-Medina, and J. Prado-Olivarez, “A streaming architecture for Convolutional Neural Networks based on layer operations chaining,” Journal of Real-Time Image Processing, vol. 17, no. 5, pp. 1715–1733, Jan. 2020, doi: 10.1007/s11554-019-00938-y.
[12] Y. Ma, Y. Cao, S. Vrudhula, and J. Seo, “Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 26, no. 7, pp. 1354–1367, Jul. 2018, doi: 10.1109/tvlsi.2018.2815603.
[13] Y. Shen, M. Ferdman and P. Milder, "Maximizing CNN accelerator efficiency through resource partitioning," 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), 2017, pp. 535-547, doi: 10.1145/3079856.3080221.
[14] S. Ghaffari and S. Sharifian, "FPGA-based convolutional neural network accelerator design using high level synthesize," 2016 2nd International Conference of Signal Processing and Intelligent Systems (ICSPIS), 2016, pp. 1-6, doi: 10.1109/ICSPIS.2016.7869873.
[15] Huimin Li, Xitian Fan, Li Jiao, Wei Cao, Xuegong Zhou and Lingli Wang, "A high performance FPGA-based accelerator for large-scale convolutional neural networks," 2016 26th International Conference on Field Programmable Logic and Applications (FPL), 2016, pp. 1-9, doi: 10.1109/FPL.2016.7577308.
[16] Z. Liu, Y. Dou, J. Jiang, J. Xu, S. Li, Y. Zhou, Y. Xu, “Throughput-optimized fpga accelerator for deep convolutional neural networks.” ACM Trans. Reconfgurable Technol. Syst. (TRETS) 10(3), 17, 2017
[17] Y. Zhou, J. Jiang, “ An FPGA-based accelerator implementation for deep convolutional neural networks.” In Proceedings of the 2015 4th International Conference on Computer Science and Network Technology, ICCSNT 2015, Harbin, China, 19–20 December2015; Volume 1, pp. 829–832.
[18] K. Abdelouahab, M. Pelcat, J. Serot, & F. Berry, “Accelerating CNN inference on FPGAs: A survey.” arXiv preprint arXiv:1806.01683. 2018.
[19] K. Guo, S. Zeng, J. Yu, Y. Wang, & H. Yang” [DL] A survey of FPGA-based neural network inference accelerators.” ACM Transactions on Reconfigurable Technology and Systems (TRETS), 12(1), 1-26.2019.
[20] R. Ayachi, Y. Said, & A. Abdelali, “Optimizing Neural Networks for Efficient FPGA Implementation: A Survey.” Archives of Computational Methods in Engineering, 28(7), 4537–4547. 2021.
[21] G. Muhsin “A Comparative Study between RTL and HLS for Image Processing Applications with FPGAs” thesis, University of California, San Diego, Master of Science.
[22] Vivado Design Suite User Guide High-Level Synthesis Documentation Portal. (2022). Retrieved May 17, 2022, from Xilinx.com website: https://docs.xilinx.com/v/u/2018.3-English/ug902-vivado-high-level-synthesis
[23] S. Guzel Aydin and H. S. Bilge, "FPGA -Based Implementation of Convolutional Layer Accelerator Part for CNN," 2021 Innovations in Intelligent Systems and Applications Conference (ASYU), 2021, pp. 1-6, doi: 10.1109/ASYU52992.2021.9599029.
[24] F. Uysal, F. Hardalaç, O. Peker, T. Tolunay, and N. Tokgöz, “Classification of Shoulder X-ray Images with Deep Learning Ensemble Models,” Applied Sciences, vol. 11, no. 6, p. 2723, Mar. 2021, doi: 10.3390/app11062723.

Year 2022, Volume: 6 Issue: 1, 38 - 44, 20.07.2022

Seda Güzel Aydın , Hasan Şakir Bilge

Abstract

Project Number

121E393

References

[1] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[2] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv.org, 2014, doi: 10.48550/arXiv.1409.1556.
[3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84–90, May 2017, doi: 10.1145/3065386.
[4] M. Mikaeili and H. S. Bilge, “Estimating Rotation Angle and Transformation Matrix Between Consecutive Ultrasound Images Using Deep Learning,” 2020 Medical Technologies Congress (TIPTEKNO), Nov. 2020, doi: 10.1109/tiptekno50054.2020.9299237.
[5] C. Huang, S. Ni and G. Chen, "A layer-based structured design of CNN on FPGA," 2017 IEEE 12th International Conference on ASIC (ASICON), 2017, pp. 1037-1040, doi: 10.1109/ASICON.2017.8252656.
[6] W. A. Haque, S. Arefin, A. S. M. Shihavuddin, and M. A. Hasan, “DeepThin: A novel lightweight CNN architecture for traffic sign recognition without GPU requirements,” Expert Systems with Applications, vol. 168, p. 114481, Apr. 2021, doi: 10.1016/j.eswa.2020.114481.
[7] Y. Hu, Y. Liu, and Z. Liu, “A Survey on Convolutional Neural Network Accelerators: GPU, FPGA and ASIC,” 2022 14th International Conference on Computer Research and Development (ICCRD), Jan. 2022, doi: 10.1109/iccrd54409.2022.9730377.
[8] N. Zhang, X. Wei, H. Chen, and W. Liu, “FPGA Implementation for CNN-Based Optical Remote Sensing Object Detection,” Electronics, vol. 10, no. 3, p. 282, Jan. 2021, doi: 10.3390/electronics10030282.
[9] C, Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, J. Cong, “Optimizing fpga-based accelerator design for deep convolutional neural networks.” In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, 22–24 February 2015; pp. 161–170.
[10] A. Dundar, J. Jin, V. Gokhale, B. Krishnamurthy, A. Canziani, B. Martini, & E. Culurciello,” Accelerating deep neural networks on mobile processor with embedded programmable logic.” In Neural information processing systems conference (NIPS). 2013
[11] M. Arredondo-Velázquez, J. Diaz-Carmona, C. Torres-Huitzil, A. Padilla-Medina, and J. Prado-Olivarez, “A streaming architecture for Convolutional Neural Networks based on layer operations chaining,” Journal of Real-Time Image Processing, vol. 17, no. 5, pp. 1715–1733, Jan. 2020, doi: 10.1007/s11554-019-00938-y.
[12] Y. Ma, Y. Cao, S. Vrudhula, and J. Seo, “Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 26, no. 7, pp. 1354–1367, Jul. 2018, doi: 10.1109/tvlsi.2018.2815603.
[13] Y. Shen, M. Ferdman and P. Milder, "Maximizing CNN accelerator efficiency through resource partitioning," 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), 2017, pp. 535-547, doi: 10.1145/3079856.3080221.
[14] S. Ghaffari and S. Sharifian, "FPGA-based convolutional neural network accelerator design using high level synthesize," 2016 2nd International Conference of Signal Processing and Intelligent Systems (ICSPIS), 2016, pp. 1-6, doi: 10.1109/ICSPIS.2016.7869873.
[15] Huimin Li, Xitian Fan, Li Jiao, Wei Cao, Xuegong Zhou and Lingli Wang, "A high performance FPGA-based accelerator for large-scale convolutional neural networks," 2016 26th International Conference on Field Programmable Logic and Applications (FPL), 2016, pp. 1-9, doi: 10.1109/FPL.2016.7577308.
[16] Z. Liu, Y. Dou, J. Jiang, J. Xu, S. Li, Y. Zhou, Y. Xu, “Throughput-optimized fpga accelerator for deep convolutional neural networks.” ACM Trans. Reconfgurable Technol. Syst. (TRETS) 10(3), 17, 2017
[17] Y. Zhou, J. Jiang, “ An FPGA-based accelerator implementation for deep convolutional neural networks.” In Proceedings of the 2015 4th International Conference on Computer Science and Network Technology, ICCSNT 2015, Harbin, China, 19–20 December2015; Volume 1, pp. 829–832.
[18] K. Abdelouahab, M. Pelcat, J. Serot, & F. Berry, “Accelerating CNN inference on FPGAs: A survey.” arXiv preprint arXiv:1806.01683. 2018.
[19] K. Guo, S. Zeng, J. Yu, Y. Wang, & H. Yang” [DL] A survey of FPGA-based neural network inference accelerators.” ACM Transactions on Reconfigurable Technology and Systems (TRETS), 12(1), 1-26.2019.
[20] R. Ayachi, Y. Said, & A. Abdelali, “Optimizing Neural Networks for Efficient FPGA Implementation: A Survey.” Archives of Computational Methods in Engineering, 28(7), 4537–4547. 2021.
[21] G. Muhsin “A Comparative Study between RTL and HLS for Image Processing Applications with FPGAs” thesis, University of California, San Diego, Master of Science.
[22] Vivado Design Suite User Guide High-Level Synthesis Documentation Portal. (2022). Retrieved May 17, 2022, from Xilinx.com website: https://docs.xilinx.com/v/u/2018.3-English/ug902-vivado-high-level-synthesis
[23] S. Guzel Aydin and H. S. Bilge, "FPGA -Based Implementation of Convolutional Layer Accelerator Part for CNN," 2021 Innovations in Intelligent Systems and Applications Conference (ASYU), 2021, pp. 1-6, doi: 10.1109/ASYU52992.2021.9599029.
[24] F. Uysal, F. Hardalaç, O. Peker, T. Tolunay, and N. Tokgöz, “Classification of Shoulder X-ray Images with Deep Learning Ensemble Models,” Applied Sciences, vol. 11, no. 6, p. 2723, Mar. 2021, doi: 10.3390/app11062723.

There are 24 citations in total.

Details

Primary Language	English
Subjects	Engineering
Journal Section	Articles
Authors	Seda Güzel Aydın 0000-0001-8875-9705 Hasan Şakir Bilge 0000-0002-4945-0884
Project Number	121E393
Publication Date	July 20, 2022
Submission Date	June 1, 2022
Published in Issue	Year 2022 Volume: 6 Issue: 1

Cite

IEEE	S. Güzel Aydın and H. Ş. Bilge, “Efficient Hardware Optimization for CNN”, IJMSIT, vol. 6, no. 1, pp. 38–44, 2022.

Download Cover Image

Article Files

Full Text