Abstract
Deep Learning applications are rapidly developing, especially in mobile devices. Existing performance, power and space constraints on these platforms increase the need for application-specific hardware designs. One of the most current methods in image processing is Convolutional Neural Networks. In this study, max-pooling unit designs, which is an important process block of Convolutional Neural Networks, are presented. The max-pooling layer is in the critical delay path of the Convolutional Neural Network design and is important to influence the main conversion rate of a pipeline integrated circuit. The total frame processing times of the proposed designs are much shorter than the Standard Design. The proposed designs can be integrated into different pipeline structures. All designs are modeled with VHDL and synthesized on a current FPGA platform. The synthesis results show that the fastest of the proposed designs processes a 128x128 frame around 8.1 times faster than the Standard Design.