Text Detection and Recognition in Natural Scenes by Mobile Robot

Erdal Alimovski; Gökhan Erdemir; Ahmet Emin Kuzucuoglu

doi:10.36222/ejt.1407231

Araştırma Makalesi

Text Detection and Recognition in Natural Scenes by Mobile Robot

Yıl 2024, , 1 - 7, 30.06.2024

Erdal Alimovski , Gökhan Erdemir , Ahmet Emin Kuzucuoglu

https://doi.org/10.36222/ejt.1407231

Öz

Detecting and identifying signboards on their route is crucial for all autonomous and semi-autonomous vehicles, such as delivery robots, UAVs, UGVs, etc. If autonomous systems interact more with their environments, they have the ability to improve their operational aspects. Extracting and comprehending textual information embedded in urban areas has recently grown in importance and popularity, especially for autonomous vehicles. Text detection and recognition in urban areas (e.g., store names and street nameplates, signs) is challenging due to the natural environment factors such as lighting, obstructions, weather conditions, and shooting angles, as well as large variability in scene characteristics in terms of text size, color, and background type. In this study, we proposed three stages text detection and recognition approach for outdoor applications of autonomous and semi-autonomous mobile robots. The first step of the proposed approach is to detect the text in urban areas using the "Efficient And Accurate Scene Text Detector (EAST)" algorithm. Easy, Tesseract, and Keras Optical Character Recognition (OCR) algorithms were applied to the detected text to perform a comparative analysis of character recognition methods. As the last step, we used the Sequence Matcher to the recognized text values to improve the method's impact on OCR algorithms in urban areas. Experiments were held on the university campus by an 8-wheeled mobile robot, and a video stream process was carried out through the camera mounted on the top of the mobile robot. The results demonstrate that the Efficient And Accurate Scene Text Detector (EAST) text detection algorithm combined with Keras OCR outperforms other algorithms and reaches an accuracy of 91.6%

Anahtar Kelimeler

Natural scene, Mobile robot, Optical character recognition, Text detection, Text recognition

Kaynakça

[1] D. Sankowski and J. Nowakowski, Computer Vision in Robotics and Industrial Applications, vol. 3. WORLD SCIENTIFIC, 2014. doi: 10.1142/9090.
[2] M. Nagy and G. Lăzăroiu, “Computer Vision Algorithms, Remote Sensing Data Fusion Techniques, and Mapping and Navigation Tools in the Industry 4.0-Based Slovak Automotive Sector,” Mathematics, vol. 10, no. 19, 2022, doi: 10.3390/math10193543.
[3] M. Bicakci, O. Ayyildiz, Z. Aydin, A. Basturk, S. Karacavus, and B. Yilmaz, “Metabolic Imaging Based Sub-Classification of Lung Cancer,” IEEE Access, vol. 8, 2020, doi: 10.1109/ACCESS.2020.3040155.
[4] Z. Lin et al., “A unified matrix-based convolutional neural network for fine-grained image classification of wheat leaf diseases,” IEEE Access, vol. 7, 2019, doi: 10.1109/ACCESS.2019.2891739.
[5] S. Cebollada, L. Payá, M. Flores, A. Peidró, and O. Reinoso, “A state-of-the-art review on mobile robotics tasks using artificial intelligence and visual data,” Expert Systems with Applications, vol. 167. 2021. doi: 10.1016/j.eswa.2020.114195.
[6] M. Yousef, K. F. Hussain, and U. S. Mohammed, “Accurate, data-efficient, unconstrained text recognition with convolutional neural networks,” Pattern Recognit, vol. 108, 2020, doi: 10.1016/j.patcog.2020.107482.
[7] B. Epshtein, E. Ofek, and Y. Wexler, “Detecting text in natural scenes with stroke width transform,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010. doi: 10.1109/CVPR.2010.5540041.
[8] C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Tu, “Detecting texts of arbitrary orientations in natural images,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2012. doi: 10.1109/CVPR.2012.6247787.
[9] Y. Netzer and T. Wang, “Reading digits in natural images with unsupervised feature learning,” Nips, 2011.
[10] H. Badri, H. Yahia, and K. Daoudi, “Fast and accurate texture recognition with multilayer convolution and multifractal analysis,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014. doi: 10.1007/978-3-319-10590-1_33.
[11] L. Neumann and J. Matas, “Scene text localization and recognition with oriented stroke detection,” in Proceedings of the IEEE International Conference on Computer Vision, 2013. doi: 10.1109/ICCV.2013.19.
[12] S. Zhang, M. Lin, T. Chen, L. Jin, and L. Lin, “Character proposal network for robust text extraction,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2016. doi: 10.1109/ICASSP.2016.7472154.
[13] K. Wang, B. Babenko, and S. Belongie, “End-to-end scene text recognition,” in Proceedings of the IEEE International Conference on Computer Vision, 2011. doi: 10.1109/ICCV.2011.6126402.
[14] C. Shi, C. Wang, B. Xiao, S. Gao, and J. Hu, “End-to-end scene text recognition using tree-structured models,” Pattern Recognit, vol. 47, no. 9, 2014, doi: 10.1016/j.patcog.2014.03.023.
[15] A. Mishra, K. Alahari, and C. V. Jawahar, “Top-down and bottom-up cues for scene text recognition,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2012. doi: 10.1109/CVPR.2012.6247990.
[16] C. Yao, X. Bai, B. Shi, and W. Liu, “Strokelets: A learned multi-scale representation for scene text recognition,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2014. doi: 10.1109/CVPR.2014.515.
[17] Y. Xin, D. Chen, C. Zeng, W. Zhang, Y. Wang, and R. C. C. Cheung, “High Throughput Hardware/Software Heterogeneous System for RRPN-Based Scene Text Detection,” IEEE Transactions on Computers, vol. 71, no. 7, 2022, doi: 10.1109/TC.2021.3092195.
[18] J. Ma et al., “Arbitrary-oriented scene text detection via rotation proposals,” IEEE Trans Multimedia, vol. 20, no. 11, 2018, doi: 10.1109/TMM.2018.2818020.
[19] W. He, X. Y. Zhang, F. Yin, and C. L. Liu, “Deep Direct Regression for Multi-oriented Scene Text Detection,” in Proceedings of the IEEE International Conference on Computer Vision, 2017. doi: 10.1109/ICCV.2017.87.
[20] J. Memon, M. Sami, R. A. Khan, and M. Uddin, “Handwritten Optical Character Recognition (OCR): A Comprehensive Systematic Literature Review (SLR),” IEEE Access, vol. 8. 2020. doi: 10.1109/ACCESS.2020.3012542.
[21] L. Q. Zuo, H. M. Sun, Q. C. Mao, R. Qi, and R. S. Jia, “Natural Scene Text Recognition Based on Encoder-Decoder Framework,” IEEE Access, vol. 7, 2019, doi: 10.1109/ACCESS.2019.2916616.
[22] C. Yao, X. Bai, and W. Liu, “A unified framework for multioriented text detection and recognition,” IEEE Transactions on Image Processing, vol. 23, no. 11, 2014, doi: 10.1109/TIP.2014.2353813.
[23] O. Y. Ling, L. B. Theng, A. Chai, and C. McCarthy, “A model for automatic recognition of vertical texts in natural scene images,” in Proceedings - 8th IEEE International Conference on Control System, Computing and Engineering, ICCSCE 2018, 2019. doi: 10.1109/ICCSCE.2018.8685019.
[24] E. Zacharias, M. Teuchler, and B. Bernier, “Image Processing Based Scene-Text Detection and Recognition with Tesseract,” Apr. 2020, Accessed: Mar. 31, 2023. [Online]. Available: https://keras-ocr.readthedocs.io/en/latest/
[25] “Zed2i Documentation.” Accessed: May 01, 2023. [Online]. Available: https://www.stereolabs.com/docs/
[26] “Pixhawk Documentation.” Accessed: May 01, 2023. [Online]. Available: https://pixhawk.org/
[27] O. M. T. Kaya and G. Erdemir, “Design of an Eight-Wheeled Mobile Delivery Robot and Its Climbing Simulations,” in Conference Proceedings - IEEE SOUTHEASTCON, 2023. doi: 10.1109/SoutheastCon51012.2023.10115114.
[28] X. Zhou et al., “EAST: An efficient and accurate scene text detector,” in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017. doi: 10.1109/CVPR.2017.283.
[29] Y. Baek, B. Lee, D. Han, S. Yun, and H. Lee, “Character region awareness for text detection,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019. doi: 10.1109/CVPR.2019.00959.
[30] B. Shi, X. Bai, and C. Yao, “An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition,” IEEE Trans Pattern Anal Mach Intell, vol. 39, no. 11, 2017, doi: 10.1109/TPAMI.2016.2646371.
[31] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016. doi: 10.1109/CVPR.2016.90.
[32] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Comput, vol. 9, no. 8, pp. 1735–1780, Nov. 1997, doi: 10.1162/neco.1997.9.8.1735.
[33] A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, “Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks,” in ACM International Conference Proceeding Series, 2006. doi: 10.1145/1143844.1143891.
[34] “Keras OCR Documentation.” Accessed: Apr. 09, 2023. [Online]. Available: https://keras-ocr.readthedocs.io/
[35] R. Smith, “An overview of the tesseract OCR engine,” in Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 2007. doi: 10.1109/ICDAR.2007.4376991.
[36] “Difflib Library Documentation.” Accessed: Apr. 09, 2023. [Online]. Available: https://docs.python.org/3/library/difflib.html
[37] J. W. Ratcliff and D. Metzener, “Pattern matching: The gestalt approach,” Dr. Dobb’s Journal, vol. 13, 1988.

Yıl 2024, , 1 - 7, 30.06.2024

Erdal Alimovski , Gökhan Erdemir , Ahmet Emin Kuzucuoglu

https://doi.org/10.36222/ejt.1407231

Öz

Kaynakça

[1] D. Sankowski and J. Nowakowski, Computer Vision in Robotics and Industrial Applications, vol. 3. WORLD SCIENTIFIC, 2014. doi: 10.1142/9090.
[2] M. Nagy and G. Lăzăroiu, “Computer Vision Algorithms, Remote Sensing Data Fusion Techniques, and Mapping and Navigation Tools in the Industry 4.0-Based Slovak Automotive Sector,” Mathematics, vol. 10, no. 19, 2022, doi: 10.3390/math10193543.
[3] M. Bicakci, O. Ayyildiz, Z. Aydin, A. Basturk, S. Karacavus, and B. Yilmaz, “Metabolic Imaging Based Sub-Classification of Lung Cancer,” IEEE Access, vol. 8, 2020, doi: 10.1109/ACCESS.2020.3040155.
[4] Z. Lin et al., “A unified matrix-based convolutional neural network for fine-grained image classification of wheat leaf diseases,” IEEE Access, vol. 7, 2019, doi: 10.1109/ACCESS.2019.2891739.
[5] S. Cebollada, L. Payá, M. Flores, A. Peidró, and O. Reinoso, “A state-of-the-art review on mobile robotics tasks using artificial intelligence and visual data,” Expert Systems with Applications, vol. 167. 2021. doi: 10.1016/j.eswa.2020.114195.
[6] M. Yousef, K. F. Hussain, and U. S. Mohammed, “Accurate, data-efficient, unconstrained text recognition with convolutional neural networks,” Pattern Recognit, vol. 108, 2020, doi: 10.1016/j.patcog.2020.107482.
[7] B. Epshtein, E. Ofek, and Y. Wexler, “Detecting text in natural scenes with stroke width transform,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010. doi: 10.1109/CVPR.2010.5540041.
[8] C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Tu, “Detecting texts of arbitrary orientations in natural images,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2012. doi: 10.1109/CVPR.2012.6247787.
[9] Y. Netzer and T. Wang, “Reading digits in natural images with unsupervised feature learning,” Nips, 2011.
[10] H. Badri, H. Yahia, and K. Daoudi, “Fast and accurate texture recognition with multilayer convolution and multifractal analysis,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014. doi: 10.1007/978-3-319-10590-1_33.
[11] L. Neumann and J. Matas, “Scene text localization and recognition with oriented stroke detection,” in Proceedings of the IEEE International Conference on Computer Vision, 2013. doi: 10.1109/ICCV.2013.19.
[12] S. Zhang, M. Lin, T. Chen, L. Jin, and L. Lin, “Character proposal network for robust text extraction,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2016. doi: 10.1109/ICASSP.2016.7472154.
[13] K. Wang, B. Babenko, and S. Belongie, “End-to-end scene text recognition,” in Proceedings of the IEEE International Conference on Computer Vision, 2011. doi: 10.1109/ICCV.2011.6126402.
[14] C. Shi, C. Wang, B. Xiao, S. Gao, and J. Hu, “End-to-end scene text recognition using tree-structured models,” Pattern Recognit, vol. 47, no. 9, 2014, doi: 10.1016/j.patcog.2014.03.023.
[15] A. Mishra, K. Alahari, and C. V. Jawahar, “Top-down and bottom-up cues for scene text recognition,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2012. doi: 10.1109/CVPR.2012.6247990.
[16] C. Yao, X. Bai, B. Shi, and W. Liu, “Strokelets: A learned multi-scale representation for scene text recognition,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2014. doi: 10.1109/CVPR.2014.515.
[17] Y. Xin, D. Chen, C. Zeng, W. Zhang, Y. Wang, and R. C. C. Cheung, “High Throughput Hardware/Software Heterogeneous System for RRPN-Based Scene Text Detection,” IEEE Transactions on Computers, vol. 71, no. 7, 2022, doi: 10.1109/TC.2021.3092195.
[18] J. Ma et al., “Arbitrary-oriented scene text detection via rotation proposals,” IEEE Trans Multimedia, vol. 20, no. 11, 2018, doi: 10.1109/TMM.2018.2818020.
[19] W. He, X. Y. Zhang, F. Yin, and C. L. Liu, “Deep Direct Regression for Multi-oriented Scene Text Detection,” in Proceedings of the IEEE International Conference on Computer Vision, 2017. doi: 10.1109/ICCV.2017.87.
[20] J. Memon, M. Sami, R. A. Khan, and M. Uddin, “Handwritten Optical Character Recognition (OCR): A Comprehensive Systematic Literature Review (SLR),” IEEE Access, vol. 8. 2020. doi: 10.1109/ACCESS.2020.3012542.
[21] L. Q. Zuo, H. M. Sun, Q. C. Mao, R. Qi, and R. S. Jia, “Natural Scene Text Recognition Based on Encoder-Decoder Framework,” IEEE Access, vol. 7, 2019, doi: 10.1109/ACCESS.2019.2916616.
[22] C. Yao, X. Bai, and W. Liu, “A unified framework for multioriented text detection and recognition,” IEEE Transactions on Image Processing, vol. 23, no. 11, 2014, doi: 10.1109/TIP.2014.2353813.
[23] O. Y. Ling, L. B. Theng, A. Chai, and C. McCarthy, “A model for automatic recognition of vertical texts in natural scene images,” in Proceedings - 8th IEEE International Conference on Control System, Computing and Engineering, ICCSCE 2018, 2019. doi: 10.1109/ICCSCE.2018.8685019.
[24] E. Zacharias, M. Teuchler, and B. Bernier, “Image Processing Based Scene-Text Detection and Recognition with Tesseract,” Apr. 2020, Accessed: Mar. 31, 2023. [Online]. Available: https://keras-ocr.readthedocs.io/en/latest/
[25] “Zed2i Documentation.” Accessed: May 01, 2023. [Online]. Available: https://www.stereolabs.com/docs/
[26] “Pixhawk Documentation.” Accessed: May 01, 2023. [Online]. Available: https://pixhawk.org/
[27] O. M. T. Kaya and G. Erdemir, “Design of an Eight-Wheeled Mobile Delivery Robot and Its Climbing Simulations,” in Conference Proceedings - IEEE SOUTHEASTCON, 2023. doi: 10.1109/SoutheastCon51012.2023.10115114.
[28] X. Zhou et al., “EAST: An efficient and accurate scene text detector,” in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017. doi: 10.1109/CVPR.2017.283.
[29] Y. Baek, B. Lee, D. Han, S. Yun, and H. Lee, “Character region awareness for text detection,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019. doi: 10.1109/CVPR.2019.00959.
[30] B. Shi, X. Bai, and C. Yao, “An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition,” IEEE Trans Pattern Anal Mach Intell, vol. 39, no. 11, 2017, doi: 10.1109/TPAMI.2016.2646371.
[31] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016. doi: 10.1109/CVPR.2016.90.
[32] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Comput, vol. 9, no. 8, pp. 1735–1780, Nov. 1997, doi: 10.1162/neco.1997.9.8.1735.
[33] A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, “Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks,” in ACM International Conference Proceeding Series, 2006. doi: 10.1145/1143844.1143891.
[34] “Keras OCR Documentation.” Accessed: Apr. 09, 2023. [Online]. Available: https://keras-ocr.readthedocs.io/
[35] R. Smith, “An overview of the tesseract OCR engine,” in Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 2007. doi: 10.1109/ICDAR.2007.4376991.
[36] “Difflib Library Documentation.” Accessed: Apr. 09, 2023. [Online]. Available: https://docs.python.org/3/library/difflib.html
[37] J. W. Ratcliff and D. Metzener, “Pattern matching: The gestalt approach,” Dr. Dobb’s Journal, vol. 13, 1988.

Toplam 37 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Bilgisayar Yazılımı
Bölüm	Araştırma Makalesi
Yazarlar	Erdal Alimovski 0000-0003-0909-2047 Gökhan Erdemir 0000-0003-4095-6333 Ahmet Emin Kuzucuoglu 0000-0002-7769-6451
Erken Görünüm Tarihi	23 Ağustos 2024
Yayımlanma Tarihi	30 Haziran 2024
Gönderilme Tarihi	19 Aralık 2023
Kabul Tarihi	20 Nisan 2024
Yayımlandığı Sayı	Yıl 2024

Kaynak Göster

APA	Alimovski, E., Erdemir, G., & Kuzucuoglu, A. E. (2024). Text Detection and Recognition in Natural Scenes by Mobile Robot. European Journal of Technique (EJT), 14(1), 1-7. https://doi.org/10.36222/ejt.1407231

Makale Dosyaları

Tam Metin

All articles published by EJT are licensed under the Creative Commons Attribution 4.0 International License. This permits anyone to copy, redistribute, remix, transmit and adapt the work provided the original work and source is appropriately cited. Creative Commons LisansÄ±