Speech-to-Gender Recognition Based on Machine Learning Algorithms

Serhat Hızlısoy; Emel Çolakoğlu; Recep Sinan Arslan

doi:10.18100/ijamec.1221455

Research Article

Year 2022, Volume: 10 Issue: 4, 84 - 92, 31.12.2022

Serhat Hızlısoy , Emel Çolakoğlu , Recep Sinan Arslan

https://doi.org/10.18100/ijamec.1221455

Cited By: 2

Abstract

References

R. S. Arslan and N. Barışçı, “Development of output correction methodology for long short term memory-based speech recognition,” Sustainability, , cilt 11(15), 2019.
R. S. Arslan and N. Barışçı, “A detailed survey of Turkish automatic speech recognition,” Turkish journal of electrical engineering and computer science, pp. 3253-3269, 2020.
H. Erokyar, “Age and Gender Recognition for Speech Applications based on Support Vector Machines,” Florida, 2014.
A. Oğuz, “Ses Sinyallerinden Yaş Grubu ve Cinsiyet Bilgisinin Tahmin Edilmesi,” Siirt, 2018.
S. Hızlısoy and Z. Tüfekçi, “Noise robust speech recogniton using parallel model compensation and voice activity detection methods,” 2015 5th international conference on electronics, devices, systems, and applications(ICEDSA), pp. 1-4, 2016.
S. Hızlısoy and R. S. Arslan, “Text independent speaker recognition based on MFCC and machine learning,” Selcuk University Journal of Engineering Sciences, no. 20(3), pp. 73-78, 2021.
S. Hızlısoy, S. Yıldırım and Z. Tüfekçi, “Music emotional recognition using convolutional long short term memory deep neural networks,” Engineering science and technology, an international journal, no. 24(3), pp. 760-767, 2021.
A. Tursunov, Mustaqeem, J. Y. Choeh and S. Kwon, “Age and Gender Recognition Using a Convolutional Neural Network with a Specially Designed Multi-Attention Module through Speech Spectrograms,” Sensors, 09 2021.
E. Çolakoğlu, S. Hızlısoy ve A. Recep Sinan, “Konuşmadan duygu tanıma üzerine detaylı bir inceleme: özellikler ve sınıflandırma metodları,” Avrupa bilim ve teknoloji dergisi, pp. 471-483, 2021.
A. Recep Sinan ve N. Barışçı, “Farklı optimizasyon tekniklerinin bağlantıcı zamansal sınıflandırma kullanılan uçtan uca Türkçe konuşma tanıma sistemlerine etkisi,” 2nd International Symposium on Multidisciplinary Studies and Innovative Technologies(ISMSIT), pp. 19-21, 10 2018.
A. Pahwa and G. Aggarwal, “Speech Feature Extraction for Gender Recognition,” I.J. Image, Graphics and Signal Processing,, pp. 17-25, 9 2016.
F. Ertam, “An effective gender recognition approach using voice data via deeper LSTM networks,” Applied Acoustics, pp. 351-358, 08 2019.
A. Oğuz, “Ses sinyallerinden yaş grubu ve cinsiyet bilgisinin tahmin edilmesi” Siirt Üniversitei Fen Bilimleri Enstitüsü, Siirt, 2018.
S. Levitan, T. Mishra and S. Bangalore, “Automatic identification of gender from speech,” Speech Prosody 2016, Boston, USA, 2016.
Ö. Eskidere ve F. Ertaş, “Mel Frekansı Kepstrum Katsayılarındaki Değişimlerin Konuşmacı Tanımaya Etkisi,” Uludağ Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi Cilt 14, Sayı 2, 2009.
R. S. Alkhawaldeh, “DGR: Gender Recognition of Human Speech Using One-Dimensional Conventional Neural Network,” Scientific Programming, pp. 1-12, 12 2019.
A. Alan ve M. Karabatak, “Veri Seti - Sınıflandırma İlişkisinde Performansa Etki Eden Faktörlerin Değerlendirilmesi,” Fırat Üniversitesi Müh. Bil. Dergisi , cilt 32(2), no. 531-540, pp. 531-540, 8 2020.
M. M. Nasef, A. M. Sauber and . M. M. Nabil, “Voice gender recognition under unconstrained environments using self-attention,” Applied Acoustics, 11 2020.
Y. S. Taspinar, M. M. Saritas, İ. Cinar and M. Koklu, “Gender Determination Using Voice Data,” International Journal of Applied Mathematics, Electronics and Computers, 11 2020.
A. Sadek, I. Shariful and H. Alamgir , “Gender Recognition System Using Speech Signal,” International Journal of Computer Science, Engineering and Information Technology, pp. Vol.2, No.1, 02 2012.
B. Zhong, Y. Liang, J. Wu, B. Quan, C. Li, W. Wang, J. Zhang and Z. Li, “Gender Recognition of Speech based on Decision Tree Model,” %1 içinde Proceedings of the 3rd International Conference on Computer Engineering, Information Science & Application Technology, Chongqing, China, 2019.
E. Yücesoy and V. V. Nabiyev, “Gender Identification Of A Speaker From Voice Source,” %1 içinde 21st Signal Processing and Communications Applications Conference, Haspolat, Turkey, 2013.
M. A. Uddin, R. K. Pathan, H. Sayem and M. Biswas, “Gender and region detection from human voice using the three-layer feature extraction method with 1D CNN,” Journal of Information and Telecommunication, pp. 27-42, 08 2021.
B. Jena, A. Mohanty and S. K. Mohanty, “Gender Recognition of Speech Signal using KNN and SVM,” %1 içinde International Conference on IoT based Control Networks and Intelligent Systems, Kottayam, Kerala,India, 2020.
E. Yücesoy ve V. V. Nabiyev, “Konuşmacı yaş ve cinsiyetinin GKM süpervektörlerine dayalı bir DVM sınıflandırıcısı ile belirlenmesi,” Journal of the Faculty of Engineering and Architecture of Gazi University, 09 2016.
J. Thangaiyan, K. Vinothkumar and A. Vijayaselvi, “Automatic Gender Identiﬁcation in Speech Recognition by Genetic Algorithm,” Applied Mathematics & Information Sciences, pp. 907-913, 05 2017.
F. Kiani, M. A. Kutlugün ve M. Y. Çakır, “Derin Sinir Ağları ile Konuşma Tespiti ve Cinsiyet Tahmini,” %1 içinde 22. Türkiye’de Internet Konferansı, İstanbul, 2017.
E. Yücesoy, “Konuşmacının Yaş ve Cinsiyetine Göre Sınıflandırılmasında DVM Çekirdeğinin Etkisi,” El-Cezerî Fen ve Mühendislik Dergisi, pp. 970-982, 05 2020.
S. KARASARTOVA, “Metinden Bağımsız Konuşmacı Tanıma Sistemlerinin İncelenmesi ve Gerçekleştirilmesi,” Ankara, 2011.
Ö. Eskidere and F. Ertaş, “The Effects of Filter Frequency Scale Variability On Speaker Identification Performance,” Journal of Engineering and Natural Sciences, pp. 197-207, 09 2009.
İ. TÜRKER, “Ses Si̇nyallerinin Graf Tabanlı Temsillerinin Yapay Zekâ Yöntemleri İle Sınıflandırılması,” Karabük, 2022.
“wikipedia,” [Online]. Available: https://en.wikipedia.org/wiki/Confusion_matrix.
G. Öğündür, “Model Seçimi-K Fold Cross Validation,” 13 01 2020. [Online]. Available: https://medium.com/@gulcanogundur/model-se%C3%A7imi-k-fold-cross-validation-4635b61f143c. [Access: 12 2022].
Ö. Eskidere and F. Ertaş, “The effects of filter frequency scale variability on speaker identification performance” Journal of Engineering and Natural Sciences Mühendislik ve Fen Bilimleri Dergisi, 9 2009.
S. Aksu, “Ses sinyallerinin graf tabanlı temsillerinin yapay zeka yöntemleri ile sınıflandırılması “ Karabük Üniversitesi, Karabük, 2022.

Speech-to-Gender Recognition Based on Machine Learning Algorithms

Year 2022, Volume: 10 Issue: 4, 84 - 92, 31.12.2022

Serhat Hızlısoy , Emel Çolakoğlu , Recep Sinan Arslan

https://doi.org/10.18100/ijamec.1221455

Cited By: 2

Abstract

Speech recognition has several application areas such as human machine interaction, classification of phone calls by gender, voice tagging, STT, etc. Predicting gender from audio signals is a problem that is easy for humans to solve, difficult to solve by a computer. In this study, a model based on MFCC and classification with machine learning is proposed for gender estimation from Turkish voice signals. Within the scope of the study, 58 different series and films were examined and a new original dataset was created with 894 audio recordings consisting of 5 sec sections taken from them. Mel-frequency cepstral coefficients (MFCC) and spectrogram, which are frequently used in the literature, were used for feature extraction from audio data. The results were first evaluated separately using two features in one way. A hybrid feature vector was then created using two feature vectors. Different machine learning algorithms (LR, DT, RF, XGB etc.) were tested in the classification process and it was seen that the best accuracy was achieved in the hybrid model and logistic regression with 89%. Recall, precision and f-score values were obtained as 86.8%, 92% and 89.3%, respectively. The obtained test results revealed that the proposed model, together with the hybrid feature vector used, the original dataset and the classifier based on machine learning, showed classification success in terms of accuracy and was a stable and robust model.

Keywords

Gender Recognition, Machine Learning, MFCC, spectrogram, logistic regression, Turkish

References

R. S. Arslan and N. Barışçı, “Development of output correction methodology for long short term memory-based speech recognition,” Sustainability, , cilt 11(15), 2019.
R. S. Arslan and N. Barışçı, “A detailed survey of Turkish automatic speech recognition,” Turkish journal of electrical engineering and computer science, pp. 3253-3269, 2020.
H. Erokyar, “Age and Gender Recognition for Speech Applications based on Support Vector Machines,” Florida, 2014.
A. Oğuz, “Ses Sinyallerinden Yaş Grubu ve Cinsiyet Bilgisinin Tahmin Edilmesi,” Siirt, 2018.
S. Hızlısoy and Z. Tüfekçi, “Noise robust speech recogniton using parallel model compensation and voice activity detection methods,” 2015 5th international conference on electronics, devices, systems, and applications(ICEDSA), pp. 1-4, 2016.
S. Hızlısoy and R. S. Arslan, “Text independent speaker recognition based on MFCC and machine learning,” Selcuk University Journal of Engineering Sciences, no. 20(3), pp. 73-78, 2021.
S. Hızlısoy, S. Yıldırım and Z. Tüfekçi, “Music emotional recognition using convolutional long short term memory deep neural networks,” Engineering science and technology, an international journal, no. 24(3), pp. 760-767, 2021.
A. Tursunov, Mustaqeem, J. Y. Choeh and S. Kwon, “Age and Gender Recognition Using a Convolutional Neural Network with a Specially Designed Multi-Attention Module through Speech Spectrograms,” Sensors, 09 2021.
E. Çolakoğlu, S. Hızlısoy ve A. Recep Sinan, “Konuşmadan duygu tanıma üzerine detaylı bir inceleme: özellikler ve sınıflandırma metodları,” Avrupa bilim ve teknoloji dergisi, pp. 471-483, 2021.
A. Recep Sinan ve N. Barışçı, “Farklı optimizasyon tekniklerinin bağlantıcı zamansal sınıflandırma kullanılan uçtan uca Türkçe konuşma tanıma sistemlerine etkisi,” 2nd International Symposium on Multidisciplinary Studies and Innovative Technologies(ISMSIT), pp. 19-21, 10 2018.
A. Pahwa and G. Aggarwal, “Speech Feature Extraction for Gender Recognition,” I.J. Image, Graphics and Signal Processing,, pp. 17-25, 9 2016.
F. Ertam, “An effective gender recognition approach using voice data via deeper LSTM networks,” Applied Acoustics, pp. 351-358, 08 2019.
A. Oğuz, “Ses sinyallerinden yaş grubu ve cinsiyet bilgisinin tahmin edilmesi” Siirt Üniversitei Fen Bilimleri Enstitüsü, Siirt, 2018.
S. Levitan, T. Mishra and S. Bangalore, “Automatic identification of gender from speech,” Speech Prosody 2016, Boston, USA, 2016.
Ö. Eskidere ve F. Ertaş, “Mel Frekansı Kepstrum Katsayılarındaki Değişimlerin Konuşmacı Tanımaya Etkisi,” Uludağ Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi Cilt 14, Sayı 2, 2009.
R. S. Alkhawaldeh, “DGR: Gender Recognition of Human Speech Using One-Dimensional Conventional Neural Network,” Scientific Programming, pp. 1-12, 12 2019.
A. Alan ve M. Karabatak, “Veri Seti - Sınıflandırma İlişkisinde Performansa Etki Eden Faktörlerin Değerlendirilmesi,” Fırat Üniversitesi Müh. Bil. Dergisi , cilt 32(2), no. 531-540, pp. 531-540, 8 2020.
M. M. Nasef, A. M. Sauber and . M. M. Nabil, “Voice gender recognition under unconstrained environments using self-attention,” Applied Acoustics, 11 2020.
Y. S. Taspinar, M. M. Saritas, İ. Cinar and M. Koklu, “Gender Determination Using Voice Data,” International Journal of Applied Mathematics, Electronics and Computers, 11 2020.
A. Sadek, I. Shariful and H. Alamgir , “Gender Recognition System Using Speech Signal,” International Journal of Computer Science, Engineering and Information Technology, pp. Vol.2, No.1, 02 2012.
B. Zhong, Y. Liang, J. Wu, B. Quan, C. Li, W. Wang, J. Zhang and Z. Li, “Gender Recognition of Speech based on Decision Tree Model,” %1 içinde Proceedings of the 3rd International Conference on Computer Engineering, Information Science & Application Technology, Chongqing, China, 2019.
E. Yücesoy and V. V. Nabiyev, “Gender Identification Of A Speaker From Voice Source,” %1 içinde 21st Signal Processing and Communications Applications Conference, Haspolat, Turkey, 2013.
M. A. Uddin, R. K. Pathan, H. Sayem and M. Biswas, “Gender and region detection from human voice using the three-layer feature extraction method with 1D CNN,” Journal of Information and Telecommunication, pp. 27-42, 08 2021.
B. Jena, A. Mohanty and S. K. Mohanty, “Gender Recognition of Speech Signal using KNN and SVM,” %1 içinde International Conference on IoT based Control Networks and Intelligent Systems, Kottayam, Kerala,India, 2020.
E. Yücesoy ve V. V. Nabiyev, “Konuşmacı yaş ve cinsiyetinin GKM süpervektörlerine dayalı bir DVM sınıflandırıcısı ile belirlenmesi,” Journal of the Faculty of Engineering and Architecture of Gazi University, 09 2016.
J. Thangaiyan, K. Vinothkumar and A. Vijayaselvi, “Automatic Gender Identiﬁcation in Speech Recognition by Genetic Algorithm,” Applied Mathematics & Information Sciences, pp. 907-913, 05 2017.
F. Kiani, M. A. Kutlugün ve M. Y. Çakır, “Derin Sinir Ağları ile Konuşma Tespiti ve Cinsiyet Tahmini,” %1 içinde 22. Türkiye’de Internet Konferansı, İstanbul, 2017.
E. Yücesoy, “Konuşmacının Yaş ve Cinsiyetine Göre Sınıflandırılmasında DVM Çekirdeğinin Etkisi,” El-Cezerî Fen ve Mühendislik Dergisi, pp. 970-982, 05 2020.
S. KARASARTOVA, “Metinden Bağımsız Konuşmacı Tanıma Sistemlerinin İncelenmesi ve Gerçekleştirilmesi,” Ankara, 2011.
Ö. Eskidere and F. Ertaş, “The Effects of Filter Frequency Scale Variability On Speaker Identification Performance,” Journal of Engineering and Natural Sciences, pp. 197-207, 09 2009.
İ. TÜRKER, “Ses Si̇nyallerinin Graf Tabanlı Temsillerinin Yapay Zekâ Yöntemleri İle Sınıflandırılması,” Karabük, 2022.
“wikipedia,” [Online]. Available: https://en.wikipedia.org/wiki/Confusion_matrix.
G. Öğündür, “Model Seçimi-K Fold Cross Validation,” 13 01 2020. [Online]. Available: https://medium.com/@gulcanogundur/model-se%C3%A7imi-k-fold-cross-validation-4635b61f143c. [Access: 12 2022].
Ö. Eskidere and F. Ertaş, “The effects of filter frequency scale variability on speaker identification performance” Journal of Engineering and Natural Sciences Mühendislik ve Fen Bilimleri Dergisi, 9 2009.
S. Aksu, “Ses sinyallerinin graf tabanlı temsillerinin yapay zeka yöntemleri ile sınıflandırılması “ Karabük Üniversitesi, Karabük, 2022.

There are 35 citations in total.

Details

Primary Language	English
Subjects	Engineering
Journal Section	Research Article
Authors	Serhat Hızlısoy 0000-0001-8440-5539 Emel Çolakoğlu 0000-0003-1755-3130 Recep Sinan Arslan 0000-0002-3028-0416
Early Pub Date	December 29, 2022
Publication Date	December 31, 2022
Published in Issue	Year 2022 Volume: 10 Issue: 4

Cite

APA	Hızlısoy, S., Çolakoğlu, E., & Arslan, R. S. (2022). Speech-to-Gender Recognition Based on Machine Learning Algorithms. International Journal of Applied Mathematics Electronics and Computers, 10(4), 84-92. https://doi.org/10.18100/ijamec.1221455
AMA	Hızlısoy S, Çolakoğlu E, Arslan RS. Speech-to-Gender Recognition Based on Machine Learning Algorithms. International Journal of Applied Mathematics Electronics and Computers. December 2022;10(4):84-92. doi:10.18100/ijamec.1221455
Chicago	Hızlısoy, Serhat, Emel Çolakoğlu, and Recep Sinan Arslan. “Speech-to-Gender Recognition Based on Machine Learning Algorithms”. International Journal of Applied Mathematics Electronics and Computers 10, no. 4 (December 2022): 84-92. https://doi.org/10.18100/ijamec.1221455.
EndNote	Hızlısoy S, Çolakoğlu E, Arslan RS (December 1, 2022) Speech-to-Gender Recognition Based on Machine Learning Algorithms. International Journal of Applied Mathematics Electronics and Computers 10 4 84–92.
IEEE	S. Hızlısoy, E. Çolakoğlu, and R. S. Arslan, “Speech-to-Gender Recognition Based on Machine Learning Algorithms”, International Journal of Applied Mathematics Electronics and Computers, vol. 10, no. 4, pp. 84–92, 2022, doi: 10.18100/ijamec.1221455.
ISNAD	Hızlısoy, Serhat et al. “Speech-to-Gender Recognition Based on Machine Learning Algorithms”. International Journal of Applied Mathematics Electronics and Computers 10/4 (December 2022), 84-92. https://doi.org/10.18100/ijamec.1221455.
JAMA	Hızlısoy S, Çolakoğlu E, Arslan RS. Speech-to-Gender Recognition Based on Machine Learning Algorithms. International Journal of Applied Mathematics Electronics and Computers. 2022;10:84–92.
MLA	Hızlısoy, Serhat et al. “Speech-to-Gender Recognition Based on Machine Learning Algorithms”. International Journal of Applied Mathematics Electronics and Computers, vol. 10, no. 4, 2022, pp. 84-92, doi:10.18100/ijamec.1221455.
Vancouver	Hızlısoy S, Çolakoğlu E, Arslan RS. Speech-to-Gender Recognition Based on Machine Learning Algorithms. International Journal of Applied Mathematics Electronics and Computers. 2022;10(4):84-92.

Cited By

Automatic Age and Gender Recognition Using Ensemble Learning

Applied Sciences

https://doi.org/10.3390/app14166868

Gender Recognition Based on the Stacking of Different Acoustic Features

Applied Sciences

https://doi.org/10.3390/app14156564

Article Files

Full Text

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.