Research Article
BibTex RIS Cite

Konuşmacının Yaş ve Cinsiyetine Göre Sınıflandırılmasında DVM Çekirdeğinin Etkisi

Year 2020, Volume: 7 Issue: 3, 970 - 982, 30.09.2020
https://doi.org/10.31202/ecjse.707179

Abstract

Bu çalışmada kısa süreli telefon konuşmalarından konuşmacının yaş ve cinsiyet grubunun otomatik olarak belirlemesi konusu ele alınmıştır. Çalışmada Mel Frekansı Kepstrum katsayıları (MFKK) ve bu katsayılardan türetilen delta parametreleri öznitelik olarak kullanılırken yaş ve cinsiyet sınıflarının modellenmesinde Genel Arkaplan Modelinden (GAM) uyarlanarak oluşturulan Gauss Karışım Modelleri (GKM) kullanılmıştır. Her konuşma için oluşturulan GKM modelleri süpervektörlere dönüştürülmüş ve bir Destek Vektör Makinesine (DVM) uygulanarak konuşmacının yaş ve cinsiyet grubuna göre sınıflandırılmıştır. Çalışmada doğrusal, polinomiyal, radya tabanlı (RBF) ve GKM-KL çekirdeği olmak üzer dört farklı DVM çekirdeği kullanılırken GKM bileşen sayısı da 32 ile 512 arasında değiştirilmiştir. aGender veri tabanı ile yapılan testlerde en iyi sınıflandırma oranı 256 bileşenli GKM’lerin GKM-KL çekirdeği ile sınıflandırılması sonucunda % 60.95 olarak elde edilmiştir.

References

  • F. Metze et al., “Comparison of four approaches to age and gender recognition for telephone applications,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2007, vol. 4, pp. IV-1089-IV-1092.
  • D. C. Tanner and M. E. Tanner, Forensic aspects of speech patterns: voice prints, speaker profiling, lie and intoxication detection. Lawyers & Judges Publishing Company, 2004.
  • S. Bhukya, “Effect of Gender on Improving Speech Recognition System,” in International Journal of Computer Applications, 2018, vol. 179, no. 14, pp. 22–30.
  • M. Li, C.-S. Jung, and K. Han, “Combining five acoustic level modeling methods for automatic speaker age and gender recognition,” in Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, 2010, pp. 2826–2829.
  • Z. Qawaqneh, A. A. Mallouh, and B. D. Barkana, “Deep neural network framework and transformed MFCCs for speaker’s age and gender classification,” in Knowledge-Based Systems, 2017, vol. 115, pp. 5–14.
  • S. Safavi, M. Russell, and P. Jančovič, “Automatic speaker, age-group and gender identification from children’s speech,” in Computer Speech and Language, 2018, vol. 50, pp. 141–156.
  • C. BAKIR, “Automatic Speaker Gender Identification for the German Language,” in Balkan Journal of Electrical and Computer Engineering, 2016, vol. 4, no. 2, pp. 79–83.
  • O. Büyük and L. M. Arslan, “An investigation of multi-language age classification from voice,” in BIOSIGNALS 2019 - 12th International Conference on Bio-Inspired Systems and Signal Processing, Proceedings; Part of 12th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2019, 2019, pp. 85–92.
  • E. Fokoue, Z. Ma, and E. Fokoué, “Speaker Gender Recognition via MFCCs and SVMs,” (2013), Accessed from https://scholarworks.rit.edu/article/1749
  • J. Přibil, A. Přibilová, and J. Matoušek, “GMM-based speaker age and gender classification in Czech and Slovak,” in Journal of Electrical Engineering, 2017, vol. 68, no. 1, pp. 3–12.
  • F. Faek, “Objective Gender and Age Recognition from Speech Sentences,” in Aro, The Scientific Journal of Koya University, 2015, vol. 3, no. 2, pp. 24–29.
  • J. Równicka and S. Kacprzak, “Speaker Age Classification and Regression Using i-Vectors,” 2016, pp. 1402–1406.
  • B. Schuller et al., “The INTERSPEECH 2010 paralinguistic challenge,” in Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, 2010, pp. 2794–2797.
  • S. B. Davis and P. Mermelstein, “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences,” in IEEE Transactions on Acoustics, Speech, and Signal Processing, 1980, vol. 28, no. 4, pp. 357–366.
  • S. S. Stevens, J. Volkmann, and E. B. Newman, “A Scale for the Measurement of the Psychological Magnitude Pitch,” in Journal of the Acoustical Society of America, 1937, vol. 8, no. 3, pp. 185–190.
  • J. W. Picone, “Signal Modeling Techniques in Speech Recognition,” in Proceedings of the IEEE, 1993, vol. 81, no. 9, pp. 1215–1247.
  • L. Rabiner, “Fundamentals of speech recognition,” Fundam. speech Recognit., 1993.
  • S. Furui, “Comparison of Speaker Recognition Methods Using Statistical Features and Dynamic Features,” in IEEE Transactions on Acoustics, Speech, and Signal Processing, 1981, vol. 29, no. 3, pp. 342–350.
  • J. S. Mason and X. Zhang, “Velocity and acceleration features in speaker recognition,” in [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing, 1991, pp. 3673–3676.
  • C. Cortes and V. Vapnik, “Support-vector networks,” in Machine learning, 1995, vol. 20, no. 3, pp. 273–297.
  • R. Collobert and S. Bengio, “SVMTorch: Support Vector Machines for large-scale regression problems,” in Journal of Machine Learning Research, 2001, vol. 1, no. 2, pp. 143–160.
  • J. Bilmes, “A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models,” Tech. Rep. ICSI-TR-97-021, Univ. Berkeley, vol. 4, 2000.
  • M. Azam and N. Bouguila, “Speaker verification using adapted bounded Gaussian mixture model,” in Proceedings - 2018 IEEE 19th International Conference on Information Reuse and Integration for Data Science, IRI 2018, 2018, vol. 10, no. 1–3, pp. 300–307.
  • W. M. Campbell, D. E. Sturim, D. A. Reynolds, and A. Solomonoff, “SVM based speaker verification using a GMM supervector kernel and NAP variability compensation,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2006, vol. 1, pp. 97–100.
Year 2020, Volume: 7 Issue: 3, 970 - 982, 30.09.2020
https://doi.org/10.31202/ecjse.707179

Abstract

References

  • F. Metze et al., “Comparison of four approaches to age and gender recognition for telephone applications,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2007, vol. 4, pp. IV-1089-IV-1092.
  • D. C. Tanner and M. E. Tanner, Forensic aspects of speech patterns: voice prints, speaker profiling, lie and intoxication detection. Lawyers & Judges Publishing Company, 2004.
  • S. Bhukya, “Effect of Gender on Improving Speech Recognition System,” in International Journal of Computer Applications, 2018, vol. 179, no. 14, pp. 22–30.
  • M. Li, C.-S. Jung, and K. Han, “Combining five acoustic level modeling methods for automatic speaker age and gender recognition,” in Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, 2010, pp. 2826–2829.
  • Z. Qawaqneh, A. A. Mallouh, and B. D. Barkana, “Deep neural network framework and transformed MFCCs for speaker’s age and gender classification,” in Knowledge-Based Systems, 2017, vol. 115, pp. 5–14.
  • S. Safavi, M. Russell, and P. Jančovič, “Automatic speaker, age-group and gender identification from children’s speech,” in Computer Speech and Language, 2018, vol. 50, pp. 141–156.
  • C. BAKIR, “Automatic Speaker Gender Identification for the German Language,” in Balkan Journal of Electrical and Computer Engineering, 2016, vol. 4, no. 2, pp. 79–83.
  • O. Büyük and L. M. Arslan, “An investigation of multi-language age classification from voice,” in BIOSIGNALS 2019 - 12th International Conference on Bio-Inspired Systems and Signal Processing, Proceedings; Part of 12th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2019, 2019, pp. 85–92.
  • E. Fokoue, Z. Ma, and E. Fokoué, “Speaker Gender Recognition via MFCCs and SVMs,” (2013), Accessed from https://scholarworks.rit.edu/article/1749
  • J. Přibil, A. Přibilová, and J. Matoušek, “GMM-based speaker age and gender classification in Czech and Slovak,” in Journal of Electrical Engineering, 2017, vol. 68, no. 1, pp. 3–12.
  • F. Faek, “Objective Gender and Age Recognition from Speech Sentences,” in Aro, The Scientific Journal of Koya University, 2015, vol. 3, no. 2, pp. 24–29.
  • J. Równicka and S. Kacprzak, “Speaker Age Classification and Regression Using i-Vectors,” 2016, pp. 1402–1406.
  • B. Schuller et al., “The INTERSPEECH 2010 paralinguistic challenge,” in Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, 2010, pp. 2794–2797.
  • S. B. Davis and P. Mermelstein, “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences,” in IEEE Transactions on Acoustics, Speech, and Signal Processing, 1980, vol. 28, no. 4, pp. 357–366.
  • S. S. Stevens, J. Volkmann, and E. B. Newman, “A Scale for the Measurement of the Psychological Magnitude Pitch,” in Journal of the Acoustical Society of America, 1937, vol. 8, no. 3, pp. 185–190.
  • J. W. Picone, “Signal Modeling Techniques in Speech Recognition,” in Proceedings of the IEEE, 1993, vol. 81, no. 9, pp. 1215–1247.
  • L. Rabiner, “Fundamentals of speech recognition,” Fundam. speech Recognit., 1993.
  • S. Furui, “Comparison of Speaker Recognition Methods Using Statistical Features and Dynamic Features,” in IEEE Transactions on Acoustics, Speech, and Signal Processing, 1981, vol. 29, no. 3, pp. 342–350.
  • J. S. Mason and X. Zhang, “Velocity and acceleration features in speaker recognition,” in [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing, 1991, pp. 3673–3676.
  • C. Cortes and V. Vapnik, “Support-vector networks,” in Machine learning, 1995, vol. 20, no. 3, pp. 273–297.
  • R. Collobert and S. Bengio, “SVMTorch: Support Vector Machines for large-scale regression problems,” in Journal of Machine Learning Research, 2001, vol. 1, no. 2, pp. 143–160.
  • J. Bilmes, “A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models,” Tech. Rep. ICSI-TR-97-021, Univ. Berkeley, vol. 4, 2000.
  • M. Azam and N. Bouguila, “Speaker verification using adapted bounded Gaussian mixture model,” in Proceedings - 2018 IEEE 19th International Conference on Information Reuse and Integration for Data Science, IRI 2018, 2018, vol. 10, no. 1–3, pp. 300–307.
  • W. M. Campbell, D. E. Sturim, D. A. Reynolds, and A. Solomonoff, “SVM based speaker verification using a GMM supervector kernel and NAP variability compensation,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2006, vol. 1, pp. 97–100.
There are 24 citations in total.

Details

Primary Language Turkish
Subjects Engineering
Journal Section Makaleler
Authors

Ergün Yücesoy 0000-0003-1707-384X

Publication Date September 30, 2020
Submission Date March 21, 2020
Acceptance Date May 3, 2020
Published in Issue Year 2020 Volume: 7 Issue: 3

Cite

IEEE E. Yücesoy, “Konuşmacının Yaş ve Cinsiyetine Göre Sınıflandırılmasında DVM Çekirdeğinin Etkisi”, El-Cezeri Journal of Science and Engineering, vol. 7, no. 3, pp. 970–982, 2020, doi: 10.31202/ecjse.707179.
Creative Commons License El-Cezeri is licensed to the public under a Creative Commons Attribution 4.0 license.
88x31.png