Research Article
BibTex RIS Cite

I-vectorlerin Maskeleme Yoluyla Dayanıklılığının Arttırılması: Sentetik Konuşma Tespitinde Bir Vaka Çalışması

Year 2024, Volume: 29 Issue: 1, 191 - 204, 22.04.2024
https://doi.org/10.17482/uumfd.1311113

Abstract

Konuşmacı tanıma sistemleri için güvenlik hayati önem taşımaktadır. Geçtiğimiz yıllarda, sahte konuşma saldırılarının bu sistemleri kandırabildiği ortaya konmuştur. Bu durumu önlemek amacı ile sahte konuşma tespit sistemleri geliştirilmiştir. Bu tür sistemler bazı durumlarda oldukça yüksek performans sergilese de, gürültü altında performansları kötüleşmektedir. Geleneksel konuşma iyileştirme yöntemleri performansı artırmak bir yana, daha da kötüleştirmektedir. Bu çalışmada, konvolüsyonel sinir ağı yapısı kullanılarak elde edilen maskenin gürültü etkisini azaltmaktaki performansı incelenmiştir. Maske, spektrogramın gürültülü bölgelerini bastırmakta ve bu spektrogramdan elde edilen i-vectorleri gürbüz hale getirmekte kullanılmıştır. ASVspoof 2015 veri tabanı ve üç farklı gürültü tipi ile gerçekleştirilen testlerde önerilen sistemin geleneksel sistemlerden daha üstün olduğu gösterilmiştir. Ancak eğitim aşamasında karşılaşılmayan gürültü tiplerinde performans kaybı olmaktadır.

Project Number

121E057

References

  • 1. Alegre, F., Amehraye, A. and Evans, N. (2013) A one-class classification approach to generalized speaker verification spoofing countermeasures using local binary patterns, PInt. Conf. on Biometrics: Theory, Applications and Systems (BTAS), IEEE, Washington DC, USA. doi: 10.1109/BTAS.2013.6712706
  • 2. ASVspoof, (2014). ASVspoof 2015: Automatic speaker verification spoofing and countermeasures challenge evaluation plan. Available: https://www.asvspoof.org/asvSpoof.pdf Accessed: Dec 19, 2014
  • 3. Benhafid, Z., Selouani, S. A., Yakoub, M. S., Amrouche, A. (2021) LARIHS ASSERT reassessment for logical access ASVspoof 2021 challenge. Proceedings of the 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge, Online, 94-99. doi: 10.21437/ASVSPOOF.2021-15
  • 4. Dean, D., Kanagasundaram, A., Ghaemmaghami, H., Hafizur, M., Sridharan, S. (2015) The QUT-NOISE-SRE protocol for the evaluation of noisy speaker recognition, Interspeech 2015, International Speech and Communication Association, Dresden. doi: 10.21437/Interspeech.2015-685
  • 5. Dehak, N., Kenny, P. J., Dehak, R., Dumouchel, P., Ouellet, P. (2011) Front-End Factor Analysis for Speaker Verification, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 19(4), 788-798. doi: 10.1109/TASL.2010.2064307
  • 6. Delgado, H., Todisco, M., Sahidullah, M., Evans, N., Kinnunen, T., Lee, K. A., Yamagishi, J. (2018) ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements, Odyssey 2018 - The Speaker and Language Recognition Workshop, ASVSpoof, Odyssey, 296-303. doi: 10.21437/Odyssey.2018-42
  • 7. Dipjyoti, P., Monisankha, P., Goutam, S., (2015) Novel speech features for improved detection of spoofing attacks, 2015 Annual IEEE India Conference (INDICON), New Delhi, India, pp. 1-6, doi: 10.1109/INDICON.2015.7443805.
  • 8. Dipjyoti, P., Monisankha, P., Goutam, S., (2017) Spectral features for synthetic speech detection. IEEE journal of selected topics in signal processing, 11.4: 605-617. doi: 10.1109/JSTSP.2017.2684705
  • 9. Dişken, G. (2023) Differential convolutional network for noise mask estimation. Applied Acoustics, 211, 109568. doi: 10.1016/j.apacoust.2023.109568
  • 10. Dutoit, T., Holzapfel, A., Jottrand, M., Moinet, A., Perez, J., Stylianou, Y. (2007) Towards a voice conversion system based on frame selection, Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP),. IEEE, Honolulu, USA,4, 513-516. doi: 10.1109/ICASSP.2007.366962
  • 11. Evans, N., Yamagishi, J., and Kinnunen, T. (2013) Spoofing and countermeasures for speaker verification: a need for standard corpora, protocols, and metrics, IEEE Signal Processing Society Speech and Language Technical Committee Newsletter..
  • 12. Evans, N., Kinnunen, T., and Yamagishi, J. (2013) Spoofing and countermeasures for automatic speaker verification, Interspeech 2013, ISCA, Lyon, France, 925-929. doi: 10.21437/Interspeech.2013- 288.
  • 13. Find Biometrics (2018). Voicevault Biometrics to Protect Payments. Available: https://findbiometrics.com/voicevault-biometrics-toprotect-payments-25131/ (Accessed: Jun. 13, 2018)
  • 14. Find Biometrics (2018). Morpho and Agnitio Partner, Bring Voice Biometrics to Criminal ID. Available: https:// findbiometrics.com/morpho-and-agnitio-partner-bring-voice-biometricsto-criminal-id-21261/ (Accessed: Jun. 13, 2018)
  • 15. Gomez-Alanis, A., Peinado, A. M., Gonzalez, J. A., and Gomez, A. M. (2018) A Deep Identity Representation for Noise Robust Spoofing Detection, Interspeech 2018, International Speech and Communication Association, Haydarabad, 676-680. doi: 10.21437/Interspeech.2018-1909
  • 16. Gomez-Alanis, A., Peinado, A. M., Gonzalez, J. A., and Gomez, A. M. (2019) A Gated Recurrent Convolutional Neural Network for Robust Spoofing Detection, IEEE/ACM Transactions on Audio, Speech, And Language Processing, 27(12), 1985-1999. Doi: 10.1109/TASLP.2019.2937413
  • 17. Hanilçi, C. (2018) Data selection for i-vector based automatic speaker verification anti-spoofing, Digital Signal Processing, 72, 171-180. doi: 10.1016/j.dsp.2017.10.010 (Article)
  • 18. Hanilçi, C., Kinnunen, T., Sahidullaha, M., Sizova, A. (2016) Spoofing detection goes noisy: An analysis of synthetic speech detection in the presence of additive noise, Speech Communication, 85, 83-97. doi: 10.1016/j.specom.2016.10.002
  • 19. Hassan, F., Javed, A. (2021) "Voice Spoofing Countermeasure for Synthetic Speech Detection," 2021 International Conference on Artificial Intelligence (ICAI), Islamabad, Pakistan, 2021, pp. 209-212, doi: 10.1109/ICAI52203.2021.9445238.
  • 20. HSBC (2017). HSBC Voice ID Making Telephone Banking Safer Than Ever. Available: https://www.hsbc.co.uk/1/2/voice-id (Accessed: Dec. 29, 2017)
  • 21. Jung, J., Heo, H., Tak, H., Shim, H., Chung, J. S., Lee, B. J., Yu, H. J., & Evans, N. (2022) AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks, ICASSP 2022, IEEE, Lyon, France. doi: 10.21437/Interspeech.2013-288
  • 22. Nugroho, K., Winarno, E., (2022) Spoofing Detection of Fake Speech Using Deep Neural Network Algorithm, 2022 International Seminar on Application for Technology of Information and Communication (iSemantic), Semarang, Indonesia, pp. 56-60. doi: 10.1109/iSemantic55962.2022.9920401.
  • 23. Sizov, A., Khoury, E., Kinnunen, T ., Wu, Z. and Marcel, S. (2015) Joint speaker verification and anti-spoofing in the i-vector space, IEEE Transactions on Information Forensics and Security, IEEE Transactions on Information Forensics and Security, 10(4), 821-832. doi: 10.1109/TIFS.2015.2407362
  • 24. Xiao, X., Tian, X., Du, S., Xu, H., Chng, E., Li, H. (2015). Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for ASVspoof 2015 challenge. In Interspeech (pp. 2052-2056). doi:10.21437/Interspeech.2015-465
  • 25. Varga, A., Steeneken, H. J. M. (1993) Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems, Speech Communication, 12(43), 247-251. doi: 10.1016/0167-6393(93)90095-3
  • 26. Wang, D. L., Kjems, U., Perdersen, M. S., Boldt, J. B., Lunner, T. (2009) Speech intelligibility in background noise with ideal binary time-frequency masking, J. Acoustical Soc. America, 125(4), 2336– 2347. doi: 10.1121/1.3083233
  • 27. Wang, X., Yamagishi, J. (2021) A Comparative Study on Recent Neural Spoofing Countermeasures for Synthetic Speech Detection., Interspeech 2021, ISCA, Brno, Czech Republic, 4259-4263. doi: 10.21437/Interspeech.2021-702
  • 28. Wu, Z., Khodabakhsh, A., Demiroglu, C., Yamagishi, J., Saito, D., Toda, T., King, S. (2015) SAS: A speaker verification spoofing database containing diverse attacks, IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, South Brisbane, Queensland, Australia, 9(5), 4440-4444. doi: 10.1109/ICASSP.2015.7178810.
  • 29. Wu, Z., Kinnunen, T., Evans, N., & Yamagishi, J. (2015). Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof 2015) Database, University of Edinburgh. The Centre for Speech Technology Research (CSTR). https://doi.org/10.7488/ds/298.
  • 30. Zhang, C., Yu, C., Hansen, J. H. L. (2017) An Investigation of Deep-Learning Frameworks for Speaker Verification Antispoofing, IEEE Journal of Selected Topics in Signal Processing, 11, 684-694, 2017.
  • 31. Zhang, Y., Jiang, F., Duan, Z. (2020) One-Class Learning Towards Synthetic Voice Spoofing Detection, in IEEE Signal Processing Letters, vol. 28, pp. 937-941, doi: 10.1109/LSP.2021.3076358.

INCREASING ROBUSTNESS OF I-VECTORS VIA MASKING: A CASE STUDY IN SYNTHETIC SPEECH DETECTION

Year 2024, Volume: 29 Issue: 1, 191 - 204, 22.04.2024
https://doi.org/10.17482/uumfd.1311113

Abstract

Ensuring security in speaker recognition systems is crucial. In the past years, it has been demonstrated that spoofing attacks can fool these systems. In order to deal with this issue, spoof speech detection systems have been developed. While these systems have served with a good performance, their effectiveness tends to degrade under noise. Traditional speech enhancement methods are not efficient for improving performance, they even make it worse. In this research paper, performance of the noise mask obtained via a convolutional neural network structure for reducing the noise effects was investigated. The mask is used to suppress noisy regions of spectrograms in order to extract robust i-vectors. The proposed system is tested on the ASVspoof 2015 database with three different noise types and accomplished superior performance compared to the traditional systems. However, there is a loss of performance in noise types that are not encountered during training phase.

Supporting Institution

TÜBİTAK

Project Number

121E057

Thanks

This work was supported by TÜBİTAK (Project No: 121E057).

References

  • 1. Alegre, F., Amehraye, A. and Evans, N. (2013) A one-class classification approach to generalized speaker verification spoofing countermeasures using local binary patterns, PInt. Conf. on Biometrics: Theory, Applications and Systems (BTAS), IEEE, Washington DC, USA. doi: 10.1109/BTAS.2013.6712706
  • 2. ASVspoof, (2014). ASVspoof 2015: Automatic speaker verification spoofing and countermeasures challenge evaluation plan. Available: https://www.asvspoof.org/asvSpoof.pdf Accessed: Dec 19, 2014
  • 3. Benhafid, Z., Selouani, S. A., Yakoub, M. S., Amrouche, A. (2021) LARIHS ASSERT reassessment for logical access ASVspoof 2021 challenge. Proceedings of the 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge, Online, 94-99. doi: 10.21437/ASVSPOOF.2021-15
  • 4. Dean, D., Kanagasundaram, A., Ghaemmaghami, H., Hafizur, M., Sridharan, S. (2015) The QUT-NOISE-SRE protocol for the evaluation of noisy speaker recognition, Interspeech 2015, International Speech and Communication Association, Dresden. doi: 10.21437/Interspeech.2015-685
  • 5. Dehak, N., Kenny, P. J., Dehak, R., Dumouchel, P., Ouellet, P. (2011) Front-End Factor Analysis for Speaker Verification, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 19(4), 788-798. doi: 10.1109/TASL.2010.2064307
  • 6. Delgado, H., Todisco, M., Sahidullah, M., Evans, N., Kinnunen, T., Lee, K. A., Yamagishi, J. (2018) ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements, Odyssey 2018 - The Speaker and Language Recognition Workshop, ASVSpoof, Odyssey, 296-303. doi: 10.21437/Odyssey.2018-42
  • 7. Dipjyoti, P., Monisankha, P., Goutam, S., (2015) Novel speech features for improved detection of spoofing attacks, 2015 Annual IEEE India Conference (INDICON), New Delhi, India, pp. 1-6, doi: 10.1109/INDICON.2015.7443805.
  • 8. Dipjyoti, P., Monisankha, P., Goutam, S., (2017) Spectral features for synthetic speech detection. IEEE journal of selected topics in signal processing, 11.4: 605-617. doi: 10.1109/JSTSP.2017.2684705
  • 9. Dişken, G. (2023) Differential convolutional network for noise mask estimation. Applied Acoustics, 211, 109568. doi: 10.1016/j.apacoust.2023.109568
  • 10. Dutoit, T., Holzapfel, A., Jottrand, M., Moinet, A., Perez, J., Stylianou, Y. (2007) Towards a voice conversion system based on frame selection, Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP),. IEEE, Honolulu, USA,4, 513-516. doi: 10.1109/ICASSP.2007.366962
  • 11. Evans, N., Yamagishi, J., and Kinnunen, T. (2013) Spoofing and countermeasures for speaker verification: a need for standard corpora, protocols, and metrics, IEEE Signal Processing Society Speech and Language Technical Committee Newsletter..
  • 12. Evans, N., Kinnunen, T., and Yamagishi, J. (2013) Spoofing and countermeasures for automatic speaker verification, Interspeech 2013, ISCA, Lyon, France, 925-929. doi: 10.21437/Interspeech.2013- 288.
  • 13. Find Biometrics (2018). Voicevault Biometrics to Protect Payments. Available: https://findbiometrics.com/voicevault-biometrics-toprotect-payments-25131/ (Accessed: Jun. 13, 2018)
  • 14. Find Biometrics (2018). Morpho and Agnitio Partner, Bring Voice Biometrics to Criminal ID. Available: https:// findbiometrics.com/morpho-and-agnitio-partner-bring-voice-biometricsto-criminal-id-21261/ (Accessed: Jun. 13, 2018)
  • 15. Gomez-Alanis, A., Peinado, A. M., Gonzalez, J. A., and Gomez, A. M. (2018) A Deep Identity Representation for Noise Robust Spoofing Detection, Interspeech 2018, International Speech and Communication Association, Haydarabad, 676-680. doi: 10.21437/Interspeech.2018-1909
  • 16. Gomez-Alanis, A., Peinado, A. M., Gonzalez, J. A., and Gomez, A. M. (2019) A Gated Recurrent Convolutional Neural Network for Robust Spoofing Detection, IEEE/ACM Transactions on Audio, Speech, And Language Processing, 27(12), 1985-1999. Doi: 10.1109/TASLP.2019.2937413
  • 17. Hanilçi, C. (2018) Data selection for i-vector based automatic speaker verification anti-spoofing, Digital Signal Processing, 72, 171-180. doi: 10.1016/j.dsp.2017.10.010 (Article)
  • 18. Hanilçi, C., Kinnunen, T., Sahidullaha, M., Sizova, A. (2016) Spoofing detection goes noisy: An analysis of synthetic speech detection in the presence of additive noise, Speech Communication, 85, 83-97. doi: 10.1016/j.specom.2016.10.002
  • 19. Hassan, F., Javed, A. (2021) "Voice Spoofing Countermeasure for Synthetic Speech Detection," 2021 International Conference on Artificial Intelligence (ICAI), Islamabad, Pakistan, 2021, pp. 209-212, doi: 10.1109/ICAI52203.2021.9445238.
  • 20. HSBC (2017). HSBC Voice ID Making Telephone Banking Safer Than Ever. Available: https://www.hsbc.co.uk/1/2/voice-id (Accessed: Dec. 29, 2017)
  • 21. Jung, J., Heo, H., Tak, H., Shim, H., Chung, J. S., Lee, B. J., Yu, H. J., & Evans, N. (2022) AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks, ICASSP 2022, IEEE, Lyon, France. doi: 10.21437/Interspeech.2013-288
  • 22. Nugroho, K., Winarno, E., (2022) Spoofing Detection of Fake Speech Using Deep Neural Network Algorithm, 2022 International Seminar on Application for Technology of Information and Communication (iSemantic), Semarang, Indonesia, pp. 56-60. doi: 10.1109/iSemantic55962.2022.9920401.
  • 23. Sizov, A., Khoury, E., Kinnunen, T ., Wu, Z. and Marcel, S. (2015) Joint speaker verification and anti-spoofing in the i-vector space, IEEE Transactions on Information Forensics and Security, IEEE Transactions on Information Forensics and Security, 10(4), 821-832. doi: 10.1109/TIFS.2015.2407362
  • 24. Xiao, X., Tian, X., Du, S., Xu, H., Chng, E., Li, H. (2015). Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for ASVspoof 2015 challenge. In Interspeech (pp. 2052-2056). doi:10.21437/Interspeech.2015-465
  • 25. Varga, A., Steeneken, H. J. M. (1993) Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems, Speech Communication, 12(43), 247-251. doi: 10.1016/0167-6393(93)90095-3
  • 26. Wang, D. L., Kjems, U., Perdersen, M. S., Boldt, J. B., Lunner, T. (2009) Speech intelligibility in background noise with ideal binary time-frequency masking, J. Acoustical Soc. America, 125(4), 2336– 2347. doi: 10.1121/1.3083233
  • 27. Wang, X., Yamagishi, J. (2021) A Comparative Study on Recent Neural Spoofing Countermeasures for Synthetic Speech Detection., Interspeech 2021, ISCA, Brno, Czech Republic, 4259-4263. doi: 10.21437/Interspeech.2021-702
  • 28. Wu, Z., Khodabakhsh, A., Demiroglu, C., Yamagishi, J., Saito, D., Toda, T., King, S. (2015) SAS: A speaker verification spoofing database containing diverse attacks, IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, South Brisbane, Queensland, Australia, 9(5), 4440-4444. doi: 10.1109/ICASSP.2015.7178810.
  • 29. Wu, Z., Kinnunen, T., Evans, N., & Yamagishi, J. (2015). Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof 2015) Database, University of Edinburgh. The Centre for Speech Technology Research (CSTR). https://doi.org/10.7488/ds/298.
  • 30. Zhang, C., Yu, C., Hansen, J. H. L. (2017) An Investigation of Deep-Learning Frameworks for Speaker Verification Antispoofing, IEEE Journal of Selected Topics in Signal Processing, 11, 684-694, 2017.
  • 31. Zhang, Y., Jiang, F., Duan, Z. (2020) One-Class Learning Towards Synthetic Voice Spoofing Detection, in IEEE Signal Processing Letters, vol. 28, pp. 937-941, doi: 10.1109/LSP.2021.3076358.
There are 31 citations in total.

Details

Primary Language English
Subjects Software Engineering (Other)
Journal Section Research Articles
Authors

Barış Aydın 0000-0003-0604-8243

Gökay Dişken 0000-0002-8680-0636

Project Number 121E057
Early Pub Date March 28, 2024
Publication Date April 22, 2024
Submission Date June 7, 2023
Acceptance Date March 15, 2024
Published in Issue Year 2024 Volume: 29 Issue: 1

Cite

APA Aydın, B., & Dişken, G. (2024). INCREASING ROBUSTNESS OF I-VECTORS VIA MASKING: A CASE STUDY IN SYNTHETIC SPEECH DETECTION. Uludağ Üniversitesi Mühendislik Fakültesi Dergisi, 29(1), 191-204. https://doi.org/10.17482/uumfd.1311113
AMA Aydın B, Dişken G. INCREASING ROBUSTNESS OF I-VECTORS VIA MASKING: A CASE STUDY IN SYNTHETIC SPEECH DETECTION. UUJFE. April 2024;29(1):191-204. doi:10.17482/uumfd.1311113
Chicago Aydın, Barış, and Gökay Dişken. “INCREASING ROBUSTNESS OF I-VECTORS VIA MASKING: A CASE STUDY IN SYNTHETIC SPEECH DETECTION”. Uludağ Üniversitesi Mühendislik Fakültesi Dergisi 29, no. 1 (April 2024): 191-204. https://doi.org/10.17482/uumfd.1311113.
EndNote Aydın B, Dişken G (April 1, 2024) INCREASING ROBUSTNESS OF I-VECTORS VIA MASKING: A CASE STUDY IN SYNTHETIC SPEECH DETECTION. Uludağ Üniversitesi Mühendislik Fakültesi Dergisi 29 1 191–204.
IEEE B. Aydın and G. Dişken, “INCREASING ROBUSTNESS OF I-VECTORS VIA MASKING: A CASE STUDY IN SYNTHETIC SPEECH DETECTION”, UUJFE, vol. 29, no. 1, pp. 191–204, 2024, doi: 10.17482/uumfd.1311113.
ISNAD Aydın, Barış - Dişken, Gökay. “INCREASING ROBUSTNESS OF I-VECTORS VIA MASKING: A CASE STUDY IN SYNTHETIC SPEECH DETECTION”. Uludağ Üniversitesi Mühendislik Fakültesi Dergisi 29/1 (April 2024), 191-204. https://doi.org/10.17482/uumfd.1311113.
JAMA Aydın B, Dişken G. INCREASING ROBUSTNESS OF I-VECTORS VIA MASKING: A CASE STUDY IN SYNTHETIC SPEECH DETECTION. UUJFE. 2024;29:191–204.
MLA Aydın, Barış and Gökay Dişken. “INCREASING ROBUSTNESS OF I-VECTORS VIA MASKING: A CASE STUDY IN SYNTHETIC SPEECH DETECTION”. Uludağ Üniversitesi Mühendislik Fakültesi Dergisi, vol. 29, no. 1, 2024, pp. 191-04, doi:10.17482/uumfd.1311113.
Vancouver Aydın B, Dişken G. INCREASING ROBUSTNESS OF I-VECTORS VIA MASKING: A CASE STUDY IN SYNTHETIC SPEECH DETECTION. UUJFE. 2024;29(1):191-204.

Announcements:

30.03.2021-Beginning with our April 2021 (26/1) issue, in accordance with the new criteria of TR-Dizin, the Declaration of Conflict of Interest and the Declaration of Author Contribution forms fulfilled and signed by all authors are required as well as the Copyright form during the initial submission of the manuscript. Furthermore two new sections, i.e. ‘Conflict of Interest’ and ‘Author Contribution’, should be added to the manuscript. Links of those forms that should be submitted with the initial manuscript can be found in our 'Author Guidelines' and 'Submission Procedure' pages. The manuscript template is also updated. For articles reviewed and accepted for publication in our 2021 and ongoing issues and for articles currently under review process, those forms should also be fulfilled, signed and uploaded to the system by authors.