Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder

Gökay Dişken; Zekeriya Tüfekci

Araştırma Makalesi

Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder

Yıl 2023, Cilt: 19 Sayı: 2, 167 - 174, 29.06.2023

Gökay Dişken , Zekeriya Tüfekci

Öz

Audio spoof detection gained attention of the researchers recently, as it is vital to detect spoofed speech for automatic speaker recognition systems. Publicly available datasets also accelerated the studies in this area. Many different features and classifiers have been proposed to overcome the spoofed speech detection problem, and some of them achieved considerably high performances. However, under additive noise, the spoof detection performance drops rapidly. On the other hand, number of studies about robust spoofed speech detection is very limited. The problem becomes more interesting as the conventional speech enhancement methods reportedly performed worse than no enhancement. In this work, i-vectors are used for spoof detection, and discriminative denoising autoencoder (DAE) network is used to obtain enhanced (clean) i-vectors from their noisy counterparts. Once the enhanced i-vectors are obtained, they can be treated as normal i-vectors and can be scored/classified without any modifications in the classifier part. Data from ASVspoof 2015 challenge is used with five different additive noise types, following a similar configuration of previous studies. The DAE is trained with a multicondition manner, using both clean and corrupted i-vectors. Three different noise types at various signal-to-noise ratios are used to create corrupted i-vectors, and two different noise types are used only in the test stage to simulate unknown noise conditions. Experimental results showed that the proposed DAE approach is more effective than the conventional speech enhancement methods.

Anahtar Kelimeler

Deep learning, denoising autoencoder, i-vector, spoofing detection

Destekleyen Kurum

Tübitak

Proje Numarası

121E057

Teşekkür

This work was supported by TUBITAK under project number 121E057.

Kaynakça

[1] Z. Wu, E. S. Chng, and H. Li, “Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition,” in 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, 2012, vol. 2, pp. 1698–1701.
[2] A. Nautsch et al., “ASVspoof 2019: Spoofing Countermeasures for the Detection of Synthesized, Converted and Replayed Speech,” IEEE Trans. Biometrics, Behav. Identity Sci., vol. 3, no. 2, pp. 252–265, Apr. 2021.
[3] H. Delgado et al., “ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements,” in Odyssey 2018 The Speaker and Language Recognition Workshop, 2018, pp. 296–303.
[4] Z. Wu et al., “ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge,” IEEE J. Sel. Top. Signal Process., vol. 11, no. 4, pp. 588–604, Jun. 2017.
[5] M. Todisco, H. Delgado, and N. Evans, “Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification,” Comput. Speech Lang., vol. 45, pp. 516–535, Sep. 2017.
[6] J. Yang and L. Liu, “Playback speech detection based on magnitude-phase spectrum,” Electron. Lett., vol. 54, no. 14, pp. 901–903, Jul. 2018.
[7] A. T. Patil, H. A. Patil, and K. Khoria, “Effectiveness of energy separation-based instantaneous frequency estimation for cochlear cepstral features for synthetic and voice-converted spoofed speech detection,” Comput. Speech Lang., vol. 72, no. 1, p. 101301, Mar. 2022.
[8] J. Yang, R. K. Das, and N. Zhou, “Extraction of Octave Spectra Information for Spoofing Attack Detection,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 27, no. 12, pp. 2373–2384, Dec. 2019.
[9] C. Zhang, C. Yu, and J. H. L. Hansen, “An Investigation of Deep-Learning Frameworks for Speaker Verification Antispoofing,” IEEE J. Sel. Top. Signal Process., vol. 11, no. 4, pp. 684–694, Jun. 2017.
[10] S. Scardapane, L. Stoffl, F. Rohrbein, and A. Uncini, “On the use of deep recurrent neural networks for detecting audio spoofing attacks,” Proc. Int. Jt. Conf. Neural Networks, vol. 2017-May, pp. 3483–3490, 2017.
[11] C. Hanilçi, T. Kinnunen, M. Sahidullah, and A. Sizov, “Spoofing detection goes noisy: An analysis of synthetic speech detection in the presence of additive noise,” Speech Commun., vol. 85, pp. 83–97, Dec. 2016.
[12] X. Tian, Z. Wu, X. Xiao, E. S. Chng, and H. Li, “An Investigation of Spoofing Speech Detection Under Additive Noise and Reverberant Conditions,” in INTERSPEECH 2016, 2016, pp. 1715–1719.
[13] A. Gómez Alanís, A. M. Peinado, J. A. Gonzalez, and A. Gomez, “A Deep Identity Representation for Noise Robust Spoofing Detection,” in Interspeech 2018, 2018, pp. 676–680.
[14] A. Gomez-Alanis, A. M. Peinado, J. A. Gonzalez, and A. M. Gomez, “A Gated Recurrent Convolutional Neural Network for Robust Spoofing Detection,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 27, no. 12, pp. 1985–1999, Dec. 2019.
[15] S. Mahto, H. Yamamoto, and T. Koshinaka, “i-Vector Transformation Using a Novel Discriminative Denoising Autoencoder for Noise-Robust Speaker Recognition,” in Interspeech 2017, 2017, pp. 3722–3726.
[16] N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, “Front-End Factor Analysis for Speaker Verification,” IEEE Trans. Audio. Speech. Lang. Processing, vol. 19, no. 4, pp. 788–798, May 2011.
[17] W. Rao et al., “Neural networks based channel compensation for i-vector speaker verification,” in 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP), 2016, pp. 1–5.
[18] H. Yamamoto and T. Koshinaka, “Denoising autoencoder-based speaker feature restoration for utterances of short duration,” in Interspeech 2015, 2015, pp. 1052–1056.
[19] A. Varga and H. J. M. Steeneken, “Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems,” Speech Commun., vol. 12, no. 3, pp. 247–251, Jul. 1993.
[20] D. Dean, A. Kanagasundaram, H. Ghaemmaghami, M. H. Rahman, and S. Sridharan, “The QUT-NOISE-SRE protocol for the evaluation of noisy speaker recognition,” in Interspeech 2015, 2015, pp. 3456–3460.
[21] C. Zhang et al., “Joint information from nonlinear and linear features for spoofing detection: An i-vector/DNN based approach,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 5035–5039.
[22] A. Sizov, E. Khoury, T. Kinnunen, Z. Wu, and S. Marcel, “Joint Speaker Verification and Antispoofing in the i-Vector Space,” IEEE Trans. Inf. Forensics Secur., vol. 10, no. 4, pp. 821–832, Apr. 2015.
[23] D. Martinez, L. Burget, T. Stafylakis, Y. Lei, P. Kenny, and E. Lleida, “Unscented transform for ivector-based noisy speaker recognition,” in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 4042–4046.
[24] D. Ribas and E. Vincent, “An Improved Uncertainty Propagation Method for Robust I-Vector Based Speaker Recognition,” in 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 6331–6335.
[25] W. Ben Kheder, D. Matrouf, M. Ajili, and J.-F. Bonastre, “A Unified Joint Model to Deal With Nuisance Variabilities in the i-Vector Space,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 26, no. 3, pp. 633–645, Mar. 2018.
[26] W. Ben Kheder, D. Matrouf, J.-F. Bonastre, M. Ajili, and P.-M. Bousquet, “Additive noise compensation in the i-vector space for speaker recognition,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 4190–4194.
[27] W. Ben Kheder, D. Matrouf, P.-M. Bousquet, J.-F. Bonastre, and M. Ajili, “Fast i-vector denoising using MAP estimation and a noise distributions database for robust speaker recognition,” Comput. Speech Lang., vol. 45, pp. 104–122, Sep. 2017.
[28] G. Saon, H. Soltau, D. Nahamoo, and M. Picheny, “Speaker adaptation of neural network acoustic models using i-vectors,” in 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013, pp. 55–59.
[29] W. Wang, W. Song, C. Chen, Z. Zhang, and Y. Xin, “I-vector features and deep neural network modeling for language recognition,” Procedia Comput. Sci., vol. 147, pp. 36–43, 2019.
[30] Y. Qian, N. Chen, H. Dinkel, and Z. Wu, “Deep Feature Engineering for Noise Robust Spoofing Detection,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 25, no. 10, pp. 1942–1955, Oct. 2017.

Yıl 2023, Cilt: 19 Sayı: 2, 167 - 174, 29.06.2023

Gökay Dişken , Zekeriya Tüfekci

Öz

Proje Numarası

121E057

Kaynakça

[1] Z. Wu, E. S. Chng, and H. Li, “Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition,” in 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, 2012, vol. 2, pp. 1698–1701.
[2] A. Nautsch et al., “ASVspoof 2019: Spoofing Countermeasures for the Detection of Synthesized, Converted and Replayed Speech,” IEEE Trans. Biometrics, Behav. Identity Sci., vol. 3, no. 2, pp. 252–265, Apr. 2021.
[3] H. Delgado et al., “ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements,” in Odyssey 2018 The Speaker and Language Recognition Workshop, 2018, pp. 296–303.
[4] Z. Wu et al., “ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge,” IEEE J. Sel. Top. Signal Process., vol. 11, no. 4, pp. 588–604, Jun. 2017.
[5] M. Todisco, H. Delgado, and N. Evans, “Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification,” Comput. Speech Lang., vol. 45, pp. 516–535, Sep. 2017.
[6] J. Yang and L. Liu, “Playback speech detection based on magnitude-phase spectrum,” Electron. Lett., vol. 54, no. 14, pp. 901–903, Jul. 2018.
[7] A. T. Patil, H. A. Patil, and K. Khoria, “Effectiveness of energy separation-based instantaneous frequency estimation for cochlear cepstral features for synthetic and voice-converted spoofed speech detection,” Comput. Speech Lang., vol. 72, no. 1, p. 101301, Mar. 2022.
[8] J. Yang, R. K. Das, and N. Zhou, “Extraction of Octave Spectra Information for Spoofing Attack Detection,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 27, no. 12, pp. 2373–2384, Dec. 2019.
[9] C. Zhang, C. Yu, and J. H. L. Hansen, “An Investigation of Deep-Learning Frameworks for Speaker Verification Antispoofing,” IEEE J. Sel. Top. Signal Process., vol. 11, no. 4, pp. 684–694, Jun. 2017.
[10] S. Scardapane, L. Stoffl, F. Rohrbein, and A. Uncini, “On the use of deep recurrent neural networks for detecting audio spoofing attacks,” Proc. Int. Jt. Conf. Neural Networks, vol. 2017-May, pp. 3483–3490, 2017.
[11] C. Hanilçi, T. Kinnunen, M. Sahidullah, and A. Sizov, “Spoofing detection goes noisy: An analysis of synthetic speech detection in the presence of additive noise,” Speech Commun., vol. 85, pp. 83–97, Dec. 2016.
[12] X. Tian, Z. Wu, X. Xiao, E. S. Chng, and H. Li, “An Investigation of Spoofing Speech Detection Under Additive Noise and Reverberant Conditions,” in INTERSPEECH 2016, 2016, pp. 1715–1719.
[13] A. Gómez Alanís, A. M. Peinado, J. A. Gonzalez, and A. Gomez, “A Deep Identity Representation for Noise Robust Spoofing Detection,” in Interspeech 2018, 2018, pp. 676–680.
[14] A. Gomez-Alanis, A. M. Peinado, J. A. Gonzalez, and A. M. Gomez, “A Gated Recurrent Convolutional Neural Network for Robust Spoofing Detection,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 27, no. 12, pp. 1985–1999, Dec. 2019.
[15] S. Mahto, H. Yamamoto, and T. Koshinaka, “i-Vector Transformation Using a Novel Discriminative Denoising Autoencoder for Noise-Robust Speaker Recognition,” in Interspeech 2017, 2017, pp. 3722–3726.
[16] N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, “Front-End Factor Analysis for Speaker Verification,” IEEE Trans. Audio. Speech. Lang. Processing, vol. 19, no. 4, pp. 788–798, May 2011.
[17] W. Rao et al., “Neural networks based channel compensation for i-vector speaker verification,” in 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP), 2016, pp. 1–5.
[18] H. Yamamoto and T. Koshinaka, “Denoising autoencoder-based speaker feature restoration for utterances of short duration,” in Interspeech 2015, 2015, pp. 1052–1056.
[19] A. Varga and H. J. M. Steeneken, “Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems,” Speech Commun., vol. 12, no. 3, pp. 247–251, Jul. 1993.
[20] D. Dean, A. Kanagasundaram, H. Ghaemmaghami, M. H. Rahman, and S. Sridharan, “The QUT-NOISE-SRE protocol for the evaluation of noisy speaker recognition,” in Interspeech 2015, 2015, pp. 3456–3460.
[21] C. Zhang et al., “Joint information from nonlinear and linear features for spoofing detection: An i-vector/DNN based approach,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 5035–5039.
[22] A. Sizov, E. Khoury, T. Kinnunen, Z. Wu, and S. Marcel, “Joint Speaker Verification and Antispoofing in the i-Vector Space,” IEEE Trans. Inf. Forensics Secur., vol. 10, no. 4, pp. 821–832, Apr. 2015.
[23] D. Martinez, L. Burget, T. Stafylakis, Y. Lei, P. Kenny, and E. Lleida, “Unscented transform for ivector-based noisy speaker recognition,” in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 4042–4046.
[24] D. Ribas and E. Vincent, “An Improved Uncertainty Propagation Method for Robust I-Vector Based Speaker Recognition,” in 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 6331–6335.
[25] W. Ben Kheder, D. Matrouf, M. Ajili, and J.-F. Bonastre, “A Unified Joint Model to Deal With Nuisance Variabilities in the i-Vector Space,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 26, no. 3, pp. 633–645, Mar. 2018.
[26] W. Ben Kheder, D. Matrouf, J.-F. Bonastre, M. Ajili, and P.-M. Bousquet, “Additive noise compensation in the i-vector space for speaker recognition,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 4190–4194.
[27] W. Ben Kheder, D. Matrouf, P.-M. Bousquet, J.-F. Bonastre, and M. Ajili, “Fast i-vector denoising using MAP estimation and a noise distributions database for robust speaker recognition,” Comput. Speech Lang., vol. 45, pp. 104–122, Sep. 2017.
[28] G. Saon, H. Soltau, D. Nahamoo, and M. Picheny, “Speaker adaptation of neural network acoustic models using i-vectors,” in 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013, pp. 55–59.
[29] W. Wang, W. Song, C. Chen, Z. Zhang, and Y. Xin, “I-vector features and deep neural network modeling for language recognition,” Procedia Comput. Sci., vol. 147, pp. 36–43, 2019.
[30] Y. Qian, N. Chen, H. Dinkel, and Z. Wu, “Deep Feature Engineering for Noise Robust Spoofing Detection,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 25, no. 10, pp. 1942–1955, Oct. 2017.

Toplam 30 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Mühendislik
Bölüm	Makaleler
Yazarlar	Gökay Dişken 0000-0002-8680-0636 Zekeriya Tüfekci 0000-0001-7835-2741
Proje Numarası	121E057
Yayımlanma Tarihi	29 Haziran 2023
Yayımlandığı Sayı	Yıl 2023 Cilt: 19 Sayı: 2

Kaynak Göster

APA	Dişken, G., & Tüfekci, Z. (2023). Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder. Celal Bayar University Journal of Science, 19(2), 167-174.
AMA	Dişken G, Tüfekci Z. Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder. CBUJOS. Haziran 2023;19(2):167-174.
Chicago	Dişken, Gökay, ve Zekeriya Tüfekci. “Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder”. Celal Bayar University Journal of Science 19, sy. 2 (Haziran 2023): 167-74.
EndNote	Dişken G, Tüfekci Z (01 Haziran 2023) Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder. Celal Bayar University Journal of Science 19 2 167–174.
IEEE	G. Dişken ve Z. Tüfekci, “Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder”, CBUJOS, c. 19, sy. 2, ss. 167–174, 2023.
ISNAD	Dişken, Gökay - Tüfekci, Zekeriya. “Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder”. Celal Bayar University Journal of Science 19/2 (Haziran 2023), 167-174.
JAMA	Dişken G, Tüfekci Z. Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder. CBUJOS. 2023;19:167–174.
MLA	Dişken, Gökay ve Zekeriya Tüfekci. “Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder”. Celal Bayar University Journal of Science, c. 19, sy. 2, 2023, ss. 167-74.
Vancouver	Dişken G, Tüfekci Z. Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder. CBUJOS. 2023;19(2):167-74.

Kapak Resmi İndir

Makale Dosyaları

Tam Metin