Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder

Gökay Dişken; Zekeriya Tüfekci

Research Article

Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder

Year 2023, Volume: 19 Issue: 2, 167 - 174, 29.06.2023

Gökay Dişken , Zekeriya Tüfekci

Abstract

Audio spoof detection gained attention of the researchers recently, as it is vital to detect spoofed speech for automatic speaker recognition systems. Publicly available datasets also accelerated the studies in this area. Many different features and classifiers have been proposed to overcome the spoofed speech detection problem, and some of them achieved considerably high performances. However, under additive noise, the spoof detection performance drops rapidly. On the other hand, number of studies about robust spoofed speech detection is very limited. The problem becomes more interesting as the conventional speech enhancement methods reportedly performed worse than no enhancement. In this work, i-vectors are used for spoof detection, and discriminative denoising autoencoder (DAE) network is used to obtain enhanced (clean) i-vectors from their noisy counterparts. Once the enhanced i-vectors are obtained, they can be treated as normal i-vectors and can be scored/classified without any modifications in the classifier part. Data from ASVspoof 2015 challenge is used with five different additive noise types, following a similar configuration of previous studies. The DAE is trained with a multicondition manner, using both clean and corrupted i-vectors. Three different noise types at various signal-to-noise ratios are used to create corrupted i-vectors, and two different noise types are used only in the test stage to simulate unknown noise conditions. Experimental results showed that the proposed DAE approach is more effective than the conventional speech enhancement methods.

Keywords

Deep learning, denoising autoencoder, i-vector, spoofing detection

Supporting Institution

Tübitak

Project Number

121E057

Thanks

This work was supported by TUBITAK under project number 121E057.

References

[1] Z. Wu, E. S. Chng, and H. Li, “Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition,” in 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, 2012, vol. 2, pp. 1698–1701.
[2] A. Nautsch et al., “ASVspoof 2019: Spoofing Countermeasures for the Detection of Synthesized, Converted and Replayed Speech,” IEEE Trans. Biometrics, Behav. Identity Sci., vol. 3, no. 2, pp. 252–265, Apr. 2021.
[3] H. Delgado et al., “ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements,” in Odyssey 2018 The Speaker and Language Recognition Workshop, 2018, pp. 296–303.
[4] Z. Wu et al., “ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge,” IEEE J. Sel. Top. Signal Process., vol. 11, no. 4, pp. 588–604, Jun. 2017.
[5] M. Todisco, H. Delgado, and N. Evans, “Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification,” Comput. Speech Lang., vol. 45, pp. 516–535, Sep. 2017.
[6] J. Yang and L. Liu, “Playback speech detection based on magnitude-phase spectrum,” Electron. Lett., vol. 54, no. 14, pp. 901–903, Jul. 2018.
[7] A. T. Patil, H. A. Patil, and K. Khoria, “Effectiveness of energy separation-based instantaneous frequency estimation for cochlear cepstral features for synthetic and voice-converted spoofed speech detection,” Comput. Speech Lang., vol. 72, no. 1, p. 101301, Mar. 2022.
[8] J. Yang, R. K. Das, and N. Zhou, “Extraction of Octave Spectra Information for Spoofing Attack Detection,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 27, no. 12, pp. 2373–2384, Dec. 2019.
[9] C. Zhang, C. Yu, and J. H. L. Hansen, “An Investigation of Deep-Learning Frameworks for Speaker Verification Antispoofing,” IEEE J. Sel. Top. Signal Process., vol. 11, no. 4, pp. 684–694, Jun. 2017.
[10] S. Scardapane, L. Stoffl, F. Rohrbein, and A. Uncini, “On the use of deep recurrent neural networks for detecting audio spoofing attacks,” Proc. Int. Jt. Conf. Neural Networks, vol. 2017-May, pp. 3483–3490, 2017.
[11] C. Hanilçi, T. Kinnunen, M. Sahidullah, and A. Sizov, “Spoofing detection goes noisy: An analysis of synthetic speech detection in the presence of additive noise,” Speech Commun., vol. 85, pp. 83–97, Dec. 2016.
[12] X. Tian, Z. Wu, X. Xiao, E. S. Chng, and H. Li, “An Investigation of Spoofing Speech Detection Under Additive Noise and Reverberant Conditions,” in INTERSPEECH 2016, 2016, pp. 1715–1719.
[13] A. Gómez Alanís, A. M. Peinado, J. A. Gonzalez, and A. Gomez, “A Deep Identity Representation for Noise Robust Spoofing Detection,” in Interspeech 2018, 2018, pp. 676–680.
[14] A. Gomez-Alanis, A. M. Peinado, J. A. Gonzalez, and A. M. Gomez, “A Gated Recurrent Convolutional Neural Network for Robust Spoofing Detection,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 27, no. 12, pp. 1985–1999, Dec. 2019.
[15] S. Mahto, H. Yamamoto, and T. Koshinaka, “i-Vector Transformation Using a Novel Discriminative Denoising Autoencoder for Noise-Robust Speaker Recognition,” in Interspeech 2017, 2017, pp. 3722–3726.
[16] N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, “Front-End Factor Analysis for Speaker Verification,” IEEE Trans. Audio. Speech. Lang. Processing, vol. 19, no. 4, pp. 788–798, May 2011.
[17] W. Rao et al., “Neural networks based channel compensation for i-vector speaker verification,” in 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP), 2016, pp. 1–5.
[18] H. Yamamoto and T. Koshinaka, “Denoising autoencoder-based speaker feature restoration for utterances of short duration,” in Interspeech 2015, 2015, pp. 1052–1056.
[19] A. Varga and H. J. M. Steeneken, “Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems,” Speech Commun., vol. 12, no. 3, pp. 247–251, Jul. 1993.
[20] D. Dean, A. Kanagasundaram, H. Ghaemmaghami, M. H. Rahman, and S. Sridharan, “The QUT-NOISE-SRE protocol for the evaluation of noisy speaker recognition,” in Interspeech 2015, 2015, pp. 3456–3460.
[21] C. Zhang et al., “Joint information from nonlinear and linear features for spoofing detection: An i-vector/DNN based approach,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 5035–5039.
[22] A. Sizov, E. Khoury, T. Kinnunen, Z. Wu, and S. Marcel, “Joint Speaker Verification and Antispoofing in the i-Vector Space,” IEEE Trans. Inf. Forensics Secur., vol. 10, no. 4, pp. 821–832, Apr. 2015.
[23] D. Martinez, L. Burget, T. Stafylakis, Y. Lei, P. Kenny, and E. Lleida, “Unscented transform for ivector-based noisy speaker recognition,” in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 4042–4046.
[24] D. Ribas and E. Vincent, “An Improved Uncertainty Propagation Method for Robust I-Vector Based Speaker Recognition,” in 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 6331–6335.
[25] W. Ben Kheder, D. Matrouf, M. Ajili, and J.-F. Bonastre, “A Unified Joint Model to Deal With Nuisance Variabilities in the i-Vector Space,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 26, no. 3, pp. 633–645, Mar. 2018.
[26] W. Ben Kheder, D. Matrouf, J.-F. Bonastre, M. Ajili, and P.-M. Bousquet, “Additive noise compensation in the i-vector space for speaker recognition,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 4190–4194.
[27] W. Ben Kheder, D. Matrouf, P.-M. Bousquet, J.-F. Bonastre, and M. Ajili, “Fast i-vector denoising using MAP estimation and a noise distributions database for robust speaker recognition,” Comput. Speech Lang., vol. 45, pp. 104–122, Sep. 2017.
[28] G. Saon, H. Soltau, D. Nahamoo, and M. Picheny, “Speaker adaptation of neural network acoustic models using i-vectors,” in 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013, pp. 55–59.
[29] W. Wang, W. Song, C. Chen, Z. Zhang, and Y. Xin, “I-vector features and deep neural network modeling for language recognition,” Procedia Comput. Sci., vol. 147, pp. 36–43, 2019.
[30] Y. Qian, N. Chen, H. Dinkel, and Z. Wu, “Deep Feature Engineering for Noise Robust Spoofing Detection,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 25, no. 10, pp. 1942–1955, Oct. 2017.

Year 2023, Volume: 19 Issue: 2, 167 - 174, 29.06.2023

Gökay Dişken , Zekeriya Tüfekci

Abstract

Project Number

121E057

References

[1] Z. Wu, E. S. Chng, and H. Li, “Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition,” in 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, 2012, vol. 2, pp. 1698–1701.
[2] A. Nautsch et al., “ASVspoof 2019: Spoofing Countermeasures for the Detection of Synthesized, Converted and Replayed Speech,” IEEE Trans. Biometrics, Behav. Identity Sci., vol. 3, no. 2, pp. 252–265, Apr. 2021.
[3] H. Delgado et al., “ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements,” in Odyssey 2018 The Speaker and Language Recognition Workshop, 2018, pp. 296–303.
[4] Z. Wu et al., “ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge,” IEEE J. Sel. Top. Signal Process., vol. 11, no. 4, pp. 588–604, Jun. 2017.
[5] M. Todisco, H. Delgado, and N. Evans, “Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification,” Comput. Speech Lang., vol. 45, pp. 516–535, Sep. 2017.
[6] J. Yang and L. Liu, “Playback speech detection based on magnitude-phase spectrum,” Electron. Lett., vol. 54, no. 14, pp. 901–903, Jul. 2018.
[7] A. T. Patil, H. A. Patil, and K. Khoria, “Effectiveness of energy separation-based instantaneous frequency estimation for cochlear cepstral features for synthetic and voice-converted spoofed speech detection,” Comput. Speech Lang., vol. 72, no. 1, p. 101301, Mar. 2022.
[8] J. Yang, R. K. Das, and N. Zhou, “Extraction of Octave Spectra Information for Spoofing Attack Detection,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 27, no. 12, pp. 2373–2384, Dec. 2019.
[9] C. Zhang, C. Yu, and J. H. L. Hansen, “An Investigation of Deep-Learning Frameworks for Speaker Verification Antispoofing,” IEEE J. Sel. Top. Signal Process., vol. 11, no. 4, pp. 684–694, Jun. 2017.
[10] S. Scardapane, L. Stoffl, F. Rohrbein, and A. Uncini, “On the use of deep recurrent neural networks for detecting audio spoofing attacks,” Proc. Int. Jt. Conf. Neural Networks, vol. 2017-May, pp. 3483–3490, 2017.
[11] C. Hanilçi, T. Kinnunen, M. Sahidullah, and A. Sizov, “Spoofing detection goes noisy: An analysis of synthetic speech detection in the presence of additive noise,” Speech Commun., vol. 85, pp. 83–97, Dec. 2016.
[12] X. Tian, Z. Wu, X. Xiao, E. S. Chng, and H. Li, “An Investigation of Spoofing Speech Detection Under Additive Noise and Reverberant Conditions,” in INTERSPEECH 2016, 2016, pp. 1715–1719.
[13] A. Gómez Alanís, A. M. Peinado, J. A. Gonzalez, and A. Gomez, “A Deep Identity Representation for Noise Robust Spoofing Detection,” in Interspeech 2018, 2018, pp. 676–680.
[14] A. Gomez-Alanis, A. M. Peinado, J. A. Gonzalez, and A. M. Gomez, “A Gated Recurrent Convolutional Neural Network for Robust Spoofing Detection,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 27, no. 12, pp. 1985–1999, Dec. 2019.
[15] S. Mahto, H. Yamamoto, and T. Koshinaka, “i-Vector Transformation Using a Novel Discriminative Denoising Autoencoder for Noise-Robust Speaker Recognition,” in Interspeech 2017, 2017, pp. 3722–3726.
[16] N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, “Front-End Factor Analysis for Speaker Verification,” IEEE Trans. Audio. Speech. Lang. Processing, vol. 19, no. 4, pp. 788–798, May 2011.
[17] W. Rao et al., “Neural networks based channel compensation for i-vector speaker verification,” in 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP), 2016, pp. 1–5.
[18] H. Yamamoto and T. Koshinaka, “Denoising autoencoder-based speaker feature restoration for utterances of short duration,” in Interspeech 2015, 2015, pp. 1052–1056.
[19] A. Varga and H. J. M. Steeneken, “Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems,” Speech Commun., vol. 12, no. 3, pp. 247–251, Jul. 1993.
[20] D. Dean, A. Kanagasundaram, H. Ghaemmaghami, M. H. Rahman, and S. Sridharan, “The QUT-NOISE-SRE protocol for the evaluation of noisy speaker recognition,” in Interspeech 2015, 2015, pp. 3456–3460.
[21] C. Zhang et al., “Joint information from nonlinear and linear features for spoofing detection: An i-vector/DNN based approach,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 5035–5039.
[22] A. Sizov, E. Khoury, T. Kinnunen, Z. Wu, and S. Marcel, “Joint Speaker Verification and Antispoofing in the i-Vector Space,” IEEE Trans. Inf. Forensics Secur., vol. 10, no. 4, pp. 821–832, Apr. 2015.
[23] D. Martinez, L. Burget, T. Stafylakis, Y. Lei, P. Kenny, and E. Lleida, “Unscented transform for ivector-based noisy speaker recognition,” in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 4042–4046.
[24] D. Ribas and E. Vincent, “An Improved Uncertainty Propagation Method for Robust I-Vector Based Speaker Recognition,” in 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 6331–6335.
[25] W. Ben Kheder, D. Matrouf, M. Ajili, and J.-F. Bonastre, “A Unified Joint Model to Deal With Nuisance Variabilities in the i-Vector Space,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 26, no. 3, pp. 633–645, Mar. 2018.
[26] W. Ben Kheder, D. Matrouf, J.-F. Bonastre, M. Ajili, and P.-M. Bousquet, “Additive noise compensation in the i-vector space for speaker recognition,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 4190–4194.
[27] W. Ben Kheder, D. Matrouf, P.-M. Bousquet, J.-F. Bonastre, and M. Ajili, “Fast i-vector denoising using MAP estimation and a noise distributions database for robust speaker recognition,” Comput. Speech Lang., vol. 45, pp. 104–122, Sep. 2017.
[28] G. Saon, H. Soltau, D. Nahamoo, and M. Picheny, “Speaker adaptation of neural network acoustic models using i-vectors,” in 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013, pp. 55–59.
[29] W. Wang, W. Song, C. Chen, Z. Zhang, and Y. Xin, “I-vector features and deep neural network modeling for language recognition,” Procedia Comput. Sci., vol. 147, pp. 36–43, 2019.
[30] Y. Qian, N. Chen, H. Dinkel, and Z. Wu, “Deep Feature Engineering for Noise Robust Spoofing Detection,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 25, no. 10, pp. 1942–1955, Oct. 2017.

There are 30 citations in total.

Details

Primary Language	English
Subjects	Engineering
Journal Section	Articles
Authors	Gökay Dişken 0000-0002-8680-0636 Zekeriya Tüfekci 0000-0001-7835-2741
Project Number	121E057
Publication Date	June 29, 2023
Published in Issue	Year 2023 Volume: 19 Issue: 2

Cite

APA	Dişken, G., & Tüfekci, Z. (2023). Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder. Celal Bayar University Journal of Science, 19(2), 167-174.
AMA	Dişken G, Tüfekci Z. Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder. CBUJOS. June 2023;19(2):167-174.
Chicago	Dişken, Gökay, and Zekeriya Tüfekci. “Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder”. Celal Bayar University Journal of Science 19, no. 2 (June 2023): 167-74.
EndNote	Dişken G, Tüfekci Z (June 1, 2023) Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder. Celal Bayar University Journal of Science 19 2 167–174.
IEEE	G. Dişken and Z. Tüfekci, “Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder”, CBUJOS, vol. 19, no. 2, pp. 167–174, 2023.
ISNAD	Dişken, Gökay - Tüfekci, Zekeriya. “Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder”. Celal Bayar University Journal of Science 19/2 (June 2023), 167-174.
JAMA	Dişken G, Tüfekci Z. Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder. CBUJOS. 2023;19:167–174.
MLA	Dişken, Gökay and Zekeriya Tüfekci. “Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder”. Celal Bayar University Journal of Science, vol. 19, no. 2, 2023, pp. 167-74.
Vancouver	Dişken G, Tüfekci Z. Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder. CBUJOS. 2023;19(2):167-74.

Download Cover Image

Article Files

Full Text