A Literature Review On Speech Emotion Recognition Using Deep Learning Techniques
Year 2022,
Volume: 10 Issue: 4, 765 - 791, 30.12.2022
Emrah Dikbıyık
,
Önder Demir
,
Buket Doğan
Abstract
People's speech varies according to the emotions they are in and contains information about these emotions. Carrying out studies on speech emotion recognition (SER) systems to discover this information has been a remarkable research area. Different data sets were created with the studies, many features of speech were discussed, different classification algorithms were applied for emotion recognition. This study includes the results of a literature research prepared by considering speech recognition applications (we reviewed studies published between 2019 and 2021) in which deep learning methods are used. In addition, the emotional data sets used in these applications were examined and the features used in emotion recognition were included. Unlike other studies, emotional data sets created in Turkish and studies on these data sets are also discussed as a separate section.
References
- Duygu kelimesinin tanımı. Türk Dil Kurumu TDK, https://sozluk.gov.tr/ Erişim tarihi: 20/03/2022
- Sibel, S. Ü. (2013). Örgütlerde duygusal zeka. Balıkesir Üniversitesi Sosyal Bilimler Enstitüsü Dergisi, 16(29), 213-242.
- Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 5(1), 1-167.
- Li, X., & Lin, R. (2021, December). Speech Emotion Recognition for Power Customer Service. In 2021 7th International Conference on Computer and Communications (ICCC) (pp. 514-518). IEEE.
- Simcock, G., McLoughlin, L. T., De Regt, T., Broadhouse, K. M., Beaudequin, D., Lagopoulos, J., & Hermens, D. F. (2020). Associations between facial emotion recognition and mental health in early adolescence. International journal of environmental research and public health, 17(1), 330.
- Saste, S. T., & Jagdale, S. M. (2017, April). Emotion recognition from speech using MFCC and DWT for security system. In 2017 international conference of electronics, communication and aerospace technology (ICECA) (Vol. 1, pp. 701-704). IEEE
- Yang, D., Alsadoon, A., Prasad, P. C., Singh, A. K., & Elchouemi, A. (2018). An emotion recognition model based on facial recognition in virtual learning environment. Procedia Computer Science, 125, 2-10.
- Er, M. B., & Harun, Ç. İ. Ğ. (2020). Türk Müziği Uyaranları Kullanılarak İnsan Duygularının Makine Öğrenmesi Yöntemi İle Tanınması. Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım ve Teknoloji, 8(2), 458-474
- Soleymani, M., Garcia, D., Jou, B., Schuller, B., Chang, S. F., & Pantic, M. (2017). A survey of multimodal sentiment analysis. Image and Vision Computing, 65, 3-14
- Nasukawa, T., & Yi, J. (2003, October). Sentiment analysis: Capturing favorability using natural language processing. In Proceedings of the 2nd international conference on Knowledge capture (pp. 70-77).
- Rhanoui, M., Mikram, M., Yousfi, S., & Barzali, S. (2019). A CNN-BiLSTM Model for Document-Level Sentiment Analysis. Machine Learning and Knowledge Extraction, 1(3), 832-847.
- Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., & Taylor, J. G. (2001). Emotion recognition in human-computer interaction. IEEE Signal processing magazine, 18(1), 32-80.
- Busso, C., Lee, S., & Narayanan, S. (2009). Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE transactions on audio, speech, and language processing, 17(4), 582-596.]
- Wu, S., Falk, T. H., & Chan, W. Y. (2011). Automatic speech emotion recognition using modulation spectral features. Speech communication, 53(5), 768-785.
- Jayalekshmi, J., & Mathew, T. (2017, July). Facial expression recognition and emotion classification system for sentiment analysis. In 2017 International Conference on Networks & Advances in Computational Technologies (NetACT) (pp. 1-8). IEEE.
- Wu, T., Peng, J., Zhang, W., Zhang, H., Tan, S., Yi, F., ... & Huang, Y. (2022). Video sentiment analysis with bimodal information-augmented multi-head attention. Knowledge-Based Systems, 235, 107676.
- Zadeh, A. (2015). Micro-opinion Sentiment Intensity Analysis and Summarization in Online Videos. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (ICMI ’15).
- Zadeh, A., Chen, M., Poria, S., Cambria, E., & Morency, L. P. (2017). Tensor fusion network for multimodal sentiment analysis. arXiv preprint arXiv:1707.07250.
- Koolagudi, S. G., Kumar, N., & Rao, K. S. (2011, February). Speech emotion recognition using segmental level prosodic analysis. In 2011 international conference on devices and communications (ICDeCom) (pp. 1-5). IEEE
- Korkmaz, O. E., & Atasoy, A. (2015, November). Emotion recognition from speech signal using mel-frequency cepstral coefficients. In 2015 9th International Conference on Electrical and Electronics Engineering (ELECO) (pp. 1254-1257). IEEE.
- Ingale, A. B., & Chaudhari, D. S. (2012). Speech emotion recognition. International Journal of Soft Computing and Engineering (IJSCE), 2(1), 235-238.
- LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521(7553), 436-444.
- Zhao, Z. Q., Zheng, P., Xu, S. T., & Wu, X. (2019). Object detection with deep learning: A review. IEEE transactions on neural networks and learning systems, 30(11), 3212-3232.
- Young, T., Hazarika, D., Poria, S., & Cambria, E. (2018). Recent trends in deep learning based natural language processing. ieee Computational intelligenCe magazine, 13(3), 55-75.
- Liang, H., Sun, X., Sun, Y., & Gao, Y. (2017). Text feature extraction based on deep learning: a review. EURASIP journal on wireless communications and networking, 2017(1), 1-12.
- Yao, K., Yu, D., Seide, F., Su, H., Deng, L., & Gong, Y. (2012, December). Adaptation of context-dependent deep neural networks for automatic speech recognition. In 2012 IEEE Spoken Language Technology Workshop (SLT) (pp. 366-369). IEEE.
- Aravindpai Pai, “CNN vs. RNN vs. ANN – Analyzing 3 Types of Neural Networks in Deep Learning” https://www.analyticsvidhya.com/blog/2020/02/cnn-vs-rnn-vs-mlp-analyzing-3-types-of-neural-networks-in-deep-learning/ Erişim Tarihi: 21/02/2022
- Khalil, R. A., Jones, E., Babar, M. I., Jan, T., Zafar, M. H., & Alhussain, T. (2019). Speech emotion recognition using deep learning techniques: A review. IEEE Access, 7, 117327-117345.
- Akçay, M. B., & Oğuz, K. (2020). Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Communication, 116, 56-76.
- El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern recognition, 44(3), 572-587.
- Eyben, F., Wöllmer, M., & Schuller, B. (2010, October). Opensmile: the munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM international conference on Multimedia (pp. 1459-1462
- Paul Boersma & David Weenink (1992–2022):Praat: doing phonetics by computer [Computer program]. https://www.fon.hum.uva.nl/paul/praat.html Erişim tarihi: 20/05/2022
- Chen, S., Jin, Q., Li, X., Yang, G., & Xu, J. (2014, September). Speech emotion classification using acoustic features. In The 9th International Symposium on Chinese Spoken Language Processing (pp. 579-583). IEEE.
- Jacob, A. (2016, April). Speech emotion recognition based on minimal voice quality features. In 2016 International conference on communication and signal processing (ICCSP) (pp. 0886-0890). IEEE.
- Zhou, Y., Sun, Y., Zhang, J., & Yan, Y. (2009, December). Speech emotion recognition using both spectral and prosodic features. In 2009 international conference on information engineering and computer science (pp. 1-4). IEEE.
- Wang, Y., Du, S., & Zhan, Y. (2008, October). Adaptive and optimal classification of speech emotion recognition. In 2008 fourth international conference on natural computation (Vol. 5, pp. 407-411). IEEE.
- Rao, K. S., Koolagudi, S. G., & Vempada, R. R. (2013). Emotion recognition from speech using global and local prosodic features. International journal of speech technology, 16(2), 143-160.
- Li, X., Tao, J., Johnson, M. T., Soltis, J., Savage, A., Leong, K. M., & Newman, J. D. (2007, April). Stress and emotion classification using jitter and shimmer features. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP'07 (Vol. 4, pp. IV-1081). IEEE.
- Aouani, H., & Ayed, Y. B. (2020). Speech emotion recognition with deep learning. Procedia Computer Science, 176, 251-260.
- Pathak, S., & Kulkarni, A. (2011, April). Recognizing emotions from speech. In 2011 3rd International Conference on Electronics Computer Technology (Vol. 4, pp. 107-109). IEEE.
- Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech communication, 41(4), 603-623.
- Jiang, P., Fu, H., Tao, H., Lei, P., & Zhao, L. (2019). Parallelized convolutional recurrent neural network with spectral features for speech emotion recognition. IEEE Access, 7, 90368-90377.
- Jain, M., Narayan, S., Balaji, P., Bhowmick, A., & Muthu, R. K. (2020). Speech emotion recognition using support vector machine. arXiv preprint arXiv:2002.07590.
- Zhou, G., Hansen, J. H., & Kaiser, J. F. (2001). Nonlinear feature based classification of speech under stress. IEEE Transactions on speech and audio processing, 9(3), 201-216.
- Bandela, S. R., & Kumar, T. K. (2017, July). Stressed speech emotion recognition using feature fusion of teager energy operator and MFCC. In 2017 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (pp. 1-5). IEEE.
- Mairesse, F., Polifroni, J., & Di Fabbrizio, G. (2012, March). Can prosody inform sentiment analysis? experiments on short spoken reviews. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5093-5096). IEEE.
- Shen, Q., Wang, Z., & Sun, Y. (2017, October). Sentiment analysis of movie reviews based on cnn-blstm. In International Conference on Intelligence Science (pp. 164-171). Springer, Cham.
- Rosas, V. P., Mihalcea, R., & Morency, L. P. (2013). Multimodal sentiment analysis of spanish online videos. IEEE Intelligent Systems, 28(3), 38-45.
- Zhao, J., Mao, X., & Chen, L. (2019). Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomedical signal processing and control, 47, 312-323.
- Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005, September). A database of German emotional speech. In Interspeech (Vol. 5, pp. 1517-1520).
- Haq, S. U. (2011). Audio visual expressed emotion classification. University of Surrey (United Kingdom).
- Livingstone, S. R., & Russo, F. A. (2018). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PloS one, 13(5), e0196391.
- Dhall, A., Ramana Murthy, O. V., Goecke, R., Joshi, J., & Gedeon, T. (2015, November). Video and image based emotion recognition challenges in the wild: Emotiw 2015. In Proceedings of the 2015 ACM on international conference on multimodal interaction (pp. 423-426).
- Önder, O., Zhalehpour, S., & Erdem, Ç. E. (2013, April). A Turkish audio-visual emotional database. In 2013 21st Signal Processing and Communications Applications Conference (SIU) (pp. 1-4). IEEE.
- Busso, C., Bulut, M., Lee, C. C., Kazemzadeh, A., Mower, E., Kim, S., Chang, J., N., Lee, S. & Narayanan, S. S. (2008). IEMOCAP: Interactive emotional dyadic motion capture database. Language resources and evaluation, 42(4), 335-359.
- Cao, H., Cooper, D. G., Keutmann, M. K., Gur, R. C., Nenkova, A., & Verma, R. (2014). Crema-d: Crowd-sourced emotional multimodal actors dataset. IEEE transactions on affective computing, 5(4), 377-390.
- Martin, O., Kotsia, I., Macq, B., & Pitas, I. (2006, April). The eNTERFACE'05 audio-visual emotion database. In 22nd International Conference on Data Engineering Workshops (ICDEW'06) (pp. 8-8). IEEE.
- China Linguistic Data Consortium http://www.chineseldc.org Erişim Tarihi: 25/03/2022
- Bänziger, T., Pirker, H., & Scherer, K. (2006, May). GEMEP-GEneva Multimodal Emotion Portrayals: A corpus for the study of multimodal emotional expressions. In Proceedings of LREC (Vol. 6, pp. 15-019).
- Wang, Y., & Guan, L. (2005, March). Recognizing human emotion from audiovisual information. In Proceedings.(ICASSP'05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. (Vol. 2, pp. ii-1125). IEEE.
- Latif, S., Qayyum, A., Usman, M., & Qadir, J. (2018, December). Cross lingual speech emotion recognition: Urdu vs. western languages. In 2018 International Conference on Frontiers of Information Technology (FIT) (pp. 88-93). IEEE.
- Costantini, G., Iaderola, I., Paoloni, A., & Todisco, M. (2014). EMOVO corpus: an Italian emotional speech database. In International Conference on Language Resources and Evaluation (LREC 2014) (pp. 3501-3504). European Language Resources Association (ELRA).
- Wani, T. M., Gunawan, T. S., Qadri, S. A. A., Kartiwi, M., & Ambikairajah, E. (2021). A comprehensive review of speech emotion recognition systems. IEEE Access, 9, 47795-47814.
- Wang, X., Chen, X., & Cao, C. (2020). Human emotion recognition by optimally fusing facial expression and speech feature. Signal Processing: Image Communication, 84, 115831.
- Zehra, W., Javed, A. R., Jalil, Z., Khan, H. U., & Gadekallu, T. R. (2021). Cross corpus multi-lingual speech emotion recognition using ensemble learning. Complex & Intelligent Systems, 7(4), 1845-1854
- Demircan, S., & Kahramanli, H. (2018). Application of fuzzy C-means clustering algorithm to spectral features for emotion classification from speech. Neural Computing and Applications, 29(8), 59-66.
- Ganapathy, A. (2016). Speech Emotion Recognition Using Deep Learning Techniques. ABC Journal of Advanced Research, 5(2), 113-122.
- Abbaschian, B. J., Sierra-Sosa, D., & Elmaghraby, A. (2021). Deep learning techniques for speech emotion recognition, from databases to models. Sensors, 21(4), 1249.
- Demir, A., Atila, O., & Şengür, A. (2019, September). Deep learning and audio based emotion recognition. In 2019 International Artificial Intelligence and Data Processing Symposium (IDAP) (pp. 1-6). IEEE.
- Meng, H., Yan, T., Yuan, F., & Wei, H. (2019). Speech emotion recognition from 3D log-mel spectrograms with deep learning network. IEEE access, 7, 125868-125881.
- Xie, Y., Liang, R., Liang, Z., Huang, C., Zou, C., & Schuller, B. (2019). Speech emotion classification using attention-based LSTM. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(11), 1675-1685.
- Jalal, M. A., Milner, R., & Hain, T. (2020, October). Empirical Interpretation of Speech Emotion Perception with Attention Based Model for Speech Emotion Recognition. In INTERSPEECH (pp. 4113-4117).
- Issa, D., Demirci, M. F., & Yazici, A. (2020). Speech emotion recognition with deep convolutional neural networks. Biomedical Signal Processing and Control, 59, 101894.
- Mustaqeem, Kwon, S. (2020). CLSTM: Deep feature-based speech emotion recognition using the hierarchical ConvLSTM network. Mathematics, 8(12), 2133.
- Mustaqeem, Sajjad, M., & Kwon, S. (2020). Clustering-based speech emotion recognition by incorporating learned features and deep BiLSTM. IEEE Access, 8, 79861-79875.
- Anvarjon, T., Mustaqeem, & Kwon, S. (2020). Deep-net: A lightweight CNN-based speech emotion recognition system using deep frequency features. Sensors, 20(18), 5212.
- Li, D., Liu, J., Yang, Z., Sun, L., & Wang, Z. (2021). Speech emotion recognition using recurrent neural networks with directional self-attention. Expert Systems with Applications, 173, 114683.
- Mustaqeem, & Kwon, S. (2021). MLT-DNet: Speech emotion recognition using 1D dilated CNN based on multi-learning trick approach. Expert Systems with Applications, 167, 114177.
- Yusuf, S. M., Adedokun, E. A., Muazu, M. B., Umoh, I. J., & Ibrahim, A. A. (2021, October). RMWSaug: Robust Multi-window Spectrogram Augmentation Approach for Deep Learning based Speech Emotion Recognition. In 2021 Innovations in Intelligent Systems and Applications Conference (ASYU) (pp. 1-6). IEEE.
- Zhang, S., Tao, X., Chuang, Y., & Zhao, X. (2021). Learning deep multimodal affective features for spontaneous speech emotion recognition. Speech Communication, 127, 73-81.
- Oflazoglu, Ç., & Yildirim, S. (2011, April). Turkish emotional speech database. In 2011 IEEE 19th Signal Processing and Communications Applications Conference (SIU) (pp. 1153-1156). IEEE.
- Grimm, M., Kroschel, K., & Narayanan, S. (2008, June). The Vera am Mittag German audio-visual emotional speech database. In 2008 IEEE international conference on multimedia and expo (pp. 865-868). IEEE.
- Oflazoglu, C., & Yildirim, S. (2013). Recognizing emotion from Turkish speech using acoustic features. EURASIP Journal on Audio, Speech, and Music Processing, 2013(1), 1-11.
- Eroglu Erdem, C., Turan, C., & Aydin, Z. (2015). BAUM-2: a multilingual audio-visual affective face database. Multimedia tools and applications, 74(18), 7429-7459.
- Meral, H. M., Ekenel, H. K., & Ozsoy, A. (2003). Analysis of emotion in Turkish. In XVII National Conference on Turkish Linguistics.
- Kaya, H., Salah, A. A., Gürgen, S. F., & Ekenel, H. (2014, April). Protocol and baseline for experiments on Bogazici University Turkish emotional speech corpus. In 2014 22nd Signal Processing and Communications Applications Conference (SIU) (pp. 1698-1701). IEEE.
- Parlak, C., Diri, B., & Gürgen, F. (2014, September). A cross-corpus experiment in speech emotion recognition. In SLAM@ INTERSPEECH (pp. 58-61).
- Oflazoglu, Ç., & Yıldırım, S. (2015, May). Binary classification performances of emotion classes for Turkish Emotional Speech. In 2015 23nd Signal Processing and Communications Applications Conference (SIU) (pp. 2353-2356). IEEE.
- Zhalehpour, S., Onder, O., Akhtar, Z., & Erdem, C. E. (2016). BAUM-1: A spontaneous audio-visual face database of affective and mental states. IEEE Transactions on Affective Computing, 8(3), 300-313.
- Bakır, C., & Yuzkat, M. (2018). Speech emotion classification and recognition with different methods for Turkish language. Balkan Journal of Electrical and Computer Engineering, 6(2), 122-128.
- Canpolat, S. F., Ormanoğlu, Z., & Zeyrek, D. (2020, May). Turkish Emotion Voice Database (TurEV-DB). In Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL) (pp. 368-375).
- Özsönmez, D. B., Acarman, T., & Parlak, İ. B. (2021, June). Optimal Classifier Selection in Turkish Speech Emotion Detection. In 2021 29th Signal Processing and Communications Applications Conference (SIU) (pp. 1-4). IEEE.
Derin Öğrenme Yöntemleri İle Konuşmadan Duygu Tanıma Üzerine Bir Literatür Araştırması
Year 2022,
Volume: 10 Issue: 4, 765 - 791, 30.12.2022
Emrah Dikbıyık
,
Önder Demir
,
Buket Doğan
Abstract
İnsanların konuşmaları, içinde bulundukları duygulara göre değişiklik gösterir ve bu duygularla ilgili bilgiler içerir. Bu bilgileri keşfetmek için konuşmadan duygu tanıma sistemleri üzerine çalışmalar gerçekleştirmek dikkat çeken bir araştırma alanı olmuştur. Yapılan çalışmalarla farklı veri setleri ortaya çıkmış, konuşmaya ait birçok özellik dikkate alınmış ve duygu tanıma için farklı sınıflandırma algoritmaları uygulanmıştır. Bu çalışma, derin öğrenme yöntemlerinin kullanıldığı konuşmadan duygu tanıma uygulamaları (2019-2021 yılları arasında yapılan çalışmalar) dikkate alınarak hazırlanmış bir literatür araştırmasının sonuçlarını içerir. Bununla birlikte bu uygulamalarda kullanılan duygusal veri setleri incelenmiş, duygu tanımada kullanılan özelliklere yer verilmiştir. Diğer çalışmalardan farklı olarak Türkçe dilinde hazırlanmış duygusal veri setleri ve bu veri setleri üzerinde yapılan çalışmalar da ayrı bir bölüm olarak ele alınmıştır.
References
- Duygu kelimesinin tanımı. Türk Dil Kurumu TDK, https://sozluk.gov.tr/ Erişim tarihi: 20/03/2022
- Sibel, S. Ü. (2013). Örgütlerde duygusal zeka. Balıkesir Üniversitesi Sosyal Bilimler Enstitüsü Dergisi, 16(29), 213-242.
- Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 5(1), 1-167.
- Li, X., & Lin, R. (2021, December). Speech Emotion Recognition for Power Customer Service. In 2021 7th International Conference on Computer and Communications (ICCC) (pp. 514-518). IEEE.
- Simcock, G., McLoughlin, L. T., De Regt, T., Broadhouse, K. M., Beaudequin, D., Lagopoulos, J., & Hermens, D. F. (2020). Associations between facial emotion recognition and mental health in early adolescence. International journal of environmental research and public health, 17(1), 330.
- Saste, S. T., & Jagdale, S. M. (2017, April). Emotion recognition from speech using MFCC and DWT for security system. In 2017 international conference of electronics, communication and aerospace technology (ICECA) (Vol. 1, pp. 701-704). IEEE
- Yang, D., Alsadoon, A., Prasad, P. C., Singh, A. K., & Elchouemi, A. (2018). An emotion recognition model based on facial recognition in virtual learning environment. Procedia Computer Science, 125, 2-10.
- Er, M. B., & Harun, Ç. İ. Ğ. (2020). Türk Müziği Uyaranları Kullanılarak İnsan Duygularının Makine Öğrenmesi Yöntemi İle Tanınması. Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım ve Teknoloji, 8(2), 458-474
- Soleymani, M., Garcia, D., Jou, B., Schuller, B., Chang, S. F., & Pantic, M. (2017). A survey of multimodal sentiment analysis. Image and Vision Computing, 65, 3-14
- Nasukawa, T., & Yi, J. (2003, October). Sentiment analysis: Capturing favorability using natural language processing. In Proceedings of the 2nd international conference on Knowledge capture (pp. 70-77).
- Rhanoui, M., Mikram, M., Yousfi, S., & Barzali, S. (2019). A CNN-BiLSTM Model for Document-Level Sentiment Analysis. Machine Learning and Knowledge Extraction, 1(3), 832-847.
- Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., & Taylor, J. G. (2001). Emotion recognition in human-computer interaction. IEEE Signal processing magazine, 18(1), 32-80.
- Busso, C., Lee, S., & Narayanan, S. (2009). Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE transactions on audio, speech, and language processing, 17(4), 582-596.]
- Wu, S., Falk, T. H., & Chan, W. Y. (2011). Automatic speech emotion recognition using modulation spectral features. Speech communication, 53(5), 768-785.
- Jayalekshmi, J., & Mathew, T. (2017, July). Facial expression recognition and emotion classification system for sentiment analysis. In 2017 International Conference on Networks & Advances in Computational Technologies (NetACT) (pp. 1-8). IEEE.
- Wu, T., Peng, J., Zhang, W., Zhang, H., Tan, S., Yi, F., ... & Huang, Y. (2022). Video sentiment analysis with bimodal information-augmented multi-head attention. Knowledge-Based Systems, 235, 107676.
- Zadeh, A. (2015). Micro-opinion Sentiment Intensity Analysis and Summarization in Online Videos. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (ICMI ’15).
- Zadeh, A., Chen, M., Poria, S., Cambria, E., & Morency, L. P. (2017). Tensor fusion network for multimodal sentiment analysis. arXiv preprint arXiv:1707.07250.
- Koolagudi, S. G., Kumar, N., & Rao, K. S. (2011, February). Speech emotion recognition using segmental level prosodic analysis. In 2011 international conference on devices and communications (ICDeCom) (pp. 1-5). IEEE
- Korkmaz, O. E., & Atasoy, A. (2015, November). Emotion recognition from speech signal using mel-frequency cepstral coefficients. In 2015 9th International Conference on Electrical and Electronics Engineering (ELECO) (pp. 1254-1257). IEEE.
- Ingale, A. B., & Chaudhari, D. S. (2012). Speech emotion recognition. International Journal of Soft Computing and Engineering (IJSCE), 2(1), 235-238.
- LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521(7553), 436-444.
- Zhao, Z. Q., Zheng, P., Xu, S. T., & Wu, X. (2019). Object detection with deep learning: A review. IEEE transactions on neural networks and learning systems, 30(11), 3212-3232.
- Young, T., Hazarika, D., Poria, S., & Cambria, E. (2018). Recent trends in deep learning based natural language processing. ieee Computational intelligenCe magazine, 13(3), 55-75.
- Liang, H., Sun, X., Sun, Y., & Gao, Y. (2017). Text feature extraction based on deep learning: a review. EURASIP journal on wireless communications and networking, 2017(1), 1-12.
- Yao, K., Yu, D., Seide, F., Su, H., Deng, L., & Gong, Y. (2012, December). Adaptation of context-dependent deep neural networks for automatic speech recognition. In 2012 IEEE Spoken Language Technology Workshop (SLT) (pp. 366-369). IEEE.
- Aravindpai Pai, “CNN vs. RNN vs. ANN – Analyzing 3 Types of Neural Networks in Deep Learning” https://www.analyticsvidhya.com/blog/2020/02/cnn-vs-rnn-vs-mlp-analyzing-3-types-of-neural-networks-in-deep-learning/ Erişim Tarihi: 21/02/2022
- Khalil, R. A., Jones, E., Babar, M. I., Jan, T., Zafar, M. H., & Alhussain, T. (2019). Speech emotion recognition using deep learning techniques: A review. IEEE Access, 7, 117327-117345.
- Akçay, M. B., & Oğuz, K. (2020). Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Communication, 116, 56-76.
- El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern recognition, 44(3), 572-587.
- Eyben, F., Wöllmer, M., & Schuller, B. (2010, October). Opensmile: the munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM international conference on Multimedia (pp. 1459-1462
- Paul Boersma & David Weenink (1992–2022):Praat: doing phonetics by computer [Computer program]. https://www.fon.hum.uva.nl/paul/praat.html Erişim tarihi: 20/05/2022
- Chen, S., Jin, Q., Li, X., Yang, G., & Xu, J. (2014, September). Speech emotion classification using acoustic features. In The 9th International Symposium on Chinese Spoken Language Processing (pp. 579-583). IEEE.
- Jacob, A. (2016, April). Speech emotion recognition based on minimal voice quality features. In 2016 International conference on communication and signal processing (ICCSP) (pp. 0886-0890). IEEE.
- Zhou, Y., Sun, Y., Zhang, J., & Yan, Y. (2009, December). Speech emotion recognition using both spectral and prosodic features. In 2009 international conference on information engineering and computer science (pp. 1-4). IEEE.
- Wang, Y., Du, S., & Zhan, Y. (2008, October). Adaptive and optimal classification of speech emotion recognition. In 2008 fourth international conference on natural computation (Vol. 5, pp. 407-411). IEEE.
- Rao, K. S., Koolagudi, S. G., & Vempada, R. R. (2013). Emotion recognition from speech using global and local prosodic features. International journal of speech technology, 16(2), 143-160.
- Li, X., Tao, J., Johnson, M. T., Soltis, J., Savage, A., Leong, K. M., & Newman, J. D. (2007, April). Stress and emotion classification using jitter and shimmer features. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP'07 (Vol. 4, pp. IV-1081). IEEE.
- Aouani, H., & Ayed, Y. B. (2020). Speech emotion recognition with deep learning. Procedia Computer Science, 176, 251-260.
- Pathak, S., & Kulkarni, A. (2011, April). Recognizing emotions from speech. In 2011 3rd International Conference on Electronics Computer Technology (Vol. 4, pp. 107-109). IEEE.
- Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech communication, 41(4), 603-623.
- Jiang, P., Fu, H., Tao, H., Lei, P., & Zhao, L. (2019). Parallelized convolutional recurrent neural network with spectral features for speech emotion recognition. IEEE Access, 7, 90368-90377.
- Jain, M., Narayan, S., Balaji, P., Bhowmick, A., & Muthu, R. K. (2020). Speech emotion recognition using support vector machine. arXiv preprint arXiv:2002.07590.
- Zhou, G., Hansen, J. H., & Kaiser, J. F. (2001). Nonlinear feature based classification of speech under stress. IEEE Transactions on speech and audio processing, 9(3), 201-216.
- Bandela, S. R., & Kumar, T. K. (2017, July). Stressed speech emotion recognition using feature fusion of teager energy operator and MFCC. In 2017 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (pp. 1-5). IEEE.
- Mairesse, F., Polifroni, J., & Di Fabbrizio, G. (2012, March). Can prosody inform sentiment analysis? experiments on short spoken reviews. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5093-5096). IEEE.
- Shen, Q., Wang, Z., & Sun, Y. (2017, October). Sentiment analysis of movie reviews based on cnn-blstm. In International Conference on Intelligence Science (pp. 164-171). Springer, Cham.
- Rosas, V. P., Mihalcea, R., & Morency, L. P. (2013). Multimodal sentiment analysis of spanish online videos. IEEE Intelligent Systems, 28(3), 38-45.
- Zhao, J., Mao, X., & Chen, L. (2019). Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomedical signal processing and control, 47, 312-323.
- Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005, September). A database of German emotional speech. In Interspeech (Vol. 5, pp. 1517-1520).
- Haq, S. U. (2011). Audio visual expressed emotion classification. University of Surrey (United Kingdom).
- Livingstone, S. R., & Russo, F. A. (2018). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PloS one, 13(5), e0196391.
- Dhall, A., Ramana Murthy, O. V., Goecke, R., Joshi, J., & Gedeon, T. (2015, November). Video and image based emotion recognition challenges in the wild: Emotiw 2015. In Proceedings of the 2015 ACM on international conference on multimodal interaction (pp. 423-426).
- Önder, O., Zhalehpour, S., & Erdem, Ç. E. (2013, April). A Turkish audio-visual emotional database. In 2013 21st Signal Processing and Communications Applications Conference (SIU) (pp. 1-4). IEEE.
- Busso, C., Bulut, M., Lee, C. C., Kazemzadeh, A., Mower, E., Kim, S., Chang, J., N., Lee, S. & Narayanan, S. S. (2008). IEMOCAP: Interactive emotional dyadic motion capture database. Language resources and evaluation, 42(4), 335-359.
- Cao, H., Cooper, D. G., Keutmann, M. K., Gur, R. C., Nenkova, A., & Verma, R. (2014). Crema-d: Crowd-sourced emotional multimodal actors dataset. IEEE transactions on affective computing, 5(4), 377-390.
- Martin, O., Kotsia, I., Macq, B., & Pitas, I. (2006, April). The eNTERFACE'05 audio-visual emotion database. In 22nd International Conference on Data Engineering Workshops (ICDEW'06) (pp. 8-8). IEEE.
- China Linguistic Data Consortium http://www.chineseldc.org Erişim Tarihi: 25/03/2022
- Bänziger, T., Pirker, H., & Scherer, K. (2006, May). GEMEP-GEneva Multimodal Emotion Portrayals: A corpus for the study of multimodal emotional expressions. In Proceedings of LREC (Vol. 6, pp. 15-019).
- Wang, Y., & Guan, L. (2005, March). Recognizing human emotion from audiovisual information. In Proceedings.(ICASSP'05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. (Vol. 2, pp. ii-1125). IEEE.
- Latif, S., Qayyum, A., Usman, M., & Qadir, J. (2018, December). Cross lingual speech emotion recognition: Urdu vs. western languages. In 2018 International Conference on Frontiers of Information Technology (FIT) (pp. 88-93). IEEE.
- Costantini, G., Iaderola, I., Paoloni, A., & Todisco, M. (2014). EMOVO corpus: an Italian emotional speech database. In International Conference on Language Resources and Evaluation (LREC 2014) (pp. 3501-3504). European Language Resources Association (ELRA).
- Wani, T. M., Gunawan, T. S., Qadri, S. A. A., Kartiwi, M., & Ambikairajah, E. (2021). A comprehensive review of speech emotion recognition systems. IEEE Access, 9, 47795-47814.
- Wang, X., Chen, X., & Cao, C. (2020). Human emotion recognition by optimally fusing facial expression and speech feature. Signal Processing: Image Communication, 84, 115831.
- Zehra, W., Javed, A. R., Jalil, Z., Khan, H. U., & Gadekallu, T. R. (2021). Cross corpus multi-lingual speech emotion recognition using ensemble learning. Complex & Intelligent Systems, 7(4), 1845-1854
- Demircan, S., & Kahramanli, H. (2018). Application of fuzzy C-means clustering algorithm to spectral features for emotion classification from speech. Neural Computing and Applications, 29(8), 59-66.
- Ganapathy, A. (2016). Speech Emotion Recognition Using Deep Learning Techniques. ABC Journal of Advanced Research, 5(2), 113-122.
- Abbaschian, B. J., Sierra-Sosa, D., & Elmaghraby, A. (2021). Deep learning techniques for speech emotion recognition, from databases to models. Sensors, 21(4), 1249.
- Demir, A., Atila, O., & Şengür, A. (2019, September). Deep learning and audio based emotion recognition. In 2019 International Artificial Intelligence and Data Processing Symposium (IDAP) (pp. 1-6). IEEE.
- Meng, H., Yan, T., Yuan, F., & Wei, H. (2019). Speech emotion recognition from 3D log-mel spectrograms with deep learning network. IEEE access, 7, 125868-125881.
- Xie, Y., Liang, R., Liang, Z., Huang, C., Zou, C., & Schuller, B. (2019). Speech emotion classification using attention-based LSTM. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(11), 1675-1685.
- Jalal, M. A., Milner, R., & Hain, T. (2020, October). Empirical Interpretation of Speech Emotion Perception with Attention Based Model for Speech Emotion Recognition. In INTERSPEECH (pp. 4113-4117).
- Issa, D., Demirci, M. F., & Yazici, A. (2020). Speech emotion recognition with deep convolutional neural networks. Biomedical Signal Processing and Control, 59, 101894.
- Mustaqeem, Kwon, S. (2020). CLSTM: Deep feature-based speech emotion recognition using the hierarchical ConvLSTM network. Mathematics, 8(12), 2133.
- Mustaqeem, Sajjad, M., & Kwon, S. (2020). Clustering-based speech emotion recognition by incorporating learned features and deep BiLSTM. IEEE Access, 8, 79861-79875.
- Anvarjon, T., Mustaqeem, & Kwon, S. (2020). Deep-net: A lightweight CNN-based speech emotion recognition system using deep frequency features. Sensors, 20(18), 5212.
- Li, D., Liu, J., Yang, Z., Sun, L., & Wang, Z. (2021). Speech emotion recognition using recurrent neural networks with directional self-attention. Expert Systems with Applications, 173, 114683.
- Mustaqeem, & Kwon, S. (2021). MLT-DNet: Speech emotion recognition using 1D dilated CNN based on multi-learning trick approach. Expert Systems with Applications, 167, 114177.
- Yusuf, S. M., Adedokun, E. A., Muazu, M. B., Umoh, I. J., & Ibrahim, A. A. (2021, October). RMWSaug: Robust Multi-window Spectrogram Augmentation Approach for Deep Learning based Speech Emotion Recognition. In 2021 Innovations in Intelligent Systems and Applications Conference (ASYU) (pp. 1-6). IEEE.
- Zhang, S., Tao, X., Chuang, Y., & Zhao, X. (2021). Learning deep multimodal affective features for spontaneous speech emotion recognition. Speech Communication, 127, 73-81.
- Oflazoglu, Ç., & Yildirim, S. (2011, April). Turkish emotional speech database. In 2011 IEEE 19th Signal Processing and Communications Applications Conference (SIU) (pp. 1153-1156). IEEE.
- Grimm, M., Kroschel, K., & Narayanan, S. (2008, June). The Vera am Mittag German audio-visual emotional speech database. In 2008 IEEE international conference on multimedia and expo (pp. 865-868). IEEE.
- Oflazoglu, C., & Yildirim, S. (2013). Recognizing emotion from Turkish speech using acoustic features. EURASIP Journal on Audio, Speech, and Music Processing, 2013(1), 1-11.
- Eroglu Erdem, C., Turan, C., & Aydin, Z. (2015). BAUM-2: a multilingual audio-visual affective face database. Multimedia tools and applications, 74(18), 7429-7459.
- Meral, H. M., Ekenel, H. K., & Ozsoy, A. (2003). Analysis of emotion in Turkish. In XVII National Conference on Turkish Linguistics.
- Kaya, H., Salah, A. A., Gürgen, S. F., & Ekenel, H. (2014, April). Protocol and baseline for experiments on Bogazici University Turkish emotional speech corpus. In 2014 22nd Signal Processing and Communications Applications Conference (SIU) (pp. 1698-1701). IEEE.
- Parlak, C., Diri, B., & Gürgen, F. (2014, September). A cross-corpus experiment in speech emotion recognition. In SLAM@ INTERSPEECH (pp. 58-61).
- Oflazoglu, Ç., & Yıldırım, S. (2015, May). Binary classification performances of emotion classes for Turkish Emotional Speech. In 2015 23nd Signal Processing and Communications Applications Conference (SIU) (pp. 2353-2356). IEEE.
- Zhalehpour, S., Onder, O., Akhtar, Z., & Erdem, C. E. (2016). BAUM-1: A spontaneous audio-visual face database of affective and mental states. IEEE Transactions on Affective Computing, 8(3), 300-313.
- Bakır, C., & Yuzkat, M. (2018). Speech emotion classification and recognition with different methods for Turkish language. Balkan Journal of Electrical and Computer Engineering, 6(2), 122-128.
- Canpolat, S. F., Ormanoğlu, Z., & Zeyrek, D. (2020, May). Turkish Emotion Voice Database (TurEV-DB). In Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL) (pp. 368-375).
- Özsönmez, D. B., Acarman, T., & Parlak, İ. B. (2021, June). Optimal Classifier Selection in Turkish Speech Emotion Detection. In 2021 29th Signal Processing and Communications Applications Conference (SIU) (pp. 1-4). IEEE.