A Turkish Broadcast News Speech Database for Investigation the Effect of Deep Neural Network and Long Short Term Memory Hyperparameters on Speech Recognition Based Systems

Serhat Ok; Zekeriya Tüfekci

doi:10.31590/ejosat.900422

Research Article

A Turkish Broadcast News Speech Database for Investigation the Effect of Deep Neural Network and Long Short Term Memory Hyperparameters on Speech Recognition Based Systems

Year 2021, Issue: 24, 87 - 92, 15.04.2021

Serhat Ok , Zekeriya Tüfekci

https://doi.org/10.31590/ejosat.900422

Cited By: 1

Abstract

Speech recognition is the transformation of spoken words and sentences into text. There have been many studies on speech recognition in many countries recently. However, studies on speech recognition applications in our country are very few, one of the reasons is the lack of voice dataset. In this study, a Turkish speech database has been developed for Turkish speech recognition based systems. Sound recordings were obtained from news broadcasted by Turkish news tv channels at different times. The created data set was shared on the web in a way that everyone can access in order to set a precedent for other studies. Additionally, the effects of number of layers and number of cells hyperparameters of Long Short Term Memory (LSTM) and Deep Neural Network (DNN) models were investigated on the Turkish Broadcast News Speech Database.

Keywords

Speech Recognition, Long Short Term Memory, Deep Neural Networks, Turkish Speech Recognition Database

References

Bengio, Y., 2009. "Learning Deep Architectures for AI" (PDF). Foundations and Trends in Machine Learning. 1–127.
Gaikwad, S., Gawali, B. W., & Yannawar, P. 2010. A review on Speech Recognition Technique. , pp. 16-24
Graves, A., Mohamed, A. R., & Hinton, G. (2013, May). Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing (pp. 6645-6649). IEEE.
Graves, A., Jaitly, N., & Mohamed, A. R. (2013b, December). Hybrid speech recognition with deep bidirectional LSTM. In 2013 IEEE workshop on automatic speech recognition and understanding (pp. 273-278). IEEE.
Hizlisoy, S., 2020. Music Emotion Recognition Using Convolutional Long Short Memory Deep Neural Networks.
Patlar, F., 2009. A Continuous Speech Recognition System For Turkish Language Based On Triphone Model.
Sepp Hochreiter; Jürgen Schmidhuber (1997). "LSTM can Solve Hard Long Time Lag Problems". Advances in Neural Information Processing Systems 9. Advances in Neural Information Processing Systems. Wikidata Q77698282.
Tüfekci, Z., and Dokuz, Y., 2020. Investigation of the Effect of LSTM Hyperparameters on Speech Recognition Performance , European Journal of Science and Technology: p. 165.
Yu, D., & Deng, L. (2016). Automatic Speech Recognition: A Deep Learning Approach. Springer
Zuo, Z., Shuai, B., Wang, G., Liu, X., Wang, X., Wang, B. (2016). Learning Contextual Dependence with Convolutional Hierarchical Recurrent Neural Networks. IEEE Transactions on Image Processing, 25, 2983-2996.

Derin Sinir Ağları ve Uzun Kısa Süreli Bellek Hiperparametrelerinin Konuşma Tanıma Tabanlı Sistemler Üzerindeki Etkisinin İncelenmesi için Türkçe Yayın Haberleri Konuşma Veri Tabanı

Year 2021, Issue: 24, 87 - 92, 15.04.2021

Serhat Ok , Zekeriya Tüfekci

https://doi.org/10.31590/ejosat.900422

Cited By: 1

Abstract

Konuşma tanıma, söylenen kelime ve cümlelerin metne dönüştürülmesidir. Son zamanlarda birçok ülkede konuşma tanıma ile ilgili birçok çalışma yapılmıştır, fakat ülkemizde konuşma tanıma uygulamaları ile ilgili yapılan çalışmalar çok azdır, bunun nedenlerinden biri ses veri seti eksikliğidir. Bu çalışmada, Türkçe konuşma tanıma tabanlı sistemler için bir Türkçe konuşma veri tabanı geliştirilmiştir. Ses kayıtları Türkçe haber tv kanallarının farklı zamanlarda yayınladıkları haberlerden elde edilmiştir. Oluşturulan veri seti diğer çalışmalara da emsal teşkil etmesi açısından herkesin erişebileceği şekilde web ortamında paylaşılmıştır. Ek olarak, katman sayısı ve hücre sayısı hiper parametrelerinin Uzun Kısa Süreli Hafıza (LSTM) ve Derin Sinir Ağı (DNN) modelleri üzerindeki etkisi oluşturduğumuz Türkçe Yayın Haberleri Konuşma veri seti üzerinde incelendi ve karşılaştırıldı.

Keywords

Konuşma Tanıma, Türkçe Konuşma Tanıma Veriseti, Uzun Kısa Süreli Bellek, Derin Sinir Ağları

References

Bengio, Y., 2009. "Learning Deep Architectures for AI" (PDF). Foundations and Trends in Machine Learning. 1–127.
Gaikwad, S., Gawali, B. W., & Yannawar, P. 2010. A review on Speech Recognition Technique. , pp. 16-24
Graves, A., Mohamed, A. R., & Hinton, G. (2013, May). Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing (pp. 6645-6649). IEEE.
Graves, A., Jaitly, N., & Mohamed, A. R. (2013b, December). Hybrid speech recognition with deep bidirectional LSTM. In 2013 IEEE workshop on automatic speech recognition and understanding (pp. 273-278). IEEE.
Hizlisoy, S., 2020. Music Emotion Recognition Using Convolutional Long Short Memory Deep Neural Networks.
Patlar, F., 2009. A Continuous Speech Recognition System For Turkish Language Based On Triphone Model.
Sepp Hochreiter; Jürgen Schmidhuber (1997). "LSTM can Solve Hard Long Time Lag Problems". Advances in Neural Information Processing Systems 9. Advances in Neural Information Processing Systems. Wikidata Q77698282.
Tüfekci, Z., and Dokuz, Y., 2020. Investigation of the Effect of LSTM Hyperparameters on Speech Recognition Performance , European Journal of Science and Technology: p. 165.
Yu, D., & Deng, L. (2016). Automatic Speech Recognition: A Deep Learning Approach. Springer
Zuo, Z., Shuai, B., Wang, G., Liu, X., Wang, X., Wang, B. (2016). Learning Contextual Dependence with Convolutional Hierarchical Recurrent Neural Networks. IEEE Transactions on Image Processing, 25, 2983-2996.

There are 10 citations in total.

Details

Primary Language	English
Subjects	Engineering
Journal Section	Articles
Authors	Serhat Ok 0000-0002-9764-2952 Zekeriya Tüfekci 0000-0001-7835-2741
Publication Date	April 15, 2021
Published in Issue	Year 2021 Issue: 24

Cite

APA	Ok, S., & Tüfekci, Z. (2021). A Turkish Broadcast News Speech Database for Investigation the Effect of Deep Neural Network and Long Short Term Memory Hyperparameters on Speech Recognition Based Systems. Avrupa Bilim Ve Teknoloji Dergisi(24), 87-92. https://doi.org/10.31590/ejosat.900422

Avrupa Bilim ve Teknoloji Dergisi

A Turkish Broadcast News Speech Database for Investigation the Effect of Deep Neural Network and Long Short Term Memory Hyperparameters on Speech Recognition Based Systems

Abstract

Keywords

References

Derin Sinir Ağları ve Uzun Kısa Süreli Bellek Hiperparametrelerinin Konuşma Tanıma Tabanlı Sistemler Üzerindeki Etkisinin İncelenmesi için Türkçe Yayın Haberleri Konuşma Veri Tabanı

Abstract

Keywords

References

Details

Cite

Cited By

VPSA-Based Transfer Function Identification of Single DoF Copter System

International Journal of Aviation Science and Technology

https://doi.org/10.23890/IJAST.vm04is02.0204