Gerçek Zamanlı Ses Tanıma ile Robot Kolu Kontrolü

Ozan Fırat Çıplak; Serkan Keser

doi:10.31590/ejosat.969608

Research Article

Gerçek Zamanlı Ses Tanıma ile Robot Kolu Kontrolü

Year 2021, Issue: 31, 34 - 39, 31.12.2021

Ozan Fırat Çıplak , Serkan Keser

https://doi.org/10.31590/ejosat.969608

Abstract

Gün geçtikçe cihazların uzaktan kontrolünü gerçekleştiren tanıma sistemleri gelişmektedir. En çok kullanılan tanıma sistemleri olarak konuşma, yüz ve parmak izi tanıma sistemleri gösterilebilir. Konuşma tanıma sistemleri güvenlik sistemlerinde, cihaz kontrolü sistemlerinde ve dikte ettirme sistemlerinde gerçek zamanlı olarak kullanılabilmektedir. Bu çalışmada konuşma komutlarının gerçek zamanlı olarak tanınması ile robot kolu kontrolü gerçekleştirilmiştir. Konuşma komutlarının tanınması için Yapay Sinir Ağları (YSA), Fisher Doğrusal Ayrım Analizi (FDAA) ve Ayırt Edici Ortak Vektör (AOVY) sınıflandırıcıları kullanılmıştır. Eğitim kümesi için, her biri altı farklı renge sahip dört farklı nesne için toplam 24 adet konuşma cümleleri oluşturulmuştur. Eğitim kümesindeki konuşma sinyalleri 8 konuşmacı tarafından oluşturulmuştur. Test ve eğitim aşamalarında her kişi 50 konuşma sinyalli seslendirmiştir. Komutun tanınması ile robot kolu önceden koordinatları belli olan nesneye yöneltilmektedir. Çalışma sonucunda AOVY için dil modelli ortalama konuşma tanıma oranı % 97,13 ve dil modelsiz % 88,20 olarak bulunmuştur. FDAA için dil modelsiz ortalama konuşma tanıma oranı % 87,3 ve dil modelli % 96,3 olarak bulunmuştur. YSA için dil modelli ortalama konuşma tanıma oranı % 89,76 ve dil modelsiz % 82,3 bulunmuştur.

Keywords

Konuşma tanıma, Robot kolu kontrolü, YSA, FDAA, AOVY

References

Akyazi, Ö., Şahin, E., Özsoy, T., & Algül, M. (2019). A Solar Panel Cleaning Robot Design and Application. Avrupa Bilim ve Teknoloji Dergisi, 343-348.
Anggraeni, D., Sanjaya, W. S. M., Nurasyidiek, M. Y. S., & Munawwaroh, M. (2018). The implementation of speech recognition using mel-frequency cepstrum coefficients (MFCC) and support vector machine (SVM) method based on python to control robot arm. In IOP Conference Series: Materials Science and Engineering (Vol. 288, No. 1, p. 012042). IOP Publishing.
Beigi, H. (2011). Speaker recognition. In Fundamentals of Speaker Recognition (pp. 543-559). Springer, Boston, MA.
Belhumeur, P. N., Hespanha, J. P., & Kriegman, D. J. (1997). Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Transactions on pattern analysis and machine intelligence, 19(7), 711-720.
Çevikalp, H., Neamtu, M., Wilkes, M., & Barkana, A. (2004, April). A novel method for face recognition. In Proceedings of the IEEE 12th Signal Processing and Communications Applications Conference, 2004. (pp. 579-582). IEEE.
Dokuz, Y., & Tüfekci, Z. (2020). A Review on Deep Learning Architectures for Speech Recognition. Avrupa Bilim ve Teknoloji Dergisi, 169-176.
Filho, G. L., & Moir, T. J. (2010). From science fiction to science fact: a smart-house interface using speech technology and a photo-realistic avatar. International journal of computer applications in technology, 39(1-3), 32-39.
Furui, S., Kikuchi, T., Shinnaka, Y., & Hori, C. (2004). Speech-to-text and speech-to-speech summarization of spontaneous speech. IEEE Transactions on Speech and Audio Processing, 12(4), 401-408.
Gunal, S., & Edizkan, R. (2007, July). Use of novel feature extraction technique with subspace classifiers for speech recognition. In IEEE International Conference on Pervasive Services (pp. 80-83). IEEE.
Gunal, S., & Edizkan, R. (2008). Subspace based feature selection for pattern recognition. Information Sciences, 178(19), 3716-3726.
Gülmezoglu, M. B., Dzhafarov, V., Keskin, M., & Barkana, A. (1999). A novel approach to isolated word recognition. IEEE Transactions on Speech and Audio Processing, 7(6), 620-628.
Gülmezoğlu, M. B., Dzhafarov, V., Edizkan, R., & Barkana, A. (2007). The common vector approach and its comparison with other subspace methods in case of sufficient data. Computer Speech & Language, 21(2), 266-281.
Keser, S., & Edizkan, R. (2009, April). Phonem-based isolated Turkish word recognition with subspace classifier. In 2009 IEEE 17th Signal Processing and Communications Applications Conference (pp. 93-96). IEEE.
Lalitha, S., Mudupu, A., Nandyala, B. V., & Munagala, R. (2015, December). Speech emotion recognition using DWT. In 2015 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC) (pp. 1-4). IEEE.
Çıplak,O.F., ve Keser S., (2021). Robot Arm Controlling With Real-time Speech Recognition Using Subspace Based Classifiers, ISPEC 10th International Conference on Engineering & natural sciences , (pp. 247-256), Siirt University, Siirt, TURKEY.
Soujanya, M., & Kumar, S. (2010, August). Personalized IVR system in contact center. In 2010 International Conference on Electronics and Information Engineering (Vol. 1, pp. V1-453). IEEE.
Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., & Kitamura, T. (2000, June). Speech parameter generation algorithms for HMM-based speech synthesis. In 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 00CH37100) (Vol. 3, pp. 1315-1318). IEEE.
Wang, Y., & Guan, L. (2004, October). An investigation of speech-based human emotion recognition. In IEEE 6th Workshop on Multimedia Signal Processing, 2004. (pp. 15-18). IEEE.
Yavuz, H. S., Çevikalp, H., & Barkana, A. (2006). Two-dimensional CLAFIC methods for image recognition. In 2006 IEEE 14th Signal Processing and Communications Applications.

Robot Arm Control with Real-Time Speech Recognition

Year 2021, Issue: 31, 34 - 39, 31.12.2021

Ozan Fırat Çıplak , Serkan Keser

https://doi.org/10.31590/ejosat.969608

Abstract

Recognition systems, which perform remote control of devices, are developing day by day. Speech, face, and fingerprint recognition systems seem to be the most frequently used recognition systems. Speech recognition systems can be used in real-time for security, device control and dictation systems. In this study, the robot arm is controlled by recognizing the real-time speech commands. Artificial Neural Networks (ANN), Fisher Linear Discrimination Analysis (FLDA) and Discriminative Common Vector Approach (DCVA) classifiers were used to recognize speech commands. For the training set, a total of 24 speech sentences have been recorded for four different objects with six different colors. Speech signals in the training set have been generated by 8 speakers. During the test and training phases, each person voiced 50 speech signals. The robot arm is directed to the objects whose coordinates are known beforehand with the recognition of the command. As a result of the study, the average speech recognition rate for DCVA with language model was % 97,13 and without language model was % 88,20. For the FLDA, the average speech recognition rate without language model was % 87,3 and with language model was % 96,3. For ANN, the average speech recognition rate with language model was % 89,76 and without language model % 82,3.

Keywords

Speech recognition, Robot arm controlling, ANN, FLDA, DCVA

References

Akyazi, Ö., Şahin, E., Özsoy, T., & Algül, M. (2019). A Solar Panel Cleaning Robot Design and Application. Avrupa Bilim ve Teknoloji Dergisi, 343-348.
Anggraeni, D., Sanjaya, W. S. M., Nurasyidiek, M. Y. S., & Munawwaroh, M. (2018). The implementation of speech recognition using mel-frequency cepstrum coefficients (MFCC) and support vector machine (SVM) method based on python to control robot arm. In IOP Conference Series: Materials Science and Engineering (Vol. 288, No. 1, p. 012042). IOP Publishing.
Beigi, H. (2011). Speaker recognition. In Fundamentals of Speaker Recognition (pp. 543-559). Springer, Boston, MA.
Belhumeur, P. N., Hespanha, J. P., & Kriegman, D. J. (1997). Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Transactions on pattern analysis and machine intelligence, 19(7), 711-720.
Çevikalp, H., Neamtu, M., Wilkes, M., & Barkana, A. (2004, April). A novel method for face recognition. In Proceedings of the IEEE 12th Signal Processing and Communications Applications Conference, 2004. (pp. 579-582). IEEE.
Dokuz, Y., & Tüfekci, Z. (2020). A Review on Deep Learning Architectures for Speech Recognition. Avrupa Bilim ve Teknoloji Dergisi, 169-176.
Filho, G. L., & Moir, T. J. (2010). From science fiction to science fact: a smart-house interface using speech technology and a photo-realistic avatar. International journal of computer applications in technology, 39(1-3), 32-39.
Furui, S., Kikuchi, T., Shinnaka, Y., & Hori, C. (2004). Speech-to-text and speech-to-speech summarization of spontaneous speech. IEEE Transactions on Speech and Audio Processing, 12(4), 401-408.
Gunal, S., & Edizkan, R. (2007, July). Use of novel feature extraction technique with subspace classifiers for speech recognition. In IEEE International Conference on Pervasive Services (pp. 80-83). IEEE.
Gunal, S., & Edizkan, R. (2008). Subspace based feature selection for pattern recognition. Information Sciences, 178(19), 3716-3726.
Gülmezoglu, M. B., Dzhafarov, V., Keskin, M., & Barkana, A. (1999). A novel approach to isolated word recognition. IEEE Transactions on Speech and Audio Processing, 7(6), 620-628.
Gülmezoğlu, M. B., Dzhafarov, V., Edizkan, R., & Barkana, A. (2007). The common vector approach and its comparison with other subspace methods in case of sufficient data. Computer Speech & Language, 21(2), 266-281.
Keser, S., & Edizkan, R. (2009, April). Phonem-based isolated Turkish word recognition with subspace classifier. In 2009 IEEE 17th Signal Processing and Communications Applications Conference (pp. 93-96). IEEE.
Lalitha, S., Mudupu, A., Nandyala, B. V., & Munagala, R. (2015, December). Speech emotion recognition using DWT. In 2015 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC) (pp. 1-4). IEEE.
Çıplak,O.F., ve Keser S., (2021). Robot Arm Controlling With Real-time Speech Recognition Using Subspace Based Classifiers, ISPEC 10th International Conference on Engineering & natural sciences , (pp. 247-256), Siirt University, Siirt, TURKEY.
Soujanya, M., & Kumar, S. (2010, August). Personalized IVR system in contact center. In 2010 International Conference on Electronics and Information Engineering (Vol. 1, pp. V1-453). IEEE.
Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., & Kitamura, T. (2000, June). Speech parameter generation algorithms for HMM-based speech synthesis. In 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 00CH37100) (Vol. 3, pp. 1315-1318). IEEE.
Wang, Y., & Guan, L. (2004, October). An investigation of speech-based human emotion recognition. In IEEE 6th Workshop on Multimedia Signal Processing, 2004. (pp. 15-18). IEEE.
Yavuz, H. S., Çevikalp, H., & Barkana, A. (2006). Two-dimensional CLAFIC methods for image recognition. In 2006 IEEE 14th Signal Processing and Communications Applications.

There are 19 citations in total.

Details

Primary Language	Turkish
Subjects	Engineering
Journal Section	Articles
Authors	Ozan Fırat Çıplak 0000-0002-5197-7810 Serkan Keser 0000-0001-8435-0507
Publication Date	December 31, 2021
Published in Issue	Year 2021 Issue: 31

Cite

APA	Çıplak, O. F., & Keser, S. (2021). Gerçek Zamanlı Ses Tanıma ile Robot Kolu Kontrolü. Avrupa Bilim Ve Teknoloji Dergisi(31), 34-39. https://doi.org/10.31590/ejosat.969608

Download Cover Image

Article Files

Full Text