Speaking is the easiest and natural form of communication between people. Intensive studies are made in order to provide this communication via computers between people. The systems using voice biometric technology are attracting attention especially in the angle of cost and usage. When compared with the other biometic systems the application is much more practical. For example by using a microphone placed in the environment voice record can be obtained even without notifying the user and the system can be applied. Moreover the remote access facility is one of the other advantages of voice biometry. In this study, it is aimed to automatically determine the gender of the speaker through the speech waves which include personal information. If the speaker gender can be determined while composing models according to the gender information, the success of voice recognition systems can be increased in an important degree. Generally all the speaker recognition systems are composed of two parts which are feature extraction and matching. Feature extraction is the procedure in which the least information presenting the speech and the speaker is determined through voice signal. There are different features used in voice applications such as LPC, MFCC and PLP. In this study as a feature vector MFCC is used. Feature mathcing is the procedure in which the features derived from unknown speakers and known speaker group are compared. According to the text used in comparison the system is devided to two parts that are text dependent and text independent. While the same text is used in text dependent systems, different texts are used in indepentent text systems. Nowadays, DTW and HMM are text dependent, VQ and GMM are text indepentent matching methods. In this study due to the high success ratio and simple application features VQ approach is used.
In this study a system which determines the speaker gender automatically and text independent is proposed. The proposed system is composed of two levels that are training and testing. In the training level MFCC feature vector is calculated by speaker gender known voice records. MFCC feature vector models the frequency perception of human ear and is one of the most preferred methods. As in all the voice analysis methods, MFCC method is also applied through the short parts which are accepted as having stable voice proporties. These parts generally are chosen as 20-30 ms and while moving 10-15 ms shifting amounts they are applied to the whole signal. A window function is applied in order to decrease the discontinuty that are at the edges of derived analysis windows. In voice applications generally hamming window is preferred. Following the windowing procedure the signal is taken to the frequency space by FFT method. The derived FFT spectrum is converted to mel-spectrum by the scale which models human frequency perception and is called as mel-scala. Mel-scala has a lineer charactristics up to 1Khz and a logarithmic characteristics over 1 Khz. For converting procedure triangle filters are used of which the band width differs lineerly due to the mel-scala. Generally as the filter coefficient a value is chosen between 20 and 30. In the last stage, the logaritm of mel spectrum is taken and we back to time domain. The coefficients derived at the end are called MFCC. The MFCC features derived for each speaker are converted to a smaller vector space by using VQ method. VQ is the transformation to limited numbers of subspaces from a wide vector space. Each subspace is presented with a centre point which is named as code word. Code words constituates code book. One of the methods which is used to compress N number training vector group to M number (M N) code book vector is LBG algorithm.
The system is trained with 16 records in which 8 male and 8 female speaks the same sentence. In the testing level 10 different sentences which are spoken by 56 female and 112 male are used. In the total of 1680 test data only 34 incorrect decisions are made and 98% success is achieved.
Bu çalışmada konuşmacı cinsiyetinin metinden bağımsız olarak belirlenmesi amaçlanmaktadır. Önerilen sistem iki bölümden oluşmaktadır. Birinci bölüm olan eğitim aşamasında deneklerden alınan ses kayıtlarından öznitelik vektörü hesaplanır. Çalışmada öznitelik vektörü olarak MFCC(Mel Frequency Cepstral Coefficients) kullanılmıştır. Elde edilen MFCC öznitelik vektörü VQ (Vector Quantization) yöntemiyle sınıflandırılır ve veritabanına kaydedilerek eğitim aşaması tamamlanır. İkinci bölüm olan test aşamasında konuşmacı cinsiyeti bilinmeyen ses kayıtları giriş olarak alınır ve eğitim aşamasındaki gibi öznitelik vektörü hesaplanır. Elde edilen öznitelik vektörü eğitim veritabanındaki verilerle kıyaslanarak erkek ve bayan sınıflar için ortalama bir uzaklık değeri hesaplanır. Bu uzaklık değerlerinden küçük olanı test verisinin hangi sınıfa ait olduğunu belirtir. Çalışmada TIMIT veritabanı üzerinde çeşitli testler yapılmıştır. Bu testlerden 168 konuşmacının 10'ar cümle söylediği toplam 1680 veriden oluşan test kümesinde yalnızca 34 hatalı karar verilerek %98,80 başarı elde edilmiştir.
Anahtar kelimeler: Konuşmacı tanıma, cinsiyet tanıma, vektör niceleme (VQ)
Birincil Dil | Türkçe |
---|---|
Bölüm | Araştırma Makaleleri |
Yazarlar | |
Yayımlanma Tarihi | 6 Kasım 2013 |
Yayımlandığı Sayı | Yıl 2009 Cilt: 1 Sayı: 1 |