Serbest Sırada Birliktelik İstatistiklerinin Kullanımıyla Türkçe'nin Biçimbirimsel Belirsizliği'nin Giderilmesi

Enis Arslan; Umut Orhan; B. Tahir Tahiroğlu

doi:10.17714/gumusfenbil.430034

Araştırma Makalesi

Morphological Disambiguation of Turkish with Free-order Co-occurrence Statistics

Yıl 2018, CMES 2018 Ek Sayısı, 46 - 52, 30.11.2018

Enis Arslan , Umut Orhan , B. Tahir Tahiroğlu

https://doi.org/10.17714/gumusfenbil.430034

Öz

In this article, a solution to the morphological ambiguity problem which
occurs frequently in morphologically complex languages like Turkish is proposed.
Generally, statistical methods are applicable for these tasks which maximize
the information, obtained for a probable word order sequence in a sentence. The
decision in selection of the method for calculation of the probabilities and
the sequence selection method depends on the nature of the language. By using
the co-occurrence statistics obtained from a semantic graph network which
represents the lemmas of the sentences, the best word order sequence is
selected from the alternatives. The non-ambiguous and free-word-order character
of this network is helpful in determining the statistics independently. The
probability values are obtained by using the Naive Bayes (NB) method and the
selection of each word sequence is achieved by maximization, in the inspiration
of the Viterbi algorithm.

Anahtar Kelimeler

Co-occurrence, Morphological ambiguity, Naive Bayes, the Viterbi algorithm

Kaynakça

Ballesteros, L. and Croft, W.B., 1998. August. Resolving ambiguity for cross-language retrieval. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 64-71). ACM.
Beluga, S., Meštrović, A. and Martinčić-Ipšić, S., 2015. An overview of graph-based keyword extraction methods and approaches. Journal of information and organizational sciences, 39 (1), pp.1-20.
Borge-Holthoefer, J. and Arenas, A., 2010. Semantic networks: Structure and dynamics. Entropy, 12 (5), pp.1264-1302.
Duque, A., Stevenson, M., Martinez-Romo, J. and Araujo, L., 2018. Co-occurrence graphs for word sense disambiguation in the biomedical domain. Artificial intelligence in medicine.
Eryiğit, G., 2012. Biçimbilimsel Çözümleme. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi, 5 (2).
Fan, X., Wang, J., Pu, X., Zhou, L. and Lv, B., 2011. On graph-based name disambiguation. Journal of Data and Information Quality (JDIQ), 2 (2), p.10.
Hessami, E., Mahmoudi, J. and Jadidinejad, A.H., 2011. Unsupervised graph-based word sense disambiguation using the lexical relation of WordNet. Int. J. Comput. Sci. Issues (IJCSI).
Lahiri, S., Choudhury, S.R. and Caragea, C., 2014. Keyword and keyphrase extraction using centrality measures on collocation networks. arXiv preprint arXiv:1401.6571.
Litvak, M., Last, M., Aizenman, H., Gobits, I. and Kandel, A., 2011. DegExt—A language-independent graph-based keyphrase extractor. In Advances in Intelligent Web Mastering–3 (pp. 121-130). Springer, Berlin, Heidelberg.
Martinez-Romo, J., Araujo, L., Borge-Holthoefer, J., Arenas, A., Capitán, J.A. and Cuesta, J.A., 2011. Disentangling categorical relationships through a graph of co-occurrences. Physical Review E, 84 (4), p.046108.
Matsuo, Y., Ohsawa, Y. and Ishizuka, M., 2001. November. Keyword: Extracting keywords from document s small world. In International Conference on Discovery Science (pp. 271-281). Springer, Berlin, Heidelberg.
Mihalcea, R. and Tarau, P., 2004. Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing.
Minkov, E., Cohen, W.W. and Ng, A.Y., 2006. August. Contextual search and name disambiguation in email using graphs. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 27-34). ACM.
Niwa, Y. and Nitta, Y., 1994. August. Co-occurrence vectors from corpora vs. distance vectors from dictionaries. In Proceedings of the 15th conference on Computational linguistics-Volume 1 (pp. 304-309). Association for Computational Linguistics.
Sak, H., Güngör, T. and Saraçlar, M., 2007, February. Morphological disambiguation of Turkish text with perceptron algorithm. In International Conference on Intelligent Text Processing and Computational Linguistics (pp. 107-118). Springer, Berlin, Heidelberg.
Sinha, R. and Mihalcea, R., 2007. Unsupervised graph-based word sense disambiguation using measures of word semantic similarity. In Semantic Computing, 2007. ICSC 2007. International Conference on (pp. 363-369). IEEE.
Tahiroğlu B.T. 2014, Türkçe Çevrim İçi Haber Metinlerinde Yeni Sözlerin (Neolojizm) Otomatik Çıkarımı. Derlem Dilbilim Uygulamaları, Özkan, B., Tahiroğlu, B. Tahir ve Özkan Ayşe Eda (Ed.), Karahan Kitabevi Yayınları, Adana, ss.1-22.

Serbest Sırada Birliktelik İstatistiklerinin Kullanımıyla Türkçe'nin Biçimbirimsel Belirsizliği'nin Giderilmesi

Yıl 2018, CMES 2018 Ek Sayısı, 46 - 52, 30.11.2018

Enis Arslan , Umut Orhan , B. Tahir Tahiroğlu

https://doi.org/10.17714/gumusfenbil.430034

Öz

Bu makalede, Türkçe gibi biçimsel
olarak karmaşık yapıda olan dillerde sıklıkla karşılaşılan biçimbirimsel
belirsizlik problemi için bir çözüm önerilmiştir. Genellikle, bu tipte bir
problemin çözümü için bir cümledeki muhtemel kelime sıralarından uygun olanın
seçilmesi için bilgiyi maksimuma çıkaran istatistiksel yöntemler
uygulanmaktadır. Olasılıkların hesaplanması ve uygun sıranın seçilmesi için
tercih edilecek metot uygulanacak dilin doğasına bağlıdır. Cümlelerde geçen
kelimelerin madde başlarının oluşturduğu bir anlamsal çizgeden elde edilen
birliktelik istatistikleri kullanılarak alternatifler arasından uygun olan
kelime sıra dizilimi seçilmektedir. Bu çizge ağının belirsizlik içermeyen
serbest sıralı karakteri istatistiklerin bağımsız olarak hesaplanmasında
oldukça faydalıdır. Olasılıksal değerler Naive Bayes (NB) yöntemi kullanılarak
elde edilmekte ve her kelime sıraları arasından uygun olanının, Viterbi
algoritmasından esinlenilerek, maksimumu seçilmektedir.

Anahtar Kelimeler

Birliktelik, Biçimbirimsel belirsizlik, Naive Bayes, Viterbi algoritması

Kaynakça

Ballesteros, L. and Croft, W.B., 1998. August. Resolving ambiguity for cross-language retrieval. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 64-71). ACM.
Beluga, S., Meštrović, A. and Martinčić-Ipšić, S., 2015. An overview of graph-based keyword extraction methods and approaches. Journal of information and organizational sciences, 39 (1), pp.1-20.
Borge-Holthoefer, J. and Arenas, A., 2010. Semantic networks: Structure and dynamics. Entropy, 12 (5), pp.1264-1302.
Duque, A., Stevenson, M., Martinez-Romo, J. and Araujo, L., 2018. Co-occurrence graphs for word sense disambiguation in the biomedical domain. Artificial intelligence in medicine.
Eryiğit, G., 2012. Biçimbilimsel Çözümleme. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi, 5 (2).
Fan, X., Wang, J., Pu, X., Zhou, L. and Lv, B., 2011. On graph-based name disambiguation. Journal of Data and Information Quality (JDIQ), 2 (2), p.10.
Hessami, E., Mahmoudi, J. and Jadidinejad, A.H., 2011. Unsupervised graph-based word sense disambiguation using the lexical relation of WordNet. Int. J. Comput. Sci. Issues (IJCSI).
Lahiri, S., Choudhury, S.R. and Caragea, C., 2014. Keyword and keyphrase extraction using centrality measures on collocation networks. arXiv preprint arXiv:1401.6571.
Litvak, M., Last, M., Aizenman, H., Gobits, I. and Kandel, A., 2011. DegExt—A language-independent graph-based keyphrase extractor. In Advances in Intelligent Web Mastering–3 (pp. 121-130). Springer, Berlin, Heidelberg.
Martinez-Romo, J., Araujo, L., Borge-Holthoefer, J., Arenas, A., Capitán, J.A. and Cuesta, J.A., 2011. Disentangling categorical relationships through a graph of co-occurrences. Physical Review E, 84 (4), p.046108.
Matsuo, Y., Ohsawa, Y. and Ishizuka, M., 2001. November. Keyword: Extracting keywords from document s small world. In International Conference on Discovery Science (pp. 271-281). Springer, Berlin, Heidelberg.
Mihalcea, R. and Tarau, P., 2004. Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing.
Minkov, E., Cohen, W.W. and Ng, A.Y., 2006. August. Contextual search and name disambiguation in email using graphs. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 27-34). ACM.
Niwa, Y. and Nitta, Y., 1994. August. Co-occurrence vectors from corpora vs. distance vectors from dictionaries. In Proceedings of the 15th conference on Computational linguistics-Volume 1 (pp. 304-309). Association for Computational Linguistics.
Sak, H., Güngör, T. and Saraçlar, M., 2007, February. Morphological disambiguation of Turkish text with perceptron algorithm. In International Conference on Intelligent Text Processing and Computational Linguistics (pp. 107-118). Springer, Berlin, Heidelberg.
Sinha, R. and Mihalcea, R., 2007. Unsupervised graph-based word sense disambiguation using measures of word semantic similarity. In Semantic Computing, 2007. ICSC 2007. International Conference on (pp. 363-369). IEEE.
Tahiroğlu B.T. 2014, Türkçe Çevrim İçi Haber Metinlerinde Yeni Sözlerin (Neolojizm) Otomatik Çıkarımı. Derlem Dilbilim Uygulamaları, Özkan, B., Tahiroğlu, B. Tahir ve Özkan Ayşe Eda (Ed.), Karahan Kitabevi Yayınları, Adana, ss.1-22.

Toplam 17 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Mühendislik
Bölüm	Makaleler
Yazarlar	Enis Arslan 0000-0002-2609-3925 Umut Orhan 0000-0003-1882-6567 B. Tahir Tahiroğlu 0000-0002-7956-3257
Yayımlanma Tarihi	30 Kasım 2018
Gönderilme Tarihi	2 Haziran 2018
Kabul Tarihi	30 Kasım 2018
Yayımlandığı Sayı	Yıl 2018 CMES 2018 Ek Sayısı

Kaynak Göster

APA	Arslan, E., Orhan, U., & Tahiroğlu, B. T. (2018). Serbest Sırada Birliktelik İstatistiklerinin Kullanımıyla Türkçe’nin Biçimbirimsel Belirsizliği’nin Giderilmesi. Gümüşhane Üniversitesi Fen Bilimleri Dergisi46-52. https://doi.org/10.17714/gumusfenbil.430034

Kapak Resmi İndir

Makale Dosyaları

Tam Metin