A LANGUAGE MODELING APPROACH TO TURKISH TEXT RETRIEVAL

Ozgur Yilmazel

Research Article

A LANGUAGE MODELING APPROACH TO TURKISH TEXT RETRIEVAL

Year 2010, Volume: 11 Issue: 2, 163 - 172, 29.11.2010

Ozgur Yilmazel

Abstract

We used Lemur Toolkit, an open source toolkit designed for Information Retrieval research, for our automated indexing and retrieval experiments on a TREC-like test collection for Turkish language. We investigate effectiveness of three retrieval models Lemur supports, especially Language modeling approach to Information Retrieval, combined with language specific preprocessing techniques. Our experiments show that language specific preprocessing significantly improves retrieval performance for all retrieval models. Also Language Modeling approach is the best performing retrieval model when language specific preprocessing applied.

Keywords

Turkish information retrieval, Lemur toolkit, Language modeling

References

Altingovde, I.S., Ozcan, R., Ocalan, H.C., Can, F. and Ulusoy, O. (2007). Large-scale cluster-based retrieval experiments on Turkish texts. Paper presented at the Proceedings international ACM SIGIR conference on Research and development in information retrieval. 30th annual
And, P.O., Ogilvie, P. and Callan, J. (2002). Experiments Using the Lemur Toolkit. Paper presented at the in Proceedings of the Tenth Text Retrieval Conference (TREC-10).
Arslan, A. and Yilmazel, O. (2008). 19-22 Oct. 2008). A comparison of Relational Databases and information retrieval libraries on Turkish text retrieval. Paper presented at the Natural Language Processing and Knowledge Engineering, 2008. Conference on. International
Buckley, C. and Voorhees, E.M. (2004). Retrieval evaluation with incomplete information. Paper presented at the Proceedings international ACM SIGIR conference on Research and development in information retrieval. 27th annual
Can, F., Kocberber, S., Balcik, E., Kaynak, C., Ocalan, H.C., and Vursavas, O.M. (2008). Information retrieval on Turkish texts. J. Am. Soc. Inf. Sci. Technol. 59(3), 407- 421.
Cover, T.M. and Thomas, J.A. (1991). Elemets of information theory. Wiley-Interscience.
Eryigit, G. and Adali, E. (2004). An Affix Stripping Morphological Analyzer For Turkish. Paper presented at the IASTED International Conference on Artificial Intelligence and Applications, Innsbruck, Austria.
Harman, D. (1993). Overview of the first TREC conference. Paper presented at the Proceedings international ACM SIGIR conference on Research and development in information retrieval. 16th annual
Jones, K.S. (1988). A statistical interpretation of term specificity and its application in retrieval Document retrieval systems (pp. 132-142): Taylor Graham Publishing.
Manning, C., Raghavan, P. and Schutze, H. (2008). Introduction to Information Retrieval: Cambridge University Press.
Ponte, J.M. and Croft, W.B. (1998). A language modeling approach to information retrieval. Paper presented at the Proceedings international ACM SIGIR conference on Research and development in information retrieval. 21st annual
Robertson, S., Walker, S., Hancock-Beaulieu, M., Gull, A. and Lau, M. (1992). Okapi at TREC-3. Paper presented at the Text REtrieval Conference.
Salton, G., Wong, A. and Yang, C.S. (1975). A vector space model for automatic indexing. Commun. ACM 18(11), 613- 620.
Walker, S., Robertson, S.E., Boughanem, M., Jones, G.J.F. and Jones, S. (1998). Okapi at TREC-6 - Automatic ad hoc, VLC, routing, filtering and QSDR.
Zhai, C. Notes on the Lemur TFIDF model, from 1.0/tfidf.ps
Zhai, C. and Lafferty, J. (2001). A study of smoothing methods for language models applied to Ad Hoc information retrieval. Paper presented at the Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval.

TÜRKÇE METİN GERİ GETİRIMİNDE DİL MODELLEME YAKLAŞIMI

Year 2010, Volume: 11 Issue: 2, 163 - 172, 29.11.2010

Ozgur Yilmazel

Abstract

Bu çalışmada, bilgi erişimi araştırması için tasarlanmış açık kaynak kodlu bir araç olan Lemur kullanılarak, Türkçe dili için hazırlanmış TREC benzeri bir derlem üzerinde otomatik indeksleme ve geri getirme deneyleri gerçekleştirildi. Bilgi erişiminde dil modelleme yaklaşımı başta olmak üzere Lemur tarafından desteklenen üç geri getirme modeli ve dile özgü ön işleme teknikleri araştırıldı. Deneylerimiz, dile özgü ön işleme tekniklerinin tüm geri getirim modelleri için geri getirme performansını artırdığını gösterdi. Ayrıca Türkçe dili için en iyi performans dil modelleme yaklaşımından elde edildi.

Keywords

Türkçe bilgi erişimi, Lemur aracı, Dil modelleme

References

Altingovde, I.S., Ozcan, R., Ocalan, H.C., Can, F. and Ulusoy, O. (2007). Large-scale cluster-based retrieval experiments on Turkish texts. Paper presented at the Proceedings international ACM SIGIR conference on Research and development in information retrieval. 30th annual
And, P.O., Ogilvie, P. and Callan, J. (2002). Experiments Using the Lemur Toolkit. Paper presented at the in Proceedings of the Tenth Text Retrieval Conference (TREC-10).
Arslan, A. and Yilmazel, O. (2008). 19-22 Oct. 2008). A comparison of Relational Databases and information retrieval libraries on Turkish text retrieval. Paper presented at the Natural Language Processing and Knowledge Engineering, 2008. Conference on. International
Buckley, C. and Voorhees, E.M. (2004). Retrieval evaluation with incomplete information. Paper presented at the Proceedings international ACM SIGIR conference on Research and development in information retrieval. 27th annual
Can, F., Kocberber, S., Balcik, E., Kaynak, C., Ocalan, H.C., and Vursavas, O.M. (2008). Information retrieval on Turkish texts. J. Am. Soc. Inf. Sci. Technol. 59(3), 407- 421.
Cover, T.M. and Thomas, J.A. (1991). Elemets of information theory. Wiley-Interscience.
Eryigit, G. and Adali, E. (2004). An Affix Stripping Morphological Analyzer For Turkish. Paper presented at the IASTED International Conference on Artificial Intelligence and Applications, Innsbruck, Austria.
Harman, D. (1993). Overview of the first TREC conference. Paper presented at the Proceedings international ACM SIGIR conference on Research and development in information retrieval. 16th annual
Jones, K.S. (1988). A statistical interpretation of term specificity and its application in retrieval Document retrieval systems (pp. 132-142): Taylor Graham Publishing.
Manning, C., Raghavan, P. and Schutze, H. (2008). Introduction to Information Retrieval: Cambridge University Press.
Ponte, J.M. and Croft, W.B. (1998). A language modeling approach to information retrieval. Paper presented at the Proceedings international ACM SIGIR conference on Research and development in information retrieval. 21st annual
Robertson, S., Walker, S., Hancock-Beaulieu, M., Gull, A. and Lau, M. (1992). Okapi at TREC-3. Paper presented at the Text REtrieval Conference.
Salton, G., Wong, A. and Yang, C.S. (1975). A vector space model for automatic indexing. Commun. ACM 18(11), 613- 620.
Walker, S., Robertson, S.E., Boughanem, M., Jones, G.J.F. and Jones, S. (1998). Okapi at TREC-6 - Automatic ad hoc, VLC, routing, filtering and QSDR.
Zhai, C. Notes on the Lemur TFIDF model, from 1.0/tfidf.ps
Zhai, C. and Lafferty, J. (2001). A study of smoothing methods for language models applied to Ad Hoc information retrieval. Paper presented at the Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval.

There are 16 citations in total.

Details

Primary Language	English
Subjects	Engineering
Journal Section	Articles
Authors	Ozgur Yilmazel
Publication Date	November 29, 2010
Published in Issue	Year 2010 Volume: 11 Issue: 2

Cite

APA	Yilmazel, O. (2010). A LANGUAGE MODELING APPROACH TO TURKISH TEXT RETRIEVAL. Anadolu University Journal of Science and Technology A - Applied Sciences and Engineering, 11(2), 163-172.
AMA	Yilmazel O. A LANGUAGE MODELING APPROACH TO TURKISH TEXT RETRIEVAL. AUJST-A. December 2010;11(2):163-172.
Chicago	Yilmazel, Ozgur. “A LANGUAGE MODELING APPROACH TO TURKISH TEXT RETRIEVAL”. Anadolu University Journal of Science and Technology A - Applied Sciences and Engineering 11, no. 2 (December 2010): 163-72.
EndNote	Yilmazel O (December 1, 2010) A LANGUAGE MODELING APPROACH TO TURKISH TEXT RETRIEVAL. Anadolu University Journal of Science and Technology A - Applied Sciences and Engineering 11 2 163–172.
IEEE	O. Yilmazel, “A LANGUAGE MODELING APPROACH TO TURKISH TEXT RETRIEVAL”, AUJST-A, vol. 11, no. 2, pp. 163–172, 2010.
ISNAD	Yilmazel, Ozgur. “A LANGUAGE MODELING APPROACH TO TURKISH TEXT RETRIEVAL”. Anadolu University Journal of Science and Technology A - Applied Sciences and Engineering 11/2 (December 2010), 163-172.
JAMA	Yilmazel O. A LANGUAGE MODELING APPROACH TO TURKISH TEXT RETRIEVAL. AUJST-A. 2010;11:163–172.
MLA	Yilmazel, Ozgur. “A LANGUAGE MODELING APPROACH TO TURKISH TEXT RETRIEVAL”. Anadolu University Journal of Science and Technology A - Applied Sciences and Engineering, vol. 11, no. 2, 2010, pp. 163-72.
Vancouver	Yilmazel O. A LANGUAGE MODELING APPROACH TO TURKISH TEXT RETRIEVAL. AUJST-A. 2010;11(2):163-72.

Download Cover Image

Article Files

Full Text