Açık Kaynak Doğal Dil İşleme Kütüphaneleri
Year 2021,
, 81 - 85, 30.04.2021
Havva Yılmaz
,
Semih Yumuşak
Abstract
Doğal dil işleme, dil bileşenlerinin hem şekilsel hem de anlamsal olarak analiz edildiği yöntemlere verilen isimdir. Doğal dil işleme yöntemleri sürekli güncellenmekte ve yeni yöntemler geliştirilmektedir. Bu çalışmada, doğal dil işlemede kullanılan güncel ve popüler kütüphaneler ve bu kütüphanelerde kullanılan yöntemler incelenmiştir. Farklı yöntem ve kütüphaneler karşılaştırmalı olarak açıklanmıştır.
References
- Aggarwal, C. C., & Zhai, C. (2012). A survey of text classification algorithms. In Mining text data (pp. 163-222). Springer, Boston, MA.
- Akbik, Roland, (2018). Flair A very simple framework for state-of-the-art NLP, 2020. [Çevrimiçi]: https://github.com/flairNLP/flair
- Barrus, (2018). Pyspellchecker Pure python spell checker based on work by Peter Norvig, 2020. [Çevrimiçi]: https://pypi.org/project/pyspellchecker/
- Bora, (2020). Zemberek Parser , 2019. [Çevrimiçi]: https://github.com/kemalcanbora/zemberek\_parser
- Buitinck, Louppe, (2013). Scikit-learn 0.23.2 Machine Learning in Python, 2020. [Çevrimiçi]: https://scikit-learn.org/stable/
- Bird, (2019). Natural Language Toolkit, 2020. [Çevrimiçi]: http://www.nltk.org
- Cambria, E., Poria, S., Gelbukh, A., & Thelwall, M. (2017). Sentiment analysis is a big suitcase. IEEE Intelligent Systems, 32(6), 74-80.
- Chowdhary, (2020). Natural language processing. In Fundamentals of Artificial Intelligence (pp. 603-649). Springer, New Delhi.
- Çetinkaya, (2018). Turkish NLP, 2020. [Çevrimiçi]: https://pypi.org/project/turkishnlp/
- David, 2020. How many languages in the world. [Çevrimiçi]: https://www.ethnologue.com/guides/how-many-languages
- Dehkharghani, Rahim & Saygin, Yucel & Yanikoglu, Berrin & Oflazer, Kemal. (2015). SentiTurkNet: a Turkish polarity lexicon for sentiment analysis. Language Resources and Evaluation. 50. 10.1007/s10579-015-9307-6.
- Eryiğit, G. (2014, April). ITU Turkish NLP web service. In Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics (pp. 1-4).
- Gardner, (2017). A natural language processing platform for building state-of-the-art models. 2020. [Çevrimiçi]: https://allennlp.org/
- Honnibal, Matthew, (2017). Spacy Industrial-Strength Natural Language Processing in Python, 2020. [Çevrimiçi]: https://spacy.io/
- Howard, (2017). Fast.ai , 2020. [Çevrimiçi]: https://www.fast.ai/
- Jivani, A. G. (2011). A comparative study of stemming algorithms. Int. J. Comp. Tech. Appl, 2(6), 1930-1938.
- Koksal, (2018). GitHub - akoksal/Turkish-Lemmatizer: Lemmatization for Turkish Language, 2018. [Çevrimiçi]: https://github.com/akoksal/Turkish-Lemmatizer
- Liang, J., Koperski, K., Dhillon, N. S., Tusk, C., & Bhatti, S. (2013). U.S. Patent No. 8,594,996. Washington, DC: U.S. Patent and Trademark Office.
- Lorai, (2013). TextBlob: Simplified Text Processing, 2020. [Çevrimiçi]: https://textblob.readthedocs.io/en/dev/
- Majumder, P., Mitra, M., & Chaudhuri, B. B. (2002). N-gram: a language independent approach to IR and NLP. In International conference on universal knowledge and language.
- McClosky,,Bauer, (2014). Stanford CoreNLP: A Java suite of core NLP tools. 2020. [Çevrimiçi]: https://github.com/stanfordnlp/CoreNLP
- Onaldi, (2018). Turkish Stemmer, 2019. [Çevrimiçi]: https://github.com/otuncelli/turkish-stemmer-python/
- Paszke, (2017). Pytorch/pytorch: Tensors and Dynamic neural networks in Python with strong GPU acceleration, 2020. [Çevrimiçi]: https://github.com/pytorch/pytorch
- Radim, Sojka, (2008). Gensim 3.8.3 Python framework for fast Vector Space Modelling, 2020. [Çevrimiçi]: https://pypi.org/project/gensim/
- Sciforce, (2020). Text Preprocessing for NLP and Machine Learning Tasks [Çevrimiçi]: https://medium.com/sciforce/text-preprocessing-for-nlp-and-machine-learning-tasks-3e077aa4946e
- Sun, Q., Wang, B., Gu, Z., & Fu, Y. (2018). VECTORIZATION METHODS IN RECOMMENDER SYSTEM.
- Webster, (1992). Tokenization as the initial phase in NLP. In COLING 1992 Volume 4: The 15th International Conference on Computational Linguistics.
- Zemberek, (2007). NLP tools for Turkish. [Çevrimiçi]: https://github.com/ahmetaa/zemberek-nlp
Open Source Natural Language Processing Libraries
Year 2021,
, 81 - 85, 30.04.2021
Havva Yılmaz
,
Semih Yumuşak
Abstract
Natural language processing is a collection of methods in which language components are analyzed both syntactically and semantically. This study presents the set of tools and libraries classified as natural language processing methods. Current and popular libraries used in natural language processing and the methods used in these libraries are comperatively explained.
References
- Aggarwal, C. C., & Zhai, C. (2012). A survey of text classification algorithms. In Mining text data (pp. 163-222). Springer, Boston, MA.
- Akbik, Roland, (2018). Flair A very simple framework for state-of-the-art NLP, 2020. [Çevrimiçi]: https://github.com/flairNLP/flair
- Barrus, (2018). Pyspellchecker Pure python spell checker based on work by Peter Norvig, 2020. [Çevrimiçi]: https://pypi.org/project/pyspellchecker/
- Bora, (2020). Zemberek Parser , 2019. [Çevrimiçi]: https://github.com/kemalcanbora/zemberek\_parser
- Buitinck, Louppe, (2013). Scikit-learn 0.23.2 Machine Learning in Python, 2020. [Çevrimiçi]: https://scikit-learn.org/stable/
- Bird, (2019). Natural Language Toolkit, 2020. [Çevrimiçi]: http://www.nltk.org
- Cambria, E., Poria, S., Gelbukh, A., & Thelwall, M. (2017). Sentiment analysis is a big suitcase. IEEE Intelligent Systems, 32(6), 74-80.
- Chowdhary, (2020). Natural language processing. In Fundamentals of Artificial Intelligence (pp. 603-649). Springer, New Delhi.
- Çetinkaya, (2018). Turkish NLP, 2020. [Çevrimiçi]: https://pypi.org/project/turkishnlp/
- David, 2020. How many languages in the world. [Çevrimiçi]: https://www.ethnologue.com/guides/how-many-languages
- Dehkharghani, Rahim & Saygin, Yucel & Yanikoglu, Berrin & Oflazer, Kemal. (2015). SentiTurkNet: a Turkish polarity lexicon for sentiment analysis. Language Resources and Evaluation. 50. 10.1007/s10579-015-9307-6.
- Eryiğit, G. (2014, April). ITU Turkish NLP web service. In Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics (pp. 1-4).
- Gardner, (2017). A natural language processing platform for building state-of-the-art models. 2020. [Çevrimiçi]: https://allennlp.org/
- Honnibal, Matthew, (2017). Spacy Industrial-Strength Natural Language Processing in Python, 2020. [Çevrimiçi]: https://spacy.io/
- Howard, (2017). Fast.ai , 2020. [Çevrimiçi]: https://www.fast.ai/
- Jivani, A. G. (2011). A comparative study of stemming algorithms. Int. J. Comp. Tech. Appl, 2(6), 1930-1938.
- Koksal, (2018). GitHub - akoksal/Turkish-Lemmatizer: Lemmatization for Turkish Language, 2018. [Çevrimiçi]: https://github.com/akoksal/Turkish-Lemmatizer
- Liang, J., Koperski, K., Dhillon, N. S., Tusk, C., & Bhatti, S. (2013). U.S. Patent No. 8,594,996. Washington, DC: U.S. Patent and Trademark Office.
- Lorai, (2013). TextBlob: Simplified Text Processing, 2020. [Çevrimiçi]: https://textblob.readthedocs.io/en/dev/
- Majumder, P., Mitra, M., & Chaudhuri, B. B. (2002). N-gram: a language independent approach to IR and NLP. In International conference on universal knowledge and language.
- McClosky,,Bauer, (2014). Stanford CoreNLP: A Java suite of core NLP tools. 2020. [Çevrimiçi]: https://github.com/stanfordnlp/CoreNLP
- Onaldi, (2018). Turkish Stemmer, 2019. [Çevrimiçi]: https://github.com/otuncelli/turkish-stemmer-python/
- Paszke, (2017). Pytorch/pytorch: Tensors and Dynamic neural networks in Python with strong GPU acceleration, 2020. [Çevrimiçi]: https://github.com/pytorch/pytorch
- Radim, Sojka, (2008). Gensim 3.8.3 Python framework for fast Vector Space Modelling, 2020. [Çevrimiçi]: https://pypi.org/project/gensim/
- Sciforce, (2020). Text Preprocessing for NLP and Machine Learning Tasks [Çevrimiçi]: https://medium.com/sciforce/text-preprocessing-for-nlp-and-machine-learning-tasks-3e077aa4946e
- Sun, Q., Wang, B., Gu, Z., & Fu, Y. (2018). VECTORIZATION METHODS IN RECOMMENDER SYSTEM.
- Webster, (1992). Tokenization as the initial phase in NLP. In COLING 1992 Volume 4: The 15th International Conference on Computational Linguistics.
- Zemberek, (2007). NLP tools for Turkish. [Çevrimiçi]: https://github.com/ahmetaa/zemberek-nlp