NATURAL LANGUAGE PROCESSING ALGORITHMS AND PERFORMANCE COMPARISON

Ayhan Arısoy

doi:10.57120/yalvac.1536202

Araştırma Makalesi

NATURAL LANGUAGE PROCESSING ALGORITHMS AND PERFORMANCE COMPARISON

Yıl 2024, Cilt: 9 Sayı: 2, 106 - 121, 30.10.2024

Ayhan Arısoy

https://doi.org/10.57120/yalvac.1536202

Öz

Natural language processing (NLP) is the general name for the methods and algorithms developed for computers to understand, interpret and produce human language. NLP plays a critical role in many fields, from social media analyses to customer service, from language translation to healthcare. This paper provides a comprehensive overview of the basic concepts of NLP, popular algorithms and models, performance comparisons, and various application areas. Key concepts of NLP include language models, tokenisation, lemmatisation, stemming, POS tagging, NER and syntactic parsing. These concepts are critical for processing, analysing and making sense of texts. Language models include popular methods such as N-gram, Word2Vec, GloVe and BERT. NLP algorithms are classified as rule-based methods, machine learning methods and deep learning methods. Rule-based methods are based on grammatical rules, while machine learning methods work on the principle of learning from data. Deep learning methods, on the other hand, achieve high accuracy results by using large datasets and powerful computational resources. In the performance comparison section, it is stated that the algorithms are evaluated with metrics such as accuracy, precision, recall and F1 score. Advanced models such as BERT and GPT-3 show superior performance in many NLP tasks. In conclusion, the field of NLP is rapidly evolving, with significant advancements anticipated in several key areas. These include the creation of more effective and efficient models, efforts to reduce biases, enhanced privacy protection, the growth of multilingual and cross-cultural models, and the development of explainable artificial intelligence techniques. This paper provides a comprehensive overview to understand the current status and future directions of NLP technologies.

Anahtar Kelimeler

NLP, Language Models, Deep Learning, Text Analysis

Kaynakça

[1] Egger, R., Gokce, E. (2022). Natural Language Processing (NLP): An Introduction. In: Egger, R. (eds) Applied Data Science in Tourism. Tourism on the Verge. Springer, Cham. https://doi.org/10.1007/978-3-030-88389-8_15
[2] Shankar, V., Parsana, S. An overview and empirical comparison of natural language processing (NLP) models and an introduction to and empirical application of autoencoder models in marketing. J. of the Acad. Mark. Sci. 50, 1324–1350 (2022). https://doi.org/10.1007/s11747-022-00840-3
[3] Greco, C.M., Tagarelli, A., Zumpano, E. (2022). A Comparison of Transformer-Based Language Models on NLP Benchmarks. In: Rosso, P., Basile, V., Martínez, R., Métais, E., Meziane, F. (eds) Natural Language Processing and Information Systems. NLDB 2022. Lecture Notes in Computer Science, vol 13286. Springer, Cham. https://doi.org/10.1007/978-3-031-08473-7_45
[4] Rahman, M., Nowakowski, S., Agrawal, R., Naik, A. D., Sharafkhaneh, A., & Razjouyan, J. (2022). Validation of a natural language processing algorithm for the extraction of the sleep parameters from the polysomnography reports. Healthcare, 10(10), 1837. https://doi.org/10.3390/healthcare10101837
[5] Nowakowski, S., Razjouyan, J., Naik, A. D., Agrawal, R., Velamuri, K., Singh, S., … & Sharafkhaneh, A. (2020). 1180 the use of natural language processing to extract data from psg sleep study reports using national vha electronic medical record data. Sleep, 43(Supplement_1), A450-A451. https://doi.org/10.1093/sleep/zsaa056.1174
[6] Lo, Y., Varghese, S., Blackley, S. V., Seger, D. L., Blumenthal, K. G., Goss, F. R., … & Zhou, L. (2022). Reconciling allergy information in the electronic health record after a drug challenge using natural language processing. Frontiers in Allergy, 3. https://doi.org/10.3389/falgy.2022.904923
[7] Zheng, Y., Dickson, V. V., Blecker, S., Ng, J., Rice, B. C., Melkus, G. D., … & Johnson, S. B. (2022). Identifying patients with hypoglycemia using natural language processing: systematic literature review. JMIR Diabetes, 7(2), e34681. https://doi.org/10.2196/34681
[8] Afzal, N., Sohn, S., Abram, S., Scott, C. G., Chaudhry, R., Liu, H., … & Arruda‐Olson, A. M. (2017). Mining peripheral arterial disease cases from narrative clinical notes using natural language processing. Journal of Vascular Surgery, 65(6), 1753-1761. https://doi.org/10.1016/j.jvs.2016.11.031
[9] Fu, S., Lopes, G. S., Pagali, S. R., Thorsteinsdottir, B., LeBrasseur, N. K., Wen, A., … & Sohn, S. (2020). Ascertainment of delirium status using natural language processing from electronic health records. The Journals of Gerontology: Series A, 77(3), 524-530. https://doi.org/10.1093/gerona/glaa275
[10] Wi, C. I., Sohn, S., Rolfes, M., Seabright, A., Ryu, E., Voge, G. A., … & Juhn, Y. J. (2017). Application of a natural language processing algorithm to asthma ascertainment. an automated chart review. American Journal of Respiratory and Critical Care Medicine, 196(4), 430-437. https://doi.org/10.1164/rccm.201610-2006oc
[11] Wang, Y., Mehrabi, S., Sohn, S., Atkinson, E., Amin, S., & Liu, H. (2019). Natural language processing of radiology reports for identification of skeletal site-specific fractures. BMC Medical Informatics and Decision Making, 19(S3). https://doi.org/10.1186/s12911-019-0780-5
[12] Ridgway, J. P., Uvin, A. Z., Schmitt, J., Oliwa, T., Almirol, E., Devlin, S., … & Schneider, J. A. (2021). Natural language processing of clinical notes to identify mental illness and substance use among people living with hiv: retrospective cohort study. JMIR Medical Informatics, 9(3), e23456. https://doi.org/10.2196/23456
[13] Mishra, A. (2021). Conversational artificial intelligence/natural language processing algorithms for modeling and research summarization of friction stir welded aluminum joints.. https://doi.org/10.26434/chemrxiv-2021-hbxdx
[14] Al-Furaiji, R. H. and Abdulkader, H. (2024). Comparison of the performance of six machine learning algorithms for fake news. EAI Endorsed Transactions on AI and Robotics, 3. https://doi.org/10.4108/airo.4153
[15] Khurana, D., Koli, A., Khatter, K. et al. Natural language processing: state of the art, current trends and challenges. Multimed Tools Appl 82, 3713–3744 (2023). https://doi.org/10.1007/s11042-022-13428-4
[16] Mohammad, S. (2020, May). NLP scholar: A dataset for examining the state of NLP research. In Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 868-877).
[17] Treviso, M., Lee, J. U., Ji, T., Aken, B. V., Cao, Q., Ciosici, M. R., ... & Schwartz, R. (2023). Efficient methods for natural language processing: A survey. Transactions of the Association for Computational Linguistics, 11, 826-860.
[18] Sarkar, D. (2019). Natural Language Processing Basics. In: Text Analytics with Python. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-4354-1_1
[19] Lee, R. S. (2023). N-Gram Language Model. In Natural Language Processing: A Textbook with Python Implementation (pp. 19-42). Singapore: Springer Nature Singapore.
[20] Oralbekova, D., Mamyrbayev, O., Othman, M., Kassymova, D., & Mukhsina, K. (2023). Contemporary approaches in evolving language models. Applied Sciences, 13(23), 12901.
[21] Khurana, D., Koli, A., Khatter, K., & Singh, S. (2023). Natural language processing: state of the art, current trends and challenges. Multimedia tools and applications, 82(3), 3713-3744.
[22] Nadkarni, P. M., Ohno-Machado, L., & Chapman, W. W. (2011). Natural language processing: an introduction. Journal of the American Medical Informatics Association, 18(5), 544-551.
[23] Yogish, D., Manjunath, T. N., & Hegadi, R. S. (2019). Review on natural language processing trends and techniques using NLTK. In Recent Trends in Image Processing and Pattern Recognition: Second International Conference, RTIP2R 2018, Solapur, India, December 21–22, 2018, Revised Selected Papers, Part III 2 (pp. 589-606). Springer Singapore.
[24] Manjunath, T. N., & Hegadi, R. S. (2019). Review on natural language processing trends and techniques using NLTK. In Recent Trends in Image Processing and Pattern Recognition: Second International Conference, RTIP2R 2018, Solapur, India, December 21–22, 2018, Revised Selected Papers, Part III 2 (pp. 589-606). Springer Singapore.
[25] Kusal, S., Patil, S., Choudrie, J. et al. A systematic review of applications of natural language processing and future challenges with special emphasis in text-based emotion detection. Artif Intell Rev 56, 15129–15215 (2023). https://doi.org/10.1007/s10462-023-10509-0
[26] Johnson, S.J., Murty, M.R. & Navakanth, I. A detailed review on word embedding techniques with emphasis on word2vec. Multimed Tools Appl 83, 37979–38007 (2024). https://doi.org/10.1007/s11042-023-17007-z
[27] Kang, N., Singh, B., Afzal, Z., van Mulligen, E. M., & Kors, J. A. (2013). Using rule-based natural language processing to improve disease normalization in biomedical text. Journal of the American Medical Informatics Association, 20(5), 876-881.
[28] Ghazizadeh, E., & Zhu, P. (2020, October). A systematic literature review of natural language processing: Current state, challenges and risks. In Proceedings of the future technologies conference (pp. 634-647). Cham: Springer International Publishing.
[29] Rezaeian, N., & Novikova, G. (2020). Persian text classification using naive bayes algorithms and support vector machine algorithm. Indonesian Journal of Electrical Engineering and Informatics (IJEEI), 8(1), 178-188.
[30] Alloghani, M., Al-Jumeily, D., Mustafina, J., Hussain, A., & Aljaaf, A. J. (2020). A systematic review on supervised and unsupervised machine learning algorithms for data science. Supervised and unsupervised learning for data science, 3-21.
[31] Zhang, C. (2021). Soft sensing transformer: hundreds of sensors are worth a single word.. https://doi.org/10.48550/arxiv.2111.05973
[32] Liu, S., Ni'mah, I., Menkovski, V., Mocanu, D., & Pechenizkiy, M. (2021). Efficient and effective training of sparse recurrent neural networks. neural Computing and Applications, 33(15), 9625-9636. https://doi.org/10.1007/s00521-021-05727-y
[33] Gupta, P. (2023). Stock market analysis using long short-term model. Icst Transactions on Scalable information Systems. https://doi.org/10.4108/eetsis.4446
[34] Agarap, A. (2018). A neural network architecture combining gated recurrent unit (gru) and support vector machine (svm) for intrusion detection in network traffic data.. https://doi.org/10.1145/3195106.3195117
[35] Yuan, F., Zhang, Z., & Fang, Z. (2023). An effective CNN and Transformer complementary network for medical image segmentation. Pattern Recognition, 136, 109228.
[36] Dodiya, T. (2021). Using term frequency - inverse document frequency to find the relevance of words in gujarati language. International Journal for Research in Applied Science and Engineering Technology, 9(4), 378-381. https://doi.org/10.22214/ijraset.2021.33625
[37] Christian, H., Agus, M., & Suhartono, D. (2016). Single document automatic text summarization using term frequency-inverse document frequency (tf-idf). Comtech Computer Mathematics and Engineering Applications, 7(4), 285. https://doi.org/10.21512/comtech.v7i4.3746
[38] Qaiser, S. and Ali, R. (2018). Text mining: use of tf-idf to examine the relevance of words to documents. International Journal of Computer Applications, 181(1), 25-29. https://doi.org/10.5120/ijca2018917395
[39] Faouzi, H., Elbadaoui, M., Boutalline, M., Tannouche, A., & Ouanan, H. (2023). Towards amazigh word embedding: corpus creation and word2vec models evaluations. Revue D Intelligence Artificielle, 37(3), 753-759. https://doi.org/10.18280/ria.370324
[40] Mohadikar, E. (2023). Sentence semantic similarity based complex network approach for word sense disambiguation. International Journal on Recent and Innovation Trends in Computing and Communication, 11(10), 286-293. https://doi.org/10.17762/ijritcc.v11i10.8491
[41] Shen, Y., Zhang, Q., Zhang, J., Huang, J., Lu, Y., & Lei, K. (2018). Improving medical short text classification with semantic expansion using word-cluster embedding., 401-411. https://doi.org/10.1007/978-981-13-1056-0_41
[42] Kasri, M., Birjali, M., Mohamed, N., Beni-Hssane, A., El-Ansari, A., & Fissaoui, M. (2022). Refining word embeddings with sentiment information for sentiment analysis. Journal of Ict Standardization. https://doi.org/10.13052/jicts2245-800x.1031
[43] Santos, F., Bispo, T., Macedo, H., & Zanchettin, C. (2021). Morphological skip-gram: replacing fasttext characters n-gram with morphological knowledge. Inteligencia Artificial, 24(67), 1-17. https://doi.org/10.4114/intartif.vol24iss67pp1-17
[44] Fivez, P., Suster, S., & Daelemans, W. (2017). Unsupervised context-sensitive spelling correction of clinical free-text with word and character n-gram embeddings.. https://doi.org/10.18653/v1/w17-2317
[45] Athiwaratkun, B., Wilson, A., & Anandkumar, A. (2018). Probabilistic fasttext for multi-sense word embeddings.. https://doi.org/10.18653/v1/p18-1001
[46] Sung, C., Dhamecha, T., Saha, S., Ma, T., Reddy, V., & Arora, R. (2019). pre-training bert on domain resources for short answer grading.. https://doi.org/10.18653/v1/d19-1628
[47] Shaghaghian, S., Luna, F., Jafarpour, B., & Pogrebnyakov, N. (2021). Customizing contextualized language models forlegal document reviews. https://doi.org/10.48550/arxiv.2102.05757
[48] Imamguluyev, R. (2023). The rise of gpt-3: implications for natural language processing and beyond. International Journal of Research Publication and Reviews, 4(3), 4893-4903. https://doi.org/10.55248/gengpi.2023.4.33987
[49] Gaikwad, A., Rambhia, P., & Pawar, S. (2022). An extensive analysis between different language models: gpt-3, bert and macaw.. https://doi.org/10.21203/rs.3.rs-2155616/v1
[50] Dharrao, D. (2024). Summarizing business news: evaluating bart, t5, and pegasus for effective information extraction. Revue D Intelligence Artificielle, 38(3), 847-855. https://doi.org/10.18280/ria.380311 [51] Liu, F., Huang, T., Lyu, S., Shakeri, S., Yu, H., & Li, J. (2021). Enct5: fine-tuning t5 encoder for non-autoregressive tasks.. https://doi.org/10.48550/arxiv.2110.08426
[52] Mallinson, J., Adámek, J., Malmi, E., & Severyn, A. (2022). Edit5: semi-autoregressive text-editing with t5 warm-start.. https://doi.org/10.48550/arxiv.2205.12209

DOĞAL DİL İŞLEME ALGORİTMALARI VE PERFORMANS KARŞILAŞTIRMASI

Yıl 2024, Cilt: 9 Sayı: 2, 106 - 121, 30.10.2024

Ayhan Arısoy

https://doi.org/10.57120/yalvac.1536202

Öz

Doğal dil işleme (NLP), bilgisayarların insan dilini anlaması, yorumlaması ve üretmesi için geliştirilen yöntem ve algoritmaların genel adıdır. NLP, sosyal medya analizlerinden müşteri hizmetlerine, dil çevirisinden sağlık hizmetlerine kadar birçok alanda kritik bir rol oynamaktadır. Bu makale, NLP'nin temel kavramları, popüler algoritmalar ve modeller, performans karşılaştırmaları ve çeşitli uygulama alanları hakkında kapsamlı bir genel bakış sunmaktadır. NLP'nin temel kavramları arasında dil modelleri, tokenisation, lemmatisation, stemming, POS tagging, NER ve syntactic parsing yer almaktadır. Bu kavramlar metinlerin işlenmesi, analiz edilmesi ve anlamlandırılması için kritik öneme sahiptir. Dil modelleri N-gram, Word2Vec, GloVe ve BERT gibi popüler yöntemleri içerir. NLP algoritmaları kural tabanlı yöntemler, makine öğrenimi yöntemleri ve derin öğrenme yöntemleri olarak sınıflandırılır. Kural tabanlı yöntemler dilbilgisi kurallarına dayanırken, makine öğrenimi yöntemleri veriden öğrenme prensibiyle çalışır. Derin öğrenme yöntemleri ise büyük veri kümeleri ve güçlü hesaplama kaynakları kullanarak yüksek doğrulukta sonuçlar elde etmektedir. Performans karşılaştırma bölümünde algoritmaların doğruluk, kesinlik, geri çağırma ve F1 skoru gibi metriklerle değerlendirildiği belirtilmektedir. BERT ve GPT-3 gibi gelişmiş modeller birçok NLP görevinde üstün performans göstermektedir. Sonuç olarak, NLP alanı hızla gelişmekte ve birkaç kilit alanda önemli ilerlemeler beklenmektedir. Bunlar arasında daha etkili ve verimli modellerin oluşturulması, önyargıları azaltma çabaları, gelişmiş gizlilik koruması, çok dilli ve kültürler arası modellerin büyümesi ve açıklanabilir yapay zeka tekniklerinin geliştirilmesi yer almaktadır. Bu makale, NLP teknolojilerinin mevcut durumunu ve gelecekteki yönelimlerini anlamak için kapsamlı bir genel bakış sunmaktadır.

Anahtar Kelimeler

NLP, Language Models, Deep Learning, Text Analysis

Kaynakça

[1] Egger, R., Gokce, E. (2022). Natural Language Processing (NLP): An Introduction. In: Egger, R. (eds) Applied Data Science in Tourism. Tourism on the Verge. Springer, Cham. https://doi.org/10.1007/978-3-030-88389-8_15
[2] Shankar, V., Parsana, S. An overview and empirical comparison of natural language processing (NLP) models and an introduction to and empirical application of autoencoder models in marketing. J. of the Acad. Mark. Sci. 50, 1324–1350 (2022). https://doi.org/10.1007/s11747-022-00840-3
[3] Greco, C.M., Tagarelli, A., Zumpano, E. (2022). A Comparison of Transformer-Based Language Models on NLP Benchmarks. In: Rosso, P., Basile, V., Martínez, R., Métais, E., Meziane, F. (eds) Natural Language Processing and Information Systems. NLDB 2022. Lecture Notes in Computer Science, vol 13286. Springer, Cham. https://doi.org/10.1007/978-3-031-08473-7_45
[4] Rahman, M., Nowakowski, S., Agrawal, R., Naik, A. D., Sharafkhaneh, A., & Razjouyan, J. (2022). Validation of a natural language processing algorithm for the extraction of the sleep parameters from the polysomnography reports. Healthcare, 10(10), 1837. https://doi.org/10.3390/healthcare10101837
[5] Nowakowski, S., Razjouyan, J., Naik, A. D., Agrawal, R., Velamuri, K., Singh, S., … & Sharafkhaneh, A. (2020). 1180 the use of natural language processing to extract data from psg sleep study reports using national vha electronic medical record data. Sleep, 43(Supplement_1), A450-A451. https://doi.org/10.1093/sleep/zsaa056.1174
[6] Lo, Y., Varghese, S., Blackley, S. V., Seger, D. L., Blumenthal, K. G., Goss, F. R., … & Zhou, L. (2022). Reconciling allergy information in the electronic health record after a drug challenge using natural language processing. Frontiers in Allergy, 3. https://doi.org/10.3389/falgy.2022.904923
[7] Zheng, Y., Dickson, V. V., Blecker, S., Ng, J., Rice, B. C., Melkus, G. D., … & Johnson, S. B. (2022). Identifying patients with hypoglycemia using natural language processing: systematic literature review. JMIR Diabetes, 7(2), e34681. https://doi.org/10.2196/34681
[8] Afzal, N., Sohn, S., Abram, S., Scott, C. G., Chaudhry, R., Liu, H., … & Arruda‐Olson, A. M. (2017). Mining peripheral arterial disease cases from narrative clinical notes using natural language processing. Journal of Vascular Surgery, 65(6), 1753-1761. https://doi.org/10.1016/j.jvs.2016.11.031
[9] Fu, S., Lopes, G. S., Pagali, S. R., Thorsteinsdottir, B., LeBrasseur, N. K., Wen, A., … & Sohn, S. (2020). Ascertainment of delirium status using natural language processing from electronic health records. The Journals of Gerontology: Series A, 77(3), 524-530. https://doi.org/10.1093/gerona/glaa275
[10] Wi, C. I., Sohn, S., Rolfes, M., Seabright, A., Ryu, E., Voge, G. A., … & Juhn, Y. J. (2017). Application of a natural language processing algorithm to asthma ascertainment. an automated chart review. American Journal of Respiratory and Critical Care Medicine, 196(4), 430-437. https://doi.org/10.1164/rccm.201610-2006oc
[11] Wang, Y., Mehrabi, S., Sohn, S., Atkinson, E., Amin, S., & Liu, H. (2019). Natural language processing of radiology reports for identification of skeletal site-specific fractures. BMC Medical Informatics and Decision Making, 19(S3). https://doi.org/10.1186/s12911-019-0780-5
[12] Ridgway, J. P., Uvin, A. Z., Schmitt, J., Oliwa, T., Almirol, E., Devlin, S., … & Schneider, J. A. (2021). Natural language processing of clinical notes to identify mental illness and substance use among people living with hiv: retrospective cohort study. JMIR Medical Informatics, 9(3), e23456. https://doi.org/10.2196/23456
[13] Mishra, A. (2021). Conversational artificial intelligence/natural language processing algorithms for modeling and research summarization of friction stir welded aluminum joints.. https://doi.org/10.26434/chemrxiv-2021-hbxdx
[14] Al-Furaiji, R. H. and Abdulkader, H. (2024). Comparison of the performance of six machine learning algorithms for fake news. EAI Endorsed Transactions on AI and Robotics, 3. https://doi.org/10.4108/airo.4153
[15] Khurana, D., Koli, A., Khatter, K. et al. Natural language processing: state of the art, current trends and challenges. Multimed Tools Appl 82, 3713–3744 (2023). https://doi.org/10.1007/s11042-022-13428-4
[16] Mohammad, S. (2020, May). NLP scholar: A dataset for examining the state of NLP research. In Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 868-877).
[17] Treviso, M., Lee, J. U., Ji, T., Aken, B. V., Cao, Q., Ciosici, M. R., ... & Schwartz, R. (2023). Efficient methods for natural language processing: A survey. Transactions of the Association for Computational Linguistics, 11, 826-860.
[18] Sarkar, D. (2019). Natural Language Processing Basics. In: Text Analytics with Python. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-4354-1_1
[19] Lee, R. S. (2023). N-Gram Language Model. In Natural Language Processing: A Textbook with Python Implementation (pp. 19-42). Singapore: Springer Nature Singapore.
[20] Oralbekova, D., Mamyrbayev, O., Othman, M., Kassymova, D., & Mukhsina, K. (2023). Contemporary approaches in evolving language models. Applied Sciences, 13(23), 12901.
[21] Khurana, D., Koli, A., Khatter, K., & Singh, S. (2023). Natural language processing: state of the art, current trends and challenges. Multimedia tools and applications, 82(3), 3713-3744.
[22] Nadkarni, P. M., Ohno-Machado, L., & Chapman, W. W. (2011). Natural language processing: an introduction. Journal of the American Medical Informatics Association, 18(5), 544-551.
[23] Yogish, D., Manjunath, T. N., & Hegadi, R. S. (2019). Review on natural language processing trends and techniques using NLTK. In Recent Trends in Image Processing and Pattern Recognition: Second International Conference, RTIP2R 2018, Solapur, India, December 21–22, 2018, Revised Selected Papers, Part III 2 (pp. 589-606). Springer Singapore.
[24] Manjunath, T. N., & Hegadi, R. S. (2019). Review on natural language processing trends and techniques using NLTK. In Recent Trends in Image Processing and Pattern Recognition: Second International Conference, RTIP2R 2018, Solapur, India, December 21–22, 2018, Revised Selected Papers, Part III 2 (pp. 589-606). Springer Singapore.
[25] Kusal, S., Patil, S., Choudrie, J. et al. A systematic review of applications of natural language processing and future challenges with special emphasis in text-based emotion detection. Artif Intell Rev 56, 15129–15215 (2023). https://doi.org/10.1007/s10462-023-10509-0
[26] Johnson, S.J., Murty, M.R. & Navakanth, I. A detailed review on word embedding techniques with emphasis on word2vec. Multimed Tools Appl 83, 37979–38007 (2024). https://doi.org/10.1007/s11042-023-17007-z
[27] Kang, N., Singh, B., Afzal, Z., van Mulligen, E. M., & Kors, J. A. (2013). Using rule-based natural language processing to improve disease normalization in biomedical text. Journal of the American Medical Informatics Association, 20(5), 876-881.
[28] Ghazizadeh, E., & Zhu, P. (2020, October). A systematic literature review of natural language processing: Current state, challenges and risks. In Proceedings of the future technologies conference (pp. 634-647). Cham: Springer International Publishing.
[29] Rezaeian, N., & Novikova, G. (2020). Persian text classification using naive bayes algorithms and support vector machine algorithm. Indonesian Journal of Electrical Engineering and Informatics (IJEEI), 8(1), 178-188.
[30] Alloghani, M., Al-Jumeily, D., Mustafina, J., Hussain, A., & Aljaaf, A. J. (2020). A systematic review on supervised and unsupervised machine learning algorithms for data science. Supervised and unsupervised learning for data science, 3-21.
[31] Zhang, C. (2021). Soft sensing transformer: hundreds of sensors are worth a single word.. https://doi.org/10.48550/arxiv.2111.05973
[32] Liu, S., Ni'mah, I., Menkovski, V., Mocanu, D., & Pechenizkiy, M. (2021). Efficient and effective training of sparse recurrent neural networks. neural Computing and Applications, 33(15), 9625-9636. https://doi.org/10.1007/s00521-021-05727-y
[33] Gupta, P. (2023). Stock market analysis using long short-term model. Icst Transactions on Scalable information Systems. https://doi.org/10.4108/eetsis.4446
[34] Agarap, A. (2018). A neural network architecture combining gated recurrent unit (gru) and support vector machine (svm) for intrusion detection in network traffic data.. https://doi.org/10.1145/3195106.3195117
[35] Yuan, F., Zhang, Z., & Fang, Z. (2023). An effective CNN and Transformer complementary network for medical image segmentation. Pattern Recognition, 136, 109228.
[36] Dodiya, T. (2021). Using term frequency - inverse document frequency to find the relevance of words in gujarati language. International Journal for Research in Applied Science and Engineering Technology, 9(4), 378-381. https://doi.org/10.22214/ijraset.2021.33625
[37] Christian, H., Agus, M., & Suhartono, D. (2016). Single document automatic text summarization using term frequency-inverse document frequency (tf-idf). Comtech Computer Mathematics and Engineering Applications, 7(4), 285. https://doi.org/10.21512/comtech.v7i4.3746
[38] Qaiser, S. and Ali, R. (2018). Text mining: use of tf-idf to examine the relevance of words to documents. International Journal of Computer Applications, 181(1), 25-29. https://doi.org/10.5120/ijca2018917395
[39] Faouzi, H., Elbadaoui, M., Boutalline, M., Tannouche, A., & Ouanan, H. (2023). Towards amazigh word embedding: corpus creation and word2vec models evaluations. Revue D Intelligence Artificielle, 37(3), 753-759. https://doi.org/10.18280/ria.370324
[40] Mohadikar, E. (2023). Sentence semantic similarity based complex network approach for word sense disambiguation. International Journal on Recent and Innovation Trends in Computing and Communication, 11(10), 286-293. https://doi.org/10.17762/ijritcc.v11i10.8491
[41] Shen, Y., Zhang, Q., Zhang, J., Huang, J., Lu, Y., & Lei, K. (2018). Improving medical short text classification with semantic expansion using word-cluster embedding., 401-411. https://doi.org/10.1007/978-981-13-1056-0_41
[42] Kasri, M., Birjali, M., Mohamed, N., Beni-Hssane, A., El-Ansari, A., & Fissaoui, M. (2022). Refining word embeddings with sentiment information for sentiment analysis. Journal of Ict Standardization. https://doi.org/10.13052/jicts2245-800x.1031
[43] Santos, F., Bispo, T., Macedo, H., & Zanchettin, C. (2021). Morphological skip-gram: replacing fasttext characters n-gram with morphological knowledge. Inteligencia Artificial, 24(67), 1-17. https://doi.org/10.4114/intartif.vol24iss67pp1-17
[44] Fivez, P., Suster, S., & Daelemans, W. (2017). Unsupervised context-sensitive spelling correction of clinical free-text with word and character n-gram embeddings.. https://doi.org/10.18653/v1/w17-2317
[45] Athiwaratkun, B., Wilson, A., & Anandkumar, A. (2018). Probabilistic fasttext for multi-sense word embeddings.. https://doi.org/10.18653/v1/p18-1001
[46] Sung, C., Dhamecha, T., Saha, S., Ma, T., Reddy, V., & Arora, R. (2019). pre-training bert on domain resources for short answer grading.. https://doi.org/10.18653/v1/d19-1628
[47] Shaghaghian, S., Luna, F., Jafarpour, B., & Pogrebnyakov, N. (2021). Customizing contextualized language models forlegal document reviews. https://doi.org/10.48550/arxiv.2102.05757
[48] Imamguluyev, R. (2023). The rise of gpt-3: implications for natural language processing and beyond. International Journal of Research Publication and Reviews, 4(3), 4893-4903. https://doi.org/10.55248/gengpi.2023.4.33987
[49] Gaikwad, A., Rambhia, P., & Pawar, S. (2022). An extensive analysis between different language models: gpt-3, bert and macaw.. https://doi.org/10.21203/rs.3.rs-2155616/v1
[50] Dharrao, D. (2024). Summarizing business news: evaluating bart, t5, and pegasus for effective information extraction. Revue D Intelligence Artificielle, 38(3), 847-855. https://doi.org/10.18280/ria.380311 [51] Liu, F., Huang, T., Lyu, S., Shakeri, S., Yu, H., & Li, J. (2021). Enct5: fine-tuning t5 encoder for non-autoregressive tasks.. https://doi.org/10.48550/arxiv.2110.08426
[52] Mallinson, J., Adámek, J., Malmi, E., & Severyn, A. (2022). Edit5: semi-autoregressive text-editing with t5 warm-start.. https://doi.org/10.48550/arxiv.2205.12209

Toplam 51 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Derin Öğrenme, Takviyeli Öğrenme, Veri Yönetimi ve Veri Bilimi (Diğer)
Bölüm	Makaleler
Yazarlar	Ayhan Arısoy 0000-0001-6754-932X
Erken Görünüm Tarihi	24 Ekim 2024
Yayımlanma Tarihi	30 Ekim 2024
Gönderilme Tarihi	20 Ağustos 2024
Kabul Tarihi	7 Eylül 2024
Yayımlandığı Sayı	Yıl 2024 Cilt: 9 Sayı: 2

Kaynak Göster

APA	Arısoy, A. (2024). NATURAL LANGUAGE PROCESSING ALGORITHMS AND PERFORMANCE COMPARISON. Yalvaç Akademi Dergisi, 9(2), 106-121. https://doi.org/10.57120/yalvac.1536202

Kapak Resmi İndir

Makale Dosyaları

Tam Metin

http://www.yalvacakademi.org/