Continuously increasing data bring new problems and problems usually reveal new research areas. One of the new areas is Sentiment Analysis. This field has some difficulties. The fact that people have complex sentiments is the main cause of the difficulty, but this has not prevented the progress of the studies in this field. Sentiment analysis is generally used to obtain information about persons by collecting their texts or expressions. Sentiment analysis can sometimes bring serious benefits. In this study, with singular tag-plural class approach, a binary classification was performed. An LSTM network and several machine learning models were tested. The dataset collected in Turkish, and Stanford Large Movie Reviews datasets were used in this study. Due to the noise in the dataset, the Zemberek NLP Library for Turkic Languages and Regular Expression techniques were used to normalize and clean texts, later, the data were transformed into vector sequences. The preprocessing process made 2% increase to the model performance on the Turkish Customer Reviews dataset. The model was established using an LSTM network. Our model showed better performance than Machine Learning techniques and achieved an accuracy of 90.59% on the Turkish dataset and an accuracy of 89.02% on the IMDB dataset.
Continuously increasing data bring new problems and problems usually reveal new research areas. One of the new areas is Sentiment Analysis. This field has some difficulties. The fact that people have complex sentiments is the main cause of the difficulty, but this has not prevented the progress of the studies in this field. Sentiment analysis is generally used to obtain information about persons by collecting their texts or expressions. Sentiment analysis can sometimes bring serious benefits. In this study, with singular tag-plural class approach, a binary classification was performed. An LSTM network and several machine learning models were tested. The dataset collected in Turkish, and Stanford Large Movie Reviews datasets were used in this study. Due to the noise in the dataset, the Zemberek NLP Library for Turkic Languages and Regular Expression techniques were used to normalize and clean texts, later, the data were transformed into vector sequences. The preprocessing process made 2% increase to the model performance on the Turkish Customer Reviews dataset. The model was established using an LSTM network. Our model showed better performance than Machine Learning techniques and achieved an accuracy of 90.59% on the Turkish dataset and an accuracy of 89.02% on the IMDB dataset.
Primary Language | Turkish |
---|---|
Subjects | Engineering |
Journal Section | Research Article |
Authors | |
Publication Date | October 1, 2022 |
Submission Date | December 21, 2020 |
Published in Issue | Year 2022 Volume: 25 Issue: 3 |
This work is licensed under Creative Commons Attribution-ShareAlike 4.0 International.