Research Article
BibTex RIS Cite

SİBER SAVUNMADA AKILLI YÖNTEMLER: WEB SAYFALARINDA MAKİNE ÖĞRENİMİ TABANLI KİMLİK AVI TESPİTİ

Year 2024, , 416 - 429, 30.06.2024
https://doi.org/10.21923/jesd.1458955

Abstract

Web sayfalarında oltalama saldırısı, internet kullanıcılarının kişisel ve hassas bilgilerini çalmayı amaçlayan kötü niyetli bir saldırı türüdür. Oltalama saldırıları genellikle e-posta, SMS, sosyal medya mesajları veya web siteleri gibi çeşitli iletişim kanalları aracılığıyla gerçekleştirilir. Kullanıcılar devlet kurumları, bankalar, çevrimiçi alışveriş siteleri gibi güvenilir kuruluşların sahte web sayfalarına yönlendirilir ve kişisel bilgilerini girmeleri istenir. Bu sahte web sayfaları orijinal sitelere oldukça benzeyebilir ve kullanıcıları yanıltmak için tasarlanmıştır. Bu çalışmada, web sayfalarının kimlik avı tehdidini tespit etmek için makine öğrenimi yöntemlerini kullandık ve bu alanda önemli bir ilerleme kaydettik. Altı farklı makine öğrenimi algoritmasının kapsamlı analizi, Extra Trees algoritmasının en başarılı sonuçları verdiğini gösterdi. Bu başarıyı daha da artırmak için Extra Trees algoritmasında ince ayarlar yaptık ve doğru sınıflandırma başarısını %97,9'a çıkardık. Gelecekteki çalışmalarda, bu teknolojinin kötü amaçlı yazılım tespiti veya kimlik avı saldırılarının önlenmesi gibi alanlarda kullanımını araştırmak için veri kümesini diğer makine öğrenimi yöntemlerini içerecek şekilde genişletmek istiyoruz. Bu, siber güvenlik alanında daha kapsamlı koruma sağlamaya yönelik çok önemli bir adım olacaktır.

References

  • Abdelhamid, N., Ayesh, A., & Thabtah, F. (2014). Phishing detection based Associative Classification data mining. Expert Systems with Applications, 41(13), 5948–5959. https://doi.org/10.1016/J.ESWA.2014.03.019
  • Adeyemo, V. E., Balogun, A. O., Mojeed, H. A., Akande, N. O., & Adewole, K. S. (2021). Ensemble-Based Logistic Model Trees for Website Phishing Detection. Communications in Computer and Information Science, 1347, 627–641. https://doi.org/10.1007/978-981-33-6835-4_41/TABLES/6
  • AlOmar, M. K., Hameed, M. M., & AlSaadi, M. A. (2020). Multi hours ahead prediction of surface ozone gas concentration: Robust artificial intelligence approach. Atmospheric Pollution Research, 11(9), 1572–1587. https://doi.org/10.1016/J.APR.2020.06.024
  • Alshingiti, Z., Alaqel, R., Al-Muhtadi, J., Haq, Q. E. U., Saleem, K., & Faheem, M. H. (2023). A Deep Learning-Based Phishing Detection System Using CNN, LSTM, and LSTM-CNN. Electronics 2023, Vol. 12, Page 232, 12(1), 232. https://doi.org/10.3390/ELECTRONICS12010232
  • Balogun, A. O., Akande, N. O., Usman-Hamza, F. E., Adeyemo, V. E., Mabayoje, M. A., & Ameen, A. O. (2021). Rotation Forest-Based Logistic Model Tree for Website Phishing Detection. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 12957 LNCS, 154–169. https://doi.org/10.1007/978-3-030-87013-3_12/TABLES/10
  • Balogun, A. O., Mojeed, H. A., Adewole, K. S., Akintola, A. G., Salihu, S. A.,
  • Bajeh, A. O., & Jimoh, R. G. (2021). Optimized Decision Forest for Website Phishing Detection. Lecture Notes in Networks and Systems, 231 LNNS, 568–582. https://doi.org/10.1007/978-3-030-90321-3_47/TABLES/7
  • Barraclough, P. A., Fehringer, G., & Woodward, J. (2021). Intelligent cyber-phishing detection for online. Computers & Security, 104, 102123. https://doi.org/10.1016/J.COSE.2020.102123
  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324/METRICS
  • Dhanavanthini, P., & Chakkravarthy, S. S. (2023). Phish-armour: phishing detection using deep recurrent neural networks. Soft Computing, 1–13. https://doi.org/10.1007/S00500-023-07962-Y/TABLES/2
  • Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42. https://doi.org/10.1007/S10994-006-6226-1/METRICS
  • Hammid, A. T., Sulaiman, M. H. Bin, & Abdalla, A. N. (2018). Prediction of small hydropower plant power production in Himreen Lake dam (HLD) using artificial neural network. Alexandria Engineering Journal, 57(1), 211–221. https://doi.org/10.1016/J.AEJ.2016.12.011
  • Jain, A. K., & Gupta, B. B. (2019). A machine learning based approach for phishing detection using hyperlinks information. Journal of Ambient Intelligence and Humanized Computing, 10(5), 2015–2028. https://doi.org/10.1007/S12652-018-0798-Z/TABLES/6
  • Mishra, G., Sehgal, D., & Valadi, J. K. (2017). Quantitative Structure Activity Relationship study of the Anti-Hepatitis Peptides employing Random Forests and Extra-trees regressors. Bioinformation, 13(3), 60. https://doi.org/10.6026/97320630013060
  • Mithra Raj, M., & Arul Jothi, J. A. (2022). Website Phishing Detection Using Machine Learning Classification Algorithms. Communications in Computer and Information Science, 1643 CCIS, 219–233. https://doi.org/10.1007/978-3-031-19647-8_16/TABLES/8
  • Moghimi, M., & Varjani, A. Y. (2016). New rule-based phishing detection method. Expert Systems with Applications, 53, 231–242. https://doi.org/10.1016/J.ESWA.2016.01.028
  • Rashid, J., Mahmood, T., Nisar, M. W., & Nazir, T. (2020). Phishing Detection Using Machine Learning Technique. Proceedings - 2020 1st International Conference of Smart Systems and Emerging Technologies, SMART-TECH 2020, 43–46. https://doi.org/10.1109/SMART-TECH49988.2020.00026
  • Sahingoz, O. K., Buber, E., Demir, O., & Diri, B. (2019). Machine learning based phishing detection from URLs. Expert Systems with Applications, 117, 345–357. https://doi.org/10.1016/J.ESWA.2018.09.029
  • Website Phishing Dataset. (n.d.). Retrieved March 19, 2024, from https://www.kaggle.com/datasets/ahmednour/website-phishing-data-set/data
  • Willmott, C. J., & Matsuura, K. (2005). Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Research, 30(1), 79–82. https://doi.org/10.3354/CR030079
  • Wu, C. Y., Kuo, C. C., & Yang, C. S. (2019). A Phishing Detection System based on Machine Learning. Proceedings - 2019 International Conference on Intelligent Computing and Its Emerging Applications, ICEA 2019, 28–32. https://doi.org/10.1109/ICEA.2019.8858325
  • Yerima, S. Y., & Alzaylaee, M. K. (2020). High Accuracy Phishing Detection Based on Convolutional Neural Networks. ICCAIS 2020 - 3rd International Conference on Computer Applications and Information Security. https://doi.org/10.1109/ICCAIS48893.2020.9096869
  • Yi, P., Guan, Y., Zou, F., Yao, Y., Wang, W., & Zhu, T. (2018). Web phishing detection using a deep learning framework. Wireless Communications and Mobile Computing, 2018. https://doi.org/10.1155/2018/4678746
  • Ying, P., & Xuhua, D. (2006). Anomaly based web phishing page detection. Proceedings - Annual Computer Security Applications Conference, ACSAC, 381–390. https://doi.org/10.1109/ACSAC.2006.13

INTELLIGENT METHODS IN CYBER DEFENCE: MACHINE LEARNING BASED PHISHING ATTACK DETECTION ON WEB PAGES

Year 2024, , 416 - 429, 30.06.2024
https://doi.org/10.21923/jesd.1458955

Abstract

Phishing attack on web pages is a type of malicious attack that aims to steal personal and sensitive information of internet users. Phishing attacks are usually conducted through various communication channels such as email, SMS, social media messages or websites. Users are directed to fake web pages of trusted organizations such as government agencies, banks, online shopping sites, etc. and asked to enter their personal information. These fake web pages may look remarkably like the original sites and are designed to mislead users. In this study, we used machine learning methods to detect the phishing attack threat of web pages and made significant progress in this area. Extensive analysis of six different machine learning algorithms showed that the Extra Trees algorithm yielded the most successful results. To further improve this success, we fine-tuned the Extra Trees algorithm and increased the correct classification success to 97.9%. In future studies, we would like to expand the dataset to include other machine learning methods to investigate the use of this technology in areas such as malware detection or the prevention of phishing attacks. This would be a crucial step towards providing more comprehensive protection in the field of cybersecurity.

References

  • Abdelhamid, N., Ayesh, A., & Thabtah, F. (2014). Phishing detection based Associative Classification data mining. Expert Systems with Applications, 41(13), 5948–5959. https://doi.org/10.1016/J.ESWA.2014.03.019
  • Adeyemo, V. E., Balogun, A. O., Mojeed, H. A., Akande, N. O., & Adewole, K. S. (2021). Ensemble-Based Logistic Model Trees for Website Phishing Detection. Communications in Computer and Information Science, 1347, 627–641. https://doi.org/10.1007/978-981-33-6835-4_41/TABLES/6
  • AlOmar, M. K., Hameed, M. M., & AlSaadi, M. A. (2020). Multi hours ahead prediction of surface ozone gas concentration: Robust artificial intelligence approach. Atmospheric Pollution Research, 11(9), 1572–1587. https://doi.org/10.1016/J.APR.2020.06.024
  • Alshingiti, Z., Alaqel, R., Al-Muhtadi, J., Haq, Q. E. U., Saleem, K., & Faheem, M. H. (2023). A Deep Learning-Based Phishing Detection System Using CNN, LSTM, and LSTM-CNN. Electronics 2023, Vol. 12, Page 232, 12(1), 232. https://doi.org/10.3390/ELECTRONICS12010232
  • Balogun, A. O., Akande, N. O., Usman-Hamza, F. E., Adeyemo, V. E., Mabayoje, M. A., & Ameen, A. O. (2021). Rotation Forest-Based Logistic Model Tree for Website Phishing Detection. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 12957 LNCS, 154–169. https://doi.org/10.1007/978-3-030-87013-3_12/TABLES/10
  • Balogun, A. O., Mojeed, H. A., Adewole, K. S., Akintola, A. G., Salihu, S. A.,
  • Bajeh, A. O., & Jimoh, R. G. (2021). Optimized Decision Forest for Website Phishing Detection. Lecture Notes in Networks and Systems, 231 LNNS, 568–582. https://doi.org/10.1007/978-3-030-90321-3_47/TABLES/7
  • Barraclough, P. A., Fehringer, G., & Woodward, J. (2021). Intelligent cyber-phishing detection for online. Computers & Security, 104, 102123. https://doi.org/10.1016/J.COSE.2020.102123
  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324/METRICS
  • Dhanavanthini, P., & Chakkravarthy, S. S. (2023). Phish-armour: phishing detection using deep recurrent neural networks. Soft Computing, 1–13. https://doi.org/10.1007/S00500-023-07962-Y/TABLES/2
  • Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42. https://doi.org/10.1007/S10994-006-6226-1/METRICS
  • Hammid, A. T., Sulaiman, M. H. Bin, & Abdalla, A. N. (2018). Prediction of small hydropower plant power production in Himreen Lake dam (HLD) using artificial neural network. Alexandria Engineering Journal, 57(1), 211–221. https://doi.org/10.1016/J.AEJ.2016.12.011
  • Jain, A. K., & Gupta, B. B. (2019). A machine learning based approach for phishing detection using hyperlinks information. Journal of Ambient Intelligence and Humanized Computing, 10(5), 2015–2028. https://doi.org/10.1007/S12652-018-0798-Z/TABLES/6
  • Mishra, G., Sehgal, D., & Valadi, J. K. (2017). Quantitative Structure Activity Relationship study of the Anti-Hepatitis Peptides employing Random Forests and Extra-trees regressors. Bioinformation, 13(3), 60. https://doi.org/10.6026/97320630013060
  • Mithra Raj, M., & Arul Jothi, J. A. (2022). Website Phishing Detection Using Machine Learning Classification Algorithms. Communications in Computer and Information Science, 1643 CCIS, 219–233. https://doi.org/10.1007/978-3-031-19647-8_16/TABLES/8
  • Moghimi, M., & Varjani, A. Y. (2016). New rule-based phishing detection method. Expert Systems with Applications, 53, 231–242. https://doi.org/10.1016/J.ESWA.2016.01.028
  • Rashid, J., Mahmood, T., Nisar, M. W., & Nazir, T. (2020). Phishing Detection Using Machine Learning Technique. Proceedings - 2020 1st International Conference of Smart Systems and Emerging Technologies, SMART-TECH 2020, 43–46. https://doi.org/10.1109/SMART-TECH49988.2020.00026
  • Sahingoz, O. K., Buber, E., Demir, O., & Diri, B. (2019). Machine learning based phishing detection from URLs. Expert Systems with Applications, 117, 345–357. https://doi.org/10.1016/J.ESWA.2018.09.029
  • Website Phishing Dataset. (n.d.). Retrieved March 19, 2024, from https://www.kaggle.com/datasets/ahmednour/website-phishing-data-set/data
  • Willmott, C. J., & Matsuura, K. (2005). Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Research, 30(1), 79–82. https://doi.org/10.3354/CR030079
  • Wu, C. Y., Kuo, C. C., & Yang, C. S. (2019). A Phishing Detection System based on Machine Learning. Proceedings - 2019 International Conference on Intelligent Computing and Its Emerging Applications, ICEA 2019, 28–32. https://doi.org/10.1109/ICEA.2019.8858325
  • Yerima, S. Y., & Alzaylaee, M. K. (2020). High Accuracy Phishing Detection Based on Convolutional Neural Networks. ICCAIS 2020 - 3rd International Conference on Computer Applications and Information Security. https://doi.org/10.1109/ICCAIS48893.2020.9096869
  • Yi, P., Guan, Y., Zou, F., Yao, Y., Wang, W., & Zhu, T. (2018). Web phishing detection using a deep learning framework. Wireless Communications and Mobile Computing, 2018. https://doi.org/10.1155/2018/4678746
  • Ying, P., & Xuhua, D. (2006). Anomaly based web phishing page detection. Proceedings - Annual Computer Security Applications Conference, ACSAC, 381–390. https://doi.org/10.1109/ACSAC.2006.13
There are 24 citations in total.

Details

Primary Language English
Subjects Information Systems Development Methodologies and Practice
Journal Section Research Articles
Authors

Remzi Gürfidan 0000-0002-4899-2219

Publication Date June 30, 2024
Submission Date March 26, 2024
Acceptance Date June 11, 2024
Published in Issue Year 2024

Cite

APA Gürfidan, R. (2024). INTELLIGENT METHODS IN CYBER DEFENCE: MACHINE LEARNING BASED PHISHING ATTACK DETECTION ON WEB PAGES. Mühendislik Bilimleri Ve Tasarım Dergisi, 12(2), 416-429. https://doi.org/10.21923/jesd.1458955