Research Article
BibTex RIS Cite

DarkWEB Traffic Detection and Classification Using Machine Learning Method

Year 2023, Volume: 9 Issue: 4, 126 - 140, 31.12.2023

Abstract

DarkWEB makes up 6% of DeepWEB, which contains data that search engines cannot index and is approximately 96% of all websites. DarkWEB is encrypted network traffic tunneled through special software such as TOR (The Onion Router) and provides a high level of anonymity with a series of anonymized connections that make the IP address untraceable. This makes it easier to carry out criminal activities such as media piracy, drug dealing, terrorism and child pornography. In this study, the statistical information of the packets was analyzed without decrypting this encrypted network traffic. Different data sets were obtained by applying categorical data coding, scaling, feature selection and data balancing pre-processes separately and together to the CIC-Darknet2020 data set used within the scope of the proposed methodology for high-accuracy detection and classification of DarkWEB traffic. Obtained data sets and Logistic Regression (LR), Gaussian Naive Bayes (GNB), Decision Tree (DT), K-Nearest Neighbor (KNN), Multi Layer Perceptron (MLP), Random Forest (RF), eXtreme Gradient Boosting (XGBoost). ), many DarkWEB traffic detection and classification models have been created using Light Gradient Boosting Machine (LightGBM), Category Boosting (CatBoost) machine learning algorithms. With the models created, Encryption (Encrypted, Standard), Category (Tor, Non-Tor, Non-VPN, VPN), Subcategory (Audio-Stream, Browsing, Chat, E-mail, P2P, Transfer, Video-Stream, VOIP) classes 2, 4 and 8 classifications were made. The correct detection and classification rate of DarkWEB traffic was achieved at 99.9% in 2-way and 4-way classification and 94% in 8-way classification.

References

  • [1] G. Weımann, “Going Darker? The Challenge of Dark Net Terrorism”, wilsoncenter.org, [Online]. Available: https://www.wilsoncenter.org/sites/default/files/media/documents/publication/going_darker_challenge_of_dark_net_terrorism.pdf. [Accessed: Jun. 6, 2023].
  • [2] R. Badhwar, The CISO’s Next Frontier: Dark Web & Dark Net, Springer Nature Switzerland AG 2021.
  • [3] K. Demertzis, K. Tsiknas, D. Takezis, C. Skianis and L. Iliadis, “Darknet traffic bigdata analysis and network management for real-time automating of the malicious intent detection process by a weight agnostic neural networks framework”, Electronics, vo.10, no.7, pp.781, 2021. doi: 10.3390/electronics10070781
  • [4] A. Bracci, M.Nadini, M. Aliapoulios, D. McCoy, I. Gray, A. Teytelboym, A. Gallo and A. Baronchelli, “Dark Web Marketplaces and COVID-19: before the vaccine,” EPJ Data Sci, vol.10, no. 6, 2021. doi: 10.1140/epjds/s13688-021-00259-w
  • [5] A.H. Lashkari, G. Kaur and A. Rahali, “DIDarknet: A Contemporary Approach to Detect and Characterize the Darknet Traffic using Deep Image Learning,” 10th International Conference on Communication and Network Security, 2020, Tokyo, pp. 1-13, November, 2020.
  • [6] M. B. Sarwar, M. K. Hanif, R. Talib, M. Younas and M. U. Sarwar, "DarkDetect: Darknet Traffic Detection and Categorization Using Modified Convolution-Long Short-Term Memory," in IEEE Access, vol. 9, pp. 113705-113713, 2021, doi: 10.1109/ACCESS.2021.3105000.
  • [7] L. A. Iliadis and T. Kaifas, "Darknet Traffic Classification using Machine Learning Techniques," 2021 10th International Conference on Modern Circuits and Systems Technologies (MOCAST), Thessaloniki, July 2021, Greece [Online]. Available: IEEE Xplore, https://ieeexplore.ieee.org/document/9493386. [Accessed: 10 Sept. 2023].
  • [8] S. Sridhar and S. Sanagavarapu, "DarkNet Traffic Classification Pipeline with Feature Selection and Conditional GAN-based Class Balancing," 2021 IEEE 20th International Symposium on Network Computing and Applications (NCA), Boston, MA, USA, 2021, [Online]. Available: IEEE Xplore, https://ieeexplore.ieee.org/document/9685743. [Accessed: 20 May. 2023].
  • [9] Y. Li, Y. Lu and S. Li, "EZAC: Encrypted Zero-day Applications Classification using CNN and K-Means," 2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design (CSCWD), Dalian, China, 2021, [Online]. Available: IEEE Xplore, https://ieeexplore.ieee.org/document/9437716. [Accessed: 12 Feb. 2023].
  • [10] M. Ugurlu, İ. Dogru, ve R. S. Arslan, “Karanlık ağ trafiğinin makine öğrenmesi yöntemleri kullanılarak tespiti ve sınıflandırılması,” GUMMFD, vol. 38, no. 3, pp. 1737–1746, 2023, doi: 10.17341/gazimmfd.1023147.
  • [11] N. Rust-Nguyen, S. Sharma, and M. Stamp, “Darknet traffic classification and adversarial attacks using machine learning,” Comput. Secur, vol. 127, pp.16, 2023. doi: 10.1016/j.cose.2023.103098
  • [12] A. Almomani, “Darknet traffic analysis, and classification system based on modified stacking ensemble learning algorithms,” Inf Syst E-Bus Manage, 2023. doi: 10.1007/s10257-023-00626-2
  • [13] H. Mohanty, A. H. Roudsari, and A. Habibi Lashkari, “Robust stacking ensemble model for darknet traffic classification under adversarial settings,” Comput. Secur, vol.120, Sep. 2022. doi: 10.1016/j.cose.2022.102830
  • [14] Q. A. Al-Haija, M. Krichen and W. A. Elhaija, “Machine-Learning-Based Darknet Traffic Detection System for IoT Applications,” Electronics, vol. 11, no.4, pp.556, 2022. doi:11. 556. 10.3390/electronics11040556.
  • [15] Y. Li and Y. Lu, “ ETCC: Encrypted Two-Label Classification Using CNN,” Sec. and Commun. Netw. vol.2021, pp.11, 2021. doi:10.1155/2021/6633250
  • [16] M. Alimoradi, M. Zabihimayvan, A. Daliri, R. Sledzik and R. Sadeghi, “Deep Neural Classification of Darknet Traffic,” In book: Artificial Intelligence Research and Development, Edition: printChapter: 356, Publisher: IOS Press, 2022, pp.105-114
  • [17] A. H. Lashkari, G. Draper Gil, M. Mamun and A. Ghorbani, “Characterization of Encrypted and VPN Traffic Using Time-Related Features,” The International Conference on Information Systems Security and Privacy (ICISSP), Feb 2016, Italy, [Online]. Available: IEEE Xplore, https://doi.org/10.5220/0005740704070414. [Accessed: 10 Apr. 2023].
  • [18] A. H. Lashkari, G. Kaur and A. Rahali, “DIDarknet: A Contemporary Approach to Detect and Characterize the Darknet Traffic using Deep Image Learning,” 10th International Conference on Communication and Network Security, November 2020, Tokyo, Japan, [Online]. Available: https://doi.org/10.1145/3442520.3442521. [Accessed: 20 May. 2023].
  • [19] E. G. İlgün ve R. Samet, “Veri setine uygulanan ön işlemler ile makine öğrenimi yöntemi kullanılarak geliştirilen saldırı tespit modellerinin performanslarının arttırılması,” GUMMFD, vol. 39, no. 2, pp. 679–692, 2023, doi: 10.17341/gazimmfd.1122021.
  • [20] E. G. İlgün, “Veri setine uygulanan ön işlemlerin anomali tabanlı saldırı tespit modellerinin performansları üzerindeki etkisinin incelenmesi,” Yüksek Lisans Tezi, Ankara Üniversitesi, Ankara, Türkiye, 2022.
  • [21] O. Kaynar, H. Arslan, Y. Görmez ve Y. E. Işık, “Makine Öğrenmesi ve Öznitelik Seçim Yöntemleriyle Saldırı Tespiti,” Bilişim Teknolojileri Dergisi, 11 (2), pp.175-185, 2018. doi: 10.17671/gazibtd.368583
  • [22] A. Fernandez, S. Garcia, M. Galar, R.C. Prati, B. Krawczyk and F. Herrera, “Learning from Imbalanced Data Sets,” Cambridge International Law Journal, pp. 83, 2018. doi:10.1007/978-3-319-98074-4
  • [23] J. Brownlee, “Random Oversampling and Undersampling for Imbalanced Classification,” machinelearningmastery.com, Jan. 15, 2020. [Online]. Available: https://machinelearningmastery.com/random-oversampling-and-undersampling-for-imbalanced-classification/on. [Accessed: 12 Apr. 2023].

Makine Öğrenme Yöntemi Kullanılarak DarkWEB Trafiği Tespiti ve Sınıflandırılması

Year 2023, Volume: 9 Issue: 4, 126 - 140, 31.12.2023

Abstract

DarkWEB, arama motorlarının indeksleyemediği verileri içeren ve tüm web sitelerinin yaklaşık %96’sı olan DeepWEB’in %6’sını oluşturur. DarkWEB, TOR (The Onion Router) gibi özel yazılımlar ile tünellenen şifreli ağ trafiğidir ve IP adresini izlenemez hale getiren anonimleştirilmiş bir dizi bağlantı ile yüksek düzeyde anonimlik sağlar. Bu durum medya korsanlığı, uyuşturucu satıcılığı, terörizm, çocuk pornografisi gibi suç faaliyetlerinin gerçekleştirilmesini kolaylaştırır. Bu çalışmada, bu şifreli ağ trafiğinde deşifreleme işlemi yapılmadan, paketlerin istatistiki bilgileri analiz edilmiştir. DarkWEB trafiğinin yüksek doğrulukta tespiti ve sınıflandırılması için önerilen metodoloji kapsamında kullanılan CIC-Darknet2020 veri setine kategorik veri kodlama, ölçeklendirme, öznitelik seçimi ve veri dengeleme ön işlemleri ayrı ayrı ve de birlikte uygulanarak farklı veri setleri elde edilmiştir. Elde edilen veri setleri ve Logistic Regression (LR), Gaussian Naive Bayes (GNB), Decision Tree (DT), K-Nearest Neighbor (KNN), Multi Layer Perceptron (MLP), Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), Category Boosting (CatBoost) makine öğrenme algoritmaları kullanılarak çok sayıda DarkWEB trafiği tespit ve sınıflandırma modeli oluşturulmuştur. Oluşturulan modeller ile Encryption (Şifreli, Standart), Category (Tor, Non-Tor, Non-VPN, VPN), Subcategory ( Audio-Stream, Browsing, Chat, E-mail, P2P, Transfer, Video-Stream, VOIP) sınıfları olmak üzere 2’li, 4’lü, 8’li sınıflandırmalar yapılmıştır. 2’li ve 4’lü sınıflandırmada %99.9, 8’li sınıflandırmada ise %94, DarkWEB trafiği doğru tespit ve sınıflandırma oranına ulaşılmıştır.

References

  • [1] G. Weımann, “Going Darker? The Challenge of Dark Net Terrorism”, wilsoncenter.org, [Online]. Available: https://www.wilsoncenter.org/sites/default/files/media/documents/publication/going_darker_challenge_of_dark_net_terrorism.pdf. [Accessed: Jun. 6, 2023].
  • [2] R. Badhwar, The CISO’s Next Frontier: Dark Web & Dark Net, Springer Nature Switzerland AG 2021.
  • [3] K. Demertzis, K. Tsiknas, D. Takezis, C. Skianis and L. Iliadis, “Darknet traffic bigdata analysis and network management for real-time automating of the malicious intent detection process by a weight agnostic neural networks framework”, Electronics, vo.10, no.7, pp.781, 2021. doi: 10.3390/electronics10070781
  • [4] A. Bracci, M.Nadini, M. Aliapoulios, D. McCoy, I. Gray, A. Teytelboym, A. Gallo and A. Baronchelli, “Dark Web Marketplaces and COVID-19: before the vaccine,” EPJ Data Sci, vol.10, no. 6, 2021. doi: 10.1140/epjds/s13688-021-00259-w
  • [5] A.H. Lashkari, G. Kaur and A. Rahali, “DIDarknet: A Contemporary Approach to Detect and Characterize the Darknet Traffic using Deep Image Learning,” 10th International Conference on Communication and Network Security, 2020, Tokyo, pp. 1-13, November, 2020.
  • [6] M. B. Sarwar, M. K. Hanif, R. Talib, M. Younas and M. U. Sarwar, "DarkDetect: Darknet Traffic Detection and Categorization Using Modified Convolution-Long Short-Term Memory," in IEEE Access, vol. 9, pp. 113705-113713, 2021, doi: 10.1109/ACCESS.2021.3105000.
  • [7] L. A. Iliadis and T. Kaifas, "Darknet Traffic Classification using Machine Learning Techniques," 2021 10th International Conference on Modern Circuits and Systems Technologies (MOCAST), Thessaloniki, July 2021, Greece [Online]. Available: IEEE Xplore, https://ieeexplore.ieee.org/document/9493386. [Accessed: 10 Sept. 2023].
  • [8] S. Sridhar and S. Sanagavarapu, "DarkNet Traffic Classification Pipeline with Feature Selection and Conditional GAN-based Class Balancing," 2021 IEEE 20th International Symposium on Network Computing and Applications (NCA), Boston, MA, USA, 2021, [Online]. Available: IEEE Xplore, https://ieeexplore.ieee.org/document/9685743. [Accessed: 20 May. 2023].
  • [9] Y. Li, Y. Lu and S. Li, "EZAC: Encrypted Zero-day Applications Classification using CNN and K-Means," 2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design (CSCWD), Dalian, China, 2021, [Online]. Available: IEEE Xplore, https://ieeexplore.ieee.org/document/9437716. [Accessed: 12 Feb. 2023].
  • [10] M. Ugurlu, İ. Dogru, ve R. S. Arslan, “Karanlık ağ trafiğinin makine öğrenmesi yöntemleri kullanılarak tespiti ve sınıflandırılması,” GUMMFD, vol. 38, no. 3, pp. 1737–1746, 2023, doi: 10.17341/gazimmfd.1023147.
  • [11] N. Rust-Nguyen, S. Sharma, and M. Stamp, “Darknet traffic classification and adversarial attacks using machine learning,” Comput. Secur, vol. 127, pp.16, 2023. doi: 10.1016/j.cose.2023.103098
  • [12] A. Almomani, “Darknet traffic analysis, and classification system based on modified stacking ensemble learning algorithms,” Inf Syst E-Bus Manage, 2023. doi: 10.1007/s10257-023-00626-2
  • [13] H. Mohanty, A. H. Roudsari, and A. Habibi Lashkari, “Robust stacking ensemble model for darknet traffic classification under adversarial settings,” Comput. Secur, vol.120, Sep. 2022. doi: 10.1016/j.cose.2022.102830
  • [14] Q. A. Al-Haija, M. Krichen and W. A. Elhaija, “Machine-Learning-Based Darknet Traffic Detection System for IoT Applications,” Electronics, vol. 11, no.4, pp.556, 2022. doi:11. 556. 10.3390/electronics11040556.
  • [15] Y. Li and Y. Lu, “ ETCC: Encrypted Two-Label Classification Using CNN,” Sec. and Commun. Netw. vol.2021, pp.11, 2021. doi:10.1155/2021/6633250
  • [16] M. Alimoradi, M. Zabihimayvan, A. Daliri, R. Sledzik and R. Sadeghi, “Deep Neural Classification of Darknet Traffic,” In book: Artificial Intelligence Research and Development, Edition: printChapter: 356, Publisher: IOS Press, 2022, pp.105-114
  • [17] A. H. Lashkari, G. Draper Gil, M. Mamun and A. Ghorbani, “Characterization of Encrypted and VPN Traffic Using Time-Related Features,” The International Conference on Information Systems Security and Privacy (ICISSP), Feb 2016, Italy, [Online]. Available: IEEE Xplore, https://doi.org/10.5220/0005740704070414. [Accessed: 10 Apr. 2023].
  • [18] A. H. Lashkari, G. Kaur and A. Rahali, “DIDarknet: A Contemporary Approach to Detect and Characterize the Darknet Traffic using Deep Image Learning,” 10th International Conference on Communication and Network Security, November 2020, Tokyo, Japan, [Online]. Available: https://doi.org/10.1145/3442520.3442521. [Accessed: 20 May. 2023].
  • [19] E. G. İlgün ve R. Samet, “Veri setine uygulanan ön işlemler ile makine öğrenimi yöntemi kullanılarak geliştirilen saldırı tespit modellerinin performanslarının arttırılması,” GUMMFD, vol. 39, no. 2, pp. 679–692, 2023, doi: 10.17341/gazimmfd.1122021.
  • [20] E. G. İlgün, “Veri setine uygulanan ön işlemlerin anomali tabanlı saldırı tespit modellerinin performansları üzerindeki etkisinin incelenmesi,” Yüksek Lisans Tezi, Ankara Üniversitesi, Ankara, Türkiye, 2022.
  • [21] O. Kaynar, H. Arslan, Y. Görmez ve Y. E. Işık, “Makine Öğrenmesi ve Öznitelik Seçim Yöntemleriyle Saldırı Tespiti,” Bilişim Teknolojileri Dergisi, 11 (2), pp.175-185, 2018. doi: 10.17671/gazibtd.368583
  • [22] A. Fernandez, S. Garcia, M. Galar, R.C. Prati, B. Krawczyk and F. Herrera, “Learning from Imbalanced Data Sets,” Cambridge International Law Journal, pp. 83, 2018. doi:10.1007/978-3-319-98074-4
  • [23] J. Brownlee, “Random Oversampling and Undersampling for Imbalanced Classification,” machinelearningmastery.com, Jan. 15, 2020. [Online]. Available: https://machinelearningmastery.com/random-oversampling-and-undersampling-for-imbalanced-classification/on. [Accessed: 12 Apr. 2023].
There are 23 citations in total.

Details

Primary Language Turkish
Subjects Software Engineering (Other)
Journal Section Research Articles
Authors

Esen Gül İlgün 0000-0002-1719-5727

Yusuf Sönmez

Murat Dener

Publication Date December 31, 2023
Submission Date November 19, 2023
Acceptance Date December 20, 2023
Published in Issue Year 2023 Volume: 9 Issue: 4

Cite

IEEE E. G. İlgün, Y. Sönmez, and M. Dener, “Makine Öğrenme Yöntemi Kullanılarak DarkWEB Trafiği Tespiti ve Sınıflandırılması”, GJES, vol. 9, no. 4, pp. 126–140, 2023.

Gazi Journal of Engineering Sciences (GJES) publishes open access articles under a Creative Commons Attribution 4.0 International License (CC BY). 1366_2000-copia-2.jpg