Büyük Veri Ortamlarında Zararlı Yazılım Tespiti Kapsamında Makine Öğrenmesi Algoritmalarının Performansının İncelenmesi

Sercan Gülburun; Murat Dener

doi:10.31202/ecjse.967919

Research Article

Büyük Veri Ortamlarında Zararlı Yazılım Tespiti Kapsamında Makine Öğrenmesi Algoritmalarının Performansının İncelenmesi

Year 2021, Volume: 8 Issue: 3, 1536 - 1549, 30.09.2021

Sercan Gülburun , Murat Dener

https://doi.org/10.31202/ecjse.967919

Abstract

Bilgi teknolojileri varlıklarının hem bireylerin günlük hayatlarındaki hem de kurum ve kuruluşların işleyişindeki yeri son çeyrek asırda hızlı bir artış göstermiştir. Bu artışa paralel olarak bilgi varlıklarına yönelik tehditler de artmıştır. Bu varlıkları tehdit eden başlıca hususlardan bir tanesi zararlı yazılımlardır. Bu çalışmada, büyük veri ortamında zararlı yazılımların tespit edilmesi kapsamında makine öğrenmesi algoritmalarının etkinliği incelenmiştir. Google Colaboratory, Azure HDInsight, Amazon EMR ve Google Dataproc ortamlarında yapılan çalışmada, Apache Spark 3.0’da bulunan ve ikili sınıflandırma yapabilen rastgele orman (Random Forest - RF), karar ağaçları (Decision Trees – DT) ve gradyan yükseltme ağaçları (Gradient Boosting Trees – GBT) makine öğrenme metotları kullanılarak Kaggle Zararlı Yazılım Tespiti Veri Seti üzerinde modellerin etkinliği test edilmiştir. Statik analiz yaklaşımıyla gerçekleştirilen çalışmada, her bir makine öğrenme algoritması için doğruluk, kesinlik, duyarlılık, eğitim zamanı ve tahmin zamanı metrikleri hesaplanmış, ayrıca, aynı algoritmalar için Sci-Kit Learn kütüphanesinden faydalanılarak da sonuçlar elde edilmiş ve değerlendirilmiştir.

Keywords

Büyük Veri, Makine Öğrenmesi, Zararlı Yazılım Tespiti, Google Dataproc, Azure HDInsight

References

[1]. Abawajy, J. H., and Kelarev, A., “Large iterative multitier ensemble classifiers for security of big data”, IEEE Trans. Emerg. Top. Comput., vol. 2, no. 3, pp. 352–363, 2014, doi: 10.1109/TETC.2014.2316510.
[2]. Bocchi, E., Grimaudo, L., Mellia, M., Baralis, E., Saha, S., Miskovic, S., Modelo-Howard, G. and Lee, S.J., “MAGMA network behavior classifier for malware traffic” Comput. Networks, vol. 109, pp. 142–156, 2016, doi: 10.1016/j.comnet.2016.03.021.
[3]. Gupta, D., and Rani, R., “Big data framework for zero-day malware detection” Cybern. Syst., vol. 49, no. 2, pp. 103–121, 2018, doi: 10.1080/01969722.2018.1429835.
[4]. Gupta, D., and Rani, R., “Improving malware detection using big data and ensemble learning,” Comput. Electr. Eng., vol. 86, p. 106729, 2020, doi: 10.1016/j.compeleceng.2020.106729.
[5]. Abawajy, J. H., Chowdhury, M., and Kelarev, A., “Hybrid consensus pruning of ensemble classifiers for big data malware detection” IEEE Trans. Cloud Comput., vol. 8, no. 2, pp. 398–407, 2020, doi: 10.1109/TCC.2015.2481378.
[6]. Usman, N., Usman, S., Khan, F., Jan, M.A., Saj,d, A., Alazab, M. and Watters, P., “Intelligent dynamic malware detection using machine learning in IP reputation for forensics data analytics” Futur. Gener. Comput. Syst., vol. 118, pp. 124–141, 2021, doi: 10.1016/j.future.2021.01.004.
[7]. Sahoo, A. K., Sahoo, K. S., and Tiwary, M., “Signature based malware detection for unstructured data in Hadoop” 2014 Int. Conf. Adv. Electron. Comput. Commun. ICAECC 2014, 2015, doi: 10.1109/ICAECC.2014.7002394.
[8]. Suhasini, N. S., Hirwarkar, T., and Ashok, J., “Big data analytics for malware detection in a virtulaized framework” vol. 7, no. 14, pp. 3184–3191, 2020.
[9]. Vinayakumar, R., Alazab, M., Soman, K. P., Poornachandran, P., and Venkatraman, S., “Robust intelligent malware detection using deep learning” IEEE Access, vol. 7, pp. 46717–46738, 2019, doi: 10.1109/ACCESS.2019.2906934.
[10]. De Paola, A., Gaglio, S., Lo Re, G., and Morana, M., “A hybrid system for malware detection on big data” INFOCOM 2018 - IEEE Conf. Comput. Commun. Work., pp. 45–50, 2018, doi: 10.1109/INFCOMW.2018.8406963.
[11]. Masabo, E., Kaawaase, K. S., and Sansa-Otim, J., “Big data: Deep learning for detecting malware” Proc. - Int. Conf. Softw. Eng., pp. 20–26, 2018, doi: 10.1145/3195528.3195533.
[12]. Yousefi-azar, M., Hamey, L. G. C., Varadharajan, V., and Chen, S., “Malytics : A malware detection scheme” IEEE Access, vol. 6, pp. 49418–49431, 2018, doi: 10.1109/ACCESS.2018.2864871.
[13]. Mao, W., Cai, Z., Yang, Y., and Shi, X., “From big data to knowledge : A spatio- temporal approach to malware detection” Comput. Secur., vol. 74, pp. 167–183, 2018, doi: 10.1016/j.cose.2017.12.005.
[14]. Niveditha, V. R., Ananthan, T. V. , Amudha, S., Sam, D., and Srinidhi, S., “Detect and classify zero day malware efficiently in big data platform” Int. J. Adv. Sci. Technol., vol. 29, no. 4 Special Issue, pp. 1947–1954, 2020, doi: 10.13140/RG.2.2.20804.45445.
[15]. Libri, A., Bartolini, A., and Benini, L., “pAElla: Edge AI-based real-time malware detection in data centers” IEEE Internet Things J., vol. 7, no. 10, pp. 9589–9599, 2020, doi: 10.1109/JIOT.2020.2986702.
[16]. Wu, W. C., and Hung, S. H., “DroidDolphin: A dynamic android malware detection framework using big data and machine learning” Proc. 2014 Res. Adapt. Converg. Syst. RACS 2014, pp. 247–252, 2014, doi: 10.1145/2663761.2664223.
[17]. Wassermann, S., and Casas, P., “BIGMOMAL - Big data analytics for mobile malware detection” WTMC 2018 - Proc. 2018 Work. Traffic Meas. Cybersecurity, Part SIGCOMM 2018, pp. 33–39, 2018, doi: 10.1145/3229598.3229600.
[18]. Memon, L. U., Bawany, N. Z., and Shamsi, J. A., “A comparison of machine learning techniques for android malware detection using apache spark” J. Eng. Sci. Technol., vol. 14, no. 3, pp. 1572–1586, 2019.
[19]. Venkatraman, S., and Alazab, M., “Use of Data Visualisation for Zero-Day Malware Detection” Secur. Commun. Networks, vol. 2018, 2018, doi: 10.1155/2018/1728303.
[20]. Modiri, E., Azmoodeh, A., Dehghantanha, A., Ellis, D., Parizi, R. M., and Karimipour, H., “Fuzzy pattern tree for edge malware detection and categorization in IoT” J. Syst. Archit. Comput., vol. 97, no. October 2018, pp. 1–7, 2019, doi: 10.1016/j.sysarc.2019.01.017.

Analyzing of Machine Learning Algorithms Performance in Big Data Environment in terms of Malware Detection

Year 2021, Volume: 8 Issue: 3, 1536 - 1549, 30.09.2021

Sercan Gülburun , Murat Dener

https://doi.org/10.31202/ecjse.967919

Abstract

The place of information technology assets in both the daily lives of individuals and the functioning of institutions and organizations has increased rapidly in the last quarter century. Parallel to this increase, threats to information assets have also increased. One of the main threats to these assets is malware. In this study, the effectiveness of machine learning algorithms in detecting malicious software in big data environment was examined. In the study conducted in Google Colaboratory, Azure HDInsight, Amazon EMR and Google Dataproc, the effectiveness of random forest, decision trees and gradient boosting trees algorithms which are included in Apache 3.0 and capable of binary classification are tested using Kaggle Malware Detection dataset. In the study, which was carried out with a static analysis approach, accuracy, precision, sensitivity, training time and prediction time metrics were calculated for each machine learning algorithm and the results of same algorithms using Sci-Kit Learn library are collected and evaluated all together.

Keywords

Big Data, Machine Learning, Malware Detection, Google Dataproc, Azure HDInsight

References

[1]. Abawajy, J. H., and Kelarev, A., “Large iterative multitier ensemble classifiers for security of big data”, IEEE Trans. Emerg. Top. Comput., vol. 2, no. 3, pp. 352–363, 2014, doi: 10.1109/TETC.2014.2316510.
[2]. Bocchi, E., Grimaudo, L., Mellia, M., Baralis, E., Saha, S., Miskovic, S., Modelo-Howard, G. and Lee, S.J., “MAGMA network behavior classifier for malware traffic” Comput. Networks, vol. 109, pp. 142–156, 2016, doi: 10.1016/j.comnet.2016.03.021.
[3]. Gupta, D., and Rani, R., “Big data framework for zero-day malware detection” Cybern. Syst., vol. 49, no. 2, pp. 103–121, 2018, doi: 10.1080/01969722.2018.1429835.
[4]. Gupta, D., and Rani, R., “Improving malware detection using big data and ensemble learning,” Comput. Electr. Eng., vol. 86, p. 106729, 2020, doi: 10.1016/j.compeleceng.2020.106729.
[5]. Abawajy, J. H., Chowdhury, M., and Kelarev, A., “Hybrid consensus pruning of ensemble classifiers for big data malware detection” IEEE Trans. Cloud Comput., vol. 8, no. 2, pp. 398–407, 2020, doi: 10.1109/TCC.2015.2481378.
[6]. Usman, N., Usman, S., Khan, F., Jan, M.A., Saj,d, A., Alazab, M. and Watters, P., “Intelligent dynamic malware detection using machine learning in IP reputation for forensics data analytics” Futur. Gener. Comput. Syst., vol. 118, pp. 124–141, 2021, doi: 10.1016/j.future.2021.01.004.
[7]. Sahoo, A. K., Sahoo, K. S., and Tiwary, M., “Signature based malware detection for unstructured data in Hadoop” 2014 Int. Conf. Adv. Electron. Comput. Commun. ICAECC 2014, 2015, doi: 10.1109/ICAECC.2014.7002394.
[8]. Suhasini, N. S., Hirwarkar, T., and Ashok, J., “Big data analytics for malware detection in a virtulaized framework” vol. 7, no. 14, pp. 3184–3191, 2020.
[9]. Vinayakumar, R., Alazab, M., Soman, K. P., Poornachandran, P., and Venkatraman, S., “Robust intelligent malware detection using deep learning” IEEE Access, vol. 7, pp. 46717–46738, 2019, doi: 10.1109/ACCESS.2019.2906934.
[10]. De Paola, A., Gaglio, S., Lo Re, G., and Morana, M., “A hybrid system for malware detection on big data” INFOCOM 2018 - IEEE Conf. Comput. Commun. Work., pp. 45–50, 2018, doi: 10.1109/INFCOMW.2018.8406963.
[11]. Masabo, E., Kaawaase, K. S., and Sansa-Otim, J., “Big data: Deep learning for detecting malware” Proc. - Int. Conf. Softw. Eng., pp. 20–26, 2018, doi: 10.1145/3195528.3195533.
[12]. Yousefi-azar, M., Hamey, L. G. C., Varadharajan, V., and Chen, S., “Malytics : A malware detection scheme” IEEE Access, vol. 6, pp. 49418–49431, 2018, doi: 10.1109/ACCESS.2018.2864871.
[13]. Mao, W., Cai, Z., Yang, Y., and Shi, X., “From big data to knowledge : A spatio- temporal approach to malware detection” Comput. Secur., vol. 74, pp. 167–183, 2018, doi: 10.1016/j.cose.2017.12.005.
[14]. Niveditha, V. R., Ananthan, T. V. , Amudha, S., Sam, D., and Srinidhi, S., “Detect and classify zero day malware efficiently in big data platform” Int. J. Adv. Sci. Technol., vol. 29, no. 4 Special Issue, pp. 1947–1954, 2020, doi: 10.13140/RG.2.2.20804.45445.
[15]. Libri, A., Bartolini, A., and Benini, L., “pAElla: Edge AI-based real-time malware detection in data centers” IEEE Internet Things J., vol. 7, no. 10, pp. 9589–9599, 2020, doi: 10.1109/JIOT.2020.2986702.
[16]. Wu, W. C., and Hung, S. H., “DroidDolphin: A dynamic android malware detection framework using big data and machine learning” Proc. 2014 Res. Adapt. Converg. Syst. RACS 2014, pp. 247–252, 2014, doi: 10.1145/2663761.2664223.
[17]. Wassermann, S., and Casas, P., “BIGMOMAL - Big data analytics for mobile malware detection” WTMC 2018 - Proc. 2018 Work. Traffic Meas. Cybersecurity, Part SIGCOMM 2018, pp. 33–39, 2018, doi: 10.1145/3229598.3229600.
[18]. Memon, L. U., Bawany, N. Z., and Shamsi, J. A., “A comparison of machine learning techniques for android malware detection using apache spark” J. Eng. Sci. Technol., vol. 14, no. 3, pp. 1572–1586, 2019.
[19]. Venkatraman, S., and Alazab, M., “Use of Data Visualisation for Zero-Day Malware Detection” Secur. Commun. Networks, vol. 2018, 2018, doi: 10.1155/2018/1728303.
[20]. Modiri, E., Azmoodeh, A., Dehghantanha, A., Ellis, D., Parizi, R. M., and Karimipour, H., “Fuzzy pattern tree for edge malware detection and categorization in IoT” J. Syst. Archit. Comput., vol. 97, no. October 2018, pp. 1–7, 2019, doi: 10.1016/j.sysarc.2019.01.017.

There are 20 citations in total.

Details

Primary Language	Turkish
Subjects	Engineering
Journal Section	Makaleler
Authors	Sercan Gülburun 0000-0001-5272-3911 Murat Dener 0000-0001-5746-6141
Publication Date	September 30, 2021
Submission Date	July 8, 2021
Acceptance Date	August 31, 2021
Published in Issue	Year 2021 Volume: 8 Issue: 3

Cite

IEEE	S. Gülburun and M. Dener, “Büyük Veri Ortamlarında Zararlı Yazılım Tespiti Kapsamında Makine Öğrenmesi Algoritmalarının Performansının İncelenmesi”, El-Cezeri Journal of Science and Engineering, vol. 8, no. 3, pp. 1536–1549, 2021, doi: 10.31202/ecjse.967919.

Download Cover Image

Article Files

Full Text

Creative Commons License El-Cezeri is licensed to the public under a Creative Commons Attribution 4.0 license.