Phishing E-mail Detection with Deep Learning Models
Yıl 2020,
Cilt: 13 Sayı: 2, 17 - 31, 16.12.2020
Şeydanur Ahi
,
İbrahim Soğukpınar
Öz
Social engineering is the art of getting information (deception) from people with using technology or without using technology. The vast majority of the attacks facing today are human origin, and likewise, these attacks target computer users. Human being who is the weakest link in the security chain shows various weaknesses in the security process, due to human being’s variable behavior in different times. Phishing that is a kind of social engineering attack is technically created to capture consumers' financial or personal information. Phishing is the one of the biggest challenges for the e commerce world. Many companies and individuals lose billions of dollars because of phishing attacks. This global impact of phishing attacks will continue to increase therefore, more effective phishing detection techniques need to be developed to reduce threats. A detection method which is created by using deep learning models against phishing email attacks is proposed in this work. Various deep learning models were trained using the features obtained from the head and body parts of incoming e-mails in the proposed method. As a result of the tests, a 96.84% success rate was achieved with this detection method proposed against phishing attacks.
Kaynakça
- [1] Khonji, M. Iraqi Y., ve Jones, A. Phishing detection: a literature survey, IEEE Communications & Surveys Tutorials, vol. 15, no. 4, pp. 2091–2121, 2013.
- [2] Sheng S., Holbrook, M. Kumaraguru, P. L. Cranor, F. ve Downs, J. Who falls for phish?: a demographic analysis of phishing susceptibility and effectiveness of interventions, in Proceedings of the 28th Annual SIGCHI Conference on Human Factors in Computing Systems (CHI ’10), pp. 373–382, Atlanta, Ga, USA, April 2010.
- [3] Behdad M., Barone, L. Bennamoun, M. ve French, T. Nature inspired techniques in the context of fraud detection, IEEE Transactions on Systems, Man, and Cybernetics C: Applications and Reviews, vol. 42, no. 6, pp. 1273–1290, 2012.
- [4] Akinyelu, A. A., ve Adewumi, A. O. Classification of phishing email using random forest machine learning technique. Journal of Applied Mathematics, vol. 2014, 2014.
- [5] Mohammad, R. M., Thabtah, F., ve McCluskey, L. Intelligent rule-based phishing websites classification. IET Information Security, 8(3), 153-160. (2014).
- [6] Almomani, A., Gupta, B. B., Atawneh, S., Meulenberg, A., ve Almomani, E. A survey of phishing email filtering techniques. IEEE communications surveys & tutorials, 15(4), 2070-2090. (2013).
- [7] Silva, R. M., Yamakami, A., ve Almeida, T. A. An analysis of machine learning methods for spam host detection. In 2012 IEEE 11th International Conference on Machine Learning and Applications, (Vol. 2, pp. 227-232). IEEE. (2012, December).
- [8] Zareapoor, M., ve Seeja, K. R. Feature extraction or feature selection for text classification: A case study on phishing email detection. International Journal of Information Engineering and Electronic Business, 7(2), 60. (2015).
- [9] Nguyen, M., Nguyen, T., ve Nguyen, T. H. A deep learning model with hierarchical lstms and supervised attention for anti-phishing. arXiv preprint arXiv:1805.01554. (2018).
- [10] Özdemı̇r, C., Ataş, M., ve Özer, A. B. Classification of Turkish spam e-mails with artificial immune system. In 21st Signal Processing and Communications Applications Conference (SIU) (pp. 1-4). IEEE. (2013, April).
- [11] Basnet, R., Mukkamala, S., ve Sung, A. H. Detection of phishing attacks: A machine learning approach. In Soft Computing Applications in Industry (pp. 373-383). Springer, Berlin, Heidelberg. (2008).
- [12] Park, G., & Taylor, J. M. Using syntactic features for phishing detection. arXiv preprint arXiv:1506.00037. (2015).
- [13] E. Kreyszig Advanced Engineering Mathematics (Fourth ed.). Wiley. p. 880, eq. 5. ISBN 0-471-02140-7. (1979).
- [14] Spiegel, Murray R.; Stephens, Larry J Schaum's Outlines Statistics (Fourth ed.), McGraw Hill, ISBN 978-0-07-148584-5 (2008),
- [15] Tool for computing continuous distributed representations of words, Google Jul 30, 2013, Accessed on: Nov. 2019. [Online]. Available: https://code.google.com/archive/p/word2vec/
- [16] Chollet, F. Keras Git Hubrepository. [Online]. Available: https://github.com/fchollet/keras [Accessed 2020].
- [17] Kingma, D. P., ve Ba, J. Adam: A method for stochastic optimization. ArXivpreprint arXiv:1412.6980. (2014).
- [18] Şeker, A, Diri, B, Balık, H., Derin Öğrenme Yöntemleri ve Uygulamaları Hakkında Bir İnceleme. Gazi Mühendislik Bilimleri Dergisi (GMBD), 3 (3), 47-64 (2017).
- [19] Sak, H., Senior, A. W., ve Beaufays, F. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. (2014).
- [20] Jose Navario phishing corpus, [Online]. Available: https://monkey.org/~jose/phishing/. [Accessed 2020].
- [21] W. W. Cohen, Enron Email Dataset, 8 May [Online]. Available: https://www.cs.cmu.edu/~enron/. [Accessed 2020].
- [22] Enron Corporation-Company Profile, [Online]. Available: https://www.referenceforbusiness.com/history2/57/Enron-Corporation.html. [Accessed 2020].
- [23] Vinayakumar, R., Barathi Ganesh, H. B., ve Kumar, M., ve Soman, K. P. DeepAnti-PhishNet: applying deep neural networks for phishing email detection. CEN-AISecurity@ IWSPA, 40-50. (2018).
- [24] Kramer, O. Scikit-learn. In Machine learning for evolution strategies (pp. 45-53). Springer, Cham. (2016).
- [25] T. Fawcett, An Introduction to ROC Analysis, Pattern Recognition Letters, vol. 27, Jun 2006, pp. 861-874.
- [26] Abu-Nimeh, S., Nappa, D., Wang, X., ve Nair, S. October). A comparison of machine learning techniques for phishing detection. In Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit (pp. 60-69). (2007,
- [27] Hussain, R., ve Qamar, U. An Approach to Detect Spam Emails by Using Majority Voting. In International Conference on Data Mining, Internet Computing and Big Data (BigData2014) (pp. 76-83). (2014).
- [28] Das, A., ve Verma, R. Automated email Generation for Targeted Attacks using Natural Language. arXiv preprint arXiv:1908.06893. (2019).
- [29] Almomani, A., Gupta, B. B., Atawneh, S., Meulenberg, A., ve Almomani, E. A survey of phishing email filtering techniques. IEEE communications surveys & tutorials, 15(4), 2070-2090. (2013).
- [30] Richardson, L. Beautiful soup documentation. April. (2007).
- [31] Hopkins M., UCI Machine Learning Repository, Spambase Data Set, [Online].Available: https://archive.ics.uci.edu/ml/datasets/Spambase. [Accessed 2019]
- [32] Moradpoor, N., Clavie, B., ve Buchanan, B. Employing machine learning techniques for detection and classification of phishing emails. In 2017 Computing Conference (pp. 149-156). IEEE. (2017, July).
- [33] Spam Assassin spam email public corpus, [Online].Available: https://spamassassin.apache.org/old/publiccorpus/. [Accessed 2020].
- [34] Unnithan, N. A., Harikrishnan, N. B., Vinayakumar, R., Soman, K. P., & Sundarakrishna, S. Detecting phishing E-mail using machine learning techniques (2018).
- [35] Sonowal, G., & Kuppusamy, K. S. PhiDMA–A phishing detection model with multi-filter approach. Journal of King Saud University-Computer and Information Sciences. (2017).
- [36] Espinoza, B., Simba, J., Fuertes, W., Benavides, E., Andrade, R., & Toulkeridis, T. December). Phishing Attack Detection: A Solution Based on the Typical Machine Learning Modeling Cycle. In 2019 International Conference on Computational Science and Computational Intelligence (CSCI) (pp. 202-207). IEEE. (2019,
Derin Öğrenme Modelleri ile Kimlik Avı E-posta Tespiti
Yıl 2020,
Cilt: 13 Sayı: 2, 17 - 31, 16.12.2020
Şeydanur Ahi
,
İbrahim Soğukpınar
Öz
Sosyal mühendislik, teknolojiyi kullanarak ya da teknolojiyi kullanmadan insanlardan bilgi edinme (aldatma) sanatıdır. Günümüzde karşı karşıya olduğumuz saldırıların çok büyük bir kısmı insan kaynaklıdır ve aynı şekilde sistemleri değil onları kullanan insanları hedef almaktadır. Güvenlik zincirindeki en zayıf halka olan insan, farklı zamanlarda farklı davranışlar sergilemesinden dolayı güvenlik sürecinde çeşitli zafiyetler gösterebilmektedir. Kimlik avı teknik olarak tüketicilerin finansal veya kişisel bilgilerini ele geçirmek için oluşturulmuş bir tür sosyal mühendislik saldırısıdır. Kimlik avı bugün e-ticaret dünyasının karşılaştığı en büyük zorluklardan biridir. Kimlik avı saldırıları yüzünden birçok şirket ve birey milyarlarca dolar kaybetmektedir. Kimlik avı saldırılarının bu küresel etkisi artmaya devam edecektir ve bu nedenle tehditleri azaltmak için daha etkili kimlik avı algılama tekniklerinin geliştirilmesi gerekmektedir. Bu çalışmada, kimlik avı e-posta saldırılarına karşı derin öğrenme modelleri kullanılarak oluşturulan bir tespit yöntemi önerilmiştir. Önerilen yöntemde gelen e-posta iletilerinin başlık ve gövde bölümlerinden elde edilen özellikler kullanılarak çeşitli derin öğrenme modelleri eğitilmiştir. Yapılan testler sonucunda kimlik avı saldırılarına karşı önerilen bu tespit yöntemi %96,84’lük bir başarı oranı elde edilmiştir.
Kaynakça
- [1] Khonji, M. Iraqi Y., ve Jones, A. Phishing detection: a literature survey, IEEE Communications & Surveys Tutorials, vol. 15, no. 4, pp. 2091–2121, 2013.
- [2] Sheng S., Holbrook, M. Kumaraguru, P. L. Cranor, F. ve Downs, J. Who falls for phish?: a demographic analysis of phishing susceptibility and effectiveness of interventions, in Proceedings of the 28th Annual SIGCHI Conference on Human Factors in Computing Systems (CHI ’10), pp. 373–382, Atlanta, Ga, USA, April 2010.
- [3] Behdad M., Barone, L. Bennamoun, M. ve French, T. Nature inspired techniques in the context of fraud detection, IEEE Transactions on Systems, Man, and Cybernetics C: Applications and Reviews, vol. 42, no. 6, pp. 1273–1290, 2012.
- [4] Akinyelu, A. A., ve Adewumi, A. O. Classification of phishing email using random forest machine learning technique. Journal of Applied Mathematics, vol. 2014, 2014.
- [5] Mohammad, R. M., Thabtah, F., ve McCluskey, L. Intelligent rule-based phishing websites classification. IET Information Security, 8(3), 153-160. (2014).
- [6] Almomani, A., Gupta, B. B., Atawneh, S., Meulenberg, A., ve Almomani, E. A survey of phishing email filtering techniques. IEEE communications surveys & tutorials, 15(4), 2070-2090. (2013).
- [7] Silva, R. M., Yamakami, A., ve Almeida, T. A. An analysis of machine learning methods for spam host detection. In 2012 IEEE 11th International Conference on Machine Learning and Applications, (Vol. 2, pp. 227-232). IEEE. (2012, December).
- [8] Zareapoor, M., ve Seeja, K. R. Feature extraction or feature selection for text classification: A case study on phishing email detection. International Journal of Information Engineering and Electronic Business, 7(2), 60. (2015).
- [9] Nguyen, M., Nguyen, T., ve Nguyen, T. H. A deep learning model with hierarchical lstms and supervised attention for anti-phishing. arXiv preprint arXiv:1805.01554. (2018).
- [10] Özdemı̇r, C., Ataş, M., ve Özer, A. B. Classification of Turkish spam e-mails with artificial immune system. In 21st Signal Processing and Communications Applications Conference (SIU) (pp. 1-4). IEEE. (2013, April).
- [11] Basnet, R., Mukkamala, S., ve Sung, A. H. Detection of phishing attacks: A machine learning approach. In Soft Computing Applications in Industry (pp. 373-383). Springer, Berlin, Heidelberg. (2008).
- [12] Park, G., & Taylor, J. M. Using syntactic features for phishing detection. arXiv preprint arXiv:1506.00037. (2015).
- [13] E. Kreyszig Advanced Engineering Mathematics (Fourth ed.). Wiley. p. 880, eq. 5. ISBN 0-471-02140-7. (1979).
- [14] Spiegel, Murray R.; Stephens, Larry J Schaum's Outlines Statistics (Fourth ed.), McGraw Hill, ISBN 978-0-07-148584-5 (2008),
- [15] Tool for computing continuous distributed representations of words, Google Jul 30, 2013, Accessed on: Nov. 2019. [Online]. Available: https://code.google.com/archive/p/word2vec/
- [16] Chollet, F. Keras Git Hubrepository. [Online]. Available: https://github.com/fchollet/keras [Accessed 2020].
- [17] Kingma, D. P., ve Ba, J. Adam: A method for stochastic optimization. ArXivpreprint arXiv:1412.6980. (2014).
- [18] Şeker, A, Diri, B, Balık, H., Derin Öğrenme Yöntemleri ve Uygulamaları Hakkında Bir İnceleme. Gazi Mühendislik Bilimleri Dergisi (GMBD), 3 (3), 47-64 (2017).
- [19] Sak, H., Senior, A. W., ve Beaufays, F. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. (2014).
- [20] Jose Navario phishing corpus, [Online]. Available: https://monkey.org/~jose/phishing/. [Accessed 2020].
- [21] W. W. Cohen, Enron Email Dataset, 8 May [Online]. Available: https://www.cs.cmu.edu/~enron/. [Accessed 2020].
- [22] Enron Corporation-Company Profile, [Online]. Available: https://www.referenceforbusiness.com/history2/57/Enron-Corporation.html. [Accessed 2020].
- [23] Vinayakumar, R., Barathi Ganesh, H. B., ve Kumar, M., ve Soman, K. P. DeepAnti-PhishNet: applying deep neural networks for phishing email detection. CEN-AISecurity@ IWSPA, 40-50. (2018).
- [24] Kramer, O. Scikit-learn. In Machine learning for evolution strategies (pp. 45-53). Springer, Cham. (2016).
- [25] T. Fawcett, An Introduction to ROC Analysis, Pattern Recognition Letters, vol. 27, Jun 2006, pp. 861-874.
- [26] Abu-Nimeh, S., Nappa, D., Wang, X., ve Nair, S. October). A comparison of machine learning techniques for phishing detection. In Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit (pp. 60-69). (2007,
- [27] Hussain, R., ve Qamar, U. An Approach to Detect Spam Emails by Using Majority Voting. In International Conference on Data Mining, Internet Computing and Big Data (BigData2014) (pp. 76-83). (2014).
- [28] Das, A., ve Verma, R. Automated email Generation for Targeted Attacks using Natural Language. arXiv preprint arXiv:1908.06893. (2019).
- [29] Almomani, A., Gupta, B. B., Atawneh, S., Meulenberg, A., ve Almomani, E. A survey of phishing email filtering techniques. IEEE communications surveys & tutorials, 15(4), 2070-2090. (2013).
- [30] Richardson, L. Beautiful soup documentation. April. (2007).
- [31] Hopkins M., UCI Machine Learning Repository, Spambase Data Set, [Online].Available: https://archive.ics.uci.edu/ml/datasets/Spambase. [Accessed 2019]
- [32] Moradpoor, N., Clavie, B., ve Buchanan, B. Employing machine learning techniques for detection and classification of phishing emails. In 2017 Computing Conference (pp. 149-156). IEEE. (2017, July).
- [33] Spam Assassin spam email public corpus, [Online].Available: https://spamassassin.apache.org/old/publiccorpus/. [Accessed 2020].
- [34] Unnithan, N. A., Harikrishnan, N. B., Vinayakumar, R., Soman, K. P., & Sundarakrishna, S. Detecting phishing E-mail using machine learning techniques (2018).
- [35] Sonowal, G., & Kuppusamy, K. S. PhiDMA–A phishing detection model with multi-filter approach. Journal of King Saud University-Computer and Information Sciences. (2017).
- [36] Espinoza, B., Simba, J., Fuertes, W., Benavides, E., Andrade, R., & Toulkeridis, T. December). Phishing Attack Detection: A Solution Based on the Typical Machine Learning Modeling Cycle. In 2019 International Conference on Computational Science and Computational Intelligence (CSCI) (pp. 202-207). IEEE. (2019,