A Design of Crime Category Detection Framework using Stacking Ensemble Model

Recep Sinan Arslan; Burak Dülgeroğlu

doi:10.21605/cukurovaumfd.1410642

Araştırma Makalesi

A Design of Crime Category Detection Framework using Stacking Ensemble Model

Yıl 2023, Cilt: 38 Sayı: 4, 1035 - 1048, 28.12.2023

Recep Sinan Arslan , Burak Dülgeroğlu

https://doi.org/10.21605/cukurovaumfd.1410642

Cited By: 1

Öz

Crime refers to an action legally defined as harmful to society, and it is important to understand the type of crime to prevent these actions. However, crime can occur at any time and place, making it difficult to predict. Data generated based on previously committed crimes contributes to overcoming this difficulty. This study proposes a novel model for classifying criminal activities using a Doc2Vec that can cause a numerical representation of texts regardless of length and a stacking ensemble model that includes 8 different machine-learning models. Unlike the literature, the model processes the features as text and converts them into vectors rather than categorically. In this way, it enables using features that cannot be used in the literature. The proposed model is tested using a distributed online competition database, Francisco Crime Classification, which contains crimes committed over 12 years. An accuracy value of 99.28% was obtained for the 15 crime categories with the highest crime records, while precision, recall, and f-score values were 99.18%, 99.38%, and 99.20%, respectively. With cross-validation (k=10), 99.80% performance was achieved with a std. value of 0.001. These performance values are higher than those of all the studies in the literature using categorical feature structures. The results show that converting criminal activity reports, which contain text-based features, into vectors that can be processed with natural language processing techniques such as Doc2vec instead of using them categorically in model training can directly contribute to the classification performance and provide a more efficient model with less preprocessing.

Anahtar Kelimeler

Crime prediction, Criminology, Doc2vec, Stacking ensemble model

Kaynakça

1. İçli, T.G., 1993. Türkiye’de Suçlular (Sosyal Kültürel ve Ekonomik Özellikleri. Atatürk Kültür, Dil ve Tarih Kurumu Atatürk Kültür Merkezi Yayını, Ankara, 71.
2. Hochstetler, J., Hochstetler, L., Fu, S., 2016. An Optimal Police Patrol Planning Strategy for Smart City Safety. IEEE 18th International Conference on High Performance Computing and Communications, Sydney, Australia, 1256-1263.
3. Open Government, https://www.data.gov/open -gov/, Access date: Haziran 2023.
4. Data.world Crime Datasets, https://data.world/ datasets/crime, Access date: Temmuz 2023.
5. All Data Related to Crime And Justice, https://www.ons.gov.uk/peoplepopulationandcommunity/crimeandjustice/datalist?filter=datasets, Access date: Ağustos 2023.
6. Pradhan, I., Potika, K., Eirinaki, M., Potikas, P., 2019. Exploratory Data Analysis and Crime Prediction for Smart Cities. Proceedings of the 23rd International Database Applications and Engineering Symposium on - IDEAS ’19, Athens, Greece, 1-9.
7. Ke, J., Li, X., Chen, J., 2018. San Fransisco Crime Classification (Report), Jocobs School of Engineering, San Diego, 7.
8. Khan, M., Ali, A., Alharbi, Y., 2022. Predicting and Preventing Crime: A Crime Prediction Model using San Francisco Crime Data by Classification Techniques. Complexity, 1-13.
9. Wu, X., 2016. An Informative and Predictive Analysis of the San Francisco Police Department Crime Data. M.Sc., University of California, Los Angeles, 11.
10. Abouelnaga, Y., 2016. San Francisco Crime Classification”, arXiv:1607.03626.
11. Arslan, R.S., Dülgeroğlu, B., 2023. Crime Classification using Categorical Feature Engineering and Machine Learning. International Ankara Congress on Multidisciplinary Studies-VI, Ankara, Turkey, 1-8.
12. Aldossari, B.S., Alqahtani, F.M., Alshahrani, N.S., Alhammam, M.M., Alzamanan, R.M., Aslam, N.I., 2020. A Comparative Study of Decision Tree and Naive Bayes Machine Learning Model for Crime Category Prediction in Chicago. Proceedings of 2020 6th International Conference on Computing and Data Engineering, Sanya, China, 34-38.
13. Deborah, D., Jitesh, J., Kieron, D., Vincent, T., 2023. A Comparative Analysis of Multiple Methods for Predicting a Specific Type of Crime in the City of Chicago. ArXiv: 2304.13464.
14. Reier Forradellas, R.F., Náñez Alonso, S.L., Jorge-Vazquez, J., Rodriguez, M.L., 2020. Applied Machine Learning in Social Sciences: Neural Networks and Crime Prediction. Social Sciences, 10(1), 4.
15. Kim, S., Joshi, P., Kalsi, P.S., Taheri, P., 2018. Crime Analysis through Machine Learning. 2018 IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver, Canada, 1-6.
16. Alves, L.G.A., Ribeiro, H.V., Rodrigues, F.A., 2018. Crime Prediction through Urban Metrics and Statistical Learning. Physica A: Statistical Mechanics and Its Applications, 505, 435-443.
17. Wu, S., Wang, C., Cao, H., Jia, X., 2020. Crime Prediction using Data Mining and Machine Learning. Intell. Syst. Comput., Springer Verlag, 905, 360-375.
18. Bandekar, S.R., Vijayalakshmi, C., 2020. Design and Analysis of Machine Learning algorithms for the Reduction of Crime Rates in India. Procedia Computer Science, 172, 122-127.
19. Gül, S., Polat, A., 2009. Kamu Güvenlik Politikalarının Oluşturulmasında Yeni Bir Yaklaşım: Suç Tahmini. Türk İdare Dergisi. 463 (81), 131-156.
20. Iqbal, R., 2013. An Experimental Study of Classification Algorithms for Crime Prediction. Indian Journal of Science and Technology, 6(3), 1-7.
21. Saeed, U., Sarim, M., Usmani, A., Mukhtar, A., Basit, S.A., Kashif Riffat, S., 2015. Application of Machine Learning Algorithms in Crime Classification and Classification Rule Mining. Research Journal of Recent Sciences, 4(3), 106-114.
22. Shojaee, S., Mustapha, A., Fatimah, S., Jabar, A., 2013. A Study on Classification Learning Algorithms to Predict Crime Status. International Journal of Digital Content Technology and its Applications, 7(9), 361-371.
23. Arslan, R.S., 2021. Comparison of Feature Selection Methods in Security Analysis of Android. 2021 6th International Conference on Computer Science and Engineering (UBMK). Ankara, Turkey, 1-5.
24. Lundberg, S., Lee, S., 2017. A Unified Approach to Interpreting Model Predictions. 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 1-11.
25. Hizlisoy, S., Arslan, R.S., 2021. Text Independent Speaker Recognition Based on MFCC and Machine Learning. Selcuk University Journal of Engineering Sciences, 20(03), 073-078.
26. Hizlisoy, S., Tüfekci, Z., 2020. Türkçe Müzikten Duygu Tanıma. European Journal of Science and Technology, Special Issue, 6-12.
27. Arslan, R.S., Yurttakal, A.H., 2020. K-Nearest Neighbour Classifier Usage for Permission based Malware Detection in Android. Icontech International Journal, 4(2), 15-27.
28. Quoc, L., Tomas, M., 2014. Distributed Representations of Sentences and Documents. Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2), 1188-1196.
29. Arslan, R.S., 2021. Kötücül Web Sayfalarının Tespitinde Doc2Vec Modeli ve Makine Öğrenmesi Yaklaşımı. European Journal of Science and Technology, 27, 792-801.
30. Arslan, R.S., 2021. Kötücül URL Filtreleme için Derin Öğrenme Modeli Tasarımı. European Journal of Science and Technology, 29, 122-128.
31. Arslan, R.S., 2021. Identify type of Android malware with Machine Learning Based Ensemble model. 2021 5th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), Ankara, Turkey, 1-5.

Suç Kategorisi Tespiti için Yığınlama Topluluk Öğrenimi Modeli Kullanan Çatı Tasarımı

Yıl 2023, Cilt: 38 Sayı: 4, 1035 - 1048, 28.12.2023

Recep Sinan Arslan , Burak Dülgeroğlu

https://doi.org/10.21605/cukurovaumfd.1410642

Cited By: 1

Öz

Suç, toplum açısından kanuni olarak zararlı olarak tanımlanmış eylemi ifade eder ve bu eylemlerin engellenmesi için suç türünün anlaşılması oldukça önemlidir. Ancak suç herhangi bir zamanda ve yerde meydana gelebilmektedir ve bu durum suçun tahmin edilebilirliğini zorlaştırmaktadır. Daha önce işlenmiş suçlara dayalı olarak oluşturulan verilerin kullanılması bu zorluğun aşılmasına katkı sağlamaktadır. Bu çalışmada suç faaliyetlerini sınıflandırma için uzunluğundan bağımsız olarak metinlerin sayısal temsilini üretebilen Doc2Vec yapısı ve 8 farklı yapay öğrenme modelini içeren bir yığınlama topluluk öğrenimi modelin kullanıldığı özgün bir model önerilmiştir. Model literatürden farklı olarak öznitelikleri kategorik olarak değil metin olarak işlemekte ve vektör haline dönüştürmektedir. Bu sayede literatürde kullanılamayan özniteliklerin kullanılmasını sağlamaktadır. Önerilen model 12 yıl boyunca işlenen suçları içeren, Francisco Crime Classification, isimli online dağıtımlı bir çevrimiçi yarışma veriseti kullanılarak test edilmiştir. En yüksek suç kaydının olduğu 15 suç kategorisi için %99,28 doğruluk değeri elde edilirken, kesinlik, geri çağırma ve f-değeri sırasıyla %99,18, %99,38 ve %99,28 olmuştur. Çapraz doğrulama (k=10) ile 0,001 std. değeri ile %99,8 başarım yakalanmıştır. Bu performans değerleri kategorik özellik yapısının kullanıldığı literatürdeki tüm çalışmalardan yüksektir. Elde edilen sonuçlar metin tabanlı özellikler barındıran suç faaliyet raporlarının kategorik olarak model eğitimlerinde kullanılması yerine Doc2Vec gibi doğal dil işleme teknikleri ile işlenebilir vektörlere dönüştürülmesinin sınıflandırma performansına doğrudan katkı sunabildiğini göstermiş ve daha az ön işlem sayesinde daha verimli bir modelin ortaya çıkmasını sağlamıştır.

Anahtar Kelimeler

Suç tahminlemesi, Kriminoloji, Doc2Vec, Topluluk öğrenme

Kaynakça

1. İçli, T.G., 1993. Türkiye’de Suçlular (Sosyal Kültürel ve Ekonomik Özellikleri. Atatürk Kültür, Dil ve Tarih Kurumu Atatürk Kültür Merkezi Yayını, Ankara, 71.
2. Hochstetler, J., Hochstetler, L., Fu, S., 2016. An Optimal Police Patrol Planning Strategy for Smart City Safety. IEEE 18th International Conference on High Performance Computing and Communications, Sydney, Australia, 1256-1263.
3. Open Government, https://www.data.gov/open -gov/, Access date: Haziran 2023.
4. Data.world Crime Datasets, https://data.world/ datasets/crime, Access date: Temmuz 2023.
5. All Data Related to Crime And Justice, https://www.ons.gov.uk/peoplepopulationandcommunity/crimeandjustice/datalist?filter=datasets, Access date: Ağustos 2023.
6. Pradhan, I., Potika, K., Eirinaki, M., Potikas, P., 2019. Exploratory Data Analysis and Crime Prediction for Smart Cities. Proceedings of the 23rd International Database Applications and Engineering Symposium on - IDEAS ’19, Athens, Greece, 1-9.
7. Ke, J., Li, X., Chen, J., 2018. San Fransisco Crime Classification (Report), Jocobs School of Engineering, San Diego, 7.
8. Khan, M., Ali, A., Alharbi, Y., 2022. Predicting and Preventing Crime: A Crime Prediction Model using San Francisco Crime Data by Classification Techniques. Complexity, 1-13.
9. Wu, X., 2016. An Informative and Predictive Analysis of the San Francisco Police Department Crime Data. M.Sc., University of California, Los Angeles, 11.
10. Abouelnaga, Y., 2016. San Francisco Crime Classification”, arXiv:1607.03626.
11. Arslan, R.S., Dülgeroğlu, B., 2023. Crime Classification using Categorical Feature Engineering and Machine Learning. International Ankara Congress on Multidisciplinary Studies-VI, Ankara, Turkey, 1-8.
12. Aldossari, B.S., Alqahtani, F.M., Alshahrani, N.S., Alhammam, M.M., Alzamanan, R.M., Aslam, N.I., 2020. A Comparative Study of Decision Tree and Naive Bayes Machine Learning Model for Crime Category Prediction in Chicago. Proceedings of 2020 6th International Conference on Computing and Data Engineering, Sanya, China, 34-38.
13. Deborah, D., Jitesh, J., Kieron, D., Vincent, T., 2023. A Comparative Analysis of Multiple Methods for Predicting a Specific Type of Crime in the City of Chicago. ArXiv: 2304.13464.
14. Reier Forradellas, R.F., Náñez Alonso, S.L., Jorge-Vazquez, J., Rodriguez, M.L., 2020. Applied Machine Learning in Social Sciences: Neural Networks and Crime Prediction. Social Sciences, 10(1), 4.
15. Kim, S., Joshi, P., Kalsi, P.S., Taheri, P., 2018. Crime Analysis through Machine Learning. 2018 IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver, Canada, 1-6.
16. Alves, L.G.A., Ribeiro, H.V., Rodrigues, F.A., 2018. Crime Prediction through Urban Metrics and Statistical Learning. Physica A: Statistical Mechanics and Its Applications, 505, 435-443.
17. Wu, S., Wang, C., Cao, H., Jia, X., 2020. Crime Prediction using Data Mining and Machine Learning. Intell. Syst. Comput., Springer Verlag, 905, 360-375.
18. Bandekar, S.R., Vijayalakshmi, C., 2020. Design and Analysis of Machine Learning algorithms for the Reduction of Crime Rates in India. Procedia Computer Science, 172, 122-127.
19. Gül, S., Polat, A., 2009. Kamu Güvenlik Politikalarının Oluşturulmasında Yeni Bir Yaklaşım: Suç Tahmini. Türk İdare Dergisi. 463 (81), 131-156.
20. Iqbal, R., 2013. An Experimental Study of Classification Algorithms for Crime Prediction. Indian Journal of Science and Technology, 6(3), 1-7.
21. Saeed, U., Sarim, M., Usmani, A., Mukhtar, A., Basit, S.A., Kashif Riffat, S., 2015. Application of Machine Learning Algorithms in Crime Classification and Classification Rule Mining. Research Journal of Recent Sciences, 4(3), 106-114.
22. Shojaee, S., Mustapha, A., Fatimah, S., Jabar, A., 2013. A Study on Classification Learning Algorithms to Predict Crime Status. International Journal of Digital Content Technology and its Applications, 7(9), 361-371.
23. Arslan, R.S., 2021. Comparison of Feature Selection Methods in Security Analysis of Android. 2021 6th International Conference on Computer Science and Engineering (UBMK). Ankara, Turkey, 1-5.
24. Lundberg, S., Lee, S., 2017. A Unified Approach to Interpreting Model Predictions. 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 1-11.
25. Hizlisoy, S., Arslan, R.S., 2021. Text Independent Speaker Recognition Based on MFCC and Machine Learning. Selcuk University Journal of Engineering Sciences, 20(03), 073-078.
26. Hizlisoy, S., Tüfekci, Z., 2020. Türkçe Müzikten Duygu Tanıma. European Journal of Science and Technology, Special Issue, 6-12.
27. Arslan, R.S., Yurttakal, A.H., 2020. K-Nearest Neighbour Classifier Usage for Permission based Malware Detection in Android. Icontech International Journal, 4(2), 15-27.
28. Quoc, L., Tomas, M., 2014. Distributed Representations of Sentences and Documents. Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2), 1188-1196.
29. Arslan, R.S., 2021. Kötücül Web Sayfalarının Tespitinde Doc2Vec Modeli ve Makine Öğrenmesi Yaklaşımı. European Journal of Science and Technology, 27, 792-801.
30. Arslan, R.S., 2021. Kötücül URL Filtreleme için Derin Öğrenme Modeli Tasarımı. European Journal of Science and Technology, 29, 122-128.
31. Arslan, R.S., 2021. Identify type of Android malware with Machine Learning Based Ensemble model. 2021 5th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), Ankara, Turkey, 1-5.

Toplam 31 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Veri Modelleri, Depolama ve Dizinleme, Bilgisayar Yazılımı
Bölüm	Makaleler
Yazarlar	Recep Sinan Arslan 0000-0002-3028-0416 Burak Dülgeroğlu Bu kişi benim 0009-0000-8201-9343
Yayımlanma Tarihi	28 Aralık 2023
Gönderilme Tarihi	16 Ekim 2023
Kabul Tarihi	25 Aralık 2023
Yayımlandığı Sayı	Yıl 2023 Cilt: 38 Sayı: 4

Kaynak Göster

APA	Arslan, R. S., & Dülgeroğlu, B. (2023). A Design of Crime Category Detection Framework using Stacking Ensemble Model. Çukurova Üniversitesi Mühendislik Fakültesi Dergisi, 38(4), 1035-1048. https://doi.org/10.21605/cukurovaumfd.1410642

Cited By

Crime Prediction with DistilBERT-based Feature Extraction and Machine Learning

Çukurova Üniversitesi Mühendislik Fakültesi Dergisi

https://doi.org/10.21605/cukurovaumfd.1606169

Kapak Resmi İndir

Makale Dosyaları

Tam Metin