İnşaat Şirketi Müşterilerinin Gelecekteki Konut Satın Alma Davranışlarının Metin Madenciliği ve Makine Öğrenmesi ile Tahmin Modellerinin Oluşturulması

Haydar Ekelik; Şenol Emir

doi:10.17671/gazibtd.1484123

Research Article

İnşaat Şirketi Müşterilerinin Gelecekteki Konut Satın Alma Davranışlarının Metin Madenciliği ve Makine Öğrenmesi ile Tahmin Modellerinin Oluşturulması

Year 2024, Volume: 17 Issue: 4, 323 - 337, 31.10.2024

Haydar Ekelik , Şenol Emir

https://doi.org/10.17671/gazibtd.1484123

Abstract

Bu çalışmada, inşaat sektöründe faaliyet gösteren bir işletmenin müşterileriyle yüz yüze veya telefonla yapılan görüşmelerinin kayıtlarına çeşitli metin madenciliği ve makine öğrenmesi teknikleri uygulanmıştır. Temel amaç, bu metin tabanlı doküman kümesinden (korpus), yeni görüşme yapılan herhangi bir müşterinin ileride şirketten konut satın alıp almayacağını doğru bir şekilde tahmin edebilecek bir model geliştirmektir. Bu amaçla metinsel verilere bir takım veri ön işleme aşamaları uygulandıktan sonra anahtar kelimeler ve vektör uzay modeli oluşturmuş ve metin tabanlı veri analize uygun formata dönüştürülmüştür. CART(Classification And Regression Tree), RF(Random Forest) ve XGBoost(eXtreme Gradient Boosting) makine öğrenmesi yöntemleri uygulanarak farklı tahmin modelleri oluşturulmuş ve daha sonra bu modeller farklı sınıflandırma ölçütlerine göre karşılaştırılmıştır. Sınıflandırma problemlerinde sınıflardaki gözlem sayıları arasında dengesizlikler olması durumunda yaygın sınıflandırma ölçütlerine göre modellerin karşılaştırılması yanlı sonuçlar verebilmektedir. Bu nedenle literatürde bu gibi durumlar için genel karşılaştırma ölçütlerine ek olarak yeni ölçütler geliştirilmiştir. Çalışmadaki uygulamada da sınıflar arası dengesizlik olduğundan bu ölçütlerden birisi olan PR (Precision- Recall) eğrileri kullanılmıştır. Analiz sonucunda, PR eğrileri dikkate alındığında, görüşme yapılan yeni müşterilerin ileride konut alıp almayacağını en iyi tahmin eden yöntemin Random Forest olduğu görülmüştür.

Keywords

Metin Madenciliği, Pazarlama, Random Forest, CART, XGBoost

References

C. C. Aggarwal and C. Zhai, Eds. Mining Text Data (An Introduction to Text Mining. Springer, 2012.
L. Duan and Y. Xiong, "Big data analytics and business analytics," Journal of Management Analytics, vol. 2, no. 1, pp. 1-21, 2015/01/02 2015, doi: 10.1080/23270012.2015.1020891.
M. A. Hearst, "Untangling text data mining," presented at the Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, College Park, Maryland, 1999. [Online]. Available: https://doi.org/10.3115/1034678.1034679.
I. Feinerer, K. Hornik, and D. Meyer, "Text Mining Infrastructure in R," Journal of Statistical Software, vol. 25, no. 5, pp. 1 - 54, 03/31 2008, doi: 10.18637/jss.v025.i05.
R. Feldman and I. Dagan, "Knowledge Discovery in Textual Databases (KDT)," 06/28 1995.
A. Hotho, A. Nürnberger, and G. Paass, "A Brief Survey of Text Mining," LDV Forum - GLDV Journal for Computational Linguistics and Language Technology, vol. 20, pp. 19-62, 07/01 2005, doi: 10.21248/jlcl.20.2005.68.
R. Feldman, Ronen, Sanger, and James, The text mining handbook: Advanced approaches in analyzing unstructured data. 2007.
D. Delen and M. Crossland, "Seeding the survey and analysis of research literature with text mining," Expert Systems with Applications, vol. 34, pp. 1707-1720, 04/01 2008, doi: 10.1016/j.eswa.2007.01.035.
P. Hosseini, S. Khoshsirat, M. Jalayer, S. Das, and H. Zhou, "Application of text mining techniques to identify actual wrong-way driving (WWD) crashes in police reports," International Journal of Transportation Science and Technology, vol. 12, no. 4, pp. 1038-1051, 2023/12/01/ 2023, doi: https://doi.org/10.1016/j.ijtst.2022.12.002.
S. Soleimani, M. Leitner, and J. Codjoe, "Applying machine learning, text mining, and spatial analysis techniques to develop a highway-railroad grade crossing consolidation model," Accident Analysis & Prevention, vol. 152, p. 105985, 2021/03/01/ 2021, doi: https://doi.org/10.1016/j.aap.2021.105985.
M. Nilashi et al., "Big social data and customer decision making in vegetarian restaurants: A combined machine learning method," Journal of Retailing and Consumer Services, vol. 62, no. 102630, 2021.
A. Petropoulos and V. Siakoulis, "Can central bank speeches predict financial market turbulence? Evidence from an adaptive NLP sentiment index analysis using XGBoost machine learning technique," Central Bank Review, vol. 21, no. 4, pp. 141-153, 2021/12/01/ 2021, doi: https://doi.org/10.1016/j.cbrev.2021.12.002.
S. Chatterjee, D. Goyal, A. Prakash, and J. Sharma, "Exploring healthcare/health-product ecommerce satisfaction: A text mining and machine learning application," Journal of Business Research, vol. 131, pp. 815-825, 2021/07/01/ 2021, doi: https://doi.org/10.1016/j.jbusres.2020.10.043.
W.-C. Lin, C.-F. Tsai, and H. Chen, "Factors affecting text mining based stock prediction: Text feature representations, machine learning models, and news platforms," Applied Soft Computing, vol. 130, p. 109673, 10/01 2022, doi: 10.1016/j.asoc.2022.109673.
C. Allenbrand, "Supervised and unsupervised learning models for pharmaceutical drug rating and classification using consumer generated reviews," Healthcare Analytics, vol. 5, p. 100288, 2024/06/01/ 2024, doi: https://doi.org/10.1016/j.health.2023.100288.
Y. Anagun, N. S. Bolel, S. Isik, and S. E. Ozkan, "DEEP LEARNING-BASED CUSTOMER COMPLAINT MANAGEMENT," Journal of Organizational Computing and Electronic Commerce, vol. 32, no. 3-4, pp. 217-231, 2022/10/02 2022, doi: 10.1080/10919392.2023.2210049.
S. Isik, Z. Kurt, Y. Anagun, and K. Ozkan, "Spam E-mail Classification Recurrent Neural Networks for Spam E-mail Classification on an Agglutinative Language," International Journal of Intelligent Systems and Applications in Engineering, vol. 8, no. 4, pp. 221-227, 12/30 2020, doi: 10.18201/ijisae.2020466316.
S. Baek, W. Jung, and S. H. Han, "A critical review of text-based research in construction: Data source, analysis method, and implications," Automation in Construction, vol. 132, p. 103915, 12/01 2021, doi: 10.1016/j.autcon.2021.103915.
H. Yan, M. Ma, Y. Wu, H. Fan, and C. Dong, "Overview and analysis of the text mining applications in the construction industry," Heliyon, vol. 8, no. 12, p. e12088, 2022/12/01/ 2022, doi: https://doi.org/10.1016/j.heliyon.2022.e12088.
A. Shamshiri, K. Ryu, and J. Y. Park, "Text mining and natural language processing in construction," Automation in Construction, vol. 158, p. 105200, 02/01 2024, doi: 10.1016/j.autcon.2023.105200.
R: A Language and Environment for Statistical Computing. (2021). R Foundation for Statistical Computing, Vienna, Austria. [Online]. Available: https://www.R-project.org/
E. Ikonomakis, S. Kotsiantis, and V. Tampakas, "Text Classification Using Machine Learning Techniques," WSEAS transactions on computers, vol. 4, pp. 966-974, 08/01 2005.
E. Leopold and J. Kindermann, "Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?," Machine Learning, vol. 46, no. 1, pp. 423-444, 2002/01/01 2002, doi: 10.1023/A:1012491419635.
J. Han, M. Kamber, and J. Pei, Data mining : concepts and techniques, 3 ed. Morgan Kaufmann, 2012.
G. James, D. Witten, T. Hastie, and R. Tibshirani, An introduction to statistical learning : with applications in R. New York : Springer, 2013.
L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.
L. Rokach, Pattern Classification Using Ensemble Methods. Singapore: World Scientific Publishing, 2010.
J. H. Friedman, "Greedy function approximation: a gradient boosting machine," Annals of statistics, pp. 1189-1232, 2001.
J. Son, I. Jung, K. Park, and B. Han, "Tracking-by-Segmentation with Online Gradient Boosting Decision Tree," in 2015 IEEE International Conference on Computer Vision (ICCV), 7-13 Dec. 2015 2015, pp. 3056-3064, doi: 10.1109/ICCV.2015.350.
G. Ke et al., "LightGBM: a highly efficient gradient boosting decision tree," presented at the Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, California, USA, 2017.
R. Mitchell and E. Frank, "Accelerating the XGBoost algorithm using GPU computing," PeerJ Comput. Sci., vol. 3, p. e127, 2017.
T. Chen and C. Guestrin, "XGBoost: A Scalable Tree Boosting System," presented at the Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA, 2016.

Predictive Modeling of Future Home Purchase Behavior in Construction Company Customers Using Text Mining and Machine Learning

Year 2024, Volume: 17 Issue: 4, 323 - 337, 31.10.2024

Haydar Ekelik , Şenol Emir

https://doi.org/10.17671/gazibtd.1484123

Abstract

In this study, various text mining and machine learning techniques were applied to the recordings of face-to-face or telephone interviews with customers of a company operating in the construction industry. The main objective is to develop a model from this set of text-based documents (corpus) that can accurately predict whether a new customer interviewed will purchase a house from the company in the future. For this purpose, a number of data preprocessing steps were applied to the textual data, then keywords and vector space model were created and the text-based data were converted into a format suitable for further analysis. Different prediction models were created by applying CART(Classification And Regression Tree), RF(Random Forest) and XGBoost(eXtreme Gradient Boosting) methods and then these models were compared according to different classification metrics. In classification problems, imbalances between classes make it difficult to compare models. For this reason, in literature new metrics have been developed in addition to the classical performance metrics. Since there is an imbalance between classes in the application in this study, PR (precision-recall) curves, one of the developed criteria, were used. As a result of the analysis, when the PR curves are taken into account, it is seen that Random Forest shows the best performance for predicting whether interviewed new customers will buy a house in the future.

Keywords

text mining, marketing, random forest, CART, XGBoost

References

C. C. Aggarwal and C. Zhai, Eds. Mining Text Data (An Introduction to Text Mining. Springer, 2012.
L. Duan and Y. Xiong, "Big data analytics and business analytics," Journal of Management Analytics, vol. 2, no. 1, pp. 1-21, 2015/01/02 2015, doi: 10.1080/23270012.2015.1020891.
M. A. Hearst, "Untangling text data mining," presented at the Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, College Park, Maryland, 1999. [Online]. Available: https://doi.org/10.3115/1034678.1034679.
I. Feinerer, K. Hornik, and D. Meyer, "Text Mining Infrastructure in R," Journal of Statistical Software, vol. 25, no. 5, pp. 1 - 54, 03/31 2008, doi: 10.18637/jss.v025.i05.
R. Feldman and I. Dagan, "Knowledge Discovery in Textual Databases (KDT)," 06/28 1995.
A. Hotho, A. Nürnberger, and G. Paass, "A Brief Survey of Text Mining," LDV Forum - GLDV Journal for Computational Linguistics and Language Technology, vol. 20, pp. 19-62, 07/01 2005, doi: 10.21248/jlcl.20.2005.68.
R. Feldman, Ronen, Sanger, and James, The text mining handbook: Advanced approaches in analyzing unstructured data. 2007.
D. Delen and M. Crossland, "Seeding the survey and analysis of research literature with text mining," Expert Systems with Applications, vol. 34, pp. 1707-1720, 04/01 2008, doi: 10.1016/j.eswa.2007.01.035.
P. Hosseini, S. Khoshsirat, M. Jalayer, S. Das, and H. Zhou, "Application of text mining techniques to identify actual wrong-way driving (WWD) crashes in police reports," International Journal of Transportation Science and Technology, vol. 12, no. 4, pp. 1038-1051, 2023/12/01/ 2023, doi: https://doi.org/10.1016/j.ijtst.2022.12.002.
S. Soleimani, M. Leitner, and J. Codjoe, "Applying machine learning, text mining, and spatial analysis techniques to develop a highway-railroad grade crossing consolidation model," Accident Analysis & Prevention, vol. 152, p. 105985, 2021/03/01/ 2021, doi: https://doi.org/10.1016/j.aap.2021.105985.
M. Nilashi et al., "Big social data and customer decision making in vegetarian restaurants: A combined machine learning method," Journal of Retailing and Consumer Services, vol. 62, no. 102630, 2021.
A. Petropoulos and V. Siakoulis, "Can central bank speeches predict financial market turbulence? Evidence from an adaptive NLP sentiment index analysis using XGBoost machine learning technique," Central Bank Review, vol. 21, no. 4, pp. 141-153, 2021/12/01/ 2021, doi: https://doi.org/10.1016/j.cbrev.2021.12.002.
S. Chatterjee, D. Goyal, A. Prakash, and J. Sharma, "Exploring healthcare/health-product ecommerce satisfaction: A text mining and machine learning application," Journal of Business Research, vol. 131, pp. 815-825, 2021/07/01/ 2021, doi: https://doi.org/10.1016/j.jbusres.2020.10.043.
W.-C. Lin, C.-F. Tsai, and H. Chen, "Factors affecting text mining based stock prediction: Text feature representations, machine learning models, and news platforms," Applied Soft Computing, vol. 130, p. 109673, 10/01 2022, doi: 10.1016/j.asoc.2022.109673.
C. Allenbrand, "Supervised and unsupervised learning models for pharmaceutical drug rating and classification using consumer generated reviews," Healthcare Analytics, vol. 5, p. 100288, 2024/06/01/ 2024, doi: https://doi.org/10.1016/j.health.2023.100288.
Y. Anagun, N. S. Bolel, S. Isik, and S. E. Ozkan, "DEEP LEARNING-BASED CUSTOMER COMPLAINT MANAGEMENT," Journal of Organizational Computing and Electronic Commerce, vol. 32, no. 3-4, pp. 217-231, 2022/10/02 2022, doi: 10.1080/10919392.2023.2210049.
S. Isik, Z. Kurt, Y. Anagun, and K. Ozkan, "Spam E-mail Classification Recurrent Neural Networks for Spam E-mail Classification on an Agglutinative Language," International Journal of Intelligent Systems and Applications in Engineering, vol. 8, no. 4, pp. 221-227, 12/30 2020, doi: 10.18201/ijisae.2020466316.
S. Baek, W. Jung, and S. H. Han, "A critical review of text-based research in construction: Data source, analysis method, and implications," Automation in Construction, vol. 132, p. 103915, 12/01 2021, doi: 10.1016/j.autcon.2021.103915.
H. Yan, M. Ma, Y. Wu, H. Fan, and C. Dong, "Overview and analysis of the text mining applications in the construction industry," Heliyon, vol. 8, no. 12, p. e12088, 2022/12/01/ 2022, doi: https://doi.org/10.1016/j.heliyon.2022.e12088.
A. Shamshiri, K. Ryu, and J. Y. Park, "Text mining and natural language processing in construction," Automation in Construction, vol. 158, p. 105200, 02/01 2024, doi: 10.1016/j.autcon.2023.105200.
R: A Language and Environment for Statistical Computing. (2021). R Foundation for Statistical Computing, Vienna, Austria. [Online]. Available: https://www.R-project.org/
E. Ikonomakis, S. Kotsiantis, and V. Tampakas, "Text Classification Using Machine Learning Techniques," WSEAS transactions on computers, vol. 4, pp. 966-974, 08/01 2005.
E. Leopold and J. Kindermann, "Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?," Machine Learning, vol. 46, no. 1, pp. 423-444, 2002/01/01 2002, doi: 10.1023/A:1012491419635.
J. Han, M. Kamber, and J. Pei, Data mining : concepts and techniques, 3 ed. Morgan Kaufmann, 2012.
G. James, D. Witten, T. Hastie, and R. Tibshirani, An introduction to statistical learning : with applications in R. New York : Springer, 2013.
L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.
L. Rokach, Pattern Classification Using Ensemble Methods. Singapore: World Scientific Publishing, 2010.
J. H. Friedman, "Greedy function approximation: a gradient boosting machine," Annals of statistics, pp. 1189-1232, 2001.
J. Son, I. Jung, K. Park, and B. Han, "Tracking-by-Segmentation with Online Gradient Boosting Decision Tree," in 2015 IEEE International Conference on Computer Vision (ICCV), 7-13 Dec. 2015 2015, pp. 3056-3064, doi: 10.1109/ICCV.2015.350.
G. Ke et al., "LightGBM: a highly efficient gradient boosting decision tree," presented at the Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, California, USA, 2017.
R. Mitchell and E. Frank, "Accelerating the XGBoost algorithm using GPU computing," PeerJ Comput. Sci., vol. 3, p. e127, 2017.
T. Chen and C. Guestrin, "XGBoost: A Scalable Tree Boosting System," presented at the Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA, 2016.

There are 32 citations in total.

Details

Primary Language	Turkish
Subjects	Machine Learning (Other), Data Mining and Knowledge Discovery, Natural Language Processing
Journal Section	Articles
Authors	Haydar Ekelik 0000-0002-0661-4164 Şenol Emir 0000-0002-6762-9351
Publication Date	October 31, 2024
Submission Date	May 14, 2024
Acceptance Date	October 19, 2024
Published in Issue	Year 2024 Volume: 17 Issue: 4

Cite

APA	Ekelik, H., & Emir, Ş. (2024). İnşaat Şirketi Müşterilerinin Gelecekteki Konut Satın Alma Davranışlarının Metin Madenciliği ve Makine Öğrenmesi ile Tahmin Modellerinin Oluşturulması. Bilişim Teknolojileri Dergisi, 17(4), 323-337. https://doi.org/10.17671/gazibtd.1484123

Download Cover Image

Article Files

Full Text