Machine Learning Models for Accurate Prediction of Obesity: A Data-Driven Approach

Ali Değirmenci

doi:10.55525/tjst.1572382

Research Article

Machine Learning Models for Accurate Prediction of Obesity: A Data-Driven Approach

Year 2025, Volume: 20 Issue: 1, 77 - 90, 27.03.2025

Ali Değirmenci

https://doi.org/10.55525/tjst.1572382

Abstract

The number of people affected by obesity is rising steadily. Diagnosing obesity is crucial due to its harmful impacts on human health and it has become one of the world’s most important global health concerns. Therefore, it is crucial to develop methods that can enable early prediction of obesity risk and aid in mitigating the increasing prevalence of obesity. In the literature, some methods rely solely on Body Mass Index (BMI) for the prediction and classification of obesity may result in inaccurate outcomes. Additionally, more accurate predictions can be performed by developing machine learning models that incorporate additional factors such as individuals’ lifestyle and dietary habits, alongside height and weight used in BMI calculations. In this study, the potential of three different machine learning methods (naive Bayes, decision tree, and Random Forest (RF)) in predicting obesity levels were investigated. The best performance among the compared methods was obtained with RF (accuracy=0.8892, macro average F1-score=0.8618, Macro Average Precision (MAP)=0.8350, Macro Average Recall (MAR)=0.9122,). In addition, feature selection was also performed to determine the features that are significant for the estimation of the obesity level. According to the experimental results with feature selection, the RF method resulted in the highest score (accuracy=0.9236, MAP=0.9232, MAR=0.9358, macro average F1-score=0.9269) with fewer features. The results demonstrate that the performance of machine learning models on the same dataset can be enhanced through detailed hyperparameter tuning. Furthermore, applying feature selection can improve performance by mitigating the adverse effects of irrelevant or redundant features that may degrade the model’s effectiveness.

Keywords

Obesity, machine learning, feature selection, mutual information

References

World Obesity Federation. “World Obesity Atlas 2023.” Available: https://data.worldobesity.org/publications/?cat=19
Włodarczyk M, Nowicka G. Obesity, DNA damage, and development of obesity-related diseases. Int J Mol Sci 2019; 20(5): 1146.
Mohajan D, Mohajan HK. Obesity and its related diseases: a new escalating alarming in global health. J Innov Med Res 2023; 2(3): 12-23.
Göktaş ÖF, Çankaya İ, Ermeydan EŞ. Determination of the Optimum Test Conditions for Measurement of Glucose Level in Liquids. TJST 2024; 19(1): 45-53.
Okunogbe A, Nugent R, Spencer G, Powis J, Ralston J, Wilding J. Economic impacts of overweight and obesity: current and future estimates for 161 countries. BMJ Glob Health 2022; 7(9): e009773.
World Health Organization. (2024). Obesity and overweight. https://www.who.int/news-room/fact-sheets/detail/obesity-and-overweight
Nuttall FQ. Body mass index: obesity, BMI, and health: a critical review. Nutr Today 2015; 50(3): 117-128.
De Koning L, Merchant AT, Pogue J, Anand SS. Waist circumference and waist-to-hip ratio as predictors of cardiovascular events: meta-regression analysis of prospective studies. Eur Heart J 2007; 28(7): 850-856.
Degirmenci A, Karal O. iMCOD: Incremental multi-class outlier detection model in data streams. Knowledge-Based Syst 2022; 258: 109950.
Degirmenci A, Karal O. Efficient density and cluster based incremental outlier detection in data streams. Inf Sci 2022; 607: 901-920.
Özbay FA, Özbay E. An NCA-based hybrid cnn model for classification of Alzheimer’s disease on grad-cam-enhanced brain MRI images. TJST 2023; 18(1): 139-155.
Degirmenci A. Performance comparison of kNN, random forest and SVM in the prediction of cervical cancer from behavioral risk. Int J Innov Sci Res Technol 2022; 7(10): 71-79.
Peeyada P, Cholamjiak W. A new projection algorithm for variational inclusion problems and its application to cervical cancer disease prediction. J Comput Appl Math 2024: 441, 115702.
Goktas OF, Demiray E, Degirmenci A, Cankaya I. Real time non-invasive monitoring of glucose and nitrogen sources with a novel window sliding based algorithm. Eng Sci Technol Int J 2024; 58: 101845.
Cheng ER, Steinhardt R, Ben Miled Z. Predicting childhood obesity using machine learning: Practical considerations. BioMedInformatics 2022; 2(1): 184-203
Solomon DD, Khan S, Garg S, Gupta G, Almjally A, Alabduallah BI, Alsagri HS, Ibrahim MM, Abdallah AMA. Hybrid Majority Voting: Prediction and Classification Model for Obesity. Diagnostics 2023; 13(15): 2610.
Kaur R, Kumar R, Gupta M. Predicting risk of obesity and meal planning to reduce the obese in adulthood using artificial intelligence. Endocrine 2022; 78(3): 458-469.
Wang Q, Yang M, Pang B, Xue M, Zhang Y, Zhang Z, Niu W. Predicting risk of overweight or obesity in Chinese preschool-aged children using artificial intelligence techniques. Endocrine 2022; 77(1): 63-72.
Liu W, Fang X, Zhou Y, Dou L, Dou T. Machine learning-based investigation of the relationship between gut microbiome and obesity status. Microbes Infect 2022; 24(2): 104892.
Wong JE, Yamaguchi M, Nishi N, Araki M, Wee LH. Predicting overweight and obesity status among Malaysian working adults with machine learning or logistic regression: retrospective comparison study. JMIR Format Res 2022; 6(12): e40404.
Calderón-Díaz M, Serey-Castillo LJ, Vallejos-Cuevas EA, Espinoza A, Salas R, Macías-Jiménez MA. Detection of variables for the diagnosis of overweight and obesity in young Chileans using machine learning techniques. Procedia Comput Sci 2023; 220: 978-983.
Koklu N, Sulak SA. Using Artificial Intelligence Techniques for the Analysis of Obesity Status According to the Individuals’ Social and Physical Activities. Sinop Uni J Nat Sci 2024; 9(1): 217-239.
Koklu N, Sulak SA. Obesity Dataset. Kaggle. https://www.kaggle.com/datasets/suleymansulak/obesity-dataset: 2024.
Kim T, Lee JS. Maximizing AUC to learn weighted naive Bayes for imbalanced data classification. Expert Syst Appl 2023; 217: 119564.
Tokgöz N, Değirmenci A, Karal Ö. Machine Learning-Based Classification of Turkish Music for Mood-Driven Selection. J Adv Res Nat Appl Sci 2024; 10(2): 312-328.
Rodriguez-Galiano VF, Ghimire B, Rogan J, Chica-Olmo M, Rigol-Sanchez JP. An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J Photogramm Remote Sens 2012; 67: 93-104.
Shannon CE. A mathematical theory of communication. Bell Syst Tech J 1948; 27(3): 379-423.
Cover TM. Elements of information theory. John Wiley & Sons.

Obezitenin Doğru Tahmini için Makine Öğrenimi Modelleri: Veri Odaklı Yaklaşım

Year 2025, Volume: 20 Issue: 1, 77 - 90, 27.03.2025

Ali Değirmenci

https://doi.org/10.55525/tjst.1572382

Abstract

Obezitenin insan sağlığı üzerindeki zararlı etkileri ve obeziteden etkilenen bireylerin sayısı giderek artışı nedeniyle bu sorunun teşhis edilmesi büyük bir önem taşımaktadır. Obezitenin yaygınlaşması küresel sağlık açısından en önemli sorunlardan biri haline gelmesine yol açmıştır. Bu nedenle, obezite riskinin erken tespitini sağlayacak, ayrıca obezitenin artan yaygınlığını azaltmaya yardımcı olacak yöntemlerin geliştirilmesi elzemdir. Obezitenin öngörülmesi ve sınıflandırılması için yalnızca Beden Kitle İndeksine (BKİ) güvenmek hatalı sonuçlara yol açabilir. BKİ hesaplamalarında kullanılan boy ve kilonun yanı sıra bireylerin yaşam tarzı ve beslenme alışkanlıkları gibi ek faktörleri de içeren makine öğrenimi modelleri geliştirilerek daha doğru tahminler elde edilebilir. Bu çalışmada, üç farklı makine öğrenimi yönteminin (naive Bayes, karar ağacı ve Rasgele Orman (RF)) obezite seviyelerini tahmin etme potansiyeli araştırılmıştır. Karşılaştırılan yöntemler arasında en iyi performans RF ile elde edilmiştir (doğruluk=0,8892, makro ortalama F1-skor=0,8618, Makro Ortalama Kesinlik (MAP)=0,8350, Makro Ortalama Duyarlılık (MAR)=0,9122). Ayrıca, obezite seviyesini tahmin etmede etkili olan öznitelikleri belirlemek için öznitelik seçimi de yapılmıştır. Öznitelik seçimi ile elde edilen deneysel sonuçlara göre, RF yöntemi daha az öznitelik ile en yüksek skoru (doğruluk=0,9236, MAP=0,9232, MAR=0,9358, makro ortalama F1-skor=0,9269) elde etmiştir. Sonuçlar, makine öğrenimi modellerinin aynı veri kümesi üzerindeki performansının ayrıntılı hiperparametre ayarlamasıyla artırılabileceğini göstermektedir. Ayrıca, öznitelik seçimi uygulamak, modelin etkinliğini azaltabilecek ilgisiz veya gereksiz özniteliklerin olumsuz etkilerini azaltarak performansı artırabilir.

Keywords

Obezite, makine öğrenmesi, öznitelik seçimi, karşılıklı bilgi

References

World Obesity Federation. “World Obesity Atlas 2023.” Available: https://data.worldobesity.org/publications/?cat=19
Włodarczyk M, Nowicka G. Obesity, DNA damage, and development of obesity-related diseases. Int J Mol Sci 2019; 20(5): 1146.
Mohajan D, Mohajan HK. Obesity and its related diseases: a new escalating alarming in global health. J Innov Med Res 2023; 2(3): 12-23.
Göktaş ÖF, Çankaya İ, Ermeydan EŞ. Determination of the Optimum Test Conditions for Measurement of Glucose Level in Liquids. TJST 2024; 19(1): 45-53.
Okunogbe A, Nugent R, Spencer G, Powis J, Ralston J, Wilding J. Economic impacts of overweight and obesity: current and future estimates for 161 countries. BMJ Glob Health 2022; 7(9): e009773.
World Health Organization. (2024). Obesity and overweight. https://www.who.int/news-room/fact-sheets/detail/obesity-and-overweight
Nuttall FQ. Body mass index: obesity, BMI, and health: a critical review. Nutr Today 2015; 50(3): 117-128.
De Koning L, Merchant AT, Pogue J, Anand SS. Waist circumference and waist-to-hip ratio as predictors of cardiovascular events: meta-regression analysis of prospective studies. Eur Heart J 2007; 28(7): 850-856.
Degirmenci A, Karal O. iMCOD: Incremental multi-class outlier detection model in data streams. Knowledge-Based Syst 2022; 258: 109950.
Degirmenci A, Karal O. Efficient density and cluster based incremental outlier detection in data streams. Inf Sci 2022; 607: 901-920.
Özbay FA, Özbay E. An NCA-based hybrid cnn model for classification of Alzheimer’s disease on grad-cam-enhanced brain MRI images. TJST 2023; 18(1): 139-155.
Degirmenci A. Performance comparison of kNN, random forest and SVM in the prediction of cervical cancer from behavioral risk. Int J Innov Sci Res Technol 2022; 7(10): 71-79.
Peeyada P, Cholamjiak W. A new projection algorithm for variational inclusion problems and its application to cervical cancer disease prediction. J Comput Appl Math 2024: 441, 115702.
Goktas OF, Demiray E, Degirmenci A, Cankaya I. Real time non-invasive monitoring of glucose and nitrogen sources with a novel window sliding based algorithm. Eng Sci Technol Int J 2024; 58: 101845.
Cheng ER, Steinhardt R, Ben Miled Z. Predicting childhood obesity using machine learning: Practical considerations. BioMedInformatics 2022; 2(1): 184-203
Solomon DD, Khan S, Garg S, Gupta G, Almjally A, Alabduallah BI, Alsagri HS, Ibrahim MM, Abdallah AMA. Hybrid Majority Voting: Prediction and Classification Model for Obesity. Diagnostics 2023; 13(15): 2610.
Kaur R, Kumar R, Gupta M. Predicting risk of obesity and meal planning to reduce the obese in adulthood using artificial intelligence. Endocrine 2022; 78(3): 458-469.
Wang Q, Yang M, Pang B, Xue M, Zhang Y, Zhang Z, Niu W. Predicting risk of overweight or obesity in Chinese preschool-aged children using artificial intelligence techniques. Endocrine 2022; 77(1): 63-72.
Liu W, Fang X, Zhou Y, Dou L, Dou T. Machine learning-based investigation of the relationship between gut microbiome and obesity status. Microbes Infect 2022; 24(2): 104892.
Wong JE, Yamaguchi M, Nishi N, Araki M, Wee LH. Predicting overweight and obesity status among Malaysian working adults with machine learning or logistic regression: retrospective comparison study. JMIR Format Res 2022; 6(12): e40404.
Calderón-Díaz M, Serey-Castillo LJ, Vallejos-Cuevas EA, Espinoza A, Salas R, Macías-Jiménez MA. Detection of variables for the diagnosis of overweight and obesity in young Chileans using machine learning techniques. Procedia Comput Sci 2023; 220: 978-983.
Koklu N, Sulak SA. Using Artificial Intelligence Techniques for the Analysis of Obesity Status According to the Individuals’ Social and Physical Activities. Sinop Uni J Nat Sci 2024; 9(1): 217-239.
Koklu N, Sulak SA. Obesity Dataset. Kaggle. https://www.kaggle.com/datasets/suleymansulak/obesity-dataset: 2024.
Kim T, Lee JS. Maximizing AUC to learn weighted naive Bayes for imbalanced data classification. Expert Syst Appl 2023; 217: 119564.
Tokgöz N, Değirmenci A, Karal Ö. Machine Learning-Based Classification of Turkish Music for Mood-Driven Selection. J Adv Res Nat Appl Sci 2024; 10(2): 312-328.
Rodriguez-Galiano VF, Ghimire B, Rogan J, Chica-Olmo M, Rigol-Sanchez JP. An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J Photogramm Remote Sens 2012; 67: 93-104.
Shannon CE. A mathematical theory of communication. Bell Syst Tech J 1948; 27(3): 379-423.
Cover TM. Elements of information theory. John Wiley & Sons.

There are 28 citations in total.

Details

Primary Language	English
Subjects	Computing Applications in Health
Journal Section	TJST
Authors	Ali Değirmenci 0000-0001-9727-8559
Publication Date	March 27, 2025
Submission Date	October 23, 2024
Acceptance Date	December 3, 2024
Published in Issue	Year 2025 Volume: 20 Issue: 1

Cite

APA	Değirmenci, A. (2025). Machine Learning Models for Accurate Prediction of Obesity: A Data-Driven Approach. Turkish Journal of Science and Technology, 20(1), 77-90. https://doi.org/10.55525/tjst.1572382
AMA	Değirmenci A. Machine Learning Models for Accurate Prediction of Obesity: A Data-Driven Approach. TJST. March 2025;20(1):77-90. doi:10.55525/tjst.1572382
Chicago	Değirmenci, Ali. “Machine Learning Models for Accurate Prediction of Obesity: A Data-Driven Approach”. Turkish Journal of Science and Technology 20, no. 1 (March 2025): 77-90. https://doi.org/10.55525/tjst.1572382.
EndNote	Değirmenci A (March 1, 2025) Machine Learning Models for Accurate Prediction of Obesity: A Data-Driven Approach. Turkish Journal of Science and Technology 20 1 77–90.
IEEE	A. Değirmenci, “Machine Learning Models for Accurate Prediction of Obesity: A Data-Driven Approach”, TJST, vol. 20, no. 1, pp. 77–90, 2025, doi: 10.55525/tjst.1572382.
ISNAD	Değirmenci, Ali. “Machine Learning Models for Accurate Prediction of Obesity: A Data-Driven Approach”. Turkish Journal of Science and Technology 20/1 (March 2025), 77-90. https://doi.org/10.55525/tjst.1572382.
JAMA	Değirmenci A. Machine Learning Models for Accurate Prediction of Obesity: A Data-Driven Approach. TJST. 2025;20:77–90.
MLA	Değirmenci, Ali. “Machine Learning Models for Accurate Prediction of Obesity: A Data-Driven Approach”. Turkish Journal of Science and Technology, vol. 20, no. 1, 2025, pp. 77-90, doi:10.55525/tjst.1572382.
Vancouver	Değirmenci A. Machine Learning Models for Accurate Prediction of Obesity: A Data-Driven Approach. TJST. 2025;20(1):77-90.

Download Cover Image

Article Files

Full Text