Drug Solubility Prediction: A Comparative Analysis of GNN, MLP, and Traditional Machine Learning Algorithms

Veysel Gider; Cafer Budak

doi:10.29109/gujsc.1371519

Araştırma Makalesi

Drug Solubility Prediction: A Comparative Analysis of GNN, MLP, and Traditional Machine Learning Algorithms

Yıl 2024, Cilt: 12 Sayı: 1, 164 - 175, 25.03.2024

Veysel Gider , Cafer Budak

https://doi.org/10.29109/gujsc.1371519

Cited By: 1

Öz

The effective development and design of pharmaceuticals hold fundamental importance in the fields of medicine and the pharmaceutical industry. In this process, the accurate prediction of drug molecule solubility is a critical factor influencing the bioavailability, pharmacokinetics, and toxicity of drugs. Traditionally, mathematical equations based on chemical and physical properties have been used for drug solubility prediction. However, in recent years, with the advancement of artificial intelligence and machine learning techniques, new approaches have been developed in this field. This study evaluated different modeling approaches consisting of Graph Neural Networks (GNN), Multilayer Perceptron (MLP), and traditional Machine Learning (ML) algorithms. The Random Forest (RF) model stands out as the optimal performer, manifesting superior efficacy through the attainment of minimal error rates. It attains a Root Mean Square Error (RMSE) value of 1.2145, a Mean Absolute Error (MAE) value of 0.9221, and an R-squared (R2) value of 0.6575. In contrast, GNN model displays comparatively suboptimal performance, as evidenced by an RMSE value of 1.8389, an MAE value of 1.4684, and an R2 value of 0.2147. These values suggest that the predictions of this model contain higher errors compared to other models, and its explanatory power is lower. These findings highlight the performance differences among different modeling approaches in drug solubility prediction. The RF model is shown to be more effective than other methods, while the GNN model performs less effectively. This information provides valuable insights into which model should be preferred in pharmaceutical design and development processes.

Anahtar Kelimeler

Machine Learning, Drug solubility, Graph Neural Networks, Regression models

Etik Beyan

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Destekleyen Kurum

This study was not supported by any funding organisation.

Kaynakça

[1] Prieto-Martínez, F. D., López-López, E., Juárez-Mercado, K. E., & Medina-Franco, J. L. (2019). Computational drug design methods—current and future perspectives. In silico drug design, 19-44.
[2] Barrett, Jaclyn A., et al. "Discovery solubility measurement and assessment of small molecules with drug development in mind." Drug Discovery Today 27.5 (2022): 1315-1325.
[3] Vora, Lalitkumar K., et al. "Artificial Intelligence in Pharmaceutical Technology and Drug Delivery Design." Pharmaceutics 15.7 (2023): 1916.
[4] Budak, Cafer, Vasfiye Mençik, and Veysel Gider. "Determining similarities of COVID-19–lung cancer drugs and affinity binding mode analysis by graph neural network-based GEFA method." Journal of Biomolecular Structure and Dynamics 41.2 (2023): 659-671.
[5] Gider, Veysel, and Cafer Budak. "Instruction of molecular structure similarity and scaffolds of drugs under investigation in ebola virus treatment by atom-pair and graph network: A combination of favipiravir and molnupiravir." Computational biology and chemistry 101 (2022): 107778.
[6] Gardner, Matt W., and S. R. Dorling. "Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences." Atmospheric environment 32.14-15 (1998): 2627-2636.
[7] Hu, Pingfan, et al. "Development of solubility prediction models with ensemble learning." Industrial & Engineering Chemistry Research 60.30 (2021): 11627-11635.
[8] Selvaraj, Chandrabose, Ishwar Chandra, and Sanjeev Kumar Singh. "Artificial intelligence and machine learning approaches for drug design: challenges and opportunities for the pharmaceutical industries." Molecular diversity (2021): 1-21.
[9] Kherouf, Soumaya, et al. "Modeling of linear and nonlinear quantitative structure property relationships of the aqueous solubility of phenol derivatives." Journal of the Serbian Chemical Society 84.6 (2019): 575-590.
[10] Eros, Daniel, et al. "Comparison of predictive ability of water solubility QSPR models generated by MLR, PLS and ANN methods." Mini Reviews in Medicinal Chemistry 4.2 (2004): 167-177.
[11] Sinha, Priyanka, et al. "Integrating Machine Learning and Molecular Simulation for Material Design and Discovery." Transactions of the Indian National Academy of Engineering 8.3 (2023): 325-340.
[12] Reiser, Patrick, et al. "Graph neural networks for materials science and chemistry." Communications Materials 3.1 (2022): 93.
[13] Qin, Yongfei, et al. "MLP-based regression prediction model for compound bioactivity." Frontiers in Bioengineering and Biotechnology 10 (2022): 946329.
[14] Ahmad, Waqar, Hilal Tayara, and Kil To Chong. "Attention-Based Graph Neural Network for Molecular Solubility Prediction." ACS omega 8.3 (2023): 3236-3244.
[15] Lee, Sangho, et al. "Multi-order graph attention network for water solubility prediction and interpretation." Scientific Reports 13.1 (2023): 957.
[16] Hamdi, Mohammad Erfan, et al. "Prediction of Aqueous Solubility of Drug Molecules by Embedding Spatial Conformers Using Graph Neural Networks." 2022 29th National and 7th International Iranian Conference on Biomedical Engineering (ICBME). IEEE, 2022.
[17] Ge, Kai, and Yuanhui Ji. "Novel computational approach by combining machine learning with molecular thermodynamics for predicting drug solubility in solvents." Industrial & Engineering Chemistry Research 60.25 (2021): 9259-9268.
[18] Alzhrani, Rami M., Atiah H. Almalki, and Sameer Alshehri. "Novel numerical simulation of drug solubility in supercritical CO2 using machine learning technique: Lenalidomide case study." Arabian Journal of Chemistry 15.11 (2022): 104180.
[19] Sadeghi, Arash, et al. "Machine learning simulation of pharmaceutical solubility in supercritical carbon dioxide: Prediction and experimental validation for busulfan drug." Arabian Journal of Chemistry 15.1 (2022): 103502.
[20] Meng, Di, and Zhenyu Liu. "Machine learning aided pharmaceutical engineering: Model development and validation for estimation of drug solubility in green solvent." Journal of Molecular Liquids 392 (2023): 123286.
[21] Li, Mengshan, et al. "Prediction of the aqueous solubility of compounds based on light gradient boosting machines with molecular fingerprints and the cuckoo search algorithm." ACS omega 7.46 (2022): 42027-42035.
[22] Sadybekov, Anastasiia V., and Vsevolod Katritch. "Computational approaches streamlining drug discovery." Nature 616.7958 (2023): 673-685.
[23] KAGGLE, Online (2023). https://www.kaggle.com/code/mmelahi/physical-chemistry-esol/input Access: 02.09.2023.
[24] Gong, Weiyi, and Qimin Yan. "Graph-based deep learning frameworks for molecules and solid-state materials." Computational Materials Science 195 (2021): 110332.
[25] Liu, Yanli, Yourong Wang, and Jian Zhang. "New machine learning algorithm: Random forest." Information Computing and Applications: Third International Conference, ICICA 2012, Chengde, China, September 14-16, 2012. Proceedings 3. Springer Berlin Heidelberg, 2012.
[26] Friedman, Jerome H. "Greedy function approximation: a gradient boosting machine." Annals of statistics (2001): 1189-1232.
[27] Bentéjac, Candice, Anna Csörgő, and Gonzalo Martínez-Muñoz. "A comparative analysis of gradient boosting algorithms." Artificial Intelligence Review 54 (2021): 1937-1967.

İlaç Çözünürlüğü Tahmini: GNN, MLP ve Geleneksel Makine Öğrenimi Algoritmalarının Karşılaştırmalı Analizi

Yıl 2024, Cilt: 12 Sayı: 1, 164 - 175, 25.03.2024

Veysel Gider , Cafer Budak

https://doi.org/10.29109/gujsc.1371519

Cited By: 1

Öz

İlaçların etkin bir şekilde geliştirilmesi ve tasarlanması, tıp ve ilaç endüstrisi alanlarında temel öneme sahiptir. Bu süreçte, ilaç molekülünün çözünürlüğünün doğru bir şekilde tahmin edilmesi, ilaçların biyoyararlanımını, farmakokinetiğini ve toksisitesini etkileyen kritik bir faktördür. Geleneksel olarak, ilaç çözünürlüğü tahmini için kimyasal ve fiziksel özelliklere dayalı matematiksel denklemler kullanılmıştır. Ancak son yıllarda yapay zekâ ve makine öğrenimi tekniklerinin ilerlemesiyle bu alanda yeni yaklaşımlar geliştirilmiştir. Bu çalışmada, Grafik Sinir Ağları (GNN), Çok Katmanlı Algılayıcı (MLP) ve geleneksel Makine Öğrenmesi (ML) algoritmalarından oluşan farklı modelleme yaklaşımları değerlendirilmiştir. Rastgele Orman (RF) modeli, minimum hata oranlarına ulaşarak üstün etkinlik gösteren en iyi performans gösteren model olarak öne çıkmaktadır. Kök Ortalama Kare Hata (RMSE) değeri 1,2145, Ortalama Mutlak Hata (MAE) değeri 0,9221 ve R-kare (R2) değeri 0,6575'tir. Buna karşılık GNN modeli, 1,8389 RMSE değeri, 1,4684 MAE değeri ve 0,2147 R2 değeri ile kanıtlandığı üzere nispeten düşük bir performans sergilemektedir. Bu değerler, bu modelin tahminlerinin diğer modellere kıyasla daha yüksek hata içerdiğini ve açıklayıcı gücünün daha düşük olduğunu göstermektedir. Bu bulgular, ilaç çözünürlüğü tahmininde farklı modelleme yaklaşımları arasındaki performans farklılıklarını vurgulamaktadır. RF modelinin diğer yöntemlere göre daha etkili olduğu, GNN modelinin ise daha az etkili performans gösterdiği görülmektedir. Bu bilgi, farmasötik tasarım ve geliştirme süreçlerinde hangi modelin tercih edilmesi gerektiği konusunda değerli bilgiler sağlamaktadır.

Anahtar Kelimeler

Makine Öğrenmesi, İlaç Çözünürlük, Grafik Sinir Ağları, Regresyon Modelleri

Kaynakça

[1] Prieto-Martínez, F. D., López-López, E., Juárez-Mercado, K. E., & Medina-Franco, J. L. (2019). Computational drug design methods—current and future perspectives. In silico drug design, 19-44.
[2] Barrett, Jaclyn A., et al. "Discovery solubility measurement and assessment of small molecules with drug development in mind." Drug Discovery Today 27.5 (2022): 1315-1325.
[3] Vora, Lalitkumar K., et al. "Artificial Intelligence in Pharmaceutical Technology and Drug Delivery Design." Pharmaceutics 15.7 (2023): 1916.
[4] Budak, Cafer, Vasfiye Mençik, and Veysel Gider. "Determining similarities of COVID-19–lung cancer drugs and affinity binding mode analysis by graph neural network-based GEFA method." Journal of Biomolecular Structure and Dynamics 41.2 (2023): 659-671.
[5] Gider, Veysel, and Cafer Budak. "Instruction of molecular structure similarity and scaffolds of drugs under investigation in ebola virus treatment by atom-pair and graph network: A combination of favipiravir and molnupiravir." Computational biology and chemistry 101 (2022): 107778.
[6] Gardner, Matt W., and S. R. Dorling. "Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences." Atmospheric environment 32.14-15 (1998): 2627-2636.
[7] Hu, Pingfan, et al. "Development of solubility prediction models with ensemble learning." Industrial & Engineering Chemistry Research 60.30 (2021): 11627-11635.
[8] Selvaraj, Chandrabose, Ishwar Chandra, and Sanjeev Kumar Singh. "Artificial intelligence and machine learning approaches for drug design: challenges and opportunities for the pharmaceutical industries." Molecular diversity (2021): 1-21.
[9] Kherouf, Soumaya, et al. "Modeling of linear and nonlinear quantitative structure property relationships of the aqueous solubility of phenol derivatives." Journal of the Serbian Chemical Society 84.6 (2019): 575-590.
[10] Eros, Daniel, et al. "Comparison of predictive ability of water solubility QSPR models generated by MLR, PLS and ANN methods." Mini Reviews in Medicinal Chemistry 4.2 (2004): 167-177.
[11] Sinha, Priyanka, et al. "Integrating Machine Learning and Molecular Simulation for Material Design and Discovery." Transactions of the Indian National Academy of Engineering 8.3 (2023): 325-340.
[12] Reiser, Patrick, et al. "Graph neural networks for materials science and chemistry." Communications Materials 3.1 (2022): 93.
[13] Qin, Yongfei, et al. "MLP-based regression prediction model for compound bioactivity." Frontiers in Bioengineering and Biotechnology 10 (2022): 946329.
[14] Ahmad, Waqar, Hilal Tayara, and Kil To Chong. "Attention-Based Graph Neural Network for Molecular Solubility Prediction." ACS omega 8.3 (2023): 3236-3244.
[15] Lee, Sangho, et al. "Multi-order graph attention network for water solubility prediction and interpretation." Scientific Reports 13.1 (2023): 957.
[16] Hamdi, Mohammad Erfan, et al. "Prediction of Aqueous Solubility of Drug Molecules by Embedding Spatial Conformers Using Graph Neural Networks." 2022 29th National and 7th International Iranian Conference on Biomedical Engineering (ICBME). IEEE, 2022.
[17] Ge, Kai, and Yuanhui Ji. "Novel computational approach by combining machine learning with molecular thermodynamics for predicting drug solubility in solvents." Industrial & Engineering Chemistry Research 60.25 (2021): 9259-9268.
[18] Alzhrani, Rami M., Atiah H. Almalki, and Sameer Alshehri. "Novel numerical simulation of drug solubility in supercritical CO2 using machine learning technique: Lenalidomide case study." Arabian Journal of Chemistry 15.11 (2022): 104180.
[19] Sadeghi, Arash, et al. "Machine learning simulation of pharmaceutical solubility in supercritical carbon dioxide: Prediction and experimental validation for busulfan drug." Arabian Journal of Chemistry 15.1 (2022): 103502.
[20] Meng, Di, and Zhenyu Liu. "Machine learning aided pharmaceutical engineering: Model development and validation for estimation of drug solubility in green solvent." Journal of Molecular Liquids 392 (2023): 123286.
[21] Li, Mengshan, et al. "Prediction of the aqueous solubility of compounds based on light gradient boosting machines with molecular fingerprints and the cuckoo search algorithm." ACS omega 7.46 (2022): 42027-42035.
[22] Sadybekov, Anastasiia V., and Vsevolod Katritch. "Computational approaches streamlining drug discovery." Nature 616.7958 (2023): 673-685.
[23] KAGGLE, Online (2023). https://www.kaggle.com/code/mmelahi/physical-chemistry-esol/input Access: 02.09.2023.
[24] Gong, Weiyi, and Qimin Yan. "Graph-based deep learning frameworks for molecules and solid-state materials." Computational Materials Science 195 (2021): 110332.
[25] Liu, Yanli, Yourong Wang, and Jian Zhang. "New machine learning algorithm: Random forest." Information Computing and Applications: Third International Conference, ICICA 2012, Chengde, China, September 14-16, 2012. Proceedings 3. Springer Berlin Heidelberg, 2012.
[26] Friedman, Jerome H. "Greedy function approximation: a gradient boosting machine." Annals of statistics (2001): 1189-1232.
[27] Bentéjac, Candice, Anna Csörgő, and Gonzalo Martínez-Muñoz. "A comparative analysis of gradient boosting algorithms." Artificial Intelligence Review 54 (2021): 1937-1967.

Toplam 27 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Devreler ve Sistemler, Elektrik Mühendisliği (Diğer), Kimya Mühendisliği (Diğer)
Bölüm	Tasarım ve Teknoloji
Yazarlar	Veysel Gider 0000-0001-7538-262X Cafer Budak 0000-0002-8470-4579
Erken Görünüm Tarihi	5 Mart 2024
Yayımlanma Tarihi	25 Mart 2024
Gönderilme Tarihi	5 Ekim 2023
Yayımlandığı Sayı	Yıl 2024 Cilt: 12 Sayı: 1

Kaynak Göster

APA	Gider, V., & Budak, C. (2024). Drug Solubility Prediction: A Comparative Analysis of GNN, MLP, and Traditional Machine Learning Algorithms. Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım Ve Teknoloji, 12(1), 164-175. https://doi.org/10.29109/gujsc.1371519

Cited By

Polymorph-Specific Solubility Prediction of Urea Using Constant Chemical Potential Molecular Dynamics Simulations

The Journal of Physical Chemistry B

https://doi.org/10.1021/acs.jpcb.4c02027

Kapak Resmi İndir

Makale Dosyaları

Tam Metin

e-ISSN:2147-9526