Research Article
BibTex RIS Cite
Year 2023, Volume: 5 Issue: 2, 284 - 9, 15.05.2023
https://doi.org/10.37990/medr.1202671

Abstract

References

  • Abd-Elnaby M, Alfonse M, Roushdy M. Classification of breast cancer using microarray gene expression data: a survey. J Biomed Inform. 2021;117.
  • Bray F, Ferlay J, Soerjomataram I, et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68:394-424.
  • Bray F, Ferlay J, Soerjomataram I, et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;70:313.
  • Bahçeli PZ, Kucuk BY, Fear of cancer recurrence in women with breast cancer: A cross-sectional study after Mastectomy, Med Records. 2022;4:315-20.
  • Chaffer CL, Weinberg RA. A perspective on cancer cell metastasis. Science. 2011;25:331.
  • Scully OJ, Bay B, Yip G, Yu Y. Breast cancer metastasis.Cancer Genomics Proteomics. 2012:9;311-20.
  • Curtis RK, Oresic M, Vidal-Puig A. Breast cancer metastasis Pathways to the analysis of microarray data. Trends Biotechnol. 2005;23:429–35.
  • Dhanasekaran SM, Barrette TR, Ghosh D, et al. Delineation of prognostic biomarkers in prostate cancer. Nature. 2001;412:822–6.
  • Chang DD, Park NH, Denny CT, et al. Characterization of transformation related genes in oral cancer cells. Oncogene. 1998;16:1921-30.
  • Pirooznia M, Yang JY, Yang MQ, et al. A comparative study of different machine learning methods on microarray gene expression data. BMC Genomics. 2008;9:13.
  • Sung-Bae C, Hong-Hee W. Machine learning in dna microarray analysis for cancer classification. APBC. 2003;189-98.
  • Alagukumar S, Kathirvalavakumar T. Classifying Microarray Gene Expression Cancer Data Using Statistical Feature Selection and Machine Learning Methods. In: Saraswat, M., Sharma, H., Balachandran, K., Kim, J.H., Bansal, J.C. (eds) Congress on Intelligent Systems. Lecture Notes on Data Engineering and Communications Technologies, 2022;114.
  • Lohith RD, Chetty RN, Shaan MS, et al. Gene Expression Analysis using Particle Swarm Optimization and Machine Learning Algorithms for Diagnosing Liver & Breast Cancer, 2022 3rd International Conference on Electronics and Sustainable Communication Systems (ICESC), 2022;1176-81.
  • Mohd A, Besar N. Hybrid feature selection of breast cancer gene expression microarray data based on metaheuristic methods: a comprehensive review. Symmetry. 2022;14:1955.
  • Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc. B: Stat Methodol. 1996;58:267-88. Miron B, Witold R. Rudnicki. Feature selection with the boruta package. J Stat Softw. 2010;36:1-13.
  • Hanchuan P, Fuhui L, Ding C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell. 2005;27:1226-38.
  • Zhao Z, Anand R, Wang M. Maximum relevance and minimum redundancy feature selection methods for a marketing machine learning platform. In 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA) 2019;442–52.
  • Van der Laan MJ, Polley EC, Hubbard AE. Super learner. Stat Appl Genet Mol Biol. 2007; 6(1).
  • Secilmis D, Agraz M, Purutcuglu V. Two New Nonparametric Models for Biological Networks, In Hemanchardan K. et al. (editors) Bayesian Reasoning and Gaussian Processes for Machine Learning Applications. 2022;CRC Press. Geurts P, Ernst D, Wehenkel L. Extremely randomized trees. Mach Learn. 2006;63:3–42.
  • Friedman JH. Greedy function approximation: a gradient boosting machine. Annals of Statistics. 2001:1189–232. Paksoy N, Yangin HF. Artificial Intelligence-based colon cancer prediction by identifying genomic biomarkers. Med Records. 2022;4:196-202.
  • Güçkiran K, Cantürk İ, Özyilmaz L. DNA microarray gene expression data classification using SVM, MLP, and RF with feature selection methods relief and LASSO. Journal of Suleyman Demirel University Institute of Science and Technology. 2019;23:126-32.
  • Baha Ş. Importance of attribute selection for parkinson disease. Academic Platform J Engineering Sci. 2020;8:175-80.
  • Breiman L. Random forests. Machine Learning. 2001;45:5–32.
  • Lacalamita A, Piccinno E, Scalavino V, et al. A Gene-based machine learning classifier associated to the colorectal adenoma-carcinoma sequence. Biomedicines. 2021;9:1937.
  • Ding C, Peng H. Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol. 2005;3:185-205.

Comparison of Feature Selection Methods in Breast Cancer Microarray Data

Year 2023, Volume: 5 Issue: 2, 284 - 9, 15.05.2023
https://doi.org/10.37990/medr.1202671

Abstract

Aim: We aim to predict metastasis in breast cancer patients with tree-based conventional machine learning algorithms and to observe which feature selection methods is more effective in machine learning methods related to microarray breast cancer data reducing the number of features.
Material and Methods: Feature selection methods, least squares absolute shrinkage (LASSO), Boruta and maximum relevance-minimum redundancy (MRMR) and statistical preprocessing steps were first applied before the tree-based learning conventional machine learning methods like Decision-tree, Extremely randomized trees and Gradient Boosting Tree applied on the microarray breast cancer data.
Results: Microarray data with 54675 features (202 (101/101 breast cancer patients with/without metastases)) was first reduced to 235 features, then the feature selection algorithms were applied and the most important features were found with tree-based machine learning algorithms. It was observed that the highest recall and F-measure values were obtained from the XGBoost method and the highest precision value was received from the Extra-tree method. The 10 arrays out of 54675 with the highest variable importance were listed.
Conclusion: The most accurate results were obtained from the statistical preprocessed data for the XGBoost and Extra-trees machine learning algorithms. Statistical and microarray preprocessing steps would be enough in machine learning analysis of microarray data in breast cancer metastases predictions.

References

  • Abd-Elnaby M, Alfonse M, Roushdy M. Classification of breast cancer using microarray gene expression data: a survey. J Biomed Inform. 2021;117.
  • Bray F, Ferlay J, Soerjomataram I, et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68:394-424.
  • Bray F, Ferlay J, Soerjomataram I, et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;70:313.
  • Bahçeli PZ, Kucuk BY, Fear of cancer recurrence in women with breast cancer: A cross-sectional study after Mastectomy, Med Records. 2022;4:315-20.
  • Chaffer CL, Weinberg RA. A perspective on cancer cell metastasis. Science. 2011;25:331.
  • Scully OJ, Bay B, Yip G, Yu Y. Breast cancer metastasis.Cancer Genomics Proteomics. 2012:9;311-20.
  • Curtis RK, Oresic M, Vidal-Puig A. Breast cancer metastasis Pathways to the analysis of microarray data. Trends Biotechnol. 2005;23:429–35.
  • Dhanasekaran SM, Barrette TR, Ghosh D, et al. Delineation of prognostic biomarkers in prostate cancer. Nature. 2001;412:822–6.
  • Chang DD, Park NH, Denny CT, et al. Characterization of transformation related genes in oral cancer cells. Oncogene. 1998;16:1921-30.
  • Pirooznia M, Yang JY, Yang MQ, et al. A comparative study of different machine learning methods on microarray gene expression data. BMC Genomics. 2008;9:13.
  • Sung-Bae C, Hong-Hee W. Machine learning in dna microarray analysis for cancer classification. APBC. 2003;189-98.
  • Alagukumar S, Kathirvalavakumar T. Classifying Microarray Gene Expression Cancer Data Using Statistical Feature Selection and Machine Learning Methods. In: Saraswat, M., Sharma, H., Balachandran, K., Kim, J.H., Bansal, J.C. (eds) Congress on Intelligent Systems. Lecture Notes on Data Engineering and Communications Technologies, 2022;114.
  • Lohith RD, Chetty RN, Shaan MS, et al. Gene Expression Analysis using Particle Swarm Optimization and Machine Learning Algorithms for Diagnosing Liver & Breast Cancer, 2022 3rd International Conference on Electronics and Sustainable Communication Systems (ICESC), 2022;1176-81.
  • Mohd A, Besar N. Hybrid feature selection of breast cancer gene expression microarray data based on metaheuristic methods: a comprehensive review. Symmetry. 2022;14:1955.
  • Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc. B: Stat Methodol. 1996;58:267-88. Miron B, Witold R. Rudnicki. Feature selection with the boruta package. J Stat Softw. 2010;36:1-13.
  • Hanchuan P, Fuhui L, Ding C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell. 2005;27:1226-38.
  • Zhao Z, Anand R, Wang M. Maximum relevance and minimum redundancy feature selection methods for a marketing machine learning platform. In 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA) 2019;442–52.
  • Van der Laan MJ, Polley EC, Hubbard AE. Super learner. Stat Appl Genet Mol Biol. 2007; 6(1).
  • Secilmis D, Agraz M, Purutcuglu V. Two New Nonparametric Models for Biological Networks, In Hemanchardan K. et al. (editors) Bayesian Reasoning and Gaussian Processes for Machine Learning Applications. 2022;CRC Press. Geurts P, Ernst D, Wehenkel L. Extremely randomized trees. Mach Learn. 2006;63:3–42.
  • Friedman JH. Greedy function approximation: a gradient boosting machine. Annals of Statistics. 2001:1189–232. Paksoy N, Yangin HF. Artificial Intelligence-based colon cancer prediction by identifying genomic biomarkers. Med Records. 2022;4:196-202.
  • Güçkiran K, Cantürk İ, Özyilmaz L. DNA microarray gene expression data classification using SVM, MLP, and RF with feature selection methods relief and LASSO. Journal of Suleyman Demirel University Institute of Science and Technology. 2019;23:126-32.
  • Baha Ş. Importance of attribute selection for parkinson disease. Academic Platform J Engineering Sci. 2020;8:175-80.
  • Breiman L. Random forests. Machine Learning. 2001;45:5–32.
  • Lacalamita A, Piccinno E, Scalavino V, et al. A Gene-based machine learning classifier associated to the colorectal adenoma-carcinoma sequence. Biomedicines. 2021;9:1937.
  • Ding C, Peng H. Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol. 2005;3:185-205.
There are 25 citations in total.

Details

Primary Language English
Subjects ​Internal Diseases
Journal Section Original Articles
Authors

Melih Agraz 0000-0002-6597-7627

Early Pub Date May 15, 2023
Publication Date May 15, 2023
Acceptance Date January 4, 2023
Published in Issue Year 2023 Volume: 5 Issue: 2

Cite

AMA Agraz M. Comparison of Feature Selection Methods in Breast Cancer Microarray Data. Med Records. May 2023;5(2):284-9. doi:10.37990/medr.1202671

17741

Chief Editors

Assoc. Prof. Zülal Öner
İzmir Bakırçay University, Department of Anatomy, İzmir, Türkiye

Assoc. Prof. Deniz Şenol
Düzce University, Department of Anatomy, Düzce, Türkiye

Editors
Assoc. Prof. Serkan Öner
İzmir Bakırçay University, Department of Radiology, İzmir, Türkiye
 
E-mail: medrecsjournal@gmail.com

Publisher:
Medical Records Association (Tıbbi Kayıtlar Derneği)
Address: Orhangazi Neighborhood, 440th Street,
Green Life Complex, Block B, Floor 3, No. 69
Düzce, Türkiye
Web: www.tibbikayitlar.org.tr

Publication Support:
Effect Publishing & Agency
Phone: + 90 (540) 035 44 35
E-mail:
info@effectpublishing.com
Address: Akdeniz Neighborhood, Şehit Fethi Bey Street,
No: 66/B, Ground floor, 35210 Konak/İzmir, Türkiye
web: www.effectpublishing.com