A Comparison for Feature Extraction and Dimension Reduction Methods for Classification of RNA m6A Modification Sites
Year 2024,
EARLY VIEW, 1 - 1
Batuhan Nuray
,
Volkan Altuntaş
Abstract
In this study was to identify N6-methyladenosine (m6A) modification sitesthat frequently occur in RNA and to compare the performance of different feature extractors, feature selectors and dimension reduction algorithms using K-nearest neighbour classification algorithm for future studies. 35 different feature extraction algorithms and 9 different dimension reduction and feature selection algorithms were used to evaluate the performance of the algorithms in the identification of m6A modification sites. At the end of the study, it was observed that the combination of the NCP feature extraction algorithm, whic takes into account the chemical properties of nucleotides, and the Extra Trees dimension reduction mehod showed high performance in the identification of m6A modification sites.
References
- [1] P. Acera Mateos, Y. Zhou, K. Zarnack, E. Eyras, ve Y. Zhou contributed equally, “Concepts and methods for transcriptome-wide prediction of chemical messenger RNA modifications with machine learning”, Briefings in Bioinformatics, c. 1-14., (2023).
- [2] Y. Zhang vd., “StackRAM: a cross-species method for identifying RNA N 6-methyladenosine sites based on stacked ensemble”, (2022).
- [3] L. He, H. Li, A. Wu, Y. Peng, G. Shu, ve G. Yin, “Functions of N6-methyladenosine and its role in cancer”.
- [4] W. Chen, H. Tran, Z. Liang, H. Lin, ve L. Zhang, “Identification and analysis of the N 6-methyladenosine in the Saccharomyces cerevisiae transcriptome OPEN”, Nature Publishing Group, (2015).
- [5] A. K. Sangaiah vd., “M6AMRFS: Robust Prediction of N6-Methyladenosine Sites With Sequence-Based Features in Multiple Species”, (2018).
- [6] A. Khan, H. U. Rehman, U. Habib, ve U. Ijaz, “Detecting N6-methyladenosine sites from RNA transcriptomes using random forest”, Journal of Computational Science, c. 47, 101238, (2020).
- [7] L. Wei, R. Su, B. Wang, X. Li, Q. Zou, ve X. Gao, “Integration of deep feature representations and handcrafted features to improve the prediction of N 6-methyladenosine sites”, Neurocomputing, c. 324, 3-9, (2019).
- [8] A. Maity ve B. Das, “N6-methyladenosine modification in mRNA: Machinery, function and implications for health and diseases”, FEBS Journal, c. 283, 1607-1630, (2016).
- [9] J. Luo, T. Xu, ve K. Sun, “N6-Methyladenosine RNA Modification in Inflammation: Roles, Mechanisms, and Applications”, Frontiers in Cell and Developmental Biology, c. 9, 670711, (2021).
- [10] M. U. Rehman, K. J. Hong, H. Tayara, ve K. T. Chong, “m6A-NeuralTool: Convolution Neural Tool for RNA N6-Methyladenosine Site Identification in Different Species”.
- [11] M. F. Sabooh, N. Iqbal, M. Khan, M. Khan, ve H. F. Maqbool, “Identifying 5-methylcytosine sites in RNA sequence using composite encoding feature into Chou’s PseKNC”, Journal of Theoretical Biology, c. 452, 1-9, (2018).
- [12] Z. Chen vd., “iFeatureOmega: an integrative platform for engineering, visualization and analysis of features from molecular sequences, structural and ligand data sets”, Nucleic Acids Research, c. 50, W434-W447, (2022).
- [13] Z. Chen vd., “iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization”, Nucleic Acids Research, c. 49, (2021).
- [14] A. El Allali, Z. Elhamraoui, ve R. Daoud, “Machine learning applications in RNA modification sites prediction”, Computational and Structural Biotechnology Journal, c. 19, 5510-5524, (2021).
- [15] H. Wang, S. Wang, Y. Zhang, S. Bi, ve X. Zhu, “A brief review of machine learning methods for RNA methylation sites prediction”, Methods, c. 203, 399-421, (2022).
- [16] T. H. Nguyen-Vo, Q. H. Nguyen, T. T. T. Do, T. N. Nguyen, S. Rahardja, ve B. P. Nguyen, “IPseU-NCP: Identifying RNA pseudouridine sites using random forest and NCP-encoded features”, BMC Genomics, c. 20, 1-11, (2019).
- [17] M. O. Arowolo, M. O. Adebiyi, C. Aremu, ve A. A. Adebiyi, “A survey of dimension reduction and classification methods for RNA-Seq data on malaria vector”, Journal of Big Data, c. 8, 1-17, (2021).
- [18] C. Lan, H. Peng, G. Hutvagner, ve J. Li, “Construction of competing endogenous RNA networks from paired RNA-seq data sets by pointwise mutual information”, BMC Genomics, c. 20, 1-10, (2019).
- [19] Y. Bengio, O. Delalleau, N. Le Roux, J. F. Paiement, P. Vincent, ve M. Ouimet, “Learning Eigenfunctions Links Spectral Embedding and Kernel PCA”, Neural Computation, c. 16, 2197-2219, (2004).
- [20] Y. Liang, S. Zhang, H. Qiao, ve Y. Yao, “iPromoter-ET: Identifying promoters and their strength by extremely randomized trees-based feature selection”, Analytical Biochemistry, c. 630, 114335, (2021).
- [21] X. Zhu, T. Ching, X. Pan, S. M. Weissman, ve L. Garmire, “Detecting heterogeneity in single-cell RNA-Seq data by non-negative matrix factorization”, PeerJ, c. 2017, e2888, (2017).
- [22] N. Yu, M. J. Wu, J. X. Liu, C. H. Zheng, ve Y. Xu, “Correntropy-Based Hypergraph Regularized NMF for Clustering and Feature Selection on Multi-Cancer Integrated Data”, IEEE Transactions on Cybernetics, c. 51, 3952-3963, (2021).
- [23] Y. Liang, S. Zhang, H. Qiao, ve Y. Yao, “iPromoter-ET: Identifying promoters and their strength by extremely randomized trees-based feature selection”, Analytical Biochemistry, c. 630, 114335, (2021).
- [24] X. Zhou, J. Zhu, K. Y. Liu, B. L. Sabatini, ve S. T. C. Wong, “Mutual information-based feature selection in studying perturbation of dendritic structure caused by TSC2 inactivation”, Neuroinformatics 2006 4:1, c. 4, 81-94, (2006).
- [25] R. Qi, A. Ma, Q. Ma, ve Q. Zou, “Clustering and classification methods for single-cell RNA-sequencing data”, Briefings in Bioinformatics, c. 21, 1196-1208, (2020).
- [26] S. Karasu ve Z. Saraç, “Güç Kalitesi Bozulmalarının Hilbert-Huang Dönüşümü, Genetik Algoritma Ve Yapay Zeka/Makine Öğrenmesi Yöntemleri İle Sınıflandırılması”, Journal of Polytechnic, c. 23, 1219-1229, (2020).
- [27] M. O. Arowolo, M. Adebiyi, A. Adebiyi, ve O. Okesola, “PCA Model for RNA-Seq Malaria Vector Data Classification Using KNN and Decision Tree Algorithm”, 2020 International Conference in Mathematics, Computer Engineering and Computer Science, ICMCECS 2020, (2020).
- [28] H. Liu, H. Q. Tian, Y. F. Li, ve L. Zhang, “Comparison of four Adaboost algorithm based artificial neural networks in wind speed predictions”, Energy Conversion and Management, c. 92, 67-81, (2015).
- [29] H. H. Patel ve P. Prajapati, “Study and Analysis of Decision Tree Based Classification Algorithms”, International Journal of Computer Sciences and Engineering, c. 6, 74-78, (2018).
- [30] A. Kulkarni ve B. Lowe, “Random Forest Algorithm for Land Cover Classification”, Computer Science Faculty Publications and Presentations, [Çevrimiçi]. Erişim adresi: https://scholarworks.uttyler.edu/compsci_fac/1 (2016). Erişim: 04 Eylül 2024.
- [31] Z. U. Rehman, M. T. Mirza, A. Khan, ve H. Xhaard, “Predicting G-Protein-Coupled Receptors Families Using Different Physiochemical Properties and Pseudo Amino Acid Composition”, Methods in Enzymology, c. 522, 61-79, (2013).
- [32] İ. Keski̇n, M. Yadgar AHMED, A. Makalesi, ve R. Article Mohammed Yadgar AHMED, “A simulation on soil structure interaction with ABAQUS; effect on the behavior of a concrete building of soil layers and earthquake properties”, Journal of Polytechnic, c. 27, 749-757, (2024).
- [33] D. Chicco ve G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation”.
RNA m6A Modifikasyon Bölgelerinin Sınıflandırılması için Öznitelik Çıkarma ve Boyut Azaltma Yöntemlerinin Karşılaştırılması
Year 2024,
EARLY VIEW, 1 - 1
Batuhan Nuray
,
Volkan Altuntaş
Abstract
Bu çalışmada RNA’da sıklıkla meydana gelen N6-metiladenozin (m6A) modifikasyon bölgelerinin belirlenmesi ve gelecekte yapılacak çalışmalar için farklı öznitelik çıkarıcılar, öznitelik seçiciler ve boyut düşürme algoritmalarının, K-en yakın komşu sınıflandırma algoritması kullanılarak performanslarının karşılaştırılması amaçlanmıştır. 35 farklı öznitelik çıkarma algoritması ve 9 farklı boyut azaltma ve öznitelik seçici algoritma kullanılarak algoritmaların m6A modifikasyon bölgelerinin tanımlamasındaki performansları değerlendirilmiştir. Yapılan çalışmanın sonunda Nükleotidlerin kimyasal özelliklerini dikkate alarak öznitelik çıkarımı yapan NCP öznitelik çıkarma algoritması ile Ekstra Ağaçlar boyut azaltma yönteminin birlikte kullanılmasının m6A modifikasyon bölgelerinin belirlenmesinde yüksek performans gösterdiği görülmüştür.
References
- [1] P. Acera Mateos, Y. Zhou, K. Zarnack, E. Eyras, ve Y. Zhou contributed equally, “Concepts and methods for transcriptome-wide prediction of chemical messenger RNA modifications with machine learning”, Briefings in Bioinformatics, c. 1-14., (2023).
- [2] Y. Zhang vd., “StackRAM: a cross-species method for identifying RNA N 6-methyladenosine sites based on stacked ensemble”, (2022).
- [3] L. He, H. Li, A. Wu, Y. Peng, G. Shu, ve G. Yin, “Functions of N6-methyladenosine and its role in cancer”.
- [4] W. Chen, H. Tran, Z. Liang, H. Lin, ve L. Zhang, “Identification and analysis of the N 6-methyladenosine in the Saccharomyces cerevisiae transcriptome OPEN”, Nature Publishing Group, (2015).
- [5] A. K. Sangaiah vd., “M6AMRFS: Robust Prediction of N6-Methyladenosine Sites With Sequence-Based Features in Multiple Species”, (2018).
- [6] A. Khan, H. U. Rehman, U. Habib, ve U. Ijaz, “Detecting N6-methyladenosine sites from RNA transcriptomes using random forest”, Journal of Computational Science, c. 47, 101238, (2020).
- [7] L. Wei, R. Su, B. Wang, X. Li, Q. Zou, ve X. Gao, “Integration of deep feature representations and handcrafted features to improve the prediction of N 6-methyladenosine sites”, Neurocomputing, c. 324, 3-9, (2019).
- [8] A. Maity ve B. Das, “N6-methyladenosine modification in mRNA: Machinery, function and implications for health and diseases”, FEBS Journal, c. 283, 1607-1630, (2016).
- [9] J. Luo, T. Xu, ve K. Sun, “N6-Methyladenosine RNA Modification in Inflammation: Roles, Mechanisms, and Applications”, Frontiers in Cell and Developmental Biology, c. 9, 670711, (2021).
- [10] M. U. Rehman, K. J. Hong, H. Tayara, ve K. T. Chong, “m6A-NeuralTool: Convolution Neural Tool for RNA N6-Methyladenosine Site Identification in Different Species”.
- [11] M. F. Sabooh, N. Iqbal, M. Khan, M. Khan, ve H. F. Maqbool, “Identifying 5-methylcytosine sites in RNA sequence using composite encoding feature into Chou’s PseKNC”, Journal of Theoretical Biology, c. 452, 1-9, (2018).
- [12] Z. Chen vd., “iFeatureOmega: an integrative platform for engineering, visualization and analysis of features from molecular sequences, structural and ligand data sets”, Nucleic Acids Research, c. 50, W434-W447, (2022).
- [13] Z. Chen vd., “iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization”, Nucleic Acids Research, c. 49, (2021).
- [14] A. El Allali, Z. Elhamraoui, ve R. Daoud, “Machine learning applications in RNA modification sites prediction”, Computational and Structural Biotechnology Journal, c. 19, 5510-5524, (2021).
- [15] H. Wang, S. Wang, Y. Zhang, S. Bi, ve X. Zhu, “A brief review of machine learning methods for RNA methylation sites prediction”, Methods, c. 203, 399-421, (2022).
- [16] T. H. Nguyen-Vo, Q. H. Nguyen, T. T. T. Do, T. N. Nguyen, S. Rahardja, ve B. P. Nguyen, “IPseU-NCP: Identifying RNA pseudouridine sites using random forest and NCP-encoded features”, BMC Genomics, c. 20, 1-11, (2019).
- [17] M. O. Arowolo, M. O. Adebiyi, C. Aremu, ve A. A. Adebiyi, “A survey of dimension reduction and classification methods for RNA-Seq data on malaria vector”, Journal of Big Data, c. 8, 1-17, (2021).
- [18] C. Lan, H. Peng, G. Hutvagner, ve J. Li, “Construction of competing endogenous RNA networks from paired RNA-seq data sets by pointwise mutual information”, BMC Genomics, c. 20, 1-10, (2019).
- [19] Y. Bengio, O. Delalleau, N. Le Roux, J. F. Paiement, P. Vincent, ve M. Ouimet, “Learning Eigenfunctions Links Spectral Embedding and Kernel PCA”, Neural Computation, c. 16, 2197-2219, (2004).
- [20] Y. Liang, S. Zhang, H. Qiao, ve Y. Yao, “iPromoter-ET: Identifying promoters and their strength by extremely randomized trees-based feature selection”, Analytical Biochemistry, c. 630, 114335, (2021).
- [21] X. Zhu, T. Ching, X. Pan, S. M. Weissman, ve L. Garmire, “Detecting heterogeneity in single-cell RNA-Seq data by non-negative matrix factorization”, PeerJ, c. 2017, e2888, (2017).
- [22] N. Yu, M. J. Wu, J. X. Liu, C. H. Zheng, ve Y. Xu, “Correntropy-Based Hypergraph Regularized NMF for Clustering and Feature Selection on Multi-Cancer Integrated Data”, IEEE Transactions on Cybernetics, c. 51, 3952-3963, (2021).
- [23] Y. Liang, S. Zhang, H. Qiao, ve Y. Yao, “iPromoter-ET: Identifying promoters and their strength by extremely randomized trees-based feature selection”, Analytical Biochemistry, c. 630, 114335, (2021).
- [24] X. Zhou, J. Zhu, K. Y. Liu, B. L. Sabatini, ve S. T. C. Wong, “Mutual information-based feature selection in studying perturbation of dendritic structure caused by TSC2 inactivation”, Neuroinformatics 2006 4:1, c. 4, 81-94, (2006).
- [25] R. Qi, A. Ma, Q. Ma, ve Q. Zou, “Clustering and classification methods for single-cell RNA-sequencing data”, Briefings in Bioinformatics, c. 21, 1196-1208, (2020).
- [26] S. Karasu ve Z. Saraç, “Güç Kalitesi Bozulmalarının Hilbert-Huang Dönüşümü, Genetik Algoritma Ve Yapay Zeka/Makine Öğrenmesi Yöntemleri İle Sınıflandırılması”, Journal of Polytechnic, c. 23, 1219-1229, (2020).
- [27] M. O. Arowolo, M. Adebiyi, A. Adebiyi, ve O. Okesola, “PCA Model for RNA-Seq Malaria Vector Data Classification Using KNN and Decision Tree Algorithm”, 2020 International Conference in Mathematics, Computer Engineering and Computer Science, ICMCECS 2020, (2020).
- [28] H. Liu, H. Q. Tian, Y. F. Li, ve L. Zhang, “Comparison of four Adaboost algorithm based artificial neural networks in wind speed predictions”, Energy Conversion and Management, c. 92, 67-81, (2015).
- [29] H. H. Patel ve P. Prajapati, “Study and Analysis of Decision Tree Based Classification Algorithms”, International Journal of Computer Sciences and Engineering, c. 6, 74-78, (2018).
- [30] A. Kulkarni ve B. Lowe, “Random Forest Algorithm for Land Cover Classification”, Computer Science Faculty Publications and Presentations, [Çevrimiçi]. Erişim adresi: https://scholarworks.uttyler.edu/compsci_fac/1 (2016). Erişim: 04 Eylül 2024.
- [31] Z. U. Rehman, M. T. Mirza, A. Khan, ve H. Xhaard, “Predicting G-Protein-Coupled Receptors Families Using Different Physiochemical Properties and Pseudo Amino Acid Composition”, Methods in Enzymology, c. 522, 61-79, (2013).
- [32] İ. Keski̇n, M. Yadgar AHMED, A. Makalesi, ve R. Article Mohammed Yadgar AHMED, “A simulation on soil structure interaction with ABAQUS; effect on the behavior of a concrete building of soil layers and earthquake properties”, Journal of Polytechnic, c. 27, 749-757, (2024).
- [33] D. Chicco ve G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation”.