Research Article
BibTex RIS Cite

A Comparison for Feature Extraction and Dimension Reduction Methods for Classification of RNA m6A Modification Sites

Year 2024, EARLY VIEW, 1 - 1
https://doi.org/10.2339/politeknik.1511303

Abstract

In this study was to identify N6-methyladenosine (m6A) modification sitesthat frequently occur in RNA and to compare the performance of different feature extractors, feature selectors and dimension reduction algorithms using K-nearest neighbour classification algorithm for future studies. 35 different feature extraction algorithms and 9 different dimension reduction and feature selection algorithms were used to evaluate the performance of the algorithms in the identification of m6A modification sites. At the end of the study, it was observed that the combination of the NCP feature extraction algorithm, whic takes into account the chemical properties of nucleotides, and the Extra Trees dimension reduction mehod showed high performance in the identification of m6A modification sites.

References

  • [1] P. Acera Mateos, Y. Zhou, K. Zarnack, E. Eyras, ve Y. Zhou contributed equally, “Concepts and methods for transcriptome-wide prediction of chemical messenger RNA modifications with machine learning”, Briefings in Bioinformatics, c. 1-14., (2023).
  • [2] Y. Zhang vd., “StackRAM: a cross-species method for identifying RNA N 6-methyladenosine sites based on stacked ensemble”, (2022).
  • [3] L. He, H. Li, A. Wu, Y. Peng, G. Shu, ve G. Yin, “Functions of N6-methyladenosine and its role in cancer”.
  • [4] W. Chen, H. Tran, Z. Liang, H. Lin, ve L. Zhang, “Identification and analysis of the N 6-methyladenosine in the Saccharomyces cerevisiae transcriptome OPEN”, Nature Publishing Group, (2015).
  • [5] A. K. Sangaiah vd., “M6AMRFS: Robust Prediction of N6-Methyladenosine Sites With Sequence-Based Features in Multiple Species”, (2018).
  • [6] A. Khan, H. U. Rehman, U. Habib, ve U. Ijaz, “Detecting N6-methyladenosine sites from RNA transcriptomes using random forest”, Journal of Computational Science, c. 47, 101238, (2020).
  • [7] L. Wei, R. Su, B. Wang, X. Li, Q. Zou, ve X. Gao, “Integration of deep feature representations and handcrafted features to improve the prediction of N 6-methyladenosine sites”, Neurocomputing, c. 324, 3-9, (2019).
  • [8] A. Maity ve B. Das, “N6-methyladenosine modification in mRNA: Machinery, function and implications for health and diseases”, FEBS Journal, c. 283, 1607-1630, (2016).
  • [9] J. Luo, T. Xu, ve K. Sun, “N6-Methyladenosine RNA Modification in Inflammation: Roles, Mechanisms, and Applications”, Frontiers in Cell and Developmental Biology, c. 9, 670711, (2021).
  • [10] M. U. Rehman, K. J. Hong, H. Tayara, ve K. T. Chong, “m6A-NeuralTool: Convolution Neural Tool for RNA N6-Methyladenosine Site Identification in Different Species”.
  • [11] M. F. Sabooh, N. Iqbal, M. Khan, M. Khan, ve H. F. Maqbool, “Identifying 5-methylcytosine sites in RNA sequence using composite encoding feature into Chou’s PseKNC”, Journal of Theoretical Biology, c. 452, 1-9, (2018).
  • [12] Z. Chen vd., “iFeatureOmega: an integrative platform for engineering, visualization and analysis of features from molecular sequences, structural and ligand data sets”, Nucleic Acids Research, c. 50, W434-W447, (2022).
  • [13] Z. Chen vd., “iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization”, Nucleic Acids Research, c. 49, (2021).
  • [14] A. El Allali, Z. Elhamraoui, ve R. Daoud, “Machine learning applications in RNA modification sites prediction”, Computational and Structural Biotechnology Journal, c. 19, 5510-5524, (2021).
  • [15] H. Wang, S. Wang, Y. Zhang, S. Bi, ve X. Zhu, “A brief review of machine learning methods for RNA methylation sites prediction”, Methods, c. 203, 399-421, (2022).
  • [16] T. H. Nguyen-Vo, Q. H. Nguyen, T. T. T. Do, T. N. Nguyen, S. Rahardja, ve B. P. Nguyen, “IPseU-NCP: Identifying RNA pseudouridine sites using random forest and NCP-encoded features”, BMC Genomics, c. 20, 1-11, (2019).
  • [17] M. O. Arowolo, M. O. Adebiyi, C. Aremu, ve A. A. Adebiyi, “A survey of dimension reduction and classification methods for RNA-Seq data on malaria vector”, Journal of Big Data, c. 8, 1-17, (2021).
  • [18] C. Lan, H. Peng, G. Hutvagner, ve J. Li, “Construction of competing endogenous RNA networks from paired RNA-seq data sets by pointwise mutual information”, BMC Genomics, c. 20, 1-10, (2019).
  • [19] Y. Bengio, O. Delalleau, N. Le Roux, J. F. Paiement, P. Vincent, ve M. Ouimet, “Learning Eigenfunctions Links Spectral Embedding and Kernel PCA”, Neural Computation, c. 16, 2197-2219, (2004).
  • [20] Y. Liang, S. Zhang, H. Qiao, ve Y. Yao, “iPromoter-ET: Identifying promoters and their strength by extremely randomized trees-based feature selection”, Analytical Biochemistry, c. 630, 114335, (2021).
  • [21] X. Zhu, T. Ching, X. Pan, S. M. Weissman, ve L. Garmire, “Detecting heterogeneity in single-cell RNA-Seq data by non-negative matrix factorization”, PeerJ, c. 2017, e2888, (2017).
  • [22] N. Yu, M. J. Wu, J. X. Liu, C. H. Zheng, ve Y. Xu, “Correntropy-Based Hypergraph Regularized NMF for Clustering and Feature Selection on Multi-Cancer Integrated Data”, IEEE Transactions on Cybernetics, c. 51, 3952-3963, (2021).
  • [23] Y. Liang, S. Zhang, H. Qiao, ve Y. Yao, “iPromoter-ET: Identifying promoters and their strength by extremely randomized trees-based feature selection”, Analytical Biochemistry, c. 630, 114335, (2021).
  • [24] X. Zhou, J. Zhu, K. Y. Liu, B. L. Sabatini, ve S. T. C. Wong, “Mutual information-based feature selection in studying perturbation of dendritic structure caused by TSC2 inactivation”, Neuroinformatics 2006 4:1, c. 4, 81-94, (2006).
  • [25] R. Qi, A. Ma, Q. Ma, ve Q. Zou, “Clustering and classification methods for single-cell RNA-sequencing data”, Briefings in Bioinformatics, c. 21, 1196-1208, (2020).
  • [26] S. Karasu ve Z. Saraç, “Güç Kalitesi Bozulmalarının Hilbert-Huang Dönüşümü, Genetik Algoritma Ve Yapay Zeka/Makine Öğrenmesi Yöntemleri İle Sınıflandırılması”, Journal of Polytechnic, c. 23, 1219-1229, (2020).
  • [27] M. O. Arowolo, M. Adebiyi, A. Adebiyi, ve O. Okesola, “PCA Model for RNA-Seq Malaria Vector Data Classification Using KNN and Decision Tree Algorithm”, 2020 International Conference in Mathematics, Computer Engineering and Computer Science, ICMCECS 2020, (2020).
  • [28] H. Liu, H. Q. Tian, Y. F. Li, ve L. Zhang, “Comparison of four Adaboost algorithm based artificial neural networks in wind speed predictions”, Energy Conversion and Management, c. 92, 67-81, (2015).
  • [29] H. H. Patel ve P. Prajapati, “Study and Analysis of Decision Tree Based Classification Algorithms”, International Journal of Computer Sciences and Engineering, c. 6, 74-78, (2018).
  • [30] A. Kulkarni ve B. Lowe, “Random Forest Algorithm for Land Cover Classification”, Computer Science Faculty Publications and Presentations, [Çevrimiçi]. Erişim adresi: https://scholarworks.uttyler.edu/compsci_fac/1 (2016). Erişim: 04 Eylül 2024.
  • [31] Z. U. Rehman, M. T. Mirza, A. Khan, ve H. Xhaard, “Predicting G-Protein-Coupled Receptors Families Using Different Physiochemical Properties and Pseudo Amino Acid Composition”, Methods in Enzymology, c. 522, 61-79, (2013).
  • [32] İ. Keski̇n, M. Yadgar AHMED, A. Makalesi, ve R. Article Mohammed Yadgar AHMED, “A simulation on soil structure interaction with ABAQUS; effect on the behavior of a concrete building of soil layers and earthquake properties”, Journal of Polytechnic, c. 27, 749-757, (2024).
  • [33] D. Chicco ve G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation”.

RNA m6A Modifikasyon Bölgelerinin Sınıflandırılması için Öznitelik Çıkarma ve Boyut Azaltma Yöntemlerinin Karşılaştırılması

Year 2024, EARLY VIEW, 1 - 1
https://doi.org/10.2339/politeknik.1511303

Abstract

Bu çalışmada RNA’da sıklıkla meydana gelen N6-metiladenozin (m6A) modifikasyon bölgelerinin belirlenmesi ve gelecekte yapılacak çalışmalar için farklı öznitelik çıkarıcılar, öznitelik seçiciler ve boyut düşürme algoritmalarının, K-en yakın komşu sınıflandırma algoritması kullanılarak performanslarının karşılaştırılması amaçlanmıştır. 35 farklı öznitelik çıkarma algoritması ve 9 farklı boyut azaltma ve öznitelik seçici algoritma kullanılarak algoritmaların m6A modifikasyon bölgelerinin tanımlamasındaki performansları değerlendirilmiştir. Yapılan çalışmanın sonunda Nükleotidlerin kimyasal özelliklerini dikkate alarak öznitelik çıkarımı yapan NCP öznitelik çıkarma algoritması ile Ekstra Ağaçlar boyut azaltma yönteminin birlikte kullanılmasının m6A modifikasyon bölgelerinin belirlenmesinde yüksek performans gösterdiği görülmüştür.

References

  • [1] P. Acera Mateos, Y. Zhou, K. Zarnack, E. Eyras, ve Y. Zhou contributed equally, “Concepts and methods for transcriptome-wide prediction of chemical messenger RNA modifications with machine learning”, Briefings in Bioinformatics, c. 1-14., (2023).
  • [2] Y. Zhang vd., “StackRAM: a cross-species method for identifying RNA N 6-methyladenosine sites based on stacked ensemble”, (2022).
  • [3] L. He, H. Li, A. Wu, Y. Peng, G. Shu, ve G. Yin, “Functions of N6-methyladenosine and its role in cancer”.
  • [4] W. Chen, H. Tran, Z. Liang, H. Lin, ve L. Zhang, “Identification and analysis of the N 6-methyladenosine in the Saccharomyces cerevisiae transcriptome OPEN”, Nature Publishing Group, (2015).
  • [5] A. K. Sangaiah vd., “M6AMRFS: Robust Prediction of N6-Methyladenosine Sites With Sequence-Based Features in Multiple Species”, (2018).
  • [6] A. Khan, H. U. Rehman, U. Habib, ve U. Ijaz, “Detecting N6-methyladenosine sites from RNA transcriptomes using random forest”, Journal of Computational Science, c. 47, 101238, (2020).
  • [7] L. Wei, R. Su, B. Wang, X. Li, Q. Zou, ve X. Gao, “Integration of deep feature representations and handcrafted features to improve the prediction of N 6-methyladenosine sites”, Neurocomputing, c. 324, 3-9, (2019).
  • [8] A. Maity ve B. Das, “N6-methyladenosine modification in mRNA: Machinery, function and implications for health and diseases”, FEBS Journal, c. 283, 1607-1630, (2016).
  • [9] J. Luo, T. Xu, ve K. Sun, “N6-Methyladenosine RNA Modification in Inflammation: Roles, Mechanisms, and Applications”, Frontiers in Cell and Developmental Biology, c. 9, 670711, (2021).
  • [10] M. U. Rehman, K. J. Hong, H. Tayara, ve K. T. Chong, “m6A-NeuralTool: Convolution Neural Tool for RNA N6-Methyladenosine Site Identification in Different Species”.
  • [11] M. F. Sabooh, N. Iqbal, M. Khan, M. Khan, ve H. F. Maqbool, “Identifying 5-methylcytosine sites in RNA sequence using composite encoding feature into Chou’s PseKNC”, Journal of Theoretical Biology, c. 452, 1-9, (2018).
  • [12] Z. Chen vd., “iFeatureOmega: an integrative platform for engineering, visualization and analysis of features from molecular sequences, structural and ligand data sets”, Nucleic Acids Research, c. 50, W434-W447, (2022).
  • [13] Z. Chen vd., “iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization”, Nucleic Acids Research, c. 49, (2021).
  • [14] A. El Allali, Z. Elhamraoui, ve R. Daoud, “Machine learning applications in RNA modification sites prediction”, Computational and Structural Biotechnology Journal, c. 19, 5510-5524, (2021).
  • [15] H. Wang, S. Wang, Y. Zhang, S. Bi, ve X. Zhu, “A brief review of machine learning methods for RNA methylation sites prediction”, Methods, c. 203, 399-421, (2022).
  • [16] T. H. Nguyen-Vo, Q. H. Nguyen, T. T. T. Do, T. N. Nguyen, S. Rahardja, ve B. P. Nguyen, “IPseU-NCP: Identifying RNA pseudouridine sites using random forest and NCP-encoded features”, BMC Genomics, c. 20, 1-11, (2019).
  • [17] M. O. Arowolo, M. O. Adebiyi, C. Aremu, ve A. A. Adebiyi, “A survey of dimension reduction and classification methods for RNA-Seq data on malaria vector”, Journal of Big Data, c. 8, 1-17, (2021).
  • [18] C. Lan, H. Peng, G. Hutvagner, ve J. Li, “Construction of competing endogenous RNA networks from paired RNA-seq data sets by pointwise mutual information”, BMC Genomics, c. 20, 1-10, (2019).
  • [19] Y. Bengio, O. Delalleau, N. Le Roux, J. F. Paiement, P. Vincent, ve M. Ouimet, “Learning Eigenfunctions Links Spectral Embedding and Kernel PCA”, Neural Computation, c. 16, 2197-2219, (2004).
  • [20] Y. Liang, S. Zhang, H. Qiao, ve Y. Yao, “iPromoter-ET: Identifying promoters and their strength by extremely randomized trees-based feature selection”, Analytical Biochemistry, c. 630, 114335, (2021).
  • [21] X. Zhu, T. Ching, X. Pan, S. M. Weissman, ve L. Garmire, “Detecting heterogeneity in single-cell RNA-Seq data by non-negative matrix factorization”, PeerJ, c. 2017, e2888, (2017).
  • [22] N. Yu, M. J. Wu, J. X. Liu, C. H. Zheng, ve Y. Xu, “Correntropy-Based Hypergraph Regularized NMF for Clustering and Feature Selection on Multi-Cancer Integrated Data”, IEEE Transactions on Cybernetics, c. 51, 3952-3963, (2021).
  • [23] Y. Liang, S. Zhang, H. Qiao, ve Y. Yao, “iPromoter-ET: Identifying promoters and their strength by extremely randomized trees-based feature selection”, Analytical Biochemistry, c. 630, 114335, (2021).
  • [24] X. Zhou, J. Zhu, K. Y. Liu, B. L. Sabatini, ve S. T. C. Wong, “Mutual information-based feature selection in studying perturbation of dendritic structure caused by TSC2 inactivation”, Neuroinformatics 2006 4:1, c. 4, 81-94, (2006).
  • [25] R. Qi, A. Ma, Q. Ma, ve Q. Zou, “Clustering and classification methods for single-cell RNA-sequencing data”, Briefings in Bioinformatics, c. 21, 1196-1208, (2020).
  • [26] S. Karasu ve Z. Saraç, “Güç Kalitesi Bozulmalarının Hilbert-Huang Dönüşümü, Genetik Algoritma Ve Yapay Zeka/Makine Öğrenmesi Yöntemleri İle Sınıflandırılması”, Journal of Polytechnic, c. 23, 1219-1229, (2020).
  • [27] M. O. Arowolo, M. Adebiyi, A. Adebiyi, ve O. Okesola, “PCA Model for RNA-Seq Malaria Vector Data Classification Using KNN and Decision Tree Algorithm”, 2020 International Conference in Mathematics, Computer Engineering and Computer Science, ICMCECS 2020, (2020).
  • [28] H. Liu, H. Q. Tian, Y. F. Li, ve L. Zhang, “Comparison of four Adaboost algorithm based artificial neural networks in wind speed predictions”, Energy Conversion and Management, c. 92, 67-81, (2015).
  • [29] H. H. Patel ve P. Prajapati, “Study and Analysis of Decision Tree Based Classification Algorithms”, International Journal of Computer Sciences and Engineering, c. 6, 74-78, (2018).
  • [30] A. Kulkarni ve B. Lowe, “Random Forest Algorithm for Land Cover Classification”, Computer Science Faculty Publications and Presentations, [Çevrimiçi]. Erişim adresi: https://scholarworks.uttyler.edu/compsci_fac/1 (2016). Erişim: 04 Eylül 2024.
  • [31] Z. U. Rehman, M. T. Mirza, A. Khan, ve H. Xhaard, “Predicting G-Protein-Coupled Receptors Families Using Different Physiochemical Properties and Pseudo Amino Acid Composition”, Methods in Enzymology, c. 522, 61-79, (2013).
  • [32] İ. Keski̇n, M. Yadgar AHMED, A. Makalesi, ve R. Article Mohammed Yadgar AHMED, “A simulation on soil structure interaction with ABAQUS; effect on the behavior of a concrete building of soil layers and earthquake properties”, Journal of Polytechnic, c. 27, 749-757, (2024).
  • [33] D. Chicco ve G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation”.
There are 33 citations in total.

Details

Primary Language Turkish
Subjects Machine Learning (Other)
Journal Section Research Article
Authors

Batuhan Nuray 0009-0008-1345-3363

Volkan Altuntaş 0000-0003-3144-8724

Early Pub Date September 10, 2024
Publication Date
Submission Date July 5, 2024
Acceptance Date September 1, 2024
Published in Issue Year 2024 EARLY VIEW

Cite

APA Nuray, B., & Altuntaş, V. (2024). RNA m6A Modifikasyon Bölgelerinin Sınıflandırılması için Öznitelik Çıkarma ve Boyut Azaltma Yöntemlerinin Karşılaştırılması. Politeknik Dergisi1-1. https://doi.org/10.2339/politeknik.1511303
AMA Nuray B, Altuntaş V. RNA m6A Modifikasyon Bölgelerinin Sınıflandırılması için Öznitelik Çıkarma ve Boyut Azaltma Yöntemlerinin Karşılaştırılması. Politeknik Dergisi. Published online September 1, 2024:1-1. doi:10.2339/politeknik.1511303
Chicago Nuray, Batuhan, and Volkan Altuntaş. “RNA m6A Modifikasyon Bölgelerinin Sınıflandırılması için Öznitelik Çıkarma Ve Boyut Azaltma Yöntemlerinin Karşılaştırılması”. Politeknik Dergisi, September (September 2024), 1-1. https://doi.org/10.2339/politeknik.1511303.
EndNote Nuray B, Altuntaş V (September 1, 2024) RNA m6A Modifikasyon Bölgelerinin Sınıflandırılması için Öznitelik Çıkarma ve Boyut Azaltma Yöntemlerinin Karşılaştırılması. Politeknik Dergisi 1–1.
IEEE B. Nuray and V. Altuntaş, “RNA m6A Modifikasyon Bölgelerinin Sınıflandırılması için Öznitelik Çıkarma ve Boyut Azaltma Yöntemlerinin Karşılaştırılması”, Politeknik Dergisi, pp. 1–1, September 2024, doi: 10.2339/politeknik.1511303.
ISNAD Nuray, Batuhan - Altuntaş, Volkan. “RNA m6A Modifikasyon Bölgelerinin Sınıflandırılması için Öznitelik Çıkarma Ve Boyut Azaltma Yöntemlerinin Karşılaştırılması”. Politeknik Dergisi. September 2024. 1-1. https://doi.org/10.2339/politeknik.1511303.
JAMA Nuray B, Altuntaş V. RNA m6A Modifikasyon Bölgelerinin Sınıflandırılması için Öznitelik Çıkarma ve Boyut Azaltma Yöntemlerinin Karşılaştırılması. Politeknik Dergisi. 2024;:1–1.
MLA Nuray, Batuhan and Volkan Altuntaş. “RNA m6A Modifikasyon Bölgelerinin Sınıflandırılması için Öznitelik Çıkarma Ve Boyut Azaltma Yöntemlerinin Karşılaştırılması”. Politeknik Dergisi, 2024, pp. 1-1, doi:10.2339/politeknik.1511303.
Vancouver Nuray B, Altuntaş V. RNA m6A Modifikasyon Bölgelerinin Sınıflandırılması için Öznitelik Çıkarma ve Boyut Azaltma Yöntemlerinin Karşılaştırılması. Politeknik Dergisi. 2024:1-.