Meta-Sezgisel Tabanlı Clustal-SA Algoritmasını Kullanarak DNA Sekanslarında Çoklu Dizi Hizalama

Hatic Erdirik; Abdullah Ammar Karcıoğlu; Bahattin Tanyolaç; Hasan Bulut

doi:10.21597/jist.1404898

Research Article

Meta-Sezgisel Tabanlı Clustal-SA Algoritmasını Kullanarak DNA Sekanslarında Çoklu Dizi Hizalama

Year 2024, Volume: 14 Issue: 2, 544 - 562, 01.06.2024

Hatic Erdirik , Abdullah Ammar Karcıoğlu , Bahattin Tanyolaç , Hasan Bulut

https://doi.org/10.21597/jist.1404898

Abstract

Biyoinformatik, biyolojik verilerin analizi ve kalıtsal ilişkilerin ortaya çıkarılması için matematik, biyoloji ve bilgisayar bilimlerini birleştiren bir disiplindir. Bu alandaki en kritik görevlerden biri, biyolojik dizilerin hizalanmasıyla ilgili olan dizi hizalama problemini çözmektir. Ancak, biyolojik verilerin hızla artması, bu problemi manuel olarak çözülemez hale getirmiş ve bilgisayar sistemlerinin biyoinformatikte daha yaygın bir şekilde kullanılmasına yol açmıştır. Bu çalışmada, mevcut Clustal algoritması ve benzetimli tavlama algoritması kullanılarak yeni bir dizi hizalama algoritması önerilmiştir. Clustal algoritmasının hız avantajını kullanarak ve benzetimli tavlama algoritmasını entegre ederek, Clustal'ın aç gözlü yaklaşımından uzaklaşılarak optimal hizalama skoru elde etmek amaçlanmıştır. Geliştirilen algoritmanın başarısını değerlendirmek için SP (Çiftlerin Toplamı) puanlama sistemi kullanılmış ve hizalama sonucunda sütun eşleşme sayısı dikkate alınmıştır. Elde edilen sonuçlar, geliştirilen algoritmanın aynı uzunluktaki dizi veri kümeleri üzerinde ClustalW programından daha iyi performans gösterdiğini, MUSCLE programına göre ise bazı veri setlerinde daha başarılı olduğu veya yakın sonuçlar verdiğini ortaya koymuştur. Bu gelişme, biyoinformatik alanında dizi hizalama problemini çözmek için yeni ve daha etkili bir yaklaşımın potansiyelini vurgulamaktadır. Gelecekte, bu tür geliştirmelerin biyolojik veri analizi alanında daha geniş bir uygulama alanı bulabileceği düşünülmektedir.

Keywords

Çoklu dizi hizalama, Sezgisel Yaklaşım, Clustal Algoritması, Benzetimli Tavlama, Biyoinformatik

Supporting Institution

Ege Üniversitesi Bilimsel Araştırma Projeleri Koordinatörlüğü

Project Number

FOA-2020-20981

Thanks

Bu çalışma Ege Üniversitesi BAP koordinatörlüğü tarafından FOA-2020-20981 kodlu “Cicer ve Lens türlerinin kloroplast DNA dizilerinin Next Generation Sequencing ile sekanslanması ve genom organizasyonlarının belirlenerek karşılaştırmalı genom analizlerinin gerçekleştirilmesi” isimli proje kapsamında desteklenmiştir.

References

Aarts, E. H., & van Laarhoven, P. J. (1987). Simulated annealing: a pedestrian review of the theory and some applications. In Pattern recognition theory and applications (pp. 179-192). Springer Berlin Heidelberg. Doi: 10.1007/978-3-642-83069-3_15
Aktan, M. N., & Bulut, H. (2022). Metaheuristic task scheduling algorithms for cloud computing environments. Concurrency and Computation: Practice and Experience, 34(9), e6513. Doi: 10.1002/cpe.6513
Botta, M., & Negro, G. (2010). Multiple sequence alignment with genetic algorithms. In Computational Intelligence Methods for Bioinformatics and Biostatistics: 6th International Meeting, CIBB 2009, Genoa, Italy, October 15-17, 2009. Springer Berlin Heidelberg. Doi: 10.1007/978-3-642-14571-1_15
Bucak, İ. Ö., & Uslan, V. (2011). Sequence alignment from the perspective of stochastic optimization: a survey. Turkish Journal of Electrical Engineering and Computer Sciences, 19(1), 157-173. Doi: 10.3906/elk-1002-410
Chen J.T, Chao J.N, Liu H., Yang F.L., Zou Q. & Tang F.R, (2023) WMSA 2: a multiple DNA/RNA Sequence alignment tool implemented with accurate progressive mode and a fast win-win mode combining the center star and progressive strategies, Briefings in Bioinformatics, Volume: 24, Issue:4, Doi :10.1093/bib/bbad190
Cohen, J. (2004). Bioinformatics—an introduction for computer scientists. ACM Computing Surveys (CSUR), 36(2), 122-158. Doi: 10.1145/1031120.1031122
Diamantis, S., & Anna, C. (2005). Comparison of multiple sequence alignment programs. National and Kapodistrian university of Athens.
Doğan, H., & Otu, H., (2014) Multiple Sequence Alignment Methods: Objevtive Function, Springer Protocols, Chepter 3, 44-85p
Edgar, R.C., (2004), MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucleic Acids Research, 200, vol.32, No.5, DOI: 10.1093/nar/gkh340
Edgar, R.C., Batzoglou, S., (2006) Multiple sequence alignmnet, Current Opinion in Structural Biology 2006, 16:368–373, Elsevier
Haque, W., Aravind, A., & Reddy, B. (2009, March). Pairwise sequence alignment algorithms: a survey. In Proceedings of the 2009 conference on Information Science, Technology and Applications (pp. 96-103). Doi: 10.1145/1551950.1551980
Karcıoğlu, A. A., & Bulut, H. (2022). DNA sekansları için q-gram hash karşılaştırmasına dayalı çoklu kesin dizi eşleştirme algoritması. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi, 38(2), 875-888. Doi: 10.17341/gazimmfd.951157
Karcioglu, A. A., & Bulut, H. (2021). Improving hash-q exact string matching algorithm with perfect hashing for DNA sequences. Computers in Biology and Medicine, 131, 104292. Doi: 10.1016/j.compbiomed.2021.104292
Karcioglu, A. A., & Bulut, H. (2021). The WM-q multiple exact string matching algorithm for DNA sequences. Computers in Biology and Medicine, 136, 104656. Doi: 10.1016/j.compbiomed.2021.104656
Karcioglu, A. A., & Bulut, H. (2022). q‐frame hash comparison based exact string matching algorithms for DNA sequences. Concurrency and Computation: Practice and Experience, 34(9), e6505. Doi: 10.1002/cpe.6505
Lee, Z. J., Su, S. F., Chuang, C. C., & Liu, K. H. (2008). Genetic algorithm with ant colony optimization (GA-ACO) for multiple sequence alignment. Applied Soft Computing, 8(1), 55-78. Doi: 10.1016/j.asoc.2006.10.012
Likic, V. (2008). The Needleman-Wunsch algorithm for sequence alignment. Lecture given at the 7th Melbourne Bioinformatics Course, Bi021 Molecular Science and Biotechnology Institute, University of Melbourne, 1-46.
Luscombe, N. M., Greenbaum, D., & Gerstein, M. (2001). What is bioinformatics? An introduction and overview. Yearbook of medical informatics, 10(01), 83-100. Doi: 10.1055/s-0038-1638103
Major Differences, Difference between Global and Local Sequence Alignment, https://www.majordifferences.com/2016/05/differencebetween-global-and-local.html, Access Date: 28.12.2022.
Mirjalili, S. (2019). Evolutionary algorithms and neural networks. In Studies in computational intelligence (Vol. 780). Berlin/Heidelberg, Germany: Springer.
Omar, M.F., Salam, R.A., Abdullah, R. & Rashid, N.A (2004). Multiple Sequence Alignment Using Optimization Algorithms, International Journal of Computational Intelligence Volume 1 Number 2
Pais, F. S. M., Ruy, P. D. C., Oliveira, G., & Coimbra, R. S. (2014). Assessing the efficiency of multiple sequence alignment programs. Algorithms for molecular biology, 9, 1-8. Doi: 10.1186/1748-7188-9-4
Paruchuri T., Kancharla G.R. & Dara S. (2023). Solving multiple sequence alignment problems by using a swarm intelligent optimization based approach, International Journal of Electrical and Computer Engineering (IJECE), Vol. 13, No. 1, February 2023, pp. 1097-1104
Russel and Norvig. (2010). Artificial intelligence: a modern approach, Global Edition. Harlow, Essex, England: Pearson Educatio.
Smith, T. F., & Waterman, M. S. (1981). Identification of common molecular subsequences. Journal of molecular biology, 147(1), 195-197.

Multiple Sequence Alignment for DNA Sequences by Using Meta-Heuristic Based Clustal-SA Algorithm

Year 2024, Volume: 14 Issue: 2, 544 - 562, 01.06.2024

Hatic Erdirik , Abdullah Ammar Karcıoğlu , Bahattin Tanyolaç , Hasan Bulut

https://doi.org/10.21597/jist.1404898

Abstract

Bioinformatics is a discipline that combines mathematics, biology and computer science to analyze biological data and reveal genetic relationships. One of the most critical tasks in this field is to solve the sequence alignment problem, which is related to the alignment of biological sequences. However, the rapid increase in biological data has made this problem unsolvable manually and led to the more widespread use of computer systems in bioinformatics. In this study, a new sequence alignment algorithm is proposed using the existing Clustal algorithm and simulated annealing algorithm. By using the speed advantage of the Clustal algorithm and integrating the simulated annealing algorithm, it is aimed to obtain an optimal alignment score by moving away from the greedy approach of Clustal. To evaluate the success of the developed algorithm, the SP (Sum of Pairs) scoring system was used and the number of column matches as a result of the alignment was taken into account. The results obtained revealed that the developed algorithm performed better than the ClustalW program on sequence data sets of the same length, and was more successful or gave similar results on some data sets compared to the MUSCLE program. This development highlights the potential of a new and more effective approach to solving the sequence alignment problem in bioinformatics. It is thought that in the future, such developments may find wider application in the field of biological data analysis.

Keywords

Multiple Sequence Alignment, Heuristic Approach, Clustal Algorithm, Simulated Annealing, Bioinformatics

Supporting Institution

Ege University Scientific Research Projects Coordinatorship

Project Number

FOA-2020-20981

Thanks

This study was supported by Ege University BAP coordinatorship within the scope of the project titled "Sequencing of chloroplast DNA sequences of Cicer and Lens species by Next Generation Sequencing and performing comparative genome analysis by determining genome organizations" with the code FOA-2020-20981.

References

Aarts, E. H., & van Laarhoven, P. J. (1987). Simulated annealing: a pedestrian review of the theory and some applications. In Pattern recognition theory and applications (pp. 179-192). Springer Berlin Heidelberg. Doi: 10.1007/978-3-642-83069-3_15
Aktan, M. N., & Bulut, H. (2022). Metaheuristic task scheduling algorithms for cloud computing environments. Concurrency and Computation: Practice and Experience, 34(9), e6513. Doi: 10.1002/cpe.6513
Botta, M., & Negro, G. (2010). Multiple sequence alignment with genetic algorithms. In Computational Intelligence Methods for Bioinformatics and Biostatistics: 6th International Meeting, CIBB 2009, Genoa, Italy, October 15-17, 2009. Springer Berlin Heidelberg. Doi: 10.1007/978-3-642-14571-1_15
Bucak, İ. Ö., & Uslan, V. (2011). Sequence alignment from the perspective of stochastic optimization: a survey. Turkish Journal of Electrical Engineering and Computer Sciences, 19(1), 157-173. Doi: 10.3906/elk-1002-410
Chen J.T, Chao J.N, Liu H., Yang F.L., Zou Q. & Tang F.R, (2023) WMSA 2: a multiple DNA/RNA Sequence alignment tool implemented with accurate progressive mode and a fast win-win mode combining the center star and progressive strategies, Briefings in Bioinformatics, Volume: 24, Issue:4, Doi :10.1093/bib/bbad190
Cohen, J. (2004). Bioinformatics—an introduction for computer scientists. ACM Computing Surveys (CSUR), 36(2), 122-158. Doi: 10.1145/1031120.1031122
Diamantis, S., & Anna, C. (2005). Comparison of multiple sequence alignment programs. National and Kapodistrian university of Athens.
Doğan, H., & Otu, H., (2014) Multiple Sequence Alignment Methods: Objevtive Function, Springer Protocols, Chepter 3, 44-85p
Edgar, R.C., (2004), MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucleic Acids Research, 200, vol.32, No.5, DOI: 10.1093/nar/gkh340
Edgar, R.C., Batzoglou, S., (2006) Multiple sequence alignmnet, Current Opinion in Structural Biology 2006, 16:368–373, Elsevier
Haque, W., Aravind, A., & Reddy, B. (2009, March). Pairwise sequence alignment algorithms: a survey. In Proceedings of the 2009 conference on Information Science, Technology and Applications (pp. 96-103). Doi: 10.1145/1551950.1551980
Karcıoğlu, A. A., & Bulut, H. (2022). DNA sekansları için q-gram hash karşılaştırmasına dayalı çoklu kesin dizi eşleştirme algoritması. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi, 38(2), 875-888. Doi: 10.17341/gazimmfd.951157
Karcioglu, A. A., & Bulut, H. (2021). Improving hash-q exact string matching algorithm with perfect hashing for DNA sequences. Computers in Biology and Medicine, 131, 104292. Doi: 10.1016/j.compbiomed.2021.104292
Karcioglu, A. A., & Bulut, H. (2021). The WM-q multiple exact string matching algorithm for DNA sequences. Computers in Biology and Medicine, 136, 104656. Doi: 10.1016/j.compbiomed.2021.104656
Karcioglu, A. A., & Bulut, H. (2022). q‐frame hash comparison based exact string matching algorithms for DNA sequences. Concurrency and Computation: Practice and Experience, 34(9), e6505. Doi: 10.1002/cpe.6505
Lee, Z. J., Su, S. F., Chuang, C. C., & Liu, K. H. (2008). Genetic algorithm with ant colony optimization (GA-ACO) for multiple sequence alignment. Applied Soft Computing, 8(1), 55-78. Doi: 10.1016/j.asoc.2006.10.012
Likic, V. (2008). The Needleman-Wunsch algorithm for sequence alignment. Lecture given at the 7th Melbourne Bioinformatics Course, Bi021 Molecular Science and Biotechnology Institute, University of Melbourne, 1-46.
Luscombe, N. M., Greenbaum, D., & Gerstein, M. (2001). What is bioinformatics? An introduction and overview. Yearbook of medical informatics, 10(01), 83-100. Doi: 10.1055/s-0038-1638103
Major Differences, Difference between Global and Local Sequence Alignment, https://www.majordifferences.com/2016/05/differencebetween-global-and-local.html, Access Date: 28.12.2022.
Mirjalili, S. (2019). Evolutionary algorithms and neural networks. In Studies in computational intelligence (Vol. 780). Berlin/Heidelberg, Germany: Springer.
Omar, M.F., Salam, R.A., Abdullah, R. & Rashid, N.A (2004). Multiple Sequence Alignment Using Optimization Algorithms, International Journal of Computational Intelligence Volume 1 Number 2
Pais, F. S. M., Ruy, P. D. C., Oliveira, G., & Coimbra, R. S. (2014). Assessing the efficiency of multiple sequence alignment programs. Algorithms for molecular biology, 9, 1-8. Doi: 10.1186/1748-7188-9-4
Paruchuri T., Kancharla G.R. & Dara S. (2023). Solving multiple sequence alignment problems by using a swarm intelligent optimization based approach, International Journal of Electrical and Computer Engineering (IJECE), Vol. 13, No. 1, February 2023, pp. 1097-1104
Russel and Norvig. (2010). Artificial intelligence: a modern approach, Global Edition. Harlow, Essex, England: Pearson Educatio.
Smith, T. F., & Waterman, M. S. (1981). Identification of common molecular subsequences. Journal of molecular biology, 147(1), 195-197.

There are 25 citations in total.

Details

Primary Language	Turkish
Subjects	Computer Software
Journal Section	Bilgisayar Mühendisliği / Computer Engineering
Authors	Hatic Erdirik 0000-0002-5816-4804 Abdullah Ammar Karcıoğlu 0000-0002-0907-751X Bahattin Tanyolaç 0000-0002-4368-0988 Hasan Bulut 0000-0002-4872-5698
Project Number	FOA-2020-20981
Early Pub Date	May 28, 2024
Publication Date	June 1, 2024
Submission Date	December 14, 2023
Acceptance Date	April 4, 2024
Published in Issue	Year 2024 Volume: 14 Issue: 2

Cite

APA	Erdirik, H., Karcıoğlu, A. A., Tanyolaç, B., Bulut, H. (2024). Meta-Sezgisel Tabanlı Clustal-SA Algoritmasını Kullanarak DNA Sekanslarında Çoklu Dizi Hizalama. Journal of the Institute of Science and Technology, 14(2), 544-562. https://doi.org/10.21597/jist.1404898
AMA	Erdirik H, Karcıoğlu AA, Tanyolaç B, Bulut H. Meta-Sezgisel Tabanlı Clustal-SA Algoritmasını Kullanarak DNA Sekanslarında Çoklu Dizi Hizalama. J. Inst. Sci. and Tech. June 2024;14(2):544-562. doi:10.21597/jist.1404898
Chicago	Erdirik, Hatic, Abdullah Ammar Karcıoğlu, Bahattin Tanyolaç, and Hasan Bulut. “Meta-Sezgisel Tabanlı Clustal-SA Algoritmasını Kullanarak DNA Sekanslarında Çoklu Dizi Hizalama”. Journal of the Institute of Science and Technology 14, no. 2 (June 2024): 544-62. https://doi.org/10.21597/jist.1404898.
EndNote	Erdirik H, Karcıoğlu AA, Tanyolaç B, Bulut H (June 1, 2024) Meta-Sezgisel Tabanlı Clustal-SA Algoritmasını Kullanarak DNA Sekanslarında Çoklu Dizi Hizalama. Journal of the Institute of Science and Technology 14 2 544–562.
IEEE	H. Erdirik, A. A. Karcıoğlu, B. Tanyolaç, and H. Bulut, “Meta-Sezgisel Tabanlı Clustal-SA Algoritmasını Kullanarak DNA Sekanslarında Çoklu Dizi Hizalama”, J. Inst. Sci. and Tech., vol. 14, no. 2, pp. 544–562, 2024, doi: 10.21597/jist.1404898.
ISNAD	Erdirik, Hatic et al. “Meta-Sezgisel Tabanlı Clustal-SA Algoritmasını Kullanarak DNA Sekanslarında Çoklu Dizi Hizalama”. Journal of the Institute of Science and Technology 14/2 (June 2024), 544-562. https://doi.org/10.21597/jist.1404898.
JAMA	Erdirik H, Karcıoğlu AA, Tanyolaç B, Bulut H. Meta-Sezgisel Tabanlı Clustal-SA Algoritmasını Kullanarak DNA Sekanslarında Çoklu Dizi Hizalama. J. Inst. Sci. and Tech. 2024;14:544–562.
MLA	Erdirik, Hatic et al. “Meta-Sezgisel Tabanlı Clustal-SA Algoritmasını Kullanarak DNA Sekanslarında Çoklu Dizi Hizalama”. Journal of the Institute of Science and Technology, vol. 14, no. 2, 2024, pp. 544-62, doi:10.21597/jist.1404898.
Vancouver	Erdirik H, Karcıoğlu AA, Tanyolaç B, Bulut H. Meta-Sezgisel Tabanlı Clustal-SA Algoritmasını Kullanarak DNA Sekanslarında Çoklu Dizi Hizalama. J. Inst. Sci. and Tech. 2024;14(2):544-62.

Download Cover Image

Article Files

Full Text