Meta-Sezgisel Tabanlı Clustal-SA Algoritmasını Kullanarak DNA Sekanslarında Çoklu Dizi Hizalama
Year 2024,
Volume: 14 Issue: 2, 544 - 562, 01.06.2024
Hatic Erdirik
,
Abdullah Ammar Karcıoğlu
,
Bahattin Tanyolaç
,
Hasan Bulut
Abstract
Biyoinformatik, biyolojik verilerin analizi ve kalıtsal ilişkilerin ortaya çıkarılması için matematik, biyoloji ve bilgisayar bilimlerini birleştiren bir disiplindir. Bu alandaki en kritik görevlerden biri, biyolojik dizilerin hizalanmasıyla ilgili olan dizi hizalama problemini çözmektir. Ancak, biyolojik verilerin hızla artması, bu problemi manuel olarak çözülemez hale getirmiş ve bilgisayar sistemlerinin biyoinformatikte daha yaygın bir şekilde kullanılmasına yol açmıştır. Bu çalışmada, mevcut Clustal algoritması ve benzetimli tavlama algoritması kullanılarak yeni bir dizi hizalama algoritması önerilmiştir. Clustal algoritmasının hız avantajını kullanarak ve benzetimli tavlama algoritmasını entegre ederek, Clustal'ın aç gözlü yaklaşımından uzaklaşılarak optimal hizalama skoru elde etmek amaçlanmıştır. Geliştirilen algoritmanın başarısını değerlendirmek için SP (Çiftlerin Toplamı) puanlama sistemi kullanılmış ve hizalama sonucunda sütun eşleşme sayısı dikkate alınmıştır. Elde edilen sonuçlar, geliştirilen algoritmanın aynı uzunluktaki dizi veri kümeleri üzerinde ClustalW programından daha iyi performans gösterdiğini, MUSCLE programına göre ise bazı veri setlerinde daha başarılı olduğu veya yakın sonuçlar verdiğini ortaya koymuştur. Bu gelişme, biyoinformatik alanında dizi hizalama problemini çözmek için yeni ve daha etkili bir yaklaşımın potansiyelini vurgulamaktadır. Gelecekte, bu tür geliştirmelerin biyolojik veri analizi alanında daha geniş bir uygulama alanı bulabileceği düşünülmektedir.
Supporting Institution
Ege Üniversitesi Bilimsel Araştırma Projeleri Koordinatörlüğü
Project Number
FOA-2020-20981
Thanks
Bu çalışma Ege Üniversitesi BAP koordinatörlüğü tarafından FOA-2020-20981 kodlu “Cicer ve Lens türlerinin kloroplast DNA dizilerinin Next Generation Sequencing ile sekanslanması ve genom organizasyonlarının belirlenerek karşılaştırmalı genom analizlerinin gerçekleştirilmesi” isimli proje kapsamında desteklenmiştir.
References
- Aarts, E. H., & van Laarhoven, P. J. (1987). Simulated annealing: a pedestrian review of the theory and some applications. In Pattern recognition theory and applications (pp. 179-192). Springer Berlin Heidelberg. Doi: 10.1007/978-3-642-83069-3_15
- Aktan, M. N., & Bulut, H. (2022). Metaheuristic task scheduling algorithms for cloud computing environments. Concurrency and Computation: Practice and Experience, 34(9), e6513. Doi: 10.1002/cpe.6513
- Botta, M., & Negro, G. (2010). Multiple sequence alignment with genetic algorithms. In Computational Intelligence Methods for Bioinformatics and Biostatistics: 6th International Meeting, CIBB 2009, Genoa, Italy, October 15-17, 2009. Springer Berlin Heidelberg. Doi: 10.1007/978-3-642-14571-1_15
- Bucak, İ. Ö., & Uslan, V. (2011). Sequence alignment from the perspective of stochastic optimization: a survey. Turkish Journal of Electrical Engineering and Computer Sciences, 19(1), 157-173. Doi: 10.3906/elk-1002-410
- Chen J.T, Chao J.N, Liu H., Yang F.L., Zou Q. & Tang F.R, (2023) WMSA 2: a multiple DNA/RNA Sequence alignment tool implemented with accurate progressive mode and a fast win-win mode combining the center star and progressive strategies, Briefings in Bioinformatics, Volume: 24, Issue:4, Doi :10.1093/bib/bbad190
- Cohen, J. (2004). Bioinformatics—an introduction for computer scientists. ACM Computing Surveys (CSUR), 36(2), 122-158. Doi: 10.1145/1031120.1031122
- Diamantis, S., & Anna, C. (2005). Comparison of multiple sequence alignment programs. National and Kapodistrian university of Athens.
- Doğan, H., & Otu, H., (2014) Multiple Sequence Alignment Methods: Objevtive Function, Springer Protocols, Chepter 3, 44-85p
- Edgar, R.C., (2004), MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucleic Acids Research, 200, vol.32, No.5, DOI: 10.1093/nar/gkh340
- Edgar, R.C., Batzoglou, S., (2006) Multiple sequence alignmnet, Current Opinion in Structural Biology 2006, 16:368–373, Elsevier
- Haque, W., Aravind, A., & Reddy, B. (2009, March). Pairwise sequence alignment algorithms: a survey. In Proceedings of the 2009 conference on Information Science, Technology and Applications (pp. 96-103). Doi: 10.1145/1551950.1551980
- Karcıoğlu, A. A., & Bulut, H. (2022). DNA sekansları için q-gram hash karşılaştırmasına dayalı çoklu kesin dizi eşleştirme algoritması. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi, 38(2), 875-888. Doi: 10.17341/gazimmfd.951157
- Karcioglu, A. A., & Bulut, H. (2021). Improving hash-q exact string matching algorithm with perfect hashing for DNA sequences. Computers in Biology and Medicine, 131, 104292. Doi: 10.1016/j.compbiomed.2021.104292
- Karcioglu, A. A., & Bulut, H. (2021). The WM-q multiple exact string matching algorithm for DNA sequences. Computers in Biology and Medicine, 136, 104656. Doi: 10.1016/j.compbiomed.2021.104656
- Karcioglu, A. A., & Bulut, H. (2022). q‐frame hash comparison based exact string matching algorithms for DNA sequences. Concurrency and Computation: Practice and Experience, 34(9), e6505. Doi: 10.1002/cpe.6505
- Lee, Z. J., Su, S. F., Chuang, C. C., & Liu, K. H. (2008). Genetic algorithm with ant colony optimization (GA-ACO) for multiple sequence alignment. Applied Soft Computing, 8(1), 55-78. Doi: 10.1016/j.asoc.2006.10.012
- Likic, V. (2008). The Needleman-Wunsch algorithm for sequence alignment. Lecture given at the 7th Melbourne Bioinformatics Course, Bi021 Molecular Science and Biotechnology Institute, University of Melbourne, 1-46.
- Luscombe, N. M., Greenbaum, D., & Gerstein, M. (2001). What is bioinformatics? An introduction and overview. Yearbook of medical informatics, 10(01), 83-100. Doi: 10.1055/s-0038-1638103
- Major Differences, Difference between Global and Local Sequence Alignment, https://www.majordifferences.com/2016/05/differencebetween-global-and-local.html, Access Date: 28.12.2022.
- Mirjalili, S. (2019). Evolutionary algorithms and neural networks. In Studies in computational intelligence (Vol. 780). Berlin/Heidelberg, Germany: Springer.
- Omar, M.F., Salam, R.A., Abdullah, R. & Rashid, N.A (2004). Multiple Sequence Alignment Using Optimization Algorithms, International Journal of Computational Intelligence Volume 1 Number 2
- Pais, F. S. M., Ruy, P. D. C., Oliveira, G., & Coimbra, R. S. (2014). Assessing the efficiency of multiple sequence alignment programs. Algorithms for molecular biology, 9, 1-8. Doi: 10.1186/1748-7188-9-4
- Paruchuri T., Kancharla G.R. & Dara S. (2023). Solving multiple sequence alignment problems by using a swarm intelligent optimization based approach, International Journal of Electrical and Computer Engineering (IJECE), Vol. 13, No. 1, February 2023, pp. 1097-1104
- Russel and Norvig. (2010). Artificial intelligence: a modern approach, Global Edition. Harlow, Essex, England: Pearson Educatio.
- Smith, T. F., & Waterman, M. S. (1981). Identification of common molecular subsequences. Journal of molecular biology, 147(1), 195-197.
Multiple Sequence Alignment for DNA Sequences by Using Meta-Heuristic Based Clustal-SA Algorithm
Year 2024,
Volume: 14 Issue: 2, 544 - 562, 01.06.2024
Hatic Erdirik
,
Abdullah Ammar Karcıoğlu
,
Bahattin Tanyolaç
,
Hasan Bulut
Abstract
Bioinformatics is a discipline that combines mathematics, biology and computer science to analyze biological data and reveal genetic relationships. One of the most critical tasks in this field is to solve the sequence alignment problem, which is related to the alignment of biological sequences. However, the rapid increase in biological data has made this problem unsolvable manually and led to the more widespread use of computer systems in bioinformatics. In this study, a new sequence alignment algorithm is proposed using the existing Clustal algorithm and simulated annealing algorithm. By using the speed advantage of the Clustal algorithm and integrating the simulated annealing algorithm, it is aimed to obtain an optimal alignment score by moving away from the greedy approach of Clustal. To evaluate the success of the developed algorithm, the SP (Sum of Pairs) scoring system was used and the number of column matches as a result of the alignment was taken into account. The results obtained revealed that the developed algorithm performed better than the ClustalW program on sequence data sets of the same length, and was more successful or gave similar results on some data sets compared to the MUSCLE program. This development highlights the potential of a new and more effective approach to solving the sequence alignment problem in bioinformatics. It is thought that in the future, such developments may find wider application in the field of biological data analysis.
Supporting Institution
Ege University Scientific Research Projects Coordinatorship
Project Number
FOA-2020-20981
Thanks
This study was supported by Ege University BAP coordinatorship within the scope of the project titled "Sequencing of chloroplast DNA sequences of Cicer and Lens species by Next Generation Sequencing and performing comparative genome analysis by determining genome organizations" with the code FOA-2020-20981.
References
- Aarts, E. H., & van Laarhoven, P. J. (1987). Simulated annealing: a pedestrian review of the theory and some applications. In Pattern recognition theory and applications (pp. 179-192). Springer Berlin Heidelberg. Doi: 10.1007/978-3-642-83069-3_15
- Aktan, M. N., & Bulut, H. (2022). Metaheuristic task scheduling algorithms for cloud computing environments. Concurrency and Computation: Practice and Experience, 34(9), e6513. Doi: 10.1002/cpe.6513
- Botta, M., & Negro, G. (2010). Multiple sequence alignment with genetic algorithms. In Computational Intelligence Methods for Bioinformatics and Biostatistics: 6th International Meeting, CIBB 2009, Genoa, Italy, October 15-17, 2009. Springer Berlin Heidelberg. Doi: 10.1007/978-3-642-14571-1_15
- Bucak, İ. Ö., & Uslan, V. (2011). Sequence alignment from the perspective of stochastic optimization: a survey. Turkish Journal of Electrical Engineering and Computer Sciences, 19(1), 157-173. Doi: 10.3906/elk-1002-410
- Chen J.T, Chao J.N, Liu H., Yang F.L., Zou Q. & Tang F.R, (2023) WMSA 2: a multiple DNA/RNA Sequence alignment tool implemented with accurate progressive mode and a fast win-win mode combining the center star and progressive strategies, Briefings in Bioinformatics, Volume: 24, Issue:4, Doi :10.1093/bib/bbad190
- Cohen, J. (2004). Bioinformatics—an introduction for computer scientists. ACM Computing Surveys (CSUR), 36(2), 122-158. Doi: 10.1145/1031120.1031122
- Diamantis, S., & Anna, C. (2005). Comparison of multiple sequence alignment programs. National and Kapodistrian university of Athens.
- Doğan, H., & Otu, H., (2014) Multiple Sequence Alignment Methods: Objevtive Function, Springer Protocols, Chepter 3, 44-85p
- Edgar, R.C., (2004), MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucleic Acids Research, 200, vol.32, No.5, DOI: 10.1093/nar/gkh340
- Edgar, R.C., Batzoglou, S., (2006) Multiple sequence alignmnet, Current Opinion in Structural Biology 2006, 16:368–373, Elsevier
- Haque, W., Aravind, A., & Reddy, B. (2009, March). Pairwise sequence alignment algorithms: a survey. In Proceedings of the 2009 conference on Information Science, Technology and Applications (pp. 96-103). Doi: 10.1145/1551950.1551980
- Karcıoğlu, A. A., & Bulut, H. (2022). DNA sekansları için q-gram hash karşılaştırmasına dayalı çoklu kesin dizi eşleştirme algoritması. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi, 38(2), 875-888. Doi: 10.17341/gazimmfd.951157
- Karcioglu, A. A., & Bulut, H. (2021). Improving hash-q exact string matching algorithm with perfect hashing for DNA sequences. Computers in Biology and Medicine, 131, 104292. Doi: 10.1016/j.compbiomed.2021.104292
- Karcioglu, A. A., & Bulut, H. (2021). The WM-q multiple exact string matching algorithm for DNA sequences. Computers in Biology and Medicine, 136, 104656. Doi: 10.1016/j.compbiomed.2021.104656
- Karcioglu, A. A., & Bulut, H. (2022). q‐frame hash comparison based exact string matching algorithms for DNA sequences. Concurrency and Computation: Practice and Experience, 34(9), e6505. Doi: 10.1002/cpe.6505
- Lee, Z. J., Su, S. F., Chuang, C. C., & Liu, K. H. (2008). Genetic algorithm with ant colony optimization (GA-ACO) for multiple sequence alignment. Applied Soft Computing, 8(1), 55-78. Doi: 10.1016/j.asoc.2006.10.012
- Likic, V. (2008). The Needleman-Wunsch algorithm for sequence alignment. Lecture given at the 7th Melbourne Bioinformatics Course, Bi021 Molecular Science and Biotechnology Institute, University of Melbourne, 1-46.
- Luscombe, N. M., Greenbaum, D., & Gerstein, M. (2001). What is bioinformatics? An introduction and overview. Yearbook of medical informatics, 10(01), 83-100. Doi: 10.1055/s-0038-1638103
- Major Differences, Difference between Global and Local Sequence Alignment, https://www.majordifferences.com/2016/05/differencebetween-global-and-local.html, Access Date: 28.12.2022.
- Mirjalili, S. (2019). Evolutionary algorithms and neural networks. In Studies in computational intelligence (Vol. 780). Berlin/Heidelberg, Germany: Springer.
- Omar, M.F., Salam, R.A., Abdullah, R. & Rashid, N.A (2004). Multiple Sequence Alignment Using Optimization Algorithms, International Journal of Computational Intelligence Volume 1 Number 2
- Pais, F. S. M., Ruy, P. D. C., Oliveira, G., & Coimbra, R. S. (2014). Assessing the efficiency of multiple sequence alignment programs. Algorithms for molecular biology, 9, 1-8. Doi: 10.1186/1748-7188-9-4
- Paruchuri T., Kancharla G.R. & Dara S. (2023). Solving multiple sequence alignment problems by using a swarm intelligent optimization based approach, International Journal of Electrical and Computer Engineering (IJECE), Vol. 13, No. 1, February 2023, pp. 1097-1104
- Russel and Norvig. (2010). Artificial intelligence: a modern approach, Global Edition. Harlow, Essex, England: Pearson Educatio.
- Smith, T. F., & Waterman, M. S. (1981). Identification of common molecular subsequences. Journal of molecular biology, 147(1), 195-197.