Yeni Nesil Dizileme Analizinde Hizalama Adımının GPU Başarımı
Year 2024,
Volume: 10 Issue: 2, 432 - 442, 31.08.2024
Hilal Akarkamçı
,
Gülistan Özdemir Özdoğan
Abstract
Biyolojik verilerin miktarının artmasıyla birlikte, bu verilerin etkin bir biçimde işlenebilmesi güçleşmiştir. Bu durum, biyoinformatik disiplinini ön plana çıkarmış ve ilgili araçların geliştirilmesine olan ihtiyacı artırmıştır. Yeni nesil dizileme tekniği ile üretilen büyük miktardaki verinin anlamlandırılabilmesi için hassas bir veri analizi süreci yürütülmelidir. Bu süreç içerisinde en yüksek maliyetli adım, hizalama adımıdır. Bu maliyeti azaltan en etkili tekniklerden birisi, grafik işlem biriminin kullanılmasıdır. Bu çalışmada; hizalama adımında CPU ile çalışan Burrows-Wheeler hizalayıcı ve GPU programlama versiyonu olan BarraCUDA araçlarının performansı, farklı veri setleri için hizalama oranları ve hesaplama zamanları açısından karşılaştırılmıştır. Çalışmada ayrıca, bu araçların toplam çalışma zamanlarının yanı sıra, hizalama alt adımlarının bir veya birden fazla GPU kullanıldığındaki çalışma zamanları da incelenmiştir. Her bir veri setinde kullanılan araçların hizalama oranlarında benzerlik görülmekle birlikte, GPU destekli BarraCUDA aracılığıyla farklı büyüklükteki verilerde zaman açısından önemli bir yarar sağlandığı görülmüştür. Sonuç olarak, hizalama adımında GPU kullanımı ile tek uçlu verilerde yaklaşık 5 kat, çift uçlu verilerde ise yaklaşık 9 kat hızlanma elde edilmiştir.
Thanks
Bu araştırmada yer alan tüm nümerik hesaplamalar TÜBİTAK ULAKBİM, Yüksek Başarım ve Grid Hesaplama Merkezi’nde (TRUBA kaynaklarında) gerçekleştirilmiştir.
References
- [1] G. Özdemir Özdoğan, “Retinoblastom hastalığında yeni nesil dizileme veri analizi ile bir ardışık düzenin geliştirilmesi (Development of a pipeline with next-generation sequencing data analysis in retinoblastoma disease),” Ph.D. dissertation. Ankara Yıldırım Beyazıt University, Ankara, Türkiye, 2020.
- [2] G. Özdemir Özdoğan and H. Kaya, “Next-generation sequencing data analysis on pool-seq and low-coverage retinoblastoma data,” Interdisciplinary Sciences, Computational Life Sciences, vol. 12, no. 3, pp. 302–310, Sep. 2020. doi:10.1007/s12539-020-00374-8
- [3] M. Nobile, P. Cazzaniga, A. Tangherloni, and D. Besozzi, “Graphics processing units in bioinformatics, computational biology and systems biology,” Brief Bioinform., vol. 18, no. 5, pp. 870-885, Sep. 2017. doi:10.1093/bib/bbw058
- [4] S. Pawar, A. Stanam, and Y. Zhu, “Evaluating the computing efficiencies (specificity and sensitivity) of graphics processing unit (GPU)-accelerated DNA sequence alignment tools against central processing unit (CPU) alignment tool,” Journal of Bioinformatics and Sequence Analysis, vol. 9, no. 2, pp. 10-14, July 2018. doi:10.5897/JBSA2018.0109
- [5] Y. Liu, J.-Y. Li, Y.-Q. Mao, X.-L. Wang and D.-S. Zhao, “A literature evaluation of CUDA compatible sequence aligners,” in Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms (Bioinformatics-2013), P. Fernandes, J. Solé-Casals, A. L. N. Fred, and H. Gamboa, Eds. Spain: Scitepress, Feb. 2013, pp. 268-271. [Online]. Available: https://dblp.org/db/conf/biostec/bioinformatics2013.html. [Accessed: April 19, 2023].
- [6] X. Zhao, C. Liu, and G. Tan, “Implementation of short read alignment algorithm in OpenCL on Xeon Phi coprocessor,” in IEEE 17th International Conference on High Performance Computing and Communications, HPCC 2015. New York, NY, USA, Aug. 24-26, 2015, IEEE, 2015, pp. 1633-1636. [Online]. Available: IEEE Xplore, https://ieeexplore.ieee.org/abstract/document/7336403. [Accessed: May 20, 2023].
- [7] J. González-Domínguez, Y. Liu and B. Schmidt, “Parallel and scalable short-read alignment on multi-core clusters using UPC++,” PLoS One, vol. 11, no. 1, Jan. 2016. doi:10.1371/journal.pone.0145490
- [8] R. Luo, J. Cheung, E. Wu, H. Wang, S.-H. Chan, W.-C. Law and G. He, “MICA: A fast short-read aligner that takes full advantage of Many Integrated Core Architecture (MIC),” BMC Bioinformatics, vol. 16, no. 7, p. S10, Apr. 2015. doi:10.1186/1471-2105-16-S7-S10
- [9] P. Liu, A. Hemani, K. Paul, C. Weis, M. Jung and N. Wehn, “3D-Stacked many-core architecture for biological sequence analysis problems,” Int J. Parallel Prog., vol. 45, pp. 1420–1460, Apr. 2017. doi:10.1007/s10766-017-0495-0
- [10] P. Klus, S. Lam, D. Lyberg, M. Cheung, G. Pullan and I. McFarlane, “BarraCUDA - a fast short read sequence aligner using graphics processing units,” BMC Research Notes, vol. 5, no. 2, Jan. 2012. doi:10.1186/1756-0500-5-27
- [11] R. Luo, T. Wong, J. Zhu, C.-M. Liu, X. Zhu, E. Wu and L.-K. Lee, “SOAP3-dp: Fast, accurate and sensitive GPU-based short read aligner,” PLoS One, vol. 8, no. 5, May 2013. doi:10.1371/journal.pone.0065632
- [12] Y. Liu and B. Schmidt, “CUSHAW2-GPU: empowering faster gapped short-read alignment using GPU computing,” IEEE Design and Test of Computers, vol. 31, no. 1, pp. 31 – 39, Febr. 2014. doi:10.1109/MDAT.2013.2284198
- [13] “NVBIO: nvBowtie,” [Online]. Available: https://nvlabs.github.io/nvbio/nvbowtie_page.html. [Accessed: April 19, 2023].
- [14] A. Manconi, A. Orro, E. Manca, G. Armano and L Milanesi, “A tool for mapping Single Nucleotide Polymorphisms using Graphics Processing Units,” BMC Bioinformatics, vol. 15, no. 1, p. S10, Jan. 2014. doi:1471-2105/15/S1/S10
- [15] F. Buntara, B.-S. Lee, R. Purbojati and C. Zhou, “Is GPUs ready to boost genomic alignment computation,” in 2019 International Conference on Innovative Trends in Computer Engineering, ITCE, 2019. Egypt, February 02-04, 2019, IEEE, 2019, pp. 130-135. [Online]. Available: IEEE Xplore, https://ieeexplore.ieee.org/document/8646637. [Accessed: May 21, 2024].
- [16] A. Shrestha and M. Frith, “An approximate bayesian approach to mapping paired-end DNA reads to a reference genome,” Bioinformatics, vol. 29, no. 8, pp. 965–972, April 2013. doi:10.1093/bioinformatics/btt073
[17] “Advantages of paired-end and single-read sequencing - Illumina,” [Online]. Available: https:// www.illumina.com/science/technology/next-generation-sequencing/plan-experiments/paired-end-vs-single-read.html. [Accessed: August 7, 2024].
- [18] H. Li and R. Durbin, “Fast and accurate short-read alignment with burrows-wheeler transform,” Bioinformatics, vol. 25, no. 14, pp. 1754-60, May 2009. doi:10.1093/bioinformatics/btp324
- [19] H. Li and R. Durbin,, “Fast and accurate long-read alignment with burrows-wheeler transform,” Bioinformatics, vol. 26, no. 5, pp. 589-95, March 2010. doi:10.1093/bioinformatics/btp698
[20] A. Al Kawam, “Towards the next generation of clinical decision support: Overcoming the integration challanges of genomic data and electronic health records,” Ph.D. dissertation. Graduate and Professional Studies of Texas A&M University, Texas, USA, 2018.
- [21] M. Schatz, C. Trapnell, A. Delcher and A. Varshney, “High-throughput sequence alignment using graphics processing units,” BMC Bioinformatics, vol. 8, pp. 474, Dec. 2007. doi:10.1186/1471-2105-8-474
- [22] “Single-read vs. paired-end sequencing – CD Genomics,” [Online]. Available: https://www.cd-genomics.com/resource-single-read-vs-paired-end-sequencing.html. [Accessed: July 22, 2024].
- [23] A. H. Freedman, J. M. Gaspar, and T. B. Sackton, “Short paired-end reads trump long single-end reads for expression analysis,” BMC Bioinformatics, vol. 21, no. 149, Apr. 2020. doi:10.1186/s12859-020-3484-z
- [24] M. Qasaimeh, K. Denolf, J. Lo, K. Vissers, J. Zambreno, and Jones, Phillip, “Comparing energy efficiency of CPU, GPU and FPGA implementations for vision kernels,” in IEEE International Conference on Embedded Software and Systems, ICESS, 2019, June 02-03, 2019, Las Vegas, NV, USA, 2019, pp. 1-8. [Online]. Available: IEEE Xplore, https://ieeexplore.ieee.org/document/8782524. [Accessed: Aug. 20, 2024].
- [25] K. R. Franke and E. L. Crowgey, “Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for genome analysis toolkit algorithms,” Genomics Inform, vol. 18, no. 1, Mar. 2020. doi:10.5808/GI.2020.18.1.e10
- [26] G. Carpi, L. Gorenstein, T.T. Harkins, M. Samadi, and P. Vats, “A GPU-accelerated compute framework for pathogen genomic variant identification to aid genomic epidemiology of infectious disease: a malaria case study.” Brief Bioinform., vol. 23, no. 5, Sep. 2020. doi:10.1093/bib/bbac314
GPU Performance of Alignment Step in Next Generation Sequencing Analysis
Year 2024,
Volume: 10 Issue: 2, 432 - 442, 31.08.2024
Hilal Akarkamçı
,
Gülistan Özdemir Özdoğan
Abstract
As the amount of biological data has increased, it has become difficult to process it effectively. This has brought the discipline of bioinformatics to the forefront and increased the need for the development of relevant tools. A sensitive data analysis process is required to make sense of this large amount of data produced by next generation sequencing technique. The most costly step in this process is the alignment step. One of the most effective techniques to reduce this cost is the use of a graphics processing unit. In this study, the performance of the CPU-based Burrows-Wheeler aligner and the GPU programming version BarraCUDA tools in the alignment step were compared in terms of alignment rates and computation times for different datasets. In the study, total runtime of these tools was also examined, as well as the runtime of the alignment sub-steps when using one or more GPUs. While there is a similarity in the alignment rates of the tools used in each data set, it has been observed that there is a significant time benefit in data of different sizes through GPU supported BarraCUDA. As a result, with the use of GPU in the alignment step, approximately 5 times acceleration was achieved in single-ended data and approximately 9 times in paired-ended data.
References
- [1] G. Özdemir Özdoğan, “Retinoblastom hastalığında yeni nesil dizileme veri analizi ile bir ardışık düzenin geliştirilmesi (Development of a pipeline with next-generation sequencing data analysis in retinoblastoma disease),” Ph.D. dissertation. Ankara Yıldırım Beyazıt University, Ankara, Türkiye, 2020.
- [2] G. Özdemir Özdoğan and H. Kaya, “Next-generation sequencing data analysis on pool-seq and low-coverage retinoblastoma data,” Interdisciplinary Sciences, Computational Life Sciences, vol. 12, no. 3, pp. 302–310, Sep. 2020. doi:10.1007/s12539-020-00374-8
- [3] M. Nobile, P. Cazzaniga, A. Tangherloni, and D. Besozzi, “Graphics processing units in bioinformatics, computational biology and systems biology,” Brief Bioinform., vol. 18, no. 5, pp. 870-885, Sep. 2017. doi:10.1093/bib/bbw058
- [4] S. Pawar, A. Stanam, and Y. Zhu, “Evaluating the computing efficiencies (specificity and sensitivity) of graphics processing unit (GPU)-accelerated DNA sequence alignment tools against central processing unit (CPU) alignment tool,” Journal of Bioinformatics and Sequence Analysis, vol. 9, no. 2, pp. 10-14, July 2018. doi:10.5897/JBSA2018.0109
- [5] Y. Liu, J.-Y. Li, Y.-Q. Mao, X.-L. Wang and D.-S. Zhao, “A literature evaluation of CUDA compatible sequence aligners,” in Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms (Bioinformatics-2013), P. Fernandes, J. Solé-Casals, A. L. N. Fred, and H. Gamboa, Eds. Spain: Scitepress, Feb. 2013, pp. 268-271. [Online]. Available: https://dblp.org/db/conf/biostec/bioinformatics2013.html. [Accessed: April 19, 2023].
- [6] X. Zhao, C. Liu, and G. Tan, “Implementation of short read alignment algorithm in OpenCL on Xeon Phi coprocessor,” in IEEE 17th International Conference on High Performance Computing and Communications, HPCC 2015. New York, NY, USA, Aug. 24-26, 2015, IEEE, 2015, pp. 1633-1636. [Online]. Available: IEEE Xplore, https://ieeexplore.ieee.org/abstract/document/7336403. [Accessed: May 20, 2023].
- [7] J. González-Domínguez, Y. Liu and B. Schmidt, “Parallel and scalable short-read alignment on multi-core clusters using UPC++,” PLoS One, vol. 11, no. 1, Jan. 2016. doi:10.1371/journal.pone.0145490
- [8] R. Luo, J. Cheung, E. Wu, H. Wang, S.-H. Chan, W.-C. Law and G. He, “MICA: A fast short-read aligner that takes full advantage of Many Integrated Core Architecture (MIC),” BMC Bioinformatics, vol. 16, no. 7, p. S10, Apr. 2015. doi:10.1186/1471-2105-16-S7-S10
- [9] P. Liu, A. Hemani, K. Paul, C. Weis, M. Jung and N. Wehn, “3D-Stacked many-core architecture for biological sequence analysis problems,” Int J. Parallel Prog., vol. 45, pp. 1420–1460, Apr. 2017. doi:10.1007/s10766-017-0495-0
- [10] P. Klus, S. Lam, D. Lyberg, M. Cheung, G. Pullan and I. McFarlane, “BarraCUDA - a fast short read sequence aligner using graphics processing units,” BMC Research Notes, vol. 5, no. 2, Jan. 2012. doi:10.1186/1756-0500-5-27
- [11] R. Luo, T. Wong, J. Zhu, C.-M. Liu, X. Zhu, E. Wu and L.-K. Lee, “SOAP3-dp: Fast, accurate and sensitive GPU-based short read aligner,” PLoS One, vol. 8, no. 5, May 2013. doi:10.1371/journal.pone.0065632
- [12] Y. Liu and B. Schmidt, “CUSHAW2-GPU: empowering faster gapped short-read alignment using GPU computing,” IEEE Design and Test of Computers, vol. 31, no. 1, pp. 31 – 39, Febr. 2014. doi:10.1109/MDAT.2013.2284198
- [13] “NVBIO: nvBowtie,” [Online]. Available: https://nvlabs.github.io/nvbio/nvbowtie_page.html. [Accessed: April 19, 2023].
- [14] A. Manconi, A. Orro, E. Manca, G. Armano and L Milanesi, “A tool for mapping Single Nucleotide Polymorphisms using Graphics Processing Units,” BMC Bioinformatics, vol. 15, no. 1, p. S10, Jan. 2014. doi:1471-2105/15/S1/S10
- [15] F. Buntara, B.-S. Lee, R. Purbojati and C. Zhou, “Is GPUs ready to boost genomic alignment computation,” in 2019 International Conference on Innovative Trends in Computer Engineering, ITCE, 2019. Egypt, February 02-04, 2019, IEEE, 2019, pp. 130-135. [Online]. Available: IEEE Xplore, https://ieeexplore.ieee.org/document/8646637. [Accessed: May 21, 2024].
- [16] A. Shrestha and M. Frith, “An approximate bayesian approach to mapping paired-end DNA reads to a reference genome,” Bioinformatics, vol. 29, no. 8, pp. 965–972, April 2013. doi:10.1093/bioinformatics/btt073
[17] “Advantages of paired-end and single-read sequencing - Illumina,” [Online]. Available: https:// www.illumina.com/science/technology/next-generation-sequencing/plan-experiments/paired-end-vs-single-read.html. [Accessed: August 7, 2024].
- [18] H. Li and R. Durbin, “Fast and accurate short-read alignment with burrows-wheeler transform,” Bioinformatics, vol. 25, no. 14, pp. 1754-60, May 2009. doi:10.1093/bioinformatics/btp324
- [19] H. Li and R. Durbin,, “Fast and accurate long-read alignment with burrows-wheeler transform,” Bioinformatics, vol. 26, no. 5, pp. 589-95, March 2010. doi:10.1093/bioinformatics/btp698
[20] A. Al Kawam, “Towards the next generation of clinical decision support: Overcoming the integration challanges of genomic data and electronic health records,” Ph.D. dissertation. Graduate and Professional Studies of Texas A&M University, Texas, USA, 2018.
- [21] M. Schatz, C. Trapnell, A. Delcher and A. Varshney, “High-throughput sequence alignment using graphics processing units,” BMC Bioinformatics, vol. 8, pp. 474, Dec. 2007. doi:10.1186/1471-2105-8-474
- [22] “Single-read vs. paired-end sequencing – CD Genomics,” [Online]. Available: https://www.cd-genomics.com/resource-single-read-vs-paired-end-sequencing.html. [Accessed: July 22, 2024].
- [23] A. H. Freedman, J. M. Gaspar, and T. B. Sackton, “Short paired-end reads trump long single-end reads for expression analysis,” BMC Bioinformatics, vol. 21, no. 149, Apr. 2020. doi:10.1186/s12859-020-3484-z
- [24] M. Qasaimeh, K. Denolf, J. Lo, K. Vissers, J. Zambreno, and Jones, Phillip, “Comparing energy efficiency of CPU, GPU and FPGA implementations for vision kernels,” in IEEE International Conference on Embedded Software and Systems, ICESS, 2019, June 02-03, 2019, Las Vegas, NV, USA, 2019, pp. 1-8. [Online]. Available: IEEE Xplore, https://ieeexplore.ieee.org/document/8782524. [Accessed: Aug. 20, 2024].
- [25] K. R. Franke and E. L. Crowgey, “Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for genome analysis toolkit algorithms,” Genomics Inform, vol. 18, no. 1, Mar. 2020. doi:10.5808/GI.2020.18.1.e10
- [26] G. Carpi, L. Gorenstein, T.T. Harkins, M. Samadi, and P. Vats, “A GPU-accelerated compute framework for pathogen genomic variant identification to aid genomic epidemiology of infectious disease: a malaria case study.” Brief Bioinform., vol. 23, no. 5, Sep. 2020. doi:10.1093/bib/bbac314