Yeni Nesil Dizileme Verilerinin Analizinde Bulut Teknolojisi
Yıl 2022,
, 1 - 10, 13.06.2022
Sema Karabudak
,
Meryem Sena Akkuş
Öz
Yeni nesil dizileme (YND) araçları, büyük miktarda veri üretme kapasitesine sahiptir ancak dizileme sonrası büyük ölçekli veri analizi için yeterli olmayan hesaplama ve depolama kapasitesi ile donatılmışlardır. Bulut bilişim altyapılarını kullanmak YND verilerinin analizi, depolanması ve aktarılması ile ilgili sorunlara alternatif bir seçenek olmuştur. Bulut bilişim, kullanıcılara dizileme verilerinin analizi için gerekli hesaplama kapasitesi ve bilişim altyapılarına erişim sunmakta ve biyoinformatik altyapıları için gerekli olan ön sermaye harcamalarının çoğunu ortadan kaldırmaktadır. Yapılan bu çalışmada yeni nesil dizileme yöntemi ve dizileme verilerinin analizinde kullanılan bulut bilişim platformaları hakkında bilgi verilmiştir.
Kaynakça
- [1]Behjati S., Tarpey P. S., "What is next generation sequencing?," Archives of Disease in Childhood. Education and Practice Edition, 98, 236-238, 2013.
- [2] Barba, M., Czosnek, H., & Hadidi, A.” Historical perspective, development and applications of next-generation sequencing in plant virology,” Viruses, 6, 106–136, 2014
.
[3] Goodwin, S., McPherson, J. & McCombie, W., “Coming of age: ten years of next-generation sequencing Technologies,” Nat Rev Genet, 17, 333–351, 2016.
- [4] Kwon T., Yoo W. G., W.-J. Lee W.J., Kim W., Kim D.W., "Next-generation sequencing data analysis on cloud computing," Genes & Genomics, 37, 489-501, 2015.
- [5] Pereira M., Malta F., Freire M., and Couto P., "Application of Next-Generation Sequencing in the Era of Precision Medicine. In Applications of RNA-Seq and Omics Strategies – From Microorganisms
to Human Health", Intech Open, 2017.
- [6] Celesti F., Celesti A., Carnevale L., Galletta A., Campo S., Romano A., "Big data analytics in genomics: The point on Deep Learning solutions," 22nd IEEE Symposium on Computers and Communications (ISCC), Abstract Book, 306-309, 2017.
- [7] Schmidt B. , Hildebrandt A., "Next-generation sequencing: Big data meets high performance computing," Drug Discovery Today, 22, 712-717, 2017.
- [8] Zhao S., Watrous K., Zhang C., and Zhang B., "Cloud Computing for Next-Generation Sequencing Data Analysis," InTechOpen, 29–51, 2017
.
[9] Thakur R., Bandopadhyay R., Chaudhary B., Chatterjee S., "Now and next-generation sequencing techniques: Future of sequence analysis using cloud computing," Frontiers in Genetics, 3, 280-280, 2012.
- [10] Langmead B. and Nellore A., "Cloud computing for genomic data analysis and collaboration," Nature Reviews Genetics, 19, 208-219, 2018.
- [11] Baker Q. B., Al-Rashdan W., and Jararweh Y., "Cloud-Based Tools for Next-Generation Sequencing Data Analysis," 2018 5th International Conference on Social Networks Analysis, Management and Security (SNAMS), Abstract Book 99-105s, Valencia-Spain, 2018.
- [12] Zhang Q., Cheng L., and Boutaba R., "Cloud Computing: State-of-the-art and research challenges," Journal of Internet Services and Applications, 1, 7-18, 2010.
- [13] Dai, L., Gao, X., Guo, Y., Xiao, J., Zhang, Z., “Bioinformatics clouds for big data manipulation,” Biology direct, 7, 1-7, 2012.
- [14] Goyal S., "Public vs private vs hybrid vs community - cloud computing: A critical review," International Journal of Computer Network and Information Security, 6, 20-29, 2014.
- [15] Zhao S., Prenger K., Smith L., Messina T., Fan H., Jaeger E., "Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing," BMC Genomics, 14, 425-425, 2013.
- [16] Wang, D., Song, L., Singh, V., Rao, S., An, L., Madhavan, S., “SNP2Structure: a public and versatile resource for mapping and three-dimensional modeling of missense SNPs on human protein structures,” Computational and structural biotechnology journal, 13, 514-519, 2015.
- [17] Oh, J., Choi, C. H., Park, M. K., Kim, B. K., Hwang, K., Lee, S. H.,Kim, K. M., “Clustom-cloud: In-memory data grid-based software for clustering 16s rrna sequence data in the cloud environment,” PloS one, 11, e0151064, (2016).
- [18] Chae, H., Rhee, S., Nephew, K. P., Kim, S., ”BioVLAB-MMIA-NGS: microRNA–mRNA integrated analysis using high-throughput sequencing data,” Bioinformatics, 31, 265-267, 2015.
- [19] White, J., Arze, C., Matalka, M., Team, T. C., Angiuoli, S., Fricke, W. F., “CloVR-Metagenomics: Functional and taxonomic microbial community characterization from metagenomic whole-genome shotgun (WGS) sequences–standard operating procedure,” Nature Precedings, 1, 1-1, 2011.
- [20] Fricke, W., White, J., Arze, Matalka,M., White,O., Angiuoli,S., ‘’CloVR-Metagenomics: Functional and taxonomic microbial community characterization from metagenomic whole-genome shotgun (WGS) sequences – standard operating procedure, version 1.0.’’ Nature Precedings, 1,1-1,2011.
- [21] White, O., Angiuoli, S., Fricke, W. F., Galens, K., White, J., Arze, C., Team, T. C., “CloVR-Microbe: Assembly, gene finding and functional annotation of raw sequence data from single microbial genome projects–standard operating procedure,” Nature Precedings, 1, 1-1, 2011.
- [22] http://clovr.org/methods/clovr-search/
- [23] Orvis, J., Crabtree, J., Galens, K., Gussman, A., Inman, J. M., Lee, E., Angiuoli, S. V., “Ergatis: a web interface and scalable software system for bioinformatics workflows,” Bioinformatics, 26, 1488-1492, 2010.
- [24] Dai, L., Gao, X., Guo, Y., Xiao, J., Zhang, Z., “Bioinformatics clouds for big data manipulation,” Biology direct, 7, 1-7, 2012.
- [25] Wang,D., Song,L., Singh,V., Rao,S., An,L., Madhavan,S., ‘’ SNP2Structure: A Public and Versatile Resource for Mapping and Three-Dimensional Modeling of Missense SNPs on Human Protein Structures,’’ Computational and structural biotechnology journal, 13, 514-519, 2015.
- [26] Schatz,M., ‘’CloudBurst: highly sensitive read mapping with MapReduce’’ Bioinformatics, 25,1363-1369, 2009.
- [27] Habegger, L., Balasubramanian, S., Chen, DZ., Khurana,E., Sboner,A., Harmanci,A., Rozowsky,J., Clarke,D., Snyder,M., Gerstein,M., “VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment,’’ Bioinformatics. 28,2267-2269,2012.
- [28] Langmead B., Hansen K., Leek J., "Cloud-scale RNA-sequencing differential expression analysis with Myrna," Genome Biology, 11, R83, 2010.
- [29] Schatz, Michael C. "BlastReduce: high performance short read mapping with MapReduce." University of Maryland, http://cgis. cs. umd. edu/Grad/scholarlypapers/papers/MichaelSchatz. pdf, 2008.
- [30] Pireddu, L., Leo, S., Zanetti, G., “SEAL: a distributed short read mapping and duplicate removal tool,” Bioinformatics, 27, 2159-2160, 2011.
- [31] Chang, Y. J., Chen, C. C., Ho, J. M., & Chen, C. L., “De novo assembly of high-throughput sequencing data with cloud computing and new operations on string graphs,” In 2012 IEEE Fifth International Conference on Cloud Computing IEEE, 155-161, 2012.
- [32] Schönherr, S., Forer, L., Weißensteiner, H., Kronenberg, F., Specht, G., & Kloss-Brandstätter, A., “Cloudgene: a graphical execution platform for MapReduce programs on private and public clouds,” BMC bioinformatics, 13, 1-9, 2012.
- [33] Li, Bo, Gould, J., Yang, Y., Sarkizova, S., Tabaka, M., Ashenberg, O., Regev, A. "Cumulus provides cloud-based data analysis for large-scale single-cell and single-nucleus RNA-seq." Nature methods, 17, 793-798, 2020.
- [34] Nordberg H., Bhatia K., Wang K., Wang Z., "Biopig: a Hadoop-based analytic toolkit for large-scale sequence data," Bioinformatics, 29, 23, 2013.
- [35] Challis, D., Yu, J., Evani, U. S., Jackson, A. R., Paithankar, S., Coarfa, C., Yu, F., “An integrative variant analysis suite for whole exome next-generation sequencing data,” BMC bioinformatics, 13, 1-12, 2012.
- [36] Lu W., Jackson J., Barga R., "AzureBlast: A case study of developing science applications on the cloud," 2010. Conference: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC 2010, 21-25 June 2010, 413-420, Chicago, Illinois, USA, 2010.
- [37] Zhang L., Gu S., Liu Y., Wang B., Azuaje F., "Gene set analysis in the cloud," Bioinformatics, 28, 294-295, 2012.
- [38] Karczewski, K. J., Fernald, G. H., Martin, A. R., Snyder, M., Tatonetti, N. P., Dudley, J. T.,” STORMSeq: an open-source, user-friendly pipeline for processing personal genomics data in the cloud,” PloS one, 9, e84860, 2014.
- [39] Zhao, S., Prenger, K., Smith, L., Stormbow: a cloud-based tool for reads mapping and expression quantification in large-scale RNA-Seq studies,” ISRN Bioinformatics, 2013, 1-8, 2013.
- [40] Zhao, S., Prenger, K., Smith, L., Stormbow: a cloud-based tool for reads mapping and expression quantification in large-scale RNA-Seq studies,” ISRN Bioinformatics, 2013, 1-8, 2013.
- [41] Li, Y., Zhong, S., “SeqMapReduce: software and web service for accelerating sequence mapping,” Critical Assessment of Massive Data Anaysis (CAMDA), 2009, 1-5, 2009.
- [42] Gurtowski J., Schatz M. C., Langmead B., "Genotyping in the cloud with Crossbow," Current Protocols in Bioinformatics, 15, Unit15.3, 2012.
- [43] Jourdren L., Bernard M., Dillies M.-A., Crom S. Le, "Eoulsan: A cloud computing-based framework facilitating high throughput sequencing analyses," Bioinformatics, 28, 1542-1543, 2012.
- [44] Blankenberg, D., Hillman-Jackson, J., “Analysis of next-generation sequencing data using Galaxy,” In Stem cell transcriptional networks, Humana Press, New York, 21-43, 2014.
- [45] Afgan, E., Baker, D., Coraor, N., Goto, H., Paul, I. M., Makova, K. D.,Taylor, J., “Harnessing cloud computing with Galaxy Cloud,” Nature biotechnology, 29, 972-974, 2011.
- [46] Wiewiórka M. S., Messina A., Pacholewska A., Maffioletti S., Gawrysiak P., Okoniewski M. J., "SparkSeq: fast, scalable and cloud-ready tool for the interactive genomic data analysis with nucleotide precision," Bioinformatics, 30, 2652-2653, 2014.
- [47] Krampis K., Booth T., Chapman B., Tiwari B., Bicak M., "Cloud BioLinux: Pre-configured and on-demand bioinformatics computing for the genomics community," BMC Bioinformatics, 13, 42, 2012.
[48] Afgan, E., Chapman, B., Taylor, J., ”CloudMan as a platform for tool, data, and analysis distribution,” BMC bioinformatics, 13, 1-7, 2012.
- [49] Oh, J., Choi, C. H., Park, M. K., Kim, B. K., Hwang, K., Lee, S. H., Kim, K. M., “Clustom-cloud: In-memory data grid-based software for clustering 16s rRNA sequence data in the cloud environment,” PloS one, 11, e0151064, 2016.
- [50] Schumacher A., Pireddu L., Niemenmaa M., Kallio A., Korpelainen E., Zanetti G., "SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop," Bioinformatics, 30(1), 119-120, 2013.
- [51] Navale V., Bourne P. E., "Cloud computing applications for biomedical science: A perspective," PLoS Computational Biology, 14,1006144, 2018.
- [52] Nordberg H., Bhatia K., Wang K., Wang Z., "Biopig: a Hadoop-based analytic toolkit for large-scale sequence data," Bioinformatics, 29, 23, 2013.
- [53] Zhao, Y., Tang, H., Ye, Y.,”RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data,” Bioinformatics, 28, 125-126, 2012.
- [54] Afgan, E., Baker, D., Coraor, N., Chapman, B., Nekrutenko, A., Taylor, J., “Galaxy CloudMan: delivering cloud compute clusters,” BMC bioinformatics, 11, 1-6, 2010.
- [55] Nguyen T., Shi W., Ruden D., "CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping," BMC Research Notes, 4, 171, 2011.
- [56] McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., DePristo, M. A., “The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data,” Genome research, 20, 1297-1303, 2010.
- [57] Huson, D. H., Weber, N., “Microbial community analysis using MEGAN,” Methods in enzymology, 531, 465-485, 2013.
- [58] Keegan, K. P., Glass, E. M., Meyer, F., “MG-RAST, a metagenomics service for analysis of microbial community structure and function. Microbial environmental genomics,” Humana Press, New York, 207-233, 2016.
- [59] Stewart, A. C., Osborne, B., Read, T. D., “DIYA: a bacterial annotation pipeline for any genomics lab.,” Bioinformatics, 25, 962-963, 2009.