Effect of Benchmark Datasets on Protein Structure Prediction As a Concept

Nuh Azgınoğlu

doi:10.31590/ejosat.1014716

Konferans Bildirisi

Effect of Benchmark Datasets on Protein Structure Prediction As a Concept

Yıl 2021, Sayı: 29, 117 - 121, 01.12.2021

Nuh Azgınoğlu

https://doi.org/10.31590/ejosat.1014716

Öz

Knowing the protein structures is essential in understanding the job descriptions of proteins involved in vital functions, drug design, and many more. On the other hand, protein structure prediction is an alternative bioinformatics sub-study field to shorten the process that takes a long time in the laboratory environment. Performance analyzes of the methods developed in this field are generally made on benchmark datasets. The size of the datasets directly affects the algorithm runtime. In this study, how to benchmark datasets are reflected in the results is analyzed. Within the scope of the study, two different benchmark datasets, CB513 and EVASet, and two different protein structure prediction methods, JPred and Porter, were used. The study is a source of inspiration for further studies with the idea of developing benchmark datasets that are comprehensive in terms of protein properties but contain as little data as possible in terms of data size.

Anahtar Kelimeler

Protein structure prediction, Benchmark dataset, Concept

Destekleyen Kurum

Kayseri University Scientific Research Projects Unit

Proje Numarası

FHD-2021-1045

Teşekkür

This study was supported as Project Number: FHD-2021-1045 by Kayseri University Scientific Research Projects Unit. We thank Kayseri University Scientific Research Projects unit for their contributions.

Kaynakça

Asai, K., Hayamizu, S., & Handa, K. I. (1993). Prediction of protein secondary structure by the hidden Markov model. Bioinformatics, 9(2), 141-146.
Atasever, S., Azgınoglu, N., Erbay, H., & Aydın, Z. (2021). 3-State Protein Secondary Structure Prediction based on SCOPe Classes. Brazilian Archives of Biology and Technology, 64.
Aydin, Z., Azginoglu, N., Bilgin, H. I., & Celik, M. (2019). Developing structural profile matrices for protein secondary structure and solvent accessibility prediction. Bioinformatics, 35(20), 4004-4010.
Azginoglu, N., Aydin, Z., & Celik, M. (2020). Structural profile matrices for predicting structural properties of proteins. Journal of Bioinformatics and Computational Biology, 18(04), 2050022.
Bouziane, H., Messabih, B., & Chouarfia, A. (2015). Effect of simple ensemble methods on protein secondary structure prediction. Soft Computing, 19(6), 1663-1678.
Bujnicki, J. M., Elofsson, A., Fischer, D., & Rychlewski, L. (2001). LiveBench‐1: Continuous benchmarking of protein structure prediction servers. Protein Science, 10(2), 352-361.
Cuff, J. A., & Barton, G. J. (1999). Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins: Structure, Function, and Bioinformatics, 34(4), 508-519.
Drozdetskiy, A., Cole, C., Procter, J., & Barton, G. J. (2015). JPred4: a protein secondary structure prediction server. Nucleic acids research, 43(W1), W389-W394.
Jones, D. T. (1999). Protein secondary structure prediction based on position-specific scoring matrices. Journal of molecular biology, 292(2), 195-202.
Holley, L. H., & Karplus, M. (1989). Protein secondary structure prediction with a neural network. Proceedings of the National Academy of Sciences, 86(1), 152-156.
Koh, I. Y., Eyrich, V. A., Marti-Renom, M. A., Przybylski, D., Madhusudhan, M. S., Eswar, N., ... & Rost, B. (2003). EVA: evaluation of protein structure prediction servers. Nucleic Acids Research, 31(13), 3311-3315.
Krishnan, K. V. (1932). The Defence Mechanism of the Human Body. The Indian medical gazette, 67(11), 637.
KU, L. L. (1952). Lane medical lectures: proteins and enzymes.
Mirabello, C., & Pollastri, G. (2013). Porter, PaleAle 4.0: high-accuracy prediction of protein secondary structure and relative solvent accessibility. Bioinformatics, 29(16), 2056-2058.
Le, Q., Sievers, F., & Higgins, D. G. (2017). Protein multiple sequence alignment benchmarking through secondary structure prediction. Bioinformatics, 33(9), 1331-1337.
Pearson, W. R., & Lipman, D. J. (1988). Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences, 85(8), 2444-2448.
Pirovano, W., & Heringa, J. (2010). Protein secondary structure prediction. Data Mining Techniques for the Life Sciences, 327-348.
Rost, B., & Eyrich, V. A. (2001). EVA: large‐scale analysis of secondary structure prediction. Proteins: Structure, Function, and Bioinformatics, 45(S5), 192-199.
Silverman, R. B., & Holladay, M. W. (2014). The organic chemistry of drug design and drug action. Academic press.
Spencer, M., Eickholt, J., & Cheng, J. (2014). A deep learning network approach to ab initio protein secondary structure prediction. IEEE/ACM transactions on computational biology and bioinformatics, 12(1), 103-112.
Van Goudoever, J. B., Vlaardingerbroek, H., van den Akker, C. H., de Groof, F., & van der Schoor, S. R. (2014). Amino acids and proteins. Nutritional Care of Preterm Infants, 110, 49-63.
Zemla, A., Venclovas, Č., Fidelis, K., & Rost, B. (1999). A modified definition of Sov, a segment‐based measure for protein secondary structure prediction assessment. Proteins: Structure, Function, and Bioinformatics, 34(2), 220-223.

Kıyaslama Veri Kümelerinin Protein Yapı Tahminine Etkisi: Bir Kavram Çalışması

Yıl 2021, Sayı: 29, 117 - 121, 01.12.2021

Nuh Azgınoğlu

https://doi.org/10.31590/ejosat.1014716

Öz

Protein yapılarının bilinmesi hayati fonksiyonlarda görev alan proteinlerin görev tanımlarının anlaşılabilmesi, ilaç tasarımı ve daha birçok açıdan öneme sahiptir. Protein yapı tahmini ise laboratuvar ortamında oldukça uzun zaman alan süreci kısaltmak için alternatif bir biyoinformatik alt çalışma alanıdır. Bu alanda geliştirilen yöntemlerin performans analizleri genel itibariyle kıyaslama (benchmark) veri kümeleri üzerinden yapılmaktadır. Veri kümelerinin büyüklüğü algoritma çalışma zamanlarına doğrudan etki etmektedir. Bu çalışmada kapsamında kıyaslama veri kümelerinin sonuçlara nasıl yansıdığı analiz edilmiştir. Çalışma kapsamında iki CB513 ve EVASet olmak üzere iki farklı kıyaslama veri kümesi, JPred ve Porter olmak üzere iki farklı protein yapı tahmini yöntemi kullanılmıştır. Çalışma, protein özellikleri açısından geniş kapsamlı ancak, veri büyüklüğü anlamında olabildiğince az veri içerecek olan benchmark veri kümeleri geliştirme fikri itibariyle sonraki çalışmalar için esin kaynağı niteliğindedir.

Anahtar Kelimeler

Protein yapı tahmini, Kıyaslama veri kümesi, Kavram

Proje Numarası

FHD-2021-1045

Kaynakça

Asai, K., Hayamizu, S., & Handa, K. I. (1993). Prediction of protein secondary structure by the hidden Markov model. Bioinformatics, 9(2), 141-146.
Atasever, S., Azgınoglu, N., Erbay, H., & Aydın, Z. (2021). 3-State Protein Secondary Structure Prediction based on SCOPe Classes. Brazilian Archives of Biology and Technology, 64.
Aydin, Z., Azginoglu, N., Bilgin, H. I., & Celik, M. (2019). Developing structural profile matrices for protein secondary structure and solvent accessibility prediction. Bioinformatics, 35(20), 4004-4010.
Azginoglu, N., Aydin, Z., & Celik, M. (2020). Structural profile matrices for predicting structural properties of proteins. Journal of Bioinformatics and Computational Biology, 18(04), 2050022.
Bouziane, H., Messabih, B., & Chouarfia, A. (2015). Effect of simple ensemble methods on protein secondary structure prediction. Soft Computing, 19(6), 1663-1678.
Bujnicki, J. M., Elofsson, A., Fischer, D., & Rychlewski, L. (2001). LiveBench‐1: Continuous benchmarking of protein structure prediction servers. Protein Science, 10(2), 352-361.
Cuff, J. A., & Barton, G. J. (1999). Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins: Structure, Function, and Bioinformatics, 34(4), 508-519.
Drozdetskiy, A., Cole, C., Procter, J., & Barton, G. J. (2015). JPred4: a protein secondary structure prediction server. Nucleic acids research, 43(W1), W389-W394.
Jones, D. T. (1999). Protein secondary structure prediction based on position-specific scoring matrices. Journal of molecular biology, 292(2), 195-202.
Holley, L. H., & Karplus, M. (1989). Protein secondary structure prediction with a neural network. Proceedings of the National Academy of Sciences, 86(1), 152-156.
Koh, I. Y., Eyrich, V. A., Marti-Renom, M. A., Przybylski, D., Madhusudhan, M. S., Eswar, N., ... & Rost, B. (2003). EVA: evaluation of protein structure prediction servers. Nucleic Acids Research, 31(13), 3311-3315.
Krishnan, K. V. (1932). The Defence Mechanism of the Human Body. The Indian medical gazette, 67(11), 637.
KU, L. L. (1952). Lane medical lectures: proteins and enzymes.
Mirabello, C., & Pollastri, G. (2013). Porter, PaleAle 4.0: high-accuracy prediction of protein secondary structure and relative solvent accessibility. Bioinformatics, 29(16), 2056-2058.
Le, Q., Sievers, F., & Higgins, D. G. (2017). Protein multiple sequence alignment benchmarking through secondary structure prediction. Bioinformatics, 33(9), 1331-1337.
Pearson, W. R., & Lipman, D. J. (1988). Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences, 85(8), 2444-2448.
Pirovano, W., & Heringa, J. (2010). Protein secondary structure prediction. Data Mining Techniques for the Life Sciences, 327-348.
Rost, B., & Eyrich, V. A. (2001). EVA: large‐scale analysis of secondary structure prediction. Proteins: Structure, Function, and Bioinformatics, 45(S5), 192-199.
Silverman, R. B., & Holladay, M. W. (2014). The organic chemistry of drug design and drug action. Academic press.
Spencer, M., Eickholt, J., & Cheng, J. (2014). A deep learning network approach to ab initio protein secondary structure prediction. IEEE/ACM transactions on computational biology and bioinformatics, 12(1), 103-112.
Van Goudoever, J. B., Vlaardingerbroek, H., van den Akker, C. H., de Groof, F., & van der Schoor, S. R. (2014). Amino acids and proteins. Nutritional Care of Preterm Infants, 110, 49-63.
Zemla, A., Venclovas, Č., Fidelis, K., & Rost, B. (1999). A modified definition of Sov, a segment‐based measure for protein secondary structure prediction assessment. Proteins: Structure, Function, and Bioinformatics, 34(2), 220-223.

Toplam 22 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Mühendislik
Bölüm	Makaleler
Yazarlar	Nuh Azgınoğlu 0000-0002-4074-7366
Proje Numarası	FHD-2021-1045
Erken Görünüm Tarihi	15 Aralık 2021
Yayımlanma Tarihi	1 Aralık 2021
Yayımlandığı Sayı	Yıl 2021 Sayı: 29

Kaynak Göster

APA	Azgınoğlu, N. (2021). Effect of Benchmark Datasets on Protein Structure Prediction As a Concept. Avrupa Bilim Ve Teknoloji Dergisi(29), 117-121. https://doi.org/10.31590/ejosat.1014716

Kapak Resmi İndir

Makale Dosyaları

Tam Metin