Protein complex detection from protein protein i nteraction networks with machine learning methods
Yıl 2024,
Cilt: 30 Sayı: 3, 333 - 342, 29.06.2024
Yasin Karakuş
,
Volkan Altuntaş
Öz
Understanding Protein - Protein interaction networks, which show the interactions between proteins involved in tasks that are very important for our organisms such as structural support, storage, signal transduction and defence, provides a better understanding of cellular processes. One of the important studies carried out for this purpose is to try to detect protein complexes from protein - protein interaction networks. Supervised and unsupervised machine learning methods were used to detect protein complexes. It is known that the machine learning methods used produce better performance when more than one method is used together. Based on this knowledge, a method that detects protein complexes from protein-protein interaction networks is proposed in this study. The method first weights protein-protein interaction networks using biological and topological properties of proteins. Then it estimates local and global protein complex core. Then it builds a protein complex detection model using the structural modularity of proteins and the voting regression model. We predict that XGB regression, gaussian process regression, catboost regression and histogram-based gradient boosting regression supervised learning methods can achieve more successful results when used together in the voting regression model. When we compare the success of the model with other models, it has shown the best performance many times among the compared models.
Kaynakça
- [1] Sabzinezhad A, Jalili S. “DPCT: a dynamic method for detecting protein complexes from TAP-aware weighted ppi network”. Frontiers in Genetics, 11, 1-15, 2020.
- [2] Xu B, Li K, Zheng W, Liu X, Zhang Y, Zhao Z, He Z. “Protein complexes identification based on go attributed network embedding”. BMC Bioinformatics, 19, 1-10, 2018.
- [3] Altaf-Ul-Amin M, Shinbo Y, Mihara K, Kurokawa K, Kanaya S. “Development and implementation of an algorithm for detection of protein complexes in large interaction networks”. BMC Bioinformatics, 7, 207-213, 2006.
- [4] Wang R, Wang C, Liu G. “A novel graph clustering method with a greedy heuristic search algorithm for mining protein complexes from dynamic and static ppi networks”. Information Sciences, 522, 275-298, 2020.
- [5] Dilmaghani S, Brust MR, Ribeiro CHC, Kieffer E, Danoy G, Bouvry P. “From communities to protein complexes: a local community detection algorithm on PPI networks”. Plos One, 17(1), 1-17, 2022.
- [6] Yu Y, Kong D. “Protein complexes detection based on node local properties and gene expression in PPI weighted networks”. BMC Bioinformatics, 23, 1-15, 2022.
- [7] Cho Y, Hwang W, Zhang A. “Identification of overlapping functional modules in protein interaction networks: information flow-based approach”. Sixth IEEE International Conference on Data Mining-Workshops (ICDMW'06), Hong Kong, China, 18-22 December 2006.
- [8] Omranian S, Angeleska A, Nikoloski Z. “PC2P: parameter-free network-based prediction of protein complexes”. Bioinformatics, 37(1), 73-81, 2021.
- [9] Wang R, Wang C, Ma H. “Detecting protein complexes with multiple properties by an adaptive harmony search algorithm”. BMC Bioinformatics, 23(1), 1-32, 2022.
- [10] Xu B, Wang Y, Wang Z, Zhou J, Zhou S, Guan J. “An effective approach to detecting both small and large complexes from protein-protein interaction networks”. BMC Bioinformatics, 18, 1-10, 2017.
- [11] Wang X, Zhang N, Zhao Y, Wang J. “A new method for recognizing protein complexes based on protein interaction networks and GO terms”. Frontiers in Genetics, 12, 1-7, 2021.
- [12] Zhang XF, Dai DQ, Li XX. “Protein complexes discovery based on protein-protein interaction data via a regularized sparse generative network model”. IEEE-ACM Transactions on Computational Biology and Bioinformatics, 9(3), 857-870, 2012.
- [13] Xu B, Lin H, Chen Y, Yang Z, Liu H. “Protein complex identification by integrating protein-protein interaction evidence from multiple sources”. Plos One, 8(12), 1-12, 2013.
- [14] Zaki N, Singh H, Mohamed EA. “Identifying protein complexes in protein-protein interaction data using graph convolutional network”. IEEE Access, 9, 123717-123726, 2021.
- [15] Xenarios I, Salwinski L, Duan XJ, Higney P, Kim SM, Eisenberg D. “DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions”. Nucleic Acids Research, 30(1), 303-305, 2002.
- [16] Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C, Jensen LJ, Bastuck S, Dümpelfeld B, Edelmann A, Heurtier MA, Hoffman V, Hoefert C, Klein K, Hudak M, Michon AM, Schelder M, Schirle M, Remor M, Rudi T, Hooper S, Bauer A, Bouwmeester T, Casari G, Drewes G, Neubauer G, Rick JM, Kuster B, Bork P, Russell RB, Superti-Furga G. “Proteome survey reveals modularity of the yeast cell machinery”. Nature, 440(7084), 631-636, 2006.
- [17] Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, Li J, Pu S, Datta N, Tikuisis AP, Punna T, Peregrín-Alvarez JM, Shales M, Zhang X, Davey M, Robinson MD, Paccanaro A, Bray JE, Sheung A, Beattie B, Richards DP, Canadien V, Lalev A, Mena F, Wong P, Starostine A, Canete MM, Vlasblom J, Wu S, Orsi C, Collins SR, Chandran S, Haw R, Rilstone JJ, Gandi K, Thompson NJ, Musso G, Onge PS, Ghanny S, Lam MHY, Butland G, Altaf-Ul AM, Kanaya S, Shilatifard A, O'Shea E, Weissman JS, Ingles CJ, Hughes TR, Parkinson J, Gerstein M, Wodak SJ, Emili A, Greenblatt JF. “Global landscape of protein complexes in the yeast saccharomyces cerevisiae”. Nature, 440(7084), 637-643, 2006.
- [18] Güldener U, Münsterkötter M, Oesterheld M, Pagel P, Ruepp A, Mewes HW, Stümpflen V. “Mpact: the mips protein interaction resource on yeast”. Nucleic Acids Research, 34, D436-441, 2006.
- [19] Hong EL, Balakrishnan R, Dong Q, Christie KR, Park J, Binkley G, Costanzo MC, Dwight SS, Engel SR, Fisk DG, Hirschman JE, Hitz BC, Krieger CJ, Livstone MS, Miyasato SR, Nash RS, Oughtred R, Skrzypek MS, Weng S, Wong ED, Zhu KK, Dolinski K, Botstein D, Cherry JM. “Gene ontology annotations at sgd: new data sources and annotation methods”. Nucleic Acids Research, 36, D577-D581, 2007.
- [20] Aloy P, Bööttcher B, Ceulemans H, Leutwein C, Mellwig C, Fischer S, Gavin AC, Bork P, Superti-Furga G, Serrano L, Russell RB. “Structure-based assembly of protein complexes in yeast”. Science, 303(5666), 2026-2029, 2004.
- [21] Pu S, Wong J, Turner B, Cho E, Wodak SJ. “Up-to-date catalogues of yeast protein complexes”. Nucleic Acids Research, 37(3), 825-831, 2009.
- [22] Friedel CC, Krumsiek J, Zimmer R. “Bootstrapping the interactome: unsupervised identification of protein complexes in yeast”. Journal of Computational Biology, 16(8), 971-987, 2009.
- [23] Ma CY, Chen YP, Berger B, Liao CS. “Identification of protein complexes by integrating multiple alignment of protein interaction networks”. Bioinformatics, 33(11), 1681-1688, 2017.
- [24] Spirin V, Mirny LA. “Protein complexes and functional modules in molecular networks”. Proceedings of the National Academy of Sciences of the United States of America, 100(21), 12123-12128, 2003.
- [25] Zhang J, Zhong C, Huang Y, Lin HX, Wang M. “A method for identifying protein complexes with the features of joint co-localization and joint co-expression in static PPI networks”. Computers in Biology and Medicine, 111, 1-10, 2019.
- [26] Grover A, Leskovec J. “node2vec: scalable feature learning for networks”. KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, California, USA, 13-17 August 2016.
- [27] Wu M, Li X, Kwoh CK, Ng SK. “A core-attachment based method to detect protein complexes in ppi networks”. BMC Bioinformatics, 10, 1-16, 2009.
- [28] Wang R, Wang C, Sun L, Liu G. “A seed-extended algorithm for detecting protein complexes based on density and modularity with topological structure and go annotations”. BMC Genomics, 20, 1-28, 2019.
- [29] Ghahramani Z. “A tutorial on gaussian processes (or why i don’t use SVMs)". Machine Learning Summer School (MLSS), Bordeaux, Fransa, 4-17 September 2011.
- [30] Wang R, Ma H, Wang C. “An ensemble learning framework for detecting protein complexes from PPI networks”. Frontiers in Genetics, 13, 1-22, 2022.
- [31] Peng W, Wang J, Zhao B, Wang L. “Identification of protein complexes using weighted pagerank-nibble algorithm and core-attachment structure”. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 12(1), 179-192, 2014.
- [32] Nepusz T, Yu H, Paccanaro A. “Detecting overlapping protein complexes in protein-protein interaction networks”. Nature Methods, 9(5), 471-472, 2012.
- [33] Zaki N, Efimov D, Berengueres J. “Protein complex detection using interaction reliability assessment and weighted clustering coefficient”. BMC Bioinformatics, 14, 1-9, 2013.
- [34] Zhang Y, Lin H, Yang Z, Wang J, Liu Y, Sang S. “A method for predicting protein complex in dynamic ppi networks”. BMC Bioinformatics, 17, 533-543, 2016.
- [35] Liu Q, Song J, Li J. “Using contrast patterns between true complexes and random subgraphs in ppi networks to predict unknown protein complexes”. Scientific Reports, 6, 1-15, 2016.
- [36] Dong Y, Sun Y, Qin C. “Predicting protein complexes using a supervised learning method combined with local structural in
Protein-Protein etkileşim ağlarından makine öğrenmesi yöntemleriyle protein kompleksi tespiti
Yıl 2024,
Cilt: 30 Sayı: 3, 333 - 342, 29.06.2024
Yasin Karakuş
,
Volkan Altuntaş
Öz
Yapısal destek, depolama, sinyal iletimi, savunma gibi organizmalarımız için çok önemli olan görevlerde yer alan proteinlerin birbirleriyle olan ilişkilerinin gösterildiği Protein-Protein etkileşim ağlarını anlayabilmek hücresel süreçleri daha iyi anlayabilmeyi sağlamaktadır. Bu amaçla yapılan önemli çalışmalardan birisi protein-protein etkileşim ağlarından protein komplekslerini tespit etmeye çalışmaktır. Protein komplekslerini tespit etmek için denetimli ve denetimsiz makine öğrenmesi yöntemleri kullanılmıştır. Kullanılan makine öğrenmesi yöntemlerinin birden fazla yöntem bir arada kullanıldığında daha iyi performans ürettiği bilinmektedir. Buna benzer bilgilere dayanarak bu çalışmada protein-protein etkileşim ağlarından protein komplekslerini tespit eden bir yöntem önerilmiştir. Yöntem, ilk olarak protein-protein etkileşim ağlarını proteinlerin biyolojik ve topolojik özelliklerini kullanarak ağırlıklandırır. Ardından yerel ve global protein kompleksi çekirdeklerini tahmin eder. Sonra proteinlerin yapısal modülerliğini ve oylama regresyon modelini kullanarak protein kompleksi tespit eden model oluşturur. XGB regresyonu, gauss süreci regresyonu, catboost regresyonu ve histogram tabanlı gradyan artırma regresyonu denetimli öğrenme yöntemlerinin oylamalı regresyon modelinde birlikte kullanıldığında daha başarılı sonuçlar elde edebileceğini öngörüyoruz. Modelin başarısını diğer modellerle kıyasladığımızda kıyaslanan modeller arasında birçok kez en iyi performansı göstermiştir.
Kaynakça
- [1] Sabzinezhad A, Jalili S. “DPCT: a dynamic method for detecting protein complexes from TAP-aware weighted ppi network”. Frontiers in Genetics, 11, 1-15, 2020.
- [2] Xu B, Li K, Zheng W, Liu X, Zhang Y, Zhao Z, He Z. “Protein complexes identification based on go attributed network embedding”. BMC Bioinformatics, 19, 1-10, 2018.
- [3] Altaf-Ul-Amin M, Shinbo Y, Mihara K, Kurokawa K, Kanaya S. “Development and implementation of an algorithm for detection of protein complexes in large interaction networks”. BMC Bioinformatics, 7, 207-213, 2006.
- [4] Wang R, Wang C, Liu G. “A novel graph clustering method with a greedy heuristic search algorithm for mining protein complexes from dynamic and static ppi networks”. Information Sciences, 522, 275-298, 2020.
- [5] Dilmaghani S, Brust MR, Ribeiro CHC, Kieffer E, Danoy G, Bouvry P. “From communities to protein complexes: a local community detection algorithm on PPI networks”. Plos One, 17(1), 1-17, 2022.
- [6] Yu Y, Kong D. “Protein complexes detection based on node local properties and gene expression in PPI weighted networks”. BMC Bioinformatics, 23, 1-15, 2022.
- [7] Cho Y, Hwang W, Zhang A. “Identification of overlapping functional modules in protein interaction networks: information flow-based approach”. Sixth IEEE International Conference on Data Mining-Workshops (ICDMW'06), Hong Kong, China, 18-22 December 2006.
- [8] Omranian S, Angeleska A, Nikoloski Z. “PC2P: parameter-free network-based prediction of protein complexes”. Bioinformatics, 37(1), 73-81, 2021.
- [9] Wang R, Wang C, Ma H. “Detecting protein complexes with multiple properties by an adaptive harmony search algorithm”. BMC Bioinformatics, 23(1), 1-32, 2022.
- [10] Xu B, Wang Y, Wang Z, Zhou J, Zhou S, Guan J. “An effective approach to detecting both small and large complexes from protein-protein interaction networks”. BMC Bioinformatics, 18, 1-10, 2017.
- [11] Wang X, Zhang N, Zhao Y, Wang J. “A new method for recognizing protein complexes based on protein interaction networks and GO terms”. Frontiers in Genetics, 12, 1-7, 2021.
- [12] Zhang XF, Dai DQ, Li XX. “Protein complexes discovery based on protein-protein interaction data via a regularized sparse generative network model”. IEEE-ACM Transactions on Computational Biology and Bioinformatics, 9(3), 857-870, 2012.
- [13] Xu B, Lin H, Chen Y, Yang Z, Liu H. “Protein complex identification by integrating protein-protein interaction evidence from multiple sources”. Plos One, 8(12), 1-12, 2013.
- [14] Zaki N, Singh H, Mohamed EA. “Identifying protein complexes in protein-protein interaction data using graph convolutional network”. IEEE Access, 9, 123717-123726, 2021.
- [15] Xenarios I, Salwinski L, Duan XJ, Higney P, Kim SM, Eisenberg D. “DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions”. Nucleic Acids Research, 30(1), 303-305, 2002.
- [16] Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C, Jensen LJ, Bastuck S, Dümpelfeld B, Edelmann A, Heurtier MA, Hoffman V, Hoefert C, Klein K, Hudak M, Michon AM, Schelder M, Schirle M, Remor M, Rudi T, Hooper S, Bauer A, Bouwmeester T, Casari G, Drewes G, Neubauer G, Rick JM, Kuster B, Bork P, Russell RB, Superti-Furga G. “Proteome survey reveals modularity of the yeast cell machinery”. Nature, 440(7084), 631-636, 2006.
- [17] Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, Li J, Pu S, Datta N, Tikuisis AP, Punna T, Peregrín-Alvarez JM, Shales M, Zhang X, Davey M, Robinson MD, Paccanaro A, Bray JE, Sheung A, Beattie B, Richards DP, Canadien V, Lalev A, Mena F, Wong P, Starostine A, Canete MM, Vlasblom J, Wu S, Orsi C, Collins SR, Chandran S, Haw R, Rilstone JJ, Gandi K, Thompson NJ, Musso G, Onge PS, Ghanny S, Lam MHY, Butland G, Altaf-Ul AM, Kanaya S, Shilatifard A, O'Shea E, Weissman JS, Ingles CJ, Hughes TR, Parkinson J, Gerstein M, Wodak SJ, Emili A, Greenblatt JF. “Global landscape of protein complexes in the yeast saccharomyces cerevisiae”. Nature, 440(7084), 637-643, 2006.
- [18] Güldener U, Münsterkötter M, Oesterheld M, Pagel P, Ruepp A, Mewes HW, Stümpflen V. “Mpact: the mips protein interaction resource on yeast”. Nucleic Acids Research, 34, D436-441, 2006.
- [19] Hong EL, Balakrishnan R, Dong Q, Christie KR, Park J, Binkley G, Costanzo MC, Dwight SS, Engel SR, Fisk DG, Hirschman JE, Hitz BC, Krieger CJ, Livstone MS, Miyasato SR, Nash RS, Oughtred R, Skrzypek MS, Weng S, Wong ED, Zhu KK, Dolinski K, Botstein D, Cherry JM. “Gene ontology annotations at sgd: new data sources and annotation methods”. Nucleic Acids Research, 36, D577-D581, 2007.
- [20] Aloy P, Bööttcher B, Ceulemans H, Leutwein C, Mellwig C, Fischer S, Gavin AC, Bork P, Superti-Furga G, Serrano L, Russell RB. “Structure-based assembly of protein complexes in yeast”. Science, 303(5666), 2026-2029, 2004.
- [21] Pu S, Wong J, Turner B, Cho E, Wodak SJ. “Up-to-date catalogues of yeast protein complexes”. Nucleic Acids Research, 37(3), 825-831, 2009.
- [22] Friedel CC, Krumsiek J, Zimmer R. “Bootstrapping the interactome: unsupervised identification of protein complexes in yeast”. Journal of Computational Biology, 16(8), 971-987, 2009.
- [23] Ma CY, Chen YP, Berger B, Liao CS. “Identification of protein complexes by integrating multiple alignment of protein interaction networks”. Bioinformatics, 33(11), 1681-1688, 2017.
- [24] Spirin V, Mirny LA. “Protein complexes and functional modules in molecular networks”. Proceedings of the National Academy of Sciences of the United States of America, 100(21), 12123-12128, 2003.
- [25] Zhang J, Zhong C, Huang Y, Lin HX, Wang M. “A method for identifying protein complexes with the features of joint co-localization and joint co-expression in static PPI networks”. Computers in Biology and Medicine, 111, 1-10, 2019.
- [26] Grover A, Leskovec J. “node2vec: scalable feature learning for networks”. KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, California, USA, 13-17 August 2016.
- [27] Wu M, Li X, Kwoh CK, Ng SK. “A core-attachment based method to detect protein complexes in ppi networks”. BMC Bioinformatics, 10, 1-16, 2009.
- [28] Wang R, Wang C, Sun L, Liu G. “A seed-extended algorithm for detecting protein complexes based on density and modularity with topological structure and go annotations”. BMC Genomics, 20, 1-28, 2019.
- [29] Ghahramani Z. “A tutorial on gaussian processes (or why i don’t use SVMs)". Machine Learning Summer School (MLSS), Bordeaux, Fransa, 4-17 September 2011.
- [30] Wang R, Ma H, Wang C. “An ensemble learning framework for detecting protein complexes from PPI networks”. Frontiers in Genetics, 13, 1-22, 2022.
- [31] Peng W, Wang J, Zhao B, Wang L. “Identification of protein complexes using weighted pagerank-nibble algorithm and core-attachment structure”. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 12(1), 179-192, 2014.
- [32] Nepusz T, Yu H, Paccanaro A. “Detecting overlapping protein complexes in protein-protein interaction networks”. Nature Methods, 9(5), 471-472, 2012.
- [33] Zaki N, Efimov D, Berengueres J. “Protein complex detection using interaction reliability assessment and weighted clustering coefficient”. BMC Bioinformatics, 14, 1-9, 2013.
- [34] Zhang Y, Lin H, Yang Z, Wang J, Liu Y, Sang S. “A method for predicting protein complex in dynamic ppi networks”. BMC Bioinformatics, 17, 533-543, 2016.
- [35] Liu Q, Song J, Li J. “Using contrast patterns between true complexes and random subgraphs in ppi networks to predict unknown protein complexes”. Scientific Reports, 6, 1-15, 2016.
- [36] Dong Y, Sun Y, Qin C. “Predicting protein complexes using a supervised learning method combined with local structural in