Development of a New Supervised Principal Component Analysis Based on Artificial Neural Networks in Gene Expression Data

Mevlüt Türe; İmran Kurt Ömürlü

doi:10.20515/otd.371882

Araştırma Makalesi

Gen Ekspresyon Verilerinde Yapay Sinir Ağlarına Dayalı Yeni Bir Denetimli Temel Bileşenler Analizi’nin Geliştirilmesi

Yıl 2018, Cilt: 40 Sayı: 1, 20 - 27, 22.02.2018

Mevlüt Türe , İmran Kurt Ömürlü

https://doi.org/10.20515/otd.371882

Cited By: 1

Öz

Bu çalışmada,
denetimli temel bileşenler analizi (D-TBA) ile yeni bir yaklaşım olarak
önerilen yapay sinir ağlarıyla denetimli temel bileşenler analizi (D-YSA-TBA)
kullanılarak çok boyutlu gen ekspresyon verilerinin boyutunun indirgenmesi ve
random survival forests (RSF) analizi kullanılarak performansların
karşılaştırılması amaçlandı. Simülasyon uygulamasında çok değişkenli normal
dağılımdan 100 birim için 5000 gen ve bu gen verisi ile ilişkili yaşam süresi
verisi türetildi. Simülasyon aşaması 1000 tekrarlı olarak gerçekleştirildi.
Ayrıca yaygın B-hücreli lenfoma (DLBCL) hastası 240 bireye ilişkin gen
ekspresyon verileri kullanıldı. Önemli genlerin seçiminde Wald istatistiği
kullanılarak boyut indirgemesi yapıldı. Yöntemlerden elde edilen yeni veri
setleri RSF analizi kullanılarak analiz edildi. Simülasyon uygulamasında D-TBA
ve D-YSA-TBAyöntemlerinin açıklayıcılıkları arasında anlamlı bir fark olduğu
görülmüştür (p<0.001). DLBCL verisi ile yapılan uygulamada D-TBA yönteminin
hatasının %36.78, D-YSA-TBA yönteminin ise RSF sonucu- %43 olduğu bulunmuştur. D-TBA yönteminin önem
değeri diğer yöntemden daha büyük, hatası ise daha düşük çıkmıştır. Çok
boyutluluk problemi yaşanan gen ekspresyon verilerinin analizinde D-TBA, D-YSA-TBA’ya
göre daha iyi performans göstermiştir.

Anahtar Kelimeler

Boyut indirgeme; yapay sinir ağları; denetimli temel bileşenler analizi; random survival forests; gen ekspresyon

Kaynakça

1. Rosenwald A, Wright G, Chan WC, Connors JM, Campo E, Fisher RI, et al. The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. New England Journal of Medicine. 2002;346(25):1937-47.
2. Bair E, Tibshirani R. Semi-supervised methods to predict patient survival from gene expression data. PLoS biology. 2004;2(4):e108.
3. Chen X, Wang L, Smith JD, Zhang B. Supervised principal component analysis for gene set enrichment of microarray data with continuous or survival outcomes. Bioinformatics. 2008;24(21):2474-81.
4. Beer DG, Kardia SL, Huang C-C, Giordano TJ, Levin AM, Misek DE, et al. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nature medicine. 2002;8(8):816-24.
5. Kramer MA. Nonlinear principal component analysis using autoassociative neural networks. AIChE journal. 1991;37(2):233-43.
6. Hsieh WW. Machine learning methods in the environmental sciences: Neural networks and kernels: Cambridge university press; 2009.
7. Monahan AH. Nonlinear principal component analysis by neural networks: theory and application to the Lorenz system. Journal of Climate. 2000;13(4):821-35.
8. Scholz M, Fraunholz M, Selbig J. Nonlinear principal component analysis: neural network models and applications. Principal manifolds for data visualization and dimension reduction: Springer; 2008. p. 44-67.
9. Dong D, McAvoy TJ. Batch tracking via nonlinear principal component analysis. AIChE Journal. 1996;42(8):2199-208.
10. Hayat EA, Mevlut T, Senol S. An Alternative Dimension Reduction Approach to Supervised Principal Components Analysis in High Dimensional Survival Data. Turkiye Klinikleri Journal of Biostatistics. 2016;8(1):21-9.
11. Albanis G, Batchelor R, editors. Assessing the long-term credit standing using dimensionality reduction techniques based on neural networks—an alternative to overfitting. The proceedings of the SCI 99/ISAS 99 conference, Orlando, US; 1999.
12. Hsieh WW. Nonlinear principal component analysis by neural networks. Tellus A: Dynamic Meteorology and Oceanography. 2001;53(5):599-615.
13. Ture M, Kurt I, Akturk Z. Comparison of dimension reduction methods using patient satisfaction data. Expert Systems with Applications. 2007;32(2):422-6.
14. Oja E. Principal components, minor components, and linear neural networks. Neural networks. 1992;5(6):927-35.
15. Fotheringhame D, Baddeley R. Nonlinear principal components analysis of neuronal spike train data. Biological Cybernetics. 1997;77(4):283-8.
16. Daszykowski M, Walczak B, Massart D. A journey into low-dimensional spaces with autoassociative neural networks. Talanta. 2003;59(6):1095-105.
17. Michailidis G, de Leeuw J. Multilevel homogeneity analysis with differential weighting. Computational statistics & data analysis. 2000;32(3):411-42.
18. Bøvelstad HM, Nygård S, Størvold HL, Aldrin M, Borgan Ø, Frigessi A, et al. Predicting survival from microarray data—a comparative study. Bioinformatics. 2007;23(16):2080-7.
19. Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology. 1983;148(3):839-43.
20. Haykin S. Neural Networks, a comprehensive foundation,2nd ed., Prentice Hall, 842 p. 1999.
21. Breiman L. Random forests. Machine learning. 2001;45(1):5-32.
22. Ishwaran H, Kogalur UB. Random survival forests for R. R News. 2007;7(2):25-31.
23. Nguyen TS, Rojo J. Dimension reduction of microarray data in the presence of a censored survival response: a simulation study. Statistical applications in genetics and molecular biology. 2009;8(1):1-38.
24. Van Wieringen WN, Kun D, Hampel R, Boulesteix A-L. Survival prediction using gene expression data: a review and comparison. Computational statistics & data analysis. 2009;53(5):1590-603.
25. Zhao H, Ljungberg B, Grankvist K, Rasmuson T, Tibshirani R, Brooks JD. Gene expression profiling predicts survival in conventional renal cell carcinoma. PLoS medicine. 2005;3(1):e13.
26. Liu B, Cui Q, Jiang T, Ma S. A combinational feature selection and ensemble neural network method for classification of gene expression data. BMC bioinformatics. 2004;5(1):136.
27. O'Neill MC, Song L. Neural network analysis of lymphoma microarray data: prognosis and diagnosis near-perfect. BMC bioinformatics. 2003;4(1):13.
28. Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, et al. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature medicine. 2001;7(6):673-9.
29. Quackenbush J. Computational analysis of microarray data. Nature reviews genetics. 2001;2(6):418-27.
30. Zhang H, Yu C-Y, Singer B, Xiong M. Recursive partitioning for tumor classification with gene expression microarray data. Proceedings of the National Academy of Sciences. 2001;98(12):6730-5.
31. Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American statistical association. 2002;97(457):77-87.

Development of a New Supervised Principal Component Analysis Based on Artificial Neural Networks in Gene Expression Data

Yıl 2018, Cilt: 40 Sayı: 1, 20 - 27, 22.02.2018

Mevlüt Türe , İmran Kurt Ömürlü

https://doi.org/10.20515/otd.371882

Cited By: 1

Öz

The
aim of this study is dimension reduction of multidimensional gene expression
data using supervised principal component analysis (S-PCA) and –proposed as a
new approach- supervised principal component analysis with artificial neural
networks (S-ANN-PCA) and to compare performances of these two methods by using
random survival forests (RSF). In simulation application 5000 genes were
generated according to multivariate normal distribution and then survival time
that is correlated to these gene data were generated for 100 units. Simulation
step was carried out with 1000 repetitions.

In
addition, gene expression data for 240 individuals with extensive B-cell
lymphoma (DLBCL) were used. Dimension reduction was done using Wald statistic
in selection of important genes. The new data sets obtained from the methods
were analyzed using RSF analysis.In the simulation application, it was obtained
that the explanatoriness of S-PCA was significantly different from S-ANN-PCA
(p<0.001). In the DLBCL data application, it was found that the error rate
for the S-PCA was 36.78% and 43% for the S-ANN-PCA as a result of RSF. The
importance value of S-PCA method was found to be higher and its error rate was
found to be lower than the other method.S-PCA performed better than S-ANN-PCA
in analyzing gene expression data experiencing a multidimensional problem.

Anahtar Kelimeler

Dimension reduction; Neural networks; Supervised principal component analysis; Random survival forests; Gene expression

Kaynakça

1. Rosenwald A, Wright G, Chan WC, Connors JM, Campo E, Fisher RI, et al. The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. New England Journal of Medicine. 2002;346(25):1937-47.
2. Bair E, Tibshirani R. Semi-supervised methods to predict patient survival from gene expression data. PLoS biology. 2004;2(4):e108.
3. Chen X, Wang L, Smith JD, Zhang B. Supervised principal component analysis for gene set enrichment of microarray data with continuous or survival outcomes. Bioinformatics. 2008;24(21):2474-81.
4. Beer DG, Kardia SL, Huang C-C, Giordano TJ, Levin AM, Misek DE, et al. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nature medicine. 2002;8(8):816-24.
5. Kramer MA. Nonlinear principal component analysis using autoassociative neural networks. AIChE journal. 1991;37(2):233-43.
6. Hsieh WW. Machine learning methods in the environmental sciences: Neural networks and kernels: Cambridge university press; 2009.
7. Monahan AH. Nonlinear principal component analysis by neural networks: theory and application to the Lorenz system. Journal of Climate. 2000;13(4):821-35.
8. Scholz M, Fraunholz M, Selbig J. Nonlinear principal component analysis: neural network models and applications. Principal manifolds for data visualization and dimension reduction: Springer; 2008. p. 44-67.
9. Dong D, McAvoy TJ. Batch tracking via nonlinear principal component analysis. AIChE Journal. 1996;42(8):2199-208.
10. Hayat EA, Mevlut T, Senol S. An Alternative Dimension Reduction Approach to Supervised Principal Components Analysis in High Dimensional Survival Data. Turkiye Klinikleri Journal of Biostatistics. 2016;8(1):21-9.
11. Albanis G, Batchelor R, editors. Assessing the long-term credit standing using dimensionality reduction techniques based on neural networks—an alternative to overfitting. The proceedings of the SCI 99/ISAS 99 conference, Orlando, US; 1999.
12. Hsieh WW. Nonlinear principal component analysis by neural networks. Tellus A: Dynamic Meteorology and Oceanography. 2001;53(5):599-615.
13. Ture M, Kurt I, Akturk Z. Comparison of dimension reduction methods using patient satisfaction data. Expert Systems with Applications. 2007;32(2):422-6.
14. Oja E. Principal components, minor components, and linear neural networks. Neural networks. 1992;5(6):927-35.
15. Fotheringhame D, Baddeley R. Nonlinear principal components analysis of neuronal spike train data. Biological Cybernetics. 1997;77(4):283-8.
16. Daszykowski M, Walczak B, Massart D. A journey into low-dimensional spaces with autoassociative neural networks. Talanta. 2003;59(6):1095-105.
17. Michailidis G, de Leeuw J. Multilevel homogeneity analysis with differential weighting. Computational statistics & data analysis. 2000;32(3):411-42.
18. Bøvelstad HM, Nygård S, Størvold HL, Aldrin M, Borgan Ø, Frigessi A, et al. Predicting survival from microarray data—a comparative study. Bioinformatics. 2007;23(16):2080-7.
19. Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology. 1983;148(3):839-43.
20. Haykin S. Neural Networks, a comprehensive foundation,2nd ed., Prentice Hall, 842 p. 1999.
21. Breiman L. Random forests. Machine learning. 2001;45(1):5-32.
22. Ishwaran H, Kogalur UB. Random survival forests for R. R News. 2007;7(2):25-31.
23. Nguyen TS, Rojo J. Dimension reduction of microarray data in the presence of a censored survival response: a simulation study. Statistical applications in genetics and molecular biology. 2009;8(1):1-38.
24. Van Wieringen WN, Kun D, Hampel R, Boulesteix A-L. Survival prediction using gene expression data: a review and comparison. Computational statistics & data analysis. 2009;53(5):1590-603.
25. Zhao H, Ljungberg B, Grankvist K, Rasmuson T, Tibshirani R, Brooks JD. Gene expression profiling predicts survival in conventional renal cell carcinoma. PLoS medicine. 2005;3(1):e13.
26. Liu B, Cui Q, Jiang T, Ma S. A combinational feature selection and ensemble neural network method for classification of gene expression data. BMC bioinformatics. 2004;5(1):136.
27. O'Neill MC, Song L. Neural network analysis of lymphoma microarray data: prognosis and diagnosis near-perfect. BMC bioinformatics. 2003;4(1):13.
28. Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, et al. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature medicine. 2001;7(6):673-9.
29. Quackenbush J. Computational analysis of microarray data. Nature reviews genetics. 2001;2(6):418-27.
30. Zhang H, Yu C-Y, Singer B, Xiong M. Recursive partitioning for tumor classification with gene expression microarray data. Proceedings of the National Academy of Sciences. 2001;98(12):6730-5.
31. Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American statistical association. 2002;97(457):77-87.

Toplam 31 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Sağlık Kurumları Yönetimi
Bölüm	ORİJİNAL MAKALELER / ORIGINAL ARTICLES
Yazarlar	Mevlüt Türe İmran Kurt Ömürlü
Yayımlanma Tarihi	22 Şubat 2018
Yayımlandığı Sayı	Yıl 2018 Cilt: 40 Sayı: 1

Kaynak Göster

Vancouver	Türe M, Kurt Ömürlü İ. Development of a New Supervised Principal Component Analysis Based on Artificial Neural Networks in Gene Expression Data. Osmangazi Tıp Dergisi. 2018;40(1):20-7.

Cited By

Survival Prediction with Extreme Learning Machine, Supervised Principal Components and Regularized Cox Models in High-Dimensional Survival Data by Simulation

Gazi University Journal of Science

https://doi.org/10.35378/gujs.1223015

Makale Dosyaları

Tam Metin