Solution of Behrens-Fisher Problem For High-Dimensional Data and Comparison of Proposed Test Statistics
Year 2021,
Volume: 16 Issue: 2, 397 - 415, 25.11.2021
Mehmet Sandal
,
Zeki Yıldız
Abstract
Comparison of multivariate mean of more than two groups is generally known as multivariate variance analysis (MANOVA) problem. However, classical test statistics used to solve MANOVA problems are highly affected by assumption violations. In addition, multivariate test statistics are insufficient in cases where the number of dependent variables is greater than the number of observations. This study aims to compare some proposed test statistics to determine whether mean vectors are equal in high-dimensional Behrens-Fisher problems. For this purpose, a simulation analysis was carried out for four different test statistics in cases where variance-covariance matrices are heterogeneous. Three different variance-covariance models were used in the study. In addition, different experimental conditions were taken into account for the number of dependent variables and the number of observations. The results of the study showed that the performances of the test statistics were generally comparable. However, it has been observed that the performance of test statistics varies according to experimental conditions.
References
- [1] R. Alpar, Uygulamalı Çok Değişkenli İstatistiksel Yöntemler, Detay Yayıncılık, Ankara, 2011.
- [2] H. Finch, and B. French, “A monte carlo comparison of robust MANOVA test statistics,” J. Mod. Appl. Stat. Methods, 12 (2), 35-81, 2013.
- [3] S.S. Wilks, “Certain generalizations in the analysis of variance,” Biometrika, 24 (3-4), 471-494, 1932.
- [4] H. Hotelling, “A generalized T test and measure of multivariate dispersion,” Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, 23-41, 1951.
- [5] K.C.S. Pillai, “Some new test criteria in multivariate analysis,” Ann. Math. Stat., 26 (1), 117–121, 1955.
- [6] S.N. Roy, Some Aspects of Multivariate Analysis, Wiley, New York, 1957.
- [7] K. Jiamwattanapong, and S. Chongcharoen, “A new test for the mean vector in high-dimensional data,” Songklanakarin Journal of Science and Technology, 37 (4), 477-484, 2015.
- [8] Y. Shen, Z. Lin, and J. Zhu, “Shrinkage-based regularization tests for high-dimensional data with application to gene set analysis,” Comput. Stat. Data Anal., 55 (7), 2221–2233, 2011.
- [9] A.P. Dempster, “A high dimensional two sample significance test,” Ann. Math. Stat., 29 (4), 995–1010, 1958,
- [10] A.P. Dempster, “A significant test for the separation of two highly multivariate small samples,” Biometrics, 16 (1), 41–50, 1960.
- [11] Z. Bai, and H. Saranadasa, “Effect of high dimension: By an example of a two sample problem,” Statistica Sinica, 6 (2), 311–329, 1996.
- [12] S. Chen, and Y. Qin, “A two-sample test for high-dimensional data with applications to gene-set testing,” Ann. Stat., 38 (2), 808–835, 2010.
- [13] Y. Fujikoshi, T. Himeno, and H. Wakaki, “Asymptotic results of a high dimensional MANOVA test and power comparisons when the dimension is large compared to the sample size,” Journal of Japan Statistical Society, 34, 19-26, 2004.
- [14] M.S. Srivastava, and Y. Fujikoshi, “Multivariate analysis of variance with fewer observations than the dimension,” J. Multivar. Anal., 97 (9), 1927–1940, 2006.
- [15] M.S. Srivastava, “Multivariate theory for analyzing high-dimensional data,” Journal of the Japan Statistical Society, 37 (1), 53–86, 2007.
- [16] J.R. Schott, “Some high-dimensional tests for a one-way MANOVA,” J. Multivar. Anal., 98 (9), 1825-1839, 2007.
- [17] J.T. Zhang, and J. Xu, “On the k-sample Behrens-Fisher problem for high-dimensional data,” Science in China Series A:Mathematics, 52 (6), 1285-1304, 2009.
- [18] T. Yamada, and M.S. Srivastava, “A test for the multivariate analysis of variance in high-dimension,” Commun. Stat. Theory Methods, 41, 2602–2612, 2012.
- [19] M.S. Srivastava, and T. Kubokawa, “Tests for multivariate analysis of variance in high dimension under non-normality,” J. Multivar. Anal., 115, 204-216, 2013.
- [20] T. Yamada, and T. Himeno, “Testing homogeneity of mean vectors under heteroscedasticity in high-dimension,” J. Multivar. Anal., 139, 7-27, 2015.
- [21] B. Zhou, “Linear hypothesis testing for high-dimensional data under heteroscedasticity,” Phd Dissertation, National University of Singapore, Singapore, 2016.
- [22] J. Hu, Z. Bai, C. Wang, and W. Wang, “On testing the equality of high dimensional mean vectors with unequal covariance matrices,” Ann. Inst. Stat. Math., 69 (2), 365-387, 2017.
- [23] M. X. Cao, J. Park, and D. J. He, “A test for the k sample Behrens–Fisher problem in high dimensional data,” J. Stat. Plan. Inference, 201, 86–102, 2019.
- [24] M. Aoshima, and K. Yata, “Two-stage procedures for high-dimensional data,” Seq. Anal.s, 30 (4), 356-399, 2011.
- [25] H. Yanagira, and K.H. Yuan, “Three approximate solutions to the multivariate Behrens–Fisher problem,” Commun. Stat. Simulation Comput., 34 (4), 975-988, 2005.
- [26] T. Kawasaki, and T. Seo, “A Two Sample Test for Mean Vectors with Unequal Covariance Matrices,” Commun. Stat. Simulation Comput., 44 (7), 1850-1866, 2015.
- [27] P. Sukcharoen, and S. Chongcharoen, “A test on the multivariate Behrens–Fisher Problem in high–dimensional data by block covariance estimation,” J. Math. Stat., 15 (1), 44-54, 2019.
- [28] A. Tsanas, M.A. Little, C. Fox, and L.O. Ramig, “Objective automatic assessment of rehabilitative speech treatment in Parkinson's disease,” IEEE Trans. Neural Syst. Rehabilitation Eng., 22 (1), 181-190, 2014.
- [29] A. Qayed, and D. Han, “Homogeneity test of several high-dimensional covariance matrices for stationary processes under non-normality,” arXiv preprint arXiv:2008.09259, 2020.
Yüksek Boyutlu Veriler İçin Behrens-Fisher Probleminin Çözümü ve Önerilen Test İstatistiklerinin Karşılaştırılması
Year 2021,
Volume: 16 Issue: 2, 397 - 415, 25.11.2021
Mehmet Sandal
,
Zeki Yıldız
Abstract
Çok değişkenli ikiden fazla grup ortalama vektörünün karşılaştırılması genellikle Çok Değişkenli Varyans Analizi (MANOVA) problemi olarak bilinmektedir. Ancak MANOVA problemlerinin çözümü için kullanılan klasik test istatistikleri, varsayım ihlallerinden oldukça fazla etkilenmektedir. Ayrıca bağımlı değişken sayısının gözlem sayısından daha büyük olduğu durumlarda çok değişkenli test istatistikleri yetersiz kalmaktadır. Bu çalışmanın amacı, yüksek boyutlu Behrens-Fisher problemlerinde ortalama vektörlerin eşit olup olmadığını belirlemek için önerilen bazı test istatistiklerini karşılaştırmaktır. Bu amaçla varyans-kovaryans matrislerinin heterojen olduğu durumlarda dört farklı test istatistiği için bir simülasyon analizi gerçekleştirilmiştir. Çalışmada üç farklı varyans-kovaryans modeli kullanılmıştır. Ayrıca bağımlı değişken sayısı ve gözlem sayısı için farklı deneysel koşullar dikkate alınmıştır. Çalışmanın sonuçları, test istatistiklerinin performanslarının genel olarak karşılaştırılabilir olduğunu göstermiştir. Ancak deneysel koşullara göre test istatistiklerinin performansının değiştiği gözlemlenmiştir.
References
- [1] R. Alpar, Uygulamalı Çok Değişkenli İstatistiksel Yöntemler, Detay Yayıncılık, Ankara, 2011.
- [2] H. Finch, and B. French, “A monte carlo comparison of robust MANOVA test statistics,” J. Mod. Appl. Stat. Methods, 12 (2), 35-81, 2013.
- [3] S.S. Wilks, “Certain generalizations in the analysis of variance,” Biometrika, 24 (3-4), 471-494, 1932.
- [4] H. Hotelling, “A generalized T test and measure of multivariate dispersion,” Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, 23-41, 1951.
- [5] K.C.S. Pillai, “Some new test criteria in multivariate analysis,” Ann. Math. Stat., 26 (1), 117–121, 1955.
- [6] S.N. Roy, Some Aspects of Multivariate Analysis, Wiley, New York, 1957.
- [7] K. Jiamwattanapong, and S. Chongcharoen, “A new test for the mean vector in high-dimensional data,” Songklanakarin Journal of Science and Technology, 37 (4), 477-484, 2015.
- [8] Y. Shen, Z. Lin, and J. Zhu, “Shrinkage-based regularization tests for high-dimensional data with application to gene set analysis,” Comput. Stat. Data Anal., 55 (7), 2221–2233, 2011.
- [9] A.P. Dempster, “A high dimensional two sample significance test,” Ann. Math. Stat., 29 (4), 995–1010, 1958,
- [10] A.P. Dempster, “A significant test for the separation of two highly multivariate small samples,” Biometrics, 16 (1), 41–50, 1960.
- [11] Z. Bai, and H. Saranadasa, “Effect of high dimension: By an example of a two sample problem,” Statistica Sinica, 6 (2), 311–329, 1996.
- [12] S. Chen, and Y. Qin, “A two-sample test for high-dimensional data with applications to gene-set testing,” Ann. Stat., 38 (2), 808–835, 2010.
- [13] Y. Fujikoshi, T. Himeno, and H. Wakaki, “Asymptotic results of a high dimensional MANOVA test and power comparisons when the dimension is large compared to the sample size,” Journal of Japan Statistical Society, 34, 19-26, 2004.
- [14] M.S. Srivastava, and Y. Fujikoshi, “Multivariate analysis of variance with fewer observations than the dimension,” J. Multivar. Anal., 97 (9), 1927–1940, 2006.
- [15] M.S. Srivastava, “Multivariate theory for analyzing high-dimensional data,” Journal of the Japan Statistical Society, 37 (1), 53–86, 2007.
- [16] J.R. Schott, “Some high-dimensional tests for a one-way MANOVA,” J. Multivar. Anal., 98 (9), 1825-1839, 2007.
- [17] J.T. Zhang, and J. Xu, “On the k-sample Behrens-Fisher problem for high-dimensional data,” Science in China Series A:Mathematics, 52 (6), 1285-1304, 2009.
- [18] T. Yamada, and M.S. Srivastava, “A test for the multivariate analysis of variance in high-dimension,” Commun. Stat. Theory Methods, 41, 2602–2612, 2012.
- [19] M.S. Srivastava, and T. Kubokawa, “Tests for multivariate analysis of variance in high dimension under non-normality,” J. Multivar. Anal., 115, 204-216, 2013.
- [20] T. Yamada, and T. Himeno, “Testing homogeneity of mean vectors under heteroscedasticity in high-dimension,” J. Multivar. Anal., 139, 7-27, 2015.
- [21] B. Zhou, “Linear hypothesis testing for high-dimensional data under heteroscedasticity,” Phd Dissertation, National University of Singapore, Singapore, 2016.
- [22] J. Hu, Z. Bai, C. Wang, and W. Wang, “On testing the equality of high dimensional mean vectors with unequal covariance matrices,” Ann. Inst. Stat. Math., 69 (2), 365-387, 2017.
- [23] M. X. Cao, J. Park, and D. J. He, “A test for the k sample Behrens–Fisher problem in high dimensional data,” J. Stat. Plan. Inference, 201, 86–102, 2019.
- [24] M. Aoshima, and K. Yata, “Two-stage procedures for high-dimensional data,” Seq. Anal.s, 30 (4), 356-399, 2011.
- [25] H. Yanagira, and K.H. Yuan, “Three approximate solutions to the multivariate Behrens–Fisher problem,” Commun. Stat. Simulation Comput., 34 (4), 975-988, 2005.
- [26] T. Kawasaki, and T. Seo, “A Two Sample Test for Mean Vectors with Unequal Covariance Matrices,” Commun. Stat. Simulation Comput., 44 (7), 1850-1866, 2015.
- [27] P. Sukcharoen, and S. Chongcharoen, “A test on the multivariate Behrens–Fisher Problem in high–dimensional data by block covariance estimation,” J. Math. Stat., 15 (1), 44-54, 2019.
- [28] A. Tsanas, M.A. Little, C. Fox, and L.O. Ramig, “Objective automatic assessment of rehabilitative speech treatment in Parkinson's disease,” IEEE Trans. Neural Syst. Rehabilitation Eng., 22 (1), 181-190, 2014.
- [29] A. Qayed, and D. Han, “Homogeneity test of several high-dimensional covariance matrices for stationary processes under non-normality,” arXiv preprint arXiv:2008.09259, 2020.