Research Article
BibTex RIS Cite
Year 2020, Volume: 15 Issue: 1, 1 - 12, 03.03.2020

Abstract

References

  • [1] Donoho, D.L.; Available online: statweb.stanford.edu/~donoho/Lectures/AMS2000/Curses.pdf. 2 (accessed on 20 April 2019) [2] Cunningham P.; Dimension Reduction. In: Machine Learning Techniques for Multimedia. Cord M.; Cunningham P.; Eds, Cognitive Technologies. Springer, Berlin, Heidelberg, 2008. [3] Fiebig, D. G.; On the maximum entropy approach to undersized samples. Applied Mathematics and Computation 1984, 14,301-312 [4] Pamukçu, E.; Bozdogan, H.; Çalık, S. A novel hybrid dimension reduction technique for undersized high dimensional gene expression data sets using information complexity criterion for cancer classification. Computational and Mathematical Methods in Medicine 2015, Article ID 370640, 1-14 [5] Bozdogan, H.; Pamukçu, E.; Novel Dimension Reduction Techniques for High-Dimensional Data Using Information Complexity. In Optimization Challenges in Complex, Networked and Risky Systems INFORMS 2016 140-170 [6] Mohebbi, S.; Pamukcu, E.; Bozdogan, H.; A new data adaptive elastic net predictive model using hybridized smoothed covariance estimators with information complexity. Journal of Statistical Computation and Simulation, 2019, 89(6), 1060-1089. [7] Pamukcu, E; A new hybrid regression model for undersized sample problem. Celal Bayar University Journal of Science 2017, 13(3), 803-813 [8] Linhart, H.; Zucchini, W. ; Finite sample selection criteria for multinomial models. Statistiche Hefte 1986, 27, 173-178 [9] Burnham, K., P.; Anderson, D.,R.; Kullback-Leibler information as a basis for strong inference in ecological studies. Wildlife Research 2001, 28, 111-119 [10] Boyce, D.,E.; Faire, A.; Weischedel, R.; Optimal subset selection: multiple regression, interdependence, and optimal network algorithms. Springer-Verlag, 1974,p:16. [11] Bozdogan, H.; Intelligent Statistical Data Mining with Information Complexity and Genetic Algorithm. In: Statistical Data Mining and Knowledge Discovery. H. Bozdogan (ed). Chapman and Hall/CRC. Florida, 2004. [12] Bozdogan, H.; Information Complexity and Multivariate Learning in High Dimensions with Applications in Data Mining. Forthcoming book. 2019 [13] Haff, L.,R.; Emprical bayes estimation of the multivariate normal covariance matrix. The Annals of Statistics, 1980, 8(3):586-597 [14] Shurygin, A. The linear combination of the simplest discriminator and Fisher’s one. In Applied Statistics. Nauka (ed). Moscow. Rusia. 1983. [15] Press, S.; Estimation of a normal covariance matrix. Technical Report. University of British Columbia. 1975. [16] Chen, M. C. F.; Estimation of covariance matrices under a quadratic loss function. Research Report S-46, Department of Mathematics, SUNY at Albany (Island of Capri, Italy), 1976, 1–33. [17] Bozdogan, H.; A new class of information complexity (ICOMP) criteria with an application to customer profiling and segmentation. Invited paper. In Istanbul University Journal of the School Business Administration. 2010, 39(2),370-398 [18] Chen, Y.; Wiesel, A.; Eldar, Y. C.; Hero, A. O.; Shrinkage algorithms for mmse covariance estimation. IEEE Trans. On Signal Processing 2010. 58 (10), 5016–5029. [19] Ledoit, O.; Wolf, M.; A well conditioned estimator for large dimensional covariance matrices. Journal of Multivariate Analysis 2004, 88, 365-411 [20] Thomaz., C.,E.; Maximum Entropy Covariance Estimate for Statistical Pattern Recognization. PhD Dissertation, Department of Computing Imperial College. University of London. UK, 2004. [21] Pamukcu E.; Choosing the optimal hybrid covariance estimators in adaptive elastic net regression models using information complexity. Journal of Statistical Computation and Simulation 2019, 89(16), 2983-2996. [22] Akaike, H.; Information theory and extension of the maximum likelihood principle. 2nd International Symposium on Information Theory. Budapest: Academiai Kiado. 1973, 267-281, [23] Akaike,H.;. A new look at the statistical model identification. IEEE Transaction and Automatic Control 1974, AC-19:719-723 [24] Bozdogan, H. ; ICOMP: A new model selection criterion. Classification and Related Methods of Data Analysis. 1988. 599-608 [25] Bozdogan, H.; On the information based measure of covariance complexity and its application to the evaluation of multivariate linear models. Communications in Statistics: Theory and Methods 1990, 1, 221-278 [26] Bozdogan, H.; Haughton, D.M.A.; Information complexity criteria for regression models. Computational Statistics and Data Analysis 1998, 28, 51-76 [27] Bozdogan, H. ; Howe, J.,A.; Misspecified multivariate regression models using the genetic algorithm and information complexity as the fitness function. European Journal of Pure and Applied Mathematics 2012, 5(2), 211-249 [28] Schwarz, G.; Estimating the dimension of model. Annals of Statistics 1978, 6,461-464 [29] Bozdogan, H.; Model selection and Akaike’s İnformation Criterion (AIC): the general theory and its analytical extensions. Psychometrika 1987, 52(3), 345-370 [30] Goldberg, David E. ; Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley Longman Publishing Co., Inc. Boston, MA, USA. 1989 [31] Michalewicz, Zbigniew; Genetic algorithms+ data structures= evolution programs. Springer Science & Business Media, 2013. [32] Jang J. S. R.; Derivative-Free Optimization. In Neuro-Fuzzy and Soft Computing: A Computational Approach To Learning and Machine Intelligence. Prentice-Hall, USA, 1997, 173-196 [33] Zou, Hui; Hastie, Trevor.; Regularization and variable selection via the elastic net. Journal of the royal statistical society: series B (statistical methodology), 2005, 67.2: 301-320. [34] Shahriari, S.; Faria, S.; Gonçalves, A.M. Variable Selection Methods in High-dimensional Regression—A Simulation Study. Communications in Statistics-Simulation and Computation, 2015, 44.10: 2548-2561. [35] Leardi R.; Boggia R.; Terrile M.; Genetic algorithms as a strategy for feature selection. Journal of Chemometrics 1992, .6(5), 267-281 [36] Chatterjee S.; Laudato M.; Lynch L.A.; Genetic algorithms and their statistical applications: an introduction. Computational Statistics & Data Analysis 1996, 22(6), 633-651 [37] Minerva T.; Paterlini S.; Evolutionary approaches for statistical modelling. Published in: Proceedings of the 2002 Congress on Evolutionary Computation CEC'02. Honolulu, HI, USA, 2002. Cat. No.02TH8600 [38] Tolvi J.; Genetic algorithms for outlier detection and variable selection in linear regression models. Soft Computing 2004, 8(8), 527-533 [39] Paterlini S.; Minerva T.; Regression Model Selection Using Genetic Algorithms. Recent Advances in Neural Networks, Fuzzy Systems & Evolutionary Computing. Proceedings of the 11th WSEAS. 2010.

An Alternative Approach to Variable Selection using Regression Modeling in Undersized Sample Data

Year 2020, Volume: 15 Issue: 1, 1 - 12, 03.03.2020

Abstract

The problems encountered in the analysis of data sets with undersized sample mainly arise from the singular covariance structure. As a solution to this problem, non-singular Hybrid Covariance Estimators (HCEs) have been proposed in the literature. Several multivariate statistical techniques where HCEs are used continue to be developed and introduced. One of these is the Hybrid Regression Model (HRM). Thanks to HCEs, since there is no longer the rank problem in covariance matrix, in HRM analysis the regression coefficients can be estimated as many as the number of variables. However, determining the best predictors in regression model is one of the biggest problems for researchers since the number of variables increases and there is insufficient knowledge about the model. Therefore, some numerical optimization techniques and strategies are required to explain such a wide solution space where the number of alternative subsets of candidate models of predictors can reach millions. In this paper, we introduced a new and alternative approach to variable selection for undersized sample data by using the Genetic Algorithm (GA) and Information Complexity Criteria (ICOMP) as a fitness function in the HRM analysis. To demonstrate the ability of proposed method, we carried out the Monte Carlo simulation study with correlated and undersized data sets. We compared our method with Elastic Net (EN) modeling. According to results, the proposed method can be recommended as an alternative approach to select variable in undersized sample data.

References

  • [1] Donoho, D.L.; Available online: statweb.stanford.edu/~donoho/Lectures/AMS2000/Curses.pdf. 2 (accessed on 20 April 2019) [2] Cunningham P.; Dimension Reduction. In: Machine Learning Techniques for Multimedia. Cord M.; Cunningham P.; Eds, Cognitive Technologies. Springer, Berlin, Heidelberg, 2008. [3] Fiebig, D. G.; On the maximum entropy approach to undersized samples. Applied Mathematics and Computation 1984, 14,301-312 [4] Pamukçu, E.; Bozdogan, H.; Çalık, S. A novel hybrid dimension reduction technique for undersized high dimensional gene expression data sets using information complexity criterion for cancer classification. Computational and Mathematical Methods in Medicine 2015, Article ID 370640, 1-14 [5] Bozdogan, H.; Pamukçu, E.; Novel Dimension Reduction Techniques for High-Dimensional Data Using Information Complexity. In Optimization Challenges in Complex, Networked and Risky Systems INFORMS 2016 140-170 [6] Mohebbi, S.; Pamukcu, E.; Bozdogan, H.; A new data adaptive elastic net predictive model using hybridized smoothed covariance estimators with information complexity. Journal of Statistical Computation and Simulation, 2019, 89(6), 1060-1089. [7] Pamukcu, E; A new hybrid regression model for undersized sample problem. Celal Bayar University Journal of Science 2017, 13(3), 803-813 [8] Linhart, H.; Zucchini, W. ; Finite sample selection criteria for multinomial models. Statistiche Hefte 1986, 27, 173-178 [9] Burnham, K., P.; Anderson, D.,R.; Kullback-Leibler information as a basis for strong inference in ecological studies. Wildlife Research 2001, 28, 111-119 [10] Boyce, D.,E.; Faire, A.; Weischedel, R.; Optimal subset selection: multiple regression, interdependence, and optimal network algorithms. Springer-Verlag, 1974,p:16. [11] Bozdogan, H.; Intelligent Statistical Data Mining with Information Complexity and Genetic Algorithm. In: Statistical Data Mining and Knowledge Discovery. H. Bozdogan (ed). Chapman and Hall/CRC. Florida, 2004. [12] Bozdogan, H.; Information Complexity and Multivariate Learning in High Dimensions with Applications in Data Mining. Forthcoming book. 2019 [13] Haff, L.,R.; Emprical bayes estimation of the multivariate normal covariance matrix. The Annals of Statistics, 1980, 8(3):586-597 [14] Shurygin, A. The linear combination of the simplest discriminator and Fisher’s one. In Applied Statistics. Nauka (ed). Moscow. Rusia. 1983. [15] Press, S.; Estimation of a normal covariance matrix. Technical Report. University of British Columbia. 1975. [16] Chen, M. C. F.; Estimation of covariance matrices under a quadratic loss function. Research Report S-46, Department of Mathematics, SUNY at Albany (Island of Capri, Italy), 1976, 1–33. [17] Bozdogan, H.; A new class of information complexity (ICOMP) criteria with an application to customer profiling and segmentation. Invited paper. In Istanbul University Journal of the School Business Administration. 2010, 39(2),370-398 [18] Chen, Y.; Wiesel, A.; Eldar, Y. C.; Hero, A. O.; Shrinkage algorithms for mmse covariance estimation. IEEE Trans. On Signal Processing 2010. 58 (10), 5016–5029. [19] Ledoit, O.; Wolf, M.; A well conditioned estimator for large dimensional covariance matrices. Journal of Multivariate Analysis 2004, 88, 365-411 [20] Thomaz., C.,E.; Maximum Entropy Covariance Estimate for Statistical Pattern Recognization. PhD Dissertation, Department of Computing Imperial College. University of London. UK, 2004. [21] Pamukcu E.; Choosing the optimal hybrid covariance estimators in adaptive elastic net regression models using information complexity. Journal of Statistical Computation and Simulation 2019, 89(16), 2983-2996. [22] Akaike, H.; Information theory and extension of the maximum likelihood principle. 2nd International Symposium on Information Theory. Budapest: Academiai Kiado. 1973, 267-281, [23] Akaike,H.;. A new look at the statistical model identification. IEEE Transaction and Automatic Control 1974, AC-19:719-723 [24] Bozdogan, H. ; ICOMP: A new model selection criterion. Classification and Related Methods of Data Analysis. 1988. 599-608 [25] Bozdogan, H.; On the information based measure of covariance complexity and its application to the evaluation of multivariate linear models. Communications in Statistics: Theory and Methods 1990, 1, 221-278 [26] Bozdogan, H.; Haughton, D.M.A.; Information complexity criteria for regression models. Computational Statistics and Data Analysis 1998, 28, 51-76 [27] Bozdogan, H. ; Howe, J.,A.; Misspecified multivariate regression models using the genetic algorithm and information complexity as the fitness function. European Journal of Pure and Applied Mathematics 2012, 5(2), 211-249 [28] Schwarz, G.; Estimating the dimension of model. Annals of Statistics 1978, 6,461-464 [29] Bozdogan, H.; Model selection and Akaike’s İnformation Criterion (AIC): the general theory and its analytical extensions. Psychometrika 1987, 52(3), 345-370 [30] Goldberg, David E. ; Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley Longman Publishing Co., Inc. Boston, MA, USA. 1989 [31] Michalewicz, Zbigniew; Genetic algorithms+ data structures= evolution programs. Springer Science & Business Media, 2013. [32] Jang J. S. R.; Derivative-Free Optimization. In Neuro-Fuzzy and Soft Computing: A Computational Approach To Learning and Machine Intelligence. Prentice-Hall, USA, 1997, 173-196 [33] Zou, Hui; Hastie, Trevor.; Regularization and variable selection via the elastic net. Journal of the royal statistical society: series B (statistical methodology), 2005, 67.2: 301-320. [34] Shahriari, S.; Faria, S.; Gonçalves, A.M. Variable Selection Methods in High-dimensional Regression—A Simulation Study. Communications in Statistics-Simulation and Computation, 2015, 44.10: 2548-2561. [35] Leardi R.; Boggia R.; Terrile M.; Genetic algorithms as a strategy for feature selection. Journal of Chemometrics 1992, .6(5), 267-281 [36] Chatterjee S.; Laudato M.; Lynch L.A.; Genetic algorithms and their statistical applications: an introduction. Computational Statistics & Data Analysis 1996, 22(6), 633-651 [37] Minerva T.; Paterlini S.; Evolutionary approaches for statistical modelling. Published in: Proceedings of the 2002 Congress on Evolutionary Computation CEC'02. Honolulu, HI, USA, 2002. Cat. No.02TH8600 [38] Tolvi J.; Genetic algorithms for outlier detection and variable selection in linear regression models. Soft Computing 2004, 8(8), 527-533 [39] Paterlini S.; Minerva T.; Regression Model Selection Using Genetic Algorithms. Recent Advances in Neural Networks, Fuzzy Systems & Evolutionary Computing. Proceedings of the 11th WSEAS. 2010.
There are 1 citations in total.

Details

Primary Language English
Journal Section TJST
Authors

Esra Pamukçu 0000-0002-5778-9626

Publication Date March 3, 2020
Submission Date December 11, 2019
Published in Issue Year 2020 Volume: 15 Issue: 1

Cite

APA Pamukçu, E. (2020). An Alternative Approach to Variable Selection using Regression Modeling in Undersized Sample Data. Turkish Journal of Science and Technology, 15(1), 1-12.
AMA Pamukçu E. An Alternative Approach to Variable Selection using Regression Modeling in Undersized Sample Data. TJST. March 2020;15(1):1-12.
Chicago Pamukçu, Esra. “An Alternative Approach to Variable Selection Using Regression Modeling in Undersized Sample Data”. Turkish Journal of Science and Technology 15, no. 1 (March 2020): 1-12.
EndNote Pamukçu E (March 1, 2020) An Alternative Approach to Variable Selection using Regression Modeling in Undersized Sample Data. Turkish Journal of Science and Technology 15 1 1–12.
IEEE E. Pamukçu, “An Alternative Approach to Variable Selection using Regression Modeling in Undersized Sample Data”, TJST, vol. 15, no. 1, pp. 1–12, 2020.
ISNAD Pamukçu, Esra. “An Alternative Approach to Variable Selection Using Regression Modeling in Undersized Sample Data”. Turkish Journal of Science and Technology 15/1 (March 2020), 1-12.
JAMA Pamukçu E. An Alternative Approach to Variable Selection using Regression Modeling in Undersized Sample Data. TJST. 2020;15:1–12.
MLA Pamukçu, Esra. “An Alternative Approach to Variable Selection Using Regression Modeling in Undersized Sample Data”. Turkish Journal of Science and Technology, vol. 15, no. 1, 2020, pp. 1-12.
Vancouver Pamukçu E. An Alternative Approach to Variable Selection using Regression Modeling in Undersized Sample Data. TJST. 2020;15(1):1-12.