Distance-based regression is an alternative method for parameter estimation in linear regression models when mixed-type explanatory variables are used. Distance-based regression is similar to classical linear regression, except that explanatory variables are measured by distance measures rather than raw values. In this study, datasets with sample sizes of 10, 25, 50, 100, 250 and 500 produced for Binomial, Normal, t, Chi-square and Poisson distributions of Euclidean, Gower and Manhattan distance measures and real data with discrete and continuous distribution that body weight at sixth months was used as outcome variable, body length and chest depth at sixth months of Saanen kids were used as explanatory variables as continuous data. Milk fat ratio was determined as the response variable, while the number of milking per day and the season of Polish Holstein Friesian cattle were determined as the explanatory variables as discrete data. It was aimed to determine the effect on the data sets (10, 50 and 100 sample sizes) by comparing the results obtained from the Linear Regression method. R packages "dbstats", "cluster" and "tidyverse" were used to perform the analysis. As a result, it has been determined that the use of Manhattan distance in data with Poisson distribution may produce unsuccessful results, especially in small sample sizes (n<50). Although there is no significant difference between Gower and Euclidean distances in different distributions according to sample sizes, it has been determined that the use of Euclidean distance measure in some distributions produces results that cause fluctuation. However, it has been understood that the Gower distance can be recommended as a more suitable choice since it has a more stable structure. For the applicability of the Least Square Estimation method, it may be recommended to use Distance Based Regression methods in cases where the necessary assumptions mentioned in this study cannot be met.
Ethics committee approval was not required for this study because of there was no study on animals or humans.
This study is short summary of MSc thesis of first author under the supervision of the second author.
Distance-based regression is an alternative method for parameter estimation in linear regression models when mixed-type explanatory variables are used. Distance-based regression is similar to classical linear regression, except that explanatory variables are measured by distance measures rather than raw values. In this study, datasets with sample sizes of 10, 25, 50, 100, 250 and 500 produced for Binomial, Normal, t, Chi-square and Poisson distributions of Euclidean, Gower and Manhattan distance measures and real data with discrete and continuous distribution that body weight at sixth months was used as outcome variable, body length and chest depth at sixth months of Saanen kids were used as explanatory variables as continuous data. Milk fat ratio was determined as the response variable, while the number of milking per day and the season of Polish Holstein Friesian cattle were determined as the explanatory variables as discrete data. It was aimed to determine the effect on the data sets (10, 50 and 100 sample sizes) by comparing the results obtained from the Linear Regression method. R packages "dbstats", "cluster" and "tidyverse" were used to perform the analysis. As a result, it has been determined that the use of Manhattan distance in data with Poisson distribution may produce unsuccessful results, especially in small sample sizes (n<50). Although there is no significant difference between Gower and Euclidean distances in different distributions according to sample sizes, it has been determined that the use of Euclidean distance measure in some distributions produces results that cause fluctuation. However, it has been understood that the Gower distance can be recommended as a more suitable choice since it has a more stable structure. For the applicability of the Least Square Estimation method, it may be recommended to use Distance Based Regression methods in cases where the necessary assumptions mentioned in this study cannot be met.
Ethics committee approval was not required for this study because of there was no study on animals or humans.
This study is short summary of MSc thesis of first author under the supervision of the second author.
Birincil Dil | İngilizce |
---|---|
Konular | İstatistiksel Analiz, Uygulamalı İstatistik |
Bölüm | Research Articles |
Yazarlar | |
Yayımlanma Tarihi | 15 Mart 2025 |
Gönderilme Tarihi | 11 Aralık 2024 |
Kabul Tarihi | 16 Ocak 2025 |
Yayımlandığı Sayı | Yıl 2025 Cilt: 8 Sayı: 2 |