Research Article
BibTex RIS Cite

A STUDY ON DETERMINATION OF OUTLIER OBSERVATIONS BY USING CHI-SQUARE THRESHOLD VALUE

Year 2019, Volume: 2 Issue: 1, 7 - 10, 01.01.2019

Abstract

References

  • Calenge C, Darmon G, Basille M, Loison A, Jullien JM. 2008. The factorial decomposition of the Mahalanobis distances in habitat selection studies. Ecology, 89(2): 555–566, doi: 10.1890/06-1750.1.
  • Egan WJ, Morgan SL. 1998. Outlier detection in multivariate analytical chemical data. Anal Chem, 70(11):2372–2379, doi: 10.1021/ac970763d.
  • Farber O, Kadmon R. 2003. Assessment of alternative approaches for bioclimatic modeling with special emphasis on the Mahalanobis distance. ECMOD, 160 (1-2):115-130, doi: 10.1016/S0304-3800(02)00327-7.
  • Gogoi P, Bhattacharyya DK, Borah B, Kalita JK. 2011. A survey of outlier detection methods in network anomaly identification. Computer Journal, 54(4):570-588, doi: 10.1093/comjnl/bxr026.
  • Gupta M, Gao J, Aggarwal C, Han J.2013. Outlier detection for temporal data : A survey. IEEE TKDE, 26(9): 2250-2267,doi: 10.1109/TKDE.2013.184.
  • Hodge VJ,Austin J. 2004. A survey of outlier detection methodologies. Artif Intell Rev, 22:85–126, doi: 10.1023/B:AIRE.0000045502.10941.a9.
  • Hubert M, Van Der Veeken S. 2008. Outlier detection for skewed data. In Journal of Chemometrics, 22(3-4):235-246, doi: 10.1002/cem.1123.
  • Liu H, Shah S, Jiang W. 2004. On-line outlier detection and data cleaning. CCEND, 28(9):1635-1647,doi: 10.1016/j.compchemeng.2004.01.009.
  • Maesschalck RD, Jouan-Rimbaud D, Massart DL. 2000. The Mahalanobis distance. Chemometrics and Intelligent Laboratory Systems, 50:1–18, doi: 10.1016/S0169-7439(99)00047-7.
  • Pei Y, Zaïane O. 2006. A synthetic data generator for clustering and outlier analysis. Department of Computing science, University of Alberta.URL: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.73.5133&rep=rep1&type=pdf (accesess date:10.09.2018).
  • Rousseeuw PJ, Hubert M. 2011. Robust statistics for outlier detection. WIREs Data Mining Knowl Discov, 1(1):73-79 doi: 10.1002/widm.2. Singh, K, Upadhyaya S.2012. Outlier detection: applications and techniques. IJCSI, 9(1): 307-323.
  • Teng M. 2010. Anomaly detection on time series. 2010. IEEE International Conference on Progress in Informatics and Computing, 1:603-608. doi: 10.1109/PIC.2010.5687485.
  • Ting JA, D’Souza A, Schaal S. 2007a. Automatic outlier detection: A Bayesian approach.IEEE International Conference on Robotics and Automation. 2489-2494, doi: 10.1109/ROBOT.2007.363693.
  • Ting JA, Theodorou E, Schaal S. 2007b. A Kalman filter for robust outlier detection. IEEE International Conference on Intelligent Robots and Systems, 1514-1519. doi: 10.1109/IROS.2007.4399158.
  • Url1: http://onlinestatbook.com/2/advanced_graphs/q-q_plots.html (access date: 09.10.2018).
  • Url2: http://rstat.web.tr(access date: 09.10.2018). Url3: https://onlinecourses.science.psu.edu/stat414/ node/154/(access date: 09.10.2018).
  • Xiang S, Nie F, Zhang C. 2008. Learning a Mahalanobis distance metric for data clustering and classification. Pattern Recognition, 41(12):3600-3612 doi: 10.1016/j.patcog.2008.05.018.

A Study on Determination of Outlier Observations by Using Chi-Square Threshold Value

Year 2019, Volume: 2 Issue: 1, 7 - 10, 01.01.2019

Abstract

Outlier
observations are observations that are out of the tendency of all observations
in a data set. The observations come out in situations such as faulty
observation, incorrect data entry. It is important to be able to identify these
observations as the results of statistical analysis, for example such as
multiple regression analysis, can be quite sensitive against to these
observations. Outlier observations are mostly determined by using distance
calculation, statistical test and density based approaches. In this study, the
distances of each observation vector to the center were calculated with
Mahalanobis distance by using R program. For this purpose, the features such as
hematokrit (htc), hemoglobin (hgb), mean platelet volume (mpv), platelet
distribution width (pdw), nonbacterial prostatitis (nbp) and pulse pressure
values measured in the blood of 315 heart patients were examined as data set.
As a result of the research, sixteen observations were found as outlier
observation. It is thought that the result of this study will help the
researchers trying to find out especially the outlier observations.

References

  • Calenge C, Darmon G, Basille M, Loison A, Jullien JM. 2008. The factorial decomposition of the Mahalanobis distances in habitat selection studies. Ecology, 89(2): 555–566, doi: 10.1890/06-1750.1.
  • Egan WJ, Morgan SL. 1998. Outlier detection in multivariate analytical chemical data. Anal Chem, 70(11):2372–2379, doi: 10.1021/ac970763d.
  • Farber O, Kadmon R. 2003. Assessment of alternative approaches for bioclimatic modeling with special emphasis on the Mahalanobis distance. ECMOD, 160 (1-2):115-130, doi: 10.1016/S0304-3800(02)00327-7.
  • Gogoi P, Bhattacharyya DK, Borah B, Kalita JK. 2011. A survey of outlier detection methods in network anomaly identification. Computer Journal, 54(4):570-588, doi: 10.1093/comjnl/bxr026.
  • Gupta M, Gao J, Aggarwal C, Han J.2013. Outlier detection for temporal data : A survey. IEEE TKDE, 26(9): 2250-2267,doi: 10.1109/TKDE.2013.184.
  • Hodge VJ,Austin J. 2004. A survey of outlier detection methodologies. Artif Intell Rev, 22:85–126, doi: 10.1023/B:AIRE.0000045502.10941.a9.
  • Hubert M, Van Der Veeken S. 2008. Outlier detection for skewed data. In Journal of Chemometrics, 22(3-4):235-246, doi: 10.1002/cem.1123.
  • Liu H, Shah S, Jiang W. 2004. On-line outlier detection and data cleaning. CCEND, 28(9):1635-1647,doi: 10.1016/j.compchemeng.2004.01.009.
  • Maesschalck RD, Jouan-Rimbaud D, Massart DL. 2000. The Mahalanobis distance. Chemometrics and Intelligent Laboratory Systems, 50:1–18, doi: 10.1016/S0169-7439(99)00047-7.
  • Pei Y, Zaïane O. 2006. A synthetic data generator for clustering and outlier analysis. Department of Computing science, University of Alberta.URL: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.73.5133&rep=rep1&type=pdf (accesess date:10.09.2018).
  • Rousseeuw PJ, Hubert M. 2011. Robust statistics for outlier detection. WIREs Data Mining Knowl Discov, 1(1):73-79 doi: 10.1002/widm.2. Singh, K, Upadhyaya S.2012. Outlier detection: applications and techniques. IJCSI, 9(1): 307-323.
  • Teng M. 2010. Anomaly detection on time series. 2010. IEEE International Conference on Progress in Informatics and Computing, 1:603-608. doi: 10.1109/PIC.2010.5687485.
  • Ting JA, D’Souza A, Schaal S. 2007a. Automatic outlier detection: A Bayesian approach.IEEE International Conference on Robotics and Automation. 2489-2494, doi: 10.1109/ROBOT.2007.363693.
  • Ting JA, Theodorou E, Schaal S. 2007b. A Kalman filter for robust outlier detection. IEEE International Conference on Intelligent Robots and Systems, 1514-1519. doi: 10.1109/IROS.2007.4399158.
  • Url1: http://onlinestatbook.com/2/advanced_graphs/q-q_plots.html (access date: 09.10.2018).
  • Url2: http://rstat.web.tr(access date: 09.10.2018). Url3: https://onlinecourses.science.psu.edu/stat414/ node/154/(access date: 09.10.2018).
  • Xiang S, Nie F, Zhang C. 2008. Learning a Mahalanobis distance metric for data clustering and classification. Pattern Recognition, 41(12):3600-3612 doi: 10.1016/j.patcog.2008.05.018.
There are 17 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section Research Articles
Authors

Fahrettin Kaya 0000-0003-1666-4859

Esra Yavuz

Şeyma Koç

Ömer Faruk Karaokur

Publication Date January 1, 2019
Submission Date October 13, 2018
Acceptance Date November 18, 2018
Published in Issue Year 2019 Volume: 2 Issue: 1

Cite

APA Kaya, F., Yavuz, E., Koç, Ş., Karaokur, Ö. F. (2019). A Study on Determination of Outlier Observations by Using Chi-Square Threshold Value. Black Sea Journal of Engineering and Science, 2(1), 7-10.
AMA Kaya F, Yavuz E, Koç Ş, Karaokur ÖF. A Study on Determination of Outlier Observations by Using Chi-Square Threshold Value. BSJ Eng. Sci. January 2019;2(1):7-10.
Chicago Kaya, Fahrettin, Esra Yavuz, Şeyma Koç, and Ömer Faruk Karaokur. “A Study on Determination of Outlier Observations by Using Chi-Square Threshold Value”. Black Sea Journal of Engineering and Science 2, no. 1 (January 2019): 7-10.
EndNote Kaya F, Yavuz E, Koç Ş, Karaokur ÖF (January 1, 2019) A Study on Determination of Outlier Observations by Using Chi-Square Threshold Value. Black Sea Journal of Engineering and Science 2 1 7–10.
IEEE F. Kaya, E. Yavuz, Ş. Koç, and Ö. F. Karaokur, “A Study on Determination of Outlier Observations by Using Chi-Square Threshold Value”, BSJ Eng. Sci., vol. 2, no. 1, pp. 7–10, 2019.
ISNAD Kaya, Fahrettin et al. “A Study on Determination of Outlier Observations by Using Chi-Square Threshold Value”. Black Sea Journal of Engineering and Science 2/1 (January 2019), 7-10.
JAMA Kaya F, Yavuz E, Koç Ş, Karaokur ÖF. A Study on Determination of Outlier Observations by Using Chi-Square Threshold Value. BSJ Eng. Sci. 2019;2:7–10.
MLA Kaya, Fahrettin et al. “A Study on Determination of Outlier Observations by Using Chi-Square Threshold Value”. Black Sea Journal of Engineering and Science, vol. 2, no. 1, 2019, pp. 7-10.
Vancouver Kaya F, Yavuz E, Koç Ş, Karaokur ÖF. A Study on Determination of Outlier Observations by Using Chi-Square Threshold Value. BSJ Eng. Sci. 2019;2(1):7-10.

                                                24890