Research Article
BibTex RIS Cite

Comparison of Hard and Fuzzy Clustering Techniques and Selection of Optimal Fuzzifier Parameter: An Application on Household Characteristics and Health Expenditures

Year 2024, Volume: 11 Issue: 1, 17 - 31, 08.01.2024

Abstract

It is a challenging task for decision makers for finding the optimal classification pattern for the dataset obtained from national accounts, such as household budget survey (HBS) data. Fuzzy c-means (FCM) clustering, a fuzzy logic-based clustering algorithm, can be used effectively to find the proper cluster structure of given data sets under uncertainty. In this study, crisp (k-means) and fuzzy (FCM) clustering performances on grouping of households are compared while changing fuzzifier parameter for FCM. The results of the study reveal that FCM clustering performs better when compared with k-means clustering. It is found out that the optimal number of household groups is 5 and further, high cluster validity index scores are obtained when fuzzifier value is 1.5 in FCM clustering. High cluster validity index scores obtained from fuzzy Silhouette is compared to the crisp cluster validity index. The experimental results proved that fuzzy clustering superior grouping ability and it has better validity measures for grouping of households in a national dataset. It is observed that smaller fuzzifier value is a better choice to enhance fitness of fuzzy clustering. It is hoped that future experiments will compare the clustering abilities of FCM using datasets with different sizes and variables under the uncertainty conditions to determine the class boundary.

References

  • Askari, S. (2021). Fuzzy C-means clustering algorithm for data with unequal cluster sizes and contaminated with noise and outliers: review and development. Expert Systems with Applications, 165(113856), 1-27.
  • Bezdek J.C. (1981). Pattern recognition with fuzzy objective algorithms. Plenum Press. New York.
  • Bezdek, J. C., Ehrlich, R., & Full, W. (1984). FCM: the fuzzy c-means clustering algorithm. Computers & Geosciences, 10(2-3), 191-203.
  • Bonis, T., & Oudot, S. (2018). A fuzzy clustering algorithm for the mode-seeking framework. Pattern Recognition Letters, 102, 43-73.
  • Campello, R. J., & Hruschka, E. R. (2006). A fuzzy extension of the silhouette width criterion for cluster analysis. Fuzzy Sets Systems, 157(2), 2858-2875.
  • Chan, K. P., & Cheung, Y. S. (1992). Clustering of clusters. Pattern Recognition, 25(2), 211-217.
  • De Carvalho, F. D. A., Lechevallier, Y., & De Melo, F. M. (2021). Partitioning hard clustering algorithms based on multiple dissimilarity matrices. Pattern Recognition, 45(1), 447-464.
  • Di Martino, F., & Sessa, S. (2022). A novel quantum inspired genetic algorithm to initialize cluster centers in fuzzy C-means. Expert Systems with Applications, 191(116340), 1-10.
  • Dunn J.C. (1974). A fuzzy relative ISODATA process and its use in detecting compact well-separated clusters. Journal of Cybernetics, 3(3), 32-57.
  • Ferreira, M. R., de Carvalho, F. D. A., & Simões, E. C. (2016). Kernel based hard clustering methods with kernelization of the metric and automatic weighting of the variables. Pattern Recognition, 51, 310-321.
  • Gerlhof, C., Kemper, A., Kilger, C., & Moerkotte, G. (1993). Partition-based clustering in object bases: from theory to practice. foundations of data organization and algorithms, 4th International Conference, FODO'93, Chicago, Illinois, USA, October 13-15.
  • Goyal, M. K., Shivam, G., & Sarma, A. K. (2019). Spatial homogeneity of extreme precipitation indices using fuzzy clustering over northeast India. Natural Hazards, 98(4), 559–574.
  • Guha, S., Rastogi, R., & Shim, K. (2001). CURE: an efficient clustering algorithm for large databases. Information Systems, 26(1), 35-58.
  • Hinneburg, A., & Keim, D. A. (1998). An efficient approach to clustering in large multimedia databases with noise. In KDD. pp. 58-65.
  • Huang, M., Xia, Z., Wang, H., Zeng, Q., & Wang, Q. (2012). The range of the value for the fuzzifier of the fuzzy c-means algorithm. Pattern Recognition Letters, 33(16), 2280-2284.
  • Idri, A., Hosni, M., & Abran, A. (2016). Improved estimation of software development effect using classical and fuzzy analogy ensembles. Applied Soft Computing, 49, 990-1019.
  • Izakian, H., Pedrycz, W., & Jamal, I. (2015). Fuzzy clustering of time series data using dynamic time warping distance. Engineering Applications of Artificial Intelligence, 396, 235-244.
  • Janalipour, M., & Mohammadzadeh, A. (2017). Evaluation of effectiveness of three fuzzy systems and three texture extraction methods for building damage detection from post-event LiDAR data. International Journal of Digital Earth, 11, 1241-1268.
  • Jothi, R., Mohanty, S. K., & Ojha, A. (2017). DK-means: a deterministic k-means clustering algorithm for gene expression analysis. Pattern Analysis and Applications, 22, 649-667.
  • Karczmarek, P., Kiersztyn, A., Pedrycz, W., & Czerwiński, D. (2021). Fuzzy c-means-based isolation forest. Applied Soft Computing, 106(107354), 1-10.
  • Kaufman, L., & Rousseeuw, P. J. (1990). Finding Groups in Data, Wiley, New York.
  • Liao, W. K., Liu, Y., & Choudhary, A. (2004). A grid based clustering algorithm using adaptive mesh refinement. 7th Workshop on Mining Scientific and Engineering Datasets, pp.1-9.
  • Memon, K. H. (2018). A histogram approach for determining fuzzifier values of interval type-2 fuzzy c-means. Expert Systems with Applications, 91, 27-35.
  • Mohammadrezapour, O., Kisi, O., & Pourahmad, F. (2020). Fuzzy c-means and k-means clustering with genetic algorithm for identification of homogenous regions of groundwater quality. Neural Computing & Applications, 32, 3763-3775.
  • Ozkan, & I.B. Turksen, (2007). Upper and lower values for the level of fuzziness in FCM. In: Wang P.P., Ruan D., Kerre E.E. (eds) Fuzzy Logic. Studies in Fuzziness and Soft Computing, vol 215. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71258-9_6.
  • Pal, NR & Bezdek, JC. (1995). On cluster validity for the fuzzy c-mean model. IEEE Transactions on Fuzzy Systems, 3, 370-379.
  • Pedrycz, W. (2005). Knowledge-based clustering: from data to information granules. John Wiley & Sons.
  • Rousseeuw, PJ. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53-65.
  • Saha, S., & Bandyopadhyay, S. (2012) Some connectivity based cluster validity indices. Applied Soft Computing, 12(5), 1555-1565.
  • Salehi, F., Keyvanpour, M. R., & Sharifi, A. (2021). SMKFC-ER: Semi-supervised multiple kernel fuzzy clustering based on entropy and relative entropy. Information Sciences, 547, 667-688.
  • Sarkar, J. P., Saha, I., & Maulik, U. (2016). Rough possibilistic type-2 fuzzy c-means clustering for MR brain image segmentation. Applied Soft Computing, 46, 527-536.
  • Schwämmle, V., & Jensen, O. N. (2010). A simple and fast method to determine the parameters for fuzzy c–means cluster analysis. Bioinformatics, 26(22), 2841-2848.
  • Sert, S.A., Bagci, H., & Yazici, A. (2015). MOFCA: multi-objective fuzzy clustering algorithm for wireless sensor networks. Applied Soft Computing, 30, 151-165.
  • Shen, Y., Shi, H., & Zhang, J. Q. (2001). Improvement and optimization of a fuzzy c-means clustering algorithm, IMTC 2001. Proceedings of the 18th IEEE Instrumentation and Measurement Technology Conference. Rediscovering Measurement in the Age of Informatics (Cat. No.01CH 37188), Budapest, 3, 1430-1433.
  • Su, S., & Zhao, S. (2018). An optimal clustering mechanism based on Fuzzy-C means for wireless sensor networks. Sustainable Computing: Informatics and Systems, 18, 127-134.
  • Turkish Statistical Institute (TurkStat). (2015) Household Budget Survey Data. https://www.tuik.gov.tr/Home/Index
  • Velmurugan, T. (2014). Performance based analysis between k-means and fuzzy c-means clustering algorithms for connection oriented telecommunication data. Applied Soft Computing, 19, 134-146.
  • Wei, Y., Zhang, X., Shi, Y., Xia, L., Pan, S., Wu, J., ... & Zhao, X. (2018). A review of data-driven approaches for prediction and classification of building energy consumption. Renewable and Sustainable Energy Reviews, 82, 1027-1047.
  • Wu, K. L. (2012). Analysis parameter selections for fuzzy c-means. Pattern Recognition, 45(1), 407-415.
  • Xu, K., Evans, D. B., Kawabata, K., Zeramdini, R., Klavus, J., & Murray, C. J. (2003). Household catastrophic health expenditure: a multicountry analysis. The Lancet, 362(9378), 111-117.
  • Yang, M. S., & Nataliani, Y. (2017). Robust-learning fuzzy c-means clustering algorithm with unknown number of clusters. Pattern Recognition, 71, pp. 45-59.
  • Yu, J., Cheng, Q., & Huang, H. (2004). Analysis of weighting exponent in the FCM. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 34, pp. 634-639.
  • Zhou, F., Bai, B., Wu, Y., Chen, M., Zhong, Z., Zhu, R., ... & Zhao, Y. (2019). FuzzyRadar: visualization for understanding fuzzy clusters. Journal of Visualization, 22, 913-926.
  • Zhou, K., & Yang, S. (2019). Fuzzifier selection in fuzzy C-means from cluster size distribution perspective. Informatica, 30(3), 613-628.
  • Zhou, K., & Yang, S. (2020). Effect of cluster size distribution on clustering: a comparative study of k-means and fuzzy c-means clustering. Pattern Analysis and Applications, 23, 455-466.
  • Zhou, K., Fu, C., & Yang, S. (2014). Fuzziness parameter selection in fuzzy c-means: the perspective of cluster validation. Science China Information Sciences, 57, 1-8.
  • Zhou, K., Yang, S., & Shao, Z. (2017). Household monthly electricity consumption pattern mining: a fuzzy clustering-based model a case study. Journal of Cleaner Production, 141, 900-908.

Sert ve Bulanık Kümeleme Tekniklerinin Karşılaştırılması ve Optimal Bulanıklaştırıcı Parametresinin Seçimi: Hanehalkı Özellikleri ve Sağlık Harcamaları Üzerine bir Uygulama

Year 2024, Volume: 11 Issue: 1, 17 - 31, 08.01.2024

Abstract

Hanehalkı bütçe anketi (HBS) verileri gibi ulusal hesaplardan elde edilen veri seti için en uygun sınıflandırma modelini bulmak karar vericiler için zorlu bir görevdir. Bulanık mantık tabanlı bir kümeleme algoritması olan bulanık c-means (FCM) kümeleme, belirsizlik altında verilen veri setlerinin uygun küme yapısını bulmak için etkili bir şekilde kullanılabilir. Bu çalışmada, FCM için bulanıklaştırıcı parametresi değiştirilirken hanehalklarının gruplandırılmasında kesin (k-ortalamalar) ve bulanık (FCM) kümeleme performansları karşılaştırılmıştır. Çalışmanın sonuçları, FCM kümelemesinin k-means kümeleme ile karşılaştırıldığında daha iyi performans gösterdiğini ortaya koymaktadır. FCM kümelemesinde en uygun hane grubu sayısının 5 olduğu ve ayrıca FCM kümelemesinde bulanıklaştırıcı değeri 1.5 olduğunda yüksek küme geçerlilik indeksi puanları elde edildiği görülmüştür. Fuzzy Silhouette den elde edilen yüksek küme geçerlilik indeks değerleri sert küme geçerlilik indeks değerleri ile karşılaştırılmıştır. Deneysel sonuçlar, bulanık kümelemenin üstün gruplama becerisine sahip olduğunu ve ulusal bir veri setinde hane halklarının gruplanması için daha iyi geçerlilik ölçütlerine sahip olduğunu kanıtlamıştır. Bulanık kümelemenin uygunluğunu artırmak için daha küçük bulanıklaştırıcı değerinin daha iyi bir seçim olduğu gözlemlenmiştir. Gelecekte yapılacak çalışmalarda sınıf sınırını belirlemek için belirsizlik koşulları altında farklı boyut ve değişkenlere sahip veri kümelerini kullanarak FCM'nin kümeleme yeteneklerinin karşılaştırılması umulmaktadır.

References

  • Askari, S. (2021). Fuzzy C-means clustering algorithm for data with unequal cluster sizes and contaminated with noise and outliers: review and development. Expert Systems with Applications, 165(113856), 1-27.
  • Bezdek J.C. (1981). Pattern recognition with fuzzy objective algorithms. Plenum Press. New York.
  • Bezdek, J. C., Ehrlich, R., & Full, W. (1984). FCM: the fuzzy c-means clustering algorithm. Computers & Geosciences, 10(2-3), 191-203.
  • Bonis, T., & Oudot, S. (2018). A fuzzy clustering algorithm for the mode-seeking framework. Pattern Recognition Letters, 102, 43-73.
  • Campello, R. J., & Hruschka, E. R. (2006). A fuzzy extension of the silhouette width criterion for cluster analysis. Fuzzy Sets Systems, 157(2), 2858-2875.
  • Chan, K. P., & Cheung, Y. S. (1992). Clustering of clusters. Pattern Recognition, 25(2), 211-217.
  • De Carvalho, F. D. A., Lechevallier, Y., & De Melo, F. M. (2021). Partitioning hard clustering algorithms based on multiple dissimilarity matrices. Pattern Recognition, 45(1), 447-464.
  • Di Martino, F., & Sessa, S. (2022). A novel quantum inspired genetic algorithm to initialize cluster centers in fuzzy C-means. Expert Systems with Applications, 191(116340), 1-10.
  • Dunn J.C. (1974). A fuzzy relative ISODATA process and its use in detecting compact well-separated clusters. Journal of Cybernetics, 3(3), 32-57.
  • Ferreira, M. R., de Carvalho, F. D. A., & Simões, E. C. (2016). Kernel based hard clustering methods with kernelization of the metric and automatic weighting of the variables. Pattern Recognition, 51, 310-321.
  • Gerlhof, C., Kemper, A., Kilger, C., & Moerkotte, G. (1993). Partition-based clustering in object bases: from theory to practice. foundations of data organization and algorithms, 4th International Conference, FODO'93, Chicago, Illinois, USA, October 13-15.
  • Goyal, M. K., Shivam, G., & Sarma, A. K. (2019). Spatial homogeneity of extreme precipitation indices using fuzzy clustering over northeast India. Natural Hazards, 98(4), 559–574.
  • Guha, S., Rastogi, R., & Shim, K. (2001). CURE: an efficient clustering algorithm for large databases. Information Systems, 26(1), 35-58.
  • Hinneburg, A., & Keim, D. A. (1998). An efficient approach to clustering in large multimedia databases with noise. In KDD. pp. 58-65.
  • Huang, M., Xia, Z., Wang, H., Zeng, Q., & Wang, Q. (2012). The range of the value for the fuzzifier of the fuzzy c-means algorithm. Pattern Recognition Letters, 33(16), 2280-2284.
  • Idri, A., Hosni, M., & Abran, A. (2016). Improved estimation of software development effect using classical and fuzzy analogy ensembles. Applied Soft Computing, 49, 990-1019.
  • Izakian, H., Pedrycz, W., & Jamal, I. (2015). Fuzzy clustering of time series data using dynamic time warping distance. Engineering Applications of Artificial Intelligence, 396, 235-244.
  • Janalipour, M., & Mohammadzadeh, A. (2017). Evaluation of effectiveness of three fuzzy systems and three texture extraction methods for building damage detection from post-event LiDAR data. International Journal of Digital Earth, 11, 1241-1268.
  • Jothi, R., Mohanty, S. K., & Ojha, A. (2017). DK-means: a deterministic k-means clustering algorithm for gene expression analysis. Pattern Analysis and Applications, 22, 649-667.
  • Karczmarek, P., Kiersztyn, A., Pedrycz, W., & Czerwiński, D. (2021). Fuzzy c-means-based isolation forest. Applied Soft Computing, 106(107354), 1-10.
  • Kaufman, L., & Rousseeuw, P. J. (1990). Finding Groups in Data, Wiley, New York.
  • Liao, W. K., Liu, Y., & Choudhary, A. (2004). A grid based clustering algorithm using adaptive mesh refinement. 7th Workshop on Mining Scientific and Engineering Datasets, pp.1-9.
  • Memon, K. H. (2018). A histogram approach for determining fuzzifier values of interval type-2 fuzzy c-means. Expert Systems with Applications, 91, 27-35.
  • Mohammadrezapour, O., Kisi, O., & Pourahmad, F. (2020). Fuzzy c-means and k-means clustering with genetic algorithm for identification of homogenous regions of groundwater quality. Neural Computing & Applications, 32, 3763-3775.
  • Ozkan, & I.B. Turksen, (2007). Upper and lower values for the level of fuzziness in FCM. In: Wang P.P., Ruan D., Kerre E.E. (eds) Fuzzy Logic. Studies in Fuzziness and Soft Computing, vol 215. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71258-9_6.
  • Pal, NR & Bezdek, JC. (1995). On cluster validity for the fuzzy c-mean model. IEEE Transactions on Fuzzy Systems, 3, 370-379.
  • Pedrycz, W. (2005). Knowledge-based clustering: from data to information granules. John Wiley & Sons.
  • Rousseeuw, PJ. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53-65.
  • Saha, S., & Bandyopadhyay, S. (2012) Some connectivity based cluster validity indices. Applied Soft Computing, 12(5), 1555-1565.
  • Salehi, F., Keyvanpour, M. R., & Sharifi, A. (2021). SMKFC-ER: Semi-supervised multiple kernel fuzzy clustering based on entropy and relative entropy. Information Sciences, 547, 667-688.
  • Sarkar, J. P., Saha, I., & Maulik, U. (2016). Rough possibilistic type-2 fuzzy c-means clustering for MR brain image segmentation. Applied Soft Computing, 46, 527-536.
  • Schwämmle, V., & Jensen, O. N. (2010). A simple and fast method to determine the parameters for fuzzy c–means cluster analysis. Bioinformatics, 26(22), 2841-2848.
  • Sert, S.A., Bagci, H., & Yazici, A. (2015). MOFCA: multi-objective fuzzy clustering algorithm for wireless sensor networks. Applied Soft Computing, 30, 151-165.
  • Shen, Y., Shi, H., & Zhang, J. Q. (2001). Improvement and optimization of a fuzzy c-means clustering algorithm, IMTC 2001. Proceedings of the 18th IEEE Instrumentation and Measurement Technology Conference. Rediscovering Measurement in the Age of Informatics (Cat. No.01CH 37188), Budapest, 3, 1430-1433.
  • Su, S., & Zhao, S. (2018). An optimal clustering mechanism based on Fuzzy-C means for wireless sensor networks. Sustainable Computing: Informatics and Systems, 18, 127-134.
  • Turkish Statistical Institute (TurkStat). (2015) Household Budget Survey Data. https://www.tuik.gov.tr/Home/Index
  • Velmurugan, T. (2014). Performance based analysis between k-means and fuzzy c-means clustering algorithms for connection oriented telecommunication data. Applied Soft Computing, 19, 134-146.
  • Wei, Y., Zhang, X., Shi, Y., Xia, L., Pan, S., Wu, J., ... & Zhao, X. (2018). A review of data-driven approaches for prediction and classification of building energy consumption. Renewable and Sustainable Energy Reviews, 82, 1027-1047.
  • Wu, K. L. (2012). Analysis parameter selections for fuzzy c-means. Pattern Recognition, 45(1), 407-415.
  • Xu, K., Evans, D. B., Kawabata, K., Zeramdini, R., Klavus, J., & Murray, C. J. (2003). Household catastrophic health expenditure: a multicountry analysis. The Lancet, 362(9378), 111-117.
  • Yang, M. S., & Nataliani, Y. (2017). Robust-learning fuzzy c-means clustering algorithm with unknown number of clusters. Pattern Recognition, 71, pp. 45-59.
  • Yu, J., Cheng, Q., & Huang, H. (2004). Analysis of weighting exponent in the FCM. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 34, pp. 634-639.
  • Zhou, F., Bai, B., Wu, Y., Chen, M., Zhong, Z., Zhu, R., ... & Zhao, Y. (2019). FuzzyRadar: visualization for understanding fuzzy clusters. Journal of Visualization, 22, 913-926.
  • Zhou, K., & Yang, S. (2019). Fuzzifier selection in fuzzy C-means from cluster size distribution perspective. Informatica, 30(3), 613-628.
  • Zhou, K., & Yang, S. (2020). Effect of cluster size distribution on clustering: a comparative study of k-means and fuzzy c-means clustering. Pattern Analysis and Applications, 23, 455-466.
  • Zhou, K., Fu, C., & Yang, S. (2014). Fuzziness parameter selection in fuzzy c-means: the perspective of cluster validation. Science China Information Sciences, 57, 1-8.
  • Zhou, K., Yang, S., & Shao, Z. (2017). Household monthly electricity consumption pattern mining: a fuzzy clustering-based model a case study. Journal of Cleaner Production, 141, 900-908.
There are 47 citations in total.

Details

Primary Language English
Subjects Business Administration
Journal Section Articles
Authors

Songül Çınaroğlu 0000-0001-5699-8402

Publication Date January 8, 2024
Submission Date March 23, 2023
Published in Issue Year 2024 Volume: 11 Issue: 1

Cite

APA Çınaroğlu, S. (2024). Comparison of Hard and Fuzzy Clustering Techniques and Selection of Optimal Fuzzifier Parameter: An Application on Household Characteristics and Health Expenditures. Optimum Ekonomi Ve Yönetim Bilimleri Dergisi, 11(1), 17-31.
AMA Çınaroğlu S. Comparison of Hard and Fuzzy Clustering Techniques and Selection of Optimal Fuzzifier Parameter: An Application on Household Characteristics and Health Expenditures. OJEMS. January 2024;11(1):17-31.
Chicago Çınaroğlu, Songül. “Comparison of Hard and Fuzzy Clustering Techniques and Selection of Optimal Fuzzifier Parameter: An Application on Household Characteristics and Health Expenditures”. Optimum Ekonomi Ve Yönetim Bilimleri Dergisi 11, no. 1 (January 2024): 17-31.
EndNote Çınaroğlu S (January 1, 2024) Comparison of Hard and Fuzzy Clustering Techniques and Selection of Optimal Fuzzifier Parameter: An Application on Household Characteristics and Health Expenditures. Optimum Ekonomi ve Yönetim Bilimleri Dergisi 11 1 17–31.
IEEE S. Çınaroğlu, “Comparison of Hard and Fuzzy Clustering Techniques and Selection of Optimal Fuzzifier Parameter: An Application on Household Characteristics and Health Expenditures”, OJEMS, vol. 11, no. 1, pp. 17–31, 2024.
ISNAD Çınaroğlu, Songül. “Comparison of Hard and Fuzzy Clustering Techniques and Selection of Optimal Fuzzifier Parameter: An Application on Household Characteristics and Health Expenditures”. Optimum Ekonomi ve Yönetim Bilimleri Dergisi 11/1 (January 2024), 17-31.
JAMA Çınaroğlu S. Comparison of Hard and Fuzzy Clustering Techniques and Selection of Optimal Fuzzifier Parameter: An Application on Household Characteristics and Health Expenditures. OJEMS. 2024;11:17–31.
MLA Çınaroğlu, Songül. “Comparison of Hard and Fuzzy Clustering Techniques and Selection of Optimal Fuzzifier Parameter: An Application on Household Characteristics and Health Expenditures”. Optimum Ekonomi Ve Yönetim Bilimleri Dergisi, vol. 11, no. 1, 2024, pp. 17-31.
Vancouver Çınaroğlu S. Comparison of Hard and Fuzzy Clustering Techniques and Selection of Optimal Fuzzifier Parameter: An Application on Household Characteristics and Health Expenditures. OJEMS. 2024;11(1):17-31.

Please click for the statistics of Google Scholar.