Research Article
BibTex RIS Cite

Comparing Clusterings:A Store Segmentation Application

Year 2018, Volume: 32 Issue: 44, 41 - 57, 30.06.2018

Abstract

This study focuses on one of the clustering
comparison measures, pair counting techniques such as Rand Index, Adjusted Rand
Index and Fowlkes Mallows Index. The aim is discussing their properties and
showing a marketing application of the techniques. For an application, a retail
chain company’s supermarket stores are segmented with clustering analysis by
two approach. The first clustering approach is segmenting stores based on
socioeconomic factors and the second approach is based on purchasing behaviors of
customers. Since consumer purchases are influenced strongly by socioeconomic
factors, this study expects to find an agreement between two clusterings. The
results show that while Rand Index value indicates an agreement,
Fowlkes-Mallows Index value has found a weak agreement and Adjusted Rand Index
value could not find any agreement between two clusterings.

References

  • Aggarwal Charu C. (2015). Data mining: The textbook. Switzerland: Springer.
  • Agrawal Rakesh, & Srikant Ramakrishnan. (1994). Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB (Vol. 1215, pp. 487-499).
  • Fowlkes Edward B, & Mallows Colin L. (1983). A method for comparing two hierarchical clusterings. Journal of the American Statistical association, 78, 553-569.
  • Hennig Christian, & Liao Tim F. (2013). How to find an appropriate clustering for mixed‐type variables with application to socio‐economic stratification. Journal of the Royal Statistical Society: Series C (Applied Statistics), 62, 309-369.
  • Hubert Lawrence, & Arabie Phipps. (1985). Comparing partitions. Journal of Classification, 2, 193-218.
  • Jaccard Paul. (1908). Nouvelles recherches sur la distribution florale.
  • Jain A. K., Murty M. N., & Flynn P. J. (1999). Data clustering: a review. ACM Computing Surveys, 31, 264-323.
  • Jain Anil K, & Dubes Richard C. (1988). Algorithms for clustering data: Prentice-Hall, Inc.
  • Kotler Philip, & Gary Armstrong. (2012). Principles of marketing (Vol. 14th ed.). New Jersey: Prentice Hall.
  • Lamb Charles W, Hair Joe F, & McDaniel Carl. (2011). Essentials of marketing: Cengage Learning.
  • Meilă Marina. (2007). Comparing clusterings—an information based distance. Journal of multivariate analysis, 98, 873-895.
  • Rabbany Reihaneh, & Zaïane Osmar R. (2015). Generalization of clustering agreements and distances for overlapping clusters and network communities. Data mining and knowledge discovery, 29, 1458-1485.
  • Rand William M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association, 66, 846-850.
  • Romano Simone, Bailey James, Nguyen Xuan Vinh, & Verspoor Karin. (2014). Standardized Mutual Information for Clustering Comparisons: One Step Further in Adjustment for Chance. In ICML (pp. 1143-1151).
  • Romano Simone, Vinh Nguyen Xuan, Bailey James, & Verspoor Karin. (2015). Adjusting for Chance Clustering Comparison Measures. arXiv preprint arXiv:1512.01286.
  • Steinley Douglas, Brusco Michael J, & Hubert Lawrence. (2016). The variance of the adjusted Rand index. Psychological methods, 21, 261.
  • Vinh Nguyen Xuan, Epps Julien, & Bailey James. (2009). Information theoretic measures for clusterings comparison: is a correction for chance necessary? In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 1073-1080): ACM.
  • Wagner Silke, & Wagner Dorothea. (2007). Comparing clusterings: an overview: Universität Karlsruhe, Fakultät für Informatik Karlsruhe.
  • Ward Jr Joe H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical association, 58, 236-244.
  • Xiang Qiaoliang, Mao Qi, Chai Kian Ming, Chieu Hai Leong, Tsang Ivor, & Zhao Zhendong. (2012). A Split-Merge Framework for Comparing Clusterings. arXiv preprint arXiv:1206.6475.
  • Yeung Ka Yee, & Ruzzo Walter L. (2001). Details of the adjusted Rand index and clustering algorithms, supplement to the paper “An empirical study on principal component analysis for clustering gene expression data”. Bioinformatics, 17, 763-774.
  • Zaki Mohammed J, & Meira Jr Wagner. (2014). Data mining and analysis: fundamental concepts and algorithms: Cambridge University Press.

Kümelemelerin Karşılaştırılması: Bir Mağaza Segmentasyonu Uygulaması

Year 2018, Volume: 32 Issue: 44, 41 - 57, 30.06.2018

Abstract

Bu çalışma, kümelemelerin karşılaştırılması
ölçülerinden biri olan çiftleri sayma tekniklerini (Rand Endeksi, Düzeltilmiş
Rand Endeksi ve Fowlkes Mallows Endeksi gibi) incelemektedir. Bu çalışmanın
amacı bahsi geçen tekniklerin özelliklerini tartışmak ve tekniklerin pazarlama
alanındaki bir uygulamasını göstermektir. Uygulama olarak, zincir mağazalara
sahip olan bir perakendecinin süpermarket mağazaları iki farklı yaklaşımla,
kümeleme analizi kullanılarak segmentlere ayrılmıştır. İlk yaklaşımında mağazalar
bulundukları yerin ve potansiyel müşterilerinin sosyoekonomik özelliklerine
göre, ikinci yaklaşımda ise mağazalar kendi müşterilerinin satın alma
davranışlarına göre segmentlere ayrılmıştır. Müşteri satın alma davranışları,
sosyoekonomik faktörlerden güçlü bir şekilde etkilendiği için, bu çalışmanın
beklentisi iki kümelemenin görüş birliğinde olması yönündedir. Analizler
sonucunda Rand Endeksi iki kümeleme arasında bir görüş birliğinin olduğunu
gösterse de, Fowlkes-Mallows Endeksi zayıf bir görüş birliğine, Düzeltilmiş
Rand Endeksi ise görüş birliğinin olmadığına işaret etmektedir. 

References

  • Aggarwal Charu C. (2015). Data mining: The textbook. Switzerland: Springer.
  • Agrawal Rakesh, & Srikant Ramakrishnan. (1994). Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB (Vol. 1215, pp. 487-499).
  • Fowlkes Edward B, & Mallows Colin L. (1983). A method for comparing two hierarchical clusterings. Journal of the American Statistical association, 78, 553-569.
  • Hennig Christian, & Liao Tim F. (2013). How to find an appropriate clustering for mixed‐type variables with application to socio‐economic stratification. Journal of the Royal Statistical Society: Series C (Applied Statistics), 62, 309-369.
  • Hubert Lawrence, & Arabie Phipps. (1985). Comparing partitions. Journal of Classification, 2, 193-218.
  • Jaccard Paul. (1908). Nouvelles recherches sur la distribution florale.
  • Jain A. K., Murty M. N., & Flynn P. J. (1999). Data clustering: a review. ACM Computing Surveys, 31, 264-323.
  • Jain Anil K, & Dubes Richard C. (1988). Algorithms for clustering data: Prentice-Hall, Inc.
  • Kotler Philip, & Gary Armstrong. (2012). Principles of marketing (Vol. 14th ed.). New Jersey: Prentice Hall.
  • Lamb Charles W, Hair Joe F, & McDaniel Carl. (2011). Essentials of marketing: Cengage Learning.
  • Meilă Marina. (2007). Comparing clusterings—an information based distance. Journal of multivariate analysis, 98, 873-895.
  • Rabbany Reihaneh, & Zaïane Osmar R. (2015). Generalization of clustering agreements and distances for overlapping clusters and network communities. Data mining and knowledge discovery, 29, 1458-1485.
  • Rand William M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association, 66, 846-850.
  • Romano Simone, Bailey James, Nguyen Xuan Vinh, & Verspoor Karin. (2014). Standardized Mutual Information for Clustering Comparisons: One Step Further in Adjustment for Chance. In ICML (pp. 1143-1151).
  • Romano Simone, Vinh Nguyen Xuan, Bailey James, & Verspoor Karin. (2015). Adjusting for Chance Clustering Comparison Measures. arXiv preprint arXiv:1512.01286.
  • Steinley Douglas, Brusco Michael J, & Hubert Lawrence. (2016). The variance of the adjusted Rand index. Psychological methods, 21, 261.
  • Vinh Nguyen Xuan, Epps Julien, & Bailey James. (2009). Information theoretic measures for clusterings comparison: is a correction for chance necessary? In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 1073-1080): ACM.
  • Wagner Silke, & Wagner Dorothea. (2007). Comparing clusterings: an overview: Universität Karlsruhe, Fakultät für Informatik Karlsruhe.
  • Ward Jr Joe H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical association, 58, 236-244.
  • Xiang Qiaoliang, Mao Qi, Chai Kian Ming, Chieu Hai Leong, Tsang Ivor, & Zhao Zhendong. (2012). A Split-Merge Framework for Comparing Clusterings. arXiv preprint arXiv:1206.6475.
  • Yeung Ka Yee, & Ruzzo Walter L. (2001). Details of the adjusted Rand index and clustering algorithms, supplement to the paper “An empirical study on principal component analysis for clustering gene expression data”. Bioinformatics, 17, 763-774.
  • Zaki Mohammed J, & Meira Jr Wagner. (2014). Data mining and analysis: fundamental concepts and algorithms: Cambridge University Press.
There are 22 citations in total.

Details

Primary Language English
Journal Section Makaleler / Articles
Authors

Emrah Bilgiç 0000-0002-9875-2299

Özgür Çakır This is me 0000-0003-1410-8162

Publication Date June 30, 2018
Submission Date December 21, 2017
Acceptance Date June 6, 2018
Published in Issue Year 2018 Volume: 32 Issue: 44

Cite

APA Bilgiç, E., & Çakır, Ö. (2018). Comparing Clusterings:A Store Segmentation Application. Erciyes Üniversitesi Sosyal Bilimler Enstitüsü Dergisi, 32(44), 41-57.

ERCİYES AKADEMİ | 2021 | sbedergi@erciyes.edu.tr Bu eser Creative Commons Atıf-Gayri Ticari-Türetilemez 4.0 Uluslararası Lisansı ile lisanslanmıştır.