We need to evaluate Clustering analysis (for Community structure, see Community validation) methods. Because of the unsupervised nature, it is often harder to evaluate clustering methods.
The validation approaches can be categorized into three major ones: external evaluation, internal evaluation, and relative evaluation^1. The general criteria are compactness, connectedness, and separation. Compactness means that the members of a cluster should be close to each other. It is good to identify spherical clusters but may fail to detect connected clusters. Connectedness means that the members of a cluster should be very close to some other members of the cluster and the cluster should form connected set in the space. Separation indicates that two different clusters should be well-separated from each other.
External evaluation uses pre-classified items or gold standards to validate the clustering results. The results depend on the benchmark used and thus can have biases[^2]. The evaluation does not guarantee real-world performance.
Internal evaluation normally uses intra-group similarity vs. inter-group similarity.
Relative evaluation compares different methods or different parameters.
To read #
- On Using Class-Labels in Evaluation of Clusterings
- A Review Of Monte Carlo Tests Of Cluster Analysis
References #
-
http://en.wikipedia.org/wiki/Cluster_analysis#Evaluation_of_Clustering_Results
- K-means Clustering versus Validation Measures: A Data Distribution Perspective
^1: Maria Halkidi, Yannis Batistakis, and Michalis Vazirgiannis (2001). "On Clustering Validation Techniques". JOURNAL OF INTELLIGENT INFORMATION SYSTEMS 17: 107--145. doi:10.1023/A:1012801612483. http://www.springerlink.com/content/k43h06u025w2x4q6/.
[^2]: Handl, J.; Knowles, J.; Kell, DB. (Aug 2005). "Computational cluster validation in post-genomic data analysis.". Bioinformatics 21 (15): 3201-12. doi:10.1093/bioinformatics/bti517. PMID 15914541.
Incoming Links #
Related Articles (Article 0) #
Suggested Pages #
- 0.169 Network similarity
- 0.100 Omega index
- 0.087 Pajek
- 0.051 Community
- 0.038 Local community detection
- 0.031 Dynamic community structure
- 0.025 Rand index
- 0.025 k-means clustering
- 0.025 Aaron Clauset
- 0.025 Jon Kleinberg
- More suggestions...