The validation approaches can be categorized into three major ones: external evaluation, internal evaluation, and relative evaluation[^1]. The general criteria are compactness, connectedness, and separation. Compactness means that the members of a cluster should be close to each other. It is good to identify spherical clusters but may fail to detect connected clusters. Connectedness means that the members of a cluster should be very close to some other members of the cluster and the cluster should form connected set in the space. Separation indicates that two different clusters should be well-separated from each other.
External evaluation uses pre-classified items or gold standards to validate the clustering results. The results depend on the benchmark used and thus can have biases[^2]. The evaluation does not guarantee real-world performance.
Internal evaluation normally uses intra-group similarity vs. inter-group similarity.
Relative evaluation compares different methods or different parameters.
To read #
- On Using Class-Labels in Evaluation of Clusterings
- A Review Of Monte Carlo Tests Of Cluster Analysis
- K-means Clustering versus Validation Measures: A Data Distribution Perspective
[^1]: Maria Halkidi, Yannis Batistakis, and Michalis Vazirgiannis (2001). "On Clustering Validation Techniques". JOURNAL OF INTELLIGENT INFORMATION SYSTEMS 17: 107--145. doi:10.1023/A:1012801612483. http://www.springerlink.com/content/k43h06u025w2x4q6/.
[^2]: Handl, J.; Knowles, J.; Kell, DB. (Aug 2005). "Computational cluster validation in post-genomic data analysis.". Bioinformatics 21 (15): 3201-12. doi:10.1093/bioinformatics/bti517. PMID 15914541.