Evaluation of the quality of a clustering is a non-trivial and often ill-posed task. In fact, many definitions of objective functions for clusterings exist [JD88]. Internal criteria formulate quality as a function of the given data and/or similarities. For example, the mean squared error criterion (for -means) and other measures of compactness are popular evaluation criteria. Measures can also be based on isolation such as the min-cut criterion, which uses the sum of edge weights across clusters (for graph partitioning). When using internal criteria, clustering becomes an optimization problem, and a clusterer can evaluate its own performance and tune its results accordingly.
External criteria on the other hand impose quality by additional, external information not given to the clusterer, such as category labels. This is sometimes more appropriate since groupings are ultimately evaluated externally by humans. For example, when objects have already been categorized by an external source, i.e. when class labels are available, we can use information theoretic measures to quantify the match between the categorization and the clustering. Previously, average purity and entropy-based measures to assess clustering quality from 0 (worst) to 1 (best) have been used [BGG$^+$99]. While average purity is intuitive to understand, it is biased to favor small clusters. Singletons, in fact, are scored as perfect. Also, a monolithic clustering (one single cluster for all objects) receives a score as high as the maximum category prior probability. In unstratified data-sets, this prior might be close to 1, while the desired value should be close to 0. An entropy-based measure is better than purity but is still biased towards small clusters (e.g., a set of singletons is always considered perfect).
Using a measure based on normalized mutual information (equation 5.2) fixes both biases (see chapter 4): Monolithic clusterings are evaluated at 0 and singletons are severely discounted compared to the best clustering.5.9 However, the mutual information for the best clustering is smaller than 1 unless all categories have equal prior probabilities. Thus, mutual information provides an unbiased and conservative measure as compared to purity and entropy. Given categories (or classes), we use the categorization labels to evaluate the cluster quality using mutual information , as defined in equation 5.2.