Evaluation of the quality of a clustering is a non-trivial and often
ill-posed task. In fact, many definitions of objective functions for
clusterings exist [JD88]. Internal criteria formulate
quality as a function of the given data and/or similarities. For
example, the mean squared error criterion (for -means) and other
measures of compactness are popular evaluation criteria. Measures
can also be based on isolation such as the min-cut criterion, which
uses the sum of edge weights across clusters (for graph partitioning).
When using internal criteria, clustering becomes an optimization
problem, and a clusterer can evaluate its own performance and tune its
results accordingly.
External criteria on the other hand impose quality by additional, external information not given to the clusterer, such as category labels. This is sometimes more appropriate since groupings are ultimately evaluated externally by humans. For example, when objects have already been categorized by an external source, i.e. when class labels are available, we can use information theoretic measures to quantify the match between the categorization and the clustering. Previously, average purity and entropy-based measures to assess clustering quality from 0 (worst) to 1 (best) have been used [BGG$^+$99]. While average purity is intuitive to understand, it is biased to favor small clusters. Singletons, in fact, are scored as perfect. Also, a monolithic clustering (one single cluster for all objects) receives a score as high as the maximum category prior probability. In unstratified data-sets, this prior might be close to 1, while the desired value should be close to 0. An entropy-based measure is better than purity but is still biased towards small clusters (e.g., a set of singletons is always considered perfect).
Using a measure based on
normalized mutual information (equation
5.2) fixes both biases
(see chapter 4): Monolithic clusterings are evaluated at 0 and
singletons are severely discounted compared to the best
clustering.5.9 However, the mutual
information for the best clustering is smaller than 1 unless all
categories have equal prior probabilities. Thus, mutual information
provides an unbiased and conservative measure as compared to purity
and entropy. Given categories (or classes), we use the
categorization labels
to evaluate the cluster quality using
mutual information
, as
defined in equation 5.2.