Evaluation Methodology

We conducted experiments with all five algorithms, using four variants (involving different similarity measures) each for -means and graph partitioning, yielding eleven techniques in total. This section gives an overview of ways to evaluate clustering results. A nice recent survey on clustering evaluation can be found in [ZK01], where the emphasis is on determining the impact of a variety of cost functions, built using distance or cosine similarity measures, on the quality of two generic clustering approaches.

There are two fundamentally different ways of evaluating the quality
of results delivered by a clustering algorithm. *Internal*
criteria formulate quality as a function of the given data and/or
similarities. For example, the mean squared error criterion is a
popular evaluation criterion. Hence, the clusterer can evaluate its
own performance and tune its results accordingly. When using internal
criteria, clustering becomes an optimization problem. *External*
criteria impose quality by additional, external information not given
to the clusterer, such as class labels. While this makes the problem
ill-defined, it is sometimes more appropriate since groupings are
ultimately evaluated externally by humans.