We conducted experiments with all five algorithms, using four variants (involving different similarity measures) each for -means and graph partitioning, yielding eleven techniques in total. This section gives an overview of ways to evaluate clustering results. A nice recent survey on clustering evaluation can be found in [ZK01], where the emphasis is on determining the impact of a variety of cost functions, built using distance or cosine similarity measures, on the quality of two generic clustering approaches.
There are two fundamentally different ways of evaluating the quality of results delivered by a clustering algorithm. Internal criteria formulate quality as a function of the given data and/or similarities. For example, the mean squared error criterion is a popular evaluation criterion. Hence, the clusterer can evaluate its own performance and tune its results accordingly. When using internal criteria, clustering becomes an optimization problem. External criteria impose quality by additional, external information not given to the clusterer, such as class labels. While this makes the problem ill-defined, it is sometimes more appropriate since groupings are ultimately evaluated externally by humans.