We also employed the well-known Euclidean -means algorithm and
three variations of it using non-metric distance measures. The
-means algorithm is an iterative algorithm to minimize the least
squares error criterion [DH73]. A cluster
is represented by its center
, the mean of all samples in
. The centers are initialized with a random
selection of data objects. Each sample is then labeled with the
index of the *nearest* or *most similar* center. In
classical -means, `nearest' means the point with the smallest Euclidean
distance. However, -means can be generalized by substituting
nearness with any other notion of similarity. For our comparison, we
will use all four definitions of the similarity
between two objects
and
as introduced in the previous section. Subsequent
re-computing of the mean for each cluster and re-assigning the cluster
labels is iterated until convergence to a fixed labeling after
iterations. The complexity of this algorithm is
.

Alexander Strehl 2002-05-03