The -means Framework

Next: Robust -medoids Up: Clustering Algorithms Previous: Clustering Algorithms Contents

The -means Framework

The -means algorithm is a very simple and very powerful iterative technique to partition a data-set into disjoint clusters, where the value has to be pre-determined [DH73,Har75]. A generalized, similarity-based description of the algorithm can be given as follows:

Start at with randomly selected objects as the cluster centers $\mathbf{c}_{\ell}^{(0)}$ , $\ell \in \{1,\ldots,k \}$ .
Assign each object $\mathbf{x}_j$ to the cluster center with maximum similarity:

$\displaystyle \lambda_j^{(t)} = \arg \max_{\ell \in \{1,\ldots,k \}} s(\mathbf{x}_j, \mathbf{c}_{\ell}^{(t)})$ (2.1)
Update all cluster means:

$\displaystyle \mathbf{c}_{\ell}^{(t+1)} = \frac{1}{\vert \mathcal{C}_{\ell}^{(t)} \vert } \sum_{\lambda_j^{(t)} = \ell} \mathbf{x}_j$ (2.2)
If any $\mathbf{c}_{\ell}^{(t+1)}$ differs from $\mathbf{c}_{\ell}^{(t)}$ go to step 2 with $t \leftarrow t+1$ unless termination criteria (such as exceeding the maximum number of iterations) are met.

When similarity is based on a strictly monotonic decreasing mapping of Euclidean distances,

-means greedily minimizes the sum of squared distances of the samples $\mathbf{x}_j$ to the closest cluster centroid $\mathbf{c}_{\lambda_j}$ as given by equation 2.3.

$\displaystyle \sum_{j=1}^n \Vert \mathbf{x}_j - \mathbf{c}_{\lambda_j} \Vert _2... ...thbf{x} \in \mathcal{C}_{\ell}} \Vert \mathbf{x} - \mathbf{c}_{\ell} \Vert _2^2$

(2.3)

Next: Robust -medoids Up: Clustering Algorithms Previous: Clustering Algorithms Contents

Alexander Strehl 2002-05-03