Next: Robust -medoids
Up: Clustering Algorithms
Previous: Clustering Algorithms
  Contents
The
-means algorithm is a very simple and very powerful iterative
technique to partition a data-set into
disjoint clusters, where
the value
has to be pre-determined [DH73,Har75]. A
generalized, similarity-based description of the algorithm can be
given as follows:
- Start at
with
randomly selected objects as
the cluster centers
,
.
- Assign each object
to the cluster center with maximum
similarity:
data:image/s3,"s3://crabby-images/dcca6/dcca6b10031a99087f89bcd5eb3fa75f498bd3bd" alt="$\displaystyle \lambda_j^{(t)} = \arg \max_{\ell \in \{1,\ldots,k \}} s(\mathbf{x}_j, \mathbf{c}_{\ell}^{(t)})$" |
(2.1) |
- Update all
cluster means:
data:image/s3,"s3://crabby-images/7d576/7d5763ffdf52c391f3e9ed597ca783ac240e136e" alt="$\displaystyle \mathbf{c}_{\ell}^{(t+1)} = \frac{1}{\vert \mathcal{C}_{\ell}^{(t)} \vert } \sum_{\lambda_j^{(t)} = \ell} \mathbf{x}_j$" |
(2.2) |
- If any
differs from
go to step 2 with
unless termination criteria (such as exceeding the maximum number of
iterations) are met.
When similarity is based on a strictly monotonic decreasing mapping of
Euclidean distances,
-means greedily minimizes the sum of squared
distances of the samples
to the closest cluster
centroid
as given by equation
2.3.
data:image/s3,"s3://crabby-images/d8e37/d8e378bc901a78e8ab74d93b82a30bccd178886e" alt="$\displaystyle \sum_{j=1}^n \Vert \mathbf{x}_j - \mathbf{c}_{\lambda_j} \Vert _2...
...thbf{x} \in \mathcal{C}_{\ell}} \Vert \mathbf{x} - \mathbf{c}_{\ell} \Vert _2^2$" |
(2.3) |
Next: Robust -medoids
Up: Clustering Algorithms
Previous: Clustering Algorithms
  Contents
Alexander Strehl
2002-05-03