Essentially, if two objects are in the same cluster then they are
considered to be fully similar, and if not they are dissimilar. This
is the simplest heuristic and is used in the Cluster-based Similarity
Partitioning Algorithm (CSPA). With this viewpoint, one can simply
reverse engineer a single clustering into a binary similarity matrix.
Similarity between two objects is 1 if they are in the same cluster
and 0 otherwise.
For each clustering, a
binary similarity matrix is
created. The entry-wise average of
such matrices representing the
sets of groupings yields an overall similarity matrix.
Figure 5.3 illustrates the generation of the
cluster-based similarity matrix for the example given in table
5.1.
Alternatively, and more concisely, this can be interpreted as using
binary cluster membership features and defining similarity as
the fraction of clusterings in which two objects are in the same
cluster. The entire
similarity matrix
can be
computed in one sparse matrix multiplication
.5.4
Now, we can use the similarity matrix to recluster the objects using any reasonable similarity based clustering algorithm. We chose to partition the induced similarity graph (vertex = object, edge weight = similarity) using METIS [KK98a] because of its robust and scalable properties.
![]() |