Next: Discussion and Comparison
Up: Efficient Consensus Functions
Previous: HyperGraph Partitioning Algorithm (HGPA)
Contents
MetaCLustering Algorithm (MCLA)
In this subsection, we introduce the third algorithm to solve the
cluster ensemble problem. The MetaCLustering Algorithm (MCLA) is
based on clustering clusters. It also yields objectwise confidence
estimates of cluster membership.
We represented each cluster by a hyperedge. The idea in MCLA is to
group and collapse related hyperedges and assign each object to the
collapsed hyperedge in which it participates most strongly. The
hyperedges that are considered related for the purpose of collapsing
are determined by a graphbased clustering of hyperedges. We refer to
each cluster of hyperedges as a metacluster
. Collapsing reduces the number of
hyperedges from
to .
The detailed steps are:
 Construct Metagraph.
 Let us view all the
indicator vectors
(the hyperedges of
) as
vertices of another regular undirected graph, the metagraph. The edge
weights are proportional to the similarity between vertices. A
suitable similarity measure here is the binary Jaccard measure, since
it is the ratio of the intersection to the union of the sets of
objects corresponding to the two hyperedges. Formally, the edge
weight between two vertices and as defined by
the binary Jaccard measure of the corresponding indicator vectors
and
is:
.
Since the clusters are nonoverlapping (e.g., hard), there are no edges
amongst vertices of the same clustering
and, thus,
the metagraph is partite, as shown in figure
5.5.
 Cluster Hyperedges.
 Find matching labels by partitioning the metagraph into balanced
metaclusters.
Each vertex is weighted proportional
to the size of the corresponding cluster.
Balancing ensures that the sum of vertexweights is approximately
the same in each metacluster.
We use the graph partitioning package METIS in this step.
This results in a clustering of the
vectors.
Since each vertex in the metagraph represents a distinct cluster label, a
metacluster represents a group of corresponding labels.
 Collapse Metaclusters.
 For each of the metaclusters, we collapse the hyperedges into a
single metahyperedge. Each metahyperedge has an association vector
which contains an entry for each object describing its level of
association with the corresponding metacluster. The level is
computed by averaging all indicator vectors
of a
particular metacluster.^{5.5}
An entry of 0 or 1 indicates the weakest or
strongest association, respectively.
 Compete for Objects.
 In this step, each object is assigned to its
most associated metacluster:
Specifically, an object is assigned to the metacluster with the highest
entry in the association vector. Ties are broken
randomly. The confidence of an assignment is reflected by the
winner's share of association (ratio of the winner's association to
the sum of all other associations). Note that not every
metacluster can be guaranteed to win at least one object. Thus,
there are at most labels in the final combined clustering
.
Figure 5.5 illustrates metaclustering
for the example given in table
5.1 where , ,
, and . Figure
5.5 shows the original 4partite metagraph.
The three metaclusters are shown in red / , blue / , and
green / .
Consider the first metacluster,
(the red/
markers in figure
5.5). Collapsing the hyperedges yields the
objectweighted metahyperedge
with association vector
. Subsequently, metacluster
will win the competition for
vertices/objects and , and thus represent the cluster
in the resulting integrated clustering.
Our proposed metaclustering algorithm robustly outputs
, one of the 6 optimal clusterings which is
equivalent to clusterings
and
. The uncertainty about some objects is
reflected in the confidences 3/4, 1, 2/3, 1, 1/2, 1, and 1 for objects
1 through 7, respectively.
Figure 5.5:
Illustration of MetaCLustering Algorithm (MCLA) for
the cluster ensemble example problem given in table
5.1.
The 4partite metagraph is shown. Edge darkness increases with edge
weight. The vertex positions are slightly perturbed to
expose otherwise occluded edges. The three metaclusters are shown in
red / , blue / , and green / .

Next: Discussion and Comparison
Up: Efficient Consensus Functions
Previous: HyperGraph Partitioning Algorithm (HGPA)
Contents
Alexander Strehl
20020503