next up previous contents
Next: Meta-CLustering Algorithm (MCLA) Up: Efficient Consensus Functions Previous: Cluster-based Similarity Partitioning Algorithm   Contents


HyperGraph Partitioning Algorithm (HGPA)

The second algorithm is a direct approach to cluster ensembles that re-partitions the data using the given clusters as indications of strong bonds. The cluster ensemble problem is formulated as partitioning the hypergraph by cutting a minimal number of hyperedges. We call this approach the HyperGraph Partitioning Algorithm (HGPA). All hyperedges are considered to have the same weight. Also, all vertices are equally weighted. Note, that this includes $ n_{\ell}$-way relationship information, while CSPA only considers pairwise relationships. Now, we look for a hyperedge separator that partitions the hypergraph (figure 5.4) into $ k$ unconnected components of approximately the same size. Equal sizes are obtained by maintaining a vertex imbalance of at most 5% as formulated by the following constraint: $ k \cdot \max_{\ell \in \{1,\ldots,k\}}
\frac{n_{\ell}}{n} \leq 1.05$.

Hypergraph partitioning is a well-studied area [KL70,AK95] and algorithmic details are omitted here for brevity. We use the hypergraph partitioning package HMETIS [KAKS97]. HMETIS gives high-quality partitions and is very scalable. However, please note that hypergraph partitioning in general has no provision for partially cut hyperedges. This means that there is no sensitivity to how much of a hyperedge is left in the same group after the cut. This can be problematic for our applications. Let us consider the example from table 5.1. For simplicity, let us assume that only the three hyperedges from $ \mathbf{\lambda}^{(1)}$ are present. The two partitionings $ \{\{x_1, x_2, x_7\},\{x_3, x_4\}, \{x_5, x_6\}
\}$ and $ \{ \{x_1, x_7\}, \{x_3, x_4\}, \{x_2, x_5, x_6\}\}$ both cut all three hyperedges. The first is intuitively superior, because 2/3 of the hyperedge $ \{x_1, x_2, x_3\}$ `remain' versus only 1/3 in the second. However, in standard hypergraph partitioning they have equivalent quality since both cut the same number of hyperedges.

Figure 5.4: Illustration of HyperGraph Partitioning Algorithm (HGPA) for the cluster ensemble example problem given in table 5.1. Each hyperedge is represented by a closed curve enclosing the vertices it connects. Hyperedges from the same clustering have the same color. The combined clustering $ \{\{x_1, x_2, x_3\}, \{x_4, x_5\},
\{x_6, x_7\}\}$ has the minimal hyperedge cut of 4 and is as balanced as possible for 3 clusters of 7 objects.
\resizebox{.7\textwidth}{.4\textwidth}{\includegraphics*{eps/suprahyp3332new}}


next up previous contents
Next: Meta-CLustering Algorithm (MCLA) Up: Efficient Consensus Functions Previous: Cluster-based Similarity Partitioning Algorithm   Contents
Alexander Strehl 2002-05-03