The second algorithm is a direct approach to cluster ensembles that re-partitions the data using the given clusters as indications of strong bonds. The cluster ensemble problem is formulated as partitioning the hypergraph by cutting a minimal number of hyperedges. We call this approach the HyperGraph Partitioning Algorithm (HGPA). All hyperedges are considered to have the same weight. Also, all vertices are equally weighted. Note, that this includes -way relationship information, while CSPA only considers pairwise relationships. Now, we look for a hyperedge separator that partitions the hypergraph (figure 5.4) into unconnected components of approximately the same size. Equal sizes are obtained by maintaining a vertex imbalance of at most 5% as formulated by the following constraint: .
Hypergraph partitioning is a well-studied area [KL70,AK95] and algorithmic details are omitted here for brevity. We use the hypergraph partitioning package HMETIS [KAKS97]. HMETIS gives high-quality partitions and is very scalable. However, please note that hypergraph partitioning in general has no provision for partially cut hyperedges. This means that there is no sensitivity to how much of a hyperedge is left in the same group after the cut. This can be problematic for our applications. Let us consider the example from table 5.1. For simplicity, let us assume that only the three hyperedges from are present. The two partitionings and both cut all three hyperedges. The first is intuitively superior, because 2/3 of the hyperedge `remain' versus only 1/3 in the second. However, in standard hypergraph partitioning they have equivalent quality since both cut the same number of hyperedges.
|