A hypergraph is a graph whose edges can connect more than two
vertices (hyperedges). The clustering problem is then formulated as a
finding the minimum-cut of a hypergraph. A minimum-cut is the removal
of the set of hyperedges (with minimum edge weight) that separates
the hypergraph into unconnected components. Again, an object
maps to a vertex
. Each word (feature) maps to a
hyperedge connecting all vertices with non-zero frequency count of
this word. The weight of this hyperedge is chosen to be the total
number of occurrences in the data-set. Hence, the importance of a
hyperedge during partitioning is proportional to the occurrence of
the corresponding word. The minimum-cut of this hypergraph into
unconnected components gives the desired clustering. We employ the
HMETIS package
for partitioning. An advantage of this approach is that the clustering
problem can be mapped to a graph problem without the explicit
computation of similarity, which makes this approach computationally
efficient with
assuming a (close to) linear
performing hypergraph partitioner. Note that, sample-wise frequency
information gets lost in this formulation since there is only a single
weight associated with a hyperedge.