next up previous contents
Next: Cluster-based Similarity Partitioning Algorithm Up: Efficient Consensus Functions Previous: Efficient Consensus Functions   Contents


Representing Sets of Clusterings as a Hypergraph

The first step for both of our proposed consensus functions is to transform the given cluster label vectors into a suitable hypergraph representation. In this subsection, we describe how any set of clusterings can be mapped to a hypergraph. A hypergraph consists of vertices and hyperedges. An edge in a regular graph connects exactly 2 vertices. A hyperedge is a generalization of an edge in that it can connect any set of vertices.

For each label vector $ \mathbf{\lambda}^{(q)} \in \mathbb{N}^n$, we construct the binary membership indicator matrix $ \mathbf{H}^{(q)} \in
\mathbb{N}^{n \times k^{(q)}}$ in which each cluster is represented as a hyperedge (column), as illustrated in table 5.1.5.3All entries of a row in the binary membership indicator matrix $ \mathbf{H}^{(q)}$ add to 1, if the row corresponds to an object with known label. Rows for objects with unknown label are all zero.


Table 5.1: Illustrative cluster ensemble problem with $ r=4$, $ k^{(1,\ldots,3)}=3$, and $ k^{(4)}=2$: Original label vectors (left) and equivalent hypergraph representation with 11 hyperedges (right). Each cluster is transformed into a hyperedge.
         
  $ \mathbf{\lambda}^{(1)}$ $ \mathbf{\lambda}^{(2)}$ $ \mathbf{\lambda}^{(3)}$ $ \mathbf{\lambda}^{(4)}$
$ x_1$ 1 2 1 1
$ x_2$ 1 2 1 2
$ x_3$ 1 2 2 ?
$ x_4$ 2 3 2 1
$ x_5$ 2 3 3 2
$ x_6$ 3 1 3 ?
$ x_7$ 3 1 3 ?
$ \Leftrightarrow$
  $ \mathbf{H}^{(1)}$     $ \mathbf{H}^{(2)}$     $ \mathbf{H}^{(3)}$     $ \mathbf{H}^{(4)}$  
  $ \mathbf{h}_{1}$ $ \mathbf{h}_{2}$ $ \mathbf{h}_{3}$ $ \mathbf{h}_{4}$ $ \mathbf{h}_{5}$ $ \mathbf{h}_{6}$ $ \mathbf{h}_{7}$ $ \mathbf{h}_{8}$ $ \mathbf{h}_{9}$ $ \mathbf{h}_{10}$ $ \mathbf{h}_{11}$
$ v_1$ 1 0 0 0 1 0 1 0 0 1 0
$ v_2$ 1 0 0 0 1 0 1 0 0 0 1
$ v_3$ 1 0 0 0 1 0 0 1 0 0 0
$ v_4$ 0 1 0 0 0 1 0 1 0 1 0
$ v_5$ 0 1 0 0 0 1 0 0 1 0 1
$ v_6$ 0 0 1 1 0 0 0 0 1 0 0
$ v_7$ 0 0 1 1 0 0 0 0 1 0 0


The concatenated block matrix $ \mathbf{H} = \mathbf{H}^{(1,\ldots,r)} =
(\mathbf{H}^{(1)} \ldots\
\mathbf{H}^{(r)})$ defines the adjacency matrix of a hypergraph with $ n$ vertices and $ \sum_{q=1}^{r} k^{(q)}$ hyperedges. Each column vector $ \mathbf{h}_a$ specifies a hyperedge $ h_a$, where 1 indicates that the vertex corresponding to the row is part of that hyperedge and 0 indicates that it is not. Thus, we have mapped each cluster to a hyperedge and the set of clusterings to a hypergraph.


next up previous contents
Next: Cluster-based Similarity Partitioning Algorithm Up: Efficient Consensus Functions Previous: Efficient Consensus Functions   Contents
Alexander Strehl 2002-05-03