Representing Sets of Clusterings as a Hypergraph

The first step for both of our proposed consensus functions is to transform the given cluster label vectors into a suitable hypergraph representation. In this subsection, we describe how any set of clusterings can be mapped to a hypergraph. A hypergraph consists of vertices and hyperedges. An edge in a regular graph connects exactly 2 vertices. A hyperedge is a generalization of an edge in that it can connect any set of vertices.

For each label vector $\mathbf{\lambda}^{(q)} \in \mathbb{N}^n$ , we construct the binary membership indicator matrix $\mathbf{H}^{(q)} \in \mathbb{N}^{n \times k^{(q)}}$ in which each cluster is represented as a hyperedge (column), as illustrated in table 5.1.^5.3All entries of a row in the binary membership indicator matrix $\mathbf{H}^{(q)}$ add to 1, if the row corresponds to an object with known label. Rows for objects with unknown label are all zero.

Table 5.1: Illustrative cluster ensemble problem with

, $k^{(1,\ldots,3)}=3$ , and $k^{(4)}=2$ : Original label vectors (left) and equivalent hypergraph representation with 11 hyperedges (right). Each cluster is transformed into a hyperedge.


$\mathbf{\lambda}^{(1)}$	$\mathbf{\lambda}^{(2)}$	$\mathbf{\lambda}^{(3)}$	$\mathbf{\lambda}^{(4)}$
1	2	1	1
1	2	1	2
1	2	2	?
2	3	2	1
2	3	3	2
3	1	3	?
3	1	3	?

$\Leftrightarrow$

$\mathbf{H}^{(1)}$			$\mathbf{H}^{(2)}$			$\mathbf{H}^{(3)}$			$\mathbf{H}^{(4)}$
$\mathbf{h}_{1}$	$\mathbf{h}_{2}$	$\mathbf{h}_{3}$	$\mathbf{h}_{4}$	$\mathbf{h}_{5}$	$\mathbf{h}_{6}$	$\mathbf{h}_{7}$	$\mathbf{h}_{8}$	$\mathbf{h}_{9}$	$\mathbf{h}_{10}$	$\mathbf{h}_{11}$
1	0	0	0	1	0	1	0	0	1	0
1	0	0	0	1	0	1	0	0	0	1
1	0	0	0	1	0	0	1	0	0	0
0	1	0	0	0	1	0	1	0	1	0
0	1	0	0	0	1	0	0	1	0	1
0	0	1	1	0	0	0	0	1	0	0
0	0	1	1	0	0	0	0	1	0	0

The concatenated block matrix $\mathbf{H} = \mathbf{H}^{(1,\ldots,r)} = (\mathbf{H}^{(1)} \ldots\ \mathbf{H}^{(r)})$ defines the adjacency matrix of a hypergraph with

vertices and $\sum_{q=1}^{r} k^{(q)}$ hyperedges. Each column vector $\mathbf{h}_a$ specifies a hyperedge

, where 1 indicates that the vertex corresponding to the row is part of that hyperedge and 0 indicates that it is not. Thus, we have mapped each cluster to a hyperedge and the set of clusterings to a hypergraph.