Next: Cluster-based Similarity Partitioning Algorithm Up: Efficient Consensus Functions Previous: Efficient Consensus Functions   Contents

Representing Sets of Clusterings as a Hypergraph

The first step for both of our proposed consensus functions is to transform the given cluster label vectors into a suitable hypergraph representation. In this subsection, we describe how any set of clusterings can be mapped to a hypergraph. A hypergraph consists of vertices and hyperedges. An edge in a regular graph connects exactly 2 vertices. A hyperedge is a generalization of an edge in that it can connect any set of vertices.

For each label vector , we construct the binary membership indicator matrix in which each cluster is represented as a hyperedge (column), as illustrated in table 5.1.5.3All entries of a row in the binary membership indicator matrix add to 1, if the row corresponds to an object with known label. Rows for objects with unknown label are all zero.

Table 5.1: Illustrative cluster ensemble problem with , , and : Original label vectors (left) and equivalent hypergraph representation with 11 hyperedges (right). Each cluster is transformed into a hyperedge.
 1 2 1 1 1 2 1 2 1 2 2 ? 2 3 2 1 2 3 3 2 3 1 3 ? 3 1 3 ?
 1 0 0 0 1 0 1 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 1 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 1 0 0 1 0 0 0 1 0 0 1 0 1 0 0 1 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 1 0 0

The concatenated block matrix defines the adjacency matrix of a hypergraph with vertices and hyperedges. Each column vector specifies a hyperedge , where 1 indicates that the vertex corresponding to the row is part of that hyperedge and 0 indicates that it is not. Thus, we have mapped each cluster to a hyperedge and the set of clusterings to a hypergraph.

Next: Cluster-based Similarity Partitioning Algorithm Up: Efficient Consensus Functions Previous: Efficient Consensus Functions   Contents
Alexander Strehl 2002-05-03