Objective Function for Cluster Ensembles

Next: Efficient Consensus Functions Up: The Cluster Ensemble Problem Previous: Illustrative Example Contents

Objective Function for Cluster Ensembles

Given groupings with the -th grouping $\mathbf{\lambda}^{(q)}$ having $k^{(q)}$ clusters, a consensus function $\Gamma$ is defined as a function $\mathbb{N}^{n \times r} \rightarrow \mathbb{N}^n$ mapping a set of clusterings to an integrated clustering:

$\displaystyle \Gamma : \{\mathbf{\lambda}^{(q)} \left\vert \right. q \in \{1,\ldots,r\} \} \rightarrow \mathbf{\lambda}.$

(5.1)

We abbreviate the set of groupings $\mathbf{\Lambda} = \{\mathbf{\lambda}^{(q)} \left\vert \right. q \in \{1,\ldots,r\} \}$ . The optimal combined clustering should share the most information with the original clusterings. But how do we measure shared information between clusterings? In information theory, mutual information is a symmetric measure to quantify the statistical information shared between two distributions [CT91]. Suppose there are two labelings $\mathbf{\lambda}^{(a)}$ and $\mathbf{\lambda}^{(b)}$ . Let there be $k^{(a)}$ groups in $\mathbf{\lambda}^{(a)}$ and $k^{(b)}$ groups in $\mathbf{\lambda}^{(b)}$ . Let $n^{(h)}$ be the number of objects in cluster $\mathcal{C}_h$ according to $\mathbf{\lambda}^{(a)}$ , and $n_{\ell}$ the number of objects in cluster $\mathcal{C}_{\ell}$ according to $\mathbf{\lambda}^{(b)}$ . Let $n_{\ell}^{(h)}$ denote the number of objects that are in cluster

according to $\mathbf{\lambda}^{(a)}$ as well as in group $\ell$ according to $\mathbf{\lambda}^{(b)}$ . Then, a [0,1]-normalized mutual information criterion $\phi^{(\mathrm{NMI})}$ is computed as follows (see appendix B.1 for details):

$\displaystyle \phi^{(\mathrm{NMI})} (\mathbf{\lambda}^{(a)},\mathbf{\lambda}^{(... ...a)} \cdot k^{(b)}} \left( \frac{ n_{\ell}^{(h)} n} { n^{(h)} n_{\ell} } \right)$

(5.2)

Therefore, the Average Normalized Mutual Information (ANMI) between a set of

labelings, $\mathbf{\Lambda}$ , and a labeling $\hat{\mathbf{\lambda}}$ is defined as follows:

$\displaystyle \phi^{(\mathrm{ANMI})}(\mathbf{\Lambda}, \hat{\mathbf{\lambda}}) ... ...q = 1}^{r} \phi^{(\mathrm{NMI})}(\hat{\mathbf{\lambda}},\mathbf{\lambda}^{(q)})$

(5.3)

We define the optimal combined clustering $\mathbf{\lambda}^{(k-\mathrm{opt})}$ to be the one that has maximal average mutual information with all individual labelings $\mathbf{\lambda}^{(q)}$ in $\mathbf{\Lambda}$ given that the number of consensus clusters desired is

. Our objective function is average normalized mutual information (ANMI, equation 5.3), and $\mathbf{\lambda}^{(k-\mathrm{opt})}$ is defined as

$\displaystyle \mathbf{\lambda}^{(k-\mathrm{opt})} = \arg \max_{\hat{\mathbf{\la... ...q = 1}^{r} \phi^{(\mathrm{NMI})}(\hat{\mathbf{\lambda}},\mathbf{\lambda}^{(q)})$

(5.4)

where $\hat{\mathbf{\lambda}}$ goes through all possible

-partitions [SG02a].

There may be situations where not all labels are known for all objects, i.e. there is missing data in the label vectors. For such cases, the consensus clustering objective from equation 5.4 can be generalized by computing a weighted average of the mutual information with the known labels, with the weights proportional to the comprehensiveness of the labelings as measured by the fraction of known labels. Let $\mathcal{L}^{(q)}$ be the set of object indices with known labels for the -th labeling. Then, the generalized objective function becomes:

$\displaystyle \mathbf{\lambda}^{(k-\mathrm{opt})} = \arg \max_{\hat{\mathbf{\la... ...lambda}}_{\mathcal{L}^{(q)}}, \hat{\mathbf{\lambda}}^{(q)}_{\mathcal{L}^{(q)}})$

(5.5)

For finite populations, the trivial solution is to exhaustively search through all possible clusterings with

labels for the one with the maximum ANMI. However, for

objects and

partitions there are

$\displaystyle \frac{1}{k!} \sum_{\ell=1}^{k} \left( \begin{array}{c} k \ \ell \end{array} \right) (-1)^{k-\ell} \ell^n$

(5.6)

possible clusterings, or approximately

for $n \gg k$ . For example, there are 171,798,901 ways to form 4 groups of 16 objects. For 8 groups of 32 objects this number grows to over $1.75\cdot10^{24}$ and for 16 groups of 64 objects to over $4.23\cdot10^{63}$ . Clearly, this exponential growth in

makes a full search prohibitive. Having defined the objective, we now propose three algorithms to find good heuristic solutions. Note that greedy optimizations of the ANMI objective are difficult since it is a hard combinatorial problem. So our three proposed algorithms in this chapter have been developed from intuitive ideas for maximizing the objective.

Next: Efficient Consensus Functions Up: The Cluster Ensemble Problem Previous: Illustrative Example Contents

Alexander Strehl 2002-05-03