next up previous contents
Next: Concluding Remarks Up: Cluster Ensembles Previous: Object-Distributed Clustering (ODC)   Contents

Summary

In this chapter we introduced the cluster ensemble problem and provided three effective and efficient algorithms to solve it. We defined a mutual information based objective function that enables us to automatically select the best solution from several algorithms and allows one to build a supra-consensus function as well. We conducted experiments to show how cluster ensembles can be used to introduce robustness, speed-up superlinear clustering algorithms, and dramatically improve `sets of subspace clusterings' for a large variety of domains. In document clustering of Yahoo! web-pages we showed that combining e.g. 20 clusterings each obtained from only 128 random words can more than double quality compared to the best single result. Some of the algorithms and data-sets are available for download at http://strehl.com/. Indeed, the cluster ensemble is a very general framework and enables a wide range of applications. We are especially interested in new applications for knowledge reuse and for distributed clustering.



Alexander Strehl 2002-05-03