Next: Organization
Up: Introduction
Previous: Current Challenges in Clustering
  Contents
Contributions
The goal of this dissertation is to improve cluster analysis of
complex, high-dimensional, and sparse data, especially when the
application scenario imposes constraints on the desired results and on
the distribution of and access to the data.
This dissertation utilizes ideas from pattern recognition, machine
learning, statistics, graph theory, matrix reordering, multi-learner
systems, and information theory to build a novel paradigm for cluster
analysis based on relationships. The specific contributions of this
dissertation are as follows:
- Development of a complete framework for behavioral
customer segmentation. The framework extends previous work through
domain specific similarity measures such as the extended Jaccard
coefficient and constraints such as revenue or customer balancing.
- Proposal of an intuitive and interactive clustering
visualization method based on a reordering of the similarity matrix.
- Development of a comparative framework for semi-supervised
text clustering and investigation of several popular clustering
approaches on a variety of data-sets. The empirical evaluation
demonstrates how relationship-based methods improve both quality as well
as balance of results.
- Definition of the cluster ensemble problem as a counterpart to
classification ensembles in unsupervised learning. The problem of
combining previous clusterings without resorting to the original
features is posed as a mutual information maximization problem.
- Development and comparison of three relationship-based
algorithms for the cluster ensemble problem.
It
is demonstrated that all of them work well on real data and are able
to deal with missing labels and soft clusterings.
- Application of cluster ensembles to foster robustness
and to enable distributed
clustering.
Next: Organization
Up: Introduction
Previous: Current Challenges in Clustering
  Contents
Alexander Strehl
2002-05-03