Bioinformatics

Next: Data-sets Up: Future Work Previous: Distributed Clustering Contents

Bioinformatics

Molecular biologists are currently engaged in some of the most impressive data collection projects. Recent genome-sequencing projects are generating an enormous amount of data related to the function and structure of biological molecules and sequences. The interpretation of this wealth of data may deeply affect our understanding of life at the molecular level. Important problems for which cluster analysis might be very successful include the prediction of protein structure and function, semi-automated drug design, interpretation of nucleotide sequences, and knowledge acquisition from genetic data.

In micro-array data, for example, an object could be a gene, while a feature could be the level of expression of that gene under a particular condition. There typically are thousands of genes under hundreds of conditions. This data shares many of the properties of the high-dimensional data investigated in this dissertation and the relationship-based clustering approach proposed in chapters 3 and 4 seems promising for finding interesting genes, for example.

Also, the multitude of experimental results available to industrial gene expression researchers warrants the investigation of applications of cluster ensembles as proposed in chapter 5 to integrate and consolidate previous partitions of genes by function. Thereby cluster ensembles can yield robust results that `smooth' out variations in the individual experiments without requiring researchers to integrate their entire primary data.

Next: Data-sets Up: Future Work Previous: Distributed Clustering Contents

Alexander Strehl 2002-05-03