Projective Techniques

Next: Mixture Density Estimation Up: Clustering Algorithms Previous: Artificial Neural Systems Contents

Projective Techniques

In many domains, the high-dimensional representation contains redundant information considering the clustering task. In document clustering, for example, many different words and phrases can be used to convey a similar meaning. Hence, the vector-space representation contains highly correlated evidence and the clustering process can be extended as follows:

Find an appropriate projection of the original data () into concept space ().
Perform clustering in the dimensionality reduced space.

Good sub-spaces for projections are characterized by preserving most of the `information' in the data. For a regular matrix the

eigenvectors with the

highest eigenvalues are the

orthogonal directions of projection that explain the maximum possible of the variance in the data [Str88]. The extension of the eigen decomposition to non-square matrices is the Singular Value Decomposition (SVD). SVD is closely related to Principal Component Analysis (PCA) which is based on the SVD of the zero-mean normalized data. In Latent Semantic Indexing (LSI) the $d \times n$ word-document matrix $\mathbf{X}$ is modeled by a rank-

approximation using the top

singular values. While LSI is originally used for improved query processing in information retrieval the base idea may be employed for clustering as well. A matrix $\mathbf{U}$ is unitary if and only if $\mathbf{U}^{\dagger}\mathbf{U} = \mathbf{I}$ . Equation 2.4 gives the singular value decomposition (SVD) of the $d \times n$ data matrix $\mathbf{A}=\mathbf{X}$ into a product of the unitary $d \times r$ matrix $\mathbf{U}$ , the diagonal $r \times r$ matrix $\mathbf{\Sigma}$ , and the unitary $r \times n$ matrix $\mathbf{V}^{\dagger}$ .

$\displaystyle \mathbf{A} = \mathbf{U} \mathbf{\Sigma} \mathbf{V}^{\dagger} ,\: ... ...{V}^{\dagger} \mathbf{V} = \mathbf{I} ,\: \forall (i \neq j) : \sigma_{i,j} = 0$

(2.4)

is the rank of $\mathbf{X}$ . $\mathbf{A}_K$ is the best rank

-approximation of $\mathbf{A}$ (with

) which is obtained by using $\mathbf{\Sigma}_K$ , which is $\mathbf{\Sigma}$ with the original upper-left

diagonal entries and all others set to 0 (equation 2.5)

$\displaystyle \mathbf{A}_K = \mathbf{U}_K \mathbf{\Sigma}_K \mathbf{V}^{\dagger}_K = \sum_{i = 1}^{K} \sigma_i \mathbf{u}_i \mathbf{v}_i^{\dagger}$

(2.5)

The left-singular column vectors $\mathbf{u} \in \mathbb{R}^{d}$ in $\mathbf{U}$ are called the concept vectors with $\Vert \mathbf{u} \Vert = 1$ . In general, $\mathbf{A}$ looses its sparsity since the projection is not axis-parallel. However, the concept space does not have to be instantiated explicitly, but can be rather stored functionally, as for example, a product of two sparse matrices. In the vector space model [SWY75,Kol97], a keyword search can be expressed as a matrix product as shown in 2.6. A binary valued query vector $\mathbf{q}$ is projected linearly to a match vector $\mathbf{p}$ .

$\displaystyle \mathbf{p} = \mathbf{A}^{\dagger} \mathbf{q}$

(2.6)

In the context of web-search, it has been claimed that projecting the query vector and the web-page vectors to concept space outperforms keyword search. The rationale for this observation lies in the fact that the rank-

approximation is actually more representative than the original data matrix, because noise and redundancy have been removed. Also, it has been argued that by reducing the matrix complexity to a small number of concepts, implicitly the query is extended to encompass redundancy as introduced e.g., by synonyms. Using the optimal rank-

-approximation (LSI) yields equation 2.7, where $\mathbf{A}_K$ is given by equation 2.5.

$\displaystyle \hat{\mathbf{p}} = \mathbf{A}_K^{\dagger} \mathbf{q} = (\mathbf{V}_K \mathbf{\Sigma}_K) (\mathbf{U}_K \mathbf{q})$

(2.7)

Generally, the rank of has a magnitude of to and is around . Recently, it has been reported that random subspace projections perform very well for high-dimensional data and are close to the optimal projection as given by the SVD [SS97].

Next: Mixture Density Estimation Up: Clustering Algorithms Previous: Artificial Neural Systems Contents

Alexander Strehl 2002-05-03