Balancing

Next: Vertex Weighted Graph Partitioning Up: OPOSSUM Previous: OPOSSUM Contents

Balancing

Typically, one segments transactional data into 7-14 groups, each of which should be of comparable importance. Balancing avoids trivial clusterings (e.g.,

singletons and 1 big cluster). More importantly, the desired balancing properties have many application driven advantages. For example when each cluster contains the same number of customers, discovered phenomena (e.g. frequent products, co-purchases) have equal significance / support and are thus easier to evaluate. When each customer cluster equals the same revenue share, marketing can spend an equal amount of attention and budget to each of the groups. OPOSSUM strives to deliver `balanced' clusters using either of the following two criteria:

Sample balanced: Each cluster should contain roughly the same number of samples, . This allows, for example, retail marketers to obtain a customer segmentation with equally sized customer groups.
Value balanced: Each cluster should contain roughly the same amount of feature values. Thus, a cluster represents a -th fraction of the total feature value $v = \sum_{j=1}^n \sum_{i=1}^d x_{i,j}$ . In customer clustering, we use extended price per product as features and, thus, each cluster represents a roughly equal contribution to total revenue. In web-session clustering the feature of choice is the time spent on a particular web-page. This results in user clusters balanced with respect to the total time spent on the site.

We formulate the desired balancing properties by assigning each object (customer, document, web-session) a weight and then softly constrain the sum of weights in each cluster. For sample balanced clustering, we assign each sample $\mathbf{x}_j$ the same weight

. To obtain value balancing properties, a sample $\mathbf{x}_j$ 's weight is set to $w_j = \frac{1}{v} \sum_{i=1}^d x_{i,j}$ . Please note that the sum of weights for all samples is 1.

Next: Vertex Weighted Graph Partitioning Up: OPOSSUM Previous: OPOSSUM Contents

Alexander Strehl 2002-05-03