Pruning and Visualizing Frequent Itemsets and Association Rules


One topic that we are very interested in is visualization and visual exploration of data mining results. Unlike relational data, data mining results are presented as facts and rules which can not be assumed as distributions in Euclidean space. For example, one interesting topic is how to prune and visualize many frequent itemsets each of which contains many items and many many-to-many association rules derived from such frequent itemsets. Please pay attention that the fundamental challenge comes from the bold words many: the first problem is how to present many itemsets or rules; the second problem comes from the lack of effective visual technique to describe many-to-many relationships.

For the first problem, we have noticed that frequentness is a monotone Boolean function on the itemset lattice. Therefore, the problem becomes how to present the long border of support in the itemset lattice. Our idea is to introduce a boolean displayable property of itemsets. This displayable property is also monotone on the itemset lattice. Therefore, it introduces another border, called the displayable itemset border, which separates displayable itemsets from undisplayable ones on the itemset lattice. The value of this border is that only those displayable frequent itemsets will be visualized. In this way, we can selectively visualize frequent itemsets by interactively changing the displayable itemset border.

Frequent itemsets and association rules can be selectively visualized by changing the displayable itemset border. A key issue is how to meaningfully change the border through visual interaction. We have found an interesting approach to do this through the visualization of item taxonomy. For example, the following figure shows a simple item taxonomy over the items {a,b,c,d}.

A generalized itemset lattice on this item taxonomy is shown below. We can control the display of item taxonomy so that items {a,b} (descendants of e) are not shown. This gives a displayable itemset border (shown in orange) in the itemset lattice.

Only those displayable frequent itemsets (shown in orange) will be visualized. Depending on our itemset pruning strategy, we can visualize only ec and ed and the other displayable frequent itemsets {c,d,e,ef} are implied by these two.

For the second problem on visualizing many-to-many relationships, we have found a novel use of parallel coordinates as a good visual metaphor. An association rule is visualized by connecting its items, one on each parallel coordinate. The connection can be made by using polynomial curves, for example, Bezier curves, to distinguish itemsets or rules sharing parts (as shown in the figure below). In the presence of item taxonomy, an item taxonomy tree can be used as coordinate and can be expanded or shrunk by user interaction. As we said, this interaction introduces another border in the generalized itemset lattice. Only those frequent itemsets on the border are displayed. By changing the border through user interaction, we can selectively visualize frequent itemsets and association rules.

Here is a screen snapshot of visualization of frequent itemsets.

Here is a screen snapshot of visualizing association rules. The left two coordinates (with item names annotated on the left) represent the left-hand sides of the rules. The right two coordinates (with item names annotated on the right) represent the right-hand sides of the rules.

In summary, the resulting 3D visualization has the following features: (1) It is capable of visualizing large itemsets and many-to-many association rules; (2) It is capable of visualizing a large number of frequent itemsets and association rules by displaying only those ones whose items are selected by the user; (3) The closure properties of frequent itemsets and association rules are inherently supported by this visualization such that the implied ones are not visualized.

Future work will be focused on rule pruning and better exploitation of user interaction. One topic is how to prune uninteresting itemsets and rules. Some redundant itemsets or rules may be more interesting than their respective displayed ones. This raises the concern that some potentially interesting patterns are not accorded perceptual places in the visualization even though they are implied. Summarization techniques would be needed to better summarize and prune the discovered itemsets and rules. Another topic is to explore other alternatives in addition to item taxonomy to selectively visualize itemsets or rules.

Publications


Updated April 7, 2007 by Li Yang