Multi-Assignment Clustering for Boolean Data (2009)

Authors

Abstract

Conventional clustering methods typically assume that each data item belongs to a single cluster. This assumption might not hold in general. In order to overcome this limitation, we propose a generative approach to the clustering of Boolean vectorial data, where each object can be assigned to multiple clusters. Using a deterministic annealing scheme, our method is able to decompose observed data into the contributions of individual clusters and to infer their parameters. Experiments on synthetic data show higher accuracy in the source parameter estimation and superior cluster stability as compared to state-of-the-art approaches. We apply our method to an important problem in computer security known as role mining, which concerns the automated engineering of roles for role-based access control. Experiments on real-world access control data show performance gains in privilege prediction for new employees against other multiassignment methods. In challenging situations with high noise levels, our approach maintains its good performance, while alternative state-of-the-art techniques lack robustness.

Discussion

Mario Frank, 2009/10/08 06:55

We finally released a first version of the code: http://www.inf.ethz.ch/personal/mafrank/

Enter your comment (wiki syntax is allowed):
OEDGD
 
paper/2009/362.txt · Last modified: 2009/05/24 18:43 (external edit)
 
Driven by DokuWiki