As noted in the Fallback analysis section, the algorithm would perform somewhat worse than random sampling if there were some hopelessly mixed clusters, where going down into smaller clusters doesn't help.
I was wondering if it would help if preference was given in sampling not to those clusters that are less pure but to those that have class distributions most different from that of their subclusters. In this way both pure and completely mixed clusters would be avoided.
paper/2008/324.txt · Last modified: 2008/06/22 03:35 (external edit)
Discussion
Very interesting paper.
As noted in the Fallback analysis section, the algorithm would perform somewhat worse than random sampling if there were some hopelessly mixed clusters, where going down into smaller clusters doesn't help.
I was wondering if it would help if preference was given in sampling not to those clusters that are less pure but to those that have class distributions most different from that of their subclusters. In this way both pure and completely mixed clusters would be avoided.