Revisiting Probabilistic Models for Clustering with Pair-wise Constraints
Blaine Nelson - University of California, Berkeley, USA
Ira Cohen - Hewlett-Packard Research Labs, USA
We revisit recently proposed algorithms for probabilistic clustering with pair-wise constraints between data points. We evaluate and compare existing techniques in terms of robustness to misspecified constraints. We show that the technique that strictly enforces the given constraints, namely the chunklet model, produces poor results even under a small number of misspecified constraints. We further show that methods that penalize constraint violation are more robust to misspecfied constraints but have undesirable local behaviors. Based on this evaluation, we propose a new learning technique, extending the chunklet model to allow soft constraints represented by an intuitive measure of confidence in the constraint.