Two-view Feature Generation Model for Semi-supervised Learning
Rie Ando - IBM T.J. Watson Research Center, USA
Tong Zhang - Yahoo Inc., USA
We consider a setting for discriminative semisupervised learning where unlabeled data are used with a generative model to learn effective feature representations for discriminative training. Within this framework, we revisit the two-view feature generation model of cotraining and prove that the optimum predictor can be expressed as a linear combination of a few features constructed from unlabeled data. From this analysis, we derive methods that employ two views but are very different from co-training. Experiments show that our approach is more robust than co-training and EM, under various data generation conditions.