Abstract
We observe a training set Q composed of l labeled samples {(X/sub 1/,/spl theta//sub 1/),...,(X/sub l/, /spl theta//sub l/)} and u unlabeled samples {X/sub 1/',...,X/sub u/'}. The labels /spl theta//sub i/ are independent random variables satisfying Pr{/spl theta//sub i/=1}=/spl eta/, Pr{/spl theta//sub i/=2}=1-/spl eta/. The labeled observations X/sub i/ are independently distributed with conditional density f/sub /spl theta/i/(/spl middot/) given /spl theta//sub i/. Let (X/sub 0/,/spl theta//sub 0/) be a new sample, independently distributed as the samples in the training set. We observe X/sub 0/ and we wish to infer the classification /spl theta//sub 0/. In this paper we first assume that the distributions f/sub 1/(/spl middot/) and f/sub 2/(/spl middot/) are given and that the mixing parameter is unknown. We show that the relative value of labeled and unlabeled samples in reducing the risk of optimal classifiers is the ratio of the Fisher informations they carry about the parameter /spl eta/. We then assume that two densities g/sub 1/(/spl middot/) and g/sub 2/(/spl middot/) are given, but we do not know whether g/sub 1/(/spl middot/)=f/sub 1/(/spl middot/) and g/sub 2/(/spl middot/)=f/sub 2/(/spl middot/) or if the opposite holds, nor do we know /spl eta/. Thus the learning problem consists of both estimating the optimum partition of the observation space and assigning the classifications to the decision regions. Here, we show that labeled samples are necessary to construct a classification rule and that they are exponentially more valuable than unlabeled samples.

This publication has 17 references indexed in Scilit: