To avoid
problems associated with PDF estimation
at high dimensions (time-series, image, etc), practitioners of
generative methods often prefer to begin with a dimension-reducing feature extraction.
But, in problems with many and varied data classes,
it is difficult to find a single low-dimensional feature vector
that contains all the necessary information.
This can be called the *feature bottleneck*.
Aside from the potential information loss,
using features greatly limits the usefulness of
generative models. All generative models can, at least theoretically, be used
to generate random samples. Generating random features has little
value since the synthetic features have no obvious way to be interpreted
outside of the classifier itself.
If robust generative models could be created on the input data
space, then synthetic data in the form of time-series, images, etc,
could be interpreted or visualized intuitively, and processed by
alternative means in meaningful ways. Uses of sampling include
- Creating realistic simulated data for experiments.
- Validation
of the distribution
, by observing
the quality and suitability of the generated samples.
- Monte Carlo integration. The integral
is approximated using
for large , where the samples are drawn from
.
- Creating
hybrid generative/discriminative models.
Let
be a decision rule taking the
value 1 (accept) or 0 (reject).
We create hybrid generative model
where the normalizing constant is obtained by Monte Carlo
integration
where are samples samples of
that have been accepted by
.
This model is a true generative model
with the qualities of both the discriminative and
generative components.

Baggenstoss
2017-05-19