Motivation: Advantages of PBN

The projected belief network (PBN) is a layered generative network (LGN) with tractable likelihood function, and is based on a feed-forward neural network (FFNN). There are two versions of the PBN: stochastic and deterministic, and each has theoretical advantages over other layered generative networks.

It has recently been shown that the information at the output of a dimension-reducing transformation can be maximized using probability density function projection (PDF projection) [21]. PDF projection estimates the distribution of the input data, simultaneously with a dimension-reducing transformation that extracts the latent variables [6,3]. The method is more general than other methods of non-linear dimension reduction (NLDR) [83,84,85,86,87]. Implementing PDF projection in a neural network architecture is called projected belief network (PBN) [28,88,29,82]. As a result, the PBN is the most direct and general way to apply NLDR in a neural network architecture.

The LF of other widely used LGNs can only be obtained by intractable integration (marginalizing) over the hidden variables. They must rely on a surrogate cost function in order to approximate LF training. Examples are contrastive divergence (CD) to train restricted Boltzmann machines [89,90], and Kullback Leibler divergence to train variational auto-encoder (VAE) [91], or an adversarial discriminative network to train generative adversarial networks (GAN) [92]. On the other hand, since the PBN generates data by manifold sampling, it posesses a tractable likelihood function (LF) that allows direct gradient-based training.

The deterministic PBN (D-PBN) can be used as an auto-encoder [29,28] and has theoretical advantage over conventional auto-encoders. While other auto-encoders use an empirical reconstruction network, the deterministic PBN reconstructs input data by backing up (back-projecting) through the same feed-forward neural network (FFNN) that was used to extract the features. In each layer, it selects the conditional mean estimate of the layer input based on a maximum entropy prior, a type of optimal estimator.

Another advantage of the PBN and D-PBN is that they are based on a FFNN. Therefore, a single FFNN can be simultaneously a generative and a discriminative network [82]. This offers the most direct way to combine the advantages of both network types. A number of variations of this concept have been proposed. The PBN or D-PBN cost functions can be used as a regularization for discriminative neural networks [82], or the opposite: discriminative cost function can be used to “align" a PBN to decision boundaries to create better-performing generative models [1]. We will use this approach in this paper to test the PBN and D-PBN at high dimensions.