## Chain Rule

Most useful signal processing occurs in several stages. We have up to now only considered the transformation as a whole rather than the individual stages. Assuming that the transformation can be broken into parts, for example,

equation (2.2) takes on the chain-rule form:

 (2.9)

where are reference hypotheses used at each stage. The J-function of is the product of the three stage-wise J-functions. To understand the importance of the chain-rule, consider how we would cope without it. We would need to solve for , the PDF of under the assumption that had PDF . But, at each stage, the distribution of the output feature becomes more and more intractable. Thus, at the end of a long signal processing chain, we would be unable to derive . Estimating is futile since the is generally evaluated in the far tails of the distribution, and can only realistically be represented in log form, and by a closed-form expression.

On the other hand, using the chain-rule, we can re-start" the process by assuming a suitable canonical form for at the start of each stage. As we mentioned, and will shortly see, for MaxEnt, this canonical will be decided by the choice of feature transformation.

Incidentally, (2.9) can be used for the exact calculation of . Let be the combined transformation of the chain. Then, above is also equal to . Using (2.9) and (2.2), we have

 (2.10)

This is an exact relationship. Its implementation only requires that we can solve for the PDF of the feature at the output of each stage under the reference hypothesis proposed at the input of the stage.

Baggenstoss 2017-05-19