Pretty nuggets of math, CS, bio: The Gaussian

I did not go deep enough in my math studies as an undergrad to get a glimpse of calculus of variations. It looks fascinating, but I won't pick it up, at least not yet.

Calculus of variations can be used to show why the Gaussian is interesting. It's a limiting distribution of several families, sum of IID variables is approximately Gaussian - but it is also a distribution that conveys our ignorance of the data. Given a distribution $p$ has mean $\mu$ and variance $\sigma^2$ , normal distribution is the one that conveys the least extra information, i.e, it has the maximum entropy among all possible $p$ .

The way to show it involves calculus of variations and Lagrange multipliers. We encode the 3 conditions of the distribution function (integrates to 1, mean and variance given), and combine it with the entropy in the Lagrangian:

$L(p) = -\int p(x) ln p(x) dx + \lambda_1 \left(\int p(x) dx - 1\right) + \\ + \lambda_2 \left(\int x p(x) dx - \mu \right) + \lambda_3 \left( \int p(x) (x - \mu)^2 dx - \sigma^2 \right)$

Now differentiating with respect to $p$ , and finding the maximum, we get

$0 = -1 - ln p(x) + \lambda_1 + \lambda_2 x + \lambda_3 x^2$ , from where we instantly get the form of the Gaussian:

$p(x) = \frac{1}{Z} exp(\lambda_2 x + \lambda_3 x^2)$

Completing the square and solving for the Lagrange multipliers using the constraints for mean and variance, we arrive at the Gaussian distribution. This holds similarly for the multivariate case.

So having a Gaussian as a prior distribution for observed data is equivalent to saying that we know nothing about the data except its mean and variance. Once again - cool :)

2 comments:

Anonymous said...: That is assuming the data variation occurs in a continuous spectrum. As an experimentalist, we know that is not always the case ;)

Gaussian- The Darwinian theory of selection springs to mind. Although it is tempting to believe the monk who loved his peas and apply natural selection to discrete genetic entities instead.

I will leave the actual maths to the experts..; April 21, 2008 at 2:52 AM
Leopold said...: for some reason I'm not notified of comments - and am surprised to see them here :)

Gaussian is nice if you really know nothing of the underlying processes generating the data - assuming the gaussian form is then the least assumptions you can make. If you actually know how your experiments are screwed up (of course yours are not - unlike mine as you well know :), other choices can be better.; April 28, 2008 at 12:50 AM

Pretty nuggets of math, CS, bio

Sunday, April 20, 2008

The Gaussian

2 comments:

Contributors

Blog Archive