Friday, May 2, 2008

Model selection - Information criteria, part II

Now for the hardcore information criteria part :)

The goal is still the same - pick a model to maximize the log-likelihood of the data. This is given by We can approximate the integral with a Laplace approximation, which is similar in idea to the previous post - the probability mass will be centered around the mode of the distribution. We can fit a normal distribution with the mode as mean, and variance approximated from Taylor expansion at the mode. Next 2 paragraphs can be skipped if you believe this :)

For example, to approximate a function that has a mode (and thus a local maximum) at , we use the 2nd order Taylor:

(the first order term is 0 because of the local maximum)

Taking as the negative of the second derivative matrix, we get If we are looking for a probability distribution that is proportional to , we have as the mean, as the covariance matrix, and as the normalizing coefficient - voila!

So we can fit a Gaussian to a function - back to information criteria. We'll fit a Gaussian to at the mode (with the most likely parameter setting) :







As before, the first term is the fit of the model to the data. The rest of the terms are the complexity penalty. The a wide prior probability for the parameters, the second term is small, and the last term scales with - the main penalty comes from

To evaluate the determinant of the covariance matrix, we assume that it has full rank, and is due to iid data points. This means that is the sum of variances due to the data points, and since the data is iid, . So . Again, last term is constant, so all in all we have


To recap, we estimated the probability of the data under the model, using the Laplace approximation to fit a Gaussian for the log-likelihood, and used some simplifying assumptions to arrive at the final form.

The end result is pretty much the Bayesian Information Criterion, and it penalizes model complexity more than AIC. Note that the constants in front are not arbitrary, since we never made any simplifications for them, and there's a 2:1 ratio. That should show Matti :)

No comments: