Pretty nuggets of math, CS, bio: Least squares intuition

The ideas for these will probably keep coming from Chris Bishop's book. Today, I liked the intuition behind the least squares solution for the (a bit generalized) linear regression problem.

The general problem is this: given a set of $N$ data points $(\underline{x}_i, t_i)$ , we want to find a predictor function $y$ of the form
$y(\underline x,\underline w) = w_0 + \sum_{i = 1}^{M-1} w_i \phi_i(\underline{x})$ $= \underline w \underline \phi (\underline x)^T$

that minimizes the mean-squared error $E = \sum (y(\underline x_i, \underline w) - t_i)^2$

The $\phi_j$ are basis functions that allow for richer models, and the $\underline w$ are weights of the basis. For ordinary linear regression, we can take $\underline \phi_j(\underline x) = \underline x$ , but in general, we can try to match the output with any basis functions - gaussians, sinusoids, sigmoids, etc.

Now, consider a $N$ -dimensional space $S$ whose axes are given by the regression targets $\underline t = (t_1, t_2, ..., t_N)$ . Then any basis function evaluated at the $N$ data points is also a point in this space: $( \phi_j (\underline x_1), \phi_j (\underline x_2), ..., \phi_j (\underline x_N))$

If the number of basis functions $M$ is less than the number of data points $N$ , then the linear combinations of the basis function values define a linear subspace of $S$ .
Particularly, $\underline y (X, \underline w)$ is a point in this subspace for any choice of $\underline w$ .

Now for the cherry on the cake - the choice of weights $\underline w$ that minimize error $E$ corresponds to the choice of $y$ that is the projection of the given data vector $\underline t$ onto the subspace spanned by the basis functions.

Perhaps obvious (and proof omitted), but I thought it was nice that the world is consistent :)

Pretty nuggets of math, CS, bio

Sunday, April 13, 2008

Least squares intuition

No comments:

Post a Comment