The ideas for these will probably keep coming from Chris Bishop's book. Today, I liked the intuition behind the least squares solution for the (a bit generalized) linear regression problem.
The general problem is this: given a set of

data points

, we want to find a predictor function

of the form

that minimizes the mean-squared error

The

are basis functions that allow for richer models, and the

are weights of the basis. For ordinary linear regression, we can take

, but in general, we can try to match the output with any basis functions - gaussians, sinusoids, sigmoids, etc.
Now, consider a

-dimensional space

whose axes are given by the regression targets

. Then any basis function evaluated at the

data points is also a point in this space:

If the number of basis functions

is less than the number of data points

, then the linear combinations of the basis function values define a linear subspace of

.
Particularly,

is a point in this subspace for any choice of

.
Now for the cherry on the cake - the choice of weights

that minimize error

corresponds to the choice of

that is the projection of the given data vector

onto the subspace spanned by the basis functions.
Perhaps obvious (and proof omitted), but I thought it was nice that the world is consistent :)
No comments:
Post a Comment