First pretty little nugget, from Chris Bishop's book "Pattern recognition and machine learning". If anyone can show me how to do inline latex in blogspot, it would be most appreciated :) Edit: found a script from
http://servalx02.blogspot.comHere's a little bit of intuition why entropy is also a measure for disorder.
Suppose there are

things in the world, each of which one of

different types. There are at most

objects of type

(

). Then the total number of possible configurations of the world

is

Less possible configurations (smaller

) means less disorder. So let's look at the scaled log of

in the limit as

(using Stirling's approximation, and shorthanding

as

):

Denoting

, we get

For a discrete random variable

, where

, this is exactly the entropy of

.