i second the David MacKay recommendation. its a real pleasure to read. for me personally, his coverage of the fundamentals of information theory was amazing. the proofs are easy to follow and explained well so you can match the intuition to formula.
i'd also like to recommend Boyd & Vanderberghe's book on Convex Optimization (available here http://www.stanford.edu/~boyd/cvxbook/). This is maybe a little less fun but very deep and still well written and 'illustrated'.
My background is in ML and I came to realize over time that a lot of ML is just statistics and optimization theory rehashed by computer scientist.
Many moons ago, I went to a presentation with a classmate whose advisor taught control theory. It was a seminar about neural networks, the new CS hotness back then. Her jaw dropped when she saw the equations the CS guy was presenting - it was the simplest control stuff she knew.
On the other hand, another former classmate with the same engineering background (control theory, etc.), has been doing well at CS in machine learning ....
i'd also like to recommend Boyd & Vanderberghe's book on Convex Optimization (available here http://www.stanford.edu/~boyd/cvxbook/). This is maybe a little less fun but very deep and still well written and 'illustrated'.
My background is in ML and I came to realize over time that a lot of ML is just statistics and optimization theory rehashed by computer scientist.