I'm not angry about it, but I like to minimise waste. Visa wants to, and is better if it it can, minimise its computational overhead. Bitcoin is dependent on computational overhead. Seems like a very different situation.
I'm no expert--maybe it is necessary that the computation have no benefit other than being costly--but I don't understand why. If it does, I would welcome an explanation, though I recognize it's not your responsibility to educate me on the finer details of Bitcoin. :-)
The reason it is costly, is because it needs to be costly (sorry for the circular logic). ~3600 Bitcoin is given away each day to miners, which means that rational actors in the mining network are going to be able to spend that same amount in power on mining (~1M USD). That money is put into the Bitcoin network, as prevention of some kind of exploits on the blockchain. An attacker would need to spend that much in order to be able to fork the blockchain for any period of time.
As far as why it needs to have no benefit, the main reason is that the state of the blockchain needs to be transferred into a hard problem of some kind in order for the proof-of-work to work. You can think of each attempt at solving it is a "vote" for that particular version of the blockchain. If everyone could vote very fast on their own particular version of the blockchain, then it would very quickly pollute the bitcoin network, and consensus would be very difficult to achieve. If, instead each vote could be scored some way (randomly), and only one out of every thousand one of your votes for the state of the blockchain is broadcast to the bitcoin network, then that means it's going to be much less polluted, and much easier to come to some kind of consensus. By being random, the fact that you are able to send a vote at one out of every thousand, means that each vote really represents 1000 votes. This is how bitcoin works, but the thousand is a much larger number (200,000,000,000,000,000,000).
In the naive implementation, the proof of work could be done by having a function f(x) producing a number in [0, 1) based on x (and f is irreversible), and then submitting votes that are below/above some threshold. Suppose f is the protein folding problem, and f(x) is some energy on how well you folded it (I don't really know how folding works, but bear with me). The problem with this is you could sit in your basement for several weeks and solve a bunch of these problems, and then all at once use them to fork the blockchain by having several blocks that have a proof of work assigned. This means that the work that's being done needs to be related to the state of the blockchain that you're voting on somehow. Another way, is that there needs to be a function w : b -> f where b is the blockchain state that you are voting on, and it produces f. In bitcoin, this w function is the merkle tree of all the transactions in it, and f is sha2(sha2(blockheader with the merkle tree and x)). This step is what makes it very difficult to "do actual work" when mining, since it's hard to make a hard problem that is dependent on random data, that's actually useful.
I'm no expert--maybe it is necessary that the computation have no benefit other than being costly--but I don't understand why. If it does, I would welcome an explanation, though I recognize it's not your responsibility to educate me on the finer details of Bitcoin. :-)