If both core threads are memory (and cache) intensive, then you get effectively half the cache size and half the memory bandwidth. Partitioning may make eviction less random, but the cache size is still halved, regardless of how much "tons of cache" you start with.
Increasing cache has the net effect of increasing hit ratio, sometimes substantially. With 20MB per die this may change the calculation of where things drop off. I have found that I can't reliably predict how a chip will perform, so I just wrote a bunch of benchmarks and it takes me about half an hour to see if the chip performs better or worse than I thought it would. Google's Broadwell VMs perform very well.