What about CPU data cache misses due to the fact that malloc doesn't have a concept for how best to organize that data chunk of data based on your application's specific usage pattern?
That can definitely be a problem. Using handles makes it easier to swap out your allocation mechanism if you ever have to optimize your cache efficiency.
Hell, projects that need to squeeze out that level of performance will typically have multiple different memory allocation mechanisms.