If you just want calls and returns, can't you use one of the other PMUs for that? Or is sampling at the "1 sample per event" level higher overhead than IPT?
It's worth noting that aside from the overhead, function call / returns are not quite enough to reconstruct the callstack: tailcalls are just regular branch instructions.