Yes you would only do this for hot loops that have short basic block branches (binary search being the canonical example). That’s why I said hand annotate vs having the CPU try to detect and distinguish these situations at runtime.
Yes, we're talking about different things. Those are able to be optimized in software doing what you mentioned. The branches I am talking about are not. They often sit in "business logic" code.