Don't. Yes, there are instructions for this. No, don't use them, unless you really exactly know what you are doing and optimizing towards a specific, single µarch only, otherwise they will invariably hurt performance, not improve it.
Similarly explicit prefetching usually does not improve performance, but reduces it.
(Non-temporal stores are quite a good example here, since a game engine used them in a few spots until recently, causing not only worse performance on Intel chips, but also heavily deteriorated performance on AMD's Zen µarch. Removing them improved performance for all chips across the bank. Ouch!)
Similarly explicit prefetching usually does not improve performance, but reduces it.
(Non-temporal stores are quite a good example here, since a game engine used them in a few spots until recently, causing not only worse performance on Intel chips, but also heavily deteriorated performance on AMD's Zen µarch. Removing them improved performance for all chips across the bank. Ouch!)