> D-REX proposes a really clever trick to get around not having any reward label...

hansvm · on Oct 26, 2021

The two ideas are mostly compatible (and neither assumption always holds):

(Evolutionary) If you generate enough perturbations then some of them are better.

(TFA) If you generate perturbations then most of them are worse.

In the evolutionary case you also explicitly design your model and algorithm to try to generate good perturbations, so the two ideas aren't necessarily directly comparable anyway.