It all comes down to how explicitly preferences are originally understood and then if the reward function can incorporate implicit analysis.
There have been recent studies about AI powered shirt design - the original input uses existing designs in terms of color and shape rather than the basic naive description of requirements that an engineer would give. Then the designs can be assessed by a review board or put up on a site and not produced until some n quantity of purchases.
You wouldn't try to detect cats in images without labelled data why would you try something MUCH harder without labelled data?!?!?!?!?!
There have been recent studies about AI powered shirt design - the original input uses existing designs in terms of color and shape rather than the basic naive description of requirements that an engineer would give. Then the designs can be assessed by a review board or put up on a site and not produced until some n quantity of purchases.
You wouldn't try to detect cats in images without labelled data why would you try something MUCH harder without labelled data?!?!?!?!?!