Here's an example. Suppose there are two buttons, A and B. If you press A for th...

state_less · on June 10, 2021

There are no infinite rewards in biology and yet mathematicians seem to do just fine answering these sorts of questions.

I don’t think you want to encode your problem domain in your reward system. It’d be like asking a logic gate to add when you really should be reaching for an FPU. Maybe I’m missing something though?

xamuel · on June 10, 2021

>There are no infinite rewards in biology and yet mathematicians seem to do just fine answering these sorts of questions

This is only a problem if you're already assuming we do everything based on our biological reward systems, and in the current context that would be circular reasoning.

Imagine the treasury creates a "superdollar", a product which, if you have one, you can use to create any number of dollars you want, whenever you want, as many times as you want. Obviously a superdollar is more valuable than any finite number of dollars, and humans/mathematicians/AGIs would treat it accordingly, regardless of the finiteness of our biological reward systems.

state_less · on June 10, 2021

> This is only a problem if you're already assuming we do everything based on our biological reward systems

Is there some other way that we are do it beside our biological reward system? It sure looks like we get an apple and not an infinite reward when we pick the right answer to be selecting button B. I understand that might not satisfy you.

xamuel · on June 11, 2021

>Is there some other way that we are do it beside our biological reward system?

Seems to me that's what this whole paper we're discussing is about. If you're already convinced that there is no other way, then you're basically already agreeing with the paper, "Rewards are enough".

enkid · on June 10, 2021

What's the behavior your trying to get the AI to do in this example? Learn how to compute the power of 2? This is a task that can be accomplished much more simply with a different reward system. For example, have A always equal 1 and B equal 2 if it is a power if 2 and 0 otherwise.

I understand you can use non real numbers, that's not what I was asking. I'm asking what's a behaviour you can't replicate using a reward system based on real numbers.

xamuel · on June 11, 2021

>I'm asking what's a behaviour you can't replicate using a reward system based on real numbers

So glad you asked! I can give an answer which people will love who take the necessary time to understand it. It's complicated, you might have to re-read it a few times and really ponder it. It's about automatic code generation (though it might not look like it at first).

Definition 1: Define the "Intuitive Ordinal Notations" (IONs) to be the smallest set P of computer programs such that for every computer program p, if all the things p outputs are IONs, then p is an ION.

See https://github.com/semitrivial/IONs for some ION examples in python.

Definition 2: Inductively associate an ordinal |p| with every ION p as follows: |p| is defined to be smallest ordinal which is bigger than every ordinal |q| such that q is an output of p. Say that p "notates" |p|.

Finally, to answer your question, I want the AGI to write programs which are IONs notating large ordinals, accompanied by arguments convincing me they really are IONs. An easy way to incentivize this with RL would be as follows. If the AGI writes an ION p and an argument that convinces me it's an ION, I will grant the AGI reward |p|. If the AGI does anything else (including if its argument does not convince me), then I'll give it reward 0.

You can't correctly incentivize this behavior using reals. The computable ordinals are too non-Archimedean to do so.