> An operation is idempotent if performing it multiple times yields the same res...

xsmasher · on Dec 12, 2019

I agree with the author re: returning success on retries; it lets you automate the retry process.

I work in mobile games; because someone might play in a tunnel or bad network area, I need to make sure that every request is retry-able.

To do that I generally include a GUID of some kind in the request; if the client says "create an entry for XXYY," there's a chance that the request will get to the server but the response will fail to reach the client.

If the client is able to retry the request (with the same GUID) and get a success response, then I can have the retries handled transparently in the communication layer; all the client code needs to know is "I made this request and it was a success," without any knowledge of how many tries it took.

If the second/third/etc request returned an error of some kind, I wouldn't have a good "success" response to hand back to the game code. (I'm assuming the "success" response contains some information that the game code needs.)

tlb · on Dec 11, 2019

The idempotent operation is usually "Ensure that a VM exists with this name and spec".

An advantage of this style is that if the client dies or times out during the (long) operation, it can retry and get the same answer instantly.

MadWombat · on Dec 13, 2019

What happens if some other process has already created a VM with this name and spec? Under most realistic scenarios I would rather VM creation failed than silently clobber someone else's VM.

AgentME · on Dec 12, 2019

With an idempotent API, starting a VM and doing something with it can look like this:

    let new_id = generate_id();
    retry_with_backoff(() => api.make_vm(new_id));
    retry_with_backoff(() => api.do_thing_with_vm(new_id));

This works even if any individual API calls fail, or if the API call makes it to the API server but the response fails to make it to the client.

If the APIs aren't idempotent, then you would have to do this to get the same behavior:

    let new_id = generate_id();
    retry_with_backoff(() => {
      try {
        api.make_vm(new_id);
      } catch (e) {
        if (e.info && e.info.code === 'vm_already_exists') {
          return;
        }
        throw e;
      }
    });
    retry_with_backoff(() => {
      try {
        api.do_thing_with_vm(new_id);
      } catch (e) {
        if (e.info && e.info.code === 'thing_already_done') {
          return;
        }
        throw e;
      }
    });

This nonidempotent API is harder to use. Someone that doesn't know about these error codes or the fact that the API isn't idempotent will write code without the try-catch blocks that doesn't handle retries correctly. With the idempotent API, users fall into the pit of success where things just work without them having to know the details about each of the edge cases.

The nonidempotent API is exposing some extra data to the user, but it's not super useful. You basically always want to treat the vm_already_exists error identically to a success response. Maybe you also want to log some data about how many retries were necessary so you can figure out how spotty the network connection is, but there's no reason that couldn't work with the idempotent API either. The idempotent API could include a header about whether the action was already taken previously.

Consider how TCP connections are used by applications. Your application doesn't have to opt in to handling packets that were resent. The fact that some packets had to be resent is by default just an implementation detail. You have to opt in to get information about the resent packets; by default they're handled like regular successful packets. Idempotent APIs are about making handling retries work by default in a very similar way.

MadWombat · on Dec 12, 2019

Lets start simple, your example assumes that you generate the id yourself. In my experience a common API usage pattern would look more like

  try:
      vm_id = api.make_vm()
  except SomeError as e:
      log.error(e)
  else:
      res = api.do_thing_with_vm(vm_id)

and in your example, if we are generating ids ourselves, we still have to verify that we got the right VM. If your ids are provably unique, there is no reason to generate them, the API can take care of that, but if you want something like a named entity, you have a problem. What if the name is already taken? So your code would look more like

    new_id = generate_id()
    try:
        vm = api.get_vm(new_id)
    except VM_DoesNotExist:
        vm = api.make_vm(new_id)
    except SomeError as e:
        log.error(e)
    else:
        api.do_thing_with_vm(new_id)

because if the make_vm API simply returns a VM whether it was created or not, it is entirely possible that you are getting a VM that is busy doing something else for some other process.