Are you using Node.js or Java for your Lambda function?
If you are using "Node.js", you may be seeing slow times if you are not calling "context.done" in the correct place, or if you have code paths that don't call it.
Not calling context.done could either cause Node.js to exit, either because the node event loop is empty, or because the code times out, and Lambda kills it.
When node exits, the container shuts down, which means Lambda can't re-use the container for the next invoke and needs to create a new one. The "cold-start" path is much slower than the "warm-start" path. When Lambda is able to re-use containers, invokes will be much faster than when it can't.
Also, how are you initializing your connection to DDB? Is it happening inside your Lambda function? If you either move it to the module initializer (for Node), or to a static constructor (for Java), you may also see speed improvements.
If you initiate a connection to DDB inside the Lambda function, that code will run on every invoke. However, if you create it outside the Lambda function (again in the Module initializer or in a static constructor) then that code will only run once, when the container is spun up. Subsequent invokes that use the same container will be able to re-use the HTTP connection to DDB, which will improve invoke times.
Also, you may want to consider increasing the amount of RAM allocated to your Lambda function. The "memory size" option is badly named, and it controls more than just the maximum ram you are allowed to use. It also controls the proportion of CPU that your container is allowed to use. Increasing the memory size will result in a corresponding (linear) increase in CPU power.
One final thing to keep in mind when scaling a Lambda function is that Lambda mainly throttles on the number of concurrent requests, not on the transactions per second (TPS). By default Lambda will allow up to 100 concurrent requests.
If you want a maximum limit greater than that you do have to call us, but we can set the limit fairly high for you.
The scaling is still dynamic, even if we have to raise your upper limit. Lambda will spin up and spin down servers for you, as you invoke, depending on actual traffic.
The default limit of 100 is mainly meant as safety limit. For example, unbounded recursion is a mistake we see frequently. Having default throttles in place is good protection, both for us and for you. We generally want to make sure that you really want to use a large number of servers before we go ahead and allocate them to you.
For example, a lot of folks use Lambda with S3 for generating thumbnail images, sometimes in several different sizes. A common mistake some folks make when implementing this for the first time is to write back to the same s3 bucket they are triggering off of, without filtering the generated files from the trigger. The end result is an exponential explosion of Lambda requests. Having a safety limit in place helps with that.
In any case, if you are having trouble getting Lambda to scale, I'm happy to try and help.
I'm initialising the DDB connection outside the function as you suggest. However, I'm calling context.succeed() not context.done() -- would this be problematic?
I'll try increasing the "memory size" and requesting an increased concurrent request limit too, thanks.
Your code looks correct. I would expect something closer to 50ms in the warm path (300ms in the cold path seems about right).
I'll take a look tomorrow and see if I can reproduce what you are seeing. I'm not super familiar with API gateway, so there could be some config issues over there.
If you want to discuss this more offline, feel free to contact me at "scottwis AT amazon".
If you are using "Node.js", you may be seeing slow times if you are not calling "context.done" in the correct place, or if you have code paths that don't call it.
Not calling context.done could either cause Node.js to exit, either because the node event loop is empty, or because the code times out, and Lambda kills it.
When node exits, the container shuts down, which means Lambda can't re-use the container for the next invoke and needs to create a new one. The "cold-start" path is much slower than the "warm-start" path. When Lambda is able to re-use containers, invokes will be much faster than when it can't.
Also, how are you initializing your connection to DDB? Is it happening inside your Lambda function? If you either move it to the module initializer (for Node), or to a static constructor (for Java), you may also see speed improvements.
If you initiate a connection to DDB inside the Lambda function, that code will run on every invoke. However, if you create it outside the Lambda function (again in the Module initializer or in a static constructor) then that code will only run once, when the container is spun up. Subsequent invokes that use the same container will be able to re-use the HTTP connection to DDB, which will improve invoke times.
Also, you may want to consider increasing the amount of RAM allocated to your Lambda function. The "memory size" option is badly named, and it controls more than just the maximum ram you are allowed to use. It also controls the proportion of CPU that your container is allowed to use. Increasing the memory size will result in a corresponding (linear) increase in CPU power.
One final thing to keep in mind when scaling a Lambda function is that Lambda mainly throttles on the number of concurrent requests, not on the transactions per second (TPS). By default Lambda will allow up to 100 concurrent requests.
If you want a maximum limit greater than that you do have to call us, but we can set the limit fairly high for you.
The scaling is still dynamic, even if we have to raise your upper limit. Lambda will spin up and spin down servers for you, as you invoke, depending on actual traffic.
The default limit of 100 is mainly meant as safety limit. For example, unbounded recursion is a mistake we see frequently. Having default throttles in place is good protection, both for us and for you. We generally want to make sure that you really want to use a large number of servers before we go ahead and allocate them to you.
For example, a lot of folks use Lambda with S3 for generating thumbnail images, sometimes in several different sizes. A common mistake some folks make when implementing this for the first time is to write back to the same s3 bucket they are triggering off of, without filtering the generated files from the trigger. The end result is an exponential explosion of Lambda requests. Having a safety limit in place helps with that.
In any case, if you are having trouble getting Lambda to scale, I'm happy to try and help.