So usually with Lambda, you want your jobs to be as atomic/quick as possible, as Lambda is stateless and has a maximum duration of 15 minutes.
As for the warm up times, the decompression of Chromium with Brotli takes about 700ms on a 1.5GB Lambda (this is faster than Gzip/Zip). Launching Chromium itself and opening a new tab takes another 400ms or so. If you keep your Lambdas warm (by registering a scheduled ClowdWatch event every 15 minutes for instance) your startup time will effective be those 400ms.
Presumably you can also keep Chromium running and keep a pool of tabs to reuse. Doing this it would pretty fast I imagine.
Depending on your use case you can also disable security and open as many iframes as you want in a single tab. Not sure how this compares to multiple tabs though.
Of course you'll run into cold start again when lambda has to scale.
Yes, this is a good solution but only if you don't have any sort of session data that you want flushed out after it runs. One could argue you could use browser contexts (incognito tabs) to have ephemeral sessions, but unfortunately that feature doesn't work in --single-process mode (which AWS Lambda requires).
I'm using incognito mode to parse some pages that for some reason I can't using the normal context.
I have been considering moving my pool of chromium workers to lambda functions so we can avoid api slowdowns due to a high number of parsings at the same time.
Are there any other side effects of running chromium headless in a lambda function?
- I'm just getting started with Lambda so pardon if this is ignorant, but what's the cold start time of Chromium? Or can you warm start it somehow?
- Since scraping often depends on state, wouldn't you hit a timeout doing longer scraping joba?