A couple thoughts occurred to me as I read the post:
- Lambda functions deployed using Docker images can be up to 10GB.[1] Would that change your math here? I'm curious what the tradeoff would be vs parallelizing more function executions searching smaller datasets on both cost and performance.
- Great notes on the anti-competitive nature of the current market. If there was an open standard on crawling, maybe we'd see more innovation here.
Maybe. Based on my experiments with lambda I doubt they have enough CPU to deal with the additional space. At the current size it's bumping against the limits of lambda as is. Might be possible to switch to something like C++/Rust/C/Zig however to help with this.
A couple thoughts occurred to me as I read the post:
- Lambda functions deployed using Docker images can be up to 10GB.[1] Would that change your math here? I'm curious what the tradeoff would be vs parallelizing more function executions searching smaller datasets on both cost and performance.
- Great notes on the anti-competitive nature of the current market. If there was an open standard on crawling, maybe we'd see more innovation here.
- Cool use of a bloom filter!
1. https://aws.amazon.com/blogs/compute/working-with-lambda-lay...