The interesting thing is "serverless" batch processing. Amazon have essentially reinvented the 1970s mainframe batch processing business model, complete with charging for CPU usage. And made it fairly readily available for any developer anywhere.
Almost, does lambda track resource usage down to cpu cycles, heap usage, and network bandwidth? I've contemplated adding scripting language support for my BaaS project with this level of accounting, but from what I've determined, it would require creating a new language, or a new runtime for an existing language, or Lua. Lua's is simple and easier to modify from what I can tell. Unfortunately, no ones writing web apps in Lua, aside from the OpenResty guys.
The goal being the utility model of computing, pay for exactly the cycles you use, no more, no less. Off premise on demand. Like tap water.
They do, yes. Rounded up to the nearest 100ms, scaled for chosen RAM capacity: https://aws.amazon.com/lambda/pricing/ Note that this doesn't measure actual RAM use, although they do display it after lambda runs.
Data traffic is already measured and priced accordingly in AWS so that's not something new for Lambda.
That's not exactly what marktt was asking. For example, calling sleep (explicitly or implicitly while waiting on network input) would not incur much CPU cost under marktt's accounting method but does incur wall clock charge under Lambda.
(It's totally understandable and fair that it does; it's just different than what he asked.)
One could think of Lambda as charging for (A * network bandwidth + B * RAM usage * wall clock time + C * CPU time + D * Disk IO) where C and D are both zero.
From a capacity planning perspective, your goal should be 100% cpu utilization per box. Allocating excessive, unutilized wall clock time is wasted capacity. Same applies for ram/heap per call. These are the same considerations that played out in client/server vs mainframes for so many years.
Mainframe budgeting of cycles, memory, an IO was highly effective and efficiently utilizing resources. It's a model of computation that has disappeared, but is still relevant. When google app engine first came I had hoped it would utilize this model, but instead went the containerization route.
Lambda does not bill on these, nor do they provide you with these metrics out of the box. This is something that 3rd-parties can provide, however... it's something that we collect with IOpipe[1] (disclaimer: I'm CTO & founder)
That's exactly what we offer [1]. Serverless, in process, hosted Sqlite served by apache and mod_lua. We have application level caching, so for high read, low/medium write applications, scale is no problem, you're mostly served from redis in that case. Applications that do a lot of inserts/updates aren't ideal in our case. Sqlite's WAL can handle a lot, but our service hasn't been stressed that way yet.
Why Sqlite? It's really an awesome database, and using it in process from apache, we can achieve truly massive multi-tenancy, and really low cost.
We also have static file hosting, we're going for hosting of single page apps, but the database API is available over CORS for any domain.
Do you allow any other server side scripting? For instance, I'm building something that may fit this pretty well, but I have some PHP to geocode addresses that are submitted without lat/long coordinates before the row is written to the db. And my next step is some backend scripting to get some automated feeds on some interval (likely daily) into the database for the web interface to consume.
I'd like to eventually, but not at this time (see my Lua comment above). You can of course process the data offline and use our database API to update the table on a schedule. I realize that's not optimal to what you're asking thougth.
Or, you know, code an aws lambda to call our service and get your data, process it, then post it back. :)
Apologies, for spamming this thread, thought of this after the edit window expired.
Thanks for checking, true no info on pricing. We just launched a few weeks ago and haven't seen much interest yet. We're just offering the rate limited free tier for now.
Out of the gate our users get a subdomain on our site. But to serve webapps from a custom domain (ie www.mysite.com), or access the database API from CORS from a different domain, our tentative plan is to charge a small monthly fee (less than $10 a month) and remove the rate limit.
Thanks for mentioning those things, they help guide the roadmap. Right now we're focusing on performance, then a SQL engine, then auth tooling. Could you expand more on spam? Like DDOS and rate-limiting?
You must be an engineer yourself ;) using the 4X rule. Haha.
We've built a tool called panic (https://github.com/gundb/panic-server) to test stuff like this, however correct we haven't finished integrating all the pieces.
Yeah, shoot me a message ( mark AT gunDB DOT io ) now, and I'll send you a ping when we're testing that stuff. :) Thanks for the feedback!
I've been using S3 for serving so much of my web work that I can't believe it has only been 3 years since I learned how to do static page serving on S3.
Do you use any particular framework to generate the static pages like jekyll or middleman?
Slightly OT: If you are less of a ruby guy, what are the go to frameworks in other languages? Especially when you are not looking for blog cms (e.g. pelican or hugo).
I use Middleman almost exclusively. It gives me almost the simplicity of Jekyll but with all the flexibility of writing and including Ruby whenever I feel like it (which can be bad...). I've written things that are almost akin to a small read-only Rails app except the data is stored as YAML files.
If I had to move from Ruby, I think I'd give Lektor (made by the creator of the Python Flask web framework) a deep look. Besides being a static site framework, its creator has endeavored to build a client-friendly GUI, making it a potential way to build static sites that can be maintained by laypersons. It has a very opinionated structure, which had caused me to rethink how I arrange files in my other static projects: https://www.getlektor.com/
21 cents isn't the true cost but cool nonetheless. He's on the free tier available to new signups for up to a year. For that matter he could have run it on T2.micro and it would still have cost pennies. Still, a neat setup. (Edit: Lambda's free tier isn't time limited, as noted in other comments below)
Lambda requires a lot of boilerplate and configuration to get things setup. Even when you use Serverless/Apex/Gordon, there's quite a bit of configuration to do. I found ClaudiaJS (https://claudiajs.com/) is much quicker to get up and running and it sets up the API gateways for you based on your API signatures.
A 35mb jar deployment has gotta be painful (zip, upload, deploy). It helps to have a local dev and testing setup off the get go. Once you get past all that, working on your codebase can be fun again.
>The Lambda free tier includes 1M free requests per month and 400,000 GB-seconds of compute time per month. The memory size you choose for your Lambda functions determines how long they can run in the free tier. The Lambda free tier does not automatically expire at the end of your 12 month AWS Free Tier term, but is available to both existing and new AWS customers indefinitely.
The free tier is actually big enough to run a single 128Mb calculation for 37 days each month.
Yes, but it doesn't scale up easily and it doesn't scale down at all. If you are doing exactly that amount of traffic on a predictable basis the fixed server makes sense.
Regarding to JAR file size, you could use minimizeJar flag of maven-shade-plugin, add only needed aws sdk artifacts as dependency. You could also use https://github.com/lambadaframework/lambadaframework which is the JAVA-AWS Lambda framework I created.
This is pretty neat, stuff like this is slowly creeping in and going to cause dramatic decreases in costs not just for startups but enterprises a like.
Similarly, we ran a stress test where we saved 100M+ records a day (~100GB) for about $10 a day (all costs, machines, disk, backup).
Not their core business? Their version wasn't profitable enough?
Hadn't really read about Parse before. It seems it was primarily aimed at mobile developers. Did it have significant use-cases beyond that? In comparison, AWS Lambda, Google Functions, and Azure Functions seem more generically applicable.
This is a great writeup and really demonstrates how "the cloud is the computer".
As many comments point out there are other ways to do this.
But doing it entirely with AWS primitives of Lambda, S3 and automatic triggers is something very novel and almost certainly the best way to use "the cloud".
The title is actually understating things a bit. This is $0.21 dollars both for ~30k page views of a static file, but also a fairly involved data processing pipeline which feeds into the generation of that static file.
Not sure why this even needs a backend though, as the post above you mentions, this could be made all in the frontend. Doesn't seem like very complicated math.
However, then again $0.21 is not very much but, when it could be free...
It should be possible to scrape the data with AWS lambda and then just push the results onto github and let the browser deal with the data. All for free.
Is such use of Github in compliance with their terms of service?
I've been thinking about using such public git providers to store small amounts of data, possibly encrypted. For example important documents that I don't want to lose. It seems that it'd be ok as long as you manually create the account.
Not really although you probably need to use quite a bit of traffic before it becomes a problem. Your example of personal use only seems fine.
There was a post here a while ago (that I cannot find anymore) about the devs of a package manager of sorts who were asked kindly by Github to do something about their excessive data usage. So it's probably not a good idea to build a company on it.
"Data is typically made available at different days/times throughout the week by different external sources, so each Collector is triggered by a CloudWatch cron job." Also it's in Java, but the author mentions at the bottom that python would be better if you need low latency.
Run the Python cron job every five minutes. It checks the condition of all data sources. You don't need a server to do this, you can run this from your house!
Maybe, but I guess I don't see why you're so worried about it. It works and it costs less to run than it would cost in electricity if he were running it from his house.
He doesn't have a "fully fledged server". He's sharing some servers with tens of thousands of other people. It's literally less hardware investment then a RPi let alone a laptop, and it's priced accordingly. And it runs even when his home computer isn't powered on or online.
Then you have to pay for a server to run the cron jobs which would be more expensive. You could run it from home, but then you need to manage uptime yourself.
A simple cron job doesn't require managing uptime. It just needs to run frequently enough and then upload the generated HTML to github. I'm sure he uses his laptop at least once a day. This is crazy!!!
I don't know about OP, but I know that I often don't use my (personal) laptop once a day. I think I went a month earlier this year where I only touched it twice, and that was to move it out of the way so I could write a letter at that desk. Setting up something like this that's so inexpensive is a reasonable selling point for hobby projects that fit the model AWS has been designed for.