I really don't understand what's happening here. I think we use the term "API" and "server" differently. When I think of an API, I think of a library or something that I can link to. For example the POSIX API, or the P-Threads API. These are available in one way or anther from any language.
This seems to think very differently about APIs. As if APIs are somehow tied to a specific language? When they use the term API, do they maybe mean library?
I think that under the hood, the idea is that, if you want to do something, in another language (rather than just writing it in the language) you somehow generate a stub in both languages and then proxy through a (network?) connection?
To make this more complex, it seems that the proxy connections are not just on the local machine, but over the internet to another machine. Presumably you have to pay for this?
Am I right? This seems like a lot of effort to go through just to avoid figuring out how to do something directly. With really sucky performance implications. I guess it depends on what you're doing. Most languages support a foreign interface to C, so maybe that is a more sensible way to do this rather than setting up "servers" and tunnelling commands and data back and forth.
EDIT: I just had another look at the website. So, when they use the term API - it would seem that they mean some sort of an HTTP based web interface? The idea is that you have some kind of script that does something for you and you can query it from you local script. I still don't understand why you'd want to do this over running it locally?
In context, you're right of course. But you might be underestimating the number of non-web programmers who have this reaction. I know many such people, and many have asked me "The term API predates the web; In what sense is this an API?".
For such people, I propose the metaphor that HTTP APIs are a form run-time linking where the root URL (api.example.com) is the "library" and the valid URIs (api.example.com/list_of_cats) are the exported "symbols" you can reference from your program. Performing a GET on such a "symbol" is like dereferencing the pointer you get from calling dlsym. The 400 and 500 range status codes are like dlsym's error codes: 404 is a failure to find the symbol, 500+ is various run-time errors, etc.
Thank you! Perfect response. I'm a systems programmer, mostly working in VHDL, C and assembly. I use and write API's all the time. I've never used an "API" over HTTP before.
I think the confusion comes from a slow divergence of terminology. Web providers offered "API's" to allow people to access their services, these were kind of API's in the sense that an application could use them. However, it seems that for the web development community has begun to think that an API and an HTTP based API are synonymous (they are not!). Hence my confusion here.
I think the idea is that if you have a useful script, you can set it up here just once and it's automatically wrapped with an HTTP-based interface so you can use it like an API. Now you can use that script from any Internet-connected device, with basically zero setup on each device. I'm not sure if I can think of something I'd want to use it for, but this is an interesting idea and a great proof-of-concept implementation.
I think perhaps the most interesting part of this is that they're apparently spinning up/down Python (and other languages) sandbox environments on the fly, nearly instantaneously.
But why not just use the script directly on that device? Since the interfaces to the "API" are things like Python, Ruby, QJuery etc, you still need your internet connected device to be running an actual scripting language. At which point, you might as well just script it locally rather than proxying through a (potentially insecure) connection to a remote server on the internet to run it for you. I just don't understand the benefit?
The APIs only have a single interface, and that is HTTP. All the examples are just different ways of making an HTTP request, whether through the curl command or a full-blown language. Hitting this API has only one requirement: something capable of making HTTP requests.
Running the script directly requires a correctly-configured environment for each scripting language you want to support on each device. If a script has additional dependencies or libraries, those must also be installed locally if you want to run the script locally. By putting the script behind an HTTP API, it just needs to be correctly configured once on a single device (the API server), and then the requirement for using the script on all other devices is just plain old HTTP.
Suppose you have five different scripts each written in a different language (because languages have their own strengths and weaknesses). And suppose you need to run all of these scripts on three completely different devices (say, a phone, a laptop, and an Arduino). If you want to run the scripts directly on each device, then you need to set up 5*3=15 runtime environments and make sure they all function correctly (and don't break with updates). To save effort, it might make sense to put these scripts behind an HTTP API, and then just use simple HTTP requests from each device to access all scripts in a uniform manner. Note that in reality, you'll probably want to be targeting much more than three devices, so the savings can stack up quickly.
You're totally right. I'm not sure API is the best word for it, but haven't found a better one. Would love your thoughts!
I've heard a bunch of crazy usecases from HN over the past day but my original one was to simply take some of the python and R scripts I had, and without much work (a few clicks actually) be able to execute them from my frontnd js, or from rails workers. It worked really well so we decided to see if others might find it useful too.
It's badass to see what others are coming up with in the API Library, hoping that can be a valuable page for everyone. And I imagine when we figure out fair pricing, we can actually make this significantly cheaper than a dedicated server for these processes.
Please, continue to send feedback like this. You're awesome.
I'm not sure I understand what this is. Free hosted PaaS for single files, essentially? I know nothing about where the code is running? Is it performant? Is it reliable? How are they covering server costs / at what point would I be asked to pay, and how much?
Great questions. Let me answer them one at a time!
Where is the code running? Every script run happens in its own sandbox on our machines. We use docker, it's awesome. People are asking for dedicated machines specifically for their company, and others are asking to hook into their own machines. Once we figure out how people/companies want to use the site, we'll put up some options for where code will be run. I'm totally open to feedback and would love to hear what you think.
I'm measuring costs right now. But I'm seeing that when we get a fair pricing model up - if you could hit these api's only when you need them, rather than have a dedicated server for them, you'd save a money. A really different way of looking at PaaS + Apis... and I'm super excited about it. Right now, I'm just trying to collect as much awesome feedback as possible for use cases, and then we'll have a better idea for how users would want to be charged. Right now, people are asking for a simple request-based model. Would love your thoughts on this as well.
You may be interested in my Python-on-ZeroVM-On-Docker Dockerfile[1]. This adds the security and isolation of ZeroVM on top of the connivence of Docker.
Note that ZeroVM isn't an x86 VM, so you need a custom Python (which that Dockerfile downloads). There are also no network sockets, so some things are difficult to make work, but you can work around that by using network code in the Docker container, and riskier code in ZeroVM.
I'd be pretty confident in that security model.
However, it's six months old now, and likely to need some updating. ZeroVM was changing pretty quickly when I was working on it.
Do you run the scripts as a specific user inside the container then? I was under the impression that running untrusted code in Docker as UID 0 was not yet safe.
Super curious, what would you use the virtualenv for? Each script run happens in its own container. Is it to import your own libraries that we don't support yet?
I'd be more interested in seeing an infrastructure for scraping where the API functions are fixed, but the actual scraping functions are dynamically loaded so that the API user doesn't have to maintain it or re-pull/re-fork/re-compile when the website design changes.
Bonus points if one can make an ORM out of it, e.g.
for article in get_api('reddit.com').todayilearned.filter('new').limit(100):
... do something ...
Where a call to get_api() dynamically fetches the latest scraping functions, in case reddit's page design has changed.
Triple bonus points if the system can be designed in a de-centralized fashion to defend against ToSes that try to disrciminate between human eyes and machine eyes.
I created http://scrape.ly with some of those points described. It's a bit of journey of getting there and I'd love to know in more detail about what you have in mind.
The API generation with Blockspring and Kimonolab is very nicely done but I like to focus solely on the web scraping more, as it represents very difficult set of challenges.
What are you options if a site/service doesn't provide an API? I am looking into creating a third-party Android app for https://askbot.com/ which is an open-sourced (GPLv3) version of StackExchange-like website. Askbot has a limited read-only API. It is written in Python and I am not sure how much work it would be to write APIs for it.
We should do an Ask HN about this :). Would love your feedback on what makes most sense.
People have messaged me asking for pricing tiers based on rate limits / usage, or purchasing company machines to run all apis from, or even paying for extra privacy (like on github).
To be completely honest, I'm not sure. I would say charge by the total amount of time the VMs have run, but I have no idea how the costing will work out.
That's a great point. And since people aren't running dedicated machines, only paying for the requests that happen, it could even be cheaper than spinning up your own server. Interesting!
It would be sort of cool to implement cryptanalysis tools as APIs using this. Like "submit your data and we'll tell you what it might be". I built some tools in python the first time I was working on the Matasano crypto challenges but Go is a better choice. Would be nice if it was supported.
Truth is most people's APIs really cost us nothing to run, so we'd rather not charge for it :P
I'm a really big believer in sites that provide a ton of utility for free, and only charge you when you're actually costing the company something.
That said if you do want to use a ton of CPU, that gets expensive for us - so we may make paid accounts for people that use CPU heavy API's. We realized Blockspring would still be a lot cheaper than keeping your own EC2 server up all the time, so that's sorta cool!
Also if you wanted it set-up locally at your company and 24/7 phone support (from the two of us), we'd have a paid account to cover our time.
Check this out: https://github.com/hmarr/codecube
1. It's open source
2. Put this on heroku and use it privately.
3. Highly stable(because it's write using golang, you can run millions of request, and it's still work fine)
???
NONPROFIT
But project no longer in developing, so i guess maybe someone wants to reanimate codecube? On weekends for example, contact me: iamjacke AT gmail.com
I personally like the google reverse image search hack :).
I used to just run that python script manually on my computer. Now I can just call it from my front-end js when users upload images and return back some text about the image.
But you can do just about anything. Users have done r statistics from js. Or python sentiment analyses (lots of good libraries there) whenever they get user comments in their rails app.
I imagine they have an undocumented api for reverse image search. The team at Google has released this extension[1] for adding reverse image search to the context menu, which is currently the only extension that enables reverse image search for images only accessible while logged in (think if you want to reverse image search a preview of an image someone emailed to you).
I was thinking about intercepting the requests it makes and making an open source extension that does the same thing, but I haven't had the time yet.
This seems to think very differently about APIs. As if APIs are somehow tied to a specific language? When they use the term API, do they maybe mean library?
I think that under the hood, the idea is that, if you want to do something, in another language (rather than just writing it in the language) you somehow generate a stub in both languages and then proxy through a (network?) connection?
To make this more complex, it seems that the proxy connections are not just on the local machine, but over the internet to another machine. Presumably you have to pay for this?
Am I right? This seems like a lot of effort to go through just to avoid figuring out how to do something directly. With really sucky performance implications. I guess it depends on what you're doing. Most languages support a foreign interface to C, so maybe that is a more sensible way to do this rather than setting up "servers" and tunnelling commands and data back and forth.
EDIT: I just had another look at the website. So, when they use the term API - it would seem that they mean some sort of an HTTP based web interface? The idea is that you have some kind of script that does something for you and you can query it from you local script. I still don't understand why you'd want to do this over running it locally?