This is one of those things that I didn't realize had a name until now.
I've been wanting to get banks to implement this kind of thing forever. For instance, it'd be nice if I could log into my bank and set a url that they post to every time a transaction happens. Then I could set up a service at that url to take their message and do something with it (email myself, send an sms, automatically enter the transaction in buxfer, etc).
Same for statements. Instead of sending me an email, just let me specify a url. When my statement is ready, it's posted (in json or xml format) to a url I specify. Then I can slice and dice my own financial data and update my long term cashflow graph, or something.
Since the url is specified by me, while logged into my account (which should be secure), I have total control over who has my data (assuming DNS is secure and I provide an https url). If I wanted to use a service like Mint or Buxfer, I could add their callback urls to my account. When I decided to stop using their service, I simply remove the urls from the account. There is no need for Mint/Buxfer to know my login credentials or even what bank I use.
At least for chase, they will happily send you an email every time there is a transaction on your account. If you want a URL to post to, pull up the history of the account and just refresh that.
I'm wondering how long it will take until we see the first DOS attacks, triggered by 2 or more webhooks in a feedback loop (e.g. A triggers B triggers C triggers A triggers B ...).
Good thinking! I guess we'll have to devise some way to avoid these kinds of cycles, or limit them by setting some type of maximum number of bounces on a webhook message.
How would you do that? The point here is, the webhook call can't really know what it's POST'ing to; all it takes to set up a loop is another web app that will itself formulate an arbitrary POST.
Just keep track of a 'url stack' in the POST message (assuming the post body is structured with something like JSON or XML). If a service detects a cycle it can drop the message or give back a 50x error or something.
The problem here is that you can aim a "web hook" at any other HTTP endpoint, which may or may not honor any "web hook loop avoidance" protocol you come up with, and if that endpoint re-triggers the activity that generated the hook update, you have a loop.
HTTP is stateless, so the "url stack" idea is going to be tricky to implement.
Yeah I suppose keeping a trail of breadcrumbs only helps you prevent accidental loops. A malicious service in the chain could just clear the stack and create a loop, and no one would be any the wiser.
A malicious machine that wants to DoS a server doesn't need web hooks in order to make a bunch of requests to the server. It can just run ab, the Apache Benchmark program.
Presumably DoSing wouldn't be the objective here. If some of your services in the loop send SMS messages or modify your checkbook register, that would be much more annoying than a simple DoS.
You are talking up the stack, users, sessions, etc. I'm talking about using controls at lower levels of the stack; basic ip-based bandwidth throttling, forcing all requests to pass through a proxy, and the like. If the business requires that users be able to point random postbots at their account you have to plan for misbehaving counterparties; and application level controls while important will not be sufficient to keep the system up in the face of a broken client that is spamming the server with oversized requests at a pace higher than it can accomodate.
Even if you do low level throttling that maintains the stability of the system, it doesn't necessarily ensure usability. Who would want to use webhooks if there was much of a chance of frequently getting flooded by spam? It doesn't matter if the trickle isn't enough to kill your server, if you keep getting buzzed on your mobile every thirty seconds, you won't want to use the service any more.
This problem has been relatively solved for email beforehand, every once in a while the spammers figure out something new or something goes awry and you have a few offers for male enhancement products in your inbox; do you stop using email because of it?
The problem is solved for email because MTAs and mailing list programs go through contortions, documented in a long series of RFCs and Internet-Drafts, to avoid loops.
Isn't it just a bit naive to just assume that every HTTP-based API call is going to work together?
What fields does it return? Do they post XML? JSON? These things are both important and hard to get right. Even if you settle on a technical format, you'd still need to agree on fields for every single different kind of content (otherwise it's "headline" and "body" on my blog system and "title" and "contents" on yours). That's why this hasn't happened yet.
I've dealt with some truly awful HTTP-based APIs before. The kind that use some homegrown XML that doesn't validate. It's not nearly enough to say "I've got an API, here's the URL, POST some data to it"
I am guessing the assumption is that each service that allows hook specification also identifies the info it would be posting.
ie:
Enter a webhook you would like to be POSTed to when you receive a tweet. The webhook will be appended with the following url-encoded fields: sender, time, message.
Whatever URL you specify as your webhook needs to be able to handle it. The burden is on you to make sure that you integrate successfully with what is being POSTed. It is actually a pretty nice method of ensuring that a service can inter operate with a large number of other systems without the service provider having to know about them all.
I don't think the assumption is that you can just enter any old URL and it will magically work. I think the power is behind it is that it is simple enough that:
a) any service can trivially implement their end of it
b) anyone interested in extending the service can trivially implement their end of it
I'm sure someone doing part b will decide they need to take the post and form an XML message and relay it to a web-service, but it isn't really necessary.
I'm curious, what's your point, exactly? That we shouldn't do this at all because it might not be perfect out of the gate?
If it doesn't work, then it doesn't work, and you'll just whine at the person who broke it until its fixed, or use a different implementation that does work. Yay, loose coupling.
No, certainly the world would be better if everything had an API.
But without standards, things are never going to "just work" together. We may as well just encourage people to use semantic HTML and microformat and scrape data off the pages when we want it.
There are plenty of standards right now, why doesn't everything "just work together" yet?
Ad-hoc apis can be frustrating, but at least they don't make you suffer under the illusion that there's some magical perfection you're missing out on if only that wrinkle in how they implemented the standard got ironed out.
Loose coupling, tolerance of fucked-uppedness, etc., are why the web works at all. The web standards movement is dead, long live the web.
Things like what? My web server works with a phenomenal number of browsers and clients because they all more-or-less follow the HTTP spec. It just works.
And it's more than just frustrating, it makes the idea presented by the OP basically impossible. If you have to add your own processing layer on top of the API, then it loses the magic of having one site ping another.
The web works because browsers and such are tolerant of deviations from the standard. The web works because it has standards, not in spite of them.
You couldn't say "just connect to my server on port 80 and, uh, we'll figure out how to exchange data"
Of course there has to be some kind of minimal agreement, I didn't mean to imply otherwise. This idea, "POST to some url on some action", that's pretty minimal. You were arguing it was too little and wouldn't work, I was arguing it was just enough to build on without collapsing under the weight of expectations of miracles.
HTTP is a remarkably simple and straightforward spec compared to some others, and HTTP servers are amazingly tolerant. As a client, you can get a lot of work done knowing only GET and POST and ignoring all the different kinds of errors and redirects. I would hold up HTTP as evidence supporting what I'm saying.
Same with html. Yes, we all want our beautiful and perfect html, and we curse browsers that don't implement the specs, but honestly, renderers are so forgiving. As long as you're in the ballpark, you're going to get something something useful.
SOAP, otoh. The WS-* specs. What utter misery to work with, with their outright lies about out-of-the-box complex interoperability.
I must say I really like this idea. I don't know why it took so long to kick in, though, it's quite simple. Finally sites don't work just as servers, but also as clients, information reflectors, mirrors or whatever.
I'd suggest that if someone loves this concept -- a meta-service that works between service and client might be very useful. Similar to FeedBurner.
Users would have a central location they could add their hook urls to. (Drop-down: service/method to hook/account credentials.) They could establish preferred formats, like XML or JSON and this service could translate from the primary service.
Services could talk to the meta-service via a queueing system, so scaling would be simple.
The meta-service could go free and make money by offering to do SMS messaging or email messaging for clients with a subscription.
Here's an idea, a firefox extension that let's you turn on a 'broadcast' feature that causes your browser actions to be posted to a remote server; where subscribers can 'follow' your clickstream in near real-time.
I wrote something like this back in the days before browsers reliably implemented cross-domain security models. I could see where each visitor on my site went in real time subsequent to their visit, chat with them, and redirect them to new pages. Enabled some interesting conversations.
It's interesting that this has come up because I've been thinking along these lines recently. Specifically, I'd like to build a service (probably using RoR) that acts as an aggregator for these webhooks.
Then I'd be able to assign endpoints that I could use for specific services, so suppose I decide to use github I might define http://somethingmoved.org/jgrahamc/github and give that URL to github for the post-receive URL.
On my page I can define how that gets added to my feed (perhaps part of the XML document is extracted and added as a simple line of text, and of course for common services there could be ready made templates of what to do).
If standard service endpoints were done right then you can imagine going to a FedEx tracking page and finding a 'SomethingMoved' button where you are asked to enter just the main part of your URL (e.g. http://somethingmoved.org/jgrahamc) and FedEx could automatically tack on /fedex which would then have a standard handling.
Then I also say that this github message should be passed on to other services (perhaps many, perhaps just RunCodeRun). Clearly I could do some transformation before passing things on. For example, I might pass on to RunCodeRun unchanged, but extract part and pass to Twitter.
Anyone got time to work on hacking this together with me in RoR? (I really need someone who's good with web UI hacking since I'm an application guts kind of guy).
There is no simpler way to allow open ended integration with arbitrary web services.
Providing an email address is simpler, and it's already provided to these specs by a zillion times more web services. Email also provides for better reliability (receive a message sent even when your service is unreachable) and queueing.. all outside of the code and often infrastructure you'd need to do yourself.
The main reason it may be worse are that we've got a ton of programmers, frameworks, and virtual hosts that seem to live in a limited little world called "the web." (Sorry - personal rant - this mindset is annoying for me when trying to hire programmers these days. It's even annoying when I talk to somebody about a new business involving a lot of programming, and they assume it's a website.)
Sure, spam is also an issue, but that only reinforces my point. Spam is an issue because email is in common use.. it's not for arbitrary HTTP POST because it isn't. If it ever is in common use, spammers would have no trouble using that too. They're doing plenty of arbitrary HTTP POSTs to web comment systems including cracking some CAPTCHAs in the process.
If he changed his stance to saying there'll be an arbitrary body provided at setup time, or given no body, it'd HTTP GET it, perhaps allowing some keyword substitution for a few standard things like the address that changed.. That'd be something different. You really wouldn't have to write or deploy any "code" for many integrations - for the sender or receiver. You could instead just provide the appropriate info to call an existing web service with its existing parameters.
On the other side, the email address callback also allows you to contact humans more easily, and if you did the same arbitrary response with keyword substitution thing.. You could make websites do form letters for you. I'm not sure that's a good thing or not.
Isn't this the same ideal as SOA and similar? Decentralized services hanging out on the internet? Except we have gone from SOAP -> REST -> Super Simple REST as transport mechanisms.
I've been wanting to get banks to implement this kind of thing forever. For instance, it'd be nice if I could log into my bank and set a url that they post to every time a transaction happens. Then I could set up a service at that url to take their message and do something with it (email myself, send an sms, automatically enter the transaction in buxfer, etc).
Same for statements. Instead of sending me an email, just let me specify a url. When my statement is ready, it's posted (in json or xml format) to a url I specify. Then I can slice and dice my own financial data and update my long term cashflow graph, or something.
Since the url is specified by me, while logged into my account (which should be secure), I have total control over who has my data (assuming DNS is secure and I provide an https url). If I wanted to use a service like Mint or Buxfer, I could add their callback urls to my account. When I decided to stop using their service, I simply remove the urls from the account. There is no need for Mint/Buxfer to know my login credentials or even what bank I use.