ReverseHttp

extension · on March 9, 2009

There are two protocols described here. One is an extension to HTTP which allows the client and server to swap roles while still using the same connection. This would allow a server to "push" events to a client asynchronously. This will never work in a browser.

The other protocol tunnels an HTTP connection over another HTTP connection in the opposite direction. Tunneling asynchronous messages over HTTP is an old technique which can be implemented in a browser.

Neither protocol enables any kind of novel functionality. They merely add another layer of HTTP cruft.

tonyg · on March 9, 2009

Well, the intent is to design a systematic way of setting up a relay for HTTP requests from a public internet through a gateway to an application that otherwise wouldn't be publicly addressable. Without a protocol like the one I've drafted, setting up HTTP servers or CGI scripts stays ad-hoc, requiring local access to the gateway server and DNS and firewall configuration.

extension · on March 10, 2009

The problem could be solved much more generally with a protocol to request socket level forwarding of arbitrary network services. This could be used transparently to create a gateway for HTTP or any other protocol. Some existing protocols come close to doing this (e.g. SSH) but I don't know of any that handle public namespace allocation.

For the case of "works in a browser today" aka Comet, it is again better to solve the more general problem of bidirectional tunneling over HTTP (e.g. http://xmpp.org/extensions/xep-0124.html), through which you could make any sort of connection, including a reversed HTTP connection or the gateway request protocol above.

Imposing an extra HTTP layer and/or building on top of HTTP (in the first case) needlessly complicates and significantly restricts both of these protocols without deriving significant value from existing standards or infrastructure.

tonyg · on March 10, 2009

Check out http://www.orbited.org/ for TCP-sockets in the browser (though I don't think it lets you act as a server yet). As you point out, nothing yet handles public namespace allocation, and this is key; for the specific case of TCP servers in the browser, port contention could become an issue fairly quickly, in which case lifting the level of abstraction to something like XMPP or HTTP (as I've done), where the addressing model is more flexible than TCP's, seems the sensible thing to do to avoid this.

XEP0124 ("BOSH") is very similar indeed to what I've defined; the differences are (1) BOSH is XML-specific and (2) it only provides a tunnel between the client (browser or not) and the server. What I've been experimenting with is content neutral, and, crucially, not only specifies the tunnel, but also specifies how the gateway server should expose the application at the end of the tunnel to the rest of the world. That's something that I have not seen before anywhere. (Except, as you mention, by SSH in a limited way.)

With regard to leveraging existing infrastructure: this is exactly why restricting ourselves to carrying HTTP over the transport is a good idea. We get to reuse all existing infrastructure such as URLs, proxies, and of course the ubiquitous HTTP client libraries. Raw TCP sockets, even if the public namespace allocation issue were addressed, do not have an URL-like notion, and caching proxies do not exist; further, raw TCP access is in many environments not permitted or not available (e.g. within corporate firewalls, or running within a browser). Using HTTP rather than TCP is a deliberate choice to structure the network by providing not just a transport (packet-based, at that!) but a notion of addressing and a content model. HTTP out-of-the-box is a much richer protocol than TCP.

In conclusion, what I've proposed is in its transport aspect no more complicated than XEP0124, and in its name-registration aspect AFAIK not comparable to anything currently existing. The restriction to HTTP gives us an addressing model already widely supported and understood, and lets us reuse existing infrastructure and avoid needless reimplementation or reinvention.

extension · on March 10, 2009

URLs could be used with a generalized protocol. The client would specify the URL scheme, port and an arbitrary name and the server would generate and return a URL, or an error if it doesn't support the requested scheme (servers could support a very limited set of schemes and ports, perhaps just one). Raw socket endpoints would use "tcp://host:port" and "udp://host:port". Servers that provide raw sockets would probably want to create a subdomain for each endpoint to avoid port contention. Since the server knows the URL scheme, it can transparently do caching/filtering/mangling for particular protocols. Making a request with the "http" scheme would be functionally equivalent to your reverse HTTP.

This is just off the top of my head and there are surely better approaches but the point is that it's quite doable and probably as simple or simpler than something at the HTTP layer.

HTTP's "richness" is also what makes it a pain in the ass. It's a megalomaniacal protocol designed for a very specific purpose and when you are forced to use it for any other purpose, you have to carry a lot of baggage, and the baggage is full of rocks.

This gateway service is nearly always going to be used to create some sort of ad-hoc messaging endpoint, rather than a proper web server with web pages, so why force tunneling over TWO layers of HTTP while precluding the use of any existing wire-level level protocols?

It's time we buried the "use HTTP for everything" meme. We already have an everything protocol called TCP and if there's going to be a layer after that, it will be a carefully designed, flexible messaging protocol like AMQP. HTTP adds negative value as a general purpose transport and it's not even that great for serving web sites.

tonyg · on March 11, 2009

The generalized protocol/URL idea is a nice one. I'm not sure it'd be simpler, but it'd certainly be useful.

It's interesting you mention AMQP; this work I'm doing actually came out of some of the work I did as part of my work on RabbitMQ.

extension · on March 12, 2009

Ah, you work at LShift. A funny coincidence indeed.

I think the URL request protocol itself would be fairly simple but any use case would be application specific so coming up with a general purpose implementation might be tricky.

I'm just finishing off the Erlang book and for my first project, I was going to either build a general purpose Comet server (improving on Orbited, Meteor, cometd, etc) or flesh out the above protocol and implement it... or a combination of the two. If you want to offer input or be involved: jedediah at silencegreys dawt kom.

samueladam · on March 9, 2009

The demo speaks for itself.

http://www.reversehttp.net/demos/demo.html

DLWormwood · on March 9, 2009

Case breaks things; entering a sub-domain name with uppercase letters causes the local server to fail. DNS names are supposed to be case-insensitive.

http://www.rfc-archive.org/getrfc.php?rfc=4343

apgwoz · on March 9, 2009

is that the fault of DNS or the actual server sitting behind it, expecting the host header?

tonyg · on March 9, 2009

It's the fault of the server reading the host header. I'll fix it now.

DLWormwood · on March 11, 2009

Cool. It normalizes to lowercase, but it's otherwise functional, even if the URL bar's case differs from the posted version.

TimothyFitz · on March 9, 2009

This is a different propopsal than the IETF draft by Donovon Preston, which is knows as "Reverse HTTP".

This is apparently "ReverseHTTP" and the specification looks about twenty times longer, needlessly. I'd love to hear any good reasons why I should use this "ReverseHTTP" instead of long-polling or Reverse HTTP... the IETF draft

tonyg · on March 9, 2009

It's very similar to Lentzcner & Preston's I-D, yes. We both seem to have independently invented the same general idea and chosen the same obvious name. The differences are the use of a ReSTful protocol, and more elaboration of the registration/name-management aspects. And of course that I haven't submitted it as an I-D :-)

My current draft is far too long, I agree; it describes not only the use of HTTP to retrieve requests (which is equivalent to Donovan Preston's idea), but also the interactions and headers needed to manage the tunnelled service. That latter is something that Lentzcner & Preston haven't addressed yet, I think.

patrickdlogan · on March 12, 2009

Not sure I buy the premise "Polling for updates is bad." Certainly reverse http and/or web hooks do not cover, for example, all the same cases as http/atom/atompub. I'd like to see people's guidance on when to consider one or the other.

tonyg · on March 17, 2009

Perhaps more accurately, polling for updates in an event-based network is suboptimal -- especially since we have all this lovely packet-switching machinery available for use! -- but it's not completely wrong. Polling an RSS feed is equivalent to (a shitty form of) queue replication, and (slightly less closely) to TCP retransmissions, where event notification is equivalent to message delivery and to TCP segment transfer. The two approaches are in a sense dual. You can construct a message-streaming system from a state-replication system, and you can construct a state-replication system from a message-streaming system. Of course this still doesn't address when one or the other should be used: for that you have to get into the different scenarios for message replication. RSS/Atom etc are great when latency doesn't matter and you are multicasting, or when recipients desire (relative) anonymity. The cacheability of the polling approach can also be valuable.

Neither pull nor push solves the problem that SUP tries to address. For that, a layer on top is required -- essentially an embedded message broker with configurable private/shared queues and bindings. One very promising approach, once the transport is sorted out (which is what ReverseHttp is trying for), is to transplant the AMQP model (objects and operations) into the new setting.

tjogin · on March 9, 2009

Neat, almost a bit magical. But what kind of common web development problems does this solve?

I'm sure there are a lot of contrived examples, but are there any good ones? Facebook and Twitter.

Ok, but are there any good and common ones?

apgwoz · on March 9, 2009

Maybe there are no "common" uses just yet because we do not have an easily implemented way to do it. When Ajax first " came out", many said the same things. "great but what do we use it for." now, can you imagine the present day web without it?

tptacek · on March 9, 2009

I remember almost exactly the opposite about AJAX (when the paper came out, not when the XmlHttpRequest object was introduced) --- people went ape about what they could use it for. AJAX style genuinely made new things possible. This (supposedly) just makes them cleaner.

DomesticMouse · on March 9, 2009

Are you questioning the relative merit of reverse http, or of the ability of the server to update the page?

I'm failing to see the advantage of reverse http over long polls, even though I am completely sold on the difference that having server push would make...

tptacek · on March 9, 2009

The relative merit.

tjogin · on March 9, 2009

You have a point. But I have to say that I didn't need any imagination to think of ways to use Ajax to enhance parts of my web app interfaces.

The only reason I hadn't use xmlhttprequests extensively before it became popular (and renamed Ajax) was because it was too much work, too error prone. The popularization of Ajax lead to solid frameworks and libraries, which fixed that.

andrewbadera · on March 9, 2009

Messaging/queuing/publishing-subscribing is required in an incredible number of complex applications, especially at the enterprise level. Consider, for instance, trading floor and backend systems.

tjogin · on March 9, 2009

True, but they rarely use web interfaces, do they?

DomesticMouse · on March 9, 2009

Think how different HN would be if the comment pages updated themselves in real time as people commented?

Or a bug tracking system. Or a newspaper. Or pretty much any site, really.

tjogin · on March 9, 2009

Good points!

axod · on March 9, 2009

In safari, the loading spinner never stops. Not a good thing.

I don't really see what this is trying to solve either.

tonyg · on March 9, 2009

Thanks for the pointer re: Safari. I'll fix that.

What this is doing is twofold: letting HTTP push be used consistently, with long-polling pushed out to the edges of the network, where it belongs; and making it less of a burden to spin up and shut down HTTP-based services.

The registration and management aspect -- enrollment, in short -- is to HTTP as DHCP is to IP, if you like. It lets you avoid the equivalent of manually assigning IP numbers.

apgwoz · on March 9, 2009

I believe Comet works around this by using iframes, though I could be wrong. Either way, ReverseHTTP is another way to not have to poll for resources, which might be useful for instance in real time chat.

axod · on March 9, 2009

It's all in fuzzy definitions, but this is using 'comet'.

Comet is generally used to refer to any method that can emulate a raw socket - iframes, xhr, inserting script tags, etc etc

All this is doing is proxying http from the server, to the browser and back. It's an interesting thing to try, but I can't see any real life use for it.

mbreese · on March 9, 2009

This is much lower level and doesn't require javascript to work. (Ignoring the fact that the demo is an in-browser javascript http server). So, it's not comet.

The problem is that this isn't supported by any clients that are in wide-spread use, so you can't just load up a webpage and see a demo.

The advantage of this type of approach only becomes clear when you are using a lower-level http client library to access resources. It gives the server a chance to poll the client for information, without using Javascript. For browser approaches, this may not matter. However, for lower-infrastructure things, this approach is great.

I've used a very similar technique to link compute nodes to a job server where the compute nodes were behind a NAT. This eliminated any long polling required and still allowed the server to query the nodes for their status.

Again, not the type of thing where you're running anything in a browser, but I wanted to use HTTP as the protocol for simplicity, and needed a way for the server to talk to a client behind a NAT.

Now the down side is that you basically have to rewrite a web server in order for this to work. I'm not sure if this could be bolted on. You also need some sort of session management built in, so you can pair incoming (client->server) requests and outgoing (server->client) requests. And then you need a client library that can spin up it's own http server and handle it's own requests.

In my case, I was able to write everything from scratch. But I doubt my code would scale very well. I'm also not sure that in this case it isn't better to just make a new protocol. There is a lot of hackery required to get this to work, and I doubt you'll see web browsers support anything like this.

the_me · on March 9, 2009

Don't get it. If you are interested in the low level communications, why wouldn't you simply use a socket and send your own application defined commands over port 80?

HTTP exists so that any browser can access any web server, it doesn't re-implement or otherwise allow the usage of TCP/IP.

As a corollary, I don't see why I need to know about your application's communication protocol, let alone adhere to it because it's now a standard.

mbreese · on March 9, 2009

We are talking about bi-directional communications between the client and server. Specifically, server initiated requests to the client. So the major issue that you can overcome with this would be overcoming some NAT/firewall issues.

This proposal would be to convert HTTP from being a client making requests to a server to (effectively) a server making / receiving requests from another server. So your browser would also be a (mini) server, handling requests from the main server.

This is largely for people that want to use HTTP as a message-passing protocol, but use it in a bi-directional manner between possibly NAT'd hosts.

tonyg · on March 10, 2009

"This is largely for people that want to use HTTP as a message-passing protocol, but use it in a bi-directional manner between possibly NAT'd hosts."

That is exactly it. You've got it.

HTTP makes an almost ideal message passing protocol: it has a rich and battle-tested addressing model; it is asymmetric in a helpful way (really! the response codes are similar to ICMP messages, where the requests are similar to IP datagrams); it is widely supported and deployed; it is content neutral.

It doesn't even have to be inefficient ;-) (http://www.lshift.net/blog/2009/02/27/streamlining-http)

int2e · on March 9, 2009

How about just using hanging gets?

DomesticMouse · on March 9, 2009

Maybe it makes some sense in server apis?

I don't know. I'm sold on using long polls in conjunction with actors...