HTTP is a huge, hefty, inefficient and complex protocol whose only advantage is ...

icebraining · on Aug 22, 2012

HTTP implements an architectural style which ensures reliability, scalability, decoupling of systems and support for hypermedia for a complex network of disparate, unreliable systems and networks.

Do you have any suggestion that provides the same features, or should we forgo them because HTTP is "hefty"?

flatline3 · on Aug 22, 2012

> Do you have any suggestion that provides the same features

Message passing.

That's all HTTP really is, but it's dressed up in a bunch of historical complexity and inefficiency centered around supporting web browsers.

icebraining · on Aug 22, 2012

But do you have any concrete suggestions of protocols, or are you criticizing the choice based on an hypothetical protocol that would be very similar but incompatible with HTTP and all its existing tools (millions of tested and deployed caching servers, load balancers, etc), and for which whole new libraries would have to be written, just so you can make it somewhat more efficient?

flatline3 · on Aug 23, 2012

I think you're grossly over-estimating the difficulty of defining a protocol. It's no more difficult than defining the protocol for which you'll use HTTP as transport.

Load balancers know how to load balance straight TCP. HTTP caching servers are an HTTP-centric idea.

The 'libraries' you'll need can be much, much smaller when all you need is a bit of framing and serialization, instead of a complete complex RFC compliant HTTP client stack.

icebraining · on Aug 23, 2012

I think you're grossly over-estimating the difficulty of defining a protocol.

It's not writing the protocol that I find the most difficult. It's reimplementing everything the uses the protocol.

Load balancers know how to load balance straight TCP.

Which is only useful if all the nodes are exactly the same, but that prevents you from distributing the data across them based on the user profiles, and then load balance according to the user id, as (if I'm not mistaken) Netflix does. Since they're using subdomains as user identifiers, you'd get that for free using an existing, well-tested HTTP load balancer.

HTTP caching servers are an HTTP-centric idea.

That's a tautology. The question is: are they a useful idea? Is being able to take advantage of existing and deployed solutions like CDNs useful? Seems to me like it would be.

The 'libraries' you'll need can be much, much smaller when all you need is a bit of framing and serialization, instead of a complete complex RFC compliant HTTP client stack.

I think you underestimate the advantages that some of the core HTTP concepts provide.

flatline3 · on Aug 23, 2012

> Which is only useful if all the nodes are exactly the same, but that prevents you from distributing the data across them based on the user profiles, and then load balance according to the user id, as (if I'm not mistaken) Netflix does. Since they're using subdomains as user identifiers, you'd get that for free using an existing, well-tested HTTP load balancer.

I'm not sure what you think makes that complicated to implement without HTTP, or why you consider it 'free'. Netflix had to write custom code to support that, and could have just as easily done so on top of a message passing architecture ala ZeroMQ or even AMQP.

> That's a tautology. The question is: are they a useful idea? Is being able to take advantage of existing and deployed solutions like CDNs useful? Seems to me like it would be.

Not really, no -- neither a tautology nor are they particularly useful for API implementation. Their primary value is in caching resources for HTTP requests in a way that meshes well with the complexity of HTTP.

If you need geographically distributed resource distribution than HTTP may be a good idea simply because:

- There's widespread standardized support for HTTP resource distribution.

- Its inefficiencies are easily outweighed by the simple transit costs of a large file transfer.

We're largely talking about server "API", however.

> I think you underestimate the advantages that some of the core HTTP concepts provide.

No, the core concepts are more-or-less fine. It's the stack that's inefficient and grossly complex, largely due to browser constraints and historical limitations.

icebraining · on Aug 23, 2012

I'm not sure what you think makes that complicated to implement without HTTP, or why you consider it 'free'. Netflix had to write custom code to support that, and could have just as easily done so on top of a message passing architecture ala ZeroMQ or even AMQP.

It's free because it already exists. Load balancers for hypothetical protocols don't.

Not really, no -- neither a tautology nor are they particularly useful for API implementation. Their primary value is in caching resources for HTTP requests in a way that meshes well with the complexity of HTTP.

If you need geographically distributed resource distribution than HTTP may be a good idea simply because:

- There's widespread standardized support for HTTP resource distribution.

- Its inefficiencies are easily outweighed by the simple transit costs of a large file transfer.

We're largely talking about server "API", however.

Isn't the whole point of this system to transfer people's content - posts, pictures, videos, etc - between servers? I would think pure API "calls" would be a small part of the whole traffic.

No, the core concepts are more-or-less fine. It's the stack that's inefficient and grossly complex, largely due to browser constraints and historical limitations.

But to implement them, you need more than "a bit of framing and serialization".

flatline3 · on Aug 23, 2012

> But to implement them, you need more than "a bit of framing and serialization".

I posit you're still grossly overestimating complexity based on your own experience with HTTP, coupled with grossly underestimating the complexity, time costs, and efficiency costs of the stack HTTP weds you to.

A TCP stream is simple. It's as simple as it gets. Load balancing it requires a few hundred lines of code, at a maximum. It only gets complicated when you start layering on a protocol stack that is targeted at web browsers, grew over the past 20 years, requires all sorts of hoop-jumping for efficiency (keep-alive, websockets, long-polling), requires a slew of text parsing and escaping (percent-escapes, URL encoding, base64 HTTP basic auth, OAuth, ...), cookies, MIME parsing/encoding, so on and so forth.

All this complexity is targeted at web browsers, introduces significant inefficiencies, and requires huge libraries to make it accessible to application/server engineers.

What's the gain? Nothing other than familiarity, as evidenced by your belief that the core of what HTTP provides is so incredibly complicated, and you couldn't possibly replace it.

No -- it's the complexity of HTTP that's complicated, not the concepts that underly it. Drop the HTTP legacy and things get a heckuvalot simpler.

icebraining · on Aug 23, 2012

It's not so much that you can't replace HTTP, as that you can't replace all the thousands of tools and packages that already work with HTTP, and that can be very useful for a project like this. And you can't easily replace the knowledge that people have to HTTP either. (Claiming that OAuth is part of HTTP doesn't help with your credibility either, I'm afraid.)

Furthermore, I think that even if the developers of this project could replace the required tools and forgo the rest, I doubt it'd make sense.

Frankly, you'd need a working prototype to convince me of the contrary, so I guess we'll have to leave it at that. I'm a stubborn man ;)

phillmv · on Aug 23, 2012

Yes, but who cares? You're just sending json down the wire. Fuck it.

lmm · on Aug 23, 2012

If you really want a particular concrete protocol to compare to, how about AMQP?

icebraining · on Aug 24, 2012

While iMatix was the original designer of AMQP and has invested hugely in that protocol, we believe it is fundamentally flawed, and unfixable. It is too complex and the barriers to participation are massive. We do not believe that it's in the best interest of our customers and users to invest further in AMQP. Specifically, iMatix will be stepping out of the AMQP workgroup and will not be supporting AMQP/1.0 when that emerges, if it ever emerges.

By the way, the AMQP spec is roughly the same size as the HTTP spec, and the latter spends a lot of pages listing just status codes.

And of course, AMQP uses a model based on Sessions, which is great if the components of the system are static, but not that great if you're talking to a lot of nodes who come and go, since you'll end up with uneven load distribution on your servers. Regardless of HTTP as a particular implementation, I think statelessness makes perfect sense in a unreliable network of nodes.

TazeTSchnitzel · on Aug 22, 2012

WebSockets also have the advantage that they pass through corporate firewalls and open wifi networks, as well as many proxies, as they masquerade as HTTP traffic. And a nicer frames mechanic than raw socket, something I love. (nowhere near as low-level, but for me it's essentially stateful UDP that's reliable, i.e. TCP except with datagrams)

flatline3 · on Aug 22, 2012

Corporate firewalls are the general boogieman, but in reality, I haven't seen evidence that they're much more than that.

To test this, we implemented fallback-to-HTTPS behavior in a very widely used previously non-HTTP client. We then observed the number of clients that failed to connect via our custom protocol, but succeeded in falling back to HTTPS.

The numbers were negligible.

It's ridiculous that we'd seriously believe that we can't trust that TCP works on the internet. We joke about it being the "interweb", but I see no reason to sow fear, uncertainty, and doubt, and thus and actually turn the interweb into reality.

TazeTSchnitzel · on Aug 22, 2012

Perhaps, but open wifi often only allows 443 and 80.

flatline3 · on Aug 22, 2012

That also breaks IMAP(S), SMTP(S), Jabber, AIM, and a slew of other applications.

I don't see that we should model the internet architecture on bad technical choices made on a limited number of open wifi networks.

Or, we just frame our standard protocol over websockets as an (unfortunate) fallback, if it ever is revealed to be a real problem.

ynniv · on Aug 22, 2012

Port numbers are not protocols.

cdcarter · on Aug 23, 2012

Yes, but many open wifi hotspots at commercial institutions only have 80 and 443 open.

flatline3 · on Aug 23, 2012

I believe his point is that you can generally carry whatever protocol you want over port 443 (and often port 80).

Given how many other things are broken by networks that foolishly only open port 80 and 443, and their (in my experience) relative rarity, I'd suggest that it's not worth bothering with, except possibly as a fall-back to measure the actual number of people trying to use your service behind such a network.

gizmo686 · on Aug 23, 2012

Who says port 80 has to be http?

cturner · on Aug 23, 2012

I think 443 will be a better example because (I think) it's harder for a middle party to profile https and see that it is indeed https and not something else.

gizmo686 · on Aug 23, 2012

I don't think so, even with https, you still have the handshake which middle parties can see.

fusiongyro · on Aug 22, 2012

> HTTP is a huge, hefty, inefficient and complex protocol

It's starting to sound like you've never used CORBA.

flatline3 · on Aug 22, 2012

No, I just have worse things to say about CORBA and the notion of distributed objects in general.

darkhorn · on Aug 22, 2012

You can use DSNP but I cannot because shared hosting do not provide that much support. I wish everybody could use it.