Hacker News new | past | comments | ask | show | jobs | submit login
The HTTP QUERY Method specification (httpwg.org)
165 points by andyk on May 27, 2023 | hide | past | favorite | 95 comments



I see the need, and good write up, but just use this for the definition of GET body.

Nothing in the existing spec prevents GET from having a body, though there isn't currently a semantic meaning to it.

This would fit perfectly, be more compatible and result in a simpler spec and protocol.


Despite the spec, a lot of clients, load balancers, and server libraries can't handle GET with a body.


They won't support a new method either.


But it being an unsupported method will hopefully at least cause the middlebox to generate a 405 error, rather than undefined behavior.


Rather than a 400 error?


No, rather than the body being silently ignored in the caching (or even proxying!) logic of some middleboxes, because "since GET will never have a body, for efficiency, we won't bother to check for one."


Yet, given enough demand, and a explicit recommending nod from the spec, these will comply. It's not really as big as problem as everyone seems to think. And it's literally the same amount of work of adding a new verb.


Things might've changed since then, but back in 2009 it was Chromium who disabled bzip2 compression after some ISPs were borking bzip2-compressed pages[1] (although it's mot entirely clear if it was indeed the reason for dropping bzip2), not the other way around. Later in 2016 this was mentioned as being the reason for limiting brotli compression to HTTPS only[2].

[1]: https://bugs.chromium.org/p/chromium/issues/detail?id=14801 [2]: https://bugs.chromium.org/p/chromium/issues/detail?id=452335...

So while I agree that it would be nice if everyone respected a "living standard", my hopes for middleboxes to comply are not high.


Which is somewhat surprising given that it is common enough that I've come across it several times in the wild.

So much so that I added support for it to my own server and client libraries. Which means that adding support for QUERY will be trivial (yay!)

As an aside, I also support DELETE with a body.


Caching the body scares the hell out of me.

If the params for the search are so many or so big that they don't fit in a single url, how could you use that as a cache key?

Right now you can:

* Pass the arguments as parameters

* Pass them on the request body. I personally done it on apis for games in unity/ios/android for almost a decade). Other products like Elasticsearch count on that as part of the core product.

* Semantically create searches in the server with POST /search

In the previous two examples, you can return a redirect to the search results (like /searches/33) with perfect caching/indexing, and delegate to the server the cache algorithms.

With things like Vary, Etags, Conditional fetchs, Content-Encoding, Content-Type, Cache-Control, Expires that the spec barely grasp, adding a huge body is something that a cache server/cdn will not implement.

So again. What is this spec solving?


> If the params for the search are so many or so big that they don't fit in a single url, how could you use that as a cache key?

The way many caching systems work, by hashing the body and using the hash as the cache key.


Absolutely. I would not recommend using raw search terms as a cache key. Good way to a) leak cache data unintentionally if an attacker were to guess at other cache keys (given the cache keys were not namespaced well), and b) leak user search terms (and users often search for some weird stuff including passwords).


a) Unless you're caching only a single endpoint, which you almost never are, you'd need to have the URL or at least path be a part of the key too, so that solves the "stealing cache from another app/component" (also not having any namespacing is a bad idea regardless, even if using hashes)

b) Unless your cache keys are publicly listable, this is not a security issue. And from a privacy perspective, GET requests are usually cached by path+params, and since search queries are usually in params these days, again, nothing changes.

That's not to say you shouldn't use cryptographic hash functions for keys, just that nothing really changes with this new verb.


I've personally discovered a vulnerability due to a lack of namespacing, where token objects were cached using the token's raw value as the key. There was an API with a /whoami endpoint that returned the current token being used. What the API didn't expect was non-token objects to be read from cache, so if you used authn "Bearer users:1", the /whoami endpoint would respond with the user object of the user with ID 1. Redis is also commonly used for non-caching purposes, e.g. config, so this could've also leaked secrets.

Even if the token cache keys were properly namespaced, any cache key with a "token:" prefix would be readable, even if was used for other purposes than to store a token object. All that would be needed is the key suffix. The remediation of the vulnerability I found included proper cache key namespacing, as well as hashing with an HMAC (since tokens were being stored in plaintext).

So just sharing a real-world scenario where a lack of namespacing (and other caching mistakes) produced a vulnerability.


That's why they're caching the body of requests with a new method.

New and clearly distinct type of requests, new practices.

It isn't just the size of the request that makes people not want to put them in the query string, it's use of the query string over decades.


> If the params for the search are so many or so big that they don't fit in a single url, how could you use that as a cache key?

I think the horse has bolted here. With HTTP/2 (and often without), URLs can be _very_ long.


> how could you use that as a cache key

Hash


We'd have to change the behavior of browsers & server implementations to do this. Itvs much less risky, much more managable change, to do this with a new more deliberate difference. It'll make it clear that the new behavior is intended.


> We'd have to change the behavior

But you wouldn't for QUERY?

This is backwards compatible and in many cases will just work since GET with body is already syntactically valid.


> But you wouldn't for QUERY?

No, any good HTTP client or server allows unknown/new HTTP methods. If they are not recognized, they are treated as POST. This is a requirement of the HTTP spec.

I've already used QUERY in a few places, and it basically universally just works. Adding a request body to GET would practically be much harder to deploy and depend on.

A few years ago PATCH was added, and back then there was a bit more friction with some HTTP implementations only allowing a fixed set of methods, but this is mostly not true anymore.


> If they are not recognized, they are treated as POST. This is a requirement of the HTTP spec.

That's not true.

"An origin server SHOULD return the status code ... 501 (Not Implemented) if the method is unrecognized or not implemented by the origin server." https://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5....


When you use a framework, and add a route that uses QUERY you 'implement' it. My point is that when you do, it will all just work.


I think they mean we’d have to modify existing features. There are probably assumptions made in code and tests around how GET will be used. Instead of breaking those assumptions and potentially implementing new bugs in an old feature, you can use a completely new HTTP method with completely new code paths. Older feature remains unchanged.


QUERY is an explicit negotiating forwards. It would need support, but nothing with existing systems would change. No web page which accidentally tried to send a http bodied GET would start having new behavior.


middleboxes are free to drop the body of a GET request, and many do


Saw this mentioned by @dragonwriter in the discussion at https://news.ycombinator.com/item?id=36095032 but it seemed buried/easy to miss there. Also the article that thread is discussing is from 2021, whereas this was just published yesterday.


I seriously doubt the usefulness of this method. If you cannot fit the query into URL, because it has a complex structure, you are likely requesting a significant amount of work from server. In data-rich environments which constantly update that work will likely produce unique results every time the request is submitted, so caching etc may not make sense. To me, this is pretty much the same as making the request via POST. The very special use case where you execute complex queries against static data does not deserve such a change in protocol.


> If you cannot fit the query into URL, because it has a complex structure, you are likely requesting a significant amount of work from server.

But - are you asking the server to create/update a resource (POST and PATCH semantics), or are you asking the server to provide you a _view_ of a resource? I'm thinking of query languages like GraphQL and jq here, where a client is making a potentially complex request for a view of the data returned in a structure which is most efficient for them.

The client isn't intending to _modify_ that data in any way, but it is asking for a potentially computationally expensive view of that data.

The way we've done it so far is to either: 1. Cram that complex filtering and view-building into the GET request URI (brittle, not guaranteed URI will function correctly past ~2,000 characters, not clearly defined maximum limit to request URI length in specs) 2. Send the query in a POST request - which works, and is almost what we want, but semantically feels a bit icky, because a query is more cacheable than your average POST, see below:

> caching etc may not make sense.

Perhaps you have a large and complex but static dataset _which will never change_ which nevertheless has many ways of filtering it -- all of which would benefit greatly from e.g. CDN caching.

I'm in two minds here. I half agree with you that POST works fine as the wording of the RFC spec is general enough that what we want to do with QUERY works with it fine as-is. But then I also see how _de-facto_ REST API design conventions have worked against this and treat POST as "create a new resource everytime", and trying to stuff endpoints that go "filter and restructure the responses to my specification" alongside feels.... wrong.

(There is of course the argument that you shouldn't be mixing and matching 'strictly' 'RESTful' patterns with other patterns like the GraphQL one which does away with some of the REST HTTP method semantics in favour of defining the operation type in the request data itself, and of course both of these things are patterns which happen to use HTTP and don't need to influence HTTP spec itself...... But still.... I really want that QUERY lol)


POST request does not have to go to a queried resource. Instead it can go to /search/resource and create new instance of search. Whether a server decides to persist it or not, will be an implementation detail.

There’s also an option to use the following semantics (with a number of nice bonuses):

  POST /resource/views
  {some definition of response structure and allowed parameters}

  GET /resource?view=…&params=…
The whole idea of GraphQL revolves around the possibility of client to query the whole graph of the persistent model. This idea itself is questionable for many reasons - coupling, security etc. In many cases where I have seen it, classic REST with HAL (client-server) or plain old SQL (server-server) would be more than enough.


Requiring every proxy and web server to implement their own cache hashing algorithm, especially one that should ignore encoding-specific "non-consequential" parts, sounds like a monumentally bad idea.


The part where a cache SHOULD have “knowledge of the semantics of the content itself” in combination with “normalization is performed solely for the purpose of generating a cache key; it does not change the request itself” is the scary part.

It may sound cool and efficient on paper, just trim the whitespace and sort all json dictionaries right? But in practice it adds too much complexity, eventually implementations of this semantics will start to drift between cache and real backend. Case in point: SAML XML signatures.

This is how one creates a cache poisoning vulnerability. If a request is normalized as a cache key, use the normalized request when sending to the backend also. If you don’t trust that process you shouldn’t trust it as the cache key either.

Proxies should be dumb, just hash the raw string for the cache key.


This. Plus it is a good idea to specify the minimal recommended hash algorithm to have some manageable expectations on collisions. "The cache key collision rate is guaranteed to be not worse than SHA-256".


The cache is local to the proxy or web server, it doesn't matter what the hashing algorithm is as long as the cache accurately returns cached results given the same inputs. The semantic meaning of "input" is different for if it's a proxy or if it's the origin. The origin web server could very well cache based on the result of post processing and validation of the input, while a proxy should cache based on a much more strict (exact series of bytes) interpretation of the input.

This is no different than how any other caching proxy is expected to operate given a set of inputs. It's never been up to the proxy to interpret if the queries "name=joe%20user" and "first=joe&last=user" are the same, it just passes the input along to its upstream and then locally stores the result, assuming that the same input will occur again and save a trip to upstream.


You are assuming that proxies will correctly determine which content does not matter. From what I've seen, what will most likely happen is that we will be spending countless hours just because some box is sometimes returning wrong content, because it decides that the request is "the same".

I don't mind caching, but please make it deterministic.


I am assuming that dealing with how and what proxies cache is a long standing potential issue that anything here does not change at all. A caching proxy could currently use a truncated path on a GET request to build a cache key and it would not be caching the correct data. Section 2.1 in its entirety, along with the defined meanings of MAY, MUST and SHOULD, tell caches how to operate. A caching proxy not caching and returning the right content because it does not take into account the necessary data when determining the cache key is broken, and semantic understanding of the request is a local concern so any non-local cache needs to acknowledge that it doesn't have semantic understanding and use the byte array that is the body as a input to determine a cache key.


you're papering over the important details

namely, urls are finite, bodies are infinite


So? Just deal with it however one would deal with unruly POST requests, slow-walked multi-part, and other protocol abuse. No matter what, you need protection against bad actors trying to get the servers to do bad things.


i'm not sure how malicious requests are relevant to this conversation

specifically, a URL can basically always be fully read and cached in memory in a server, a request body cannot


I was wondering which versions of HTTP will this be added to

Thinking, will it be possible to send HTTP/1.1 QUERY requests? HTTP/2 QUERY requests?

Or will it be for HTTP/3 or something even higher?

Well the examples given in the document seems to indicate that it will be possible to use with HTTP/1.1 even

    QUERY /contacts HTTP/1.1
    Host: example.org
    Content-Type: example/query
    Accept: text/csv

    select surname, givenname, email limit 10


Any good HTTP implementation will support any HTTP method. If a HTTP method is not recognized, it will basically be treated as if it's a POST method.


Where do you get this assertion from that methods are post by default? Most http server implementations will reply with a 405 error for an unexpected method.


Servers that just serve static files: yes you are correct. I'm talking about application servers, proxies, etc. You need to decide to do something with a method, but all libraries, servers and clients are ready to handle them and treat unknown methods as unsafe and idempotent (like POST).

If you read my comment as 'QUERY will automatically become POST', that's not what I meant.


> Any good HTTP implementation will support any HTTP method.

By "support", do you mean "will not crash on encountering an unexpected verb"?

WEBDAV and CALDAV rely on a slew of new HTTP verbs; but IME only specialized servers actually support these verbs, in the sense of knowing what to do with them. There's an addon module for Apache that supports WEBDAV, but the logic to support CALDAV is insanely complicated, and nearly all implementations rely on a real database server.


I'm one of the few people who's actually implemented a full CalDAV and WebDAV stack (called SabreDAV). I've started working on this in 2006, and even back then it worked with stock Apache and Nginx, and PHP's REQUEST_METHOD would have the appropriate methods.

You are right, code is needed to 'do something' with the method, but webservers, clients, application frameworks and proxies typically don't need to change for someone to start handling (for example) PROPFIND requests.

The same is true for QUERY. In most (but not all) cases you can already start using it today.


Ah, Hi, Evert! I know you from back then!

I tried to build a CALDAV stack in PHP. But I didn't want to use a database; I wanted all resources to be files. I made something that worked, and passed most tests; but the RFCs were breeding faster than I could keep up. I wasn't trying to make a product, I just wanted a "simple" calendar server for my own use, and I didn't have the time or inclination. I gave up, and switched to Radicale.

Last week I deployed Baikal; it seems very nice. Well done!


SQL-like queries in the request body seem like a bad idea, what are the security implications and how do we protect against it?

Or will the QUERY method end up with the same fate as GraphQL - wherein it's more effective and "secure" in a server-to-server setup and the client only deals with REST.


QUERY has nothing to do with the actual contents of the request body beyond "it is hashable text". That could be JSON, GraphQL, weird SQL cousins, or anything else.


Yeah, figured that out in thanks to another thread on here. I got confused as all the examples were primarily sql ones.


How is that less secure than a REST API Frontend to an SQL database, like PHPMyAdmin?

I don’t think anyone suggests we all open our databases to the web; but if you choose to do so, or if you happen to work on a modern database, like Elasticsearch or CouchDB, which accept queries via HTTP, now there’s a better way to implement queries in regard to caching.

That being said: I’ve been wondering for a long time what a backend API could look like that used SQL instead of JSON as the query format - not to pass it to the database verbatim, but with an application layer that speaks SQL, applies business logic, queries the database, and responds in SQL. That would save a lot of reinvented wheels in terms of filtering, ordering, joining, and so on, and give developers a query language they already know. And suddenly, having a QUERY method available sounds useful, too :)


It's just an example, not a production use case.


Yeah, I mean, I get that but eventually? no?


Why eventually? It’s a sad reality for quite a few of us here right now and has been for quite some time


> Unlike POST, however, the method is explicitly safe and idempotent, allowing functions like caching and automatic retries to operate.

And just like with POST, whether or not this is actually the case in a given API, depends entirely on the server-side implementation.

Look, I get it. We want to make rules. Rules are good. Rules define things.

But in a world where so many "REST APIs" are actually "RESTful" or REST-ish, or "actually about as REST as a Pelican, but we really liked the sound of the acronym", I wonder if adding one more rule to the pile is really going to substantially change anything.

A majority of APIs don't even use all of the existing HTTP verbs, or HTTP response codes for that matter. And every API is free to make up their own rules. I had the dubious pleasure of consuming APIs that required GET with both a body and urlparams, and which returned 200 on error, but with an `{error: {...}}` object in the body. The crown jewel so far, was an authentication system that had me send credentials as multipart/form-data, with a PUT request (because you inPUT the credentials, see? Not a joke, that was the rationale given to me by the dev who made it.)


If you control both ends of the pipe you are free to use the methods and conventions and maybe benefit when something gets cached.

That you cannot rely on consistency is not a reason to have none at all.

Destructive actions can and are also hidden behind a GET. When such an endpoint gets called indiscriminately by the rest of the internet it usually becomes the faulty implementation's problem.

Presumably the same would be true for QUERY.


If I control both ends. And if the server side is not a legacy application that no resources get expended on to fix something that isn't broken.

Those are 2 very big if's.

> That you cannot rely on consistency is not a reason to have none at all.

I want consistency. I stated as much in my post: "Rules are good." I'm just not convinced adding more rules to a collection of rules that already isn't consistently applied in the wild, will give me more consistency.


> A majority of APIs don't even use all of the existing HTTP verbs

That is because there are essentially only 3 verbs in HTTP: GET, POST, and OPTIONS (needed for CORS). All the others are (from a protocol POV) rebranded POST or very specialized (like TRACE or HEAD).

This new method in particular is meant to be useful for proxies, so that they can cache more without risking breaking old stuff


People try to apply rules. REST may sometimes make sense, but much of the time it’s probably simplest and most consistent to JUP - Just Use POST.


I get the 200: Error.

Semantically, it's an error that the frontend can handle and shouldn't show up as "unexpected error" in network logs. Many people treat HTTP error codes as exceptions / real infrastructure problems.

I mean, we build rich clients. You can't simply auth with a user/password header when you get a 401/403 and your browser shows an alert. It's not 2003 anymore.


> Semantically, it's an error that the frontend can handle

No it isn't. Semantically `200 OK` means a request was successful. That isn't me saying that, that's the RFC:

https://datatracker.ietf.org/doc/html/rfc1945#section-9.2

The semantically correct message to signal to the frontend that the error originated on its side, and thus has to be fixed by it, is a 4xx response, period.

https://datatracker.ietf.org/doc/html/rfc1945#section-9.4

> Many people treat HTTP error codes as exceptions / real infrastructure problems.

Doesn't change the fact that they aren't exceptional. A != 2xx response code is not an exceptional occurrence. As soon as I talk to a networked process, especially one that I don't control, I am talking to unreliability, and being able to deal with that is normal program operation, not an exception.

> I mean, we build rich clients. You can't simply auth with a user/password header when you get a 401/403 and your browser shows an alert. It's not 2003 anymore.

So?

What exactly does the auth method I'm using have to do with the server using semantically correct return codes?

Luckily, I also consume a lot of good APIs in my code. Guess what they send back when my client messes up something during OAuth. Hint: It's not `HTTP/200 OK`


Is this a joke?


I can't wait until this becomes a spec. I wrote a small middleware to cache sql and graphql queries and I implemented QUERY and Cache-Control. Worked great and saved a ton of bandwidth for developing and running reports, as I didn't have to worry about caching progress. I just reran the whole job. It was like <50 lines of python to pull it off.


What I fail to understand is how the server interprets the query content/body; how does the server "apply" the "sql" query in the request to the resource? is there something similar to a gql resolver that you need to write?


It's agnostic, the QUERY verb has nothing to do with the actual implementation, or the content encoding. You can use any content encoding for your query body, and you can resolve it any way you see fit.

In mine, I was just caching raw sql queries, so it was literally just text/sql encoding, the query in the body, some metadata in headers, and a sqlalchemy engine to execute the query.

It's basically a way to get around the non-idempotency of POST and URI limitations of GET.


Aha, got it. Thanks for the explanation.

So, the request content/body can have "non-sql-like" queries? can it be GraphQL? or even plain English? - of course, assuming that the server knows how to resolve the query.


Yep. This is valid

    QUERY /graphql
    Content-Type: text/graphql
    {too lazy to write valid gql}
so is this

    QUERY /elasticsearch
    Content-Type: application/json
    {"name": "alice"}
Even this

    QUERY /my-ai-image-gen
    Content-Type: text/plain
    Draw me a picture of a cat
It's entirely up to the software to decide how to handle the request.


Great, thank you!


I'd love to see JSON-RPC overhauled with this method. It's quite a pain to cache idempotent methods exposed over JSON-RPC as-is currently.


previously it was titled SEARCH. I like query better personally, as it kind of aligns with SQL like requests.

Here's a great post on it- https://httptoolkit.com/blog/http-search-method/ and earlier HN thread- https://news.ycombinator.com/item?id=36095032


It is a little ironic that a new HTTP method called "QUERY" is being created largely to be able to remove the query from the URL.


This seems like it would introduce a tremendous amount of work to solve a problem that basically does not exist. You can just handle your POST request idempotently. We should just live with the semantics we have.


Idempotency and safety is about how to send POST requests, rather than how to handle them.

You can't pre-fetch a POST request, or re-try after a timeout. Because you have to consider that the POST request could have unintended consequences if sent too many times.


That looks more of a documentation problem, rather than an HTTP problem.

You just document that that POST endpoint doesn't actually modify data.

A great example of this are OpenAI completions.


did you even read the article? it describes why this is a bad idea — it’s not just your server that’s handling the request, it’s all the other middleboxes in between.


It's definitely not a "tremendous" amount of work. At the bare minimum, you copy your POST code and change the verb, that's it. There's no need to cache, so you actually can handle it basically as idempotent POST.

> You can just handle your POST request idempotently

You actually can't though, and you don't usually have to look far to run into a case where this gives the wrong behavior. Better to just have a separate verb.


With POST, you can't know at a system level if it's safe to cache the response or not.


Cache-Control response headers?


Cache-Control and Content -Location.


insufficient



> A cached POST response can be reused to satisfy a later GET or HEAD request. In contrast, a POST request cannot be satisfied by a cached POST response because POST is potentially unsafe; see Section 4 of [CACHING].

So if you are using POST to query, you can't cache the response. You have to resort to POST/GET. With QUERY, you have idempotent requests with cacheable responses you can directly return.


HTTP defines the POST method explicitly as non-idempotent


Love it :)


This would be fun to play around with


While you *can* do the same with a GET (include data in the body), it's not spec-compliant for servers to parse and interpret this data.

https://stackoverflow.com/questions/978061/http-get-with-req...

When designing an API, and if spec compliance is not key, I wonder if client-compliance would become the issue (clients refusing to emit a GET body).


> it's not spec-compliant for servers to parse and interpret this data.

That's wrong. A lot of people just don't understand the difference between SHOULD and MUST when reading standards. The standard just says that you shouldn't rely on servers accepting it unless they tell you.


Parsing the body of GET can be added as an extension, same as adding this new QUERY method.


Many “WAF” (Web-Application Firewalls) and reverse-proxies are configured to block “unusual” traffic though, including GET-with-body - but I feel that this approach is like how (around 2000-2006) everyone switched from high-performance or legacy binary protocols to XML/SOAP-over-HTTPS to avoid corporate firewall headaches.


There will most likely be problems with such web-application firewalls anyway, since those same firewalls will probably reject HTTP methods that they don't know about.

But adding a new new method is probably overall better and matches people's understanding of the implementation and interpretation of GET, even if (with extensions) GET can have a body people don't think of it like that. So a new method with defined semantics and interpretation avoids a whole bunch of sideshow debate about if GET with a body is possible or appropriate.


It's not up to debate, GET does allow a body.


You can make all noise you want about how it's not up for debate because of what the standard says, and said noise does not avoid sideshow bs debates about the how it is used and restricted/limited in practice. A new method avoids that because a new method comes with exactly zero historical baggage.


A new method does not give any advantage over extending an existing method, it's going to have to use the same code anyway.


GET allows, but does not require, server implementations to read/parse a body

sending a body with a GET request does not suggest the recipient server will receive that body, in general


middleboxes are free to drop the body of any GET request, and generally do so


> middleboxes are free to drop the body of any GET request, and generally do so

Most middlebox will equally drop any HTTP verb which is not whitelisted.

Even the extension of WebDAV, which are 15 years old are still commonly blocked.


blocking a request is fine, the request is stopped, the client gets an error

my point is that a middlebox that receives a GET request with a body may drop the body from the request and still forward the request onwards, without the body

these are categorically different outcomes




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: