> would it be feasible to include the compression dictionary along w/ the data
In theoretical terms (in particular Kolmogorov-complexity perspective), I believe the inclusion of a dictionary wouldn't help, it would strictly increase size. It can help (specially from a Shannon information perspective) if the compression is too computationally demanding to build the data model (which is kind of what all compressors do); but it almost certainly doesn't help in the case of something as small as an URL. But then it's possible to just pack a common compressor with the website (and just refer to it with a version code).
I think the web in general would benefit greatly from some kind of standard compression libraries. Compression tends to have a significant cold start cost to work well, if that were mitigated certainly something like an url could be far shorter.
I believe HTTP2 has header compression (RFC 7541)[1] with 'static tables', which seems to be a form of dictionary, but surprisingly to me no html compression with such shared dictionaries. There is caching libraries (unfortunately cross-domain caching seems to be deprecated[2] due to security concerns) I believe which helps in other ways, but I think a true dictionary-style compressor would bring huge bandwidth savings.
Major web entities should get together to develop those standard dictionaries (and which algorithms to use them with), so they are reasonably fair for everyone (maybe even language specific dictionaries should be used).
Security seems indeed a concern, but I think the key would be carefully preventing that a content were decoded with the incorrect dictionary (meaning the server could serve a different web page than the standard decompression) -- but overall it doesn't seem like a big issue (similar to cache, less problematic than cross-domain caching).
In the future maybe even one of those fancy neural network (or otherwise machine learning inspired) methods could be an option, specially useful for highly bandwidth constrained like rural or satellite internet (although of course performance is always a priority with mobile devices being major users).
> I believe HTTP2 has header compression (RFC 7541)[1] with 'static tables', which seems to be a form of dictionary, but surprisingly to me no html compression with such shared dictionaries.
To my understanding, that static table in HTTP2 is directly in context of recommending and discussing Brotli compression, which does use a table like that as a standard static dictionary in HTTP2+ scenarios including other static dictionary inclusions derived from a corpus of HTML documents.
Thanks, I didn't know Brotli! Indeed it seems to include a dictionary in its compression which seems to help significantly with small pages. I hope compression continues to improve in this way.
> To my understanding, that static table in HTTP2 is directly in context of recommending and discussing Brotli compression
It seems separate, that would be a header compression for HTTP2 header itself, while Brotli encodes content (usually html, css or js).
The header compression in http2 has atypical requirements placed upon it: contents of different headers in a single request must not impact each other's compressed size[^], lest an attacker able to manipulate one of them can use that to guess the other. Thus, LZ77-style compression is out of the window, as well as using Huffman codes selected based on _all_ the header values. In essence, each header value (in a single request) has to be compressed independently. IIUC the dictionary you reference is used for header keys (as opposed to values).
[^] this statement is a bit of a lie for reasons of simplification
In theoretical terms (in particular Kolmogorov-complexity perspective), I believe the inclusion of a dictionary wouldn't help, it would strictly increase size. It can help (specially from a Shannon information perspective) if the compression is too computationally demanding to build the data model (which is kind of what all compressors do); but it almost certainly doesn't help in the case of something as small as an URL. But then it's possible to just pack a common compressor with the website (and just refer to it with a version code).
I think the web in general would benefit greatly from some kind of standard compression libraries. Compression tends to have a significant cold start cost to work well, if that were mitigated certainly something like an url could be far shorter.
I believe HTTP2 has header compression (RFC 7541)[1] with 'static tables', which seems to be a form of dictionary, but surprisingly to me no html compression with such shared dictionaries. There is caching libraries (unfortunately cross-domain caching seems to be deprecated[2] due to security concerns) I believe which helps in other ways, but I think a true dictionary-style compressor would bring huge bandwidth savings.
Major web entities should get together to develop those standard dictionaries (and which algorithms to use them with), so they are reasonably fair for everyone (maybe even language specific dictionaries should be used).
Security seems indeed a concern, but I think the key would be carefully preventing that a content were decoded with the incorrect dictionary (meaning the server could serve a different web page than the standard decompression) -- but overall it doesn't seem like a big issue (similar to cache, less problematic than cross-domain caching).
In the future maybe even one of those fancy neural network (or otherwise machine learning inspired) methods could be an option, specially useful for highly bandwidth constrained like rural or satellite internet (although of course performance is always a priority with mobile devices being major users).
[1] https://httpwg.org/specs/rfc7541.html#static.table.definitio...
[2] https://www.stefanjudis.com/notes/say-goodbye-to-resource-ca... Is this still accurate?