More

ciprian_craciun · on Sept 6, 2022

[the author here] Indeed I mention Cloudflare quite a lot, and that is for two reasons:

(A) If you do want to have a performant website (be it static or dynamic) you do need a CDN with edge caching; else the physics with it's insistence on a cap for the light speed will put a minimum theoretical latency of 25ms say between Bucharest and Toronto, which in practical terms translates towards at least 100ms one way, or 300ms just to establish a TCP connection, and around 500 for a complete TLS handshake... And no amount of bandwidth can solve that.

(B) Cloudflare is the only free choice that allows you to use your own domain and that doesn't impose any (published) limits on your usage. If you know of other CDN providers that give you this for free please let me know and I'll add their link to the article.

ciprian_craciun · on Sept 6, 2022

You could just use GoatCounter <https://www.goatcounter.com/>; it's simple to deploy it yourself (it's open-source), but it's also free to use it hosted, and it seems to be quite privacy friendly (if one could call a web analytics solution as such).

If you are already using CloudFlare, they have a (free and) simple enough web analytics solution. (They say they are also privacy friendly, but given it's not open-source you can't check.)

As with static hosts that provide HTTP logs, perhaps they are, but certainly they won't be for free, as log shipping (from the edge where most hosted solutions cache things to the aggregator) and then storing those isn't quite that cheap...

schemescape · on Sept 6, 2022

I forgot to specify my goal when asking about HTTP logs: I want to get some basic analytics without adding anything to my HTML.

I love Goat Counter, but I still have to add a tracker (a privacy friendly one, but a tracker nonetheless).

Cloudflare looks like it might actually provide what I'm looking for, but the price is a bit too high for my low traffic hobby projects (looks like it starts at $20/month). I could just host my site on a VPS for cheaper than that (although I'd prefer some sort of managed/hosted offering under $10/month, if I can find one).

ciprian_craciun · on Sept 6, 2022

No, with Cloudflare you can use for free the "web analytics" product; you can even use it for any site, not only those hosted at CloudFlare. (Perhaps what you've seen is the "cache analytics" or "traffic analytics" which indeed are payed.)

(I use it "web analytics" on my own site thus I can confirm that it's free.)

schemescape · on Sept 6, 2022

Their site indicates that the free tier requires adding a JavaScript beacon (which is what I’m trying to avoid).

Is that what you’re using? Or did I misread their offering (which seemed to clearly indicate a paid plan was required for the non-JavaScript analytics)?

See at the bottom how it says “sign up for a paid plan” in the section that says “there is no code to add”:

https://www.cloudflare.com/web-analytics/

Edit: Oh, you're the OP! Just checked out your site and you are indeed using their JavaScript beacon. I know I'm being unreasonable, but in my case, I'm going for no JavaScript, and no trackers (so some sort of request logging on the server or proxy side is likely my only option). It's my attempt to "be the change you want to see in the world".

ciprian_craciun · on Sept 6, 2022

Yes, the Cloudflare web-analytics does require JavaScript.

With regard to no-JavaScript sites I understand; and for example the only JS on my site is the one for GoatCounter and Cloudflare web-analytics. GoatCounter does fallback nicely in case of no-JavaScript, meanwhile Cloudflare just doesn't work.

However, with regard to Cloudflare web-analytics, given that I already serve my site through them, there is no privacy lost by also using their analytics.

GoatCounter is good for assessing a long-term picture regarding readership, meanwhile CloudFlare is good for assessing performance issues.

Also note that if one registers a domain with Google for their webmaster tools (previously called Search Console?), although one doesn't use any of the Google services on the site, just because your visitors are using Chrome you can get performance metrics that way... So that's that about privacy... :)

ciprian_craciun · on Sept 6, 2022

> Worth noting that the incoming younger generations can't and don't into file systems. [...] We are probably the last generation who can be ubiquitously assumed to have an understanding of files and folders/directories in a computer.

[the author here] The article is not about misunderstanding file-systems.

As I've commented earlier on a different thread, <https://news.ycombinator.com/item?id=32733825>, the problem is that when it comes to performance and jumping through the hoops the internet gatekeepers (i.e. the search engines) have set out, just dropping some files somewhere the web-server can pick up isn't enough; there are additional tasks that can't be easily performed with files-on-the-disk approach.

Dalewyn · on Sept 6, 2022

Not that I'm interested in arguing against your intent, but the second and third paragraphs clearly lay out the article is about the younger generations not knowing what a file system is.

I linked it here because, with regards to "already standardized the hosting side of static sites by using the file system", it means squat if the webmaster-to-be doesn't know what a file system is. There's no preconception of using files to hold and serve data from if you don't know (and possibly don't care) about file systems.

If we're going to be discussing attempts at changing the paradigm, then it stands to reason to understand the environment at large that might be encouraging it. If we're going to be talking about something replacing file systems for data storage, it's pertinent to know first of all that the younger generations give zero bits about files.

Also, SEO is at best a tangent to data storage and serving. Some webmasters might not even care to bother with SEO if they don't care about search engine rankings.

ciprian_craciun · on Sept 6, 2022

> Strong disagree. You do not want both URLs to return the same content; you can have alternatives redirect to the canonical URL; but think through why you’re doing it: who is benefited by the redirect?

At least with regard the with/without slash redirects there is value in these: sometimes the user copy-pastes the URL but forgets the final slash, sometimes the application he is using for bookmarking, sharing, etc. drops the slash; sometimes even the site owner forgets be consistent. Thus having those redirects in place means not having dead links pointing to your site.

With regard the `.html`, with and without, it's a matter of taste... However, the same point applies, perhaps you've been inconsistent along the years, thus having redirects saves the dead links...

However you are right about the "canonical" URL: for search engine purposes, having a single page serving the content, and the rest being redirects, is kind of essential. (Or at least having a `<link rel="canonical" ...>` if not a redirect.)

> Supporting more things for the sake of it is not a virtue. There is value in failing early on incorrect input. Postel was wrong.

It is when you don't want dead links. :)

chrismorgan · on Sept 6, 2022

You seem to have missed a good chunk of what I said.

But to address some particular points:

> sometimes the application he is using for bookmarking, sharing, etc. drops the slash

Do you happen to have any evidence of this? I’ve heard it mentioned very occasionally, but never seen it, including any probability of it in logs (though I have seen more bizarre things), and the only ways I can imagine it being likely to happen would break many other things too, so that it doesn’t seem likely.

> perhaps you've been inconsistent along the years, thus having redirects saves the dead links...

And so I strongly advocate for retaining such redirects. Just not gratuitous support for other things.

> It is when you don't want dead links.

I said for the sake of it. If by “dead links” you mean “existing URLs that worked in the past”, that’s not “for the sake of it”, but good cause. But if you’re speaking about proactively allowing things that never worked in the past, that’s exactly what I’m arguing against. I want robust justification for every extra URL that is supported, of the machine or human that is likely to encounter it and why. (As an example of this, I’d honestly quite enjoy returning 400 for requests with unknown query strings parameters, which in the context of static websites mostly means any query string, in order to truly have only one URL for the content; but I acknowledge that this is not pragmatic because it’s not uncommon to inject additional query string parameters, typically for the purpose of spying on users in unwanted utm_* parameters and the likes.)

ciprian_craciun · on Sept 6, 2022

[the author here] Indeed didn't mention anything about the shared webhosting solutions, just as I didn't mention anything about S3 + CloudFront, or Backblaze B2 + a CDN in front, or Cloudflare + WebWorkers, or AWS Lambda, or any other thousand ways to do it... (Like for example there is <https://redbean.dev/> which I find just so intriguing, and not far from my own <https://github.com/volution/kawipiko> proposal.)

Although shared webhosting is part of our web history -- and still a viable choice especially if you have something in PHP or something that requires a little-bit of dynamic content -- I don't think it's still a common choice for today.

It's somewhere in between dedicated cloud-hosting, because although you have an actual HTTP server (usually Apache or Nginx) that you can't configure it much because it's managed by the provider, thus it gives you the same features (and limitations) as an a proper cloud-hosted static site solution (such as Netlify); and between self-hosting because of the same reasons, having an actual full-blown HTTP server, but one you can't fully control, thus it gives you fewer features than a self-managed VM in a cloud provider or self-hosted machine. Thus unless you need PHP, or `htaccess`, I think the other two alternatives make a better choice.

The issue with "static sites", due to the de-facto requirements in 2022 imposed by the the internet "gatekeepers" (mainly search engines), is that they aren't "just a bunch of files on disk that we can just serve with proper `Content-Type`, `Last-Modified` or `ETag`, and perhaps compressed"; we now need (in order to meet the latest hoops the gatekeepers want us to jump through) to also do a bunch of things that aren't quite possible (or certainly not easily) with current web servers. For example:

* minification (which I've cited in my article) -- besides compression, one should also employ HTML / CSS / JS and other asset minification; none of the classical web servers support this; there is something like <https://www.modpagespeed.com/>, but it's far from straightforward to deploy (let alone on a shared web-host;)

* when it comes to headers (be it the ones for CSP and other security related ones) or even `Link` headers for preloading, these aren't easy to configure, especially if you need those `Link` headers only for some HTML pages and not all resources; in this regard I don't know how many shared webhosts actually allow you to tinker with these;

The point I was trying to make is that if you want to deploy a professional (as in performant) static web site, just throwing some files in a folder and pointing Apache or Nginx at them isn't enough. If the performance you are getting by default from such a setup is enough for you, then perfect! If not there is a lot of pain getting everything to work properly.

martin_a · on Sept 6, 2022

This is wrong on many levels.

Most people don't need all that complexity and configuration options, especially if you just want it to work.

Asset minification can be done locally, besides the ominous gatekeepers don't give a fuck. I've got a static website with lots of files, just HTML and CSS, not even minified I think, and a dozen lines of JS, and it's a solid 100/100 on Page Speed Insights.

Link headers? There's a HTML tag for that and it's fast enough: https://developer.mozilla.org/en-US/docs/Web/HTML/Link_types...

You're just bloating everything up.

chubot · on Sept 6, 2022

The point I was trying to make is that if you want to deploy a professional (as in performant) static web site, just throwing some files in a folder and pointing Apache or Nginx at them isn't enough

Shared hosting is basically where somebody configures and runs Apache or Nginx for you ... they know how to do this, and it definitely works!

marc_io · on Sept 6, 2022

> I don't think it's still a common choice for today.

Not true at all. About 37% of the web hosting market is shared hosting. Just do a simple web search for the size of the web hosting services market to find this information. It's one of the 3 most common choices today – and it's always been that way.

ciprian_craciun · on Sept 6, 2022

[the author here] Well, the article does point in the last section "Putting it in practice" to my own implementation <https://github.com/volution/kawipiko>. :)

ciprian_craciun · on Sept 6, 2022

> fasthttp doesn't implement HTTP/1 correctly

Could you point to an issue describe such an improper behavior?

> doesn't implement HTTP/2 at all

Or HTTP/3; and most likely it won't implement HTTP/4 (after the HTTP/3 fashion dies out). There is an issue about this on `fasthttp`'s repository: <https://github.com/valyala/fasthttp/issues/144>

And I'll quote here what I've said there:

> Having experimented in my kawipiko static server based on fasthttp with both HTTP/2 (based on the Go's implementation) and HTTP/3 (based on an experimental available library), I continue to believe that perhaps HTTP/2 and HTTP/3 is a job for some other component of the infrastructure, be it a CDN or even a local HTTP router / load-balancer such as HAProxy.

Thus if one needs HTTP/2 or HTTP/3 in order to reap their performance benefits, then using a CDN that actually supports these is the best approach.

ciprian_craciun · on Sept 6, 2022

[the author here] Where do I sell a "square space-esque managed site builder with hosting"? I need to get on some of that revenue! :) :) :)

Under the name of "volution" I own the following:

* volution.ro -- where this article was posted, which contains no products or hosting services, or even advertising to anything; (it does contain links to my GitHub projects, all of which are purely open-source; and links to my business site bellow and to another project I'm working on but which has nothing to do with hosting;)

* volutico.eu -- which is a "under construction" page for my consulting firm;

* github.com/volution -- where there are a few more polished open-source projects, including <https://github.com/volution/kawipiko> which is an open-source implementation of the ideas described in this article;

So, either there this is a case of mistaken identity, or please point me in the right direction.

(Searching on the internet for `volution` it does yield some companies which have a similar name, but those have nothing in common with me.) :)

idealmedtech · on Sept 6, 2022

Ah, my apologies! This is the one I thought you were associated with; one letter makes a big difference: https://www.volusion.com/

I figured it was one of those "founder-has-a-blog-with-very-similar-name-to-capture-new-users" situations. Nothing wrong with it, just something I feel it's important to be aware of. Sorry for the mixup!

ciprian_craciun · on Sept 6, 2022

If one has the pre-generated response for a "simple" `GET` request (one that doesn't use conditional headers, ranges, or other advanced HTTP features), then a server could easily generate proper responses (based on that simple response) to any such complex requests. For example:

* `HEAD` is just taking the `GET` response and replying with just the headers (no body);

* if the pre-generated `GET` response contains an `ETag` header, then the server could easily handle an `If-Match` and `If-None-Match`; (else, such a static server implementation could fallback for each resource with a temporary generated `ETag` that is obtained by hashing the path of the resource and some random token generated when the server was started;)

* if the pre-generated `GET` response contains a `Last-Modified` header, then the server could easily handle `If-Modifiend-Since` and other related conditional requests; (else, the server could just consider that all resources have the `Last-Modified` header equal to the moment the server was started;)

* if the client requests a range of that resource, the server could easily respond with a slice of the stored `GET` response;

In fact, what I describe here is nothing out of the ordinary, all caching proxies do exactly this: they store the "simple" `GET` response, and then they derive all other responses based on that.

ciprian_craciun · on Sept 2, 2022

BTW, I've just tried running Kawipiko on an Android tablet (ARM64) to export the same demo as available at the link in the repository, and it works great!

In fact, benchmarking it over wireless + Wireguard with 128 concurrent connections yielded around ~2K requests/second and the overall CPU stayed under 1% (for the server process). (Granted the main issue was latency, but the average was well under 25 ms.)

Thus for a portfolio site that you take with you at conferences / meetings it works very well.