Gatsby is a fairly complex static site generator. At the highest level, it provides an ingest layer that can take any data sources (CMS, markdown, json, images, or anything that a plugin supports) and bring them into a single centralized GraphQL data source. Pages (which are built using React) can query this graph for the data they need to render. Gatsby then renders the React pages to static HTML and converts the queries to JSON (so there's no actual GraphQL in production).
This process is fairly fast on small/simple sites. Gatsby is overall very efficient and can render out thousands of pages drawing from large data sources rather quickly. The issue is that Gatsby isn't just used for personal blogs. As you can imagine, a site with thousands of pages of content that is processing thousands of images for optimization starts taking a long time to build (and a lot of resources). For example, I'm building a Gatsby site for a photographer than includes 16000+ photos totaling a few hundred GB. Without incremental builds, any change (e.g. fixing a typo) means every single page needs to be rebuilt.
Incremental builds means you don't have to rebuild everything. Because the data is all coming from the GraphQL (which Gatsby pre-processes and converts to static JSON), it is possible to diff the graphs (i.e. determine what data a commit has changed) and determine what pages it affects (i.e. which pages include queries that access that field). From there, Gatsby can only rebuild that changed pages.
This not only means faster build times, it also means that only the changed pages and assets have to be re-pushed to your CDN. This way, content that hasn't changed will remain cached and only modified pages will have to be sent down to your site's users.
But if you have 16,000 of anything, why are you using a static site? Surely the access patterns are long tail and you need to build more often than most pages are even accessed.
Surely there's a CPU/disk trade off at some point. Static pages are much larger (less likely in memory) and would cause disk reads much sooner than the same files being generated dynamically. Of course wordpress isn't known for it's efficiency so the static page preference is probably quite high.
there is a big difference in the cost of static hosting and CDN (think Cloudfront / S3) for static stuff and running an active piece of hardware for static stuff that doesn't change. Like orders of magnitude. Sure for small sites it's not that much but it's still orders of magnitude.
also the answer to a large number of my interview questions ends up being figuring out how you can just effeciently serve the stuff from a CDN/Blob Storage. You can scale the crap out of this for quite cheap.
If the final result is 500k of HTML, a dynamic website is doing a LOT more work to return that 500k than a static website. Assuming you get more traffic than you have pages/generated.
Admittedly in this case I'm mostly just trying to push Gatsby to it's limits. For a photography site, there ends up being very little overhead with a static site (if you can do incremental builds). I also explored NextJS (SSR) and just making a good old SPA, but decided to go with Gatsby because at the end of the day, a distinct majority of the storage cost is just the raw images. I think Gatsby ends up making the most sense because you get to take advantage of a CDn for caching (most don't like being used just as an asset cache) and I can just leave it there without worrying about a server.
Hi, have you thought about hosting for this yet? I've got a similar site which I originally tried on AWS Amplify but it got too big for the artifact size limit so I opted for S3/Cloudflare instead however build times are slow and more of a manual process currently.
I’m yet to figure that out! I’m procrastinating on that but until I have everything else figured out (it’s for a family member so no strict timeline). I’m thinking in the end the setup will be something with a CMS for editing the photo metadata, a file storage system for the images, and everything else in git. The build would pull from all 3, run and then the processed images will be pulled out and hosted on their own. I’m planning that it’ll just run and take a long time on a droplet unless I figure something better out.
This process is fairly fast on small/simple sites. Gatsby is overall very efficient and can render out thousands of pages drawing from large data sources rather quickly. The issue is that Gatsby isn't just used for personal blogs. As you can imagine, a site with thousands of pages of content that is processing thousands of images for optimization starts taking a long time to build (and a lot of resources). For example, I'm building a Gatsby site for a photographer than includes 16000+ photos totaling a few hundred GB. Without incremental builds, any change (e.g. fixing a typo) means every single page needs to be rebuilt.
Incremental builds means you don't have to rebuild everything. Because the data is all coming from the GraphQL (which Gatsby pre-processes and converts to static JSON), it is possible to diff the graphs (i.e. determine what data a commit has changed) and determine what pages it affects (i.e. which pages include queries that access that field). From there, Gatsby can only rebuild that changed pages.
This not only means faster build times, it also means that only the changed pages and assets have to be re-pushed to your CDN. This way, content that hasn't changed will remain cached and only modified pages will have to be sent down to your site's users.