Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
My thoughts about Fly.io (so far) and other newish technology I'm getting into (hartleybrody.com)
130 points by hartleybrody on May 19, 2022 | hide | past | favorite | 108 comments


> It seems like this would add a whole new class of bugs, like “I just submitted a form to change a setting and when the page reloaded, it still showed my previous value in the form” – since the write hadn’t propagated to the local read replica yet.

There's a very solid solution to this that isn't as widely known as it should be.

Read after write consistency is extremely important. If a user makes an edit to their content and then can't see that edit in the next page they load they will assume things are broken, and that the site has lost their content. This is really bad!

The best fix for this is to make sure that all reads from that user are directed to the lead database for a short period of time after they make an edit.

The Fly replay header is perfect for this. Here's what to do:

Any time a user performs a write (which should involve a POST request), set a cookie with a very short time expiry - 5s perhaps, though monitor your worst case replica lag to pick the right value.

I have trust issues with clocks in user's browsers, so I like to do this by including a value of the cookie that's the server-time when it should expire.

In your application's top-level middleware, look for that cookie. If a user has it and the court time has not been reached yet, send a Fly replay header that internally redirects the request to the lead region.

This guarantees that users who have just performed a write won't see stale data from a lagging replica. And the implementation is a dozen or so lines of code.

Obviously this won't work for every product - if you're building a chat app where every active user writes to the database every few seconds implementing this will send almost every piece of traffic to your leaders leaving your replicas with not much to do.

But if your application fits the common pattern where 95% of traffic are reads and only a small portion of your users are causing writes at any one time I would expect this to be extremely effective.

Fly replay headers are explained in detail here: https://fly.io/blog/globally-distributed-postgres/


There's another, more sophisticated trick that works for some databases: tracking a global transaction counter of some sort, persisting that in a cookie when a user makes a write and redirecting the user to the lead database if the replica they are talking to hasn't made it to that point yet.

Chris McCord describes how Elixir does that with PostgreSQL here: https://news.ycombinator.com/item?id=31434094

Wikipedia implements this trick on top of PHP and MySQL global transaction IDs (GTIDs) so it definitely scales!


Actually the way Wikipedia works is slightly different: they don't redirect to a lead database, they instead call this MySQL function to wait on the replica for it to catch up:

    SELECT WAIT_FOR_EXECUTED_GTID_SET($gtidArg, $timeout)
https://github.com/wikimedia/mediawiki/blob/434c333d9b2be817...

I wonder if there's a PostgreSQL equivalent of this?


Looks like someone proposed a WAIT FOR feature for PostgreSQL a couple of years ago: https://www.postgresql.org/message-id/flat/69a363498b76cd079...


(Disclaimer: Not an expert.. just sharing something I read somewhere)

I think FoundationDB does something really interesting with this problem. When you make changes, you do it via a transaction. But all the client reads are using the previous version, until the transaction changes have propagated across the nodes, then the new value is returned.


This is a ton of effort to save the RTT of sending all the requests to a central server. And it all goes out the window the second you need to call an external API in the processing of your requests. And to get what benefit there may be you need to, more or less, pay for a server in every big city. IMHO, outside of gaming there's no real need for what fly.io does.

For something like this to be useful I think the code would need to be running on the user's network. That would drop server ping to sub 1 ms and open up a whole lot of interesting possibilities. But I don't see what changing server ping from 80 ms to 15ms gets me.


This trick isn't just about geographic distribution - it's most commonly used for classic horizontal scaling, where you use multiple read-replicas to handle more traffic.


If you're getting 80ms response times for user requests, consistently, then it doesn't change much.


80ms is the network latency not response times. That's the number fly.io can change and realistic best case is going from 10-15ish ms staying within a city vs 80ms going to a server on the other side of the US.


I'm just saying, if your application is already fast for your users --- anything in the ballpark of 80ms is fast enough --- geographically distributing it might not make a big difference. I'm agreeing with the comment (or at least, its subtext).


If all of your users are in the US you won't gain much from geographical distribution. Where this gets really interesting is when you have users all around the world.


I think his stack is a little confused. He's got HTMX and Phoenix in there.

If you are using Phoenix then LiveView is the obvious approach to dynamically updating a page based on server stuff. It's a similar-ish architecture to HTMX, but integrated into the framework. The page is rendered on the server as normal, then when it loads on the client a web-socket is opened to a task on the server (page includes the LiveView JS). Then when something changes on the server, some new HTML generated and then the parts that have changed are sent down the websocket to the client to insert into the page. LiveView is part of Phoenix, leverages Elixir's concurrency, is very performant and a joy to use.

HTMX is a way of getting similar functionality but for a conventional server rendered framework like Django which doesn't have any of this stuff built in. It would be challenging to build it in anyway because the concurrency isn't as powerful. Simplistically, Phoenix exists because Chris McCord was trying to do a LiveView equivalent in Ruby, had issues, went on a search discovered Elixir.

So either use:

Elixir + Phoenix + Phoenix LiveView

Or:

Python + Django + HTMX (Python and Django can be substituted for other frameworks like Rails)

In both cases, Alpine can then be useful to sprinkle in some clientside only UI features.


OP here, thanks for making that distinction more clear. I had listed them all as new tech that I am starting to use, but you're correct that I wouldn't intend to use them all _together_.

The HTMX and alpine libs were intended to be sprinkled onto existing web apps (my usual python/flask stack), whereas Phoenix would be for building all new projects.


From the article, I’m not sure if the author is using all of Phoenix + HTMX + alpine.js or just exploring the combos to see what works.

I recently started playing with Phoenix and the intro to channels and LiveView has been a bit confusing. E.g. a few days ago I wondered if it was worth using something like Svelte for the frontend and then realised I could just use LiveView. As a newbie to the ecosystem, it’s taking a while to get the lay of the land and start understanding the options.


Not even sure Alpine is needed anymore as they have Phoenix.LiveView.JS now https://fly.io/phoenix-files/sdeb-toggling-element/


Thanks for this explainer. This is the missing “here is when it’s redundant” guideline when investigating whether to add to your stack.


One of the points about read replicas and read-your-own-writes is correct to call out, but on the Elixir side we have an answer to that:

> It seems like this would add a whole new class of bugs, like “I just submitted a form to change a setting and when the page reloaded, it still showed my previous value in the form” – since the write hadn’t propagated to the local read replica yet.

Elixir is distributed out of the box, so nodes can message each other. This allowed us to easily ship a `fly_postgres_elixir` library that guarantees read-your-own-writes: https://github.com/superfly/fly_postgres_elixir

It does this by sending writes to the primary region over RPC (via distributed elixir). The write is performed on a primary instance adjacent to the DB, then the result, and the postgres log-sequence-number, is sent back to the remote node. When the library gets a result of the RPC write, it blocks locally until its local read replica matches an LSN >= write LSN, then the result is returned to the caller

This gives us read-your-own-writes for the end-user, and the calling code remains unchanged for standard code paths. This doesn't solve all classes of race conditions – for example you may broadcast a message over Phoenix.PubSub that causes a read on the remote node for data that isn't yet replicated, but typically you'd avoid an N query problem from pubsub in general by populating the data in the message on the publisher beforehand.

There's no completely avoiding the fact you have a distributed system where the speed of light matters, but it's Fly's (and Phoenix's) goal to push those concerns back as far as possible. For read heavy apps, or apps that use caching layers for reads, developers already face these kinds of problems. If you think of your read-replicas as cache with a convenient SQL interface, you can avoid most foot guns.

I'm happy to answer other questions as it relates to Phoenix, Fly or what Phoenix + Fly enables from my perspective.


That's very cool. Presumably as well this could be opt out if you had specific operations (perhaps from a write-only API) that don't need to wait to do further writes?


Yeah we expose interfaces to ignore blocking on the LSN, but the way this works is by proxying the Ecto Repo interface with our own Repo. So you could call your underlying Repo directly if you wanted to perform a write without blocking on the LSN as well.


An unrelated, yet honest question.

There have been many posts hitting the HN frontpage regarding fly.io recently. Is it healthy to have so much content about a single PAAS platform showing up here so often now?


As per dang's comment a few days back(1)

> I wish more startups would achieve this, YC or not. Whenever I run across one that's trying to succeed on HN, I try to help them do so (YC or not)—why? because it makes HN better if the community finds things it loves here. Among the startups of today, I can think of only two offhand who are showing signs of maybe reaching darling status—fly.io (YC), and Tailscale (not YC).

Personally too both these companies are doing a lot of incredible things. I also love Litestream, phoenixframework and other things they are doing.

(1) https://news.ycombinator.com/item?id=30066969


Interesting to consider the power the mods have here to nudge certain companies into the lime-light of influential technologists.


There's certainly power, that's at least inherent in modding a popular platform. But as a _very_ casual observer mainly lurking around, I'm satisfied with interpreting dangs stance as anything with any traction gets boosted, yc or not.

To the examples, fly.io caught my attention primarily by offering a useful free tier DB, and tailscale has my attention as a "beat this" offering for some homelab access stuff (meaning some stuff could but I at least have a benchmark). Until this post I didn't actually know one was YC and another wasn't. I'm interested in both purely because of HN posts.


We don't do that - it's important that the interest in these things be community-organic. We're interested in tracking what the community is interested in, and we never try to gin up interest in anything (ok, except APL). It wouldn't work anyhow; that's not how a technology becomes influential, at least not in this community.

I put the OP in the second-chance pool (https://news.ycombinator.com/pool, explained at https://news.ycombinator.com/item?id=26998308), but that was according to the usual 5-second standard of "I think the community might like this one". I didn't look closely enough to tell whether the article was positive or negative towards Fly.io, nor is it our job to care about that. What we care about is intellectual curiosity (see https://hn.algolia.com/?dateRange=all&page=0&prefix=true&sor...).

There's one big exception to the above, which is the official Launch HN posts we do for YC startups - those are described at https://news.ycombinator.com/newsfaq.html - they get official placement on HN's front page, as explained there. But they're always clearly indicated by "Launch HN".

I personally find it super interesting which startups end up achieving HN darling status - the classic examples of this are Stripe and Cloudflare - I'd add Hashicorp - and it would be fun to make a list of others. But from a moderation point of view it doesn't matter whether such a startup is YC-funded or not and we try to be as neutral as we can that way. I'm not saying we don't have unconscious biases or conflicts of interest (such a claim is impossible!) but we're strict about how we approach this consciously, and we have quite a lot of practice with that.

It's a natural concern of course, and I'm always happy to answer questions.

p.s. Incidentally, there's at least one HN user who has made a long series of accounts angrily accusing us of secretly favoring a non-YC startup (the one mentioned in the GP comment). We haven't, but because that startup is well-loved by the community, it's easy to see how it could come across that way, with threads appearing frequently and filling up quickly with comments.


HN: one of the last remaining Great Good Places of the Internet, a lone tavern in an iconic gateway town to the now not-so-wild west.

Beyond the western borders of this little town, the tech gold rush has both expanded to epic proportions, affecting all the economies in the world, and also gone through enough booms and busts that the phrase "gold rush" seems somehow off.

As more and more young'uns join and jaded veterans return to throng the tavern alike, it often seems to be on the brink of either exploding with the largest gun fight in history, or jumping the shark.

And yet, against all odds, it retains its original magnetism - drawing throngs that grow in number and diversity while seers like [https://news.ycombinator.com/user?id=patio11](https://news.y... and [https://news.ycombinator.com/threads?id=tptacek](https://new... continue to return - dispensing worldly wisdom worth its weight in gold from corner tables.

The secret is the man at the corner of the bar @dang, always around with a friendly smile and a towel on his shoulder. The only sheriff in the west who still doubles as the friendly bartender: always polite, always willing to break up a fight with kind words and clean up messes himself.

Yes a cold-hard look from him is all it takes to get most outlaws to back down, yes, his Colt-45 "moderator" edition is feared by all men, but the real secret to his success: his earnest passion (some call it an obsession) for the seemingly sisyphean task of sustaining good conflict - letting it simmer but keeping it all times below the boiling point based on "the code":

"Conflict is essential to human life, whether between different aspects of oneself, between oneself and the environment, between different individuals or between different groups. It follows that the aim of healthy living is not the direct elimination of conflict, which is possible only by forcible suppression of one or other of its antagonistic components, but the toleration of it—the capacity to bear the tensions of doubt and of unsatisfied need and the willingness to hold judgement in suspense until finer and finer solutions can be discovered which integrate more and more the claims of both sides. It is the psychologist's job to make possible the acceptance of such an idea so that the richness of the varieties of experience, whether within the unit of the single personality or in the wider unit of the group, can come to expression."

May the last great tavern in the West and it's friendly bartender-sheriff live long and prosper.


fly.io didn't do phoenix. They hired its creator Chris McCord, but Phoenix was already an established product.


litestream looks super solid. would definitely consider it for a new project, if appropriate.


My personal feeling (based on what I upvote) is that ycombinator isn't getting enough quality writing about tech issues to fill the front page, so if it's "full of fly.io", that just means there isn't enough stuff about other systems at the moment.

Same reason for a while the world seemed full of Rust articles -- at that point in time there wasn't (speaking as a C++ programmer) a pile of quality C++ articles around which the Rust was pushing out.


It seems like a combination of HN "top of mind" (ie that HN users submit articles on things they're currently reading/researching), social "top of mind" (ie that people write blog posts on things their peer group is talking about), and a side effect of HN positional age-decay.

If there aren't new and different articles to fill the HN front page, then there has to be something.

And that something ends up being a base layer of blogs about "current stuff".


I regularly notice that when I read an HN-linked article that has a some solid links, maybe even sending me down a rabbit-hole of related pages on a new topic, when I resurface and return to HN I find a few a of those related page on HN themselves.


If there's significant, good, on-topic technical content that isn't getting posted to HN, for god's sake someone please let me know.

If it's getting posted to HN but not getting traction, for god's sake someone please let me know that too.

Literally the only thing we're trying to do is have HN be as interesting as possible. Missing out on the best content is disastrous for that goal—sort of like missing out on the best startups is disastrous for an investor.


We agree! It's flattering and it's super interesting to read what people think about us and we're all blushing about the "heir to the vast Heroku fortune" stuff (even if it's probably not true), but we're also cringing a bit.

We have a big announcement/technical post queued up --- we'd planned to run it on Monday --- and we're holding off on it because of the "organic" attention we're getting this week. We'd much rather talk about things like Litestream, app pentests, hiring processes, and how we replaced Nomad in our architecture.

But we're as aware as everyone else is that the front page has limited bandwidth, and we can't be on it all the time, so we're waiting for this (hopefully short) wave of attention to crest before we post our own stuff.


It's related to all the negative press recently around Heroku (security concerns + Salesforce neglect). People had a lot of goodwill towards the Paas, and a lot of that is missing what a PaaS does for you in the current environment of AWS or k8s complexity around devops, so they are looking for a replacement, and fly.io looks like one of the more innovative, or at least well-marketed, in this space.


For better or worse, fly.io has as a principal tptacek, who's at the top of the HN leaderboard and so has built up a lot of goodwill here.


And to be fair, Fly's blog content is very good for this audience (which is largely the result of tptacek).


This actually makes me wonder if people here generally pay attention, who posted what? Does that influence actual upvote status?

I never look at the person name when replying or voting, only the content. For example, I remembered this tptacek not because I remembered his posts, but because they get frequently mentioned in other people posts.


> Does that influence actual upvote status?

It didn't for me. I also don't look at who posted links. But fly.io is leaning into a concept I want to use and want to become more of a thing, and that's using SQLite for more web scale applications. Because there's web scale of Google and there's web scale of the rest of us, and I'm sick of paying the operational overhead of having a full database cluster when the app doesn't utilize its features any more than it would a SQLite db.

But in the past everyone turned their nose up at SQLite because they were cool and you were dirty and gross if you wanted to simplify things.

Don't you know we just need horizontal scaling because any second now we're going to get more than 10 write requests per second?

But fly.io leaning into litestream for replicating those databases is a thumbs up to using simple boring technology (SQLite), while still getting 80% of the benefits of doing containers hosted in a cloud platform.


A lot of people on HN do pay attention to who is posting.

This is why Cloudflare is generally beloved here as well: the principles show up in the comments whenever it’s mentioned.


Fly isn't even that great, it's just everything else is much worse. I use it and I'm not surprised other people do as well.

There's also a lot of momentum for the Elixir/Phoenix right now, and they're pretty tightly integrated with that community.


I agree but in a wider sense as well.

There are an awful lot of programming languages and methods that receive no hype. I have wondered about this, the most likely reason

Is that the majority of the crowd at the site come from a shared sphere, and to some agree the same type of priorities.

Personally, I like to stay away from the bleeding edge technology.

That does not constitute much of a problem for a lot of companies.

The annoying part is that recruiters often cram all sorts of technology into requirements for a CV and a job, even if the client has no need for those things, at least not yet.

There are millions or at least 100,000 of "enterprise" software projects out there.

I do admit it is not as sexy as the latest and greatest and start ups but it is a field where there are a whole lot of devs working in.

They do perhaps not spend as much time on HN, or they are quiet


Fair question. I’d rather see more of this than culture-war and politics.


fly.io is a solution but I don't know what the problem is. I can think of it like more dynamic CDN that can have more compute capacity (deploy whatever backed by SQLite/Postgres) to serve customers right way far more instantly.

Most applications and bigger chunk of them, are transactional and enterprise software is all about consistency and accuracy.

Nevertheless, I think its a great engineering fiat in and of itself anyway and hence gets discussed often probably could be the explanation.


<disclaimer: haven't actually used Fly.io>

The usecase I had for my startup (a few years ago, before fly.io) was "I have an Elixir/Phoenix application - stuff working in the background plus web frontend". I would like to host it with as little thinking about individual servers, load balancers etc. I went with GAE at the time and it was fine.

Fly.io seems like a much more streamlined version of the same thing, with the addition of "global load balancing" stuff on top, if I got to the point of caring about international customers.


I mean, that's a cool aspect useful to some, but I spent a day getting a Laravel app running on there, all in one region, and that's fine; I'm looking for a place to move my hobby apps off Heroku right now.


The last time I remember such a run on HN was when we created Docker. It was on the front page or mentioned in front page comments almost daily for at least 6 months (late 2013 early 2014)

Fly.io has a great combination of user virality/momentum and fundamentally technically interesting content on a wide range of topics.


Given that Fly.io has been part of YC W20, and that they make some interesting technology choices (eg all-in on sqlite), it creates more traction on this site.

Typically this type of thing goes in phases, and I wouldn’t worry about it, assuming you’re already OK with HN being biased towards YC-funded startups.


A big competitor of theirs, tailscale, also does well here.

I think the lesson is partly that the typical somewhat-deranged writing style/topics are popular. More companies should try to write engaging blog posts and be more open if they want to be successful.

It seems to have paid off for them as I would guess at least some of the people trying it out are learning about fly.io from HN.


> A big competitor of theirs, tailscale, also does well here.

How does a container/database as a service platform competes with a Wireguard as a service platform?


We do not, at all, unless that commenter knows something about Tailscale that we don't.


I’m clearly wrong here. I’m not sure why I thought what I wrote above.


I think the perspective of this article is a very healthy and productive one and I found it particularly useful.

Assessing if attending on the shoulders of the new Giants who stand up every few years is a difficult problem that I'm interested in and I appreciate the amount of context given here considering different use cases.


Generally, it is all YC hype. HN is biased towards YC startups.

There are other alternatives like Render.com, railway.app, etc but it is clear that fly.io is unsurprisingly overhyped by the HN crowd, especially if you are looking for a Heroku alternative.

It’s like asking a barber if you need a haircut.


Give credit where it's due:

fly.io spends a tremendous amount of time on creating interesting technical content that attracts this type of attention. The company is intentional about this as a customer acquisition strategy. They have an illustrator on staff for their unique art style, for example. Their founder and senior technical staff engage with these posts and answer questions, etc.. It's not YC favoritism, it's a deep understanding of the developer first mindset / ecosystem and targeting it as a company strategy.


That's incredibly generous of you, and it's true that our illustrator fucking rules, but if there are other startup people wondering why we do well on HN, I think it's actually really simple: we write for HN, not for our own marketing goals. One of the first rules in our style guide is that our model reader is never going to use Fly.io, and that our posts still have to be worth their time. I think that's all there is to it? If you can clear that bar, you're all set. Tailscale does that, too, and so does Cloudflare.


That's an excellent guideline. I want to throw another example into the ring if folks are researching this space:

Planetscale does a good job at https://planetscale.com/blog

In particular "Generics can make your Go code slower"[1] which received deservedly a lot of attention here.

1 - https://planetscale.com/blog/generics-can-make-your-go-code-...


Just a quick note that the list of applications for Fly.io at the end of this post was taken from our Launch HN --- https://news.ycombinator.com/item?id=22616857 --- and we've changed (expanded) since then.

When we launched, we didn't do persistent storage for instances, so it didn't make as much sense to run ordinary apps here; rather, the idea was that you'd run your full-stack app somewhere like us-east-1, and carve off performance-sensitive bits and run them on Fly.io. That's "edge computing".

But a bit over a year ago, we added persistent volumes, and then we built Fly Postgres on top of it. You can store files on Fly.io or use a bunch of different databases, some of which we support directly. So it makes a lot more sense to run arbitrary applications, like a Rails or Elixir app, which is not something we would have said back in March 2020.


> But despite how much I want to learn the fly.io platform – it has been a bit tricky for me wrap my head around a good use-case for this type of distributed hosting service.

Worth noting that you don't have to use the distributed aspect. I have my site hosted on a single one of a fly.io's smallest instances (which one can get 3 of for free), and even like this the performance is excellent (50ms response times), and it doesn't have the problem of spinning down when not in use like Heroku's free tier.

It's nice to at least get a choice of regions. For example, the company I work for (not hosted on fly.io currently) only has customers in the UK and Ireland. So it's would be to be able to pop our servers there with a simple config setting.


Same. I'm really impressed with the experience on there now that I finally spent a day trying it out. The geodistribution stuff had no interest to me so I'd avoided them till now, but it's really the underlying tooling and experience that has won me over.


This is an excellent point. While their main value prop seems to be "servers closer to your users" you could also just use them as a drop-in replacement for something like heroku and just use one region to simplify the mental model, pricing and orchestration.


Do many companies actually need databases geolocated near users?

I'm working on big and small projects/companies and that has never been any concern of ours.

I always imagined it to be something only the very very big players care about. And as a big player I would usually bet on a big partner like AWS, GCP, Azure. Or am I missing something?


Almost none.

I've built 3 adtech companies including all the tech and it's one of the few cases where data needs to be spread across global regions for latency and regulations. It's a lot of effort regardless of the underlying provider and not worth it unless you really have the scale and latency requirements.

You can receive an HTTP response from the other side of the planet in less than a second so server-side rendering and sending a single HTML page works just fine. The problem is actually all these client-side SPAs that make a dozen requests and are actually much slower because of it.


Companies start to get a lot more interested in this when their business truly goes global. Users in Australia have money to spend and get pretty poor performance from apps hosted in the USA due to speed of light issues.

I've looked at implementing this in the past and always found it to be SO difficult that the benefit would not be worth the cost.

Fly has changed that equation for me. It has moved this problem from "I'd love to do it if I could but it's just too hard" to "This is a thing I could do with small enough engineering effort that it would be worthwhile".

This is my favourite type of technology: I love things that move something from the "too expensive" to the "now feasible to implement" bucket!


Yeah, and I guess all my Australian hosted things are slow for you Americans who have even more money to spend


> who have even more money to spend

?

I'm guessing it's a joke about the government printing money, and the surrounding fiascos?


Web Scale was first a meme, than an ideal everyone pushes towards even when deploying their small scale blog. Who knows what'll happen if suddenly you get a million concurrent users tomorrow? Better scale it geographically and put it behind a CDN today. Look at those generous free tiers.

Like you said, it's mostly snake oil except for very big players.


I'd say using stuff like Netlify or GH Pages for static sites is worth it even if you have zero traffic. They legit are much easier to use than setting up your own VPS.


This isn't quite the same thing as trying to scale like Google though. Low latency is very important for usability regardless of how many users you serve. How easy it is to achieve depends more on the geographical distribution of your users than on their number.

If my app has a handful of users that are split between the US and Europe or Asia, and the app is 90% reads, then the distributed DB approach of fly.io or Cloudflare makes a lot of sense. It also adds considerable complexity though, so it's obviously a tradeoff.


Our business has an API that can be used for displaying dynamic information at point of sale (i.e. dynamic in that it cannot be cached and will need a DB call).

While we encourage our customers to try and use us asynchronously, we have a number of enterprises that don't and therefore demand incredibly fast response times with low latency. They pay us accordingly, so as a result we have geolocated databases (in our case though, we are using AWS Aurora replication).


Have your projects had customers all over the globe and have you measured the user experience for those that are furthest from your database?

If so, locality jumps up to the top of the performance bottlenecks pretty quick and there is no amount of performance optimization you can do to fix it.


I’ve used fly.io for a couple new projects. The main thing I like about it is that it supports affordable and easy to use persistent volumes, something the container services (GCP, AWS) don’t. This lets me test locally with volumes in a way that is identical to how things will work when deployed. With the other container hosts, I’ve had to refactor to use cloud storage services like S3.

Fly.io also has a clean, highly usable CLI and minimal set of services unlike the hundreds of options on other providers. But that’s just icing on top—the volume support is the big advantage for me.


The end of this article raises the issue of whether Fly.io’s USP, deploying app servers close to your users, is useful for run of the mill web apps. And as much as I like Fly.io and the people associated with it, I’ve wondered this myself. It just seems like serving to US customers from any major US data center is generally fast enough. And I think this might even be true for the world of HTML-over-the-wire web stuff, which Fly.io seems to investing heavily in.

No doubts there are plenty of more niche uses (if I were serving users internationally, I’d probably use Fly.io), but the use case just doesn’t seem as broad as the Heroku/PaaS comparisons make it out to be.


Not only that, having one location for a world-wide user base is usually enough. You can optimize much more through rendering speed, blocking requests etc than by being closer to your user.

And even if your page becomes really popular, 3 locations (Europe, US, East Asia) are enough to be <200ms to any user in the world. And it keeps your setup and cost much lower.


This pretty much fits our definition of "deploy app servers close to your users".

One region works just fine for some apps. Some are worth going to three.


Tired of SPA complexity? Try server side rendering, now with websockets, globally distributed nodes, read replicas, and eventual consistency!

All of this tech sounds cool, but like the author, I'm unsure when it's called for.


To tame the snark from the quoted comment, I think it's worth breaking down.

Current SPA trends are about deploying your app separate from the backend, often CDN-style close to the user (because speed of light matters). Most apps at scale use caching for reads on hot-code paths, so now we have "eventual consistency" in the mix.

Elixir is distributed out of the box, so while "global distribution" sounds fanciful, it's literally baked into the Virtual Machine, Fly simply gives us a private ipv6 network across the globe. All Elixir sees is a cluster of hosts that it can connect to, and it's off to the races.

What I'm getting at is all this snark actually describes most application folks build today at any scale, and we build distributed apps with Elixir because it's a distributed platform.

> All of this tech sounds cool, but like the author, I'm unsure when it's called for.

Imagine if you could write your dynamic UI with realtime updates, and you didn't have to bootstrap JSON apis, or GraphQL schemas for it. Imagine doing `PubSub.broadcast(room, "new_message", ...)` and it gets sent globally to all your instances – with no external dependency. Want to show some activity on the page when something happens on the cluster? Broadcast the event, then write 3 lines of code to update your UI. Imagine writing "naive" template code that renders some markup, but what falls out is smaller payloads on the wire than your carefully typed and specified GraphQL schemas that require serialization rules for all your objects. Imagine doing away with all that and gaining all the benefits of payload size.

If that sounds interesting, Phoenix + LiveView would be called for any time you wanted a dynamic UI or realtime updates and bonus points if you care about writing less code and killing layers of abstraction. Fly would be called for for the same reason folks use CDN's today, to serve resources of the app close to the user, except we just serve the app there instead.


It really is pretty cool stuff, and it may not be additional complexity for those who are already at the scale where they're dealing with CDNs and caching hot paths, as you say.

For the long tail of Rails/Django/Laravel apps sitting on Heroku or a pair of EC2s in Virginia, who are looking at SPAs with trepidation, I think the case is less obvious.

Sorry if the snark was excessive and thanks for your reply!


There's been a lot of talk about fly.io lately - it's clearly an awesome and exciting platform. But I'd have to agree with the author here that it doesn't solve the core problems faced by most web devs and web dev teams.

There are 3 relevant (for this comment) "performance layers" in building software:

- Cycle time of a team or of the project - this is affected the most by language/framework choice, DevOps infrastructure, and team working style - this should be measured in days/weeks

- Feedback loop for an individual dev working on a new ticket - this is based on the team's cycle time but in addition is really about the dev environment, team collaboration, how the team maintains quality, and how well-defined work is before being started - this should be measured in muinutes/hours

- Performance of the software deployed in terms of response time to end users - milliseconds

Fly.io helps the most with category #3. But how often is that really the most important issue in choosing where to deploy your app? If an alternative made small sacrifices there (for example went form 99.99% performance to 99%) but gained velocity for individual devs and the team to be able to ship better product more quickly, would the company/project be better off?

At Coherence (www.withcoherence.com) - disclosure that I'm a cofounder - we're laser-focused on a post-Heroku development platform that goers further than Heroku on categories 1 & 2 above (where I'd argue Heroku is still the gold standard) rather than focusing on category 3.

We're super early but in closed beta - if it sounds exciting please check us out and request a demo on the site!


> But despite how much I want to learn the fly.io platform – it has been a bit tricky for me wrap my head around a good use-case for this type of distributed hosting service.

The distributed features are there for when you need them – I don't think you have to use them. Or am I missing something?


A little note about read replicas and problems I've discovered. It's often the case in code that you write a value to the DB then immediately read it back often in the form of a different query somewhere else.

If you are setup to do some kind of a round robin read from the read replicas you can often get a different read from what you wrote as the value hasn't replicated to your read replicas yet. The solution is to use the write endpoint when reading after a write.

He says that here but just wanted to point out that it can happen inside an api and cause real issues with data.


This depends on the database and consistency level it’s enforcing. You can often configure databases to require an ack from the replicas before it returns, so that you’ll be able to read your writes. This obviously has a trade off with speed.

Some databases are cleverer about this. Things like Spanner and FoundationDB work differently so as to be fast to both read and write, but they’re much more complex to operate and use.

There is another quick trick though… if a client performs a write, set a bit in their session that causes all of their reads to come from the primary database for a short period, maybe a few seconds, just enough to cover the replication latency. This is a hack. It’s got a lot of downsides, but it’s a quick way to patch the problem if truly necessary.


I think that's an interesting idea.

Here's a post that benchmarked multi-region Postgres (Elixir/Phoenix on fly.io): https://nathanwillson.com/blog/posts/2021-09-25-fly-multi-db...

According to the post, for some users (residing in Japan, and the primary instance being located in Amsterdam), a query could take ~200ms (median). If multiple queries are performed for each request, that could mean 1 second or more per API call - not so great if that's the case for multiple seconds after each write. I think this would eventually lead to putting more code in stored procedures, begging the question: why not use a distributed DB like Fauna in the first place?

Alternatively, the replication problem could be accounted for in the app itself. E.g. the SPA or the edge instance could retry reads following a write until the change from the primary instance has propagated, and up until then pretend that everything went fine. In case a write isn't replicated within 10 seconds or so, show an error to the user and let them retry the write action. This could lead to duplicate entries, but I'd estimate the chance for that to be quite low.


Yeah 200ms to the database feels essentially useless for a webapp. I'd normally expect <5ms for a typical app, and <1ms is ideal – that usually means the same rack or very close.

This time all adds up. For a moderately complex webapp something like 10-20 queries isn't unheard of (even when you're careful about N+1 queries and caching). If each of those is taking ~5-10ms, add the network latency, and you're at ~50-150ms, which is pretty much your time budget for most end-user-facing webapps.

This is one of the problems with compute at the edge, you have to be much smarter about these things because there are a lot more network roundtrips between your app and database than there are between your customer and your app. Edge replicas help but still complicate things.


That's actually like what was done on the project in question. If you use a transaction it was set to always use the primary that way you can read and write all you want in your queries.

For this ACK that you talk about. It's AWS aurora mysql specifically here do you know if that's a setting you can setup?


Sorry no idea about AWS Aurora, but I think MySQL will do this itself.


Unfortunately, it won't. Traditional MySQL replication does not provide a built-in option for fully synchronous behavior.

You can optionally use MySQL's "semi-sync" replication feature, in which replicas ACK receipt of transactions. This is purely for durability though, not consistency: it ensures commits are durable even if the primary DB has an unrecoverable hardware failure, but without having the massive latency penalty of fully-synchronous replication.

With semi-sync, the replicas are essentially just confirming they've queued up the transaction in their local relay log, rather than acknowledging execution of the transaction. The replicas may still be lagging behind the primary, in terms of what transactions they've applied locally; this means they'll return stale reads.

That said, the story is completely different in AWS Aurora, which (by default) uses proprietary physical (storage-level) replication. Within one region, AWS docs say their lag is "usually much less than 100 milliseconds", but they also note lag depends on transaction volume, so it's unclear what sync/async tradeoffs they're making behind the scenes.

In any case, for a nice third-party implementation of read-after-write consistency in MySQL, ProxySQL has a really powerful feature: https://proxysql.com/blog/proxysql-gtid-causal-reads/


Fly.io, as sorry as I am to say, does not come close to the functionality Heroku offers yet.

Redis instances are single-region single-replica, for example.

On another note, as soon as they offer serverless functions and solid redundant Redis + SQL I'll be thinking about moving some of our production services over there for a test run.


Just so we're clear: our take right now is that if you want Redis, you should run Redis as an app. The Redis we provide "built in" to the platform is an artifact of an earlier iteration of Fly.io.


Yes - which unfortunately results in a single-instance, single-region Redis instance, as far as I understood.


Not sure I follow, sorry. If you want multi-region Redis, you can deploy multi-region Redis.


Now I am not quite sure if I follow haha. Sorry.

The documentation states,

> "to get a Redis instance with persistent storage running in a single region."

Would I be correct to assume that I would have to take care of configuring the clustering?

I'm obviously aware I could deploy multiple Redis instances in different regions, but then I would end up with different Redis instances, no?


Cloud Run is an alternative to fly.io that scales to zero so it can be much cheaper but with added cold starts.


But, to get a cloud run based setup:

- you need an elb - and this is not cheap (if you have an efficient backend, the elb will dwarf compute costs) - no persistent volumes, and you are encouraged to use gcs or firestore - each region requires a new deployment. No big deal but certainly not super easy to automate, esp. given the need to run behind an ELB (which you need on gcp to have a WAF) - google sdks for some languages suck big time. Most of python sdk is not async friendly, unbelievable as it seems.

I do use cloud run for projects big and small, and rather like it, but its hardly a competitor to fly.io imho.


Yeah the load balancer bit sucks, I forgot about that. I use terraform for multi region deploys but yeah, the load balancer is a major cost.


Isn’t the whole appeal of fly that they’re geolocated near your users, reducing latency so you can use fancy serverside rendering stacks like Phoenix?

Cold start latency kinda ruins that no?


Latency on cloud run is not an issue, in my experience (been using it around 4 years now, on a fairly large scale).

Its generally fast as is, and if you want to pre-allocate a minimum # of instances, cold starts are less of a problem (basically only if you are suddenly stampeded by a spike in traffic). But the whole smart routing and localized storage concepts are left for you to implement. You can have a bunch of cloud run services behind an ELB that does geo-proximity based routing but firestore is region-bound, and cloud spanner can be very expensive. Not saying there are no workarounds but it seems to me fly.io offers a much lower cost of entry here.


No cloud Run instance supports 1000 concurrent users and stays active for 15 min after no traffic, so only a small subset of users should ever hit a cold start.


Flyio does promote a pattern for avoiding the distributed write database complexity: request replay, a single main write database, and replicated read dbs.\1

When a request comes in to write on a read server that attempt a db write, the request is aborted and replayed on the main write server.

With some clever assumptions such as “get requests rarely write to the db” and “post request usually do”, much of the write traffic can skip the read vms.

They created a ruby rack middleware\2 to standardize this pattern for Ruby on Rails.

\1 https://fly.io/blog/run-ordinary-rails-apps-globally/

2\ https://github.com/superfly/fly-ruby


>Flyio does promote a pattern for avoiding the distributed write database complexity

I think the author is talking about the complexity of dealing with read after write situations.


I'm enjoying fly.io so far.

I just dropped DigitalOcean because of their price hike. No hard feelings. I was barely using it, and the product is growing more towards full-featured apps and teams, which is not as good a fit for me, an individual just screwing around. I don't fault them. I'm not their target customer.

Fly.io is very much designed for use primarily via their CLI tool. Their web interface needs some polish. But it does everything it says on the tin, for a price that's more than reasonable.

I only used Heroku briefly so I can't comment on similarities or differences with any authority.

As someone who is already very comfortable with container-based development, I'm happy with fly.io.


I found render.com more convincing than fly.io (which looked more like a beta product with a prime-time landing page), with both of them not being anywhere close to making me jump from GCP.


I’m unsure why you would need HTMX and alpine. Phoenix I believe is capable of handling what both of those would provide, or perhaps I’m missing something?


I’ve got a deploy running on Fly.io, but I didn’t go with the buildpack option; instead I’m pushing a locally built docker image (buildpacks don’t support pnpm).

One big miss, though, is you’ll still need a database and s3, so I’m not sure if I totally understand the value.


I think some things need to be built out some more, like their postgres and adding more storage after creation, but in general it's really enjoyable to use. The pricing seems fair, and their blog is intersting and fun to read.


Completely agree. Their offering for running containers is great, but nobody wants to maintain a database. They should add DBaaS on their architecture with automatic backups etc.


I totally agree with this! This would really add so much value to fly.io. And if they don't want to allocate resources to this right now I wonder if they could work with DBaaS providers like Planet Scale or FaunaDB where they'd wireguard their nearest instances into the fly.io network, add cli integrations/automations that'd link to their respective dashboards, etc.

I plan to use fly.io + planetscale and I'm hoping to still get low latency between those two services but it's no where near the low latency cloudflare can achieve with their new edge redis/db offerings (or the fly.io db at edge strategies) but after looking into fly.io's db strategies I really feel hesitant to take on that level of devops/additional-engineering when something like planetscale provides so much value out of the box.

Hope the fly.io team has something in the works either way! (And I'd love if they chime in with any input here in terms of performance between fly.io and existing DBaaS providers that are regionally replicated by default.)


Noob question but don't Netlify and Vercel do this already at least for Next.js apps (cache to the edge, run serverless functions on edge nodes etc)?


Unrelated to this topic, but I think you should apply this one line of CSS to your stylesheet - it improves your text aesthetics and readability :)

html { -webkit-font-smoothing: "antialiased" }


First off, avoid changing that property. It doesn't improve things

10 years later, still relevant - https://usabilitypost.com/2012/11/05/stop-fixing-font-smooth...

Secondly, the issue may be the font being used. I don't recognize it, but probably not optimized for modern web screens. Or a "converted" typeface originally designed for print




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: