Hacker News new | past | comments | ask | show | jobs | submit login
What One-person SaaS Healthchecks.io uses for hosting, hardware and software (healthchecks.io)
373 points by slyall on May 24, 2022 | hide | past | favorite | 175 comments



I never realized that Healthchecks.io was a one man show. I'm almost ashamed to say that i've been using the free tier for years. I moved from self hosted to the free tier when i killed off all of self hosted services, and i don't have much legitimate use for it, having it attached to a backup of a Raspberry Pi running HomeBridge in my summerhouse :)

I do use it for a purpose it probably was not designed for :) In my summerhouse, one of the circuit breakers trips every now and then (1-2 times per year) for no apparent reason, and since the fridge is connected to that breaker, i very much fear arriving at a summer house where the fridge has been unpowered for a couple of weeks.

The solution to that of course, was to setup a systemd timer on a Raspberry Pi (that also runs HomeBridge), which pings Healthchecks.io every 15 minutes. If it misses 2 pings it raises an alert.

So far i've had 2 notifications in the 2+ years it's been running, and both have been internet connection related, so monitoring the situation has apparently changed something :)


I use it for the same purpose as well, a friend left on a trip and his fridge lost power, spoiling thousands of dollars worth of his small baking business materials.

I set up a phone with Tasker, if the phone stops charging or fails to check in, it sends an alert.

Luckily, and as you say, monitoring changed the situation, as it never failed since.


No fair! You changed the outcome by measuring it!


Pipe down, Farnsworth!


"The doctor is in" or "It doesn't do that when a mechanic looks at it"


> i very much fear arriving at a summer house where the fridge has been unpowered for a couple of weeks.

Another solution to this problem is to just empty the fridge and turn it off while not in the summer house for a few weeks.


We don't keep easily perishable goods in there like meat or vegetables, but some of them does require cooling to stay fresh.

> just empty the fridge and turn it off while not in the summer house for a few weeks

We tried that, but the fridge uses 6-8 hours to reach target temperature despite being brand new, and keeping food fresh without cooling for 10 hours is not easy.

It actually uses less power being powered on continiously for 2 weeks (12 days from departure to arrival), than it does being powered off and working overtime for 6 hours trying to cool down. It's not much less power, but still less.


> It actually uses less power being powered on continiously for 2 weeks (12 days from departure to arrival), than it does being powered off and working overtime for 6 hours trying to cool down. It's not much less power, but still less.

I suppose you mean energy rather than power? (Cause the argument makes less sense with power.)

That's an interesting phenomenon then since it goes counter to my (rather naive) understanding of physics. There could be some anomaly with the fridges efficiency curve that causes this. Maybe it's really inefficient cooling down the contents from room temperature while being really efficient maintaining the cool temp? After all, the latter is its main purpose..


> Maybe it's really inefficient cooling down the contents from room temperature while being really efficient maintaining the cool temp? After all, the latter is its main purpose..

Normal power consumption of it (assuming 18C room temperature) is less than 0.4 kWh/day. That amounts to 4.8 kWh in a 12 day period. Cooling it down from 18C takes 5.6 kWh (measured).

Temperature varies of course, and the lower the temperature the lower the power consumption, and the heat pump keeps a minimum temperature of 12C. During long summer days, the indoor temperature can easily reach 26C, which will require more power, and i have not measured every possible scenario.

I'm not ruling out that actually cooling down the thing during winter will require less energy than keeping it running, but i'm fairly certain that doing the same from 26C will require more power than keeping it running.


> Cooling it down from 18C takes 5.6 kWh

This seems like a lot. It's 20 megajoules, so even at a COP of 1, it's enough energy to cool down 20 megajoules / (4182 J/kg/K) / 18 K = 265 kilos of water by 18 kelvin.

And a refrigerator is supposed to attain a COP better than 1, and I suspect you have less thermal mass around than 265 kilos of water.

Something is wrong-- either in measurement, or heat exchangers very badly occluded by dust, etc, or some other fault.


It does sound like a lot, and I’ve only measured once.

The fridge is in a cabinet, enclosed on 3 sides, so maybe it’s not letting enough hot air out ?

I could be wrong, but I think it was stated as using 184 kWh / year (at room temp 22C, optimal conditions), which is just over 0.5 kWh/day, and considering that the summerhouse is usually cooler than that (12-16C) when unoccupied, I would assume it used even less power.

As it is, for this entire week, the heat pump has (self) reported using 1.8 kWh/day, and according to my power company, my daily consumption has been 1.8 kWh/day. Considering that there are 5-7 IoT things plugged in, the fridge probably isn’t using much power.


> The fridge is in a cabinet, enclosed on 3 sides, so maybe it’s not letting enough hot air out ?

Seems likely. Once it succeeds in warming its cabinet, a big share of that heat leaks back into the fridge. (And efficiency falls because of the bigger temperature difference).


Insulation can be really good now a days.


You would always expect to be better off letting the refrigerator warm up, because:

* Every bit of warming that finds its way into a powered-off refrigerator is heat the refrigerator would need to take out anyways. And the heat flowing in declines as the inside warms up.

* There's standby power, etc.

* A refrigerator is expected to be more efficient removing heat when the temperature difference between inside and outside is less.


As a counter argument, imagine an infinitely good insulation and 0 standby power (and a spherical cow inside, for illustration purposes). Then keeping it "running" is clearly better than letting the fridge warm up since power draw is 0 in the former case. Obviously the conditions are not as perfect, but with a very modern fridge, maybe you get sufficiently close that keeping running indeed is superior.

I'm not arguing that that's the case here. I'm arguing that you don't "always" expect things to be worse off letting it run. There is a threshold, when approaching the mentioned fantasy conditions, beyond which letting it run may be better.


> imagine an infinitely good insulation

Then it would never warm up, and at this limit the two cases are equivalent (powered and unpowered).

> when approaching the mentioned fantasy conditions, beyond which letting it run may be better.

Anywhere short of the limit, what I said holds. Anything better than infinite insulation, I don't know how to reason about.


I think you're missing the point.

Perfect insulation doesn't help if you open the door?

If you're saying 'well with perfect insulation it doesn't matter if it has power' - well yeah. It doesn't matter if it has power. So you aren't wasting any either leaving it on.


Heat losses are proportional to the temperature difference integrated over time, and the refrigerator is more efficient in taking heat from the inside when it is close to the temperature of the room. This means that you are better off letting it warm up, because this results in both types of losses decreasing.

This continues all the way up to perfect insulation, though ultimately it's breakeven with perfect insulation.

> Perfect insulation doesn't help if you open the door?

This doesn't change the picture, but worse, it's just completely unrelated to the scenario. Who's opening the door when everyone has left for weeks?


Uh no. The heat pump may be more efficient during the period it has the largest temperature differential, but that is ensuring it has the largest possible heat differential up front. If you don’t do that (aka keep it on), there is far less total heat to move, unless you’d have it sit so long that you’d get the same total heat transfer. Which if the insulation is good, may take many weeks.

Which seems unlikely in this case.

My chest freezer for instance turns on for less than 30 minutes a day even with people getting in and out. When I defrost it, it takes a day+ to cool down again.

The reason someone would leave the door open is to ensure no one accidentally left a jug of milk in there when they turned it off. Which would pretty much neutralize any possible gains in sheer grossness if nothing else.


> you don’t do that (aka keep it on), there is far less total heat to move, unless you’d have it sit so long that you’d get the same total heat transfer.

I think you're a little confused. I'm a little lost at even how to minimally explain it to you. The rate at which heat leaks into the refrigerator declines as the refrigerator heats up, so this metric too improves with time.

> Which if the insulation is good, may take many weeks.

Implausible. "Many weeks"?

The best 20 cu ft refrigerators use about 375 kilowatt-hours per year according to energystar.gov. At a coefficient of performance of 10, that's an average of 400W flowing into the fridge and needing to be removed by the refrigerator.

At 20K of temperature difference, that's about 20W/degK. The contents will warm very quickly, absent phase change.

Having ice in there will slow things down until the ice melts; then it will warm very quickly. The typical numbers cited say you have 2-3 days with a full freezer without power before things have mostly thawed and become unsafe-- this is well over half of the way to room temperature in terms of energy budget (and in the best case of a full freezer where there's the most thermal mass inside).

(Specific heat capacity of water ~4200 J/(kg*k); heat of fusion of water ~334000 J/(kg*k)).

None of this changes the energy balance of the problem, though.


Sounds like a win-win to me. Either you installing the thing alerts you to the problem, or instslling it makes the whole problem disappear. Either way it’ll be fixed.


I think it was exactly designed for this purpose - monitoring


> i very much fear arriving at a summer house where the fridge has been unpowered for a couple of weeks.

Usually there's little any single person can do, but this waste of electricity is very selfish considering climate change.


Compared to the office buildings leaving their entire office illuminated all night even though the parking lot is empty, this is less than miniscule. In the grand scheme of things, yes, it matters. But start with the big fish and you'll be less hungry.


While I agree with the overall sentiment (fix the big problems first to maximize payoff), the issue is one of attitude. Keeping a "save where possible" mindset, no matter how large the immediate payoff (within reasonable limits) has a much greater chance of success.

You have that effect in many areas of life. A quite obvious example is working out. Of course it won't make much of a difference if you skip today's workout or abort your 40 min run today halfway cause you don't feel like it. But it's the mindset/attitude. It makes you a quitter. It enables you to do the same thing again. And again. And in the end, the cumulative damage of the change in mindset is magnitudes worse than the single event.


> Keeping a "save where possible" mindset, no matter how large the immediate payoff (within reasonable limits) has a much greater chance of success.

Ironically that is what kept me from setting up solar power. Being in Scandinavia, solar has a somewhat limited potential given that days are 7 hours long during winter, and December often has less than 20 hours of sunshine in total. My calculations for the "payback" time of the solar panels said that _MAYBE_ they would have saved something before their "expiry date" some 20 years into the future.

Considering the "energy and material waste" involved in creating solar panels, this task can be handled by energy companies much more efficiently than i ever can.

I am still considering a share in a windmill though :)


This isn't the same thing - you've actually calculated that you're unlikely to save in this situation. You made a sensible decision that the payback period is so long for your own uses that it runs the risk of not actually paying back/being beneficial. The "save where possible" mindset would be more like "I calculated it would be better, but my impact is less than my workplace doing X, so I won't".

Everyone should be calculating payback on these investments every couple of years, because the calculus changes depending on incentives/tech/other factors (e.g. EV purchases, or massive cost of living changes)


So did the other poster - the energy to cool it from room temp to what they need exceeded (or nearly so) the energy to keep it running.


As a Norwegian looking into solar for my garage, all the products available now produce power when it's cloudy, snowy, rainy etc.

First Google result: https://www.energy.gov/eere/solar/articles/busted-common-sol...


I’m not saying it wouldn’t produce energy, and it would probably also contribute most of the year, but the time where I need the most energy (heating and light) is also the time where it is least likely to produce any.

We have an annual power consumption of 6-8k kWh, with about half during winter (heat pump, yay). More energy is required during night time due to dropping temperatures and it being dark, so realistically, I could probably save about half.

At normal energy prices of €0.3/kWh, that means I would save €1200/year, and that’s without taking into account reduced electricity taxes for electric heating.

Taking those into account, all electricity used beyond 4000 kWh/year would be around €0.12/kWh. Taking that into account, I could save €480/year, still assuming I could save 4000 kWh/year out of 8000 total.

A solar panel installation _without_ batteries is about €13500, and saving €480/year means it would take 28 years to get back the investment.

I’m aware batteries can change that equation somewhat in my favor, but it’s ever harder to calculate anything usable with those as it depends if the battery can get charged during the day. In any case, a solar panel installation with batteries is around €20000.


Yes, the ROI takes years to achieve but the alternative is to keep throwing money at the power company (and some of that energy will likely be from coal nowadays).

Since I bought an electric car I've been drooling over the idea of having the car powered by the sun. I go into the office only on occasion so I don't need to charge fast either. But you need an available roof and an okay topography.

But in Scandinavia, I bet most energy goes to heating so improving insulation and adding ground thermal will probably be more efficient. The idea of a sun powered car is just so enticing, however. It's kinda sci-fi.


> but the alternative is to keep throwing money at the power company (and some of that energy will likely be from coal nowadays).

This is Denmark, where a large part of our power grid is based on renewables, and backup power is based on natural gas. There is no coal involved. When we import power it's from Norway/Sweden or Germany, so either renewables or nuclear. We're not at Norwegian CO2 levels (yet), but it's getting better :)

You can check for yourself here : https://app.electricitymap.org/map

> Since I bought an electric car I've been drooling over the idea of having the car powered by the sun.

Denmark has outlawed sales of new ICE cars from 2030, so at that point solar will make MUCH more sense, as i can charge my car for "free", and the time to ROI will be much shorter.

> But in Scandinavia, I bet most energy goes to heating so improving insulation and adding ground thermal will probably be more efficient.

I thought so to, but currently, in no small part due to the Ukraine situation, lots of people are replacing oil and natural gas heaters with heat pumps or central heating, and the central heating providers are installing large heat pumps as well.

As part as getting a heat pump, my house (built 1970's and renovated 2010's) has a total heat demand of roughly 13000 kWh. Assuming a heat pump with a SCOP of 3.5, that means i will need about 3700 kWh of electricity to heat it, which means that our total electricity consumption ends up around 7500 kWh, so about half the energy (in my case) being used for heating.


Or, with how people work, harping on the little things causes them to run out of shits to give for even the big things and most will tune you out.

In my experience, anyway.


Define waste.

We tested it, and with our usual usage pattern of using the summerhouse every 2 weeks, the brand new fridge uses less power while powered for the 12 days between departure and arrival, than it does cooling down when powered on at arrival.

Despite being brand new, it uses 6-8 hours to get from 18C to 5C. I guess because it's brand new it is well insulated, so less cooling power is needed once it's cooled down, which translates to a weaker compressor with less power consumption.

As for the Raspberry Pi, it runs idle for 99% of the time, consuming _at most_ 1.15W. That amounts to 0.8 kWh in a month, which is about as much as a TV uses in 2.5 hours, or about a quarter the power consumption of a Sonos One Speaker sitting idle [1].

The total power consumption of the summerhouse uninhabited is ~1.8kWh/day, which includes the heat pump, fridge, internet modem and router, Raspberry Pi, Security Camera and a couple of hubs for IoT.

[1]: https://support.sonos.com/s/article/256?language=en_US


Definitely the case that the newer more energy efficient coolers and freezers use smaller heat pumps. It can be a real problem if you keep it empty and add a bunch of ‘warm’ thermal mass too, as it can take a day or so to get it down to safe temperatures again in some pathological cases.

I started leaving a gallon jug of water (frozen) in mine as it helps give it enough thermal mass to even out the spikes.

My chest freezer maxes out at 180 watts TDP for instance, which is pretty anemic if you’re trying to freeze a pot of room temperature stew.


> Define waste.

Owning a "summer" house?


But the winter lodge isnt really that nice in summer.


> this waste of electricity is very selfish considering climate change

Moving yourself to the summer house every two weeks probably has far more climate impact than the fridge.

Taking to a logical conclusion: you shouldn’t have a summer house at all. (That’s not a position that I hold, but one that seems more logical than “have a summer house; just empty out and unplug the fridge while you’re not there…”)


Minor usability suggestion that literally no one will likely care about but I notice it all the time:

If you're writing a blog post about your service then chances are you're interested in driving people to the main website about that service?

So...make it easy for those potential users to get there. At the top of your blog have a prominent "what is [service]" link to the marketing home / about page or a call out box.

This guy does it as a CTA after the post which is a start but many, many people don't. That then leaves me and other potential users to have to figure out how to get to the homepage. I'll do it but you'll lose many others.


Seconded. I'd never heard of healthchecks.io and was curious enough to strip off the blog-related parts of the url to get to their main site (not totally trivial on a phone browser).

I almost didnt bother but I'm glad I did because it looks like a really useful service that i might have uses for. I'd assumed that it was a healthcare-related business tbh.

(Also: super-informative post.)


Indeed it's very annoying when company blogs have their logo or homepage nav link go to the blog's home page instead of their main landing page.


I'm someone who likes configuring & running my own servers, it is always a pleasure reading how others small / solo businesses are running their show.

It is so cool to see one-person SaaS business running on old school bare metal servers without needing any fancy devOps / containerize tooling.


From what I observe, developer prefer to stay in API abstractions than spending time on servers/other infrastructure. Partially I can understand their choice.


What do you mean? The economics and decision-making are different between one-person companies through small and large organizations. When you have to know to to do everything, and fix anything, it is a problem when there is complexity, and you lack control. As organizations get larger, there is more value from abstractions and more leeway for their use.

In many ways it is a luxury to be able to use said abstractions and be able to open a ticket with someone else. Of course, a single-person startup could use the same tech, but it seriously impacts their costs and they don't have time to deal with the sprawl of abstractions.


I believe it’s the opposite? In a large org, you can have a dedicate team managing your servers. When you’re a one man business, you want to abstract as much as possible to focus on value proposition.


True. Although more often than not it's the opposite in my experience. Large organizations can afford to spend the money on GCP/aws/Azure and deploy thousands of services. They can also afford the money to hire infrastructure engineers to maintain such services. Also, in large companies everything needs to be implemented "the same way", so no matter if you want to deploy a tiny service or a huge one, you must do it the same way other services in the company are deployed.

Whereas when you are a one-man business, all you have is (free) time and prefer to spend as little money as possible while your business is not profitable. This means: walk away from dedicated databases, managed servers, huge aws bills, etc. You do everything "the old" way: rent a server (bare metal or cloud) and maintain it yourself.


In theory, a one-man business should "abstract as much as possible". In practice, it doesn't work out like this because you are responsible for everything about the business and it's easier to actually operate and maintain the relatively small number of bare-metal servers that run the business. Your advantage as a solo developer is that you can move fast at any time, and it's harder to do when you are encumbered with abstractions. That's not to say build versus buy isn't an evaluation that happens. There are strategic bets, maybe on analytics, or things that help, but aren't core to what you build.

This pattern plays out with many indie app developers as well, both now with iOS apps, and back in the day with Delphi shareware. The cost of taking a dependency on a 3rd party library turns out to be much higher for the solo developer than larger orgs. That's not to say they don't make bets, but they don't build the way that someone at a larger org would do because they have to maintain a higher degree of direct control.

Marco Arment (https://marco.org/) has elaborated a lot on this topic. You can also get a feel for the limits on the upper bounds of his developer experience. Many on HN are not willing to work at a low-level with such boring and dull technologies.


That fancy devops/containers tooling is easier to maintain and work on vs running on bare metal servers.


I think you and the people disagreeing with you are both right, in a sense. These tools can absolutely simplify and improve even a small deployment…for someone with significant experience using them. But there’s a lot of learning that has to happen to get to that level of experience. And for a solo operator or small team without that experience, investing that time into learning may not be the best investment.

You see this contextual disagreement a ton here on HN. Those who gave taken the time to learn these tools and have practical experience using them for real deployments see the alternative as primitive, error prone and fundamentally limiting. Those without the experience see the tools as overly-complex distractions. Both are true, depending on your situation. Like almost all tech decisions, there are no universally correct answers. The best decisions are those tailored to the specific circumstances.


The more tools you have the more difficult it is to maintain yourself. In a larger team with dedicated DevOps you might be right. But in my case where I maintain frontend, backend, databases, security, hosting and the servers I cannot have too much complexity. And i find container tools too complex to fully understand as a full stack software developer (not an expert in anything). Same for heavy weight frontend or backend frameworks. I like to keep it simple without too many abstractions on all layers. So far I managed to build apps used by millions of daily users for about 10 years.


When there are just 9 servers and a single developer like in the article your opinion doesn't matter.


Is that some kind of universal truth or just what you think


It's not easy to maintain I can tell you


I said easier, not easy.


I don't see how it is easier to maintain.

I follow a simple rule of thumb - the more abstractions you introduce the more complex it gets to maintain

Life is easier if you can get away with lesser abstractions.


Very cool setup.

I know it's a bit silly, but anyone building out cloud services should look at this post and think about how easy it is to set up this. This is just slightly more complex than a static site, has a bit of heterogenity, but otherwise is a lot of known tools.

Can your stack provide all of this without breaking the bank for the people signing up? Can it be done without having to learn a bunch of bespoke systems? I think that Heroku's success in particular is totally down to managing this bespoke-ness balance (that is still kinda missing from container-based setups).


To be fair, the author is using several SaaS apps for core parts of his business: monitoring, email, DNS, etc. He's just "self-hosting" the more expensive parts (e.g, the EC2 parts) to get more for his money. This is great if you feel comfortable sysadmin'ing and dba'ing servers. A lot of people don't feel comfortable doing this and prefer to use ELB instead of HAProxy, RDS instead of DB servers, etc. To each his own.


> A lot of people don't feel comfortable doing this and prefer to use ELB instead of HAProxy, RDS instead of DB servers, etc. To each his own.

But you gotta pay more than 500$, way more.


> This is just slightly more complex than a static site

That's because their product is only slightly more complex than a static site.

I wish every project was as basic as a web server and database but it's pretty rare these days.


> I wish every project was as basic as a web server and database but it's pretty rare these days.

I feel like the overwhelming majority of web based software projects can be catered for with exactly this setup.

If it's rare it's because everybody's drunk the Kool-aid


Majority by what metric? Solo projects? Enterprise? What? That’s an incredibly broad brush that doesn’t match my experience at all. Could it be you just haven’t been involved in many complicated projects?


Maybe in the past, sure. But these days people rely far more on their web apps.

Having a 1-2 day outage because your architecture is not highly-available across data centres like this one just doesn't cut it. And multi-day outages are very much real because of the thundering herd effect as companies rapidly try to move between data centres exhausting capacity.

And much of the complexity in modern day architectures come from high availability and security.


It's already multi-datacenter, since the servers are distributed across multiple datacenters on the Falkenstein campus (and they also have other locations in their eco-system).


The post mentions he picked multiple Hetzner DCs to put his servers in


Multiple DCs in the same "park" equivalent to an availability zone.

i.e. they would likely be sharing the same power and network infrastructure.

So not redundant by any normal definition.


> I wish every project was as basic as a web server and database but it's pretty rare these days.

Is it? What functionality are you thinking about that absolutely require more than that?

I find that most ideas even today can be solved with a web server + DB, while developers today like to over-engineer things and think about scaling a service for a million users while they haven't even figured out the value proposition of their product/service yet.


The last couple places I worked also had a queueing system for async execution/background tasks.

A lot of apps use an "AI" or "machine learning" component which usually just means a second heterogeneous internal service.

For a little bit larger companies it's pretty common to have a data analytics or data warehouse component.

In addition, a web server + database usually require some monitoring system. After you have more than a handful of servers, probably also log aggregation.

If you're just a single or handful of people working on something, you probably don't need any of that.

Once you get a little bigger (or downtime becomes expensive) those things get tacked on quickly.


If you’re programming autonomous weapons for fighter jets it tends to require a little more than a web server, or so I’ve heard!


Well, I'd say it's more rare to work on autonomous weapons for fighter jets than not, wouldn't you agree?


Depends where you live :)


Now I'm curious, where do people live where it's more common to write software for any type of weapons compared to software for anything not weapons?


I was being a little facetious, but there are some DoD/military industry towns where the majority of code will be C/C++ based stuff.


I fail to see how AWS helps with that though


Also pragmatically choosing data centres in one physical location (even though this is a health checking service) rather than immediately going for highly available Aurora Serverless multi-AZ or something that would inflate the bill 10x.


Most saas companies are simple crud apps.


No they aren't. Even seemingly basic ones like Notion hide a lot of complexity.

What you're seeing is survivorship bias where the SaaS companies that are doing well are the ones that can provide a simple, clean UI whilst adding the functionality that you need for the product to be useful.


What happens when the datacenter catches fire? (OVH in 2021).

If you're running a site like this, it's fine to be down for a bit, people will forgive you as you restore from backup (looks like a very solid foundation the site has).


The nice bit about this is, that you can deploy the same thing everywhere.

on other bare metals, or on AWS, azure, ...

As long as you have database backup, git code, and build-scripts.

I run something similar (although smaller). My disaster recovery is: I have everything prepared, to go on AWS, if necessary. Would take less, than 1 hr to be back up (database size being the biggest time sink). If I wanted to minimize that time, I could have small DB replica running, so I would just have to run last day of wall files. But for my purposes everything less than a day is good enough.

And then you can take a few days, to find something cheaper, to migrate to.


Pretty much this. I can quickly set up an entire new environment for my SaaS in under an hour; on a good day probably in about 10 minutes. It will run on any Linux environment: AWS, DigitalOcean, Linode, Azure, what-have-you.

At my last job we had one of those hyperfancy devops setups with all the fancy devops tools. Literally no one in the company knew how to spin up a new environment. I'm not exaggerating: no one knew how to run a dev environment and when they had to set up a new region for legal purposes it took weeks for the team. All of that was initially set up in an age of legends in the mythical past of a year and a half ago.

Theoretically it would all failover and scale to the moon and back. Emphasis on theoretically because no one seemed to understand how it all worked, so who knows how well it would behave when it failed. There was actually some major downtime in my time there which was rationalized away as "growing pains", but in in my opinion a major factor was just that no one really understood how it al worked.

If something goes wrong with a simple system then often the diagnosis and fix is simple (in this case: just deploy a new environment). If something goes wrong with a complex system then all of this is much harder.

Not saying you can't use these tools correctly or that there isn't an appropriate use for it, but it's a good case study on how the complexity can spin out of control if you're not careful in how you apply it.


The bus factor is 1. It's much more likely the dev can't do something for some reason than the datacenter catch fire.


> What happens when the datacenter catches fire? (OVH in 2021).

He has daily backups for that reason - those can be changed to hourly or even every 15-mins if needed.


His servers are distributed over multiple datacenters at the Falkenstein location of hetzner.


I'm really proud of this project. Personally, it has been a case study for a long time. It's the absolute epitome of pragmatic engineering solutions that actually works and turn them into a SaaS as an open source project. Well done!


I've been using healthchecks.io for years now. Awesome tool. Small scope, does the things it intends to do really well. Can highly recommend.

I'm surprised that the machines are that large, there must be a bunch of people using the service nowadays :) I remember the setup being even smaller in previous blog posts. But I really like it. Small, easy to reason about. Makes debugging when stuff goes wrong quite easy.

I'm a bit surprised that there's no config management listed though - seems like it really is just Fabric + a bunch of scripts. But hey, if it works, great!


>A dedicated laptop inside a dedicated backpack, for dealing with emergencies while away from the main PC.

This line here shows much thought has gone into "what if...." situations. Bravo!

I too have a similar setup - I have a contractual responsibility to my clients, and if anything goes wrong with my main equipment, I can resume ASAP. This includes a dedicated smartphone for hotspotting if my internet connection dies.


A business bug out bag or BCP in a back pack.


Can you share more details about what the 4 HAProxy servers are doing? I run a SaaS too, but on a single beefy machine handling web and DB stuff. I'm curious what I'm missing out on not using HAProxy in front.


To answer the question of why four of them:

* the traffic from the monitored systems comes with spikes. Looking at netdata graphs, currently the baseline is 600 requests/s, but there is a 2000 requests/s spike every minute, and 4000 requests/s spike every 10 minutes.

* want to maintain redundancy and capacity even when a load balancer is removed from DNS rotation (due to network problems, or for upgrade)

There are spare resources on the servers, especially RAM, and I could pack more things on fewer hosts. But, with Hetzner prices, why bother? :-)


I understand having multiple proxies, but it seems a bit odd to have more proxy servers than web servers, no? Do they also do caching so that of the up-to 4,000 requests/second, a lot of them never reach the web servers? Presumably the Python code is more resource intensive than HAProxy.


I didn't realize you were getting that many requests per second. That's quite remarkable when you only have 18 app cores available. In my experience, you're lucky if you can get Django to service a request in 10ms. So I'd expect closer to 40 cores to handle that level of traffic. I suppose since it is spiky, and not a particularly latency sensitive application, you can get by with fewer cores.


I've got a small program written in Golang for handling the HTTP "ping" requests. There's also rate limiting, not every request makes it past NGINX.


Damn, I never thought about it, but I guess the next cron/scheduling system should really be configurable down to the second, and if it is not defined set a random value between 0-59 seconds as a default. That might make the entire internet traffic and global compute use less spiky.


Why uWSGI? Every time I've talked to someone who uses it and got them to try gunicorn they've seen quite big performance improvement for no code change and switched.


In the post he says they're for SSL termination and for rolling upgrades of the servers behind them.


This is such a great blog post. Succinct and to the point. Read it from start to finish without pause even though the stack used (which is great!) is only remotely applicable to my workflow.


Another happy healthchecks.io user here, for many many years. The service has always worked great for me and I root for Peteris. Can happily recommend.


I notice the author is using Braintree for payments. Is it a good option for accepting payments from global audiences?


I'm also curious why this over stripe?


When I started, Stripe was not yet available in my country, Latvia (it now is).

Personally I've had good experience with Braintree. Particularly their support has been impressively good – they take time to respond, but you can tell the support agents have deep knowledge of their system, they have access to tools to troubleshoot problems, and they don't hesitate to escalate to engineering.


You should consider backblaze for backups it is much less expensive and has an s3 compatible api…


One thing I just realise is OP can replace

> HAProxy

> NGINX

> SSLMate

with Caddy [0] which would reduce the number of dependencies and complexity involve in infra.

I've been using Caddy for a very long time now and it has been working out so well.

[0] https://caddyserver.com


(Shameless plug)

Here's a tiny PAAS that can be used if Caddy is too complex.

https://github.com/mardix/sailor

Sailor is a tiny PaaS to install on your servers/VPS that uses git push to deploy micro-apps, micro-services, sites with SSL, on your own servers or VPS.


Sailor looks really cool! Just took a peak at the codebase and it already seems very interesting compared to other self hosting PaaS solutions.


He can replace them with NGINX too I believe.


NGINX doesn't have first class support for auto generating SSL, you would still have to rely on third party service/dependency for that.

Whereas, with Caddy you would solve all 3 problems with single tool.


I'd like to here more on your usage of SSLMate and SOPs.

For SSLMate I hardly understand what problem it solves (even if we forget about having Let's Encrypt free certs).

For SOPs - how it's integrated in your flow.

Thanks in advance.


SSLMate it is a certificate reseller with a convenient (for me) interface – a CLI program. It's no fun copy-pasting certificates from email attachments.

I'm using both RSA and ECDSA certificates (RSA for compatibility with old clients, ECDSA for efficiency). I'm not sure but looks like ECDSA is not yet generally available from Lets Encrypt.

On sops: the secrets (passwords, API keys, access tokens) are sitting in an encrypted file ("vault"). When a Fabric task needs secrets to fill in a configuration file template, it calls sops to decrypt the vault. My Yubikey starts flashing, I tap the key, Fabric task receives the secrets and can continue.


Thanks for clarification!


Great write up. I would love to hear more detail how WireGuard is setup.


I use vanilla Wireguard (the "wg" command and the "wg-quick" service). I set up new hosts and update peer configuration using Fabric tasks. It may sound messy, but works fine in practice. For example, to set up Wireguard on a new host:

* On the new host, I run a Fabric task which generates a key pair and spits out the public key. The private key never leaves the server.

* I paste the public key in a peer configuration template.

* On every host that must be able to contact the new host, I run another Fabric task which updates the peer configuration from the template ("wg syncconf").

One thing to watch out is any services that bind to the Wireguard network interface. I had to make sure on reboot they start after wg-quick.


Thanks for taking the time to respond!


> HAProxy 2.2 for terminating TLS connections, and load balancing between app servers. Enables easy rolling updates of application servers.

Wonder why he is not using hetzner's load balancer [1]. At least cost-wise, the savings are huge (4xAX41-NVME are around 160€; LB31 - the most expensive one - is 36€)

[1] https://www.hetzner.com/cloud/load-balancer


That product has a serious lack of docs, and I know from experience some off-cloud vendor balancers have crappy architectures with low overall throughput limits. Their product could be a shared CPU VPS under the hood or something deeply baked into their network architecture, guilty until proven innocent


The load balancer is at the core of the product reliability. That seems a good reason to completely control it instead of relying on a black box.


It's kind of interesting that there's more horsepower/cores in the load balancers than in the downstream web/app servers...4xAX41 vs 3xAX41. So I imagine something is different about the workload there versus a typical load balancer.


I believe the load balancer is pretty new and it is probably the only service that allows connecting to dedicated servers -- which is separate from their cloud offering.


Hetzner's LB doesn't really scale. LB31 should be based on CX31 which is just 2 vCPU compared to the 12 threads of a single AX41 dedicated server.


i tried to use it out of curiosity but the documentation was really lacking and the workflow of creating the networks is bad


> Wireguard for private networking between the servers.

I'm curious: doesn't Hetzner provide some sort of private virtual cloud? (Digitalocean calls that VPC)


They do for their cloud instances [0], but not their dedicated server offerings.

I've seen on their custom solutions page [1] you can pay extra to have your own private interconnect or manged switches, but I've never used them nor heard much about them.

[0] https://www.hetzner.com/cloud - see "FEATURES" [1] https://www.hetzner.com/custom-solutions


They do provide private networks now, but AFAIK they didn't in 2017. Today, one could use “vSwitch” or “Cloud Network”, or both, to set up private networking. But, if you want to easily encrypt all traffic, Wireguard might still be a good idea.


Good to know.

> But, if you want to easily encrypt all traffic, Wireguard might still be a good idea.

But, if one is already using https I guess encrypting the traffic would be unnecessary, no?


My guess: https hits HAProxy from external world, afterwards all is just http. There is no need to encrypt internal trafic.


Sure, but then why use WireGuard to encrypt http internal traffic if one can just let the internal traffic be https :D


Hardware redundancy aside (and the good cost efficiency at Hetzner), reading the title, I'm getting the impression, the bus factor at healthchecks.io seems to be 1.

If I'd run a one man show type of business, I'd love to have some kind of plan B in case I'd be incapacitated for more than half a day. THAT would be an interesting read to me.


Yes, the bus factor is 1, and it's bugging me too. I think any realistic plan B involves expanding the team.


Sveiki! @cuu508 how do you think opensourcing selfhosted version of your product impacted your sales? Positively, negatively?


I can't say definitely, but my gut feeling is positively.

A. What if another operator takes the source code, and starts a competing commercial service?

I've seen very few (I think 1 or 2) instances of somebody attempting a commercial product based on Healthchecks open-source code. I think that's because it's just a lot of work to run the service professionally, and then even more work to find users and get people to pay for it.

B. What if a potential customer decides to self-host instead?

I do see a good amount of enthusiasts and companies self-hosting their private Healthchecks instance. I'm fine with that. For one thing, the self-hosting users are all potential future clients of the hosted service. They are already familiar and happy with the product, I just need to sell the "as a service" part.


> Elastic Email for sending transactional email. > Fastmail for sending and receiving support email.

As a SaaS newbie, is it preferred to have two different email providers for transactional and support email? Why wouldn't one use Elastic Email for support email?


A saas platform usually generates transactional email by calling an http api, and it is usually outgoing-only (smtp). Its not designed for sending ad hoc messages.

Also, transactional email services police their usage to ensure that they are trusted as sources of legitimate bulk email. It would be hard to get away with sending much transactional email via a normal email provider without having your account shut down.

In contrast. Support email is handled by people and is bidirectional and ad-hoc. Fastmail provides smtp/imap/webmail, spam filtering, mailboxes, etc. and is a good match for handling support.


I respect that they use a combination of git tools just for version controlling. I understand that you can do everything with the git cli but my experience is that the more I tried to limit myself to the cli the messy my version controlling become.


This is fascinating. I am curious how sites like this (and others such as Pingdom, 'up time' checkers, etc) handle scheduled tasks that have to run at high frequencies? Cron on one machine? Celery beat?


In my case (updown.io), I use a "scheduler" process (one per machine) which runs the following loop: every 5 seconds it fetches the checks that needs to execute in the next 5 seconds, ordered by time of run ascending. Then it iterates through them one by one, wait with a small sleep until the precise running time, and then push the job into a background job queue (sidekiq / redis in my case). This allows for a very precise timing of execution even if there is hundreds jobs per seconds, and a good distribution over time (instead of firing 100 jobs every seconds in spikes, it can schedule one every 10ms for example)


Healthchecks runs a loop of

10 send any due notifications

20 SLEEP 2

30 GOTO 10

The actual loop [1] is of course a little more complicated, and is being run concurrently on several machines.

[1] https://github.com/healthchecks/healthchecks/blob/09a99d3e9c...


Question for OP.

For the healthcheck's integration solution (email, signal, discord....), how did you go about doing that? I would love to hear about all that. I did find a section about Signal's integration.


Started with just the email integration, and added other integration types over time, one by one. A few were contributed as GitHub PRs.

The Signal one took by far the most effort to get going. But, for ideological reasons, I really wanted to have it :-) Unlike most other services, Signal doesn't have public HTTP API for sending messages. Instead you have to run your own local Signal client and send messages through it. Healthchecks is using signal-cli: https://github.com/AsamK/signal-cli


How much does it all cost?


Hi, the author of the blog post here. I don't have a precise number, but somewhere in the €800/mo region.


Is the decision not to use Patroni for HA Postgres in this case, so that you don’t add more complexity?


Yes. Plus, from reading database outage postmortems, I was not comfortable making the "do we failover now?" decision automatic. Think about the brownouts, where the primary is still up, but slow. Or it experiences intermittent packet loss.

I've automated the mechanics of the failover, but it still must be initiated manually.


That‘s a good decision! Developers nowadays fear a failing machine so much because it‘s happening more often on the cloud than dedicated servers. I wouldn‘t use automatic failover too. Cleaning up the mess can take hours and days, a small downtime of a few minutes/hours is better.

I‘ve seen one developer using telegram for ha-failover which is great! Just a message and the scripts are executed for a failover. You can do it in seconds without even being on a computer.


Project I relate to, doing switch to Patroni managed DB (first of many) right today, in like a 5 hours, I may share observations from almost first hand experience later, if you care.


The monthly Hetzner bill is €484.


What volume of data are you storing in postgres? Any reason not to use a hosted postgres provider? At first glance it seems like a lot of compute for the service but curious about scale


Around 200 write tx/s as a baseline. Spikes to 2000 write tx/s at the start of every minute, and 4000 write tx/s every 10 minutes.

> Any reason not to use a hosted postgres provider?

* Cost

* Schrems II

* From what I remember, both Google Cloud SQL and AWS RDS used to have mandatory maintenance windows. The fail-over was not instant, so there was some unavoidable downtime every month. This was a while ago – maybe it is different now.


I don't think that this will work differently than you manually updating your databases. GCP claims that maintenance downtime is less than 60 seconds.


Great post, thanks for sharing your experience.

Could you also share what is you strategy for marketing and acquiring new users ?


> On app servers: [...]

Are "app" servers the haproxy ones or the "web servers", or other ones not mentioned?


App servers are the ones running NGINX + uwsgi + the Django app.


Great stuff.

Off topic, any pgDash alternatives for MYSQL?


my concerns for using Hetzner is the latency. Servers are cheap but they are located in Scandinavia or Germany.

So by the time your browser requests an API call to a Hetzner server, there is a 100~170ms overhead. Compare that to a hosting provider that has datacenters in US West and Asia


Not only that. It's single-DC, and Hetzner is somewhat known for having serious DC-wide network failures from time to time.

The hardware used is consumer hardware too.

This is literally the opposite of what I would expect for a SaaS with uptime requirement... I feel like OP/the developers behind this might have made a really terrible choice and could be paying more in manpower than the savings they have by using a budget server discounter

Don't get me wrong, I have nothing against Hetzner, but this seems like an unfit usecase. This is literally what you use cloud platforms for..


I run a one-person load-balanced SaaS on Hetzner. In the 2 years since I started I've had no customer-facing outages.

Over the same period, I think there's been two outages of AWS us-east-1 and a multi-region GCP outage. Plus weekly outages of a grownup unicorn SaaS like github or slack.

It's a tradeoff: the simplicity of such a setup reduces outage risk, and the costs saved can be spent on monitoring tools or extra redundancy.


Consumer hardware is totally fine if you use multiple machines - the probability of them failing at the exact same time is very low.

His stack already uses multiple appservers (so the loss of one wouldn't even be noticeable) and has a manual way of failing over to a DB replica.

It's often cheaper to work around node failures (which you often need to do for other reasons anyway - DC-wide outage, etc) than have more reliable nodes.


It's not single-DC (since they have multiple locations and multiple datacenters at those locations), but they are a single provider. However AWS is also a single provider. And as Facebook and large ISPs have demonstrated recently-ish, having a global footprint does not mean you cannot break everything at the same time in interesting ways anyway. Only way around this is to go with multiple vendors, but now whatever you build to use them at the same time/fail over between them will be the weakest link.

Hetzner also offers server-grade hardware btw. But if your setup is fully redundant anyway it doesn't really matter anymore if it's consumer grade or not (which the setup described in the article is).


We are discussing the service described in the article linked to with this post, which

1) runs single-DC 2) uses consumer hardware

I agree on the notion that with redundancy it is not as important what kind of hardware is used, but it still is something to consider (seeing how this is a one-person venture and server outages are still a huge time-sink), and I wouldn't even necessarily speak about true redundancy in this case


> uses consumer hardware

I run dedicated servers at Hetzner and the consumer parts are CPU and motherboard. Memory is ECC, NVMe SSDs are data center grade. How often do you see CPU failure?


The article explicitly said the servers are scattered across the various datacenters at the Falkenstein location - ergo not single-dc. Now you can argue about the amount of shared fate, but they are separate buildings and hetzner lists them as separate datacenters [0].

[0] https://docs.hetzner.com/general/others/data-centers-and-con...


Lol... okay, point taken, single-location.

Not going to argue about the shared fate, as I've seen more outages than not affecting whole locations..


im curious as to why they dont have US West, Central, Asia data centers. Even with a US West, Central offering they could really do a number with their pricing discounts.

even better if they offer Singapore & Tokyo.


Hetzner has built a DC in the US East where they currently offer their cloud products:

https://www.hetzner.com/cloud


I didn't know that thanks for sharing. Has there been major cases recently?

really puzzling that a healthcheck provider itself is not accounting for this potential issue.

Is there some sort of edgy reverse proxy here? Doesn't seem like it. Even then there would be a big latency.


About a month ago Hetzner was hit by an UDP DDOS, and as a mitigation they throttled UDP traffic on specific port ranges.

This caused major problems for Healthchecks, as I'm using Wireguard for private networking, and Wireguard uses UDP. Here's a postmortem about this incident: https://status.healthchecks.io/en/incidents/dcbcDNd89LptHoWi...


https://status.hetzner.com/

We haven't been using them for anything in production, so I am not 100% sure - they have a backbone outage right now however..

There are no edge / reverse proxies being described in the OP article, just a HAProxy configuration within the same DC.


interesting....im also concerned that with the recent geopolitical developments, Hetzner could already be at crosshairs


In case you are talking about potential DDoS attacks, yes, they have been hit hard by massive attacks the last few weeks.

They also do not have adequate protection systems in place


Hetzner has a datacenter in Ashburn, VA nowadays as well: https://www.hetzner.com/unternehmen/rechenzentrum/


I wonder how much time do you need to spend in order to maintain such infrastructure?


With ubuntu and the automatic security updates you don‘t have to spend any time maintaining it. I haven‘t done anything in months, the system is applying security updates automatically.


Is this on by default? I just installed Ubuntu 22 in a new VPS and was just thinking about this.


See https://askubuntu.com/a/204 - also applies to Debian (and most likely other derivatives) too.


Thanks, I, in fact, did that already, so good to know it is setup.


What software do you use to manage/develop against the postgres db?


Came, saw, read, bookmarked


Do you use Hetzner for DNS too?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: