Hacker News new | past | comments | ask | show | jobs | submit login
LetsEncrypt Certificate Issuance Halted (status.io)
151 points by Phreaker00 on Nov 25, 2021 | hide | past | favorite | 56 comments



I'm fine with the outage LetsEncrypt overall has been great and they should take their time fixing whatever is wrong.


Yup. Great free service. Also this practically would only disrupt new registrations as there is more than enough time window for renewals anyway.


This is why most Let's Encrypt clients start renewal some x days before certificate expiration. 'Sall good. ;-)


I think by default it is 30 days (at least for acme.sh), so you have renewal every two months and a month to fix any issues.


And I generally do it every month in case there are some unexpected issue.

1 more month for me to fix stuff (or wait for fix).


acmed uses 3 weeks (28 days), seems reasonable as well.


Sorry to nitpick, but 3 weeks are 21 days.


I guess that depends on how you count. If starting from 0, it is 3, or if you don’t count the last week as a full week, it’s also 3.


I’m not sure where you’re from; I’ve lived in various continents - nobody starts counting things like “weeks” from 0.


Birthdays are counted from zero. When you “turn” one, you’ve completed that year.


That's because "birthdays" we celebrate are technically birthday anniversaries. Hence starting from 1 year after the actual birth day


5/7 explanation


If you start weeks at 0, you should do the same for days, 3 weeks = 27 days


A week is 7 days. Three weeks is 3x7 = 21 days.


Also, calendars aren’t exactly the same everywhere. If you say “let’s work out sat and sun every other week”, in the US this results in a calendar where you have every other weekend off. On an EU calendar, this results in working out every week. Or something like that, it was pretty entertaining when my friend pointed out how subtly shifting the calendar to start on a different day results in a very different work out plan.

But back to counting. In the EU, if you go to the first floor, you’d call that the second floor in the US. It has one floor in the EU, but two in the US. I have no idea if this is the same thing or just a mistake. But there are some interesting assumptions in this thread!


3 weeks is pretty well known as 21 days.

Just because elevators and other things start at a 0 doesn't mean that makes sense anywhere else, it's fairly common for ground level to be 0. There is no instance where it makes sense to say 0 week = 7 days, 1 weeks = 14 days, etc. It does however make sense to say I live on the ground level (aka floor 0), I live on the 1st floor (aka second floor).


I'll be honest, this is still better than some more 'professional' CA issuers which sometimes just stops for a whole day. I hope that day is spent on audits and not like because their update regime doesn't support on-the-fly (or virtually on-the-fly by having two or more signing machines) updates.


Lest anyone think that such issues only happen to free providers, check out Sectigo's status page:

https://sectigo.status.io/pages/history/5938a0dbef3e6af26b00...

For context, Sectigo also provides freebies for cPanel customers.


They have, like, an incidence every two days? That's utterly disturbing, does anyone actually pay them?


For political reasons we buy a tonne of their certificates. It's not uncommon that I'll paste a CSR into the interface and just get a quality error like "Unhandled Exception", which basically tells me that they fixed it. Because back when it was Comodo I used to see full page stack traces.


The blue ones are planned maintenance and reading the contents indicates that it's mostly changes and fixes for the customer-facing UI. It may be a bit excessive to post all of these, but they certainly don't indicate any sort of problematic issue with their software.


Already restarted, was unavailable for 29 minutes. At the time of writing, performance is degraded.


... for less than half an hour.


Spoke too soon? Looks like they halted again.


Looks like they continued again


It still shows as degraded performance overall, so will probably be intermittent for a while. Pretty cool that they’re providing that level of transparency actually.


AFAICT users of Caddy would not have been affected since Caddy can fallback from one CA to another. Pretty clever!

https://caddyserver.com/docs/automatic-https#overview


Wow the title got me worried. Luckily it's an outage not a shutdown.


I guess this really only affects those wanting to get new certificates for new (sub)domains.

For renewals, this is not a problem unless it's down for an extended period of time - and even then there would be time to switch providers. Should be using scheduled updates, and even if not, the email notifications come in on 20 and 10 days, so plenty of time to go and get it renewed.


I like Let's Encrypt's free certificates! But I don't like centralization where failure in a centralized service may render millions of websites inaccessible... It is somehow against the spirit of the "inter-net" where many independent networks and computers are connected and work even if some fail...


ACME is a standardized protocol (RFC8555) and there are more providers than Let's Encrypt, and you can switch transparently. That combined with the standard procedure of renewing a few weeks before the expiration date lets one handle even a total failure rather nicely.

Some other ACME providers I know of:

- ZeroSSL.com

- BuyPass.com

- SSL.com

(most of those provide free certs in some form, but some with limitations and may then ask for money if you want more features).

https://datatracker.ietf.org/doc/html/rfc8555


> and you can switch transparently

> ZeroSSL.com

I kinda wish people would stop recommending them. This might have changed, but last time I tried ZeroSSL (~a year ago) it was not RFC 8555 compliant (specifically section 7.3.1), and you were basically supposed to use their own proprietary API to deal with the issue. So you can't always switch transparently.

If you need an alternative use Buypass. Also free, and they're actually RFC 8555 compliant.


FWIW: I'm not recommending anything, just listing a few providers that claim to be ACME conform, to be specific I took the list from the acme.sh:

https://github.com/acmesh-official/acme.sh#supported-ca

But it seems that acme.sh got bought by zero ssl, which would explain that it's their default now..

Out of honest interest, where did they fail to honor "7.3.1 Finding an Account URL Given a Key"?


> Out of honest interest, where did they fail to honor "7.3.1 Finding an Account URL Given a Key"?

Well... it doesn't work. Let me quote the RFC:

> If the server receives a newAccount request signed with a key for which it already has an account registered with the provided account key, then it MUST return a response with status code 200 (OK) and provide the URL of that account in the Location header field.

With ZeroSSL you could only call `newAccount` once; any subsequent call will fail, while according to the RFC it should return the URL of the account. So you have to either a) use their proprietary API to recover the URL (I sent them a bug report for this and that's what they basically told me), or b) save the URL along with the account key (which you don't have to do for any other ACME provider).


Buypass saved the day last month. We were suddenly not working in older browsers due to rhe expired root cert, and just one server flag on the certbot invocation and now we are good until 2040. Yay ACME protocol and yay Buypass


Whats the issue with that? I use ZeroSSL and never had to use their non-ACME-API.


Switching to ZeroSSL helped me when some clients failed to handle the root cert switchover did Let's Encrypt at, what, the end of September? The ZeroSSL root goes waaay back: https://help.zerossl.com/hc/en-us/articles/360058294074-Zero...


> you can switch transparently

Unless you use some sort of certificate (authority) pinning.


However, your pinning strategy, at whatever level, should have planned fallbacks. That can mean if you pin keys you have a second pinned key that exists only on a HSM in somebody's safe ready for emergencies, or in this case it means picking an extra CA and pinning them too, ready for such scenarios. If your pinning doesn't account for such things you sacrificed availability for security which is likely a bad choice.


Certs renew like a month before expiry if using the bot, so it would need to be down a long time before sites became inaccessible. (It was down for 20 mins)

There are also other services that offer this sort of thing, they’re just lesser known.


That's why we renew our certs more than one day in advance.


Some people renew them after someone calls them that the page is inaccessible...

...not naming names, but I can see one above the bathroom sink.

(yes yes, I know, I know...)


I assume most people have a cron job to do that. The thing is, if it fails for a number of consecutive times then you won't be able to renew it for a period of time IIRC.


The cron should run always.

They have a rate limit for the amount of renewed certs. This doesn't apply if you don't reach their servers or don't get a cert.

But if you have a problem with a CA just switch to another which supports your bot (e.g. acme compliant CAs).


Right, if your scripts can screw up, after the CA issued a certificate, their costs are locked in, (and thus Let's Encrypt rate limits apply here) and you should make effort to be able to recover the key and certificate if you fail rather than start over.

The private key only you have, so that's the thing you most need to avoid throwing away over and over due to a bug. If you lose the certificate, that's a public document, you can just get another copy manually if necessary.


If you renew 30 days ahead of schedule, your monitoring presumably goes red at least 2 weeks before expiry so you have plenty of notice to fix it.


I mean it still doesn't fully solve the problem, but I've set-up mine so that it connects to both Let's Encrypt and ZeroSSL's certificate chain (with LE getting priority), and considering adding BuyPass into the mix. I know that this isn't the true solution (some proposed a DNS-based system of sending public certificates, which unfortunately can be intercepted if your zone cannot use DNSSEC because your TLD manager didn't bother them).


> But I don't like centralization where failure in a centralized service may render millions of websites inaccessible

Clients are encouraged to renew their certificates a couple of days prior to expiration, precisely to make sure that in the case of a disruption there is still some buffer in time to prevent expired certs being served.


Standard practice is to renew 30 days before expiry. This gives you plenty of time to deal with issues.


No problem, since ACME is an open protocol, you can use multiple providers at the same time.

I didn't use them but apparently ZeroSSL and SSL.com issue free certs as well.


If only there was a way to use a different CA on renewal!


Things are under a lot of strain today. I noticed AWS lambda went down earlier today for 4 of my clients using completely unrelated stacks in different regions, but AWS status page was all green.


Here's a series of comments discussing why status pages are unreliable:

https://news.ycombinator.com/item?id=25213817

Edit:

My main takeaway is that it goes against the human instinct of self preservation (losing one's job, opening the company to lawsuits, status pages being used against you by competitor, etc.)


The halt happened twice, but only lasted ~25 mins each time. It was back running before the arrival of most of the people that will end up getting to this post.


I'll be honest, this title worried me a lot more than the Facebook is down one.


Anybody who's affected by this clearly is too late in renewing =)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: