Terraform Cloud is down for nearly five hours, and counting

EdwardDiego · on Aug 25, 2023

So glad my company pays for managed services so we don't have to worry about managing resilience and availability of critical infra ourselves...

alex_lav · on Aug 25, 2023

HashiCorp is really not doing hot lately in terms of headlines

erulabs · on Aug 25, 2023

I’ve done a lot of good work with Hashicorp tools for many many years. But I’m extremely excited to move off Terraform, Nomad, Consul, Vault and the rest.

Actually quite happy with the future: CDK, CDK8s… it’s a bit of a mess currently but at least there’s a strong path forward and tons of innovation.

mitjam · on Aug 25, 2023

Yes, I think the Terraform license change is an opportunity. For me, something like Crossplane is much more attractive than going to a community fork of Terraform. Cloud Native, continuous reconciliation loop, all the k8s tooling available, an API (k8s) out of the box and an open governance as it is a CNCF project.

oarmstrong · on Aug 25, 2023

What are you looking at for a Vault replacement?

erulabs · on Aug 25, 2023

Vault will be last on my list to replace. It's boring and it works and it's integration with K8s is actually much better than it's integration with Nomad.

Eventually though, I'd like to move to AWS Secrets Manager.

oarmstrong · on Aug 25, 2023

I agree that it’s boring and works well. It’s also what I’m most worried about finding a reasonable alternative, we can’t just not update Vault.

Sadly Secrets Manager doesn’t hit all the features I need. Really hoping for an OpenTF-style fork of Vault.

rvz · on Aug 25, 2023

If this was Threads, Twitter / X, Instagram, etc the entire news world would be reporting in less than 10 mins if it was down for just 5 mins.

It looks like you can get away with it and let your service(s) have downtime for more than 5 hours if little to no-one is paying for your service or there are almost no users using it.

Their stock price being 66% down since IPO, also suggests that not only Hashicorp's valuation was extremely inflated, but it also seems that they are in (actual) decline.

atonse · on Aug 25, 2023

I regret buying their stock.

Their tech seemed good initially so i thought everyone going to need the multi cloud stuff.

But after using it and encountering how brittle it was, and easily our nomad cluster would go down without any way to recover, even though the major point of a cluster is high availability, I can see that the tech is just very brittle and the value of the stock doesn’t surprise me.

I’m relieved that I followed my instinct and chose not to deploy it in production even after we invested so much money in setting it up.

I really hope I’m wrong about the stock and the tech.

I like the founders and they’re clearly very smart but the company has grown too fast and likely let quality go down.

mardifoufs · on Aug 25, 2023

I think the founder left the corporation. Not certain about it though, can't find definitive proof (a part from not being ceo anymore, but maybe he still has some role inside?)

emptysongglass · on Aug 25, 2023

He said he wanted to be an ICP again but now seems more concerned with writing his own terminal over saving his own company. https://twitter.com/mitchellh/status/1694785322318712960?s=2...

pxc · on Aug 25, 2023

In June of this year (there are no snapshots for July), his Twitter bio included the phrase 'lover of open source'. Nowadays it says instead 'passionate about indie software' (as well as stressing that he's no longer in leadership at Hashicorp).

https://web.archive.org/web/20230614095444/https://twitter.c...

mdaniel · on Aug 25, 2023

> our nomad cluster would go down without any way to recover,

for my curiosity, was in Nomad or Consul that fell over? My experience with etcd leads me to suspect it was actually a consul fire, but since I (thankfully) have never run Nomad I don't know first hand about its dragons

atonse · on Aug 26, 2023

I’m trying to remember. I think it was nomad.

Essentially, about once a week the raft pings between the nodes would result in no response from the primary, so then the cluster would assume it lost the node and try to hold an election to pick the new leader and get stuck in a loop cuz it kept indefinitely trying to ask the leader for it’s vote.

I thought, surely this software isn’t that stupid. But it was.

The recovery documentation was to hand-generate a peers.json file. Seriously? In 2013 when you have a million ways to do auto discovery? Including in your own software? it couldn’t just auto heal?

I managed a MongoDB cluster ten years ago and never had a single issue like this. I could routinely take nodes down and bring them up and the cluster healed perfectly.

hijinks · on Aug 25, 2023

the amount of money they charge you for basically a nice looking jenkins is crazy.