What happened to Vivaldi Social?

mollems · on July 29, 2023

Great writeup (including the human cost, e.g. loss / lack of sleep, which in my experience has a huge impact on complicated incident resolution).

Here’s what jumped out at me: “The new account was created in our database with a null value in the URI field.”

Almost every time I see a database-related postmortem — and I have seen a lot of them — NULL is lurking somewhere in the vicinity of the crime scene. Even if NULL sometimes turns out not to be the killer, it should always be brought in for questioning.

My advice is: never rely on NULL as a sentinel value, and if possible, don’t allow it into the database at all. Whatever benefits you think you might gain, they will inevitably be offset by a hard-to-find bug, quite possibly years later, where some innocuous-seeming statement expects either NULL or NOT NULL and the results are unexpected (often due to drift in the semantics of the data model).

Although this was a race condition, if the local accounts and the remote accounts were affirmatively distinguished by type, the order of operations may not have mattered (and the account merge code could have been narrowly scoped).

chipbarm · on July 29, 2023

I finally made an account just to respond to this, I hope you don't find that too aggressive a move.

Null is a perfectly valid value for data, and should be treated as such. A default value (e.g. -1 for a Boolean or an empty for a string) can make your system appear to work where NULL would introduce a runtime error, but that doesn't mean your system is performing as expected, it just makes it quieter.

I know it's tempting to brush NULL under the rug, but nothing is just as valid a state for data as something, and systems should be written generally to accommodate this.

colejohnson66 · on July 29, 2023

I agree with you re: NULL being a useful thing. I personally use nullable floats in an internal company program to denote unknown values. However, the "billion-dollar mistake" everyone brings up with it has to do with NULL allowance being implicit. In languages like C/C++, Java, C#[a] (and more), any pointer could be NULL and the only way to know is to do a NULL check. In SQL (which we're talking about here), one must explicitly call out `NOT NULL` in the column's definition.[b] Rust (and other FP languages) gets a point here by having "optional" types one must use to have a NULL-like system.

[a]: C# is fixing this with "nullable reference types", but as long as it's still opt-in, it's not perfect (backwards compatibility and everything). I can still forcibly pass a NULL to a function (defined to not take a null value) with the null-forgiving operator: `null!`. This means library code still needs `ArgumentNullException.ThrowIfNull(arg)` guards everywhere, just in case the caller is stupid. One could argue this is the caller shooting themselves in the foot like `Option.unwrap_unchecked` in Rust, but "good practice" in C# (depending on who you ask) tends to dictate guard checks.

[b]: Which is kind of stupid, IMO. Why should `my_column BOOL` be able to be null in the first place? Nullable pointers I can understand, but implicitly nullable everything is a horrible idea.

chipbarm · on July 29, 2023

In SQL as you've said, nullability is explicit. It's arguably the wrong way around (i.e. NOT NULL rather than NULLABLE), but it is explicit. I feel the issue comes from the intersection of languages without explicit nullability and their data storage techs; removing that explicit typing from SQL doesn't fix the issue.

(I feel you agree with this btw, just being explicit)

kaoD · on July 29, 2023

What's the alternative, an empty string?

IMO the problem (at least in this case) is not NULL in the DB, but NULL at the application level.

If NULL is some sort of Maybe monad and you're forced to deal with it, well, you're forced to deal with it, think about it, etc.

Empty string, whatever NULL string is in your language of choice, or some sort of sigil value you invent... not much of a difference.

int_19h · on July 29, 2023

The problem is precisely that NULL is not some sort of Maybe monad, but people keep trying to use it as such. It's a lot like using NaN as a sentinel value for floats - sure, you can do that, but when something goes wrong, instead of an error at the point where the problem is, you end up dealing with a mysterious NULL somewhere way down the line. And that's the best case - the worst is that you get wrong query results because of the way NULL comparisons work.

An empty string is better as a sentinel value because at least this doesn't have the weird "unknown value" semantics that NULL does. But if you really want the same level of explicitness and safety as an option type, the theoretically proper way to do this in relational model is to put the strings themselves in a separate table in a 1:N (where N is 0 or 1) relationship with the primary table.

resonious · on July 29, 2023

It looks to me like using empty string would not have prevented the bug in the article. If their language had maybes, they might be able to prevent this bug by having a function type signature where uri is concrete. And most langs with maybes will automatically turn external nulls into maybes.

vbezhenar · on July 29, 2023

With empty string there's gotcha: Oracle treats empty string and null as the same values. If you really need sentinel value, generate UUID.

mst · on July 29, 2023

I believe that's a nasty implementation detail of Oracle and not a problem for anybody using anything else.

A generated UUID, if possible, is probably much more clear because it will only be there if it was inserted very deliberately.

But at that point honestly I'd usually prefer breaking the field out into its own table.

semiquaver · on July 29, 2023

Joins are cheap. Wide tables are often a sign that a data-model is a bit too CRUDdy. Foreign key relationships often do a much better job modeling optionality/cardinality in relational systems.

In this case, a `user_uris` table with non-nullable columns and a unique constraint on `user_id` is the first option that comes to mind.

ngc248 · on July 30, 2023

At scale ... it is better to use CQRS ... sou you have a transaction model which is fully normalized and a read nly model which is wide, if you really want to use a RDBMS.

kaoD · on July 29, 2023

For a table with a single column (plus FK), isn't LEFT JOIN isomorphic to just having a NULLABLE column (and much better at data locality)?

It might prevent error if you only rely on INNER JOIN but that's rarely the case at least for me (you often want to access the record anyways).

Much safer to deal with the NULL at the application level.

traxys · on July 29, 2023

I had a situation where I'm not really sure I could have used something else than null: I need a value in one of two columns exactly (meaning one is NULL and the other not).

You can build a constraint to check that if it's in the same table, but across tables it seems to be a bit more complex right ?

snmx999 · on July 29, 2023

You could have the first column indicate the type of value and the second column the value. If you now have your columns named "a" and "b" you could have the first column named "type" only allowing values "a" or "b" and the second column would be "value" where the actual value is stored.

tedunangst · on July 29, 2023

In this specific case, all the local users could have had URLs in the database instead of NULL (or empty string), which would have prevented them from merging.

mananaysiempre · on July 29, 2023

> What's the alternative, an empty string?

Yes! NULL is relational for “don’t know”, and SQL is (mostly, with varying degrees of success) designed to treat is as such. That’s why NULL=anything is NULL and not e.g. false (and IMO it’s a bit of a misfeature that queries that branch on a NULL don’t crash, although it’s still better than the IEEE 754 NaN=anything outright evaluating to false). If the value is funky but you do know it, then store a funky value, not NULL.

TylerE · on July 30, 2023

But a known value of empty string is a fairly commonplace thing, and I have never wanted to treat that as null.

mananaysiempre · on July 30, 2023

I’m not sure what to say here. ... Yes? If you’re referring to the Oracle behaviour where '' IS NULL, well, the rude way of putting it is that Oracle is doing a stupid. The more polite way of putting it is that Oracle is absolutely ancient and these parts probably existed long before people had the theory developed well enough to recognize which things make sense and which don’t, and now Oracle’s backward-compat-driven livelihood depends on not recognizing that it’s making no sense there. Either way, if this matters to you, you’re stuck and will have to work around this particular wart.

Tempest1981 · on July 29, 2023

A separate Boolean column?

kaoD · on July 29, 2023

Can't wait to have a NULL url and has_url TRUE.

Might or might not be based in production data I deal with on a daily basis.

btown · on July 29, 2023

> the account merge code could have been narrowly scoped

IMO automated merging/deduplication of "similar" records is one of those incredibly hard problems, with edge cases and race conditions galore, that should have a human in the loop whenever possible, and should pass data (especially data consumed asynchronously) as explicitly as possible, with numerous checks to ensure that facts haven't shifted on the ground.

In many cases, it requires the implementors to start by thinking about all the concerns and interactivity requirements that e.g. a Git-style merge conflict would have, and try to make simplifying assumptions based on the problem domain from that starting position.

Looking at the Mastodon source [0], and seeing that there's not even an explicit list of to-merge-from IDs passed from the initiator of the merge request to the asynchronous executor of the merge logic, it seems like it was only a matter of time before something like this happened.

This is not a criticism of Mastodon, by the way! I've personally written, and been bitten by, merge logic with far worse race conditions, and it's frankly incredible that a feature like this even exists for what is effectively [1] a volunteer project! But it is a cautionary tale nonetheless.

[0] https://github.com/mastodon/mastodon/blob/main/app/workers/a... (note: AGPL)

[1] https://opencollective.com/mastodon

wouldbecouldbe · on July 29, 2023

It’s not hard not to merge on null.

I agree rest of it can be hard, and would be nervous. But this should have been obvious

msla · on July 29, 2023

NULL is inevitable if you use JOINs, simply as a matter of what a JOIN is.

More deeply, NULL is inevitable because reality is messy and your database can't decline to deal with it just because it's messy. You want to model titles, with prenomials and postnomials, and then generate full salutations using that data? Well, some people don't have postnomials, at the very least, so even if you never store NULLs you're going to get them as a result of the JOIN you use to make the salutation.

You can remove the specific NULL value, but you can't remove the fact "Not Applicable"/"Unknown" is very often a valid "value" for things in reality, and a database has to deal with that.

wouldbecouldbe · on July 29, 2023

Even if there is null, the merge function should have done some sort of null / istruthy check. That’s unbelievable.

empathy_m · on July 29, 2023

The part that resonates here is saying

"ah yes well we have a full database backup so we can do a full restore", then

"the full restore will be tough and involve downtime and has some side effects," then

"I bet we could be clever and restore only part of the data that are missing", then

doing that by hand, which hits weird errors, then

finally shipping the jury-rigged selective restore and cleaning up the last five missing pieces of data (hoping you didn't miss a sixth)

Happens every time someone practices backup/restore no matter how hard they've worked in advance. It always ends up being an application level thing to decide what data to put back from the backup image.

SV_BubbleTime · on July 29, 2023

I agree with you. The phrase is you don’t have backups unless you test your backups.

But in this case I don’t really get what the issue is. Restore everything from the last good backup and people miss some posts made in the meantime, sucks, but it’s an instant solution instead of hand work and uncertainty.

NikkiA · on July 29, 2023

When I worked as a VMS sysadmin full restore checks were one of the things I insisted on doing, sure, it used up a morning every couple of weeks, and tied up one of our microvaxes, but it was worth it.

Especially three months after I finished being sysadmin and moved to development, and they had a disk failure.

me: 'so you have backups?'

the replacement: 'sure, but they didn't restore'

me: 'what's the last good backup you have?'

tr: 'august, the last one you did'

me: 'welp'

tr's boss: 'guess £390,000 for third party disk recovery is our only option...'

NikkiA · on July 29, 2023

To add some context...

Yes, it was documented in our ISO 9000 docs. But only 'strongly recommended' to perform a regular/routine test restore. I attempted to get it converted to a mandatory step, but since I was only a temporary sysadmin and an intern, it wasn't going to happen.

I was told by my predecessor (who was a direct contractor to my employer) to perform it as routinely as I could. I would guess that he had attempted to get it put as a mandatory step, but his time was billed, mine wasn't, so shrug.

My/the replacement was an external contractor as part of a 'company Y now provides system administration services' deal, who presumably ended up eating the liability of not having working backups that they were contracted to produce.

As horrified as I was, 'it's not really my problem, I wasn't responsible' was the only attitude I could bear to take. Besides, I was busy with fortran.

pbjtime · on July 29, 2023

"If you've never tested your backup solution, you don't have a backup solution."

martey · on July 29, 2023

> To Renaud, Claire, and Eugen of the Mastodon developer team, who went above and beyond all expectations to help us out. You folks were amazing, you took our situation very seriously, and immediately jumped in to help us. I really could not have asked for anything more. Thank you!

I don't know if Vivaldi provides financial support to Mastodon (I couldn't find their name on the sponsors page). If not, I hope this situation causes them (and other companies using Mastodon) to consider sponsorship or a support contract.

renchap · on July 29, 2023

We (the Mastodon non-profit) do not offer support contracts at the moment, but this is a good idea, thanks :)

But we indeed have sponsorships open, and they really have impact. Having full-time people working on the project is very impactful, but at the moment we only have 1 full-time developer in addition to Eugen (the founder) and a DevOps person on the technical side.

progval · on July 29, 2023

They aren't on https://joinmastodon.org/sponsors so probably not.

kaoD · on July 29, 2023

Well they provide the Mastodon federation with what seems to be a large instance and people working on it.

yellowapple · on July 29, 2023

One of the better post-mortems I've read in a long while.

pluto_modadic · on July 29, 2023

I remember the hachyderm postmortems were also pretty good. I'm glad that folks are transparent.

rsynnott · on July 29, 2023

Items two and three not happening atomically feels like an issue, though I assume there's a reason that it's not trivial to do so (I haven't looked at the code; really should at some point.)

TheDong · on July 29, 2023

One of the linked fixes is: https://github.com/mastodon/mastodon/commit/13ec425b721c9594...

It seems like it was trivial to make it happen atomically.

There just wasn't a need to before since them not being atomic isn't an issue, unless you have a poor configuration like someone pointing sidekiq at a stale database server (sorry, a replica), which I see as the primary issue here.

williamdclt · on July 29, 2023

Maybe I’m missing something but I f it’s not atomic, it doesn’t matter whether there’s a replica or not: sidekiq (whatever that is) might do a read in-between step 2 and 3.

I see several problems in their setup really

- lack of strong consistency

- using eventually consistent data, the replica, to take business decision

- no concurrency control (pessimistic or optimistic)

I don’t know much about mastodon but, while not trivial, that’s pretty basic systems design concepts

afavour · on July 29, 2023

> There just wasn't a need to before since them not being atomic isn't an issue

I disagree: there clearly is an issue with a non-local account having a null URI. It’s unlikely but totally possible for the server to crash inbetween query 1 and query 2, irrespective of database replication stuff. This is a textbook example of why you use database transactions.

rsynnott · on July 29, 2023

I think even without that there was likely still at least theoretically a race condition.

psd1 · on July 30, 2023

OTOH, reading from a read-only copy reduces load on the master

INTPenis · on July 29, 2023

I'll never forget the first time I had to restore a massive sql dump and realized that vim actually segfaults trying to read it.

That's when I discovered the magic of spit(1) "split a file into pieces". I just split the huge dump into one file per table.

Of course a table can also be massive, but at least the file is now more uniform which means you can easier run other tools on it like sed or awk to transform queries.

williamdclt · on July 29, 2023

I'm surprised that vim segfaults! I had it slow to open huge files, but I always assumed it could handle anything, through some magic buffering mechanisms. I could be wrong!

That being said, from the point that one has to edit the dump to restore data... something is very wrong in the restore process (the knowledge of which isn't helpful when you're actually faced with the situation, of course)

INTPenis · on July 30, 2023

Yes you shouldn't be manually restoring sql dumps but I've been working in this field long before versioned source control or pgbackrest existed.

Terr_ · on July 29, 2023

I once had to administer a system where a particular folder had so many files that things stopped working, even the ls command would not complete. (It was probably on ext3 or ext2.)

The workaround involved writing a python script that handled everything in a gradual manner, moving files into subdirectories based on shared prefixes.

fragmede · on July 29, 2023

oh yes. ls uses 4k buffers for dirents, and in a directory with lots of entries, the time for userspace to hit the kernel to list the entities until that 4k buffer is full, back in the day, became noticable. In my dealings with a system like that, I had a hacked copy of ls that used bigger buffers so at least it wouldn't hang. Tab completion would also hang if there were too many entries.

TylerE · on July 29, 2023

This make anyone elses eyebrows raise sky high at this?

> Claire replied, asking for the full stacktraces for the log entries, which I was able to also extract from the logs.

This is either deep voodoo magic, or the code or configuration is turning a Xeon into the equivalent of a 286. House is that not, like, megabytes on every single hit?

TheDong · on July 29, 2023

> HTTP 500 errors when viewing an account

> Stacktrace for that 500

This is the default ruby on rails behavior. It prints a stacktrace on any 500 or unknown error, and it's just line numbers and filepaths.

> megabytes on every single hit

I run a rails app that's very poorly designed.

I just checked, and the stack trace for a single 500 is 5KiB. It doesn't even add up to 1MiB a day since there's only a 500 error about every hour.

> This is either deep voodoo magic, or the code or configuration is turning a Xeon into the equivalent of a 286

Having a call stack handy is is actually pretty performant. Java's default exception behavior is to bubble up a stack trace with every exception, whether you print it or not, and java applications run just fine. You have the call stack anyway since you have to know how to return, so the only extra information you need handy is the filename and line number debug symbols, and ruby needs that info anyway just by the nature of the language.

SV_BubbleTime · on July 29, 2023

>Java's default exception behavior is to bubble up a stack trace with every exception, whether you print it or not,

Anyone who has spent 5 minutes in Java knows exactly what this looks like. And also how unwelcoming it is to new programmers.

TheDong · on July 29, 2023

I have spent more than 5 minutes in java, and I don't know why having a stack available on an exception is unwelcoming to new programmers.

It definitely seems better than the approaches some other languages have, like C's "return code 1, check errno", when most new programmers don't even know what an errno is.

Can you explain it to me?

vore · on July 29, 2023

Sure, but imagine how unwelcoming to everyone it would be if you didn't get a stack trace.

k1t · on July 29, 2023

Recording stacktraces of errors is a pretty reasonable thing to do. And ideally not every hit causes an error.

williamdclt · on July 29, 2023

Do you mean you do _not_ capture stacktraces of errors in a live system ? How do you go about understanding where the error comes from ?

jrockway · on July 29, 2023

I log stacks for every error-level log and have never found it that useful. It's better than just logging "EOF" with no context of course, but manually annotating each frame with information not known to the caller is the way to go. Shifting to Go specifics; stack traces miss things like channel recvs and loops. Consider:

   for _, datum := range data {
       if err := DoSomethingWithDatum(datum); err != nil {
           log.Error(...)
       }
   }

In that case, the stack trace misses the most important thing: which datum failed.

Another common case:

   type Thing struct {
       Value any
       Err error
   }
   func Produce() {
       ch <- MakeThing()
   }
   func Consume() {
       for _, thing := range ch {
           if thing.Err != nil {
              log.Error(...)
           }
       }
   }

This one is easier to get right; capture the stack when MakeThing's implementation produces a Thing with err != nil. But, a lot of people just log the stack at log.Error which is basically useless. (Adding to the fun, sometimes Consume() is going to be an RPC to another service written in a different language. But you're still going to want a stack to help debug it.)

TL;DR stack traces are better than nothing, but a comprehensive way of handling errors and writing the information you need to fix it to the log is going to be more valuable. It is a lot of work, but I've always found it worthwhile.

williamdclt · on July 29, 2023

> In that case, the stack trace misses the most important thing: which datum failed.

OK we agree that the stacktrace isn't _enough_, but it's still a really useful thing to have to understand what exactly happened (and quite often the single most useful thing). Of course we still expect devs to capture the information that led to the `log.Error`, so that we don't have to play guess games.

Rather than manually-annotated logs, I'd prefer getting rid of all logging altogether and use tracing (opentelemetry), which is precisely designed for observability.

jrockway · on July 29, 2023

> I'd prefer getting rid of all logging altogether

I prefer to build metrics and distributed traces from logs! But, I think we can agree that they're the same thing. It's a stream of events that a system captures and lets you retrieve. (I wrote our logging/tracing/metrics library at work, and indeed I call any sort of start/finish operation a Span. https://github.com/pachyderm/pachyderm/blob/master/src/inter...)

Post-processing into an efficient retrieval system is the key. You can tail the logs and send span starts/ends to Jaeger, and you can tail the logs and send summarized metrics to Prometheus or InfluxDB. I really like having the too-high-cardinality raw data around, though, so I can analyze specific failing requests. For example, you wouldn't want to have per-request "tx_bytes" metrics in Prometheus; the cardinality is too high and it blows up the whole system. But if you log your metrics, then you find the request (grep for lines that contain the x-request-id), and you can see exact timestamps for when every megabyte of data was sent along with everything else that happened in between. ("It took 10 seconds to send 1MB of this file?" "We did 300 point SQL queries between the 10th and 11th megabyte?" Things like that.) Meanwhile, you still have summarized data for alerts ("across all requests handled by this machine, 90% of requests are taking longer than 10 seconds").

Logs end up being a lot of data, but storing a terabyte of logs to save me a week of debugging is the biggest no-brainer in software engineering. $20 for 1 week of time saved. I'll also add that as a backend developer, logs are your software's UI. If something goes wrong, that's where the operator interfaces with your software to fix the problem. So they are not to be neglected or done without care; the same way you'd probably spell check your HTML frontend.

I'll also add, I have a weird background. When I worked on Google Fiber, we wanted the ability to add new fleet-wide CPE metrics to the system without a software push. So, we logged aggressively and turned relevant log lines into metrics on the log receiver side. (I designed and wrote that system.) That meant we could monitor for issues without taking people's Internet down for 15 minutes while their routers rebooted to apply a software update that captured a metric we wanted to monitor or alert on. At my current job, we don't operate the software we write; our users buy a license and then they do God Knows What on their own infrastructure. What that means is if I want something like Jaeger, it's on me to install it, maintain it, upgrade it, and support it in an environment that I can only access by saying commands to type on a Zoom call. The juice was worth the squeeze exactly once; for logs. Users can run "pachctl debug dump", generate a tar file, send it to us, and then we have logs, metrics, and traces without any operational effort on their end.

While I'm here, here's how I do metrics-to-logs on the producing side: https://github.com/pachyderm/pachyderm/blob/master/src/inter... Apache2 if anyone wants to steal it ;)

lmm · on July 29, 2023

> I prefer to build metrics and distributed traces from logs! But, I think we can agree that they're the same thing. It's a stream of events that a system captures and lets you retrieve.

...

> Post-processing into an efficient retrieval system is the key. You can tail the logs and send span starts/ends to Jaeger, and you can tail the logs and send summarized metrics to Prometheus or InfluxDB.

I'm not sure it's the same thing; the idea that you spit everything out into this flat stream of bytes and then try to parse it back into structured data seems ass-backwards to me. I agree with keeping a trace of events that happen in your system, but if it's data you care about, don't you want to keep it structured the whole way through? At which point it's not really "logs" as usually understood.

oefrha · on July 30, 2023

You’ve confused a stack trace with a core dump (or something similar).

_ugfj · on July 29, 2023

> And it just so happens that all local accounts in a Mastodon instance have a null value in their URI field, so they all matched.

How? NULL = NULL evaluates to FALSE, SQL is a three value logic, specifically Kleene's weak three-valued logic, NULL anyoperator NULL is NULL.

porridgeraisin · on July 29, 2023

Yeah, was wondering. Maybe they filter at the application level? And check equality with their language's null value?

nateb2022 · on July 29, 2023

This is the case, more or less. The fix for this issue boiled down to a one-liner: https://github.com/mastodon/mastodon/commit/13ec425b721c9594...

But basically, some object attributes (which should have been set by default) weren't set by default. This is a common oversight when dealing with data structures that are incomplete at one point or another, and it's easy to assume during programming that code will execute in a fixed order that allows for the necessary fields to be present when needed although sometimes it doesn't always work out that way.

In my opinion, they were lucky to have caught this but a fix should include more than adding missing initialization. They should implement a sanity check to ensure that fields used are present and !NULL, and if things are undefined or missing for whatever reason, abort whatever process they are attempting to perform and log the issue.

ziml77 · on July 30, 2023

How did accounts with a NULL value in the URI column match the query? NULL does not compare equal to NULL. Is this some awful Rails magic at work?

notresidenter · on July 29, 2023

> 6 Users with symbols in their usernames couldn’t log in. This turned out to be due to a mistake I’d made in the recovery script, and was very easily fixed.

UTF-8 strikes again.

photoGrant · on July 29, 2023

Bad luck on timing? Feels like luck had little to do with it and migration testing wasn't fuzz'd enough?

TheDong · on July 29, 2023

Who are you suggesting fuzz what?

The bug wouldn't have occurred in a normal mastodon installation since mastodon's recommended configuration is a single postgres database, or at the very least to use synchronous replication.

Also, very typically, fuzzers intentionally use simplified configuration, so it seems even less likely fuzzing would have caught this interaction.

account42 · on July 31, 2023

What kind of monster disables text selection on their blog. Absolutely unreadable.

danillonunes · on Aug 4, 2023

It's not disabled, the selection color is the same as the background color.

ChrisArchitect · on July 29, 2023

This is a tough and crappy situation but that was an engaging read surprisingly! Nice one

AtlasBarfed · on July 29, 2023

Hm, so a distributed twitter runs into the challenge that each independently managed node is ... and independently managed node. Backup problems etc.

Centralized twitter improves its operations for all users over time. But can be purchased by a nutso billionaire on a whim, or subjected to the """"""national security"""""" directives of the US Government.

pornel · on July 29, 2023

Database replicas are "distributed", but not in the sense ActivityPub is.

The same error could have happened on any centralized service that had more than one db instance and background cleanup jobs. I don't think Xitter runs entirely off Elon's laptop yet, so they could have had the same kind of error.

Brendinooo · on July 29, 2023

Yeah. Speaking in generalities, decentralization increases overall resilience of a network because it isolates the extent to which bad things can spread. Centralization increases efficiency (and efficacy, if the ruler of the centralized power is competent), and the likelihood of a system-wide failure.

rsynnott · on July 29, 2023

Eh. Unusual configuration surfaced a bug, bug was fixed. That's just _normal_.

olah_1 · on July 29, 2023

Perhaps better is decentralized twitter (Nostr). Your account doesn't live on a server and you send events to multiple servers if you want to. If one server goes down, it hardly impacts you.

input_sh · on July 29, 2023

I'm sure all 30 of Nostr users will benefit.

olah_1 · on July 29, 2023

Yeah, because Mastodon is the most happening place online lol. I can't believe people on hacker news talk like this. Embarrassing how far this community has declined. Can't even discuss protocols without these stupid comments.

input_sh · on July 29, 2023

> Yeah, because Mastodon is the most happening place online lol.

Compared to other Twitter alternatives? It absolutely is. It's not even a contest, it's in a league of its own. As embarassing as it sounds, Bluesky, Nostr, Post.news, Spoutible etc don't come even close. (Threads does of course, but the two should be compatible in the near future.)

It's also the only one that 1) didn't come to life as a reaction to Twitter changing ownership, 2) federates between a decent amount of interoperable servers and software right now, not in the future, 3) already has years of experience of dealing with bad actors that are gonna come to any decentralised service, 4) grew organically, not with VC money, and 5) is stable enough that you can choose between dozens of third-party clients (the thing we all complained about Twitter and Reddit killing this year).

A couple of millions of MAU is tiny in comparison to centralised social media (like Instagram), but it's huge in comparison to any other decentralised protocol made this century. There's nothing out there that's gonna dethrone it in usage for the next 3-5 years.

LexiMax · on July 29, 2023

> didn't come to life as a reaction to Twitter changing ownership

Small correction, it came to life years before the actual purchase due to the _possibility_ that Twitter could be purchased. This gave it years to mature, and I think the time was well spent for the most part.

Agree with the rest of your points though. Mastodon is really a great platform, even if it's not the best platform for every type of user or use case, and I truly do not understand why it provokes such vitriolic detractors. I was astonished that it was actually easier for me to get updates about what was happening to Twitter during its API and connectivity issues last month from my Mastodon feed than anywhere else, including Twitter itself.

prmoustache · on July 29, 2023

In the grand scheme of things, even twitter is anecdotic as only a small and marginal fraction of internet users are using twitter.

It makes the remark above even more idiotic.

anigbrowl · on July 29, 2023

A protocol is only as useful as the number of people that adopt it. Network effects and preferential attachment are real phenomena. I've been using Nostr almost since the beginning, so it's not that I am biased against it.

tedunangst · on July 29, 2023

What the hell does any of this have to do with protocols or the problem at hand? You're the one who showed mentioning nostr out of nowhere.