This post was also made (by the same person, it seems) on Mastodon: https://hachyderm.io/@samhenrigold/115159295473019599 — which has the added benefit of not being X, not requiring cookies, and has more information than the tweet, including a follow-up "theremin" hinge.
Fediverse will never be useful because balkanization isn't a desirable feature. The question of "which server should I sign up for" is an irredeemable anchor around anyone's neck before they can even start using it. I'm all for decentralized social media but the whole federated model is so bad.
Have you actually tried using it? I love mastodon now! You can just follow people as normal, a number of pretty interesting folks hang out on there (Brian Krebs etc).
No ads, a timeline which isn't endless and you can actually just read. It's actually really nice! I also think the decentralized non proprietary model brings us closer to something which is becoming ever more important in this world we find ourselves in.
Using it isn't the problem, joining it is. Finding a server that has the right combination of
- isn't The Big One (defeats the point)
- has a nice domain (that's your name forever)
- is stable (major downtime or data loss is unacceptable these days)
- is guaranteed to stick around forever (no, migration isn't solved and it will never not suck)
- has rules you agree with and can guarantee you'll follow
- is running the right software (no, "fedi" isn't compatible, you either run Mastodon or things will always be ever so slightly broken)
Some of the points you make are still true, but I think you're a little out of date.
Migration is not solved, but it also doesn't suck - unless you're doing it every week nothing will break, and several people I follow have already done it and it's been just fine.
Stability is also fine - if your server is down for a couple of hours, your timeline will catch up when it comes back online, and likewise your sent posts will stay in a local outbox until they can be sent. That's absolutely no different from email or Jabber or anything else.
"Fedi" is compatible enough that I run my own GoToSocial server, which is technically still beta software, and I haven't experienced any issues following and interacting with anyone on Mastodon, Pixelfed, Pleroma and quite a few other platforms.
Would I recommend it to a non-technical user, someone who wasn't really interested in 'servers' and 'clients' and 'protocols'? Yes, although I'd suggest they just go for The Big One, as you put it. What I would say though is that this is no longer just a technology for Web nerds any longer; it's a very viable alternative to centralized platforms.
I made a serious effort to look into it, but without already knowing where I want to be it was impossible to decide which server to sign onto and it's an expensive choice to make upfront since they don't all federate with each other and even the ones that do federate are not guaranteed to not start beef with each other. That's before even getting to the fact that I can name at least 4 different protocols off the top of my head (Mastodon, Pleroma, Akkoma, Misskey) at various levels of not-entirely-incompatible with each other. I remember there being work on between-server account moving mechanisms in some state of almost-partially-working, too. Maybe things have changed now but I doubt it, everything I saw in the ecosystem just seemed to promote balkanization as a feature.
I'd love a truly decentralized model for this but fediverse isn't it, fediverse is a Hellenic League of city states where your ability to interact outside your bubble is beholden to your and their local leadership and shifting realities of protocol war jank.
If you do think my opinion is uninformed or mistaken at least know that I know many times more people who bounced off the idea for these reasons than people who actually managed to make heads or tails of this. Fwiw I don't use xitter/bsky either.
Why click on a link that works versus one that doesn't? Is that the question? It's a weird form of evangelism to say that one shouldn't use the working link because it may not work in the future. That's the nature of web, most links decay.
I'm only going to to be alive for a million more hours, and the BDFL in charge of this Xitter is doing a way better job of things. Year of Linux desktop when?
This is exactly why I avoid things like Mastodon as well, because the problem isn't who controls the format, it's the format itself. Who controls the format sure doesn't help, but if you imagine Mastodon becoming as universally adopted as Twitter and seriously don't think it would be a massive mess, then I envy your optimism.
Fedi is different because it isn't proprietary or centralized. A new proprietary and/or centralized alternative is never the answer. That's just buying time.
Personally I am not a fan of the Mastodon software or side of fedi, but I have had good times on the Pleroma/Akkoma side, and it all works together.
It will never be 'it', because I - despite being technically capable of running server on bare metal or something - have no idea what you're talking about. Fedi, Mastodon, Pleroma, Akkoma, there's too much to know or read about before you can just use it. People go to Facebook, to twitter.com, and just sign up and use it and know what it is.
If mastodon.com or whatever is all you know, I can still follow you, we can interact, you don't need to know how it all works. However, pseudo-centralization with everyone piling on to a flagship instance is not ideal, so onboarding should still involve you picking an instance that doesn't already have 50k+ people on it. Some instances are specialized, they advertise themselves as being related to fishing or anime or lgbt stuff, but it's not like a more limited version of the software running. You can still post whatever you want on there and follow people on the other ones.
You also don't need to know everything right away. You could make every "mistake", sign up on the flagship Mastodon instance, hear about how you should have other instances, make an alt somewhere else, maybe fosstodon because you like free software, then you hear talk of Pleroma and you look into that a bit. It's fairly common to have multiple accounts, which is good because it provides redundancy. If your instance goes down, flagship or not, you ideally still have a way to view and post. They make it easy to import/export your following list as well, so migration isn't too bad.
It's pretty similar to Matrix if you're familiar with that at all. Initially my friends and I all ended up on matrix.org, then there was some downtime and I realized how fragile it was to all be on the one main big instance, so I made several alts. Now when matrix.org goes down (just happened a week or two ago), I can still post in the group chats I'm in, and anyone else on an instance that isn't down can post, and when matrix.org comes back it'll all flood in for those people as well.
I think it can work and be successful because email was quite successful. Not everyone was on the same domain but we still manage to email each other. You could argue that gmail has a monstrously large presence and that it's harder to host your own mail server these days, but it's all still possible.
I don’t think that matters that much; it’s still just a popularity contest, and if something manages to break through that threshold, it’ll be trivial enough to make the default.
No one knew Reddit boards and 4chan boards either; you just knew to go to /b/ or /r/funny. The other boards, the other fediverse servers, are just details that enable other subcommunities to survive. The major community will just route to a single server, and most will probably never use a second
Not who you were speaking to, but you just tried to trivialise the power of friction in a signup process, which goes _strongly_ against all known research on the topic.
A social network does not have to be universally adopted to be interesting because the vast majority of the folks do not do or think anything interesting.
A social network with just the top 1% of the geeks would be absolutely amazing.
They called it a “Trojan horse” they shouldn’t be distracted from. They were stating that it was more likely to fail, which isn’t true. You can challenge that without challenging the idea that mastodon can still be a cool place, no one said they couldn’t.
I can't see it, and if I click on @samhenrigold's profile I get a random selection of things from this July and last October instead of recent posts .
It's really not a useful platform for publicly sharing information anymore. Drives me nuts that government agencies use it for announcements like "Here's an amber alert with a twitter link, but you can't have any of the followup information because that's only for people who are logged in."
But you can only see replies to tweets if you're logged in; so thank you for providing that link, but currently, that's the only way that those of us who aren't logged into Twitter can find it.
Not only can you just replace twitter.com with nitter.net, I bet there's a browser extension you can get (or generate in 1 minute with any LLM) that would load any Twitter link into Nitter.
Plenty of people put their content behind paywalls, but apparently, someone who puts theirs behind a free loginwall is a bridge too far? I'm not sure I understand the outrage.
I can't stand Bluesky, but I have an account on it. What the fuck is the big deal?
> I have Chrome on mobile configured as such that JS and cookies are disabled by default
My God, there's two of us!
(Though … you're being privacy conscious on Chrome? Come to Firefox. Ignore the pesky "it's funded by Google" problems, nothing to see, nothing to see, the water is fiiiine.)
> You might be surprised to learn that normally, this actually works fine
I guess I have a different experience there. A huge number of sites just outright crash. (E.g., the HN search.) JavaScript devs, I've learned, do not handle error cases, and the exceptions tend to just propagated out and ruin the rendering. There seems to be some popular framework out there that even just destroys the whole DOM to emit just the error. (I forget the text, but it's the same text, always. Always centered. Flash of page, then crash.)
I have a custom extension that fakes the cookie storage for those JS pages that just lies & says "yeah, cookies are enabled" and the blackholes the writes. But it fails for anything that needs a real cookie … like Anubis.
I'm empathetic towards where Anubis is coming from though. But the "I passed the challenge" cookie is indistinguishable from a tracker … although probably most people running Anubis are inherently trustworthy by a sort of cultural association so long as Anubis remains non-mainstream. I think I might modify it to have the ability to store cookies for a short time frame (like 1h) in some cases, such as Anubis; that's enough to pass the challenge, but weighed against tracking. I'm usually only blocked by Anubis for something like a blog post, so that should suffice.
Pharmacists are a fantastic example. My pharmacy is delivered my prescription by computer. They text me, by computer, when it's ready to pick up. I drive over there … and it isn't ready, and I have to loiter for 15 minutes.
Also, after the prescription ends, they're still filling it. I just never pick it up. The autonomous flow has no ability to handle this situation, so now I get a monthly text that my prescription is ready. The actual support line is literally unmanned, and messages given it are piped to /dev/null.
The existing automation is hot garbage. But C-suite would have me believe our Lord & Savior, AI, will fix it all.
The only way AI could fix this if it said "replace the pharmacist with a vending machine and hire a $150k junior engineer to make sure the DB is updated afterwards", which you never know, Claude Opus 4 might suggest. At that point, we'll know AGI has been achieved.
I've not tried Pyright, but mypy on any realistic, real-world codebase I've thrown at it emits ~80k errors. It's hard to get started with that.
mypy's output is, AFAICT, also non-deterministic, and doesn't support a programmatic format that I know of. This makes it next to impossible to write a wrapper script to diff the errors to, for example, show only errors introduced by the change one is making.
Relying on my devs to manually trawl through 80k lines of errors for ones they might be adding in is a lost cause.
Our codebase also uses SQLAlchemy extensively, which does not play well with typecheckers. (There is an extension to aid in this, but it regrettably SIGSEGVs.)
Everyone stubs their toe on container invariance once, then figures it out and moves on. It's not unique to Python and developers should understand the nuances of variance.
I used mypy just fine for a previous job. If you are getting 80k errors, that means you are either starting very late to use the type checker and have done many dubious things before, or you didn't exclude your venv from being type checked by mypy.
Pagination: do not force me to drink from a paginated coffee stir. I do not want 640 B of data in a response, and then have to send another response for the next 640 B. And often, pagination means the calls are serialized, so I'm just doing nothing but waiting for round trip latency after round trip latency for the next meager 640 B of data.
Azure I'm looking at you. Many of their services do this, but Blob storage is something else: I've literally gotten information-free responses there. (I.e., 0 B of actual data. I wish I could say 0 B were used to transfer it.)
When you're designing, think about how big a record/object/item is, and return a reasonable number of them in a page. For programmatic consumers who want to walk the dataset, a 640 KiB response is really not that big, and I've seen so many times responses orders of magnitude less, because someone thought "100 items is a good page size, right?" and 100 items was like 4 KiB of data.
> If you have thirty API endpoints, every new version you add introduces thirty new endpoints to maintain. You will rapidly end up with hundreds of APIs that all need testing, debugging, and customer support.
You version the one thing that's changing.
As much as I hate the /v2/... form of versioning, nobody reversions all the /v1/... APIs just because one API needed a /v2. /v2 is ghost town, save for the /v2 APIs.
It’s certainly been my experience that page sizes should be bigger than you initially expect. Paginated endpoints are typically iterated all the way through meaning you’re going to return that data anyway. May as well save the additional overhead from multiple requests.
Not implementing pagination at the outset can be problematic, however. If you later want to paginate data (e.g. if the size of your data grows) then it’s going to be a breaking change to implement that later. Big page sizes but with pagination can be a reasonable balance.
Yeah, pagination ia a great option — maybe even a good default. But don't make it the only choice, give developers the choice to make the tradeoff between number of requests and payload size.
I'm curious, is there a backend reason to only offer pagination? Is it less work on the backend vs a user making X calls to get all the resources anyways?
From embedded experience I would say it could be benefitial to do paging only if you operate under heavy memory- or latency-constraints. But most APIs certainly are not.
Of course the should be some sort of maximum size, but I have seen APIs that return 1200 lines of text and require me to page them at 100 per request with no option to turn it off.
Or just don't use Bash. Python is a great scripting language, and won't blow your foot off if you try to iterate through an array.
Other than that, yeah, if you must use bash, set -eu -o pipefail; the IFS is new and mildly interesting idea to me.
> The idea is that if a reference is made at runtime to an undefined variable, bash has a syntax for declaring a default value, using the ":-" operator:
Just note that defaulting an undefined variable to a value (let's use a default value of "fallback") for these examples is,
${foo-fallback}
The syntax,
${foo:-fallback}
means "use 'fallback' if foo is unset or is equal to "". (The :, specifically triggers this; there's a bunch of others, like +, which is "use alternate value", or, you'll get the value if the parameter is defined, nothing otherwise.
if [[ "${foo+set}" == "set" ]]; then
# foo is not undefined.
fi
And similarly,
${foo:+triggered}
will emit triggered if foo is set and not empty.)
See "Parameter Expansion" in the manual. I hate this syntax, but it is the syntax one must use to check for undefined-ness.
> Python is a great scripting language, and won't blow your foot off if you try to iterate through an array.
I kind of hate that every time the topic of shell scripting comes up, we get a troop of comments touting this mindless nonsense. Python has footguns, too. Heck, it's absolutely terrible and hacky if you try to do concatenative programming with it. Does that mean it should never be used?
Instead of bashing the language, why not learn bash the language? IME, most of the industry has just absorbed shell programming haphazardly through osmosis, and almost always tries to shove the square pegs of OOP and FP into the round hole that is bash. No wonder people are having a bad time.
In contrast, a data-first design that heavily normalizes data into line-oriented tables and passes information around in pipes results in simple, direct code IME. Stop trying to use arrays and embrace data normalization and text. Also, a lot of pain comes from simply not learning the facilities, e.g. the set builtin obviates most uses of string munging and exec:
set -- "$@" --file 'filename with spaces.pdf'
set -- "$@" 'data|blob with "dangerous" characters'
set -- "$@" "$etc"
some_command "$@"
Anyway, the senseless bash hate is somewhat of a pet peeve of mine. Exunt.
All languages have foot guns, but bash is on the more explodey end of the scale. It is not senseless to note that if you can use a safer tool, you should consider it.
C/C++ got us really far, but greenfield projects are moving to safer languages where they can. Expert low level programmers, armed with all of the available linting tools are still making unfortunate mistakes. At some point we should switch to something better.
In my years of reading and writing bash as well as Python for sysops tasks, I'd say that bash is the more reliable workhorse of the two. Python tends to encourage a kind of overengineering, resulting in more bugs overall. Many times I've seen hundreds of lines of Python or Typescript result from the attempt to replace just a few lines of bash!
The senselessness I object to is not the conscientious choice of tooling or discussion of the failings thereof; it's the fact that every single bash article on here sees the same religious refrain, "Python is better than bash. Period." It's like if every article about vim saw a flood of comments claiming that vim is okay for light editing, but for any real programming we should use a real editor like emacs.
If you open vim expecting emacs but with a few different bindings, then it might just explode in you face. If you use bash expecting to be able to program just like Python but with slightly different syntax, then it's not surprising to feel friction.
IME, bash works exceptionally well using a data-oriented, text-first design to program architecture. It's just unfortunate that very little of the industry is even aware of this style of programming.
The type is the same, i.e., if you look at a type as an infinite set of values, they are the same infinite set. Yes, their in-memory representations might differ, but it means all values in one exist in the other, and only those, so conversion between them are infallible.
So in your last example, UTF-8 & UTF-32 are the same type, containing the same infinite set of values, and — of course — one can convert between them infallibly.
But you can't encode arbitrary Go strings in WTF-8 (some are not representable), you can't encode arbitrary Python strings in UTF-8 or WTF-8 (n.b. that upthread is wrong about Python being equivalent to Unicode scalars/well-formed UTF-*.) and attempts to do so might error. (E.g., `.encode('utf-8')` in Python on a `str` can raise.)
No, WTF-8[1] is a precisely defined format (that isn't that).
If you imagine a format that can encode JavaScript strings containing unpaired surrogates, that's WTF-8. (Well-formed WTF-8 is the same type as a JS string, through with a different representation.)
(Though that would have been cute name for the UTF-8/latin1/UTF-8 fail.)
When I posted that, I was honestly projecting from my own use. I think I may have independently thought of the term on Stack Overflow prior to koalie's tweet, but it's not the easiest thing (by design) to search for comments there (and that's assuming they don't get deleted, which they usually should).
(On review, it appears that the thread mentions much earlier uses...)
I did the search because I have a similar memory, I'd place it in the early 2000s before StackOverflow existed, around when people were first switching from latin1 and Windows-1251 and others to UTF-8 on the web and browsers would often pick the wrong encoding, and IE had a submenu where you could tell it which one to use on the page. WTF-8 was a thing because occasionally none of these options would work, because the layers server-side would be misconfigured and cause the double (or more, if it involved user input) encoding. It was also used just in general to complain about UTF-8 breaking everything as it was slowly being introduced.
> Subsidized solar farms have made it more difficult for farmers to access farmland by making it more expensive and less available. Within the last 30 years, Tennessee alone has lost over 1.2 million acres of farmland and is expected to lose 2 million acres by 2027.
A quick Google says that solar generates ~20 W/sq ft., so the amount of farmland lost here, by implication, to solar generation, is enough to power the entire United State with solar power alone, twice over.
Obviously, not all 1.2 million acres of land here is lost to solar generation as the government is implying. They don't cite their source, but AFAICT, this is all land that is no longer farmland for any reason at all.
> JavaScript is compelled to count UTF-16 code units because it actually does use UTF-16. Python's flexible string representation is a space optimization; it still fundamentally represents strings as a sequence of characters, without using the surrogate-pair system.
Python's flexible string system has nothing to do with this. Python could easily have had len() return the byte count, even the USV count, or other vastly more meaningful metrics than "5", whose unit is so disastrous I can't put a name to it. It's not bytes, it's not UTF-16 code units, it's not anything meaningful, and that's the problem. In particular, the USV count would have been made easy (O(1) easy!) by Python's flexible string representation.
You're handwaving it away in your writing by calling it a "character in the implementation", but what is a character? It's not a character in any sense a normal human would recognize — like a grapheme cluster — as I think if I asked a human "how many characters is <imagine this is man with skin tone face palming>?", they'd probably say "well, … IDK if it's really a character, but 1, I suppose?" …but "5" or "7"? Where do those even come from? An astute person might like "Oh, perhaps that takes more than one byte, is that it's size in memory?" Nope. Again: "character in the implementation" is a meaningless concept. We've assigned words to a thing to make it sound meaningful, but that is like definitionally begging the question here.
> or other vastly more meaningful metrics than "5", whose unit is so disastrous I can't put a name to it. It's not bytes, it's not UTF-16 code units, it's not anything meaningful, and that's the problem.
The unit is perfectly meaningful.
It's "characters". (Pedantically, "code points" — https://www.unicode.org/glossary/#code_point — because values that haven't been assigned to characters may be stored. This is good for interop, because it allows you to receive data from a platform that implements a newer version of the Unicode standard, and decide what to do with the parts that your local terminal, font rendering engine, etc. don't recognize.)
Since UTF-32 allows storing every code point in a single code unit, you can also describe it that way, despite the fact that Python doesn't use a full 4 bytes per code point when it doesn't have to.
The only real problem is that "character" doesn't mean what you think it does, and hasn't since 1991.
From the way that the Unicode standard dictates that this text shall be represented. This is not Python's fault.
> Again: "character in the implementation" is a meaningless concept.
"Character" is completely meaningful, as demonstrated by the fact the Unicode Consortium defines it, and by the fact that huge amounts of software has been written based on that definition, and referring to it in documentation.
> Since UTF-32 allows storing every code point in a single code unit, you can also describe it that way, despite the fact that Python doesn't use a full 4 bytes per code point when it doesn't have to.
Python does not use UTF-32, even notionally. Yes, I know it uses a compact representation in memory when the value is ASCII, etc. That's not what I'm talking about here. |str| != |all UTF32 strings|; `str` and "UTF-32" are different things, as there are values in the former that are absent in the latter, and again, this is why encoding to utf8 or any utf encoding is fallible in Python.
Code points is not a meaningful metric, though I suppose strictly speaking, yes, len() is code points.
> I don't understand what you mean by "USV count".
The number of Unicode scalar values in the string. (If the string were encoded in UTF-32, the length of that array.) It's the basic building block of Unicode. It's only marginally useful, and there's a host of other more meaningful metrics, like memory size, terminal width, graphemes, etc. But it's more meaningful than code points, and if you want to do anything at any higher level of representation, USVs are going to be what you want to build off. Anything else is going to be more fraught with error, needlessly.
> It's what the Unicode standard says a character is.
The Unicode definition of "character" is not a technical definition, it's just there to help humans. Again, if I fed that definition to a human, and asked the same question above, <facepalm…> is 1 "character", according to that definition in Unicode as evaluated by a reasonable person. That's not the definition Python uses, since it returns 5. No reasonable person is looking at the linked definition, and then at the example string, and answering "5".
"How many smallest components of written language that has semantic value does <facepalm emoji …> have?" Nobody is answering "5".
(And if you're going to quibble with my use of definition (1.), the same applies to (2.). (3.) doesn't apply here as Python strings are not Unicode strings (again, |str| != |all Unicode strings|), (4.) is specific to Chinese.)
> "Character" is completely meaningful, as demonstrated by the fact the Unicode Consortium defines it, and by the fact that huge amounts of software has been written based on that definition, and referring to it in documentation.
A lot of people write bad code does not make bad code good. Ambiguous technical documentation is likewise not made good by being ambiguous. Any use of "character" in technical writing would be made more clear by replacing it with one of the actual technical terms defined by Unicode, whether that's "UTF-16 code point", "USV", "byte", etc. "Character" leaves far too much up to the imagination of the reader.
> there are values in the former that are absent in the latter, and again, this is why encoding to utf8 or any utf encoding is fallible in Python.
Yes, yes, the `str` type may contain data that doesn't represent a valid string. I've already explained elsewhere ITT that this is a feature.
And sure, pedantically it should be "UCS-4" rather than UTF-32 in my post, since a str object can be created which contains surrogates. But Python does not use surrogate pairs in representing text. It only stores surrogates, which it considers invalid at encoding time.
Whenever a `str` represents a valid string without surrogates, it will reliably encode. And when bytes are decoded, surrogates are not produced except where explicitly requested for error handling.
> The number of Unicode scalar values in the string. (If the string were encoded in UTF-32, the length of that array.)
Ah.
Good news: since Python doesn't use surrogate pairs to represent valid text, these are the same whenever the `str` contents represent a valid text string in Python. And the cases where they don't, are rare and more or less must be deliberately crafted. You don't even get them from malicious user input, if you process input in obvious ways.
> The Unicode definition of "character" is not a technical definition, it's just there to help humans.
You're missing the point. The facepalm emoji has 5 characters in it. The Unicode Consortium says so. And they are, indisputably, the ones who get to decide what a "character" is in the context of Unicode.
I linked to the glossary on unicode.org. I don't understand how it could get any more official than that.
Or do you know another word for "the thing that an assigned Unicode code point has been assigned to"? cf. also the definition of https://www.unicode.org/glossary/#encoded_character , and note that definition 2 for "character" is "synonym of abstract character".
As the other comment says, Python considers strings to be a sequence of codepoints, hence the length of a string will be the number of codepoints in that string.
I just relied on this fact yesterday, so it's kind of a funny timing. I wrote a little script that looks out for shenanigans in source files. One thing I wanted to explore was what Unicode blocks a given file references characters from. This is meaningless on the byte level, and meaningless on the grapheme cluster level. It is only meaningful on the codepoint level. So all I needed to do was to iterate through all the codepoints in the file, tally it all up by Unicode block, and print the results. Something this design was perfectly suited for.
Now of course:
- it coming in handy once for my specific random workload doesn't mean it's good design
- my specific workload may not be rational (am a dingus sometimes)
- at some point I did consider iterating by grapheme clusters, which the language didn't seem to love a whole lot, so more flexibility would likely indeed be welcome
- I am well and fully aware that iterating through data a few bytes at a time is abject terrible and possibly a sin. Too bad I don't really do coding in any proper native language, and I have basically no experience in SIMD, so tough shit.
But yeah, I really don't see why people find this so crazy. The whole article is in good part about how relying on grapheme cluster semantics makes you Unicode version dependent and that being a bit hairy, so it's probably not a good idea to default to it. At which point, codepoints it is. Counting scalars only is what would be weird in my view, you're "randomly" doing skips over the data potentially.
I'm currently working with some local legacy code, so I primarily wanted to scan for incorrectly transcoded accented characters (central-european to utf-8 mishaps) - did find them.
Also good against data fingerprinting, homoglyph attacks in links (e.g. in comments), pranks (greek question mark vs. semicolon), or if it's a strictly international codebase, checking for anything outside ASCII. So when you don't really trust a codebase and want to establish a baseline, basically.
But I also included other features, like checking line ending consistency, line indentation consistency, line lengths, POSIX compliance, and encoding validity. Line lengths were of particular interest to me, having seen some malicious PRs recently to FOSS projects where the attacker would just move the payload out of sight to the side, expecting most people to have word wrap off and just not even notice (pretty funny tbf).