"Version control systems", in case of AI, mean that their knowledge will stay frozen in time, and so their usefulness will diminish. You need fresh data to train AI systems on, and since contemporary data is contaminated with generative AI, it will inevitably lead to inbreeding and eventual model collapse.
The existence of common slang which isn't used in the sort of formal writing that grammar linting tools are typically designed to promote is more of a weakness of learning grammar by a weighted model of the internet vs formal grammatical rules than a strength.
Not an insurmountable problem, ChatGPT will use "aight fam" only in context-sensitive ways and will remove it if you ask to rephrase to sound more like a professor, but RHLFing slang into predictable use is likely a bigger potential challenge than simply ensuring the word list of an open source program is sufficiently up to date to include slang whose etymology dates back to the noughties or nineties, if phrasing things in that particular vernacular is even a target for your grammar linting tool...
Huh, this is the first time I've seen "noughties" used to describe the first decade of the 2000s. Slightly amusing that it's surely pronounced like "naughties". I wonder if it'll catch on and spread.
aight, trippin, fr (at least the spoken version), and fam were all very common in the 1990s (which was the last decade I was able to speak like that without getting jeered at by peers).
I don't think anyone has the need to check such a message for grammar or spelling mistakes.
Even then, I would not rely on a LLM to accurately track this "evolution of language".
Yes, precisely. This "less than a decade" is magnitudes above the hours or days that it would take to manually add those words and idioms to proper dictionaries and/or write new grammar rules to accomodate aspects like skipping "g" in continuous verbs to get "bussin" or "bussin'" instead of "bussing". Thank you for illustrating my point.
Also, it takes at most few developers to write those rules into a grammar checking system, compared to millions and more that need to learn a given piece of "evolved" language as it becomes impossible to avoid learning it. It's not only fast enough to do this manually, it also takes much less work-intensive and more scalable.
Not exactly. It takes time for those words to become mainstream for a generation. While you'd have to manually add those words in dictionaries, LLMs can learn these words on the fly, based on frequency of usage.
At this point we're already using different definitions of grammar and vocabulary - are they discrete (as in a rule system, vide Harper) or continuous (as in a probability, vide LLMs). LLMs, like humans, can learn them on the fly, and, like humans, they'll have problems and disagreements judging whether something should be highlighted as an error or not.
Or, in other words: if you "just" want a utility that can learn speech on the fly, you don't need a rigid grammar checker, just a good enough approximator. If you want to check if a document contains errors, you need to define what an error is, and then if you want to define it in a strict manner, at that point you need a rule engine of some sort instead of something probabilistic.
I’m glad we have people at HN who could have eliminated decades of effort by tens of thousands of people, had they only been consulted first on the problem.
Which effort? Learning a language is something that can't be eliminated. Everyone needs to do it on their own. Writing grammar checking software, though, can be done few times and then copied.
Please share your reasoning that led you to this conclusion -- that natural language "evolves slowly".
You also seem to be making an assumption that natural languages (English, I'm assuming) can be well defined by a simple set of rigid patterns/rules?
> Please share your reasoning that led you to this conclusion -- that natural language "evolves slowly".
Languages are used to successfully communicate. To achieve this, all parties involved in the communication must know the language well enough to send and receive messages. This obviously includes messages that transmit changes in the language, for instance, if you tried to explain to your parents the meaning of the current short-lived meme and fad nouns/adjectives like "skibidi ohio gyatt rizz".
It takes time for a language feature to become widespread and de-facto standardized among a population. This is because people need to asynchronously learn it, start using it themselves, and gain critical mass so that even people who do not like using that feature need to start respecting its presence. This inertia is the main source of slowness that I mention, and also and a requirement for any kind of grammar-checking software. From the point of such software, a language feature that (almost) nobody understands is not a language feature, but an error.
> You also seem to be making an assumption that natural languages (English, I'm assuming) can be well defined by a simple set of rigid patterns/rules?
Yes, that set of patterns is called a language grammar. Even dialects and slangs have grammars of their own, even if they're different, less popular, have less formal materials describing them, and/or aren't taught in schools.
Fair enough, thanks for replying. I don't see the task of specifying a grammar as straightforward as you do, perhaps. I guess I just didn't understand the chain of comments.
I find that clear-cut, rigid rules tend to be the least helpful ones in writing. Obviously this class of rule is also easy/easier to represent in software, so it also tends to be the source of false positives and frustration that lead me to disable such features altogether.
When you do writing as a form of art, rules are meant to be bent or broken; it's useful to have the ability to explicitly write new ones and make new forms of the language legal, rather than wrestle with hallucinating LLMs.
When writing for utility and communication, though, English grammar is simple and standard enough. Browsing Harper sources, https://github.com/Automattic/harper/blob/0c04291bfec25d0e93... seems to have a lot of the basics already nailed down. Natural language grammar can often be represented as "what is allowed to, should, or should not, appear where, when, and in which context" - IIUC, Harper seems to tackle the problem the same way.
I'm certainly not disputing the existence of grammar nor do I think an LLM is a good way to implement/check/enforce one. And now I realise how my first comment landed. Thanks again!
Your first point would be more fitting if a language checker would need a complete, computable grammar that can be parsed and understood. That would be problematic for natural languages.
Just because the rules aren’t set fully in stone, or can be bent or broken, doesn’t mean they don’t “exist” - perhaps not the way mathematical truths exist, but there’s something there.
Even these few posts follow innumerable “rules” which make it easier to (try) to communicate.
Perhaps what you’re angling against is where rules of language get set it stone and fossilized until the “Official” language is so diverged from the “vulgar tongue” that it’s incomprehensibly different.
Like church/legal Latin compared to Italian, perhaps. (Fun fact - the Vulgate translation of the Bible was INTO the vulgar tongue at the time: Latin).
Traditional S-expressions, by their definition, ignore most of whitespace; additionally, reading sexprs is always a linear operation without the need to backtrack by more than one character.
The suggestion from this post violates both assumptions by introducing a 2D structure to code. To quote this post's examples, it requires the multiline string in
(fst-atom """ trd-atom)
00001
00002
00003
"""
to be fully read before TRD-ATOM. It also forces the reading function to jump up and down vertically in order to read the structure in
* ( )
* e ( ) ( )
* q m ( ) p ( ) *
u a a o a 2 *
l w *
It gets worse/better. Since Racket allows you to hook your own reader in front of (or in place of) the default reader, you can have things like 2D syntax:
Truth be told, you can intercept the reader in Common Lisp, too, and here it actually makes some sense since the 2D value is immediately visually grokkable as an ASCII-art table. The proposed 2D sexpr notation does not have this.
I’ve seen dozens of attempts to make S-Exp “better” even the original M-Exp. I also did some experiments myself. But at the end, I come back to goo’ol s-exp. Seems to be a maximum (or minimum) found just perchance.
Not sure if that example helps. You can make any programming language hard to read without some basic formatting. The way I would write the sexpr would be:
(impl
(impl
p
(impl q r))
(impl
(impl p q)
(impl p r)))
It's clear when each section begins and ends and doesn't require complex parsing rules.
Yes, it is indeed a risk. It is much better to only allow people "not of sound mind" to kill themselves the way they currently do - via brutal, goresome, and unexpected methods, which also often endanger others as well.
It's unbelievable that over the whole course of PS3's lifespan we've gone from "we will never be able to emulate this at full speed, the hardware is too slow and the Cell architecture too alien" to "why is PS3 emulation so fast, optimizations explained". I've been loosely tracking various emulators' progress and it's hats off with regards to the ingenuity behind all of the mechanisms that make it possible to emulate things fast enough.
I don't think anyone with knowledge of emulation (from a research and development side) would say it's impossible. The Cell is a standard PPC core with some well-documented[1] coprocessors.
A more realistic response would be: "computing hardware would not be powerful enough to emulate the PS3 in it's lifetime". We're now two decades out from it's release, and a decade out from it's final phase-out, so it seems that was a fair assessment.
Back in those days we didn't have that many cores,etc so the raw computation power of the PS3 was an issue in itself and the SPU was a kind of middle-ground between shaders and CPU's so you probably had to use a regular CPU to emulate it.
We have multicore machines with far more cores today so we can match the computation unit count and performance.
The other part was that older consoles (8 and 16 bit era) really needed a lot of cycle-exact emulation to not fail and that requires an order of magnitude faster CPU's to manage emulating it properly and with CPU's hitting the Ghz limits around the same time we thought it'd be impossible to do that level of cycle-exactness needed.
Luckily though, because the PS3 needed optimal multi-core programming and the way to achieve the maximum throughput for that was to use DMA channels to shuffle data between CPU/SPU parts, emulator authors can probably use them as choke-points to handle emulation on a slightly more high level and avoid trying to manage cycle-exact timings.
The nice thing about more recent architectures is that no one (including the original authors) can rely on cycle-exactness because of the basically unpredictable effects of caches and speculative execution and bus contention and (...).
Most of these, as a big exception, do not apply to the Cell running code on data in its local memory, but fortunately, it's different as seen from other system components, as you say.
Sony learned their lesson from what happened with the PS1, where commercial emulators like Bleem became available during the product's lifetime. It was probably not a huge deal in terms of lost sales, but Sony really didn't like this, as evident by their lawsuit (which also failed).
The PS2 with its Emotion engine was a huge leap which was pretty hard to emulate for a while. And the PS3 was even harder. Yes the developers hated the Cell architecture, but overall, Sony managed to create a pretty good system which spawned incredible games, while also being so hard to emulate that it took over a decade to reach a point where it's done properly, and almost 20 years to reach a point where it's considered really fast.
Compare this to the Switch, which was being emulated pretty well from the get go. This allowed some people to just make do with an emulator instead of buying the console (and the games). Actually this goes for pretty much all Nintendo devices.
Sony didn't create the cell architecture to prevent efficient emulation. At the time, manufacturers tried to get as much performance as possible from the manufacturing dollar under the assumption that developers would optimize their games for the machine. It was actually a partial failure, as few third party titles made full use of the architecture.
Kinda, in so many respects the PS3 SPU's that many hated was just taking the PS2 VU's to the next level as the programming model was very similar(shuffle blocks of data via DMA to fast vector units).
As a former PS2 developer I mostly thought "cool, VU's with more memory".
Few games really utilized the PS2 to it's fullest either (there's an port of GTA3 and GTA:VC to the older Dreamcast that's coming along very well).
The thing that really bit Sony here for the PS3 was that many PS2 titles (The PS2 GTA games being the prime example!) used the Renderware engine (a few others were available but it was the most popular afaik), so the complexity of the PS2 never really hit developers who were making games just below the absolute top tier.
When EA bought up Renderware slightly before the PS3 release, they closed off sales while honoring existing sales only so the most used cross platform engine was suddenly off limits to most third parties for the PS3 (Iirc is why Rockstar released that ping-pong game as an engine test before GTA4 and 5).
And the perceptions about third party engines also took a hit so not only was the most popular engine closed off, bigger developers were also became wary of relying on third party engines at all (during the PS3 period) until Unity later took off from indie usage.
What evidence is there that Sony designed their hardware to be hard to emulated?
As an aside:
The n64 is hard to emulated and yet ultraHLE appeared right in the middle of its commercial life.
Angrylion brought a new paradigm to n64 emulation, which is "fuck it, Low Level Emulation is fully doable now", and then that incredibly successful work was ported to run as a GPU shader, where it works a million times better! Now even medium powered devices, like the Steam Deck, can run low level emulation of n64 games at upscaled resolution and never run into graphics bugs, have fewer other bugs, have great performance, etc.
Classic bugs like the perfect dark remote camera that always had trouble on High Level Emulator plugins are just gone, no tweaks required. Games that wrote their own microcode run with no trouble. The crazy shit Rare and Factor 5 did at the end of the console's lifecycle just works in emulators.
Notably, Modern Vintage Gamer released a video titled "Why is Nintendo 64 emulation still a broken mess in 2025" and to make that video he had to contrive dumb scenarios: Running n64 emulation on the Playstation Vita and a Raspberry Pi.
Efficient and accurate high level emulation of the n64 is just not possible. You can't properly emulate the complex interactions going on in the n64 without huge amounts of overhead, it's too interconnected. Angrylion and Parallel-n64 proved, with that amount of overhead, you might as well do pixel accurate low level emulation and just eliminate an entire class of emulation bugs. When angrylion came out, around 2017, even a shitty laptop with a couple cores could run n64 games pixel accurate at native resolution and full speed.
In fact, on the Raspberry Pi that MVG is complaining about in the above mentioned video, he is complaining that "n64 emulation is a broken mess" because he is getting 50fps in conkers bad fur day. Because he is running it upscaled. He's complaining "n64 emulation is a broken mess" because the Raspberry Pi has a garbage GPU. Laptop integrated GPUs, even budget laptop integrated GPUs have no problems with parallel-n64
High level emulation was always a crutch, and never a good approach for n64 emulation. Even in it's heyday, it relied on per-game patches.
Notably, the Dolphin team ended up finding the same reality. What finally solved some serious emulator problems dealing with the gamecube having a mostly Fixed Function Pipeline graphics system that could be updated whenever, a situation that does not translate at all to computer graphics systems that expect you have individual shader programs to call with certain materials, was to write a giant shader that literally emulated the entire gamecube graphics hardware and use that while you wait for the emulated shader to compile. Ubershaders they call it.
If we have a Google Graveyard [0], perhaps we need a Trump Policy one - showing how many days his various survived before being reverted or struck down by courts.
Maybe funding/motivation dried up after the 2020 election, but it really does seem like we need another site to catalog all admin's polices/initiatives/EOs that have been backtracked, walked back, or blocked by judicial oversight over the years