Hacker Newsnew | past | comments | ask | show | jobs | submit | phoe-krk's commentslogin

"Version control systems", in case of AI, mean that their knowledge will stay frozen in time, and so their usefulness will diminish. You need fresh data to train AI systems on, and since contemporary data is contaminated with generative AI, it will inevitably lead to inbreeding and eventual model collapse.



Natural languages evolve so slowly that writing and editing rules for them is easily achievable even this way. Think years versus minutes.


Aight you win fam, I was trippin fr. You're absolutely bussin, no cap. Harvard should be taking notes.

(^^ alien language that was developed in less than a decade)


The existence of common slang which isn't used in the sort of formal writing that grammar linting tools are typically designed to promote is more of a weakness of learning grammar by a weighted model of the internet vs formal grammatical rules than a strength.

Not an insurmountable problem, ChatGPT will use "aight fam" only in context-sensitive ways and will remove it if you ask to rephrase to sound more like a professor, but RHLFing slang into predictable use is likely a bigger potential challenge than simply ensuring the word list of an open source program is sufficiently up to date to include slang whose etymology dates back to the noughties or nineties, if phrasing things in that particular vernacular is even a target for your grammar linting tool...


Huh, this is the first time I've seen "noughties" used to describe the first decade of the 2000s. Slightly amusing that it's surely pronounced like "naughties". I wonder if it'll catch on and spread.


‘Noughties’ was popular in Australia from 2010 onwards. Radio stations would “play the best from the eighties nineties noughties and today”.


Common in Britain too, also appears in the opening lines of the Wikipedia description for the decade and the OED.


The fact that you never saw it before suggests it did not catch on and spread during the last 25 years.


Pedantically,

aight, trippin, fr (at least the spoken version), and fam were all very common in the 1990s (which was the last decade I was able to speak like that without getting jeered at by peers).


I don't think anyone has the need to check such a message for grammar or spelling mistakes. Even then, I would not rely on a LLM to accurately track this "evolution of language".


What if you're writing emails to GenZers?


As a zoomer, I'd rather not receive emails that sound like they're written by a moron.


Attempting to write like a GenZ when you’re not gets you “hello fellow kids” and “Boomer” right away.


Yes, precisely. This "less than a decade" is magnitudes above the hours or days that it would take to manually add those words and idioms to proper dictionaries and/or write new grammar rules to accomodate aspects like skipping "g" in continuous verbs to get "bussin" or "bussin'" instead of "bussing". Thank you for illustrating my point.

Also, it takes at most few developers to write those rules into a grammar checking system, compared to millions and more that need to learn a given piece of "evolved" language as it becomes impossible to avoid learning it. It's not only fast enough to do this manually, it also takes much less work-intensive and more scalable.


Not exactly. It takes time for those words to become mainstream for a generation. While you'd have to manually add those words in dictionaries, LLMs can learn these words on the fly, based on frequency of usage.


At this point we're already using different definitions of grammar and vocabulary - are they discrete (as in a rule system, vide Harper) or continuous (as in a probability, vide LLMs). LLMs, like humans, can learn them on the fly, and, like humans, they'll have problems and disagreements judging whether something should be highlighted as an error or not.

Or, in other words: if you "just" want a utility that can learn speech on the fly, you don't need a rigid grammar checker, just a good enough approximator. If you want to check if a document contains errors, you need to define what an error is, and then if you want to define it in a strict manner, at that point you need a rule engine of some sort instead of something probabilistic.


I’m glad we have people at HN who could have eliminated decades of effort by tens of thousands of people, had they only been consulted first on the problem.


Which effort? Learning a language is something that can't be eliminated. Everyone needs to do it on their own. Writing grammar checking software, though, can be done few times and then copied.


Please share your reasoning that led you to this conclusion -- that natural language "evolves slowly". You also seem to be making an assumption that natural languages (English, I'm assuming) can be well defined by a simple set of rigid patterns/rules?


> Please share your reasoning that led you to this conclusion -- that natural language "evolves slowly".

Languages are used to successfully communicate. To achieve this, all parties involved in the communication must know the language well enough to send and receive messages. This obviously includes messages that transmit changes in the language, for instance, if you tried to explain to your parents the meaning of the current short-lived meme and fad nouns/adjectives like "skibidi ohio gyatt rizz".

It takes time for a language feature to become widespread and de-facto standardized among a population. This is because people need to asynchronously learn it, start using it themselves, and gain critical mass so that even people who do not like using that feature need to start respecting its presence. This inertia is the main source of slowness that I mention, and also and a requirement for any kind of grammar-checking software. From the point of such software, a language feature that (almost) nobody understands is not a language feature, but an error.

> You also seem to be making an assumption that natural languages (English, I'm assuming) can be well defined by a simple set of rigid patterns/rules?

Yes, that set of patterns is called a language grammar. Even dialects and slangs have grammars of their own, even if they're different, less popular, have less formal materials describing them, and/or aren't taught in schools.


Fair enough, thanks for replying. I don't see the task of specifying a grammar as straightforward as you do, perhaps. I guess I just didn't understand the chain of comments.

I find that clear-cut, rigid rules tend to be the least helpful ones in writing. Obviously this class of rule is also easy/easier to represent in software, so it also tends to be the source of false positives and frustration that lead me to disable such features altogether.


When you do writing as a form of art, rules are meant to be bent or broken; it's useful to have the ability to explicitly write new ones and make new forms of the language legal, rather than wrestle with hallucinating LLMs.

When writing for utility and communication, though, English grammar is simple and standard enough. Browsing Harper sources, https://github.com/Automattic/harper/blob/0c04291bfec25d0e93... seems to have a lot of the basics already nailed down. Natural language grammar can often be represented as "what is allowed to, should, or should not, appear where, when, and in which context" - IIUC, Harper seems to tackle the problem the same way.


I'm certainly not disputing the existence of grammar nor do I think an LLM is a good way to implement/check/enforce one. And now I realise how my first comment landed. Thanks again!


Your first point would be more fitting if a language checker would need a complete, computable grammar that can be parsed and understood. That would be problematic for natural languages.


You get it!!


Just because the rules aren’t set fully in stone, or can be bent or broken, doesn’t mean they don’t “exist” - perhaps not the way mathematical truths exist, but there’s something there.

Even these few posts follow innumerable “rules” which make it easier to (try) to communicate.

Perhaps what you’re angling against is where rules of language get set it stone and fossilized until the “Official” language is so diverged from the “vulgar tongue” that it’s incomprehensibly different.

Like church/legal Latin compared to Italian, perhaps. (Fun fact - the Vulgate translation of the Bible was INTO the vulgar tongue at the time: Latin).


    (fst-atom """   trd-atom frt-atom
      """     00001
      asdf    00002 """    fth-atom)
      qwer    00003 hahaha
      zxcv      """ hehehe
      """           hohoho
                    """
I'm not sure I'd like the above to be parseable.


Lisp programmer here.

Traditional S-expressions, by their definition, ignore most of whitespace; additionally, reading sexprs is always a linear operation without the need to backtrack by more than one character.

The suggestion from this post violates both assumptions by introducing a 2D structure to code. To quote this post's examples, it requires the multiline string in

    (fst-atom """   trd-atom)
              00001
              00002
              00003
                """
to be fully read before TRD-ATOM. It also forces the reading function to jump up and down vertically in order to read the structure in

    * (                               )  
    *   e (           ) (           )    
    *   q   m (     )     p (     )     *
            u   a a       o   a 2       *
            l             w             *
The author also states that

    (eq (mul (a a)) (pow (a 2)))
is less readable than

    * (                                                  )  
    *   *eq* (                   ) (                   )    
    *          *mul* (         )     *pow* (         )     *
                       *a* *a*               *a* *2*       *
                                                           *
Then there's the ending passage:

> we hope that the introduced complexity is justified by the data readability expressed this way.

I cannot force myself to read this post as anything but a very poor Befungesque joke.


It gets worse/better. Since Racket allows you to hook your own reader in front of (or in place of) the default reader, you can have things like 2D syntax:

    #lang 2d racket
    (require 2d/match)
     
    (define (subtype? a b)
      #2dmatch
      ╔══════════╦══════════╦═══════╦══════════╗
      ║   a  b   ║ 'Integer ║ 'Real ║ 'Complex ║
      ╠══════════╬══════════╩═══════╩══════════╣
      ║ 'Integer ║             #t              ║
      ╠══════════╬══════════╗                  ║
      ║ 'Real    ║          ║                  ║
      ╠══════════╣          ╚═══════╗          ║
      ║ 'Complex ║        #f        ║          ║
      ╚══════════╩══════════════════╩══════════╝)
https://docs.racket-lang.org/2d/index.html


Truth be told, you can intercept the reader in Common Lisp, too, and here it actually makes some sense since the 2D value is immediately visually grokkable as an ASCII-art table. The proposed 2D sexpr notation does not have this.


That's amazing and terrible at the same time. I love it.


A normal tree would be easier to read

            eq
       mul      pow
     a    a    a   2


Turned 90, maybe?

  eq:
    mul:
      a
      a 
    pow:
      a
      2


That's the classical LISP way of doing it:

    (eq (mul a
             a)
        (pow a
             2))
or

    (eq
        (mul
             a
             a
        )
        (pow
             a
             2
        )
    )


x*x == pow(x,2)


We have a winner!

Actually, I'd suggest a slight improvement: x*x = x^2


x · x = x²


(== (* x x) (pow x 2))


(= * x x ^ x 2)


no yaml programming please :(


From the YAML inventor himself: https://yamlscript.org/

The length people go to avoid Lisp, only to reinvent it, badly.


Yes that part must be a joke!

I’ve seen dozens of attempts to make S-Exp “better” even the original M-Exp. I also did some experiments myself. But at the end, I come back to goo’ol s-exp. Seems to be a maximum (or minimum) found just perchance.


Here is another example, an axiom from propositional logic:

    (impl (impl p (impl q r)) (impl (impl p q) (impl p r)))
which, vertically indented in a transposed block, looks like this:

    * (                                               )
    *   i (               ) (                       )
    *   m   i p (       )     i (       ) (       )
        p   m     i q r       m   i p q     i p r
        l   p     m           p   m         m           *
            l     p           l   p         p           *
                  l               l         l           *
which, using transposed lines within the transposed block, finally looks like this:

    * (                                                                                           )
    *   *impl* (                               ) (                                              )   *
    *            *impl* *p* (                )     *impl* (                ) (                )     *
                              *impl* *q* *r*                *impl* *p* *q*     *impl* *p* *r*       *
This time I won't make any judgements. Could be good, could be bad, you decide.


Not sure if that example helps. You can make any programming language hard to read without some basic formatting. The way I would write the sexpr would be:

  (impl
    (impl 
       p 
       (impl q r))
    (impl
       (impl p q)
       (impl p r)))
It's clear when each section begins and ends and doesn't require complex parsing rules.


That looks clean, can't argue that.


Thanks for restoring my sanity. Was quite confused of the value added by the author.


Sorry for the confusion. I must be a very disturbed person because I kind of like what is explained there.


Here, I brought down the enthusiasm a bit in the closing word. I hope it creates less confusion now.


Yes, it is indeed a risk. It is much better to only allow people "not of sound mind" to kill themselves the way they currently do - via brutal, goresome, and unexpected methods, which also often endanger others as well.

/s


It's unbelievable that over the whole course of PS3's lifespan we've gone from "we will never be able to emulate this at full speed, the hardware is too slow and the Cell architecture too alien" to "why is PS3 emulation so fast, optimizations explained". I've been loosely tracking various emulators' progress and it's hats off with regards to the ingenuity behind all of the mechanisms that make it possible to emulate things fast enough.


I don't think anyone with knowledge of emulation (from a research and development side) would say it's impossible. The Cell is a standard PPC core with some well-documented[1] coprocessors.

A more realistic response would be: "computing hardware would not be powerful enough to emulate the PS3 in it's lifetime". We're now two decades out from it's release, and a decade out from it's final phase-out, so it seems that was a fair assessment.

1 - https://arcb.csc.ncsu.edu/~mueller/cluster/ps3/SDK3.0/docs/a...


Back in those days we didn't have that many cores,etc so the raw computation power of the PS3 was an issue in itself and the SPU was a kind of middle-ground between shaders and CPU's so you probably had to use a regular CPU to emulate it.

We have multicore machines with far more cores today so we can match the computation unit count and performance.

The other part was that older consoles (8 and 16 bit era) really needed a lot of cycle-exact emulation to not fail and that requires an order of magnitude faster CPU's to manage emulating it properly and with CPU's hitting the Ghz limits around the same time we thought it'd be impossible to do that level of cycle-exactness needed.

Luckily though, because the PS3 needed optimal multi-core programming and the way to achieve the maximum throughput for that was to use DMA channels to shuffle data between CPU/SPU parts, emulator authors can probably use them as choke-points to handle emulation on a slightly more high level and avoid trying to manage cycle-exact timings.


The nice thing about more recent architectures is that no one (including the original authors) can rely on cycle-exactness because of the basically unpredictable effects of caches and speculative execution and bus contention and (...).

Most of these, as a big exception, do not apply to the Cell running code on data in its local memory, but fortunately, it's different as seen from other system components, as you say.


Sony learned their lesson from what happened with the PS1, where commercial emulators like Bleem became available during the product's lifetime. It was probably not a huge deal in terms of lost sales, but Sony really didn't like this, as evident by their lawsuit (which also failed).

The PS2 with its Emotion engine was a huge leap which was pretty hard to emulate for a while. And the PS3 was even harder. Yes the developers hated the Cell architecture, but overall, Sony managed to create a pretty good system which spawned incredible games, while also being so hard to emulate that it took over a decade to reach a point where it's done properly, and almost 20 years to reach a point where it's considered really fast.

Compare this to the Switch, which was being emulated pretty well from the get go. This allowed some people to just make do with an emulator instead of buying the console (and the games). Actually this goes for pretty much all Nintendo devices.


Sony didn't create the cell architecture to prevent efficient emulation. At the time, manufacturers tried to get as much performance as possible from the manufacturing dollar under the assumption that developers would optimize their games for the machine. It was actually a partial failure, as few third party titles made full use of the architecture.


Kinda, in so many respects the PS3 SPU's that many hated was just taking the PS2 VU's to the next level as the programming model was very similar(shuffle blocks of data via DMA to fast vector units).

As a former PS2 developer I mostly thought "cool, VU's with more memory".

Few games really utilized the PS2 to it's fullest either (there's an port of GTA3 and GTA:VC to the older Dreamcast that's coming along very well).

The thing that really bit Sony here for the PS3 was that many PS2 titles (The PS2 GTA games being the prime example!) used the Renderware engine (a few others were available but it was the most popular afaik), so the complexity of the PS2 never really hit developers who were making games just below the absolute top tier.

When EA bought up Renderware slightly before the PS3 release, they closed off sales while honoring existing sales only so the most used cross platform engine was suddenly off limits to most third parties for the PS3 (Iirc is why Rockstar released that ping-pong game as an engine test before GTA4 and 5).

And the perceptions about third party engines also took a hit so not only was the most popular engine closed off, bigger developers were also became wary of relying on third party engines at all (during the PS3 period) until Unity later took off from indie usage.


That is really interesting thanks. I always wondered what happened to renderware or why I stopped seeing it after the PS2.


What evidence is there that Sony designed their hardware to be hard to emulated? As an aside: The n64 is hard to emulated and yet ultraHLE appeared right in the middle of its commercial life.


> Actually this goes for pretty much all Nintendo devices.

Roughly 30 years later and N64 emulation is not fully solved.


Fully solved how? It's in a great state.

Angrylion brought a new paradigm to n64 emulation, which is "fuck it, Low Level Emulation is fully doable now", and then that incredibly successful work was ported to run as a GPU shader, where it works a million times better! Now even medium powered devices, like the Steam Deck, can run low level emulation of n64 games at upscaled resolution and never run into graphics bugs, have fewer other bugs, have great performance, etc.

Classic bugs like the perfect dark remote camera that always had trouble on High Level Emulator plugins are just gone, no tweaks required. Games that wrote their own microcode run with no trouble. The crazy shit Rare and Factor 5 did at the end of the console's lifecycle just works in emulators.

https://www.libretro.com/index.php/category/parallel-n64/

Notably, Modern Vintage Gamer released a video titled "Why is Nintendo 64 emulation still a broken mess in 2025" and to make that video he had to contrive dumb scenarios: Running n64 emulation on the Playstation Vita and a Raspberry Pi.

Efficient and accurate high level emulation of the n64 is just not possible. You can't properly emulate the complex interactions going on in the n64 without huge amounts of overhead, it's too interconnected. Angrylion and Parallel-n64 proved, with that amount of overhead, you might as well do pixel accurate low level emulation and just eliminate an entire class of emulation bugs. When angrylion came out, around 2017, even a shitty laptop with a couple cores could run n64 games pixel accurate at native resolution and full speed.

In fact, on the Raspberry Pi that MVG is complaining about in the above mentioned video, he is complaining that "n64 emulation is a broken mess" because he is getting 50fps in conkers bad fur day. Because he is running it upscaled. He's complaining "n64 emulation is a broken mess" because the Raspberry Pi has a garbage GPU. Laptop integrated GPUs, even budget laptop integrated GPUs have no problems with parallel-n64

High level emulation was always a crutch, and never a good approach for n64 emulation. Even in it's heyday, it relied on per-game patches.

Notably, the Dolphin team ended up finding the same reality. What finally solved some serious emulator problems dealing with the gamecube having a mostly Fixed Function Pipeline graphics system that could be updated whenever, a situation that does not translate at all to computer graphics systems that expect you have individual shader programs to call with certain materials, was to write a giant shader that literally emulated the entire gamecube graphics hardware and use that while you wait for the emulated shader to compile. Ubershaders they call it.


To be fair PC CPUs and GPUs have evolved leaps and bounds form the beginning of PS3 emulation till today.


If we have a Google Graveyard [0], perhaps we need a Trump Policy one - showing how many days his various survived before being reverted or struck down by courts.

[0] https://killedbygoogle.com/


Kinda reminded me of another website, corrupt.af. It looks like the site went into 301 redirect mode sometime after Nov 2020 however (https://web.archive.org/web/20201105181227/https://corrupt.a...)

Maybe funding/motivation dried up after the 2020 election, but it really does seem like we need another site to catalog all admin's polices/initiatives/EOs that have been backtracked, walked back, or blocked by judicial oversight over the years


It's certainly a crisis in the Lisp world; everyone seems to be writing fewer comments.


Any such leftover behavior is going to be a reportable and fixable bug then.


I'm not sure it's explicitly in the policy or if any team can decide what to do…


It isn't in policy yet no.

https://wiki.debian.org/PrivacyIssues


It's not guaranteed that policies enforce every possible case though.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: