What's Going on in Machine Learning? Some Minimal Models

deng · 2024-08-23T14:27:28 1724423248

Say what you will about Wolfram: he's a brilliant writer and teacher. The way he's able to simplify complex topics without dumbing them down is remarkable. His visualizations are not only extremely helpful but usually also beautiful, and if you happen to have Mathematica on hand, you can easily reproduce what he's doing. Anytime someone asks me for a quick introduction to LLMs, I always point them to this article of his, which I still think is one of best and most understandable introductions to the topic:

https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-...

mebiles · 2024-08-23T21:34:29 1724448869

“entropy is the log of the number of states that a system can be in that are consistent with all the information known about that system”. he is amazing at explaining things.

vessenes · 2024-08-23T12:47:38 1724417258

Classic Wolfram — brilliant, reimplements / comes at a current topic using only cellular automata, and draws some fairly deep philosophical conclusions that are pretty intriguing.

The part I find most interesting is his proposal that neural networks largely work by “hitching a ride” on fundamental computational complexity, in practice sort of searching around the space of functions representable by an architecture for something that works. And, to the extent this is true, that puts explainability at fundamental odds with the highest value / most dense / best deep learning outputs — if they are easily “explainable” by inspection, then they are likely not using all of the complexity available to them.

I think this is a pretty profound idea, and it sounds right to me — it seems like a rich theoretical area for next-gen information theory, essentially are their (soft/hard) bounds on certain kinds of explainability/inspectability?

FWIW, there’s a reasonably long history of mathematicians constructing their own ontologies and concepts and then people taking like 50 or 100 years to unpack and understand them and figure out what they add. I think of Wolfram’s cellular automata like this, possibly really profound, time will tell, and unusual in that he has the wealth and platform and interest in boosting the idea while he’s alive.

phyalow · 2024-08-23T13:24:00 1724419440

Agree. (D)NNs have a powerful but somewhat loose inductive bias. They're great at capturing surface-level complexity but often miss the deeper compositional structure. This looseness, in my opinion, stems from a combination of factors: architectures that are not optimally designed for the specific task at hand, limitations in computational resources that prevent us from exploring more complex and expressive models, and training processes that don't fully exploit the available information or fail to impose the right constraints on the fitting process.

The ML research community generally agrees that the key to generalization is finding the shortest "program" that explains the data (Occam's Razor / MDL principle). But directly searching for these minimal programs (architecture space, feature space, training space etc) is exceptionally dificult, so we end up approximating the search to look something like GPR or circuit search guided by backprop.

This shortest program idea is related to Kolmogorov complexity (arises out of classical Information Theory) - i.e. the length of the most concise program that generates a given string (because if your not operating on the shortest program, then there is looseness/or overfit!). In ML, the training data is the string, and the learned model is the program. We want the most compact model that still captures the underlying patterns.

(D)NNs have been super successful, their reliance on approximations suggests there's plenty of room for improvement in terms of inductive bias and more program-like representations. I think approaches that combine the flexibility of neural nets with the structured nature of symbolic representations will lead to more efficient and performant learning systems. It seems like a rich area to just "try stuff" in.

Leslie Valiant touches on some of the same ideas in his book "Probably approximately correct" which tries to nail down some of the computational phenomena associated with the emergent properties of reality (its heady stuff).

jderick · 2024-08-23T15:43:05 1724427785

What is GPR?

phyalow · 2024-08-23T16:52:10 1724431930

Gaussian Process Regression (a form of Bayesian Optimisation to try and get to the right "answer"/parameter space sooner) - explained in some context here...

https://brendanhasz.github.io/2019/03/28/hyperparameter-opti...

In saying that random param search still works well enough in many cases.

bob1029 · 2024-08-23T16:17:54 1724429874

> neural networks largely work by “hitching a ride” on fundamental computational complexity

If you look at what a biological neural network is actually trying to optimize for, you might be able to answer The Bitter Lesson more adeptly.

Latency is a caveat, not a feature. Simulating a biologically-plausible amount of real-time delay is almost certainly wasteful.

Leaky charge carriers are another caveat. In a computer simulation, you can never leak any charge (i.e. information) if you so desire. This would presumably make the simulation more efficient.

Inhibitory neurology exists to preserve stability of the network within the constraints of biology. In a simulation, resources are still constrained but you could use heuristics outside biology to eliminate the fundamental need for this extra complexity. For example, halting the network after a limit of spiking activity is met.

Learning rules like STDP may exist because population members learned experiences cannot survive across generations. If you have the ability to copy the exact learned experiences from prior generations into new generations (i.e. cloning the candidates in memory), this learning rule may represent a confusing distraction more than a benefit.

captainclam · 2024-08-23T15:47:13 1724428033

"Classic Wolfram — brilliant, reimplements / comes at a current topic using only cellular automata, and draws some fairly deep philosophical conclusions that are pretty intriguing."

Wolfram has a hammer and sees everything as a nail. But its a really interesting hammer.

mjburgess · 2024-08-23T14:51:50 1724424710

> And, to the extent this is true, that puts explainability at fundamental odds with the highest value / most dense / best deep learning outputs — if they are easily “explainable” by inspection, then they are likely not using all of the complexity available to them.

Could you define explainability in this context?

vessenes · 2024-08-24T20:02:33 1724529753

The ability of humans, by inspection, to determine why a program is constructed in the way that it is vis-a-vis the goal/output.

mjburgess · 2024-08-25T15:31:54 1724599914

hmm -- could you speak to the meaning of explanation here?

It seems youre talking about the ability of a researcher to diagnose the structure of the model?

taneq · 2024-08-23T16:00:11 1724428811

> searching around the space of functions representable by an architecture for something that works

That’s… why we’re here?

taneq · 2024-08-23T16:01:11 1724428871

> searching around the space of functions representable by an architecture for something that works

That’s… why we’re here? How else could we characterise what any learning algorithm does?

nuz · 2024-08-23T13:31:49 1724419909

I can never read comments on any wolfram blog on HN because they're always so mean spirited. I'm seeing a nerdy guy explaining things from a cool new perspective I'm excited to read through. The comments almost always have some lens against him being 'self centered' or obsessing about cellular automata (who cares we all have our obsessions)

whalee · 2024-08-23T13:39:40 1724420380

The complaint about his ego is warranted, but he also earned it. Wolfram earned his PhD in particle physics from cal tech at 21 years old. Feynman was on his thesis committee. He spent time at the IAS. When he speaks about something, no matter in which configuration he chooses to do so, I am highly inclined to listen.

leobg · 2024-08-23T14:15:53 1724422553

Same here on anything Elon. HN is like an uncle who knows a lot and teaches you new things every time you hang out with him… but who also has a few really weird sore spots that you better never mention in his presence.

aantix · 2024-08-23T14:22:18 1724422938

Agree - the whole "he's great" vs "he's evil and a con" just gets old.

Everyone is a complex mixture of both.

My dad loved reading and sharing technical subjects with me and is probably part of the reason why I enjoy a good career today.

He also cheated on my mom for 30 years for which we didn't discover until the last 3 years of his life. We didn't have much money growing up. He probably took her out to dinner with money we didn't have.

It's perfectly normal to both love and hate parts of someone, but not reject them as a whole.

tines · 2024-08-23T14:48:53 1724424533

Elon's not more evil than anyone else, he's just way dumber than everyone expected him to be when we were in the honeymoon phase.

jandrese · 2024-08-23T16:04:45 1724429085

Elon is unusual not in how much his fans adore him (and they do), but the degree to which his haters hate him. I've seen plenty of places where people simply stating facts or talking about their own experiences gets downvoted to invisibility.

Even points that you wouldn't think would be controversial, like "Tesla disrupted the EV industry that was previously only interested in building compliance cars for rich crunchy-granola city weirdos", are almost instantly shot down. Anything that isn't just outright hatred for Elon gets slammed to hell. It's almost impossible to have a balanced discussion about him or his companies.

I also don't think he's as dumb as people think. He has an eye for industries that are ripe for disruption and has actually managed to deliver at least twice so far. There is no question at all that SpaceX is the premier launch provider in the world today (I know I'm getting downvoted for saying this). Tesla sells about half of the EVs sold in the US and basically didn't exist 10 years ago. The jury is still out on the Boring company. Neuralink and Optimus are still too new to tell. Even Paypal is a good example of seeing a market gap and exploiting it. Twitter/X was the real stinker. Elon is exactly the wrong guy to be running a social media company, and worse he thinks he is winning by out-foxing Fox News. He's got that engagement algorithm brain that results when you chase bigger numbers. It is the biggest right wing black hole you could imagine and Elon himself has fallen right down the center. You couldn't ask for a more perfect radicalization system than he has built with the "pay for voice" scheme with basically no bot protection.

tines · 2024-08-23T16:26:58 1724430418

Agree about SpaceX, but I'm not sure how much of that is about Elon. Tesla used to be awesome, its nascency is what I was referring to as the "honeymoon phase," but it definitely feels like it's gone downhill with the Cybertruck goofiness. Like it feels like it has some awesome engineers that do the cool stuff, and then you have Elon interfering from the top and injecting his goofy ass ideas while everyone else is trying to make the company work. Feels like Tesla is like, super cool engineering and then oh yeah there's Elon over there in the corner playing with his toys and we try to keep him from messing things up too much. Every Tesla engineer I've heard from echoes this, he springs random requirements on them and they often learn of new product requirements or features from his Twitter posts and then they're in panic mode trying to implement whatever half-baked stupid idea he had on the toilet at 3 AM.

Most of your last paragraph about Twitter is what has caused me to think he's dumb, in spite of having a few early successes which may be attributable to survivor bias (you have millions of people taking random risks, some of them are going to pan out randomly, and after the first one you have money so it's easy to make more money). Calling the rescuer guy a "pedo guy" during the crisis in Thailand and just myriad other inane utterances have tanked his valuation in my eyes.

But I think I might agree with you that "dumb" is the wrong word. Perhaps "unwise" is what I'm feeling. He may have skills and intelligence to apply those skills to start businesses or make money or whatever. But he doesn't apply that intelligence to the end that we might call wisdom, and I think that's what I as well as a lot of other people are trying to articulate when they say he's dumb. Agree with you though, people aren't black and white.

jandrese · 2024-08-23T17:44:52 1724435092

Maybe "poor emotional intelligence". Like you said, a person can plausibly get lucky on their first business and make it big once. But to do it repeatedly and to such a large degree takes skill. Say what you will about his politics, but his companies deliver. Even Hyperloop, which is basically just a bad subway, still has more buildout than pretty much any subway system in the US in the past decade. That might just be Elon willing to lose a ton of money on it to get it built though.

Might be interesting to compare and contrast the Las Vegas Hyperloop vs. the Las Vegas Monorail. Which is more of a boondoggle?

BriggyDwiggs42 · 2024-08-24T08:07:03 1724486823

I was with ya until the hyperloop example. Hyperloop was pushed for years as a vacuum tube train. Serious prototypes of this were developed. If it had worked out absolutely perfectly, it would still have lacked the throughput to be a good idea, but the prevalence of earthquakes in its target area, the inherent instability of a vacuum tube, the massive intended scale of the system, the economic feasibility of digging a huge tunnel, etc all made it an absolute shitfest of an idea. They then came up with a new idea, which was a one-lane tunnel without a safety walkway, which is utterly useless for passenger transport because it has, at best, the throughput of a two-lane highway. The hyperloop is probably elon’s most embarrassing failure after twitter.

nurettin · 2024-08-26T05:20:53 1724649653

I have observed that the topic naturally gravitates towards mentioning Elon in every thread where people are confessing their infatuation with grifters.

ralusek · 2024-08-23T01:51:46 1724377906

There should be a Godwin’s Law for Stephen Wolfram. Wolfram’s Law: as the length of what he’s saying increases, the probability it will be about cellular automata approaches 1.

That being said, I’m enjoying this. I often experiment with neural networks in a similar fashion and like to see people’s work like this.

throwup238 · 2024-08-23T03:40:58 1724384458

There's a Wolfram Derangement Syndrome to go along with it, too: https://news.ycombinator.com/item?id=38975876

nxobject · 2024-08-23T02:17:57 1724379477

...and the probability that he names something after himself approaches 1/e.

krackers · 2024-08-23T02:41:59 1724380919

>Instead what seems to be happening is that machine learning is in a sense just “hitching a ride” on the general richness of the computational universe. It’s not “specifically building up behavior one needs”; rather what it’s doing is to harness behavior that’s “already out there” in the computational universe.

Is this similar to the lottery ticket hypothesis?

Also the visualizations are beautiful and a nice way to demonstrate the "universal approximation theorem"

DataDive · 2024-08-23T12:59:40 1724417980

I find it depressing that every time Stephen Wolfram wants to explain something, he slowly gravitates towards these simplistic cellular automata and tries to explain everything through them.

It feels like a religious talk.

The presentation consists of chunks of hard-to-digest, profound-sounding text followed by a supposedly informative picture with lots of blobs, then the whole pattern is repeated over and over.

But it never gets to the point. There is never an outcome, never a summary. It is always some sort of patterns and blobs that are supposedly explaining everything ... except nothing useful is ever communicated. You are supposed to "see" how the blobs are "everything..." a new kind of Science.

He cannot predict anything; he can not forecast anything; all he does is use Mathematica to generate multiplots of symmetric little blobs and then suggests that those blobs somehow explain something that currently exists

I find these Wolfram blogs a massive waste of time.

They are boring to the extreme.

benlivengood · 2024-08-23T19:32:35 1724441555

I think that unless Wolfram is directly contradicting the Church-Turing thesis it is ok to skip over the finite automata sections.

It is a given from Church-Turing that some automata will be equivalent to some turing machines, and while it is a profound result the specific details of the equivalence isn't super important unless, perhaps, it becomes super fast and efficient to run the automata instead of Von Neumann architecture.

ActionHank · 2024-08-23T13:33:54 1724420034

Got me feeling self conscious here.

I often explain boring things with diagrams consisting of boxes and arrows, some times with different colours.

wrsh07 · 2024-08-23T02:40:11 1724380811

Because of the computational simplicity, I think there's a possibility that we will discover very cheap machine learning techniques that are discrete like this.

I think this is novel (I've seen BNN https://arxiv.org/pdf/1601.06071 This actually makes things continuous for training, but if inference is sufficiently fast and you have an effective mechanism for permutation, training could be faster using that)

I am curious what other folks (especially researchers) think. The takes on Wolfram are not always uniformly positive but this is interesting (I think!)

sdenton4 · 2024-08-23T15:31:00 1724427060

So, the thing is that linear algebra operations are very cheap already... you just need a lot of them. Any other 'cheap' method is going to have a similar problem: if the unit is small and not terribly expressive, you need a whole lot of them. But it will be compounded by the fact that we don't have decades of investment in making these new atomic operations as fast and cheap as possible.

A good take-away from the Wolfram writeup is that you can do machine learning on any pile of atoms you've got lying around, so you might as well do it on whatever you've got the best tooling for - right now this is silicon doing fixed-point linear algebra operations, by a long shot.

dboreham · 2024-08-23T15:57:50 1724428670

My take is that the neural network is a bit of a red herring -- people poked around in brains to see what was going on and noticed a network structure with many apparently simple computing nodes. So they tried making similar structures in software and quickly discovered they could do some interesting things. But it may turn out that the neural network was just nature's best implementation for "field programmable matrix manipulation". You can implement the functionality in other ways, not resembling neural networks.

wrsh07 · 2024-08-23T18:51:21 1724439081

I think the point of wolfram's essay is that you don't need the base unit of computation to be a dot product

wrsh07 · 2024-08-23T15:44:39 1724427879

Sort of, yes. But if the existing thing were "the cheapest", quantization wouldn't exist.

It depends on what your constraint is! So if you're memory constrained (or don't have a GPU), a bunch of 1 bit atoms with operations that are very fast on CPU might be better

I haven't thought very deeply about whether it's provably faster to do gradient descent on 32 bits vs 8, but it probably always is. What's the next step to speed up training?

wrsh07 · 2024-08-23T15:45:29 1724427929

But to your point - that is how I feel about graph nns vs transformers or the fully connected set (GPUs are so good at transformers and fully connected nns, even if there is a structure that makes sense we don't have the hardware to have it make sense.... Unless grok makes it cheap??)

sdenton4 · 2024-08-24T14:58:54 1724511534

Perhaps; in a lot of cases the architecture barely matters. Transformers took a lot of extra tricks to get working well; the ConvNext paper showed that applying those same tricks to convolutional networks can fully close the gap.

https://arxiv.org/abs/2201.03545

usgroup · 2024-08-23T07:09:31 1724396971

Tsetlin machines have been around for some time:

https://en.wikipedia.org/wiki/Tsetlin_machine

They are discrete, individually interpretable, and can be configured into complicated architectures.

abecedarius · 2024-08-23T17:49:59 1724435399

This looks like it might be interesting or might not, and I wish it said more in the article itself about why it's cool rather than listing technicalities and types of machines. Do you have a favorite pitch in those dozens of references at the end?

saberience · 2024-08-23T14:20:59 1724422859

https://www.literal-labs.ai/

These guys are trying to make chips for ML using Tsetlin machines...

taneq · 2024-08-24T16:59:18 1724518758

For a wiki article, this seems almost deliberately obtuse. What actually is one, in plain language?

achrono · 2024-08-23T12:15:32 1724415332

>All one will be able to say is that somewhere out there in the computational universe there’s some (typically computationally irreducible) process that “happens” to be aligned with what we want.

>There’s no overarching theory to it in itself; it’s just a reflection of the resources that were out there. Or, in the case of machine learning, one can expect that what one sees will be to a large extent a reflection of the raw characteristics of computational irreducibility

Strikes me as a very reductive and defeatist take that flies in the face of the grand agenda Wolfram sets forth.

It would have been much more productive to chisel away at it to figure out something rather than expecting the Theory to be unveiled in full at once.

For instance, what I learn from the kinds of playing around that Wolfram does in the article is: neural nets are but one way to achieve learning & intellectual performance, and even within that there are a myriad different ways to do it, but most importantly: there is a breadth vs depth trade-off, in that neural nets being very broad/versatile are not quite the best at going deep/specialised; you need a different solution for that (e.g. even good old instruction set architecture might be the right thing in many cases). This is essentially why ChatGPT ended up needing Python tooling to reliably calculate 2+2.

jstanley · 2024-08-23T12:44:38 1724417078

> ChatGPT ended up needing Python tooling to reliably calculate 2+2.

This is untrue. ChatGPT very reliably calculates 2+2 without invoking any tooling.

nyrikki · 2024-08-23T13:48:52 1724420932

Nit, it predicts that it is the token '4'.

Token frequency in pre-training corpus and the way tokenization is implemented impacts arithmetic proficiency for LLMs.

OpenAI calls this out in the GPT4 technical report.

SkyBelow · 2024-08-23T14:17:10 1724422630

You can see this by giving it broken code and seeing what it can predict.

I gave copilot a number of implementations of factorial with the input of 5. When it recognized the correct implementations, it was able to combine the ideas of "factorial", "5", and "correct implementation" to output 120. But when I gave it buggy implementations, it could recognize they were wrong, but the concepts of "factorial", "5", and "incorrect implementation" weren't enough for it to output the correct wrong result produced. Even when I explained its attempts to calculate the wrong output was itself wrong, it couldn't 'calculate' the right answer.

e12e · 2024-08-23T15:13:04 1724425984

This makes very little sense (as a contrast to chatgpt predicted that the likely continuation of factorial and 5 is 120).

Perhaps if you are able to share the chat session it's possible to see if you likely confused the issue with various factorial implementations - or got chatgpt to run your code with 5 as input?

I mean the code is redundant:

https://chatgpt.com/share/be249097-5067-4e3d-93c7-3eebedb510...

nyrikki · 2024-08-23T23:02:03 1724454123

Do a google search with 'before:2020' on that code, that is recall from pre-training, not 'calculating'

e12e · 2024-08-24T20:13:35 1724530415

I misread gp's comment, we're in agreement.

achrono · 2024-08-23T13:15:41 1724418941

Sure, but I think you get my point.

dbrueck · 2024-08-23T13:38:26 1724420306

I believe that this is one of the key takeaways for reasoning about LLMs and other seemingly-magical recent developments in AI:

"tasks—like writing essays—that we humans could do, but we didn’t think computers could do, are actually in some sense computationally easier than we thought."

It hurts one's pride to realize that the specialized thing they do isn't quite as special as was previously thought.

wredue · 2024-08-23T14:52:13 1724424733

Computers still aren’t writing essays. They are stringing words together using copied data.

If they were writing essays, I would suggest that it wouldn’t be so ridiculously easy to pick out the obviously AI articles everywhere.

dboreham · 2024-08-23T15:53:17 1724428397

> They are stringing words together using copied data.

Which is what we will eventually realize is what humans are doing too.

wredue · 2024-08-23T23:31:24 1724455884

It absolutely is NOT what humans are doing.

When humans write, they are serializing thoughts. Humans (well, most of us. Certainly not AI enthusiasts), are reasoning and thinking.

When AI writes, it is following a mathematical pathway to string words together that it has seen together before in the given context.

GaggiX · 2024-08-24T00:41:15 1724460075

When an LLM solves a novel problem, it's also reasoning, unless you use some contrived definition of the word "reasoning" that doesn't match how the word is actually used in normal conversation. Also I fully expect the human brain to be encoded in a mathematical model.

And if it wasn't obvious, an LLM can string together two words that it had never seen together in the training dataset, it really shows how people tend to simplify the extremely complex dynamics by which these models operate.

wredue · 2024-08-25T18:11:45 1724609505

No. It is the AI enthusiasts that use contrived, often entirely random definitions of “reasoning”.

AIs do not “think” in any capacity and are therefore incapable of reasoning. However, if you wish to take “thinking” out of the definition, where we allow an AI to try its hand at “novel (for it)” problems, then AIs fail the test horrifically. I agree, they will probably spit something out and sound confident, but sounding confident is not being correct, and AIs tend to not be correct when something truly new to them is thrown at them. AIs spit out straight incorrect answers (colloquially called “hallucinations” so that AI enthusiasts can downplay the fact that it is factually wrong) for things that an AI is heavily trained on.

If we train an AI on what a number is. But then we slap it with 2+2 =5 long enough, it will eventually start to incorrectly state that 2+2=5. Humans, however, due to their capacity to actually think and reason, can confidently tell you, no matter how much you beat them over the head, that 2+2 is 4 because that’s how numbers work.

Even if we somehow get a human to state that 2+2=5 as an actual thought pattern, they would be capable of reasoning out the problems the moment we start asking “what about 2+3?” Where an AI might make the connection, but there no forward thinking won’t resolve the issue.

GaggiX · 2024-08-25T20:59:57 1724619597

These arguments really make no sense: "it can't reason because it can't think" very comprehensive ahah

Also if you train an AI on how numbers work, like humans do in school, you can tell it how much you want to believe that 2+2=5, it won't believe it, just like humans, it's obvious.

lossolo · 2024-08-24T13:26:38 1724505998

> When an LLM solves a novel problem, it's also reasoning, unless you use some contrived definition of the word "reasoning" that doesn't match how the word is actually used in normal conversation. Also I fully expect the human brain to be encoded in a mathematical model.

Depends on what your definition of a novel problem is. If it's some variation of a problem that has already been seen in some form in the training data, then yes. But if you mean a truly novel problem—one that humans haven't solved in any form before (like the Millennium Problems, a cancer cure, new physics theories, etc.)—then no, LLMs haven't solved a single problem.

> And if it wasn't obvious, an LLM can string together two words that it had never seen together in the training dataset, it really shows how people tend to simplify the extremely complex dynamics by which these models operate.

Well, for anyone who knows how latent space and attention work in transformer models, it's pretty obvious that they can be used together. But I guess for someone who doesn't know the internals, this could seem like magic or reasoning.

GaggiX · 2024-08-24T14:59:18 1724511558

>then no, LLMs haven't solved a single problem.

Using your definition of a novel problem, do most people solve novel problems? If so, give me an example of a novel problem you have solved.

lossolo · 2024-08-24T18:05:13 1724522713

Sure, I did—a lot of them. These are the ones that were not in my training dataset in any form before I solved them, as it's impossible for a human to hold all scientific papers, historical facts, and in general, the entirety of human knowledge and experiences from the entire internet in their brain.

GaggiX · 2024-08-24T21:02:20 1724533340

you haven't given me a concrete example.

GaggiX · 2024-08-23T16:05:53 1724429153

>They are stringing words together using copied data

Ah yes and image generators are just rearranging stolen pixels.

delifue · 2024-08-23T10:22:59 1724408579

> But now we get to use a key feature of infinitesimal changes: that they can always be thought of as just “adding linearly” (essentially because ε2 can always be ignored to ε). Or, in other words, we can summarize any infinitesimal change just by giving its “direction” in weight space

> a standard result from calculus gives us a vastly more efficient procedure that in effect “maximally reuses” parts of the computation that have already been done.

This partially explains why gradient descent becomes mainstream.

aeonik · 2024-08-23T15:27:25 1724426845

This article does a good job laying the foundation of why I think homiconic languages are so important, and doing AI in languages that aren't, are doomed to stagnation in the long term.

The acrobatics that Wolfram can do with the code and his analysis is awesome, and doing the same without the homoiconicity and metaprogramming makes my poor brain shudder.

Do note, Wolfram Language is homoiconic, and I think I remember reading that it supports Fexprs. It has some really neat properties, and it's a real shame that it's not Open Source and more widely used.

jderick · 2024-08-23T17:03:03 1724432583

I'd be curious to see an example of what you are talking about wrt his analysis here.

aeonik · 2024-08-23T18:52:00 1724439120

I don't know how to express my thoughts coherently in such a small space and time, but I will try. There isn't "one" example.

----------

Almost all the code and its display is some form of meta-programming. Stephen Wolfram is literally brute-forcing/fuzzing all combinations of "code".

    - Permuting all the different rules/functions in a given scope
    - evolutionary adapting/modifying them
    - graphing and analyzing those structures
    - producing the HTML for display

I get that "normal machine learning" is also permuting different programs. But it's more special when you are using the same language for the whole stack. There is a canyon that you have to cross without homoiconicity, (granted I don't know exactly how Wolfram generated and analyzed everything here, but I have used his language before, and I see the hallmarks of it).

I can't really copy and paste an example for you, because plaintext struggles. Here is an excerpt some fanciness in there:

   And as an example, here are the results of the forward and backward methods for the problem of learning the function f[x] = <graph of the function>  , for the “breakthrough” configurations that we showed above:

You might see a "just" a small .png interspersed in plain text. The language and runtime itself has deep support for interacting with graphics like this.

The only other systems that I see that can juggle the same computation/patterns around like this are pure object oriented systems like Smalltalk/Pharo. You necessarily need first class functions to come even close to the capability, but as soon as you want to start messing with the rules themselves, you need some sort of term re-writing, lisp macro, or fexpr (or something similar?).

Don't get me wrong, you can do it all "by hand" (with compiler or interpreter help), you can generate the strings or opcodes for a processor or use reflection libraries, generate the graphs and use some HTML generator library to stitch it all together. But in the case of this article, you can clearly see that he has direct command over the contents of these computations in his Wolfram Language compared to other systems, because it's injected right into his prose. The outcome here can look like Jupyter labs or other notebooks. But in homoiconic languages there is a lot more "first-class citizenry" than you get with notebooks. The notebook format is just something that can "pop out" of certain workflows.

If you try to do this with C++ templates, Python Attribute hacking, Java byte-code magic... like... you can, but it's too hard and confusing, so most people don't do it. People just end up creating specific DSLs or libraries for different forms of media/computations, with templating smeared on top. Export to a renderer and call it a day -> remember to have fun designing a tight feedback loop here. /s

Nothing is composable, and it makes for very brittle systems as soon you want to inject some part of a computation into another area of the system. It's way way overspecified.

Taking the importance of homoiconicty further, when I read this article I just start extrapolating, moving past xor or "rule 12", and applying these techniques to the symbolic logic, like Tseltin machine referenced in another part of this thread: https://en.wikipedia.org/wiki/Tsetlin_machine

Or using something like miniKanran: https://en.wikipedia.org/wiki/MiniKanren

It seems to me that training AI on these kinds systems will give them far more capability in producing useful code that is compatible with our systems, because, for starters, you have to dedicate less neuronal connections on syntax parsing with a grammar that is actually fundamentally broken and ad hoc. But I think there are far deeper reasons than just this.

----------

I think it's so hard to express this idea because it's like trying to explain why having arms and legs is better than not. It's applied to every part of the process of getting from point A to point B.

Also, addendum, I'm not 100% sure homoiconicity it "required" per se. I suppose any structured and reversible form of "upleveling" or "downleveling" logic that remains accessible from all layers of the system would work. Even good ol' Lisp macros have hygiene problems that can be solved, e.g. by Racket's syntax-parse.

jderick · 2024-08-23T17:08:48 1724432928

It is interesting to see the type of analysis he does and the visualizations are impressive, but the conclusions don't really seem too surprising. To me, it seems the most efficient learning algorithm will not be simpler but rather much more complex, likely some kind of hybrid involving a multitude of approaches. An analogy here would be looking at modern microprocessors -- although they have evolved from some relatively simple machines, they involve many layers of optimizations for executing various types of programs.

jksk61 · 2024-08-23T07:06:31 1724396791

Is a TL;DR available or at least some of the ideas covered? Because after 3 paragraphs it seems the good old "it is actually something resembling a cellular automata" post by Wolfram.

G3rn0ti · 2024-08-24T06:29:50 1724480990

Wolfram explains the basic concepts of neural networks rather well, I think. He trains and runs a perceptron at the beginning und then a simpler network. Then he dwells into replacing the continuous functions they constitute into discrete binary ones — and ends up with cellular automata he thinks emulate neural networks and their training process. While this surely looks interesting all „insight“ he obtains into the original question of how exactly networks do learn is trained networks do not seem to come up with a simple model they use to produce the output we observe but rather find one combination of parameters in a random state space being able to reproduce a target function. There are multiple possible solutions that equally work well — so perhaps the notion of networks generalizing training data is perhaps not quite accurate (?). Wolfram links this to „his concept“ of „computational irreducibility“ (which I believe is just a consequence of Turing-completeness) but does not give any novel strategies to understand trained machine models or how to do machine learning in any better way using discrete systems. Wolfram presents a fun but at times confusing exercise in discrete automata and unfortunately does not apply the mathematical rigor needed to draw deep conclusions on his subject.

theGnuMe · 2024-08-25T23:49:42 1724629782

We know there are infinitely many solutions, it is just hard to find a specific configuration of parameters. The question is, do all those configurations generalize or not.

jmount · 2024-08-23T04:24:30 1724387070

Wow- Wolfram "invented" cellular automata, neural nets, symbolic algebra, physics and so much more.