Hacker Newsnew | past | comments | ask | show | jobs | submit | tyronehed's commentslogin

So, you oppose their desire to diversify their funding? NPR gets 10% of their funding from tax dollars. Who you are harming is small rural public radio stations in red areas.


archive org. Look for the audiobook "Mawson's Will: The Greatest Survival Story Ever Told"


Especially if they are all me-too copies of a Transformer.

When we arrive at AGI, you can be certain it will not contain a Transformer.


I don't think architecture matters. It seems to be more a function of the data somehow.

I once saw a LessWrong post claiming that the Platonic Representation Hypothesis doesn't hold when you only embed random noise, as opposed to natural images: http://lesswrong.com/posts/Su2pg7iwBM55yjQdt/exploring-the-p...


Its a bit like saying algorithms don't matter for solving computational problems. Two different algorithms might produce equivalent results but if you have to wait years for an answer, when seconds matter, the slow algorithm isn't helpful.

I believe the current approach of using mostly a feed-forward in the inference stage, with well-filtered training data and backpropagation for discrete "training cycles" has limitations. I know this has been tried in the past, but something modelling how animal brains actually function, with continuous feedback, no explicit "training" (we're always being trained), might be the key.

Unfortunately our knowledge of "whats really going on" in the brains is still limited, investigative methods are crude as the brain is difficult to image at the resolution we need, and in real time. Last I checked no one's quite figured out how memory works, for example. Whether its "stored in the network" somehow through feedback (like a SR-latch or flip-flop in electronics) or whether there's some underlying chemical process within the neuron itself (we know that chemicals definitely regulate brain function, don't know how much it goes the other way and it can be used to encode state)


> I don't think architecture matters. It seems to be more a function of the data somehow.

of course it matters

if I supply the ants in my garden with instructions on how to build tanks and stealth bombers they're still not going to be able to conquer my front room


As soon as you need to start leaning heavily on error correction, that is an indication that your architecture and solution is not correct. The final solution will need to be elegant and very close to a perfect solution immediately.

You must always keep close to the only known example we have of an intelligence which is the human brain. As soon as you start to wander away from the way the human brain does it, you are on your own and you are not relying on known examples of intelligence. Certainly that might be possible, but since there's only one known example in this universe of intelligence, it seems ridiculous to do anything but stick close to that example, which is the human brain.


This is actually a lazy approach as you describe it. Instead, what is needed is an elegant and simple approach that is 99% of the way there out of the gate. Soon as you start doing statistical tweaking and overfitting models, you are not approaching a solution.


In a way yes. For models in physics that should make you suspicious, since most of our famous and useful models found are simple and accurate. However, in general intelligence or even multimodal pattern matching there’s no guarantee there’s an elegant architecture at the core. Elegant models in social sciences like economics, sociology and even fields like biology tend to be hilariously off.


When an architecture is based around world model building, then it is a casual outcome that similar concepts and things end up being stored in similar places. They overlap. As soon as your solution starts to get mathematically complex, you are departing from what the human brain does. Not saying that in some universe it might be possible to make a statistical intelligence, but when you go that direction you are straying away from the only existing intelligences that we know about. The human brain. So the best solutions will closely echo neuroscience.


Since this exposes the answer, the new architecture has to be based on world model building.


The thing is, this has been known since even before the current crop of LLMs. Anyone who considered (only the English) language to be sufficient to model the world understands so little about cognition as to be irrelevant in this conversation.


The alternative architectures must learn from streaming data, must be error tolerant and must have the characteristic that similar objects or concepts much naturally come near to each other. They must naturally overlap.


Any transformer based LLM will never achieve AGI because it's only trying to pick the next word. You need a larger amount of planning to achieve AGI. Also, the characteristics of LLMs do not resemble any existing intelligence that we know of. Does a baby require 2 years of statistical analysis to become useful? No. Transformer architectures are parlor tricks. They are glorified Google but they're not doing anything or planning. If you want that, then you have to base your architecture on the known examples of intelligence that we are aware of in the universe. And that's not a transformer. In fact, whatever AGI emerges will absolutely not contain a transformer.


It's not about just picking the next word here, that doesn't at all refuse whether Transformers can achieve AGI. Words are just one representation of information. And whether it resembles any intelligence we know is also not an argument because there is no reason to believe that all intelligence is based on anything we've seen (e.g us, or other animals). The underlying architecture of Attention & MLPs can surely still depict something which we could call an AGI, and in certain tasks it surely can be considered an AGI already. I also don't know for certain whether we will hit any roadblocks or architectural asymptotes but I haven't come across any well-founded argument that Transformers definitely could not reach AGI.


The transformer is a simple and general architecture. Being such a flexible model, it needs to learn "priors" from data, it makes few assumptions on its distribution from the start. The same architecture can predict protein folding and fluid dynamics. It's not specific to language.

We on the other hand are shaped by billions of years of genetic evolution, and 200k years of cultural evolution. If you count the total number of words spoken by 110 billion people who ever lived, assuming 1B estimated words per human during their lifetime, it comes out to 10 million times the size of GPT-4's training set.

So we spent 10 million more words discovering than it takes the transformer to catch up. GPT-4 used 10 thousand people's worth of language to catch up all that evolutionary finetuning.


> words spoken by 110 billion people who ever lived, assuming 1B estimated words per human during their lifetime..comes out to 10 million times the size of GPT-4's training set

This assumption is slightly wrong direction, because not exist human who could consume much more than about 1B words during their lifetime. So humanity could not gain enhancement from just multiply words of one human by 100 billion. I think, correct estimation could be 1B words multiply by 100.

I think, current AI already achieved size need to become AGI, but to finish, probably need to change structure (but I'm not sure about this), and also need some additional multidimensional dataset, not just texts.

I might bet on 3D cinema, and/or on automobile targeting autopilot dataset, or something for real life humanoid robots solving typical human tasks, like fold shirt.


> Does a baby require 2 years of statistical analysis to become useful?

Well yes, actually.


of the entire human race's knowledge, and it's like from written history, not 2 years ago.


Only Transformer based architectures are over.

It amazes me that everyone so fetishizes Transformer architectures that they cannot imagine alternative--when the alternative is obvious.


For someone only tangentially following along, what is the obvious alternative?


Lol pay the originating artists and copyright holders for training on their data. Stealing is game over, legally, so they are trying unsuccessfully to change the law.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: