More

nil-sec · on May 17, 2023

Feedforward: y=Wx

Attention: y=W(x)x

W is Matrix, x & y Are vectors. In the second case, W is a function of the input.

Sai_ · on May 17, 2023

You must be from a planet with very long years!

There is no way I can even begin to digest what you have said in your comment.

nil-sec · on May 17, 2023

Sorry maybe I should have added more explanation. One way to think about attention, which is the main distinguishing element in a transformer, is as an adaptable matrix. A feedforward layer is a matrix with static entries that do not change at inference time (only during training). The attention mechanism offers a way to have adaptable weight matrices at inference time (this is implemented by using three different matrices, K,Q & V called keys query and value in case you want to dig deeper).

oneearedrabbit · on May 17, 2023

I think in your notation it should have been:

y=Wx_0

y=W(x)x_0

nil-sec · on May 17, 2023

I guess I was more thinking about self attention, so yes. The more general case is covered by your notation!

nil-sec · on Oct 23, 2022

I had the same experience a couple of years back when people started discussing AI. I try to keep this in mind but somehow I keep forgetting. It’s genuinely difficult to filter good from bad takes without expert level understanding of a topic unfortunately.

nil-sec · on Oct 21, 2022

It’s quite nice to have something irreversible though. It gives you time. Also nobody really thinks QM is the end of it, assuming semi classical physics under the hood is just odd to me. There is something below QM that we don’t understand yet (AdS/CFT looks like a good start to me) and personally I think the whole interpretation of QM debate will look stupid in retrospect. Yeah collapse is odd, but it just shows us this isn’t it. Reality is much weirder than we thought and giving up on realism is just the beginning.

ravi-delia · on Oct 22, 2022

You get time from all sorts of technically reversible things though! Even in a totally classical universe entropy gives us an arrow of time. Under MWI decoherence is reversible, but is functionally irreversible in the same way entropy is.

You're right though, I very much doubt QFT as it stands is the bottom. However, that doesn't mean the current debate is stupid. Whatever underlies QM, you'd still expect the measurement effect to also be an emergent property. The debate about whether atoms existed is still meaningful even though we now know that "atom" isn't a perfectly natural category. Indeed there are protons and such underling the physics, but the protons do pretend to be atoms much of the time, and thus pretend to do all the things we use atoms to predict.

Similarly, MWI and collapse (as well as, if less so, weirder theories) can be good explanations as to why a quantum phenomenon occurs even if there's also a reason they happen.

nil-sec · on May 28, 2022

It’s funny because, for me, this was one of the major confusions when I moved from Europe to the US. In Europe, there are a lot of prominent subcultures, particularly in college. This was totally absent in the US in my experience. Even at places where you would expect it, e.g. underground, hard Techno warehouse raves in Baltimore. Instead the same mainstream people & opinions were there too.

astrange · on May 28, 2022

In the US only GenX and early millennials had subcultures, which is also why they’re the only people in bands.

Everyone after them thinks it’s weird that they’ve confused listening to a single kind of music with an entire lifestyle.

(Boomers, who had a single counterculture instead of subcultures, similarly thought that doing drugs at music festivals was somehow actively saving the world.)

nil-sec · on April 13, 2022

This isn’t true, the quality of images generated by DALL-E are really good, but they are an incremental improvement and based on a long chain of prior work. See e.g. https://github.com/CompVis/latent-diffusion

gwern · on April 13, 2022

Also Make-A-Scene, which in some ways is noticeably better than DALL-E 2 (faces, editing & control of layout through semantic segmentation conditioning): https://arxiv.org/abs/2203.13131#facebook

nil-sec · on March 2, 2022

Insane take, the west vs Russia in open confrontation ends in a nuclear winter.

nil-sec · on Aug 24, 2021

I have no background in control theory but this sounds very similar to the identifiability problem in nonlinear ICA. Are those equivalent?

nil-sec · on April 28, 2021

This is a good point which I think stems from wrongly equating human level intelligence to AGI in the popular literature. It’s not at all clear what a general intelligence should be and it’s much less clear that humans have general intelligence. In my view we have a set of very good innate priors (e.g. space/time continuity, intuitive physics) that have been optimized by evolution for thousands of years. These priors in turn allow us to learn fast from unlabeled data, but would anyone call such a system general intelligence? I’m not sure.

newsbinator · on April 28, 2021

I call General Intelligence any system that wonders for no particular reason if others systems have General Intelligence.

nil-sec · on April 28, 2021

While I agree with the general point of this paper I don’t think it’s quite right to compare the current situation with the last AI spring. It’s not AGI but it’s very good narrow AI that has real commercial value right now. The systems back then did not to the same extent, and for this reason I don’t see funding dry up for current ML approaches.

nil-sec · on March 8, 2021

For one it lets you avoid controlling for the wrong variables and causing e.g. spurious correlations by doing so. In fact this is one of the best examples of why a causal model is necessary, because without one you can easily end up with a correlation that doesn’t exist as is illustrated quite nicely in his book.

haberman · on March 8, 2021

Do you mean that the new techniques will (1) help you prove which variables should not be controlled for, or that they will (2) help you more clearly describe your causal assumptions, so that you can more easily recognize which variables should not be controlled for according to your assumptions?

If you mean (2), I can't really disagree: explicitly specifying your causal assumptions through a DAG seems like a clarifying step in specifying a model.

If you mean (1), then I must be missing something because I'm not seeing that this set of tools can do that.

My worry is that (2) is mistaken for (1), and that writing down a causal model is conflated with proving that it is true.

nil-sec · on March 8, 2021

For a given causal model it is (1) in my understanding.

haberman · on March 9, 2021

But "for a given causal model" precisely means "given a set of statements about what causes what." Those statements must be either proved or assumed.

If they are already proved, they don't need to be proved further per (1).

If they are not already proved, then they are just assumptions and we are talking about (2).