It's a MoE (mixture of experts) architecture, which means that there's only 3.6 billion parameters activated per token (but a total of 20b parameters for the model). So it should run at the same speed that a 3.6b model would run assuming that all of the parameters fit in vRAM.
Generally, 20b MoE will run faster but be less smart than a 20b dense model. In terms of "intelligence" the rule of thumb is the geometric mean between the number of active parameters and the number of total parameters.
So a 20b model with 3.6b active (like the small gpt-oss) should be roughly comparable in terms of output quality to a sqrt(3.6*20) = 8.5b parameter model, but run with the speed of a 3.6b model.
Yeah, the interview with Geoffrey Hinton had a much better summary of risks. If we're talking about the bad actor model, biological weaponry is both easier to make and more likely as a threat vector than nuclear.
It's because of the effects of technological progress on wealth accumulation.
Tech allows winner-takes-all effects in many different markets (and runaway situations where labor cannot catch up at all) - you either have to have very strong antitrust (which is still not possible in some places), or basically strong redistribution (i.e. taxation of wealth in order to redistribute) to maintain a reasonable (note: not equal, but a reasonable degree of inequality - closer to the model that existed from the 1940s-1970s) distribution of wealth in society.
Yup, this problem is why I think all therapists should ideally know behavioral genetics and evolutionary psychology (there is at least a plausibly objective measure there which is dissonance between the ancestral environment in which the brain developed and the modern day environment. And at least some amount of psychological problems can be explained by it).
I am a fan of the « Beat Your Genes » podcast, and while some of the prescriptions can be a bit heavy handed, most feel intuitively right. It’s approaching human problems as intelligent mammal problems, as opposed to something in a category of its own.
It probably requires some sort of decreased security (if the password hash is truly slow & secure, it would be hard to enforce dissimilarity); but there might be other methods that leak less than cleartext (like salting and storing hashes of overlapping/separate n-grams from the previous password and checking for number of similar n-grams; etc). Or as another commenter suggested checking all passwords within edit distance 1 (though if you can do that, your password hashing algorithm is likely too fast).
What about examples where [technical] management actively makes it harder for engineers to do their job by enforcing nonsensical design decisions (when explicitly warned against it)? :)
Not sure I agree here completely. On its own perhaps not, but if AI is able to deliver nonsensical design decisions at a faster rate than a human, you can iterate to a final better design faster.
Well, in that case U.S. Engineers should cheer for a weaker dollar, since it would bring the U.S. salaries more in line with the salaries in the rest of the world without affecting local dynamics as much (and still allowing people to afford their mortgages) :)
Generally, 20b MoE will run faster but be less smart than a 20b dense model. In terms of "intelligence" the rule of thumb is the geometric mean between the number of active parameters and the number of total parameters.
So a 20b model with 3.6b active (like the small gpt-oss) should be roughly comparable in terms of output quality to a sqrt(3.6*20) = 8.5b parameter model, but run with the speed of a 3.6b model.