"training data is totally a violation of copyright"
This really isn't clear because cognition is treated as a special exception to copyright. Every thought we have is derivative of everything we've seen before to some degree; reading a book makes our brains a derivative work. But we recognize that cognition is special.
With machines we tend to apply a strict test: Did copyright go in? If so, the output is almost certainly derivative.
With human brains, with cognition, it isn't enough to prove that a person has consumed a copywitten work prior to having a thought -- instead we judge every thought individually as to its originality.
If we are in a position to apply similar cognitive rules to an LLM then the weights won't be derivative works and we will judge each output as to its originality rather than simply assume.
"This really isn't clear because cognition is treated as a special exception to copyright."
Actually, no. It's considered a transformative use. If you memorize a copyrighted play or piece of music and then perform in in public, that's a copyright violation. It's the literalness of the copy that matters.
No, that's totally incorrect, we do not consider every observation a "transformative use" as applied to the human mind. If you memorize a copyrighted play and write another play it is NOT inherently a copyright violation of everything which has come before. We just don't do that.
The new play is judged as to its originality.
People who have seen a play (everybody) are allowed to write new plays which aren't beholden to the copyright of the first play they've ever watched.
>> "training data is totally a violation of copyright"
> This really isn't clear because cognition is treated as a special exception to copyright.
Human cognition; not the latest algorithms and their output, which some enthusiastic software engineers eagerly confuse for cognition. It's actually pretty clear.
> The open question is how to handle machines that mimic the process.
It's not really an open question, except for software engineers who've talked themselves into thinking of humans as computers. A machine is not a human mind, so does not benefit from the legal exceptions and rights granted to the latter.
I remarked on how human cognition is treated as a magical process with respect to copyright law.
This is just a legal fact. It has nothing to do with how an LLM operates internally, or whether an LLM is at all similar to a human mind in terms of internal mechanics.
> "The legal question of does "copyright goes away if your violation is big enough?"
1) no similarities have ever been demonstrated between large language models and human cognition, and until that happens (spoiler: never) there is no basis in comparing them like this.
2) even if they were somehow proven to be the same there is still no reason why the same standards need to be applied to computer programs and humans because computer programs do not have any rights or legal protections.
3) cognition is not a "special exception to copyright" because it is entirely unrelated. "Copy" "right" is who has rights to make copies. Your thoughts are not considered copies because they are intangible.
4) we do not "judge every thought individually as to it's originality" because other peoples' thoughts are entirely opaque. Nobody is judging your thoughts, and if you think they are you need to take your medications.
"1) no similarities have ever been demonstrated between large language models and human cognition"
This is false. The LLM's entire purpose is to mimic cognition.
You could argue that the operation differs in important ways - of course. But the similarity of output is literally the entire point.
"2) even if they were somehow proven to be the same"
I didn't suggest they need to be the same, proven or otherwise. I think you're not understanding. The point is that the function is similar.
How it works doesn't necessarily matter.
"3) cognition is not a "special exception to copyright" because it is entirely unrelated. "
False as a matter of law.
"4) we do not "judge every thought individually as to it's originality" because other peoples' thoughts are entirely opaque."
Also false as a matter of law. When you publish your thoughts - your works, writing, whatever they are judged as to their originality if the question of who owns the copyright is raised.
"Nobody is judging your thoughts, and if you think they are you need to take your medications."
There's no need to be snarky and disingenuous.
From the comment guidelines: Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes.
>This is false. The LLM's entire purpose is to mimic cognition.
Purpose and mechanism are not the same thing. "Similarity of output" does not make it equivalent.
>I didn't suggest they need to be the same, proven or otherwise. I think you're not understanding. The point is that the function is similar.
Sure, go ahead and ignore all but half a sentence and then accuse me of missing the point.
>False as a matter of law.
Show me the court case where somebody was found to have violated copyright law by thinking about something.
>When you publish your thoughts
You don't publish your thoughts. You publish essays, internet comments, articles, videos, etc based on what you are thinking and those are subject to copyright law.
>There's no need to be snarky and disingenuous.
How dare you, i would never disingenuously tell somebody who thinks his thoughts belong to other people to take their psychiatric medications. Of course i did mean that they should be prescribed by a licensed physician and looking back i regret not stating that explicitly.
"The LLM's entire purpose is to mimic cognition." is your counterpoint to me saying that no peer-reviewed source has ever demonstrated a similarity between LLMs and human cognition. I'm talking about mechanism and you're talking about purpose.
Thank you for saying what I was going to say to this person. I'm so fucking tired of seeing people who probably have never opened a neuroscience textbook talk about cognition.
Or the payer of support finds ways to "hide" income or they put in less effort and earn less knowing that some of it goes towards child/spousal support.
Thats probably more accurate. I knowi work a lot less now than i did a decade ago because i have built up a nest egg and there is no point incrementally destroying my health, work life balance, etc. When the greasy government is taking half. The incremental value is not there vs non taxed uses of my time like raising my children.
Indeed. If you follow that line of thinking, there's probably lots of both men and women that struggled to support their partner before divorce, and are relieved of that burden after divorce. You could easily see effects from this on both side at the same time. All it would take is for the primary earner income to go down, and for the low or no income partners earnings to remain the same. This results in a net reduction in income for both genders.
> Going to an ER after taking cannabis seems like something only someone with an existing anxiety disorder might do.
Twenty or thirty years ago, I'd have agreed with you on this, but potency that's normal today wouldn't even have been a dream back then. If you think that makes no difference, I'm here to tell you you are wrong.
You both are correct, I've seen it countless times. Folks sometimes don't admit even to themselves how insecure and anxious they are. And THC will highlight that, sometimes a bit, sometimes massively. Same with alcohol, other drugs, intense stressful situations and so on.
If you are not anxious, this is no story, this won't bulge even after decades of copious consumption (some very close folks fall here and only here, no exception ever heard of).
Now if you are completely clueless or just a (again clueless) kid and say eat 5 space cakes to show off, yeah this will not be a nice story. 5-10 hours of catatonic despair will leave some mark, but this id self-inflicted harm due to stupidity, many wonderful harms and deaths have been caused by very same thing and if we want bans due to that, alcohol should be the first in line by huge margin.
Educate from childhood, regulate (age, potency, how much daily, optimize for harmless consumption of quality products without impurities), but otherwise let folks do their thing. Anything else leads long term to worse results for whole society, any gut feeling is easily beaten by long term statistics.
Oh, it's nothing anyone can't get through with a little help from someone who can take a broader perspective. That's a lot harder to come by these days than should be, or at least than is healthy for primates as social as we. Stronger communities would yield fewer such ER visits, too.
I don't advocate a ban; Prohibition is a salutary example and it would be worse today for all involved. But I also don't like to see anyone talk about any drug, and certainly not this one, as entirely benign.
My father in law gets super paranoid and anxious on weed and LSD but he will never admit it because he wants people to believe that he's completely chill
I constantly see this, but haven't people been making concentrates for 1000s of years? I imagine ancient Nepali hash would probably rock a lot of people's world still to this day.
Sure, but Nepali hash was being taken in a much different cultural context (religious, often) by people who were generally very prepared for its consequences.
The Amiga was dead a few years before Win95. The VGA chipset and the soundblaster killed the Amiga.
In the late 1980s I wanted an Amiga so badly. But by the early 90s I had a 486 with VGA and a sb16 and it was all over. The Amiga had a mere fraction of the PC's power by then.
Indeed. Commodore killed itself and Amiga, and breakneck speed of PC advancement didn't help. Doom was just a side effect of it all. It left a trauma on Amiga community though, you can still see the community mentioning Doom to this day (see, it can run Doom?). Doom-envy is omnipresent to this day. There was another, rarely mentioned, which is nintendo-envy. NES and then SNES in particular had killer games where Amiga never came through in such capacity (platformers, it shined in other genres). Amiga was poor man's SGI at the time. It was great, fun, relatively cheap for what it offered. It could've been so much more if Commodore had a sense of direction and focus. Alas, here we are lamenting decades after on its fate.
Cachet it left is still strong. I recently (over few years back) tried to get ahold of Amiga again. I just wanted one endgame A1200.. now I have three A1200 - one with Blizzard 1230/030+fpu which is the best general purpose IMO, one with Blizzard 1260/060rev6 for demos (not that great compatibility for general purpose), and one with TF1260/060rev6.. and then two A600 (one stock, one with Furia030) and A500+, indivision addins etc., and a whole bunch of Commodore 1084s monitors. It was supposed to be only one A1200, damnit. Take it as a warning from a friend if you want to get one, they multiply fast.
Nah. The Amiga 1200 debuted in 1992 for $600. 2MB RAM and a 14Mhz 68020. No monitor.
In 1992 you could get a 486dx 33MHz with 4MB for like $800 (a two year old chip) with similar peripherals. Way more than double the power for a marginal increase in cost.
The Pentium arrived a year later in 1993 and by 1994 we had the 486dx4/100MHz and Pentium/586 at similar clock speeds. This is around when doom arrived and Amiga was long since toast.
I've used both a top loader (currently a speed queen) and a front loader washer each for more than a decade. I have many t-shirts that are 20+ years old. Any wear from the washer seems negligible in comparison to the wear from actually wearing the clothes. There's no discernible difference.
But there's a HUGE difference in terms of ability to clean. If I'm out doing yardwork and I have a pair of jeans with deep mud stains on the knees the old style top loader agitators can clean them just fine. The front loaders cannot no matter how many times I run them through. I end up having to scrub the jeans between my knuckles in the laundry sink - moderate agitation breaks up the mud and it rinses out easily.
I suspect a lot of this "agitators are rough" nonsense comes from modern washers that don't use a sufficient amount of water. But the SQ has a setting to use the normal amount of water so it's a non-issue. Most analysis I've seen (eg: from Consumer Reports) refuses to consider top loaders with normal water usage settings -- the data is basically invalid. A lot of Consumer Reports analysis has this type of problem where the entire study is built on a false premise.
Front loaders might be fine if your clothes never get dirty and only need occasional light rinsing. They're really terrible for actually cleaning dirt.
The LG front loaders are very good at cleaning clothing. Wirecutter has done actual tests, and they seem to perform nearly as well as top loaders with agitators, but they don’t damage clothing anywhere near as much. My experience matches — mine cleans even better than the top-loader-with-agitator than I remember as a kid, and it’s much much better than the crappy commercial coin-operated top-loader I used to use.
(They do try to minimize water usage, which means that if you have the water soluble sort of mess on the laundry, you may need to select a “heavy soil” mode or add something wet and heavy to the load. The former takes two straightforward button presses, although LG sadly seems to have switched to capacitive “buttons” on newer models.)
Also, the speed wash cycle is genuinely fast and seems to work fine.
edit: What do you mean top loaders with normal water usage? Most top loaders want as much water is needed to cover the clothing. They gain nothing except longer cycle time if you use more water, and they don’t get clothing clean if you use too little.
I've used the LG frontloaders. They are not effective at removing muddy stains. They simply can't do the job.
Wirecutter doesn't publish their methodology, but every "tester" who has focuses on questionable metrics -- such as testing stain types that don't require agitation to remove.
In my experience most people who are happy with them don't have very dirty clothing to begin with.
Regarding normal water usage: It is not true that washers "gain nothing" by using more water. More water protects clothing under agitation and aids in removing dirt. From the SQ manual:
"Wash delicate items usually washed by hand on this cycle. A full tub of water is recommended (even for small loads) to allow the delicate items to move freely through the water. More water helps reduce fabric wear, wrinkling, and provides for a clean wash."
2) Low water modes clean less effectively and are rougher on clothes
3) Front loaders are designed to work with low water loads, but still don't clean well
3) All modern washers are now terrible, except models that intentionally skirt regulations - such as SQ
High-water agitation is the best by far. The only drawback is increased water use - which is insignificant. The entire issue is a result of bad washer regulations.
There isn't anything that fundamentally makes a top loader clean better than a front loader per se.
And while I don't doubt your SQ top loader with a full agitator does an absolutely fantastic job when compared to a modern front loader you have a big apples/oranges comparison problem.
And if you live in Austin, I would be willing to have a cloths cleaning video made against my 25 year old kenmore/frigidare front loader. :) And that is simply because (as of a couple years ago when I checked) my front loader uses closer to as much water as a modern top loader. It puts a good 6" of water in the tub and has a much smaller capacity that most of the modern front loaders because it has a very small opening and a fairly deep tub but is also narrower front to back. Given the agitators are about the same as the water depth, the cloths can be fully submerged/agitated out.
I to do a lot of outdoor work/repair (including that stupid washer's bearing), and things get dirty/greasy/etc and that washer has consistently amazed me just how good a job it does, and as a bonus when I got it I was also amazed at how much longer it seemed many of my cloths were lasting. I have 25 year old shirts that still look fairly new. So IMHO that washer has been a bit of a miracle worker, which is why despite being used at much greater duty cycle than your average home washer for the past 15 years (think ~15-20 loads a week, don't ask), every time it starts to need another bearing set, I upset my wife by fixing it. The parts cost like $30, and now that I've done it a few times its a ~2 hour job. Although, last time I sorta re-engineered the bearing set after discovering some new bearing technology better suited to being in a wet/soapy environment like a washer, so its possible I may have significantly extended how long it goes between bearing replacements (aka hopefully I never have to do it again). It also seems to spin significantly faster than many of the modern front loaders, which is seemingly at least part of the problem with the bearing loads.
The $5-$10k mark for a high end home workstation has held pretty firm since the 1980s. That's the buy-in price point for a very early PC, an SGI Indy, a low end Sun workstation, etc.
We went through an amazing period of very cheap computers from 2000-2020. We just didn't need specialty or high end equipment for a while.
If you read what Yann writes you'll pretty quickly see that he's rather ignorant about AI. His opinion is probably worse on average than the typical technical generalist's
That’s hilarious. I have read a few things he has written which suggests he’s definitely better than the average technical generalist. I haven’t read everything obviously but he has written quite a lot https://scholar.google.com/citations?user=WLN3QrAAAAAJ
The common place to use something like this would be to mmap an existing external data structure. There are a number of existing mmap-able 0copy k/v library/db formats that fit the bill here.