And it's a question of do we accept breaking law for the possibility to have the greatest technological advancement of the 21st century. In my opinion, legal system has become a blocker for a lot of innovation, not only in AI but elsewhere as well.
This is a point that I don't see discussed enough. I think anthropic decided to purchase books in bulk, tear them apart to scan them, and then destroy those copies. And that's the only source of copyrighted material I've ever heard of that is actually legal to use for training LLMs.
Most LLMs were trained on vast troves of pirated copyrighted material. Folks point this out, but they don't ever talk about what the alternative was. The content industries, like music, movies, and books, have done nothing to research or make their works available for analysis and innovation, and have in fact fought industries that seek to do so tooth and nail.
Further, they use the narrative that people that pirate works are stealing from the artists, where the vast majority of money that a customer pays for a piece of copyrighted content goes to the publishing industry. This is essentially the definition of rent seeking.
Those industries essentially tried to stop innovation entirely, and they tried to use the law to do that (and still do). So, other companies innovated over the copyright holder's objections, and now we have to sort it out in the courts.
> So, other companies innovated over the copyright holder's objections, and now we have to sort it out in the courts.
I think they try to expand copyright from "protected expression" to "protected patterns and abstractions", or in other words "infringement without substantial similarity". Otherwise why would they sue AI companies? It makes no sense:
1. If I wanted a specific author, I would get the original works, it is easy. Even if I am cheap it is still much easier to pirate than use generative models. In fact AI is the worst infringement tool ever invented - it almost never reproduces faithfully, it is slow and expensive to use. Much more expensive than copying which is free, instant and makes perfect replicas.
2. If I wanted AI, it means I did not want the original, I wanted something Else. So why sue people who don't want the originals? The only reason to use AI is when you want to steer the process to generate something personalized. It is not to replace the original authors, if that is what I needed no amount of AI would be able to compare to the originals. If you look carefully almost all AI outputs get published in closed chat rooms, with a small fraction being shared online, and even then not in the same venues as the original authors. So the market substitution logic is flimsy.
You're using the phrase "actually legal" when the ruling in fact meant it wasn't piracy after the change. Training on the shredded books was not piracy. Training on the books they downloaded was piracy. That is where the damages come from.
Nothing in the ruling says it is legal to start outputting and selling content based off the results of that training process.
I think your first paragraph is entirely congruent with my first two paragraphs.
Your second paragraph is not what I'm discussing right now, and was not ruled on in the case you're referring to. I fully expect that, generally speaking, infringement will be on the users of the AI, rather than the models themselves, when it all gets sorted out.
I'm in agreement that it will be targeted at the users of AI as well. Once that prevails legally someone will try litigating against the users and the AI corporations as a common group.
>Nothing in the ruling says it is legal to start outputting and selling content based off the results of that training process.
Nothing says it's illegal, either. If anything the courts are leaning towards it being legal, assuming it's not trained on pirated materials.
>A federal judge dealt the case a mixed ruling in June, finding that training AI chatbots on copyrighted books wasn't illegal but that Anthropic wrongfully acquired millions of books through pirate websites.
I'm saying that LLMs are worthwhile useful tools, and that I'm glad that we built them, and that the publishing industry, which holds the copyright on the material that we would use to train the LLMs, have had no hand in developing them, have done no research, and have actively tried to fight the process at every turn. I have no sympathy for them.
The authors have been abused by the publishing industry for many decades. I think they're just caught in the middle, because they were never going to get a payday, whether from AI or selling books. I think the percentage of authors that are commercially successful is sub 1%.
So the argument is because LLMs are useful and the publishing industry was not involved in their creation we should disregard the property rights of the publishing industry and allow using their work without a license? By that same argument (if something useful is being build, we ignore existing rights) shouldn't not also just take the code/models from OpenAI etc. and just publish them somewhere? Why not also their datacenters?
It's not really an argument. It's an observation that they sat on their hands while other industries out-innovated them. They were complacent and now they're paying the price.
We have laws and rules, but those are intended to work for society. When they fail to do so, society routes around them. Copyright in particular has been getting steadily weaker in practice since the advent of the Internet, because the mechanisms it uses to extract value are increasingly impractical since they are rooted in the idea of printed media.
Copyright is fundamentally broken for the modern world, and this is just a symptom of that.
> Folks point this out, but they don't ever talk about what the alternative was.
That LLMs would be as expensively priced as they really are on society and energy costs? A lot of things are possible, whether they are economically feasible is determined by giving them a price. When that price doesn't reflect the real costs, society starts to wast work on weird things, like building large AI centers, because of a financial bubble. And yes putting people out of business does come with a cost.
Innovation is absolutely an end goal, at least in terms of our legal framework. The primary impetus for copyright and patent law is is innovation: to credit those that innovate their due, and I do think this stems from our society seeing innovation as an end goal. But the intent of the system is always different than its actual effect, and I'm fairly passionate about examining the shear.
I run my AI models locally, paying for the hardware and electricity myself, precisely to ensure the unit economics of the majority of my usage are something I can personallly support. I do use hosted models regularly, though not often these days, which is why I say "the majority of my usage".
In terms of the concerns you express, I'm simply not worried. Time will sort it out naturally.
You’re willing to eliminate the entire concept of intellectual property for a possibility something might be a technological advancement? If creators are the reason you believe this advancement can be achieved, are you willing to provide them the majority of the profits?
Bullshit. Read up and understand the history of these things and their benefits to society. There is a reason they were created in the first place. Over a very long time. With lots of thoughts into the tradeoff/benefits to society. That Disney fucked with it does not make the original tradeoff not a benefit to society.
The fact that you don't actually call out the specific benefit is telling. We're in a world of plenty and don't need copyright to have those benefits for our fellow humans.
Without agreeing or disagreeing with your view, I feel like the the issue the issue with that paradigm is inconsistency. If an individual "pirates", they get fines and possible jail time, but if a large enough company does it, they get rewarded by stockholders and at most a slap on the wrist by regulators. If as a society we've decided that the restrictions aren't beneficial, they should be lifted for everyone, not just ignored when convenient for large corporations. As it stands right now, the punishments are scaled inversely to the amount of damage that the one breaking the law actually is capable of doing.
Yandex is one of Kagi's index sources. They used to publish a list of their sources but have changed it to a generic "we use multiple sources" because they got so much shit for using Brave (because of its founder’s bigotry) and Yandex (because of the ethical dilemma of paying a company headquartered in Russia which in the best case pays taxes and in the worst case has Kremlin/military involvement). This has been a contention debate for awoke, you can search their forum and discord for more details.
Has a very strong AI code smell. Still it seems to be a decent turf.js wrapper for GIS professionals who don't have too much JS experience. Most should be able to use CLI tools at least for all of this.
I would expect GIS professionals to use ESRI or QGIS. This is an interesting showcase. It seems a little too simple given the variety of options for geographic projection. I’m not quite sure what the value proposition is, but it’s interesting.
The entire site is a result of me building different tools for geography guessing game, GuessWhereYouAre.com. At first, I needed them for myself (parsing JSONs, working a lot with Leaflet maps and Turf). Then I added some of them to the guide page in the game, and later I thought — why not create a dedicated site, polish the experience, and make them available as simple geography tools for people like me?
I’m not sure how useful they might be for professionals who use ESRI or QGIS, but I know I needed tools like that for the game’s development and couldn’t find anything easy to use and simple — at least not all in one place.
So technically, this site is a result of another project.
Not even an issue anymore. I have no issue playing FPS games like call of duty multiplayer via GeForce Now and can be decently competitive. I do live close to the servers though.
QGIS is janky? It's quite possibly the smoothest and best running GIS software available today. Most built-in tools run way faster than AG Pro, and once the move to QT6 with 4.0 is complete this october, we'll finally get native builds on M-series Mac as well.
I couldn't even know where to start listing the upsides compared to ESRI offering, fron PostGIS integration all the way to the simplicity of plugins.
>It's quite possibly the smoothest and best running GIS software available today
lol, the bar is not high. It can be both the smoothest and extremely janky at times. Let's be honest with ourselves here. (and I do agree, it's among the best running... but also janky).
Been to foss4g (NL/Europe/Global) a few times. If you work in a field that even touches a bit on geospatial, you'll find interesting talks. Even if you are a web dev and you think "no", getting to know a few things about leaflet or other geo tools for web never hurts.
There's also tools that wrap a part of toporijdreis and add other georeferenced historical maps! I recently saw one of those at https://geodienst.xyz/pastforward. Wish more people georeferenced historical maps, but it is tough.
This paper is more than anything a showcase for the method they created. I think the point of their visualisations is to make them as appealing as possible, and people want to see how Manhattan looks, but they might now want to see how a certain neighbourhood in Singapore looks. And that choice definitely did help create virality for their research a few years back.
Yesterday I was checking HN from 10years±2weeks ago and guess what the top posts were... "Why you need jQuery" and "You might not need jQuery". I'm too young to know about those days but I guess not much has changed in relation to people's attitude towards jQuery.
The problem jQuery was solving was to provide a usable abstraction over an inconsistent platform for core functionality like event handling and DOM manipulation.
The point of the last decade and a half of standards work was to eliminate that problem, and it has at least moved it from "core web functionality" to more complicated areas like bluetooth, 3d rendering and audio, which jQuery's goals do not include handling.
Is it urgent to remove jQuery from projects? Not really. And it's good the jQuery team maintain the project for that reason. Certainly some projects have gotten performance gains out of removing it, but part of the work they've done in 3.x and now 4.x is arguably jQuery removing stuff internally and replacing them with the now reliable browser APIs.
But on the other hand, is there a great argument for including jQuery in new projects? This is also a "not really".
It was initially sold as a way to fix browser incompatibilities, but I still use it because there are really nice plugins like select2. I really don't see the point in SPA apps for the most part. Fair enough if you have a heavily interactive frontend, but I can build an app in Django with a bit of JQuery and end up with around half the code compared to adding in React.