All good points, thanks for the constructive reply.
Your point that Swartz would have had a different result had he formed an LLC, and hired a bunch of lawyers, is definitely the key point here. A legal system that only works for the rich and powerful is not something we should defend, support, or put up with.
His purpose in copying research papers and making them available for free is massively more in the public interest than anything the AI companies are doing. They are, after all, seeking to make a profit at the end of this. And they knowingly and deliberately broke copyright law because it was "too hard" to make any kind of licensing deal with the publishers. You can argue about fair use and transformative purposes (as their lawyers have done), but you can also argue from Swartz's point of view that this information was (to a large extent) publicly funded and therefore belonged to the public, and trying to get the journals to acknowledge that is "too hard". And had he been able to afford lawyers, that's a possible line they could have taken. But he didn't get the chance. As you say, we never got to the trial so we will never know.
It's definitely not a stretch to say that his crime and the AI companies' crimes (which they admit to - they admit to downloading source texts from pirate sites) are comparable, even equivalent. Yet their treatment is not.
My understanding of his treatment is that it was a lot more than "light compulsion" and that he underwent a sustained campaign of enforcement activity and litigation at the hands of a specific prosecutor. But given that the AI companies have had nothing - no criminal charges - just a civil case brought by the authors they admit to ripping off, then I don't think I need to push this point. They are clearly being treated differently to him, despite the similar actions.
We haven't gotten to the part of the trial for Anthropic yet where we determine whether they actually broke the law when they downloaded from pirate sites. Copyright has multiple exceptions. And on the topic at hand here (training on YouTube videos to understand space and relationships in it), I don't think even Google would want to make the case that it's a violation of copyright.
That's the thing about copyright; it's a whole category of law more based in utility than morality. One of the reasons AI is such a fight right now is that nobody was opposing it as an academic project when it was generating, for example, tools that could go from an image to describing the image, or from an image to recognizing the likely artistic style and helping somebody find the original artist. But with just a few tweaks those tools became devices for generating novel images, and now people are upset. Intent matters.
And again, you are drawing equivalence between harvesting data from openly accessible sources online and hiding a server in a closet with unauthorized physical access to a network. Swartz's prosecution wasn't accusing him of copyright violation; it was accusing him of compromising a network. A far more serious charge; if the researchers in the story here had collected those YouTube videos by wiretapping the fiber optics between two of Google's data centers I suspect they would have concerns.
Your point that Swartz would have had a different result had he formed an LLC, and hired a bunch of lawyers, is definitely the key point here. A legal system that only works for the rich and powerful is not something we should defend, support, or put up with.
His purpose in copying research papers and making them available for free is massively more in the public interest than anything the AI companies are doing. They are, after all, seeking to make a profit at the end of this. And they knowingly and deliberately broke copyright law because it was "too hard" to make any kind of licensing deal with the publishers. You can argue about fair use and transformative purposes (as their lawyers have done), but you can also argue from Swartz's point of view that this information was (to a large extent) publicly funded and therefore belonged to the public, and trying to get the journals to acknowledge that is "too hard". And had he been able to afford lawyers, that's a possible line they could have taken. But he didn't get the chance. As you say, we never got to the trial so we will never know.
It's definitely not a stretch to say that his crime and the AI companies' crimes (which they admit to - they admit to downloading source texts from pirate sites) are comparable, even equivalent. Yet their treatment is not.
My understanding of his treatment is that it was a lot more than "light compulsion" and that he underwent a sustained campaign of enforcement activity and litigation at the hands of a specific prosecutor. But given that the AI companies have had nothing - no criminal charges - just a civil case brought by the authors they admit to ripping off, then I don't think I need to push this point. They are clearly being treated differently to him, despite the similar actions.