OFFSystem

xg15 · on May 1, 2020

I'll agree that copyright has gone haywire as much as the next guy, but this seems to be the exact kind of defense strategy that will archieve nothing but annoy the judge.

The project seems to be based on the assumption that copyright really protects particular sequences of bytes - and that therefore, if you can sufficiently change the representation of a protected work, you're off the hook.

IANAL, but that seems far from the intentions of copyright.

The 4'33 case [0] comes to mind, where a work of nothing but silence was protected. From what I understood, in that case, the actual contents of the work didn't matter a lot (it was all just silence after all) - but what did matter was whether or not someone was consciously using this piece as a base for their own work.

My understanding is that copyright bans the act of using a protected work in an unauthorized manner. By that logic, it doesn't matter how an illegal copy is represented if the data is used with the intention of sharing an illegal copy.

[0] http://news.bbc.co.uk/2/hi/entertainment/2276621.stm

shawnz · on May 1, 2020

A classic analogy for explaining the difference: https://ansuz.sooke.bc.ca/entry/23

chias · on May 1, 2020

Came here to post that link.

In a nutshell, we programmers tend to like to think that we can use technical means to show that copyright is fundamentally nonsensical / inconsistent / imperfect, as though copyright were a mathematical proof and providing a counter-example would somehow make it go away. This is fundamentally missing the point.

chias · on May 1, 2020

Poignant excerpt:

I think Colour is what the designers of Monolith are trying to challenge, although I'm afraid I think their understanding of the issues is superficial on both the legal and computer-science sides. The idea of Monolith is that it will mathematically combine two files with the exclusive-or operation. You take a file to which someone claims copyright, mix it up with a public file, and then the result, which is mixed-up garbage supposedly containing no information, is supposedly free of copyright claims even though someone else can later undo the mixing operation and produce a copy of the copyright-encumbered file you started with. Oh, happy day! The lawyers will just have to all go away now, because we've demonstrated the absurdity of intellectual property!

The fallacy of Monolith is that it's playing fast and loose with Colour, attempting to use legal rules one moment and math rules another moment as convenient. When you have a copyrighted file at the start, that file clearly has the "covered by copyright" Colour, and you're not cleared for it, Citizen. When it's scrambled by Monolith, the claim is that the resulting file has no Colour - how could it have the copyright Colour? It's just random bits! Then when it's descrambled, it still can't have the copyright Colour because it came from public inputs. The problem is that there are two conflicting sets of rules there. Under the lawyer's rules, Colour is not a mathematical function of the bits that you can determine by examining the bits. It matters where the bits came from. The scrambled file still has the copyright Colour because it came from the copyrighted input file. It doesn't matter that it looks like, or maybe even is bit-for-bit identical with, some other file that you could get from a random number generator. It happens that you didn't get it from a random number generator. You got it from copyrighted material; it is copyrighted. The randomly-generated file, even if bit-for-bit identical, would have a different Colour. The Colour inherits through all scrambling and descrambling operations and you're distributing a copyrighted work, you Commie Mutant Traitor.

monkpit · on May 1, 2020

If all that mattered were the bits, then wouldn’t it be sufficient to just re-encode a video? All the bits are different now...

It seems to me that changing the bits only helps to avoid detection. Not to avoid the legal issues.

chias · on May 1, 2020

It wouldn't be just an equality check on the bits anyway, rather some kind of theoretical function over the bits. For example, is this JPG image a copyrighted image? Perhaps my function would be:

    - take the source image, reduce it to a 10px by 10px grid
    - take the image in question, reduce it to a 10px by 10px grid
    - for each pixel, calculate the deviation between the source pixel and the candidate pixel
    - if the sum of the squares of the deviation is smaller than some threshold, output true.

The point is that no matter how clever a function you make, it cannot capture the notion of where the bits came from and can only capture information about what the bits are. Computers care about the latter. All these clever schemes care about the latter. Copyright law cares about the former.

From a copyright law perspective, the notion that you might have two files on your computer whose bits are completely identical in every way, but only one of which you are legally permitted to copy, is perfectly reasonable.

jatone · on May 1, 2020

it changes the nature of the question.

now they have to prove you extracted the specific copyrighted material from those random bits that where exchanged.

sure copyright would still exist, but proving one infringed on it becomes borderline impossible.

lifeisstillgood · on May 1, 2020

Off topic but their blog starts in Feb 1997.

Now I am fairly sure i was doing something blogging like then, but ... actually still having it up... impresses me for the sheer ability to keep something running that long.

daniel-thompson · on May 1, 2020

From the article:

> Hey, Reddit, Ycombinator, and Metafilter readers! You know what, I'm proud that this article has become a benchmark, but I've written a lot of others I like too, some of them more recently than 2004. It would sure be nice if my other articles got some love instead of just this one being linked from a discussion every week.

dnr · on May 1, 2020

Just a clarification on 4'33": recordings of it are not pure silence (and are not intended to be). They're recordings of actual musicians on an actual stage, not playing any music. So you hear lots of little environment sounds, movements, coughs, etc. It seems plausible for a particular recording of that piece to be copyrighted. Or for the instructions to be.

The issue in that link is weird because it seems like part of the problem is that he credited Cage on his own album, admitting that he was copying it.

xg15 · on May 3, 2020

Good point. I wasn't aware of that. Thanks for the clarification.

amelius · on May 1, 2020

> I'll agree that copyright has gone haywire as much as the next guy, but this seems to be the exact kind of defense strategy that will archieve nothing but annoy the judge.

I think a good analogy is that of two twin brothers A and B. One of them (A) commits a crime and is caught on camera. The judge, however, can't convict A because the video evidence isn't conclusive because of the existence of B. You can say that B is annoying the judge because he doesn't play along, but that's not how it works.

greenshackle2 · on May 1, 2020

That would be a good analogy if they just claimed the network gives you anonymity and therefore lets you break the law with impunity.

But they're making the claim that the system lets you share copyrighted work without breaking the law.

Because they XOR'ed copyrighted bits with other copyrighted bits therefore how can anyone own the resulting garbage? This is the part that will be laughed out of the court room.

Whether it can be proved that a particular node participated in the infringement is a different story, but if that is your threat model you want strong encryption not XOR.

Chris2048 · on May 1, 2020

If the law where clear cut, yes. When the law is uncertain, it's not possible to either fully comply, or fully violate the law - avoiding "being caught" prevents legal action on an issue that may or may not be legal. That this is possible/necessary is due to the structure of the legal system.

What intensives have people to risk being a legal test case? It's clear to me those with influence over the law purposefully keep things ambiguous in order to have it both ways (threaten, without risk of being threatened) - maybe the opposing sides should be able to take advantage of the ambiguity just as much - after all, privacy is the right even of those with nothing to hide.

amelius · on May 1, 2020

The allegation will be: I downloaded A from X and B from Y, XORed them together and got copyrighted work.

However, X will say that Y could have manufactured B that way. And Y will say the same.

> This is the part that will be laughed out of the court room.

You can laugh all you want, but you can't convict anyone.

greenshackle2 · on May 1, 2020

Great, X and Y have infringed the copyright for both A and B. I'm not convinced they don't have more legal exposure rather than less (if B is copyrighted too).

amelius · on May 1, 2020

The main point is that there will be nobody in court in the first place.

Assume you have published random block A.

Now I claim that you have infringed copyright of a work because XORing your A with my C gives that work.

I hope you can agree that I can't sue you based on that. And this is exactly how this system works.

Kalium · on May 1, 2020

If I need your A in order to produce the copyrighted work, it seems likely to me that the argument will be made that it's a derivative work or another way of copying the copyrighted work. It might event work. It would certainly be expensive to find out.

As a rule, trying to get clever with technology to get around the legal system rarely works well. Especially when you can't afford to defend yourself.

LukeShu · on May 1, 2020

The point being made is that the system creates such deniability that it's impossible to say who the perpetrator is.

Let A and B be two data blocks produced by different people, and C be a copyrighted work.

It becomes known that A⊕B⇒C, and so the copyright holder of C would like to sue someone.

It is obvious that either the person who produced A or the person who produced B is infringing on the copyright of C. However, it is impossible to say which one it is; it is impossible to say whether A is a derivative work of B and C (B⊕C⇒A) or if B is a derivative work of A and C (A⊕C⇒B).

(Of course, good luck explaining that to a judge.)

Let's iterate on that more. So A and B a blocks on the network. Yet a 3rd person publishes some freely-licensed works; they F1 and F2. In order to be stored in OFF, they get xor'ed against existing blocks, and just so happen to xor against A and B. So A⊕F1⇒A' and B⊕F2⇒B'. So someone wanting to download F1 will download A and A', then get F1 by A⊕A'⇒F1. Neither the person publishing F1 and F2 nor the person downloading them have any idea that A⊕B⇒C. So now someone lawfully downloading F1 and F2 appears to be downloading C, because the blocks they're downloading are [A, B, A', B']; and [A, B] are what it takes to infringe on C. But they had no idea that A⊕B⇒C, they were just interested in F1 and F2. So now you can't say that downloading A and B together indicates unlawfully downloading C.

(Again, good luck explaining that to a judge.)

So the point is to make it really hard to say who did what.

Kalium · on May 1, 2020

Maybe it's just me, but doesn't this make it exceptionally clear that multiple people have colluded to both do a thing and hide who was the responsible party? I think the US legal system might have tools for handling such groups of people. I'm reasonably sure this isn't unprecedented, except perhaps in the technological details.

I think it might strain credulity that these "random" blocks "just happen" to have these particular properties. So much so that I doubt the argument would stand in face of the argument that F1 and F2 were deliberately crafted derivative works of C, made in an effort to allow piracy of C.

LukeShu · on May 1, 2020

> multiple people have colluded

Yes, everyone using OFFSystem is colluding to make it hard to tell who did what. There are legitimate reasons one someone might want to engage in such a system--including privacy. It's the same thing with Tor; everyone with a Tor node is colluding to make it hard to tell who's talking with what other nodes.

> it might strain credulity that these "random" blocks "just happen" to have these particular properties

The OFFSystem is designed such that blocks do "just happen" to have those particular properties. Any new thing put on the OFFSystem will make use of many random blocks that are already on the OFFSystem. For simplicity, in the examples I gave, storing 1 new data block used just 1 existing data block, but that number is configurable; the default is 2. So when a user stores F in to the OFFSystem, the software will randomly select 2 existing data blocks on the network (S, R), and combine F with them in order to store it: S⊕R⊕F⇒data-block-to-store; the software having having no idea what S and R have previously been used for. For simplicity, I've also been assuming that a file you want to store fits in 1 block; a source block is 128KiB; storing a file larger than that will take storing many blocks. So in all, by default, storing a file will make use of 2×(filesize/128KiB) randomly-selected data blocks already on the network. The idea is that after a short time, virtually every data block on the network will have many unrelated uses.

It would be entirely plausible that someone has both a legitimate use for A (e.g. F1) and a legitimate use for B (e.g. F2) with no one involved in that legitimate use having any idea that together they have the illegitimate use A⊕B⇒C, because the whole system was designed to make that likely.

LukeShu · on May 1, 2020

Switching sides, kinda: OK, so you might not be able to assign much culpability to any given data block, so just assign culpability to sharing the metadata that A⊕B⇒C, right?

It's my understanding that that metadata is stored in a "descriptor block" stored in to the network like any other data block (but isn't combined with a set of random other blocks already on the network). I am unsure what the format of that data is, it might very well be easy to identify a block as being a descriptor block (or maybe not, I have no idea).

Going after those who download the descriptor block is problematic for the same reasons--that block will end up being reused for other files, the same as any other data block; they could be using it for any number of purposes.

OK, so you go after those that say "hey, use this X descriptor block to get such-and-such copyrighted file". Well, there's a reasonable amount of precedent that that sharing a link to a copyrighted work doesn't constitute infringement in many jurisdictions (e.g. hosting a torrent file or a magnet link isn't infringing; it's actually seeding or leaching the contents of the torrent that's infringing).

OK, so you go after those that downloaded the descriptor block, then shortly afterward downloaded the data blocks referred to by the descriptor block, and that's reasonable suspicion. Well (1) tracking that is a lot harder than tracking who downloaded an individual block, and (2) there's still a sliver of plausible deniability, so then then you get a subpoena to search their computer for C.

Again, the whole point is to make it really _hard_ to tell who's doing what.

heavenlyblue · on May 1, 2020

No, I think this whole thing makes it impeccably clear which blocks are the data and which blocks are the encryption pad since all of that is in the URL.

There’s no philosophical argument here, in my opinion.

LukeShu · on May 1, 2020

xor is a commutative operation; there is no technical distinction between the "data" and "encryption pad" as you call them. The URL identifies an unordered set of blocks, and does not distinguish which of them are the pre-existing "randomizer" blocks ("encryption pad" blocks, as you call them), and which one is the new block ("data" block, as you call it). The URL has a set of blocks, you xor them all together in any order, and you get the result.

heavenlyblue · on May 1, 2020

All encryption functions are total, it’s just that the block size of XOR is 1 bit.

If you decided to take AES-256 and then split your data into blocks of 256 bits and then encrypted them with other groups of blocks of 256 bits, you would get exactly the same conceptual result.

That being said (this definition is absolutely irrelevant) - it doesn’t matter if you have an unordered set of blocks; if by applying them together you get the data then every one who held the block is liable.

Would you say that if you stored child porn on a RAID array and then distributed this RAID array across several people and removed one disk then philosophically the only moment when that porn really exists is only when it is combined to get the data in memory?

LukeShu · on May 2, 2020

I'm not sure why you've brought up philosophy twice now; what I'm showing is that the system makes it hard, practically, to tell who did what; no philosophy involved.

All but one of the blocks were already used by other people for entirely unrelated purposes. Only one of the blocks was created in the process of uploading illicit material, and it is impossible to know which one that is (from inspecting the blocks themselves; if you knew when each block was created--which the system does not track--then you could just say "the last one to be created"). You had asserted that the URL tells you which block that is; all I said to you is that it does not.

And even if you did identify which block that was, after a short time, another unsuspecting user of the network will end up using that block-involved-in-illicit-activity as a randomizer block for an entirely unrelated purpose; say: uploading cat gifs. And then others download those cat gifs, and download that block; having no idea that that block is also used to download child porn.

The idea of the system is that after a short time, virtually every data block on the network will have many unrelated uses. And because of those many unrelated uses, one cannot say that transmission of any given block indicates one of those uses over any of the other uses.

I haven't made any sort of philosophical argument that the illicit material does not exist until it when the data blocks are combined in memory, or anything like that. I have only made the argument that because each data block has many unrelated uses, it is difficult to prove who is using it for which uses. As I said in another comment, you can start to figure out things if you keep track of patterns of which blocks are downloaded when; but tracking that is a lot harder than tracking a single block. And that's the point of the system; to make it hard to prove specific assertions about who did what.

richardwhiuk · on May 1, 2020

How were A and C constructed?

Were they constructed from the derivative work? Or did they choose or generate A and C so that A xor C produces the derived work?

In all of those cases, they are still derivative work.

kelnos · on May 2, 2020

It feels like you don't really understand how the laws work, perhaps?

Intent is very very important.

If I'm hosting stuff for the express purpose of allowing people to download what I have, download what someone else has, and then do math with the two parts to get a copyrighted work, a judge (& jury) will have more than enough to rule against all of the people involved.

> You can laugh all you want, but you can't convict anyone.

That is an incredibly naive and incorrect view of how the (US, at least) legal system works.

Also consider that in a civil suit, the burden of proof is much, much lower than in a criminal case. So the word "convict" does not apply here.

The only thing here unique to OFFSystem is that if you have hundreds or thousands of nodes, it's (theoretically) not possible to prove who "owns" A and who owns B, so who do you take to court?

But then we get back to intent. The current legal consensus is that if a tool has legitimate non-infringing uses that are actually realistically happening in practice, then it gets very muddy and a plaintiff has to work hard to prove that any particular user is responsible for infringement. But in the case of OFFSystem, its express purpose is to enable copyright infringement. That is literally the stated purpose of the entire thing. Legally, anyone who participates in that system is assumed to be there to enable copyright infringement, and can be held liable, if not for direct infringement, then contributory infringement.

That's how the law actually works, for better or worse. And even if you're in the clear, do you want to be dragged through court and be responsible for legal fees, and the general uncertainty of what's going to happen to you?

xg15 · on May 1, 2020

I agree it's a good analogy, but I think you can also use it to illustrate my point.

In your example, the goal of the judge is not to convict whoever is seen on the video. Their goal is to convict whoever commited the crime - and the camera footage may or may not be essential in reconstructing that information.

The fact that A has an identical twin makes the video into less conclusive evidence, but this just means other pieces of evidence are required - it does not change anything about the original crime.

In fact, if A was bragging on Twitter how he is about to commit the crime and no one will be able to prove it was him, the tweets themselves would probably become evidence.

In the same way, I believe (IANAL) two files being byte-for-byte identical is usually strong evidence that one file is a copy of the other in the copyright sense. However, it's neither a necessary criterion, nor strictly speaking even a sufficient one (e.g. in the 4'33 case).

jatone · on May 1, 2020

but the proof that the byte-for-byte duplicate exists isn't available. yes you maybe know that A,B,C can be combined to create the byte-for-byte duplicate but they can also be combined to create a completely different file.

so just monitoring the network proves nothing. you have to get access to the end result on the users device.

the point is copyright is now much harder to prove.

shawnz · on May 1, 2020

At best it is a way of hiding evidence, but any encryption scheme works just as well for that purpose. It doesn't change the legality of using the bits that were derived from the copyrighted material

Chris2048 · on May 1, 2020

The question is liability of hosting those bits.

I have copyrighted data block A.

I generate random, uncopyrighted data A1 and store on host X I XOR A1 with A and get A2, storing this on Y.

Y is derived from A (and A1) but A1 is totally original random data whose generation is totally unrelated to A.

So, who is hosting the derivative data? Y, possibly, but both A1 and A2 look like random data, there's no way to tell which is the random one.

shawnz · on May 1, 2020

The hosts are protected anyway by DMCA safe harbour provisions, just like if you uploaded an encrypted copy of an infringing file to Google Drive.

> So, who is hosting the derivative data? Y, possibly

Y, definitely. Like you said yourself, Y's data is derived from A.

> both A1 and A2 look like random data, there's no way to tell which is the random one.

Any encryption scheme will cause the file to look like random data. All that means is that seeing the file contents alone is not enough to prove the crime. It doesn't mean that the file contents become free of infringement.

Chris2048 · on May 1, 2020

Does any host get safe harbour, or just the big ones with lawyers?

In any case, encryption schemes have keys, distinct from data and generated during encryption. In this case, there are two sets of data, either of which could be key or data, and one of which could of been generated before, and independently of the other. In fact, for any data block A, and random data X - there is a block Y that combines with X to produce A - any/all A, any/all X.

shawnz · on May 1, 2020

Y's data is obviously the data and X's data is obviously the key. It doesn't matter if you can tell that just by looking at the data or just by looking at the key. What you can tell just by looking at the data is irrelevant, that is what I am saying.

> In fact, for any data block A, and random data X - there is a block Y that combines with X to produce A - any/all A, any/all X.

That is true with basically any encryption scheme, the only peculiarity here is that key and data are the same length (which is irrelevant to the legality of the situation anyway).

Chris2048 · on May 1, 2020

> what you can tell just by looking at the data is irrelevant

Why? hosting derivative work is illegal, hosting random data (keys) isn't, the distinction between X's data and Y's data is the distinction between liable and not-liable. If you can't tell which is which, then both are innocent under reasonable doubt.

> That is true with basically any encryption scheme

What's isn't true is the ease with which the A can be generated from X given Y. In other encryption schemes, key and data are distinct components - and not just by size.

xg15 · on May 2, 2020

> If you can't tell which is which, then both are innocent under reasonable doubt.

Nope. X is still hosting infringing content (I doubt this would even be considered derivative and not simply the encrypted original), this just made the proof harder. But "it could theoretically have happened differently" is not "reasonable doubt" if this possibility is highly unlikely in face of all other evidence. A thief doesn't get out free because the thing "could have accidentally dropped into his bag".

You're demanding that the court, should only look at the tiny piece of the puzzle that you provide. They have no reason to do so.

In this case, I'd assume, instead of diving into the technicalities of the algorithm, they'd be more interested in the fact that the openly stated purpose of the whole system is to circumvent copyright.

shawnz · on May 1, 2020

Because looking at the bits themselves is just one way of many to prove beyond a reasonable doubt what someone was trying to accomplish in the first place. In fact having the bits themselves doesn't even necessarily prove anything. If someone puts a murder weapon in my possession without my knowledge, does that make me guilty?

> In other encryption schemes, key and data are distinct components

Key and data are distinct components here too. You said yourself you generated the key data with no knowledge of the infringing content, so that makes them distinct. Size of the key is the only difference with this scheme compared to more typical ones.

Chris2048 · on May 2, 2020

I'd say showing a host is hosting a copyrighted file is the main way this is proven, and it's hard to host a file without broadcasting evidence of guilt. This is in no way comparable to possessing a murder weapon.

> You said yourself you generated the key data with no knowledge of the infringing content, so that makes them distinct.

The 'key' differs by history, which isn't know from looking at the bits. There is no way to distinguish which is the key without knowing how they where generated.

But by "distinct" I mean that you cannot switch the roles of key and encrypted data in most encryption schemes, that is not true in this case.

> Size of the key is the only difference with this scheme compared to more typical ones.

That's not true, in traditional schemes the key of any size is still the key in how it's used to decrypt the encrypted data; you cannot decrypt the key using the encrypted data as a key; this is not true for OFFsystem - there is no key/data distinction, if E is the decryption function then E(A, B) = E(B, A)

gnopgnip · on May 1, 2020

Any host gets safe harbor as long as they accept DMCA takedown notices and respond to them within a reasonable amount of time. You don't need a lawyer to comply with this law in general, but if you are in the business of hosting legally dubious content you will need lawyers one way or another.

richardwhiuk · on May 1, 2020

Y is definitely derived work.

The knowledge that "A = X(A1) xor Y(A2)" is also derived work. That fact will be known by someone, and will definitely be infringing.

adinisom · on May 1, 2020

"possibly" here means that while Y is hosting derived work, even the uploader doesn't know that for a fact since the software doing the uploading doesn't expose this detail.

Knowledge of how to combine them being a derived work is an interesting theory. One challenge is that facts aren't typically copyrightable. On the other hand you could argue contributory infringement instead.

Chris2048 · on May 1, 2020

> will achieve nothing but annoy the judge

Funny how existing BS arguments that benefit copyright holders haven't annoyed judges so much

There is no ultimate defence, but there is forcing attackers to lose viable defence ground - Once they attack you on basis X, they can't defend themselves on basis not-X; or vice versa.

shawnz · on May 1, 2020

Because copyright law was created to benefit copyright holders.

_0ffh · on May 1, 2020

We might even say that copyright holders were created by copyright law, no? =)

beagle3 · on May 1, 2020

This is interesting. I am not versed enough in law to opine on whether the actual uploading/download itself would be considered legal or illegal - there are many very strong opinions here on many aspects, and none with a legal argument more convincing than "I believe if a judge looked at it they would ..."

However, I would describe it differently: This is a "denial of information" (sort of a "denial of service") attack on the legal process. In order to actually sue, you need to have to make a more-than-good faith effort to discover who it is that wronged you and needs to make you whole - and, unlike torrents or emule or whatever kids are using these days, OFFSystem makes that part hard to discover or prove beyond reasonable doubt - especially if the descriptor block is transferred out of band.

This is somewhat similar to shell games played by many legal "villains" such as patent trolls: You encapsulate company ownership through various jurisdictions (e.g. company A in the UK is wholly owned by company B in NL, which is owned by company C in the US, which is owned by company D in Russia, etc.); it is incredibly cheap to set up (a few thousand US$), and incredibly hard to unravel (tens to hundrends of thousands of US$). If you are trying to target such a company - e.g., to countersue, you realize you have no idea who to sue - you have an opaque company A, and not much to go on.

Courts e.g. in germany, have ruled that IP address is not a good enough identifier to sue a person for copyright infringement unless you have other supporting data. It's not directly comparable, but indicates that OFFSystem might provide a defense, practically (can't figure out who to sue) and maybe even legally (not sufficient proof), depending on jurisdiction.

It's the intent that matters in many cases, but burden of proof also matters. I think it is philosophically a very interesting question and far from clear cut.

dooglius · on May 1, 2020

This seems pretty easily defeatable: an adversary just needs to track the timestamps of various blocks, and the last one that gets created is the "real" one. The stuff about copyright seems pretty dumb, obviously the most recent block is the interesting one, that you've just "encrypted" with the previous blocks. Also, even without doing any tracking, you can narrow down the uploader to a set of at most N people, where N is the number of blocks (which presumably can't be too big for performance reasons).

ComputerGuru · on May 1, 2020

The copyright stuff is bunk, but their defense isn’t (just) what you’ve said but rather that a single block is used to decrypt multiple files so their defense is that no one copyright owner owns the block in question. Which of course is BS since any part of a component derived from illegally sourced copyrighted materials renders the entire derived work in violation (eg a mixtape with one improperly sourced song or twenty is just as “fair game” for a lawsuit).

musingsole · on May 1, 2020

As mentioned elsewhere, it is difficult to prove where any block in OFF was derived. So while a block is used to rebuild a copyrighted work, the block itself can't be subject to copyright, or more so, could have been generated by a freely available work and so has no ties to the copyrighted work.

It seems like building a dictionary and then a system to rebuild books by combining fragments (words) from the dictionary. The book can be copyrighted, but the words couldn't be. Even if the word was coined in the book and submitted to the dictionary.

dane-pgp · on May 1, 2020

> an adversary just needs to track the timestamps of various blocks

Does an adversary necessarily have knowledge of every time a block is added to the system?

> you can narrow down the uploader to a set of at most N people, where N is the number of blocks

I would expect N to be greater than 1000, and I don't think a 1 in 1000 chance of someone being the uploader is reasonable suspicion.

contravariant · on May 1, 2020

The problem with that tactic is that all blocks are essentially random noise (in isolation).

You could strengthen this a bit by always writing 2 blocks, one of which is random noise, and the other a xor of that block and t-2 others. This way each block is provably completely random (the blocks just aren't independent from one another).

dooglius · on May 1, 2020

If you write 2 (or N) blocks, the new blocks still comes from you as there's a data dependency on the external blocks you use, so the most-recent tactic still de-anonymizes the uploader.

ThePhysicist · on May 1, 2020

I don't know, following that logic it would be impossible to claim copyright over an encrypted file as it's basically also just a very large random number, even if I can reassemble the original file if I have access to the cryptographic key. I think you could probably share the encrypted file on the web if you don't share the key, as no one can know what the file contains (assuming the cryptographic method is secure). Together with the key this data will become copyrightable though as there is an easy way for any user to re-assemble it on his/her computer.

So I'd say downloading individual randomized blocks is probably not problematic and akin to downloading an encrypted file without having access to the key. Downloading the URL that points to the descriptor list might be problematic though, as this will allow you to "decrypt" the other blocks.

exrook · on May 1, 2020

I think the distinction this protocol makes is that by using XOR as the "encryption" method, given any input block you can choose a "key" to decrypt with to produce any other output block. A block in isolation provides zero information to the downloader. I think it could be argued that it is the knowledge of which blocks to combine is where the actual data is being stored, and maybe that's where the copyright owners could stake a claim.

adhoc32 · on May 1, 2020

So, the copyright owners will have a more difficulty job going after the users of the network. They have to prove that there is copyright infringement by monitoring the blocks downloaded as opposed to BitTorrent file hash identification.

heavenlyblue · on May 1, 2020

So does that mean if I store child porn on AWS I don’t technically own it since it’s somewhere in the virtual cloud?

The only way I could access it is through that weird SSH key that doesn’t contain any videos in it.

exrook · on May 1, 2020

I don't think we are in any disagreement that whoever is uploading the data "owns" the data. The interesting idea is that the entity storing the bytes has 0 information about the data they represent, in the information theoretic sense, since they can decrypt the data to any value by choosing a sufficient key. This is not true for most other encryption schemes where the encrypted data has enough structure to it that theoretically it could be retrieved without the key, although the whole point of the encryption is that this isn't a practical undertaking.

heavenlyblue · on May 1, 2020

Oh no - wait. Let’s take this idea even further.

I don’t actually store child porn - the videos I have are all decryption keys to files filled with zeros that I have encrypted with a one-time-pad cypher.

ComputerGuru · on May 1, 2020

It poses the following question as its defense: if a file is broken down into blocks and a single block is used to recompose two or more different files but is itself unique from any of the actual components of either of the two files, then no one can claim copyright to it.

I don’t think that works the way they think it does, though. Just imagine analog data instead of digital, and take two copyrighted works, break them down into blocks, mix them together with some randomizers, and what do you get? A work derived from multiple copyrighted sources. No one needs to “own” the entirety of the source, if any part of it is derived from copyrighted materials in a way that isn’t exempted via eg fair use, then the entirety of the resulting work is in violation and is fair game for lawsuits, etc.

Simply moving this to the digital domain from analog doesn’t change the logic, does it?

tantalor · on May 1, 2020

A simple example may be dubbing two songs on top of each other. Then you can subtract one of the songs to recover the other. Obviously the double-song is a derivative work of both.

amelius · on May 1, 2020

The point is that using their algorithm, you can't prove that a block of data is derived from an existing work.

Jasper_ · on May 1, 2020

Doesn't matter. At the end of the day, you used a key with intent to decrypt such data and extract the original work.

Courts don't operate on technicalities. The intent is pretty much all that matters, as long as you prove that someone intended to pirate stuff, doesn't matter what rube goldberg machine they use to actually pull it off.

Schemes like this have been tried since the 1800s, perhaps far longer, where someone smugly announces "well technically I didn't defraud them, your honor" and gets sent straight into prison.

egh5oon · on May 1, 2020

> Courts don't operate on technicalities. The intent is pretty much all that matters, as long as you prove that someone intended to pirate stuff, doesn't matter what rube goldberg machine they use to actually pull it off.

You are absolutely right.

This is what enabled prosecuting the founders of the pirate bay, among other things.

The fact that files are hosted elsewhere, mangled, encrypted, cut in pieces is entirely irrelevant to the court.

Hoping that a technicality gets you off the hook is exercise in shortsightedness.

amelius · on May 1, 2020

The point is that courts can't prove anything based on the information they get from the network.

This is in contrast to e.g. bittorrent where every seeder is in direct violation and provably so.

Jasper_ · on May 1, 2020

The same thing that takes down bittorrent users will take down this too: the network log of you typing "Pink Floyd" into OFFsystem's search bar.

This is technobabble theater by people who do not understand law nor courts.

adhoc32 · on May 1, 2020

This. Any block of data can be used to reconstruct anything on the network.

amelius · on May 1, 2020

Last project activity: 5 years ago ...

Sad, because it looked promising.

However, this part seems weak:

> If the OFF-internal search function is used, search terms are untraceable to its originator, because the search request is forwarded to the next node and its results back to that node instead of directly to the originator. It is thus not possible to decide whether a node is the originating node or a node doing a search request on behalf of another node.

Being an accomplice in a crime is also a crime.

qznc · on May 1, 2020

There are quite a few of these p2p content-addressed storage systems out there: OFFSystem, Dat, Freenet, IPFS, I2P, Maidsafe, ...

Does anybody keep an overview? Has any of them achieved serious use yet?

rblatz · on May 1, 2020

The whole thing is technically interesting but legally not at all impressive or interesting. It’s almost like this was built by engineers that tried to solve a legal problem based on a view of the law informed by prime time legal television shows.

amelius · on May 1, 2020

Except that they make it impossible to tell who to sue. And that's the clever part.

setr · on May 1, 2020

The natural defense would be to have non-copyrighted things on the network as well. By assisting with any search, both legal searches as well as illegal ones, with no way to disambiguate (unless you had a copyright checker yourself when receiving the search request) the search is now laundered.

orisho · on May 1, 2020

This is nonsense IMHO. This is just an encoding scheme to encode data. Just because blocks used by the encoding scheme are cached and reused for other data does not mean that they're not used to represent this data. If the data downloaded is copyrighted work, then copyright applies. Are copyright laws specific as to the technical method in which reproducing copyrighted material without authorization counts as a violation, or is the method irrelevant?

If the method is irrelevant, then yes, using this scheme to copy copyrighted material is copyright infringement.

If it is relevant, then there's nothing special about this scheme at all except that it is different than the specific methods outlined in the law.

aidenn0 · on May 1, 2020

This reminds be a bit of Dagster: https://core.ac.uk/display/22569796

heavenlyblue · on May 1, 2020

For multiple rounds of XOR for encryption and a reused one-time-pad it would not be so hard to reproduce the files if needed.

pgt · on May 1, 2020

Fix the brand then I'll consider using it.

mdszy · on May 1, 2020

If you really think the copyright laywers who get paid tons of money to professionally bust stupid schemes like this wouldn't instantly tear it to shreds then oh boy do I have a bridge to sell you.

This stuff honestly reminds me of "soverign citizens"

"I'm doing something illegal but AH HA if I make use of this weird technicality or magical legal incantation it's not actually illegal! Haha!!"

greenshackle2 · on May 1, 2020

> No block can be copyrighted without logical contradictions, because blocks used for re-assembling a source file block are re-used for re-assembly of other source file blocks. It is undecidable who would have copyright on a block, which has several meanings. Everyone would have copyright on everything.

That's a hopelessly naive, almost sovcit-level take on law. The lawyers and judges will not give an iota of a fuck for this argument. The facts are, someone uploaded a copyrighted work to the system. Someone else downloaded it out of the system.

The classic "What Colour are your bits" is relevant here:

https://ansuz.sooke.bc.ca/entry/23

pdonis · on May 1, 2020

> The classic "What Colour are your bits" is relevant here:

I agree this article is relevant, but perhaps not in the way you meant.

To me, the point of the article is that you can't tell what Colour bits are by looking at the bits. The OFFSystem can be viewed as a way of illustrating this point (though that probably wasn't what its designers intended). But the law wants to insist that you can tell what Colour bits are by looking at the bits--as the article says, that's basically what DRM and other such systems try to do. But it can't be done.

To me, that means that the law needs to change--as it has changed in the past when technology changes in a way that invalidates assumptions that the law previously made. It looks to me like the main thing that is preventing the law from simply changing in a common sense way to accommodate the technological change of computers is that there are large corporations who make a lot of money by owning copyrights. I personally don't have a lot of sympathy for those corporations since their business model has nothing whatever to do with compensating the actual creators of the works whose copyrights they own.

AgentME · on May 1, 2020

>But the law wants to insist that you can tell what Colour bits are by looking at the bits

What makes you say that? I would expect that the law (or the judge enforcing it) cares more about the reasons people use the software and what they get out of it in the end than about the specific bit strings. The OFFSystem is clearly software for storing and retrieving files or media; it's built for that use and everyone using it is using it for that. How it accomplishes that internally isn't significant.

Do you think any cases would or should be overturned if it were revealed that, unknown to anyone at the time, a popular torrent program actually worked like OFFSystem, or hell, worked by magic without actually physically sending data between users as long as someone was concurrently doing the seeding-the-file ritual? -- I think this line of argument is strongly related to the argument that it's still murder if instead of shooting someone, you intentionally kill them by using a rube goldberg machine or by placing a land-mine that the victim triggers or by using a magic gun that erases whoever you aim it at. Intentions and results matter to the law. I think this makes technically-minded people squeamish because intentions are a lot harder to prove and are easier to mess up than questions of plainer physical facts, but to ignore them is too gameable.

The OFFSystem only sounds interesting because it's transforming the data for no other reason than to try to be a legal loophole, but strangely we'd never even consider the idea that software that transforms data for a reason (like zip archivers, https encryption, or re-encoding) would work as a similar legal loophole. I think the concept of a data transformation for no other purpose other than to try to be a legal loophole just manages to surprise people enough to think that it might work.

pdonis · on May 1, 2020

> What makes you say that?

The article says so, at least implicitly, in passages like this one:

    What we are doing with rights management information
    is simulating Colour in a computer-sciencey way. But
    lawyers will seize on the possibility of doing this
    kind of simulation and say, "See!  You admit it! You
    can recognize the Colour of bits after all!" and then
    conclude from there that all the other rules they want
    to make (such as "Red Troubleshooters may not walk
    down Orange hallways") are meaningful in the computer
    science realm.

My view of the typical legal attitude towards DRM schemes and the like is similar.

> I would expect that the law (or the judge enforcing it) cares more about the reasons people use the software and what they get out of it in the end than about the specific bit strings.

The law is also supposed to care about the people who created the bits in the first place--after all, the purpose of copyright law is to encourage creation of works. But the law as it is actually practiced does not do this at all; the parties it most consistently favors are, as I said, large corporations whose business model does not include fair compensation to the original creators.

I agree that, viewed as an actual legal strategy within the context of our current legal system, OFFSystem is daft. No court is going to agree with the legal interpretation that the creators of OFFSystem claim to be using. But I view that aspect as simply an illustration of the fact that you can't fight legal mumbo jumbo with more legal mumbo jumbo. The proper response to draconian legal doctrines with regard to DRM is to remind the legal system what it is actually supposed to be doing, and to refuse to accept the arguments that DRM proponents offer in the framework of copyright law, which basically amount to saying that they have exactly the same rights as the actual creators of the content, even though their business model treats those creators as expendable cogs in a machine. That's what I mean when I say the law needs to change: it needs to stop taking the claims made by large corporations promoting DRM and the like as presumptively valid, simply because they have maneuvered themselves into a position where they can show a piece of paper saying they "own" the copyrighted content.

_fq4v · on May 1, 2020

> But the law wants to insist that you can tell what Colour bits are by looking at the bits--as the article says, that's basically what DRM and other such systems try to do. But it can't be done.

Of course you can tell what colour the bits are. If someone uploads a recording of Taylor Swift's Love Story, literally everyone is going to know who owns it. Same is true of lesser known recordings -- someone will be able to show that it is based on something they own, and thus the colour of the bits is revealed.

> business model has nothing whatever to do with compensating the actual creators of the works whose copyrights they own.

Property isn't determined by who made something, nor should it. Otherwise, you would have no rights to your own home.

pdonis · on May 2, 2020

> If someone uploads a recording of Taylor Swift's Love Story, literally everyone is going to know who owns it.

First, as a matter of technical fact, you are wrong. You seem to think that whether a particular sequence of bits is or is not "a recording of Taylor Swift's Love Story" is a simple, easily computable function of that sequence of bits. It's not. That's one of the main points of the article.

Second, when you speak of ownership, you are simply stating the position large media corporations take without supporting argument. Sorry, not buying it.

> Property isn't determined by who made something

I didn't say it was. I'm not talking about the concept of "property" in general. I'm talking about a particular case: copyright law.

In the US at least, the purpose of such law is explicitly stated in the US Constitution: to give the authors of works exclusive rights to them. You can't transfer authorship the way you can, for example, transfer title to a home; but the legal theory relied on by large media corporations assumes without argument that you can, and equates uploading a copy of a song that has already made millions, and over which rights of authorship are not even being asserted by its actual author, with stealing the bread out of the mouth of the author. That argument violates the US Constitution as well as obvious common sense.

The only reason this legal theory got any traction at all was that before the age of computers and the Internet, authors did not have the resources to get their works widely read (or the equivalent for other media) on their own. They needed publishers who were willing to take the business risk of printing a lot of copies of a book (or, once recorded music came along, a record) in the hope of selling them. So authors had a strong incentive not to make an issue of the obvious absurdities in the legal theory that the publishers were relying on to assert copyright.

Now, however, there is no reason except inertia for authors, or songwriters, or any other authors of creative works, to use large media corporations to distribute their works. And in fact most such works no longer are distributed via large media corporations; they are posted on the web by the people who create them, and those people have a variety of ways of getting paid for their work without involving large media corporations at all. The corporations don't want people to recognize this because it will just make their business models die that much faster. But there is no reason why the law should be propping up outdated business models.

_fq4v · on May 3, 2020

> is a simple, easily computable function of that sequence of bits. It's not. That's one of the main points of the article.

It is... you get a human to listen to it, and they'll know.

> equates uploading a copy of a song that has already made millions, and over which rights of authorship are not even being asserted by its actual author, with stealing the bread out of the mouth of the author. That argument violates the US Constitution as well as obvious common sense.

The 'right of authorship' was asserted by the author when they gave someone else the right to enforce their copyright in exchange for a fixed sum -- something you call purchasing.

> So authors had a strong incentive not to make an issue of the obvious absurdities in the legal theory that the publishers were relying on to assert copyright.

There is no absurdity, except that which you construct in order to limit the rights of authors, the very people you are purporting to support.

pdonis · on May 15, 2020

> you get a human to listen to it

If you think this is what "simple, easily computable function means", I have no idea what language you think you are speaking, but it isn't English.

> they gave someone else the right to enforce their copyright

But how is that even possible given the plain language of the US Constitution that copyright protects the "authors" of works? You can't transfer the fact of being the author.

Yes, I know our current legal regime in practice allows this. I'm simply saying that means our current legal regime is broken and needs to be fixed.

> that which you construct in order to limit the rights of authors, the very people you are purporting to support

By your own account, the author is out of it as soon as they receive the "fixed sum" from the media company. So limiting the nonsense the media company can get away with after that point does not affect the author at all.

adhoc32 · on May 1, 2020

Sure, but I have a question: Is this network illegal? Or the real copyright infringement is the URL used to reconstruct the copyrighted work?

shawnz · on May 1, 2020

If the bits were derived from a copyrighted material then they fall under that copyright. Whether you can prove it just by looking at them is an unrelated issue.

adhoc32 · on May 1, 2020

You don't know that, only the owner of the URL knows what's what. The network can also store copyright free files.

shawnz · on May 1, 2020

Like I said, it's an unrelated issue whether you can prove the bits were derived from copyrighted material or not just by looking at them. That doesn't change their legal status.

adhoc32 · on May 1, 2020

I'm not disputing the legal status. Obviously downloading copyrighted material is illegal. I'm asking how they're going to enforce the law in a scheme like that. In order to have strong evidence you have to get physical access to the computer committing the crime. But in most free countries you can't raid houses without having evidence in the first place.

heavenlyblue · on May 1, 2020

But once you have the URL, you will be able to raid everyone who had those blocks.

You could make an argument that “they didn’t know”, but it doesn’t mean they will not be able to prove that you were an associate in crime.

adhoc32 · on May 2, 2020

But each block is used by multiple different files. Owning a block of a forbidden URL does not prove that you're committing a crime. Owning all blocks of that URL is an strong indication but still not a proof to justify raiding.

mdszy · on May 1, 2020

Haha I thought of the sovcit angle too and edited it in before reading this. But yeah, I absolutely agree. Throwing weird technicalities around thinking it somehow absolves you of all legal responsibility is completely asinine.

brianhorakh · on May 1, 2020

If this caught on, the mpaa and riaa lawyers are going to be out of jobs.