Let's not forget also about Library of Babel (https://libraryofbabel.info/) which includes all texts ever written and that will ever be written up to 3200 characters. I truly wonder how that website is allowed to exist and why they don't claim copyright for every other website that has been created since.
Does it truly contain all these texts? It seems more like it can generate any of these texts, but they are (obviously) not pre-generated and stored anywhere.
If we consider only lowercase non accentuated English characters and a few additional basic chars (space.,? etc), let's say 30 base characters, we are looking at 30^3200 combinations. Which is a number made of 4727 digits. Estimated number of atoms in the universe is 10^80. So basically nothing compared to this 30^3200.
Them claiming copyright over this would be a bit silly as the texts don't truly exist but only have the potential to be generated.
Would the situation change once someone can store all these texts? What if they can compress them efficiently? You could say this is just a sophisticated decompression algorithm of the original body of text.
The site can generate any sequence of text from its alphabet of 29 characters. This means it contains 29^3200 different works.
Let N = log(29)/log(2).
It is not possible to create a compression system where each work on average will use less than N*3200 bits once compressed, assuming each compressed value corresponds to only one work. This stems directly from the pigeonhole principle.
What could be possible is to create a compression scheme where say pages consisting of english text can be compressed to smaller (at the expense of pages of random garbage being compressed to longer).
This stems from the fact that the entropy (information density) of English text is much lower the 29 character alphabet space can permit. But you cannot on average have the valid English only pages compress to smaller than their entropy either. You can design the system to favor some works over others, but the average compressed length must be equal to or greater than the average entropy of the works.
And all you have really done here is invent a weird lookuptable table based compression scheme for english text. All other compression systems are also similarly constrained by the entropy of the data to be compressed, but most other scheme don't require an infeasible large lookup table of basically every possible output.
What Colour are your bits? is an interesting look into these concepts:
"The scrambled file still has the copyright Colour because it came from the copyrighted input file. It doesn't matter that it looks like, or maybe even is bit-for-bit identical with, some other file that you could get from a random number generator. It happens that you didn't get it from a random number generator. You got it from copyrighted material; it is copyrighted. The randomly-generated file, even if bit-for-bit identical, would have a different Colour."
Do you truly wonder that? Unless someone takes legal action fair use doesn’t take effect. Why would anyone bother to take legal action against them? This is essentially an art project.
BTW, it also includes all images: http://babelia.libraryofbabel.info/