This reminds me a bit of the science of nanoinformatics as described in one of the Expanse novellas (The Vital Abyss):
"A thought experiment from my first course in the program: Take a bar of metal and put a single notch in it. The two lengths thus defined have a relationship that can be expressed as the ratio between them. In theory, therefore, any rational number can be expressed with a single mark on a bar of metal. Using a simple alphabetic code, a mark that calculated to a ratio of.12152205 could be read as 12-15-22-05, or “l-o-v-e.” The complete plays of Shakespeare could be written in a single mark, if it were possible to measure accurately enough. Or the machine language expression of the most advanced expert systems, though by then the notch might be small enough that Planck’s constant got in the way. How massive amounts of information could be expressed in and retrieved from infinitesimal objects was the driving concern of my college years."
Pure fiction at this point, but it would be an interesting experiment to encode data into objects that could be expressed using the mathematical ratio of their shapes or sizes.
> Or the machine language expression of the most advanced expert systems, though by then the notch might be small enough that Planck’s constant got in the way.
With Planck's length being roughly 10^-35m, I'd say you'd hit the limit trying to store more than 15 bytes.
This is insightful. 15 bytes is not a lot.
I wonder what are other narural limits on information density? For example, magnetic field. Is there a least measurable difference?
The limit on information density is called the Bekenstein Bound, the point after which adding more information to the volume would create a black hole.
How are you calculating a value of simply 15 bytes? The proposal is that the mathematical value of the ratio of the lengths of rod and the notch will deliver a value on the number line, which in effect can be of any desired length. You actually could find a number that can represent the complete contemporary knowledge of Humanity.
The limitations mentioned in the above comment relate to the fact that to know you'd arrived at the number designed into the Notch and the Rod, you'd have to have agreed to beforehand the tolerance limits on measurement of this 'device' so that the uncertainty in measurement can be ignored and the ratio derived.
To maybe get the right number you would have to expand your possible number of ratios (at a certain measurement sensitivity you would have a certain number of lengths you could use, and thus certain number of ratios would be available to you) and to get just the right number to describe your whole "machine language expression of the most advanced expert systems" you would have to delve into tolerances of the Plank length.
Or adjust the size of your rod length and notch, make them bigger, to get that sweet number with a 'poorer' level of sensitivity.
.. Or find a number with enough usable number sequences to serve the purpose and program in the numbers surrounding gibberish as markers/jump points to the next sequence of usable numbers. I suppose you could find enough usable sequences in the full expression of pi (as it's rational expression is without end) to write a program that can decode the full linux kernel out of it.
Assuming the length of the rod is 1m, and you have a resolution of one plank length, there are 10e35 possible ratios you can express (because that's the number of possible locations of the notch). that's about 2^113, which is a number which fits in 15 bytes of information. As discussed below, if you also allow for the bar to be the size of the observable universe, this doesn't increase by much. A notch or ratio is linear, information and combinatorics grows a lot lot faster than that.
That's the crux. What we're saying is that, of these set of ratios, there exists one whose infinite decimal representation contains our intended data. Yes we'd be limited to Planck length resolution, but the idea is that we would determine such a ratio and construct the notch and bar in such a way that it yields the decimal sequence desired; the # of particular ratios is not relevant.
I wouldn't be surprised if it can be proven this can't be done, but then that would be proper question to ask.
But what does that have to do with the emergent ratio between the notch and the rod? There may be 10^35 possible steps, but that does not mean that the answer, the ratio of the notch and the rod, will be limited by that. The answer will come from the number line, where any irrational number can have however many trailing numbers. If the ratio of the rod and the notch is 22/7, how much information would you say that is?
If 'x' marks the notch of a rod of length "1", [0---------x------------1], then the implied ratio is not x/(1-x), but x/1 (so the ratio is always < 1.) Even so, your question could be "what is the information content of 1/7" (the presumed implication being that 1/7, while periodic, has an infinite decimal representation.)
But that is not the direction we are interested in. We would like, given a message, such as "l-o-v-e", or 12-15-22-05, or 0.12152205, to figure out what is the ratio that uniquely specifies it. As we can only mark one notch, we can create "only" h ~= 10^35 ratios, or represent h unique messages. We know how to distinguish between h unique elements with log(h) bits (we just enumerate them from 1 to h and write that number down in binary.)
Let's assume that the rod is 1 m in length, and say I wrote a book that is perfectly represented by the number 0.abcdefg...yz. If that number is perfectly represented by one of the ratios in set 'h', then have I not stored more infromation than 15 bytes?
As for x/(1-x), why not? And why limit ourselves to a 1 m rod? Why not a 22 m rod with a 7 m notch? I could then define the method of decoding the information via (Rod length)/(notch length). The I'd have 'infinite' information in the form of expression of pi.
My main issue with the parent comment is that they imply only 15 bytes of data could be stored via this method. I think that's prespoterous as the number of ratios my be only 15 bytes, the ratios themselves can have any possible size.
It becomes more a game of probability rather than that of exact numbers. Will you find the right number, from set 'h', that matches exactly what you wanted to say?
A 22m rod with a 7m notch is perfectly ok. You'd then be encoding 7/22. As others have pointed out, the rod could span the known universe and it would still not matter.
Say you found a great message in the representation of 1/7. Weird, since it is a rational so if its representation is infinite, its periodic (you can't write down 1/π or 1/e for example, as these are irrationals.)
Excited you found that message, you want to put your notch exactly at 1/7 on the rod to celebrate it.
But you can't. Your desired notch position will fall between two possible notches, spaced one planck distance apart, and you'll have to pick one of the two.
And when you do, you truncate your message to 15 bytes worth of information.
Not really. You have stored something which has a representation of more than 15 bytes, but you will struggle to find a way to store most things longer than 15 bytes in this way. (everything can be transformed into a representation which is infinite in size anyway, for example by using an irrational base). Information really comes down to the number of possible values you can distinguish.
If you do the ratio thing you have described you will find that alpha and beta are many, many times the size of the observable universe for something like a book. If you allow the length of the rod itself to also contribute to the information, you have added another symbol so you can store more than 15 bytes, but this doesn't even double the amount of bytes you can store.
Assuming all of it can be described as ascii characters, (ratio of.12152205 could be read as 12-15-22-05, or “l-o-v-e.”) then let's assume 4 numbers will be required to encode one letter. That gives us, in our limited system, 4 decimal bits to a byte (we really can't limit ourselves to just ratios that are sufficiently large and only made of 1s & 0s).
So, all human knowledge is, in our system,
250*(1024^6)=288230376151711740000 bytes.
As we assumed that all of it can be described as ascii characters, and in the standard system 1 byte holds one character, there are now 288230376151711740000 characters. Expressing these many characters in our 4-byte decimal numbers will require a ratio with
(288230376151711740000*4)=1152921504606846976000
numbers in it. All the ratios with 1.15 * 10^21 numbers will be the candidates which can be used to store all of humanity's contemporary knowledge.
Now, as I said earlier, the ratios may have an impressive number of numbers in them, expressed in a decimal system, and there are an infinite number of them on the number line itself, that does not mean all of those are available for use. We are limited to ratios derived from lengths which are multiples of the plank length. Assuming, for a particular rod and it's notch, there is a set 'h' that contains all the possible ratios. We will be limited to such ratios only to find our matching ratio, the ratio through chance of cosmic infinity, or not. If not, then we have to increase/decrease the design length of the rod and the notch to change the set 'h', and hope there is a number we are looking for.
How many such ratios, of the required length of 1.15 * 10^21 numbers, would exist if derived solely through the ratios, is unknown and wholly dependent on the information that has to be encoded. The longer the data, the higher the probability of not finding the right number. As you put it, there will be 15 bytes of choice, or in a one meter rod there will be 10^32 number of choices of ratios to play with. If you doubled the length of the rod, you would have twice the amount of choice, as so on. Again,
Are you just saying the length of the rod can be variable along with the position of the notch? If so, it seems like that means you approximately square the number of possibilities, so if there was a maximum of 25 bytes, then there is an maximum of 50 bytes available in something the width of the universe. Still not large.
Your comment is a non-sequitor; reread what he said.
He's saying that there exists a certain ratio in the set of ratios whose decimal representation represents a corpus of knowledge, in this case the entirety of human knowledge.
How did you not get that the term '25 bytes' is merely the size necessary to store the number of possible ratios instead of any actual information?
Did no one read the discussion below about encoding information into pi? This is the same concept. Yes, there are only 25 bytes worth of ratios to chose from, but those ratios themselves can, possibly, POSSIBLY, store all the information.
The same has been said about Pi (3.14). If you can compute, store and search enough the digits of Pi, you can reference anything by just providing the 'start' and 'finish' locations. Unfortunately, with enough digits of Pi, the 'start' and 'finish' numbers can get quite long themselves.
Years ago I tried this and basically ended up proving that if Pi is random and you are "compressing" random data, on average the start and finish numbers together are at least as long as the numbers you are trying to "compress."
In fact, any lossless compression algorithm has the property that the output is (on average) at least as long as the input. The best you can hope for is an algorithm that compresses the kind of data that humans want to store, at the expense of making other data a bit longer. If you're trying to compress random data then you just can't do it.
Here's a proof: consider the strings of length n or less, suppose there are M of them in total. Their average length is just the sum of all their lengths divided by M, and the average length of their compressed versions is just the total length of the compressed versions divided by M. Since the compression is lossless the compressed strings must all be different.
Since there are M strings, if any of them mapped to a string of length more than n then there must be some string of length at most n not being mapped to, so the average length can be improved by instead mapping that string to the shorter string. So any optimal compression method must map only to the strings of length at most n.
So the M outputs are just the M inputs, possibly permuted. So their total length is the same, and hence their average length is the same.
> any lossless compression algorithm has the property that the output is (on average) at least as long as the input.
The article you’ve linked says nothing about average. It says that for every algorithm there’s at least some input files that increase the size. It even explains more about that:
Any lossless compression algorithm that makes some files shorter must necessarily make some files longer, but it is not necessary that those files become very much longer. Most practical compression algorithms provide an "escape" facility that can turn off the normal coding for files that would become longer by being encoded. In theory, only a single additional bit is required to tell the decoder that the normal coding has been turned off for the entire input
>In fact, any lossless compression algorithm has the property that the output is (on average) at least as long as the input
I don't think this is true. If it was, lossless compression would be useless in a lot of applications. It's pretty easy to come up with a counter example.
E.g.
(simple huffman code off the top of my head, not optimal)
symbol -> code
"00" -> "0"
"01" -> "10"
"10" -> "110"
"11" -> "111"
If "00" will appear 99.999% of the time, and the other 3 symbols only appear 0.001% of the time, the output will "on average" be slightly more than half the length of the input.
Sure, I'm assuming that (a) you are trying to encode all strings of length at most n and (b) you have the uniform distribution over those strings. This makes sense in the original context of encoding random data.
>you have the uniform distribution over those strings. This makes sense in the original context of encoding random data.
Lossless compression is nothing more than taking advantage of prior knowledge of the distribution of the data you are compressing.
Random data isn't always (or even often) uniformly distributed. Everything we compress is "random" (in the context of information theory), so I disagree that it makes sense to assume uniformly distributed data.
Then the original statement about not being able to use pi as a data compression method is false. It could be the case that 99% of the time you want to encode the string "141592653".
i did somewhat the same thing. it introduced me to programming... i thought how about multiple start indices and fixed width? you can then compress the list of start indices in the same manner until you reach sufficient compression :D
Wow, isn’t there a really low probability of finding your phone number in the first 200m digits of pi? (0.09995% in first 100m) I’m tempted to start throwing a dictionary of phone numbers through this pi lookup to find your number, call you, and verify, I think you could quickly narrow in on your phone number given the information above.
I think you'll have trouble. Assuming the first digit of the phone number is somewhere between 114.5 million and 115.5 million, you have 1 million potential 11 digit numbers to check.
There are 10^11 sequences with 11 digits. The number of people in the USA is 3×10^8, and we can assume there is roughly 1 number per person (some people don't have a phone number and some people have more than one, but it turns out that the exact approximation won't matter unless we're a few of orders of magnitude off). So about 0.3% of 11 digit sequences are valid phone numbers.
So there are approximately 0.3% × 1000000 = 3000 people with phone numbers around the 115 millionth digit of pi. You have no way of knowing which one of those people is sjcsjc.
Derren Brown did a great series of tricks based on probability.. your comment reminded me on the one that showed a video of him flipping 10 heads in a row on an unbiased coin. He revealed them that he had done it by flipping a coin for hours until he got that sequence.
World's most painful compression algorithm: Finds a mathematical series of infinite digits, and the offset into it, that most efficiently compresses data passed in. Probably have to chunk the data up to make this efficient.
Of course one can argue that all current real life compression algorithms are aiming to simulate this, and that a brute force algorithm is one of those "after turning the sun into a CPU, still won't have enough compute power to finish the problem" types of solutions.
The late mathematics popularizer Martin Gardner wrote about this concept in the 1970s (I think his example involved reducing the Encyclopedia Britannica to a notch), although I don't know if the idea was unique to him or if he was popularizing an earlier idea.
You could split the package of data into chunks and place multiple notches on the bar. You'd need to include enough information to allow the chunks to be sorted into their original order for that to work.
As if this were a practical means of storing data.
I was confused as to how you could represent a bootable CD image in printable characters. It turns out that you can't. This is a tweet of a perl script which creates a cd.iso file that you can then boot from. The perl script significantly decompresses the data in the tweet.
That said, this is a playable game in around 60 bytes of actual data which is impressive.
It sounds like you've been trained to hate something that you actually love. That's kind of sad. Couldn't you just wear the awesome shirt while avoiding the negative aspects of neckbeards?
Not CD images, but there is DOS C compiler generating executables with only printable characters (and also no self-modifying code that could contain non-printable characters)
The compression is basic run-length encoding, leveraging the Perl repetition operator (x), and the property of the Perl print/say functions that they concat items passed in a list before writing to STDOUT.
I tried xz -9 on it and found, to my surprise, that it was actually longer than RLE!
Then I tried gzip -9, because perhaps that has a smaller header? Yup, saved a few bytes, now it's about the same size. Finally, I remembered that bzip2 does a lot better on text than gzip, and who knows, it might also have a shorter header than xz. Again, a few more bytes saved! Down to 223, where the original is 249 bytes (including the 'say' part but excluding the unnecessary delimiting apostrophes or the rest of the command).
Most of the "compression" is zero bytes due to fixed offsets of various things (e.g. the first 16 sectors of an ISO 9660 image are a "system area" not used by the actual file system).
As someone that develops almost exclusively in high-level languages on top of may levels of abstraction, it's nice to see what can be accomplished close to the metal.
This reminds me of Steve Gibson's SpinRite, which (from what I recall) is a fully functional disk recovery utility written entirely in assembly. https://www.grc.com/spinrite.htm. Say what you want about the man, but this is something that's saved me on at least one occasion, and is smaller than things I produce that do a lot lot less.
What I've read is that SpinRite simply just reads and writes to the disk, triggering the disk firmware to reallocate bad sectors. I think it just tries to read many times, which can sometimes help.
The other argument is that the various things SR tries to "manage" (sector interleave, getting various timing parameters "perfect", etc) were only relevant with ST-506 (!!) and similar disks from the 80s/very early 90s, and that anything remotely modern (even IDE, virtually 100% of SATA) generally doesn't provide enough low-level control surface that trying to micro-manage the disk's behavior will do anything particularly special.
Of course, I'm sure each manufacturer has their own tools and widgets that can use undocumented proprietary SATA/SCSI commands to control the drive's behavior at a very low level, but those kinds of tools are a) rare as hens' teeth and b) probably very easy to break disks with due to poor UI design and lack of documentation. Chances are the most expensive data recovery centers probably have some of these tools, and more importantly the training to know how not to kill HDDs with them :P
TL;DR: Yes, SR works, but probably just as well as dd3rescue; as always, if you think a disk is this side of dead and you think you have a chance without specialized tools, just imaging it is probably the best first step, because SR and all other tools will of course stress it.
With all of this said, I really, really like SR's startup animation :3 and I agree that it's refreshingly small.
Your snake game doesn't even have food for the snake to eat? The snake just grows without eating, until the head collides with the tail somewhere? Pssh, amateur hour...
My first thought was to make random sectors of the first hard drive the "food"... For better or for worse, this stuff really brings me back to my teens. Nice job!
Tron is a great game for fitting into a very small space. Back in high school I spent a lot of time optimizing Tron on the TI-82 to fit in as few bytes as possible. It looks like I got it down to 152 bytes: (only about 80 bytes of actual Z80 assembly code)
No it isn't.
In TRON you leave a trail behind. In Snake, you grow as you eat and drag your body along the path you crawled.
TRON's trail is static, Snakes body is dynamic.
And that difference only exists to support the single player vs multiplayer dynamic. If Snake left a Tron tail then it would be a very short, unfun experience. If Tron didn't leave a permanent trail then matches would last too long.
This is great, and ironically it’s sitting next to another HN article where, in the comments, someone is actually defending a NYT news article that weighs in at 6MB.
In a time when most software is filled with superfluous waste and endless layers of abstraction and libraries, it’s nice to see that the art of writing minimal software is not completely lost.
So page bloat is a real issue, not just "old man rants about good old days". A shame really because the rest of the page is pretty light wieght, 5.2kb for the content and 2.2 for the css.
Even worse is that the static preview of the monster gif is the second largest element.
I used to have 250MB free with my internet contract until the ISP silently upgraded it to 500MB recently (must have been the past year or so, not sure when).
If you use it only occasionally to read a few articles, you can do fine with only a few megabytes. Heck, I'd almost say kilobytes if bloat wasn't so common. Anyway, that's until shit like this comes along. If you were truly trying to watch a video---sure, that uses a lot of that tiny data bundle in one go, but a gif that should have been a video truly leaves you wondering why was this necessary?!
You just made me think of a GIF-to-mp4 conversion service that runs as a proxy.
Sadly, because of the "HTTPS everywhere!!!11" thing, such a service would not be viable (it would need to rewrite the <img> to a <video> in order to work, of course).
Opera were offering something like this for a while. With the support of a browser and HTTP proxying it's not a problem, the SSL terminates at the proxy and is re-encrypted under the proxy's SSL.
Many web services will take an uploaded gif and turn it to webm before showing it, e.g. Twitter.
For sure, although as a counterpoint I’ve also seen programmers do the reverse. Spending a lot of effort building bad abstractions to completely minimize code reuse/size, when really we would have been better off with slightly longer code that was clearer and that meshed better with our problem domain.
You can do “magic” with relatively easy things. For example, spend a weekend with https://forthsalon.appspot.com or processing.org and see where you can get to...
This is the second time this week I've seen someone note that ISO standards are expensive - are they copyrighted or something? Why doesn't someone just publish them online for free?
I bought the ISO 14000 Standard Document (Environmental Management) for $170 earlier this year, so it is not expensive for a company. Not sure how the "technical documents" ala ISO 9660 differs in price.
The certification on the other hand is a bit more costly though. My estimation of the certification cost for a ~20 people company if you do it as frugally as possible ended up at $18000 for the first year investment, and $6200 recurring the following years.
Man I love these things. Back before Twitter upped their character limits, I remember a trick to cram more data in a tweet was to abuse how Twitter counts characters (it attempts to count visually rather than by byte), so by using a ton of multipart emojis or larger Unicode characters to over double the information that could fit in a tweet.
Multibyte chars is optimizing for Twitter, not for a set amount of bytes. If you try to fit in 140 bytes and you use >140 bytes because they are multibyte chars, then yeah, you're cheating. But if you're trying to "fit in a tweet", I'd say that's perfectly fair game.
Anyhow, two games... cool, but to be perfectly honest, I thought the game was going to be more like the real snake than just drawing a non-overlapping line on screen. A more impressive game might be more impressive than two games.
The explanation is quite good but is there even more detail on what the assembly instructions are doing on a line by line basis? I’ve only written MIPS assembly and that was a long time ago.
yeah paste it into your plain text editor first and then do a code review and then save it to a jump drive and spin up the old backup computer that’s been wiped clean and then open up a sandboxed vm and then remove the wifi chip and then run it
I grew up (career wise) on CTOS systems with 286/386 processors that could address the full 16MB in protected mode without memory extenders or expanders that were available for DOS back in the day. Also premptive multi tasking. It was a great OS to learn on. more info - https://web.archive.org/web/20080828190425/http://www.byte.c...
It's my understanding that UEFI actually comes up directly into long mode?
"UEFI firmware performs those same steps, but also prepares a protected mode environment with flat segmentation and for x86-64 CPUs, a long mode environment with identity-mapped paging. The A20 gate is enabled as well."
Looks like the turbo button needs to be in the on position for this one: 0.05 FPS in a qemu KVM on my old P8600... I presume the game loop uses the hardware clock :P
For some reason, emulators (at least the ones I tried) wait 4x what real machines wait when you use BIOS int 15b 86h. You can tweak the code if you want to play at a faster speed.
There’s probably a sound explanation for this discrepancy...
I couldn't find "bf86 fec1". I did however find "b486 fec1".
"bf86 9042" made it literally so fast I physically couldn't keep up. The following worked for me (w/ QEMU on old (no KVM) Pentium M), this may be too fast on newer machines:
"A thought experiment from my first course in the program: Take a bar of metal and put a single notch in it. The two lengths thus defined have a relationship that can be expressed as the ratio between them. In theory, therefore, any rational number can be expressed with a single mark on a bar of metal. Using a simple alphabetic code, a mark that calculated to a ratio of.12152205 could be read as 12-15-22-05, or “l-o-v-e.” The complete plays of Shakespeare could be written in a single mark, if it were possible to measure accurately enough. Or the machine language expression of the most advanced expert systems, though by then the notch might be small enough that Planck’s constant got in the way. How massive amounts of information could be expressed in and retrieved from infinitesimal objects was the driving concern of my college years."
Pure fiction at this point, but it would be an interesting experiment to encode data into objects that could be expressed using the mathematical ratio of their shapes or sizes.