Edit to add: I'd only tried searching for an exact UUID when I wrote this comment. I didn't realize it supports full text search! Now I'm even more impressed.
I'm really happy that the trick was magical to you - I was so surprised and delighted when I realized that this was possible, and I wasn't really sure if anyone else would feel the same way!
And of course, I'm proud to be providing so much utility here - finally we can find and use UUIDs tailor-fit to our needs
> If we didn’t care about generating valid UUID v4s, we could just generate 2^128 numbers, scramble them, convert the resulting bits into a hex string, and intersperse some dashes.
You can do that anyway. You'd only need the twiddling if you wanted to limit the amount of numbers you generate to 2^122. Since you're willing to generate 128 bits:
// get encrypted value
uint128 bits = encipher(seed);
// clear zero bits
bits &= 0xFFFFFFFF FFFF 4FFF BFFFFFFFFFFF; // instead of 4 and B, you can use 0 and 3
// set one bits
bits |= 0x00000000 0000 4000 800000000000; // these have to be 4 and 8
But since you're generating more numbers than you need, you're going to end up with 64 copies of each UUID in your final stream. This won't matter because no one will ever be able to notice, as long as your faux search functions avoid pointing it out.
Exercise for further development: modify substring search so that it follows the expected behavior of finding matches in order within the page. [I don't recommend attempting this.]
With a linear algebra library, you can guarantee that you've found the next, or the previous, match in sequence. I don't know what the state of the art is for fast linear algebra in javascript, though.
(The matrix approach also has the advantage that, when your full-text search problem has 2^115 solutions, you can compute the one you want, the next one after some index, without having to compute them all.)
FWIW, "search" doesn't work on mobile (Chrome on Android): I go to "Find in Page" and none of the magic happens. It's also bypassed on desktop when I manually open the search box via Edit->Find->Find... instead of using Ctrl+F.
I wonder if there's (yet) a browser API you could hook into: the same way browsers allow JavaScript to manipulate the history [1], maybe there's a way to manipulate the Ctrl+F/find-in-page search results.
That is, right now you're capturing the Ctrl+F keypress and opening your own custom thing to read the user's search string and act on it. But what we'd really like is a way to be notified "The user just asked to search for 'xyz'. Would you like to capture that event, or let it go through to the browser's default behavior?"
A quick Google search found nothing like that exists yet. I then asked ChatGPT about it, hoping that ChatGPT would at least hallucinate a plausible design for the API — and had mixed feelings when it didn't. It just printed that 'Browsers do not provide a way to listen for the "Find in Page" search event due to privacy and security concerns' and suggested capturing the Ctrl+F keypresses exactly as you have done.
As someone else said, it would also be more like full-text search if you also considered the primary-key column, e.g. searching for "0390814603917539994005679487460590835" should jump to the 390814603917539994005679487460590835th row. (Highlighting-to-select pieces of the text also doesn't work: I'm not sure why not, since I would have thought the browser gives you at least that part for free.)
Besides "search" not working on mobile, the styling on mobile is such that the "scrolling" does not convince: to my eyes it looks too obviously like "changing the values in the cells of a fixed table" as opposed to "scrolling through the table itself." You could maybe mitigate that by animating quickly among three different page layouts with the table vertically offset by different amounts.
It occurs to me that if JavaScript has something like Python's `random.sample`-without-replacement, then you could set your `RANDOM_SEARCH_ITERATIONS` to 256 and achieve perfectly consistent (and exhaustive) "search" when the user has entered all but 1 or 2 hex digits of their desired result. And/or, you could just have the page secretly keep a history of the search results the user has already seen: this would prevent the user from finding out so quickly that "search + next + next + prev + prev" doesn't always get them back to where they started.
Speaking of exhaustive search results: With a bit more (probably equally algorithmically interesting) work, you could emulate the browser search's "7/256" by tallying up the number of UUIDs satisfying the constraint, e.g. if the user has typed "1234567" then you could display "1234567 (158456324585806817418058661888 results)" and maybe even fake up a convincing position indicator like "1234567 (17415833585801881134805987465/158456324585806817418058661888)". I guess if you display it as "1234567 (1.742e28/1.585e29)" then you don't even have to cheat that much. :)
A great example of Teller's observation that "sometimes magic is just someone spending more time on something than anyone else might reasonably expect."
My favorite of these was a trick where someone picked a card out of a deck and then Teller revealed a large version of that same card in an unexpected area in the vicinity, It turns out that what he had done was hide a complete set of large cards in the area before the trick and memorized the location of every one of them so, e.g., the king of hearts would be at the top of a palm tree, the three of spades under a drink tray, etc.
The best part is that that kind of trick usually becomes more, rather than less, impressive when its inner working is revealed. I recently got to see them perform live, and my favorite trick by far was one of that kind.
I love these and my kids who are a little obsessed with stage magic right now are at the perfect age for Penn & Teller videos so I need to stash these for the next time I have them. There’s definitely skill levels involved, even at the professional level (and Penn & Teller are definitely among the best out there). We’ve been to two professional magic shows in the last year, one in Wisconsin Dells, the other in Lake Geneva (both venues starring their owners) and it was apparent that the Dells guy was much better than the Lake Geneva guy (although the second magician the Lake Geneva guy had some impressive work to show).
Ricky Jay (RIP) is also extraordinary and has a handful of videos on YouTube. His stuff is mostly with cards and his humor is more aimed at adults. One of my favorites is his appearance on Arsenio Hall (I wish this was better quality):
Yeah, Ricky Jay was the last of a breed (he was also a friend of a friend, something I only learned after both he and the friend had died). He was very schooled in a lot of the street con games and was also a scholar of historical frauds and cams. He published a small-circulation magazine with his researches in the latter.
And of course his appearances in various David Mamet–related properties were not to be missed.
Watching Teller do the cups and balls trick with transparent cups is mesmerizing.
Yes, you can see everything. No, you still can't follow it.
Sure, you can see that "something changed" after the fact when it is stable. However, Teller is so damn smooth and fast that any active change looks like teleportation.
There was a video of a magic trick I saw where the trick was almost certainly accomplished by false shuffles. I know how to do a few false shuffles, in theory, but, the magician (Ricky Jay, I think?) was just so good at the false shuffles that I was more impressed than if I didn't know how it was done.
The explanation for full-text search was actually slightly more intelligent than what I initially assumed. I figured it just generated UUIDs until it found one that was in the correct direction of search (for the next/previous button), since I had observed that walking forwards and backwards in search results was giving different results each time, but in fact the author did the slightly better thing which is to just generate a bunch of possible results and then pick the best (I wonder how many results it generates for this?).
The full text search is a little confusing because it doesn't actually search them in order, though it appears to at first. And if you click "next" a few times and then "prev" the same number of times, you don't necessarily end up back at the same UUID you were at before. It's a neat-seeming trick though.
It’s an interesting question whether that could be fixed. I think the answer is Yes. If the author didn’t do any scrambling, and just displayed UUIDs in numeric order, then it’s trivial to enumerate search results in order. Likewise, if you do something like adding a constant mod 16 to each hex digit, you could do the same thing when you generate UUIDs matching a substring. So the question becomes whether you could find something like that that gives a sufficiently convincing illusion of entropy but is still reversibile when you hold a subset of the digits constant. And it seems like it should be.
FWIW I am super interested in this question but feel like I don't know how to derive a satisfying answer, maybe because the one of my goals here (add "enough" entropy) is a real fuzzy "I know it when I see it" sort of thing.
But I'm gonna try to get a few more crypto-knowledgeable friends to chat with me about this and write up what I learn!
You've sniped me and I'm going to try and tackle this over the weekend. What's the best way to exchange things for you? Also, have you explored modular multiplicative inverses at all?
I think the biggest rabbit hole I went down was trying to use FEAL (which is very breakable) and exploiting the things that make it breakable. But this is very much not my area of expertise - I was learning as I was working - so it's very possible either that that was a dumb idea or that it was a great idea that I didn't figure out.
I also considered things like "focus on add lots of entropy to groups of 200 or so UUIDs, but have obvious patterns beyond that" which I think would be a reasonable strategy; here I just kind of ran out of time (I told myself I'd make this in a week)
Did some light reading about a few other things but nothing substantial
My first thought was to use linear transformations over Z_2 as a field, as that would create a natural interpretation of fixing certain bits as taking a linear subspace. Interestingly this leads to the property that XOR is preserved.
I implemented this in a very quick and hacky way for 32 bits. I generated a random boolean matrix M invertible in Z_2. To turn an input number x into a corresponding number y in an N-bit space, I convert x to binary and turn that into a vector of 1s and 0s, then multiply it by that randomized matrix to get y. Here are the first few y's corresponding to x=0, 1, ...:
00000000000000000000000000000000
0xx0x000xx00x0xx000xxx00xxx0x000
x0xx000xxxx0xxx00x000xxx0xx0xx0x
xx0xx00x00x00x0x0x0xx0xxx0000x0x
x0xx0x00xxx0xx0x00xx0x0xx0xx00x0
xx0xxx0000x00xx000x0x00x0x0xx0x0
00000x0x000000xx0xxx00x0xx0xxxxx
0xx0xx0xxx00x0000xx0xxx000xx0xxx
...
(Hoping the non-monospace font doesn't ruin the alignment too much.)
Which looks... random-ish? I expect that turning these into UUIDs may result in more random-looking sequences.
Not sure how to solve the problem of search with this, but the hope would be that the linear structure gives you what you need, since fixing bytes on UUIDs should correspond to considering linear subspaces of the y vectors. Perhaps this can also be used to apply lexicographic order on the corresponding x vectors (i.e.: ordering the indexes), so that you could jump to each UUID matching the search in order.
I did some fooling around with this and it seems promising.
The first problem is that, when your enciphering function is a matrix, f(0) = 0. This looks bad, but we can easily solve that problem by starting the webpage sequence at an index higher than 0.
I tried to work through a much smaller version of the problem* by hand, and it looks like this:
We want to find the next index whose encipherment ends in -110. This sets up a system of equations Dx = y, where x_3 = 1, x_4 = 1, x_5 = 0, and y tells us the index of x. By multiplying that out, we get:
So we can freely choose any values for y_1 and y_5, and the rest will be filled in by constraints.
Assuming we want the least possible value for y, this means we will pick y_1 = 0 and y_5 = 0, which then tells us that we want index [0 1 0 0 0], and we can jump to there. If we wanted the least possible value for y above a threshold (such as the current viewport), we'd pick y_n values accordingly.
Instinct tells me that libraries should exist for quickly solving systems of linear equations like this.
(For full full-text search, we'd need to do this several times, also finding solutions for the enciphered values 110xx, and x110x. This multiplies the work we need to do and the storage we need to use by an amount that is linear in the difference in length between the search string and the full UUID, which is still a lot better than trial-and-error.)
* I ended up doing it in 5 bits because every random 4x4 matrix I generated was noninvertible.
This might pass an eyeball test for random when viewed as decimal numbers. (Although there sure are a lot of cases where adjacent values are 2 apart!) It looks much worse as binary. Here's every ones digit, all concatenated into a hex string, starting from 17 again:
e1e01e1f
Let's note that ~(e1e0) = 1e1f. Start from 16 instead and you'd see f0f00f0f. Here are the eights:
3332bbbc
Individual bits show pretty striking patterns. Since the UUID is reported in hex, that can be mitigated a little by the fact that each hex digit combines four binary columns. But so far it seems pretty likely that this would result in the list of UUIDs looking decidedly nonrandom. There might be quite a bit of shared material between adjacent UUIDs.
Can you solve this in general without doing integer linear programming? How else would you know how to find the lowest index greater than the current? In the field GF(2), using 0 might not minimize.
Hi! I do not know enough of the relevant math to appreciate what you're doing here (yet); I am going to try to do some reading to understand this better, but if you have any advice on where I should start I will gladly take it.
The important thing about matrices is their linear properties. Most proofs in linear algebra over real vector spaces don't rely on the particularities of the real number system, just the fact that it is a field [0].
To make an analogy with programming, you can think of a matrix as a generic container type. Normally, we think of the contained type as being the real numbers, but you could replace it with anything that supports the multiplication and addition operations -- that implements the "field interface/trait/typeclass". [1]
So you can define linear algebra over any field: the real numbers of course, but also the rational numbers, or the complex numbers. Fields can also be finite. In particular, the integers modulo n is a field if n is prime. My suggestion is to consider the case n=2. [2]
It's a bit hard to get geometric intuition about what matrices on finite fields even mean, but a lot of theorems we can prove in the real case generalize, because they depend only on the field properties of the reals. That's why I suggested it -- it would make things like inverting straightforward. And that natural relationship between linear subspaces and holding certain bits constant might make search possible.
The issue (which others pointed out) is that linearity imposes so much structure that it might not seem pseudo-random anymore. But you could consider something like XORing all numbers with a randomly generated bit string -- it won't hold up if someone looks closely but might fool the cursory human eye.
[1] The analogy breaks down slightly because not only must the operations be defined, but they must also obey laws, which can't be expressed in (most) programming language type systems. But if you've used Haskell before, you'll have seen this before in the fact that a Monad has to obey the Monad laws.
Would it be easier if only the lowest bits of the index contributed to the "entropy"? Like if the 2^16th UUID was the one right after the first and n + 2^16 was the one after the nth. You wouldn't notice the pattern but it'd be easier for the computer to handle. I guess if you were searching for a substring you might notice the ones surrounding the matches look almost the same from match to match...
Let me cook something up for you. It's an interesting puzzle.
I think you can get pretty far, if you compromise on your entropy: it only has to look random, not actually be random. (I mean it doesn't have to be cryptographically secure randomness.)
Yes, it's possible with certain restrictions on the function. Here's an example:
u128 uuid_iter(u128 x) {
b = (x * x | 0x5) + x;
c = prf(x);
return (b ^ (c << 2)) & u122_mask;
}
This is a T-function, a function where the Kth bit only depends on the K-1 lower bits. prf() is a pseudo-random function that can be anything you like as long as it's another T-function. A standard LCRNG like (48271 * x) % 2147483647 works just fine for instance.
You can invert this by just running the function N times to test each bit from LSB to MSB against the result. If you know certain bits, you don't have to run those tests and you can order the matching UUIDs by the values of the K unknown bits from 0 to 2^(N - K) - 1.
You also don't need to keep track of the position with this method either, only a stopping point. Unlike the feistel network, this function also produces a full cycle when iterated as x_i+1 = f(x_i), only repeating after all numbers are produced. That means you could run this without a counter at all, just generating until the first value is produced again, for any bit length.
Searching is very similar to a common approach for building a naïve spellchecker: given an input, generate all the possible matches it can be part of. You're not searching in a corpus, you're using the input to generate indices into the corpus (list of UUIDs here, list of words in the dictionary in a spellchecker).
For the even more curious, UUID has 5316 decillion 911 nonillion 983 octillion 139 septillion 663 sextillion 491 quintillion 615 quadrillion 228 trillion 214 billion 121 million 397 thousand 304 values. Imagine the fact that there aren't as many kms to reach GN-Z11 (farthest known galaxy in the Universe I think) as there are digits above
The UIA (Universal Internet Authority) is worried that by using UUIDs we are left with only around 34 trillion UUIDs per star and planet in the observable Universe. So the cosmic router might become DHCP-leasing dark matter.
I searched for 1337, and then 13371337, and then 133713371337, and I was flabbergasted they've got a search setup for this (which ctrl-f opens up). Thanks for posting the blog post!
> The fact that the search works impressed me more than anything. Of course, like every great magic trick, it seems so simple once it is explained.
> Edit to add: I'd only tried searching for an exact UUID when I wrote this comment. I didn't realize it supports full text search! Now I'm even more impressed.
But the trick to the full-text search is that it doesn't work.
This is neat, although I pushed it later with incremental search and it seems to be skipping results as it only found ~50 when searching for `-000000000000`.
Yeah, I at first, I though I knew exactly how it worked. Then I saw the search field, and I suddenly had no idea what the hell was going on. Now, the big question, do I want to spoil the magic trick and read how this was done, or should I keep being astonished and flabbergasted?
I disagree with my sibling comment. The trick is beautiful. If you generate UUIDs such that each bit in the result can be reliably traced back to a single bit in the input, then you can take a substring of the UUID and use that to infer which bits of the input integer must be set to produce that substring. So you can produce a whole list of input bytes that meet the criteria and those become your search results.
The real magic trick here is that the uuids on the page only look random to us because of some bit twiddling and XOR trickery. If we had a better intuition for such things we would notice that successive UUIDs are just as correlated as successive integers.
> I disagree with my sibling comment. The trick is beautiful. If you generate UUIDs such that each bit in the result can be reliably traced back to a single bit in the input, then you can take a substring of the UUID and use that to infer which bits of the input integer must be set to produce that substring.
...but that has nothing to do with what the website is doing. The accompanying article specifically calls out the fact that it can't be done while maintaining the appearance of an unordered list, and therefore it isn't attempted.
Agreed. Everyone love puzzles that are worthy of an entire section in a blogpost for their interview questions, rather than stuff actually relevant to the job.
So I just brute forced every UUID in existence on my RTX GPU and loaded the dataset into a HA opensearch cluster on AWS. It took about 5 years of calling ‘uuid.Random()’ to effectively cover about 64% of the keyspace which is good enough.
To facilitate full-text search I created a langchain application in python, hosted on kubernetes, that takes your search query and generates synonymous UUIDs via GPT o1-preview before handing over to opensearch.
Opensearch returns another set of UUIDs, which I look up in my postgres database: “SELECT uuid FROM uuids WHERE id IN (…uuid_ids)”
I could imagine some candidates starting with their default tools like this, and start complaining about the cluster performance after a few weeks.
You need a certain way of thinking to have a gut feeling "this could be expensive" and then go back, question your assumptions and confirm your requirements. Not everyone does that - better to rule them out.
For the curious, here's the linked blog post describing how the project works: https://eieio.games/blog/writing-down-every-uuid/
Edit to add: I'd only tried searching for an exact UUID when I wrote this comment. I didn't realize it supports full text search! Now I'm even more impressed.