Edit to add: I'd only tried searching for an exact UUID when I wrote this comment. I didn't realize it supports full text search! Now I'm even more impressed.
I'm really happy that the trick was magical to you - I was so surprised and delighted when I realized that this was possible, and I wasn't really sure if anyone else would feel the same way!
And of course, I'm proud to be providing so much utility here - finally we can find and use UUIDs tailor-fit to our needs
> If we didn’t care about generating valid UUID v4s, we could just generate 2^128 numbers, scramble them, convert the resulting bits into a hex string, and intersperse some dashes.
You can do that anyway. You'd only need the twiddling if you wanted to limit the amount of numbers you generate to 2^122. Since you're willing to generate 128 bits:
// get encrypted value
uint128 bits = encipher(seed);
// clear zero bits
bits &= 0xFFFFFFFF FFFF 4FFF BFFFFFFFFFFF; // instead of 4 and B, you can use 0 and 3
// set one bits
bits |= 0x00000000 0000 4000 800000000000; // these have to be 4 and 8
But since you're generating more numbers than you need, you're going to end up with 64 copies of each UUID in your final stream. This won't matter because no one will ever be able to notice, as long as your faux search functions avoid pointing it out.
Exercise for further development: modify substring search so that it follows the expected behavior of finding matches in order within the page. [I don't recommend attempting this.]
With a linear algebra library, you can guarantee that you've found the next, or the previous, match in sequence. I don't know what the state of the art is for fast linear algebra in javascript, though.
(The matrix approach also has the advantage that, when your full-text search problem has 2^115 solutions, you can compute the one you want, the next one after some index, without having to compute them all.)
FWIW, "search" doesn't work on mobile (Chrome on Android): I go to "Find in Page" and none of the magic happens. It's also bypassed on desktop when I manually open the search box via Edit->Find->Find... instead of using Ctrl+F.
I wonder if there's (yet) a browser API you could hook into: the same way browsers allow JavaScript to manipulate the history [1], maybe there's a way to manipulate the Ctrl+F/find-in-page search results.
That is, right now you're capturing the Ctrl+F keypress and opening your own custom thing to read the user's search string and act on it. But what we'd really like is a way to be notified "The user just asked to search for 'xyz'. Would you like to capture that event, or let it go through to the browser's default behavior?"
A quick Google search found nothing like that exists yet. I then asked ChatGPT about it, hoping that ChatGPT would at least hallucinate a plausible design for the API — and had mixed feelings when it didn't. It just printed that 'Browsers do not provide a way to listen for the "Find in Page" search event due to privacy and security concerns' and suggested capturing the Ctrl+F keypresses exactly as you have done.
As someone else said, it would also be more like full-text search if you also considered the primary-key column, e.g. searching for "0390814603917539994005679487460590835" should jump to the 390814603917539994005679487460590835th row. (Highlighting-to-select pieces of the text also doesn't work: I'm not sure why not, since I would have thought the browser gives you at least that part for free.)
Besides "search" not working on mobile, the styling on mobile is such that the "scrolling" does not convince: to my eyes it looks too obviously like "changing the values in the cells of a fixed table" as opposed to "scrolling through the table itself." You could maybe mitigate that by animating quickly among three different page layouts with the table vertically offset by different amounts.
It occurs to me that if JavaScript has something like Python's `random.sample`-without-replacement, then you could set your `RANDOM_SEARCH_ITERATIONS` to 256 and achieve perfectly consistent (and exhaustive) "search" when the user has entered all but 1 or 2 hex digits of their desired result. And/or, you could just have the page secretly keep a history of the search results the user has already seen: this would prevent the user from finding out so quickly that "search + next + next + prev + prev" doesn't always get them back to where they started.
Speaking of exhaustive search results: With a bit more (probably equally algorithmically interesting) work, you could emulate the browser search's "7/256" by tallying up the number of UUIDs satisfying the constraint, e.g. if the user has typed "1234567" then you could display "1234567 (158456324585806817418058661888 results)" and maybe even fake up a convincing position indicator like "1234567 (17415833585801881134805987465/158456324585806817418058661888)". I guess if you display it as "1234567 (1.742e28/1.585e29)" then you don't even have to cheat that much. :)
A great example of Teller's observation that "sometimes magic is just someone spending more time on something than anyone else might reasonably expect."
My favorite of these was a trick where someone picked a card out of a deck and then Teller revealed a large version of that same card in an unexpected area in the vicinity, It turns out that what he had done was hide a complete set of large cards in the area before the trick and memorized the location of every one of them so, e.g., the king of hearts would be at the top of a palm tree, the three of spades under a drink tray, etc.
The best part is that that kind of trick usually becomes more, rather than less, impressive when its inner working is revealed. I recently got to see them perform live, and my favorite trick by far was one of that kind.
I love these and my kids who are a little obsessed with stage magic right now are at the perfect age for Penn & Teller videos so I need to stash these for the next time I have them. There’s definitely skill levels involved, even at the professional level (and Penn & Teller are definitely among the best out there). We’ve been to two professional magic shows in the last year, one in Wisconsin Dells, the other in Lake Geneva (both venues starring their owners) and it was apparent that the Dells guy was much better than the Lake Geneva guy (although the second magician the Lake Geneva guy had some impressive work to show).
Ricky Jay (RIP) is also extraordinary and has a handful of videos on YouTube. His stuff is mostly with cards and his humor is more aimed at adults. One of my favorites is his appearance on Arsenio Hall (I wish this was better quality):
Yeah, Ricky Jay was the last of a breed (he was also a friend of a friend, something I only learned after both he and the friend had died). He was very schooled in a lot of the street con games and was also a scholar of historical frauds and cams. He published a small-circulation magazine with his researches in the latter.
And of course his appearances in various David Mamet–related properties were not to be missed.
Watching Teller do the cups and balls trick with transparent cups is mesmerizing.
Yes, you can see everything. No, you still can't follow it.
Sure, you can see that "something changed" after the fact when it is stable. However, Teller is so damn smooth and fast that any active change looks like teleportation.
There was a video of a magic trick I saw where the trick was almost certainly accomplished by false shuffles. I know how to do a few false shuffles, in theory, but, the magician (Ricky Jay, I think?) was just so good at the false shuffles that I was more impressed than if I didn't know how it was done.
The explanation for full-text search was actually slightly more intelligent than what I initially assumed. I figured it just generated UUIDs until it found one that was in the correct direction of search (for the next/previous button), since I had observed that walking forwards and backwards in search results was giving different results each time, but in fact the author did the slightly better thing which is to just generate a bunch of possible results and then pick the best (I wonder how many results it generates for this?).
The full text search is a little confusing because it doesn't actually search them in order, though it appears to at first. And if you click "next" a few times and then "prev" the same number of times, you don't necessarily end up back at the same UUID you were at before. It's a neat-seeming trick though.
It’s an interesting question whether that could be fixed. I think the answer is Yes. If the author didn’t do any scrambling, and just displayed UUIDs in numeric order, then it’s trivial to enumerate search results in order. Likewise, if you do something like adding a constant mod 16 to each hex digit, you could do the same thing when you generate UUIDs matching a substring. So the question becomes whether you could find something like that that gives a sufficiently convincing illusion of entropy but is still reversibile when you hold a subset of the digits constant. And it seems like it should be.
FWIW I am super interested in this question but feel like I don't know how to derive a satisfying answer, maybe because the one of my goals here (add "enough" entropy) is a real fuzzy "I know it when I see it" sort of thing.
But I'm gonna try to get a few more crypto-knowledgeable friends to chat with me about this and write up what I learn!
You've sniped me and I'm going to try and tackle this over the weekend. What's the best way to exchange things for you? Also, have you explored modular multiplicative inverses at all?
I think the biggest rabbit hole I went down was trying to use FEAL (which is very breakable) and exploiting the things that make it breakable. But this is very much not my area of expertise - I was learning as I was working - so it's very possible either that that was a dumb idea or that it was a great idea that I didn't figure out.
I also considered things like "focus on add lots of entropy to groups of 200 or so UUIDs, but have obvious patterns beyond that" which I think would be a reasonable strategy; here I just kind of ran out of time (I told myself I'd make this in a week)
Did some light reading about a few other things but nothing substantial
My first thought was to use linear transformations over Z_2 as a field, as that would create a natural interpretation of fixing certain bits as taking a linear subspace. Interestingly this leads to the property that XOR is preserved.
I implemented this in a very quick and hacky way for 32 bits. I generated a random boolean matrix M invertible in Z_2. To turn an input number x into a corresponding number y in an N-bit space, I convert x to binary and turn that into a vector of 1s and 0s, then multiply it by that randomized matrix to get y. Here are the first few y's corresponding to x=0, 1, ...:
00000000000000000000000000000000
0xx0x000xx00x0xx000xxx00xxx0x000
x0xx000xxxx0xxx00x000xxx0xx0xx0x
xx0xx00x00x00x0x0x0xx0xxx0000x0x
x0xx0x00xxx0xx0x00xx0x0xx0xx00x0
xx0xxx0000x00xx000x0x00x0x0xx0x0
00000x0x000000xx0xxx00x0xx0xxxxx
0xx0xx0xxx00x0000xx0xxx000xx0xxx
...
(Hoping the non-monospace font doesn't ruin the alignment too much.)
Which looks... random-ish? I expect that turning these into UUIDs may result in more random-looking sequences.
Not sure how to solve the problem of search with this, but the hope would be that the linear structure gives you what you need, since fixing bytes on UUIDs should correspond to considering linear subspaces of the y vectors. Perhaps this can also be used to apply lexicographic order on the corresponding x vectors (i.e.: ordering the indexes), so that you could jump to each UUID matching the search in order.
I did some fooling around with this and it seems promising.
The first problem is that, when your enciphering function is a matrix, f(0) = 0. This looks bad, but we can easily solve that problem by starting the webpage sequence at an index higher than 0.
I tried to work through a much smaller version of the problem* by hand, and it looks like this:
We want to find the next index whose encipherment ends in -110. This sets up a system of equations Dx = y, where x_3 = 1, x_4 = 1, x_5 = 0, and y tells us the index of x. By multiplying that out, we get:
So we can freely choose any values for y_1 and y_5, and the rest will be filled in by constraints.
Assuming we want the least possible value for y, this means we will pick y_1 = 0 and y_5 = 0, which then tells us that we want index [0 1 0 0 0], and we can jump to there. If we wanted the least possible value for y above a threshold (such as the current viewport), we'd pick y_n values accordingly.
Instinct tells me that libraries should exist for quickly solving systems of linear equations like this.
(For full full-text search, we'd need to do this several times, also finding solutions for the enciphered values 110xx, and x110x. This multiplies the work we need to do and the storage we need to use by an amount that is linear in the difference in length between the search string and the full UUID, which is still a lot better than trial-and-error.)
* I ended up doing it in 5 bits because every random 4x4 matrix I generated was noninvertible.
This might pass an eyeball test for random when viewed as decimal numbers. (Although there sure are a lot of cases where adjacent values are 2 apart!) It looks much worse as binary. Here's every ones digit, all concatenated into a hex string, starting from 17 again:
e1e01e1f
Let's note that ~(e1e0) = 1e1f. Start from 16 instead and you'd see f0f00f0f. Here are the eights:
3332bbbc
Individual bits show pretty striking patterns. Since the UUID is reported in hex, that can be mitigated a little by the fact that each hex digit combines four binary columns. But so far it seems pretty likely that this would result in the list of UUIDs looking decidedly nonrandom. There might be quite a bit of shared material between adjacent UUIDs.
Can you solve this in general without doing integer linear programming? How else would you know how to find the lowest index greater than the current? In the field GF(2), using 0 might not minimize.
Hi! I do not know enough of the relevant math to appreciate what you're doing here (yet); I am going to try to do some reading to understand this better, but if you have any advice on where I should start I will gladly take it.
The important thing about matrices is their linear properties. Most proofs in linear algebra over real vector spaces don't rely on the particularities of the real number system, just the fact that it is a field [0].
To make an analogy with programming, you can think of a matrix as a generic container type. Normally, we think of the contained type as being the real numbers, but you could replace it with anything that supports the multiplication and addition operations -- that implements the "field interface/trait/typeclass". [1]
So you can define linear algebra over any field: the real numbers of course, but also the rational numbers, or the complex numbers. Fields can also be finite. In particular, the integers modulo n is a field if n is prime. My suggestion is to consider the case n=2. [2]
It's a bit hard to get geometric intuition about what matrices on finite fields even mean, but a lot of theorems we can prove in the real case generalize, because they depend only on the field properties of the reals. That's why I suggested it -- it would make things like inverting straightforward. And that natural relationship between linear subspaces and holding certain bits constant might make search possible.
The issue (which others pointed out) is that linearity imposes so much structure that it might not seem pseudo-random anymore. But you could consider something like XORing all numbers with a randomly generated bit string -- it won't hold up if someone looks closely but might fool the cursory human eye.
[1] The analogy breaks down slightly because not only must the operations be defined, but they must also obey laws, which can't be expressed in (most) programming language type systems. But if you've used Haskell before, you'll have seen this before in the fact that a Monad has to obey the Monad laws.
Would it be easier if only the lowest bits of the index contributed to the "entropy"? Like if the 2^16th UUID was the one right after the first and n + 2^16 was the one after the nth. You wouldn't notice the pattern but it'd be easier for the computer to handle. I guess if you were searching for a substring you might notice the ones surrounding the matches look almost the same from match to match...
Let me cook something up for you. It's an interesting puzzle.
I think you can get pretty far, if you compromise on your entropy: it only has to look random, not actually be random. (I mean it doesn't have to be cryptographically secure randomness.)
Yes, it's possible with certain restrictions on the function. Here's an example:
u128 uuid_iter(u128 x) {
b = (x * x | 0x5) + x;
c = prf(x);
return (b ^ (c << 2)) & u122_mask;
}
This is a T-function, a function where the Kth bit only depends on the K-1 lower bits. prf() is a pseudo-random function that can be anything you like as long as it's another T-function. A standard LCRNG like (48271 * x) % 2147483647 works just fine for instance.
You can invert this by just running the function N times to test each bit from LSB to MSB against the result. If you know certain bits, you don't have to run those tests and you can order the matching UUIDs by the values of the K unknown bits from 0 to 2^(N - K) - 1.
You also don't need to keep track of the position with this method either, only a stopping point. Unlike the feistel network, this function also produces a full cycle when iterated as x_i+1 = f(x_i), only repeating after all numbers are produced. That means you could run this without a counter at all, just generating until the first value is produced again, for any bit length.
Searching is very similar to a common approach for building a naïve spellchecker: given an input, generate all the possible matches it can be part of. You're not searching in a corpus, you're using the input to generate indices into the corpus (list of UUIDs here, list of words in the dictionary in a spellchecker).
For the even more curious, UUID has 5316 decillion 911 nonillion 983 octillion 139 septillion 663 sextillion 491 quintillion 615 quadrillion 228 trillion 214 billion 121 million 397 thousand 304 values. Imagine the fact that there aren't as many kms to reach GN-Z11 (farthest known galaxy in the Universe I think) as there are digits above
The UIA (Universal Internet Authority) is worried that by using UUIDs we are left with only around 34 trillion UUIDs per star and planet in the observable Universe. So the cosmic router might become DHCP-leasing dark matter.
I searched for 1337, and then 13371337, and then 133713371337, and I was flabbergasted they've got a search setup for this (which ctrl-f opens up). Thanks for posting the blog post!
> The fact that the search works impressed me more than anything. Of course, like every great magic trick, it seems so simple once it is explained.
> Edit to add: I'd only tried searching for an exact UUID when I wrote this comment. I didn't realize it supports full text search! Now I'm even more impressed.
But the trick to the full-text search is that it doesn't work.
This is neat, although I pushed it later with incremental search and it seems to be skipping results as it only found ~50 when searching for `-000000000000`.
Yeah, I at first, I though I knew exactly how it worked. Then I saw the search field, and I suddenly had no idea what the hell was going on. Now, the big question, do I want to spoil the magic trick and read how this was done, or should I keep being astonished and flabbergasted?
I disagree with my sibling comment. The trick is beautiful. If you generate UUIDs such that each bit in the result can be reliably traced back to a single bit in the input, then you can take a substring of the UUID and use that to infer which bits of the input integer must be set to produce that substring. So you can produce a whole list of input bytes that meet the criteria and those become your search results.
The real magic trick here is that the uuids on the page only look random to us because of some bit twiddling and XOR trickery. If we had a better intuition for such things we would notice that successive UUIDs are just as correlated as successive integers.
> I disagree with my sibling comment. The trick is beautiful. If you generate UUIDs such that each bit in the result can be reliably traced back to a single bit in the input, then you can take a substring of the UUID and use that to infer which bits of the input integer must be set to produce that substring.
...but that has nothing to do with what the website is doing. The accompanying article specifically calls out the fact that it can't be done while maintaining the appearance of an unordered list, and therefore it isn't attempted.
Agreed. Everyone love puzzles that are worthy of an entire section in a blogpost for their interview questions, rather than stuff actually relevant to the job.
So I just brute forced every UUID in existence on my RTX GPU and loaded the dataset into a HA opensearch cluster on AWS. It took about 5 years of calling ‘uuid.Random()’ to effectively cover about 64% of the keyspace which is good enough.
To facilitate full-text search I created a langchain application in python, hosted on kubernetes, that takes your search query and generates synonymous UUIDs via GPT o1-preview before handing over to opensearch.
Opensearch returns another set of UUIDs, which I look up in my postgres database: “SELECT uuid FROM uuids WHERE id IN (…uuid_ids)”
I could imagine some candidates starting with their default tools like this, and start complaining about the cluster performance after a few weeks.
You need a certain way of thinking to have a gut feeling "this could be expensive" and then go back, question your assumptions and confirm your requirements. Not everyone does that - better to rule them out.
PINs can be up to 6 digits (at least here in the UK, but I doubt it's country specific), even though the ones they give you by default are only ever 4. So that's only a leak of 1% of them.
I'm from a country that has 6-digit PINs on most cards, and I've traveled to e.g. the United States where people are surprised that credit card PINs can be more than 4 digits, but in my experience, terminals accept them just fine. It seems like they are designed to suggest a PIN is only 4 digits but they will happily accept more. So while you're entering your PIN, the display looks something like:
For some reason, a lot of credit card processing APIs are still oriented around physical card machines, so they have a lot of fields devoted to self-declaration of the available features.
Some of the APIs allow the machine to say "I can accept a PIN up to 12 digits".
However, I don't know if anyone 1) actually delivered on it and 2) how you'd know in advance just poking random devices in stores.
That provides very valuable information: DO NOT TRUST this machine to be secure!
Similarly, any web site or app that can’t correctly handle a space character at the end of the password should never be trusted with anything of consequence.
Why? PINs are limited to 4 digits in many markets, so it's not exactly extreme for a developer to not consider the (to them) edge case of 6 digit PINs on foreign cards.
Conversely, it seems very possible to support 6 digit PINs and yet still make a ton of horrible implementation mistakes, security and otherwise.
Why is the space thing inherently insecure? I’m thinking bad form validation could trip it up and be considered “not handled” vs concerns suggesting plaintext storage
Are you really worried that a card machine is going to leak your PIN? That doesn't seem to be a common attack vector compared to a third-party skimmer being attached or someone just mugging you and demanding your PIN under threat of physical violence.
To answer the actual question: I don't know because I left my PIN at 4 digits, despite knowing I could use more, precisely because I didn't think it would really make my life any more secure.
I'm not worried specifically about the PIN leaking.
The concern is that a 4-digit max PIN length is certainly implemented by someone who couldn't be bothered to read the spec for secure credit card transaction handling.
It's the equivalent of the "No brown M&Ms" clause or "Canary in the coal mine" test.
Nobody actually cares about the M&M color or some dumb bird.
"Must support 6-digit PINs" is not part of "the spec for secure credit card transaction handling" – which is also not a (or at least one) thing: There are dozens of card networks, and many of them have tons of regional variations.
In some markets, issuers only allow 4 digit PINs, and customers don't expect to have to press an "enter" key when they're done entering their 4 digit PIN – so the reasonable implementation is to allow only 4 digit PINs, or you'll be left with people staring at the ATM/POS terminal, waiting for something to happen.
Making an ATM that can accept cards from multiple issuers (which is the norm these days) and allowing only 4 digits is the same category of error as requiring that the first character of someone's last name start with a capital letter, or to block symbol characters in names.
I agree: An annoying/avoidable implementation shortcoming, but arguably relatively orthogonal to security.
> there are over a dozen different PIN block standards
You almost certainly don't need to support all of these inside the PIN pad or even ATM/POS. If necessary, translation can happen in other parts of the system.
People tend to very very quickly their mind on that one once they get a few right-to-left control characters that flip over the text layout of the entire program.
Are you arguing to just allow any byte in names (which are ultimately user-defined input data), including 0x00, Unicode byte order markers etc., in a thread about a perceived correlation between developers' lack of i18n awareness and security bugs no less?
The reality is that there is often a tradeoff between keeping your test and edge cases simple via constraining allowable inputs and internationalization.
So I think you've got it exactly the wrong way around: These limitations might have happened precisely because somebody wanted to do the right thing from a safety/security perspective by doing overly strict input validation, at the expense of internationalization/compatibility.
Not saying that that's a good tradeoff in every case, but I really don't think you can draw any conclusions about a system's security by looking at whether it arbitrarily disallows some inputs (or if anything, maybe the opposite).
Ok, but that doesn't answer my question: what specific problem are you worried about that would allow someone to steal your money, that isn't incomparably unlikely compared to other methods? I'm just not aware of any problem that has happened in practice from poorly written card reader software.
My card doesn't even let me include repeating digits in its PIN. I suppose it can make a one-off guess more likely than one in a thousand to correctly guess my PIN.
Is it repeating in the whole PIN, or in digits next to each other? I'm trying to resist the nerd snipe of what the total number of possibilities would be in the latter case...
I believe it would be 7290, or more generalized, S(N) = 10 * 9 ^(N-1) with N being the length of the code and S being the number of combinations (assuming that a decimal system is used)
And from there, with variable lengths ranging from L to H, S(L, H) = 5/4 * 9 ^(L-1) * (9^(U-L+1) - 1)
So if the bank allows combinations from 4 - 6 digits, there would be a total of 663390 combinations to choose from.
Now, of course, the bank may decide to go from decimal to hexadecimal in the future - or maybe, there systems allow only duodecimal. In any case, the formula can be generalized further to account for all number systems - with B being the base of the system:
Which is honestly not a bad idea, given that somebody shoulder surfing or trying to read smudges on the PIN pad becomes much easier in the case of repeated numbers.
"1111" would just leave a fingerprint on a single key, for example, and only one possible PIN (or maybe 3, if the bank/card allows 6 digit PINs).
Haha, I have a file in my home directory that has every possible SSN because I wanted to be able to tell a friend “I have your SSN in a file on my computer.”
I think this is a joke, but I think it is a problem if someone finds any sensitive uuid here, because the list on this website is a tiny subset of all possible uuids, so it provides a useful rainbow table for anyone attempting brute force attacks. I.e. generating and using random uuids would have an astronomically small success rate, whereas trying the ones on this site may not (depending on where they came from, which I'm not sure of).
Oh.. ha, gotcha. Thanks for explaining. Incidentally, glad uuid's computed on the fly (as opposed to pre-computed) as I think the site would require a very (impossibly?) large database.
> depending on where they came from, which I'm not sure of
They're coming straight out of your processor :)
Careful where you scroll: Your password and your crypto wallet recovery phrase are in there somewhere too! (Unless you have one of those fancy 24 word long ones.)
All possible UUIDs are in this page, it’s not a tiny subset.
They are generated by your device on the fly as you move through the list so you can’t really use it as a rainbow table any more than manually creating the table yourself.
At work our clickstream data carries an 'event_uuid' column which combines uuid, bigint, account numbers and about 10 other identifiers. It makes joining really convenient when you don't know what column to use.
> Browsers do not want to render a window that is over a trillion trillion pixels high, so I needed to handle scrolling and rendering on my own
What’s fun is what actually happens when you try to do these things. It’s disappointing, you can’t even get anywhere near one trillion pixels.
The limits I found five years ago when working at Fastmail, after a customer using IE found their scrollbar broke when they had around 200,000 emails in a mailbox (plus I just checked a couple of them again now):
• Firefox: a few years ago, ignored declarations that resolved to a value higher than 17,895,697 pixels (a smidgeon under 2³⁰ sixtieth pixels). Now, it clamps at that point instead, but maybe three pixels more or less, not immediately clear quite what’s going on and I can’t be bothered investigating. (All browsers seem to have inconsistencies in how clientHeight/getBoundingClientRect/dev tools/whatever report things, that close to the boundaries of possibility. It’s mildly fascinating.)
• IE: ignores declarations that resolve to a value equal to or higher than 10,737,418.23 pixels (2³⁰ − 1 hundredth pixels).
• WebKit: clamps values somewhere around 2²⁵ (~33,554,432) pixels.
• Chromium: matched WebKit when I tested, now it’s clamping around 22,360,882 pixels, but I’m on a 1.5× display, so the 2²⁵ could be connected with device pixels or such. But I think I was on a 2× display when I did the initial testing!
Ah this is great - I tried finding a modern-ish list of max browser heights while writing the blog. Do you mind if I link your comment in an update?
(I'm also super curious how those numbers were chosen and about what would start breaking if the limits weren't there - maybe your example of weirdness around clientHeight etc gets at that)
I'd like to announce my new `npm` package called `get-uuid`. Behind the scenes, it loads `everyuuid.com`, picks a random row number, and returns that UUID.
That's a terrific idea, I'll also create a npm package, that consumes yours and returns the Guid but without the '-' between the digits. Go code reuse!
It's a fun implementation inspired by the short story https://en.wikipedia.org/wiki/The_Library_of_Babel and "At present it contains all possible pages of 3200 characters," though the character set is limited (no dash) so you won't find these UUIDs there.
Weissman score doesn't really have meaning anymore if the input is constrained. For example this website is 200KB as per the network traffic and I can beat it handily with ten-line script to generate every UUID. Or even verbally by just saying "every UUID".
oh hm, i thought I had fixed every case where that popped up. I can't reproduce the issue on any browser on my machine - how did you jump to the bottom (hotkey, scrollbar, scrolling for a trillion years) and what platform are you on (if you don't mind sharing)?
Chome on Windows. Grab the right scroll bar and pull to the very bottom. Once at the bottom try to scroll down with mouse scroll wheel. Screen goes blank and the error appears in the console.
OH! Thanks for the precise reproduction steps, I had done my testing with a trackpad and had missed that! Writing a little fix now (I'd hate for you to not be able to see the last UUID)
alright, a fix that I think covers this crash and a few others I found as a result is rolling out now.
for a while I thought I had an off by one error, which was pretty funny to think about in the context of trillions of items. but in fact I was just doing some bad react state management.
This reminds me of some code I stumbled on recently, where someone had implemented a custom exception they could throw if their 32-bit loop counter was greater than the maximum value of a 32-bit integer.
If the browser has 1080 vertical pixels, the scrollbars has max say 1000 possible positions. According to my napkin math* if you scroll over 100 uuids per second it would take up to ~1.7 septillion or ~1 700 000 000 000 000 000 000 000 years to scroll to an uuid of which you know the position if you hit the spot exactly on the scrollbar.
HN recently (last few months) had an article explaining how large a number was. The number was something like busy beaver or 128 bit integers or something else.
It illustrated how large the number was by creating activities you would do, incrementing the counter as you did them. The sequence went something like this:
Walk, and every time you take a step. Add 1
After you have circled the earth, place a sheet of paper on a pile, and start walking again.
Continue, until the paper pile reaches the sun, then place a grain of sand in the Grand Canyon and start over.
Continue until you have filled the Grand Canyon, etc etc
It continued for a lot of such steps until you finally counted up to the number in question.
And then do that in parallel for 10 billion people.
And for each of their devices, and servers or other supporting infrastructure.
And do it multiple times per second (e.g. for every log message, every datapoint, ...)
I've seen "thought experiments" (not sure they're quite that, but close enough) like that for a variety of things. The big numbers I've seen that done for most often are: unique shufflings of a deck of playing cards (about 10^68), atoms in the universe (about 10^80), and a googol (10^100). I've definitely seen the playing card one involve the walk around the world, pile of paper, grand canyon, etc (also drain the ocean a drop at a time, IIRC).
A "proper" search while still retaining sufficiently-looking randomness might be achievable via an SMT solver, by asking it to find an index above the current position, below a binary-searched top boundary, that contains the search term. I think SMT solvers should be clever enough to be able to work the search around some ciphers.
Maybe an SMT solver is a rather heavy-weight approach, but I think it fits the task of searching through 2^122 values :)
Edit: even with no cipher, it takes said approach ~1.6s to find that after 0x1067D3DC2F4951AEA8DB8D0108D7D65 the first occurrence of 0xABCD is at 0x1067D3DC2F4951AEA8DB8D0108DABCD with Z3 (cvc5 and Bitwuzla are slightly slower), which is perhaps a bit too slow.. perhaps it could be improved to something more reasonable by restricting the search to near the end until that fails, and reducing the matchable positions based on the dashes, but that's back to effort.
Edit 2: slightly more effort later:
cipher: 0x15FD586DF0CE258730098B94325ACE7 * (val ^ 0x93C324915DB2B3C4D4CD8135C0DDF1)
inverse: (0x1D52334877384DE3A6DAA4A3312A6D7 * val) ^ 0x93C324915DB2B3C4D4CD8135C0DDF1
example first 10 entries:
0: 2839676B79617D5B9FDF9D43CFB3077
1: 123C0EFD889357D46FD611AF9D58390
2: 143418475AFDC869FFF2B46C3468A45
3: 3E36BFD96A2FA2E2CFE928D8020DD5E
4: 002EC9233C9A13786005CB94991E413
5: 2A3170B54BCBEDF12FFC400066C372C
6: 2C2979FF1E365E86C018E2BCFDD3DE1
7: 162C21912D6838FF900F5728CB790FA
8: 18242ADAFFD2A995202BF9E562897AF
9: 0226D26D0F04840DF0226E51302EAC8
binary search max set to `10 * 2**search_bit_width` after after the chosen start
trying to find a match only in the last 3 possible places (i.e. searching WXYZ only matches *****WXYZ, ****WXYZ*, ***WXYZ**)
searching for 0xFF00FFCC after index 0xAAAAAAAAAAAAAA:
nearest match found at: 0xAAAAAAAD04E3DA
ciphered: 0x0E9399CC39B716BC96348AFF00FFCCD
time taken: 0.7s (bitwuzla or cvc5), or 1s (z3)
But alas I'm stupid, and searching doesn't care about lexicographic ordering, and you necessarily have to search all possible places, not just the end, and that's a ton slower. Perhaps dash placement could reduce them enough, but that's even more effort.
This is funny I like it. Basically just captures the scroll input and iterates through int128s and their UUIDs on a static page to fake a scroll. My feature request would be to add smooth scrolling.
I use this all the time but someone decided to disable it by default. For those who haven't turned it on after a reinstall yet, or haven't discovered the setting: about:preferences -> search autoscrolling
It depend on the UUID version you're using. Version 4 (Random) will always have that value be 4 as per RFC 9562. So 99999999-9999-9999-9999-999999999999 is a valid UUID but not a valid UUID v4. If you wanted to be pedantic the website should have been named https://everyuuidv4.com/
I think object identifiers would be better, althoug they should add another arc that does not require registration, based on: (fixed prefix).(type of identifier).(number of days past epoch).(parts according to type of identifier).(optional extra parts). (I had partially written my proposal, and I would want ITU and/or ISO (preferably ITU) to approve it and then manage it.) For example, type 0 could mean international telephone numbers, type 1 could mean version 4 IP address, type 2 could mean domain names (encoding each part as bijective base 37, from right to left), 3 could mean a combination of geographic coordinates with radio frequencies, 4 could mean telephone numbers with auto-delegated telephone extensions, etc. (I had also considered such things as automatic delegation, clock drift, etc; it is more carefully considered than UUID and some other types of identifiers.)
Sounds like you're in the realm of URNs? I don't know about that description, I think there's a benefit to a short and fixed-size ID. Though maybe for the domain name example you could have an alternate form that hashes any domain that goes over 20-30 characters.
Oh okay. That's a pretty different suggestion from the comic.
Would you suggest random 128 bit numbers, then? Otherwise it's hard to see what else would serve the same role without being UUID in a trenchcoat. And having identifiers is important.
Yeah I would really like if UUID was just 128 bits of randomness and nothing else. The whole version thing sucks, and my point (which you are right in that the ordering is a little off) is that UUIDv4 is the only good one and the rest basically should not exist. UUIDv4 itself is ruined by the fact that it needs to have a version embedded in it because the other ones exist.
The values are hexidecimal, so all "9s" isn't the biggest UUID, but all "f's". Specifically, I think: `ffffffff-ffff-4fff-bfff-ffffffffffff`.
The "4" in the 3rd block is the only permitted value as these UUIDs are using the GUIDv4 format. I'm not sure what's going on in the 4th block, but the references and linked RFC in the Wikipedia article might reveal more details: https://en.wikipedia.org/wiki/Universally_unique_identifier#...
One important feature is missing: From a proper search function I would expect to know how often my string is found. It could be that my password is rare, or that it is rather common. I need to know! Could the search also display the number of hits?
Jokes aside - you know the number of digits of the search string and if it is still a valid uuid. So computing the number of "matches found" should be possible...
For the implementation of the core logic, I probably would have gone for the lazy solution of iterating AES until the output is <2^122 (64 times on average).
Alternatively just use standard Format-Preserving-Encryption, which is usually a Feistel Network, similar to what they ended up with, but built on a standard algorithm, instead of a homebrew round function.
Thank you for finally giving me a name for this concept. I've run into lots of code implementing them badly, but there's a bit of a semantic hill for others when you don't have a better name than "1-cycle permutation".
Love it! I can't help but think there's still a way to use native scrolling though. You could start with a tall document, say, 10k pixels and grab the scroll position and multiply it out when it moves to get sort of a macro scroll position. That would handle the arrows and clicking or grabbing the scroll box.
But the wheel would be tricky because you want to scroll by only one or a few items, not items/10k. With a static viewport-sized DIV set to overflow-y:scroll and large vertical content size, positioned so you don't see the scroll bar, when the mouse is over it it should capture wheel movement which you could translate to an offset from your macro scroll position.
Just a thought. Either way, love the thinking here, fun work :)
We may be lucky that it didn't work. It appears from GNU units that printing this all out on paper would be about a hundred solar masses. Maybe about fifty solar masses if you print double-sided.
Your comment is needed as a parent / top-level for the discussion. A lot of people were confused about the 'V' portion in particular. Thanks for the insight.
I don't think f is a valid UUID version. One of these positions should be 1–6 or so (not sure who's the authority on which version works how, IETF maybe?)
Barely related tangent - but the speed of how quickly this site populated the full list of all these UUIDs (it's likely static) made me remember this site https://unicode.org/emoji/charts/full-emoji-list.html which seems to programmatically generate the page every single time (are there that many new emojis that this is needed) and as a result the page takes minutes to load (I've never let it finish).
It seems to skip a lot for substring search. For example if you search for 'aaaa' there's maybe 100 jumps to get to the bottom. So i'd assume that it's just a uniform random over the entire range to some limit, then filter idx > cur. But I feel like you could constrain the search more. The 'close next' should have exactly the same characters at the front. Like in your example '4AAB-' you would search valid position 14 first.
edit: actually this doesn't work because having the same start does not mean they are close together.
>Maybe this is silly to you. But framing the problem as “come up with a function that looks like it adds entropy but is reversible” was a lot easier for me to think about than “preserve bijectivity between these two sets.
IMO that's half of "real work" - figuring out how to think about a problem. Insights are funny that way - stuff that is obvious to one person is game-changing to others.
> And of course I’m still very curious whether there’s a cryptanalysis approach that lets me achieve more effective search over a random-ish ordering of UUIDs. I’m gonna do some more reading there.
I think that's definitely possible. Especially if you realise that you only need a random-ish looking order, not a cryptographically secure random order.
This could use a “scrubbing” control, where when scrolling by dragging vertically, the scrolling speed depends exponentially on the horizontal position. (Meaning, for example, dragging 10 screen units vertically when the pointer is at horizontal position x would scroll by 10^(x/10) entries.)
> It’s a little awkward to take our two chunks of 61 bits and map it onto the available bits here.
Does it say anything about code-AI that a problem which should be (is!) very easy for machines, but very difficult for front-endy people, seems to also be very difficult for Claude?
It's interesting insight that if you encrypt something in a way that the cipher-text is exactly as long as plain-text (counted in bits) it's as if you just listed all possible inputs and mixed them randomly.
"bijectivity", there is a term from high school I thought I would never hear in real live, still remember having injection, surjection & bijection explained in math class 20 years ago.
I know I shouldn’t be surprised that uuids like 00000000-0000-4000-8000-000000000000 or 12345678-0123-4567-8901-234556789012 exist, but being able to search for them and find them still feels amazing.
Feels to me like Pandora's box to UUID collisions. Even though UUID space is so huge human mind is always so tiny, as it is always subject to some bea5 (pronounce it "bias").
For some reason, I was expecting the search to filter the results instead of just work like Ctrl+F. It'd be nifty if it could collapse non-matching UUIDs.
He's using a bijection between the number on the left (which are ordered) and the UUID that's designed to produce something random-looking. The simple fact that bijection cannot produce the same output out of two inputs (or it's not a bijection) means that all the UUIDs are unique.
UUID is a 128-bit value, of which 122 bits are available for actual data. UUID was conceived as a way to mark an intersection point between two dimensions, originally for Apollo Computer's RPC system where the two dimensions were time and hardware identifier (originally originally Apollo hardware serial number, later commonly 802.3 address). The additional bits are metadata telling you that a particular 128-bit value is a UUID, not just a coincidental jumble of bits, and which type of UUID.
This site is specifically about “V4” UUIDs where the two dimensions are random and also-random. If you scroll all the way to the bottom you will see that the last table row is numbered the same value as the maximum 122-bit number, so the site is flipping every possible bit-combination within that space combined with the metadata bits that say “Hello I am a V4 UUID”:
They are not, 100% strictly speaking, “ensured”. But they are 128bit numbers, so you have realistically no chance of generating a uuid that someone else has already. Age-of-the-universe type chances of duplicating one.
UUIDs aren't technically unique, they're just designed in such a way that the chance of collisions is very small.
A big part of this is that the possibility space is very large, so the chance of collisions is low. Many UUID versions also determine parts of the UUID via MAC addresses and timestamps, to ensure that different servers are highly unlikely to generate the same UUID.
Thanks!
I guess strictly you'd have to include this page, which would put the total internet data in the (roughly) ronnabyte or quettabyte range, with nearly all of it being this page.
I didn't think v4 uuid's were completely random over that 128bit space for some reason, and this was wrong, but interestingly.
for a uuid like 414c1bde-b676-4242-be35-887f01a24f10, if I take its suffix 887f01a24f10 (12 chars, 48 bits) there are still 19 chars (76 bits) to the left of it. (barring the constant 13th char identifier 4)
There still are 2^76 uuid's with that suffix to search through. It made sense with ipv6 addresses and cryptography, but for some reason I had this idea that uuid's didn't use an CRNG that covered the whole 128 bit space. A lot of wrong stuff rattling around in my memory for some reason, thanks for clarifying.
There are 6 fixed bits. The other two are in the 17th hex character.
But overall you have that right. Every bit is either completely random or fixed. There are no reduced-randomness patterns unless you generated it wrong.
You were so preoccupied with whether you could, you never stopped to ask if you should.
This will be the goto reference for hackers everywhere. Think of how much faster they'll compromise my McDonald's order with all 2^122 possibilities already computed!
Oh my. Brings me back to my internship at a web development firm in Venlo. Someone "hacked" the company (somehow knew the master password used for everything) and deleted half the websites. Could restore from backup iirc but you know what the solution was?
Add an exclamation mark to the password
—
Which in turn reminds me of another internship where a computer upstairs got "hacked". Everything on the desktop was messed up, icons moved, random programs open, some files moved into folders at random... why'd anyone do that? The boss left it as-is for me to take a look at how this could happen—yes, including the thematic porn sites hinting 17-year-old me at what happened to his marriage. Not finding any hints, I started sorting the stuff back out and business continued as usual
Come afternoon, I was working on the computer when the mouse moved. Like, not me, I didn't move it. Was it the hacker? How are they connected? After a bit it moved again, but it didn't make any sense. I don't remember how I put 2 and 2 together but, going downstairs and seeing him at the media station using a wireless mouse downstairs, I got a dark suspicion
Yes of course it was an insider threat =) We bought a different wireless receiver for the mouse that day
For the curious, here's the linked blog post describing how the project works: https://eieio.games/blog/writing-down-every-uuid/
Edit to add: I'd only tried searching for an exact UUID when I wrote this comment. I didn't realize it supports full text search! Now I'm even more impressed.