A word of caution. A few years ago we had a production impact event where customers were getting identical cookies (and so started seeing each others sessions). When I took a look at the code, what I found was that they were doing something very like your code - using a time() based seed and an PRNG.
Whenever we deployed new nginx configs, those servers would roll out and restart, getting _similar_ time() results in the seed. But the individual nginx workers? Their seeds were nearly identical. Not every call to the PRNG was meant for UUIDs, but enough were that disaster was inevitable.
The solution is to use a library that leverages libuuid (via ffi or otherwise). A "native lua" implementation is always going to miss the entropy sources available in your server and generate clashes if it's seeded with time(). (eg https://github.com/Kong/lua-uuid, https://github.com/bungle/lua-resty-uuid)
In the code I saw, at least twice in its history people had introduced a "pure lua" solution for speed, and were clearly unaware of the shotgun they'd just pointed at their feet. (as in, somebody saw the issue and fixed it, and then someone else _fixed it back_ before I came along).
But in case _I'm_ messing up here, I'll bow to your expertise: libuuid uses /dev/random, which uses a CSPRNG (ChaCha20) with entropy ingested via Blake2 from whatever sources the system can get, right?
We did actually do a bunch of before/after testing showing the collision rates (zero after), and I believe the cookie in question has been replaced with a third party identity system in the intervening years - but if we did it wrong, I'd like to know.
Had this issue on a ray tracer I worked on. Since sampling was supposed to be random, you could fire it up on multiple machines and just average the result to get a lower noise image.
Except the distributed code fired it up all worker instances almost simultaneously and the code used time() to seed the RNG, so many workers ended up using the same seed and hence averaging those results did nothing.
"There are 52-factorial ways to shuffle a deck of cards, but the site's PRNG only has 32 bits of state. 4 billion is alarmingly less than 52-factorial! But even worse, the PRNG is seeded using the number of milliseconds since midnight. 86 million is alarmingly less than 4 billion!"
So the actual entropy on the card table was equivalent to about 5 cards' worth. After seeing the 2 cards in his hand, and the 3 cards in the flop, he could use a program to solve for every other card in everyone's hand and in the entire deck!
(I may have mixed up many details - If anyone has an archive of the article please post it!)
UUIDv4 is banned in some environments because of how common it is to find someone using weak PRNGs to generate them. It happens way more often than it should.
Bush Derangement Syndrome is covered (the writeup is linked to from the TDS article) but there is something special when republicans in multiple state legislatures have proposed _legislation_ on the subject of TDS, under that name, which would spend taxpayer money. https://en.m.wikipedia.org/wiki/Trump_derangement_syndrome#P...
Then should we remove the 501c3 status of every church, mosque, temple, etc in the U.S. because they are biased towards not just the existence of a god, but the existence of their particular version of god?
More relevantly, it’s an open secret that a lot of churches are heavily into political advocacy directly for candidates, which they’re not supposed to do under their tax status, but they’ve been playing with the boundaries unchecked and are now really obviously past where they’re supposed to be—but nobody’s got the guts to go after them, so they just keep getting bolder.
Absolutely you can. The places in France and Spain I've flown to just suggest you bring your own pedals, so they match your shoe cleats; they'll fit them before your hire. You can usually bring your own saddle too. It's far more convenient than bringing the bike.
I've also done it the other way, my main bike has S&S coupling so I could bring it aboard the Eurostar. For touring I prefer my own bike, because I have the racks set up for my panniers, but when I do that, I prefer travelling by ferry/train.
I love it! Even for the lift/shuttling-bike-park crowd, I bet many places would happily install cleat-matching pedals (or your own), your saddle... maybe your grips if you are picky?
The limit given in the article is 360KB (on floppy). At that size, you can't use Tries, you need lossy compression. A Bloom filter can get you 1 in 359 false positives with the size of word list given https://hur.st/bloomfilter/?n=234936&p=&m=360KB&k=
The error rate goes up to 1 in 66 for 256KB (in memory only);
to me it doesn't look great that Perplexity use BrowserBase at all. I asked BB's doc bot if you can customise the user agent; it says you can't because it sets the user agent automatically _in order to bypass bot checks_.
This seems to be the only secret sauce they offer; other than that it's just a headless browser farm. So perplexity saying "companies like Cloudflare mischaracterize user-driven AI assistants as malicious bots" is disingenuous at best; they chose to use a tool designed to mask their traffic and it blew up in their face?
While this is nice enough, it bothers me that these don't look much like "art". If you look at real roman mosaics, they do not place points in a grid - they use a technique called "opus vermiculatum" https://en.wikipedia.org/wiki/Opus_vermiculatum ... snaking the tiles around so that there is a flow to it; the overall effect is much better.
I think that'd be possible to automate too. I was doing something related over here: https://hachyderm.io/@bazzargh/112767548339559102 - in that I was trying to generate sketch-like renderings from photographs. What I did was to pick random points, look at the brightness gradient (taken from the Sobel operator, there are other ways to do this), move up the gradient a bit and sketch some parallel lines (and then various experiments with hatching for shading the flatter areas)
In a similar way you could start with a grid of tiles _with some separation_, and allow them to move and align better with the gradient of the underlying picture, and not lie _on_ edges, if possible. If they overlap, allow the tessera to be cut, and only then choose images to colour-match the average on the tile, leaving some "grout" in the image (I'd probably speckle that a bit so it didn't look too uniform). Then the result might look more like real mosaics.
I managed to get some decent results using this algorithm (not at my own computer so can't post code, yet):
- create a smaller greyscale copy of the image, and use sobel to calculate gradients (smaller to speed this up)
- set a gradient magnitude threshold, and add those points to a queue, largest first.
- for these points, add 2 squares, one either side of the queued point, in the direction of the gradient (ie you expect one to be light and one dark)
- add the squares to a location hash as they are placed. (I'm using a grid size slightly larger than my square tiles). The location hash is to speed up comparisons.
- skip placing a square if it would fall within a small radius of a previous square's centre (I used 0.5 of a square size). When looking up a point in the location hash, remember to look not for the point itself, but for the hash values for the corners of an axis-aligned square with your point at the centre; this is to catch overlaps when the point falls need grid lines.
- once all points in this queue have been processed or skipped, we start on a new queue, containing all squares placed so far.
- for each square in the queue, try to place a new square to its north, south, east and west along its alignment; as before skip these if they overlap too much
- any squares we place - jitter their position and angle slightly (otherwise it looks horribly unnatural)
- any squares we do place, add them to the phase 2 queue.
The first phase is quite slow, set the threshold high. Second phase placement is very fast. My squares all use a grey stroke the same grey as the background for the grout effect, and the squares are drawn using the colour of the point picked as the square's centre (I don't bother averaging). I have it rendering this interactively, using requestAnimationFrame, so it doesn't clog up the browser - I add about 50 tiles per frame
I'm looking at one it did of the mona lisa; it places the phase 1 tiles along her hairline and hand in a nice "vermiculatum" way, the phase 2 placement is less satisfying but with jitter it seems ok. Originally I'd thought about calculating where squares overlap and cutting tiles nicely but it was quicker just to _allow_ the overlap and so most of what you see are the whole tiles placed on top of partials. The overall effect isn't _quite_ like hand placed tiles but I like it better than a grid.
A photo mosaic demands to have photos that are high resolution. You don't want to zoom in to find blurry jpegs, it just isn't right!
I have had great fun with OpenSeadragon in the past and now there is the VIPS image processing library for writing out a massive set of image tiles.
Hence it is possible to work with thumbnails and then render out the thing with OpenSeadragon and VIPS.
OpenSeadragon was amazing when it was a Microsoft demo a few decades ago, but time has moved on. I wonder what can be done with tilesets in HTML5 with picture tags or in SVG to present a infinitely zoomable montage.
I like your suggestion and the options for rotating and clipping images with SVG methods. I always confuse mask and clip, but, in SVG, much could be done.
For me, the starting point would be to do an SVG with thousands of images in it, to just watch my computer crash as I step up the resolution. Really I want to recreate OpenSeadragon in SVG...
There is a naive approach to making this kind of thing that reduces the component images to such a small size (2-3 pixels in large image) that turns this into more of a dithering exercise than looking for artifacts in each component image to match up lines. It's still a nice effect, but it's quite different when the component images are > 10% the size of the final image, instead of < 1% the size.
It could be better sometimes for tiles to lie over edges. For example, there might be an edge dividing red and green areas, and one of your tiles is mostly half red and half green.
This is frustratingly difficult to understand from the site because they never include a human in the photos for scale (though the text says it's 3m in diameter). This video is a bit better https://www.youtube.com/watch?v=BvL0T5xyG5E ... it shows a cutaway in a scale model that makes it clearer where people stand, pictures of people painting the exterior and video of the globe in motion. I can't find any of what it looks like inside while in motion, which would have been nice. I guess I'll just have to go visit!
Back in... 2006ish? I got annoyed with being unable to copy text from multicolumn scientific papers on my iRex (an early ereader that was somewhat hackable) so dug a bit into why that was. Under the hood, the pdf reader used poppler, so I modified poppler to infer reading order in multicolumn documents using algorithms that tessaract's author (Thomas Breuel) had published for OCR.
It was a bit of a heuristic hack; it was 20 years ago but as I recall poppler's ancient API didn't really represent text runs in a way you'd want for an accessibility API. A version of the multicolumn select made it in but it was a pain to try to persuade poppler's maintainer that subsequent suggestions to improve performance were ok - because they used slightly different heuristics so had different text selections in some circumstances. There was no 'right' answer, so wanting the results to match didn't make sense.
And that's how kpdf got multicolumn select, of a sort.
Using tessaract directly for this has probably made more sense for some years now.
I too went down that rabbithole. Haha. Anything around that time to get an edge in a fantasy football league. I found a bunch of historical NFL stats pdfs and it took forever to make usable data out of them.
Whenever we deployed new nginx configs, those servers would roll out and restart, getting _similar_ time() results in the seed. But the individual nginx workers? Their seeds were nearly identical. Not every call to the PRNG was meant for UUIDs, but enough were that disaster was inevitable.
The solution is to use a library that leverages libuuid (via ffi or otherwise). A "native lua" implementation is always going to miss the entropy sources available in your server and generate clashes if it's seeded with time(). (eg https://github.com/Kong/lua-uuid, https://github.com/bungle/lua-resty-uuid)
reply