Customer fits in memory, whereas catalog_sales does not.
We chose to remove SQLite from the results because it was so much slower. The plots are much less readable when they are stretched out by something that is slower by an order of magnitude
> Customer fits in memory, whereas catalog_sales does not.
Didn't prevent using pandas which had to rely on dynamic swapping? Or is in-memory sqlite unable to use that much memory?
> We chose to remove SQLite from the results because it was so much slower. The plots are much less readable when they are stretched out by something that is slower by an order of magnitude
So you're using on-disk sqlite because it fits in memory (unlike pandas which also fits in memory) but you're dropping it anyway because it's too slow when it works on-disk?
You are right, we could probably re-run SQLite purely in memory, but only because macos dynamically allocates additional swap.
However, I would not expect much improved performance, because I do not believe that SQLite has a different sorting strategy when running in memory. It would only save some i/o operations, which are very cheap on the macbook anyway.
Furthermore the article explicitely says:
> We will use customer at SF100 and SF300, which fits in memory at every scale factor.