Towards the end of the writing the paper, another paper came out on "RISC: rapid inverted-index based search of chemical fingerprints", https://doi.org/10.1021/acs.jcim.9b00069 which does something along those lines.
It was close enough that I published the pre-print "RISC and dense fingerprints" at https://doi.org/10.26434/chemrxiv.8218517.v1 to examine its claims. I found that their RISC implementation was faster than chemfp for low bit densities (<~5%), which includes the popular 2048-bit ECFP/Morgan fingerprints for smaller radii, and uncommonly high similarity thresholds.
Otherwise, chemfp was faster.
So while there's certainly something to investigate there, I think it's better to focus that effort on truly sparse fingerprints and count fingerprints, rather than nominally dense bit fingerprints.
Just needs money and time. ;)
Plus, part of the focus was on making chemfp a really good baseline for these sorts of timing tests.
It was close enough that I published the pre-print "RISC and dense fingerprints" at https://doi.org/10.26434/chemrxiv.8218517.v1 to examine its claims. I found that their RISC implementation was faster than chemfp for low bit densities (<~5%), which includes the popular 2048-bit ECFP/Morgan fingerprints for smaller radii, and uncommonly high similarity thresholds.
Otherwise, chemfp was faster.
So while there's certainly something to investigate there, I think it's better to focus that effort on truly sparse fingerprints and count fingerprints, rather than nominally dense bit fingerprints.
Just needs money and time. ;)
Plus, part of the focus was on making chemfp a really good baseline for these sorts of timing tests.