I see this kind of pushback from engineers quite often when advocating for priva...

chrismorgan · on Aug 23, 2022

1. I use the 10GB example to show, by an extreme that is nonetheless completely plausible in many domains (mostly communication apps, the frontrunners for E2EE: email with various attachments, chat with not even a few thousand photos attached, that kind of thing), where the scheme falls apart.

I consider even 10MB more than is ideal to require downloading before you can do anything, in most domains.

Text-only notes? Sure, they’re likely to be adequately small, but I’m talking about downsides of E2EE in general, not just this application.

2. I wasn’t sufficiently clear in the context of the figures I used for client search duration. I was talking most of all of the sorts of devices that can’t manage 50MB/s of I/O, and certainly don’t have enough spare memory to fit the index in RAM, so your big index is simply too big for fast to be possible—and it’ll tend to cause memory pressures that slow everything else down too. Generally speaking on a capable machine, yes, properly-done text search should be considerably faster than your latency. But also in practice apps normally use inferior search techniques than their servers, which I think is because most of the effort has gone into server-style search engines, which are not packaged for embedded/library use. As a super simple example, any mainstream email provider will be doing full text search of emails and of at least some types of attachments (you can be confident of at least PDF, DOC and DOCX), all with features like stemming and spelling correction, but I think it’s probably still true that most local email clients don’t search attachment contents and suspect many won’t do fully proper stemming or offer spelling correction. Just more generally, if you compare the search results of server and client, it’s distressing how often client is kneecapped. This is by no means fundamental.

3. End-to-end encryption is presented as a panacea. “Because it’s end-to-end-encrypted, we can’t see your messages” and the likes. Such statements are lies. They need a big asterisk along the lines of “… until we want to, or a government orders us to”. Yes, E2EE helps in the general case, and if they stopped at that I would hold my peace; but they go further and claim, or deliberately give the impression of, inviolability, when all around the world legislatures, police forces and other governmental bodies are testing the edges of undermining it all, and it would be naive to suppose they will not go further and succeed. And so I say: first-party end-to-end encryption is largely false advertising, them saying “trust us, you don’t have to trust us”.

sleepybrett · on Aug 23, 2022

I generally agree with your points but would point out that when I exported all my evernote data a few years ago it came to about 15gb, mostly because I use a lot of photographs and diagrams in my notes. I don't know anything about this note platform but if it allows multimedia embedding then the data can balloon quite rapidly.

remram · on Aug 23, 2022

The photographs can be retrieved as needed while still being encrypted and not impeding text search.

tetromino_ · on Aug 23, 2022

How do you propose searching the content of your photographs and diagrams? ("Show me notes with gardening photos")

remram · on Aug 23, 2022

What scheme would you use for searching the content of your photographs that requires the full photograph blob to be available for search?

tetromino_ · on Aug 23, 2022

I might imagine a pipeline where a full photograph blob is downloaded and decrypted on your device, normalized, run through something like image2vec + ocr + metadata extraction, and the result stored in an index. At that point, of course, you could garbage collect the original blob - at least until your app releases an major update version requiring a reindexing of blobs.

saurik · on Aug 23, 2022

(I am leaving this comment to explain why I am downvoting your comment, as while this is absolutely the correct answer for how to build this--and so in some sense deserves an upvote--it is itself the proof for why you were wrong and yet is presented as the response to a socratic question that should have led you to realize why you were wrong and yet you didn't seem to acknowledge such, even though you clearly do appreciate that this answer is the opposite of the narrow question that was asked. I thereby feel this deserved both the two downvotes--on this answer and the original question--as well as--and I try to avoid doing this: I prefer just hitting downvote and moving on with my life--an explanation to ensure that if anyone is merely skimming they see that this is in fact the reason why the device can do that search locally without all 15GB synchronized at all times, and work only ever has to be done to improve old indexes in the off chance you make a major improvement to your indexing, and that both can be done incrementally and is often avoided by centralized players anyway as it is so costly for them.)

remram · on Aug 24, 2022

So you're saying you DON'T need to keep the photograph on the device, and it can be retrieved as needed, like I was? Your scheme only requires you to keep the index. I don't understand why you asked that question in the first place?