I see this kind of pushback from engineers quite often when advocating for privacy first technologies. I think this comment is a good example of the sort of reflexive dismissal many developers have to changing how they work, from being asked to move a bit away from the dominant paradigm of the last 20-30 years (cleartext private data on servers).
1. The idea that it’s impractical to download the data. We are talking about searching text notes. 10gb? That’s nearly 7 million pages of notes. Doesn’t seem a reasonable figure. Even hundreds of megs of text would require a lot of time to acquire and even that would rarely need to be downloaded in one go (new device).
2. The idea that tech that tends to live on the server is somehow magical. Search would take “a few seconds” on a “slow” phone processor you say. Phone professors are incredibly fast these days, but in any case, since when is text search slow? I feel like many developers treat text search as a scary magical black box but it’s rather straightforward. Read about inverted indexes and consider that the servers of the mid 90s were serving at least tens of thousands of text searchers apiece with processors probably slower than what’s in your phone. There are libraries you can use locally to make text search pretty trivial. Apple supports text search (spotlight) across the data and apps on your phone. It’s not rocket science.
3. The idea that anything other than conceptually perfect encryption is useless. “first-party end-to-end encryption is snake oil” - assuming you mean crypto where the user does not handle the keys directly and a first party causes them to be generated, first party e2ee describes some of the biggest privacy wins of the last 15 years - iMessage, FaceTime, Signal, WhatsApp.
1. I use the 10GB example to show, by an extreme that is nonetheless completely plausible in many domains (mostly communication apps, the frontrunners for E2EE: email with various attachments, chat with not even a few thousand photos attached, that kind of thing), where the scheme falls apart.
I consider even 10MB more than is ideal to require downloading before you can do anything, in most domains.
Text-only notes? Sure, they’re likely to be adequately small, but I’m talking about downsides of E2EE in general, not just this application.
2. I wasn’t sufficiently clear in the context of the figures I used for client search duration. I was talking most of all of the sorts of devices that can’t manage 50MB/s of I/O, and certainly don’t have enough spare memory to fit the index in RAM, so your big index is simply too big for fast to be possible—and it’ll tend to cause memory pressures that slow everything else down too. Generally speaking on a capable machine, yes, properly-done text search should be considerably faster than your latency. But also in practice apps normally use inferior search techniques than their servers, which I think is because most of the effort has gone into server-style search engines, which are not packaged for embedded/library use. As a super simple example, any mainstream email provider will be doing full text search of emails and of at least some types of attachments (you can be confident of at least PDF, DOC and DOCX), all with features like stemming and spelling correction, but I think it’s probably still true that most local email clients don’t search attachment contents and suspect many won’t do fully proper stemming or offer spelling correction. Just more generally, if you compare the search results of server and client, it’s distressing how often client is kneecapped. This is by no means fundamental.
3. End-to-end encryption is presented as a panacea. “Because it’s end-to-end-encrypted, we can’t see your messages” and the likes. Such statements are lies. They need a big asterisk along the lines of “… until we want to, or a government orders us to”. Yes, E2EE helps in the general case, and if they stopped at that I would hold my peace; but they go further and claim, or deliberately give the impression of, inviolability, when all around the world legislatures, police forces and other governmental bodies are testing the edges of undermining it all, and it would be naive to suppose they will not go further and succeed. And so I say: first-party end-to-end encryption is largely false advertising, them saying “trust us, you don’t have to trust us”.
I generally agree with your points but would point out that when I exported all my evernote data a few years ago it came to about 15gb, mostly because I use a lot of photographs and diagrams in my notes. I don't know anything about this note platform but if it allows multimedia embedding then the data can balloon quite rapidly.
I might imagine a pipeline where a full photograph blob is downloaded and decrypted on your device, normalized, run through something like image2vec + ocr + metadata extraction, and the result stored in an index. At that point, of course, you could garbage collect the original blob - at least until your app releases an major update version requiring a reindexing of blobs.
(I am leaving this comment to explain why I am downvoting your comment, as while this is absolutely the correct answer for how to build this--and so in some sense deserves an upvote--it is itself the proof for why you were wrong and yet is presented as the response to a socratic question that should have led you to realize why you were wrong and yet you didn't seem to acknowledge such, even though you clearly do appreciate that this answer is the opposite of the narrow question that was asked. I thereby feel this deserved both the two downvotes--on this answer and the original question--as well as--and I try to avoid doing this: I prefer just hitting downvote and moving on with my life--an explanation to ensure that if anyone is merely skimming they see that this is in fact the reason why the device can do that search locally without all 15GB synchronized at all times, and work only ever has to be done to improve old indexes in the off chance you make a major improvement to your indexing, and that both can be done incrementally and is often avoided by centralized players anyway as it is so costly for them.)
So you're saying you DON'T need to keep the photograph on the device, and it can be retrieved as needed, like I was? Your scheme only requires you to keep the index. I don't understand why you asked that question in the first place?
1. The idea that it’s impractical to download the data. We are talking about searching text notes. 10gb? That’s nearly 7 million pages of notes. Doesn’t seem a reasonable figure. Even hundreds of megs of text would require a lot of time to acquire and even that would rarely need to be downloaded in one go (new device).
2. The idea that tech that tends to live on the server is somehow magical. Search would take “a few seconds” on a “slow” phone processor you say. Phone professors are incredibly fast these days, but in any case, since when is text search slow? I feel like many developers treat text search as a scary magical black box but it’s rather straightforward. Read about inverted indexes and consider that the servers of the mid 90s were serving at least tens of thousands of text searchers apiece with processors probably slower than what’s in your phone. There are libraries you can use locally to make text search pretty trivial. Apple supports text search (spotlight) across the data and apps on your phone. It’s not rocket science.
3. The idea that anything other than conceptually perfect encryption is useless. “first-party end-to-end encryption is snake oil” - assuming you mean crypto where the user does not handle the keys directly and a first party causes them to be generated, first party e2ee describes some of the biggest privacy wins of the last 15 years - iMessage, FaceTime, Signal, WhatsApp.