Does Microsoft use deterministic encryption for searchable encryption? I'm sure OPE, Pallier etc. schemes are in use for columns that require those properties
I know the goal is to have the cloud provider untrusted in this model for storage/processing. However, have you considered them actively malicious in how they handle the protocol steps with the on-site client? As in, at each step (eg a query), they might try to attack the protocol or especially the implementation (a la OpenSSL infamy). Just make sure you have defenses for that sort of thing.
Malicious server, apart from possibility of removing data, would look pretty unusual for the client (like returning incorrect data etc). So it's a good idea to look for these patterns. An attack would probably look like trying to use the client as an oracle, and that's pretty detectable.
But as MacLane pointed out in another comment this is only the case for the standalone database described in the ArXiv paper. The Hadoop scheme works quite differently
> Currently, ZeroDB works with Hadoop and will soon will expand to other parts of the big data ecosystem, like Spark and Impala, as well as legacy databases, like Oracle, DB2, and MySQL.
I guess mysql is a legacy database, now. Someone should let all the committers know that they should transition to support roles and stop new feature development. Hadoop is the future of sql.
The biggest technical hurdle for this type of database right now is index lookup. Since the nodes on the indexes are encrypted the client/server requires a round trip for every binary tree index level that needs to be traversed. This makes what is usually one of the fastest database operations into a slow one.
However, it is not the case for our Hadoop scheme (nor our future support for structured database). In these cases, there is no round-tripping required. In fact, it's significantly more performant than existing Transparent Data Encryption in Hadoop, from both a latency and key rotation perspective.
We'll likely release a paper describing this new scheme later in the year as well as publish at some conferences.
Having worked on a similar product and heard a very similar description of the 'proprietary' method, I'm guessing either security, speed or both are actually compromised.
We have ideas how to make relational databases secure while running everything server-side, thanks to recent research publications [notably CipherBase from Microsoft Research http://www.cidrdb.org/cidr2013/Papers/CIDR13_Paper33.pdf] and advances in CPU hardware. Early days, but we'll probably test it first in the open source ZeroDB database [https://opensource.zerodb.com] and then apply the same method to existing relational databases.
I think, you may have the application with a different cloud provider or even in a different geographic location, so that you can keep keys away from the data. In this case you sort of distribute trust.
CryptDb offers DB querying without having to load certain parts of the db to your local machine, which is the model of operation by zeroDb in its current incarnation.
Which is more secure? Does zeroDb use non deterministic encryption?
Off-topic: whenever I see (YC ) in a title I always assume it's a job posting and automatically ignore it. Might be detrimental to label a blog post appear that way.
Looked different to me. Those usually don't have comments and other links under the title. Just amount of time since post appeared. Sounds like two of you are doing spot judgments on threads that ignore important info. Your filters are title only instead of title + peripheral info. Prevents the problem you're having.