Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
ZeroDB (YC S16) Provides Security for Enterprise Big Data in the Cloud (themacro.com)
79 points by stvnchn on Aug 9, 2016 | hide | past | favorite | 35 comments


Microsoft offers "Always Encrypted" for Query Processing over encrypted data in SQL Server and SQL Azure : https://msdn.microsoft.com/en-us/library/mt163865.aspx (Disclaimer : Microsoft Employee)


Does Microsoft use deterministic encryption for searchable encryption? I'm sure OPE, Pallier etc. schemes are in use for columns that require those properties


Yes. It's configurable.


Whitepaper from Arxiv describing their scheme for others who like seeing underlying details of security tech:

https://arxiv.org/pdf/1602.07168v3.pdf

@ ZeroDB developers

I know the goal is to have the cloud provider untrusted in this model for storage/processing. However, have you considered them actively malicious in how they handle the protocol steps with the on-site client? As in, at each step (eg a query), they might try to attack the protocol or especially the implementation (a la OpenSSL infamy). Just make sure you have defenses for that sort of thing.


Malicious server, apart from possibility of removing data, would look pretty unusual for the client (like returning incorrect data etc). So it's a good idea to look for these patterns. An attack would probably look like trying to use the client as an oracle, and that's pretty detectable.

But as MacLane pointed out in another comment this is only the case for the standalone database described in the ArXiv paper. The Hadoop scheme works quite differently


> Currently, ZeroDB works with Hadoop and will soon will expand to other parts of the big data ecosystem, like Spark and Impala, as well as legacy databases, like Oracle, DB2, and MySQL.

I guess mysql is a legacy database, now. Someone should let all the committers know that they should transition to support roles and stop new feature development. Hadoop is the future of sql.


Yeah, I guess a little early to retire MySQL and Oracle :-) Thanks for pointing out, legacy is more about DB2


Fixed in the post.

/s/legacy/structured


The biggest technical hurdle for this type of database right now is index lookup. Since the nodes on the indexes are encrypted the client/server requires a round trip for every binary tree index level that needs to be traversed. This makes what is usually one of the fastest database operations into a slow one.


This is indeed true for our standalone, open source database (https://github.com/zerodb/zerodb).

However, it is not the case for our Hadoop scheme (nor our future support for structured database). In these cases, there is no round-tripping required. In fact, it's significantly more performant than existing Transparent Data Encryption in Hadoop, from both a latency and key rotation perspective.

We'll likely release a paper describing this new scheme later in the year as well as publish at some conferences.


That will be a very interesting read.


Having worked on a similar product and heard a very similar description of the 'proprietary' method, I'm guessing either security, speed or both are actually compromised.


There are many proprietary methods which are based on deterministic encryption + obfuscating word distribution, that's what most companies do.

We avoid doing that because of questionable security of such method. Also we tend to publish what we do (stay tuned for Hadoop paper :-)


Hey HN, cofounder of ZeroDB here. Michael (/u/michwill) and I are excited to be a part of YC and happy to answer any questions about the company!


Love the idea behind ZeroDB - kudos to you guys, and cheers from a fellow Tar Heel!

What's your expansion strategy for Oracle/DB2/MySQL?


We have ideas how to make relational databases secure while running everything server-side, thanks to recent research publications [notably CipherBase from Microsoft Research http://www.cidrdb.org/cidr2013/Papers/CIDR13_Paper33.pdf] and advances in CPU hardware. Early days, but we'll probably test it first in the open source ZeroDB database [https://opensource.zerodb.com] and then apply the same method to existing relational databases.


If the consuming application (and thus, keys) exist in the cloud as well, does ZeroDB offer any additional benefit over other encrypted at rest dbs?

random ex: https://docs.mongodb.com/manual/core/security-encryption-at-...


I think, you may have the application with a different cloud provider or even in a different geographic location, so that you can keep keys away from the data. In this case you sort of distribute trust.


Lol k

- michael scott


CryptDb offers DB querying without having to load certain parts of the db to your local machine, which is the model of operation by zeroDb in its current incarnation.

Which is more secure? Does zeroDb use non deterministic encryption?


ZeroDB doesn't use deterministic encryption (neither Hadoop product, nor open source database)


For those who may know, how does ZeroDB stack up against an incumbent like MarkLogic in the 'Security for Enterprise Big Data in the Cloud'?


That's a very strong security claim. What's behind it?


As the founding engineer at VoltDB, I can tell you that no matter how cool your name is, it can be annoying to be always alphabetized last.


That's one of the reasons behind the name change from Cadabra to Amazon.com.

p.s. if you change it to 0db perhaps you might be listed before any "a" company? Not sure if numbers precede letters in those lists.


0db - that's I was thinking of. Way to fight 0days


0db definitely sounds like an audio brand. I don't recommend it.


Do what GM did and call the next version BoltDB.


The last is what's remembered best :-D.

But yeah, good point


Off-topic: whenever I see (YC ) in a title I always assume it's a job posting and automatically ignore it. Might be detrimental to label a blog post appear that way.


Better to read it as advertising YC's investments on YC's forum.


You can advertise it without using the exact same format as job listing.


Looked different to me. Those usually don't have comments and other links under the title. Just amount of time since post appeared. Sounds like two of you are doing spot judgments on threads that ignore important info. Your filters are title only instead of title + peripheral info. Prevents the problem you're having.


Plus one


Please comment civilly and substantively, or not at all.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: