MemSQL ships 2.0, scales across hundreds of nodes, thousands of cores

SomeCallMeTim · on April 23, 2013

Is this one of those, "If you have to ask how much it costs, you can't afford it?" situations? Because "Try it now Free!" is the only thing I can see on their site related to a cost, and "First one's free!" rarely means it will always be free. :|

vosper · on April 23, 2013

Their resource estimator starts off at 1640 cores [1] and even the bottom most tick on the scale represents 500 cores. For people who can dedicate that amount of hardware to their database the licensing cost is probably not a major component.

The numbers look impressive, though.

http://www.memsql.com/why-memsql#scale-out

jufo · on April 23, 2013

Try again - I found the bottom ticks were 8 cores and 256GB.

vosper · on April 23, 2013

Fair enough - I had considered that to be on the axis since it's the minimum value, and thus not a tick, but visually it certainly is represented as a tick.

At any rate, the point I was trying to make is that they clearly expect you to throw a lot of hardware at these systems.

SomeCallMeTim · on April 24, 2013

Sure, but a lot of projects LIKE this have an open source project for those people NOT in the enterprise, and who probably won't be running more than 8-24 cores or so.

On a related note, changing the amount of RAM seems to only change the amount of RAM, not any of the other numbers. I guess that's their point, but I would rather they just SAY that than make me try a bunch of options to show that RAM doesn't matter. :|

curiousDog · on April 24, 2013

I'm surprised these guys haven't been sued by Microsoft yet. I think the founder himself worked on a similar project at MSFT called Hekaton and moved out. Atleast that is what I heard from a couple of devs he tried to recruit out of sql. I'd advise startups to be weary of jumping ship.

CurtMonash · on April 23, 2013

I blogged about this in some detail this morning.

http://www.dbms2.com/2013/04/23/memsql-scales-out/

xtacy · on April 23, 2013

Nice product. I'll try this out.

It seems like the isolation level is Read Committed. Are there plans to support higher isolation levels (Serialisability)?

nikita · on April 23, 2013

This is in the books. It's tricky to do it on the cluster and has perf implications, but it is doable and we have a design for that.

RoboTeddy · on April 24, 2013

Looks great!

Quick question: since data is uniformly spread across n leaf nodes, do queries that require checking a number of rows >> n hit nearly every leaf? If so, does this create latency problems when n is large? (since it'd only take one slow request out of n to cause high latency)

ankrgyl · on April 24, 2013

Yes, we will fan out queries when necessary. In a hard core oltp workload this will affect latency across the system by flooding the network (if you send too many of these queries at once) but we expose knobs that let you limit these queries' parallelism to keep the rest of the system really fast.

brianatdts · on April 23, 2013

the problem with scaling out to multi cores with a focus on ram is that in larger datasets you end up trading disk latency for network and protocol latency. I a not sure that is a great trade even if we are talking about fiber channel as a medium.

dennis82 · on April 23, 2013

I have to disagree; disk is ancient - it's mechanical egads! - while 10GigE is pretty commonplace now and infiniband and fiber channel are even faster.

back from my CS 101 takeaways: there are only 3 bottlenecks in a computer system: CPU, network, and IO.

looks like MemSQL is fixing the CPU and IO bottlenecks, but physics is physics so network is pure hardware solution haha

simcop2387 · on April 23, 2013

The problem is that you can end up with larger latency over the network because it still takes a fixed amount of time for nodes to communicate. Even with a 1TB/s link between nodes you can still have a good 30ms between them all adding even more latency. That can be mitigated somewhat by a good protocol that can manage that latency properly (e.g. not blocking while waiting on ACKs and such), it can still end up with far more latency than a few large disks would be (even better now with SSDs). That said I do imagine that some datasets will benefit from this kind of topology (I can imagine that geospatial stuff will do well with that, since you can locate physically close things on a single machine and reduce the amount of talking needed).

TylerE · on April 23, 2013

30ms? In anything resembling a modern datacenter? 0.3-0.5ms is more typical these days.

brianatdts · on April 23, 2013

He was joking.

resu · on April 23, 2013

How does this compare to kdb+? This seems like a much less arcane competitor.

ericfrenkiel · on April 23, 2013

kdb+ is compressed columnar in memory on a single box with a very exotic language called Q. memsql is row-based in memory across n-machines using SQL.

resu · on April 23, 2013

kdb+ is also very fast for real-time time series analysis and signal generation. Seeing Morgan Stanley and Credit Suisse in the customer list made me wonder if memsql could become a competitor in the niche that kdb+ currently dominates?

CurtMonash · on April 23, 2013

Yes, if we note that there are core niches where nobody will replace kdb+ any time soon.

vugu · on April 23, 2013

Great design on the site! Congrats to the memsql team

rip747 · on April 23, 2013

why is it that every company that has a blog either, a) doesn't have a link or b) buries the link to their main product site on their blog?

paddy_m · on April 23, 2013

Often because the blog is run on a separate software platform from the main site.

CurtMonash · on April 23, 2013

Still no excuse.

I've educated multiple vendors on the subject. E.g., see the last point on http://www.strategicmessaging.com/marketing-communications-t... :)

gtrubetskoy · on April 23, 2013

Where is the source code?

hackerboos · on April 23, 2013

It's commercial.

dennis82 · on April 23, 2013

what are the major feature updates on this release compared to last one?

nikita · on April 23, 2013

there are a lot of new features! Beyond distributing data across multiple machines in a cluster, there's more SQL surface area, multiple levels of redundancy for HA, and a distributed query optimizer. Some cool stuff with bi-directional lock-free skiplists too w.r.t. indexes

ankrgyl · on April 23, 2013

http://developers.memsql.com/docs/2.0/2.0releasenotes.html