Hacker News new | past | comments | ask | show | jobs | submit login

the problem with scaling out to multi cores with a focus on ram is that in larger datasets you end up trading disk latency for network and protocol latency. I a not sure that is a great trade even if we are talking about fiber channel as a medium.



I have to disagree; disk is ancient - it's mechanical egads! - while 10GigE is pretty commonplace now and infiniband and fiber channel are even faster.

back from my CS 101 takeaways: there are only 3 bottlenecks in a computer system: CPU, network, and IO.

looks like MemSQL is fixing the CPU and IO bottlenecks, but physics is physics so network is pure hardware solution haha


The problem is that you can end up with larger latency over the network because it still takes a fixed amount of time for nodes to communicate. Even with a 1TB/s link between nodes you can still have a good 30ms between them all adding even more latency. That can be mitigated somewhat by a good protocol that can manage that latency properly (e.g. not blocking while waiting on ACKs and such), it can still end up with far more latency than a few large disks would be (even better now with SSDs). That said I do imagine that some datasets will benefit from this kind of topology (I can imagine that geospatial stuff will do well with that, since you can locate physically close things on a single machine and reduce the amount of talking needed).


30ms? In anything resembling a modern datacenter? 0.3-0.5ms is more typical these days.


He was joking.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: