Hacker News new | past | comments | ask | show | jobs | submit login

I did work on making a database myself, and I must say that querying 100TB fast, let alone storing 100TB of data, is a real problem. Some companies (very few) don't have much choice but to use a DB that works on 100TB. If you do have small data, then you have a lot of options. But if your data is large, then you have very few options. So it is correct to be competing on how fast a DB can query 100TB of data; while at the same time being slow if you have just 10GB of data. Some databases are designed only for large data, and should not be used if your data is small.



The larger your data, the more that indexing and maintaining them hurt you. This is why they do much better at larger datasets vs small data sets. It’s all about trade offs.

To overcome this, they make use of cache and if the small data is frequently accessed, the performance is generally pretty good and acceptable for most use cases.


Did anyone else notice the surge of brand new accounts that are appearing on these discussions of Databricks with pro-Databrick opinions?

If we had access to IP address of the posters, I sure would be interested in looking at correlation among them.


What about my comment above is pro-Databricks? Snowflake works the same way. So do most large scale DW insert Exadata, Netezza, etc...

Does anyone else notice people questioning common sense?


with most people working from home, not sure if this heuristic works.

disclaimer: works for databricks, but not on spark, and first time posting in this thread




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: