Hacker Newsnew | past | comments | ask | show | jobs | submit | drosenthal's commentslogin

- Northern Virginia based database startup in stealth.

- Founded by engineers with prior >$50MM exit.

- A very challenging and exciting project.

- We are looking for self-motivated, smart software engineers.

- Specific skills of interest: Systems programming, C++, engineering for high performance, asynchronous/distributed programming, data structures.

Contact: info at foundation-d-b dot com [with no dashes]

Update: Not hiring remote employees at this time.


I designed and built high-end web analytics software for years.

Sampling your data by visitor/cookie (not by page view) is the number one thing to do that that no one does. Just sample your data down by a factor of 10 or 100. The costs drop by a factor of 10 or 100, query speeds improve, and the business value of the data drops by very, very little.

The only unfortunate prerequisite for this approach is that you need a company/boss that understands that sampled numbers are OK even if they are not exact. (So, explain, if you must, that your industry insider source tells you that the notion of 'exact' in web analytics world is very loose indeed.)


Yep. For queries with large volumes of data, Analytics does do sampling and gives you error margins (+/- x%) next to each number. I found that having these margins helps convince higher ups that it's OK, and also tells me when it's not OK and have to dig a bit more.


I think drosenthal is talking about sampling the number of hits you send to analytics. GA's tracking api supports this:

http://code.google.com/apis/analytics/docs/gaJS/gaJSApiBasic...


Sergei, you make some great points here. Thanks for posting.

The OP asked what Clustrix does that RAC does not. Leaving aside the interesting architecture for a second, do you believe that your database can scale to a higher aggregate write per second load than Oracle RAC? than Exadata? At a lower cost?


Absolutely. Yes to all three.


Exadata does use SSDs now, in the form of Sun's flash memory cards (though they still are backed by rotational disk). You're right, however that this and Exadata look to attack the same problem in a similar way. If we believe Clustrix about the "100s of nodes" rather than the 20 they show data for, it would likely out-perform Exadata by a bit too.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: