Hacker News new | past | comments | ask | show | jobs | submit login
Vitesse X – A PostgreSQL extension module for faster queries (vitessedata.com)
56 points by onderkalaci on May 25, 2015 | hide | past | favorite | 20 comments



Is the confusion with vitess intentional ? https://github.com/youtube/vitess


Vitesse means speed in French :)


Yup, I should know since I'm french; I just thought it wasn't nice to name two very close database "accelerators" the same way…


It is not intentional. :-)


Having a look here, looks like the more esoteric (for most workloads) data types aren't supported: http://vitessedata.com/vitesse-x-doc


It's quite refreshing to see that it explicitly calls out data types not supported.


(ck from vitesse data) we will support many of those types eventually, and on demand. It will take time.


I am a happy postgres user for years, nothing special, just for personal projects. This site tells me so few things and sounds so flawless, that I am not sure if it is a parody or actually a product. Why would some random software be able to make my pg 8 times faster or more?


Check out the Tech and Company pages as well as the slides from the recent South Bay PostgreSQL Meeetup: http://goo.gl/Mtg2W6

The founders both have strong academic and commercial experience and the advisor is Prof. Joseph Hellerstein (he has both an excellent academic reputation and good experience in industry/tech transfer).

Note that TPCH is a "decision support benchmark". There are other technologies for helping postgres with these workloads as well: a column-store approach like https://github.com/citusdata/cstore_fdw, or the recent work on parallel sequential scans in Postgres core (http://rhaas.blogspot.com/2015/03/parallel-sequential-scan-f...), etc.


Did you bother to click on any of the links, or did you just glance at the "news" page? Almost half of the links on the page ("technology", "check it out", "100% Postgres, 100X Faster for Analytics", or "(slides)") they explain how it works (and it makes sense, although I'd personally be very wary of installing this, due to not particularly trusting that it gets the implementation "correct").


I did read the Technology page but it sounded so much like buzzword bingo that I really was not sure. Thanks!


You might need to update your about section on HN:

> "In an attempt to avoid losing my entire life to this website, I no longer comment as often as I used to, and, when I do, unless it is related to a topic (iOS jailbreaking) where it is part of my "job" to respond, I make it something of a policy to not look at things people say in response until at least a month later."


The 8x improvement is somewhat misleading as it is based on a single TPC-H query. The other queries from the benchmark seem to show a ~3x improvement: http://vitessedata.com/technology

Given that their query engine uses an LLVM JIT compared to the standard PostgreSQL interpreter, these results are very reasonable. This is similar to comparing an interpreter vs a JIT for a programming language.

Even with the best intentions, benchmarking a database is notoriously difficult. I work on Presto, a distributed analytic SQL database, and we are often asked how it compares to other systems, or how it will perform for a given workload. The answer is always "try it and see", both because how it performs for your configuration and workload is the only thing that matters, and because there are so many variables that factor into the result.


Yes, the technology page calls out that on average it is 3.7X faster with just LLVM and algorithmic improvements. We are trying to push more value into Vitesse X by adding vector processing, and maybe include some of our threading work in the next version.


> Why would some random software be able to make my pg 8 times faster or more?

Peter Boncz (creator of VectorWise, HyPer) says it best here: http://www.tpc.org/tpctc/tpctc2013/slides_and_papers/005.pdf

Remember also that the postgres planner and execution machinery is ANCIENT. Plan scoring still assumes Disk I/O = 10 MemoryAccess = 100CPU instructions. Modern CPUs are SIMD and Vector processing, but PG planning/exec machinery isn't vector aware. There's a 200X difference between L1 cache and main memory access that the PG planner can't see. And what about SSDs and NVMe vs disk? Spark gains its speed through caching intermediate result sets along with the ability to reconstruct them. But postgres doesn't cache intermediate results or query plans last I checked. Plus a million other things.

Postgres is wonderful, but not because it has natively fast analytics. 8X is nothing. 180X is better, but not as fast as Vertica, Aster, Greenplum, et al. I say kudos to Citus and Vitesse for leading the way.


> Why would some random software be able to make my pg 8 times faster or more?

I think we've already seen plenty of evidence that for OLAP type queries, there is a TON of room available for improvement vs. off the shelf PostgreSQL.


This feels a little like 'snake oil' to me - a little short on real details and the idea of using licensed software especially with my database server does not sound like something that I want to go back to.


probably not snake oil; and 180x faster on tpch-1 is no where near physics speed. but these guys are just starting out. i'm sure they will go even faster once they pull out the stops.

the nice thing about their model is that it's a pure accelerator play. plug it in and go faster, pull it out, or replace it with something else. in theory, there should be no change to the application/query layer. It should be as simple as entering "create/drop extension vitessedata;"


Sorting will be a lot faster on PostgreSQL 9.5:

http://pgeoghegan.blogspot.com/2015/01/abbreviated-keys-expl...

http://pgeoghegan.blogspot.com/2015/04/abbreviated-keys-for-...

I wonder whether and to what extent Vitesse benefits from using this technique, which is fairly well known (e.g. it appears in a 1994 paper by Gray -- Alphasort).

As I go into in the second link there, someone reported an order of magnitude increase in sort performance in one case.

For what it's worth, I doubt that this fully accounts for why Vitesse is faster. I don't know that I'd trust my data to it, but I don't think it's snake oil, even if some of the figures shown are arguably a bit misleading.


(ck from vitesse data) We haven't exploited this technique mentioned by Peter. We will when 9.5 comes out, and we are looking at it to see if we can put it into our 9.4 extension so people can benefit from it before 9.5. The technique is sound, which should make it go even faster as plans generated by Postgres seem to be partial towards sorts.

As for the trust element, yeah, we know it would take time to earn that in the market place.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: