From the announcement
“As of now, we have mined 1,580 PySpark tests from the Spark codebase, among which 838 (53.0%) are successful on Sail. We have also mined 2,230 Spark SQL statements or expressions, among which 1,396 (62.6%) can be parsed by Sail”
Kinda early to call this a drop in replacement with those numbers no?
But, with enough parity this project could be a dream for anybody dealing with spark’s dreadful performance. Kudos to the team
The next paragraph explains that: "When looking at the test coverage numbers alone, Sail’s capability may seem limited. But we have found that there is a long tail of failed tests due to formatting discrepancies, edge cases, and less-used SQL functions, which we will continue tackling in future releases."
I am with you that it is still very very early. I'll personally keep an eye on the project.
I'll keep an eye on it too, but for a query engine formatting compliance and edge cases tend to be almost all of the work. It's easy to implement SELECT x FROM y WHERE z.
Yeah but the website literally says “zero code changes”. It’s the long tail that’s dangerous since most people don’t understand it as well as a the core functions
Kinda early to call this a drop in replacement with those numbers no?
But, with enough parity this project could be a dream for anybody dealing with spark’s dreadful performance. Kudos to the team