Hacker News new | past | comments | ask | show | jobs | submit login
Ursa Labs: an innovation lab for open source data science (wesmckinney.com)
170 points by stablemap on April 19, 2018 | hide | past | favorite | 12 comments



Just to be clear, the author is Python Pandas creator and main maintainer. He knows what he is talking about. These are great news.

Hope they get a lot of funding.


I'm not the main maintainer anymore (Jeff Reback has that honor), as I've moved on to work on the "greater pandas ecosystem" (of which my work on Apache Arrow is a part of this long development arc)


Super excited. Just reached out to the email on the site on how I can help technically in the future!


It sounds like with RStudio and Two Sigma they've got startup capital at least. I hope others join them in funding this important work!


I'm a little confused about what the product is. They are using Apache Arrow to build...something. How would this stuff benefit me as an R programmer?


It would change things primarily under the hood for table-like data structures ... imagine having a data.frame or tibble object but in memory it's stored in a very specific layout which allows it to a) make use of pre-created highly optimized library code to do complicated queries b) be interchanged with other programs with almost zero overhead, e.g. imagine doing a query on a massive database in spark and then loading it into R or pandas instantly without having to wait.


Thanks. Do you think there is a use case for traditional "small data" analysis in R, i.e. standard CSV->dplyr->output sort of transformations?


It should make things a bit faster, and a bit easier to collaborate with people using other languages.


If the author comes in here, let me say: thank you for Feather. I am currently using that for a project on very slow servers and it is helping me quite a bit.

My question here is this: like the creator of Vue.JS has done, could Ursa Labs go the way of Patreon, and have potential personal, academic, or corporate clients simply donate to ensure contributed support and production of great tools for data science?


Very happy to see Apache Arrow gaining support! It'd be very cool to see modern columnar storage architecture gain widespread adoption: there's decades of research in it and massive improvements for analytical workloads.


I'm really happy to hear this. The vision for Arrow is solid, and I'm looking forward to a future where C++, Python, and other languages like Rust can interoperate more smoothly in an integrated analysis workflow. I hope Wes and Ursa Labs will be a major factor in developing the way we approach data science over the next 10+ years.


This is very exciting, thanks Wes! A robust, high-performance, polyglot, in-memory runtime for data(frames) would be extremely useful.

The remarks about challenges related to OSS maintenance/innovation were spot on too.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: