Ursa Labs: an innovation lab for open source data science

neves · on April 19, 2018

Just to be clear, the author is Python Pandas creator and main maintainer. He knows what he is talking about. These are great news.

Hope they get a lot of funding.

wesm · on April 19, 2018

I'm not the main maintainer anymore (Jeff Reback has that honor), as I've moved on to work on the "greater pandas ecosystem" (of which my work on Apache Arrow is a part of this long development arc)

skadamat · on April 19, 2018

Super excited. Just reached out to the email on the site on how I can help technically in the future!

saltandvinegar · on April 20, 2018

It sounds like with RStudio and Two Sigma they've got startup capital at least. I hope others join them in funding this important work!

psychometry · on April 19, 2018

I'm a little confused about what the product is. They are using Apache Arrow to build...something. How would this stuff benefit me as an R programmer?

makmanalp · on April 19, 2018

It would change things primarily under the hood for table-like data structures ... imagine having a data.frame or tibble object but in memory it's stored in a very specific layout which allows it to a) make use of pre-created highly optimized library code to do complicated queries b) be interchanged with other programs with almost zero overhead, e.g. imagine doing a query on a massive database in spark and then loading it into R or pandas instantly without having to wait.

psychometry · on April 19, 2018

Thanks. Do you think there is a use case for traditional "small data" analysis in R, i.e. standard CSV->dplyr->output sort of transformations?

hadley · on April 19, 2018

It should make things a bit faster, and a bit easier to collaborate with people using other languages.

WhompingWindows · on April 19, 2018

If the author comes in here, let me say: thank you for Feather. I am currently using that for a project on very slow servers and it is helping me quite a bit.

My question here is this: like the creator of Vue.JS has done, could Ursa Labs go the way of Patreon, and have potential personal, academic, or corporate clients simply donate to ensure contributed support and production of great tools for data science?

makmanalp · on April 19, 2018

Very happy to see Apache Arrow gaining support! It'd be very cool to see modern columnar storage architecture gain widespread adoption: there's decades of research in it and massive improvements for analytical workloads.

tmandry · on April 19, 2018

I'm really happy to hear this. The vision for Arrow is solid, and I'm looking forward to a future where C++, Python, and other languages like Rust can interoperate more smoothly in an integrated analysis workflow. I hope Wes and Ursa Labs will be a major factor in developing the way we approach data science over the next 10+ years.

vtuulos · on April 19, 2018

This is very exciting, thanks Wes! A robust, high-performance, polyglot, in-memory runtime for data(frames) would be extremely useful.

The remarks about challenges related to OSS maintenance/innovation were spot on too.