Pandas (Python data analysis tookit) version 0.8.0 released

choffstein · on June 29, 2012

wesm, I can't thank you enough for this library.

I own and operate a quantitative finance business. Pandas (+ numpy) has been a godsend. Not only do I not have to pay for matlab licenses, but even the less experience programmers on my team have been insanely productive.

Thank you.

skierscott · on June 29, 2012

What do you see the advantages of Python/pandas/numpy/etc over matlab as? Do you use any toolboxes?

choffstein · on July 1, 2012

Basically, pandas, numpy, and matplotlib give me everything I could have wanted out of matlab from a numerical capabilities and graphing perspective.

In my opinion, matlab's excellent object inspection and debugging capabilities can be replaced with strict testing standards in your code-base.

On top of that, I get to use a whole slew of libraries that are non mathematically related -- frameworks for web services, accessing ftp servers, sending e-mails -- a lot of automated "utility" stuff.

And it is all "free". Fantastic.

hogu · on June 30, 2012

not the person you're responding to, however I also made the change in the financial industry for a variety of reasons.

but the main one for me, was that python does non-math things much better than matlab. Since python is a general purpose language you can go from analysis to production application much faster, whereas with matlab it usually involved getting a software developer to rewrite it in java.

We used to take our python analysis code, wrap it up in a web app, and then use that to server risk information to traders, and it was quite easy to do so.

b_emery · on June 29, 2012

Sounds like an interesting business. The link in your profile seems to be invalid, fyi.

choffstein · on July 1, 2012

It was my personal site that I recently "scrubbed" off the web. I updated my profile for my business site. Thanks for the reminder.

monk_the_dog · on June 29, 2012

Pandas is outstanding, I love it.

This may not be the place for this, but...

I just built pandas 0.8.0 and it would not build with MinGW 0.5. The problem is -mno-cygwin is no longer recognized by MinGW's gcc. My solution was to edit distutils/cygwincompiler.py and remove references to -mno-cygwin. THIS IS NOT PANDAS'S FAULT! I just thought I'd point it out in case other people run into the same problem.

dbecker · on June 29, 2012

Wes: Thanks!

Anyone reading this who wants to get started with Pandas: The early release of "Python for Data Analysis" (http://shop.oreilly.com/product/0636920023784.do) is already very helpful.

etrain · on June 29, 2012

This release looks like a great upgrade to a great library. The idea of using python as my "one language" is really appealing, but I still find myself falling back on R pretty consistently when it comes to data manipulation/analysis. As pandas matures I see myself doing this less and less.

Thanks Wes and everyone else who pitched in!

wesm · on June 29, 2012

I'd be interested to see some of your R use cases where you perceive that things could be improved in pandas; a year ago there were lots of things you couldn't do, but a lot has changed :) Nowadays, the tables have turned and there are lots of things you can do in pandas that are nearly impossible to do in a non-kludgy way in R (particularly many things with hierarchical indexing).

pietoastfox · on July 1, 2012

Wes - I think that the reason I would keep using R over pandas is all the packages in the R universe. Which I suppose is the reason why you would use pandas over R if you had more experience with python.

E.g. ggplot2 still seems to be quite a bit better than matplotlib. Also for the random data examination/sketching I absolutely love rstudio due to it's integrated help/plotting/file browsing.

etrain · on July 3, 2012

I guess it's been about 9 months since I really put pandas through its paces, so I'll take another look. IIRC, last time I really tried to do a project with pandas, I found some typing/data transformation issues to be the things that held me back most. If I find some time in the next couple of weeks I'll try to put together some concrete examples.

ig1 · on June 30, 2012

Can someone give a rundown about how Pandas compares with numpy ?

redstripe · on June 30, 2012

It's intended to be a library for working with data stored in multi dimensional in-memory tables. Think loading a .csv file or relational table into memory, performing some transformations, adding columns, merging with other tables, and grouping data, filtering, sorting, all while handling missing data gracefully.

Maybe I would describe it as combing a spreadsheet with SQL data transformation capabilities - but better.

It requires numpy because it uses ndarray as it's underlying data structure and you can also use many/most? of the numpy data analysis functions.

phren0logy · on June 30, 2012

It requires numpy. Check out the linked article...

petergx · on June 30, 2012

Awesome to see this release. Great stuff for timeseries data analysis among other things. Thanks Pandas!