I own and operate a quantitative finance business. Pandas (+ numpy) has been a godsend. Not only do I not have to pay for matlab licenses, but even the less experience programmers on my team have been insanely productive.
Basically, pandas, numpy, and matplotlib give me everything I could have wanted out of matlab from a numerical capabilities and graphing perspective.
In my opinion, matlab's excellent object inspection and debugging capabilities can be replaced with strict testing standards in your code-base.
On top of that, I get to use a whole slew of libraries that are non mathematically related -- frameworks for web services, accessing ftp servers, sending e-mails -- a lot of automated "utility" stuff.
not the person you're responding to, however I also made the change in the financial industry for a variety of reasons.
but the main one for me, was that python does non-math things much better than matlab. Since python is a general purpose language you can go from analysis to production application much faster, whereas with matlab it usually involved getting a software developer to rewrite it in java.
We used to take our python analysis code, wrap it up in a web app, and then use that to server risk information to traders, and it was quite easy to do so.
I just built pandas 0.8.0 and it would not build with MinGW 0.5. The problem is -mno-cygwin is no longer recognized by MinGW's gcc. My solution was to edit distutils/cygwincompiler.py and remove references to -mno-cygwin. THIS IS NOT PANDAS'S FAULT! I just thought I'd point it out in case other people run into the same problem.
This release looks like a great upgrade to a great library. The idea of using python as my "one language" is really appealing, but I still find myself falling back on R pretty consistently when it comes to data manipulation/analysis. As pandas matures I see myself doing this less and less.
I'd be interested to see some of your R use cases where you perceive that things could be improved in pandas; a year ago there were lots of things you couldn't do, but a lot has changed :) Nowadays, the tables have turned and there are lots of things you can do in pandas that are nearly impossible to do in a non-kludgy way in R (particularly many things with hierarchical indexing).
Wes - I think that the reason I would keep using R over pandas is all the packages in the R universe. Which I suppose is the reason why you would use pandas over R if you had more experience with python.
E.g. ggplot2 still seems to be quite a bit better than matplotlib. Also for the random data examination/sketching I absolutely love rstudio due to it's integrated help/plotting/file browsing.
I guess it's been about 9 months since I really put pandas through its paces, so I'll take another look. IIRC, last time I really tried to do a project with pandas, I found some typing/data transformation issues to be the things that held me back most. If I find some time in the next couple of weeks I'll try to put together some concrete examples.
It's intended to be a library for working with data stored in multi dimensional in-memory tables. Think loading a .csv file or relational table into memory, performing some transformations, adding columns, merging with other tables, and grouping data, filtering, sorting, all while handling missing data gracefully.
Maybe I would describe it as combing a spreadsheet with SQL data transformation capabilities - but better.
It requires numpy because it uses ndarray as it's underlying data structure and you can also use many/most? of the numpy data analysis functions.
I own and operate a quantitative finance business. Pandas (+ numpy) has been a godsend. Not only do I not have to pay for matlab licenses, but even the less experience programmers on my team have been insanely productive.
Thank you.