Hacker News new | past | comments | ask | show | jobs | submit login
Python Pandas: Tricks and Features (realpython.com)
180 points by endlesstrax on Aug 29, 2018 | hide | past | favorite | 8 comments



Woah, I had no idea the testing module existed. One thing I've found useful in pandas are the DataFrame .query and .eval methods [1]. They're nice for cutting out tons of lambda functions in pipes.

E.g.

  df.somemethod() \
    .loc[lambda df: df.x < 2]

becomes

  df.somemethod() \
    .query("x < 2")

One issue I've noticed is that there's a frustrating bug [2] that causes many queries to raise an error before evaluating, but this can be fixed by changing the engine argument:

  df.query("x.str.contains('a')", engine = "python")

1: https://pandas.pydata.org/pandas-docs/stable/generated/panda...

2: https://github.com/pandas-dev/pandas/issues/22435


I wish more teams considered it important to expose the tests as a module or subpackage that is included in distribution, such as what numpy does with numpy.test(‘full’) [0].

When you are knee deep in some long-running docker container with some data analysis going on in an interactive console and get hit by a weird bug, it can be so, so helpful to easily run unit tests post-installation to verify everything is setup correctly.

It can also be a good step in CI if you build minimal docker containers that should house an installation of the package at the given commit, and have e.g. Jenkins build the container with the package installed from that commit and then launch the container with a simple command like

python -c “import mymodule; mymodule.test()”

[0] https://stackoverflow.com/questions/9200727/is-there-a-test-...


That's a good point--I used to keep tests outside the package, but it seems like some projects make good use of having people who open issues run the unit tests beforehand.


I recently spent a bunch of time trying to restructure an SPSS dataset that had a sub-optimal structure. After failures with excel macros and SPSS syntax, I ended up with about 100 lines of python using pandas columnar multindex and stack(). The stack/Unstack is so fantastic for preparing data for tableau I recommend everyone learn to use it.


Thanks for posting - totally worth the read to learn there's a pd.read_clipboard() function.


Came here to say that. How come no other tutorial or MOOC on pandas mentions that? It's so useful.


Whilst it has its uses, I think we should encourage people to do things in a reproducible way


Sadly, it doesn't look like there's a way of setting the xwin clipboard selection




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: