Hacker News new | past | comments | ask | show | jobs | submit login
Effectively Using Matplotlib (2017) (pbpython.com)
144 points by jjwiseman on Nov 16, 2019 | hide | past | favorite | 40 comments



Matplotlib belongs to the worst category of software: very powerful and very awful. Nothing makes any sense and it’s so profoundly unintuitive it almost feels like I’m being pranked. But, of course, use it I must.

Pandas also comes off as an unintuitive joke, but my displeasure with it has mostly worn off. Matplotlib however makes me feel angry pretty much everyday.


I’ve always categorized MPL as one of the worst designed (popular) python libraries... if not THE worst. Yet, I still find it useful even in comparison with other more modern libraries like Plotly’s Python API.


What data-analysis package do you prefer over pandas? I've used pandas since circa 2015, and lately have been trying to become fluent in R and the Tidyverse, but it's been hard so far to unlearn the deeply ingrained python/pandas patterns.


There’s nothing for Python except Pandas. I came from a FP and static typing background before I moved into ML/quant finance and initially found Pandas incredibly difficult to reason about. The tool is designed for scientists who know nothing about how a library should be designed or how a program should be structured. There’s a lot of dynamic stuff in Pandas that while making things easier for scientists make things a lot more difficult for CS people. Same thing with numpy and scipy, and other data science libraries.

My absolute favorite DataFrame library is saddle (for Scala), which I helped write at my old quant job. Very FP oriented and an absolute pleasure to use. Though maybe it’s no surprise that I like something I worked on.

An incomplete list of things that I dislike about Pandas are:

Too many parameters and knobs for each function

Inconsistenty between inplace and copying operations

Unintuitive function names compared to FP

Too much magic in how things work

Functions and parameters accept a wide range of types in order to make things “just work.”

Lots of non-orthogonal convenience functions that do mostly the same thing

I’m not familiar with how “normal” Python is written, but I suspect a lot of the problems come from the abuse of dynamic typing. Dynamic typing allows you to just add more and more levels of crap without actually changing your data/type model. I think there’s a lot of value in “correct” APIs, vs convenient ones.

That being said, Pandas is extremely powerful, and usually very succinct. Maybe not as nice as kdb+/q (nothing really compares for time-series data), but still pretty good.


Start with R for data science (book by Hadley, available online for free)

Tidy verse assumes tidy data. If you are not working with tidy data, it is unlikely to be a big help. Most data can probably be thought of as tidy.

Remember that any and every operation on a data frame returns a data frame, so unlike chaining in Pandas, you never have to worry if a method you want to use belongs to a series or a data frame, or if your method is returning a series or a data frame.

Select() selects columns, filter() selects rows. This never changes unlike the [] which means different thing depending on if it is used on a data frame (which you are not guaranteed to be served after calling a method on data frame in pandas!) on a series or using the .loc or .iloc methods.

There is no index, instead you just filter on rows.

Pandas comes with a ton of build in utilities which the tidyverse doesn’t, mostly because R is already full of functions you can easily apply across columns.

But particularly pandas date handling functions are really cool


That's reassuring: I'm (slowly) working through Hadley Wickam's r4ds online book, which is just fantastic. Thanks for the tidyverse tips!


I suppose it is very bad software but very useful and popular is that the people who create it concentrate on "business value" instead of "beauty". This means that it solves specific problems very well but the overall architecture has become inconsistent and there are many weird quirks.

Usually this is resolved by a complete redesign of the software that gets delayed forever, never becomes production ready and eventually disappears into obscurity.


Matplotlib is verbose, and has had an inconsistent API in the past (v3 has improved this a lot), but if you need to produce publication-quality figures using Python, taking the time to get comfortable with it pays off. I've been using it to produce maps and data visualisations for years – when I finally figure out how to make something look good, I put the notebook on Github, both for my own reference, and for others: https://github.com/urschrei/Geopython


This post is great---even just explaining the difference between figure and axis, and the multiple systems (and the wise recommendation to use the OO system), and all the rest is gold---that stuff took me days of beating my head against the wall and searching through the matplotlib documentation to sort out.

Honestly, for 99% of uses Seaborn is great, so long as you remember to use the latest version---for some reason, a lot of people seem to have 0.8.0 installed, and the api changed with 0.9.0.

For uses beyond what Seaborn can do, I think that the best strategy is just to figure out a personal plotting language and then wrap that up into a personal library so you never have to think about that again. That's kind what I've done: I threw together a library to produce some basic figures that are suitable for printing,[1] and now I never have to think about those figures again.

[1] https://github.com/paultopia/plottyprint


Matplotlib is one of the most user-unfriendly libraries I've had the displeasure to use. The most effective thing to do is to not use it at all.

If you can get away with it use pandas' plot, seaborn, altair, etc.


I have the opposite opinion. I'm a nuclear engineer in R&D and have been using it with much pleasure for many years. I find the graph I want in the incredible gallery and take a peek at the example code and am off. Sure I have to search around a little to polish the details but it always goes well.

https://matplotlib.org/3.1.1/gallery/index.html


I admit that almost everything I do with matplotlib is the result of searching for an answer rather than trying to navigate the API documentation. But that's true for most anything I do with programming these days. Stuff has just gotten so huge.

What I can't decide is if: a) matplotlib is difficult; b) plotting is inherently complex like writing sheet music; c) object oriented programming leads to gratuitous complexity. So I chalk it up to some combination of the three, but have never felt compelled to try anything different.

I don't do anything for publication, but I use plotting inside of software that I use for running lab experiments, prototypes of measurement hardware, and even in the factory. So I'm using Python for what people would have used LabVIEW for in the past. My programs need to produce readable plots without tweaking, because I don't know in advance what the data are going to look like. The combination of tkinter and matplotlib is really huge for me.


Sure I have to search around

Every single time! Unlike, say, numpy, where everything is consistent, makes sense, and works as expected almost always.


When doing more complex visualizations I usually have to search around anyway, for inspiration, because I don’t know exactly what I want to draw.


Everyone I know at $big_research_lab uses it and hates it. But what are the alternatives really? Altair plots are larger than you dataset, plotnine has totally unusable docs and insane behaviour in some places (look up what plotnine's gg.ylims does; I'd bet it's caused more than one peer-reviewed error.), and further seems to have basic operations like drawing a vline scale in slowness with the size of your data. Plotly is commercial, raw d3 is inconvenient from python. The situation is deeply unfortunate.


> Plotly is commercial

Since their 4.0 release (https://medium.com/plotly/plotly-py-4-0-is-here-offline-only...), there's no longer any connectivity to their cloud service, it's "offline only". It used to have an offline interface _and_ a connected interface, now it's offline only.


Plotly.py is indeed developed primarily by employees of Plotly the company, but it's a 100% free/open-source, MIT licensed library that works totally offline and doesn't depend on any external service or require any kind of registration :)

See https://plot.ly/python/is-plotly-free/ for full details.


IMO plotting things is pretty easy, if it’s so bad it seems like it would be easier to just use draw with pillow/tkinter.

You could always even just write out GNU Plot commands and then call it. I used that for the test harness I wrote in the robotics club (and everything was in C!) to plot the trajectory of the robot in auto mode. It’s super easy! I don’t remember if the GUI has all the panning and scaling though.


Oh, seaborn is tightly coupled to matplotlib, so I don't really distinguish the two in my complaining, but you should definitely be using seaborn.


what is wrong with plotnine's ylim?


Clips your data, changing the behaviour of smoothers etc, iirc gg.cartesian(ylim=...) is the thing that people usually want.


I definitely felt that way when learning matplotlib at the start. But, after spending a good amount of time learning the object oriented api I find it insanely expressive. It can get verbose, however I have full control over every part of the viz. The worst feeling for me is spending an hour to get plot to a good spot just to learn all the finishing touches of like to make to the axes, ticks, annotations, etc are not possible or only possible via hack. matplotlib May have some large initial hurdles but once you get over those you get full control of your viz. To be fair, there’s no reason one should have to suffer as much as one does when learning matplotlib. It’s worth it to stick it out imo.


I think the reason it's awful is because they prioritized making the transition from Matlab easy.


I agree with the basic premise: matplotlib is sort of lousy to use, and annoying to learn, but it works and does everything you might need. There’s something to say about software that solves a problem


I'd say the premise a bit differently: matplotlib is tricky to learn but once you figure out how it works it's not bad.

I just went through this process myself (http://kachess.k2company.com) and this outline would have been SO helpful. I learned these points the slow, hard way.

(And while it doesn't suck, it's certainly not fun and intuitive.)


I’m a long term Matlab user and I’ve been using Matplotlib more and more recently. This is partially out of frustration with recent changes to Matlab graphics and also a desire to use more open source tools.

Matlab plotting is extremely powerful and versatile. Sometimes the output could be nicer but the interactive figure hierarchy is great. Matplotlib on the other hand is, at least to me, a lot more clunky to work with. But it gets the job done and the output often looks nicer and solves my gripes with Matlab.


I'm in a similar boat. I actually like Matlab's plotting a lot more than Matplotlib, but I use python for everything these days. So for me I use Matplotlib for really quick visualizations (usually just plt.plot()), and if I want to do anything more fancy I'll dump my data into Matlab.


What is the state of the art in Python data visualisation compared to ggplot2 in R? Over the last few years I have gone exclusively with ggplot2 because it seems so intuitive and customisable.


Can I plug my Veusz GUI plotting application? It's written in Python, so it's also a Python plotting library. It's more a object-orientated library than matplotlib, so plots are built out of widgets.

https://veusz.github.io/


Altair is getting pretty good. It's a bit like ggplot in that it's declarative, though I wouldn't suggest dropping R for this any time soon.

https://altair-viz.github.io/


Another option is to use plotnine, which is intended to be a ggplot2 clone. It uses matplotlib under the hood, so if something isn't right you can tweak it. That was the main drawback I found with Altair: your declarative code is almost literally dumped to a json file and then rendered by a process external to Python, so good luck tweaking your plot.

https://github.com/has2k1/plotnine


I think there are some pretty good arguments for why Altair (which is just Python bindings for Vega Lite) should be people's first choice. I've written about this here https://medium.com/@robin.linacre/why-im-backing-vega-lite-a....

In my mind, ggplot2 is actually more comparable to matplotlib in that it's more expressive, but less intuitive. It's interesting that the 'successor' to ggplot2,ggvis (which is on ice) , used Vega as a backend.


Plotnine is a verbatim port of ggplot2. There are other ports but plotnine is the good one.

https://plotnine.readthedocs.io/



Since version 4, plotly isn't horrible. It's okay. It does basic things fine enough that I use it almost exclusively. However, I most plot for personal analysis, not for presenting to others.


Another crucial tip if you do a lot of custom drawing, is to use collections instead of calling draw functions per object. This radically speeds up drawing. For example using PolyCollection to draw a big bunch of polygons, then LineCollection, EllipseCollection etc.


I've learned to love matplotlib and its OO interface.

I just wish that its documentation examples would consistently provide the OO interface version of how to achieve each example, at least alongside the state-machine version.

It's always frustrating to see an example image that shows exactly what I want to achieve, and then click on the code for it and it's using the other interface, and I have to try to guess the equivalent OO commands. Which are always slightly different, like set_ylabel instead of ylabel...


Isn't Seaborn famous for this very reason? matplotlib is a bit difficult to write code in but seaborn makes it easy.


Seaborn is nice for making “standard” plots. but if you need to customize your graphics somewhat, you’ll find yourself needing to use MPL in addition to Seaborn anyway.


I don’t mind matplotlib but I highly recommend trying to use seaborn over it for anything.

Specifically seaborn’s catplot (for categorical), lmplot, swarmplot, pairgrid, and facetgrid.

The seaborne gallery really has an extra level of expressiveness that you might not have considered as an amateur visualizer and you can make some very nice things.

Matplot lib runs underneath it so you’ll need to learn all of the adjustment functions: lim, figsize, ticks, etc. but I think it’s fine overall.

Charts are hard because there’s more depth than people realize and if the library wasn’t deep you’d be unable to express that depth.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: