Hacker News new | past | comments | ask | show | jobs | submit login

What would you recommend for visualisation?



I actually have no idea about that. I don't think there's an equivalent to R's base graphics, so that would seem to make matplotlib the closest thing to a standard -- seaborn [0], which I've seen used a lot lately for more advanced dataviz, lives atop it, but it's also relatively new.

People seem to have conflicted feelings about matplotlib, maybe because of its origin in MATLAB? Not that Matlab itself is bad, but I think the decision to make matplotlib's API comfortable for MATLAB users seems to cause confusion to contemporary users, even before the usual 2.x vs 3.x issues (matplotlib ported to 3.x a few years ago but many users still write Python in the 2.x style.)

Anecdotally, I feel like I see advice like "Just use plotly" more than I see recommendations to actually learn matplotlib. I actually gave up on matplotlib until I stumbled upon this comprehensive tutorial, which covers the basics and many elaborate use cases. If there's a book that does it better, I haven't heard about it:

http://www.labri.fr/perso/nrougier/teaching/matplotlib/

The matplotlib site itself is chockful of well-documented examples, but some of them seem to be significantly more verbose than they need to be. My impression is that the library is stable/ubiquitous enough that there isn't a big movement to overhaul things. Last time I looked at the API changes for v2.0 [1] (1.5.3 is stable), most of the changes had to do with default styles and stylesheets, which is non-trivial given the number of people who use ggplot2 because it "just works"

[0] https://stanford.edu/~mwaskom/software/seaborn/

[1] http://matplotlib.org/devdocs/users/dflt_style_changes.html


What are your thoughts on bokeh? I seem to always revert to R for visualizations


I've used R and Python. I stick with Python whenever possible because IMO it supports the non-modeling parts of data science more effectively. ETL scripts, API creation, Flask for hosting simple websites, etc. yHat makes python-ggplot and Rodeo, similar to RStudio. I explore and develop algorithms in Jupyter notebooks, documenting along the way, while running "hardened" code from the command line, often nohup'ing it on a Linux box, for services that run perpetually, keep me updated via Slack/SMS/email, etc.

For visualization, almost everything I do is in D3, p5.js, or in Processing (Java), which has a Python interpreter, for those interested. There are some great Processing books and Daniel Shiffman is the Hadley of that world. Tons of engaging resources from him. There are tons and tons of good D3 books and online resources. bl.ocks and Mike Bostock's other online articles are wonderful.

Every organization with data scientists defines "data science" differently. People with a modeling and stats focus probably should stick with R. If you find yourself in a position with a wider scope, you simply must have more tools in your tool belt, and in my opinion, R, Python, and JavaScript all are part of that package. For me, personally, Processing is, too. Have a look at Ben Fry's work to understand why. I also use openFrameworks when the volume of data to visualize and performance concerns require it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: