Vega-Altair: Declarative Visualization in Python

ryan-duve · 2024-02-25T21:53:52 1708898032

Working as a data scientist, I have exclusively used Altair since I joined my company in 2021. Every one of my coworkers uses Matplotlib. Two people said something like "Oh, you're using that library that's supposed to be better and probably is, but I just don't want to relearn everything for plotting" but nobody else has even shown interest, let alone taken the plunge.

If you need to learn a plotting library and you already work in Pandas, I recommend choosing Altair to learn. It's a natural extension to pd.DataFrame and the only magical incantation to learn is

    alt.Chart(df).mark_<plot_type>.encode(x=df["col"], y=df["other_col"])

I find this significantly easier to work with than Matplotlib, where the same things can be done in several ways with subplots, plt.figure(), df.plot(), and maybe others?

My only complaint with the library is that outputting to an image file feels weirdly complicated. I often resort to making HTML files and taking a screenshot if I don't want to take the time to look up all the steps equivalent to `.savefig("file.png")`.

jmmease · 2024-02-26T15:33:34 1708961614

Image export before Altair 5 was a bit complicated because it required either selenium plus a system web browser, or a node.js installation. In Altair 5, we switched to using vl-convert for image export, which is just a regular Python wheel with no external dependencies. So now, `chart.save("file.png")` should be just as easy to use as matplotlib's savefig!

(Disclaimer: I'm a Vega-Altair maintainer and the author of vl-convert)

RobinL · 2024-02-26T07:02:34 1708930954

To save to an image in Altair, it's just chart.save("myimage.png"). This works much better since version 5 (the syntax is the same but the dependencies are less fiddly)

jerpint · 2024-02-26T12:12:53 1708949573

I use seaborn to plot directly with pandas, does Altair have any extra advantages? Or is it a similar style?

RobinL · 2024-02-26T16:16:34 1708964194

I wrote an article a while back about why I think it's the best default choice for vis in Python: https://www.robinlinacre.com/backing_vega_lite/

Case_of_Mondays · 2024-02-25T22:31:05 1708900265

Altair is so important for data science as a product.

Every data scientist endeavors to make an impact with their analysis, and ultimately that is typically tied to some kind of visualization. There needs to be a way to a) build the visualization you want and b) get it out there to people who would find it useful.

Just plotting in matplotlib means that you must either export as a PNG (ew) or provide the analysis itself to users/decision makers. PNGs are terrible because you completely lose interactivity. Providing the analysis means figuring out deployment of your python environment, which is possible but just causes another step between analysis and decision made on the analysis.

Altair and the vega-lite grammar of visualizations provides an interoperable and data centric way to build visualizations. It is extremely flexible when building visualizations and I find it very intuitive when it comes to complex plots. They can also be easily embedded into any webpage after being exported using the vega-lite spec, just include the vega-lite script in the html page. Can even be used with in dashboarding tools like Spotfire (I assume also with things like PowerBI although I haven't done it).

Imo no real reason to use matplotlib as a data scientist lest you seriously limit the future impact of your work

k1next · 2024-02-26T08:00:31 1708934431

I work as a data scientist (like many others in this thread) and Altair is a great tool for exploring the data. I really like the grammar of graphics approach and the ability to do things like cross-filter and the general ease of enabling interactivity are great! Amazing work by everyone on the Altair/Vega team.

There are some drawbacks though that have been holding some of my colleagues back from switching from e.g. Seaborn to Altair:

- No zoom by selecting the x- or y-range. Almost all other libraries implement box-zoom, Altair does not have this. This is probably the biggest complaint I have heard so far.

- Annotating data inside the plot is difficult. I don't mean `mark_text`, I mean placing an annotation that you would do in Matplotlib via `ax.text(...)`.

- Along side that: No TeX support.

- It does become slow once there is a large number of points in the chart (and you don't want to aggregate).

- Missing mapping features. I know you can have maps in the background of your plot, but I'm talking OSM like mapping.

- Working with layered + faceted charts takes some time to get used to.

- It used to be that if you had a dataframe with 100 columns and you were only plotting two of them, still all 100 columns would be saved to html. I think this is addressed in VegaFusion?

Still, I want to just emphasize how much I love Altair. To understand relations in your data, it's amazing to just assign color to a column, shape to another column and so on. Really neat! With a background in mathematics, it was also helping me to think about the whole tidy approach to dataframes, so that was an added benefit!

jmmease · 2024-02-26T15:39:32 1708961972

(Disclaimer: I'm a Vega-Altair maintainer)

Thanks for the feedback and for the kind words! All of these drawbacks are fair, just a couple of comments.

There is an experimental package called altair_tiles that makes is possible to add OSM-style maptile backgrounds to Altair charts. See https://github.com/altair-viz/altair_tiles. This is mostly for static charts at the moment, as it doesn't integrate well with pan/zoom yet.

As you mentioned, VegaFusion is able to remove unused columns in most cases. (And if it doesn't for a particular case, please open an issue!).

k1next · 2024-02-26T16:17:18 1708964238

Thanks for the feedback! Looking forward to `altair_tiles`. And again, great work, very much appreciate it!

Maybe while you're here: Is there any desire to implement box-zoom (or x-range zoom) at any point in the future?

jmmease · 2024-02-26T16:28:16 1708964896

Box zoom would need to be added to Vega-Lite first, and there has been some discussion around it in https://github.com/vega/vega-lite/issues/4742. Bottom line is that there's nothing blocking its implementation, someone just needs to do the work in Vega-Lite. And once released in Vega-Lite, Altair would pick it up automatically with how we generate the Altair API from the Vega-Lite schema.

tomrod · 2024-02-25T19:27:31 1708889251

> empowers you to spend less time writing code and more time exploring your data

Sidenote: I like Altair and think it's a good development, despite rendering being performed client side.

This said, the claim here is tiring when it's used everywhere. Having spent significant time with Altair, I'd argue it might have tighter code but the documentation can be obscure. I haven't found it to make things easier from a developer perspective, but rather it does solve the use case that you are working in Python and need to have the client render a figure without callbacks (things like raw html dumps and similar).

joelostblom · 2024-02-27T23:36:28 1709076988

Feel free to open an issue to let us know which parts of the documentation you find obscure and if you have suggestions for how to improve them. We did a larger overhaul a few months back and are always open to feedback on how to improve it further! https://altair-viz.github.io/

(disclaimer: I'm a co-maintainer of Altair)

TheAlchemist · 2024-02-25T23:25:39 1708903539

Altair is great.

I'm using Plotly more lately, for 2 reasons - chart have very good interactivity by default and I'm also a Dash user - so it comes in naturally.

But I believe Altair is excellent for exploration - simple charts are very simple to create, and it's not complicated to create a complex exploration dashboard - with all charts linked between them - so you can select a subset of data on one chart, and it automatically updates the others. Highly recommend at least exploring it !

jmmease · 2024-02-26T15:45:28 1708962328

(a current Altair maintainer and a former plotly.py maintainer here)

Plotly is definitely a great option as well, and it can do a bunch of things Vega-Altair is not designed for. One comment, just in case you weren't aware, is that there is a relatively new library that provides good integration between Altair and Dash: https://github.com/altair-viz/dash-vega-components. It even makes it possible to access Altair selection states in Dash callbacks so that you can have other dashboard components respond to selections.

TheAlchemist · 2024-02-26T22:24:45 1708986285

Thank you !

Wasn't aware of that. Are there any plans to improve the interactivity - by that I mean scale on X / Y independently - 'plotly' like ? I know you can bind specifically to one axis, when generating the chart, however, what I usually want is to be able to just zoom in on a specific axis when looking at the chart - Plotly is really great as the behaviour is depending of what you select - a rectangle is just a rectangle (zoom on both axes), a vertical selection zooms on Y axis, and horizontal one on X axis.

pid-1 · 2024-02-26T07:19:23 1708931963

Also Plotly generates Javascript so it has wrappers for many python frameworks such as Streamlit.

I'm also using it in a regular React app and it has quite decent APIs.

lmeyerov · 2024-02-26T02:31:03 1708914663

I've found altair/vega to not 'feel' declarative as whenever I sit down and try to pass in a json config for basic scenarios... something always gets in the way. GG is composable, so this should work in theory, so I've been curious - it feels close.

We have been enjoying Perspective as a more declarative & high-performance flow. Altair/Vega is beautiful though so it was a bummer when we dropped it for a recent project.

prasoonds · 2024-02-26T02:43:14 1708915394

If I may ask, what's Perspective - couldn't find anything on a first google search.

22c · 2024-02-26T03:22:42 1708917762

Most likely https://github.com/finos/perspective

lmeyerov · 2024-02-26T07:03:23 1708931003

yeknoda · 2024-02-26T03:44:15 1708919055

The main contributor/creator Vanderplas (http://vanderplas.com) is an insanely productive python/astronomy wizard. BIG Direct and indirect contribution to humanity. Thank you.

rogue7 · 2024-02-27T11:35:17 1709033717

I love Vega(-lite) / Altair, the grammar of graphics plotting system is really great to build any kind of chart even when it wasn't thought through by the authors of the library. There are other wrappers for languages that lack viz libraries, such as Elixir / Livebook [0]

However, when I used it a couples years back it struggled with large vizs, I think due to Vega(-lite)'s way of embedding the data in the viz artifact.

Also, interactive is nice but often I just need a quick static plot, and matplotlib is more convenient for this, you can easily see the png in any environment etc.

These days I'm eager to see an Observable Plot [1] wrapper for Python ! See [2] for a comparison to vega-lite

[0] https://github.com/livebook-dev/vega_lite

[1] https://github.com/observablehq/plot

[2] https://observablehq.com/@observablehq/plot-vega-lite

ddanieltan · 2024-02-26T05:01:39 1708923699

Like many, I was a big fan of ggplot2 from R, so coming into Python, I've always been searching for an equivalent graphical library. I used to be put off by Altair's famous

    MaxRowsError: The number of rows in your dataset is greater than the maximum allowed (5000).

but ever since the Vegafusion companion library came unto the scene, I'm back using Altair.

Overall, it's my preferred Python viz library although I do wish there was a better way for the Python library to better use introspection. After all, rather than

   alt.Chart(df).mark_line.encode(x="abc:Q")

to provide the information that abc is a quantitative variable, I'd much rather the library be able to introspect the df object, column abc to derive the data type and make the inference the column is quantitative.

That said, I'm already used to the extra syntax so still feel confident in using this library daily.

[1] https://vegafusion.io/index.html

joelostblom · 2024-02-27T23:39:32 1709077172

Altair actually does introspection when the data is passed as a dataframe (e.g. via pandas or polars). In these cases you can leave out the `:Q` and just write `x='abc'` and Altair will figure out that the encoding type shoudl be quantitative based on the column datatype in the dataframe.

If you are reading the data directly from a URL instead of via a dataframe, the URL is passed on to Vega-Lite and Python never sees the data, so no introspection can be made on the Altair side of things.

(disclaimer: I'm a co-maintainer of Altair)

acomjean · 2024-02-26T09:24:40 1708939480

We use Vega lite in our web sites. It gets us 90% where we want to be very quickly. The last 10% is a little fiddly.

It’s amazing though. High quality and downloadable graphs. We use it for single cell rna seq plotting and it’s pretty performant with large data sets (5000+ points on an xy scatter graph)

BerislavLopac · 2024-02-26T10:18:00 1708942680

My favourite tool for building visualisation on top of Vega is a Python library named -- in a stroke of naming genius -- Vincent: https://vincent.readthedocs.io

wslh · 2024-02-25T22:36:00 1708900560

Sidenote: Has Vega* a specific reference to the "Grammar of Graphics" 2005 book [1]? I used that book in research and remember praying for a real implementation. Looking into SO and an answer appeared in 2014 [2].

[1] https://link.springer.com/book/10.1007/0-387-28695-0

[2] https://stackoverflow.com/questions/4892368/implementations-...

domoritz · 2024-02-26T00:10:16 1708906216

In https://github.com/vega/vega-lite/issues/408#issuecomment-50... Leland Wilkinson himself said “having gone through a lot of these GG-inspired systems, I believe yours is the most authentic implementation. I'm using it every day. Thanks for all the great work you've done.”

hk__2 · 2024-02-25T20:57:45 1708894665

Are there non-declarative chart visualization libraries? I’ve always used matplotlib.pyplot [1] and while it doesn’t market itself as "declarative", I don’t see much difference:

     # Vega-Altair
     alt.Chart(source).mark_line().encode(
       x='x',
       y='f(x)'
     )

     # Pyplot
     plt.plot(source)
     plt.xlabel('x')
     plt.ylabel('f(x)')

[1]: https://matplotlib.org/stable/tutorials/pyplot.html

epgui · 2024-02-25T22:48:53 1708901333

I don’t know what you think declarative means, but the example you’re showing is as non-declarative as it could be for such a simple thing.

hk__2 · 2024-02-26T12:40:09 1708951209

> I don’t know what you think declarative means, but the example you’re showing is as non-declarative as it could be for such a simple thing.

According to Wikipedia this is a style of programming where you describe _what_ you want rather than _how_ it should be done. What I see in my example is declarative per this definition: "I want a chart with <source>, with this X label and that Y label".

epgui · 2024-02-26T13:03:52 1708952632

Yes but this is a contrived example so it’s not going to be the best illustration of the difference. You’re literally executing methods on an object as a way to instruct the computer to perform specific steps.

The fact that you can read it as if it were a description of the outcome is a happy accident (a purely aesthetic one at that) that would not necessarily happen with another example.

lcvriend · 2024-02-25T22:10:45 1708899045

If we only look at the simplest example then I would agree that there is not much difference. But more complicated plots will require you to write code in a more verbose and imperative fashion when using matplotlib.

Take a faceted plot like this scatter matrix [1] and try to plot it in matplotlib. You would need to set up the grid using subplots, then define the combinations you want and finally write logic to fill each subplot. The vega/altair code is much more declarative. You just tell it what needs to be in the rows/columns and vega/altair takes care of the rest.

[1]: https://altair-viz.github.io/gallery/scatter_matrix.html

theodpHN · 2024-02-26T07:21:33 1708932093

Mirror, mirror on the wall, what’s the most declarative dataviz programming language of all? :-)

  *** SAS SCATTER MATRIX EXAMPLE ***
  proc sgscatter data=sashelp.cars;
  matrix Horsepower Acceleration Miles_per_Gallon / group=Origin;

  *** ALTAIR SCATTER MARIX EXAMPLE ***
  import altair as alt
  from vega_datasets import data

  source = data.cars()

  alt.Chart(source).mark_circle().encode(
    alt.X(alt.repeat("column"), type='quantitative'),
    alt.Y(alt.repeat("row"), type='quantitative'),
    color='Origin:N'
  ).properties(
    width=150,
    height=150
  ).repeat(
    row=['Horsepower', 'Acceleration', 'Miles_per_Gallon'],
    column=['Miles_per_Gallon', 'Acceleration', 'Horsepower']
  ).interactive()

lcvriend · 2024-02-26T13:08:37 1708952917

Well, there also is for example `pandas.plotting.scatter_matrix()` [1] which is built on top of matplotlib. I suppose the question is how does SAS or any other alternative compare to vega/altair when the desired output is less standard.

[1]: http://pandas.pydata.org/pandas-docs/stable/reference/api/pa...

joelostblom · 2024-02-27T23:41:52 1709077312

If you are interested in similar shortcuts for repeated charts in Altair, I'm experimenting a bit with such a syntax in the package altair-ally https://joelostblom.github.io/altair_ally/examples.html. Feel free to try it out and leave feedback!

(disclaimer: I'm a co-maintainer of Altair)

datadeft · 2024-02-25T21:49:10 1708897750

Is there a plotting library that uses webgl? I am only aware of Plotly that uses webgl for some of the graphs.

domoritz · 2024-02-26T00:11:45 1708906305

We are working on GPU accelerated rendering for Vega and Altair.

kylebarron · 2024-02-26T06:43:22 1708929802

Is this open? Is there a way to read more about it? Issues/pull requests?

mattijn · 2024-02-26T19:24:26 1708975466

Have you seen https://github.com/jonmmease/avenger? Which is an experimental Vega visualization renderer in Rust using wgpu.

matmatmatmat · 2024-02-26T09:04:13 1708938253

Bokeh has support for WebGL. We had to switch away from Vega/Altair when a project hit around 50,000 data points in a plot, but under ~5,000 data points Vega/Altair was still good.

theodpHN · 2024-02-26T07:55:03 1708934103

Check out:

  DeckGL
  https://panel.holoviz.org/reference/panes/DeckGL.html

  ipysigma
  https://github.com/medialab/ipysigma