Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Plotting 3 years of hourly data in 150ms (leeoniya.github.io)
785 points by leeoniya on May 1, 2020 | hide | past | favorite | 102 comments



Unfortunately, I have to use Google Analytics and Google Ads at work every day, and these UIs have absolutely terrible performance :(

A small part of the problem is drawing the trend charts. So I decided to make uPlot [1] to see what was really possible.

[1] https://github.com/leeoniya/uPlot


Honestly the title doesn't do it justice. For a plotting tool it's not an impossible task, but making a webpage this responsive is incredibly hard, so well done there.


it's an understatement for sure :)

a lot of time is spent on just DOM layout and JITing the JS, which get amortized with huge datasets. the 166k bench doesnt take any longer.


Kudos for directing folks to other projects if uPlot isn't fast enough for them. Every developer should take the time to write out appropriate and inappropriate use-cases on the readme.


Hi, I'm very interested in this. I've been using dask/plotly for a side project but it's just too slow. If you know dask, do you think you could discuss which features of dask your project has and doesn't have?


Here's a Dash app that use Dask & Datashader for fast aggregations on +40M rows: https://dash-gallery.plotly.host/dash-world-cell-towers

The source code, which can be used as a Dash + Dask boilerplate, is here: https://github.com/plotly/dash-world-cell-towers

Feel free to ask questions at community.plot.ly


I don't think that actually plots the 40M points, just aggregates them in Python and plots a small section of that data on the website?


I believe so. An optimization of Datashader is that it _doesn't_ plot all the points, but rather optimizes which points actually influence pixel values.


It plots cell towers in the middle of my residential neighborhood where i absolutely sure there aren't any


https://community.opencellid.org/t/range-of-a-tower/370/2

>The range field tells you the approximate area within which the cell could be, in metres. This is effectively an estimate of how accurate the location is.

There's one marked in the middle of a corn field near me with a range of 48,000m. ?



Maybe. However cellmapper.net shows few cell towers where they actually exists and this map doesn't show them.

Credibility?


This is awesome. I wrote some software to pull readings off my weather station and stream them to the browser. The plotting library that I use is pretty awful. Besides being slow, it's constantly hosing up the numbers on the y-axis. I want to give uPlot a shot.

Station: https://carlisleweather.com

Software: https://github.com/chrissnell/gopherwx


Hopefully the upvotes and being on the front page is enough, but if not just want to say props, this is amazing. Already had it saved in my bookmarks glad to see it pop up again :)


thanks!


The 166k benchmark is ridiculously snappy. Could you shave off a few more ms by inlining the JSON as a data block?


probably, but something about diminishing returns...:p


This is fantastic and may solve a real problem for me. Thank you so much, and thank you for the MIT license!


I like this a lot! Reminds me of dygraphs, and then I saw it in that in the acknowledgements :)


This is a test reply please ignore it, it will be deleted in a few minutes.


That's rather beautiful.


This is just incredible.


The 150ms benchmark in the title is really selling this short, the performance is very impressive. The 150ms seem to refer to the time it takes to initialize the graph, and with a hot cache that is more like 50ms for me here. Redrawing seems much, much faster.

I have done some visualization with WebGL because I couldn't get it fast enough with just drawing lines. That was a while ago so I'm not sure about the details, but even a simple prototype just drawing a few tens of thousands to a hundred thousand lines using canvas was slower than this for me (subjectively).

I'll have to look at the code later, but I'm curious about where this library is getting the performance from. Before I saw this I would have said you can't do this at this speed without using previously downsampled version of the data points, I'm not entirely sure now.

I'm still looking at the performance tab in the browser dev tools and how impressively empty the main Javascript usage plot is.


It looks like the mouse lines and the selection highlight are just partially-transparent divs stacked on top of the canvas that get moved around, so nothing actually gets redrawn unless the date range changes (which is a pretty clever approach!).


but really tricky to align correctly at different screen pixel densities and rounding errors. it's still not pixel-perfect, but i decided it was good enough.


That's something I expected, you really don't want to trigger redraws on mouse over. What surprised me was that I couldn't tell the the times where I zoomed in or out from the JS flame chart. Usually that is really obvious, but in this case zooming was so fast that you could hardly see it. And this graph has probably around ~194k data points (388k in the source data, I assume that's x and y). I'm not entirely sure about the number of points here, I'm taking that from the json delivered to the site.

The selection highlight is unusual, I admit. That's something I'd just skip if I were going for high performance.


> And this graph has probably around ~194k data points

no, the top graph is pretty much what the title says: 3 * 365 * 24.

but 194k is also no problem :)


> I'll have to look at the code later, but I'm curious about where this library is getting the performance from.

https://news.ycombinator.com/item?id=23047156


Very nice project, but I just want to randomly point out that 3 years of hourly data is really not as much, as it may sound to somebody. In fact, it's 3 times less than 24 hours of every-second of data, which is pretty common scale for all sorts of real-time monitoring tools we often use. These fine-sliced stats really pile up quickly...


> but I just want to randomly point out that 3 years of hourly data is really not as much, as it may sound to somebody.

you're right, it isn't. and yet many js charting libs struggle even with this.

on my i5 thinkpad with integrated gpu uPlot can render 600 series x 8,000 datapoints in ~2000ms [1]. and it finishes this job in ~7s on my 2015 sony z5 compact phone. so there's that :)

but really, pumping 4.8M datapoints into a browser is probably not a great idea. you're gonna want to aggregate on the server at some point. just sayin'.

[1] https://leeoniya.github.io/uPlot/bench/uPlot-600-series.html


This looks really really neat!

Just want to ask, because I love talking about render performance: have you tried doing this using offfscreen canvas? That should allow you to move a lot of things to a worker, so you avoid blocking the rendering with js? It probably won’t speed up the total time to finished render, but I assume it will lock the page for a shorter period of time?


If the rendering is this controlled (and sparse), blocking the thread with it isn't really a concern. And with this data size, serialization/deserialization between the main and worker thread would probably become nontrivial.


Super cool! It would be great if you could provide some insight into how you built this and the kind of tricks you had to use to make this possible. Looking forward to a blog post in the future :)


a few things:

- the data is flat arrays of numbers

- there is no per-datapoint memory allocation beyond whatever is necessary for the browser to construct the canvas Path2D object. this keeps the memory pressure low and the GC nearly silent.

- the amount of draw commands is reduced by accumulating the min/max data values per pixel

- uPlot does not generate axis ticks by walking the data. it only uses min/max of the x and y ranges and finds the divisions from that.

- there is no mass-creation of Date objects, data is kept in timestamp format except when hovered.

- the date/time formatting is done by pre-compiling templates and not re-parsing them all the time.

- the cursor interaction uses a binary search over the x array


another one i forgot

- drawing at exact aligned pixel boundaries to avoid or minimize antialiasing

this makes uPlot charts a bit rougher looking, but the perf impact is quite large.


How much of an impact does this have?

I prefer the look of things snapped to pixels; I didn't realize it also speed things up!


hey adam :)

it's highly dependent on what's actually drawn. i regressed it by accident and didnt notice a huge difference until i randomly opened the stress test (which is also densely packed so probably a worst case for AA)[1]. it went up by a factor of 2-4. cant remember exactly.

[1] https://github.com/leeoniya/uPlot/blob/master/bench/uPlot-60...


Let this be a lesson that JavaScript is not slow. The DOM can be slow, and downloading large bundles can be slow, and any code can be made slow if you try hard enough to write it poorly. But JavaScript the language is not slow in 2020.

When we talk (or rant) about the performance of JS-based apps - running on the web or in Electron or otherwise - it's really important that the conversation focus on the factors that actually matter.


No, JS is slow(ish). The GC pressure alone will kill you. This is fast despite being in JS by simply doing as little as possible (while still meeting the requirements on the end result, which is where the genius lies).

The fastest code is no code. No code is fast even with pathologically slow languages/runtimes.


It's no slower than Python. Probably faster unless you're staying within a native library like NumPy.

But that wasn't my main point: my main point is that people love to rant about "JavaScript apps" while maintaining (willful?) ignorance about what the actual factors are that manifest as the slowness they experience. Sometimes it's poor usage of the DOM. Usually it's ads. It's almost never the unavoidable overhead of JS cycles.


> people love to rant about "JavaScript apps" while maintaining (willful?) ignorance about what the actual factors are that manifest as the slowness they experience.

Are you sure you got this right?

Just because JS is fast[0] doesn't mean we cannot complain about Javascript apps that should have been plain web pages?

It is clearly possible to create advanced Javascript apps that are enjoyable even to people like me.

For most pages I use as a consumer I don't see much value add because of frontend code: I see the value of autocomplete, drag-and-drop etc but I would much prefer if my CPU stopped chewing after web pages were loaded and rendered.

[0]: yes, I agree - in most cases the Javascript language or engine is not the cause of the performance problems on the web


Whether or not a given site should be client-side rendered is a valid discussion

Ad bloat is a valid discussion

Lazy development practices in modern sites/apps are a valid discussion

But in my experience, these as well as other less-legitimate issues all tend to get lumped under the banner of "JavaScript stuff == slow", without any nuance.


Can't say for sure but the way I read them most complaints about Javascript performance are about the results of the abuse of Javascript applications everywhere.

I guess Javascript just happens to be the common thing between a number of them, and people would be just as annoyed if the multi-megabyte, cpu-hogging, data-stealing monstrosities where hand-crafted in assembly ;-)

And: If someone complained about Javascript being slow for scientific calculations I'd probably call them out on it ;-)


> but I would much prefer if my CPU stopped chewing after web pages were loaded and rendered.

this happens when it's programmed with performance as an afterthought, which is sadly how things work when the prerogative is to just ship more features and continually ingest dependencies with the most features, with little vetting.


Javascript is probably at least an order of magnitude faster than Python.


For me as a user it seems to be: loading google analytics, loading fonts that seem to be improperly cached, loading chat windows, assistance widgets and spies, and dynamically adding things after page load that cause reflows.


It's among the fastest scripting languages. Only LuaJIT turned out to be faster (than Node.js) in my benchmarks. Node is on par with Dart most of the time and the latter is statically typed.


This is really cool! Another great library for working with large time-series datasets is Bokeh (https://docs.bokeh.org/en/latest/index.html).


Bokeh looks cool, too.


This is awesome, thanks for sharing!

Question: in the demo you load papaparse but the data is delivered as JSON?

Would you mind sharing what your backend looks like for the project where you're using this, and how do you extract all of this data at once from storage?


> Question: in the demo you load papaparse but the data is delivered as JSON?

good catch, leftovers from original PoC i made locally. the exported data was still in CSV back then. i should clean that up!

> Would you mind sharing what your backend looks like for the project where you're using this, and how do you extract all of this data at once from storage?

i just did a manual export from Analytics. how you get the data out is up to you, but a time-series or columnar database would probably be a good start :)


I clicked out of mild interest and am completely blown away. I wish this kind of optimization was more common. Well done!


That looks amazing, nice job! I was trying to create a seb dashboard with several graphs each with a few hundred data points recently. None of the libraries felt "snappy" on a mobile device in terms of load time or responsivity. I ended up abandoning the proejct because of that, but if only I knew about this library at the time...

The only thing I wish µPlot had is support for tooltips rather than numbers in the legend, but that's a small price to pay for this level of performance.



oh, and another one is in here [1] which turns the legend into the tooltip :)

[1] https://leeoniya.github.io/uPlot/demos/candlestick-ohlc.html


That's 26k points. I don't think it's very difficult to render that many points quickly unless you try and use SVG. How long does Dygraphs take?


i don't have dygraphs demo for this, but you can see where it lands on the 166k bench: https://github.com/leeoniya/uPlot#performance


I didn't even read the title before clicking and was blown away at the speed. Honestly thought something acted up since you're not used to seeing things render this fast.


interestingly, focusing on performance in web dev (and actually delivering on it) really presents new UI challenges, because users are not used to near instantaneous feedback in web-apps. you have to start adding complexity like css transitions, and UI delays just to reduce user confusion.

lol :(


I'm somewhat confused. Is 150ms considered low? That's less than 7fps.

336524=26280. Emitting 26k points is like child's play for any program, are we really so far down in bloat that this is considered impressive? Because this is not impressive. A 16MHz AVR could execute 91 instructions per sample in those 150ms, more than enough for plotting that data.

A 3GHz machine could execute 17123 instructions per sample, and it would probably need less instructions than the AVR would.


FWIW, the majority of this 150ms is bootup time and includes initializing the DOM & canvas, JITing the js, downloading & parsing the dataset, and actually running the code, which includes data downsampling & gap detection/clipping.

try to get this perf on the web, and then you can re-asses your statement. if it was easy, then every other js charting lib would not be struggling to do it, right?

for native code (or webgl), obviously this is child's play. but webgl has significant trade-offs. e.g: https://bugs.chromium.org/p/chromium/issues/detail?id=771792

finally, you cannot extrapolate from the 26k/150ms number. uPlot can draw 4.8M points in ~2000ms on an i5 with integrated gpu (after bootup amortization).


Looks really nice.

I was slightly disappointed when I zoomed in as I expected 3 years of data in 150ms increments (600 million measurements) but that's probably not possible


Assuming there is some culling, I’ve found myself somewhat obsessed about the methods of dataset-culling. If there’s a single outlier, for example, some methods would skip it, where it should really be highlighted.


https://knowledge.ni.com/KnowledgeArticleDetails?id=kA00Z000... is a good algorithm to use to keep peaks. At a maximum you need 4x the chart width data points. In practice, you can easily reduce that. I've implemented the algorithm before and it can be done in linear time.


Slick! I too felt that many libraries were not slick/fast enough, but I eventually ended up using react-stockcharts (in canvas mode) for my use case and was quite satisfied with its performance.

Could you compare your performance against this one if possible?

https://github.com/rrag/react-stockcharts


Minor suggestion: not really a bug, but inconvenient UX. When I move start the selection and I move my mouse outside the chart it stops selecting it. If you added the mouse movement listeners to the document/body then when I move my mouse outside the chart it would keep working.

Great job btw, really instant.


For a moment thought here you were plotting 3 years of data collected every 150ms - wow! What you have here still very nice. I had to work with display/charting interval data in past and having it load quickly is very nice especially when toggling new dataset intervals.


Would it be possible to allow for the cursor to still update when doing a horizontal scroll? When I move the mouse left and right it does that, but it doesn't move when I traverse the page by horizontally scrolling

Either way, just a bit of feedback on a beautiful page


can you open an issue with a repro?


Beautiful demo, thank you for sharing.

I work with analytic tools to watch production systems and they boil my data down to a pathetic number of datapoints prior to rendering, when you know computers — even in a browser — are vastly more capable.


I don't think systems like this have ever been really limited to the amount of data they can _present_. A bigger limiting factor is the volume of data they can ingest, at what cardinality (ideally arbitrary), and how quickly they can get it back to you, transformed in some way (e.g. "i want to see p95").

That said, things like SignalFx or Datadog are certainly slow at the presentation layer, too ;)


Wow this is amazing! It makes me really want to build a Python wrapper.


I find it curious that the volume of users are similar for day of the month regardless of year. For instance the first of the month is not always the same day of the week.


Eyeballing the data, there doesn't seem to be a strong weekday effect. Not sure if this is real data (human activity data usually show weekday effects).

If there was a very pronounced weekday effect, a quick hack is to use a 364-day instead of full-year offset for comparison.


Agree. That's why i find it curious that it isn't representative of human activity but there definitely is seasonality.


This looks nice. I'm currently using HighCharts and Anychart. Wish they'd included Anychart in their performance comparisons.


AnyChart uses SVG, so it will be about the same place as Highcharts, but likely slower.


Congrats on the performance and thanks for posting. I especially appreciate how clean the source code is. Great work!


Very fast indeed! How do I zoom out?


drag. double-click to reset zoom.


Man, you forget how fast computers can be. This really opened my eyes.


Cool!! I wonder how the performance compares to Highcharts?



For a dataset like this, Highcharts would be slower for sure since it will create DOM SVG components for each of the points. <canvas> (which this is built with) is much quicker at presenting tons and tons of data points like this.


Highcharts has a Boost module which uses WebGL.

Here's a sample chart with 1,000,000 points: https://jsfiddle.net/5bvLgs5w/1/


the bench code uses the boost module. so it's apples-to-apples:

https://github.com/leeoniya/uPlot/blob/master/bench/Highchar...


Excellent work - and a great Readme to boot too!


Holy crap, this is bleeding fast. This is great!


Dang, this renders FAST. Well done.


that's really good..... but the bigger problem is getting all the data to the client.


I wish Grafana was this quick


there's this, but it looks stalled:

https://github.com/grafana/grafana/pull/21835


Super impressive, great job!


So how many is the data?


I feel like in general people underestimate just how fast computers are. Crazy to hear about libraries that choke on a couple thousand points when compared to https://hackernoon.com/drawing-2-7-billion-points-in-10s-ecc... (for instance)


completely agree.

additionally, it's hard to overstate how impressive modern javascript JITs are, as well as GPUs.

every time i visit a web page that downloads 2MB of js and spins up my CPU fan, i feel ashamed that this is the industry (webdev) where i make my living.

at least with uPlot i show that all this waste is not necessary.


The problem with large websites is that there's a dozen layers of abstractions leading to large and inefficient code. But when it comes to just raw speed (iterating over a huge array of numbers in a tight loop, like drawing this chart), then it's blazing fast especially with JIT. That's specifically where a lot of people underestimate JS. I've seen a lot of people do things in the backend when they can just send all the data and do all the filtering/sorting locally leading to much faster UX.


Interesting read indeed. Too bad this was posted to Hackernoon, had a hard time reading it due to the header bar constantly dropping down into the text I was reading. So frustrating!


Yeah. This is still tens of thousands of instructions per point.


If you wanted to render these on a regular basis, WebGL is fairly straightforward, and works really well for this simple sort of rendering. You could do this with one polygon and a small fragment shader (treat the data as a texture, and use SDF to draw the line and fills), or use the actual geometry (render as a triangle strip; and separate the line if you want to do more interesting stuff in your fragment shader).


I probably wouldn't even generate an SDF, instead read the data directly from a 1D texture and fill the anti-aliased line directly in the fragment shader.


Yeah, I meant more in the abstract sense, rather than "create a two-dimensional texture of the distance field", which as you point out would be unnecessary.

You could probably sample all the 1D textures in one pass, and draw all the lines and fills there. One additional nice side effect of this is that you can easily have sequences at different resolutions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: