Full disclosure: I'm a software performance nerd who is building a deeper understanding of hardware using whatever tools are
I work with a lot of assembly traces emitted from CPU sims in Verilator. These traces can be several gigabytes in size, with tens of millions of entries. I've found the typical interactive matplotlib backends to not work the best with tens of millions of points on my M1 Macbook Pro. This is no slight against Matplotlib, or cairo, or agg, or the default macOS backend—they prioritize cross-platform "It Just Works" support over extreme performance.
But it does irk me that there seems to be such a gap between the amount of data we can visualize in the gaming domain in comparison to the scientific domain[1]. The difference between 50 million and 5 billion is 100X. I'd be ecstatic if I could get within 10X difference. I understand that this gap exists for lots of reasons, of which I'd be happy to hear in detail in the replies to this comment.
Game engines spend a whole lot of work to not draw things. Even if you have an 8K monitor, that's still only around 33 million pixels. If the data set you want to visualize is 5 billion elements, and if you decided that you were going to use every single pixel on the screen, you'd still have to collapse those 5 billion elements in a visual space of less than %1 the size.
In a video game it's easy to figure out what not to show - just hide everything that would normally be too small or far away to see. But in a scientific domain, what to show and what to hide might be a much more interesting question.
On the other hand, if you're just looking for a plot of a few billion elements with the ability to zoom in and out, then a straightforward decimation of the data and drawing it to the screen can work. In the past I've written custom tools to do just that, but at this point in life I would probably throw it at something like SciChart and let it take care of that.
Graphics is not as trivial of a problem as you're making it seem. Even if you're just rendering a single "thing", that thing can have an unbounded number of vertices. And it's definitely not a trivial problem deciding which vertices are important and which are not.
Oh, I just meant that it was conceptually easy - things should look 'right' - whereas with scientific data figuring out how to map that data to a 2D screen often involves a lot of carefully thought-out discretion about what to show and what to hide.
The actual implementations of 3D rendering are gargantuan feats of engineering, science, and art. Having spent time writing code to turn terrain height maps into quantized meshes for 3D rendering, I agree entirely that deciding which vertices are important is not a trivial problem. The OP linked a video about Nanite, which IMHO is a marvel of engineering.
My point is that both you and the person you were replying to have a kind of fundamental misconception about why data is slow and graphics are fast. The misconception is that data isn’t slow and graphics aren’t fast, at least relative to one another.
For graphics, the main performance metric is (generally) triangle count. You can draw lots of low-poly objects on screen more efficiently than you can draw one very high-poly object. The same holds true for data: you can render low amounts data more efficiently than you can render large amounts of data. Nanite doesn’t magically render high-poly meshes in real time. Nanite needs to pre-process the mesh and produces lower poly meshes that maintain the geometric properties of the original mesh. This is the main innovation of nanite, because traditionally it’s been very hard to reduce triangle count while keeping the overall geometry roughly similar to the original. And in this way, data processing has traditionally been much more efficient than 3D graphics. There have long been various statistical aggregations you can do on data to keep the same rough statistical properties using less data. But you have to be willing to preprocess the data, and that is slow. I haven’t used nanite, but I imagine the import process is also slow relative to the rendering speed after processing.
>The misconception is that data isn’t slow and graphics aren’t fast, at least relative to one another.
depends on the data/graphics. modern commercial GPUs are beefy (even when talking about integrated ones) and I suspect the Matlib kinds of tools aren't even tapping into a fraction of a percent of its power.
But at the same time 5 billion draw calls raw will bring even a decent gaming GPU to its knees, at least for responsive, real time applications. The trick is to first understand your data (e.g. that 5 billion triangles are useless on a monitor that has 1-4 billion pixels). As you said, even Nanite isn't truly trying to process a trillion triangles raw.
The steps from that understanding to a good enough approximation are indeed some dark magic.
there are indeed several reasons, but in this case I suspect the problem isn't even with the matplotlib (well, most of it). You mention traces that are GB's in size and that's the first big problem.
For reference, the newest Zelda game is a total of 18.2GB of storage, and of course Zelda isn't trying to audit the entire game on every load. Games spend a lot of time making sure their assets are lean, and at the end it is deployed in some binary format to further reduce its impact on a game. trace files that focus on human readability lose this compressibility.
So regardless of how fast the graphical plotting capabilities are, I imagine such an app is CPU bound from simply trying to parse that trace data. That'd be the first place I'd look to optimize (you know, without properly profiling your app. Probably something I'd say in an interview setting).
You can't draw that data on the screen in any meaningful manner without zooming.
At various zoom level what you actually want is a low pass filter over a slice of the data, such that the low pass filter matches the display. E.g. I can display 1080 pixels across, but there's 1 billion points. Ok... meaningful data is limited.
Optionally you could do what other device displays do and show a sort of fuzzy intensity gradient at each display x axis over y points to represent the number of actual samples at that point.
It's kind of crazy honestly. There's many options to display billions of points across 1080 pixel wide displays but none of them really show you the reality. Only some subset of information of the reality there.
There exists scientific visualization tool that goes far beyond what you can visualize in gaming domain, e.g. some applications are deployed on supercomputers so that you can change parameters interactively with the visualization.
Gaming is highly specialized applications, so you should also be comparing to highly specialized applications in scientific domain.
It’s worth noting that, as far as I know, nanite involves preprocessing the mesh to convert it into nanite’s special format. You could do the same thing with data easily. The issue is taking a large batch data you’ve never seen before and displaying it efficient in real time, which nanite doesn’t do either.
I'm not sure I fully understand why some projects so proudly avoid using standard library features and e.g. favour a more manual approach to memory/data management. What are some good key reasons for this?
> avoids ... C++ headers
Same question for this point - but is it the same reason as for avoiding STL containers or something else?
Sprinkling small malloc and free-s everywhere, where every object has their own individual lifetime to be managed, is very often not the best approach. Using STL containers often means using that prolific allocation strategy -- it just makes the numerous small allocations implicit. A region based approach is often far simpler conceptually, and more performant. This is a decent article on the topic: https://www.rfleury.com/p/untangling-lifetimes-the-arena-all...
C++ is a big language and there is a lot of diversity of opinion in what a sensible subset of the language is, with numerous concerns influencing those decisions -- one common one is compile times. Using C only headers generally ensures a much lower ceiling for compile times compared to using typical C++ headers. And C++ headers are slightly more likely to use the prolific allocation strategy from above. As such, C++ headers require far more scrutiny before acceptance, in my view.
Historically, in high-performance resource-constrained applications (eg: console video games), developers would use a "stripped down" version of C++ (eg: no RAII, no exceptions), which often included avoiding STL. It's less true these days, but still somewhat. In truth, STL will never be fully catering to these uses; that's not its goal.
One reason STL was avoided is that control over memory allocation was quite poor. std::allocator was somewhat conceptually broken for a long time, only recently remedied in C++14 and 17. So, there's still about 20 years of code out there from the days of old bad C++ (:
Another reason is that early STL implementations were poor, and not well-suited to high-performance use cases. Again, this has gotten better (some), but the legacy lives on.
So, if you are designing a library to be used in video games (such as in TFA), and want wide adoption, then avoiding STL, RAII, exceptions is still generally a good idea, IMO.
One important reason is compile time. C++ stdlib headers are so entangled which each other that each one brings in tens of thousands of lines of complex template code, which may increase the compilation time for a single source file to seconds, and it's getting worse with each new C++ version.
Also, it's not like writing your own growable array is rocket science, and a simple, specialized version can be done in a few dozen to a few hundred lines of code.
STL also has some kinda broken features, eg it’s hard to work with a std::vector<T> because everything is different when T=bool and then you get confusing errors if you happen to do that accidentally. std::vector<char> is also a bit broken though that’s the language’s fault rather than STL’s.
C++ compiler inlining capabilities have had to become magically powerful to keep up with the layers of small functions under every STL container method. Meanwhile, in some applications (games) high performance in debug builds is necessary.
Same for headers, compile times matter. I'm very much looking forward to widespread, fully-modulized standard libraries.
When working in Unity, I was honestly shocked how much performance gain I got out of a small app by moving away from LINQ to using custom allocated containers. That convenience has a cost, and games can't afford such a cost.
I’m using this from Julia and both the user and developer experience is great.
It’s much more limited than publication style plotting libraries but the instant 60fps reactivity is amazing.
This is awesome. ImGui is the only native UI library I've seen that makes intuitive sense to me. Other things are all bogged down in their own frameworks rather than just providing a simple interface for people to create UI elements with. I think most ImGui users are in game development, so I wonder what kind of uses this would be good for. Maybe some debugging information, but personally I've never needed aggregate statistics for debugging info
IMGUI is great but I always saw it as a more dev-backend debugging tool than a serious production UI framework. Hooks in easily and draws some very convenient visuals.
That's great for bootstrapping, not great for scalability.
Yeah I just wish there was a native UI library that was as easy as ImGui. Everything else is overly complicated. The only other UI framework I know of that is comparably simple is html and css
HTML and CSS are very forgiving for non-compliant code. I can morph the webpage at will by using <div>s everywhere and then manipulating the appearance in CSS. If I want to add data to the dom later on, I have a billion different ways to do so using javascript.
Compared to, say, XAML, where some elements can only have certain types of children, auto-populating data has to go in specific containers, and all the XAML tags map to actual C# classes, It's a much more strict, far less forgiving, and far less flexible syntax
Years ago I wrote an oscilloscope-style monitor for an ATI load cell using ImGUI in the early days. This library would have been extremely useful. Kudos!
There’s no caching. Most people advocating or theorizing about caching haven’t measured the performances of dear imgui or implot. Computers are incredibly powerful when you don’t waste their resources.
Yeah I've experimented with caching vertex arrays in a scenario very similar to ImGui. It's gotta be faster when there's thousands of frames with zero changes, right?
Turns out, not really. It's very easy to make performance worse like this. I'm sure you can find scenarios where it makes sense, but in general it's just not a big deal.
If writing the low level gpu handling code, the easy cases to handle are the 100% constant data case and the 100% dynamic data case. Anything else counts as 100% dynamic, pretty much, and it's a pain to deal with because you also have to take a copy of the original data. You don't know which bits won't get overwritten, and the contents of those areas need saving regardless.
This can work out as a net positive, because copying memory can be cheap compared to recalculating the data it contains. But if your API requires the caller to resubmit all the data on every frame, it makes no difference.
(Which is not to suggest there's anything wrong with requiring the caller to resubmit all the data, if it makes life easier for them. The worst case needs to perform well and one way of making sure it does is to make sure it's exercised all the time.)
My scientific programmers with only beginner to intermediate python knowledge struggled with libraries for visualization. I might recommend the python bindings for this to them..
I work with a lot of assembly traces emitted from CPU sims in Verilator. These traces can be several gigabytes in size, with tens of millions of entries. I've found the typical interactive matplotlib backends to not work the best with tens of millions of points on my M1 Macbook Pro. This is no slight against Matplotlib, or cairo, or agg, or the default macOS backend—they prioritize cross-platform "It Just Works" support over extreme performance.
But it does irk me that there seems to be such a gap between the amount of data we can visualize in the gaming domain in comparison to the scientific domain[1]. The difference between 50 million and 5 billion is 100X. I'd be ecstatic if I could get within 10X difference. I understand that this gap exists for lots of reasons, of which I'd be happy to hear in detail in the replies to this comment.
1:https://youtu.be/eviSykqSUUw