FBG – Lightweight C Linux framebuffer graphics with parallelism

codedokode · on June 18, 2018

It looks very ineffective. For example, in fbg_draw() they have this code:

https://github.com/grz0zrg/fbg/blob/master/src/fbgraphics.c#...

> memcpy(fbg->buffer, fbg->disp_buffer, fbg->size);

So they don't flip buffers, and don't even use video memory for them, they just copy data from buffer in main memory into a memory-mapped framebuffer. This must be slow. Also it doesn't check for vblank and therefore doesn't protect from tearing, but the description says they have "double buffering".

As a side note, I remember that framebuffer was very slow on a computer I had. Without proprietary video card driver both Windows and Linux had troubles even scrolling a window, it was very laggy. Why was unaccelerated VGA (or VESA?) mode so slow, I wonder?

smallstepforman · on June 18, 2018

VESA is suprisingly good with Haiku. Since most GPU manufacturers have ignored Haiku, the community has spent some time getting the VESA driver as good as it can get, and it is very respectable. Most users will not be aware that they are running VESA, since Haiku still less laggy than it's contemporaries.

zlynx · on June 18, 2018

Haiku is also one of the best performers in a virtual Spice display. It feels faster and more responsive than a virtualized Fedora or Ubuntu.

waddlesplash · on June 18, 2018

Honestly we haven't spent a massive amount of time on VESA, or really much optimization in general. It's not like there are a ton of magic tricks in there to make it fast or something. Somehow everyone else still manages to be slower than we are, though...

ajross · on June 18, 2018

The memory mapped frame buffer is video memory, that's why you have to use a driver API to map it. Though obviously on many systems it ends up just being system dram with different caching settings anyway.

And this isn't something you'd code a game with, but you'd be surprised at how high modern memory bandwidth is.

Vendan · on June 18, 2018

Yeah, but linux framebuffer can do a very optimal swap... Just make the "virtual buffer" twice as tall, and then swap which "half" is "active" with a panDisplay IOCTL... That's why memcpy is such a bad idea, cause there is a very cheap method that can prevent tearing and such.

onirom · on June 18, 2018

I am aware of those, this might land in a future update.

The library initial focus was just to get multiple cores to work for graphics task...

pjmlp · on June 18, 2018

Ah memories of swapping card registers for double buffering on MS-DOS.

Here is a tutorial about it, that I just googled for.

http://www.brackeen.com/vga/unchain.html

v_lisivka · on June 18, 2018

With CRT monitors, it was possible to swap buffer in the middle of the frame, to display different buffers in different areas of the screen. It was used on 16bit systems for fast scrolling and to display status line.

pmarin · on June 18, 2018

>As a side note, I remember that framebuffer was very slow on a computer I had. Without proprietary video card driver both Windows and Linux had troubles even scrolling a window, it was very laggy. Why was unaccelerated VGA (or VESA?) mode so slow, I wonder?

FB on modern computers work surprisingly well except for playing video or 3d graphics. Sometimes on BSD systems it is the only option.

userbinator · on June 18, 2018

Without proprietary video card driver both Windows and Linux had troubles even scrolling a window, it was very laggy. Why was unaccelerated VGA (or VESA?) mode so slow, I wonder?

I haven't looked into the details on Linux but on Windows the "default" VESA/VGA mode uses the VGA BIOS, whose code is run in 16-bit VM86 mode:

https://wiki.osdev.org/Virtual_8086_Mode#Usage

http://nuclear.mutantstargoat.com/articles/pcmetal/pcmetal04...

The 16-bit code is not optimised for speed (is your framebuffer more than 64K? It's probably doing bankswitching and only copying 64K at a time), and the video BIOS itself may reside behind a slow serial interface on the GPU[1] --- so executing from it is very slow.

[1] https://en.wikipedia.org/wiki/Serial_Peripheral_Interface_Bu...

vsampath · on June 18, 2018

On a modern system (Windows 7+, Linux wth GRUB 2.0 bootloader) all of the drawing is done without the VBIOS; the VBIOS is just used to set up the mode.

The VBIOS is a 16-bit x86 application, but it does not run on these systems in vm86; reason being that mode is not supported in 64-bit long mode.

When the VBIOS does execute, it never does from Flash. SBIOS copies into system memory first.

Source: I’ve worked on VBIOS

raverbashing · on June 18, 2018

> and don't even use video memory for them, they just copy data from buffer in main memory into a memory-mapped framebuffer. This must be slow

No, it is as fast as it gets and it's the correct way of doing it (in fb at least).

Vendan · on June 18, 2018

Linux frame buffer is perfectly capable of doing 2 pages and "flipping" without memcpy. Just have it initialize a double height virtual screen, and then use a panDisplay IOCTL to swap between them.

onirom · on June 19, 2018

Tried to implement page flipping today and performances are terribly low on the Raspberry PI due to direct draw calls to the video mapped memory... so you need to do a single memcpy to avoid multiple write to the video mapped memory, this result in the initial behavior.

Vendan · on June 20, 2018

Generate single frame, copy to "off page", flip. This prevents tearing.

onirom · on June 21, 2018

Yes, this is actually how it work however page flipping mechanism require a memcpy, as such it is as effective as not using page flipping and just memcpy a front buffer to display and doing software flipping through pointers exchange, the only advantage of using page flipping is that it require much less operations and prevent tearing but performance wise, it is roughly the same.

raverbashing · on June 18, 2018

I see your point. You still need to copy the data to the buffer.

But yeah, don't expect smooth animations without a dual buffer.

biscuitNotchips · on June 18, 2018

Sadly linux framebuffer is still inferior in comparison to Windows framebuffer

I'm running Windows 7 with a 14 yo gfx card (ATI Radeon 9250 - DirectX 8.1) along with Windows 2000 drivers in compatibility mode and if you exclude some minor tearing everything else is perfect.

Same configuration but this time in linux, on various distros, from the lightest to the heaviest and the results are the opposite. heavy screen tearing and choppy scrolling everywhere, and the weird thing is that i see more Graphic Features enabled in the chrome://gpu/ in linux than i see in windows on this ancient card, yet the overall performance is inferior in linux.

dingdingdang · on June 18, 2018

My experience is the exact same, I so want to be able to have a snappy low latency desktop experience in Linux and it is just not happening. Today I run a "lightweight" Ubuntu Mate workstation with an i7 + dedicated GPU and it's still worse latency wise than my Windows XP PC was more than 15 years ago. Not good. Really not good.

ddalex · on June 18, 2018

This is not the experience I'm getting. I'm running XUbuntu Bionic with i3, and the experience is butter smooth; makes Windows feel laggy and MacOS slow as molasses

on the other hand, there are no nice animations, transitions and 3D effects like shadowing; just pixels on the screen; however everything updates instantly.

pjmlp · on June 18, 2018

It even appears to have gotten worse with time.

My only Linux system is a netbook with an AMD APU DX11 class, it used to work quite well, however after the AMD driver split performance was never as good and having back video hardware decoding required me to force enable it.

ChuckMcM · on June 18, 2018

"...produce fullscreen pixels effects easily with non-accelerated framebuffer ... the initial target platform is a Raspberry PI 3B"

Of course the Raspberry Pi 3 B actually has a GPU but it is so entangled with crap that someone invests their time in building a sub-par visual experience. That is so very sad.

The computer business sucks so much these days.

stefan_ · on June 18, 2018

The RPi GPU and display subsystem have fully open source drivers in Linux and userland (Mesa). If you are using the old blobs it's out of ignorance.

kokada · on June 18, 2018

Is "parralel" a real word in English? I think the author meant "parallel"?

I am asking this because everywhere in the README.md and even the source code itself uses "parralel", so this typo seems to be proposital. The author is either using a uncommon way to write the word parallel, is not a native speaker or there is some other reason.

onirom · on June 18, 2018

A late night mistake that is mostly fixed. (and there are some other typo errors as well, will check it later on)

Thank you!

kokada · on June 18, 2018

BTW, the project itself seems interesting. However it may be strange for someone to use this library because you need, for example, to do a:

    #define FBG_PARRALEL

For it to work, and for some people this typo will be infuriating or distracting. I recommend the author to fix this typo.

webscalist · on June 18, 2018

Someone forgot to enable ML spell check for linter. With big data trained model, linter should be able to catch this.

yorwba · on June 18, 2018

A simple dictionary-based spell checker would be enough. Even vim ships with one of those.

Immortalin · on June 18, 2018

I think parent was being sarcastic

codetrotter · on June 18, 2018

Someone do a PR to him

quadcore · on June 18, 2018

A suggestion for the benchmarks: add one where the screen is filled with a solid color. I was getting 175fps on 1280x768 with a 1.5Ghz AMD back in 2003.

yason · on June 18, 2018

So, such a microbenchmark marks the baseline for how fast you could get via framebuffer.

That's like 6 ms per frame. If you spend a bit less than two thirds of the time in rendering you could still get 60 fps. But that's still just crazy slow.

The 175 fps at that resolution comes down to about 20-60 MB/s depending on display depth. Memory peak transfer rates were worst like that during the 1990's, then several hundreds of MB/s for regular speeds in the 2000's and are measured in GB/s these days.

As long as we can reduce framebuffer access to shared memory (to eliminate legacy transfer methods of pixels, if any) there's no reason we couldn't do significantly better. Basically the display controller is reading the memory and the cpu is writing it so we must share some of the bandwidth but still the speeds should be so high that there's absolutely no reason we should ever see visual jerkiness on framebuffer graphics, due to hardware.

quadcore · on June 18, 2018

I was intentionally keeping 'for y for x' loops.

onirom · on June 18, 2018

Did a simple clearing test with a 4 cores Raspberry PI 3B and 1280x768 resolution, i get 30 FPS.

However a single core memset (still RPI 3B) for this case is fast : 235 FPS

canadaduane · on June 18, 2018

Raspberry Pi is a good example of where this would be quite useful (as per the readme). Nice work.

s_ngularity · on June 18, 2018

The Raspberry Pi has a GPU. Why wouldn't you use accelerated graphics?

ponchotek · on June 18, 2018

Because you need to use proprietary graphics drivers for acceleration. Another reason could be that you want to write more portable code that can run in your RPi during the early stages of development and on a board with no GPU later on.

seba_dos1 · on June 18, 2018

Actually, you don't. Raspberry Pi is still a proprietary platform (you need a closed firmware blob on your SD card for it to boot), but open GPU driver is now in Mesa and in some cases works better than the old proprietary one.

fsloth · on June 18, 2018

"Because you need to use proprietary graphics drivers for acceleration."

Is there some specific scenario where this would be a problem?

jwilk · on June 18, 2018

Wait, is 320×240 at 42 fps supposed to be impressive?

AHTERIX5000 · on June 18, 2018

Yeah that is actually extremely slow unless the time goes into calculating buffer contents and in that case it isn't really a benchmark of an FB implementation but something else.

onirom · on June 18, 2018

The time goes into calculating a fullscreen pixels-based effect (so yeah; buffer contents), the point of this library is not the FB implementation which is a simple / bare-metal one.

The point is to render graphics content in a consistent way across multiple CPU cores but still remain lightweight enough to be used with your own implementation, this might change but that was the initial goal.

aparashk · on June 18, 2018

I would have thought the vector capabilities of ARM and Intel architectures would have been useful here, in addition to any multi-threadedness. They can be used to add fast GPU like functions, e.g. alpha blending.

gameswithgo · on June 18, 2018

handmade hero did a software renderer using SIMD instructions with very good results:

https://www.twitch.tv/videos/8349645

MayeulC · on June 18, 2018

How does it compare to llvmpipe, for instance? I expect llvmpipe to be able to use neon (vector extension on ARM) instructions, as well as similar SIMD mechanisms on other platforms.

chme · on June 18, 2018

Interesting. But in a modern Linux graphics library I would expect the usage of libdrm to paint instead of the /dev/fb interface.

CharlesMerriam2 · on June 18, 2018

Hmm.. 1. This is being announced on GitHub, a Microsoft entity; 2. there are typos everywhere in the documentation; 3. the code is generally undocumented.

This is a single author's work to write a demonstration piece. Odd that is made the top five items on HackerNews.

saagarjha · on June 18, 2018

You mean, like essentially every other personal project that ends up on Hacker News because it's cool?