So generally Wuffs is great and you should use it to decode your PNGs. There are some downsides: not all of the obscure bit depths and formats that PNG supports are loaded as-is, some are converted to more standard formats.
The "mango" lib [1] claims to be even faster for PNGs. Actively maintained but doesn't have as much buzz, I think the devs haven't advertised it as much on places like this.
libpng is reasonably fast, and has SIMD optimizations. Make sure to compile it with a modern CPU target.
The biggest bottleneck in PNG decoding is zlib, which is not part of libpng. There are faster inflate implementations, but nowhere near 5x.
The second slowest thing is unfiltering, but it takes only 10-20% of the decoding time, so even lightspeed implementation would make little difference.
There is possibility of a 10x difference when encoding, but that's not due to libpng being slow, but because it's possible to apply worse compression and there are dedicated crappy-but-veryfast encoders.
This is one of my favorite attempts at better programming language safety, because it compiles down to C that can then be shipped like normal C, so you don't get the ecosystem friction like with ex. Rust.
C has a lot of problems as a compilation target as well, from surprising UB (e.g. signed integer overflow) to debugging problems (e.g. #line is woefully inadequate compared to the ability to emit DWARF DIEs) to the inconvenience of setting up a toolchain for end users. To its credit, Wuffs is one of the better projects that compiles to C, because it targets a very restricted domain. But, in general, don't write programming languages that compile to C.
For many of us making a compile to c language is many times more feasible than using something like llvm. I'm not saying it's great mind you but it's probably the best thing available without a runtime.
For debugging i believe you can generate your own source maps and use gdb as a backend to talk with your custom debugger.
Of course C sucks, but since everything under the sun uses it, there's unique value in being able to make it safer without putting a whole new compiler in the process for users. Remember that time the cryptography library in Python decided to add rust? We could have avoided all that pain with wuffs.
It’s an interesting idea for sure but it isn’t a general purpose language, so the problem domains it can solve is very very different vs what Rust is trying to do.
Nigel has said that emitting "unsafe" Rust is a reasonable thing for a hypothetical WUFFS 1.0 to be able to do as an alternative to C. As with good "unsafe" Rust written by humans WUFFS would know exactly why what it's doing is fine, it's just that the Rust compiler can't necessarily see that, hence the need to label it "unsafe".
Today C makes most sense given the WUFFS language is still in flux.
What would be the primary benefit of emitting Rust rather than C? Both would be considered safe (assuming Wuffs generates correct code), and Rust could access the C code via FFI. Is there something I’m missing?
I expect that the Rust emitted by a hypothetical future WUFFS transpiler would be much easier to just drop into an existing Rust project than some C via a C FFI.
It's common for C libraries that do get wrapped today (e.g. openssl) to have a two phase wrapping, a -sys crate which turns the C into Rust C FFI and then another crate to turn the Rust C FFI into something actually palatable to ordinary people.
Nominally it can safely elide bounds checks via unsafe that it has proved are actually safe within the constraints of Wuffs, which is what it does for C (+ the language is built for more easy translation to vectorizated than something like llvm is able to do for general purpose languages).
So basically higher performance.
FFI nominally has a runtime and compile time cost - whether that matters for you in particular will depend on your needs, but being able to publish a very simple crate without a build.rs to manage can have an attraction.
The C abstract machine is slightly funkier than unsafe Rust (things like C lacking a way to do signed integer overflow without UB or needing to adhere to strict aliasing in C), so I would expect that lowering to unsafe Rust would be slightly more likely to be correct.
One benefit would be that Rust users could use Wuffs code without having to install a C compiler. Pure-Rust solutions are much more convenient in the Cargo ecosystem than wrangling -sys crates.
Probably not too much from a "final product" point of view, but using a pure Rust library is a whole lot easier than C from a faff point of view. Especially for cross-compilation.
Can Wuffs provide stronger safety guarantees than techniques like WasmBoxC?
My understanding is that compiling unsafe C to WASM and back would also guarantee safety with respect to buffer overflows, integer arithmetic overflows and null pointer dereferences.
It’s nice not annotating code to explicitly prove invariants to the compiler like you would in say Wuffs or Rust, but I suppose that’s what limits performance.
Doesn’t wasm have a memory model as well? So unless you sandbox certain parts of it you can still in theory have access across different C functions, within the same wasm module?
What seems nice about wuffs is that it has no side effects and a clear project scope. Deserialization is so riddled with severe issues that it does kind of warrant its own DSL. OTOH, some legacy formats will probably never be ported.
Yes, Wuffs can do better than WasmBoxC because it does more than sandboxing of the code. It also checks things like integer overflows which can lead to exploits that are technically not memory safety issues, but still potentially dangerous.
But the tradeoff is that you need to rewrite your code for Wuffs, while WasmBoxC can sandbox anything that compiles to wasm and prevent it from corrupting the outside, including existing code in C, C++, Zig, unsafe Rust, etc. etc.
Technically, while WASM promises you put data in and get data out, you can still have memory corruption (as it has a flat memory), so I could make a (for example) gif with some color palette, then later overflow and rewrite the palette.
Could you use this to make sure users uploading files to your website are correct (i.e only jpegs and valid image data)? But in a fast and safe way, or is this overkill?
Not sure that’s possible. I’m pretty sure it is not safe to assume „parses in wuffs“ -> „is safe in any other decoder“. I’m using wuffs to check user upload (see my recent response in another thread) but I still generate out linear RGBA and work with that. I still consider the original JPEG data hostile.
Yes, you could. But be careful to make sure that there's no more data left after the decoder finishes, because it's possible to append a ZIP file (or acropcalypse) at the end of any other valid image file data, and decoders usually stop at the end of the image and don't parse past its end, so won't complain about extra data.
I have a question about this. Why is wuffs considered to be safe? In this thread I saw a code example from cpp using wuffs, which seems to the a C library accessed from cpp in that example. Why should I trust that C library to be safe or safer than other libs?
There are PDF readers that do not support the scripting format extensions.
Note this does not prevent unscrupulous companies abusing dominant market positions to voluntarily embed machine and serial hash watermarks.
To be clear: formats like pdf, ps, webp, svg, and tiff are so badly implemented in some ecosystems... they can't _ever_ be assumed safe input formats. Thus, at some point people need to spin up an actual VM to transcode a "web" version, and scrub each stage of the rendering pipeline like a virus or header injection is already present.
"I never play where nice things are, and don't break things" (Eliza Mowry Blven, The Humanitarian Review, Volume 3, March, 1905)
I worked with TIFF pretty extensively, it's a mess but I don't see why a WUFFS TIFF codec can't be fine. What makes you say you need "an actual VM to transcode" a TIFF ?
The complex formats of tiff and tga specifications makes it nearly impossible to span all the edge-cases with unit-tests. A VM can be in a known-state snapshot, process pre/post signature logged/compared with a scripted debugger, and binary input/output stripped of non-compliant metadata/blobs at each stage of the pipeline if the process behaves as expected.
I've yet to find a better method than Honeypots to sustainably mitigate the complex leaky dependency mess on traditional architectures. It has been my experience that "all software is terrible, but some of it is useful".
It may just be my bias, but I see code smell getting worse in recent decades...
So, there's actually no particular reason and if somebody cares to write one then yup, TIFF codec in WUFFS would in fact be safer and faster than your uh, approach.
Wait, you believe that somehow one of these approaches doesn't rely on competence from programmers? How do you figure?
Have you been imagining that sandboxes are some sort of fairy dust we just stumbled onto one day, supernatural in nature and not, in fact, just software written by people you're hoping are competent and haven't left any holes?
The point was... one is testing parser/OS integrity via a debugging interface over an expectation of an unchanging emulated environment state... there is nothing particularly special about the approach. Even Qubes OS and RancherVM is not perfect in this regard friend.
Or put another way, the available attack surface of a bare-minimum fixed environment is much easier to auto-audit, than a pile of daily permuted binaries and self-delusion approach. i.e. if it fails to behave in an expected way, or is modified in any way... the host audit process doesn't have to care why or how it is broken to maintain a service queue as the guest is culled.
Perhaps I am wrong about exchanging 15% of raw performance for reliability, but things can get complicated with licenses and multiple OS specific platforms.
You seem to be getting emotional about this subject, presenting secondary and tertiary straw-man arguments. So I'm going to go eat some Cheese Goldfish crackers... and just agree that your beliefs are interesting.
There's nothing special about it, but it doesn't work especially well. This is the strategy that's blown up on Apple twice in recent years and will keep burning them.
If you're Matt Godbolt the benefits of sandboxing outweigh the cost because Matt is interested in general purpose software. But WUFFS isn't for that, as its name says it's interested in doing one particular task well.
In this deliberately limited domain, WUFFS gets to sidestep Rice's theorem altogether and just prove the software meets the semantic requirements [technically you do the proving, WUFFS just checks your work].
I hope you enjoyed your goldfish crackers but I urge you to use the right tool for the job.
"the right tool for the job" is sometimes admitting the breadth of underlying dependencies and ambiguous format specifications are unfeasible to fix with your teams time budget.
The design in question currently only processes around 1.8M large image files a day, and does not require additional work/re-implementations to support the dozens of questionable user file-formats. i.e. the plain old ImageMagick lib does most of the heavy lifting at the end.
Would I trust such a solution for something like a native client side web-browser etc... absolutely not... but for the core-bound instance overhead, the resource cost was acceptable for almost a decade of uptime on those system instances.
Use-cases are funny like that, as there is no perfect solution... but rather a tradeoff of what features get the system functional and reliable. Part of that is admitting integration of 3rd party dependencies is a long-term liability, and domain specific languages almost always fade into obscurity.
WUFFS is provably safe - that's the whole schtick. If a WUFFS kernel exists, you can assume it is safe. If it's not proven safe, it doesn't compile. The reason everyone doesn't program in WUFFS is that you have to write a proof that your kernel is safe, which takes a very very very long time.
For WUFFS the language, or for WUFFS the library, or for the WUFFS tooling today?
The clever idea is to have you the programmer in effect write a proof that your code has the desired semantic properties as part of the programming activity and so then the WUFFS transpiler is merely checking that the proof is correct.
This leverages your understanding of what you were trying to do.
If you point out some of the above has run-state in some situations... it is provably nondeterministic... and thus the assertion of correctness is utter nonsense.
Hardly a panacea for fundamentally bad designs that go back decades.
Ever seen a web-server written in postscript? Its worth a look just for the laughs.
Google was never trying to write PDF reader from scratch so they never "gave up".
They just bought foxit code to save years of development when they wanted to ship PDF reader in Chrome.
Your comment about "the only viewer that is semi-correct" is also wildly off the mark.
Parsing correctly written PDF files is hard but multiple engines can do it correctly.
Parsing real life PDFs is much harder then correctly implementing PDF spec because lots of PDFs are just broken. They generators create invalid PDF files and then PDF readers have to spend heroic efforts to somehow make sense of this brokenness. Adobe does it better than most because... well it would be embarrassing if they didn't. They invented the format, they make money from their tools, they were doing it the longest, they have the largest archive of broken PDFs for testing etc. It's hard to expect that e.g. an open-source project with one or two developers can match that.
OK. In one of my previous jobs, I needed to auto-fill PDF forms on BE (among other things with PDFs), the only thing that worked reliably across PDFs was Acrobat. I did not try SummatraPDF.
edit: it seems Summatra doesn't support PDF forms? Either AcroForms or XFA forms?
Do you ever need a JS interpreter to parse a PDF? That's horrifying.
I understand PDF has a bunch of limbs, but I always assumed the JS stuff was at least separate from the parsing. (I am familiar with the PDF format at a lower level but I never touched any of the weird features.)
I wrote an SVG that's all javascript, no elements. All the graphics are generated dynamically at runtime by the javascript. It's SVG standards compliant, but only opens correctly in browsers, not in inkscape or other desktop publishing apps.
I work a lot in OpenSCAD, and had a need to design some custom graph paper. So I found the subset of SVG which was similar to OpenSCAD. :)
Wuffs is cool, but you can get similar results writing normal C library code, compiling it into a .wasm binary via Clang, and then running the .wasm binary through the `wasm2c` tool of the WebAssembly Binary Toolkit [0]. I personally prefer this method, although Wuffs will usually produce faster code.
`wasm2c` fully implements the WebAssembly sandbox execution environment [0][1] and has the passing tests to prove it. To be a bit more specific, the .wasm binary you generate initially already has the WebAssembly semantics baked in (obviously) and `wasm2c` creates a portable C translation of the WebAssembly while also ensuring that the execution environment is sandboxed (e.g., the code traps when attempting out-of-bounds memory accesses).
This might not be what you want to hear (and I might get downvoted for it), but it’s what I consider the best answer: Implement something minimal but useful (and realistic) using both methods and benchmark them yourself.
Even if I told you some of the numbers I’ve seen in my experiments and usage, it wouldn’t be wise to trust them or let them taint your opinion.
So generally Wuffs is great and you should use it to decode your PNGs. There are some downsides: not all of the obscure bit depths and formats that PNG supports are loaded as-is, some are converted to more standard formats.
Also the Wuffs documentation is a bit hard to understand. It's a litle bit of a mission getting PNG decoding working. You can see my code for that here though: https://github.com/glaretechnologies/glare-core/blob/2c7174c...