Hacker News new | past | comments | ask | show | jobs | submit login
Deserializing Binary Data Files in Rust (michaelfbryan.com)
31 points by lukastyrychtr on June 28, 2021 | hide | past | favorite | 22 comments



For such a problem, I really recommend https://kaitai.io/

From their website:

> Kaitai Struct is a declarative language used to describe various binary data structures, laid out in files or in memory: i.e. binary file formats, network stream packet formats, etc.

They even have a Rust interface: https://github.com/kaitai-io/kaitai_struct_rust_runtime


A native Rust solution, with meta-programming: https://crates.io/crates/packed_struct (written by me)


By the way there is an easier way to convert C style null terminated strings into an &str than the bespoke c_string function used in this article. You can use the CStr type that is already in the rust standard library: https://doc.rust-lang.org/std/ffi/struct.CStr.html


I really wish there was a safe way to initialize structs with arbitrary binary data in rust. Perhaps I'm missing something, but it seems intuitively safe, especially if the struct only contains integer primitives.


How would it be possible to make it safe? The compiler can't verify that N random bytes input from an arbitrary source are going to be valid for the types in the struct - you have to tell tell the compiler "trust this even if you can't verify it from the source code" - hence the unsafe keyword.



Oh cool! Thanks for the pointer.


For integer primitives, all possible values are valid. Just allowing binary initialization for structs that consist of integer types would allow compatibility with C structure. As it is, I have to either resort to unsafe code or manually deserialize every struct


Oh I see, that makes sense. Also there's a sibling reply to yours for a cool looking proposal to do just that if you haven't seen it yet.


I think c struct is missing alignment attribute? This idea has been explored in flatbuffers, capnproto and probably others.


C structs are generally assumed to be laid out in the order defined with proper padding automatically inserted (as most platforms at best dislike unaligned accesses). Alignment information should only be necessary if you want to pack the structure or need to apply wider than standard alignment (e.g. need to align a 32 or 64b type to 128 bytes).

The one possible issue is that reading padding is UB (i think). But here there should be no padding: the struct is a pair number of bytes (alignment 1) and a word (alignment 2), which us aligned due to the previous, and the struct is if alignment 2.


Wider than standard alignment can be pretty important though. Especially for usage with SIMD instructions.


Sure but that has no relevance whatsoever to the article do the objection is irrelevant.


That's his point, you want to 1-align the struct to disable the padding to ensure the size of the structure is portable.


That makes no sense? There isn’t any padding, and if the bytes are not 8 bits it’ll never work anyway.


yeah C will be order defined in code, rust will reorder to save space unless you attribute it not to. interesting trivia a blind guy wrote the reordering code. if anyone knows his name please post.


Implemented in https://github.com/rust-lang/rust/pull/37429 ; his website seems to be https://ahicks.io/


The rust structure is defined as repr(C) so there is no reordering.


Does the C spec actually specify it needs to be in that order? Been awhile since I messed that that sort of thing. I know in practice they typically just put them in the same order and align on some sort of default packing depending on CPU. Usually to minimize on extra instructions. I have eeked out some extra perf on some CPUs and structs just by moving stuff around and making sure it fits into a cacheline. Sometimes the extra instructions are worth it to unpack some vars if you can get the data into a cacheline. But that depends on your data and usage.


Yes, it does


> We get lucky here because the flags field is at offset 202

How does one calculate this offset based on the fields in the struct?


Just add all the sizes, and account for padding (there isn’t any here because everything other than the flags field is bytes arrays)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: