Hacker News new | past | comments | ask | show | jobs | submit login
Fq: Jq for Binary Formats (github.com/wader)
450 points by philosopher1234 on Dec 23, 2021 | hide | past | favorite | 81 comments



It's interesting to see how they introduce a new binary format in their catalogue. I was expected to find a domain specific language to define the grammar of binary bitstreams, maybe as a context free grammar. Instead, they built a nice library of routines that helps them design custom parsers by hand for each new format.


I wonder if it would support also non-binary formats. The tool could evolve to handle json, yaml, xml, ini etc...


Hi, fq actually do support json so other similar text formats could work the same way. But it's currently implemented in a big hacky way, it's just a big blob that happens to be work as normal JSON. I've done some attempts at implementing it as a normal fq decoder but it's hard to figure out a way to represent the whitspace between values etc, ends up very clunky or not very user friendly. Any suggestions are very welcomed.


Relatedly, check out GNU Poke: http://www.jemarch.net/poke


Also Kaitai Struct:

https://kaitai.io/

And the other things mentioned in the fq README:

https://github.com/HexFiend/HexFiend https://github.com/binspector/binspector


It’s interesting how people work. Seeing that section of the readme with a laundry list of alternatives made me want to try fq even more. It tells you that the author actually cares about the problem space.


Hi! yes i'm very interested in binary analysis and decoders in general and fq was not built with the intention to compete or replace anything. I usually use fq together with lots of other tools, they all fill different purposes. The more the merrier!


The listing of alternatives in the README really should be standard practice for open source projects. OTOH, some maintainers don't like to do that when they haven't evaluated the projects. Perhaps they could still add them with a disclaimer though.


Nice! Some other tools and parsers: https://github.com/dloss/binary-parsing


Lots of tools i didn't know about, thanks


You may want to rename that awesome-binary-parsing, having awesome in the URL helps in some circumstances.


For something that is supposed to be an analog of jq, there is a notable omission from the list of formats: ASN.1.


Can't blame him. ASN.1 is one of the most complicated binary formats, with so many encoding rules that there's no free decoder that can process them all.


Shameless plug, but you may be interested in my library (which is MIT/Apache-2.0) that offers decoding from BER/DER/CER all from a single model in code, there's no UPER/APER support at the moment, but it's coming in the next few months. :)

https://github.com/XAMPPRocky/rasn


You mean DER, not ASN.1. ASN.1 is just notation for describing data. There are many encoding rules for ASN.1, including ones based on XML and JSON.


Except for interoperability with existing systems, is there any reason why anyone would use this ASN.1 protocol/format?

What's it good for?


In my opinion, avoid ASN.1 if you can.

There's a reason why all the cool companies invented their own serialization formats: Google's Protobuf, Facebook's Thrift, etc.. even when ASN.1 had been an international standard for years: It's too complicated.


The big part of the reason is combination of NIH with bad reputation mostly related to X503 and such rather than anything else - hard to advocate for it when the main library you can point to is OpenSSL, and most commonly known encoding is DER (which has certain implementation complexity, effectively being sorted BER, which has certain important value in cryptography).

Both Protobuf and Thrift evolved from RPC systems that possibly started out too simple for ASN.1, combined with above issue where good tools were probably commercial and expensive (FWIW, my experience also suggests that Thrift is shitty rpc system, compared even to Sun/ONC RPC, but maybe things changed)


No. It mostly exists so that people who haven't tried to use it can tell other people that they should have used ASN.1.


ASN.1 is incredibly good for one use case: As a cautionary tale against design by committee.


BER/DER/PEM encodings are mostly quite simple, have very few subtle details.


Yeah, those are the easy one. I worked in Telecom, and dealing with unaligned PER is a PITA.


Hi, here is issue related to this where i explain a bit what would be required https://github.com/wader/fq/issues/20 and how protobuf support currently works.


note that the tool linked is for binary files. ASN.1 is text, isn't it?


I assume they meant the binary encodings of asn.1 like BER/DER.


And tangentially PEM, which is Base64-encoded DER.


That’s what I thought until recently, but it turns out that PEM refers to just base64 wrapped with ——BEGIN—— and ——END—— lines, and the encapsulated data does not have to be DER.

https://datatracker.ietf.org/doc/html/rfc7468


I’d like to see support for FIT files as they are emitted by Garmin fitness devices. It’s a clever binary format that in-stream defines the format of records which then contain the actual measurements which may be scaled for more compact representation. These multiple layers make the format not obvious to parse but the tool supports already an impressive list of formats that probably use similar techniques.


I am alternating between WOW and wtf. Pretty cool stuff.

I just wonder how on earth you want to be able to support all the binary formats out there. I mean, jq supports json, not all structured text data, like json, xml, csv, ini, ...


Honestly, it all makes sense: The plugin system and open source nature makes it really easy to write a definition for the file format you want to work on, which will not just leverage the whole ecosystem, but benefit everyone.

This is one of the seriously great ideas where I‘m thinking: How didn‘t anyone come up with that before?


https://github.com/tyleradams/json-toolkit

Convert json <-> xml, csv, yaml, logfmt

So to support all formats, you write a binary <-> json converter.


apparently they went ahead and implemented a bunch of them in Go: https://github.com/wader/fq/tree/master/format


Hi, i can give some background how i ended up with go instead of using something more declarative. Maybe 1.5 years ago i start to prototype different approaches for what query language to use (sql, jsonpath, my own basic jq version and few more) and what language to implement decoders in (lisp, kaitai, tcl, "scripted" go, normal go and some more). What i found was that for my use cases, detailed parsing of big media files, anything scripted was just too slow. I did look into translating kaitai etc into something compiled which would probably be fast, but next on my list was i wanted was to be able to select and decode subformats in quite complicated ways (like mp4 samples), flexible ways of demux and join blob to decode, calculate checksums, samples counts in various way. All felt clunky or hard to fit into a purely declarative description. But i was also biased towards go as i had good experience using it and know that it would probably be fast enough (turn out smart memory usage is probably the main speed factor for fq when you keep track of lots of things). Also it would provide good tooling like IDE support, refactoring (gopls gofmt -r, rf) and it's a reasonably strongly typed language i think. Last but not least the quick build times really fits my way of working, usually use lots of watchexec etc. For query language i didn't prototype much, i know i really wanted jq as i had already used it extensively and know it was very powerful and had a terse syntax when working with structured data. I had some ideas of maybe using the C-version of jq via bindings or somehow let fq be tool that you used like this 'fq file | jq ... | fq' but it just felt strange and not very user friendly. Then i found gojq and i just felt that i have to make it work somehow, even if it would require lots of hard work and change to it (see https://github.com/wader/gojq/commits/fq, the JQValue change it probably to most interesting and support or custom iterators/functions that has been merged). And it turned out much better than i would expected, large parts becuse gojq's code is very nice and author has been very helpful. There is more things i would like that talk about but i think this is long enough for now :)

But all that said i think you could use kaitai or something similar together with fq:s decode API if you want. I also have some ideas and plans on supporting writing deocders in jq, hopefully will get some time for that next year.


Thanks for the extensive reply. I also had some good experience with Go so far, so I can understand how you came to that point ;-)


> This project would not have been possible without itchyny's jq implementation gojq.

Another approach is to take the convert binary to object part of your code, output that as JSON on stdout and feed that into jq.

Basically, a binary front end + jq = fq


It would be hard to get the full fq functionality that way. How would you encode the data in a way so that you can do both:

    .frames[100].header.sample_rate
for the individual field and

    .frames[100].header|tobytes[:0x10]
for the first few bytes of the entire header structure?

Or decode a binary slice as a particular format:

    tobytes[0x234:0x325]|avc_sps.max_num_ref_frames


I wish this was only the binary front end so I could pick my parser (e.g. PowerShell). I see fq seems to support sending the whole JSON to stdout; I wonder if there's a way to make this the default behavior:

    # JSON for whole file
    fq tovalue file


Hi, i wrote a bit about this in my reply above https://news.ycombinator.com/item?id=29661575


I wrote a small script to convert CSVs to JSON strictly to use jq on the output. Querying things like your GCP bill with jq is quite enjoyable.

gojq is also nice. I work with a lot of structured logs and wrapped jq with a little bit of format-understanding and output sugar to make looking at and analyzing such logs an enjoyable experience: https://github.com/jrockway/json-logs


> I wrote a small script to convert CSVs to JSON strictly to use jq on the output

Note that you can use jq to consume simple CSVs (and produce them) without anything else. There’s an entry in the cookbook wiki https://github.com/stedolan/jq/wiki/Cookbook#convert-a-csv-f... - I posted some usage examples a few months back https://news.ycombinator.com/item?id=27379423


Miller Csv can process json in record format and has a much saner DSL in my experience.

https://github.com/johnkerl/miller


Does this support out-of-tree format decoders? From an initial glance it looks like all decoders are in-tree and written in golang. We have a lot of internal binary formats at $WORK that I would like to use this on...


Hi! yes it's kind of support but in a very go:ish at the moment. You can use fq as submodule, import/register your own format decoders and then run cli.Main. More or less what https://github.com/wader/fq/blob/master/fq.go does. I have private version of fq for work with some proprietary formats that does this and it works great. One issue is that the decoder and format API might change, not sure i can give any stability guarantees atm and i want to evolve a bit more. Also it would be great to be able to hook into existing formats more in some way.

In the future i hope to support writing decoders in jq and or support some declarative format like kaitai.


Seems unlikely since the decoders are defined not just in-tree but in "host" code, the definitions are neither data-driven nor a DSL.

So it would require some sort of native (Go) plugins system, which I understand is about as bad as in Rust owing to there being no standard ABI (or plugins system for that matter).

therefore the way to have bespoke / internal formats would be to maintain an internal fork of the tool.


"(...) some sort of native (Go) plugins system (...)"

See: https://pkg.go.dev/plugin


I am sure it is awesome but the name of this utility is somewhat unfortunate


This related project, on the other hand, embraced it (for better or for worse):

https://github.com/jzelinskie/faq


Don’t see the issue. I naturally pronounce this “eff-queue”. You’ve really got to work hard to make it vulgar.


I’d have gone with bq myself


I know that jokes don't go over well here, but...

I'd make an improved one, one which is better. And I'd name it bbq.


And I will write a fuzzer for bbq called omgwtf.


D'oh! Big ol BigQuery CLI would like to have a word


then bbq.


Only in puritan cultures


Or passive aggressive…


:) i didn't choose it to be provocative or so, apologies if that is the case. I've always pronounced jq yay-queue so fq is eff-queue for me. Also f and q can be written with one hand on qwerty which is nice and quick


Yeah, two syllables is too long. Let's just sound it out...


... 'feek' ?

If it were 'fk' sure, but the Q on the end makes me think of all the English words that come from French and end in 'ique', like technique. 'fq' looks like 'feek' to me.


this is quite an interesting project! combining kaitai structs or similar with the command line.

however, i am a little disappointed that the jq syntax was chosen. jq has a very non-intuitive syntax. there are more intuitive query syntaxes out there. (linq or even basic sql come to mind.)


Yes i can empathize with finding jq hard to understand, it's quite different and took a while to grasp. The reason i choosed it anyway was that after prototyping some common type of queries i would like to do (basic value access in deep structures, multiple recursive traverse with filtering, transform objects and arrays) in various languages, jq was more or less then only one that felt terse enough. Also i think it's quite nice that you can output to JSON and then load into whatever language or environment you want Maybe there are some alternatives i should look at?


Linq is a bit more wordy, and SQL is sadly not very composable.


This looks incredible. I'm on my phone so I haven't tried this, but it looks like this supports slicing into MP3 bitstreams? That would have saved me a month of research and tons of development back in 2013.


Hi, it depends a bit, if the mp3 stream uses bit reservoir it might be tricky to to "pure" remuxing with any tool. fq:s mp3_frame decoder do try to know what bits are parts of the current frame or part of a future frame but not sure how much that helps. If the stream does not use reservoir you should be able to slice using fq '.frames[100:200][]' file.mp3 > sliced.mp3 or something similar.


Not to be confused with https://github.com/circonus-labs/fq, the message queue.


So is this meant for any binary formats?


Supported formats:

https://github.com/wader/fq/blob/master/doc/formats.md

They should probably make this a bit more prominent. It's an impressive list for a new project.


Permanent link (press 'y' on any Github link): https://github.com/wader/fq/blob/eb4a6fdbd6ef3a09fc59802e96e...


Thanks! any suggestion how to make it more prominent?


Of course not. It can only work on formats that the team head written parsers for.


I ask because I’d be interested in helping write an EMF+ filter


Interesting project. Unfortunate that its name conflicts with one of nq’s executables (https://github.com/leahneukirchen/nq), but I’m not sure anything can be done about it.


IMO ones that only have one non-prefixed executable take precedence over ones that don't, even when the one with multiple non-prefixed executables is older.


This looks like wireshark's panel for inspecting packets


It says it supports protobuf. Is there a protobuf file format i.e. for multiple records or do they mean a single protobuf record file?


Typically people separate protobuf messages by a length value of the next message. Perhaps that’s what they did.

Also protobuf can contain embedded messages or even just the binary representation of a list of embedded messages.


Hi, currently the protobuf support can either decode the wire-format or in some cases a format decoder uses protobuf as subformat and passes it a "schema" so it can do some more fancy decoding. But yes it would be interesting adding support for reading protobuf schemas somehow.


I wrote protobuf parser in ragel for work.. its still used to replace reflection as c++ protobuf implementation explodes our binaries to huge sizes.


Hopefully json will be superseded someday. Cool tool.


Interestingly fq works by being kind of a superset of JSON/jq. It has types that can behave as jq values when needs but then with special functions or key accessors can be something else.


why not bq?


Well fq too!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: