Fq: Jq for Binary Formats

rahimiali · on Dec 23, 2021

It's interesting to see how they introduce a new binary format in their catalogue. I was expected to find a domain specific language to define the grammar of binary bitstreams, maybe as a context free grammar. Instead, they built a nice library of routines that helps them design custom parsers by hand for each new format.

mongol · on Dec 23, 2021

I wonder if it would support also non-binary formats. The tool could evolve to handle json, yaml, xml, ini etc...

wwader · on Dec 23, 2021

Hi, fq actually do support json so other similar text formats could work the same way. But it's currently implemented in a big hacky way, it's just a big blob that happens to be work as normal JSON. I've done some attempts at implementing it as a normal fq decoder but it's hard to figure out a way to represent the whitspace between values etc, ends up very clunky or not very user friendly. Any suggestions are very welcomed.

zx2c4 · on Dec 23, 2021

Relatedly, check out GNU Poke: http://www.jemarch.net/poke

pabs3 · on Dec 23, 2021

Also Kaitai Struct:

https://kaitai.io/

And the other things mentioned in the fq README:

https://github.com/HexFiend/HexFiend https://github.com/binspector/binspector

pdpi · on Dec 23, 2021

It’s interesting how people work. Seeing that section of the readme with a laundry list of alternatives made me want to try fq even more. It tells you that the author actually cares about the problem space.

wwader · on Dec 23, 2021

Hi! yes i'm very interested in binary analysis and decoders in general and fq was not built with the intention to compete or replace anything. I usually use fq together with lots of other tools, they all fill different purposes. The more the merrier!

pabs3 · on Dec 23, 2021

The listing of alternatives in the README really should be standard practice for open source projects. OTOH, some maintainers don't like to do that when they haven't evaluated the projects. Perhaps they could still add them with a disclaimer though.

dloss · on Dec 23, 2021

Nice! Some other tools and parsers: https://github.com/dloss/binary-parsing

wwader · on Dec 23, 2021

Lots of tools i didn't know about, thanks

pabs3 · on Dec 23, 2021

You may want to rename that awesome-binary-parsing, having awesome in the URL helps in some circumstances.

userbinator · on Dec 23, 2021

For something that is supposed to be an analog of jq, there is a notable omission from the list of formats: ASN.1.

dikei · on Dec 23, 2021

Can't blame him. ASN.1 is one of the most complicated binary formats, with so many encoding rules that there's no free decoder that can process them all.

XAMPPRocky · on Dec 23, 2021

Shameless plug, but you may be interested in my library (which is MIT/Apache-2.0) that offers decoding from BER/DER/CER all from a single model in code, there's no UPER/APER support at the moment, but it's coming in the next few months. :)

https://github.com/XAMPPRocky/rasn

cryptonector · on Dec 24, 2021

You mean DER, not ASN.1. ASN.1 is just notation for describing data. There are many encoding rules for ASN.1, including ones based on XML and JSON.

chii · on Dec 23, 2021

Except for interoperability with existing systems, is there any reason why anyone would use this ASN.1 protocol/format?

What's it good for?

dikei · on Dec 23, 2021

In my opinion, avoid ASN.1 if you can.

There's a reason why all the cool companies invented their own serialization formats: Google's Protobuf, Facebook's Thrift, etc.. even when ASN.1 had been an international standard for years: It's too complicated.

p_l · on Dec 23, 2021

The big part of the reason is combination of NIH with bad reputation mostly related to X503 and such rather than anything else - hard to advocate for it when the main library you can point to is OpenSSL, and most commonly known encoding is DER (which has certain implementation complexity, effectively being sorted BER, which has certain important value in cryptography).

Both Protobuf and Thrift evolved from RPC systems that possibly started out too simple for ASN.1, combined with above issue where good tools were probably commercial and expensive (FWIW, my experience also suggests that Thrift is shitty rpc system, compared even to Sun/ONC RPC, but maybe things changed)

IshKebab · on Dec 23, 2021

No. It mostly exists so that people who haven't tried to use it can tell other people that they should have used ASN.1.

NavinF · on Dec 23, 2021

ASN.1 is incredibly good for one use case: As a cautionary tale against design by committee.

mtve · on Dec 23, 2021

BER/DER/PEM encodings are mostly quite simple, have very few subtle details.

dikei · on Dec 23, 2021

Yeah, those are the easy one. I worked in Telecom, and dealing with unaligned PER is a PITA.

wwader · on Dec 23, 2021

Hi, here is issue related to this where i explain a bit what would be required https://github.com/wader/fq/issues/20 and how protobuf support currently works.

naikrovek · on Dec 23, 2021

note that the tool linked is for binary files. ASN.1 is text, isn't it?

stormbrew · on Dec 23, 2021

I assume they meant the binary encodings of asn.1 like BER/DER.

crehn · on Dec 23, 2021

And tangentially PEM, which is Base64-encoded DER.

fanf2 · on Dec 23, 2021

That’s what I thought until recently, but it turns out that PEM refers to just base64 wrapped with ——BEGIN—— and ——END—— lines, and the encapsulated data does not have to be DER.

https://datatracker.ietf.org/doc/html/rfc7468

lindig · on Dec 23, 2021

I’d like to see support for FIT files as they are emitted by Garmin fitness devices. It’s a clever binary format that in-stream defines the format of records which then contain the actual measurements which may be scaled for more compact representation. These multiple layers make the format not obvious to parse but the tool supports already an impressive list of formats that probably use similar techniques.

arendtio · on Dec 23, 2021

I am alternating between WOW and wtf. Pretty cool stuff.

I just wonder how on earth you want to be able to support all the binary formats out there. I mean, jq supports json, not all structured text data, like json, xml, csv, ini, ...

endymi0n · on Dec 23, 2021

Honestly, it all makes sense: The plugin system and open source nature makes it really easy to write a definition for the file format you want to work on, which will not just leverage the whole ecosystem, but benefit everyone.

This is one of the seriously great ideas where I‘m thinking: How didn‘t anyone come up with that before?

code-faster · on Dec 23, 2021

https://github.com/tyleradams/json-toolkit

Convert json <-> xml, csv, yaml, logfmt

So to support all formats, you write a binary <-> json converter.

luto · on Dec 23, 2021

apparently they went ahead and implemented a bunch of them in Go: https://github.com/wader/fq/tree/master/format

wwader · on Dec 23, 2021

Hi, i can give some background how i ended up with go instead of using something more declarative. Maybe 1.5 years ago i start to prototype different approaches for what query language to use (sql, jsonpath, my own basic jq version and few more) and what language to implement decoders in (lisp, kaitai, tcl, "scripted" go, normal go and some more). What i found was that for my use cases, detailed parsing of big media files, anything scripted was just too slow. I did look into translating kaitai etc into something compiled which would probably be fast, but next on my list was i wanted was to be able to select and decode subformats in quite complicated ways (like mp4 samples), flexible ways of demux and join blob to decode, calculate checksums, samples counts in various way. All felt clunky or hard to fit into a purely declarative description. But i was also biased towards go as i had good experience using it and know that it would probably be fast enough (turn out smart memory usage is probably the main speed factor for fq when you keep track of lots of things). Also it would provide good tooling like IDE support, refactoring (gopls gofmt -r, rf) and it's a reasonably strongly typed language i think. Last but not least the quick build times really fits my way of working, usually use lots of watchexec etc. For query language i didn't prototype much, i know i really wanted jq as i had already used it extensively and know it was very powerful and had a terse syntax when working with structured data. I had some ideas of maybe using the C-version of jq via bindings or somehow let fq be tool that you used like this 'fq file | jq ... | fq' but it just felt strange and not very user friendly. Then i found gojq and i just felt that i have to make it work somehow, even if it would require lots of hard work and change to it (see https://github.com/wader/gojq/commits/fq, the JQValue change it probably to most interesting and support or custom iterators/functions that has been merged). And it turned out much better than i would expected, large parts becuse gojq's code is very nice and author has been very helpful. There is more things i would like that talk about but i think this is long enough for now :)

But all that said i think you could use kaitai or something similar together with fq:s decode API if you want. I also have some ideas and plans on supporting writing deocders in jq, hopefully will get some time for that next year.

arendtio · on Dec 23, 2021

Thanks for the extensive reply. I also had some good experience with Go so far, so I can understand how you came to that point ;-)

lido · on Dec 23, 2021

> This project would not have been possible without itchyny's jq implementation gojq.

Another approach is to take the convert binary to object part of your code, output that as JSON on stdout and feed that into jq.

Basically, a binary front end + jq = fq

donio · on Dec 23, 2021

It would be hard to get the full fq functionality that way. How would you encode the data in a way so that you can do both:

    .frames[100].header.sample_rate

for the individual field and

    .frames[100].header|tobytes[:0x10]

for the first few bytes of the entire header structure?

Or decode a binary slice as a particular format:

    tobytes[0x234:0x325]|avc_sps.max_num_ref_frames

cglong · on Dec 23, 2021

I wish this was only the binary front end so I could pick my parser (e.g. PowerShell). I see fq seems to support sending the whole JSON to stdout; I wonder if there's a way to make this the default behavior:

    # JSON for whole file
    fq tovalue file

wwader · on Dec 23, 2021

Hi, i wrote a bit about this in my reply above https://news.ycombinator.com/item?id=29661575

jrockway · on Dec 23, 2021

I wrote a small script to convert CSVs to JSON strictly to use jq on the output. Querying things like your GCP bill with jq is quite enjoyable.

gojq is also nice. I work with a lot of structured logs and wrapped jq with a little bit of format-understanding and output sugar to make looking at and analyzing such logs an enjoyable experience: https://github.com/jrockway/json-logs

darrenf · on Dec 23, 2021

> I wrote a small script to convert CSVs to JSON strictly to use jq on the output

Note that you can use jq to consume simple CSVs (and produce them) without anything else. There’s an entry in the cookbook wiki https://github.com/stedolan/jq/wiki/Cookbook#convert-a-csv-f... - I posted some usage examples a few months back https://news.ycombinator.com/item?id=27379423

awild · on Dec 23, 2021

Miller Csv can process json in record format and has a much saner DSL in my experience.

https://github.com/johnkerl/miller

rav · on Dec 23, 2021

Does this support out-of-tree format decoders? From an initial glance it looks like all decoders are in-tree and written in golang. We have a lot of internal binary formats at $WORK that I would like to use this on...

wwader · on Dec 23, 2021

Hi! yes it's kind of support but in a very go:ish at the moment. You can use fq as submodule, import/register your own format decoders and then run cli.Main. More or less what https://github.com/wader/fq/blob/master/fq.go does. I have private version of fq for work with some proprietary formats that does this and it works great. One issue is that the decoder and format API might change, not sure i can give any stability guarantees atm and i want to evolve a bit more. Also it would be great to be able to hook into existing formats more in some way.

In the future i hope to support writing decoders in jq and or support some declarative format like kaitai.

masklinn · on Dec 23, 2021

Seems unlikely since the decoders are defined not just in-tree but in "host" code, the definitions are neither data-driven nor a DSL.

So it would require some sort of native (Go) plugins system, which I understand is about as bad as in Rust owing to there being no standard ABI (or plugins system for that matter).

therefore the way to have bespoke / internal formats would be to maintain an internal fork of the tool.

akavel · on Dec 23, 2021

"(...) some sort of native (Go) plugins system (...)"

See: https://pkg.go.dev/plugin

AzzieElbab · on Dec 23, 2021

I am sure it is awesome but the name of this utility is somewhat unfortunate

CyberShadow · on Dec 23, 2021

This related project, on the other hand, embraced it (for better or for worse):

https://github.com/jzelinskie/faq

pokstad · on Dec 23, 2021

Don’t see the issue. I naturally pronounce this “eff-queue”. You’ve really got to work hard to make it vulgar.

zeroimpl · on Dec 23, 2021

I’d have gone with bq myself

fullstop · on Dec 23, 2021

I know that jokes don't go over well here, but...

I'd make an improved one, one which is better. And I'd name it bbq.

hinkley · on Dec 23, 2021

And I will write a fuzzer for bbq called omgwtf.

dinvlad · on Dec 23, 2021

D'oh! Big ol BigQuery CLI would like to have a word

snidane · on Dec 23, 2021

then bbq.

coldtea · on Dec 23, 2021

Only in puritan cultures

AzzieElbab · on Dec 23, 2021

Or passive aggressive…

wwader · on Dec 23, 2021

:) i didn't choose it to be provocative or so, apologies if that is the case. I've always pronounced jq yay-queue so fq is eff-queue for me. Also f and q can be written with one hand on qwerty which is nice and quick

aaron_m04 · on Dec 23, 2021

Yeah, two syllables is too long. Let's just sound it out...

philsnow · on Dec 23, 2021

... 'feek' ?

If it were 'fk' sure, but the Q on the end makes me think of all the English words that come from French and end in 'ique', like technique. 'fq' looks like 'feek' to me.

blondin · on Dec 23, 2021

this is quite an interesting project! combining kaitai structs or similar with the command line.

however, i am a little disappointed that the jq syntax was chosen. jq has a very non-intuitive syntax. there are more intuitive query syntaxes out there. (linq or even basic sql come to mind.)

wwader · on Dec 23, 2021

Yes i can empathize with finding jq hard to understand, it's quite different and took a while to grasp. The reason i choosed it anyway was that after prototyping some common type of queries i would like to do (basic value access in deep structures, multiple recursive traverse with filtering, transform objects and arrays) in various languages, jq was more or less then only one that felt terse enough. Also i think it's quite nice that you can output to JSON and then load into whatever language or environment you want Maybe there are some alternatives i should look at?

nine_k · on Dec 23, 2021

Linq is a bit more wordy, and SQL is sadly not very composable.

nitrogen · on Dec 23, 2021

This looks incredible. I'm on my phone so I haven't tried this, but it looks like this supports slicing into MP3 bitstreams? That would have saved me a month of research and tons of development back in 2013.

wwader · on Dec 23, 2021

Hi, it depends a bit, if the mp3 stream uses bit reservoir it might be tricky to to "pure" remuxing with any tool. fq:s mp3_frame decoder do try to know what bits are parts of the current frame or part of a future frame but not sure how much that helps. If the stream does not use reservoir you should be able to slice using fq '.frames[100:200][]' file.mp3 > sliced.mp3 or something similar.

Cayde-6 · on Dec 23, 2021

Not to be confused with https://github.com/circonus-labs/fq, the message queue.

chris_wot · on Dec 23, 2021

So is this meant for any binary formats?

kitd · on Dec 23, 2021

Supported formats:

https://github.com/wader/fq/blob/master/doc/formats.md

They should probably make this a bit more prominent. It's an impressive list for a new project.

heinrich5991 · on Dec 23, 2021

Permanent link (press 'y' on any Github link): https://github.com/wader/fq/blob/eb4a6fdbd6ef3a09fc59802e96e...

wwader · on Dec 23, 2021

Thanks! any suggestion how to make it more prominent?

linuxdude314 · on Dec 23, 2021

Of course not. It can only work on formats that the team head written parsers for.

chris_wot · on Dec 23, 2021

I ask because I’d be interested in helping write an EMF+ filter

gnubison · on Dec 23, 2021

Interesting project. Unfortunate that its name conflicts with one of nq’s executables (https://github.com/leahneukirchen/nq), but I’m not sure anything can be done about it.

benatkin · on Dec 23, 2021

IMO ones that only have one non-prefixed executable take precedence over ones that don't, even when the one with multiple non-prefixed executables is older.

leshenka · on Dec 23, 2021

This looks like wireshark's panel for inspecting packets

hawk_ · on Dec 23, 2021

It says it supports protobuf. Is there a protobuf file format i.e. for multiple records or do they mean a single protobuf record file?

sandermvanvliet · on Dec 23, 2021

Typically people separate protobuf messages by a length value of the next message. Perhaps that’s what they did.

Also protobuf can contain embedded messages or even just the binary representation of a list of embedded messages.

wwader · on Dec 23, 2021

Hi, currently the protobuf support can either decode the wire-format or in some cases a format decoder uses protobuf as subformat and passes it a "schema" so it can do some more fancy decoding. But yes it would be interesting adding support for reading protobuf schemas somehow.

Cloudef · on Dec 23, 2021

I wrote protobuf parser in ragel for work.. its still used to replace reflection as c++ protobuf implementation explodes our binaries to huge sizes.

ape4 · on Dec 23, 2021

Hopefully json will be superseded someday. Cool tool.

wwader · on Dec 23, 2021

Interestingly fq works by being kind of a superset of JSON/jq. It has types that can behave as jq values when needs but then with special functions or key accessors can be something else.

buryat · on Dec 23, 2021

why not bq?

pxeger1 · on Dec 23, 2021

Well fq too!