Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Interactive examples for learning jq (ishan.page)
271 points by ishandotpage on Nov 8, 2023 | hide | past | favorite | 96 comments


For me, bash and jq are, literally, the opposite of riding a bicycle. It doesn't matter the amount of time I spend on a given week working with them, a month later, I am gonna have to skim through my bookmarks and Kagi results (and now also chatGPT) for knowing how to do stuff I was easily doing a month ago.


I also observed this when using most cli tools… I think it’s a common problem for tools you have to reach for a couple times a month/quarter (versus programming language when you’re coding almost everyday)

My solution was literally to create Anki cards every time I discover a neat feature that I might not remember but it would be useful too. I just go through it once a day for 10 minutes (my anki cards) and it works like a charm. My memory for various cli tools has drastically improved. Rarely do I need to reach for Google, man docs or ChatGPT for most cli tools usages. I’d recommend spaces repition for cli tools



Thanks for adding the link!


I do similar things with Obsidian (markdown).


I get Bash, but for jq I found that my small fusillade of Anki flash cards was more than enough to get a fingertip feel for its syntax. Amazing what 50 flashcards of jq (or awk, or sed, or regexes, or any DSL really) gets you in the long run.


I wrote my response about using anki cards for this right before seeing your comment ! :D

Happy to see others have been doing that too!


It's more important to understand the possibilities than remember the details. Details can always be quickly looked up, as long as you know what to look for and can conceptualize which tools to combine to achieve a goal.


Agreed. Somethings I accept I will never mentally memorize nor muscle memory memorize. JQ is one of them. So this is a great reference.

I will never truly memorize how to use this because it's not my primary goal, nor is it the end product to process data.

Rather, it is a means to a means to a means to an end.


If I find myself struggling with a task I’ve done a handful of times, I just make a page for it in obsidian with the snippet I need and an explanation of how it works.


I save snippets in a markdown file for that reason.


ChatGPT is a great UI to both though


I always struggled with understanding JQ. Each time I was just googling things. But actually it does make a lot of sense if you understanding the building blocks. I wrote it all down [1] but here is my summary:

jq lets you select elements like it's a JavaScript object using dot notation and array indexing.

    jq '.key.subkey.subsubkey'
    jq '.key[].subkey[2]'
You can turn wrap things in array constructors, or object constructors to create new objects and lists:

    jq '[ .[].key ]'
    jq '{key1: .key1, key2: .key2}'
You can combine filters with pipes (|) to build complex transformations. Built-ins like map() and select() are useful for transforming arrays.

You put it all together into something like this:

    curl https://api.github.com/repos/stedolan/jq/issues | 
      jq 'map({title: .title, labels: .labels}) | 
       map(select(.labels)) |
       map({issue: .title}) |
       sort_by(.issue) | 
       [{issues: .[]}]

This query fetches GitHub issues, transforms them into a simplified structure, filters out unlabeled issues, sorts them, and wraps the results in an array - demonstrating how you can chain together jq's query language to wrangle JSON data.

[1]: https://earthly.dev/blog/jq-select/


Had to add `-L` to the curl command to follow redirects, since their API endpoints seem to have changed.


I was curious so looked up how it works before reading the summary at the end, and that led me to find another user of aioli.js jq implementation: https://jiehong.gitlab.io/jq_offline/ (featured https://news.ycombinator.com/item?id=28627172 two years ago); jqplay.org still sends all the data on every modification so they should learn from it...

Anyway, this article is neat! Good work!

If I were to nitpick one of the last examples with path has no explanation and flew over my head (would have to open the documentation), and a reset button for each example might be nice after messing with it a bit, but it was a nice play.


Hey, thanks for your kind words!

Regarding the reset button: I think that's a great suggestion and now it bugs me so much that I can't reset it. I'll add a reset button later tonight when I'm off work.

Regarding the confusing example: Yes, some of the examples are missing explanations (mainly because I spent more than a month on this post and I just did not want to put off putting it out any longer). Sorry haha. I'll try to improve the explanations and add more.


JQ is an insanely powerful language, just to put to rest any of your doubts about what it is capable of here is an implementation of JQ... in JQ itself:

https://github.com/wader/jqjq

It really is a super cool little, super expressive nearly (if not entirely) turing complete pure functional programming language.

You can:

* Define your own functions and libraries of functions

* Do light statistics

* Drastically reshape JSON data

* Create data indexes as part of you JQ scripts and summarize things

* Take JSON data, mangle it into TSV and pipe into SQLite

  cat data.json | jq '<expr>[]|@tsv' | sqlite3 -cmd ".mode tabs" -cmd ".import /dev/stdin YourTable"
And also for prototyping you can also use it to tailor output of APIs to what you need in a pinch, using JQ as a library especially with something like python:

https://pypi.org/project/jq/

As a part of the library you can compile your expressions down to "byte-code" once and reuse them.

Saying JQ is a best kept secret is an understatement. JQ gets more amazing the deeper you dig into it. Also it is kind of crazy fast for what it is.

edit: Formatting fixes


JQ + journald is great too, but 20 years of muscle memory writing bash / python / perl / awk / sql / ruby / JS / CSS selectors / xpath / xmlstarlet one-liners keep getting in my way. I keep long notes on both with examples of common tasks. I still dislike yaml (significant whitespace is my “ick” as the kids say) too much to learn whatever the equivalent is for that and still find CSV/TSV easier to slice and dice at will due to my own personal history.

I’m sure at this point that many ETL jobs in notebooks we run at $BigCo today could be reduced to jq expressions that run 100x faster and use 1/10th the memory.


The ‘nearly’ Turing complete is something I wonder about. It feels like jq might have some limitations - transformations it can’t do, due to some inherent limitation of how it handles scope or data flow. The esoteric syntax makes it hard to determine sometimes whether what you are attempting is actually possible.

As soon as jq scripts reach a certain level of complexity I break out to writing a node script instead.

And given how rapidly jq scripts acquire complexity, that level is pretty low. One nested lookup, and I’m out.


jq does often feels like a code golf language. I would say it does have some of those Perl one liner vibes, that is to say that it is often a write-only language.

Also the ‘nearly’ part is because I don’t remember if it has infinite loops or if it is more like Starlark and thus decidable. I do have vague recollections of causing infinite cycles in JQ, it quite as well could be entirely Turing complete.

So far I have not found a single task that JQ was incapable of. And I have abused it pretty bad on my spare time =], for intellectual challenge.


jq lacks coroutines, which means some tasks can be hard to accomplish in jq. It's still a very powerful language, and it is Turing complete, not just nearly.


Thanks for the jqjq shoutout! :) i'm quite sure jq is turing complete, jq (and jqjq!) can implement brainfuck https://github.com/01mf02/jaq/blob/main/examples/bf.jq


Thank you so much for piecing together a great example (jqjq) to help open everyone’s eyes that JQ is not just a JSONpath implementation with weird syntax! I often reference it to drive home the fact that JQ is a full blown language.

The brainfuck one is also gonna be going into my notes. That implementation is quite a terse implementation.


Great to hear and that was one of my hopes! but honestly it initially came to be because i was fiddling with some jq AST-tree stuff for fq :) weirdly it was much easier to implement than i expected. Hardest part was how to handle infix operators, +/- etc, parsing without infinite recision. But once i found and managed to implement precedence climbing things got a lot easier, it's still a bit of magic to me how well it works :) the eval part had some difficulties but mostly straight forward when you can piggy-back on the "host jq", but i tried to stay away from piggy-back too much, to not piggy-back at all probably requires implement a VM somehow.

BTW your very welcome to help improve the jq documentation. Me and some other maintainers have been talking about that it probably needs an overhaul to be more approachable and also better document some nice hidden features. Join the discord if you want!


For whatever reason jq is one tool that I simply can never remember the syntax for. It's ChatGPT every time for me. I just can't remember the specifics of how it differs from jsonpath vs jmespath (used by AWS) .... I wish there was a way for every tool to just use jsonpath instead.


Gron( + grep) can also be a handy combination is this case.


+1 for gron. Breaking a JSON field like { "foo": { "bar": ["baz", "bop"]}} into a greppable stream like

foo.bar.[0] = baz

foo.bar.[1] = bop

and then copying and pasting the results into jq makes the whole iteration loop much much tighter.


Not a 1:1 replacement, but I created https://github.com/pacha/cels because I wanted to have a more intuitive way of working with JSON and YAML files


JSONpath and jmespath are objectively worse than the jq language though, as you can only query, but jq also allows for very powerful transformations.

As for learning it, it's the same with any tool, key is repetition and regular use.


i get that it's more powerful but I almost consider it an anti-feature. I have very good tools for doing transformations (sed,awk,perl, etc etc) the problem is they all want line oriented format and JSON breaks that. So all I want is a tool to go from json => line oriented and I will do the rest with the vast library of experience I already have at transformations on the command line.


> So all I want is a tool to go from json => line oriented and I will do the rest with the vast library of experience I already have at transformations on the command line.*

The tool for that is likely https://github.com/tomnomnom/gron It's probably the best tool to go back and forth easily between json and line oriented.


`jq -c paths` will show you all the paths in a JSON text. You can totally use jq in a very gron-like manner. Try it.


We need to write a new tutorial.

The first and foremost thing to know about jq is that it's built on path expressions, so the first thing to learn is how to write path expressions. Fortunately path expressions are easy in jq!

  .a    # Get the value of the "a" key
        # in the current input object

  .[0]  # Get the value of the first
        # element in the current input
        # array

  .a[0] # Get the value of the first
        # element in the array at the
        # key named "a" in the current
        # input object.
        #
        # I.e., path expressions chain:

  .a[0].b # Get the value of the "b"
          # key in ...
Things get more interesting when you see that `.[]` is the iterator operator, and that you can use it in path expressions.

Things get really interesting when you see that `select(conditional expression)` can be used in path expressions joined with `|`.

Just this can be very useful. It's also useful to know about the magic `path()` function, and `paths`, which I often use to just list all the paths in an input JSON text. Try applying `jq -c paths` to a `kubectl get -o json pods` command's output!


This is my main usage of ChatGPT :-)


One feature I found very useful when using jq in Bash is the "alternative operator" '//'.

  result=$(echo "$data" | jq -r '.optional // ""')
  if [[ -n "$result" ]] ...
feels more natural in Bash than

  result=$(echo "$data" | jq -r '.optional')
  if [[ "$result" != null ]] ...
Especially if an empty field should be handled the same way.

And when using the raw-output option it helps with the ambiguity between "null" and null.


Since it's a JSON tool, I wish it leaned more on JavaScript syntax. In this case I wish it was || or even better, ??


I suspect they didn't use || because it makes parsing easier, given jq's reliance on the pipe operator.

It doesn't use ?? because it predates that operator's introduction into the language by about 8 years


Let me drop a link to my jq zsh plug-in: https://github.com/reegnz/jq-zsh-plugin

I find the biggest problem with jq is that the feedback loop is not tight enough. With this jq-repl the expression is evaluated at every keystroke.


Nice!

Let me piggyback to mention the (neo)vim plugin I use for tightening the loop... https://github.com/phelipetls/vim-jqplay

It's great for building large complex queries that will eventually live in scripts, but your zsh plugin seems to hit a real sweet spot of fast feedback for ad-hoc queries too! Huge props!


Yeah, I tend to use that one as well, but for me it just feels 'right' as a line editor plugin. I'm running a lot of kubectl commands and for me this plugin proved to be invaluable.


https://github.com/reegnz/jq-zsh-plugin/blob/e61804e35a593ad...

zshbuiltins(1): Unlike parameter assignment statements, typeset's exit status on an assignemt that involves a command substitution does not reflect the exit status of the command substitution. Therefore, to test for an error in a command substitution, separate the declaration of the parameter from its initialization.



Nice plugin; I got it and will be using it. Browsing the code, I saw a couple of small errors; not too serious, but some error handling is incorrect. In your `jq_complete()` function, for instance, you have

    local query="$(__get_query)"
    local ret=$?
Unless the `local` assignment to `query` fails, `ret` will always be 0 regardless of the return value of `__get_query`. To fix this, you would need your first line to be

    local query; query="$(__get_query)"
and so on.


I pointed out exactly this in https://news.ycombinator.com/item?id=38188500



Nice, esp reading Calzifier’s comment above and remembering how many times I’ve cursed the JQ syntax because of quoting issues…another “trick” I’ve been using is for any non-trivial JQ filter, stick it in a file or at least a heredoc and feed it to JQ using -f for much less quote-escaping malarkey.


nice, but... I'd written something like this (as a program you pipe to, not autocomplete) before, but when there's an error, I try to show the error then the last-good-output. The reason for this is that when you're typing a complex command you want to have the json visible to guide your thinking, just displaying the error hides it.

The way I did this was to store both the last working query and the last working output, I'd only reuse it if the last working query was a prefix of the current query - that avoids the awkward case where you are deleting letters from the output, so you need an output further back in history (which I didn't store, wasn't worth the hassle)

Feature request?


Thanks for the idea, I've implemented it: https://github.com/reegnz/jq-zsh-plugin/commit/60d3b6fb3ca1b...


Nice idea! I'll look into it.


Learning jq is great however you still need to know something about every new scrap of json you feed it.

To than end I wrote a line of jq to emit every structural path from any json as a list of jq arguments.

You can use it to make queries or keep track of a documents structure.

https://github.com/TomConlin/json_to_paths


For me jq is my epitome of "When faced with a problem a programmer says 'I know I can use X' and now they have two problems"

I continually bounce off the "language/philospohy" of jq in quite embarrassing ways. Every time I go "Ah, I can use this as a reason to learn jq and half an hour lateI've written a python script to extract the data instead.


x1000 this. I find I have similar reasoning that I apply to awk. I _know_ some people get massive benefits out of using it, I just dont need it often enough to actually pick it up... GPT to the rescue I suppose


I started writing a book on jq, but realised it wasn't really enough for a full book, so put it out as a series of blog posts:

https://zwischenzugs.com/2023/06/27/learn-jq-the-hard-way-pa...

JQ really is the best kept secret in data.


Oh my - your series of blog posts come up regularly for me when googling jq things.


I'm curious about this web page: Did you compile jq targeting WASI/WASM to run an in-browser version?


I am using https://github.com/biowasm/aioli which provides a already compiled wasm jq along with all the related support code for calling it


Aha, I was wondering why the biowasm CDN suddenly spiked in usage today! :D


oopsie :D


Haha, well, we should probably talk!


I'll shoot you an email after work


Great article. Nice to have it interactive. How does it work? Do you have a terminal running somewhere or does it run in the browser?

One thing I noticed, and where I stopped continuing, is that the jump from Filtering Nested Arrays to Flattening Nested JSON Objects, is WAAAY too big. From a simple filter to triple nested filters with keywords that had no introduction in a simpler example, isn’t working for me


Seems he is using something called biowasm aioli: https://github.com/biowasm/aioli

> Aioli is a library for running genomics command-line tools in the browser using WebAssembly. See Who uses biowasm for example use cases.

https://biowasm.com/cdn/v3/jq/1.6


Exercism also has a jq track with interactive lessons : https://exercism.org/tracks/jq


It seems like jq is getting a nice boost due to how useful it is getting JSON into and out of OpenAI and LLM environments that understand jq. The big new release/relaunch shows the project is up and running again so maybe we see even more integration with Agent/Function type use cases or some pydantic-ish guardrails. Thanks for the Bookmark !

https://github.com/jqlang/jq/releases/tag/jq-1.7


Ruby’s tap method on objects is still one of my favorite features in any language due to the things it enables: https://www.rubypigeon.com/posts/object-tap-and-how-to-use-i...

Out of any language I’ve worked with, Ruby was #1 overall in being able to solve problems closest to how I thought about them conceptually if that makes sense. In other words, getting some way of doing something out of my brain and translating it to Ruby was the most seamless (I haven’t tried any real lisp in earnest), with ES6 a pretty close second now.

Even though I use it a lot professionally, I really don’t like Python much, and I like Java / Gradle less. The whole “There should be one-- and preferably only one --obvious way to do it” thing never held water in my opinion, and things like making an HTTP request then doing something useful with the output are really not fun in Python without using things like the Requests lib… even closing files or having to care where a cursor is, like what year is it again? Python doesn’t feel like a high-level language many times since it keeps forcing me to deal with minutia and doing the wrong thing by default. Then there’s the whole disappointment with PEP 582 getting rejected (check out the PDM project to see what Python packaging could have been) and I just can’t help but really despise it despite how useful it continues to be.

Sorry, just hearing the desire for anything else to be more pythonic in any way takes my brain to a dark place of PyTSD.

Hey, at least we don’t have to write TS types for JQ (yet).


For getting JSON out of LLMs, I have found the Python library [dirtyjson](https://pypi.org/project/dirtyjson/) to be quite useful.



If you have trouble remembering jq syntax (or any other weird CLIs) I'd reccomend increasing the number of lines of history stored in your shell and finding a way (I use FZF) to search through that history.

I do a quick ctrl+r, type jq, and I can find all of my JQ snippets I've used in the past couple of years. If I then type "select" I can find all of the times I've used that function, etc.

I also use it to find while loops, kubectl snippets, environment variables I exported to run a script, etc.


This made me think: If you wanted to make an 'inverted bottom-up' introduction to the suite of Unix command line tools, you could go in the direction of more-to-less-structured text formats and the common tools we use with them quite easily.

1. JSON: `curl` to get interesting JSON APIs, `gron` and `grep` to explore what's inside them, `jq` to process them into interesting formats.

2. CSV: Lots of good choices here. `xsv` is very popular but I think development ended a while back; I like the `csvkit` just because I like tabbing through the options you have here. `miller` I've heard good things about. Or go to the total opposite end of the direction, use Simon Willison's excellent `csvs-to-sqlite` in conjunction with `datasette`, and then do a foray into the many interesting things you can do in SQL.

3. Bespoke text formats - `sed`, `awk`, and possibly even Vim macros reign supreme here, along with the rest of the "standard" Unix text kit. The big benefits of introducing these last is that these tools work as a superset of many of the previous ones for added flexibility.


Jq has one of the worst, non intuitive, non self evident syntax ever devised on planet Earth. Bash's if constructs are a walk in the park compared to general jq syntax. And people try to sort out that mess... somehow people always want to climb a mountain when it's in their way or someone say that would be an achievement of some sorts...


I found Jq to be difficult to use which is why Oj, https://github.com/ohler55/ojg is based on JSONPath. There still are a lot of options but it only takes a couple of help screens to figure out what the options are.


You want to provide any examples of what you mean or just leave a statement saying "jq is the worst ever"?


If you can't recall the syntax 3 days later it's terrible. Same applies to awk, sed and the like. No matter how many times I used them for something, I always have to resort to the documentation, chatgpt or the like because I just can't remember the contrived syntax. Maybe I'm retarded somehow... ¯\_(ツ)_/¯

Python (dead simple, easy to recall):

if monkey == 'fat':

    print('happy')
vs. bash (horrible syntax pitfalls):

if [ "$monkey" = 'fat' ]; then

  echo 'happy'
fi

you gonna forget about the semicolon, the spaces around the condition and it will error out. not to mention integer comparison operators...

python also has some terrible syntax, but those are advanced things, like list comprehensions.

jq...it's the same as awk, sed, bash... hard to remember for the reasons mentioned above


I think it's quite nice and intuitive to describe a pipe of filters as: filter | filter | filter. Also note that a single filter in jq is 1:N, 1 input N output where N can be zero which is a very useful feature. To express this in python etc you would probably need nested for i in (for j in ...)) and/or also use nested yield from somehow which results in a note a very CLI/ad-hoc query friendly syntax


Aside, your post looks funny because 2-space indentation triggers code formatting (so only the body of those if statements are formatted as such, instead of the whole thing).


i think the problem is it doesn't have easy to find documentation, if any at all, that introduces it as a formal language so everything is just a very hard guessing game of easily forgettable syntax.

You know, a list of reserved words, what their functions are, how the structure works, etc ... the kind you'll see if you pick up some "intro to <language X>" book from no-starch or o'reilly or the kind that GNU Awk/sed/dc have.


I suspect Stephen made a trade off between terseness and power and it being intuitive. Part of the value of jq is that it's effectively a "small program" that can easily be piped in a one liner rather than a program that has clear English keywords as an instruction.


I don’t understand, isn’t the manpage for jq exactly that, a list of syntax constructs and reserved words (mostly builtin filters) ?

If anything, it’s _too_ formal and may be missing some example for more complex usages that combine several features.


I hadn't seen it in a long time. There's this wrong assumption that the world remains static. It wasn't like this before.


Reading through these comments here - some praise jq, others claim it is not possible to actually remember the syntax. There seems a consensus it reminds of complexity in awk, bash, sed... While I appreciate the magic behind jq, from intellectual point of view, and also as a tool, indeed - is impossible for me to remember reasonable part of it.

Interestingly I still remember most Perl5 syntax, even the crazy stuff, quite vividly, after some 6-7 years of not-writing Perl code. I wonder why - perhaps because Perl is not so complex (even the PCRE), and perhaps because one needs jq now and then, while Perl can be a primary tool for many things. Sadly, Perl is past its prime now, and there are no implications it'll ever do a comeback.


One thing that has helped me write simple/intermediate jq code is this: Imagine what the context is for your filter. Most importantly, update that context at each pipe character '|'.

On an empty command, the context is the top-level of your JSON. As you add filter stages, that context evolves.

(this really requires more explanation and diagrams than I have room for in this margin)


At some point in every declarative language’s life, so many features get bolted on to make it useful that it loses its declarative nature, at which point you might as well just use a more standard imperative language. In this case, just plain JS.

The same thing happened to HCL.


I thought it must be gojq at first, it's easy to forget about Emscripten when golang- and Rust-based WASM are gaining popularity

I looked at aoili briefly. I didn't see how to reproduce the build.


The one thing I would like to have the ChatGPT thing do for me.

(and regexes)


Looks nice! Is there something similar for yq?


    yaml2json | jq ...


This. The number of times I’ve had to mess with or write yaml and gotten some ridiculously unhelpful error because I forgot a space somewhere or needed dashes in my sub-whatever are equal to the number of times I’ve had to write yaml. I’d seriously rather work with XML than YAML. It’s worse to read and 10x worse to write than JSON for anything nontrivial. It’s the TAI64 of serialization. It’s like JSON that you have to type the spaces out to format visually yourself for each thing. I don’t understand the mindset of folks who prefer it. Quotes in yaml are as much fun as they were with mysql_real_escape_string. The only way I work with any significant amount of YAML is by converting it to json. Terrible format, 0/10, let’s just write everything in acme::Bleach since significant whitespace is so darn cool right?


I agree. I made a tool for this which can do x2y where x or y can be any of json, yaml, toml, ini, xml, html, csv, tsv

it's great for converting nearly any format to json for querying or transforming with jq

https://github.com/sentriz/rsl/


I'm aware that yq has a lot more editing features, so it's not apples-to-apples, but I'll also draw ones attention to gojq's --yaml-input flag <https://github.com/itchyny/gojq/blob/v0.12.13/README.md?plai...> (they have yaml out, also, but I use that a lot less often) allowing leveraging the same language for both input formats


pro tip: write in your favourite language and ask chatgpt/github copilot to translate it to bash/jq


Pro tips should make you faster, not slower ;)

Carting your request over to chatgpt rather than learning the basics of jq doesn't make much sense.

I basically know enough jq to traverse a document, which took a few days of muscle memory. Totally worth it. One of the best tools I've used in the last decade and I'm barely scratching the surface.


not saying you shouldn't know `curl some.json | jq -r '.someField'`, but anything beyond that is overkill unless you do hardcore bash scripting all the time. My annual bash script quota is like a couple hundred lines at most so I don't really want to learn this stuff deeply.


But if you already have it in your favorite language, why would you need to translate it to bash or jq? :)


I feel like learning something like jq is rendered almost obsolete with the onset of GPT. Spending time learning the syntax of a tool I will use one every so often seems like a waste of mental energy when I can just rely on GPT to spit out whatever I need on demand


that seems like a chicken and the egg problem though. If you knew jqs syntax well enough to not need to use chatgpt, you might use it much more often.


That's fantastic! Jq is amazingly powerful and just a little complex once you master it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: