Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Tuc – When cut doesn’t cut it (github.com/riquito)
241 points by riquito on June 13, 2022 | hide | past | favorite | 115 comments
Announcing `tuc`, a utility similar to coreutils `cut`, but more powerful. It allows to split text or bytes into parts and reassemble them in any order.

I always found `cut` very practical for some tasks where `sed` or `awk` were overkill or awkward to use, but I also felt the need for more features.

Some key differences from `cut`: - parts can be referenced by negative indexes - delimiters can be any number of characters long, or match a regex - can split text into lines, and reassemble them



I've found this seriously cool!

While not my primary role at my job, I often find myself dealing with lots of disparate data sets, usually needing to do some sort of manipulation, cleaning, searching, etc. Every now and then encounter something like this and it seems to me that there are potentially a nice set of command line tools/utilities that I should be adding to my belt. Anywhere I should particularly start taking a look? Like, if my goal is to because much better at wrangling CSV/text-delimited files, searching across folders of docs for numbers, etc., where is my first entry point into trying to become much more proficient at it?


Here's a recent gateway post: https://jvns.ca/blog/2022/04/12/a-list-of-new-ish--command-l...

There are even more. :)


I do a lot of that sort of thing, and my go-to tools are grep (with full regular expressions), sort, uniq, head, tail, sed, and, the 200 pound gorilla (not as heavy as it could be) awk. Then if all else fails, Python


I highly recommend you learn the basics of AWK, which is a little language that lets you match on and run commands on every line of input (so AWK "scripts" are usually one-liners). The very basics can be learn in 5-10 minutes, the whole language in a couple of hours (read the first three chapters of "The AWK Programming Language" by A, W, K.)

For searching recursively through folders, I recommend ripgrep (rg), though it won't do number conversions and the like.

Self-promotion (but I think it's useful here) is my GoAWK project, which is a POSIX-compatible version of AWK, with the recent addition of proper CSV support: https://benhoyt.com/writings/goawk-csv/


Why would you do a posix compat awk? I loved awk, but it needs to be updated adn refined into a better language instead of just recreating the old one.


Compatibility: so it can run all the existing AWK scripts without modification, and people can refer to existing AWK documentation when writing scripts, and they'll just work under GoAWK.

For a 45-year old language, it's actually surprisingly good. There are a few annoyances and clumsy things (getting substrings is a function call without nice str[start:end] syntax, regex matches are harder than they need to be, string concatenation syntax is weird, arrays aren't nestable (except in Gawk), and so on).

I've actually thought it'd be great to "fix" these things and make a nicer AWK language, but the problem is then no one would use it. :-)


fd, bat, fzf are good starting points

also mdfind if you're on macos

I'm hesitant to go fully into "new" stuff so I can maintain skills on random *nix boxes I need to ssh into.


Try jq for parsing/selecting JSON data.


Nice, especially the format output.

See also:

* hck (https://github.com/sstadick/hck) - close to drop in replacement for cut that can use a regex delimiter instead of a fixed string

* rcut (https://github.com/learnbyexample/regexp-cut) - my own bash+awk script, supports regexp delimiters, field reordering, negative indexing, etc


I am in love with this trend of replacing old school unix utilities with new rust projects that stay just as fast or faster, but increase the usability or feature set tenfold. rg, fd, exa, bat, and now tuc.


The problem I see is that I don’t see those consolidating anytime soon into a new set of ubiquitous “core” utilities that can be expected to be available everywhere.


I guess it's just up to someone to set up the package on debian that contains all of these


That doesn't solve the "available everywhere" problem. It potentially would make the tools easier to install on Debian-based systems, if you have root access. I'm not sure any new set of tools will ever be available with the ubiquity of coreutils in the next handful of years, if ever.


I agree. If these could be made installable without root, and independent of the current version of glibc, then there is huge potential to replace the older tools. I’d love to use things like fd, but they don’t work on older servers without root and a newer version of glibc


They're already installable without root. Just put them in ~/.local/bin. Most Rust binaries are portable and only dynamically link libc.


Local installation is still not really a practical solution when you work e.g. on customer machines, on machines of another team, or generally if you work on many different machines. You still need to be aware of the standard tools and know how to use them when needed.

I mean, I get it — I used to locally install vim on machines that only had vi, to make my muscle memory be functional when editing files. But it’s not the same as a core tool just being available by default so you don’t have to ever concern yourself with any alternatives.


That’s what I initially tried for fd, but unfortunately glibc is required for it, and probably other utilities as well.



What would prevent us from adding those to a docker file?


Docker not being installed on remote


This is why I like that you can "go run" golang programs without installing them. Anywhere golang is installed it will automatically build cache and run the binary for whatever platform you are on.

For example

   go run sigs.k8s.io/kind@latest create cluster
I wonder if something like this exists for rust


Absolute no possible exploit path there at all


Unless you are reviewing the source of everything and compiling it all yourself, then you're executing someone else's binary.


Is it really an exploit when you're explicitly running code that you expressly trust (even if you probably shouldn't necessarily)?


Isn't an exploit explicitly a subversion of trust?

If you shouldn't trust it but you do anyway and someone uses that against you - they exploited your trust.


That is nice, but it still requires installation of golang which is obviously something that can't be guaranteed.


“cargo install” is the Rust equivalent.


There’s moreutils for an older rethink of the same set of utilities. I don’t see why an evenmoreutils wouldn’t eventually become popular enough to take hold? Probably not as quickly as you’d like in today’s world of instant gratification, but we’ll get there, eventually.


moreutils is still nowhere near as ubiquitous as coreutils. For me it's not about gratification. It would be more satisfying to make use of the most modern tools available. It's about making sure stuff works and I find sticking to coreutils is the easiest way to do that most of the time.


Install rustup from https://rustup.rs/

    cargo install broot exa miniserve ripgrep tuc xh xsv zellij
Cargo is the package manager you are looking for.


I use Cargo regularly and rustup it does make it easy to install. That's not the point. I'm not looking for a package manger. I want to be able to write scripts that make use of tools I can pretty much guarantee are already installed.


So we aren't talking past each other, how do you delineate between cargo and a package manager? Does anaconda, pip (pypi) or homebrew also qualify?

Your scripts could have a prolog that installs rustup and calls cargo.

There is also https://github.com/ryankurte/cargo-binstall

Even after 20 years of bashing, my bash skills still suck. So many corner cases!

But if you include this, and call install_utils at the head of your scripts. It should install the tools on demand.

    #!/bin/bash

    RUST_PACKAGE_LIST='rg xsv tuc broot du-dust dutree'
    RUST_UTIL_LIST='rg xsv tuc broot dust dutree'


    function is_cargo {
        cargo > /dev/null
    }


    function is_utils {
        eval "which $RUST_UTIL_LIST" > /dev/null
    }


    function install_utils {
        if ! is_utils ; then
            if ! is_cargo; then
                curl https://sh.rustup.rs -sSf | sh -s -- -y
            fi
            eval "cargo install --locked $RUST_PACKAGE_LIST" > /dev/null
        fi
    }
I don't recommend using this, it is just illustrative.


My primary point is that if I stick to coreutils, I don't have to worry about any of this. I can reasonably expect anything in coreutils to already be installed and available. Your functions seem like they'll work just fine, but either requires build tools to be installed, or the package to support binstall. For me, thinking about all these things and maintaining something like that isn't worth it when I can just use coreutils.


I totally understand. My .vimrc has shrunk from over a hundred lines to 4 lines in the last years. But I really think you are doing yourself a disservice. It is easier than ever (homebrew, nix, cargo, pip) to get nice software everywhere.


Have you tried running them with Nix? I believe you can use most things from it without root, but I'm not sure.


I suppose I shouldn't have said only Debian. But having tools easily installable is still quite different from being able to reasonably assume they are already installed.


I wonder how well Rust project get along with Debian packaging tools.

"Modern" dev environments are often tied to their own package manager. JS has Npm, Rust has Cargo, etc... These have their own dependency managers, version systems, etc... and they don't always get along with the way Linux package managers work.

IIRC, you don't even need Cargo to do Rust, so it should be possible to compile Rust projects like you compile C projects and essentially mirroring crates.io but with .deb packages but it looks like a lot of work.


Debian packages individual cargo packages/crates as their own Debian packages, and also applications that then depend on them.

https://packages.debian.org/stable/rust/

https://crates.io/crates/cargo-deb exists but I don’t know if Debian uses it or something else.


coreutils-ng? It would be nice to have a bunch of these in one package.


It's easy to fall in love with someone who is young, hip and all that. When it comes done to work on my loved UNIX systems though, I still prefer to stay with the old-school tools given by coreutils et. al. They are a quasi-standard, I can rely on them and I always know what to expect. Better yet, I will find them on every system and can reduce my mental load to learn and internalize something new. Sure, they're not perfect, but these advantages trump the disadvantages, and it's all worked out pretty well for decades. Here, I don't have to chase the next bride.


And yet, bash eventually replaced bourne shell (by having an sh-compatibility mode), and vim has replaced vi.

If you get anywhere in the neighborhood of a proper superset of the old application, we do occasionally retire the old ones.


I can say exactly the opposite and I have the collection of shell scripts to prove it -- the newish tools work better for me when doing a ton of scripting tasks.

So maybe don't project about "hip" or "young" because it does your otherwise decent argument a disservice.


>It's easy to fall in love with someone who is young, hip and all that

>Here, I don't have to chase the next bride.

Who hurt you? :)


I use rg, exa, bat, and zellij as my "rust replaced old stuff". Zellij isn't yet as polished as I'd like it, but it's way more intuitive than tmux.


Oh god, is tmux considered “old stuff” now? I barely finished replacing my screens.


I actually started with screen, but I only used it for daemonising some foreground processes, I work in Zellij daily.


It happens to all of us. Did you know Interstellar came out eight years ago?


The names of these utilities are bad. I have literally no idea what any of them do. The same could be said for standard unix utilities, true, but they have 50+ years (in some cases) of brain bake in, and have the advantage of names that have _some_ relation to their function (ls : list files :: exa : “extract the list of files from a dirent?”)


At some point you have to accept that if you want to know something, you have to learn it. At one point, you didn't know grep, awk, sed, etc... And then you learned them.

Or you can just stick with the old tools if you prefer not learning a new thing - that's a perfectly valid option.


Does it really matter though? You can just alias them over the originals or something close-by.


Yes, you can do that. But that exasperates the system portability problem. The real solution is for some distro to, gasp, decide that POSIX compatibility can be done with utilities in /opt/posix (or something) and do widespread replacement by default. But that’ll never happen…


In your scripts you can use "/usr/bin/env cat" to get whichever version of cat is first in your PATH. NixOS abuses this to an almost silly level :)


Usually you have to install tmux on a remote machine anyway, so Zellij seems like a good one of these to try.


OpenBSD ships tmux in the base system. I would be very pleased if more systems did this.


You can say that again. After using rg for a few years now, I can't imagine not having this tool, that I use weekly, if not daily


Same here with `exa` and `bat`


Seems similar in intent to choose (https://github.com/theryangeary/choose) as a cut which doesn't suck. The features outlined are very close, I just don't understand what "can split text into lines" is, do you mean that the selected fields can be split into lines?

The main advantage of tuc seems to be "templated" outputs.


> I just don't understand what "can split text into lines" is, do you mean that the selected fields can be split into lines?

Good question, I struggle to word it properly, any help is appreciated.

Assume a file (we will call it "input"), such as

   first line here
   followed by second line
You can use a delimiter and cut inside each line

e.g.

    $ tuc -d ' ' -f 2 < input
    line
    by
or you can cut it "by lines", practically considering the whole file as your single "line" and using newline as delimiter

    $ tuc -l -f 2 < input
    followed by second line
If you want to remove a line, or keep something inbetween it can be more practical/intuitive than head/tail or sed


This feature seems more like a replacement for head and tail (and combination of both) rather than cut.

Maybe a good way to explain it would be to show how to achieve the same thing with those well known commands (comparison which should certainly be in favor of tuc ^^)

EDIT: sorry you just said more or less the same thing, I need to read better :)


You could have it do "every 5th line". Easy with awk but this is more compact


Sed is terse for that, but not very memorable...sed -n '0~5p' filename


Good tip. Thanks.


Ref. your $ tuc -d ' ' -f 2 < input how is different from $ cut -d ' ' -f 2 input ?


> Ref. your $ tuc -d ' ' -f 2 < input how is different from $ cut -d ' ' -f 2 input ?

It's not. `tuc` is a superset of cut and in that particular example there's no difference. If you wanted instead to cut on multibyte delimiters, or on a random amount of spaces, `tuc` would work, while `cut` would fall short


I was responding in good faith to your response "Good question, I struggle to word it properly, any help is appreciated." You answered masklinn's with an implementation of your tool that `cut` would also do. That is what I was highlighting. I was not trying to belittle your utility, I have even installed it. But I also don't communicate perfectly so we'll call it evens?


Don't worry, I was in good faith too, it's hard to convey some things sometime. I tried to answer masklinn's question with a simple example, didn't really think about whether it would have showcased tuc or not.

Thanks for trying it out!


It's also very similar to my tool hck (https://github.com/sstadick/hck) which is in turn similar to choose, just faster, supports compression, and supports column selection via matching headers.


This is awesome. Especially the ability to compress delimiters. I can't tell you how many times I want to grab a couple fields from the output of another command but can't get delimiters to work correctly or they're using custom spacing to align columns and it blows everything up and then I'm crying in awk land.


Just want to say thank you for the unlimited delimiters - this was something that always limited my usage of cut so just this feature alone makes tuc worth it


> cargo install tuc

Slightly off topic question about this: in Linux, are rust programs always installed like this, or should these also be made available in the regular package manager of your distro?


> or should these also be made available in the regular package manager of your distro?

Getting a package in the official repositories is quite a high bar to clear.

Plus it's... not great experience for early development, as many distros will lock in the program entirely, leaving you with a very long tail of extremely outdated installs.

So generally the expectation is that once a program is popular or desirable enough, and is somewhat stable, it gets integrated into the base repos.


Is there an equivalent for C/C++ programs?

There's pip for python, npm for JS, cargo for rust

For C/C++ all I know are multiple different possible build and make systems, but none works like a package manager, as far as I know

Note that I don't always love when something installed with pip or npm puts files all over your OS or homedir without being managed by the package manager, though


> Is there an equivalent for C/C++ programs? [...] For C/C++ all I know are multiple different possible build and make systems, but none works like a package manager, as far as I know

Downloading the source by hand, trying to wrangle what dependencies it has not vendored (which may or may not be available through your system package managers, in versions which may or may not be recent enough), and trying to find out how to build it.

Though do note that this issue can also hit when installing python, js, or rust package, if they ultimately have native dependencies. Their respective build systems will generally try to make it work out of the box, but if your configuration was not specifically tested / supported it can break with fun C-level compilation errors.


The main solution to all of this complexity is another complex (but awesomely powerful) package manager called Portage. It's mainly used in Gentoo Linux

It's awesome. And complicated.


Stop! I'm starting to miss Funtoo... And I now have all these Ryzen cores idling, longing for a world update... Must... Resist...


That, or Nix, maybe. Also awesome. Also complicated.


    ./configure --prefix=/usr/local && make && sudo make install
(I consider it a feature that this doesn't automatically download and install hundreds to thousands of things I haven't even heard of)


I mean, ignoring the fact that the configure script is often a larger program than the one you're trying to install, it (and make) can do anything to your system and unless you read the contents of each you're just taking it on faith that it isn't downloading and installing hundreds of things.


... but if I'm installing curl or jq or similar, I'm quite familiar with the provenance of the project, and of the tarball I'm running a configure script from.

And maybe I need to install one or two dependencies, similarly they should be familiar, or small and comprehensible, and only downloaded and installed with my explicit actions.

(And yeah autoconf generated configure scripts are crazy huge and baroque, and could easily be 1/10 the size for the needed functionality, but compared to "npm install" I'll take it.)


Perhaps not everything you want, but in Python land there is pipx [0]. Pipx will create a virtual environment per binary program so that they are all isolated from each other and put things in a consistent location (~/.local/pipx). Then it is easy enough to do `pipx install black`, `pipx install cookiecutter`, whatever. Also has nice upgrade option in `pipx upgrade-all`

[0] https://pypa.github.io/pipx/


Perhaps the closest I've discovered is AUR in Arch.


Yes, it's called portage and comes with Gentoo. :)


The intention of "cargo install" is to provide a quick and easy way to distribute programs useful to other Rust programmers.

In general, end users should use some other method that doesn't require having a Rust toolchain pre-installed, but doing that can take work, and so not every program pursues it.


In general you want to install it with a package manager.

(But also it's your system. What's the point of Linux if you can't do it your way)


This is great! I always felt like cut was really handicapped by lack of negative indexes.


The biggest handicap of cut for me has always been that it cannot split on blanks (TABs or SPACEs), you have to choose between TAB or SPACE. So I wrote an awk script that can print field ranges like cut, but recognizes blanks. Now I will see if I can get tuc worked into my muscle memory.


I usually just pipe through sed to normalize the separators before applying cut.


That's also a little awkward, when there could easily be an option to split by all blanks.


I’m not defending cut here, but using sed is also pretty straightforward and fits its purpose. I’d argue that using the existing general-purpose tools is better than creating custom narrow-purpose tools in simple cases like this one. Besides maintainability and familiarity, it also exercises your proficiency in applying the standard tools.


You have a point, of course.


awk can do something like negative indexes if needed:

  $ echo "a b c" | awk '{print $(NF-1)}'
  b


When I want to use negative indices, I pipe the string through rev first, then do my cut, then rev again


That's the classic solution but blows up when using multibyte characters since rev just reads the bytes in each line in reverse.


What do you mean by negative indexes?


I believe "negative index" means array[-1] is the last element in array, array[-2] is the second-to-last element, etc.

In the context of "cut", it would mean being able to do something like:

cut -d" " -f1--2

the "-f1--2" (read: fields from 1 to minus 2; it's a range) means to select from the first field to the second-to-last field. (that double "--" is pretty awkward, to be sure!)

Some programming languages (ruby is the one that I know) have this feature for accessing array elements.


This is an incredibly useful improvement over cut, thank you. The mental distance from cut to awk/sed is often just too high and having a more useful utility will drastically reduce how much I reach for those tools.


I would still reach for awk/sed, for the fact that they tend to be preinstalled. I might fall back on perl/ruby/python if awk/sed were insufficient, which also tend to be preinstalled.


I get that ‘tuc’ is just ‘cut’ backwards but I was kind of hoping it also stood for ‘the ultimate cutter’ or something like that. :)

In either case, very cool. Thanks for sharing. I’ll be using this for sure!


It'd be nice if it could split and support escapes and quoted strings. I often run into issues with things like csv where fields might be quoted or quoted strings where quotes are escaped


Doesn't seem to fix the #1 thing missing from cut: an easy syntax for splitting on whitespace sequences (rather than a single space), like awk does by default.

(I see that it support splitting on regex but I was hoping for it to be the default or a single character switch)


Isn’t that what the —-greedy-delimiter is for?


Nice, missed that, and it has a single character shorthand too: -g

Edit: or maybe not, I think I'd still have to use --regex for real whitespace sequences that can be a mix of spaces and tabs.


As you figured out, -g (--greedy) matches the same delimiter multiple times (e.g. one or more spaces). If you want to match different delimiters (e.g. a mix of spaces), one or more times, you must use -e (--regex).


Triple thumbs up for -e

Never understood why that was never added to cut in the first place.


Seems a bit of functionality overlap with lam/jot/rs; have you looked at those?

They're from BSD originally. Included in macos, and in Linux distros as bsd-utils.


In Ubuntu I can find rs and athena-jot, but not lam; bsdutils and bsdmainutils contain different tools.


I didn't know about them, I'll check them out


Did you find any limitations of `pico-args` turned into a caveat for tuc?


> Did you find any limitations of `pico-args` turned into a caveat for tuc?

Quite the opposite, early on I started with `clap`, then moved briefly to `argh` before settling with `pico-args`. Compilation time and size where the main driving factors, alonside support for non-spaced values (e.g. -d' ').

Maybe if tuc had subcommands it would have been a different story, but I didn't find enough value in more blasoned arg libraries.


Hacker News comes through again. Been looking for tool exactly like this.


This is the "cut" I've always wanted!


Thanks! I am already using it!


awesome!


this is excellent.


Awk?


[deleted]




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: