Linux utils that you might not know

brettproctor · on May 22, 2017

Anyone here will probably enjoy checking out commandlinefu: http://commandlinefu.com

Especially looking down the list of all time greats: http://www.commandlinefu.com/commands/browse/sort-by-votes

dabber · on May 22, 2017

Thanks for the second link. I go there every few weeks but never thought to check out the top voted.

...In hindsight, of course...

The gem I just plucked out is something I've been curious about for a while but never looked up:

    CTRL-X e
    
    The shell will take what you've written on the command 
    line thus far and paste it into the editor specified by 
    $EDITOR [then run it when saved]

Similar to `fc` except you don't need to run the command before invoking the editor

vram22 · on May 22, 2017

Also works in vi mode (I think the default for bash is emacs mode - for editing commands, that is):

Run:

set -o vi

once after you log in (for ksh / bash and compatible shells only, maybe, not sure about csh), or (better) put that line in your .bashrc or similar startup file, so it runs each time you log in. (I used to use "ksh -o vi" earlier, before I knew about "set -o vi" or before it existed, but in that case, it has to be the last line in your startup file, otherwise the other lines below will not run until you exit that (sub)shell.)

Then, when typing a command at the command line, just press ESC then v ; it does the same as what you said.

You can also do ESC :q! (in the editor, if it is vi) to quit without running the command you just edited, or save the command to another file for editing later at leisure, then quit without running it right now.

In fact, "set -o vi" also enables limited editing in vi mode right on the command-line, after you press ESC - you can use the command-mode commands of vi (h, l, b, w, f, F, and more) to move around, change characters or words, can also overwrite or append or insert text, etc.

You can even use / and ? and n and N to search backward and forwards in the commmand history to find (by substring) a previous command, to edit it. Once you find the right commmand, just press v.

Great for productivity.

dabber · on May 22, 2017

Awesome, thanks for the tip!

vram22 · on May 22, 2017

Welcome :)

obeid · on May 25, 2017

Is there an equivalent to that for urls typed on the command line to be opened using whatever $BROWSER

vram22 · on May 25, 2017

start http://example.com/

works in Windows.

xdg-open [1] is for Linux.

https://linux.die.net/man/1/xdg-open

obeid · on May 26, 2017

Oh wow! Thanks. For macs you can use

  open http://example.com/

vram22 · on May 26, 2017

Good to know, thanks :)

philjr · on May 22, 2017

Bringing this on a small tangent, but I've never been comfortable with 'sudo !!' and I can't really articulate why aside from wanting to be as explicit as possible.

<up>, <ctrl-a>, type sudo and enter is nearly as quick and much more explicit for me.

scardine · on May 22, 2017

If you are a long time "vi" user, put `set -o vi` in "~/.bashrc": the sequence to issue the previous command with sudo prepended will be <esc>0isudo<enter>, probably something that is in your muscle memory already.

sevensor · on May 22, 2017

While I'm a very long time vi user, and I'm completely at home with the keybindings, I could never get into using "set -o vi". For me it falls into the uncanny valley of being quite like vi while not actually being vi.

stevenspasbo · on May 22, 2017

Both bash and zsh have the option histverify that will expand the !! like so:

> echo "Test"

> sudo !!<enter>

> sudo echo "Test"

You can then modify the command if you want to.

cromo · on May 22, 2017

My zsh is set up (don't know if this is a default or not; I'm using Oh My Zsh) to not run the command if you use any history expansions in a command, instead it'll give you a new prompt line with the substitutions already filled in. That way you can check that the command is correct before running it.

leppr · on May 22, 2017

Also <ctrl-p> instead of <up> for extra touchtype points.

Aissen · on May 22, 2017

Just look at anything in the "moreutils" package. Just great stuff. Excerpt:

- ifdata: do not parse the output of ifconfig/ip anymore. Just use this tool.

- sponge: when you need to overwrite an input file at the end of pipe. sponge will wait for the pipe to end before overwriting, preventing any data loss

- vidir: edit a given dir with your EDITOR. Awesome for mass renames/deletes.

- ts: add timestamps to a command

- parallel: C implementation of GNU parallel (in perl). Very small, very fast, just does the core (running commands in parallel), and does not try to take over xargs.

There are others of course.

vog · on May 22, 2017

Although these tools are great, I'm afraid these are of limited use, as these aren't preinstalled but must be explicitly installed.

For personal use these are great. But if you write a shell script to be executed on different machines where you can't install anything, these tools won't be available to you.

And if you don't have that requirement, i.e. you can install everything, just install appropriate Perl/Python/Ruby libraries and use a proper scripting language.

(BTW, I tend to assume that Python is installed on almost all modern Unix systems by default, hence I prefer writing portable tools in Python instead of portable Shell code.)

angry_octet · on May 22, 2017

Why assume python? It's huge compared to bash or perl. Lots of minimal configuration operational systems won't have python. More likely than go/ruby/node though.

vog · on May 22, 2017

I had the impression that more an more system/initialization scripts are written in Python, at places where shell and perl were used in the past.

And having a system using Python on startup is a pretty good indicator that Python will be preinstalled.

However, I see that the situation may be different in specialized distros.

_jcwu · on May 22, 2017

You don't need to install go. It has the runtime thing built into the compiled executable.

vog · on May 22, 2017

Same with C. However, the binaries are then architecture dependent, so not really portable either.

Nevertheless, statically linked binaries are indeed a good alternative, and provide good-enough portability in a wide range of cases.

angry_octet · on May 23, 2017

You do if you want to modify the code... Which is kind of the point.

oblio · on May 22, 2017

Is Python that much bigger than Perl?

And from your list, Go doesn't really belong there, it outputs statically linked binaries. You can just plonk them anywhere.

kefka · on May 22, 2017

If you're on RH derived systems, you're going to have a python 2.6 installed for rpm and yum to work.

This bit us in our ass with centos 6.8 , when another package required python 2.7... If you install 2.7 directly, your system fails with a thousand cuts. A chroot is needed for that, unfortunately.

Pete_D · on May 22, 2017

I had this problem recently. The solution was to install Python 2.7 as a software collection: https://www.softwarecollections.org/en/scls/rhscl/python27/

This installs it to /opt/rh, and you can run commands that need them with "scl enable python27 mything" or by sourcing /opt/rh/python27/enable in a script to set up PATH, etc.

sigjuice · on May 22, 2017

chroot is unnecessary. You just install python 2.7 in its own directory and run it from there. The system-provided python 2.6 and your desired version of python should co-exist.

As a general rule, never try to replace any program or library supplied by your distribution.

Aissen · on May 22, 2017

Most of the time, I try not to care. It's packaged everywhere, so either install it or ask the script users to do so.

When you're using Python/Perl/Ruby you'll often have to use pip/cpan/rvm for dependencies, and you're back to square one.

malux85 · on May 22, 2017

Is that really a problem anymore? Not being able to install things was common in multi-user thin-client type systems, but these days with containerisation and VMs, if you're allowed to execute a script, you're probably allowed to install new packages

No?

angry_octet · on May 22, 2017

Sadly, no.

It is often possible to install things locally, i.e. not using apt. The idea that everyone has sudo privs is unrealistic. A version of apt that added a per user subset would be very useful.

mbakke · on May 22, 2017

Nix and Guix allow unprivileged package operations.

I regularly run "one-off" stuff in a `guix environment` that includes all the required packages for that command, without polluting my user (or system) profile.

For example, I recently wanted to transfer a file over HTTP from one machine to another, and did not have the "python" executable in PATH. So I started it in a `guix environment` and "containerized" it (using user namespaces) just for show:

    guix environment --container --network --ad-hoc python -- python3 -m http.server

That starts the HTTP server from the current directory, but the process can not see anything else from the "real" system.

cyphar · on May 22, 2017

This is actually one of the things that made me work on rootless containers (https://rootlesscontaine.rs). It is a pity that everyone assumes you have sudo access to all of the machines you want to run code on.

Aissen · on May 22, 2017

There are other ways. fakeroot/fakechroot for a poor man's "container" (I cringe inside when I say that). Cross compilation + copying binaries. Or compiling on target with an appropriate ./configure --prefix.

discreditable · on May 22, 2017

One of my favorite moreutils tools is combine[1]. It allows you to compare the contents of text files with boolean operations. For example, want to know what lines are in file1 but not file2? Use:

    combine file1 not file2

1. http://manpages.ubuntu.com/manpages/zesty/en/man1/combine.1....

bbrazil · on May 22, 2017

You can use comm from coreutils for that too: `comm -1 file1 file2`

discreditable · on May 22, 2017

The comm manpage[1] says that the files have to be sorted. In combine, they do not.

1. http://manpages.ubuntu.com/manpages/zesty/en/man1/comm.1.htm...

Aissen · on May 22, 2017

Yep, but you'll have to read the man everytime to figure out if you need -1, -2 or -3, or a combination of those. combine gives you a better interface with boolean operators you know.

xkxx · on May 22, 2017

The 1st column is the lines unique to the 1st file, the 2nd column is the lines unique to the 2nd file, and the 3rd one is the lines that are not unique. -1, -2, -3 allow to disable those columns. Pretty easy to remember.

Ah, HN downvoting neutral comments again and I don't like it, but I have a policy of not upvoting anything out of pity.

Aissen · on May 22, 2017

Ahaha, then I guess I don't use it that often. I'll try to remember this mnemonic.

PS: do not worry about karma ^^

MichaelBurge · on May 22, 2017

* 'comm -3' is a quick way to do a set difference from the command line.

* Newer versions of sort have an option for running sorts in parallel. If you're using an older sort, you can split the files, sort them individually with GNU parallel, and use --merge to combine them.

* If you have scripts that read data files, process them, and output more files, consider using a Makefile.

* tmux is a good way to leave a development session on a server that you come back to at the start of the day, or that you run long-running processes in.

* Setting a soft ulimit system-wide is a good way to avoid accidentally running the machine out of memory, causing your C libraries to be paged out to disk, and usually requiring a reboot. Since it's soft, you can override it at any time.

* strace and gdb can be run on just about anything to see what it's doing. If it's an interpreted language, you can often attach to a live process, call a function to invoke some interpreted code, and not have to kill & restart the process to fix a bug or oversight. Or you can attach to a Postgres worker process to get a stack trace, which often tells you why your query is slow if it runs so long you can't get an explain analyze.

* Look in /proc to find tons of useful information about running processes and settings. For example, you can tell how many filehandles a process has open, and sometimes what file it corresponds to. Just yesterday, a grub update caused a server to boot the kernel with the command line replaced with 2 random characters and a newline, which would be harder to debug without /proc/cmdline.

* pushd and popd are useful in shell scripts to temporarily change directories without forgetting where you were.

falsedan · on May 22, 2017

> If you have scripts that read data files, process them, and output more files, consider using a Makefile.

Dear sweet hypnotoad, don't use Makefiles for this. Scripts in a pipeline are perfectly well suited for ETL. They have the advantage of using the same language as the command language (Makefile is not shell, and when it differs it's a significant surprise). Plus, you can drop a script into a dir in your PATH and it will work in any project.

Use make if you have a project which expensively computes assets and reusing partial outputs is possible.

> pushd and popd are useful in shell scripts to temporarily change directories without forgetting where you were.

I would start with

  cd - # go back to where I just came from

Which handles most of the pushd/popd use cases (outside of writing a script).

Really, there's a whole separate set of skills for effective interactive shell use & writing great bash scripts (e.g. parameter expansion and history manipulation).

nsteel · on May 22, 2017

I'd stick with pushd and popd every time, it's considerably more expressive.

falsedan · on May 22, 2017

> considerably more expressive

is that an advantage? Do you have the time to explain this a bit more?

I feel that new users need less expressiveness, to avoid decision overload, and keeping one automatic directory save point is easier to mentally manage than a stack of them.

I do recommend using pushd/popd in shell scripts (always) and interactively (if you must), but I think 'cd -' should be the first thing you introduce to newcomers w.r.t tracking working directory changes.

godd2 · on May 22, 2017

'cd -' only saves one previous directory, which is held in $OLDPWD. pushd can store an arbitrary history in its stack. And of course, $OLDPWD will change if you hop around after a cd, but the pushd/popd stack will persist.

falsedan · on May 22, 2017

That's true, but I rather would have liked to hear how the expressiveness of pushd/popd was superior to cd - (especially with respect to a newcomer internalizing all the weird coreutils & bash things).

jerf · on May 22, 2017

Speaking for myself, I'm often using pushd & popd precisely because I expect to potentially be moving around in the new directory too, and cd - will just take me to the previous dir. pushd & popd don't result in me having even a bit of stress about typing another cd command. Your mileage will vary.

That said, I don't disagree that cd - is a good introduction to the idea that there's more to the cd command than meets the eye. (I'm also a big fan of the CDPATH variable, despite its issues.)

majewsky · on May 22, 2017

Since the article mentions numfmt(1), I'm surprised that no one mentioned units(1) yet. I use it all the time to convert units.

  $ units '4123412312312 bytes' 'tebibytes'
          4123412312312 bytes = 3.7502217 tebibytes
          4123412312312 bytes = (1 / 0.26665091) tebibytes
  $ units "50 miles per gallon" "liters per 100 kilometers"
          reciprocal conversion
          1 / 50 miles per gallon = 4.7042917 liters per 100  kilometers
          1 / 50 miles per gallon = (1 / 0.21257185) liters per 100 kilometers

spleeder · on May 24, 2017

I use it as a general purpose calculator.

giis · on May 22, 2017

There is also another command 'paste'.

Example: Created two files (1.txt and 2.txt)

[efg@fedori ~]$ cat 1.txt

file1: line1

file1: line2

[efg@fedori ~]$ cat 2.txt

file2: line1

file2: line2

In order to merge the lines of these two files we can use paste:

[efg@fedori ~]$ paste 1.txt 2.txt

file1: line1 file2: line1

file1: line2 file2: line2

Also check out 'man fold'

mynewtb · on May 22, 2017

Also super handy to turn lines into columns. 'paste - - -' for example will take every three lines as single row.

j_s · on May 22, 2017

Tools like rdfind and rmlint breathed new life into my SSD by hardlinking duplicate files after video editing software helpfully duplicated huge videos on import.

https://rdfind.pauldreik.se/

https://github.com/sahib/rmlint

There are great GUI visualizers for disk space, like WizTree/Baobab; for the command line I've had success with ncdu.

https://dev.yorhel.nl/ncdu

cybersol · on May 22, 2017

Even as a decades long 'nix user, I had never used join until recently, which joins lines of two files on a common field. I used to turn to fgrep or a Python script to solve a whole class of problems join can help solve on the command-line.

minhajuddin · on May 22, 2017

I recently got to use `join`. However, without reading documentation on outer joins. I went ahead and wrote my own version:

  join -t$'\t' <(cat c <(comm -13 <(cut -f1 c)  <(cut -f1 d) | sed -e 's/$/\t/') | sort -k1,1) <(cat d <(comm -23 <(cut -f1 c)  <(cut -f1 d) | sed -e 's/$/\t/') | sort -k1,1)

Then I stumbled on an easier version with using just join

  join -a1 -a2 -o auto f1 f2

There are really too many unexplored things on the linux command line for a typical dev.

squarefoot · on May 22, 2017

Mgen and Drec. No idea if they have been included in any modern distro, or there are more modern equivalents, anyway over 15 years ago I used them on Alpha hardware under both Linux, BSD and Tru64 to test a local network for missing UDP packets now and then in a strange regular pattern. By missing packets now and then I mean figures like 5 packets every 3 millions at full speed, it wasn't something one could hope to reproduce just by pinging around, therefore we needed something to leave during night hours running and writing huge logs. Turned out that it was an interrupt triggered by the video subsystem on the Alpha machines only under Tru64 that grabbed 100% of the CPU for a very short time, but enough to make the receiving task miss a few packets. Once identified the problem I was able to reproduce and predict it with 100% accuracy.

https://www.nrl.navy.mil/itd/ncs/products/mgen

ps. ...I know, I know, but sadly no way to make them change the protocol used.

Tepix · on May 22, 2017

"cal -3" is quite useful, I use it all the time. To show the week-of-year use "ncal -w". Add "-S" if you want ncal to start weeks on Sunday (crazy!).

bartl · on May 22, 2017

Use `cal -m` if you want a week to start on a Monday. AFAIK Sunday is the (US centric) default.

bratch · on May 22, 2017

`cal` does try to use the expected first day of the week based on the locale. It seems to at least use LC_TIME from some quick local testing:

  bratch@serenity ~ $ LC_TIME=en_GB cal | head -n2
        May 2017      
  Mo Tu We Th Fr Sa Su
  
  bratch@serenity ~ $ LC_TIME=en_US cal | head -n2
        May 2017      
  Su Mo Tu We Th Fr Sa

kwhitefoot · on May 22, 2017

Why crazy?

cyphar · on May 22, 2017

A large proportion (if not the majority) of the world starts their calanders on Monday. While the ordering is arbitrary, I always thought of the weekend as a single block of days so it's odd to say that the week ends in the middle of the weekend (rather than at the end of the weekend).

coreyp_1 · on May 22, 2017

I always saw "weekend" in the same way as "book end", that is, that one is at the beginning of the week (one starting end), and one is at the conclusion of the week (the finishing end). Because our weeks come successively, the end of one is immediately followed by the beginning of the next, hence the "block" called the "weekend".

kwhitefoot · on May 24, 2017

Bookends usually come in pairs.

cyphar · on May 22, 2017

I'd never thought of it that way. I still think it's odd because book ends aren't made of chapters of a book (and weekends are days like any other).

All-in-all it's arbitrary anyway. :P https://xkcd.com/1073/

db48x · on May 22, 2017

for extra fun try watch -t -n 1 "factor \$(date +%s)"

qznc · on May 22, 2017

    alias primetime watch -t -n 1 "factor \$(date +%s)"

db48x · on May 22, 2017

Now that's a good alias

manojlds · on May 22, 2017

I thought it started with the for for a second there.

shimon_e · on May 22, 2017

  for extra in fun   
  do   
      watch -t -n 1 "factor \$(date +%s)"  
  done

smitherfield · on May 22, 2017

One note: I believe `shred` won't work as expected on a journaling file system like ext4.

Scaevolus · on May 22, 2017

No. Ext4 doesn't do data journaling by default. Even when enabled, what's written to the journal is blank data that's about to be written, not the current file's contents.

    1) write to journal "I'm going to overwrite this file with this data (zeros)"
    2) commit journal
    3) write data to file

This is typical for journaling filesystems-- step 3 can be interrupted by a crash and replayed later (by re-reading the journal).

For filesystems with CoW data (ZFS, btrfs), the in-place data will probably not be overwritten.

LukeShu · on May 22, 2017

Before evaluating the claims myself: The shred manual specifically claims that it is "not guaranteed to be effective" on "log-structured or journaled file systems", and specifically calls out ext3 in data=journal mode.

I would assume that the concern with ext3/4 in data=journal mode is that shred does not guarantee that the records of previous writes are evicted from the journal.

dom0 · on May 22, 2017

In data=journal mode, data to be written is first written into the journal. Only after the journal is flushed it will be written out to the correct location. Therefore, a crash at any time is fixed by replaying the journal forwards.

Note that the ext3/4 journal is a redo log, not an undo log. Old file contents are not copied into the journal on a write.

Thus, I don't see why shred should be less effective in data=journal mode compared to the other journaling modes.

CoW file systems are a different story. They don't allow you to overwrite physical file contents. You have to set the +C (FL_NOCOW) flag, which is, by principle, only effective for a file that does not have any contents yet. Thus, you can't set +C on an existing file and overwrite it's contents.

farhaven · on May 24, 2017

> Thus, I don't see why shred should be less effective in data=journal mode compared to the other journaling modes.

Because with data=journal, content that was previously written to a file (and made its way through the journal) might still be in there if the journal has not been replayed or garbage-collected in a while.

dom0 · on May 25, 2017

That's a good point I overlooked. Although the journal is not that big (<1 GB); so doing some extra I/O after the shred should get rid of that. When the original contents are older, then it's rather unlikely they're still in the journal.

angry_octet · on May 22, 2017

There is no reason for blocks that have been fully rewritten to be written back to the same location. In fact, it is faster to write them somewhere convenient near the write head and update the indirect block. So even though only the meta data goes through the log, block locations can change.

adrianmonk · on May 23, 2017

I don't know that I'd say there's no reason at all.

For one thing, if you have a contiguous file and you update some (but not all) bytes, putting them back in the original location allows the file to stay contiguous.

Also, if you write the data back into the original location, you don't have to update metadata such as inodes. Now, you may say that's less data, but on a spinning disc, there is some threshold below which the amount of data written doesn't matter much at all and it's the number of seeks that matters more. That is, if it's a choice between a single 50k continuous write or two separate 1k writes in different locations, the single write is probably quicker. (But this falls apart eventually of course.)

Whether these reasons are enough to prefer updating in place is another question, of course. But it's not like there isn't any benefit at all.

dom0 · on May 22, 2017

As I suggested above that's an option independent of the journaling mode.

Moter8 · on May 22, 2017

and iirc on SSDs you cant really a file with random bytes. Apple removed the "safe erasure" option from Finder because they couldn't guarantee deletion.

AstralStorm · on May 22, 2017

You can, but ssds are internally log structured, you have to overwrite whole free space plus the reserve and then run a TRIM.

angry_octet · on May 22, 2017

You can't overwrite the reserve directly though. You have to write non-zero data to 99% to lock it, then destructively update the last 1% many times to cycle through the reserve.

Many disks have a full erase command, which asks the internal controller to do the erase.

In general though, I use a hammer.

hprotagonist · on May 22, 2017

>In general though, I use a hammer.

.30-06. >:)

jwilk · on May 22, 2017

In both the ‘data=ordered’ (default) and ‘data=writeback’ modes, ‘shred’ works as usual.

Source: https://www.gnu.org/software/coreutils/manual/html_node/shre...

jeroenjanssens · on May 22, 2017

    $ curl -s http://datascienceatthecommandline.com | pup '.sect3 > h3 text{}' | head
    alias
    awk
    aws
    bash
    bc
    bigmler
    body
    cat
    cd
    chmod

aktau · on May 22, 2017

TIL about pup(1) [1], what a fantastic tool to grab some data from an HTML page. Written in Go too, so easy to deploy. That one's going in my toolchest.

[1]: https://github.com/ericchiang/pup

dkns · on May 22, 2017

My favorite: basename

Returns just the filename plus extensions e.g.

basename a/b/c/d/foo.php returns foo.php

Very useful for shell scripting.

chriswarbo · on May 22, 2017

It can also strip off a given suffix, if it appears:

    $ basename a/b/c/d/foo.php .php
    foo

The complement to basename is dirname:

    $ dirname a/b/c/d/foo.php
    a/b/c/d

_0w8t · on May 22, 2017

With shell one can just use "${path_var##*/}" to get the base name from the string stored in $path_var. This also works if $path_var is just a base name already without any slashes.

EDIT: fixed the substitution.

josefx · on May 22, 2017

> With shell one can just use "${path_var##*/}"

The reason I use python whenever I need something "shell" like. Cryptic symbol salad that makes code golf fanatics green with envy.

RijilV · on May 22, 2017

I was really happy about tac when I found out about it, same with paste and comm as already mentioned.

Gotten a lot of mileage out of the various seldom used options of uniq, diff and cut too.

VSpike · on May 22, 2017

I have a friend who uses `tac | tac` in pipelines to make the pipeline wait for all the data before continuing, which is an interesting hack.

cnvogel · on May 22, 2017

If you install the "moreutils", there's sponge doing the same.

https://joeyh.name/code/moreutils/ https://linux.die.net/man/1/sponge

lucb1e · on May 22, 2017

tac, but also rev. I can't think of an example right now, but that's been useful as well.

aidos · on May 22, 2017

I often use rev if I need something from the end of log lines. Rev, cut field by delimiter, rev back

tduk · on May 22, 2017

Don't forget the often overlooked "apropos"!

dwe3000 · on May 22, 2017

My problem with apropos (and man, for that matter) is related to how at least one major distribution handles the man pages. Rather than bundling the man pages with the applications, the man pages for many applications are bundled together in one base package. When this is done regardless of whether or not the application is actually on the system, apropos and man continually return information on applications which are not on the system.

nerflad · on May 22, 2017

"Google for manpages" as it's called... Especially helpful on a system without an internet connection :)

hawski · on May 22, 2017

In my opinion it sucks. It is possible that I suck, but I almost never can find what I'm searching for with it.

Now I found that full text search is man -K, but it searches through sources so it can also be quite useless. Is there a desktop full text search for man pages? It should use rendered man pages. It could be a fun project. Google doesn't count and I use it already.

nbanks · on May 22, 2017

Although I frequently use cal, the article mentioned cal -3 which will be useful near the end of each month. Previously I always displayed the calendar for the whole year or next month instead.

Going through the core utils manual [1], I found a couple other useful looking commands I didn't know about. Previously I used awk in bash scripts primarily for printf, but apparently it's part of coreutils too. The timeout command also looks useful.

1. https://www.gnu.org/software/coreutils/manual/coreutils.html

AstralStorm · on May 22, 2017

And the columns line has the fan favourite useless use of cat.

[0] http://porkmail.org/era/unix/award.html

hellomynameis12 · on May 22, 2017

GNU utils

xkxx · on May 22, 2017

I like how this article's title doesn't claim that it's about utils "that you don't know". Yet I was expecting utils that have something to do with the Linux kernel, not generic *nix command-line tools.

throwaway2048 · on May 22, 2017

should be noted the factor program on most systems is not accurate!

https://marc.info/?l=openbsd-cvs&m=147272695616766&w=2

https://marc.info/?l=openbsd-cvs&m=146826185625800&w=2

Hey its part of bsdGAMES not bsdGETSHITDONE :)

everybodyknows · on May 22, 2017

Which xyz-file buried in the code base has that "abc" function I saw last month?

locate /\*xyzGlobPattern | xargs grep -E '\<abc[(]'

vdm · on May 22, 2017

ag -G [file pattern] [text pattern]

sigi45 · on May 22, 2017

I'm using ack for that. Works very well.

_pmf_ · on May 22, 2017

My favorite unexpected tool: tsort for performing topological sorting (i.e. sorting a dependency graph by order of dependencies).

hprotagonist · on May 22, 2017

afaik, `shred` doesn't do anything useful on SSDs. Beneath the OS, there's intentional indeterminacy on sector access, so instructions to write over the same sectors multiple times with garbage aren't guaranteed to actually do that.

sherincall · on May 22, 2017

The reason shred writes "multiple times with garbage" is due to magnetic disks where even when overwritten, old bit values could still be extracted with advanced tools. This is not the case for SSDs, so a single pass of shred (either random or zero) is enough here. Multiple passes will just kill your SSD faster.

The problem with sector access still stands, so you're not sure it's actually overwritten. You should use sfill(1) in addition to shred. sdmem(1) also, if you don't want to physically power down your machine.

DanBC · on May 22, 2017

>The reason shred writes "multiple times with garbage" is due to magnetic disks where even when overwritten, old bit values could still be extracted with advanced tools.

People thought that you needed multiple overwrites for traditional drives but in reality that data was gone after a single overwrite of 0. There were no "advanced techniques" that could recover the data. Some specs suggested multiple passes, but this was because they were using precautionary principle.

> This is not the case for SSDs, so a single pass of shred (either random or zero) is enough here. Multiple passes will just kill your SSD faster.

SSDs do weird things with the data, so if your data is important enough you should destroy the drives. There's some data tucked away in odd places.

marcosdumay · on May 22, 2017

Hum, no. Some time ago you needed multiple uncorrelated overwrites to hide your data. Then as HDDs shrank you strted needing less and less passes up to today that you just need to clear it once.

It's not about people thinking unrealistic things. The command was created for defending against a demonstrated attack.

DanBC · on May 23, 2017

> Some time ago you needed multiple uncorrelated overwrites to hide your data.

You really didn't. It's a really persistent myth that there's some secret technique to recover overwritten data. For PC hard drives "blobby bits" has never been a thing. It might have been a thing on 1970s style 24" disc platters, but we're not talking about those.

No one has ever recovered data from a drive that's had a single overwrite of zeros - no software claims to be able to do this, no recovery service claims to be able to do this, there are no published papers that claim to be able to do this (there's one that gets an accuracy of 50%-55% per bit, ie useless).

Since the early 1980s a single overwrite of 0 has been sufficient to destroy data.

And if you're worried about a well funded government agency that has the money to spend on exotic techniques you can do a single pass with random data, or NIST 7 passes, or you destroy the drive.

hprotagonist · on May 22, 2017

Yeah. You'd be amazed what you can do with a scanning electron microscope. (ie, read individual bit states off after a rewrite.)

DanBC · on May 23, 2017

Someone would have done it by now if it was possible, so where are the published papers?

It's not possible, and it's never been possible unless we're talking about 1970s 24" platters, and we're not talking about those.

hprotagonist · on May 24, 2017

http://escholarship.org/uc/item/26g4p84b#page-5

marcosdumay · on May 22, 2017

> afaik, `shred` doesn't do anything useful on SSDs

It does effectively shred your SSD. In that it will fail sooner and sooner the more you run that command.

SSDs have a sector erase command. I don't know how it's exposed on the CLI, but it exists and will effectively erase anything that was removed. Most of them will run it automatically on the background after you delete stuff, so it's normally just a matter of keeping it powered for a while.

ivanstojic · on May 22, 2017

It always pains me when I see people use "cat" left and right, even when they don't need it.

This makes for good reading: http://porkmail.org/era/unix/award.html

noja · on May 22, 2017

It shouldn't. It's 2017 now, and the speed difference between a "misplaced" cat command is normally negligible. The speed of the terminal user is more important.

eeZah7Ux · on May 22, 2017

Also, it makes it easier to add new commands that are not able to read from file in the existing pipeline.

AstralStorm · on May 22, 2017

It is slower to type too. Just write the command with the file name already.

anc84 · on May 22, 2017

If you are just doing one line and you write it perfectly, then maybe. For many people cat is simply the start if excessive piping. I like having the filename as far at the beginning as possible so I can Ctrl-W with ease.

xfs · on May 22, 2017

Using `cat` is wasteful in scripts but not in command line. It separates filenames, which are often actually glob patterns taking some time to confirm, from the parameters, which also often take some time to look up in the manual. `cat` and a pipe make it easier to edit in the command line.

lucb1e · on May 22, 2017

Once I saw this notation, I immediately fell in love:

    <input.txt  tr -d \\n  >output.txt

flukus · on May 22, 2017

<file-that-doesnt-exist.txt less

You get a blank screen, but if you do:

cat file-that-doesnt-exist.txt | less

You get a nice error message:

cat: 'file-that-doesnt-exist.txt': No such file or directory

Tepix · on May 22, 2017

Works for me:

$ < not_existant_file less bash: not_existant_file: No such file or directory

I just tried this on bash 4.3. Cat remains superfluous in this case.

flukus · on May 22, 2017

Just tried it again and you're right it does produce an error, I was sanitizing a real error I just had though. With the real one I had an unescaped space, cat mentioned that it couldn't open "both" files, in just gave me a message about and "ambiguous redirect"

partycoder · on May 22, 2017

Many people chain ps and grep... pgrep does that

laumars · on May 22, 2017

That's not a great example as `pgrep` is a little more nuanced than running `ps [options] | grep [string]`, eg you cannot use many of the same `ps` flags `pgrep` like you can with `ps`.

At least with "in appropriate use of cat" (as some call it) you're literally just swapping the stdin file stream with a disk io file stream so there's no functional difference what-so-ever.

I'm not saying I agree with the GP either though as most of the time complaints about "in appropriate use of cat" are just showboating. Using `cat` "inappropriately" is arguably more readable for less seasoned shell script developer and it's certainly a more logical program flow for a human to parse. ie "open file, grep for contents, do something else, etc". But it's still sometimes worth a reminder that many string processing tools can accept file input directly without the need for piping it via stdin (or the files can be redirected directly from the shell via the less than, `<`, token).

MichaelBurge · on May 22, 2017

> it's certainly a more logical program flow for a human to parse. ie "open file, grep for contents, do something else, etc".

If you mean adding a 'cat' at the start to provide a clear entry-point for the data, you can actually put the redirect at the beginning:

< example.txt grep cheese | tac > cheesy_lines.txt

laumars · on May 22, 2017

Indeed, but as I said, the cat complaint tends to be a practice mostly used by those less experienced in the command line so they are probably unaware of `<` let alone that it's placement.

lathiat · on May 22, 2017

ps | grep | grep -v grep

amirite?

pgrep often fails me I think mostly for apps the edit their cmdline and it needs to whole word match I think?

stevekemp · on May 22, 2017

You can avoid literal matches like so:

    ps | grep [g]rep

Though it is still a bit terrible.

sigi45 · on May 22, 2017

Show sorted list with human readable file/foldersize:

du -hs * | sort -h

effie · on May 22, 2017

Tha shows only some files. To list all files, use:

du -hs * .* | sort -h

cyphar · on May 22, 2017

Or if you (like me) hate depending on globbing semantics:

$ find . -mindepth 1 -maxdepth 2 -print0 | xargs -0 -- du -hs

Not that I would write it in a shell. ;)

alex_duf · on May 22, 2017

I really like the command "join" as well

lathiat · on May 22, 2017

I needed column yesterday. TIL

Also paste

umanwizard · on May 22, 2017

As much as people make fun of him for caring so much about a seemingly minor point, I've come around to understanding RMS's frustration about people calling all of the GNU project "Linux". This sort of title drives home why: neither Linus nor anyone else on the Linux project had anything to do with any of this.

kemitche · on May 22, 2017

If someone wrote an article titled "Windows utils that you might not know," I would not expect it's content to be solely applications written by Microsoft.

flukus · on May 22, 2017

With WSL (and cygwin) this could be windows utils you may not know.

snakeanus · on May 22, 2017

I think a better title would probably be "Common Unix utils that you might not know" or "GNU Coreutils that you might not know" (as some of the things that he is using seem to be coreutils-only?)

foolfoolz · on May 22, 2017

and on the contrary: aside from the title, is anything in this article specific to Linux?

ekianjo · on May 22, 2017

Actually not, since these GNU tools could be used on BSD as well.

ahoka · on May 22, 2017

[flagged]

cyphar · on May 22, 2017

Many GNU utilities provide extensions above and beyond the incredibly limited POSIX spec or more featured BSD, Solaris, etc utilities.

Though a better title for this particular post would be "POSIX" utilities.

> Don't attribute everything to that bearded madman.

And yet the bearded madman has done a lot more than you probably have.

vertebrate · on May 22, 2017

Why does one use `font-weight: 200` for code?

pacaro · on May 22, 2017

It's to keep the page weight balanced. Prose is considered to have neutral buoyancy, but code is naturally heavier and will tend to drag the page down if that isn't compensated by a lighter font. FWIW poetry can be so light, that it can require a very heavy font weight to keep it tethered to the page.

vdm · on May 22, 2017

jurgemaister · on May 22, 2017

To shreds, you say?

sabujp · on May 22, 2017

factor is buggy :

factor 100000000000000000000000000000000000000

100000000000000000000000000000000000000: 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5

http://www.wolframalpha.com/input/?i=factor+1000000000000000...

also numfmt bug:

$ numfmt --to=iec 999999999999999999939999999 828Y

$ numfmt --to=iec 999999999999999999940000000 numfmt: value too large to be printed: '1e+27' (cannot handle values > 999Y)

but 999999999999999999939999999 is ~1000YB : http://www.wolframalpha.com/input/?dataset=&i=99999999999999...

barrkel · on May 22, 2017

You got the same answer from factor and Wolfram alpha; where is the bug?

sabujp · on May 22, 2017

wa shows 76 prime factors, which is the not same as printing out a bunch of 2's and 5's

thehobgoblin · on May 22, 2017

WolframAlpha lists the factors as 2^38 * 5^38, which is just a compact notation for 38 2s and 38 5s, totalling 76 factors. If you count the 2s and 5s in the output of factor, you get the same result of 38 each for a total of 76 factors.

godd2 · on May 22, 2017

100000000000000000000000000000000000000

== 10 ^ 38

== (2 * 5) ^ 38

== 2 ^ 38 * 5 ^ 38

== 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5 * 5

Are 2 and 5 not prime?

sabujp · on May 22, 2017

hrrm but bc -l shows :

999999999999999999939999999 / 1024^8 827.18061255302767482177

cyphar · on May 22, 2017

The output of the error is not using IEC units -- YB not YiB (IEC units have a lowercase "i" when written). So it's referring to yottabytes not yobibytes.