Thanks for the second link. I go there every few weeks but never thought to check out the top voted.
...In hindsight, of course...
The gem I just plucked out is something I've been curious about for a while but never looked up:
CTRL-X e
The shell will take what you've written on the command
line thus far and paste it into the editor specified by
$EDITOR [then run it when saved]
Similar to `fc` except you don't need to run the command before invoking the editor
Also works in vi mode (I think the default for bash is emacs mode - for editing commands, that is):
Run:
set -o vi
once after you log in (for ksh / bash and compatible shells only, maybe, not sure about csh), or (better) put that line in your .bashrc or similar startup file, so it runs each time you log in. (I used to use "ksh -o vi" earlier, before I knew about "set -o vi" or before it existed, but in that case, it has to be the last line in your startup file, otherwise the other lines below will not run until you exit that (sub)shell.)
Then, when typing a command at the command line, just press ESC then v ; it does the same as what you said.
You can also do ESC :q! (in the editor, if it is vi) to quit without running the command you just edited, or save the command to another file for editing later at leisure, then quit without running it right now.
In fact, "set -o vi" also enables limited editing in vi mode right on the command-line, after you press ESC - you can use the command-mode commands of vi (h, l, b, w, f, F, and more) to move around, change characters or words, can also overwrite or append or insert text, etc.
You can even use / and ? and n and N to search backward and forwards in the commmand history to find (by substring) a previous command, to edit it. Once you find the right commmand, just press v.
Bringing this on a small tangent, but I've never been comfortable with 'sudo !!' and I can't really articulate why aside from wanting to be as explicit as possible.
<up>, <ctrl-a>, type sudo and enter is nearly as quick and much more explicit for me.
If you are a long time "vi" user, put `set -o vi` in "~/.bashrc": the sequence to issue the previous command with sudo prepended will be <esc>0isudo<enter>, probably something that is in your muscle memory already.
While I'm a very long time vi user, and I'm completely at home with the keybindings, I could never get into using "set -o vi". For me it falls into the uncanny valley of being quite like vi while not actually being vi.
My zsh is set up (don't know if this is a default or not; I'm using Oh My Zsh) to not run the command if you use any history expansions in a command, instead it'll give you a new prompt line with the substitutions already filled in. That way you can check that the command is correct before running it.
Just look at anything in the "moreutils" package. Just great stuff. Excerpt:
- ifdata: do not parse the output of ifconfig/ip anymore. Just use this tool.
- sponge: when you need to overwrite an input file at the end of pipe. sponge will wait for the pipe to end before overwriting, preventing any data loss
- vidir: edit a given dir with your EDITOR. Awesome for mass renames/deletes.
- ts: add timestamps to a command
- parallel: C implementation of GNU parallel (in perl). Very small, very fast, just does the core (running commands in parallel), and does not try to take over xargs.
Although these tools are great, I'm afraid these are of limited use, as these aren't preinstalled but must be explicitly installed.
For personal use these are great. But if you write a shell script to be executed on different machines where you can't install anything, these tools won't be available to you.
And if you don't have that requirement, i.e. you can install everything, just install appropriate Perl/Python/Ruby libraries and use a proper scripting language.
(BTW, I tend to assume that Python is installed on almost all modern Unix systems by default, hence I prefer writing portable tools in Python instead of portable Shell code.)
Why assume python? It's huge compared to bash or perl. Lots of minimal configuration operational systems won't have python. More likely than go/ruby/node though.
If you're on RH derived systems, you're going to have a python 2.6 installed for rpm and yum to work.
This bit us in our ass with centos 6.8 , when another package required python 2.7... If you install 2.7 directly, your system fails with a thousand cuts. A chroot is needed for that, unfortunately.
This installs it to /opt/rh, and you can run commands that need them with "scl enable python27 mything" or by sourcing /opt/rh/python27/enable in a script to set up PATH, etc.
chroot is unnecessary. You just install python 2.7 in its own directory and run it from there. The system-provided python 2.6 and your desired version of python should co-exist.
As a general rule, never try to replace any program or library supplied by your distribution.
Is that really a problem anymore? Not being able to install things was common in multi-user thin-client type systems, but these days with containerisation and VMs, if you're allowed to execute a script, you're probably allowed to install new packages
It is often possible to install things locally, i.e. not using apt. The idea that everyone has sudo privs is unrealistic. A version of apt that added a per user subset would be very useful.
Nix and Guix allow unprivileged package operations.
I regularly run "one-off" stuff in a `guix environment` that includes all the required packages for that command, without polluting my user (or system) profile.
For example, I recently wanted to transfer a file over HTTP from one machine to another, and did not have the "python" executable in PATH. So I started it in a `guix environment` and "containerized" it (using user namespaces) just for show:
This is actually one of the things that made me work on rootless containers (https://rootlesscontaine.rs). It is a pity that everyone assumes you have sudo access to all of the machines you want to run code on.
There are other ways. fakeroot/fakechroot for a poor man's "container" (I cringe inside when I say that). Cross compilation + copying binaries. Or compiling on target with an appropriate ./configure --prefix.
One of my favorite moreutils tools is combine[1]. It allows you to compare the contents of text files with boolean operations. For example, want to know what lines are in file1 but not file2? Use:
Yep, but you'll have to read the man everytime to figure out if you need -1, -2 or -3, or a combination of those. combine gives you a better interface with boolean operators you know.
The 1st column is the lines unique to the 1st file, the 2nd column is the lines unique to the 2nd file, and the 3rd one is the lines that are not unique. -1, -2, -3 allow to disable those columns. Pretty easy to remember.
Ah, HN downvoting neutral comments again and I don't like it, but I have a policy of not upvoting anything out of pity.
* 'comm -3' is a quick way to do a set difference from the command line.
* Newer versions of sort have an option for running sorts in parallel. If you're using an older sort, you can split the files, sort them individually with GNU parallel, and use --merge to combine them.
* If you have scripts that read data files, process them, and output more files, consider using a Makefile.
* tmux is a good way to leave a development session on a server that you come back to at the start of the day, or that you run long-running processes in.
* Setting a soft ulimit system-wide is a good way to avoid accidentally running the machine out of memory, causing your C libraries to be paged out to disk, and usually requiring a reboot. Since it's soft, you can override it at any time.
* strace and gdb can be run on just about anything to see what it's doing. If it's an interpreted language, you can often attach to a live process, call a function to invoke some interpreted code, and not have to kill & restart the process to fix a bug or oversight. Or you can attach to a Postgres worker process to get a stack trace, which often tells you why your query is slow if it runs so long you can't get an explain analyze.
* Look in /proc to find tons of useful information about running processes and settings. For example, you can tell how many filehandles a process has open, and sometimes what file it corresponds to. Just yesterday, a grub update caused a server to boot the kernel with the command line replaced with 2 random characters and a newline, which would be harder to debug without /proc/cmdline.
* pushd and popd are useful in shell scripts to temporarily change directories without forgetting where you were.
> If you have scripts that read data files, process them, and output more files, consider using a Makefile.
Dear sweet hypnotoad, don't use Makefiles for this. Scripts in a pipeline are perfectly well suited for ETL. They have the advantage of using the same language as the command language (Makefile is not shell, and when it differs it's a significant surprise). Plus, you can drop a script into a dir in your PATH and it will work in any project.
Use make if you have a project which expensively computes assets and reusing partial outputs is possible.
> pushd and popd are useful in shell scripts to temporarily change directories without forgetting where you were.
I would start with
cd - # go back to where I just came from
Which handles most of the pushd/popd use cases (outside of writing a script).
Really, there's a whole separate set of skills for effective interactive shell use & writing great bash scripts (e.g. parameter expansion and history manipulation).
is that an advantage? Do you have the time to explain this a bit more?
I feel that new users need less expressiveness, to avoid decision overload, and keeping one automatic directory save point is easier to mentally manage than a stack of them.
I do recommend using pushd/popd in shell scripts (always) and interactively (if you must), but I think 'cd -' should be the first thing you introduce to newcomers w.r.t tracking working directory changes.
'cd -' only saves one previous directory, which is held in $OLDPWD. pushd can store an arbitrary history in its stack. And of course, $OLDPWD will change if you hop around after a cd, but the pushd/popd stack will persist.
That's true, but I rather would have liked to hear how the expressiveness of pushd/popd was superior to cd - (especially with respect to a newcomer internalizing all the weird coreutils & bash things).
Speaking for myself, I'm often using pushd & popd precisely because I expect to potentially be moving around in the new directory too, and cd - will just take me to the previous dir. pushd & popd don't result in me having even a bit of stress about typing another cd command. Your mileage will vary.
That said, I don't disagree that cd - is a good introduction to the idea that there's more to the cd command than meets the eye. (I'm also a big fan of the CDPATH variable, despite its issues.)
Since the article mentions numfmt(1), I'm surprised that no one mentioned units(1) yet. I use it all the time to convert units.
$ units '4123412312312 bytes' 'tebibytes'
4123412312312 bytes = 3.7502217 tebibytes
4123412312312 bytes = (1 / 0.26665091) tebibytes
$ units "50 miles per gallon" "liters per 100 kilometers"
reciprocal conversion
1 / 50 miles per gallon = 4.7042917 liters per 100 kilometers
1 / 50 miles per gallon = (1 / 0.21257185) liters per 100 kilometers
Tools like rdfind and rmlint breathed new life into my SSD by hardlinking duplicate files after video editing software helpfully duplicated huge videos on import.
Even as a decades long 'nix user, I had never used join until recently, which joins lines of two files on a common field. I used to turn to fgrep or a Python script to solve a whole class of problems join can help solve on the command-line.
Mgen and Drec. No idea if they have been included in any modern distro, or there are more modern equivalents, anyway over 15 years ago I used them on Alpha hardware under both Linux, BSD and Tru64 to test a local network for missing UDP packets now and then in a strange regular pattern. By missing packets now and then I mean figures like 5 packets every 3 millions at full speed, it wasn't something one could hope to reproduce just by pinging around, therefore we needed something to leave during night hours running and writing huge logs.
Turned out that it was an interrupt triggered by the video subsystem on the Alpha machines only under Tru64 that grabbed 100% of the CPU for a very short time, but enough to make the receiving task miss a few packets. Once identified the problem I was able to reproduce and predict it with 100% accuracy.
`cal` does try to use the expected first day of the week based on the locale. It seems to at least use LC_TIME from some quick local testing:
bratch@serenity ~ $ LC_TIME=en_GB cal | head -n2
May 2017
Mo Tu We Th Fr Sa Su
bratch@serenity ~ $ LC_TIME=en_US cal | head -n2
May 2017
Su Mo Tu We Th Fr Sa
A large proportion (if not the majority) of the world starts their calanders on Monday. While the ordering is arbitrary, I always thought of the weekend as a single block of days so it's odd to say that the week ends in the middle of the weekend (rather than at the end of the weekend).
I always saw "weekend" in the same way as "book end", that is, that one is at the beginning of the week (one starting end), and one is at the conclusion of the week (the finishing end). Because our weeks come successively, the end of one is immediately followed by the beginning of the next, hence the "block" called the "weekend".
No. Ext4 doesn't do data journaling by default. Even when enabled, what's written to the journal is blank data that's about to be written, not the current file's contents.
1) write to journal "I'm going to overwrite this file with this data (zeros)"
2) commit journal
3) write data to file
This is typical for journaling filesystems-- step 3 can be interrupted by a crash and replayed later (by re-reading the journal).
For filesystems with CoW data (ZFS, btrfs), the in-place data will probably not be overwritten.
Before evaluating the claims myself: The shred manual specifically claims that it is "not guaranteed to be effective" on "log-structured or journaled file systems", and specifically calls out ext3 in data=journal mode.
I would assume that the concern with ext3/4 in data=journal mode is that shred does not guarantee that the records of previous writes are evicted from the journal.
In data=journal mode, data to be written is first written into the journal. Only after the journal is flushed it will be written out to the correct location. Therefore, a crash at any time is fixed by replaying the journal forwards.
Note that the ext3/4 journal is a redo log, not an undo log. Old file contents are not copied into the journal on a write.
Thus, I don't see why shred should be less effective in data=journal mode compared to the other journaling modes.
CoW file systems are a different story. They don't allow you to overwrite physical file contents. You have to set the +C (FL_NOCOW) flag, which is, by principle, only effective for a file that does not have any contents yet. Thus, you can't set +C on an existing file and overwrite it's contents.
> Thus, I don't see why shred should be less effective in data=journal mode compared to the other journaling modes.
Because with data=journal, content that was previously written to a file (and made its way through the journal) might still be in there if the journal has not been replayed or garbage-collected in a while.
That's a good point I overlooked. Although the journal is not that big (<1 GB); so doing some extra I/O after the shred should get rid of that. When the original contents are older, then it's rather unlikely they're still in the journal.
There is no reason for blocks that have been fully rewritten to be written back to the same location. In fact, it is faster to write them somewhere convenient near the write head and update the indirect block. So even though only the meta data goes through the log, block locations can change.
I don't know that I'd say there's no reason at all.
For one thing, if you have a contiguous file and you update some (but not all) bytes, putting them back in the original location allows the file to stay contiguous.
Also, if you write the data back into the original location, you don't have to update metadata such as inodes. Now, you may say that's less data, but on a spinning disc, there is some threshold below which the amount of data written doesn't matter much at all and it's the number of seeks that matters more. That is, if it's a choice between a single 50k continuous write or two separate 1k writes in different locations, the single write is probably quicker. (But this falls apart eventually of course.)
Whether these reasons are enough to prefer updating in place is another question, of course. But it's not like there isn't any benefit at all.
and iirc on SSDs you cant really a file with random bytes. Apple removed the "safe erasure" option from Finder because they couldn't guarantee deletion.
You can't overwrite the reserve directly though. You have to write non-zero data to 99% to lock it, then destructively update the last 1% many times to cycle through the reserve.
Many disks have a full erase command, which asks the internal controller to do the erase.
TIL about pup(1) [1], what a fantastic tool to grab some data from an HTML page. Written in Go too, so easy to deploy. That one's going in my toolchest.
With shell one can just use "${path_var##*/}" to get the base name from the string stored in $path_var. This also works if $path_var is just a base name already without any slashes.
My problem with apropos (and man, for that matter) is related to how at least one major distribution handles the man pages. Rather than bundling the man pages with the applications, the man pages for many applications are bundled together in one base package. When this is done regardless of whether or not the application is actually on the system, apropos and man continually return information on applications which are not on the system.
In my opinion it sucks. It is possible that I suck, but I almost never can find what I'm searching for with it.
Now I found that full text search is man -K, but it searches through sources so it can also be quite useless. Is there a desktop full text search for man pages? It should use rendered man pages. It could be a fun project. Google doesn't count and I use it already.
Although I frequently use cal, the article mentioned cal -3 which will be useful near the end of each month. Previously I always displayed the calendar for the whole year or next month instead.
Going through the core utils manual [1], I found a couple other useful looking commands I didn't know about. Previously I used awk in bash scripts primarily for printf, but apparently it's part of coreutils too. The timeout command also looks useful.
I like how this article's title doesn't claim that it's about utils "that you don't know". Yet I was expecting utils that have something to do with the Linux kernel, not generic *nix command-line tools.
afaik, `shred` doesn't do anything useful on SSDs. Beneath the OS, there's intentional indeterminacy on sector access, so instructions to write over the same sectors multiple times with garbage aren't guaranteed to actually do that.
The reason shred writes "multiple times with garbage" is due to magnetic disks where even when overwritten, old bit values could still be extracted with advanced tools. This is not the case for SSDs, so a single pass of shred (either random or zero) is enough here. Multiple passes will just kill your SSD faster.
The problem with sector access still stands, so you're not sure it's actually overwritten. You should use sfill(1) in addition to shred. sdmem(1) also, if you don't want to physically power down your machine.
>The reason shred writes "multiple times with garbage" is due to magnetic disks where even when overwritten, old bit values could still be extracted with advanced tools.
People thought that you needed multiple overwrites for traditional drives but in reality that data was gone after a single overwrite of 0. There were no "advanced techniques" that could recover the data. Some specs suggested multiple passes, but this was because they were using precautionary principle.
> This is not the case for SSDs, so a single pass of shred (either random or zero) is enough here. Multiple passes will just kill your SSD faster.
SSDs do weird things with the data, so if your data is important enough you should destroy the drives. There's some data tucked away in odd places.
Hum, no. Some time ago you needed multiple uncorrelated overwrites to hide your data. Then as HDDs shrank you strted needing less and less passes up to today that you just need to clear it once.
It's not about people thinking unrealistic things. The command was created for defending against a demonstrated attack.
> Some time ago you needed multiple uncorrelated overwrites to hide your data.
You really didn't. It's a really persistent myth that there's some secret technique to recover overwritten data. For PC hard drives "blobby bits" has never been a thing. It might have been a thing on 1970s style 24" disc platters, but we're not talking about those.
No one has ever recovered data from a drive that's had a single overwrite of zeros - no software claims to be able to do this, no recovery service claims to be able to do this, there are no published papers that claim to be able to do this (there's one that gets an accuracy of 50%-55% per bit, ie useless).
Since the early 1980s a single overwrite of 0 has been sufficient to destroy data.
And if you're worried about a well funded government agency that has the money to spend on exotic techniques you can do a single pass with random data, or NIST 7 passes, or you destroy the drive.
> afaik, `shred` doesn't do anything useful on SSDs
It does effectively shred your SSD. In that it will fail sooner and sooner the more you run that command.
SSDs have a sector erase command. I don't know how it's exposed on the CLI, but it exists and will effectively erase anything that was removed. Most of them will run it automatically on the background after you delete stuff, so it's normally just a matter of keeping it powered for a while.
It shouldn't. It's 2017 now, and the speed difference between a "misplaced" cat command is normally negligible. The speed of the terminal user is more important.
If you are just doing one line and you write it perfectly, then maybe. For many people cat is simply the start if excessive piping. I like having the filename as far at the beginning as possible so I can Ctrl-W with ease.
Using `cat` is wasteful in scripts but not in command line. It separates filenames, which are often actually glob patterns taking some time to confirm, from the parameters, which also often take some time to look up in the manual. `cat` and a pipe make it easier to edit in the command line.
Just tried it again and you're right it does produce an error, I was sanitizing a real error I just had though. With the real one I had an unescaped space, cat mentioned that it couldn't open "both" files, in just gave me a message about and "ambiguous redirect"
That's not a great example as `pgrep` is a little more nuanced than running `ps [options] | grep [string]`, eg you cannot use many of the same `ps` flags `pgrep` like you can with `ps`.
At least with "in appropriate use of cat" (as some call it) you're literally just swapping the stdin file stream with a disk io file stream so there's no functional difference what-so-ever.
I'm not saying I agree with the GP either though as most of the time complaints about "in appropriate use of cat" are just showboating. Using `cat` "inappropriately" is arguably more readable for less seasoned shell script developer and it's certainly a more logical program flow for a human to parse. ie "open file, grep for contents, do something else, etc". But it's still sometimes worth a reminder that many string processing tools can accept file input directly without the need for piping it via stdin (or the files can be redirected directly from the shell via the less than, `<`, token).
Indeed, but as I said, the cat complaint tends to be a practice mostly used by those less experienced in the command line so they are probably unaware of `<` let alone that it's placement.
As much as people make fun of him for caring so much about a seemingly minor point, I've come around to understanding RMS's frustration about people calling all of the GNU project "Linux". This sort of title drives home why: neither Linus nor anyone else on the Linux project had anything to do with any of this.
If someone wrote an article titled "Windows utils that you might not know," I would not expect it's content to be solely applications written by Microsoft.
I think a better title would probably be "Common Unix utils that you might not know" or "GNU Coreutils that you might not know" (as some of the things that he is using seem to be coreutils-only?)
It's to keep the page weight balanced. Prose is considered to have neutral buoyancy, but code is naturally heavier and will tend to drag the page down if that isn't compensated by a lighter font. FWIW poetry can be so light, that it can require a very heavy font weight to keep it tethered to the page.
WolframAlpha lists the factors as 2^38 * 5^38, which is just a compact notation for 38 2s and 38 5s, totalling 76 factors. If you count the 2s and 5s in the output of factor, you get the same result of 38 each for a total of 76 factors.
The output of the error is not using IEC units -- YB not YiB (IEC units have a lowercase "i" when written). So it's referring to yottabytes not yobibytes.
Especially looking down the list of all time greats: http://www.commandlinefu.com/commands/browse/sort-by-votes