Batch editing files with ed

davidgould · on May 12, 2018

Awk really is the tool of choice for this sort of thing:

  $ awk '{print}/baz/ {sub("baz", "elephant"); print}' jvns.txt
  foo:
    - bar
    - baz
    - elephant
    - bananas

Since the script is single quoted you can also lay it out legibly:

  $awk '
  {print}
  /baz/ {
      sub ("baz", "elephant")
      print
  }
  '

which is nice for more complex "one liners". Awk is also standard on all posix environments and in the mawk flavor is extremely fast (relevant if you are processing huge files).

There is superb book, The AWK Programming Language, which teaches a lot about programming in addition to the awk language. Good discussion and link to pdf here: [0]

[0] https://news.ycombinator.com/item?id=13451454

lucidguppy · on May 12, 2018

The AWK Programming Language is the best programming book ever because it lets you learn the language through interesting problems (like writing a very small assembler).

lemonberry · on May 12, 2018

It's available online. Thanks for the tip.

https://ia802309.us.archive.org/25/items/pdfy-MgN0H1joIoDVoI...

kaushalmodi · on May 13, 2018

Come on, don't promote piracy of the book! It's worth buying! Mods, please take down this link.

kaushalmodi · on May 13, 2018

Hmm, someone downvoted me. Can the downvoter explain what was wrong in what I said?

lemonberry · on May 13, 2018

I believe the book's been out of print for sometime. I've no idea if that's why you were downvoted. I wouldn't post a link to a pirated anything. That pdf is linked from a ton of sites and the site I found it on seemed like a reputable site, though I don't remember which it was at the moment.

kaushalmodi · on May 14, 2018

> I believe the book's been out of print for sometime.

Hmm, I did not know that (I have a print copy of that book).

> I wouldn't post a link to a pirated anything.

OK. But just in case, someone still wants a physical book, they can get it used from Amazon (can talk about Amazon in US) for ~$3.

mjn · on May 13, 2018

> the mawk flavor is extremely fast

Fast, but partly because it's not Unicode-aware: it treats strings as 8-bit character sequences rather than UTF-8. Often that's fine, if non-ASCII characters are only passed through unmodified, but requires some care to avoid problems.

    $ echo $LANG
    en_GB.UTF-8

    $ echo "ÜNICÖDE" | gawk '{print tolower($0)}'
    ünicöde

    $ echo "ÜNICÖDE" | mawk '{print tolower($0)}'
    �nic�de

I ran into this in practice because I was using awk to convert paper titles from "Title Case" to APA-style "Only first word of title capitalized" case. The garbled output led me down a rabbit hole where I discovered that only some awks support Unicode locales, and the default awk on Debian (mawk) isn't one of them.

davidgould · on May 13, 2018

See my post way down the bottom, I do mention this. GNU awk also has a lot of useful extensions and builtins so that sometimes it's painful to use a plain Posix awk. But, when you need the speed, it's nice to know mawk is out there.

I'd love to see the mawk compilation technology merged to GNU awk. Or mawk updated with Unicode support and a few of the GNU extensions.

The other item on my awk-like wishlist is CSV support, ie split $1 .. $N by CSV rules instead of just a field separator. I usually end up copying CSVs into postgresql because it is fast and then I can process it very flexibly, but it's a bit heavy for things that could be one line-ers. Also, postgresql won't load malformed CSV, but I suspect awk could be less picky.

There is miller which hits some of these points, but I haven't really grokked it yet and the syntax seems ... awkward.

kaushalmodi · on May 13, 2018

+1 for The AWK Programming language book!

I have been reading through it and running code snippets from there as I find time. It's awesome!

If anyone's interested, I also update my notes from there on my blog.. https://scripter.co/notes/awk. I plan to have that notes post contain the entire book when I am done.

janvdberg · on May 12, 2018

Pretty much this whole thread is people coming up with better ways of solving the problem. I love Hacker News! :) But I think the point of the blog (for me) is showing what 'ed' is and where can you use it for (the problem at hand is secondary). And it did exactly that for me, I knew about 'ed' but never really understood what made it special or where/how you could use it. Thanks Julia for enlightening me!

orivej · on May 12, 2018

> I knew about 'ed' but never really understood what made it special or where/how you could use it.

ed script is one of the formats supported by diff and patch (with their -e or --ed command line switch). (diff generates the script by itself, but patch just pipes it to ed.)

ed scripts are used at Apple to maintain their patches for Python: see e.g. https://opensource.apple.com/source/python/python-97.50.7/2.... and others in https://opensource.apple.com/source/python/python-97.50.7/2....

captn3m0 · on May 12, 2018

fun side note: `patch` runs ed and ed allows you to run arbitary commands resulting in a security issue:

https://rachelbythebay.com/w/2018/04/05/bangpatch/

This was used for the https://holeybeep.ninja/ April fool joke release.

orivej · on May 12, 2018

This problem is easily solved with a regexp that processes the input as a whole, rather than working on it line by line. I had this need often enough that I exported Go regexp engine as a command line tool regrep, which can insert "elephant" after "baz" with:

  go get github.com/orivej/unix/regrep
  regrep s '(\n( *-) baz\n)' $'$1$2 elephant\n' < input.yaml

It only processes standard input, and can not by itself replace the contents of an input file with its output; but another tool, inplace, helps:

  go get github.com/orivej/unix/inplace
  find . -name '*.yaml' -exec inplace {} regrep s '(\n( *-) baz\n)' $'$1$2 elephant\n' \;

hawski · on May 12, 2018

It's close to the idea of structural regular expressions [0]. I'm still waiting for the awk from the paper.

[0] http://doc.cat-v.org/bell_labs/structural_regexps/

rav · on May 12, 2018

In practice I often use Vim instead.

    :args file1.txt file2.txt file3.txt
    :set autowrite
    :argdo norm /- baz/<CR>yypwCelephants

The commands set the argument list to a list of three files (by default, it is set to the filenames you passed to vim on the command-line). Then, autowrite is enabled which automatically saves each buffer after editing it. Finally, argdo runs a command on each argument file.

oblio · on May 12, 2018

You can even: https://stackoverflow.com/a/23237529

JoshMnem · on May 12, 2018

Thanks for that tip. I've used ex for similar things when editing hundreds of files.

This example searches each HTML file in a directory for a line with a string and then deletes a number of lines:

    $ echo "g/search string/ .,+20 d\nx" >> exscript
    $ for f in *.html
      do
          ex - $f < exscript
      done

teddyh · on May 12, 2018

Why write the ‘ex’ commands to a file? Why not just do the echo inline, like this

   for f in *.html; do
       echo -e "g/search string/ .,+20 d\nx" | ex - "$f"
    done

?

(I also added quotes to the $f dereference in case of file names containing white space, and the -e flag to echo to expand \n to newline. In case of a Bourne shell without support for -e in echo, I would probably use “{ echo "g/..."; echo "x"; } | ex - ...” instead of using \n.)

Also, in a production script I would probably have used “find . -maxdepth 1 -name "＊.html" -print0 | xargs --null --no-run-if-empty | while read f; do ...; done” instead of a ‘for’ loop from a pathname expansion, in order to guard against there being no html files, in which case a ‘for’ loop from a pathname expansion otherwise would be passing the literal string “＊.html” as the file name argument to ‘ex’.

JoshMnem · on May 13, 2018

Thanks for the tips. I put them in a file, because I saw someone using ed last year in a script and started looking at ex from there.

    diff -e file1 file2 > ed_script

Using echo (along with your other suggestions) is probably better for that example.

teddyh · on May 13, 2018

Note: I forgot the “--max-args=1” option to xargs.

John_KZ · on May 12, 2018

I've recently used EDLIN from an 80s version of MS-DOS.

After spending 5 minutes with the manual, I realized it was the best line editor I've ever used.

ed is very similar, but it doesn't come with a nice manual. The info page is chaotic and doesn't start from the beginning. You can cover typical usage in 5 lines, but no, you have to read through 25 pages of stuff just to figure out a sensible command. I just use vim or nano.

beefhash · on May 13, 2018

> ed is very similar, but it doesn't come with a nice manual

In many case it indeed doesn't. Historically, that documentation void was meant to be filled by learn tutorials[1] and a written introduction in volume 2A of the manual[2,3]. Unfortunately, the learn tutorials can't really be made use of in a modern environment; you’d actually have to set up a PDP-11 emulator with V7 (though prebuilt images exist) and work with that, an environment where backspace doesn’t really work out of the box.

OpenBSD ed(1)[4] tries, but it's just not quite there as an introduction.

[1] http://a.papnet.eu/UNIX/v7/files/doc/07_learn.pdf

[2] http://a.papnet.eu/UNIX/v7/files/doc/04_edtut.pdf

[3] http://a.papnet.eu/UNIX/v7/files/doc/05_adved.pdf

[4] https://man.openbsd.org/ed.1

mikec3010 · on May 12, 2018

25 pages that could be shown in one page if they skipped all the talk and just showed carefully selected examples .

yjftsjthsd-h · on May 12, 2018

That seems to be the theme of all info pages, and most GNU manpages. I suppose it beats no docs at least.

digi_owl · on May 12, 2018

My understanding is that man/info is meant more as a reference and less like a first time users guide.

gpvos · on May 12, 2018

Good man pages can be great first time user guides.

bewuethr · on May 12, 2018

I'm much more familiar with sed than ed, so here's how I would to this:

  sed '/baz/{s/.*/&\n&/;s/baz/elephant/2}' input.txt

or, slightly more readable

  sed '/baz/ {
           s/.*/&\n&/
           s/baz/elephant/2
       }' input.txt

The first substitution appends a copy of the line to the pattern space, the second substitution replaces the second occurrence of "baz" with "elephant".

This being said, I went ahead and bought the book mentioned in the article [0] - a neat little read.

[0]: https://www.michaelwlucas.com/tools/ed

textmode · on May 13, 2018

To use this solution with a version of sed that does not accept newlines in patterns (i.e. to make it portable), one has to put the commands in a sed commands file and run it with sed -f.

How to make the one-liner portable without using a sed commands file?

Maybe something like:

  sed 's/baz/elephant/;/^ \{2\}- elephant/{h;G;};/^ \{4\}- elephant/{h;G;};s/elephant/baz/' foo|sed -a wfoo

  1. s/baz/elephant/ 
  2. duplicate that line if two or four space indent
  3. s/elephant/baz/
  4. save

N.B. no temp file used to save changes

cf. jvns.ca blog:

  1. search for baz
  2. copy that line and paste it
  3. s/baz/elephant/
  4. save and quit

N.B. temp file in $TMPDIR used to save changes

simplicio · on May 12, 2018

When I first encountered the command-line in college my Prof introduced me to VI and the basic bash commands, but I wasn't familiar with any other scripting languages, (or even, if memory serves, the concept of a 'scripting language'). As a result, I ended up creating a pretty dizzying array of ed scripts until someone introduced me to sed and the fact you can use bash as a scripting language.

8077628 · on May 12, 2018

Ed is the standard text editor https://www.gnu.org/fun/jokes/ed-msg.html

atsaloli · on May 12, 2018

Nice article, good to hear ed is not dead. =)

You could also just add the text after the matching line. A little simpler and more straight forward.

    $ cat > /tmp/ed-script
    /baz
    a
      - elephants
    .
    w
    q
    $ cat /tmp/2
    foo:
      - bar
      - baz
      - bananas
    $ cat /tmp/ed-script | ed /tmp/2
    33
      - baz
    47
    $ cat /tmp/2
    foo:
      - bar
      - baz
      - elephants
      - bananas
    $·

fiddlerwoaroof · on May 12, 2018

I don’t think matches the spec that the new line has the same number of leading spaces as the surrounding lines

fiddlerwoaroof · on May 12, 2018

Weird, after rereading the article, it seems like I may have imagined that part.

bramblerose · on May 12, 2018

An older version of the article contained the following:

> I had one extra weird requirement which was that some of the lines were indented with 2 spaces, and some with 4 spaces. The - elephant line needed to have the same indentation as the previous line.

atsaloli · on May 14, 2018

Well! That explains the .t. Thanks! :)

samatman · on May 12, 2018

My guess is that the - is replacing something like an org-mode cookie, which could be in a few states that one might want to preserve.

wainstead · on May 13, 2018

Chapter 20 of O'Reilly's "Unix Power Tools, 3rd Edition" is all about batch editing and covers ed/ex as well.

Maybe there's an old copy of "Unix Power Tools" over in your server room or an abandoned cubicle in the office... the content has not changed much in the ensuing decades!

protomyth · on May 12, 2018

Ed is pretty complete, to the point it was a little too powerful when it was part of a security problem with FreeBSD's patch. https://securitytracker.com/id/1033188

textmode · on May 12, 2018

   echo -e '/-baz\n+1\ni\n-elephant\n.\nw\nq\n'|ed foo

but ed requires a temp file in $TMPDIR to save changes

for speed, put $TMPDIR on memory file system

sed requires no temp file

   1.sed:
   /-baz/a\
   -elephant
   
   
   sed -f 1.sed foo|sed -a wfoo

works with all versions of sed, e.g., not all versions support "\n" in patterns nor so-called "edit-in-place" automatic temp file creation and removal

yani · on May 12, 2018

Sure, there is more than one way to do X. It will be useful to learn why you prefer sed over ed

bewuethr · on May 12, 2018

The sed command doesn't get the indentation right, though, as the article says it could be indented by two or four spaces.

textmode · on May 12, 2018

   1.sed:
   s/- baz/- elephant/;
   /^  - elephant/{h;G;}
   /^    - elephant/{h;G;}
   s/- elephant/- baz/;

   sed -f 1.sed foo|sed -a wfoo

or

   1.sed:
   /^  - baz/a\
     - elephant
   

   /^    - baz/a\
       - elephant
   

   sed -f 1.sed foo|sed -a wfoo

KC8ZKF · on May 12, 2018

  sed s/baz/baz\\nelephants/

shabble · on May 12, 2018

that doesn't match the indentation on the subsequent line though, does it?

yani · on May 12, 2018

You are missing the dash that the line starts with

KC8ZKF · on May 12, 2018

  sed -i.bak s/baz/"baz\\n  - elephants"/ *.txt

bewuethr · on May 12, 2018

The article says that the indentation can be two or four spaces, though.

teddyh · on May 12, 2018

I do this fairly regularly in various shell scripts, but less now than previously ever since “sed” introduced the --in-place option, making it more useful for my purposes most of the time.

userbinator · on May 13, 2018

It's worth noting that the "enter a single . on a line to signal the end of an input" convention found its way into mail/mailx and SMTP too. The good thing is that it means you don't need to insert special characters like Esc (Ctrl+[, 27, 0x1B, whatever you want to call it) into your script; the bad thing is when you do want to add a line containing a single "."... whereby ed and SMTP have diverged with different "escaping" conventions.

fakedrake · on May 12, 2018

Good to know about ed! Since noone else has mentioned this, emacs' keyboard macros seem much easier to me especially since more than the basic editing stuff off awk/ed/sed i can leverage all the editor extensions and modifications i have accumulated over the years. That is unless its tens of thousands of files and the edit is exceptionally simple. I would write a script in that case too.

Symbiote · on May 12, 2018

Since discovering them, I use Emacs keyboard macros all the time.

Let's say I have:

  key   value
  salt  pepper
  fish  chips
  vodka orange
  rum   cola

and I want the second column in uppercase.

<F3> to start recording a macro. Alt →, → to position the cursor at "v" (or just →→→→→→ if this is a fixed width column), then Alt U to uppercase the next word. → to move the cursor one forward, to the start of the next line. <F4> to finish recording the macro.

Then press <F4> five times to run the macro five times.

(Explanation intended for users who've never used Emacs before. Of course, there are optimizations.)

rbonvall · on May 12, 2018

The equivalent in vim is:

• qq to start recording a macro in register q,

• w to jump to the second word,

• gUaw to "go uppercase a word",

• j to move to next line (↓ works as well),

• q to stop recording,

• 5@q to apply macro in register q five times.

But in this example I would have probably used ex command:

    :%normal wgUaw

(for every line do as if I had typed wgUaw) or visualy selected the second column as a block and just pressed U.

I'm genuinely interested in someone showing how to do this kind of trasformation in popular modern editors such as Atom and VSCode. Is there such a flexible way as in the classic editors?

rodorgas · on May 13, 2018

You can do it using multiple cursors. On Sublime, you place the cursor on beginning of “value”, then press Ctrl+Shift+Down until the end - there will be a cursor on every line. Then you press Ctrl+Right to select all values on the second column. Then press Ctrl+P and choose “Convert to Upper Case”, or just Ctrl+KU.

fakedrake · on May 13, 2018

The main reason i am not moving to something more trendy are kbd macros and the fact that emacs is designed to be equal parts runtime and editor. I think that for most purposes emacs and vim are equivalent but virtually all modern editors are lacking compared to a well good emacs/vim configuration.

omaranto · on May 13, 2018

Microoptimizing a couple of things, I'd do this from the top of the buffer:

    <F3> M-f M-u C-n C-a C-0 <F4>

I particularly like combining the "finish recording" and "running the macro" steps into a single <F4>. Plus, using a numeric argument of 0 seems better than counting lines.

textmode · on May 13, 2018

There is another program I use for editing that is older than ed. It is written in asm. I think it may actually be faster than sed (and sed is faster than AWK, Lua, Perl, Python, etc.)

  1.spt:

  ; x = "  - baz" 
  ; y = "  - elephant"
  ;a a = input :f(end)
  ;  output = a
  ;  a ? x :s(d)f(a) 
  ;d output = y
  ; :(a)
  ;end

  spitbol 1.spt < foo

davidgould · on May 13, 2018

>sed is faster than AWK

Depends on the awk implementation and the task. However even gnu awk (gawk) is very fast and mawk is astonishing.

Here is a simple example: count the lines, words, and characters in a 65MB text file (10 copies of a novel stuck together).

Testing on Ubuntu GNU/linux 16.10 reporting middle of three tries:

  export LANG=ASCII    # avoid differences due to unicode
  $ time -p wc big10.txt 
   1284570 10956950 64886660 big10.txt
  real 0.29
  user 0.28
  sys 0.01
  $ time -p gawk '{l+=1; w+=NF; c+=length($0)+1} END {print l, w, c}' big10.txt
   1284570 10956950 64886660
  real 0.55
  user 0.53
  sys 0.01

Not bad, gawk is less than twice as slow as wc which is the standard tool for this.

  $ time -p mawk '{l+=1; w+=NF; c+=length($0)+1} END {print l, w, c}' big10.txt
   1284570 10956950 64886660
  real 0.35
  user 0.33
  sys 0.01

But mawk is only 20% slower than wc. For a script!

Just for a check, even python is not terrible at this:

  #!/usr/bin/python
  import sys
  l, w, c = 0, 0, 0
  for line in file(sys.argv[1], "rb"):
      l += 1
      w += len(line.split())
      c += len(line)
  print l, w, c
  
  $ time -p ./wc.py big10.txt 
  1284570 10956950 64886660
  real 0.87
  user 0.86
  sys 0.01

About 3 times slower than wc and mawk.

textmode · on May 15, 2018

Here is how spitbol script measures against wc.

As with k, I am lacking in spitbol experience and so the counts are not identical to wc. Also I am using 10MB of big10.txt instead of the entire file.

  1.spt:

  ;* m line count, c word count, o char count 
  ;* p word pattern

  ; n = "0123456789"
  ; w = n &ucase &lcase "-"
  ; p = break(w) span(w)
  ;a a = input :f(c)
  ;  o = o + size(a) 
  ;  m = m + 1
  ;b a ? p = :f(a)
  ;  c = c + 1 :(b)
  ;c output = m ' ' c ' ' o
  ;end

  dd if=big10.txt bs=5m count=2 of=10m.txt

  time -p wc 10m.txt

  201346 1763181 10485760 10m.txt
  real         0.51
  user         0.44
  sys          0.00

  time -p spitbol 1.spt < 10m.txt

  201347 1770831 10235302
  real         0.33
  user         0.31
  sys          0.01

It appears that spitbol script is faster than wc.

textmode · on May 13, 2018

On a much slower computer...

  time -p wc big10.txt
  1284570 10956950 64886660 big10.txt

  real         2.76
  user         2.68
  sys          0.08

Trying this as novice with k3.

Because novice, 2 out of 3 counts are incorrect and probably not the fastest solution used.

Total "words" in the example was simply AWK's NF. But looking at big10.txt there anomalies such as words separated by "--" instead of space.

Here I used non-space character followed by space. Far from accurate but not too far.

  1.k: 
  w:0:"big10.txt";v:,/$w
  m:v _ss "[^ ] " / "word": char followed by space
  #w   / lines
  1+#m / words
  #v   / characters

  time -p k 1

  1284570
  10019630
  63602090

  real         2.70
  user         2.40
  sys          0.28

Counting lines with sed

  time -p wc -l big10.txt
  1284570 big10.txt

  real         0.13
  user         0.06
  sys          0.07

  sed -n '$!d;=' big10.txt
  1284570

  real         0.29
  user         0.19
  sys          0.09

davidgould · on May 14, 2018

That is a slow computer, mine is a pre-haswell i3.

  $ time -p sed -n '$!d;=' big10.txt
  1284570
  real 0.07
  user 0.06
  sys 0.00

  time -p mawk 'END {print NR}' big10.txt
  1284570
  real 0.04
  user 0.03
  sys 0.00

  $ time -p gawk 'END {print NR}' big10.txt
  1284570
  real 0.14
  user 0.13
  sys 0.00

  $ time -p wc -l big10.txt
  1284570 big10.txt
  real 0.02
  user 0.02
  sys 0.00

textmode · on May 14, 2018

Revised 1.k.

  w:0:"big10.txt";v:{" ",x}'w;u:{#v[x] _ss " [^ ]"}'!#v;t:{#w[x]}'!#w

  #w / lines
  +/u / words
  +/t / chars

Counts for words and chars are closer but still short due to inexperience using k.

But it appears the script is now faster than wc.

  time -p wc big10.txt

  1284570 10956950 64886660 big10.txt
  real         2.78
  user         2.66
  sys          0.12

  time -p k 1

  1284570
  10956830
  63602090

  real         2.57
  user         2.42
  sys          0.14

textmode · on May 13, 2018

How can I download big10.txt or the novel to recreate it?

davidgould · on May 13, 2018

big10.txt is just 10 copies of big.txt from the Peter Norvig spelling corrector essay [0].

[0] http://www.norvig.com/big.txt

zokier · on May 12, 2018

There is also ex, which is sort of cousin of ed, but also closer to nowdays more familiar vi(m).

greenyoda · on May 12, 2018

It's closer because vi was built on top of ex -- vi's ":" commands are just ex commands. In fact, on my Linux box, /bin/ex is just a symbolic link to /bin/vi:

    $ ls -l /bin/ex
    lrwxrwxrwx 1 root root 2 Oct  2  2017 /bin/ex -> vi

The original code for vi was written by Bill Joy in 1976, as the visual mode for a line editor called ex that Joy had written with Chuck Haley. Bill Joy's ex 1.1 was released as part of the first BSD Unix release in March 1978.[1]

(Bill Joy went on to become a co-founder of Sun Microsystems.)

"ex" stood for the extended version of ed.

[1] https://en.wikipedia.org/wiki/Vi

gpvos · on May 12, 2018

ex == vi(m). Really, it's the same executable. You switch from vi to ex with Q, and from ex to vi with vi^M .

I loved ex's open mode. It's kind of a line-mode version of vi. Great to use over 1200 baud. Looks like vim doesn't have it. Sad.

yani · on May 12, 2018

Good article. It has everything - a problem and a good solution to this problem.

tincholio · on May 12, 2018

manish_gill · on May 12, 2018

sed combined with iTerm's abilities to type in multiple panes lets me edit 100s of config files across multiple servers at the same time.

textmode · on May 13, 2018

(Novice k user.)

    1.k:

    /k3
    v:"- baz";u:"  - elephant";t:"foo";s:"\n"
    w:_ssr[,/$0:t;v;v,u]
    w[1_ {x-2}'(&w="-")]:s
    t 0:_ssr[w;s;s," "],s

    k 1

NVRM · on May 13, 2018

Php -> one line