OP attempted this using Python. What would be the fastest way using \*nix comman...

justinsaccount · on Dec 16, 2022

Use look:

  look $(echo -n password | sha1sum | cut -d ' ' -f 1 | tr a-z A-Z) pwned.txt

from man page:

NAME

look - display lines beginning with a given string

DESCRIPTION

The look utility displays any lines in file which contain string. As look performs a binary search, the lines in file must be sorted (where sort(1) was given the same options -d and/or -f that look is invoked with).

example:

  justin@box:~/data$ time look $(echo -n secret123 | sha1sum | cut -d ' ' -f 1 | tr a-z A-Z) pwned-passwords-sha1-ordered-by-hash-v6.txt 
  F2B14F68EB995FACB3A1C35287B778D5BD785511:17384

  real 0m0.212s
  user 0m0.005s
  sys 0m0.001s

  justin@box:~/data$ time look $(echo -n secret123 | sha1sum | cut -d ' ' -f 1 | tr a-z A-Z) pwned-passwords-sha1-ordered-by-hash-v6.txt 
  F2B14F68EB995FACB3A1C35287B778D5BD785511:17384

  real 0m0.002s
  user 0m0.003s
  sys 0m0.001s

genericlemon24 · on Dec 16, 2022

Hey, I didn't know about this command, neat!

On my laptop, look `time`s at ~10 ms (for comparison, the Python "binary search" script `time`s at ~50 ms).

justinsaccount · on Dec 16, 2022

You can make python binary search super fast if you use mmap. here's a version of that I had lying around, it's probably correct.

  import os
  import mmap
  
  def do_mmap(f):
      fd = os.open(f, os.O_RDONLY)
      size = os.lseek(fd, 0, 2)
      os.lseek(fd, 0, 0)
      m = mmap.mmap(fd, size, prot=mmap.PROT_READ)
      return m, size, fd
  
  SEEK_SET = 0
  SEEK_CUR = 1
  
  class Searcher:
      def __init__(self, file):
          self.file = file
          self.map, self.size, self.fd = do_mmap(file)
  
      def close(self):
          self.map.close()
          os.close(self.fd)
  
      def find_newline(self):
          self.map.readline()
          return self.map.tell()
  
      def binary_search(self, q):
          pos = 0
          start = 0
          end = self.size
          found = False
          #this can get stuck with start = xxx and end = xxx+1, probably from the \r\n
          while start < end - 2:
              mid = start + (end-start)//2
              self.map.seek(mid)
              pos = self.find_newline()
              if pos > end:
                  break
              line = self.map.readline()
              if q < line:
                  end = mid
              elif q > line:
                  start = mid
  
          while True:
              line = self.map.readline()
              if not line.startswith(q): break
              yield line
  
  if __name__ == "__main__":
      import sys
      q = sys.argv[1]
      s = Searcher("pwned-passwords-sha1-ordered-by-hash-v6.txt")
      import time
      ss = time.perf_counter()
      res = s.binary_search(q.upper().encode())
      for x in res:
          print(x)
      ee = time.perf_counter()
      print(ee-ss)

genericlemon24 · on Dec 16, 2022

I did try mmap, both with the plaintext binary search, and with the binary file (you can find a note about it in the HTML source :)

I ended up not mentioning it because for some reason, it was ~twice as slow on my mac... I'm now curious to try it on a decent Linux machine.

saalweachter · on Dec 16, 2022

Make sure you put a space at the beginning of your command, so you don't leave your password sitting plaintext in your bash history.

cbm-vic-20 · on Dec 16, 2022

If you're using bash, you'll need a to use HISTIGNORE or HISTCONTROL environment variables to do this.

bombolo · on Dec 17, 2022

If you're using bash, you can just leave a space before the command, like the other commentor said.

seedie · on Dec 17, 2022

Thats true if HISTCONTROL is set to `ignorespace` or `ignoreboth`

https://www.gnu.org/software/bash/manual/html_node/Bash-Vari...

denysvitali · on Dec 17, 2022

read -s -r MY_PASSWORD

Then, after typing your password you can safely use the $MY_PASSWORD variabile

geniium · on Dec 16, 2022

Oh I just learned something, thank you.

AceJohnny2 · on Dec 16, 2022

Perhaps there's a way to insert GNU Parallel in there to do parallel search of different chunks?

Or just use ripgrep, which integrates multi-core.

chkhd · on Dec 16, 2022

That is already doable with xargs itself

xargs -P maxprocs

Parallel mode: run at most maxprocs invocations of utility at once. If maxprocs is set to 0, xargs will run as many processes as possible.

giantrobot · on Dec 16, 2022

GNU parallel gives you some extra features like a runtime log and resumable operation.

throwaway14356 · on Dec 17, 2022

if have 10 computers put every 10th line in its own file, if each file is 1000 lines put line 500 at the start, then line 250, then line 750, then line 125, 375 etc

Bluecobra · on Dec 17, 2022

Assuming you have enough RAM, I wonder how much putting this file into a ramdisk will help Be speed things up.

jamespwilliams · on Dec 16, 2022

Perhaps pushing the definition of a *nix command slightly, but I’d be interested in the performance of https://sgrep.sourceforge.net/