Hacker News new | past | comments | ask | show | jobs | submit login

I dug into this a little and one of the files is 164GB. How do you even work with these files? That is, how would I search for my SSN on my windows box?



That's not even that big? `cat big_file | grep -v my_term` would go line-by-line and show any lines matching your query. If you're doing a lot of queries, you'd probably want to index it, so you throw it into a sqlite database with the usual SQL utils.

Edit: I missed you said Windows. Probably Powershell have similar utilities, so you can do `ReadFileLineByLine \r \d big_file | ReturnHitBySearchTerm \v \t \s my_term` or something similar.


>ReadFileLineByLine \r \d ssn.txt | ReturnHitBySearchTerm \v \t \s trampas ReadFileLineByLine : The term 'ReadFileLineByLine' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try again. At line:1 char:1 + ReadFileLineByLine \r \d ssn.txt | ReturnHitBySearchTerm \v \t \s tra ... + ~~~~~~~~~~~~~~~~~~ + CategoryInfo : ObjectNotFound: (ReadFileLineByLine:String) [], CommandNotFoundException + FullyQualifiedErrorId : CommandNotFoundException

:(

All I know about powershell I just learned by accident: ls works


You absolutely do not want to use "-v" with that grep.

Nor do you want to use cat (UUoCA) but that's very much a minor point in comparison.


UUoCA: https://porkmail.org/era/unix/award

I hadn't heard of it before.


Using sift on a 100GB txt file still takes multiple minutes. I haven't tried ag, but grep is supposedly slower.


If the desire is just to grep for your name, email address, whatever, and then throw the rest of the data away, I don't think waiting multiple minutes is a big deal.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: