Hacker News new | past | comments | ask | show | jobs | submit login
Regex2fat: Turn your favorite regex into FAT32 (github.com/8051enthusiast)
397 points by beefhash on April 15, 2020 | hide | past | favorite | 82 comments



If you think this is crazy, check out the VVFAT driver in qemu[1]. At first sight it seems simple enough - turn a host directory into a virtual FAT filesystem which is presented to the guest.

The clever/insane thing is it supports writes. It is able to "reverse" those block level operations from the guest to modify the source filesystem on the host.

It was written by the ever exceptional Fabrice Bellard. EDIT: No it wasn't, it was written by Johannes Schindelin, thanks for the clarification in replies.

[1] https://github.com/qemu/qemu/blob/master/block/vvfat.c


> It was written by the ever exceptional Fabrice Bellard.

I think it is rather by Johannes Schindelin [1].

[1] https://github.com/qemu/qemu/commit/de167e416fa3d6e4bbdcac90...


On the other hand, I don't think we've ever seen Fabrice Bellard and Johannes Schindelin in the same room together.


That is... compelling, I admit, as Bellard was known for his use of various pseudonyms (e.g. Gérard Lantau for FFmpeg). But I guess Schindelin now works for Microsoft while Bellard still works for Amarisoft [1], right?

[1] https://www.amarisoft.com/about-us/


But if you rearrange, remove, and then add some letters to Amerisoft it spells MicroSoft. The implication is clear.


And now you can even write the regex for it into FAT32


Huh? Where do you get the 'c'? Typo?


>and then add some letters


That seems like the perfect solution to the problem of newer Android devices losing USB mass storage support because the internal data filesystem is no longer FAT32 and they combined the partitions together into a Linux-format one. Unfortunately a quick Google suggests no one has tried to take that file/concept and implement it so the Android device could present a virtual USB drive using that with a directory on the "host", but I guess that just makes it an idea for anyone else who has some time and would like an interesting challenge...


I can imagine other interesting use cases. fat32 over the network, for example: so you can e.g. plug in a USB device into some legacy hardware, and when it does a directory read, the device lies to the host about what's inside, only retrieving the bytes over the air during a file read operation.

I think the hard part is that if the "disk" is mounted, the FAT would be in memory on Windows, so it's very hard to put in a file from "outside", because Windows will probably never re-read the FAT from the disk and will never see that file.


Why can't they keep a fat32 driver around and mount it like they always have?


The data partition used to be FAT32 and plugging the Android into USB with the mass storage option would unmount the partition and then the Android emulates a USB mass storage device to expose the block-level contents of the partition to the host computer. Note that this couldn't be done with the system partition, since the Android system itself is running on it; and even if it was unmountable, the filesystem would be a Linux one like ext3 or ext4, making it unreadable to a Windows host. In the newer versions, they combined the two partitions into one with a Linux filesystem, so there would no longer be a separate data partition to unmount from the Android and mount on the host.

The idea I'm proposing is to turn VVFAT or an equivalent idea into a kernel module that treats the data directory --- or indeed any --- directory on the Android, including the system one, as the "host" since it has a filesystem-application interface on one side that just uses the normal filesystem calls, and expose the virtual FAT filesystem over USB mass storage on the other side.


Without knowing how QEMU does it, I'm guessing writes onto the emulated fat32 partition inside QEMU get caught by their software, and QEMU talks to Windows and says "Hello Windows, I would like to store this data as a file in this directory.". The directory can probably be on an NTFS disk or even Samba mount?

If a phone is "mounted" as FAT32, the writes from Windows should be intercepted by the driver and it would create Linux filesystem calls to create/write files. If an Android app decides to write onto that partition, there needs to be a program running on the Windows side to tell Windows' filesystem driver "Hey I'm going to create this file". Otherwise Windows will never see that file.

(I'm just guessing here, maybe there is a function to re-read the FAT?)


Oh, I was thinking about this the wrong way -- I had read it as Android's ability to mount USB OTG mass storage devices plugged into the phone.

I agree with you, if the internal filesystem can't be mounted as a filesystem on most PCs then there should be some sort of virtual/emulation layer.


The insane thing is supporting writes on the host side (are they?) which change the contents of the VFAT right under the guest OS. Writes in the other direction are a lot simpler because the guest is in control of the VFAT state.

Come on, any sort of caching scheme will wreak havoc with underhanded updates to the image.


I've used this feature quite a lot and it's important to understand that it is crazy unstable. The fact that it works at all is impressive, but it does explode quite a lot.


We discussed VVFAT at the KVM Forum (2018 IIRC), and as a response I wrote: http://libguestfs.org/nbdkit-floppy-plugin.1.html (It does not support all the crazy write/host filesystem interaction because I'm nowhere near as smart as Johannes Schindelin.)


I see another name as author and no trace of Fabrice in history. Can you elaborate please?


QEMU itself came from Bellard originally, but I believe you are correct that the RO and RW vvfat support came from Johannes Schindelin:

https://github.com/qemu/qemu/commit/de167e416fa3d6e4bbdcac90...

https://github.com/qemu/qemu/commit/a046433a161a1f554be55df8...


This reminds me of that Lewis black bit, "If it weren't for my horse, I never would have spent that year in college".

I saw the words, but my brain couldn't process them, no matter how many times I tried.

If I die of an aneurysm, regex2fat will probably be the reason why


I don't understand, what's the deal with that Lewis sentence?

It seems pretty straightforward to parse. I read it as:

  if(!somethingAboutMyHorse) {
    // ? Road not taken.
  } else {
    //Spend a year in college.
  }


Yeah I didn't get it either.

There's an explanation on Reddit[1] which basically says that there's no logical explanation for why having a horse should have led to a year in college.

I was looking for some kind of grammatical trick, and judging by some of the replies here I'm not alone.

[1] https://www.reddit.com/r/OutOfTheLoop/comments/1twwao/why_is...


The horse raced past the barn fell.


Without being able to process the words in some way, put them into context, it can cause an aneurism to rupture in his brain.


Never heard this before but despite being grammatically sound, I guess the events that would lead to someone truthfully saying it are meant to defy reason.


There's a hundred things a pet can do that might cause inspire you to change your decision about college. I don't understand how it defies reason or is even confusing.


The were not and never cancel out, so remove the negation. The subjective asserts the horse thing was necessary, not merely sufficient, so be sure there is no college in the other branch. Also throw on some asserts as the non-horse non-college possible world is not the real one.

In other words, imperative pseudo-code is a poor substitute for some temporal and possible-world modal logic.


I have to say that made me think of this really old game called Leisure Suit Larry.

In one scene, he is sitting in a bar and an NPC sits next to him on a barstool and the dialog just spits out punchlines with no context.

"...and there stood the pig and the cow!"


I think he made a mistake, it should have been "if it weren't for my horse I would have spent more than that year in college."

English can be tricky that way.


> Q: Should I use this in production^w^w anywhere?

> A: No, but I can't stop you.

The motto of so many of the best projects:)


Worry not, there's already a pull request fixing this obvious deficiency. Ah, the beauty of free software.

https://github.com/8051Enthusiast/regex2fat/pull/2


All In: "Haha OS-driven regex engine go brrrrr"



If this was made as a joke (yes?), that's a pretty good punchline.


There's the first issue: "`regex2fat` is nine characters long" (https://github.com/8051Enthusiast/regex2fat/issues/1) :)


MICROS~1.COM came up with a naming workaround.


Could someone explain the joke?


https://en.wikipedia.org/wiki/8.3_filename

> "A SFN filename can have at most 8 characters before the dot. If it has more than that, the first 6 must be written, then a tilde '~' as the seventh character and a number (usually 1) as the eighth. The number distinguishes it from other files with both the same first six letters and the same extension."


It's a fat joke.


This is a horrifying abuse of a filesystem, and I love it.


Hey thanks for the project! I (thought I) know what regex is, and I (thought I) know what FAT32 is. But Bamm! putting the two together, the whole sentence makes no sense to me.

This is genius.


Now you have... quite a few problems.


>Q: NOOOOOOOOOOO!!! YOU CAN'T TURN A DFA INTO A FAT32 FILE SYSTEM!!!! YOU CAN'T JUST HAVE A DIRECTORY WITH MULTIPLE PARENTS!!! YOU ARE BREAKING THE ASSUMPTION OF LACK OF LOOPERINOS NOOOOOOOOO

>A: Haha OS-driven regex engine go brrrrr

i absolutely love little toy things like this that probably shouldn't exist but do regardless, and even more so I love it when they close on a silly and playful note like this. this is a rather interesting concept and it reminds me a lot of the idea of glitterbombing from more occult/esoteric circles of the internet (performing acts of obscurity and aloof strangeness to degrade the meaning of consensual reality and expose people to a perspective of life they otherwise would not spend much time engaging in, sorta conceptually similar to Zen koans)


It’s a popular meme format.


This strikes me as a project that was created while drunk or high, probably high, possibly both.


I don't know about you, but some of us can generate a rich stream of dumb and useless ideas without the aid of substances.


I can generate those ideas sober just fine, its the doing them part that requires the substances.


He must have reached Ballmer’s Peak (https://xkcd.com/323/)


Can I parse HTML with this ?


Only if you are adequately insured against the accidental summoning of a Great Old One.



A while back I had a car stereo that would read USB drives in fat (fat32) format, but it had a terrible user interface, and searching/traversing was a chore.

I thought it would be a cool idea to hack the filesystem to allow you to have directories of albums or genres or artists all cross-linking to the same music files.

I now see my "big dreams" were actually limited in scope.


I think this could also be built out of symlinks in a Linux filesystem. This would be slightly more practical (though, of course, still not practical at all).


“Slightly more expressive” as you can create longer filenames with most Linux filesystems and more entries too.

I like that phrase as it sounds like “better” when we’re talking about, as you say, something wonderfully useless.


As long as you don't mind the depth limit of 40 symlinks on Linux and I think 31 on Windows.


    /include/linux/namei.h:13    #define MAXSYMLINKS 40
Only takes one recompile to raise the limit - someone's next project can be regex2symlinks-in-a-bootable-initramfs.


What happens if I my regex contains aux.h?



I think the filesystem supported naming files PRN, CON, AUX, LPT, COM1, etc. just fine, it's just DOS wouldn't let you "get to" them because it (and by "it" I'm not sure if COMMAND.COM or lower level DOS functions) parsed those characters as special devices.

I think you could make those files manually in a disk or sector editor just fine, and it might even show up in DIR, but of course accessing them via normal DOS usage would be hard if not impossible.


IDK man, I've switched to Linux but it wouldn't let me delete aux.h using windows explorer IIRC. But it was a long time ago so not sure.


It will show up as something like /A/U/X/DOT/H


I read this first as "Turning your Registry into FAT32" and thought... oh, no. You do not want to do that.

Regex though... this is humorous.


This is a very stupid question and I apologize in advance, but could somebody explain to me (as if I were a five year old) what this does?

Like literally what does it do? I'm assuming there is a theoretical use case, even if as a toy project just for shits and giggles but I'm completely at a loss.


Not sure I can bring it down to the level of a five year old, but here’s something that might help: regular expressions need to be evaluated to see if they match, and how you do that is you build up a state machine called a DFA that essentially looks like “if I see an ‘a’, then I should go to a state that will look for the letters ‘b’ or ‘c’”. What this project does is encode those transitions in the filesystem, so that the states are directories. If you see a particular letter, you basically “cd” into that letter and you’ll be in the next state.


This is hilarious! I was excited hoping that it was a fuse file system that let you mount a view of another file system with regex though, something that would be a pretty useful tool.


I also wondered the same .. if it weren't some generalised way to apply regex to implement FAT32 (read and write) .. instead I think its a mapping of DFA to dentry semantics. Still pretty neat.


I love these kinds of projects! Any description of it that has me cackling by the 2nd sentence is gonna be a gem. Good thing i brought my FAT32 driver.


/A/A/A/A/A/A/A/A/A/A/A/A/A/H/H/H/H/MATCH


James Cain did something similar on Windows using his user-space SMB2 server.

https://www.youtube.com/watch?v=tDUL3wEs2ew

You could also write a Samba VFS module that does the same thing with incoming filenames.


What is this? I'm so curious but nothing of this rings a bell with me. I mean I know what a regular expression is, and I've formatted several USBs to FAT32, but DFAs and everything in between have me Googling like crazy, still in the dark though.


It converts your regexp into a virtual filesystem and then you can test if a string matches the regexp by converting the string into a path and testing if that path is contained in the filesystem.


Presumably this is useful if you can place files into (or modify a file at) the location for a particular regexp.


I wouldn't necessarily call this useful for anything in particular. It's more of a silly trick to show off than anything. (And a reminder of how regexes work behind the scenes.)


That wouldn’t work. This works by hard linking a lot of directories to each other, so if you were to write a file there, it ends up in multiple locations (possibly all ‘directories’ in the file system)

The file system could work around that by creating new unique directories whenever you write any file to an empty directory, but that would require it to keep the regular expression it represents around, and would fill up the file system quite rapidly.


RegExp and DFAs are the same computation class, so it's useful sometimes to model them as each other.


No idea why you're being downvoted. Also, classical comp. sci technique of reducing your problem into another that's already been solved (with good enough time/mem bounds).


It’s frequently useful, as many regex engines construct DFAs under the hood :)


Not as frequent as you might expect! Most regex engines in common use don't use an automata based implementation, and instead use backtracking. This lets them implement additional non-regular features such as backreferences and recursion.

Even automata based regex engines don't usually build up a full DFA, since the size of the DFA may be exponential in the size of the regex. Instead, an NFA similation might be used, or a hybrid NFA/DFA that builds the DFA during match time, but typically doesn't build out the full DFA.


This was what made it all make sense for me:

https://perl.plover.com/Regex/article.html


did you ever think you wanted your favorite regex as Fat32? Because if you did you have two problems.


Great.

Now someone build one that can compile new regexes in fat32.


Love the readme.

haha regex engine go brrrrr




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: