I suspect this is where Windows backwards compatibility bites them a bit. I've got a very old tool [1] that uses WriteProcessMemory and CreateRemoteThread to create a thread in the command process that launched it to remotely change the directory in that process.
It works to this day, despite looking exactly like what malware would do. My tool is nothing in the grand scheme, but I suspect I'm not the only one doing these sort of shenanigans, and no doubt some big important app is doing it and can't be bothered to fix itself, so MS is stuck supporting it.
I ended up using the same basic layout for the database behind a little IP lookup tool I wrote to make lookups somewhat responsive from JavaScript [1]. It ends up working out pretty well.
My internship project involved a tool to look up individual IP addresses as well as checking a range of address blocks. I ended up implementing the trie data structure, which performed well in terms of storage and run-time efficiency.
My initial approach was to store IP addresses into a hashmap but if the average query was to check address blocks like /24, then it'd turn out to be very poor data structure to search through.
WhisperX along with whisper-diarization, runs at something around 20x of real time on audio with a modern GPU, so for that part, you're looking at around $1 per twenty hours of content to run it on a g5.xlarge, not counting time to build up a node (or around 1/2 that for Spot prices, assuming you're much luckier than I am at getting stable spot instances these days).
You can short circuit that time to build up a node a bit with a prebaked AMI on AWS, but there's still some amount of time before a new node can start running at speed, around 10 minutes in my experience.
I haven't looked at this particular solution yet, but I really find the LLMs to be hit or miss at summarizing transcripts. Sometimes it's impressive, sometimes it's literally "informal conversation between multiple people about various topics"
The current directory is managed with SetCurrentDirectory/GetCurrentDirectory, however the cmd.exe command-line shell also stores the current directory for each drive in an environment variable like "=C:", and the CRT and shell hides all environment variables that start with a "=".
It gets mightily confused if these two concepts of current directory ever diverge.
Who is still using cmd.exe? I understand that there are system processes that still need it, but if you see a human using cmd in the year of our lord 2023, that’s a cry for help!
On my Windows 10 with no profile it takes 1-2 seconds (Ryzen 3600/M2/32GB RM). Like, what is it doing? I get annoyed if bash on Linux takes like 250ms.
It's installed everywhere on any version of windows and works fine for interactive tasks (personally I wouldn't write anything but the simplest scripts for it, anything with for loops is a big no-no)
I do. I don't like PowerShell (and it took me years to realize it wasn't a diagnostic tool for power management), and I find bash for Windows to be ill fitting. I don't do a lot of stuff in the command line on Windows, so working like it has for decades is a plus.
I wanted to see if I could do the lookup work client-side, and also include some more metadata about cloud provider's IPs (region, service, etc), not that it's really better, just a toy idea I had.
Extra scandalous too since stuff like load balancers will use a bunch of IPv4 addresses that they can now charge extra for, with no way to avoid the fees since you can't turn off IPv4.
I asked it to summarize the transcript for a podcast about an episode of Star Trek. One minor issue in grammar, but otherwise, it does a remarkable job of making a summary:
> In the Star Trek episode "The Cage", Captain Pike is held captive by a race of advanced televisions. He is offered a life of luxury in a cage, but he realizes that this is not the life he wants and manages to escape. The televisions watch everything on TV and have become too voyeuristic and passive. At the end of the episode, Pike makes the right choice and is able to see the difference between the falsehood of the fantasy and the difficult but more acceptable reality. This episode highlights the importance of exploration and being better than oneself, and is a reminder of the dangers of becoming too passive and voyeuristic. It also shows how technology can be used to create illusions and how these illusions can be used to manipulate people.
I wrote MicroKeys[1] out of a similar frustration. Granted, I never got past the POC stage, so it's not as feature rich as AHK, but it solved a very specific itch I had. I debate if I should flesh it out further, or try using AHK again.
Long ago I had my podcast downloader keep all files it downloads and recently I've been using OpenAI's Whisper to go through and create transcripts of the 8000 or so hours of data I have downloaded over the years.
It's very cool to be able to search through and remind myself of something I heard once. Not exactly life changing, but still, nice to be able to quickly drill down and find audio for something when a curiosity strikes me.
What kind of hardware do you have that makes it feasible to process thousands of hours of podcasts? I want to do the same but I’ve heard that Whisper requires some serious GPU might for decent accuracy (Linux Unplugged podcast specifically).
Yep, it takes a bit of GPU RAM. I'm using 3 machines with NVidia 3080 or better. I let them go for a few weeks over the winter break when I was mostly disconnected from the tech world. The workers prioritized podcasts I'm personally likely to want to search, and got through almost a third of my archive.
Now it's down to 1 or 2 machines depending on what's going on, so it'll take much longer to finish up, but I'm in no rush.
This includes data from 1995 on. The early data is backfill of radio shows that transitioned to podcasts and dumped old episodes in their feed at some point. My reader itself started in 2012, I downloaded around 7000 hours of new podcasts, which works out to 1.7 hours per day. So, around 2 hours per day, since I don't listen every day, and to be fair, I haven't listened to every podcast I've downloaded, some don't interest me. But 1-2 hours of listening a day is the sweet spot for me.
It works to this day, despite looking exactly like what malware would do. My tool is nothing in the grand scheme, but I suspect I'm not the only one doing these sort of shenanigans, and no doubt some big important app is doing it and can't be bothered to fix itself, so MS is stuck supporting it.
[1] https://github.com/seligman/ccd