Author here. There isn't a library around this yet, but the source code for the blog is open source (MIT licensed): https://github.com/alexharri/website
The code for this post is all in PR #15 if you want to take a look.
I was investigating a fun webcam-to-ASCII project so now I am tempted to take an approach at porting the logic from the blog post into something reusable.
Update: I tested a port of the OP's methodology using Claude Code/Claude Opus 4.5 with some specific performance optimizations, and per the benchmarks, converting a 1024x1024 image to ASCII takes 16 microseconds. I suspect that will decrease after some more polish/iteration but that's enough for potentially real-time generation even on mobile hardware.
Benchmark says 15.654 µs. Rendering the text as a 1024x1024 image is 2.8737 ms.
However, the ASCII output quality is nondiverse despite using the same technique, so will need to do significantly more testing and this likely won't be released soon.