This is a great example of how security has changed from the applications being ...

milesvp · on May 20, 2020

Just wanted to point out screen readers and other disability tools are a very important reason to read and modify another application memory. My understanding is this was a primary motivator for the detours library. Though I can't seem to find the MS Research blog post talking about this nearly 20 years ago, I'm wondering if there's some revisionist history that's happened :(

saagarjha · on May 20, 2020

Screen readers should not require reading (or modifying!!) a process's memory; platform UI toolkits make this information available programmatically.

TeMPOraL · on May 20, 2020

Except when they don't, or don't give enough context. A top-level comment here[0] gives an example where text was internally rendered to a bitmap prior to display, so a MS library was needed to hook the relevant API calls to get at the source text. Unfortunately (in this exceptional case), native UI toolkits aren't always as nice as webpage DOM. We'll probably start experiencing the same issue on the Web as soon as WebAssembly gets popular enough for people to develop UI toolkits on top of it.

--

[0] - https://news.ycombinator.com/item?id=23251509

saagarjha · on May 20, 2020

I should have probably said "good platform UI toolkits" :P

mwcampbell · on May 21, 2020

> screen readers and other disability tools are a very important reason to read and modify another application memory.

That has historically been true on Windows, and to the extent that people still need to use legacy applications that use GDI (the Win32 graphics API), it sometimes helps to use a screen reader that uses such hacks. But for applications using any modern graphics stack, we must rely on programmatic accessibility APIs like UI Automation.

I have some expertise in this area. I started developing a Windows screen reader in late 2004 (for a tiny company, long before I joined the Windows accessibility team at Microsoft). Naturally I started with the proper programmatic accessibility API at the time, Microsoft Active Accessibility (MSAA). I quickly encountered MSAA's severe limitations and had to turn to other techniques, such as sending window messages to Win32 edit controls to get the text under the cursor or the current selection. Internet Explorer and the Office apps had COM-based object models, so I used those as well.

About a month into developing my screen reader, I realized I was going to need something more. I kept running into pieces of UI that I couldn't access through MSAA, window messages, or an object model. I came across a technique called API hooking, that is, patching user-space Windows API functions in memory at runtime, and I realized that I could do this with GDI functions. I knew that other Windows screen readers -- all of the serious ones -- hooked into GDI, but they did it in kernel space, by installing a fake graphics driver that would receive all the function calls from the GDI subsystem in the kernel, pass the information to the screen reader, then forward the calls to the real graphics driver. I figured that with API hooking, I could do something like that in user space. Note that I didn't do the low-level patching (i.e. rewriting x86 instructions) myself; I found a cheap commercial library that did that part. But I wrote the code to apply the technique to the specialized domain of screen reading.

I should mention that I was late to the screen reader game. The real pioneers started releasing Windows screen readers in the early 90s. And one of those pioneers came up with a term for what we were doing, which became common across the industry. They called it an off-screen model. It basically worked like this: a screen reader would hook GDI functions like TextOut, FillRect, BitBlt, etc., and use the information gathered from those function calls, as well as whatever state it could query from the GDI device context, to build a model of what was on the screen. For each piece of text, the model would have information like the text string itself, the bounding rectangle, the color, the width of each character, the font name, the font weight, and probably some other things I've forgotten (I wrote most of that code in 2005 and haven't worked with it in a long time). Whenever a rectangle was filled (e.g. through FillRect or ExtTextOut), we'd note the color, so we could use it for things like detecting whether a given piece of text was highlighted and should be spoken automatically. And we had to keep track of which GDI bitmaps represented graphics, and which ones were text that had been drawn into off-screen memory and then blitted onto the screen. In short, it got really complicated. But it was the only way we could provide access to some important applications.

There were limits to what we could do with this kind of hooking and hacking, though. In early 2006, I heard that TurboTax wasn't accessible with any screen reader. Full of the hubris that comes from having some success at something difficult (edit: and being young), I bought a copy of TurboTax (though I had no intention of using it), installed it, and jumped in to see how I could make it work with the product I had been developing. I quickly discovered why none of the other screen reader developers had been able to make it work. I've long since forgotten the details, but the problem was something like this: Instead of drawing text into a GDI bitmap and then using BitBlt (or one of the similar functions) to blit the bitmap onto the screen, TurboTax would draw into a GDI bitmap, transfer the contents of that bitmap into normal memory, and then somehow blit from that memory onto the screen. Because of this roundabout approach, a screen reader's off-screen model lost track of the bitmap, and by the end, saw it as nothing but an opaque image.

So, off-screen models were fragile even with the fairly simple GDI. I don't think any screen reader developer ever shipped an off-screen model that worked with more complex graphics APIs like DirectX or OpenGL. By the time major non-gaming applications on Windows started replacing GDI, about 10 years ago, it was clear that programmatic accessibility APIs like UI Automation would be the only reliable solution going forward. And now, as a developer on the Windows accessibility team at Microsoft, I work on the Narrator screen reader, which relies exclusively on UI Automation. Sometimes I miss the ability that I had to hack accessibility into an application that wasn't accessible by design; it felt like a superpower. But I know that the days of being able to do that are over, for many good reasons.

userbinator · on May 20, 2020

Indeed, we are slowly heading toward that authoritarian dystopia envisioned by Stallman over 20 years ago:

https://www.gnu.org/philosophy/right-to-read.en.html

In 2047, Frank was in prison, not for pirate reading, but for possessing a debugger.

epse · on May 20, 2020

Oh yeah I remember <pretty big accounting app> which used OLE hacks to use word, excel and access as a backend