Hacker News new | past | comments | ask | show | jobs | submit login

This is a great example of how security has changed from the applications being installed by the system administrator being "trusted" and access control focused on users, to the main threat being malicious applications all nominally under the control of the user but with unwanted behaviour.

Directly going into another process's memory is a bit antisocial and unreliable, so it's normally only seen in debuggers and cheat engines. Some systems use shared memory deliberately to communicate, but that's setup specially (MAP_SHARED etc).

What's far more common is shared code "injection": Windows shell extensions, for example. Keyboards and input method helpers. Or OS hook mechanisms: Autohotkey. COM/OLE. And if you want to be baffled and terrified, "OLE Automation" which lets you RPC into Microsoft Word.

I tried searching for detailed explanations and came across this absolute horror: https://supportline.microfocus.com/documentation/books/nx30b... (how to use ActiveX from Object COBOL)




Just wanted to point out screen readers and other disability tools are a very important reason to read and modify another application memory. My understanding is this was a primary motivator for the detours library. Though I can't seem to find the MS Research blog post talking about this nearly 20 years ago, I'm wondering if there's some revisionist history that's happened :(


Screen readers should not require reading (or modifying!!) a process's memory; platform UI toolkits make this information available programmatically.


Except when they don't, or don't give enough context. A top-level comment here[0] gives an example where text was internally rendered to a bitmap prior to display, so a MS library was needed to hook the relevant API calls to get at the source text. Unfortunately (in this exceptional case), native UI toolkits aren't always as nice as webpage DOM. We'll probably start experiencing the same issue on the Web as soon as WebAssembly gets popular enough for people to develop UI toolkits on top of it.

--

[0] - https://news.ycombinator.com/item?id=23251509


I should have probably said "good platform UI toolkits" :P


> screen readers and other disability tools are a very important reason to read and modify another application memory.

That has historically been true on Windows, and to the extent that people still need to use legacy applications that use GDI (the Win32 graphics API), it sometimes helps to use a screen reader that uses such hacks. But for applications using any modern graphics stack, we must rely on programmatic accessibility APIs like UI Automation.

I have some expertise in this area. I started developing a Windows screen reader in late 2004 (for a tiny company, long before I joined the Windows accessibility team at Microsoft). Naturally I started with the proper programmatic accessibility API at the time, Microsoft Active Accessibility (MSAA). I quickly encountered MSAA's severe limitations and had to turn to other techniques, such as sending window messages to Win32 edit controls to get the text under the cursor or the current selection. Internet Explorer and the Office apps had COM-based object models, so I used those as well.

About a month into developing my screen reader, I realized I was going to need something more. I kept running into pieces of UI that I couldn't access through MSAA, window messages, or an object model. I came across a technique called API hooking, that is, patching user-space Windows API functions in memory at runtime, and I realized that I could do this with GDI functions. I knew that other Windows screen readers -- all of the serious ones -- hooked into GDI, but they did it in kernel space, by installing a fake graphics driver that would receive all the function calls from the GDI subsystem in the kernel, pass the information to the screen reader, then forward the calls to the real graphics driver. I figured that with API hooking, I could do something like that in user space. Note that I didn't do the low-level patching (i.e. rewriting x86 instructions) myself; I found a cheap commercial library that did that part. But I wrote the code to apply the technique to the specialized domain of screen reading.

I should mention that I was late to the screen reader game. The real pioneers started releasing Windows screen readers in the early 90s. And one of those pioneers came up with a term for what we were doing, which became common across the industry. They called it an off-screen model. It basically worked like this: a screen reader would hook GDI functions like TextOut, FillRect, BitBlt, etc., and use the information gathered from those function calls, as well as whatever state it could query from the GDI device context, to build a model of what was on the screen. For each piece of text, the model would have information like the text string itself, the bounding rectangle, the color, the width of each character, the font name, the font weight, and probably some other things I've forgotten (I wrote most of that code in 2005 and haven't worked with it in a long time). Whenever a rectangle was filled (e.g. through FillRect or ExtTextOut), we'd note the color, so we could use it for things like detecting whether a given piece of text was highlighted and should be spoken automatically. And we had to keep track of which GDI bitmaps represented graphics, and which ones were text that had been drawn into off-screen memory and then blitted onto the screen. In short, it got really complicated. But it was the only way we could provide access to some important applications.

There were limits to what we could do with this kind of hooking and hacking, though. In early 2006, I heard that TurboTax wasn't accessible with any screen reader. Full of the hubris that comes from having some success at something difficult (edit: and being young), I bought a copy of TurboTax (though I had no intention of using it), installed it, and jumped in to see how I could make it work with the product I had been developing. I quickly discovered why none of the other screen reader developers had been able to make it work. I've long since forgotten the details, but the problem was something like this: Instead of drawing text into a GDI bitmap and then using BitBlt (or one of the similar functions) to blit the bitmap onto the screen, TurboTax would draw into a GDI bitmap, transfer the contents of that bitmap into normal memory, and then somehow blit from that memory onto the screen. Because of this roundabout approach, a screen reader's off-screen model lost track of the bitmap, and by the end, saw it as nothing but an opaque image.

So, off-screen models were fragile even with the fairly simple GDI. I don't think any screen reader developer ever shipped an off-screen model that worked with more complex graphics APIs like DirectX or OpenGL. By the time major non-gaming applications on Windows started replacing GDI, about 10 years ago, it was clear that programmatic accessibility APIs like UI Automation would be the only reliable solution going forward. And now, as a developer on the Windows accessibility team at Microsoft, I work on the Narrator screen reader, which relies exclusively on UI Automation. Sometimes I miss the ability that I had to hack accessibility into an application that wasn't accessible by design; it felt like a superpower. But I know that the days of being able to do that are over, for many good reasons.


Indeed, we are slowly heading toward that authoritarian dystopia envisioned by Stallman over 20 years ago:

https://www.gnu.org/philosophy/right-to-read.en.html

In 2047, Frank was in prison, not for pirate reading, but for possessing a debugger.


Oh yeah I remember <pretty big accounting app> which used OLE hacks to use word, excel and access as a backend




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: