Hacker News new | past | comments | ask | show | jobs | submit login
Rewriting pixels to add new features to closed-source software (washington.edu)
135 points by chaosmachine on April 2, 2010 | hide | past | favorite | 40 comments



Summary:

UI researchers can't easily add new features designed to make software easy to use to existing software. This is a particular problem with closed source software. "Prefab" is a tool that looks at the pixels on the display and infers what the underlying UI widgets are. Once this is done, additional features can be retrospectively added. This functionality is platform independent, and is demonstrated to run on YouTube videos (Flash), Mac, and PC.

Examples of functionality that can be added:

"Bubble cursor" - highlighting of the nearest UI element to the cursor as if the hit area of the cursor dynamically increased to the nearest element.

Dynamic mouse acceleration change when cursor on a UI element

Animation to indicate state changes in UI elements such as tabs, sliders and checkboxes

Generation of a preview of a multi-parameter space in graphics tools such as GIMP or Photoshop. This works by automatically changing several sliders to affect parameters in a graphics transform and recording the preview image. The results are then dynamically displayed allowing users to view the effects of parameter changes in parallel using a grid of output images.


The technology they demonstrate is impressive and clever. That's just a neat way to improve GUI's. But what they use their method to do (implement bubble cursor and sticky icons) seems like it would harm usability. With bubble cursor, the downside is that any accidental click of the mouse (those happen with touch pads on laptops) is guaranteed to trigger some gui element. Also, if you are in the midpoint of two gui elements, a tiny, tiny shift in the mouse could lead to clicking the wrong thing. Which could be very back when the "save draft" and "discard" buttons are right next to each other, as in gmail. The upside is... that I don't have to move my mouse quite as far? Bur if not having to move your mouse as far is a good thing, why implement that sticky elements feature, which guarantees you will have to move your mouse much farther after moving over an element? Both the bubbles and sticky things seem designed to make ui elements much easier to click on, but unless you have muscle control problems (which I suppose some people do, especially the elderly) I don't think that's generally a good thing. There are too many buttons that do things that you just can't undo.


I just put together a javascript + canvas demo of bubble cursors based on the demo in the video. It does make it a lot easier to target small elements.

http://mwomwo.nfshost.com/bubblecursor/bubbles.html

I agree with you that stickiness doesn't sound like a great feature. Clicking the wrong thing also seems like a risk, but that might be better solved by making buttons not do things that you can't undo.

Bubble cursors seems to me like it could definitely be a win.


Upvote for you, for awesome effort.

But your demo confirms for me all my suspicion about bubble cursor. Hang out at the midpoint between three bubbles of all different sizes and it just seems incredibly unintuitive to me which one is highlighted. Yes it makes small things easier to click on (sometimes much easier than gigantic things), but as I said above, I don't think that's a good thing because of all the buttons that have functions that can't be undone. You mentioned making buttons that do not do things that you can't undo, but how would that work in a save dialogue box, or when you've just typed up an angry email you never intended to hit send with?


Thanks.

This story:

http://news.ycombinator.com/item?id=1235081

talks a bit about the points you've made. Basically you put a passive delay in that's long enough for people to cancel. It's not going to work for everything though - sometimes you really do want to start a process immediately.

The biggest benefit comes when you've got a large bit of empty space on one side of an object. That space then becomes clickable. In a world with big monitors it's harder to get easy UI wins by putting important widgets on the side of the screen. You could get a similar effect with these bubble cursors.


I just tried limiting the max bubble cursor radius in your demo and it seemed to prevent the unwanted gap-clicking problem well enough, while still helping buttons be easier to click. Not sure if it reduces the win in the proposed huge-screen example.


On my local copy I put in something so that it looks at the second closest object to the cursor. Then only selects if the nearest object is 40% closer than the second nearest. It works pretty well to prevent ambiguity, but still gets a bit confusing when there's a few objects of different sizes nearby. Quite want to try this functionality in a real user-interface now!


"Adding a passive delay" is very nearly a synonym for "adding latency". Not quiet. But close.


Another fix for the midpoint issue would be to create a dead space threshold -- for example, if the threshold is set at 5 pixels then any area where a 5 pixel movement would change the bubble target is turned into non-clickable space. Possibly all potential targets could be highlighted in a different color, to indicate that the system isn't sure where you want to click.


An idea to solve the problem with super-large selection bubbles that can capture multiple options with small changes:

Make the bubble max out at 1/2 (or some other fraction) of the distance to the second closest object. That way you have the same enhanced-selection, but you also avoid any contention points, and require that the cursor is still conceivably "close" to the target.


There is a reason this stuff is still experimental. However, it's easy to think of new options like bubble cursor + maximum bubble size which you want to explore. How about a Bubble cursor preview with a touch interface? That seems like a pure win, but how about using the iPhone's keyboard preview when clicking on non keyboard icons etc.

What's hard is finding out how this impacts current applications which is the whole point of this demo.


Very good points. What they do is interesting, but useless—at least in the way it is presented. The main problem is—you cannot bolt on good UI, it must be designed from the very beginning. Anything else is just a lipstick on a pig.


But if you're stuck dealing with a pig, it may as well be a pretty pig.


Bubble cursor worries me far less than sticky icons. The mapping between mouse and pointer displacement is currently a well defined linear relationship that humans can internalize trivially.

I've always felt that mouse "acceleration" undermined that relationship, but at least it was in a way that mapped to the motion of the mouse.

Changing that to include all possible targets en-route to your destination seems almost malicious: I know where I want to move the pointer but other GUI elements in between will try to thwart me!


An easy fix would be a speed threshold. Studies show that most movements to targets involve a high speed phase and then a deceleration phase near the target (example from the original Fitts law studies if you read up on that). So simply set a threshold for movement speed. If the mouse is moving faster than X, don't use sticky controls. Once the speed drops below X, turn on sticky controls.


But the point of the technique is not that the specific experiments shown are known beforehand to be superior, but that it allows you to perform arbitrary UI experiments on on every piece of software to find out which ideas are good and which not so.


Perhaps it's because it's late and I'm tired and a bit grumpy, but I can't see the benefit at all here. Only downsides.

1. The whole idea of teaching a user to accept another programme pretending to be the one they're running, and intercepting all inputs for that I find dangerous.

2. The bubble cursor would be annoying and frustrating, far more often that it would be useful. Have another look at that video, and see just how far the mouse is, often, from the target it's selecting. Especially when it's sometimes over the text of another question. What about cases where you don't want to click anything, you just want to bring that window to the front, or remove focus from the flash video so you can press space to scroll (not pause)?

3. The slowing of the cursor over controls looks the most useful, but would make navigation of your application interface like playing one of those games where your character keeps getting stuck in puddles of honey carelessly left lying about, and slowing dramatically. It'd turn moving your cursor into a game of dodge the gravity wells. Now picture your Mum accidentally moving her mouse into the centre of those six rows of toolbars she has in IE. She'll never be able to get it out again


2. The example is a conceptual example, the points you raise can be fixed relatively easily. You can enable deadspace by requiring the mouse to be at least X pixels from a potential target.

3. You could simply turn off sticky controls when the mouse is moving quickly. Target acquisition generally has 2 phases, I high speed movement followed by a deceleration phase near the target. Just enable the sticky controls once the mouse speed drops below the appropriate threshold value.


I think some Logitech mice used to have haptic feedback, which would vibrate the mouse slighly as you moved over buttons etc. I haven't tried it but always thought that was a much better mechanism for "feeling" buttons and other UI elements that slowing the cursor.


Ok, this thing is cool, and they wrote a paper. Where do I get the code? It's a bit ironic to use the tagline "What if every GUI were open source" and not link to the code.


You could always email the author. Often times academic code is slow to be released because the types of things that tend to be thought of as required for software release (documentation, install help, bug fixes, coherent code organization) simply aren't there. The reward system set up by academia doesn't place value on those types of things, so it takes a long time to get them done in your free time. The result is people are hesitant to release their code because they know it will likely be difficult to put to "real" use without heavy refactoring.

Just my 2c.


I think this is pretty cool. Somewhat similar tools have been used to cheat in MMOs where a library would recognize and interpret the pixels and you'd write scripts to manipulate the underlying platform.

This is actually extremely flexible and could be used not just to decorate a user interface, but to provide a programmatic /scripted/ interface into an application that doesn't traditionally provide an API.


Impressive work which I could definitely foresee being used by malicious code to seamlessly hijack existing GUIs. The security implications are spine-chilling.


I'm quite sure that malicous code has been written to do similar things already. I think what they've done with the technique is of interest here, not the technique (reading pixels from the screen and intercepting the mouse / keyboard).


It is the default method to scrape on-screen keyboards meant to prevent phishing attacs.


Nothing that can't be done already, assuming the attacker can run code on your computer, in which case, controlling where you're trying to point the mouse is the least of your problems.


Yes. I like that the first example was an OSX password dialog. I'm sure there's no potential for misuse of this feature...


I am drooling over the idea of my web-cam tracking my eye-movement and combine it with the power of the bubble cursor. Just imagine, you would not need a mouse anymore, all the problems of touch-screens are solved. You would just have your courser on wherever your are looking at.


blink twice to click?


blink 3 times for right click ;-)


Very cool research. Not exactly the same, but somewhat similar to what the sikuli group at mit (http://groups.csail.mit.edu/uid/sikuli/) is doing regarding visual programming. They can take their code, but they can not take their output.

The future is bright for post processing programs that modify one programs output in some way. Note greasemonkey that modifies html/dom within a browser, this research which modifies drawn pixels, sikuli which does programmatic image recognition and the new javascript audio research within mozilla which allows one to create and record audio within the browser (http://vocamus.net/dave/?p=974). How will music labels react when it becomes easy to record audio output to mp3 from html5 video within the browser via a javascript library?


Interesting.

But it's only multi-platform in the sense that it can recognize the underlying widgets. I am also concerned on the added complexity of bolting a layer of behaviour on top of software that's unaware of it and was not designed to take it into account.

I also wonder if is it a coincidence that it's research focused on adding a layer of complexity to closed-source software is from Washington.


It seems like there is an element of this being a means to an end, I don't think they really want everyone to use this. Rather widen what researchers in their field can work on and persuade the companies that make this software to include solid UI research down the track.


A bit different from this, but OS X has allowed the editing of nib interface files in any application for years (though that ability seems to have unfortunately gone away for the most part in Snow Leopard, where most nibs are compressed to save space).


Some quite cool ideas... the image processing based approach is a clever way to avoid the platform specific nature of UI and the fact that lots of apps do their own thing to render APIs like EnumWindows useless for stuff like this.


This reminds me of MIT's research initiative to automate any GUI using screenshots

http://news.ycombinator.com/item?id=1072710


its really interesting to see ui developments take more of a focus recently, this looks like some really hefty work and its great to see it in action on applications I use every day, I might have to take some time to see what can be done in javascript, the expanding cursor is vaguely familiar but looks like a great idea.


Yes, yes, yes, to all of these GUI ideas in the next version of whatever GUI toolkit I'm using.


Genuine question: is it breaking HN guidelines to post a duplicate by adding a question mark to the end of the URL, as in this case? I did think this deserved more attention than the original post's low score.

http://news.ycombinator.com/item?id=1233669


That's cool. It's like Greasemonkey for your UI toolkit.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: