Drop the "Hey Github" nonsense (hopefully it's only for illustration purposes anyways) and … this will be a generational paradigm change in how to write code… if it works. The hard part will be editing code with your voice too. Like "no, I meant …" etc.
VERY PROMISING, in any case you can just manually fill the gaps with the keyboard!
Generational? Idk. I work for a company that regularly sends out surveys, and there are several tools to integrate voice into it. Willingness to speak instead of type is quite low across respondents (which is a representative population sample). It looks as if speaking to a machine does not hold the same appeal as speaking to a human (something that can also be seen in telephone queue screener questions).
I hate talking to machines. Sometimes it’s the best option (I love using a voice assistant in the kitchen), but almost always I’d have a full keyboard as an interface instead.
If machines were amazing at Speech-to-Text, okay, sure. But while the capabilities are impressive, they still kinda suck at it.
The only voice control anything I use is to create reminders on my iPhone, and the only reason I use that is because the default reminders app's UX is really bad that it's quicker to use the voice commands.
I don't see how text to code would be faster than typing. And even if it is, typing speed is not really a limiting factor in the speed at which I can produce code.
It got "Theresa's and Aidsdorm" for "Turisas and Alestorm". Surprisingly, it got pretty close with the German band Schandmaul (something Alexa recognizes 100% of the time as Sean Paul), transcribing to "Schandmauel or Schandmöhrl".
But yeah, that is pretty close to amazing.
I kinda forgot about it after seeing that the Rhasspy community experimented with it, and it had issues with short utterances and a slow startup time.
The first amazing result would not be for you to program with this, but for somebody with a phone to be able to automate a few small tasks with just the voice.
Exactly. We've seen a constant improvement in the tech for decades. I remember before color lcd phones that they had "voice control" and today we have assistants, which are orders of magnitude more sophisticated.
Yet, it hasn't stuck. I'm exclusively using Siri to set timers. Most people are like me, or don't use it at all. Some use assistants for googling factoids or something. Fidelity wise, it's really underwhelming.
It's not a social acceptance issue, because people would still use it at home, and they don't. It's a small chance there's some key UI insight missing (discoverability for one), but I doubt it. Even with perfect UI, natural language is quite flawed when you're dealing with technical details (see exhibit on variable naming).
Anyway, the chances of Github solving this in an exceptionally difficult subdomain, as a side project, seems like a... Let's say, long shot.
That said, the silver lining in all these billions spent on voice interfaces is accessibility. For some people, these things are a life saver.
This is not marketed as an alternative input mechanism for people who have otherwise no difficulty typing code. It's an input mechanism for people whose abilities to type are limited.
If it works equally good like Apple Siri or Google Hey (or whatever its's called), then it will be ... totally useless? I can't imagine that they bring a better product than two of the richest companies in the world even can't figure out (perfectly). And if I need to read and adjust all my code for typos, I can just write it myself.
Because in my experience it is very often like "Call Peter" -> "Today it's sunny in NY".
To be fair Siri was really good before iOS15 on the phone - very rarely got a word wrong then I don't know what they changed but it went belly up for me and many other people have said the same.
On macOS it still seems pretty good - I have carpal tunnel syndrome and by Thursday or Friday most weeks I end up using Siri to dictate not code but a lot of conversations in Slack, pull requests, iMessage, etc. In fact, I wrote this reply with Siri right now.
I definitely notice there's a difference in quality depending on your network latency I thought quite a bit of the processing was done locally now, but latency seems play such a part in its quality.
iPhone ability to convert speech to text has always been good. It’s always been Siri’s capacity to take a meaningful action from the recognized speech that has been problematic.
I've been trying to use Siri while driving more and more, it's amazing how distracting it is compared to peaking at the screen (it's naughty, I know, I try not to do it).
But yeah, something about talking to a device which gets things wrong all the time is ridiculously distracting, at least for me.
Sometimes I look back at the road after trying to workout what it interpreted and I feel scared how focused on the phone I became.
>I can't imagine that they bring a better product than two of the richest companies in the world
Code is much more constrained by language syntax though.
Even for the "call peter" example, while the input is easy, the expected range of inputs that Siri should handle and be able to differentiate it from is huge.
Of course this is still a problem for e.g. defining variable names, where you could say anything.
> I can't imagine that they bring a better product than two of the richest companies in the world even can't figure out
Are either of those companies investing particularly heavily into voice agents? Certainly neither of them has anywhere near the kind of power of something like Copilot.
Also, a general agent is way different from one that's specific to writing code.
It seems wonderful for people who can't as easily use a keyboard, but for most people, this doesn't seem any easier than using a keyboard. Am I missing something?
I use a Czech keyboard layout on my Mac, because Czech has some letters that don't exist on a US keyboard, and I don't like switching between layouts. So basically all "programming" characters (braces, brackets, parentheses, apostrophes, quotation marks, pipes, colons) are behind modifiers.
I would totally enjoy being able to tell my IDE to "call foo with bar and string hello there end string with a block of gee times two" or something, instead of:
foo(:bar, "Hello there") { |gee| gee * 2 }
Just that, not having to think about typing different symbols would be a serious quality of life feature for me.
> I don't see why something similar can't be done for the Czech alphabet.
It probably could — we already can't fit all the letters with diacritics on the number row, so "ď, ť, ň, ó" are key combos. But as far as I know, Czech uses diacritics a bit more than Polish (e.g. for sounds that are digraphs in Polish), consider:
"Že se nestydÍŠ, nutit lidi psÁt ČeskÉ speciÁlnÍ znaky pomocÍ dvojhmatŮ!" — that's 10 modifiers just for the diacritics.
Having ALL diacritics as modifier combos would make typing actual texts even more annoying than programming is now.
Have you tried the UCW layout? English-like keyboard, but with a bonus modifier key that produces Czech (and other) letters. I use it and it's so much better than the traditional Czech layout.
If voice dictation was a killer feature, everybody would use it all the time for ordinary texts. But for some reason only few (lawyers? doctors?) use it.
I believe that's mostly because it doesn't work reliably. Doctors, lawyers, architects etc have a somewhat limited professional vocabulary and often say the same things, so voice recognition works pretty well for them. But when you write a random message, you have a much broader range of topics, and dictation that fails quickly makes the whole thing change from an improvement to an ordeal. "No, not 'or deal'. delete word. delete word. delete word. O-R-D-E-A-L. Yes, that's it. No, don't write that, sigh".
"… this will be a generational paradigm change in how to write code… if it works."
Why?
Can't really see myself working like this in an office, plane, cafe, with music on (my favorite way to code), in the house where my partner is also working. Then as others have said, editing might suck.
It's the future I always imagined as a child. A vast divider-less cubicle scape of people in Patagonia vests who define all caps constants by yelling at their standing desks.
"USER!! UNDERSCORE LIMIT!! EQUALS TWO THOUSAND AND FORTY EIGHT!"
I dunno, we already have stuff like Krisp AI background voice cancellation. I don't think it's far away to completely cancel background talking out. This is already huge for things like pair programming while one person's in the office, one is at home. If you have noise cancelling headphones for the person in the office too (with a bit of white noise), you can have a pretty perfect call in a noisy room. (not sponsored)
I'm not bothered by my call quality, which as you noticed is fine, I'm bothered by all the other people speaking (sometimes quite loudly) on their calls while I'm not on a call :-)
I could see it maybe being important once github codepilot is embedded in it? You tell it roughly what you want and then adapt by hand. But it is kinda funny seeing parent make such claims so early
> The problem with speech to code has always been that precise syntax is hard
The biggest problem is that talking sucks. You presumably can handle voice input as well as is possible, yet here we are typing to you anyway, and for good reason. Even if the natural language part is nailed, you may as well type in that natural language.
I imagine it will bring some quality of life improvements to those with certain disabilities, but I don't see why the typical developer would want to go in that direction.
I don't want to disparage their work, because it's really impressive, but "fill null values of column Fare with average column values" is closer to AppleScript than it is to natural language.
It solves the issue of trying to speak obscure code syntax like “close parenthesis semicolon newline”.
That’s enough to lower the barrier to entry for many people; I don’t know how good it is practically but it’s disingenuous to suggest it’s not offering a novel solution to an old problem.
The example puts it quite well. You kind of know what you want to achieve, step by step, but are not so comfortable with your tools.
Usually this kind of exploratory work involves a lot of Googling and copy-pasting snippets from Stackoverflow without putting too much time in trying to deeply understand things. If you get out what you want - great, if not, back to Google.
VERY PROMISING, in any case you can just manually fill the gaps with the keyboard!