Hacker News new | past | comments | ask | show | jobs | submit login

Drop the "Hey Github" nonsense (hopefully it's only for illustration purposes anyways) and … this will be a generational paradigm change in how to write code… if it works. The hard part will be editing code with your voice too. Like "no, I meant …" etc.

VERY PROMISING, in any case you can just manually fill the gaps with the keyboard!




Generational? Idk. I work for a company that regularly sends out surveys, and there are several tools to integrate voice into it. Willingness to speak instead of type is quite low across respondents (which is a representative population sample). It looks as if speaking to a machine does not hold the same appeal as speaking to a human (something that can also be seen in telephone queue screener questions).


I hate talking to machines. Sometimes it’s the best option (I love using a voice assistant in the kitchen), but almost always I’d have a full keyboard as an interface instead.

If machines were amazing at Speech-to-Text, okay, sure. But while the capabilities are impressive, they still kinda suck at it.


The only voice control anything I use is to create reminders on my iPhone, and the only reason I use that is because the default reminders app's UX is really bad that it's quicker to use the voice commands.

I don't see how text to code would be faster than typing. And even if it is, typing speed is not really a limiting factor in the speed at which I can produce code.


Speech to text is now amazing: https://huggingface.co/spaces/openai/whisper


It got "Theresa's and Aidsdorm" for "Turisas and Alestorm". Surprisingly, it got pretty close with the German band Schandmaul (something Alexa recognizes 100% of the time as Sean Paul), transcribing to "Schandmauel or Schandmöhrl".

But yeah, that is pretty close to amazing.

I kinda forgot about it after seeing that the Rhasspy community experimented with it, and it had issues with short utterances and a slow startup time.


The first amazing result would not be for you to program with this, but for somebody with a phone to be able to automate a few small tasks with just the voice.


Exactly. We've seen a constant improvement in the tech for decades. I remember before color lcd phones that they had "voice control" and today we have assistants, which are orders of magnitude more sophisticated.

Yet, it hasn't stuck. I'm exclusively using Siri to set timers. Most people are like me, or don't use it at all. Some use assistants for googling factoids or something. Fidelity wise, it's really underwhelming.

It's not a social acceptance issue, because people would still use it at home, and they don't. It's a small chance there's some key UI insight missing (discoverability for one), but I doubt it. Even with perfect UI, natural language is quite flawed when you're dealing with technical details (see exhibit on variable naming).

Anyway, the chances of Github solving this in an exceptionally difficult subdomain, as a side project, seems like a... Let's say, long shot.

That said, the silver lining in all these billions spent on voice interfaces is accessibility. For some people, these things are a life saver.


This is not marketed as an alternative input mechanism for people who have otherwise no difficulty typing code. It's an input mechanism for people whose abilities to type are limited.


Yes, but this answers to the grandparent, not the parent.

This means it's an assistive technology, but hardly "a generational paradigm change in how to write code".


If it works equally good like Apple Siri or Google Hey (or whatever its's called), then it will be ... totally useless? I can't imagine that they bring a better product than two of the richest companies in the world even can't figure out (perfectly). And if I need to read and adjust all my code for typos, I can just write it myself.

Because in my experience it is very often like "Call Peter" -> "Today it's sunny in NY".


To be fair Siri was really good before iOS15 on the phone - very rarely got a word wrong then I don't know what they changed but it went belly up for me and many other people have said the same.

On macOS it still seems pretty good - I have carpal tunnel syndrome and by Thursday or Friday most weeks I end up using Siri to dictate not code but a lot of conversations in Slack, pull requests, iMessage, etc. In fact, I wrote this reply with Siri right now.


I don't know what version number, but when it was new I could depend on it to do things like sending SMS while driving, changing the navigation, etc.

Now it's barely worth attempting, because it gets it wrong more than it gets it right.


I definitely notice there's a difference in quality depending on your network latency I thought quite a bit of the processing was done locally now, but latency seems play such a part in its quality.


iPhone ability to convert speech to text has always been good. It’s always been Siri’s capacity to take a meaningful action from the recognized speech that has been problematic.


I've been trying to use Siri while driving more and more, it's amazing how distracting it is compared to peaking at the screen (it's naughty, I know, I try not to do it).

But yeah, something about talking to a device which gets things wrong all the time is ridiculously distracting, at least for me.

Sometimes I look back at the road after trying to workout what it interpreted and I feel scared how focused on the phone I became.


>I can't imagine that they bring a better product than two of the richest companies in the world

Code is much more constrained by language syntax though.

Even for the "call peter" example, while the input is easy, the expected range of inputs that Siri should handle and be able to differentiate it from is huge.

Of course this is still a problem for e.g. defining variable names, where you could say anything.


In my experience, OpenAI's Whisper speech recognition is beyond anything currently out there. Likely Github will use it on the backend.


> I can't imagine that they bring a better product than two of the richest companies in the world even can't figure out

Are either of those companies investing particularly heavily into voice agents? Certainly neither of them has anywhere near the kind of power of something like Copilot.

Also, a general agent is way different from one that's specific to writing code.


Somehow Google has gotten worse in the last couple of years.


It seems wonderful for people who can't as easily use a keyboard, but for most people, this doesn't seem any easier than using a keyboard. Am I missing something?


I use a Czech keyboard layout on my Mac, because Czech has some letters that don't exist on a US keyboard, and I don't like switching between layouts. So basically all "programming" characters (braces, brackets, parentheses, apostrophes, quotation marks, pipes, colons) are behind modifiers.

I would totally enjoy being able to tell my IDE to "call foo with bar and string hello there end string with a block of gee times two" or something, instead of:

  foo(:bar, "Hello there") { |gee| gee * 2 }
Just that, not having to think about typing different symbols would be a serious quality of life feature for me.


>So basically all "programming" characters (braces, brackets, parentheses, apostrophes, quotation marks, pipes, colons) are behind modifiers.

Poland ditched a similar QWERTZ-based layout in favour of this: https://pl.wikipedia.org/wiki/Plik:Polish_programmer%27s_lay...

It's basically the standard US layout but the right alt (AltGr) is a modifier. So, for example, AltGr+A gives "ą".

I don't see why something similar can't be done for the Czech alphabet.


> I don't see why something similar can't be done for the Czech alphabet.

It probably could — we already can't fit all the letters with diacritics on the number row, so "ď, ť, ň, ó" are key combos. But as far as I know, Czech uses diacritics a bit more than Polish (e.g. for sounds that are digraphs in Polish), consider:

"Že se nestydÍŠ, nutit lidi psÁt ČeskÉ speciÁlnÍ znaky pomocÍ dvojhmatŮ!" — that's 10 modifiers just for the diacritics.

Having ALL diacritics as modifier combos would make typing actual texts even more annoying than programming is now.


My solution is just not to use czech characters, seems to work well so far :D


Have you tried the UCW layout? English-like keyboard, but with a bonus modifier key that produces Czech (and other) letters. I use it and it's so much better than the traditional Czech layout.



If voice dictation was a killer feature, everybody would use it all the time for ordinary texts. But for some reason only few (lawyers? doctors?) use it.


I believe that's mostly because it doesn't work reliably. Doctors, lawyers, architects etc have a somewhat limited professional vocabulary and often say the same things, so voice recognition works pretty well for them. But when you write a random message, you have a much broader range of topics, and dictation that fails quickly makes the whole thing change from an improvement to an ordeal. "No, not 'or deal'. delete word. delete word. delete word. O-R-D-E-A-L. Yes, that's it. No, don't write that, sigh".


"… this will be a generational paradigm change in how to write code… if it works."

Why?

Can't really see myself working like this in an office, plane, cafe, with music on (my favorite way to code), in the house where my partner is also working. Then as others have said, editing might suck.

If it was a neural link then I'd be in agreement.


> The hard part will be editing code with your voice too

The hard part will be open plan offices.

It’s bad enough that so many meetings are now zoom/teams and proximity to coworkers means you end up hearing their side of their meetings.

Just wait until all the devs are coding this way too.


It's the future I always imagined as a child. A vast divider-less cubicle scape of people in Patagonia vests who define all caps constants by yelling at their standing desks.

"USER!! UNDERSCORE LIMIT!! EQUALS TWO THOUSAND AND FORTY EIGHT!"


I dunno, we already have stuff like Krisp AI background voice cancellation. I don't think it's far away to completely cancel background talking out. This is already huge for things like pair programming while one person's in the office, one is at home. If you have noise cancelling headphones for the person in the office too (with a bit of white noise), you can have a pretty perfect call in a noisy room. (not sponsored)

https://www.youtube.com/watch?v=ILfTrUreS00


I'm not bothered by my call quality, which as you noticed is fine, I'm bothered by all the other people speaking (sometimes quite loudly) on their calls while I'm not on a call :-)


True, that's where I find noise-cancelling headphones with some white noise helps a lot. But I feel you


Crazy idea – whisper to use your computer. Might produce some quality ASMR in open plan office.


> this will be a generational paradigm change in how to write code… if it works

Why?


I could see it maybe being important once github codepilot is embedded in it? You tell it roughly what you want and then adapt by hand. But it is kinda funny seeing parent make such claims so early


  > once github codepilot is embedded
That's exactly the point of the demo, no?


Yup my bad for jumping to conclusions. This certainly seems worthwhile exploring


How can it not be a paradigm change when it changes the way people write code from “write by hand” to “generated by ai with natural language”?

The problem with speech to code has always been that precise syntax is hard, but AI codegen solves that.

So, no, it might not take off, but I feel like if it does, then it means ai-codegen will become the dominant way code is crafted.

That would be paradigm shifting.

It’s inconceivable that it wouldn’t be.


> The problem with speech to code has always been that precise syntax is hard

The biggest problem is that talking sucks. You presumably can handle voice input as well as is possible, yet here we are typing to you anyway, and for good reason. Even if the natural language part is nailed, you may as well type in that natural language.

I imagine it will bring some quality of life improvements to those with certain disabilities, but I don't see why the typical developer would want to go in that direction.


> generated by ai with natural language

I don't want to disparage their work, because it's really impressive, but "fill null values of column Fare with average column values" is closer to AppleScript than it is to natural language.


It doesn’t matter.

It solves the issue of trying to speak obscure code syntax like “close parenthesis semicolon newline”.

That’s enough to lower the barrier to entry for many people; I don’t know how good it is practically but it’s disingenuous to suggest it’s not offering a novel solution to an old problem.


If you prefer that, why not just use a language like AppleScript or Inform 7?


> …because speaking is easier than writing, obviously?

Easier isn’t always better.


The example puts it quite well. You kind of know what you want to achieve, step by step, but are not so comfortable with your tools.

Usually this kind of exploratory work involves a lot of Googling and copy-pasting snippets from Stackoverflow without putting too much time in trying to deeply understand things. If you get out what you want - great, if not, back to Google.


Works only in remote. Used in the office that'd be madness.


I can't wait to edit my Unreal blueprints using voice commands. Truly the future of programming.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: