Hacker News new | past | comments | ask | show | jobs | submit login

When I talk to my Google Home then 50% of my brain power is engaged in predicting and working out how to best phrase something so that the "AI" understands what I mean and the other 50% is used to actually think about what I want to accomplish in the first place. This is just about okay for things like switching lights on/off or requesting a nice song I want to listen to, but I could never be productive programming like this. When I'm in the zone I don't want to have to waste any mental capacity on supplementing an imperfect AI, I want to be thinking 100% about what I want to code and just let my fingers do the work.

For that reason I think this will be less appealing to developers than GitHub may think, otherwise I think it's a cool idea.




I think the biggest use case for this is accessibility. There are plenty of people who permanently or temporarily cannot use a keyboard (and/or mouse). This will be great for those users.

For the average dev, I agree this is more of a novelty.


I am highly suspicious of new tech coming in the guise of 'accessibility'. As someone goin blind, a lot of things toted as good for me are cumbersome and bad.

Maybe this will be different, and that'd be neat. Though I just think more expressions of code is neat. I also know the accessibility you're talkin about isn't for blindness.

That being said I can talk about code decently well, but if you've never heard code come out of text-to-speech, well, it's painful.

I bring up the text-to-speech because if speech is input, it would make sense for speech to also be the output. Selfishly, getting a lot of developers to spend time coding through voice might end up with some novel and well thought out solutions.


For sight problems you are correct. But voice input is valuable by itself. I had chronic tendonitis in my wrists a few years ago. I looked into voice coding and it was difficult to set up. Fortunately for me I've been able to adapt with a vertical mouse and split keyboard.


You look at the product from your point of view and you are not the target group, it's that easy.


I do think there will be big advancements in the text-to-speech realm. I've noticed some ML projects imitating voices surprisingly well and while it's not quite there yet - it's already a bit less grating than it was even a few years ago.


“I think there is a world market for maybe five computers.” - Thomas Watson

I bet if we use our imaginations, we’ll think of a lot of places were using voice to code could come in handy.

Personally, I’ve been waiting for it for a few decades.

The creator of TCL has RSI and has been using voice since the late 1990’s

https://web.stanford.edu/~ouster/cgi-bin/wrist.php

Thought we were really close 10 years ago when Tavis Rudd developed a system:

https://youtu.be/8SkdfdXWYaI

GitHub seems to be more high-level. It figures out the syntax and what you actually want to write.

This would help if you barely knew the language.

Time to learn Rust or Scala with a little help from machine learning.


> GitHub seems to be more high-level. It figures out the syntax and what you actually want to write.

To me, looks like it's feeding your voice input to Copilot that then generates the code output just as before. So, the same strength and weaknesses of Copilot apply (and you can probably mimic it locally with a voice input method you control, just dictate comments for copilot)


> “I think there is a world market for maybe five computers.” - Thomas Watson

This statement probably didn't happen. The closest thing to it was 10 years after the quote is usually supposed to have happened and was about a single model of a single machine: https://geekhistory.com/content/urban-legend-i-think-there-w...


As a new dad, I would love to have the voice-to-text accuracy and speed I get on my Pixel phone on my desktop OS. Done right, I could easily see myself using it more often than when I have my youngling in one arm as I've been WFH for the better part of the last 6 years of work.


This looks to be much more heavily using GPT3/Codex/Copilot, which I've found to be eerily effective. It basically feels like a voice interface to Copilot. The main difference between these and something like Google Home is how effectively they pick up on context. "Hey Github" would be able to use all the code in the file as context, so when you say "wrap this in a function", it'll have an idea of what you mean, without that function having to be explicitly programmed. Voice assistants have to _always_ be in a voice space, so context is very limited. And generally the way Google home-style voice assistants are created is by programming specific actions linked to specific phrases. ML helps make the phrase matching flexible, but the action is usually entirely explicitly coded. Using Codex would let the action be ML influenced as well.

If Copilot is any indicator of effectiveness, then I have high hopes for this! I've always wanted to program while stationary biking :)


I think yes this could be a real multiplier for seniors, you're doing something you have done lots of times before just a bit different you know pretty much everything you need to do, describe it until it is in a state where you can through and finish it off. Exactly like a stationary bike or out in the garden with your kid type thing.

IF the voice analysis was any good of course. But maybe it will also be able to be better than typical voice analysis because the syntax is limited, when programming I use a much more limited vocabulary than when writing literary criticism. So while text to speech is total crap for handling complex literary phrasing it might be adequate for programming structures.


I’m a senior/systems architect coming down with bad carpel tunnel and this sounds like a godsend


Around 1998 I broke my collarbone and had to use Dragon Dictate.

I found that for general subjects it was quite difficult to use because of the fairly poor recognition rate.

But when I talked about computers, it got almost everything right. I assumed it must have been trained by the developers, who talked about computers mostly.

This is another special purpose vocabulary, so it seems as if it would have a good chance of a high recognition rate.


It’s most likely just Cortana bolted on to Copilot.


GithubNext here! Just clarifying that this is not the case.


Then it’s Whisper into command tree thing?


I don't use voice assistants any more due to privacy concerns but I wrote some similar software in 2010s. I'm fluent in English, but with the current tech, the success rate for me giving commands to a machine is still 50/50.

> I could never be productive programming like this.

It's likely to work much better than a generic speech-to-text model due to fine-tuning.

Plus, consciously or not, we will adapt our human language to the English-ML "pidgin" (e.g. by introducing a more efficient grammatical structures, using a specific subset of vocabulary).

The way I see it is that it's not much different from giving commands to your dog, writing a Google query, writing a Stable Diffusion prompt. It'll get better. Manual input is not as fast as speech though and that's where I see the issue.


I am happy to take a severe deficit over not being able to work at all. When my back was acting up, I could not physically use my left side. Dictation was the ONLY way I could code. By the end of this period, my output was back up to 95% of my typed output - especially as I don’t type code nearly as fast as I do general language writing.


GitHubNext here! We would love to hear more about your experience. Please help us out by signing up for this experiment :)


The voice interface experience(in general) so far is like trying to make a really stupid person do something for you. Out of context misunderstandings are the worst because it breaks your flow trying to understand why that happens and how to fix it.

I imagine that voice to code would be like standing over the shoulders of a junior coder who knows the syntax and some techniques just enough to follow orders but has no idea whats doing and when gets it wrong will be very wrong.


"Writing is thinking. To write well is to think clearly. That’s why it’s so hard." ~David McCullough

This not only holds for literature but also for programming. Concerning the hard part, I would argue that is the reason why it is not called "talking is thinking".


"If you're thinking without writing, you only think you are thinking." -Leslie Lamport

Even though now speech recognition rate is really high, but I wonder how many authors use speech to write articles. The comparison may make sense. And I think there's few.


I think there's a difference between communicating your intent to a machine, which is hopeless since it has no model of intention; and commanding a machine to reproduce something.

Ie., when you're managing your house you want something that can be communicated in an infinite number of ways, but the "AI" accepts a tiny finitude of ways.

However when programming it seems like we arent asking the machine to "write a function to do X", but rather saying, "def open-paren star args...."

This seems like a pretty trivial problem to solve.


> However when programming it seems like we arent asking the machine to "write a function to do X", but rather saying, "def open-paren star args...."

Click the link first and take a look at what is being showcased, because your comment is the exact opposite of what they demo when you visit the HN link.


You're right... So, yes, it will be largely useless (as shown) for actual programming.

But I suspect there'll be a subset of its features consistent with my comment that will be actually useful.

Programming, via Naur/Ryle, is always a kind of theory building. And unless you're basically copy/pasting, it's a novel theory of some area (, business process, etc.).

That's something where intentions arent even really communicable as such, since the art of programming is sketching possible theories as a way of finding out what we ought intend.

So this is another gimmik with maybe marginal improvements at the edges.


It's really useful for those who have challenges typing (arthritis, disabilities etc..), perhaps not best for general audience as typing with auto complete is faster.


For repetitive tasks like preparing a report in the demo, saying is definitely faster than typing. It's quite impressive if your boss ask you to prepare one and the report is done in less than two minutes.

However, I too really doubt if there's any better use cases than simple tasks, let alone everyone would hear what you ask the AI to do in the office. Oh my! How embarrassing am I?


The assistants (Google, Alexa, Siri) are not great at NLP. Compare how you speak to them vs speaking to a LLM like gpt-3, there is a world of difference. The latter feels like speaking to a human, the former more like your trying to get your voice commands into a state machine.


Blind people are already very productive using voice-to-code.


There may well be examples of this, but while the blind developers I have known (a small sample, I admit) typically use screen-reader technologies to navigate and read code, they use a keyboard to write and edit it.


I don't disagree with that. I just meant I don't think it's going to have mainstream appeal. A wheel chair also makes a disabled person super productive if the alternative is not being able to go anywhere at all, but it doesn't make wheelchairs super appealing to people with healthy two legs if you see what I mean.


I think this is great for: a) people who are visually impaired or have issues with their hands/fingers b) people who aren't programmers; if you could make it more Scratch-like then this is amazing tool for showing off power of programming


The mental load would reduce with practice very quickly


they're creating a new job "prompt engineers" to replace the engineers. this is 2022.


The rise of the... t...talking monkey? cognizes intensely




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: