Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Author here, surprised this somehow got onto HN since I only posted on Mastodon.

Happy to answer any questions.



Great post. When you're reading a Mermaid diagram, do you just happen to have memorised that "dash dash greater than" means "arrow"? I assume the screen reader doesn't understand ASCII art.

And how painful is reading emails? HTML email is notoriously limited compared to HTML (and CSS) in the browser, but it's pretty hard to add structure to a plain text email too. How annoying is it when I do so using e.g. a "line" made out of repeated dashes?


Oh boy, don't get me started on emails. HTML emails are such a pain because of the hacks needed to get it to render properly across multiple devices. So I hear a lot of information about the tables being used for layout purposes, which is a pain because the tables are not semantically meaningful at all. And then there are emails that just have one or more images.

For a line of dashes like "-------", most screen readers can recognize repeating characters, so that string gets read for me as "7 dash". If using an <hr> element, then there is no ambiguity about what it means.


Users of the email client mutt has a similar problem, it doesn't render HTML and CSS and displays it as text, so instead they've developed a variety of workarounds like pushing the email body through a terminal web browser before showing it in mutt.

Might work for you too.

Edit: Also, do you MUD?


Oh yes. It was one of my formative childhood experiences. My first mud was Alter Aeon, but I haven't played in almost 10 years. I enjoyed myself during the 5 years or so that I played and got to know a lot of people. The first first thing I ever programmed was a bot to automatically heal group members.

Then Empire Mud, but I left due to disagreements with the admin. I loved the concept but it didn't really have the playerbase to support it.

More recently, I was on Procedural Realms. But I was affected by 3 separate instances of data corruption / loss, the last of which resulted in an unplanned pwipe since there were no offsite backups and the drive on the server failed. Years of progress gone due to lack of backups, so I'm never going back.

Ever since, I've been trying to find something else. Perhaps I'm just getting older but I don't have the patience to grind that I once had, which rules out most hack and slash muds. These days, I prefer something with interesting quests, places to explore and mechanics.

What muds do you play?


Neat. I mostly play Discworld MUD, which isn't very often due to small kids these days. It's a good all-rounder, has both fine grind and massive amounts of quests, exploration and crafting. Over the years I've become friends with many screen reader users there, and some of them were the fastest hunting group leaders I've seen.

http://discworld.starturtle.net/lpc/


I tried it briefly but bounced off after the tutorial finished, I couldn't figure out what to do.

Is reading the books required for enjoyment? I haven't read anything from the Discworld series.


After the tutorial, if you choose morporkian as language and Ankh-Morpork as starting location you'll be put in one of the busiest places in the world, outside a bar. Either outside or inside you'll find people who can help you get started. The 'say' command says something to the entire room, and 'tell username message' sends them a private message.

There's also a newbie group chat where you can ask for help, the syntax is 'newbie' followed by your message. It'll go away once you get too many levels in your skills.

A drawback with Ankh-Morpork is that it has cops, they might interfere if you decide to attack something that isn't a rat or cockroach or somesuch, but if you get caught and put in jail you'll eventually be released. Getting killed is a bit worse, you either waste your experience points by getting a raise from an NPC, or send a message to a particular type of priest that can resurrect you.


Thanks, I'll try this out again sometime.


Really liked the article.

The interesting part for me was that you can recognize synthetic voice much faster than human speech. Is there a specific voice you are using for 800wpm or it can be any TTS? Also, I think older voices sound more robotic that the newer ones (I mean pre AI, like the default on android is newer for me). Is there a difference for how fast you can listen to the newer more nicely sounding ones or the older more robotic ones?


> Is there a difference for how fast you can listen to the newer more nicely sounding ones or the older more robotic ones?

Yes. The main requirements for the TTS I use is it must be intelligible at very high rates of speed and it must have no perceivable latency (i.e, how long it takes to convert a string of text to audio). This rules out use of almost all voices, since a lot of them are focused on sounding as human as possible, which comes at the expense of being intelligible at high rates. The newer voices also usually don't have low latency.

> Is there a specific voice you are using for 800wpm or it can be any TTS?

I'm using ETI Eloquence. If I switched to another voice capable of being intelligible at ESpeak, I would have to slow down because I'm not used to it and have to train myself to get back to the speeds I'm used to.


Thank you for the answers. Even I'm not new to TTS usage, overall, this feels a bit like cyberpunk for me, like a neural interface that can provide you information as fast as you can consume it, not just how fast your "ears" can recognize it. Like a human modem.


I've added a section about TTS voices to the post, see https://neurrone.com/posts/software-development-at-800-wpm/#...


Great article. I was of course surprised to learn that it's possible to learn to understand the super-fast TTS, since videos and podcasts start to get very tough to follow around 2.5x and higher. I've been wondering: surely better algorithms for generating high-speed speech are possible, especially as we have more and more compute around to throw at it. It's not easy to search for, since "speed" for most tools is about speed of generation rather than wpm. As normal-speed neural net TTS models get incredibly good, I am hoping to see more attention paid to the high-speed use case.


I've added a section about TTS voices to the post, see https://neurrone.com/posts/software-development-at-800-wpm/#...


Yeah, the options for this are quite limited. The only ones I know of are Espeak (open source) but doesn't sound as good, and Eloquence, which is an abandoned product.

The use case for super high speed TTS are pretty niche though.


Thanks for the blog post!

I was wondering what TTS voices you use? I've heard from other blind people that they tend to prefer the classic, robotic voices rather than modern ML-enhanced voices. Is that true in your experience, too?


That was my initial thought, too - "I bet they can use a nicer voice now!"

Sounds like the robotic voice is more important than we give it credit for, though - from the article's "Do You Really Understand What It’s Saying?" section:

> Unlike human speech, a screen reader’s synthetic voice reads a word in the same way every time. This makes it possible to get used to how it speaks. With years of practice, comprehension becomes automatic. This is just like learning a new language.

When I listened to the voice sample in that section of the article, it sounds very choppy and almost like every phoneme isn't captured. Now, maybe they (the phonemes) are all captured, or maybe they actually aren't - but the fact that the sound per word is _exactly_ the same, every time, possibly means that each sound is a precise substitute for the 'full' or 'slow' word, meaning that any introduced variation from a "natural" voice could actually make the 8x speech unintelligible.

Hope the author can shed a bit of light, it's so neat! I remember ~20 years ago the Sidekick (or a similar phone) seemed to be popular in blind communities because it also had settings to significantly speed up TTS, which someone let me listen to once, and it sounded just as foreign as the recording in TFA.


Yeah, that bit about each phoneme sounding exactly the same everytime really made a lot of sense. Even if the TTS phoneme sounds nothing like a human would say it, once you've heard it enough times, you just memorize it.

I guess sounding "natural" really just amounts to adding variation across the sentence, which destroys phoneme-level accuracy.


> When I listened to the voice sample in that section of the article, it sounds very choppy and almost like every phoneme isn't captured.

Every syllable is being captured, just speed up so that the pauses between them are much smaller than usual.


I've added a section about TTS voices to the post, see https://neurrone.com/posts/software-development-at-800-wpm/#...





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: