Hacker News new | past | comments | ask | show | jobs | submit login

It really is a shame Spanish didn’t become the lingua franca since it does have almost a 1:1 spoken and written.

Of course, it could have been worse. We could have ended up with French as the lingua franca (yes, I know what franca means) where there is almost no correlation between written and spoken language.




This might be somewhat subjective, as I don't know how you'd measure correlation between spoken and written, but French seems to have a much higher match between written and spoken language than English.

Going from spelling to pronunciation in French follows (admittedly complex) rules that are rarely broken except for common words (or endings such as -ent). Vowel pronunciations for a given spelling are far more variable in English, and often depend on the etymology of the word. Plus, English has word-level stress that is not marked in writing (French has none, and it's marked in Spanish), and moving the stress will usually make a word unintelligible! That alone makes writing => pronunciation very difficult.


French spelling is unidirectional. I can comfortably say any French word (albeit with a couple key exceptions, such as "et" or "clef", that break rules), but I can't reliably go from someone talking to how to spell it. "Eaux", "eau", "au", and "aux", or alternatively e.g. "ou", "oux", etc., all have identical pronunciations, but different spellings.

Unsurprisingly, we can vaguely quantify this by looking at dyslexia amongst languages. English and various Southeast Asian languages that rely on Chinese ideographs are by far the worst, followed by things like Arabic, French, Hebrew, and German that have fewer exceptions but less guidance, and then followed last by things like Spanish, Cherokee, and so on that are truly one-to-one.


> how you'd measure correlation between spoken and written

There are a number of ways currently used, but I have a new one to propose: compare the size of two G2P models (1 for each language), which have similar RMS errors. Assuming they are generated using similar techniques, the one which requires the bigger model probably has a less clean phoneme-to-grapheme correspondence.


It's not subjective; French is better than English in this regard, and Spanish is better than French. English has more complex pronunciation rules and many, many more exceptions than those languages.


I confess being super happy that English is not so hung up on the gender of words. So, it has that advantage over Spanish.


The phoneme-grapheme correspondence in Spanish is better than English, but let's not pretend it is 1:1. Does it account for assimilation in rapid speech? Does it account for coarticulation of adjacent consonants? Does it account for regional/dialectal variation? Does it account for secondary articulation?

Even ignoring all of these, its clearly not bijective. For example:

C --> /k/, /θ/

Z --> /θ/ [0]

K --> /k/

Q --> /k/

G --> /ɡ/, /x/

J --> /x/

N --> /n/ (with several distinct secondary articulations), /m/ (rarely)

M --> /m/

R --> Can be tapped or trilled.

Etc. You can go here and see many bijection-failures here: [1]

I am being intentionally unfair to Spanish (which truly does have a much, much better phoneme-grapheme correspondence than English[2]), mostly to illustrate the point that there aren't really any languages which have a 1:1 mapping between spellings and pronunciations. Even if you decide to use the IPA to write your language, non-standard dialects end up needing to read words that don't match their pronunciations. What happens when inevitably the language undergoes change - do we update all of the books to use the 'new' spellings of words?

The ideal orthography shouldn't be completely 1:1, but it should be relatively shallow. From that perspective, Spanish orthography is a fairly attractive option.

[0] The non-1:1 situation with /θ/ gets much worse in most dialects of Spanish, where it is not distinguished from /s/. See: https://en.wikipedia.org/wiki/Phonological_history_of_Spanis...

[1] https://en.wikipedia.org/wiki/Spanish_orthography#Alphabet_i...

[2] Look at how effective Spanish-speakers are at reading without "decoding" compared with Portuguese, which also has a good p-g correspondence. In particular, look how much faster the Spanish students are at pseudowords, on page 141: https://www.academia.edu/17872463/Differences_in_reading_acq...


Most of these examples are unambiguous in context. For example, C is always pronounced /k/ when preceding A, O, U, and always pronounced /s/ or /θ/ depending on the dialect when preceding E and I.

The function from written Spanish to spoken Spanish (provided we are talking about a single dialect) is surjective, but darn close to bijective, especially if we exclude words of recent foreign origin.


From Mark Rosenfelder (https://www.zompist.com/spell.html):

> Many people expect … to predict the spelling from the pronunciations-- not realizing that few orthographies meet this goal. It's far from true of Spanish, for instance, which is often held up as an example of a good orthography. I stopped fervently admiring Spanish orthography when I saw a sign in a Mexican bakery with about one spelling mistake every third word.

So, no, hardly bijective!


Well, rules can be simple and there can be few exceptions and people will still screw up, so I don't think that anecdote proves anything. In any case, my claim is that there is exactly one possible pronunciation per correctly spelled Spanish word. The opposite direction is not quite 1:1, but again, it's very close, and anyway it's far closer than English.


Sure, that I can agree with.


Laughs in chilean or andalucian rap god dialect




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: