Even though Russian does drop the "to be" in present tense for all pronouns, other Slavic languages do feature it either as a separate word or together. For example this is how you say "I am Russian" and "You are Russian" in:
Slovenian:
Jaz sem Rus
Ti si Rus
Croatian:
Ja sam Rus
Ti si Rus
Slovak:
Som Rus
Ty si Rus
Whereas in Polish is a joint word (but the to be is still merged in there):
Jestem z Rosji
Jesteś Rosjaninem
I also thought that using the Latin alphabet feels a bit of a hack and Cyrillic would be a more "natural" way to put things. However, using Cyrillic would impose a higher learning curve for those who do not it.
This is mostly for phonemes that don't have a 1-to-1 representation in the Latin alphabet, like ш, щ, ж and what not. For these cases, languages like Croatian, Slovenian and others have come up with, what is my personal opinion, are quite acceptable workarounds (I just found out that this is called Gaj's Latin alphabet [1])
č, ć, dž, đ, š, ž (I skipped a few).
Without knowing initially how they are pronounced, read in context you do get an idea of what the target sound is.
I also think that on-boarding Cyrillic-first cultures would be easier because normally countries that use Cyrillic as their main alphabet also teach the Latin alphabet at school (Russia for example does and they also teach you how to write "Russian" using Latin characters because they understand the importance of this). Whereas it's not the same way around: a lot of the Slavic countries do not teach the Cyrillic alphabet at school.
> I've always had trouble trying to parse out Russian transliterated into Latin script.
I also think this is because when you're learning Russian as a non-native speaker, the courses focus mainly on Cyrillic (naturally so) so you never really learn that
ж -> zsh
I also think this is because when you're learning Russian as a non-native speaker, the courses focus mainly on Cyrillic (naturally so) so you never really learn [the romanization conventions].
There’s a number of romanization standards for Cyrillic, and which one is the most intuitive might be language-dependent (e.g. what Russian writes as a stressed ‹и› Ukrainian writes as ‹і›, which post-1918 Russian doesn’t use; what Ukrainian writes as ‹и› is closer but not identical to what Russian denotes ‹ы›, which Ukrainian doesn’t use).
Wikipedia has a good summary table[1] of some of the formal standards, but these don’t cover some of the vernacular usage, so to say. The most frequent quirk is probably writing ‹щ› as ‹sch›, as you do, when standards insist on ‹shch› (my guess is because the Belarusian counterpart of ‹щ› is ‹шч›, which would also be written ‹shch›); most confusing is perhaps writing the masculine adjectival ending ‹-ий›, ‹-ый› as ‹-y› or rarely ‹-yy› (writing ‹Navalny› for the surname ‹Навальный› or ‹Zelenskyy› for the surname ‹Зеленський›, maybe because old 19th-century German-inspired romanizations usually wrote it as ‹-i›, merging the ending with its Polish counterpart). I have to say I’ve never seen ‹zsh› instead for ‹zh› for ‹ж›, though.
Empirically, I’ve also found that French people struggle with pronouncing an English-inspired transliteration (‹sh› for ‹ш›, ‹ch› for ‹ч›), but have no problem with finding at least a pronounceable fallback using a Czech/Serbo-Croatian-inspired one (‹š› for ‹ш›, ‹č› for ‹ч›), so diacritics might indeed be underrated here.
I'd for a long time puzzled over the mysterious \t accent in TeX. Why did DEK implement this obscure diacritical and leave out the more useful Ogonek?¹ It was only when I stumbled across it (somewhat garbled) in an online card catalog entry that I discovered that it was used in some Russian transliteration schemes to represent digraphs like t͡s for ц.
⸻
1. I suspect the other reason was that implementing the tie was technically simple since, by allowing it to extend pass its spacing width, it can be treated like any other above-character accent. Ogonek, on the other hand, unlike a cedilla or the dot-under diacritic, requires positioning based on the letter that it's attached to so can't be programmed as easily as a floating diacritic mark.
I see you didn’t spend a week sick in bed with no fresh reading material except for a copy of the TeXbook :) Exercises 9.4 and 9.5 from the chunk of exercises on diacritics there mention ‹Akademii͡a› for ‹Академия› (usually ‹-ija› or ‹-iya›) and ‹I͡urʼev› for ‹Юрьев› (usually ‹Ju-› or ‹Yu-›), which I remember (sans the numbers of course) exactly because I had never before seen that romanization[1] and thought it weird. But apparently the Library of Congress does use it, and if you can get hold of official English translations/selections of Soviet physics or mathematics journals from the 70s and 80s you’ll see the authors’ names spelt according to it as well.
Note that in modern times, you’re supposed to use the tie to spell affricates and such in IPA as well, like in ‹t͡ʃ› and ‹d͡ʒ› and, yes, ‹t͡s›, even though nobody does as far as I’ve seen.
Alas, I read the TeXbook originally in 1986 and while I've dipped into it a lot since then, I've not re-read it in its entirety since that first read. I'm the one responsible for adding a section about the ALA-LC romanization in Tie (typography) on Wikipedia.
Which makes sense. The verb has already been declined and gives away which person it's addressing. Same in Spanish:
Yo soy (you can drop the yo because soy only applies for yo). Same for other pronouns.
the maxim is to "speak as you write and write as you speak". however there are variations to adoptions of the rule. croatians will tend write foreign words as written in that language, while in serbia you always follow this rule. so for example in serbia it is Majkl Džordan while in croatia its just Michael Jordan
Thanks for all the info. I definitely underestimated the number of languages that don't use Cyrillic. I also had it in my head that dropping "to be" in the present tense was more common. Do you know anything about the history of that as far as where, linguistically, it started being dropped? Like which Slavic language groups drop the verb and which don't? I don't know if that makes sense.
Slovenian:
Croatian: Slovak: Whereas in Polish is a joint word (but the to be is still merged in there): I also thought that using the Latin alphabet feels a bit of a hack and Cyrillic would be a more "natural" way to put things. However, using Cyrillic would impose a higher learning curve for those who do not it. This is mostly for phonemes that don't have a 1-to-1 representation in the Latin alphabet, like ш, щ, ж and what not. For these cases, languages like Croatian, Slovenian and others have come up with, what is my personal opinion, are quite acceptable workarounds (I just found out that this is called Gaj's Latin alphabet [1]) č, ć, dž, đ, š, ž (I skipped a few). Without knowing initially how they are pronounced, read in context you do get an idea of what the target sound is.I also think that on-boarding Cyrillic-first cultures would be easier because normally countries that use Cyrillic as their main alphabet also teach the Latin alphabet at school (Russia for example does and they also teach you how to write "Russian" using Latin characters because they understand the importance of this). Whereas it's not the same way around: a lot of the Slavic countries do not teach the Cyrillic alphabet at school.
> I've always had trouble trying to parse out Russian transliterated into Latin script. I also think this is because when you're learning Russian as a non-native speaker, the courses focus mainly on Cyrillic (naturally so) so you never really learn that ж -> zsh
ш -> sh
щ -> sch
х -> kh
ь -> '
etc.
[1]: https://en.wikipedia.org/wiki/Gaj's_Latin_alphabet
Edit: answer question about Russian transliteration, add missing link and try to fix formatting.