I’m building something similar for Chinese[0]. Every time a new language model w...

davidzweig · on Dec 10, 2022

If you wanted pinyin output, you could use a library like pypinyin? (we use it, no complaints). I assumed GPT models performed best in English, you could machine translate the input/output. MT is pretty fantastic these days, although, you introduce fidelity issues, and an API adds to cost and latency.

nmfisher · on Dec 10, 2022

I didn't actually need the pinyin conversion (I implement that myself), it was just a test to see how well LLMs* were progressing at "out-of-domain" tasks. I always used recent newspaper headlines, so the possibility of finding matching character/pinyin pairs somewhere in the huge training set was negligible.

For this specific task, Chat-GPT is the first model I've come across that actually shows some ability to chain together concrete steps to form a task (i.e. look up character, fetch pinyin), rather than the simplistic "predict the next word in the sequence" in pure language models.

* Large Language Model - I know ChatGPT isn't a pure language model (which probably explains why it performs so much better), but I don't know what we're calling these models now so I'm sticking with LLM for now.

nmfisher · on Dec 10, 2022

Figured I'd add a screenshot of my pinyin test with ChatGPT:

https://twitter.com/NickFisherAU/status/1598621215056625664?...

carom · on Dec 10, 2022

Even if you just had a set of lessons to run through, that'd be a lot of fun. Prompts that make it the counter party in text book dialogues. Character, pinyin, and translation shown.

The email sign up didn't seem to work on your page.

nmfisher · on Dec 10, 2022

That's basically the idea! Thanks for the note on the email signup, I'll have a look - I threw it together very quickly just before I went on a plane a few weeks ago so it's very fragile.

siwatanejo · on Dec 10, 2022

I've been looking for ages for a way to learn chinese with pinyin (or just audio, no chinese characters), please don't give up.

nmfisher · on Dec 10, 2022

Thanks! I agree, I'll stay away from characters, at least at an early/beginner level (although I am including them mostly for aesthetic purposes). It's a huge hurdle that lots of people don't necessarily need/want to overcome.