Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: My AI writing assistant for Chinese (dpw.me)
74 points by dwilding 10 months ago | hide | past | favorite | 28 comments
Hi HN! In trying to improve my Chinese skills, I built this tool that lets me write a mix of Chinese and English, then recommends a proper Chinese expression. This is super helpful when I want to write something in Chinese but I don't know all the vocab/grammar - I can enter my best effort and use English for the parts I don't know.

Really, the fundamental benefit of the tool is that it encourages me to exercise the writing muscle, rather than defaulting to translating from English.

My goal was to build something that is fast, relatively inexpensive, and not prone to misleading people. After some experimentation, I found that pairing GPT-3.5 with Microsoft machine translation works great! I'm using GPT-3.5 to generate an English version of the input, then Microsoft machine translation to translate the English version into Chinese. I've written a blog post about how I arrived at this approach: https://maybecoding.bearblog.dev/my-attempt-at-an-ai-writing...

The tool is pretty basic from a technical standpoint. There's a PHP backend that handles the API calls to OpenAI and Microsoft. The frontend is plain HTML with a bit of JS for interactivity - but the whole thing works fine without JS.

Hopefully other folks find this useful too. I'm interested to know whether I could have approached this problem in a different/better way. Thanks!




Interesting read, would not have thought about MS machine translation as I know that Google translate is complete garbage for Japanese compared to ChatGPT4.


After working on this project, I’ve started using MS translate as my go-to app if I want to translate an existing sentence. Previously I tended to use Google Translate by default. But I’ve genuinely been surprised by the differences between the apps


When I was testing the language capacities of LLM, Claude 2 was the only one to speak Georgian almost at a native level, better than Google translate and way better than GPT-4 (GPT-3.5 was just gibberish).

I'm curious if Claude models (now the version 3 has been released) work really well with Chinese too (beyond Microsoft Translator). Also there are several good open source bilingual models like Qwen.


I’ll check out those models - thank you!

According to my wife (a Chinese native speaker), GPT-3.5 is bad at outputting Chinese but GPT-4 does a good job


Oh thanks for sharing. This seems super useful. It's been a long time since I learned some Chinese but this would have been very useful when I was learning Chinese.

Quite often I find myself in that situation where I know most of a sentence but not how to put together.


Thanks for the kind words!

I’ve found it very helpful for staying in the zone and getting better at “thinking” in Chinese

If you start learning Chinese again in the future, I hope you find it useful too!


I think this is where language learning happens: when you know how to say part of the sentence but not all of it.


The website is quite interesting, but when entering certain Chinese phrases like "我服了你" "我真服了" or "你吃屎吧" it provides incorrect translations in English and Chinese. It's essential to review and address these inaccuracies to prevent misleading interpretations for users. Indeed, many expressions cannot be directly translated into English; instead, they require interpretation based on context.


Thanks for making this tool, highly useful! In the example you used, more native way is to include the "把" verb structure i.e. 你得把鞋带系好 instead of 你得系好鞋带. I was thinking if this could be improved by adding some examples in the prompt? I've heard that LLMs are good with learning by examples. I'm not sure if it works since you're first getting the English output and then translating it.


Thanks! Earlier in the project I tried to find a way to generate English but with Chinese grammar - something like “you need to grab your shoelaces and tie them well” - as a way to help memorize grammatical structures. But I didn’t get very far with that. For one reason, I couldn’t coax an LLM to output such unnatural English (!)

So in the end I decided that I can ask GPT-4 to explain alternative structures based on the translation I get

Ideally, I would like to find a reliable way to directly transform my first attempt at an expression into a native-level expression, without necessary routing via English. It didn’t seem doable with the LLMs I tried so far, but there are others I can try, and maybe some training would help too - as you say


Interesting. I use your promot and create a GPT[1], give an example: `I服了you`:

服了u: Correctly translates to "I give up on you." Unscramble: Incorrectly interpreted as "You impress me."

- [1] https://chat.openai.com/g/g-tI0XLZxuR-fu-liao-u


I have seen some variation in the interpretation - although it does seem to be pretty consistent. When I enter “服了u” into unscrambler I get the interpretation “I give up” every time I try it


Thanks for sharing this! I occasionally use google translate and/or GPT4 for similar purposes, but your tool makes the workflow a bit simpler.

I've found creative writing in a target language is great for learning.


Happy to share, thanks!

I also use GPT-4 for explaining the meaning of sentences in more detail (as in JimDabell’s comment). Often my questions are like “how would a native speaker say this colloquially” - I’ve found it really valuable to be able to have a back-and-forth on why something works the way it does


This is great. Foreign-language practice definitely seems like a great use case for LLMs. I'm surprised it understood your “你 dei tie ..." example, since "tie" is also valid pinyin.


Just for fun, I’m trying to force it to misinterpret what I’m saying by changing “dei tie” to “di tie” (as in subway). It really doesn’t want to interpret it differently! I think “shoelaces” must be such a strong signal of context for the LLM


I’m just starting to learn Mandarin at the moment and I’ve found LLMs useful for this. I can ask the question “I’m learning Mandarin, can you break this sentence into individual hanzi and explain it?” and it will do a good job of explaining everything. Or I can say something like “I was transcribing an audio recording for an exercise and I wrote X but the answer was Y, what did I do wrong?” and it will tell me that I misheard 妈 (ma, the question particle) as 马 (mǎ, which means “horse”) etc. Or I can paste in a sentence and ask it if it’s grammatically correct, and it will explain what I did wrong and how to correct it.

For instance:

> > I’m learning Mandarin, can you break this sentence into individual hanzi and explain it? 果汁很好喝

> Certainly! Let's break down the sentence "果汁很好喝" into individual hanzi (Chinese characters) and explain what each character and the whole sentence mean.

> - 果 (guǒ) - Fruit

> - 汁 (zhī) - Juice

> - 很 (hěn) - Very; quite

> - 好 (hǎo) - Good

> - 喝 (hē) - to drink

> So, when you put these characters together:

> - 果汁 (guǒ zhī) means "fruit juice."

> - 很 (hěn) is an adverb that intensifies the adjective that follows. When used before an adjective, it indicates a high degree of something, commonly translated as "very" or "quite" in English.

> - 好喝 (hǎo hē) combines "good" (好 hǎo) and "to drink" (喝 hē), which together mean "tasty" or "good to drink."

> So, the sentence 果汁很好喝 translates to "The fruit juice is very tasty" or more directly, "Fruit juice is very good to drink." It's a simple sentence structure expressing that the speaker finds the fruit juice to be quite delicious.

It does idioms too:

> > I’m learning Mandarin, can you break this sentence into individual hanzi and explain it? 吃醋

> Sure, I can help with that. The sentence "吃醋" consists of two characters: "吃" and "醋".

> - 吃 (chī): This character means "to eat" or "to consume".

> - 醋 (cù): This character means "vinegar".

> However, when put together as "吃醋", the phrase doesn't literally mean "eat vinegar". In Chinese, "吃醋" is an idiomatic expression that means to be jealous or envious, usually in the context of a romantic relationship. It can be translated to feeling "sour" over someone's interaction with others, hence the metaphor of "eating vinegar". This phrase showcases how Mandarin often uses vivid imagery to encapsulate emotions and situations.

Even going down to a locally running Mistral 7b gives pretty good results, although it sometimes makes minor mistakes or doesn’t give as thorough an explanation as the larger models. Still very useful for my stage though, where I’m mostly just trying to remember something that isn’t quite fully memorised yet, or exploring what’s adjacent to my current knowledge.


If you're looking for a local translation tool, I wrote a small CC-CEDICT lookup script in shell+JS then eventually rewrote it in Go so I could run it on whatever computer I was using at the time: https://github.com/alisonatwork/cedict

It just does a dumb greedy search to try find the longest chain of characters for the translation, but that turns out to work more often than not:

  $ cedict -m 果汁很好喝
  果汁    [guǒ zhī]       /fruit juice/
  很      [hěn]   /(adverb of degree)/quite/very/awfully/
  好喝    [hǎo hē]        /tasty (drinks)/
I wanted to add different strategies on -s like dict(1), but I never got round to it because it was easy enough to just force breaking the word by adding spaces:

  $ cedict -m 入鄉隨俗
  入乡随俗        [rù xiāng suí sú]       /When you enter a village, follow the local customs (idiom); do as the natives do/When in Rome, do as the Romans do/
  $ cedict -m 入鄉 隨俗
  入      [rù]    /to enter/to go into/to join/to become a member of/to confirm or agree with/abbr. for 入聲|入声[rù shēng]/
  乡      [xiāng] /country or countryside/native place/home village or town/township (PRC administrative unit)/
  随俗    [suí sú]        /according to custom/to do as local custom requires/do as the Romans do/ 
Now I live in Taiwan it might be neat to update it to output traditional characters, but to be honest I think simplified is easier to read and learn for foriegners.


I don’t mean to derail this thread, but I am considering a stint in Taiwan to learn Chinese. I am wondering how different the accent/dialect is compared to mainland China. Are you a native speaker or did you learn while there?


I am an ethnic European who lived in China for several years and now live in Taiwan. I speak and read Chinese at a level good enough for casual conversations, doing banking/rent/etc and getting by in the workplace, but I am far from an expert.

The accent and dialect is mutually intelligible on both sides of the strait, but there is a fair bit of vocabulary and some pronunciations that clearly mark you as having learned in one place or the other. Think of it as the difference between British and American English.

The bigger difference in my opinion is simplified versus traditional. It is possible to learn one and read the other, but having lived almost 2 years in Taiwan now I still find a wall of text in traditional a lot harder to digest. Personally I think simplified is more legible, especially on low res screens or at a distance, but it's unlikely to change over here for political reasons.

Taiwan is probably a fine place to learn Chinese as a beginner. It's easier to "cheat" due to the English language resources available in the community, but if you make an effort I'm sure you can immerse yourself.


I use GPT-4 for this kind of exposition too. If you haven’t come across Pleco [1] yet, I can highly recommend it. It’s a high quality Chinese-English dictionary with a lot of helpful features for reading Chinese

One feature I use a lot is the clipboard reader, which lets you step through Chinese text character by character and shows you the meaning of each character. It can be faster than using an LLM, but you don’t get the whole-sentence meaning or the ability to ask follow-up questions like you can with an LLM. But it’s a great app all round (some features are paid)

[1] https://www.pleco.com/


I wrote https://pingtype.github.io to split sentences into their words, and translate each word to pinyin and English. That might be easier if you're trying to read a longer section of text.


The question particle is 吗, not 妈. 妈妈 is mama, as in baby's first word meaning mother, shared with many other languages.


Haha, believe it or not, that was actually my error not the LLM’s. I changed the example and missed a word.


You just inspired me to build my own version of it for learning German. Thank you


That’s awesome! I hope you have fun too :)


Went into my favorites. Thanks for putting this together


My pleasure, glad you like it!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: