I used this in my Anki deck for John Heisig's Remembering the Kanji. The stroke ordering was helpful and it was animated with CSS. It's been a while, so I can't remember who to give credit to for the RTK deck or the CSS animations.
I also built a little tool that would expand out the Kanji character into its constituent radicals and Heisig primitives so I could at them to my study code.
At that time, I was reading NHK Web News Easy on a regular basis and wanted to study those characters with Heisig's visual approach.
There's also a stroke order font which is used in the Anki deck I have for learning kanji. The svgs in the post look great, but as far as I can tell there's not an easy way to copy the actual kanji character (you can get it from the actual SVG source however). The font is nice because as far as the clipboard is concerned it's just the character.
Oh interesting. I skimmed that page to make sure it wasn't already referenced and ironically didn't recognize that the kanji were the same as the link I posted. More practice needed I guess.
The publication of the kanji stroke order font at the nihilist.org.uk site actually predated the publication of the KanjiVG data. The person who made the Kanji stroke order font had access to the KanjiVG data before it was generally published. However, the link to that web site was only added to the list of projects a few months ago.
Kanji were also simplified but not necessarily the same way as simplified Chinese. Simplified Chinese also sometimes uses 'old' characters.
So sometimes modern Japanese is actually similar to simplified Chinese, sometimes it is similar to traditional Chinese, and sometimes it is unique. There is no simple 'fork'.
For instance, 円 (yen) is simplified Japanese and uniquely Japanese. It used to be same as traditional Chinese 圓. In Chinese it was separately simplified twice to its current form 元. So when you see prices in 円 in Japan and 元 in China it's actually the same original character simplified differently.
Interestingly, traditional Chinese 國 was simplified by reusing an old character and is now 国 both in Japanese and simplified Chinese but not in traditional Chinese.
To further complicate things, there are the 'kokuji" - characters that were invented in Japan, used only in Japan, and only have Japanese pronunciations - yet are still considered kanji ("Chinese characters)".
Examples:
I think 'kanji' should be interpreted at large. This is the Chinese writing system and inventing new characters (which happens everywhere this writing system is used) add to the whole corpus of kanji/hanzi even if some are invented or used in specific countries.
This is how a lot of kanji are formed. For example 町 (town) is 田 (rice paddy) + 丁 (street). I guess at some point in language formation a lot of towns were primarily collections of rice paddies.
Definitely not exactly the same. Kanji and Hanzi are two different character sets - they overlap a lot, but each has common everyday characters that aren't in the other, and sometimes the "same" character is in both sets but written differently in various languages (e.g. 骨).
In case anyone is wondering why different glyphs have the same unicode code point, and how an app is supposed to decide which one to render... Well I don't know the reason for the first question actually, though many people appear to have some choice comments.
But as for the second question: for HTML documents, many tags have a lang attribute that decide which version of the glyph to render within that tag. Hacker News has lang="en", so it'll use a user setting to decide. For example, in Firefox' about:config, there's a setting called cjk_pref_fallback_order. If e.g. ja comes first, the little square inside the top square in 骨 is rendered on the right side, if any zh thing comes first, it's rendered on the left side.
> In case anyone is wondering why different glyphs have the same unicode code point, and how an app is supposed to decide which one to render... Well I don't know the reason for the first question actually
My understanding is that this is basically "white guy says all Asian writing looks the same" in standards form and is largely regarded as a terrible idea.
Unicode had a builtin language tagging system to resolve glyph variants. Han unification was implemented with this in mind. Then the tagging got deprecated in a later version.
The more I learn about unicode, the more it looks like Bad Ideas: The Standard to me. The only good part of it is the UTF-8 encoding and that was just Thompson and Pike sitting down and thinking about the problem for an hour.
For instance traditional chinese in china will be left 過. Most computer systems will type this one
But in Taiwan they do right side. That said, i dont entirely understand how it works. You cant even copy\paste the right hand version into this comment box for instance- but you can see it on wiki. Maybe theyre separate fonts? Really not sure. Maybe somebody knows better
That's a neat project. While they are extremely similar, there are still many variations. For example one small variation is 今 is written with a horizontal stroke in Japan but a slanted stroke in mainland China.
If anyone is looking for animated stroke orders for Chinese characters, Hanzi5 is by far the best resource [1]. I built a small app to make them searchable too [2].
KanjiVG is awesome. I used it for a free kanji app I made for iOS and Android. Figuring out how to write an SVG parser + renderer wasn't as tricky as I thought when I set out to do it. https://www.bjmalicoat.com/projects/kanjibook
One thing that IMO is missing is the audio playback option for readings. This would've greatly facilitated remembering readings. AWS Polly is very easy to integrate with and it costs next to nothing. /nudge /nudge.
I looked around and couldn't find any examples or demo of the SVGs on the website. This is the epitome of a project that would benefit from a visual representation.
where the image is also displayed. There is also a new animation feature which was added a couple of weeks ago, and a "Random" button where you can get a random kanji.
There's also a link to Wiktionary on each viewer page
To ameliorate this problem, I've moved the "Viewer" link to immediately underneath the "Home" on the left side menu on the latest version of the website.
> It’s too bad the pitch data out there for Japanese is either proprietary or pirated without attribution
I have serious doubts whether this kind of data is even copyrightable, as long as you're not redistributing it verbatim.
A specific selection of words with pitch data (e.g. the NKH pitch dictionary) might be copyrightable as there was creative expression involved in picking which exact words to put into the dictionary and in what order. But the data itself? A 猫 is always going to have a HLL pitch accent (in the standard accent). That's a fact. And facts itself are not copyrightable.
You can't copyright a phone book[1]. Quoting from the case:
> "Notwithstanding a valid copyright, a subsequent compiler remains free to use the facts contained in another's publication to aid in preparing a competing work, so long as the competing work does not feature the same selection and arrangement"
Sounds exactly like the pitch accent data dump that's been floating around which a lot of people use. (Not the verbatim NHK one; the other one.) They probably used the data from NHK's pitch dictionary to compile it along with a few other dictionaries. But does it feature the same selection and arrangement? Nope.
I see apps pay to license data from https://www.cjk.org for pitch accent data. I don't know what their data looks like though. Maybe you're right and their business case for this data is essentially a scam, or there is non-factual copyrightable data, or the law is different in Japan, I don't know.
Given a large enough corpus of spoken lines, could you do some ML magic to get the pitch accents (maybe even just FFT and a simple classifier would do)? I'm aware that the "base" pitch accent does change in context so it's not quite trivial, but it seems like you could get pretty close?
The pitch variants are also highly regional but that’s an interesting idea. I wouldn’t want to give it to learners in use cases I can think of, when correct data is available though (just with licensing headaches/costs)
The correct data already exists, so I'm not sure what the point is besides having a less accurate but freer option
I don't think Suzuki-kun was trained on voice data, they trained a classifier on an annotated text corpus. And getting access to such a corpus is probably much harder than finding voice samples.
It seems like most people use the NHK pitch accent dictionary (there are TSVs of it online). I don't feel like it's particularly "pirated" though, can you really pirate the way people actually pronounce words?
Piracy is rampant. Whether lifting and reusing/redistributing copyrighted dictionaries or other stud materials, or pirating ebook/cdrom type content. That’s nice, but as a legit service provider it’s less accessible without just giving users their own ability to side load in materials
I'd never really understood just how complex these characters can be. Sure, there's a lot going on at first glance, but seeing the stroke order and direction, and imagining the process, really hammers it home to this monolingual American.
KanjiVG is pretty cool. The color coding for radicals and stroke orders is nice. Also, parsing and using the SVGs is fairly straight forward.
Having owned a couple of books that had the stroke orders wrong while I was learning Japanese, I always check for mistakes in the stroke orders of kanji like 右 (right) and 左 (left) to make sure they're correct. KanjiVG gets it right.
On a tangentially related note, I recently purchased an iOS Japanese dictionary app called "Nihongo" (https://apps.apple.com/us/app/nihongo-japanese-dictionary/id...) for my daughter because she wanted to study Japanese. I was just expecting a basic course, but it is probably the best vocabulary/kanji studying app I've ever used. It's a little pricey, but well worth it if you're trying to build a strong Japanese vocabulary or learning to read kanji. I have no affiliation with the people who make the app. I'm just an impressed buyer.
There are various disputed stroke orders and the radicals are also sometimes disputed. KanjiVG also contains a large number of variant stroke orders which are identified using suffixes.
That seems to have been uploaded on February 11 2023. I remember kanjicafe.com, which used to have a giant picture of Jim Rose's face for some reason. The site seems to have gone now.
There's a bunch of 2000s-era Japanese content up at <http://ftp.edrdg.org/pub/Nihongo/00INDEX.html>; I've been uploading some it it to the Internet Archive. The GIF you mentioned is on there, as well as SODER.
they weren't there originally, then there is a commit where they were added which says "Recover stroke numbers from SVG directory". But in the same commit the stroke orders for kana were also added, so it might have been just a side effect of something useful.
Another thing I don't really understand is why all the ASCII characters were copied into the "wide ascii" positions:
The commit summary actually says "The ascii characters copied to the full width character positions." which I think was completely pointless. KanjiVG doesn't have the entire JIS character set, since that includes Greek and Russian letters, and various graphical symbols, as well as half-width katakana (narrow katakana), so there wasn't any clear reason to stuff these duplicates into there.
I might bring these two issues up on the mailing list at some point.
One thing is balance. If you draw characters out of the typical stroke order, things will often look lopsided or have weird proportions. When written with the proper order, proportions look nicer.
For example, think about writing a capital A. It’ll look different if you draw the middle bar first or if you draw the outer two lines first. F will also look different depending on whether you draw the vertical line or one of the horizontal lines first. Try a Q with the little bottom dash before drawing the circle. It’s not only weird but more difficult.
The difference in these characters is subtle, but you can notice it with your own writing. Now instead of 3 strokes to write a character, imagine those with 15 or even 28 strokes. The odd balance and proportions have cascading effects.
It's partly just that it's tradition, but there are also practical reasons:
1) Chinese characters are traditionally/historically written with a brush, not a modern pen or pencil. Because brushes don't create uniform lines, there is a connection between the specific series of movements and the final appearance of the character (sort of like calligraphy pens). Basically, inconsistent movements tend to produce inconsistent-looking characters, so an agreed-upon standard aids in legibility.
2) Various components (most notably the "radicals") reoccur across many characters. Having a (mostly) consistent set of rules for how each component is written and in which order aids memorization because you're not learning every new character "from scratch".
3) Stroke order affects how mechanically efficient it is to write the character, which can be a pretty big deal when some of the more complex characters are upwards of a dozen strokes.
One thing not mentioned in other comments is that stroke order also helps with software that wants to do character recognition.
If, for example, you are taking handwritten notes on an ipad, and want software to convert the notes into text... well, knowing the order of strokes and having an agreed upon order helps considerably over just trying to match shapes.
Digital dictionaries also usually have a "handwritten input" mode to look up a character, and that mode will also recognize characters much more accurately when input with correct stroke order.
As a beginner Mandarin learner, my understanding is that historically, people wrote using the traditional stroke order, this informed what people think of as the aesthetically pleasing or "correct" way that the characters look. Now, if you want to write the characters in a legible and aesthetically pleasing way, the easiest method is to write them in the traditional stroke order. I think it's analogous to the way cursive writing in the west was taught, which informed the way it was written and what people thought of as the "correct" way to write cursive. If you wanted to learn to write in cursive, you could just look at existing cursive writing and try to copy it, but if for example you guessed that you should write it from right to left then you'd probably find it harder because cursive evolved to be written from left to right.
You can normally tell when someone uses the incorrect stroke order because things will be the wrong size. For example, when writing 因 you're supposed to write the outer ㄇ first, then the inner 大 and then the bottom horizontal stroke of the 口. If you start with the 大 then it's harder to write the outer 口 the right size.
Again, this is all from a beginner, so take it with a good amount of salt.
Writing by hand leads to optimizations, such as not lifting the pen a lot. This means that the movement between strokes also gets drawn. A 10 stroke character might end up being one single continuous path (effectively 10 strokes + 9 connections). Depending on stroke order, the results can be wildly different. So consistency makes sense for the result to be intelligible.
IMHO it’s high time China and Japan went the enlightened Hangul way Korea took half a millennia ago. There’s no reason to keep to absurdity going any longer; even they don’t know how to type their own words and use pinyin as input. The Vietnamese way would also be easy with their explicit tones written on each vowel, however they lost the advantages that blocky characters offer.
I also built a little tool that would expand out the Kanji character into its constituent radicals and Heisig primitives so I could at them to my study code.
At that time, I was reading NHK Web News Easy on a regular basis and wanted to study those characters with Heisig's visual approach.