> how you'd measure correlation between spoken and written
There are a number of ways currently used, but I have a new one to propose: compare the size of two G2P models (1 for each language), which have similar RMS errors. Assuming they are generated using similar techniques, the one which requires the bigger model probably has a less clean phoneme-to-grapheme correspondence.
There are a number of ways currently used, but I have a new one to propose: compare the size of two G2P models (1 for each language), which have similar RMS errors. Assuming they are generated using similar techniques, the one which requires the bigger model probably has a less clean phoneme-to-grapheme correspondence.