Shouldn't that comparison be weighted by how frequent the words are? For example words in the top 100 usage would count for more than the top 1000 and the top 10000.
It would be a much different story if British English and American English had different words for "a car". Which, by the way, happens in Spanish dialects ("el coche" vs "el carro").
here (argentina, non-native) we usually say 'el auto' but have significant use of 'coche'. 'carro' means something different; using it for an automobile sounds mexican
but if you showed an argentine a picture of a car, they might very well say 'auto' while perhaps someone from elsewhere would say 'coche', leading to a basically incorrect point of difference being measured in this study between the two dialects
Why would it be incorrect? Sometimes two or three different words describe the same thing and that's ok. If you poll enough people you can get a rough idea if one version is more dominant that the other, if there's an even split, or if different regions in the same country prefer different versions. Similar to soda/pop/coke in the US.
You can design a study with a high level of data granularity. You could even track differences in pronunciation and grammar if you wish so.
because 'we usually say coche but sometimes say auto' is almost the same as 'we usually say auto but sometimes say coche', but they differ from 'we always say carro'. if a study is saying spanish is radically different in montevideo and in buenos aires, it's just wrong. this may not be the particular design error that resulted in these incorrect results, but it seems like a promising candidate
I think we're both in agreement. Perhaps my example of coche/carro was unfortunate and I didn't make my point clear enough.
A well-designed study, in my mind, would compare the usage of a varied bag of words. Starting from articles, pronouns, numbers, common verbs, then common objects, verb forms, less common adjectives, ending with uncommon objects and phrases. The compared words would be weighted based on their frequency. If two dialects have the same articles, pronouns, numbers, etc. and some differences in less frequent nouns, they would be similar rather than radically different - at least lexically. Things might look differently if we look at pronunciation.
I don't know what list of words was compared in the study linked in this subthread, so it's hard for me to say anything about it.
It would be a much different story if British English and American English had different words for "a car". Which, by the way, happens in Spanish dialects ("el coche" vs "el carro").