Hacker News new | past | comments | ask | show | jobs | submit login

Yeah I think it's clustering and dimensionality reduction. E.g. if French people form a rough cluster in DNA space and Japanese people form another cluster, someone with a French mom and Japanese dad would show up as a data point roughly halfway between those two clusters.



I went on a little Wikipedia expedition hoping to find the methodology (and failed), but I did find an interesting (though not surprising) quote from one scientist (Adam Rutherford): “[These tests] don’t necessarily show your geographical origins in the past. They show with whom you have common ancestry today.”

So when our thread’s ancestor (pun unfortunate) says “45% Levantine” they mean they 45% of people alive today that are from that region (which includes many European immigrants). I bet this gets very messy given immigration patterns. Like which immigrants count as ancestry sample, and which don’t? For this problem I would personally use cluster analyses, however I would probably simply give up, knowing that cluster analysis would give me junk (and ultimately arbitrary) results with such noisy (and potentially skewed) data.

EDIT: The answer was right there next to this quote, in one of the aside picture for the same article[1]. They use Principal Component Analysis. Which IMO is even more fraught than cluster analysis, as you have even more control over what you want to get out of the model. If I remember correctly PCA—along with factor analyses—is used heavily in personality psychology and intelligence testing, the latter of which is very famous for radicalized pseudo science.

1: https://en.wikipedia.org/wiki/Genealogical_DNA_test#/media/F...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: