flo_hu's comments

flo_hu · on July 22, 2019

I removed my blog post from medium's distribution. So it should now be freely accessible! https://blog.esciencecenter.nl/king-man-woman-king-9a7fd2935...

flo_hu · on July 18, 2019

There's several use-cases listed where Markdown could be used for, such as paper writing, presentation etc (I'm not sure I am going to use it for the later in the near future...).

I find that one of the most interesting potential use-cases is in "literate programming" on which the same author wrote another blog post: https://blog.esciencecenter.nl/entangled-1744448f4b9f

flo_hu · on July 16, 2019

No. That's of course still "King" :)

(but sure one could also pick queen, prince, royal form the list...)

Just tested it here: http://vectors.nlpl.eu/explore/embeddings/en/calculator/#

And it gave me 0.63 King, 0.6 Prince etc...

scotty79 · on July 16, 2019

So =Prince because you should exclude King similarly how you exclude to get Queen in original example.

flo_hu · on July 16, 2019

I would also agree with you that it is fine to add additional rules to improve the outcome, but than it shouldn't be made clear in the way the result is presented (as you say, that rarely happens in intro-level tutorials/courses).

Your last point sounds like a cool idea! Using those more in-depth metrics to find weaknesses and see if other, complementary algorithms can fill the gap.

SiempreViernes · on July 16, 2019

*should?

flo_hu · on July 16, 2019

Good point! I would see this rather as yet another argument for why you should simply give the actual output of the NLP algorithm.

So if people actually do the calculation King-Man+Woman and it comes closest to King, than they should report "King-Man+Woman~=King" and not "King-Man+Woman=Queen" (only because that's what they expected).

Sean1708 · on July 16, 2019

To be honest, I think the idea that we should expect ML algorithms to give a single, certain answer is misguided. I would expect the output from this algorithm to be "King - Man + Woman = King (90%), Queen (83%), Prince (70%)" or something like that, i.e. a list of answers with some measure of how "good" those answers are. Then again, I work in a field that doesn't really have categorical answers so maybe I'm missing something obvious.

flo_hu · on July 16, 2019

That's pretty much correct. You would typically calculate a vector for "King-Man+Woman" and then do a query on this based on a cosine distance (or similar measure) over the entire vocabulary.

The query would give you a ranked list of the closest word vectors with scores that indicate how good the match is.

piker · on July 16, 2019

But the example is only performing vector operations. You could perhaps normalize the distances of a number of vectors with a softmax or something to produce a probability across a set, but what's being presented in the paper is the "closest" vector following the operations in terms of cosine distance.

feanaro · on July 16, 2019

In the end, it doesn't matter what transformation you do though, as long as you do it consistently and/or not in an ad hoc manner. If excluding the original term always leads to useful results, it is a useful transformation.

The problems materialize when you're just cherry picking for results.

flo_hu · on July 16, 2019

Exactly! I think that was part of the problem for many of the examples that turn out to not-really-work. People pretend they let the work do by an algorithm, but then hand-pick from a list of somewhat close candidates. Which of course happens with a hypothesis (and thereby desired outcome) in mind.

flo_hu · on July 16, 2019

I'm fine with the free lunch thing. But here the cheating is done on the level of how people present the capabilities of the tool. If you ask the algorithm how "SHE is to LOVELY as HE is to X", the reported answer (Bolukbasi 2016) was "BRILLIANT", which in this case suggests a heavy gender-bias. But what the algorithm actually gives for X is: "LOVELY". The authors justed picked the 10th example in the list without clearly stating it.

yorwba · on July 16, 2019

> The authors justed picked the 10th example in the list without clearly stating it.

That's not an accurate description what Bolukbasi et al (2016) [0] did. In particular, they do not list x close to lovely + he - she and then pick arbitrarily from that list. Instead, they explicitly reject that approach (see appendix A), because they're looking for pairs of words that are maximally gendered. They do that by finding x and y such that the angle between x - y and she - he is minimized. Since the task they're solving is different, you can't fault them for getting different results.

[0] https://arxiv.org/abs/1607.06520

flo_hu · on July 16, 2019

Ok, thanks a lot for bringing this up! I will have a closer look at that.

raverbashing · on July 16, 2019

"Cheating" (in practice) usually means "embedding problem/domain specific quirks"

In the "King" example, you're adding and subtracting two words that are probably very close already, so if you want to find "something else" besides itself, you need to exclude it. For some problems it might make sense, for some others it might not.

flo_hu · on July 1, 2019

Despite all great advances in deep learning and big data, scientific research often is more about getting the most from very little data.