Hacker News new | past | comments | ask | show | jobs | submit login
Gender and verbs across 100K stories: a tidy analysis (varianceexplained.org)
159 points by var_explained on April 28, 2017 | hide | past | favorite | 158 comments



> I think this paints a somewhat dark picture of gender roles within typical story plots. Women are more likely to be in the role of victims- “she screams”, “she cries”, or “she pleads.” Men tend to be the aggressor: “he kidnaps” or “he beats”. Not all male-oriented terms are negative- many, like “he saves”/”he rescues” are distinctly positive- but almost all are active rather than receptive.

Got mathematical analysis, but terrible semantic interpretation.

There are only 3 really negative verbs for men ("murdered", "kills", "kidnaps"), and 3 distinctly positive ("saves", "defeats", "rescues"). "Beat" is ambiguous, personally I would more likely interpret it as "defeats" rather than "hit"/"punch". This is in no way a "dark" picture, and the only relevant conclusion is the one he briefly mentions in passing: women are described as more passive, men as more active.


The term aggressor is misplaced. The data shows how men as a general rule always play the antagonist and the disposable henchmen. The mustache-twirling, eye-rolling, leering, cackling, and hand-rubbing villain is by definition male, and in the rare occasion a plot is different and switches the gender of the antagonist it also adds additional changes like poison and stabbing instead of guns and punching.


> The data shows how men as a general rule always play the antagonist and the disposable henchmen

I think part of that sentence is amazingly perfect for the leanings exposed in this study.

Women aren't disposable henchmen, or are rather rarely.

Unless we stumble across female societies in the course of a story, the idea of a disposable female character strikes a chord against our idealogies.

It isn't really surprising, killing women and children is considered reprehensible, the thing only the villain would do.

But it does make them unsuitable for henchman material.


mustache-twirling, eye-rolling, leering, cackling, and hand-rubbing villain is by definition male

Apart from the mustache (or maybe not) this is an apt description of the wicked witch trope of female villains. So I'm going to say, no?


The wicked witch trope is named after the wicked witch of the west character, which don't do much leering or hand rubbing. There is a lot of cackling, possible some eye-rolling, but that's about it. An other famous witch villain is the white witch (Narnia), which has none of those attributes.

But lets look at a famous villain that has been portrait as both female and male roles and see how they differ. Moriarty is male in the TV show Sherlock (https://www.youtube.com/watch?v=YN7DYPJLXkc), and female in Elementary (https://www.youtube.com/watch?v=lY7i8cQontg). Which character is show more as the typical villain, and which is portrait as more complex with redeeming values? If they keep the script identical with the exact same lines, what would happened if the genders was reversed for the two shows?


The wicked witch trope is named after the wicked witch of the west character

Yes, but the actual details of the trope go back much farther in time. The Fates of Norse and Greek mythology, the Three Witches in Macbeth, Baba Yaga from Slavic folklore. There is a long, long tradition of female characters with supernatural power, unsettling appearance and behaviour, and a tendency toward transgressing taboos such as cannibalism.

As for your Moriarty example, you're butting up against the TV industry's insistence on casting beautiful actresses in almost every role. This is a problem! If you replaced Natalie Dormer with a woman in her late 70s who's never had plastic surgeries you'd have a different comparison to think about.


The villain as plotted in current shows share roots in the older mystic stories but they also mirror current cultural values rather than past cultural values.

If for example we would have Hansel and Gretel, which is full of old symbolism about past cultural views of older women, it would be completely impossible to gender swap the antagonist. Interpreting a male antagonist in that story in current culture we would invoking feelings of pedophilia and not cannibalism, changing the story in a very radical way.

With current cultural values, stories generally cast female antagonists with a strong theme of betrayal. Even in the few cases of female henchmen like that one specific James Bond movie with female henchmen, we still see that arc. Male henchmen in other movies don't get such arcs, and is generally simply thrown in as an obstacle to be disposed of. Female henchmen are also generally not killed, which further limits the possibilities.

Last, the Sherlock version would not work with a late 70 old woman if they kept all other aspect of the character intact. Listen on the clip and try to imagine a old woman saying it. The character explicitly say they want to be seen as the classic villain, and as such it do not work unless its male. Even the wicked witch do not go around and say to the protagonist that "I am the villain and you should hate me".


That paragraph didn't match the rest of the article. I sense that he reached his own conclusions well before collecting the data. Confirmation bias is the cardinal sin of data mining.


He doesn't mean "dark" in the sense that the words are negative, he means they don't conform to modern progressive social views.


People who have poor understanding of human nature will consider this an evidence of sexism.

Think for a minute about why "Fifty shades of grey" is popular among female readers.

If there are many stories are filled with males competing over resources and women (in the form of violent criminals kidnapping women and brave heroes rescuing them) then might it be that these kinds of stories are what people are looking for?

Try to imagine a story about a weak man who cannot defend himself against a gang of three women who kidnap him, only to get saved by his brave girlfriend, who upon rescuing him promises him to stay by his side for ever.

Just try it. Does it sound like an interesting story?


That plot would fit into, say, Buffy the Vampire Slayer (which had a viewship in the millions) no problem.

Also, if you're going to say things like "natural difference between men and women that are implicitly understood by people" (down thread), you're going to have to bring some double-blind, peer-reviewed evidence.

This is a controversial subject that's overflowing with armchair sociologists. Right now you're indistinguishable from them.


You have it backwards. If you think there are no natural difference between men and women you better bring a mountain of evidence.

Strong claims require strong evidence. I did not make any outrageous claim.

It would be incredibly surprising if men and women were identical in every aspect except for their reproductive body parts

Just consider the difference in selection pressure that men and women face.


I have no problem with the possibility that men and women are different. What's unjustified is, given the powerful effect of culture on behaviour, someone making offhand claims that they know what differences are natural.

I read "natural difference between men and women that are implicitly understood by people" as saying that what's "implicitly" (i.e. commonly) understood as to be the natural differences is correct. Because what has been "implicitly" understood has differed substantially across time and cultures, that's a highly suspect claim that requires a matching degree of evidence.

Furthermore, just the long and loud history of people spouting off absolute bullshit about what women are and aren't is, by itself, enough to demand significant evidence for any positive statements of what's natural or not.


>Also, if you're going to say things like "natural difference between men and women that are implicitly understood by people" (down thread), you're going to have to bring some double-blind, peer-reviewed evidence.

Millenia of history is a kind of evidence. And the main answer back is "this history is the history of an oppression". Which doesn't even go to why would (all things being equal) one gender get to be the oppressor and the other always the oppressed within the same society, and with the same population numbers on each side (e.g. we're not talking about slaves captured via force here).


Natural differences are things like 'mean height', and the ability to get pregnant.

It is quite reasonable that past societies divided labor based on suitability of physical capacities for different tasks. It wouldn't be reasonable to call this 'oppression' per se.

However the structure of roles human societies has always been mediated by the level of technology. Gender divisions of labor, class and caste systems etc, have always shifted when the technological structure of production and warfare have changed as technology has developed.

Reinforcing roles that were developed in a less advanced technological society but are not necessary in the present landscape is most certainly oppression, as is resisting thought that could lead to more freedom for groups you don't belong to.


Think about that history affected our biology (due to evolution). Consider the case with food. We now have an abundance of food rich with carbs and fats, but if we eat too much of them we get sick, because our bodies were designed with certain assumptions about environment. In this case, or natural environment.

Wouldn't the same be applicable to our social environment?

Also, technology is not permanent. Consider what would happen if society collapsed for whatever reason.


>Also, technology is not permanent. Consider what would happen if society collapsed for whatever reason.

Plus, "just because technology enables it" is never enough reason for anything.


"Just because it was this way in the past" is an even worse reason for anything,


Actually that's a very good reason, if not the best one.

Having been that way in the past means it's already tried. And if it has been carried on, it means it has passed the test, and it has proven useful. It also means its not fickle.

https://en.wikipedia.org/wiki/Lindy_effect

https://medium.com/incerto/an-expert-called-lindy-fdb30f146e...


Nope. All it tell you is that it has been tried and it works for whoever is in power. Any kind of oppression that works, can therefore be justified this way, including slavery and genocide.

Just because something works, does not mean it is good to do. And in fact, having seen the undesirable effects of a practice may provide all the reason need for discontinuing it in favor of an untried practice.

You are simply arguing against the development of society.


So women should be kept in outdated gender roles in case society collapses?

At least it's clear what kind of logic you are using.


Note that the qualification "outdated" has been added by you, not the parent.

One could have easily said "evergreen" or "tried and true" or "resilient" roles.

Also not being reliant on what's enabled by the available technology for major choices is more than about "being prepared in case society collapses".

For one, technology is neutral, it can enable all kinds of things, including things that are bad for society. What should be going should be a cultural/ethical/etc discussion, not a knee-jerk adoption of any new available option.

Not saying that this is the case here, but there's more to being logical than your critique of the parent.


If you want to make the case that the gender roles are not outdated, be my guest.

The parent brought up the idea of society collapsing. Not me. If you have something to add please do, otherwise you are just kicking up dust.

There has been a cultural/ethical discussion going on and a great many people decided that gender roles needed to change. If you 'aren't saying that's what happened here' how is your comment relevant?

All you have attempted to do here is add doubt without adding substance or logic of your own. If you can actually argue your position, go ahead, but engaging in this kind undermining suggests that not only can you not, you are aware of it.

Your other comments on this topic show that your suggestion that you are merely trying to point out that my logic isn't as strong as it could be is a misrepresentation of your view.


"Female hero? How boring!"

How can you consider your views as non-sexist? You're clearly defining roles based on gender.

Edit: First story off my head is the first Resident Evil movie. She wakes up in chaos without memory. From what she can discern her would be lover and she are accosted by zombies. She fights her way through as the hero. Ultimately I think she remembers her lover betrayed her and started the outbreak or some such thing. Point is, it wasn't boring.


> Female hero? How boring!

That's not what I said.

Your example is not of a female rescuing a weak helpless male and then "putting a ring on it". It's of a female hero who rescues herself after discovering her lover betrayed her. Still the bad guy role played by the male. Still the person worth caring about is the female.

> How can you consider your views as non-sexist?

I'm simply not concerned with whether you consider my views sexist or not. I just want to have a realistic understanding of human nature.

Now, if I said that women should have _less_ rights than men, then I would be sexist (in a bad way). But I never even implied anything close to that. So why should I worry about my views being sexist?


> if I said that women should have _less_ rights than men, then I would be sexist (in a bad way).

That isn't the definition of sexism.


Not _your_ definition of sexism, apparently.

If you define sexism in such broad terms that it includes both bad and neutral things, the word will lose its meaning.


No, that isn't the OED's definition of sexism. The word already has a specific meaning.

> Prejudice, stereotyping, or discrimination, typically against women, on the basis of sex.

You seem to be making it more specific to exclude things you don't deem important.


>The word already has a specific meaning.

The definition is so broad as to be meaningless.

Even having different boxing (the sport) categories for men and women could be construed as "sexism".

Much less any implication that men and woman can have, as genders, different sensibilities and priorities, and not just because they were "raised that way" but also because of biological imperatives.

E.g. something that we know is true for almost all animals in nature (e.g. which gender hunts, etc).


> Even having different boxing (the sport) categories for men and women could be construed as "sexism".

It absolutely is. And that's a good thing.

And boxing's weight classes are weight-ist. Also a good thing.


> Even having different boxing (the sport) categories for men and women could be construed as "sexism".

I disagree that fits under prejudice, stereotyping, or discrimination. But of course it can be overly broad if you apply it overly broadly.


How is it not stereotyping to assume that women are weaker than men?


Because sexual dimorphism is real, but it isn't as simple to apply that to things that can be affected by society and culture.


I'm glad you agree that sexual dimorphism is real.


Of course, it doesn't explain everything about sexism though. You're basically making an appeal to nature fallacy about the roles of women in media.


One of the better horror movies that I've seen in a while, Hush, features a deaf woman as the protagonist. She is writing a novel in a cabin in the woods, but then is taunted by a psychopath and she has to fight like hell to survive.

https://en.wikipedia.org/wiki/Hush_(2016_film)

There is one pretty chilling and awesome scene where the psychopath uses the hand of her recently-murdered friend to knock on the window:

https://www.youtube.com/watch?v=9p2KPPRp7ZY


I think your story example is a bit forced. After all, there are few stories about what the princess is doing in the castle while waiting for the brave knight to rescue her.

We like stories where the characters we follow have agency. A story about a brave woman who saves her boyfriend after undergoing hardship - that certainly has the potential to be interesting.


Not if her boyfriend is weak and helpless before, during, and after the story.


Not interesting to YOU. And why that is, I'm not sure.


It might be interesting to you, but I would challenge you to find a large audience for it.

And if this audience exists, why aren't there more mainstream stories catering to them? Surely, this could be an untapped market worth millions of dollars. That is, if it were true that most people would find this kind of plot incredibly interesting.


> Try to imagine a story about a weak man who cannot defend himself against a gang of three women who kidnap him, only to get saved by his brave girlfriend, who upon rescuing him promises him to stay by his side for ever.

What's wrong with that story? It'd be more interesting than the inverse actually, because it's plausible but less played out.

Here's a story not so dissimilar if you don't believe it's plausible. Sounds like a script that would write itself:

http://www.independent.co.uk/news/world/africa/three-women-k...


The story is only interesting because it's strange. Like if you heard of a man who was stuck on top of an electricity column.

It does not strike anyone's fantasy.

Sure, these kinds of stories may exist and have some audience, but not the wide audience that you would see for mainstream typical romance stories.


Even if it were true that "these kinds of stories are what people are looking for", that doesn't mean it's not "evidence of sexism".


Depends on what you mean by sexism.

If you mean unfair discrimination against women, then that's not an evidence for it.

If you mean natural difference between men and women that are implicitly understood by people, then yea you can call it sexism but it's not necessarily a bad thing.


What if he means socially-constructed expectations about the roles of men and women, which have detrimental effects on women?


Of course, I suspect that's what he means, but I also suspect that what he sees as "detrimental" effects on women are actually not.

But this is all a bit too abstract. Could you be specific of what these determintal effects are?

Please not that any attempt to "make up" some egalitarian idealistic expectations without any regard to the biological reality is definitely going to have detrimental effects on both men and women.

For example, if you try to push the idea that women, all women, should abandon the role of motherhood, because it's an "oppressive" societal expectation, and instead should chase after high paying careers, like the powerful men in society, then I think your idea (if it was like this, or similar to this) will have a detrimental effect on most women (if it were to take hold in society).


I can give you specifics.

A friend of mine studied electrical engineering in Spain, and was so good at it that she got accepted into MIT's Technology and Policy Master's Degree. Back when she was 18 and about to enter college, her father got into a big fight with her, because "engineering wasn't a profession for women".

My mother and I were raised by my mother only. Your idea that motherhood is at odds with "chasing after a high paying career like the powerful men in society" is ridiculous. They aren't at odds, and the propagation of that myth by people like you is detrimental to the women that consider the choice.

My neighbor made more money than her husband. So instead of "abandoning the role of fatherhood, to chase after a higher-paying career", he's the one staying at home raising their daughter, while she keeps working full time. She's the sweetest little girl, and they're very happy. In their case, the man might be better fit for that role than the woman is. I wouldn't have trouble believing that, were their roles reversed, the girl would be a bit less happy, and they certainly would have less income to save for her future.


Unfortunately, every single difference of any significance is often took to be sexist and (of course) wrong sexist.

Why are the top 100 chess players men? Sexism. Why are there more male than female programmers? Sexism. In non-romance movies, why are there more male leads? Sexism.

There is legitimate wrongful sexism, but they aren't usually found in first world countries.

It's as if we ever agreed there was a significant innate difference between the sexes (or races for that matter), the world would come crashing down and put us in the Stone Age.


Right - there's no wrongful sexism in first world countries and there is no difference in the messages males and females are given during their upbringing about how they should behave in society.

Also, family courts make perfectly fair decisions.

/s


> Right - there's no wrongful sexism in first world countries

I don't think you read my comment, because it disagrees with that statement.


You said:

"There is legitimate wrongful sexism, but they aren't usually found in first world countries."


Actually there is a very good biological explanation for why this is the case: men are more varied than women. Because until the modern era men were expendable and the victors impregnated the women for the next generation.

http://www.denisdutton.com/baumeister.htm


I don't understand how that theory would make women less varied.

Successfully carrying, birthing, and raising the next generation depends almost entirely on the mother.


> I don't understand how that theory would make women less varied.

Look up sexual selection and dimorphism lectures from a respectable university. There's a huge body of evidence about the variation between sexes and a lot of it is beyond me since I'm not a biologist.

It explains rather well how the male population distributions get "cut off" as a result of sexual selection with constraint to resources.


Because the men are able to evolve some things separately from women (eg on the Y chromosome) and versa - mitochondria is from the mother. And even if this was not the case - some genes express themselves only in men (D'OH) like testosterone and baldness.


Ok they're different. But that in no way proves any specific difference is physical or societal.


Actually there are lots of strong female leads now. It all started, I think, with Joss Whedon (Buffy) and Robert Taper (Xena). Women used to watch that much more than Hercules etc.

Now here is the thing... this is all wishful thinking. And the strong female lead is very much exceptional and stereotyped. She is just really strong and skilled somehow. Nearly all the other women are still weak. Once in a while her skill "rubs off" on her companions but the average woman is still weak.

And this is just about strength and/or fighting skill. How about intelligence? There is a movie now called "Gifted". Same thing - a fictional story about a lineage of women with exceptional gifts in math. None of the other women are portrayed this way.

In other words, whenever a woman is portrayed in fiction, even if she is stronger, more skilled, more resolute, more intelligent etc. than men, she is an exception.

There are a few exceptions, such as Fast and Furious, who just cram their movies chock full of two really exceptional women. But then they also cram them chock full of feats, explosions etc.

Where are the popular movies or literature where nearly all the women are routinely stronger or more intelligent than men at things?


The 100


That sounds like sexism to me. If our culture demands that stories about men and women fit into gender roles then that is a sexist culture. It might not be the direct fault of authors, but we cannot just throw up our hands and say "well it is giving the people what they want".

It is also possible to misunderstand the market. Remember when people said that the reason why action movies didn't have female leads was because nobody would watch them? And then The Hunger Games and TFA were released?


I feel like you could have made all the bad guys in You're Next women without harming the film's effectiveness, so probably yes?


People who have a good understanding of sexism will consider your comment evidence you do not understand it.


Of all places to make arguments about Darwinian resource struggles, HN is probably the least appropriate. I assure you the male heroes in those stories are definitely not the type to read HN.

Just as our culture has in-built biases against women it also has in-built biases against men who use their brains more than their brawn.

Need I even cite any examples of this? It's such a well-worn trope.

And you know what? That anti-nerd bias is pretty stupid and it needs to change. Just like the anti-female bias is stupid and needs to change.

The anti-nerd bias is finally beginning to change a little as our culture begins to recognize the leadership value of smart men (thanks Elon!). And we're also going to change that anti-female bias.

Why? Because over time anti-sexism will just plain work better than sexism. Just like over time leadership by brainy guys has worked better than leadership by dumb alpha jocks.


Sorry, what are you talking about?


The reference at the end talking about comparing changes over time reminded me of a problem I've been kicking around. For this particular system that's not too hard, since you're putting each word on a one-dimensional femininity/masculinity scale, so you could plot a word on a line graph or something. But what do you do if you want to evaluate the changes of more complex relationships over time? Not just mapping words to a constant vector space, but modeling the relationships between words, such as clusters or word2vec representations. With something like word2vec you can take a bunch of words and project the vector space onto a plane so you can see the relative distances, but how do you express changes over time? You could show a bunch of planar projections for different instants in time, but it's hard to look at that and capture the changes.

So how do you visualize changes to these more complex interactions between data points, and also how do you mathematically quantify some of these changes? I'd really appreciate any advice on this. And sorry that this is kind of off topic for the article :)


The Gapminder charts very nicely illustrates multi dimensional data over time:

http://www.gapminder.org/tools/#_locale_id=en;&chart-type=bu...

I imagine a dataset that charted the word use over time would fit into a chart like this.


I don't think I understand what your end goal is.

what do you do if you want to evaluate the changes of more complex relationships over time?

Which complex relationships are you talking about here?

Edit: I think I understand what you're asking. Maybe one good way of doing this would be to cluster word groups for different periods of time. You could then perhaps look at where a word is now, where it was at the beginning of "time" and take the delta to examine words whose meaning has changed the most.

Another approach might be to do apriori over time. Take a text from 1600 and lhs is "some manly action" and rhs is "he". Take a text from 2000 and lhs is "some formerly manly action" and rhs is still "he" but with much lower levels of support.


I did something like this in a project for my former work at a research group, and it was very difficult to visualize due to the number of variables we needed to communicate in the visualization.

What we were trying to do was use word2vec to model changes in relationships between 700 proteins and one in particular, which is related to cancer/tumor growth. We created multiple word2vec models based on year windows of medical journals (so the 2003 model had input journals from 2000 - 2003).

To visualize the models we used a D3 force graph where the nodes were the 700 proteins and the edges were known discoveries of relationships between proteins that had years associated with them - as in X protein was discovered to be related to Y protein in 2007. The relationship data was curated by people, independently of the word2vec models. The size of each of the nodes was determined by the word2vec model for the particular year's similarity score between that protein and the cancer-related protein we were interested in.

To see the changes between years, we used a year slider which the force graph would respond to by animating the sizes of the nodes in line with the particular year model's similarity score for each protein. In addition, the color of the nodes represented changes in the models' similarities between the proteins - more green meant that the word2vec model's similarity score had increased compared to the previous year's model, and red meant it had decreased.

The visualization is useful, but it is a bit of a mess considering there are 700 . I'll message you the link and if anyone else want the link I can send it, but I'd rather not post it here since it's hosted on my college's CS department server and it's not equipped to deal with a lot of traffic.

Also if anyone has an idea of how else we might have done it I'd be interested to hear it.

Edit: Didn't realize HN doesn't have a PM feature - if there's another way to send it to you and you're interested let me know.


Thanks, that's helpful! And you can email me at <my HN username> at gmail.com.

I'm also curious if there are any ways to quantify, mathematically, the changes over time. There's the simple sum of the squares of the changes distance to get a sense of the "kinetic energy" of the system, but I'm wondering if there are some more clever analyses, especially something that can quantify localized changes versus global changes.

Edit: so are you running a separate word2vec thing for each year's dataset? If so, how to you map between them, because the orientation the word2vec mapping generates will be random, and I worry that trying to rotate the mappings to some common axis could obscure some of the data.


Sent! Yeah, we made a model for each year's dataset. In our case, we were only interested in the similarity between our target protein and the others, so we used the model's similarity measure between those in order to avoid problems with varying orientations between models.


One way is to learn clusters in the data over the whole time period and then calculate the cluster distribution of the same clusters for each time interval separately. You can then track the proportion of each cluster over time as a time series.

So for example if you have a word cluster that describes computers, you could see it start growing in the seventies, while having near-zero proportion in 19th century etc.


What about 1d protection and simple line plots over time? You'll need 2 or 3 of them over some interesting projections, but it's better than unfamiliar and complex representation.

Alternatively 1d protection + style change for another axis. Like changing the colour, thickness of the line, etc.


> With something like word2vec you can take a bunch of words and project the vector space onto a plane so you can see the relative distances, but how do you express changes over time?

Animate it?


The gender is strongly correlated to the biological sex. The biological sex is, in turn, strongly correlated to physical strength, aggressivity and other traits.

I enjoyed the technical description, but the results didn't exactly shock me!


> what verbs are used after “he” and “she”, and therefore what roles male and female characters tend to have within stories.

It might've been better had the author (and the Jane Austen article's author) used some NLP processing to see whether the pronoun was actually the subject of those verbs. But I'll grant that it's usually the case.

Also interesting: gender and the object of those verbs.

EDIT: after some research (that I should've done before posting), it's a remarkably effective technique and it seems only the most contorted sentences might get tripped up. English is nearly always Subject-Verb-Object.


Do you need NLP for that?

The forms he, she and they are used when a pronoun is the subject of a sentence. The forms him, her and them are used when a pronoun is the object of a sentence.

https://www.englishgrammar.org/words-heshe-himher-hishers/

I would think looking at the verb that comes after he/she also doubles as a subject/object filter.


I agree that the nominative case endings and usual SVO order of English should prevent (nearly all) false positives.

(The only one I can think of is that comparatives are meant to take the nominative case, e.g. pedantically it's supposed to be "he" rather than "him" in "A man smarter than he would decline". However, this rather convoluted sentence structure is rare, even more unlikely in a plot summary rather than the full novel text, and almost always followed by the subjunctive rather than an indicative verb.)

My bigger concern was what doesn't get caught because it appears after a noun rather than a pronoun, or has an adverb in the way. I thought major dramatic plot points might be more likely to use the character's name rather than a pronoun, and so we might see fewer words like "murders", "defeats", etc. From the results it seems like those words are still present in large quantities, so perhaps I'm wrong.

I'd like to see it done with "they" too, partly as a control case and partly to see if any verbs are more common for individuals rather than groups (although the rise of impersonal "they" may hinder that aspect of the analysis).


> My bigger concern was what doesn't get caught because it appears after a noun rather than a pronoun,

Yeah, some clever gender/name lookup would be a good idea, many subjects in plots are the characters' names. Maybe even more subject names than pronouns.

That said, it seems unlikely that there's a pronoun-substitution gender bias, so it would probably just yield more samples of the same trend.


What doesn't get caught is a super interesting question. (As is object genders too.) How often he occurs vs she is important. How long he is talked about vs how long she is talked about; I'm not certain about screenplay conventions, but usually the first sentence would name the character and some number of subsequent sentences might use she rather than repeat her name.

So yeah, there's plenty more room for making this analysis scientific, and no reason to assume it's unbiased.


Can you give an example sentence where this isn't the case?


One thing that might cause error is the difference in pronouns vs proper names names or other labels.

In the example "The man winced, then he fell to the floor." You'd only register "he fell." This would also be biased if male or female characters were named more often.


It's interesting how thoroughly ingrained sexist concepts are in the language. Even a verb that's fairly active like 'resist' assumes a power relation in which they are in a worse position.

I'd like to see this done by country or year or language or genre.


Heh. You have no idea. English doesn't even have proper genders. You're on easy mode.

If I say: "Your manager is good.", it's neutral. You can't figure out if I'm talking about a man or a woman without something additional.

In Romanian, and I guess in most Latin languages, no such luck. Except for a handful of neologisms, everything clearly mentions the gender, for better of for worse.

The direct translation in Romanian: "Managerul tău e bun" is masculine (both the noun and the adjective make that obvious).

The whole neutral gender movement in English sounds kind of funny for us... we could try to do it Romanian but it would be like lobotomizing the language.


A Bulgarian colleague of mine said that even surnames are gendered - her surname is Nakova, her father's is Nakov. It plays merry hell with admin software from anglo nations which assume surnames aren't gendered, apparently.


Same in Czech, which I believe still enforces the -ova ending. (Please correct me if that changed)


Not just Bulgarian. I know Latvian surnames have the same concept. Probably others too.


Russian too, I'm pretty sure from watching years of tennis. Marat Safin, for example, and his sister Dinara Safina.


Once again, US culture cherry-picks data and declares that any statistical discrepancy is caused by oppression or sexism. Other hypotheses are taboo and dismissed. You, as well as the author, jump to the morally accepted conclusion without looking at the real world.

I believe in gender equality. But it is worth thinking about other hypotheses even if you don't accept them, that is, the implications of biology in human behaviour.


Is it sexist if it's reflective of reality? Men commit more aggressive crimes than women by far.


An interesting question is: does it just reflect reality, or does it shape reality? There are cultures with different ideas about genders - ideas taught to people. So a random book, movie, story, etc you discover in modern west has a higher chance of describing a man murdering a resisting woman. If you see that trope repeated enough times - are you affected in any way? Is there a statistically relevant impact on society as a whole?

I'd say that if there is, then yes, content which keeps the idea alive is sexist to an extent. (Unless you're writing non-fiction) And the existence of the adtech, and ideas like brand awareness, and catchy jingles, and recent fake news is a decent example that society is relatively easy to influence just by constant repeating and enforcing of information.


It's probably still sexist. I think you would probably agree that literature spoke the same way about black people a couple hundred years ago. Cultures can be sexist or racist even if everyone subscribes to the same discriminatory standards.

However, I think the interesting point is that this is the culmination of 100,000 plot descriptions. At that point, it's not a reflection of bias in an author, it's a culturally accepted position on the roles of genders. (whether right or wrong)

...or I guess it could be a bunch of biased wikipedia editors :D


If you took 100,000 historical books describing the periods of US slavery, you'd have a strong correlation between "black" and "slave", but concluding that the books are therefore racist would be extremely illogical.


You have a valid point of logic. The only problem is that in reality, many many of those books actually were racist. Slavery and racism were in fact culturally acceptable in the United States for a period of time, it is not at all a stretch to hypothesize that the writing of the time reflected that fact. One might even argue that to assume otherwise would be extremely illogical.


Ok, but the conflation is still wrong. A tract by John Brown about the evils of slavery would conflate the two without racism, while a racist screed by a Southern manner owner might use "servitude" or some other euphemism. The question isn't whether racism was common (yes, obviously) it's whether a specific language correlation reflects a racist author or work about a racist society.

More broadly, this is a case of "right for the wrong reason". Many books mentioning slavery will be racist because many books of all sorts were racist, but we're asking whether this is informative or just a base rate fallacy.


Conflation of what? The conflation was @tomp's straw man example, @zebrafish and @artursapek didn't conflate anything nor presume that word frequencies equal racism.

It's true that correlation is not causation. It's true that books that use the words 'black' and 'slave' doesn't say anything about whether they're racist. Nobody in this thread was saying otherwise.

It's also true that the history we actually have is one filled with real racism and writing about real racism. It is not one that had a majority of John Brown writing about the evils of slavery. This isn't speculation, we have a historical record and this has already been demonstrated.

Bigger picture, @artursapek asked an interesting question, is a body of literature sexist if it accurately reflects society? Men do commit more crimes, is it sexist for there to be more writing about men committing crimes than women?

@zebrafish answered with an interesting answer -- we might consider it somewhat sexist because our society is still somewhat sexist. If the degree of sexism in literature and in society are the same, then is the literature sexist on it's own? Maybe it is.

Both are hypotheses, and in my book it's perfectly fine to ponder hypotheses. Neither one said something I would presume to call "wrong".


  racist ==> "black" correlates with "slave"

  "black" correlates with "slave" =/=> racist


Correct, you're absolutely right. Nobody said otherwise.


Brave New World describes an inferior negroid class whose female zygotes furiously accept sperm zygotes with great vigor.

Its not just about slavery, and I'm not sure how this analysis would even begin to comprehensively find these currently unfavorable descriptions of people.


Brave New World is meant to portray a distopia.


The alpha and beta classes were not living in a dystopia. Seemed quite nice. Freedom to pursue intellectual endeavors, freedom to have sex with whomever you wanted with no consequence, a life of privilege.


Except if they wanted to use their intellect to think critically, or wanted to move anywhere beyond sex...


Brave New World is widely considered a dystopia, but it is an introspection and satire on existing societies, which for many people actually are dystopias. This is a separation from stories that focus on the post-apocalyptic aspects primarily to create the genre of "scifi dystopia". Therefore, viewing Brave New World as an introspection only leaves the parts of the society that I like as the part to evaluate to see an ideal segment of society, colloquially called a utopia:

- I want to be at the top of the pyramid society

- I want there to be no consequences for promiscuity

- I want a calming mind altering drug that has no side effects

- I want peace and stability to focus on my intellectual tasks

Brave New World accomplishes that.


Let's say your second hypothesis is true.

What would that mean? That men commit more aggressive crimes or that the users of the language perceive it that way? Also, what would then be seen as a crime or as violence?

The point I am trying to make is that there is probably a cultural factor in this as well. (How the culture perceives things to be.)


Shhhhhh! Theres no room for reality in here, were pretending that gender is a made up sociological concept


To my knowledge, gender is thought of as a complex of at least two aspects: those differences between what is thought of as masculine and feminine, and the roles assigned to genders. In these cases, gender pertains to the ideas of gender identity and further gender expression. In other cases, gender is also thought of as including or being inextricably related to biological sex.

Gender isn't "made up" in the same way that words aren't "made up"; it is an aspect of society, but unlike words, gender is with some relation to observable physical biological expression, usually termed as 'sex'. But the concept of gender as I have seen it used is certainly depending on society, rather than physical sex.

Variations within sex (such as intersex) is another question and is unrelated to gender, as far as I know.


The problem with "gender" as a concept is that they hijacked existing words ("man" and "woman") that were previously used for "sex". This can be clearly seen by trying to define these words without reference to sex - you can't except by defining them recursively ("men are people who identify as men"), in which case, I think a better choice would be to invent new words for this new concept.


Sex is to male and female as gender is to masculine and feminine.


We're waiting for your constructive rebuttal instead of snide remarks.


There's biologically driven behavioral differences between male and female due to genetics. These differences are on a spectrum that varies between individuals but they are clearly visible in aggregate. A lot of recent political rhetoric bulldozes over the obvious differences between male and female to the detriment of our species. For example, the push to have the same physical standards for male and female in the army and elsewhere will inevitably drive the standards down. Males are mentally and physically tougher by design. And before anyone calls me sexist, they also die younger. Everything is a trade-off.

I believe that denying these differences is sexist, as it results in laws that affect males and females unequally. Should sentences for violent crimes be shorter for men because higher testosterone levels make them more biologically prone to violence? This is dangerous political territory bordering on eugenics but an interesting question nonetheless.


But there is plenty of evidence to suggest that isn't an absolute. Men's efficacy at aggression is greater, when it happens, and that is what gets reported.

I would say most of the research in this area is in abusive relationships. Where only heterosexual abuse from man to woman is noticed and acted on, which greatly distorts reality and cultural expectations.

Many men are physically abused by women, and this is underreported.

Gay and lesbian relationships also produce a power dynamic, which has also gone underreported, as they were already either marginalized, or collectively too busy trying to make all LGBT relationships seem completely benign so that their neighbors wouldn't marginalize them.

Do I have enough quantitative evidence to disprove your "by far" statement? No, but thats not the point. There is something about HUMANS that we should enlighten ourselves about.


Yes, and women commit the majority of infanticides.

Still, doesn't change the fact that most violent agressors (and victims of violent crime) are men.


again, down to the reporting.

will a human being pumped with testosterone and upper body strength be violent? a human being with a Y chromosome? thats currently what culture suspects given the reported information available in crime statistics.

I'm telling you that such reductionism isn't an absolute, as there is enough evidence from marginalized people and power dynamics to warrant altering the cultural assumptions. The culture alone (and the stories written in these cultures) is what perpetuates a lack of further insight into this.


Well, we're talking about gender here, not sex, and given that there are two genders [commonly represented in literature], it stands to reason that they would have different tendencies of behavior, or else there would be no difference between the genders, and the concept would not exist.


Right, of course. Since we live in a universe devoid of power relationships, it stands to reason that two groups described differently are in fact different, since no one has an interest in describing people other than accurately. Like how the Jews used to be all scheming and evil, but really cleaned up their act in the past few decades.


Right, of course, since we live in an universe devoid of genetic differences, it stands to reason that two groups described differently are in fact different.

Genetic differences are FACTS. If you do not account for facts, you're operating in the twilight zone.


It would indeed be hard to argue that there are not genetic differences between people of different sexes.

It does not, however, follow that all perceived differences in behavior between gender groups are genetic at root. It would be rather dishonest of me to say "men in the US like trucks, therefore liking trucks is embedded in the male geneome" without considering that there could be a societal aspect to that behavior.

Genetic differences may be facts, but blindly assigning genetic causation to perceived behavior without completely ruling out other possible influencing factors is just bad science.


Hey, the Twilight Zone was always my favorite show. It's a place between mystery and imagination, a place where the laws of reality are bent, a place where people arbitrarily start thinking you said genes don't exist.


Genetic differences are facts, but if you think that our understanding of what they imply about society is factual, you simply don't understand science.


Like how the Jews used to be all scheming and evil, but really cleaned up their act in the past few decades.

Seems like the Muslims have taken up the Evil Baton.

That said, what if the Jews had been the outgroup that has no obligations to the ingroup and vice versa? In the Middle Ages there were prohibitions on taking interest and charity was encouraged, which meant credit was unavailable, but relatives had potential to be bottomless barrels. We are still seeing this with athletes from an impoverished background. One can see why a group with no such restrictions would be useful, even successful.

I think in East Africa the Indians occupied a similar position in society, and when Idi Amin emulated the Catholic Majesties and expelled the Indians the outcome was equally disastrous.


That's a weird conclusion to draw.

Much of this reflects reality. Murder in general is committed more by men. Most reported battery and assault (beat) is committed by men. Murder by poisoning is almost exclusively committed by women. First responders or soldiers who are "saving" people trend male, etc.

When you're looking at 100k fiction works, many of them are going to be romance, mystery and other dramatic type stories that reinforce these concepts.


People in literature don't have a biological sex, they only have a socially-constructed gender, and so naturally they will have the characteristics of that gender ... in fact, the sum of behaviour characteristics of the genders in our stories is the social construction of gender

(I started this post off as tongue-in-cheek, but now I realise it's probably true)


The author failed to mention the context of the data chosen; specifically the dates. For example, rob vs. steal, would change depending on the century the work was created.


The author mentions in the second paragraph that the data comes from scraping Wikipedia articles' plot descriptions. So the plots might be old, but the descriptions (and language) were all written recently.


Before drawing any strong conclusions, I'd probably want to do at least some validation against original sources, e.g. Project Gutenberg. You're talking about plot descriptions written by a mostly fairly narrow demographic. I'd be hesitant to use that to draw conclusions about the source material.


Hm. Wouldn't that have little bearing on the result? Can't really say "he poisons" when that wasn't the plot of the story.


It might have a large impact on the language, though. 'Empoison' used to be a verb, 'burgle' has largely been replaced with 'rob', and so on. I think this would tend to improve the data, though - 'empoison' and 'poison' ought to be grouped.


But that's the point. "She poisons" is more likely to be the plot of the story than "he poisons."


There could be a bias in the summary writers too, in that they prefer "he murders" and "she poisons" for the same method of killing.


> For example, rob vs. steal, would change depending on the century the work was created.

Eh?

Both come so naturally to me that I honestly have no idea which centuries you're talking about, or which you think is obsolete.


Steal has become relatively more popular over time although the difference is probably not large enough to attach a lot of significance to it.

https://books.google.com/ngrams/graph?content=rob%2C+steal&y...

If I had to guess why (other than reasons...) I'd probably speculate that steal tends to have a broader meaning while rob is more likely to be applied to physically robbing someone.


Interesting, playing around with that shows it's a lot tighter in Britain - where it's also the other way around in the past tense by quite a margin. (Likewise in the USA until recently.)

I'm British, so maybe that explains my initial confusion.


Comparing the past tenses is probably more appropriate as that's the tense that those verbs are most commonly used in. It also occurs to me that a lot of the use of just "rob" may well be the name "Rob." Given that, the usage of stole vs. robbed tracks quite closely with just a bit of an upward track of stole over the past couple of decades.

I originally thought that rob vs. steal might be one of those Germanic vs. French roots thing but both words seem to be traceable through Old English.


> a lot of the use of just "rob" may well be the name "Rob."

Google ngram search is case-sensitive though, so that's unlikely to account for much - and probably offset by sentences beginning with the verb, e.g. 'Rob him, quick!' said the man.


Direct link to a scatter plot (x axis: quantity, y axis: gender):

http://varianceexplained.org/figs/2017-04-27-tidytext-gender...


We did a similar analysis of gender stereotypes in the Wattpad online writing community last year: https://www.aaai.org/ocs/index.php/ICWSM/ICWSM16/paper/view/...

The conformity with existing stereotypes was pretty depressing, and they're perpetuated more or less equally be male and female authors.


>I think this paints a somewhat dark picture of gender roles within typical story plots. Women are more likely to be in the role of victims- “she screams”, “she cries”, or “she pleads.” Men tend to be the aggressor: “he kidnaps” or “he beats”.

How's that different from actual real life beatings, murders and kidnappings?

Aren't men usually the aggressors?


> It’s interesting to compare this to an analysis from the Washington Post of real murders in America. Based on this text analysis, a fictional murderer is about 2.5X as likely to be male than female, but in America (and likely elsewhere) murderers are about 9X more likely to be male than female. This means female murderers may be overrepresented in fiction relative to reality.


I'd wager that much of this could be explained by the "masculine" verbs being used in relation to a protagonist more often, whereas the "feminine" verbs describe reactions to the actions of a protagonist.


Related post from a week or two ago:

https://news.ycombinator.com/item?id=14156079


If you're finding some disonnence between 100K stories and your own interpretation of reality, you've really got to ask yourself where the problem might lie.


Because I should adjust my perception of reality to match fiction?


The kind of fiction that gets popular reflects _something_ about reality.


Probably a lot of what's been most popular is stuff that very few people on HN would appreciate.


HN : Social Issues :: general conversation : a vegan


That cuts so many ways... But more than ever, I see a great diversity of social outlooks on HN. Might be recent politics, but the result is the same. I love the sub arguments on lab meat, tech ethics, gender equality and representation, and vi vs emacs. Even when I do not agree, I find coherent and consistent alternate points of view useful in seeing the world.


[flagged]


We're here for a civil and substantive discussion, and this kind of trolling is not that. Please don't.

https://news.ycombinator.com/newsguidelines.html


Thats not trolling, thats a humorous jab at the fact they were too eager to draw sweeping conclusions based on a, while promising if fine tuned, shotgun methodology to classify gender, but since you were so quick to shout out troll, there, i spellt it out for you.

Im glad atleast some folks find the post funny.


> ... were too eager to draw sweeping conclusions based on a, while promising if fine tuned, shotgun methodology to classify gender

I'm not sure exactly what this means, but you didn't say this—you said something deliberately inflammatory on an already controversial topic. We need commenters not to do that here because it sparks tedious flamewars by which our intellectual curiosity (the reason this site exists) is decidedly unsatisfied.


I merely referenced a meme to convey my opinion, which is repeated in the comment itself. We have to be careful not to turn hn into an ivory tower.

I do get you, but take it witha grain of sault. The real danger to our discourse is opinion Manipulation and echo chambers.


As in "She Apache helicoptered him in the face"?


Some people identify as "attack helicopter".


TIL, and it's even a first page search result.

> I’m beautiful. I’m having a plastic surgeon install rotary blades, 30 mm cannons and AMG-114 Hellfire missiles on my body. From now on I want you guys to call me “Apache” and respect my right to kill from above


It's a stupid meme, mocking the trend among Tumblr's sexually confused teenagers to put down "autism", or "zodiac", or "Sherlock Holmes" as their gender and insisting that everyone refer to them using their super special unique pronouns they invented.


...No they don't.

If you manage to find two people who do, congratulations on being "technically" correct.


It's a meme, relax.

I think it's not as funny as the Navy Seal meme, though!

http://knowyourmeme.com/memes/navy-seal-copypasta


This is a very tired meme. If anyone doesn't understand it, perhaps one can understand its usage by context, usually accompanied by ideas such as "gender isn't real, to prove it, I identity as a helicopter" (ironically/jokingly of course) and the classic "triggered".


Identifying as an Apache attack helicopter is ridiculous but identifying as a ferret is constructive. That's what furries are.


"along with his heart, she stole his last remaining shilling. But he should have known better than to surrender his heart to an apache attack helicopter" more like


[flagged]


Please stop breaking the guidelines.

https://news.ycombinator.com/newsguidelines.html




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: