Anonymous popularity voting is a very common situation: political elections, high school prom queen selection, to name a few. They also play an important role in career advancement. Tenure decision of a university professor depends on voluntary course evaluations of students. Managers rely on senior team members’ opinions for the performance evaluations of junior employees. In lieu of such significant ramifications, it is important to ask if it is fair to rely heavily on popularity voting outcomes in these decisions.  

Let me elaborate on the question I am interested in. There is some evidence that male professors receive higher ratings from students than female professors for the exact same work. Researchers from North Carolina State University conducted an experiment on students taking an online course. Two professors, one male and one female, each taught 2 sessions of the same online course, where both professors let the students of one session think they were male while the other session thought the professors were female. Students never saw nor heard the actual professor who taught them. At the end of the course, instructors’ performances were evaluated on 12 metrics such as fairness, professionalism, promptness of response, etc. The results are unnerving, to say the least. Even for the same professor, the perceived female version received lower scores on all 12 metrics than the perceived male version.

This study, while not a conclusive evidence due to methodology considerations, certainly adds weight to a body of anecdotal evidence and brings about an interesting question. Are we holding women, perhaps even subconsciously, to higher standards? I grew up in a society that holds very distinct notions of “ideal” for men and women.  Many activities that should have been frowned upon universally if at all, came with the exclusive qualifier “for women”. A woman who is smoking looks disgraceful A woman who is untidy must have bad personal management skills. The list goes on.

Recently, I came across an interesting “natural experiment”. A popular online website for Mongolians held an online contest to select the top 100 people of the year. About 220 people were nominated. Viewers voted positively by “liking” the profile of a contestant. The total number of “likes” was used as the ultimate metric of popularity in the contest. A typical scenario. However, curiously, viewers could also vote negatively by “disliking” a profile. The number of dislikes seemingly did not harm the contestants. (A snippet of the contestants’ profiles can be found here)

The online profiles of contestants showed the contestants’ pictures with minimal to no background information. This suited my needs for this project since varying degrees of accomplishments will most certainly affect popularity. In the absence of descriptions of achievements, judgements were more likely to have been made based on visible characteristics from the picture (such as gender, age, and attractiveness).

Of course one should be concerned with the fact that contestants were essentially in charge of their own PR. Perhaps males and females, and certainly different age groups, have different social media presence and support. We may take the number of “likes” to be a proxy of social media presence, support, and PR effort. Including the number of “likes” in the regression analysis will control for just such a possible bias.

Exploratory analysis

Let us first see how the data is distributed across gender and age. The variable age takes 2 values: young (blue) and middle-aged (red). There were no elderly contestants. The assessment was done on a purely visual basis from the pictures provided by me alone.


Immediately of interest is the small number of middle-aged women compared to middle-aged men who entered this contest. Perceived higher scrutiny of women may have already discouraged women to enter such a public contest. The total number of female contestants was 45 compared to 183 male counterparts. It is possible that Mongolian women preemptively under-participate in popularity contests for fear of public scrutiny. In that case, the women who did enter this contest may have been more sure of their popularity or social media support. That is, the results we find will be an underestimate of the negative impact of being a woman in a popularity contest due to a self-selection bias.

We want to compare the number of “dislikes” for males and females given the same number of “likes”. 


Above is the plot of the number of “likes” vs. the number of “dislikes” of each contestant. For a given number of “likes”, the number of “dislikes” for females are higher compared to males. This can be seen from the much steeper regression line for females (orange) than for men (green).

Regression analysis

The model is a simple one: a linear regression of the number of “dislikes” on the number of “likes” as a proxy of the general level of social exposure and gender restricted to young participants. Middle-aged contestants were excluded from the analysis due to the insufficient number of females in this category.

Estimate Std t-value Pr(>|t|)
(Intercept) 37.85 21.7 1.744 0.0831
like 0.116 0.009 12.045 <2e-16
female 54.48 30.98 1.759 0.0806
F-statistic: 79.52 on 2 and 154 DF,  p-value: < 2.2e-16

The number of “likes” a person receives is associated with an increased number of “dislikes”. This association makes sense since the higher the number of “likes”, the higher the public exposure, which should bring more “dislikes”. The effect of being a female controlling for public exposure is what we are interested in. The results suggest that for men and women of equal public exposure, being a women will bring an additional 54 “dislikes”. This result is significant at p-value 0.08.

Why care?

Implicit bias. Everyone is equally susceptible to applying higher standards to women. I do it sometimes knowingly, and probably more often unknowingly. Being aware of my own biases helps me recognize them, when they do creep in, and hopefully correct some.