What I said at SFSU & UO: Stereotype Threat and Intellectual Virtue

Over the last week, I’ve been lucky enough not only to avoid the devastation of Hurricane Sandy, but also to speak to the good folks at San Francisco State University and University of Oregon about stereotype threat and intellectual virtue.  Here’s a draft of the paper I delivered, which, after some revisions, will appear in Fairweather & Flanagan (eds.) Naturalizing Virtue.

Stereotype Threat and Intellectual Virtue[1]

0 Prelude


I grew up with two cats.  Both of them would regularly do things that cats just aren’t supposed to do.  They would trip while walking down the stairs, attempt to jump onto the sofa but underestimate the height, roll off the chair while enjoying belly scratches, and so on.  The thing is, though, that only one of these cats was a genuinely stupid animal.  He really didn’t know how to put one paw in front of the other.  He really couldn’t estimate heights.  He really didn’t notice the edge of the chair until it was too late.  For all I could tell, the other wasn’t a particularly dumb cat; she was just highly neurotic.  She was afraid of everything, and when she got scared, she lost focus and did silly things.  When she tripped while walking down the stairs, it wasn’t because she didn’t know where her legs were, but because she would startle at a loud sound and lose her concentration.  When she failed to jump onto the couch, it wasn’t because she couldn’t estimate heights, but because she was in a fright, fleeing an imaginary assailant.  When she rolled off the chair, it wasn’t because she couldn’t detect the edge, but because she would panic in the middle of the belly scratches.

The point is that the same manifest behaviors can be expressions of very different dispositions.  When one cat takes a tumble, it’s because he’s a stupid cat.  When the other takes a tumble, it’s because her fear distracts her and masks her feline intelligence.  It would be a mistake to ignore such masking when inferring a cat’s psychological dispositions from her behavior.  What’s needed is a nuanced accounting of how her many dispositions interact with one another and the environment to produce the behavior she manifests.


1 The Language of Character


In some ways, people are more complicated than cats.  We say things.  We interpret what others say.  We care what others think about us and expect from us.  We care about how they evaluate us.  We recognize ourselves in mirrors.  We form long-term plans.  We have desires about our own psychological dispositions.  Some people want to know more in the future than they do now.  Some people who don’t currently enjoy opera wish they did.  Some people want to be champion chess-boxers.  Just as it would be a mistake to ignore the masking of dispositions when inferring a cat’s psychological dispositions from her behavior, so it would be fallacious to ignore the masking of dispositions when inferring a person’s psychological dispositions from her behavior.

This point is especially pertinent when the inferences we make concern what are called, for lack of a better word, virtue or character.  It’s intrinsically rewarding to be judged to have virtue or good character, at least by certain people.  It’s intrinsically punishing to be judged to have vice or bad character, at least by certain people.  Moreover, when these judgments are expressed publicly, they typically have knock-on effects that radiate out through the audience of the utterance.  Being judged worthy typically leads to extrinsic rewards, whereas being judged unworthy typically leads to extrinsic punishments.  Get judged worthy often enough by the right people, and you might just enjoy the support you need to become worthy.  Get judged unworthy often enough by the right people, and you’ll probably end up flipping burgers, or worse.

Even more so, when a judgment of character is made in an official capacity, it can function more like a declaration than an assertion.  Declarations are paradoxical utterances with dual direction-of-fit.  Assertions have word-to-world direction-of-fit.  Ordinarily, if I say, “The lights are on,” I’d be making an assertion.  I’d be representing the world as being a certain way, and if there is a mismatch between what I say and how the world is, it’s my words that should change, not the world.  Other utterances, such as imperatives, have world-to-word direction of fit.  Ordinarily, if I say, “Close the window,” I’d be issuing an imperative.  I’d be representing the world as it ought to be, from my point of view, and if there is a mismatch between that representation and the world, it’s not my words that should change, but the world.  Declarations seem to have dual direction-of-fit.  Ordinarily, if I say, “This meeting is over,” I’d be making a declaration.  On the one hand, I would be representing the world as being a certain way – namely, such that I’ve ended the meeting.  On the other hand, by uttering those words sincerely, I would make it the case that I had ended the meeting.

In recent work (Alfano 2013, Alfano forthcoming), I’ve argued that virtue and vice attributions can be fruitfully compared with declarations.  This is especially so when the attributor holds a position of power.  Easy examples are ‘convicted felon’ and ‘sexual offender’.  If the appropriate legal authority designates someone a convicted felon, then that’s what he is.  The labeling is a declaration: it represents him as being a certain kind of person, but the label applies partly because the declaration has been made.  Such declarations enable or disable their targets in various ways.  In South Carolina, a convicted felon may not serve in the state legislature.  In Arkansas, convicted felons may not possess firearms, even for the purpose of hunting.  In Pennsylvania, a sexual offender must register with the state police, and his name, year of birth, address, and photograph are available online for all to see.

You can see some of the ambiguity engendered by declarations like these.  It’s hard to imagine a convicted felon who has not been declared a felon.  It’s all-too-easy to imagine a sexual offender who has not been so declared.  Many of our virtue and vice terms, I want to claim, exhibit this kind of ambiguity between their assertoric use and their declarative use.  On a more positive note, consider the term ‘genius’.  When used assertorically, it refers to a person who has extraordinary cognitive, aesthetic, or creative abilities.  But it can also be used declaratively, as when the MacArthur Foundation picks out its yearly list of fellows or that blowhard Harold Bloom picks out one hundred people (only ten of them women) to laud.  A test for whether a term t admits of both assertoric and declarative uses is to see whether it could make sense to say, “Some ts are not ts,” where the first use is declarative and the second assertoric.  For instance, some sexual offenders are not sexual offenders; that is, some people who have been declared sexual offenders never actually committed sexual offences.  Likewise, some geniuses are not geniuses.  For instance, Bloom declares James Boswell a genius, when in fact he was merely a toady.  In contrast, it simply makes no sense to say that some even numbers are not even numbers.  There’s no declarative use of ‘even’.

The declarative use of the language of intellectual virtue and vice, especially in official contexts, can have self-confirming effects, and the channels through which these effects flow include both the people who make the declarations (teachers, administrators, parents) and the people about whom they are made (students).  Many states have special programs for “gifted and talented” youths.  I couldn’t tell you in any robust sense what it takes to be a genius, but I have a pretty good idea of what it means to be gifted and talented.  And I can assure you that some gifted and talented youths are neither gifted nor talented.  That is to say, some of the students designated in this way are unremarkable.  Nevertheless, when they are so designated, they receive special treatment in school, their intellectual self-esteem goes up, and they often outperform otherwise comparable but unlabeled peers (Hoge & Renzulli 1993).

The contrary phenomenon also occurs.  In a shocking longitudinal study, Ray Rist (1973) followed the academic careers of a cohort of students in a de facto segregated school in St. Louis from kindergarten through the end of second grade.  By the end of the second week of kindergarten, these students had been assigned permanent seats at one of three tables based on what their teacher called their “qualities,” which she assessed intuitively and which just so happened to be highly correlated with the students’ socioeconomic status.  Those at Table 1 received more and more positive attention, and unsurprisingly learned more and faster.  Those at Tables 2 and 3 received increasingly less and less positive attention; unsurprisingly, their school year proceeded apace.  By the end of kindergarten, the Table 1 students had an objective record of higher achievement, better behavior, and better motivation.  The caste system was further cemented in first grade, and by the time second grade rolled around, the top table was now populated by “Tigers,” the middle table by “Cardinals,” and the last table by “Clowns.”  The labels fit, but that was so largely because they had been applied in the first place.  When someone in a position of power, such as the kindergarten teacher, applies a label, over the course of time it can function as a self-fulfilling prophecy.  The process is of course not inevitable, and students’ success or failure doesn’t depend solely on how they are labeled.  This introduces some daylight between trait attributions and standard declarations.  When a duly empowered judge declares a couple married, they are wed right there on the spot.  In contrast, when a teacher labels someone an A-student, he comes to fit that description – if at all – only over time, and only through the continued signaling of expectations by the teacher and others, as well as his own self-concept.


2 Components of Character


If this view is sound, it suggests certain revisions in how we conceptualize intellectual virtues.  Stacking the deck a bit in my own favor, I’ll call a conception naïve if it countenances only first-order dispositions as components of virtue.  Such first-order dispositions might include, among other things, dispositions to notice, think, construe, feel, want, deliberate, act, and react in characteristic ways.  Naïve views are understandably attractive.  Without going too deep into the scholarship, I think it’s fair to say that Aristotle held, or at least that most contemporary Aristotelian virtue theorists hold, a naïve conception of virtue, according to which a virtue is a disposition to do the right thing for the right reason.

In some of my recent writings on this topic, I’ve suggested that, in light of both empirical evidence and theoretical arguments, the naïve conception should be replaced with either a sophisticated internalist or a sophisticated externalist conception.  Both conceptions countenance second-order dispositions as components of virtue.  According to the sophisticated internalist conception, for a person to possess a particular virtue is for her to have a suite of first-order dispositions as well as certain second-order dispositions to notice, construe, think, feel, and want in characteristic ways.  The distinction between first- and second-order dispositions can be illustrated with the example of open-mindedness.  Somewhat roughly, on the naïve conception of open-mindedness, someone counts as open-minded just in case she is disposed to notice when others are inclined to disagree, to construe disagreement not as a threat but as an opportunity, to think carefully about the evidence others bring to bear when a disagreement is at hand, to want to learn from others rather than merely prevail in convincing them, and so on.  On the sophisticated internalist conception, she must also be disposed to think of herself as open-minded, to want to maintain her integrity in the face of disagreement, and so on.  I call these dispositions second-order because they are dispositions to token second-order mental states, such as thoughts and desires about oneself.  While it may be tempting to think that overlaying first-order dispositions with second-order dispositions leaves them unchanged, this layer-cake approach is conceptually and empirically dubious.  Our beliefs and desires about our own beliefs and desires don’t skate across the surface of the first-order mental states.  They interact with them.

The sophisticated externalist conception of virtue goes a step further.  In addition to countenancing the first-order and second-order dispositions of the possessor of virtue, it countenances the (signaling of) second-order dispositions of other people in the possessor’s social milieu.  The idea here is that your being virtuous might somehow involve other people being disposed to think that you have certain first-order dispositions, to expect you to act from those dispositions, to prefer that you have some dispositions rather than others, and so on.  In fact, what’s most important here is not that other people have these second-order dispositions, but that they signal them, that they convey them to you in some way.  I call this conception externalist because it says that some of the dispositions that constitute your virtue are actually outside your skin.  Your virtue, on this model, isn’t a monadic property of you as a person, but rather a relation between you and other people.

There are weak and strong versions of the sophisticated externalist conception.  According to the strong version, someone’s possession of virtue necessarily involves (the signaling of) others’ second-order dispositions.  According to the weak version, someone’s possession of virtue only possibly involves (the signaling of) others’ second-order dispositions.  Both versions of the view can be seen as the next step in the externalization of psychology.  In the 1970s, Kripke (1972) and Putnam (1975) popularized the idea that mental content is external, that the meaning and reference of some words is not determined solely by what’s in the heads of people who use those words.  In the 1980s, Nozick (1981) and Dretske (1981) introduced the notion that one’s justification for a given belief might not be determined solely by what’s in one’s head.  In the 1990s, Clark and Chalmers (1998) suggested that the mind itself might extend beyond the limits of the skin.  The current proposal is that some psychological dispositions – namely character traits – extend beyond the limits of the skin of their possessor.  On the weak version of the view, a given character trait needn’t so extend, but may.  On the strong version, (the signaling of) others’ second-order dispositions is always a component of trait possession.  The strong version of the thesis is obviously more revisionary than the weak one, but it is also more parsimonious.  The weak version of the thesis may seem unsatisfyingly disjunctive, since it entails that any given character trait is multiply realizable: one instance of curiosity might involve first-order and second-order dispositions contained solely within the agent, whereas another might involve those as well as second-order dispositions located outside the agent.  For my current purposes, it doesn’t matter whether the strong or the weak version of the sophisticated conception wins out because I want to talk about a specific family of cases in which we have good reason to think that some of the relevant dispositions are external.


3 Stereotype Threat and Stereotype Lift


The foregoing discussion has been a somewhat roundabout way to introduce my ultimate target: stereotype threat.  For decades, some ethnic minorities (African Americans, Latinos) and other marginalized groups (women, the poor) have performed worse than average on a variety of tests of ability, performance, and skill.  How is this phenomenon to be understood?  In a one hundred and twenty-three-page article, Arthur Jensen (1969) argued that the best interpretation attributed the differences between ethnic groups not to environmental or situational influences but to heritable traits, especially intelligence.  His controversial thesis provoked a deluge of responses, from protests and rebuttals to encomia and attempts to corroborate.  In fact, Jensen’s provocation may be the single most cited article in the entire field of intelligence research.[2]

More recently, Richard Herrnstein and Charles Murray (1996; see also Benbow & Stanley 1980, 1983; Kanazawa 2008; Templer 2008; and Templer & Arikawa 2006) in The Bell Curve: Race and Class Structure in American Life revived a version of Jensen’s argument.  Among the many claims made in The Bell Curve, several had to do with differences between groups, including:

  • East Asians are on average more intelligent than whites (p. 272).
  • Latinos are on average less intelligent than whites (p. 275).
  • African Americans are on average less intelligent than Latinos (p. 276).

Herrnstein & Murray (1996) go on to identify what they take to be the consequences of these differences in intelligence:

  • Intelligence as measured by a standardized IQ test is a stronger predictor than parental income of adult income (p. 135).
  • Intelligence as measured by a standardized IQ test is a stronger predictor than parental income of level of educational achievement (p. 154).
  • Less intelligent people are so much more accident-prone than their intelligent counterparts that they are more frequently unemployed through disability (p. 162).
  • Less intelligent people are more likely to be idle (p. 155), i.e., to “drop out” of the labor force without a “legitimate reason” (p. 157).
  • Less intelligent people are more likely to be arrested for (p. 246), be convicted of (p. 247), and to serve prison time for (p. 248) criminal behavior.

It’s not hard to connect the dots.  Minorities are poor because they’re stupid.  Their low IQs prevent them from reaching a high level of education.  In fact, they’re so unintelligent that they injure themselves more frequently and more severely than their high-IQ counterparts.  Worse still, their low intelligence is somehow connected with laziness, and even leads them to lives of crime.  Women earn less than men because they’re not as bright, especially when it comes to math.  Worst of all, the dumbest among us are out-breeding the rest, in part because of well-intentioned but misguided welfare policies that aim to alleviate the poverty of mothers, especially single mothers (pp. 548-9).

Philosophers have paid scant attention to these jeremiads.  To my knowledge, the only published response to The Bell Curve by a philosopher is Ned Block’s semi-popularizing essay, “How Heritability Misleads about Race” (1996).  Most philosophy of race focuses instead on the meaning of ‘race’ and racial predicates (e.g., ‘black’, ‘white’, ‘Latino/a’, ‘Asian’).  Various positions on these topics have been staked out in the logical space, but they do not seem particularly helpful in responding to Herrnstein and Murray and their fellow travelers.  In fact, Ron Mallon (2004, 2006) persuasively argues that all prominent theories of race – from race skepticism to various forms of race constructivism to suitably chastened biological-naturalistic accounts of race – agree on the basic metaphysical facts, differing only in their semantic and normative perspectives.

Whatever the meaning of ‘race’ and racial predicates, the race gap in intelligence testing undeniably exists, often with depressing consequences.  However races and ethnicities are individuated, then, it might seem that intellectual virtues are unevenly distributed across different groups.  It would be rash, however, to draw such an inference without exploring the many ways in which differences in performance can be explained.  Just as the behavior of cats results from a confluence of their various dispositions and the interaction of those dispositions with their environment, so the behavior of people in the context of intelligence testing results from a confluence of their dispositions and the interaction of those dispositions with their environment.  For creatures as complex as cats, monocausal explanation is to be eschewed.  Given that people are at least as complex as cats, such explanation is dubious in our case as well.

Herrnstein and Murray rush to attribute all differences in performance to differences in innate first-order dispositions.  Here, I want to explore how much of the difference in performance might be attributable to situational second-order dispositions – both internal and external.  I should clarify from the outset that in what follows I do not attempt to explain all of the observed differences situationally.  There is a clear case to be made for the importance of developmental differences between groups in the United States, given the legacy of slavery, overt discrimination, structural discrimination, implicit bias, lack of opportunity, lack of role models, internalized inferiority, lack of material resources, grinding poverty, and many other factors.  These all matter, and I do not want to downplay their importance, but it seems to me that, in addition to these developmental issues, there is a case to be made for the influence of situational pressures on both the internal and the external second-order dispositions that partially constitute intellectual virtue.


3.1 Stereotype Threat to Racial and Ethnic Minorities


In particular, I want to suggest that the combination of stereotype threat for minority test-takers and stereotype lift for majority test-takers may account for as much as half of the race and gender gaps in various measures of academic achievement and ability.  Stereotype threat was discovered in 1995 by Claude Steele and Joshua Aronson.  They began with the idea that if you’re worried that others will treat your performance on a task as emblematic of your group, and your group is stigmatized as low-performing or low-ability on that task, then you will experience a level of threat that people from another group might not.  In particular, since there is a stereotype in the United States that African-Americans are poor students, they will experience a level of threat that white students do not experience on the same task.  This experience in turn mediates performance: the more nervous you are about the inferences others might draw about your group based on your individual performance, the worse you do on the test.

To demonstrate this, Steele & Aronson (1995) conducted an experiment with African-American undergraduates at Stanford University.  The participants were randomly assigned to one of two groups.  Only the first group was told that the test they were about to take was diagnostic of ability.  Thus, their threat level was increased: if they performed poorly, it could reflect poorly on their whole group.  As predicted, the students in the first group underperformed their matched peers in the second group.  That is to say, merely being told that the test they were about to take was indicative of ability led to performance decrements.

Now, one might respond to this by saying that it demonstrates nothing about stereotypes in particular.  After all, it could be that raising the stakes of a test would make anyone – regardless of their racial or ethnic identity – more anxious, leading to poorer performance.  To rule out this possibility, Steele & Aronson (1995) conducted another experiment.  As before, the participants were African-American students and were randomly assigned to one of two treatments, but this time, the only thing that separated the treatments was that one group filled in a demographic survey before the test, whereas the other filled in the same survey after the test.  The idea behind this experiment was that merely asking students to indicate their race before taking the test would raise their threat level and lead to worse performance, whereas asking about demographics after the test might raise their threat level but would of course leave their prior performance unaffected.  As predicted, the students who were prompted to think about their group membership just prior to taking the test underperformed their matched peers who answered the same questions just after taking the same test.

Since Steele & Aronson’s groundbreaking work, psychologists have begun to explore just how wide-ranging this phenomenon is.  Subjecting someone to stereotype threat induces not only performance decrements, but measurable bodily changes.  In one study, African-American exhibited larger increases in mean arterial blood pressure than European-Americans while taking the same high-stakes test (Blascovich et al. 2001).


3.2 Stereotype Threat to Women


Latinos and latinas in the United States also suffer from stereotype threat in academic contexts (Schmader & Johns 2003), but racial and ethnic minorities are not the only groups harmed by this phenomenon.  Whereas African-American students appear to face stereotype threat across academic subjects, women experience it only in the STEM fields, especially mathematics.  Although women who endorse the negative stereotype about their sex’s mathematical prowess are especially susceptible to threat (Schmader, Johns, & Barquissau 2004), women who reject the stereotype also experience decrements under threatening conditions (Keifer & Sekaquaptewa 2007).  One of the truly insidious things about stereotype threat is that it doesn’t require that anyone – the target of threat or those in contact with the target – actually believe in the accuracy of the stereotype.  All that’s required is that the target comes to think that her performance might be treated as emblematic or representative of her group.  This thought can be triggered explicitly, by announcing at the outset that the task is a test of a stereotyped ability, or of differences between groups.  But it can also be triggered subtly, for instance by administering a demographic survey that happens to ask about the relevant group identity.  Danaher & Crandall (2008) estimate that roughly five thousand more women per year would receive advanced placement credit for calculus if demographic data were collected after the test rather than before it.

Worse still, the effects of stereotype threat are quite powerful.  According to a careful meta-analysis conducted by Nguyen & Ryan (2008), the average effect size for women and racial minorities is a Cohen’s d between 1.2 and 1.6, depending on the group and the method of threat activation.  For those unfamiliar with this statistic, I should emphasize how large it is.  Cohen’s d is the ratio of the difference between group means to standard deviation on the measure in question, so this basically means that stereotype threat tends to lead to a performance decrement of more than an entire standard deviation.  For context, this is like finding a drug that could decrease the average American male’s height by about 4 inches, or a geoengineering intervention that decreased the number of rainy days per year in Eugene, Oregon, by about 50.


3.3 Stereotype Threat to Low-SES Individuals


Other minorities also experience stereotype threat.  Even controlling for race and gender, students from low SES families tend to perform worse on a variety of tests of reasoning and intelligence.  One could jump to conclusions like Herrnstein and Murray, attributing their poor performance to genetic inferiority, but to do so would be to ignore the possibility that they too experience a kind of stereotype threat.  And in fact this possibility has been borne out.  When low SES students are told that a test they’re about to take is diagnostic of intellectual ability, they perform much worse than when they are told that the same test is not diagnostic (Croizet & Claire 1998).  Similarly, when told that the purpose of the task is to determine why low SES students generally perform worse on academic tests, they do in fact perform worse, but when told that the purpose of the task it to investigate problem-solving processes, they do not (Harrison et al. 2006).


3.4 Stereotype Threat Beyond Academic Contexts


It’s natural to imagine that stereotype threat must have something in particular to do with academic tests, but the phenomenon is much broader than that.  Women often find negotiation threatening.  Kray, Thompson, & Galinsky (2001) showed that women perform worse in pricing negotiations when they are told that the task is diagnostic of ability than when they are not so told.  Kray, Galinsky, & Thompson (2002) upped the ante by showing that the effect of stereotype threat for sex can be flipped in the context of mixed-gender negotiations.  If, prior to the negotiation, all participants are told that people who are assertive, good at problem-solving, and highly self-interested tend to succeed in the negotiation task, then the women do worse than the men, but if they are told that people who are emotional, insightful, good at listening, verbally expressive, and well-prepared tend to outperform, then the men do worse than the women.[3]

Some groups are targeted by ambivalent stereotypes, or pairs of oppositely-valenced stereotypes.  For instance, the elderly are sometimes stigmatized as senile, but sometimes stereotyped as wise.  It turns out that priming elderly participants with one of these stereotypes leads to congruent performance on a test of recall: those who are reminded of the old-is-senile stereotype recall fewer words, whereas those who are reminded of the old-is-wise stereotype recall more (Levy 1996).

Stereotype threat even arises in the context of athletics.  White athletes putt most accurately when told that the experiment tests sports intelligence, less accurately when the task is framed as an investigation of sports psychology, and least accurately when told that it’s about natural athletic ability.  Exactly the opposite pattern holds for black athletes (Stone et al. 1999; see also Stone 2002).

Perhaps the most amusing investigation of stereotype threat is due to Yeung & von Hippel (2008), who found that Australian women are more than twice as likely to run over a jaywalker in a driving simulation when they are told that the purpose of the study is to investigate why women are bad drivers than when the negative stereotype is not primed.  The increased probability of killing the jaywalker was equivalent to what happens when the driver performs a secondary task, such as talking on a cell phone.


3.5 Stereotype Lift


Even more interesting is the phenomenon that’s come to be known as stereotype lift.  In a threatening situation, participants perform worse than controls.  In some circumstances, though, being primed with one’s group identity can lead participants to outperform controls.  This occurs especially when the target identity is stereotyped as especially adept or skilled at the task in question.  The first investigation of stereotype lift was conducted by Shih, Pittinsky, & Ambady (1999).  Their subjects were East Asian-American women.  In the United States, East Asians are stereotyped as good at math, whereas women are stereotyped as bad at math.  What Shih and her colleagues did was to prime some participants with their ethnic identity, some with their gender identity, and some not at all.  Those who had been reminded of their ethnic identity correctly answered 54% of the questions on a difficult math test; those who had been reminded of their gender identity answered only 43% correctly; and those who were primed with neither group identity answered 49% correctly.  These differences might seem small, but the average quantitative SAT score of the women in this study was 750 out of 800, which cut down considerably on the variance in their mathematical abilities.  To show that this effect was driven by the stereotype rather than by some inborn characteristic, Shih and her colleagues replicated the study in Vancouver, where pre-testing suggested that the gender stereotype persisted but the ethnic stereotype did not.  In this second experiment, the participants in the control condition and those in the ethnic-priming condition were indistinguishable.

This suggests that stereotype threat and lift may sometimes be induced through comparative rather than absolute judgments about stereotyped groups.  In fact, according to Walton & Cohen’s (2003) meta-analysis, white men in the United States tend to experience a small but significant lift (d = .24) whenever they were primed to think of a group on whom they could look down.  However, it turns out that even they can be threatened by being primed to think about a group that is stereotyped as better than them at a particular task.  For instance, the Asians-are-good-at-math stereotype is so strong in the United States that white men experience stereotype threat during a math test when primed to compare themselves with Asians (Aronson et al. 1999).


3.6 Overcoming Stereotype Threat


Before proceeding, I should sound a more uplifting note.  Although stereotype threat seems to be ubiquitous, and stereotype lift seems to benefit those who need it least, some ways of overcoming threat have been successfully explored.  One of the most promising is the self-affirmation paradigm, which stigmatized individuals write about why something they personally value is important to them for about fifteen minutes.  This short exercise has been shown in a laboratory context to eliminate the effects of stereotype threat on women taking a math test (Martens et al. 2006).

Even more impressive are a pair of field studies aimed at reducing the race and gender gaps in high school and college respectively.  In the first (Cohen et al. 2006), students in a mixed-race high school in Massachusetts were split into self-affirmation and other-affirmation groups.  A couple of weeks into the semester, the self-affirmation group identified something they valued, then wrote for fifteen minutes about why the valued it, while the other-affirmation group identified something they didn’t value, then wrote for fifteen minutes about why someone else might value it.  The primary outcome variable for this field study was GPA in the class at the end of the semester.  Whereas white students’ GPAs did not differ across the two conditions, black students in the self-affirmation increased ended up with GPAs roughly .3 greater on a 4.0 scale – basically the difference between a C+ and a B-, or a B+ and an A-.  This corresponded to a decrease in the racial achievement gap at that school of 40%.  Moreover, a sentence-completion task administered several weeks after the writing exercise revealed the concept of race had been made less accessible to the black students in the self-affirmation condition: they were least likely to fill in the gaps in the sentences with stereotypical words.

Another self-affirmation study (Miyake et al. 2010; see also Good, Aronson, & Harder 2008) found a similar alleviation of threat for women in a college physics class.  While the men in this class were pretty much unaffected by the self-affirmation vs. other-affirmation manipulation, women in the self-affirmation condition performed considerably better than their peers in the other-affirmation condition.  In fact, their modal improvement was an entire letter-grade – from C to B.  The effect of the self-affirmation intervention was strongest for women who had previously endorsed the women-are-bad-at-physics stereotype, but had some effect across the board.  Overall, this seemingly trivial intervention closed the gender gap for women in the self-affirmation condition by 61% for in-class examinations and entirely for a standardized test of conceptual mastery administered at the end of the semester.


4 Stereotypes and Intellectual Virtues


There are further studies I could cite in this context, but I hope that the rough outline of the phenomenon is now clear.  Across a variety of task domains and group identities, people are susceptible to both stereotype threat and stereotype lift, often with large, measurable, real-world effects.  These effects can be induced in a variety of ways, from overtly telling participants that the test is diagnostic of ability in the domain or that the purpose of the task is to investigate group differences, to much subtler primes.  Merely asking for demographic information seems to trigger stereotype threat, as does being a “solo” visible minority (Stangor, Carr, & Kiang 1998; Inzlicht et al. 2006).

Although we now have a decent grasp on what kinds of conditions will lead to stereotype threat and stereotype lift, less is understood about their mechanisms.  Among the many potential mediating variables that have been proposed are anxiety, self-efficacy, evaluation apprehension paired with a shift towards caution, divided attention, demotivation paired with effort withdrawal, and expectancy confirmation.  Each of these constructs seems to partially mediate the effect, but none does the job all on its own (Spencer, Steele, & Quinn 1999; Cadinu et al. 2003).  One exception is the work of Schmader & Johns (2003) who argue that, at least in the case of women’s performance on math tests, the effect of stereotype threat is fully mediated by decreased working memory capacity, which in turn is thought to be closely related to the fluid component of g, or general intelligence.

Even if no single variable fully mediates the effects of both stereotype threat and stereotype lift across all groups and domains, however, it should now be plausible to suppose that performance is not solely and directly a function of first-order cognitive dispositions.  Second-order internal cognitive dispositions, such as thinking that you belong to a certain group, worrying that your performance on a task might be treated as representative of your group, knowing that others stereotype your group in a given way, and so on, all play a role.  Moreover, these second-order internal dispositions exert top-down influence on first-order cognitive dispositions.  As I mentioned in the previous paragraph, your working memory capacity – a first-order disposition if anything is – may go down in conditions of threat.  It’s also been shown that high-self-monitoring individuals are less susceptible to stereotype threat than low-self-monitoring individuals (Inzlicht et al. 2006).  Self-monitoring is a second-order internal cognitive disposition, and it moderates the effects of stereotype threat.  Furthermore, even just learning about the phenomenon of stereotype threat seems to block it, at least for women taking math tests (Johns, Schmader, & Martens 2005).  Since its content includes first-order dispositions, knowledge that you might be susceptible to stereotype threat is a second-order disposition.

These observations also help to make a case that (the signaling of) second-order external dispositions may sometimes partially constitute intellectual virtue or the lack thereof.  What worries someone who experiences stereotype threat is that other people might treat her performance as representative of her group.  In other words, other people’s dispositions to judge the target’s first-order dispositions can influence how those first-order dispositions get expressed – or at least, how those second-order dispositions are perceived can do this.  Marx & Goff (2005) showed that when the test-giver belongs to the same stereotyped group as the test-takes, threat is diminished.  Presumably this flows from the test-takers’ assumption that a fellow target of stereotype will be less willing to find confirmation of the stereotype in their performance.

It seems to me that there are three models that might accommodate these insights: the first-order model, the second-order internalist model, and the second-order externalist model.  According to the first-order model, all that it takes to have good intellectual character is that you possess a cluster of first-order dispositions, as spelled out above.  On this view, someone who performs poorly in the face of stereotype threat might still be virtuous, but her virtue would be masked or finked by the threatening context.  Conversely, someone who performs well in the face of stereotype lift might not count as virtuous, or at least not as virtuous as he seems, since the stereotype lift could be mimicking real intellectual virtue.  I find this model unappealing, for reasons I’ve spelled out elsewhere.  The main argument is simply this: if intellectual virtues are dispositions that someone who wants to believe the truth and avoid error needs or would want to have, then it’s just obvious that some second-order dispositions contribute to virtue.  Part of what it takes to be a successful investigator and a worthy knower is the desire to know one’s own mind, to monitor one’s own first-order dispositions, to evaluate one’s own intellectual responses.

According to the second-order internalist model, part of what it takes to be intelligent, skilled at mathematics, adept at physics, good at golf, and so on is a cluster of second-order dispositions to think about yourself in characteristic ways, to expect characteristic behavior from yourself, to want to have characteristic first-order dispositions, and so on.  The right second-order dispositions immunize or at least partially immunize you to things like stereotype threat, which would otherwise mask your first-order intellectual virtues.

On the second-order externalist model, all of this is so, but in addition part of what it takes is that other people be disposed to signal certain second-order dispositions to you.  This means that whether you are or become virtuous is not entirely up to you: others could strip you of virtue by failing to signal the right second-order dispositions or by signaling the wrong ones.  Likewise, others could bestow virtue upon you by signaling the right second-order dispositions and not signaling the wrong ones.  There is a longstanding prejudice in favor of the idea that character and virtue are entirely up to their bearer, but it seems to me that this prejudice may need to be abandoned.  To do so would make each of us at once more vulnerable and more responsible – more vulnerable because the virtues we have or lack would be in part due to others, and more responsible because everything we do could potentially contribute to or undermine others’ character.  I leave it to future work to argue that this pair of upshots is to be embraced rather than rejected.



[1] With grateful thanks for comments, discussions, and suggestions to Joshua Alexander, Andrew Conway, Abrol Fairweather, Ron Mallon, Philip Mayo, Carlos Montemayor, Jesse Prinz, Brian Robinson, and John Turri.

[2] As of the writing of this paper, the Web of Knowledge registered 1358 citations.

[3] The experimenters chose these lists of traits based on pre-tests of stereotype content.

  1. Very interesting article (to call it a blog post would be somehow misleading). Thanks for answering in a thorough way to most (or all) possible objections to the view that stereotypes need to be eliminated. After all, it is an enormous disadvantage for all of us, if, e.g., Latinos or women think there is no hope for them to succeed in their professional life and just give up trying (and perhaps eventually start living off someone else, or off the state, or start living illegally).

