The semantic neighborhood of intellectual humility

Here’s a draft of a paper (co-authored with Markus Christen and Brian Robinson) on the semantic neighborhood of intellectual humility.  We are replicating in German and Mandarin, so those who are familiar with Wilfrid Sellars should think of this as the first step in a seriously scientific dot-quotation research programme.

1. Introduction

The study of personality and conceptions of personality has been pursued by psychologists and other researchers in various ways, including among others observations in laboratory settings and field experiments, correlational studies of survey responses, and psycholexical analyses. The present research embodies the latter methodology, and is informed by both philosophical theory and mathematical modeling tools developed in physical science.

Psycholexical analysis dates back to Francis Galton’s Measurement of Character (1884). The basic idea is that, all else being equal, a natural language is more likely to include a predicate for a property to the extent that the property is important to those who speak the language. This is not to say that every phrase or term refers. There are no unicorns despite the existence of the term ‘unicorn’. Nor is it to say that everything worth talking about is already represented by a phrase or singular term. Words are sometimes coined because new phenomena come into existence or become important; words are also sometimes coined because extant phenomena could not otherwise be parsimoniously described and explained. Sometimes a speaker coins words to describe or explain phenomena for which a word already exists, but of which the coiner is ignorant. So words that are synonyms (or nearly so) emerge, further emphasizing the importance of the phenomena referred to. Regardless, the rough generalization that there is a strong positive correlation between the importance of phenomena in the lives of the speakers of a language and the probability of the existence of a term in the language that refers to those phenomena is hard to deny. If this is on the right track, studying psychological language is an indirect way of studying the psychological properties people care about.

Psychologists in the psycholexical tradition don’t stop there, though. They also typically argue that the semantic structure of a language reflects to some extent the perceived structure of the phenomena described by the language. In personality psychology, this insight was famously used by Allport & Odbert (1936) to create a semantic taxonomy of thousands of personality-relevant terms, which they argued represents how people conceive of personality. Of course, the step from language to people’s conception of personality is not identical to the step from their conception of personality to actual personality, but it’s natural to think that there will be at least a positive correlation – if only a weak one – between how we think about personality and how personality actually is. This two-step connection (from language about personality to conceptions of personality, from conceptions of personality to actual personality) has been empirically validated by personality models such as the Big Five (Peabody & Goldberg 1989) and Big Six (Ashton et al. 2004; Saucier 1997).

The Big Six includes an H factor that represents facets of personality related to honesty and humility. Intellectual humility seems to involve a consciousness of the limits of one’s knowledge, including sensitivity to circumstances in which one’s native egocentrism is likely to function self-deceptively (Roberts & Woods 2007), though others regard it as more of a “second-order” open-mindedness (Spiegel 2012). In our age of information, intellectualhumility has grown all the more relevant. However, little conceptual or empirical work has explored this trait. We think that the psycholexical approach is especially promising in the investigation of intellectual humility because questionnaires are likely to be especially unreliable as measures of this construct. Someone who is genuinely humble is unlikely to report being humble, and someone who reports being humble is unlikely to be humble. Humility – whether intellectual, moral, or otherwise – seems to involve a paradox of self-reference.

Additionally, our investigation is motivated by Aristotle’s insight, reiterated in contemporary philosophy by Roberts & Wood (2007), that a virtue (i.e., a positive value-laden personality disposition or dimension of individual difference) is often best understood in the context of related virtues and the vices they oppose. Put a different way, by contextualizing a term for a virtue in the constellation of its near-synonyms and its near-antonyms, we can create a perspicuous representation of the meaning of the term.

For these reasons, we propose to investigate the trait of intellectual humility psycholexically by comparing ‘intellectual humility’ with both its antonyms and synonyms.

2. Method

Our analysis is based on the assumption that the practice of language is precipitated in dictionaries, lexicons, and other wordbooks. Of particular interest is the thesaurus – a language reference book or database organized to help its users find words related to a concept but having slightly different shades of meaning or connotation. Thesauruses reflect what people in their daily use of language – in particular when writing text – consider semantically similar to a given term. In other words, a thesaurus lists synonyms in a broad sense. Modern thesauruses also list antonyms, which are then again related to a set of their own synonyms.

The present research explores the semantic space of intellectual humility by first identifying the most common synonyms and antonyms of ‘intellectual humility’. Next, by referring to the database (the largest online thesaurus for American English), we associate each identified term with a word-bag, which is the set of synonyms listed for that term. The semantic constellation of a term t is thus an ordered pair (t, {tsyn1, tsyn2, tsyn3, …, tsynn}), whose first element is t itself and whose second element is t’s word-bag, i.e., the set of synonyms of t (including t itself). By comparing semantic constellations, we then create a similarity metric by calculating the relative overlap of each pair of word-bags. The similarities calculated in this way are then used in a novel clustering and visualization tool that generates a semantic map of the terms involved.

More specifically:

1)    We identified potential synonyms and antonyms for ‘intellectual humility’ in three ways:

  1. We searched philosophy and psychology journals for articles that discuss intellectual humility; we found 24 papers or related texts (such as calls for proposals, abstracts, and papers).
  2. We performed an Internet search for entries on ‘intellectual humility’ and found 20 entries that dealt in a significant way with the concept.
  3. We identified scales that are used in psychology for constructs that have some similarity to intellectual humility (e.g., the H factor of the Big Six personality inventory).

In all these texts, we identified terms that are used to represent the meaning of ‘intellectual humility’ or its relevant vices.

2)    Four raters that have experience with the philosophical topic of intellectual humility assessed all terms collected in step 1 to determine whether they could be used to express the concept of intellectual humility or a related vice. A term was kept on the list if three out of four raters agreed to do so. In this way, we identified 52 synonyms and 69 antonyms for ‘intellectual humility’. Each term was represented at least in noun form and usually in adjective form also: for example, {tolerance, tolerant}.

3)    We identified all entries for each term generated in step 2 in the database to generate word-bags for each synonym and antonym. For example, the word-bag for ‘tolerance’ included all entries on for the term set {tolerance, tolerant}.

4)    Next, we calculated the similarity in overlap between every pairwise combination of word-bags. For example, the word-bag of ’tolerance’ contains 55 terms and the word-bag of ’broadmindedness’ contains 40 terms. 12 terms are contained in both word-bags. Hence, the similarity between ‘tolerance’ and ‘broadmindedness’ is 12/40 = 0.3. In this way, the similarity measures are always between 0 (no similarity) and 1 (one word-bag is completely contained in the other word-bag).

5)    We checked for highly similar terms (overlaps > 0.5).[1] We collapsed the word-bags of these terms into a single word-bag to reduce the number of synonyms/antonyms. Conceptually, it’s unclear whether terms that share more than half of their semantic constellations represent genuinely distinct constructs. In this way, we reduced the number of synonyms from 52 to 39 and the number of antonyms from 69 to 46. When two terms were collapsed, our raters kept the term that in their estimation was better known. A new word-bag was created combining those of the two collapsed terms. In cases where the word-bag of term X overlapped with two or more terms by > 0.5 whose mutual overlap was, however, below the cutoff-value, the raters determined collapsing based on the highest mutual overlaps. This occurred 2 times for the synonyms and 8 times for of the antonyms. For all condensed word-bags, the similarities were re-calculated. Step 5 was not iterated.

6)    The similarity measures obtained in this way were then used as inputs in a visualization algorithm called superparamagnetic agent mapping, which employs self-organizing agents governed by the dynamics of a clustering algorithm inspired by spin physics to generate denoised low-dimensional representations. To conceptualize this mapping, imagine each term as a particle that naturally repels all other particles. However, as overlap between two terms increases, they become more attracted to each other. Thus, superparamagnetic agent mapping typically produces clumping, where several particles clump together (connoting similarity) while collectively repelling a different cluster (connoting collective difference between the two clusters). It has been shown (Ott et al. 2014) that this method is superior to standard methods such as factor analysis, principal components analysis, and multidimensional scaling in preserving the topology of the data space with clustered data. Since such a map will never precisely display the real topology of the original, high-dimensional space, we calculated for each point on the map the sum of the differences between the point and all its neighbors both in the map and in the original space (normalized to the longest distance in either case). The lower this sum, the better the map displays the real distance distribution of a point from its neighbors in the original space, so this number is a proxy for the quality of the map. To increase the heuristic value of the maps, we rescaled the sizes of the points themselves so that larger points indicate greater topological certainty.

7)    Finally, using the same clustering paradigm in an adapted version from (Ott et al. 2005), we identified clusters on the map generated in step 6.

Step 7 generates the maps below that are then used to inform our reasoning about intellectual humility.

3. Results

We produced three maps to convey our results. Figure 1 is the synonym map, showing the degree of overlap among intellectual humility’s 39 synonyms. The terms predominantly cluster into three groups. The first group (displayed in green) we have labeled the Sensible Self and is exemplified by terms such as ‘comprehension,’ ‘responsiveness’, and ‘mindfulness’. We take this cluster to be representative of the notion that an intellectually humble person will be open and responsive to new ideas and information. The second (pink) cluster we call the Inquisitive Self; it is illustrated by terms such as ‘curiosity’, ‘exploration’, and ‘learning’. The difference between the Sensible Self and the Inquisitive Self indicates that there is some difference between seeking new information or ideas and being open to them when they are presented. Third, we have named the blue cluster the Discreet Self, which is typified by ‘humility’, ‘decency’, and ‘unpretentiousness’. Finally, some terms (shown in black) have intermediate positions among these groups (e.g., ‘flexibility’ and ‘tolerance’) and do not fit neatly within any cluster.


Figure 1: IH Synonym map.

Figure 2 shows the results of the antonym map, displaying the degree of overlap between intellectual humility’s 46 antonyms. The first result to notice is that almost all the terms are aligned along one dimension and cluster at each endpoint. We take this to represent the distinction between underrating and overrating. The larger, red cluster can be thought of as the Overrated Self, and includes terms such as ‘vanity, ‘pride’, and ‘arrogance’. This cluster suggests that one way not to be intellectually humble is to be overly focused on one’s own high status. Overrating oneself is not, however, the only way to fail to be intellectually humble. The opposite endpoint has two closely related clusters that indicate two other ways. There is the Underrated Other in purple (typified by terms such as ‘bias’, ‘prejudice’, and ‘unfairness’) and the Underrated Self cluster in orange, which is similar in that it involves underrating, but the object of underrating is oneself. This cluster is characterized by terms such as ‘diffidence’, ‘timidity’, and ‘acquiescence’. This cluster suggests that there is such a thing as being too humble, such that one’s lack of pride ceases to have any positive value. It is worth noting how close the two (orange and purple) underrated clusters are relative to the (red) overrated cluster. This indicates that there is a higher degree of similarity based on the nature of the rating (over or under) than on who is being evaluated (self or other). Finally, we again see several terms (such as ‘hubris’, ‘chutzpah’, and ‘aloofness’) in white circles in the middle of the line, indicating that these terms do not fit within any cluster. This result should not be surprising since one can be aloof by either overrating oneself or underrating others (or both).


Figure 2: IH Antonym map.

Finally, we mapped all synonyms and antonyms together. We have preserved the colors from the two previous maps. The resulting map preserves many of the structural features of the previous maps, but with a few significant changes. First, it reveals that for the antonyms the linear structure along the poles of the Overrated Self and the Underrated Other is mainly preserved, whereas the terms on the Underrated Self (orange) are in the same region as the terms for the Discreet Self (blue) from the synonym set. Additionally, the distinction between the terms for the Sensible Self (green) and Inquisitive Self (pink) is no longer discernible. This second merger merely indicates that the difference between the Inquisitive Self and the Sensible Self is large enough to be significant when compared to the Discreet Self, but small enough not to be significant when compared to intellectual humility’s antonyms.


Figure 3: Unified synonyms and antonyms map.

4. Discussion & Conclusion

From these results, there are three points we wish to draw out for discussion. First, there is the matter of what the clusters represent. In the antonyms map, we take each cluster to represent a distinct vice, i.e., a different way one can fail to be intellectually humble. For the synonyms, however, two possibilities exist. It might be that each cluster represents a distinct trait, all three of which go by the same name of ‘intellectual humility’. Opposing this semantic diversity thesis is the alternate interpretation that sees each cluster representing a different facet of the single trait of intellectual humility.

Second, consider the merging of the synonym-based Discreet Self and antonym-based Underrated Self in the combined map. We see two possible interpretations. It might be that the discreet aspect of intellectual humility is essentially akin to underrating oneself. Snow (1995) and Taylor (1985) both argue that humility essentially involves recognizing one’s low status or personal faults. If this is right, then either the discreet aspect of humility is more of a vice than a virtue, or the underrated aspect of humility’s antonyms is more of a virtue than a vice. Either way, the valence of one or both of these semantic clusters may need to change. Alternatively, there might be two different traits picked out by these clusters – one a virtue and the other a vice – that are behaviorally similar enough that they are easily conflated. Someone who underrates herself will behave very similarly to a discreet person. They will both not regularly speak up about controversial topics, in praise of themselves, or for their own rights and entitlements, making it difficult to differentiate them behaviorally. There could, however, be an underlying psychological difference that typically goes unobserved. The discreet person may not often attend to evaluating herself, but when she does so, she does it accurately. One who underrates herself, however, may pay significant attention to her own merits, but regularly devalue them. Further research on the behavioral and psychological aspects of intellectual humility and its contraries may help to answer this question.

The final point relates back to the Big Six personality inventory (Ashton et al. 2004; Saucier 1997). As mentioned earlier, the H factor is meant to represent facets of personality related to honesty and humility. The 100-item revised version measures the participant’s humility (specifically her modesty) by having her indicate (dis)agreement with statements such as “I am an ordinary person who is no better than others.” We worry that the Big Six therefore includes in its H dimension items that are better understood as contrary to humility, not allied with or constitutive of it.



[1]This cut-off value was chosen based on a logarithmic count of the long-tailed distance distribution such that the tail was cut off before the beginning of the main mode of the distribution (i.e., the largest mode in a multi-modal distribution).

the recognition heuristic and epistemic injustice

Now for the poet, he nothing affirmeth, and therefore never lieth.

The Defense of Poesy, by Sir Philip Sidney

It’s easy, especially for a white man like me, to take for granted my capability to assert.  If I want to say something — in person, on a blog, to a reporter, to an administrator at my university — all I have to do is open my mouth or start typing.  What could be simpler?

But any particular act of asserting, like any speech act at all, is possible only because it originates in a complex linguistic, social, and cultural matrix.  Some elements of this matrix are obvious and uncontroversial when pointed out.  I can’t say something to you if we don’t speak the same language and have no a way of translating from my language to yours.  Likewise, I can’t make an assertion if I’ve established a reputation, like the boy who cried ‘wolf!’, as unreliable: in that case, any intelligent interlocutor would treat the probability of p given that I said ‘p’ as equivalent to the prior probability of p:

P(p | Mark says ‘p) = P(p)

P(wolf | boy cries ‘wolf!’) = P(wolf)

My word would carry no weight one way or the other.  It’s unclear whether I’ve even made an assertion when my word has no weight — especially if I know in advance that I’m so distrusted.

What if I’ve established no reputation one way or another?  You might think that, in such a scenario, the default should be to trust me, to give my word some, though of course not dispositive, weight.  Call this default assertoric empowerment: an epistemic agent S is default-empowered to assert that p for a range R of propositions just in case S’s saying that p (when p is in R) typically carries some evidentiary weight even with strangers. (I’m drawing here on Searle’s idea of empowerment in The Construction of Social Reality.)

For other kinds of speech acts, it’s obvious that constraints are placed on empowerment.  Not just anyone can issue me directives.  “Eat your vegetables” carries some force when my wife says it to me, but not when the bus boy at a restaurant says it to me.  “Class dismissed” will end my class when I say it, but it won’t end my class when you say it or your class when I say it.  I can’t promise to give you the Grand Canyon for your birthday because I don’t own, and have no way of acquiring, the Grand Canyon.  One needs to be suitably empowered to give people orders, to declare X to be Y, or to promise to Z.

For “pushy” speech acts such as directives and declaratives, default empowerment is highly circumscribed.  There are very few things that any given person is assumed by default to be able to command others to do.  “Stop harming me” is probably one, though that presupposes that the speaker is in fact being harmed.  “Don’t harm me” might work a little better.  Likewise, there are very few things that any given person is assumed to be able to declare.  I can’t declare myself President, declare myself tenured, or name your baby.  Most default declarative empowerments seem to have to do with voluntary affiliations.  I can declare myself a Christian, or an atheist, or a socialist, or gay.  Historically, though, even these kinds of affiliations couldn’t be declared by default.  After the Peace of Westphalia, a German peasant couldn’t declare his own religious affiliation: it was declared for him by his prince.  Until very recently, it was impossible to self-identify as homosexual because there was no concept or word for the category.  Even after the words and concepts were forged, self-declaring as gay was not default-empowered: someone who tried might, instead of being acknowledged, face electroshock therapy.  In 2013, Bangladesh recognized a third gender category of hijras, who are neither men nor women.

Not so, one might think, with assertions.  Unless one is explicitly disempowered because one is severely mentally ill, a very young child, or a notorious liar, one is default-empowered to assert that p for a very wide range R.  I want to challenge this assumption.  Just for starters, consider the fact that in ancient Greece the testimony of a slave was admissible as evidence in a trial only if it was acquired under torture.  This shows that belonging to a certain social category has been enough, historically, to disempower someone from making an assertion unless very special steps were taken.  Surely, though, things have improved in the ensuing centuries.  But how much?  Even in progressive Sweden, a woman’s “no” still means “yes.”  In the USA, a black man’s saying “I’m not resisting arrest” can still lead to charges of… resisting arrest.  Sad to say, default assertoric empowerment does not characterize the epistemic lives of many, many people: whether you’re empowered to say that p depends on which social category you belong to.  In this post, I’ll just assume that it’s clear that the examples of assertoric disempowerment I’ve mentioned are repugnant.  Those who share my sensibilities will agree that women should be default-empowered to say (and mean) no, that black people should default-empowered to say (and mean) that they’re not resisting arrest, and that it should never be a condition on someone’s assertoric empowerment that s/he first be tortured.

It’s useful, then, to distinguish normative assertoric empowerment from descriptive assertoric empowerment.  On the one hand, default assertoric empowerment shouldn’t depend on the social category the speaker belongs to.  On the other hand, it often does.  What seems to happen all too often can be captured by a relativized version of the empowerment schema:

An epistemic agent S of socio-cultural category C is default-empowered to assert that p for a range R of propositions just in case S/C’s saying that p (when p is in R) typically carries some evidentiary weight.

When descriptive default assertoric empowerment diverges from normative default assertoric empowerment because of the role of the C-variable, we have an instance of social-categoriy-based-epistemic injustice.  In other words, if your belonging to a social category that should be irrelevant to whether you are empowered to say that p disempowers you from saying that p, you have been wronged.  (On the other side of the coin, if you are unfairly privileged to say that p only because you belong to a particular social category, a different sort of epistemic injustice has been committed.)  I won’t even attempt to lay out a general account of when people of a given category should or should not be default-empowered to assert that p.  For one thing, I don’t have the space here.  For another, I have no idea how to do so.  What I do want to try in the balance of this post is to convince you that a particularly pernicious form social-category-based epistemic injustice, in which people’s capacity to make assertions is undermined, is rife in the news — in particular, in the coverage of violent ongoing conflicts.

People don’t have time to travel the world in search of everything worth knowing.  We rely on reporters and newspapers to tell us what’s worth knowing.  We expect that, if we’ve chosen an epistemically responsible paper to read, then if it systematically ignores something, that thing isn’t worth knowing about.  One way in which epistemic injustice can crop up, then, is that people who have important assertions to make are systematically ignored because of where they’re from.  If you won’t be heard — and you know that you won’t be heard — then you cannot speak.  If you cannot speak even though you have something important to say, and your silence is determined by the social group you belong to, then epistemic injustice has occurred.

In decades of research, Gerd Gigerenzer and his collaborators have shown that the degree to which something is covered in the news is highly predictive of whether people in other countries recognize that thing.  Moreover, people seem to use the fact that they recognize something to decide whether it is large on some important dimension.  This “recognition heuristic” can be a powerful epistemic tool when the importance of something correlates with how much it gets covered in the news, and hence how many people recognize it and think it’s important.  For instance, Americans are surprisingly good (and better than Germans) at saying which of two German cities is bigger because they tend to recognize only some of them, and almost always say that the one they recognize is bigger.  Likewise, Germans are surprisingly good (and better than Americans) at saying which of two American cities is bigger because they tend to recognize only some of them, and almost always say that the one they recognize is bigger.

Population is an important dimension of a city, so it reflects well on major newspapers that their coverage (and hence our recognition and decision-making) tracks city population pretty well.  Indeed, correlations between population, news coverage, and proportion of people recognizing a city tend to be at least .60 and as high as .86.  On the plausible assumption that people from different cities have roughly as much of note to say as one another, high correlations like this indicate that epistemic justice is being served.  In other research, however, I’ve started to document problems with this model when cities outside of the US and Europe are thrown into the mix (see this post and follow-ups on my blog).  Although the correlation between population and coverage is .83 for the New York Times‘s coverage of German cities and .77 for Argentine cities, it’s a measly .41 for Turkish cities and drops to .19 when cities from Germany, Argentina, Turkey, Thailand, and Nigeria are considered together.  Ignoring for the sake of brevity a lot of important caveats, the reason for the international discrepancy is that cities outside of Europe are covered much, much less than those in Europe.  Here’s a graph that represents the correlations between ordinal population ranking and ordinal NYT coverage ranking for Germany and the rest of the world:

Screen Shot 2014-02-15 at 3.49.56 PM


Note the many cities, some of which are quite large, tied for last place with 0 mentions in the NYT.  If you lived in one of those cities between 2000 and 2010 (the dates covered by my analysis), you could not speak to the world — at least, not through the NYT.  Geography determines communicative destiny.

One might think that I’m overstating the case.  After all, maybe nothing important is going on in cities outside of Europe.  Maybe entire cities have lost their default assertoric empowerment because they have nothing worth saying.  Surely, though, you’d admit that whether people are meeting violent deaths in a given area would make that area remarkable.  If a newspaper fails to cover large-scale violence, then it is committing epistemic injustice against the survivors and victims, who presumably want to say something worth hearing about their plight.  The number of people killed in armed conflict is an important dimension of the such a horrific event.  One would hope, then, that the amount of news coverage would correlate well with the severity of the horror.  Sadly, this is not so.  To show this, I correlated the number of violent deaths in 2013 in a given area with the number of articles in the NYT that mentioned killing in the area in question.  There were 17 conflicts in which at least 100 people were killed (an arbitrary cutoff I imposed before looking at any correlations).  The correlation between the number of deaths in 2013 and the number of articles mentioning those deaths in 2013 was a paltry .28.  Here’s a scatterplot:

Screen Shot 2014-02-15 at 4.23.57 PM


The blue line is a regression line for the data.  It’s got a shadow around it indicating the 95% confidence interval.  Basically, what this means is that we can be 95% certain that the true regression line lies somewhere in the shaded area.  Notably, this means that, although the point-estimate of the correlation is .28, the real correlation could be positive, negative, or zero.  In other words, for all we know from this data, there is no correlation between the number of people killed in a violent conflict and the number of times that conflict is mentioned in the NYT.

My facehole talks

