Google Ngram and the Genealogy of Values

In his funeral oration, Pericles described Athenians as valorous, democratic, just, cultured, open, refined, knowledgeable, deliberative, daring, generous, liberal, versatile, adventurous, noble, dutiful, honorable, free, and patriotic.  In his parody of Pericles, the Menexenus, Plato has Socrates (quoting Aspasia) describe them as just, pious, aristocratic / democratic / meritocratic, equal, free, compassionate (described as vice!), and pure Hellene.  Speaking more in his own voice, in the Republic, Plato calls temperance, wisdom, courage, and justice the primary human virtues.  Aristotle’s Nicomachean Ethics contains a large catalogue of virtues: courage, temperance, generosity, magnificence, magnanimity, pride, good-temper, honesty, wit, justice, and friendship.  Hume has an even more capacious list, which includes at least 70 distinct virtues.

In some recent work, I’ve been examining the geographical diversity of values by mapping out the value-laden terms used in the obituaries of various local newspapers.  Another way to explore values, though, is temporally.  In particular, I’m interested in how values rise and fall relative to one another.  One obvious example is the pride/humility pair.  For the ancients, pride was a virtue and humility a vice.  Christianity reversed that.  What other reversals — in emphasis if not in valence — have occurred?

To help explore this question, I’ve begun using google’s ngram lab, which tracks the usage of terms in google’s massive database over the decades and even centuries.  Here are some (very) preliminary results.

First, it looks like humility and pride have done another dosey doe:

pride vs humility


The x-axis represents the year of publication.  The y-axis represents the percentage of total words published that year.  Thus, we can see that ‘humble’ was used more often than ‘proud’ until the late 19th century, during which it took a nosedive.  Of course, this ngram doesn’t tell us whether people were saying “you should be humble/proud” or “you shouldn’t be humble/proud,” but the collapse of ‘humble’ is striking.

Second, consider the three most common terms in the deontic square of opposition: ‘obligatory’, ‘permissible’ (not obligatory not to do), and ‘forbidden’ (obligatory not to do).  (I leave out omissible, since it’s a philosophers’ term.)

Screen Shot 2014-03-17 at 2.26.18 PM


Two things are worth noting about this.  First, the forbidden gets more play than the obligatory throughout the writings of the last 400 years.  This should be unsurprising to anyone who’s aware of the Knobe effect and various other demonstrations that norm violations get a lot more attention than norm-conformity.  Second, starting in the late 19th century, permissibility crossed over obligation.  What does that mean?  I suggest that a plausible interpretation is that — as social strictures loosened — the zone of the merely permissible was opened up.  James Fishkin calls this “the zone of indifference or permissibly free personal choice” (1982, p. 23), and argues that any adequate moral theory should recognize it.  But notice that, from a historical point of view, his claim looks like an innovation.  This is not to say that he’s wrong, of course, but it does suggest that he might be drawing on a rather narrow, culturally-bound set of intuitions.

Third, take a look at the ngram comparing ‘autonomy’ with ‘obedience':

autonomy vs obedience


The period between 1850 and 1950 seems to have been a time of great change!  Obedience, another Christian virtue, plumets while autonomy experiences a study rise.

Fourth, consider the basic emotions (fear, sadness, surprise, contempt, anger, and disgust).  Every language has words for them.  Every culture uses the same basic facial expression to signal them.  They are keyed to different important features of our environments and social worlds.  How have the words that refer to them been used historically?

basic emotionsFear — the emotion that tracks threats — is the clear winner.  But there are some interesting changes as well.  Contempt — the emotion that tracks and enforces socialhierarchy — surged in the late 18th century then experienced a slow but steady decline.  Meanwhile, anger and surprise have seen a slight rise in recent decades.  Does this suggest anything?  Well, social hierarchy is still here, but its pervasiveness and hegemony have declined somewhat.  I’m not sure how to interpret the results for ‘anger’ and ‘surprise’.

One last ngram — and one that would have made Nietzsche happy: ‘bad’ versus ‘evil:

bad evil

Nietzsche famously argues in the first essay of the Genealogy of Morals that Christianity instigated a slave revolt in morals, during which the good/bad distinction was inverted into the evil/good distinction. (What was good in the aristocratic culture became evil in Christian culture, while what was bad in aristocratic culture became good in Christian culture.) This ngram suggests thatm to the extent that ‘evil’ is on the decline and ‘bad’ is on the rise, this inversion has been partially undone.

Thoughts on methodology?  Suggestions for other comparisons?  Questions?


3 thoughts on “Google Ngram and the Genealogy of Values

  1. Mark, this is fascinating. Look also at “love” and “good.” Interesting correlation. “Liberty” drops under “equality” for a time; the latter remains steady. Nietzschean virtues like “creativity”, “artistic”, and “masterful” receive a huge jump in the 20th c. The seven Catholic virtues drop from 1840 steadily to WWII and then stay consistently low.

  2. This direction is quite promising, as far as I can tell.

    I am not entirely certain of its actual significance, but I think for less protracted parcels of time it is helpful to set the “smoothing” to 0. I understand this as allowing for more nuanced interpretations, in which you can consider more specific or narrow event horizons (but a more rigorous statistical analysis would be required to prove this, especially for the most specific tests). Also, I’ve been troubled by the potential distance between what Saussureans would call the signifier and signified. For instance, the inversion you note above between “bad” and “evil” might be superficial, and connote the same moral depravity–leveling– of modernity (wherein even Christian morality loses its spiritedness and vigor, and we approach some kind of modern epoche, which would cause Nietzsche to roll over in his anti-grave with Apollonian despair and Dionysian irony. The final victory of certain nothings…)

    That being said, this open database is ripe for the picking, but you’re right to be cautious with regard to methodology, especially when it concerns the sometimes intentionally impervious sphere of the most important questions. It might be large enough that chaotic dynamics might be positively identified, which would be a boon to social science, and perhaps the humanities at large. The skills that would confirm or deny this is beyond me at the moment. In any case, I wonder if the same enormity of the dataset would even control for the disjuncture between the signifier and the signified that would be the preferred nitpick of post/structuralists, but that’s just speculation which would require a pretty extensive analysis in itself.

    These are just some thoughts though, and I’d be interested in your comments.


