Integrated draft of “Experimental Moral Philosophy” for SEP

On this blog, I’ve been posting draft sections of the entry “Experimental Moral Philosophy” for the Stanford Encyclopedia of Philosophy.  Thanks to everyone who offered suggestions, comments, and criticisms here and by email.  Here’s something closer to the final draft.  I’ve incorporated feedback to the extent that I can and integrated the parts into a unified article.  It’s not quite done yet, as it needs to be merged with Don Loeb‘s work-in-progress on experimental approaches to meta-ethics, but it’s pretty close.

Without further ado…

Experimental Moral Philosophy

Draft 29 April 2013[1]

1 Introduction

Experimental moral philosophy began to emerge as a movement in the last decade of the twentieth century, a branch of the larger experimental philosophy (X-Phi, XF) movement.  From the beginning, it has been embroiled in controversy on a number of fronts.   Some doubt that it is philosophy at all.  Others acknowledge that it is philosophy but think that it has produced modest results at best, and confusion at worst.

Before the research program can be evaluated, we should have some conception of its scope.  But controversy surrounds questions about its boundaries as well.  We can call this the demarcation problem.  The distinction between experimental and non-experimental philosophy is clearly not identical to the distinction between a posteriori and a priori philosophy.  Experimental evidence is a proper subset of empirical evidence, which is itself a subset of a posteriori evidence.

One reason the demarcation problem is so fraught is that the movement known as experimental philosophy is relatively new.  Another is that practitioners of experimental philosophy are motivated by concerns that pull in opposing directions.  On the one hand, there is a desire to see experimental philosophy as continuous with the history of philosophy, and thus to view the class broadly.  On the other hand, there is an inclination to think of the movement as novel—even revolutionary, in the smoldering light of its burning armchair logo—and thus to characterize it more narrowly.  When we factor in issues about the scope of moral philosophy in general, things become even more complex.  There are a number of dimensions along which we might wish to understand the field more or less broadly; among the most important are:

1.1 How Historically Rooted is the Research Program?

Admirers are likely to think that while there is something exciting and distinctive about the experimental philosophy movement that began so recently, there is little room for doubt about its claim to philosophical respectability since it is really just an extension of traditional philosophical approaches going back to Aristotle and many others in the philosophical canon, who would not have seen an important distinction between science and philosophy or were as interested in “natural philosophy”—or science—as in anything else.  Distinctions among fields of study, defenders note, are relatively recent phenomena.[3]  What matters is basing one’s claims and arguments on the best evidence one can get one’s hands on.  If that evidence turns out to be experimental, so be it.

1.2 What Counts as an Experiment?

Paradigmatic experiments involve randomized assignment to condition of objects from a representative sample, followed by statistical comparison of the outcomes for each condition.  For example, an experimenter might try to select a representative sample of people, randomly assign them either to find or not to find a dime in the coin return of a payphone, and then measure the degree to which they subsequently helped someone they took to be in need.[4]  Finding or not finding the dime is the condition; degree of helpfulness is the outcome variable.

While true experiments follow this procedure, “studies” allow non-random assignment to condition.  Astronomers, for example, sometimes speak of “natural experiments”.  Although we are in no position to affect the motions of heavenly bodies, we can observe them under a variety of conditions, over long periods of time, and with a variety of instruments.  Likewise, anthropological investigation of different cultures’ moral beliefs and practices is unlikely to involve altering variables in the lives of the cultures’ members, but such studies are valuable and empirically respectable.  Still, for various reasons, the evidential value of studies is less than that of experiments, and so it is reassuring that most published research in experimental philosophy involves true experiments.

The distinction between studies and experiments primarily concerns input variables.  Another distinction has to do with the outcome variables.  Currently, the paradigmatic outcome measure for most of experimental philosophy is the survey.  There are decades of top-flight research in personality and social psychology using the survey paradigm, but it is also subject to incisive criticism.  Instead of revealing what people think, surveys may establish what people think they think, what they think the experimenter wants them to think, what they think other participants think, or just something they made up because they had to provide a response.  This is a worry that arises for all surveys, not just for experimental philosophy, but it is especially pertinent in the context of experimental philosophy because participant responses are often treated as equivalent to “what one would say” or even “how one would judge.”

The survey paradigm suffers from other methodological drawbacks as well.  Presumably, morality is not just a matter of what people would say, but also a matter of what they would notice, do, and feel, as well as how they would reason and respond.  Surveys provide only minimal and indirect evidence about these latter activities, and so it would behoove experimental philosophers to employ or devise further outcome measures that would more reliably and directly capture these features of morality.

Arguably, other modes of inquiry such as historical research should be treated as natural and legitimate extensions of paradigmatically experimental approaches.  Indeed, some would treat history itself as a series of quasi-experiments in which different ways of life produce a variety of outcomes.  Once again, what matters most is whether the conclusions of the inquiry are grounded in evidence rising to the standards set by social and natural science.

According to the most capacious view of experimental philosophy, its guiding principle is that when philosophical arguments invoke or make assumptions about empirical matters, those assumptions should be assessed according to the best natural and social scientific evidence available, and that if such evidence is not currently available it should, where possible, be acquired, either by seeking the help of specialists with the relevant scientific training or by having well-trained philosophers conduct the research themselves.  One’s philosophy can be empirically informed, empirically uninformed, or empirically misinformed.  There is no fourth option.  Thus, the term, “experimental moral philosophy” might be replaced by a phrase like empirically well-informed moral philosophy.  While this is unobjectionable in principle, there may still be local objections to the scientific methodology and to the relevance or probative force of empirical research in particular cases – some of which I canvass in the concluding section of this entry.

1.3 Who Conducts the Experimental Investigation?   

A good deal of the work in experimental moral philosophy involves experiments performed by philosophers in pursuit of answers to philosophical questions.  In other cases, philosophers draw on experimental results produced by social scientists.   It is hard to see why it would be important that the philosopher herself have done the experimental work.  Surely what matters is whether the experiment is well-designed, as free of confounds as possible, sufficiently highly powered, carefully interpreted, informed by the existing literature, and so on.  One of the capacities cultivated by philosophical training is the ability to imagine and construct potential counterexamples and counterarguments, so philosophers can be especially insightful in helping to design experiments in such a way that their interpretation is not plagued by such problems.  And when the conclusions aimed at are recognizably philosophical, there seems to be little room for doubt that philosophers should be included in experimental design and interpretation of results.  They are likely to improve the operationalization of the relevant concepts and will then be better able to interpret the results.

By way of illustration, consider the case of Richard Brandt, the well-known utilitarian theorist, who spent time living among the Hopi Native Americans as an anthropological observer in an effort to understand their ethical views.  During that time, he conducted the empirical investigation himself, but in later work he called upon a great deal of research by others, almost all of whom were social scientists by training, not philosophers.[5]  It seems arbitrary to include the former work while excluding the latter, provided that the latter lives up to the same standards that any good science must satisfy.

1.4 How Directly Must the Experimentation be Related to the Philosophy?

Moral philosophers often rely on beliefs about the way the world is and works and how it got that way – assumptions whose empirical credentials are impeccable but remote, or the justification for which is unimportant so long as the justificatory status is well-established.  A philosopher considering the moral permissibility of stem cell research, for example, ought to understand the nature of these cells, gestation and embryonic development, the aims of the research projects employing (or seeking to employ) stem cells, and their prospects for success.  Knowledge about these matters depends in many ways on the results of scientific experimentation, though perhaps conducted long ago and without any expectation of philosophical payoff.

Still, in some cases it is important that an experiment aim at answering a particular question, since this can improve experimental design.  Experiments that merely aim to find out “what will happen when…” tend to be under-theorized, to rely on the wrong distinctions, and even to be uninterruptable.  Well-designed experiments make specific predictions in advance of data-collection, rather than spinning post hoc just-so stories after the data are in.

Furthermore, the proportion of published false-positives is much higher for unexpected and unpredicted results than for expected and predicted results.  Statistical analysis is not deductive inference.  By the standards typically brought to bear in statistical analysis of experimental results (viz., a p-value below .05), there is a one-in-twenty chance that any given “result” is actually just a fluke, and that nothing has in fact been detected.  Since experimentalists are reluctant to report (and even discouraged by journal editors and referees from reporting) null results (i.e., results where the p-value is at least .05), for every published, unexpected result there may be nineteen unpublished, unexpected non-results.[6]  In other words, unexpected, published results are much more likely to be false positives.  For these reasons, especially when the results are surprising, it is important that the experiments have been conducted with the relevant philosophical issues in mind.

1.5 What Counts as Experimental Philosophy, as Opposed to Psychology?

As is often the case, there are examples at either extreme, for which it’s easy to give an answer.  Virtually any question in ethics will have an empirical dimension.  Most people would agree that it is impermissible to torture an animal just for fun and that the reason has something to do with the pain the animal would experience.  No doubt part of our evidence that the animal would indeed suffer is grounded in experimental or at least empirical knowledge of animal psychology.  But it seems overbroad to think of this case as arising in experimental moral philosophy.  Thinking in terms of empirically well-informed moral philosophy appears to vitiate the question of how to characterize a case like this, and arguably captures the most important motivations underlying the more common phraseology.

Likewise, it seems more accurate to think of research showing that people are much more inclined to be organ donors if donation is the default position and those wishing not to donate are permitted to opt out (versus not-donating being the default position and those wishing to donate being permitted to opt in) as psychology or sociology rather than philosophy.  But good thinking about the moral question of what sort of policies, practices, and institutions to adopt ought to take note of this evidence.  Once again, moving the focus from experimental to scientifically well-informed moral philosophy might be thought to vitiate the concern about how to characterize such cases, without sacrificing much that is useful to understanding the various research projects.  Indeed, the problem of what counts as experimental moral philosophy is just a particular application of the vexed (but perhaps unimportant) question of what counts as philosophy in the first place.

Here is a roadmap for the rest of this entry.  In the next main section, I canvass experimental research on moral judgments and intuitions.  In so doing, I describe the various programmatic uses to which experimental results have been put, then turn to specific examples of experimental research on moral judgment and intuition, including intuitions about intentionality and responsibility, and moral judgments modeled by the so-called “linguistic analogy.”  In the next main section, I discuss experimental results on “thick” (i.e., simultaneously descriptive and normative) topics, including character and virtue, wellbeing, and emotion and affect.  In the final main section, I consider and respond to some potential objections to experimental moral philosophy.

2 Moral Judgments and Intuitions

As I mentioned above, morality and ethics encompass not only moral judgment but moral perception, behavior, feeling, deliberation, reasoning, and so on.  Experimentalists can and have investigated moral intuition, moral judgments, moral emotions, and moral behaviors, among other things.  The most thoroughly investigated is intuitive moral judgment, which I discuss in this section.

2.1 Two “Negative” Programs

One project for the experimental ethics of moral judgment, associated with Stephen Stich and Jonathan Weinberg,[7] is to determine the extent to which various moral intuitions are shared by philosophers and ordinary people.[8]  As some experimental philosophers are fond of pointing out, many philosophers appeal to intuitions as evidence or use their content as premises in arguments.  They often say such things as, “as everyone would agree, p,” “the common man thinks that p,” or simply, “intuitively, p.”  But would everyone agree that p?  Does the common man think that p?  Is p genuinely intuitive?  These are empirical questions, and if the early results documented by experimental philosophers survive attempts to replicate, the answer would sometimes seem to be negative.  This raises the question of how much work the adverb ‘intuitively’ is meant to do when it comes out of a philosopher’s mouth.  If it’s just something she says to clear her throat before she makes an assertion, then the fact that intuitions exhibit a great degree of variance may not matter.  If, on the other hand, the claim “intuitively p” is meant to be evidence for p, the philosophers who make such claims should tread carefully.  Despite some recent protests to the contrary, it is clear that prominent philosophers of the twentieth century have placed a great deal of evidential weight on a judgment’s being intuitive.  For instance, in Naming and Necessity, Saul Kripke says, “some philosophers think that something’s having intuitive content is very inconclusive evidence in favor of it.  I think it is very heavy evidence in favor of anything, myself.  I really don’t know, in a way, what more conclusive evidence one can have about anything, ultimately speaking” (1980, p. 42)

Furthermore, the factors that, according to experimental philosophers, predict disagreement about supposedly intuitive philosophical claims are often non-evidential.  If these experimentalists are right, women sometimes find p intuitive, whereas men find ~p intuitive (Buckwalter & Stich forthcoming); Westerners mostly agree that q, but East Asians tend to think ~q (Machery, Mallon, Nichols, & Stich 2004);[9] and people find r plausible if they’re asked about s first, but not otherwise (Nadelhoffer & Feltz 2008; Sinnott-Armstrong 2008; Sinnott-Armstrong, Mallon, McCoy, & Hull 2008).  This leads to the second use of experimental evidence: arguing for the (un)reliability of moral intuitions, and, to the extent that moral judgments are a function of moral intuitions, those as well.  Walter Sinnott-Armstrong (2008) and Eric Schwitzgebel and Fiery Cushman (2012) have recently followed this train of thought, arguing that moral intuitions are subject to normatively irrelevant situational influences (e.g., order effects), while Feltz & Cokely (2009) and Knobe (2011) have documented correlations between moral intuitions and (presumably) normatively irrelevant individual differences (e.g., extroversion).  Such results, if they can be replicated and adequately explained, may warrant skepticism about moral intuition, or at least about some classes of intuitions or intuiters.

2.2 Three “Positive” Programs

Other philosophers are more sanguine about the upshot of experimental investigations of moral judgment and intuition.  Joshua Knobe, among others, attempts to use experimental investigations of the determinants of moral judgments to identify the contours of philosophically interesting concepts and the mechanisms or processes that underlie moral judgment.  He has famously argued for the pervasive influence of moral considerations throughout folk psychological concepts (2009, 2010; see also Pettit & Knobe 2009), claiming, among other things, that the concept of an intentional action is sensitive to the foreseeable evaluative valence of the consequences of that action (2003, 2004b, 2006).[10]

Others, such as Joshua Greene and his colleagues (2001, 2004, 2008), argue for dual-systems approaches to moral judgment.  On their view, a slower, more deliberative, system tends to issue in utilitarian judgments, whereas a quicker, more automatic system tends to produce Kantian judgments.  Which system is engaged by a given moral reasoning task is determined in part by personal style and in part by situational factors.[11]

A related approach favored by Chandra Sripada (2011) aims to identify the features to which intuitions about philosophically important concepts are sensitive.  Sripada thinks that the proper role of experimental investigations of moral intuitions is not to identify the mechanisms underlying moral intuitions – such knowledge, it is claimed, contributes little of relevance to philosophical theorizing.  It is rather to investigate, on a case by case basis, the features to which people are responding when they have such intuitions.  On this view, people (philosophers included) can readily identify whether they have a given intuition, but not why they have it.  An example: “manipulation cases” have been thought to undermine compatibilist notions of free will.  In such a case, an unwitting person is surreptitiously manipulated into having and reflectively endorsing a motivation to j.  Critics of compatibilism say that such a case satisfies compatibilist criteria for free will, and yet, intuitively, the actor is not free.  Sripada showed, however, through both mediation analysis and structural equation modeling, that to the extent that people feel the manipulee not to be free, they do so because they judge him in fact not to satisfy the compatibilist criteria.  Thus, by determining which aspects of the case philosophical intuitions are responding to, it may be possible to resolve otherwise intractable questions.

2.3 An Example: Intentionality and Responsibility

Since Knobe’s seminal (2003) paper, experimental philosophers have investigated the complex patterns in people’s dispositions to make judgments about moral notions (praiseworthiness, blameworthiness, responsibility), cognitive attitudes (belief, knowledge, remembering), motivational attitudes (desire, favor, advocacy), and character traits (compassion, callousness) in the context of violations of and conformity to norms (moral, prudential, aesthetic, legal, conventional, descriptive).[12]  In Knobe’s original experiment, participants first read a description of a choice scenario: the protagonist is presented with a potential policy (aimed at increasing profits) that would result in a side effect (either harming or helping the environment).  Next, the protagonist explicitly disavows caring about the side effect, and chooses to go ahead with the policy.  The policy results as advertised: both the primary and the side effect occur.  Finally, participants are asked to attribute an attitude (or, in the case of Robinson et al. 2013, a character trait) to the protagonist.  What Knobe found was that participants were significantly more inclined to indicate that the protagonist had intentionally brought about the side effect when it was bad (harming the environment) than when it was good (helping the environment).  This effect has been replicated dozens of times, and its scope has been greatly expanded from intentionality attributions after violations of a moral norm to attributions of diverse properties after violations of almost every imaginable kind of norm.

The first-order aim of interpreters of this body of evidence is to create a model that predicts when the attribution asymmetry will crop up.  The second-order aims are to explain as systematically as possible why the effect occurs, and to determine the extent to which the attribution asymmetry can be considered rational.  To organize these aims, I have modeled how participants’ responses to this sort of vignette are produced:


Figure 1: Model of Participant Response to X-Phi Vignettes

In this model, the boxes represent entities, the arrows represent causal or functional processes, and the area in grey represents the mind of the participant, which is not directly observable but is the target of investigation.  In broad strokes, the idea is that a participant first reads the text of the vignette and forms a mental model of what happens in the story.  On the basis of this model (and almost certainly while the vignette is still being read), the participant begins to interpret, i.e., to make both descriptive and normative judgments about the scenario, especially about the mental states and character traits of the people in it.  The participant then reads the experimenter’s question, forms a mental model of what is being asked, and – based on her judgments about the scenario – forms an answer to that question.  That answer may then be pragmatically revised (to avoid unwanted implicatures, to bring it more into accord with what the participant thinks the experiment wants to hear, etc.) and is finally recorded as an explicit response to a statement about the protagonist’s attitudes (e.g., “he brought about the side effect intentionally” on a Likert scale.[13]

What we know is that vignette texts in which a norm violation is described tend to produce higher indications of agreement on the Likert scale responses.  What experimental philosophers try to do is to explain this asymmetry by postulating models of the unobservable entities.

Perhaps the best known is Knobe’s conceptual competence model, according to which the asymmetry arises at the judgment stage.  He claims that normative judgments about the evaluative valence of the action influence otherwise descriptive judgments about whether it was intentional (or desired, or expected, etc.), and that, moreover, this input is part of the very conception of intentionality (desire, belief, etc.).  Thus, on the conceptual competence model, the asymmetry in attributions is a rational expression of the ordinary conception of intentionality (desire, belief, etc.), which turns out to have a normative component.[14]

The motivational bias model (Alicke 2008; Nadelhoffer 2004, 2006) agrees that the asymmetry originates in the judgment stage, and that normative judgments influence descriptive judgments.  However, unlike the conceptual competence model, it takes this to be a bias rather than an expression of conceptual competence.  Thus, on this model, the asymmetry in attributions is a distortion of the correct conception of intentionality (desire, belief, etc.).

The deep self concordance model (Sripada 2010, 2012; Sripada & Konrath 2011) also locates the source of the asymmetry in the judgment stage, but does not recognize an influence (licit or illicit) of normative judgments on descriptive judgments.  Instead, the model claims that when assessing intentional action, people not only pay attention to a person’s “surface” self – her expectations, means-end beliefs, moment-to-moment intentions, and conditional desires – people also pay attention to the person’s “deep” self, which harbors her sentiments, values, and core principles.  According to the model, when assessing whether someone intentionally brings about some state of affairs, people determine whether there exists sufficient concordance between the outcome the agent brings about and her deep self.  For instance, when the chairman says he does not care at all about either harming or helping the environment, people attribute to him a deeply anti-environment stance. When he harms the environment, this is concordant with his anti-environment deep self; in contrast, when the chairman helps the environment, this is discordant with his anti-environment deep self.  According to the deep self concordance model, then, the asymmetry in attributions is a reasonable expression of the folk psychological distinction between the deep and shallow self (whether that distinction in turn is defensible is of course another question).

Unlike the models discussed so far, the conversational pragmatics model (Adams & Steadman 2004, 2007) locates the source of the asymmetry in the pragmatic revision stage.  According to this model, participants judge the protagonist not to have acted intentionally in both norm-conforming and norm-violating cases.  However, when it comes time to tell the experimenter what they think, participants do not want to taken to suggest that the harm-causing protagonist is blameless, so they report that he acted intentionally.  This is a reasonable goal, so according to the pragmatic revision model, the attribution asymmetry is rational, though misleading.

According to the deliberation model (Alfano, Beebe, & Robinson 2012; Robinson, Stey, & Alfano forthcoming; Scaife & Webber forthcoming), the best explanation of the complex patterns of evidence is that the very first mental stage, the formation of a mental model of the scenario, differs between norm-violation and norm-conformity vignettes.  When the protagonist is told that a policy he would ordinarily want to pursue violates a norm, he acquires a reason to deliberate further about what to do; in contrast, when the protagonist is told that the policy conforms to some norm, he acquires no such reason.  Participants tend to model the protagonist as considering what to do when and only when a norm would be violated.  Since deliberation leads to the formation of other mental states – such as beliefs, desires, and intentions – this basal difference between participants’ models of what happens in the story flows through the rest of their interpretation and leads to the attribution asymmetry.  On the deliberation model, then, the attribution asymmetry originates much earlier than other experimental philosophers suppose, and is due to rational processes

Of course, single-factor models are not the only way of explaining the attribution asymmetry.  Mark Phelan and Hagop Sarkissian (2009, p. 179) find the idea of localizing the source of the asymmetry in a single stage or variable implausible, claiming that “attempts to account for the Knobe effect by recourse to only one or two variables, though instructive, are incomplete and overreaching in their ambition.”  While they do not propose a more complicated model, it’s clear that many could be generated by permuting the existing single-factor models.

2. 4 Another Example: The Linguistic Analogy

John Rawls (1971) famously suggested that Noam Chomsky’s (1965) generative linguistics might provide a helpful analogy for moral theorists – an analogy that Gilbert Harman (2000, 2008; Roedder & Harman 2010), Susan Dwyer (1999), and John Mikhail (2007, 2011) have speculatively explored and which experimentalists have recently investigated (Hauser 2006; Hauser, Young, & Cushman 2008; Mikhail 2007, 2008, 2011, Dwyer 2009).  There are several points of purported contact:

L1: A child raised in a particular linguistic community almost inevitably ends up speaking an idiolect of the local language despite lack of sufficient explicit instruction, lack of extensive negative feedback for mistakes, and grammatical mistakes by caretakers.

M1: A child raised in a particular moral community almost inevitably ends up judging in accordance with an idiolect of the local moral code despite lack of sufficient explicit instruction, lack of sufficient negative feedback for moral mistakes, and moral mistakes by caretakers.


L2: While there is great diversity among natural languages, there are systematic constraints on possible natural languages.

M2: While there is great diversity among natural moralities, there are systematic constraints on possible natural moralities.


L3: Language-speakers obey many esoteric rules that they themselves typically cannot produce or explain, and which some would not even recognize.

M3: Moral agents judge according to esoteric rules (such as the doctrine of double effect) that they themselves typically cannot produce or explain, and which some would not even recognize.


L4: Drawing on a limited vocabulary, a speaker can both produce and comprehend a potential infinity of linguistic expressions.

M4: Drawing on a limited moral vocabulary, an agent can produce and evaluate a very large (though perhaps not infinite) class of action-plans, which are ripe for moral judgment.

Pair 1 suggests the “poverty of the stimulus” argument, according to which there must be an innate language (morality) faculty because it would otherwise be next to impossible for children to learn what and as they do.  However, as Prinz (2008) points out, the moral stimulus may be less penurious than the linguistic stimulus: children are typically punished for moral violations, whereas their grammatical violations are often ignored.  Nichols, Kumar, & Lopez (unpublished manuscript) lend support to Prinz’s contention with a series of Bayesian moral-norm learning experiments.

Pair 2 suggests the “principles and parameters” approach, according to which, though the exact content of linguistic (moral) rules is not innate, there are innate rule-schemas, the parameters of which may take only a few values.  The role of environmental factors is to set these parameters.  For instance, the linguistic environment determines whether the child learns a language in which noun phrases precede verb phrases or vice versa.  Similarly, say proponents of the analogy, there may be a moral rule-schema according to which members of group G may not be intentionally harmed unless p, and the moral environment sets the values of G and p.  As with the first point of analogy, philosophers such as Prinz (2008) find this comparison dubious.  Whereas linguistic parameters typically take just one of two or three values, the moral parameters mentioned above can take indefinitely many values and seem to admit of diverse exceptions.

Pair 3 suggests that people have knowledge of language (morality) that is not consciously accessed (and may even be inaccessible to consciousness) but implicitly represented, such that they produce judgments of grammatical (moral) permissibility and impermissibility that far outstrip their own capacities to reflectively identify, explain, or justify.  One potential explanation of this gap is that there is a sub-personal “module” for language (morality) that has proprietary information and processing capacities.  Only the outputs of these capacities are consciously accessible.

Pair 4 suggests the linguistic (moral) essentiality of recursion, which allows the embedding of type-identical structures within one another to generate further structures of the same type.  For instance, noun phrases can be embedded in other noun phrases to form more complex noun phrases:

the calico cat –> the calico cat (that the dog chased) –> the calico cat (that the dog [that the breeding conglomerate wanted] chased) –> the calico cat (that the dog [that the company {that was bankrupt} owned] chased)

Moral judgments, likewise, can be embedded in other moral judgments to produce novel moral judgments:

“Thou shalt not kill” (Deuteronomy 5:13) –> “Ye have heard that it was said of them of old time, Thou shalt not kill; and whosoever shall kill shall be in danger of the judgment: But I say unto you, that whosoever is angry with his brother shall be in danger of the judgment.” (Matthew 5:21-2)

Another example: plausibly, if it’s wrong to x, then it’s wrong to persuade someone to x and wrong to coerce someone to x (Harman 2008, p. 346), and therefore also wrong to persuade someone to coerce someone to x.  Such moral embedding has been experimentally investigated by John Mikhail (2011, pp. 43-8), who argues on the basis of experiments using variants on the “trolley problem” (Foot 1978) that moral judgments are generated by imposing a deontic structure on one’s representation of the causal and evaluative features of the action under consideration.

As with any analogy, there are points of disanalogy between language and morality.  At least sometimes, moral judgments are corrigible in the face of argument, whereas grammaticality judgments seem to be incorrigible or at least less corrigible.  People are often tempted to act contrary to their moral judgments, but not to their grammaticality judgments.  Recursive embedding seems to be able to generate all of language, whereas recursive embedding may only be applicable to deontic judgments about actions, and not, for instance, judgments about norms, institutions, situations, and character traits.  Indeed, it’s hard to imagine what recursion would mean for character traits: does it make sense to think of honesty being embedded in courage to generate a new trait?  If it does, what would that trait be?  Finally, grammaticality judgments are bivalent: every utterance is either grammatical or not.  In contrast, moral judgments tend to be more fine-grained, allowing of at least four levels: forbidden, permissible, obligatory, and supererogatory.

3 Character, Wellbeing, and Emotion

Until the 1950s, modern moral philosophy had largely focused on either utilitarian-friendly properties (goodness, betterness, optimality) or deontology-friendly properties (rightness, wrongness, universalizability).  The revitalization of virtue ethics led to a renewed interest in the virtue- and vice-properties (e.g., honesty, generosity, fairness, dishonesty, stinginess, unfairness), in eudaimonia (often translated as ‘happiness’ or ‘flourishing’), and in the emotions.  In recent decades, experimental work in psychology, sociology, and neuroscience has been brought to bear on the empirical grounding of philosophical views on these issues.

3.1 Character and Virtue

A virtue is a complex disposition comprising sub-dispositions to notice, construe, think, desire, and act in characteristic ways.  To be generous, for instance, is (among other things) to be disposed to notice occasions for giving, to construe ambiguous social cues charitably, to desire to give people things they want, need, or would appreciate, to deliberate well about what they want, need, or would appreciate, and to act on the basis of such deliberation.  Manifestations of such a disposition are observable and hence ripe for empirical investigation.  Virtue ethicists of the last several decades have tended, furthermore, to be optimistic about the distribution of virtue in the population.  Alasdair MacIntyre claims, for example, that “without allusion to the place that justice and injustice, courage and cowardice play in human life very little will be genuinely explicable” (1984, p. 199).  Julia Annas (2011, pp. 8-10) claims that “by the time we reflect about virtues, we already have some.”  Linda Zagzebski (2010) provides an “exemplarist” semantics for virtue terms that only gets off the ground if there are in fact many virtuous people.

The philosophical situationists John Doris (1998, 2002) and Gilbert Harman (1999, 2000, 2003) were the first to mount an empirical challenge to the virtue ethical conception of character, arguing on the basis of evidence from personality and social psychology that the structure of most people’s dispositions does not match the structure of virtues (or vices).   Philosophical situationists contend that the social psychology of the last century has shown that most people are surprisingly susceptible to seemingly trivial and normatively irrelevant situational influences, such as mood elevators (Isen, Clark, & Schwartz 1976; Isen, Shalker, Clark, & Karp 1978; Isen 1987), mood depressors (Apsler 1975; Carlsmith & Gross 1968; Regan 1971; Weyant 1978), presence of bystanders (Latané & Darley 1968, 1970; Latané & Rodin 1969; Latané & Nida 1981; Schwartz & Gottlieb 1991), ambient sounds (Matthews & Cannon 1975; Boles & Haywood 1978; Donnerstein & Wilson 1976), ambient smells (Baron 1997; Baron & Thomley 1994), and ambient light levels (Zhong, Bohns, & Gino 2010).[15]  Individual difference variables typically explain less than 10% of the variance in people’s behavior (Mischel 1968) – though, as Funder & Ozer (1983) point out, situational factors typically explain less than 16%.[16]

According to Doris (2002), the best explanation of this lack of cross-situational consistency is that the great majority of people have local, rather than global, traits: they are not honest, courageous, or greedy, but they may be honest-while-in-a-good-mood, courageous-while-sailing-in-rough-weather-with-friends, and greedy-unless-watched-by-fellow-parishioners.  In contrast, Christian Miller (2013a, 2013b) thinks the evidence is best explained by a theory of mixed global traits, such as the disposition to (among other things) help because it improves one’s mood.  Such traits are global, in the sense that they explain and predict behavior across situations (someone with such a disposition will, other things being equal, typically help so long as it will maintain her mood), but normatively mixed, in the sense that they are neither virtues nor vices.  Mark Alfano (2013) goes in a third direction, arguing that virtue and vice attributions tend to function as self-fulfilling prophecies.  People tend to act in accordance with the traits that are attributed to them, whether the traits are minor virtues such as tidiness (Miller, Brickman, & Bolen 1975) and ecology-mindedness (Cornelissen et al. 2006, 2007), major virtues such as charity (Jensen & Moore 1977), cooperativeness (Grusec, Kuczynski, Simutis & Rushton 1978), and generosity (Grusec & Redler 1980), or vices such as cutthroat competitiveness (Grusec, Kuczynski, Simutis & Rushton 1978).  On Alfano’s view, when people act in accordance with a virtue, they often do so not because they have the trait in question, but because they think they do or because they know that other people think they do.  He calls such simulations of moral character factitious virtues, and even goes so far as to suggest that the notion of a virtue should be revised to include reflexive and social expectations.[17]

It might seem that this criticism misses its mark.  After all, virtue ethicists needn’t (and typically don’t) commit themselves to the claim that almost everyone is virtuous.  Instead, they usually argue that virtue is the normative goal of moral development, and that people may fail to reach that goal in various ways.  When Doris, Harman, Miller, or Alfano argues from the fact that most people’s dispositions are not virtues to a rejection of orthodox virtue ethics, then, he might be thought to be committing a non sequitur.  But empirically-minded critics of virtue ethics do not stop there.  They all have positive views about what sorts of dispositions people have instead of virtues.[18]  These dispositions are alleged to be so structurally dissimilar from virtues (as traditionally understood) that it may be psychologically unrealistic to treat (traditional) virtue as a regulative ideal.  What matters, then, is the width of the gap between the descriptive and the normative, between the (structure of the) dispositions most people have and the (structure of the) dispositions that count as virtues.

Three leading defenses against this criticism have been offered.  Some virtue ethicists (Badhwar 2009, Kupperman 2009) have conceded that virtue is extremely rare, but argued that it may still be a useful regulative ideal.  Others (Hurka 2006, Merritt 2000) have attempted to weaken the concept of virtue in such a way as to enable more people, or at least more behaviors, to count as virtuous.  Still others (Kamtekar 2004, Russell 2009, Snow 2010, Sreenivasan 2002) have challenged the situationist evidence or its interpretation.  While it remains unclear whether these defenses succeed, grappling with the situationist challenge has led both defenders and challengers of virtue ethics to develop more nuanced and empirically informed views.[19]

3.2 Wellbeing

The study of wellbeing and happiness has recently come into vogue in both psychology (Kahneman, Diener, & Schwartz 2003; Seligman 2011) and philosophy (Haybron 2008), including experimental philosophy (Braddock 2010; Phillips, Misenheimer, & Knobe 2011; Phillips, Nyholm, & Liao forthcoming).  It is helpful in this context to draw a pair of distinctions, even if those distinctions end up getting blurred by further investigation.  First, we need to distinguish between the notion of a life that goes well for the one who lives it and a morally good life.  It could turn out that these are extensionally identical, or that one is a necessary condition for the other, but at first blush they appear to involve different concepts.  The empirical study of wellbeing focuses primarily on the former conception of a good life.  Second, we need to distinguish between a hedonically good life and an overall good life.  As with the first distinction, it might turn out that a hedonically good life just is an overall good life, but that would be a discovery, not something we can simply take for granted.

With these distinctions in hand, there are a number of interesting experimental results to consider.  First, in the realm of hedonic evaluation, there are marked divergences between the aggregate sums of in-the-moment pleasures and pains and ex post memories of pleasures and pains.  For example, the remembered level of pain of a colonoscopy is well-predicted by the average of the worst momentary level of pain and the final level of pain; furthermore, the duration of the procedure has no measurable effect on ex post pain ratings (Redelmeier & Kahneman 1996).  What this means is that people’s after-the-fact summaries of their hedonic experiences are not simple integrals with respect to time of momentary hedonic tone.  If the colonoscopy were functionally completed after minute 5, but arbitrarily prolonged for another 5 minutes so that the final level of pain was less at the end of minute 10 than at the end of minute 5, the patient would retrospectively evaluate the experience as less painful – even though the first five minutes of the ten-minute procedure were phenomenologically indistinguishable from the whole of the five-minute procedure.  This matters for any philosophical theory that treats pleasure as at least partially constitutive of wellbeing.  And it matters a lot if you’re inclined, like Bentham (1789/1961) and Singer (Singer & de Lazari-Radek forthcoming), to think that pleasure is the only intrinsic good and pain the only intrinsic ill.  How exactly?  The results do nothing to establish whether pleasure is good, or is the only intrinsic good, nor do they establish whether pain is bad, or is the only intrinsic bad.  What they do establish – if the data are to be trusted – is that substantive theories of wellbeing based on hedonic intuitions may miss their mark because such theories are virtually all formulated in light of ex post evaluation of pleasures and pains.  When a philosopher says that a good life is, to the extent possible, filled with pleasure and devoid of pain, and therefore that this way of life constitutes wellbeing, she may be thinking of pleasures and pains in a biased way.  What she should do instead is to use empirical, in-the-moment reports of hedonic tone to establish what really is pleasurable (painful) and how pleasurable (painful) it is.

A second interesting set of results has to do not with (reports of) hedonic tone but (reports of) subjective wellbeing.  The most prominent researcher in this field is Ed Diener,[20] whose Satisfaction with Life Scale asks participants to agree or disagree with statements such as, “I am satisfied with my life” and “If I could live my life over, I would change almost nothing.”[21]  It might be thought that these questions are more revelatory of wellbeing than questions about hedonic tone.  However, two problems have arisen with respect to this data.  The first is that participants’ responses to life satisfaction questionnaires may be not an accurate report of standing attitudes but ad hoc constructions that rely on dubious heuristics.  Fox & Kahneman (1992), for instance, showed that, especially in personal domains that people seem to value (friends and love life), what predicts participants’ responses is not recent intrapersonal factors but social comparison.  Someone who has just lost a friend but still thinks of herself as having more friends than her peers will tend to report higher life satisfaction than someone who has just gained a friend but who still thinks of himself as having fewer friends than his peers.  How could people be so confused about themselves?  The questions in Diener’s survey are hard to answer.  What may happen is that respondents use heuristics to generate their responses, thereby answering a different but related question.  Life satisfaction surveys also seem to be subject to order effects; for instance, if a participant is asked a global life satisfaction question and then asked about his romantic life, the correlation between these questions tends to be low or nonexistent, but if the participant is asked the dating question first, the correlation tends to be high and positive (Strack, Martin, & Schwarz 1988).[22]

Another striking result in the literature on subjective wellbeing has to do with what has come to be known as the set-point.  Some early studies suggested that, though major life events such as winning the lottery or suffering a severe spinal cord injury have an effect on subjective wellbeing, the effect wears off over time, and that people return to a set-point of (reported) subjective wellbeing (Brickman & Campbell 1971).  The explanation for this striking phenomenon is that people adapt to more or less whatever life throws at them.  If the set-point hypothesis is correct (and if reported subjective wellbeing is a reliable indicator of or identical with actual wellbeing), then ethical theories that focus on the promotion of wellbeing would seem to be in trouble.  After all, what’s the point of trying to promote something that is bound to return to a set-point in the end?  More recent research, however, has challenged the set-point finding by establishing the longitudinal impact of at least some important life events, such as divorce, death of a spouse, unemployment, and disability (Lucas 2007; Lucas, Clark, Georgellis, & Diener 2003).

One final interesting set of results centers on the idea of virtue as a pre-requisite for wellbeing.  If the results of Braddock (2010), Phillips, Misenheimer, & Knobe (2011), and Phillips, Nyholm, & Liao (forthcoming) withstand scrutiny and replication, it would seem that ordinary people are willing to judge a life as good for the one living it only if it is both full of positive moods and affects, and is virtuous (or at least not vicious).  This result resonates with the empirically-supported views of Seligman (2011) that happiness contingently turns out to be best-achievable in a life that includes both a good deal of positive emotion[23] and the exercise of both moral and intellectual virtues.  If these results are solid, not only do ordinary people tend to think that both virtue and pleasure are enablers of wellbeing, but in fact it’s true that both virtue and pleasure are enablers of wellbeing.

3.3 Emotion and Affect

Experimental inquiries into morality and emotion overlap in myriad, distantly-related ways, a few of which I mentioned above.  I can only hope to gesture at many of the interesting questions that have been investigated in this context.  For instance, are moral judgments always motivating?  In other words, does it follow that, insofar as you judge that x is morally right (wrong), you are – perhaps only defeasibly, but to some extent – motivated to x (avoid xing)?  An affirmative answer is often labeled “internalist,” whereas a negative answer is labeled “externalist.”  Emotions are intrinsically motivational, so if experimental investigation could show that emotion was implicated in all moral judgments, that would be a point in favor of internalism.[24]  Another question I will not discuss in depth: is emotionally-driven reasoning in general better or worse than “cold,” affectless reasoning?  Greene et al. (2001, 2004) seem to presuppose that cold reasoning is typically or even always better, but I see little reason to make such a sweeping judgment.

Instead of trying to address all of the relevant questions, I focus on a particular application based on what have come to be known as dual-system models of cognition, reasoning, decision-making, and behavior.  While the exact details of the two systems vary from author to author, the basic distinction is between what Daniel Kahneman calls System 1, which is fast, automatic, effortless, potentially unconscious, often affect-laden, and sometimes incorrigible, and System 2, which is slow, deliberative, effortful, typically conscious, and associated with the subjective experience of agency, choice, and concentration (2011, pp. 20-21).  Whereas System 2 exhibits a degree of functional unity, System 1 is better conceived as a loose conglomeration of semi-autonomous dispositions, states, and processes, which can conflict not only with System 2 but also with each other.

The dual-system approach has been employed by various experimentalists, including Joshua Greene (2008, 2012), Jonathan Haidt (2012; Haidt & Björklund 2008), Joshua Knobe (Inbar et al. 2009), Fiery Cushman (Cushman & Greene forthcoming), and Daniel Kelly (2011).  These researchers also tend to emphasize that, though people usually associate their “true selves” with System 2, System 1 seems to be responsible for most intuitions, judgments, and behaviors.  Haidt in particular has argued for a “social intuitionist” model according to which deliberation and reasoning are entirely post hoc rationalizations of System 1 cognition, affect, and behavior.  I will focus in particular on one process that relies heavily on System 1, disgust, to show what experimental moral philosophy of the emotions can do.

Disgust is an emotion that seems to be unique to human animals.  It involves characteristic bodily, affective, motivational, evaluative, and cognitive patterns.  For instance, someone who feels disgusted almost always makes a gaping facial expression, withdraws slightly from the object of disgust, experiences a slight reduction in body temperature and heart rate, and feels a sense of nausea and the need to cleanse herself.  In addition, she is motivated to avoid and even expunge the offending object, experiences it as contaminating and repugnant, becomes more attuned to other disgusting objects in the immediate environment, is inclined to treat anything that the object comes in contact with (whether physically or symbolically) as also disgusting, and is more inclined to make harsh moral judgments – both about the object and in general.  There are certain objects that basically all normal adults are disgusted by (feces, decaying corpses, rotting food, spiders, maggots, gross physical deformities), but there is also considerable intercultural and interpersonal variation beyond these core objects of disgust, including in some better-studied cases cuisines, sexual behaviors, out-group members, and violations of social norms.  Furthermore, the disgust reaction is nearly impossible to repress, is easily recognized, and – when recognized – empathically induces disgust in the other person.[25]

Yuck! by Dan Kelly

In a recent monograph, Kelly (2011) persuasively argues that this seemingly bizarre combination of features is best explained by what he calls the “entanglement thesis” (chapter 2) and the “co-opt thesis” (chapter 4).  First, the universal bodily manifestations of disgust evolved to help humans avoid ingesting toxins and other harmful substances, while the more cognitive or symbolic sense of offensiveness and contamination associated with disgust evolved to help humans avoid diseases and parasites.  According to the entanglement thesis, these initially distinct System 1 responses became entangled in the course of human evolution and now systematically co-occur.  If you make the gape face, whatever you’re attending to will start to look contaminated; if something disgusts you at a cognitive level, you will flash a quick gape face.  Second, according to the co-opt thesis, the entangled emotional system for disgust was later recruited for an entirely distinct purpose: to help mark the boundaries between in-group and out-group, and thus to motivate cooperation with in-group members, punishment of in-group defectors, and exclusion of out-group members.  Because the disgust reaction is both on a “hair trigger” (it acquires new cues extremely easily and empathically, p. 51) and “ballistic” (once set in motion, it is nearly impossible to halt or reverse, p. 72), it was ripe to be co-opted in this way.

If Kelly’s account of disgust is on the right track, it seems to have a number of important moral upshots.  One of the more direct consequences of this theory is what he calls “disgust skepticism” (p. 139), according to which the combination of disgust’s hair trigger and its ballistic trajectory mean that it is extremely prone to incorrigible false positives that involve unwarranted feelings of contamination and even dehumanization.  Hence, “the fact that something is disgusting is not even remotely a reliable indicator of moral foul play” but is instead “irrelevant to moral justification” (p. 148).

Many theories of value incorporate a link between emotions and value.  According to fitting-attitude theories (Rönnow-Rasumussen 2011), something is bad if and only if there is reason to take a con-attitude (e.g., dislike, aversion, anger, hatred, disgust, contempt) towards it, and good if and only if there is reason to take a pro-attitude (e.g., liking, love, respect, pride, awe, gratitude) towards it.  According to response-dependence theories (Prinz 2007), something is bad (good) just in case one would, after reflection and deliberation, hold a con-attitude (pro-attitude) towards it.  According to desire-satisfaction theories of wellbeing (Heathwood 2006), your life is going well to the extent that the events towards which you harbor pro-attitudes occur, and those towards which you harbor con-attitudes do not occur.  If Kelly’s disgust skepticism is on the right track, it looks like it would be a mistake to lump together all con-attitudes.  Perhaps it still makes sense to connect other con-attitudes, such as indignation, with moral badness, but it seems unwarranted to connect disgust with moral badness.  Thus, experimental moral philosophy of the emotions leads to a potential insight into the evaluative diversity of con-attitudes.

Another potential upshot of the experimental research derives from the fact that disgust belongs firmly in System 1: it is fast, automatic, effortless, potentially unconscious, affect-laden, and nearly incorrigible.  Moreover, while it is exceedingly easy to acquire new disgust triggers whether you want to or not, there is no documented way to de-acquire them, even if you want to.[26]  Together, these points raise worries about moral responsibility.  It’s a widely accepted platitude that the less control you have over your behavior, the less responsible you are for that behavior.  At one extreme, if you totally lack control, many would say that you are not responsible for what you do.  Imagine an individual who acts badly because he is disgusted: he gapes when he sees two men kissing, even though he reflectively rejects homophobia; the men see this gape and, understandably, feel ostracized.  Would it be appropriate for them to take up a Strawsonian (1962) reactive attitude towards him, such as indignation?  Would it be appropriate for him to feel a correlative attitude towards himself, such as guilt or shame?  Of course, if his flash of disgust is something that he recognizes and endorses, the answers to these questions may be simpler, but what are we to say about the case where someone is, as it were, stuck with a disgust trigger that he would rather be rid of?  I will not try to answer this question here; instead, I intend it to show that, while experimental moral philosophy of the emotions may provide new insights, it also raises thorny questions.

4 Critical Conclusion

Experimental moral philosophy far outstrips what I’ve been able to cover here.  Experimental approaches to meta-ethics are almost entirely absent from the present discussion.  Experimental evidence is also clearly relevant to moral questions in bioethics, such as euthanasia, abortion, and genetic screening.  Experiments in behavioral economics are presumably relevant to normative views about public policy.  These issues have been neglected only because of lack of space.  In the few words remaining, I want to explore some potential criticisms of experimental philosophy.  As with my coverage of the first-order content, this catalogue will necessarily be circumscribed, but I hope to cover many of the more prominent and important criticisms.

4.1 Problems with Experimental Design and Interpretation

As a young field, experimental philosophy suffers from various problems with experimental design and interpretation.  These are not insurmountable problems, and they are problems faced by related fields, such as social psychology, cognitive psychology, and behavioral economics.  One issue that has recently come to the fore is the problem of replication.  As I mentioned above, the mere fact that statistical analysis yields a positive result by crossing below the magical p < .05 threshold does not mean that anything has actually been discovered.  Such “results” may be statistical anomalies.  The only way to tell is for the experiment to be replicated – preferably by another research group.  If a result cannot be replicated, it is probably a mirage.  Such mirages have turned up to a disturbing extent recently, as Daniel Kahneman has notoriously pointed out (Yong 2012).  Kahneman proposed a “daisy chain” of replication, where no result would be published until it had been successfully replicated by another prominent lab.  This proposal has not yet been (and may never be) instituted, but it has raised the problem of replication to salience.  A related project, however, has taken off: the reproducibility project, which aims to establish the extent to which prominent, published results can be replicated. And experimental philosophers have followed suit with their own replication project.

A related issue with experimental design and interpretation has to do with “fishing expeditions.”  An experiment is sometimes pejoratively referred to as a fishing expedition if no specific predictions are advanced prior to data collection, especially when multiple hypotheses are tested.  An example may help to illustrate these complaints.  Consider by way of example the work on the diversity of philosophical intuitions pioneered by Jonathan Weinberg, Shaun Nichols, and Stephen Stich (2008).  They found evidence that allegedly supports the claim that epistemic and normative intuitions vary both from culture to culture and from one socioeconomic group to the next.  What an American or European might call a case of mere belief, an East Asian may say is knowledge.  What someone with high SES might say is morally permissible, someone with low SES would call morally impermissible.  Why do these differences crop up?  In most cases, Weinberg, Nichols, and Stich have no reason to expect the pattern of responses they report.  While they attempt to burnish the scientific patina of their experiment by predicting novel facts (“Epistemic intuitions vary from culture to culture,” and, “Epistemic intuitions vary from one socioeconomic group to another” (p. 24)) their hypotheses are so vague as to be nearly unfalsifiable.  They make few predictions about which differences to expect and which not to expect, and they admit that they have no guiding positive heuristic that would explain the patterns of diversity they observe (pp. 33-34).

Consider the first problem: that only the vaguest prediction of “differences” is made.  This verges on saying, “Gee, something will happen.”  I’m not attacking the authors of this study; it’s easy enough to fall into this sort of pseudo-science.  Even in the annals of the British Royal Society, you find descriptions of “experiments” like this: “A circle was made with powder of unicorne’s horn, and a spider set in the middle of it, but it immediately ran out severall times repeated.  The spider once made some stay upon the powder” (Weld 1848, p. 113).  Still, even if it is human, all too human, to err in this way, the error needs to be pointed out so that it can be avoided in future.

Stemming from the problem of making no specific predictions is the fact that simultaneously testing many intuition-probes leads to overconfidence.  If you were to test two random samples of people with several intuition-probes, the odds are that you would find some differences in their intuitions – even some statistically significant differences.  As I pointed out above, the sort of statistical test used in these experiments has a false-positive rate of 5%: that is, in one out of twenty cases, the test indicates that the pattern of responses did not come from chance when in fact it did.  The probability of at least one false positive when making multiple comparisons rises exponentially: P(false positive) = 1 – .95n, where n is the number of hypotheses tested simultaneously.  This is known as the experimentwise error rate.

In a recent critique of this kind of fallacious statistical thinking, Peter Austin, Muhammad Mamdani, David Juurlink, and Janet Hux (2006) showed that Canadian patients’ astrological signs were often correlated with their pathologies.  For instance, using the same statistical techniques as experimental philosophers investigating group differences intuitions, one would be led to conclude that Gemini are 30% more likely to be alcoholics (p < 0.02); Scorpios have an 80% higher risk of developing leukemia (p < 0.05); and Virgo women suffer 40% more from excessive vomiting during pregnancy (p < 0.04).  These are presumably statistical anomalies, not indicators of genuine health risks.  An epidemiologist who constructed a theory of alcoholism based on astrological sign would be making a terrible mistake.  The same sort of problem crops up when testing for cross-cultural and cross-class differences.  For each type of intuition, Weinberg, Nichols, and Stich used nearly a dozen intuition probes, so the odds of their finding at least one false positive were about 40%.  This fact, along with the fact that they documented differences in only some cases, casts a pall of doubt over their whole research program.

Fishing expeditions are not necessarily a bad idea, but they should be seen as merely exploratory: if a fishing expedition turns up a potential result, specific hypotheses should be constructed in response and detailed replications carried out to determine whether what was caught was a fish or merely some flotsam and jetsam.  Another way to establish that a result is real is to look not just at whether it crosses the magical p < .05 threshold but whether it has a respectable effect size.  The most common measure of effect size is Cohen’s d, which is the ratio of the difference in means between conditions to the standard deviation of the relevant variable.  So, for example, a d of 1.0 would indicate that a manipulation moved the mean of the experimental condition an entire standard deviation away from the mean of the control condition.  This would be a huge effect.  Unfortunately, effect sizes have, until recently, rarely been reported in experimental philosophy, though that appears to be changing.

One final problem to note with experimental design is that ordinal scales are often treated as if they were cardinal scales.  In particular, as I mentioned above Likert scales are ordinal: the difference between a response of 0 and a response of 2 cannot be assumed to be the same “size” as the difference between a response of 2 and a response of 4.  Only if a scale is cardinal can such an assumption be made (indeed, that’s what it means for a scale to be cardinal).  But even Kahneman, in a result I cited above (Redelmeier & Kahneman 1996), commits this fallacy by taking the integral with respect to time of ordinal ratings of hedonic tone.

Science is hard.  That doesn’t mean it shouldn’t be pursued, but it does mean that even the most prominent and well-intentioned researchers sometimes make mistakes – mistakes that should be pointed out and avoided in future.[29]

4.2 Philosophical Problems

One might object, however, that the real problem with experimental philosophy is not that science is hard but that science is irrelevant.  “Real” moral philosophy, whatever that is, is in principle unaffected by empirical or experimental results.  There are various routes to this conclusion, none of which I find compelling.

One might base the argument for the irrelevance of experimental moral philosophy on the is-ought dichotomy or the related naturalistic fallacy.  Whatever science turns up about what is, the argument would go, it cannot turn up anything about what ought to be.  But if one subscribes to the ought-implies-can principle, this is plainly false.  Presumably, experimental evidence can help – at least in principle – to establish what is possible, necessary, and impossible.  But if it can do that, it can constrain normative views through the ought-implies-can principle.  Furthermore, if the evidence canvassed above on various thick properties (character, virtue, wellbeing, emotion) is any good, then it is possible to investigate normative issues empirically by investigating thick properties.

A potential response to these points is to say that fundamental moral philosophical theories are, if true at all, necessary truths, and that empirical research can establish at best only contingent truths.[30]  This objection is based on a modal fallacy: necessary implies actual, which means that not-actual implies not-necessary.  That means that if we find out that something is not actually the case, then we’ve also discovered that it is not necessarily the case.  The critic of experimental philosophy would presumably respond by saying that actuality is overkill.  All that’s needed to discredit a necessity claim is a possibility claim.  So, if it’s even merely possible that something is not the case, then it’s not necessary that it’s the case.  I agree.

But I disagree with the further implicit premise that all we need to establish a possibility claim is creativity and imagination.  We can imagine things that turn out, on closer inspection, to be impossible.  And we sometimes find unimaginable things that turn out to be not only possible but actual.  One might respond to this by saying that when we imagine the impossible and when we fail to imagine the possible, that’s just a performance error: if we were really, truly creative and imaginative in the requisite ways, these failures wouldn’t crop up.  In a sense this is so, but only if imagination is assumed a priori to be modally infallible – that is, only if it’s assumed at the outset that the “real” deliverances of a competently-functioning imagination are all and only the relevant possibilities.  Frankly, this strikes me as preposterous.  This argument applies equally to recent defenses of empirically-uninformed philosophy by Timothy Williamson (2007, 2011) and Herman Cappelen (2012).

But even granting this, there’s a further problem with the claim, which is that it also assumes the infallibility of imaginative introspection.  If we assume that a well-functioning imagination delivers all and only the relevant possibilities, there’s still the question whether, when you take yourself to have imagined something, you have indeed imagined it.  But how could you know that you’ve imagined it rather than, for instance, thinking that you imagined it, or imagining that you imagined it, or deceived yourself into believing that you imagined it?  Not by introspecting, surely.  From the inside, imagining something and thinking that you’ve imagined it (or imagining that you’ve imagined it, or deceiving yourself into believing that you’ve imagined it) are indistinguishable.  In many cases, however, what you could do to check whether you’ve imagined something, is to empirically investigate whether it can be made actual.  If this is right, then even if imagination is modally infallible there would still be a role for empirical investigation.

But maybe none of this is right.  Still, I think there would be an important place for experimental moral philosophy research.  If fundamental moral theories are necessary, then they are necessary for creatures like us.  And one thing that empirical investigation can do is to help establish what sorts of creatures we are.  In other words, even supposing both the modal infallibility of imagination and the infallibility of imaginative introspection, the partisan of possibility must admit that imagination needs material to work with.  When you sit in your armchair, imagining a hypothetical scenario, you make a whole host of assumptions about what people are like, how psychological processes work, and so on.  Consider two people who are arguing about the upshot of a thought experiment.  One says, “Imagine a case in which things are thus and so,” and the other says, “I can’t – that’s impossible.”  How is such a disagreement to be adjudicated?  As I pointed out above, there are only three ways to go: you can be empirically informed, empirically uninformed, or empirically misinformed.


Abelson, R. (1997). On the surprising longevity of flogged horses: Why there is a case for the significance test. Psychological Science, 8:1, 12-15.

Adams, F., & Steadman, A. (2004). Intentional action in ordinary language: Core concept or pragmatic understanding? Analysis, 64, 173-181.

Adams, F., & Steadman, A. (2007). Folk concepts, surveys, and intentional action. In C. Lumer (Ed.), Intentionality, deliberation, and autonomy: The action-theoretic basis of practical philosophy (pp. 17-33). Aldershot: Ashgate.

Alfano, M. (2013). Character as Moral Fiction. Cambridge UP.

Alfano, M., Beebe, J., & Robinson, B. (2012). The centrality of belief and reflection in Knobe-effect cases. The Monist, 95:2, 264-289.

Alicke, M. (2008). Blaming badly. Journal of Cognition and Culture, 8, 179-186.

Annas, J. (2011). Intelligent Virtue. Oxford UP.

Apsler, R. (1975).  Effects of embarrassment on behavior toward others.  Journal of Personality and Social Psychology, 32, 145-153.

Austin, P., Mamdani, M., Juurlink, D., Hux, J. (2006). Testing multiple statistical hypotheses resulted in spurious associations: A study of astrological signs and health. Journal of Clinical Epidemiology, 59:9, 964-969.

Badhwar, N. (2009). The Milgram experiments, learned helplessness, and character traits.  Journal of Ethics, 13:2-3, 257-289.

Banerjee, K., Huebner, B., & Hauser, M. (2010). Intuitive moral judgments are robust across demographic variation in gender, education, politics, and religion: A large-scale web-based study. Journal of Cognition and Culture, 10:1/2, 1-26.

Baron, R. (1997).  The sweet smell of … helping:  Effects of pleasant ambient fragrance on prosocial behavior in shopping malls.  Personality and Social Psychology Bulletin, 23, 498-503.

Baron, R. A., & Thomley, J. (1994). A whiff of reality: Positive affect as a potential mediator of the effects of pleasant fragrances on task performance and helping.  Environment and Behavior, 26, 766-784.

Beebe, J. R. (forthcoming). A Knobe effect for belief ascriptions.

Beebe, J. R., & Buckwalter, W. (2010). The epistemic side-effect effect. Mind & Language, 25, 474-498.

Beebe, J. R., & Jensen, M. (forthcoming). Surprising connections between knowledge and action: The robustness of the epistemic side-effect effect. Philosophical Psychology.

Bentham, J. (1789/1961). An Introduction to the Principles of Morals and Legislation. Garden City: Doubleday. Originally published in 1789.

Boles, W. & Haywood, S. (1978).  The effects of urban noise and sidewalk density upon pedestrian cooperation and tempo.  Journal of Social Psychology, 104, 29-35.

Braddock, M. (2010). Constructivist experimental philosophy on wellbeing and virtue. Southern Journal of Philosophy, 48:3, 295-323.

Brandt, R. (1974). Hopi Ethics: A Theoretical Analysis. University of Chicago Press.

Brickman, P., & Campbell, D. (1971). Hedonic relativism and planning the good society. In M. Appley (ed.), Adaptation-Level Theory, pp. 287-305. New York: Academic Press.

Buckwalter, W. & Stich, S. (forthcoming). Gender and philosophical intuition. In Knobe & Nichols (eds.), Experimental Philosophy, Vol. 2. OUP.

Cappelen, H. (2012). Philosophy Without Intuitions.  Oxford UP.

Carlsmith, J. & Gross, A. (1968).  Some effects of guilt on compliance.  Journal of Personality and Social Psychology, 53, 1178-1191.

Case, T., Repacholi, B., & Stevenson, R. (2006). My baby doesn’t smell as bad as yours: The plasticity of disgust. Evolution and Human Behavior, 27:5, 357-365.

Chomsky, N. (1965). Aspects of the Theory of Syntax. Cambridge, MA: MIT Press.

Cohen, J. (1994). The Earth is round (p < .05). American Psychologist, 49:12, 997-1003.

Cornelissen, G., Dewitte, S. & Warlop, L. (2007). Whatever people say I am that’s what I am: Social labeling as a social marketing tool.  International Journal of Research in Marketing, 24:4, 278-288.

Cornelissen, G., Dewitte, S., Warlop, L., Liegeois, A., Yzerbyt, V., Corneille, O. (2006). Free bumper stickers for a better future: The long term effect of the labeling technique.  Advances in Consumer Research, 33, 284-285.

Cushman, F. & Greene, J. (forthcoming). Finding faults: How moral dilemmas illuminate cognitive structure. Social Neuroscience.

Cushman, F. & Young, L. (2009). The psychology of dilemmas and the philosophy of morality. Ethical Theory and Moral Practice, 12:1, 9-24.

Cushman, F. & Young, L. (2011). Patterns of judgment derive from nonmoral psychological representations. Cognitive Science, 35:6, 1052-1075.

Diener, E., Scollon, C., & Lucas, R. (2003). The evolving concept of subjective wellbeing: The multifaceted nature of happiness.  Advances in Cell Aging and Gerontology, 15, 187-219.

Diener, E., Emmons, R., Larsen, R., & Griffin, S. (2010). The satisfaction with life scale. Journal of Personality Assessment, 49:1, 71-5.

Donnerstein, E. & Wilson, D.  (1976).  Effects of noise and perceived control on ongoing and subsequent aggressive behavior.  Journal of Personality and Social Psychology, 34, 774-781.

Doris, J. (1998).  Persons, situations, and virtue ethics.  Nous, 32:4, 504-540.

Doris, J. (2002).  Lack of Character: Personality and Moral Behavior.  Cambridge:  Cambridge UP.

Doris, J. & Stich, S. (2006). Moral psychology: Empirical Approaches. Stanford Encyclopedia of Philosophy.  Accessed 9 July 2012.

Dwyer, S. (1999). Moral competence.  In K. Murasugi & R. Stainton (eds.), Philosophy and Linguistics, pp. 169-190. Boulder, CO: Westview Press.

Dwyer, S. (2009). Moral dumbfounding and the linguistic analogy: Implications for the study of moral judgment. Mind and Language, 24, 274-96.

Flanagan, O. (1991). Varieties of Moral Personality: Ethics and Psychological Realism. Harvard UP.

Flanagan, O. (2009). Moral science? Still metaphysical after all these years, in Narvaez & Lapsley (eds.), Moral Personality, Identity and Character: An Interdisciplinary Future. Cambridge UP.

Foot, P. (1978). Virtues and Vices and Other Essays in Moral Philosophy. Berkeley, CA: University of California Press; Oxford: Blackwell.

Fox, C. & Kahneman, D. (1992). Correlations, causes and heuristics in surveys of life satisfaction. Social Indicators Research, 27, 221-34.

Funder, D. & Ozer, D. (1983).  Behavior as a function of the situation.  Journal of Personality and Social Psychology, 44, 107-112.

Greene, J. (2008). The secret joke of Kant’s soul. In W. Sinnott-Armstrong (ed.), Moral Psychology, volume 3, pp. 35-80. Cambridge: MIT Press.

Greene, J. (2012). Reflection and reasoning in moral judgment. Cognitive Science 36:1, 163-177.

Greene, J., Morelli, S., Lowenberg, K., Nystrom, L., & Cohen, J. (2008). Cognitive load selectively interferes with utilitarian moral judgment. Cognition, 107:3, 1144-1154.

Greene, J., Nystrom, L., Engell, A., Darley, J., & Cohen, J. (2004). The neural bases of cognitive conflict and control in moral judgment. Neuron, 44, 389-400.

Greene, J., Nystrom, L., Engell, A., Darley, J., & Cohen, J. (2004). The neural bases of cognitive conflict and control in moral judgment. Neuron, 44, 389-400.

Greene, J., Sommerveille, R., Nystrom, L., Darley, J., & Cohen, J. (2001). An fMRI investigation of emotional engagement in moral judgment. Science, 293, 2105-8.

Greene, J., Sommerville, B., Nystrom, L., Darley, J.,  Cohen, J. (2001). An fMRI investigation of emotional engagement in moral judgment. Science, 293, 2105-2108.

Grusec, J. & Redler, E. (1980). Attribution, reinforcement, and altruism: A developmental analysis, 16:5, 525-534.

Grusec, J., Kuczynski, L., Rushton, J., & Simutis, Z. (1978). Modeling, direct instruction, and attributions: Effects on altruism. Developmental Psychology, 14, 51-57.

Haidt, J. (2012). The Righteous Mind: Why Good People are Divided by Politics and Religion. New York: Pantheon.

Haidt, J. & Björklund, F. (2008). Social intuitionists answer six questions about moral psychology. In W. Sinnott-Armstrong (ed.), Moral Psychology, volume 2, pp. 181-218. Cambridge: MIT Press.

Harman, G. (1999).  Moral philosophy meets social psychology: Virtue ethics and the fundamental attribution error.  Proceedings of the Aristotelian Society, New Series 119, 316-331.

Harman, G. (2000).  The nonexistence of character traits.  Proceedings of the Aristotelian Society, 100: 223-226.

Harman, G. (2000). Explaining Value and Other Essays in Moral Philosophy. New York: Oxford UP.

Harman, G. (2003). No character or personality.  Business Ethics Quarterly, 13:1, 87-94.

Harman, G. (2008). Using a linguistic analogy to study morality. In W. Sinnott-Armstrong (ed.), Moral Psychology, volume 1, pp. 345-352. MIT Press.

Hauser, M. (2006). Moral Minds: How Nature Designed a Universal Sense of Right and Wrong. New York: Ecco Press/Harper Collins.

Hauser, M., Young, L., & Cushman, F. (2008). Reviving Rawls’s linguistic analogy: Operative principles and the causal structure of moral actions. In W. Sinnott-Armstrong (ed.), Moral Psychology, volume 2, pp. 107-144. MIT Press.

Haybron, D. (2008). The Pursuit of Unhappiness. Oxford UP.

Heathwood, C. (2005). The problem of defective desires. Australasian Journal of Philosophy, 83:4, 487-504.

Hurka, T. (2006). Virtuous act, virtuous dispositions. Analysis, 66:289, 69-76.

Inbar, Y., Pizarro, Knobe, J. & Bloom, P. (2009). Disgust sensitivity predicts intuitive disapproval of gays. Emotion, 9:3, 435-443.

Isen, A. (1987).  Positive affect, cognitive processes, and social behavior.  In L. Berkowitz (ed.) Advances in Experimental Social Psychology, volume 20, 203-254.  San Diego:  Academic Press.

Isen, A., Clark, M., & Schwartz, M. (1976). Duration of the effect of good mood on helping: “Footprints on the sands of time.” Journal of Personality and Social Psychology, 34, 385-393.

Isen, A. & Levin, P. (1972). The effect of feeling good on helping: Cookies and kindness.  Journal of Personality and Social Psychology, 21, 384-88.

Isen, A., Shalker, T., Clark, M., & Karp, L. (1978).  Affect, accessibility of material in memory, and behavior:  A cognitive loop.  Journal of Personality and Social Psychology, 36, 1-12.

Jensen, A. & Moore, S. (1977).  The effect of attribute statements on cooperativeness and competitiveness in school-age boys.  Child Development, 48, 305-307.

Kahneman, D., Diener, E., & Schwartz, N. (eds.). (2003). Wellbeing: The Foundations of Hedonic Psychology. New York: Russell Sage.

Kahneman, D. (2011). Thinking, Fast and Slow. New York: Farrar, Straus, & Giroux.

Kamtekar, R. (2004). Situationism and virtue ethics on the content of our character. Ethics, 114:3, 458-491.

Kelly, D. (2011). Yuck! The Nature and Moral Significance of Disgust. Cambridge: MIT Press.

Knobe, J. (2003). Intentional action and side effects in ordinary language. Analysis, 63:3, 190-194.

Knobe, J. (2004a). Folk psychology and folk morality: Response to critics. Journal of Theoretical and Philosophical Psychology, 24, 270-279.

Knobe, J. (2004b). Intention, intentional action and moral considerations. Analysis, 2, 181-187.

Knobe, J. (2006). The concept of intentional action: A case study in the uses of folk psychology. Philosophical Studies, 130:2, 203-231.

Knobe, J. (2007). Reason explanation in folk psychology. Midwest Studies in Philosophy, 31, 90-107.

Knobe, J. (2009). Cause and norm. Journal of Philosophy, 106:11, 587-612.

Knobe, J. (2010). Person as scientist, person as morality. Behavioral and Brain Sciences, 33, 315-329.

Knobe, J. (2011). Is morality relative? Depends on your personality. The Philosopher’s Magazine, 52, 66-71.

Knobe, J., & Mendlow, G. (2004). The good, the bad and the blameworthy: Understanding the role of evaluative reasoning in folk psychology. Journal of Theoretical and Philosophical Psychology, 24, 252-258.

Kripke, S. (1980). Naming and Necessity. Cambridge, MA: Harvard UP.

Kupperman, J. (2009). Virtue in virtue ethics. Journal of Ethics, 13:2-3, 243-255.

Lam, B. (2010). Are Cantonese speakers really descriptivists?  Revisiting cross-cultural semantics. Cognition, 115, 320-332.

Latané, B., & Darley, J. (1968).  Group inhibition of bystander intervention in emergencies.  Journal of Personality and Social Psychology, 10, 215-221.

Latané, B., & Darley, J. (1970).  The Unresponsive Bystander:  Why Doesn’t He Help?  New York:  Appleton-Century-Crofts.

Latané, B., & Nida, S. (1981).  Ten years of research on group size and helping.  Psychological Bulletin, 89, 308-324.

Latané, B., & Rodin, J. (1969). A lady in distress:  inhibiting effects of friends and strangers on bystander intervention.  Journal of Experimental Psychology, 5, 189-202.

Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 140, 1-55.

Lucas, R. (2007). Adaptation and the set-point model of subjective wellbeing: Does happiness change after major life events? Current Directions in Psychological Science, 16:2, 75-9.

Lucas, R., Clark, C., Georgellis, Y., & Diener, E. (2003). Reexamining adaptation and the set point model of happiness: Reactions to changes in marital status. Journal of Personality and Social Psychology, 84:3, 527-539.

Machery, E., Mallon, R., Nichols, S., & Stich, S. (2004). Semantics, cross-cultural style.  Cognition, 92:3, B1-B12.

MacIntyre, A. (1984).  After Virtue: A Study in Moral Theory.  Notre Dame: University of Notre Dame Press.

Matthews, K. E., & Cannon, L. K. (1975).  Environmental noise level as a determinant of helping behavior.  Journal of Personality and Social Psychology, 32, 571-577.

May, J. (forthcoming). Does disgust influence moral judgment? Australasian Journal of Philosophy.

Merritt, M. (2000). Virtue ethics and situationist personality psychology. Ethical Theory and Moral Practice, 3:4, 365-383.

Mikhail, J. (2007). Universal moral grammar: Theory, evidence, and the future. Trends in Cognitive Sciences, 11, 143-152.

Mikhail, J. (2008). The poverty of the moral stimulus. In W. Sinnott-Armstrong (ed.), Moral Psychology, volume 1, pp. 353-360. MIT Press.

Mikhail, J. (2011). Elements of Moral Cognition: Rawls’s Linguistic Analogy and the Cognitive Science of Moral and Legal Judgment. Cambridge UP.

Miller, C. (2013a). Moral Character: An Empirical Theory. Oxford UP.

Miller, C. (2013b). Character and Moral

Miller, R., Brickman, P., & Bolen, D. (1975). Attribution versus persuasion as a means for modifying behavior.  Journal of Personality and Social Psychology, 31:3, 430-441.

Mischel, W. (1968).  Personality and Assessment.  New York: Wiley.

Nadelhoffer, T. (2004). On praise, side effects, and folk ascriptions of intentionality. Journal of Theoretical and Philosophical Psychology, 24, 196-213.

Nadelhoffer, T. (2006). Bad acts, blameworthy agents, and intentional actions: Some problems for jury impartiality. Philosophical Explorations, 9, 203-220.

Nichols, S. (2002). On the genealogy of norms: A case for the role of emotion in cultural evolution. Philosophy of Science, 69, 234-255.

Nichols, S. (2004). Sentimental Rules. Oxford UP.

Nichols, S., Kumar, S., & Lopez, T. (unpublished manuscript). Rational learners and non-utilitarian rules.

Pettit, D. & Knobe, J. (2009). The pervasive impact of moral judgment.  Mind and Language, 24:5, 586-604.

Phelan, M., & Sarkissian, H. (2009). Is the ‘trade-off hypothesis’ worth trading for? Mind and Language, 24, 164-180.

Phillips, J., Misenheimer, L., & Knobe, J. (2011). The ordinary concept of happiness (and others like it). Emotion Review, 71, 929-937.

Phillips, J., Nyholm, S., & Liao, S. (forthcoming). The good in happiness. Oxford Studies in Experimental Philosophy, vol. 1. Oxford UP.

Prinz, J. (2007). The Emotional Construction of Morals. Oxford UP.

Prinz, J. (2008). Resisting the linguistic analogy: A commentary on Hauser, Young, and Cushman. In W. Sinnott-Armstrong (ed.), Moral Psychology, volume 2, pp. 157-170. MIT Press.

Rawls, J. (1971). A Theory of Justice. Cambridge, MA: Harvard UP.

Redelmeier, D. & Kahneman, D. (1996). Patients’ memories of painful medical treatments: Real-time and retrospective evaluations of two minimally invasive procedures. Pain, 1, 3-8.

Regan, J. (1971).  Guilt, perceived injustice, and altruistic behavior.  Journal of Personality and Social Psychology, 18, 124-132.

Robinson, B., Stey, P., & Alfano, M. (2013). Virtue and vice attributions in the business context: An experimental investigation. Journal of Business Ethics, 113:4, 649-661.

Roedder, E. & Harman, G. (2010). Linguistics and moral theory. In J. Doris (ed.), The Moral Psychology Handbook, pp. 273-296. Oxford UP.

Rönnow-Rasumussen, T. (2011). Personal Value. Oxford UP.

Rozin, P. (2008). Hedonic “adaptation”: Specific habituation to disgust/death elicitors as a result of dissecting a cadaver. Judgment and Decision Making, 3:2, 191-194.

Russell, D. (2009). Practical Intelligence and the Virtues. Oxford UP.

Scaife, R. & Webber, J. (forthcoming). Intentional side-effects of action. Journal of Moral Philosophy.

Schimmack, U. & Oishi, S. (2005). The influence of chronically and temporarily accessible information on life satisfaction judgments. Journal of Personality and Social Psychology, 89:3, 395-406.

Schwartz, S. & Gottlieb, A. (1991).  Bystander anonymity and reactions to emergencies.  Journal of Personality and Social Psychology, 39, 418-430.

Schwitzgebel, E. (2009). Do ethicists steal more books?  Philosophical Psychology, 22:6, 711-725

Schwitzgebel, E. & Cushman, F. (2012). Expertise in moral reasoning?  Order effects on moral judgment in professional philosophers and non-philosophers. Mind and Language, 27:2, 135-153.

Schwitzgebel, E. & Rust, J. (2010). Do ethicists and political philosophers vote more often than other professors? Review of Philosophy and Psychology, 1:2, 189-199.

Schwitzgebel, E., Rust, J., Huang, L., Moore, A., & Coates, J. (2011). Ethicists’ courtesy at philosophy conferences.  Philosophical Psychology, 25:3, 331-340.

Seligman, M. (2011). Flourish: A Visionary New Understanding of Happiness and Wellbeing. New York: Free Press.

Singer, P. & de Lazari-Radak, K. (forthcoming). Sidgwick and Contemporary Ethics.

Sinnott-Armstrong, W. (2008). Framing moral intuitions. In Sinnott-Armstrong (ed.), Moral Psychology, Vol. 2, 47-76.  MIT Press.

Sinnott-Armstrong, W., Mallon, R., McCoy, T., & Hull, J. (2008). Intention, temporal order, and moral judgments.  Mind and Language, 23:1, 90-106.

Snow, N. (2010). Virtue as Social Intelligence: An Empirically Grounded Theory. New York: Routledge

Sreenivasan, G. (2002). Errors about errors: Virtue theory and trait attribution. Mind, 111:441, 47-66.

Sripada, C. (2010). The deep self model and asymmetries in folk judgments about intentional action. Philosophical Studies, 151:2, 159-176.

Sripada, C. (2011). What makes a manipulated agent unfree? Philosophy and Phenomenological Research. doi: 10.1111/j.1933-1592.2011.00527.x

Sripada, C. (2012). Mental state attributions and the side-effect effect. Journal of Experimental Psychology, 48:1, 232-238.

Sripada, C. & Konrath, S. (2011). Telling more than we can know about intentional action. Mind and Language, 26:3, 353-380.

Stich, S. & Weinberg, J. (2001). Jackson’s empirical assumptions. Philosophy and Phenomenological Research, 62:3, 637-643.

Strack, F., Martin, L., & Schwartz, N. (1988). Priming and communication: Social determinants of information use in judgments of life satisfaction. European Journal of Social Psychology, 18, 429-442.

Strandberg, C. & Björklund, F. (forthcoming). Is moral internalism supported by folk intuitions? Philosophical Psychology, 1-17.

Strawson, P. (1962) Freedom and resentment. Proceedings of the British Academy, 48, 1-25.

Sytsma, J. & Livengood, J. (2011). A new perspective concerning experiments on semantic intuitions. Australasian Journal of Philosophy, 89:2, 315-332.

Tannenbaum, D., Ditto, P.H. & Pizarro, D.A. (2007). Different moral values produce different judgments of intentional action. Unpublished manuscript, University of California-Irvine.

Wagenmakers, E.-J., Wetzels, R., Borsboom, D., van der Maas, H., & Kievit, R. (forthcoming). An agenda for purely confirmatory research. Perspectives on Psychological Science.

Weinberg, J., Nichols, S., & Stich, S. (2008). Normativity and epistemic intuitions. In Knobe & Nichols (eds.) Experimental Philosophy, pp. 17-46. Cambridge: Cambridge UP.

Weld, C. (1848). A History of the Royal Society: With Memoirs of the Presidents, volume 1. Cambridge: Cambridge UP.

Weyant, J. (1978).  Effects of mood states, costs, and benefits on helping.  Journal of Personality and Social Psychology, 36, 1169-1176.

Williams, B. (1985).  Ethics and the Limits of Philosophy.  Cambridge:  Harvard UP.

Williamson, T. (2011). Philosophical expertise and the burden of proof. Metaphilosophy, 42:3, 215-229.

Williamson, T. (2007). The Philosophy of Philosophy. Oxford: Blackwell.

Yong, E. (2012). Nobel laureate challenges psychologists to clean up their act: Social-priming research needs “daisy chain” of replication.  Nature News.

Zagzebski, L. (2010). Exemplarist virtue theory. Metaphilosophy, 41:1, 41-57.

Zhong, C.-B., Bohns, V., & Gino, F. (2010). Good lamps are the best police: Darkness increases dishonesty and self-interested behavior. Psychological Science, 21:3, 311-314.


[1] With thanks to John Doris, Gilbert Harman, Chris Heathwood, Daniel Kelly, Clayton Littlejohn, Christian Miller, Joshua Knobe, James Beebe, Brian Robinson, Chandra Sripada, Carissa Veliz, Antti Kauppinen, John Mikhail, Dan Haybron, and Sven Nyholm for helpful comments and suggestions.

[3] Or, in the inimitable if apocryphal words of Quine, “The universe is no university.”

[4] This is of course a reference to the famous Isen & Levin (1972) phone booth experiment.

[5] His views are reported in Hopi Ethics (1974).

[6] I am also aware of the ongoing controversy surrounding null-hypothesis significance testing (NHST).  In a nutshell, the problem is that a p-value is a conditional probability, but not the conditional probability that it is often interpreted as.  A p-value is the probability that the result in hand would have been observed given the null hypothesis, i.e., given that nothing interesting is happening (no positive correlations, no negative correlations, no interaction effects, and so on).  This is often inverted by sloppy researchers and interpreters, who gloss the p-value as the probability of the null hypothesis given the observation.  Symbolically, the difference is between P(observation | null), not P(null | observation).  The latter, more desirable, conditional probability can be estimated using Bayesian statistical analysis, but seldom is (and there are other controversies surrounding Bayesian analysis, especially the arbitrariness of prior probabilities).  For an introduction to these problems, see Abelson (1997), Cohen (1994), and Wagenmakers et al. (forthcoming).

[7] See Doris & Stich (2006) and Stich & Weinberg (2001).

[8] The distinction between moral intuitions and moral judgments is fraught, but for the sake of this discussion, I’ll treat moral intuitions as moral seemings and moral judgments as considered moral beliefs.

[9] Though for incisive criticisms of this claim, see Banerjee, Huebner, & Hauser (2010), Sytsma & Livengood (2011), and Lam (2010).

[10] This and related research is discussed in more detail in subsection 2.3.

[11] Fiery Cushman and Liane Young (2009, 2011) have developed an alternative dual-process model for moral (and non-moral reasoning), as has Daniel Kahneman (2011).  I explore the dual-systems approach in more detail below, in the subsection on emotion and affect.

[12] See Nadelhoffer (2004, 2006); Knobe & Mendlow (2004); Knobe (2004a, 2004b, 2007); Pettit & Knobe (2009); Tannenbaum, Ditto, & Pizarro (2007); Beebe & Buckwalter (2010), Beebe & Jensen (forthcoming); Beebe (forthcoming); Alfano, Beebe, & Robinson (2012); Robinson, Stey, & Alfano (2013).

[13] Such scales are eponymously named for their inventor, Rensis Likert (1932, pronounced “LICK-urt”).  The basic idea with a Likert scale is that the participant is presented a statement and then asked to agree or disagree with it on a numeric scale.  Commonly, scales run from 1 to 7, 1 to 5, -3 to 3, or -2 to 2.  Almost always, the endpoints are labeled ‘strongly disagree’ and ‘strongly agree’.  Quite often, the midpoint is labeled ‘neither agree nor disagree’.  Sometimes other points on the scale are labeled as well.  Likert scales are ordinal rather than cardinal, meaning that the difference between a response of 2 and a response of 4 cannot be assumed to be equivalent to the difference between a response of 0 and a response of 2.  This is an important point, to which I return in the final section.

[14] The idea that seemingly predictive and explanatory concepts might also have a normative component is not entirely original with Knobe; Bernard Williams (1985, p. 129) pointed out that virtues and vices have such a dual nature.

[15] Owen Flanagan (1991) considered some of the same evidence before Doris and Harman, but he was reluctant to draw the pessimistic conclusions they did about virtue ethics.

[16] This is not the place to get into a lengthy discussion of what it means to explain variance in behavior.  The basic idea, however, is that the statistical analysis of experimental results yields a correlation between a personality variable (such as extroversion) and a behavioral variable (such as an act of helping).  Correlations range from -1 to +1.  A correlation of 0 means that the individual variable is of literally no use in predicting the behavioral outcome; a correlation of 1 means that the individual variable is a perfect positive predictor; a correlation of -1 means that the individual variable is a perfect negative predictor.  Actual correlations tend to be between -.3 and +.3.  The amount of variance explained by a given predictor variable is the square of the correlation between that variable and the behavior in question.  So, for instance, if extroversion is correlated with helping behavior at .25, then extroversion explains 6.25% of the variance in helping behavior.  I should point out that this is only one, rather simplistic, measure of explanatory power, but that personality variables do not look better on other measures, such as Cohen’s d, h2, or partial-h2.

[17] Merritt (2000) was the first to suggest that the situationist critique could be handled by offloading some of the responsibility for virtue onto the social environment.

[18] Despite what some of their less-than-charitable readers (e.g., Flanagan 2009, p. 55) say.

[19] One might hope that philosophical reflection on ethics would promote moral behavior.  Eric Schwitzgebel has recently begun to investigate whether professional ethicists behave more morally than their non-ethicist philosophical peers, and has found that, on most measures, the two groups are indistinguishable (Schwitzgebel 2009; Schwitzgebel & Rust 2010; Schwitzgebel et al. 2011).

[20] See, for instance, Diener, Scollon, & Lucas (2003).

[21] A new scale called the ‘Satisfaction with Life Scale’ (Diener et al. 2010) is also now in use; it is designed to facilitate cross-national and cross-cultural comparisons.

[22] Though see Schimmack & Oishi (2005) for a critical reply, which argues that chronically accessible information is a much better predictor of life satisfaction responses than temporarily accessible information, such as how many dates you went on last week.

[23] See Haybron (2008).

[24] See Nichols (2002; 2004, ch. 5) and Strandberg & Björklund (forthcoming) for an experimentally-motivated argument against internalism.

[25] See Kelly (2011, especially chapter 1) for a comprehensive literature review.  See also May (forthcoming) for a criticism of these findings.

[26] There does seem to be some potential for fine-tuning, however (Rozin 2008; Case et al. 2006).

[29] I am here indebted to Daniel Wodak.

[30] I am here indebted to Chris Heathwood.

Leave a Reply