Fallacies of probability (Dr. Van Cleave)
Here we will consider some formal fallacies of probability. These fallacies are easy to spot once you see them, but they can be difficult to detect because of the way our minds mislead us—analogous to the way our minds can be misled when watching a magic trick. In addition to introducing the fallacies, I will suggest some psychological explanations for why these fallacies are so common, despite how easy they are to see once we’ve spotted them.
The conjunction fallacy
The
conjunction fallacy is best introduced with an example.1
Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.
Given this information about Linda, which of the following is more probable?
a. Linda is a bank teller.
b. Linda is a bank teller and is active in the feminist movement.
If you are like most people who answer this question, you will answer “b.” But that cannot be correct because it violates the basic rules of probability. In particular, notice that option b contains option a (i.e., Linda is a bank teller). But option b also contains more information—that Linda is also active in the feminist movement. The problem is that a conjunction can never be more probable than either one of its conjuncts. Suppose we say it is very probable that Linda is a bank teller (how boring, given the description of Linda which makes her sound interesting!). Let’s set the probability low, say .4. Then what is the probability of her being active in the feminist movement? Let’s set that high, say .9. However, the probability that she is both a bank teller and active in the feminist movement must be computed as the probability of a conjunction, like this:
.4 × .9 = .36
So given these probability assignments (which I’ve just made up but seem fairly plausible), the probability of Linda being both a bank teller and active in the feminist movement is .36. But .36 is a lower probability than .4, which was the probability that she is bank teller. So option b cannot be more probable than option a. Notice that even if we say it is absolutely certain that Linda is active in the feminist movement (i.e., we set the probability of her being active in the feminist movement at 1), option b is still only equal to the probability of option a, since (.4)(1) = .4.
Sometimes it is easy to avoid committing conjunction fallacies. Here is an example that illustrates that we can in fact easily see that a conjunction is not more probable than either of its conjuncts.
Which is more probable?
a. Mark has hair
b. Mark has blonde hair
In this case, it is clear which of the options is more probable. Clearly a is more probable since it requires less to be true. Option a would be true even if option b is true. But option a could also be true even if option b were false (i.e., Mark could have brown hair or red hair, etc.).
Thus
there are cases where we can easily avoid committing the conjunction
fallacy. So what is the difference
between this case and the Linda case?
The Nobel Prize-winning psychologist, Daniel Kahneman (and his long-time
collaborator, Amos Tversky), has for many years suggested a psychological
explanation for this difference. The
explanation is complex, but I can give you the gist of it quite simply. Kahneman suggests that our minds are wired to
find patterns and many of these patterns we find are based on what he calls
“representativeness.” In the Linda case,
the idea of Linda being active in the feminist movement fits better with the
description of Linda as a philosophy major, as being active in social justice
movements, and, perhaps, as being single.
We build up a picture of Linda and then we try to match the descriptions
to her. “Bank teller” doesn’t really
match anything in the description of Linda.
That is, the description of Linda is not representative of a bank
teller. However, for many people, it is
representative of a feminist. Thus, our
minds more or less automatically see the match between representativeness of
the description of Linda and option b, which mentions she is a feminist. Kahneman thinks that in cases like these, our
minds substitute a question of representativeness for the question of
probability, thus answering the probability question incorrectly.2 We are distracted from the probability
question by seeking representativeness, which our minds more automatically look
for and think about than probability.
For Kahneman, the psychological explanation is needed to explain why
even trained mathematicians and those who deal regularly with probability still
commit the conjunction fallacy. The
psychological explanation that our brains are wired to look for
representativeness, and that we unwittingly substitute the question of
representativeness for the question of probability, explains why even experts
make these kinds of mistakes.
The base rate fallacy
Consider the following scenario. You go in for some testing for some health problems you’ve been having and after a number of tests, you test positive for colon cancer. What are the chances that you really do have colon cancer? Let’s suppose that the test is not perfect, but it is 95% accurate. That is, in the case of those who really do have colon cancer, the test will detect the cancer 95% of the time (and thus miss it 5% of the time). (The test will also misdiagnose those who don’t actually have colon cancer 5% of the time.) Many people would be inclined to say that, given the test and its accuracy, there is a 95% chance that you have colon cancer. However, if you are like most people and are inclined to answer this way, you are wrong. In fact, you have committed the fallacy of ignoring the base rate (i.e., the base rate fallacy).
The base rate in this example is the rate of those who have colon cancer in a population. There is very small percentage of the population that actually has colon cancer (let’s suppose it is .005 or .5%), so the probability that you have it must take into account the very low probability that you are one of the few that have it. That is, prior to the test (and not taking into account any other details about you), there was a very low probability that you have it—that is, a half of one percent chance (.5%). Yes, the test is 95% accurate, but given the very low prior probability that you have colon cancer, we cannot simply now say that there is a 95% chance that you have it. Rather, we must temper that figure with the very low base rate. The general point is this: when a condition (x) is very rare, then even if a highly accurate test identifies condition (x) as being present, we should still suspect that condition (x) is not present. In the above scenario, the prior probability (i.e., before the test) that you have colon cancer is really, really low. And that means that even after the test we should suspect that the probability is still fairly low. That’s the logic of the matter and we can understand that conceptually without actually even getting into the math.
But since we are given numbers to work with, we can actually use math to figure the actual probability that you have colon cancer. Here is how we do it. Let’s suppose that our population is 100,000 people. The base rate tells us that .5% of the population has colon cancer. That means that of the 100,000 people only 500 of them have colon cancer. If we were to apply the 95% accurate test to 500 people, the test would correctly diagnose 475 of them. That is, the test would deliver 475 correct identifications. These are called true positives. However, the test will also mistakenly tell us that some of the people who don’t have colon cancer actually do have it. When this happens, it is called a false positive. Our base rate tells us that most of our population (99,500) do not have colon cancer and the 95% accurate test will misdiagnose 5% of those has having colon cancer. This comes out to 4,975 false positives!
So what are the chances that you are true positive rather than a false positive? It is simply the number of true positives (475) divided by the total number of positive identifications that the test would make. This latter number includes those the test would misidentify (4,975) as well as the number it would accurately identify (475)—thus the total number the test would identify as having colon cancer would be 5,450. So the probability that you have it, given the positive test = 475/5450 = .087 or 8.7%. So the probability that you have cancer, given the evidence of the positive test is 8.7%. Thus, contrary to our initial reasoning that there was a 95% chance that you have colon cancer, the chance is only a tenth of that—it is less than 10%! In thinking that the probability that you have cancer is closer to 95% you would be ignoring the extremely low probability of having the disease in the first place (i.e., the low base rate). Neglecting to account for low base rates in determining the probability of some event is the signature of any base rate fallacy.
The general lesson here is that the number of false positives will be quite high (even when the identification method is fairly accurate) as long as the base rate of the phenomenon we’re looking for is very low. And if the number of false positives is high, then this will significantly lower the probability that the identification method has correctly identified the phenomenon in question.
From the above example we can see that the general method for determining probabilities when base rates are in play is the following.
- Determine the number of false positives (i.e., the number of instances that are incorrectly identified by the method)
- Determine the number of true positives (i.e., the number of instances that are correctly determined by the method)
- Use the following equation to figure probability:
true positives
true positives + false positives
Before closing this section, let’s look at a couple more examples of a base rate fallacy. Suppose that the government has developed a machine that is able to detect terrorist intent with an accuracy of 90%. During a joint meeting of congress, a highly trustworthy source says that there is a terrorist in the building. (Let’s suppose, for the sake of simplifying this example, that there is in fact a terrorist in the building.) In order to determine who the terrorist is, the building security seals all the exits, rounds up all 3000 people in the building and uses the machine to test each person. The first 30 people pass without triggering a positive identification from the machine, but on the very next person, the machine triggers a positive identification of terrorist intent. The question is: what are the chances that the person who set off the machine really is a terrorist?[3] Consider the following three possibilities: a) 90%, b) 10%, or c) .3%. If you answered 90%, then you committed the base rate fallacy again. The actually answer is “c”—less than 1%! Here’s why. The base rate is the likelihood that any given individual is a terrorist and this is exceedingly low since there is only one terrorist in the building and there are 3000 people in the building. That means the probability of any one person being a terrorist, before any results of the test, is exceedingly low: 1/3000. Since the test is 90% accurate, that means that out of the 2999 non-terrorists, it will misidentify 10% of them as terrorists = ~300 false positives. Assuming the machine doesn’t misidentify the one actual terrorist, the machine will identify a total of 301 individuals as those “possessing terrorist intent.” The probability that any one of them actually possesses terrorist intent is 1/301 = .3%. So the probability is drastically lower than 90%. It’s not even close. This is another good illustration of how far off probabilities can be when the base rate is ignored.
Last one. Suppose that Bob is a super-eyewitness. When he observers a crime he is 99% accurate in identifying the suspect. Suppose that Bob identifies Nancy as the person who robbed the American Apparel store. In a population where .5% (half of one percent) of the population are robbers, what is the probability that Nancy really is a robber, given that Bob’s eyewitness testimony identified her as a robber?
At this point, having been sensitized to the base rate fallacy, you should suspect that the probability is nowhere near as high as the accuracy of Bob’s eyewitness skills. Here’s the math. Suppose our population is 1000 people: 995 non-robbers and 5 robbers (based on the above base rate of robbers in the population). Of 995 non-robbers, Bob will misidentify 9.95 as robbers (false positives) and accurately identify 4.95 as robbers for a totally of 14.9 robber-identifications. So the chances that Nancy really is a robber, given Bob’s eyewitness evidence is:
# of robbers Bob correctly identifies
total # of Bob's robber identifications
= 4.95/14.9 = 33%. Thus, it is more likely that Nancy is not the robber.
The small numbers fallacy
Suppose a study showed that of the 3,141 counties of the United States, the incidence of kidney cancer was lowest in those counties which are mostly rural, sparsely populated, and located in traditionally Republican states. In fact, this is true.4 What accounts for this interesting finding? Most people would be tempted to look for a causal explanation—to look for features of the rural environment that account for the lower incidence of cancer. However, they would be wrong (in this case) to do so. It is easy to see why once we consider the counties that have the highest incidence of kidney cancer: they are counties that are mostly rural, sparsely populated, and located in traditionally Republican states! So whatever it was you thought might account for the lower cancer rates in rural counties can’t be the right explanation, since these counties also have the highest rates of cancer. It is important to understand that it isn’t the same counties that have the highest and lowest rates—for example, county X doesn’t have both a high and a low cancer rate (relative to other U.S. counties). That would be a contradiction (and so can’t possibly be true). Rather, what is the case is that counties that have the highest kidney cancer rates are “mostly rural, sparsely populated, and located in traditionally Republican states” but also counties that have the lowest kidney cancer rates are “mostly rural, sparsely populated, and located in traditionally Republican states.” How could this be? Before giving you the explanation, I’ll give you a simpler example and see if you can figure it out from that example.
Suppose that a jar contains equal amounts of red and white marbles. Jack and Jill are taking turns drawing marbles from the jar. However, they draw marbles at different rates. Jill draws 10 marbles at a time while Jack draws 3 marbles at a time. Who is more likely to draw either all red or all white marbles more often: Jack or Jill?5
The answer here should be obvious: Jack is more likely to draw marbles of all the same color more often, since Jack is only drawing 3 marbles at a time. Since Jill is drawing 10 marbles at a time, it will be less likely that her draws will yield marbles of all the same color. This is simply a fact of sampling and is related to the sampling errors discussed earlier. A sample that is too small will tend not to be representative of the population. In the marbles case, if we view Jack’s draws as samples, then his samples will be far less representative of the ratio of marbles in the jar. Jill, on the other hand, will tend to get far more representative samples. Since Jill is drawing a larger number of marbles, it is less likely that her samples would be drastically off in the way Jack’s would be. The general point to be taken from this example is that smaller samples tend to the extremes—both in terms of overrepresenting some feature (e.g., red marbles) and in underrepresenting that same feature.
Can you see how this might apply to the case of kidney cancer rates in rural, sparsely populated counties? There is a national kidney cancer rate which is an average of all the kidney cancer rates of the 3,141 counties in the U.S. Imagine ranking each county in terms of the cancer rates from highest to lowest. The finding is that there is a relatively larger proportion of the sparsely populated counties at the top of this list, but also a relatively larger proportion of the sparsely populated counties at the bottom of the list. But why would it be that the more sparsely populated counties would be overrepresented at both ends of the list? The reason is that these counties have smaller populations, so they will tend to have more extreme results (of either the higher or lower rates). Just as Jack is more likely to get either all white marbles or all red marbles (an extreme result), the less populated counties will tend to have cancer rates that are at the extreme, relative to the national average. And this is a purely statistical fact; it has nothing to do with features of those environments causing the cancer rate to be higher or lower. Just as Jack’s extreme draws have nothing to do with the way he is drawing (but are simply the result of statistical, mathematical facts), the extremes of the smaller counties have nothing to do with features of those counties, but only with the fact that they are smaller and so will tend to have more extreme results (i.e., cancer rates that are either higher or lower than the national average).
The first take home lesson here is that smaller groups will tend to be less representative than larger groups (thus, smaller groups will tend to inhabit the extremely of a spectrum). We can call this the law of small numbers. The second take home message is that our brains are wired to look for causal explanations rather than mathematical explanations, and because of this we are prone to ignore the law of small numbers and look for a causal explanation of phenomena instead. The small numbers fallacy is our tendency to seek a causal explanation for some phenomenon when only the law of small numbers is needed to explain that phenomenon.
We will end this section with a somewhat humorous and incredible example of a small numbers bias that, presumably, wasted billions of dollars. This example, too, comes from Kahneman, who in turn heard the anecdote from some of his colleagues who are statisticians.6 Some time ago, the Gates foundation (which is the charitable foundation of Microsoft founder, Bill Gates) donated $1.7 billion to research a curious finding: smaller schools tend to be more successful than larger schools. That is, if you consider a rank ordering of the most successful schools, the smaller schools will tend to be overrepresented near the top (i.e., there is a higher proportion of them near the top of the list compared to the proportion of larger schools at the top of the list). This is the finding that the Gates Foundation invested $1.7 billion dollars to help understand. In order to do so, they created smaller schools, sometimes splitting larger schools in half. However, none of this was necessary. Had the Gates Foundation (or those advising them) looked at the characteristics of the worst schools, they would have found that those schools also tended to be smaller! The “finding” is merely a result of the law of small numbers: smaller groups tend towards the extremes (on both ends of a spectrum) more so than larger groups. In this case, the fact that smaller schools tend to be both more successful and less successful is explained in the same way as we explain why Jack tends to get either all red or all white marbles more often than Jill.
Regression to the mean fallacy
Humans are prone to see causes even when no such cause is present. For example, if I have just committed some wrong and then immediately after the thunder cracks, I may think that my wrong action caused the lightning (e.g., because the gods were angry with me). The term “snake oil” refers to a product that promises certain (e.g., health) benefits but is actually fraudulent and has no benefits whatsoever. For example, consider a product that is supposed to help you recover from a common cold. You take the medicine and then within a few days, you are all better! No cold! It must have been the medicine. Or maybe you just regressed to the mean. Regression to the mean describes the tendency of things to go back to normal or to return to something close to the relevant statistical average. In the case of a cold, when you have a cold, you are outside of the average in terms of health. But you will naturally return to the state of health, with or without the “medicine.” If anyone were to try to convince you to buy such a medicine, you shouldn’t. Because the fact that you got better from your cold more likely has to do with the fact that you will naturally regress to the mean (return to normal) than it has to do with the special medicine.
Another example. Suppose you live in Lansing and it has been over 100 degrees for two weeks straight. Someone says that if you pay tribute and do a special dance to Baal, the temperature will drop. Suppose you do this and the temperature does drop. Was it Baal or just regression to the mean? Probably regression to the mean, unless we have some special reason for thinking it is Baal. The point is, extreme situations tend to regress towards less extreme, more average situations. Since it is very rare for it to ever be over 100 degrees in Lansing, the fact that the temperature drops is to be expected, regardless of one’s prayers to Baal.
Suppose that a professional golfer has been on a hot streak. She has been winning every tournament she enters by ten strokes—she’s beating the competition like they were middle school golfers. She is just playing so much better than them. Then something happens. The golfer all of a sudden starts playing like average. What explains her fall from greatness? The sports commentators speculate: could it be that she switched her caddy, or that it is warmer now than is was when she was on her streak, or perhaps it was fame that went to her head once she had started winning all those tournaments? Chances are, none of these are the right explanation because no such explanation is needed. Most likely she just regressed to the mean and is now playing like everyone else—still like a pro, just not like a golfer who is out of this world good. Even those who are skilled can get lucky (or unlucky) and when they do, we should expect that eventually that luck will end and they will regress to the mean.
As these examples illustrate, one commits the regression to the mean fallacy when one tries to give a causal explanation of a phenomenon that is merely statistical or probabilistic in nature. The best way to rule out that something is not to be explained as regression to the mean is by doing a study where one compares two groups. For example, suppose we could get our snake oil salesman to agree to a study in which a group of people who had colds took the medicine (experimental group) and another group of people didn’t take the medicine or took a placebo (control group). In this situation, if we found that the experimental group got better and the control group didn’t, or if the experimental group got better more quickly than the control group, then perhaps we’d have to say that maybe there is something to this snake oil medicine. But without the evidence of a control for comparison, even if lots of people took the snake oil medicine and got better from their colds, it wouldn’t prove anything about the efficacy of the medicine.
Gambler’s fallacy
The gambler’s fallacy occurs when one thinks that independent, random events can be influenced by each other. For example, suppose I have a fair coin and I have just flipped 4 heads in a row. Erik, on the other hand, has a fair coin that he has flipped 4 times and gotten tails. We are each taking bets that the next coin flipped is heads. Who should you bet flips the head? If you are inclined to say that you should place the bet with Erik since he has been flipping all tails and since the coin is fair, the flips must even out soon, then you have committed the gambler’s fallacy. The fact is, each flip is independent of the next, so the fact that I have just flipped 4 heads in a row does not increase or decrease my chances of flipping a head. Likewise for Erik. It is true that as long as the coin is fair, then over a large number of flips we should expect that the proportion of heads to tails will be about 50/50. But there is no reason to expect that a particular flip will be more likely to be one or the other. Since the coin is fair, each flip has the same probability of being heads and the same probability of being tails—50%.
1 The following famous example comes from Tversky, A. and Kahneman, D. (1983). Extension versus intuitive reasoning: The conjunction fallacy in probability judgment. Psychological Review, 90(4), 293–315.
2 Kahneman gives this explanation in numerous places, including, most exhaustively (and for a general audience) in his 2011 book, Thinking Fast and Slow. New York, NY: Farrar, Straus and Giroux.
3 This example is taken (with certain alterations) from: http://news.bbc.co.uk/2/hi/uk_news/magazine/8153539.stm
4 This example taken from Kahneman (2011), op. cit., p. 109.
5 This example is also taken (with minor modifications) from Kahneman (2011), p. 110.
6 Kahneman (2011), pp. 117-118.