Image: MiG 19. Public Domain.
The base-rate fallacy happens when available statistical data is ignored in favor of specific data to make a probability judgment.
The C.I.A. gives this example to illustrate the problem:
During the Vietnam War, a fighter plane made a non-fatal strafing attack on a US aerial reconnaissance mission at twilight. Both Cambodian and Vietnamese jets operate in the area. You know the following facts:
(a) Specific case information: The US pilot identified the fighter as Cambodian. The pilot’s aircraft recognition capabilities were tested under appropriate visibility and flight conditions. When presented with a sample of fighters (half with Vietnamese markings and half with Cambodian) the pilot made correct identifications 80 percent of the time and erred 20 percent of the time.
(b) Base rate data: 85 percent of the jet fighters in that area are Vietnamese; 15 percent are Cambodian.
Question: What is the probability that the fighter was Cambodian rather than Vietnamese?
A common procedure in answering this question is to reason as follows: We know the pilot identified the aircraft as Cambodian. We also know the pilot’s identifications are correct 80 percent of the time; therefore, there is an 80 percent probability the fighter was Cambodian. This reasoning appears plausible but is incorrect. It ignores the base rate–that 85 percent of the fighters in that area are Vietnamese. The base rate, or prior probability, is what you can say about any hostile fighter in that area before you learn anything about the specific sighting.
The correct way to do this is to use Bayesian reasoning:
If we suppose that there are 100 enemy fighter planes total, that means that 85 are Vietnamese and 15 are Cambodian.
From paragraph (a), we know that the eye-witness identifies correctly enemy planes 80% of the time, so out of 85 Vietnamese planes, he would identify 68 correctly (85 * 0.80 = 68) and erroneously identify 17 (85 * 0.20 = 17).
Out of the 15 Cambodian aircrafts, he would identify correctly 12 of them (15 * 0.80 = 12) and be mistaken about 3 (15 * 0.20 = 3).
This makes a total of 71 Vietnamese and 29 Cambodian sightings, of which only 12 of the 29 Cambodian sightings are correct; the other 17 are incorrect sightings of Vietnamese aircraft. Therefore, when the pilot claims the attack was by a Cambodian fighter, the probability that the craft was actually Cambodian is only 12/29ths or 41 percent, despite the fact that the pilot’s identifications are correct 80 percent of the time.
Ignore the base-rate in favor of specific data at your own risks! In some cases, it can make a huge difference.
Another example to make this kind of reasoning clearer:
1% of women at age forty who participate in routine screening have breast cancer. 80% of women with breast cancer will get positive mammographies. 9.6% of women without breast cancer will also get positive mammographies. A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer?
So, based on the numbers above, what is the probability that a woman who gets a positive mammography really has breast cancer?
Lets go through it. If there are 10,000 women screened, 1% will have breast cancer. So that’s 100. 80% of those will get a positive result, so that’s 80. That leaves us with 9,900 women who don’t have breast cancer. Out of those, 9.6% will get a false-positive result, so that’s 950 women.
You see where this is going?
So out of 10,000 women who get tested, 80 will have a real positive result and 950 will have a false positive, for a total of 1,030 positive results. Out of those, only 80 really have cancer, that’s 7.76% (80/1,030 * 100 = 7.76).
So if, with these numbers, you were to get a positive result to your mammography test, that would still mean that you only had a 7.76% chance of really having breast cancer.
Counter-intuitive, but true.
- Base rate fallacy at Wikipedia
- Base-Rate Fallacy by the CIA
- An Intuitive Explanation of Bayesian Reasoning By Eliezer Yudkowsky
See also: Rationality Resources