Q: If drug B has a higher success rate (%age of cures) than drug A when given to women, and also when given to men, does it have a higher success rate when given to people in general?
[2 Apr 1997]

A: Not necessarily, e.g.

Women Men Success Failure 85 31 4 5 3 1 1 1

Then for women B (75%) is better than A (73%) and for men B (50%) is again better than A (44%), but for people in general A (71%) is better than B (67%).

Q: What's the smallest possible "paradoxical" situation (i.e. smallest total number of people)? There are two versions of the problem depending on whether we allow entries to be 0.
[2 Apr 1997]

A: When I first considered this puzzle I found the following two examples by hand, and wondered whether they were minimal:

Zeros not allowed (20 people):

Women Men Success Failure 3 4 1 5 1 1 1 4

Zeros allowed (9 people):

Women Men Success Failure 2 1 0 1 1 0 1 3

In 2011 I found that the first of these was quoted in Impossible? Surprising solutions to Counterintuitive Conundrums by Julian Havil (published 2008, ISBN 978-0-691-13131-3, available from Amazon for example). This inspired me to settle the question with a quick exhaustive computer search.

It turns out that for the zeros-not-allowed version the 20-person example above is not quite the best possible. The minimum possible total is 19, and there are two essentially different solutions:

Women Men Success Failure 2 1 2 5 3 2 1 3

and

Women Men Success Failure 2 1 3 5 3 2 1 2

For the zeros-allowed version the 9-person example above is minimal, and is essentially the only solution.
[20 Mar 2011]

Q: Here's another striking version. A greengrocer sells apples at a fixed price per fruit, and oranges similarly. Each day an apple costs more than an orange. I buy fruit on several days. On average, did my apples cost me more per fruit than my oranges?
[20 Mar 2011]

A: Not necessarily. For example:

• On Monday apples cost 9p each and oranges cost 8p each, and I buy an apple and two oranges.
• On Tuesday apples cost 3p each and oranges cost 2p each, and I buy two apples and an orange.
• Overall I bought three apples which cost a total of 15p, and three oranges which cost a total of 18p. So on average my apples cost 5p each but my oranges cost 6p each.

Simpson's paradox can be interpreted as a sign that we're asking the wrong question. In the drugs trial we shouldn't be asking whether drug A is better than drug B, but rather why both drugs are more effective on women than on men. At the greengrocer's we shouldn't be asking whether apples cost more than oranges, but rather why the prices of both fruit changed so much on Tuesday.
[20 Mar 2011]

Q: Call the above situation a 2-level paradox, because we're measuring the drugs' effectiveness at two levels: the gender level and the overall population level. Is it possible to have a 3-level paradox? For example, is it possible that drug A has a higher success rate on people, but drug B has a higher success rate on women and on men, but drug A has a higher success rate on each of young women, old women, young men and old men?
[Haidar Al-Dhalimy, 2 Apr 2021]

Women Men Young Old Young Old Success Failure 2 3 3 1 1 3 1 1 1 2 5 2 1 4 4 5

The better drug for a given subpopulation is shown in red text:

• For young women drug A cures 2 out of 5 = 40% and drug B cures 1 out of 3 = 33%.
• For old women drug A cures 3 out of 4 = 75% and drug B cures 5 out of 7 = 71%.
• For women overall drug A cures 5 out of 9 = 56% and drug B cures 6 out of 10 = 60%.
• For young men drug A cures 1 out of 4 = 25% and drug B cures 1 out of 5 = 20%.
• For old men drug A cures 1 out of 2 = 50% and drug B cures 4 out of 9 = 44%.
• For men overall drug A cures 2 out of 6 = 33% and drug B cures 5 out of 14 = 36%.
• Overall drug A cures 7 out of 15 = 47% and drug B cures 11 out of 24 = 46%.

By exhaustive search this example is minimal for the zeros-not-allowed version, although there are other minimal examples.

Inspired by this example I got a bit carried away and found a 6-level paradox. This diagram is transposed relative to the tables above, so each column aggregates pairs of subpopulations from the previous column. This diagram is also available as a PDF document.

By exhaustive search this example is minimal for the zeros-not-allowed version, although there are probably other minimal examples.

I produced a 6-level example because the table still fits on one page, but I stopped there because I'm not sure we'd learn very much by going further. I can't really see any structure in these examples, and I'm just using a brute-force search so I don't have a method for constructing them. I don't even have a bound on the size of minimal examples at a given level.

In this 6-level example the effect simply reverses at each level. However I think that in fact we could specify any pattern of red text in the diagram, and then find a population where colouring the better drug in each subpopulation yields that pattern, although I haven't proved it.

Back to puzzles