I’ve been a booster of the Tversky-Kahneman cognitive-bias revolution since I read their article in Scientific American as a high school student. (To be honest, I’d always lazily thought of it as Tversky’s work, but Daniel Kahneman has had the good sense not to die prematurely, and to collect a Nobel memorial prize.) And I’ve greatly enjoyed Kahneman’s new popular book on his collected lessons from many decades of research on cognitive biases.
Putting that together with my longstanding contempt for the finance profession (expressed at greatest length here, but more generally listed here), I was particularly delighted to start in on the chapter titled “The Illusion of Validity”, where Kahneman lays into the self-serving illusions of finance professionals. It turns out, though, that this chapter is an intellectual trainwreck, with oversimplifications piling up on crude distortions, while the whistle of self-satisfied self-promotion shrills incessantly in the background. It’s both insufferable and so poorly reasoned that it begins to call the reliability of the rest of the book into question. Kahneman doesn’t claim to be free of the cognitive biases he analyses in others, but you might expect more self-awareness.
The first part of the chapter tells a story that I’ve read before, and have found quite illuminating, about the yawning gap between the failure of his Israeli Defence Force psychology team, lo many years ago, to predict the future success of officer candidates on the basis of observing them in a test, and their illusory confidence in their predictions. But then he applies this “lesson” to “The illusion of stock-picking skill”. He and two colleagues visited a senior investment manager 30 years ago to discuss the role of judgement biases in investing.
‘When you sell a stock,’ I asked, ‘who buys it?’ He answered with a wave in the vague direction of the window, indicating that he expected the buyer to be someone very much like him. That was odd: What made one person buy and the other sell? What did the sellers think they knew that the buyers did not?
Since then, my question about the stock market have hardened into a larger puzzle: a major industry appears to be built largely on an illusion of skill. Billions of shares are traded every day, with many people buying each sock and others selling it to them… Most of the buyers and sellers know that they have the same information: they exchange the stocks primarily because they have different opinions. The buyers think the price is too low and likely to rise, while the sellers think the price is high and likely to drop. The puzzle is why buyers and sellers alike think that the current price is wrong. What makes them believe they know more about what the price should be than the market does? For most of them, that belief is an illusion.
Now, I don’t mean to cast doubt on the basic assertion that most investment professionals are delusional, and purely as a matter of belief I would affirm the creed that international finance is in the hands of dangerous lunatics.* But as a matter of logic, the mere fact that proposition D (financiers are delusional) is true, does not imply that every syllogism that has D as a conclusion is valid. Most truths are (in Kantian language) synthetic truths: One could imagine a financial sector that was not a lunatic asylum, it just happens empirically not to be the one we have. Most simple demonstrations of why other people are obviously stupid are wrong. I suppose it’s understandable that someone like Kahneman, who has indeed had the experience of establishing simple truths that a large number of smart people had failed to notice, might be inclined to discount the warning signs that other people may understand something in a more complex way than he does.
Kahneman’s argument here was anticipated, in a different context, by the great thinker Douglas Adams, in The Hitchhiker’s Guide to the Galaxy:
Bypasses are devices that allow some people to dash from point A to point B very fast while other people dash from point B to point A very fast. People living at point C, being a point directly in between, are often given to wonder what’s so great about point A that so many people from point B are so keen to get there, and what’s so great about point B that so many people from point A are so keen to get there. They often wish that people would just once and for all work out where the hell they wanted to be.
It’s about as cogent as Kahneman’s argument, but Adams’s version is demonstrably funnier. If Kahneman had written The Hitchhiker’s Guide, he would have written,
Commuters travel billions of miles every day, with some people going east and others going west… Most of the commuters know that they have the same information: they travel primarily because they have different opinions. The commuters from the east think it’s better to be in the west in the morning, and the commuters from the west think it’s better to be in the east; and in the evening they reverse their opinions. The puzzle is why both groups think that their current location is wrong. For most of them, that belief is an illusion.
Now, at any given time there are some people travelling to a new location that they consider to be superior in some absolute sense, and they may very well wonder why some other people don’t recognise that superiority and desert the inferior location. (In fact, Kahneman discusses elsewhere the mistaken but widespread belief that people in California are generally happier, what with their daily doses of sunshine, than the accursed midwesterners.) But most commuters and travellers, and probably even many migrants, recognise that they are not climbing some gradient of geographic quality, but going someplace that is better for them at this time.
Now, again, it may be an empirical fact that some large percentage of equity traders think they can recognise a stock that’s about to rise, on the basis of no special expertise, and buy it cheap from someone less insightful. But it’s not the kind of truth that you can establish on the basis of a five-line sneer. Most investment professionals are not picking individual stocks, but are assembling portfolios. Suppose I own many shares of AAA US subprime mortgage-backed securities and lots of oil futures. You own wheat futures and shares of Dystopian Ammunition and Survival Supplies. In the state of the world where (hypothetically) people like us manage to engineer a collapse of the world economy, the US housing market and oil prices are going to plummet simultaneously, while wheat and ammunition are both going to seem like pretty good things to be holding. So swapping the oil futures for the wheat futures could make us both better off, by reducing our risk.
Again, I’m not casting doubt on the empirical fact that most investment professionals are providing negative value to their marks clients; only pointing out that this is a synthetic truth, not analytic as Kahneman seems to believe. (It is, on the other hand, something like a synthetic truth that an average investor would be better off investing in a market average than with an active investor, simply because of the mathematical fact that the market average gives the average of all investors’ performance, QED. See here for further discussion. But this is a different point than the one that Kahneman is making.) As John von Neumann wrote to Stanislaw Ulam in 1939,
I refuse to accept however, the stupidity of the Stock Exchange boys, as an explanation of the trend of stocks. Those boys are stupid alright, but there must be an explanation of what happens, which makes no use of this fact.
Actually, his reasoning goes off the rails elsewhere in the same chapter, also in the context of business. Here he laments the inability of otherwise intelligent people to appreciate the role of luck in success, which I would guess to be basically a true observation. He was invited to speak to a group of investment advisers that served “very wealthy clients”. He ranked the different advisers in each year, and computed the correlation coefficients between the rankings in different years. His argument was that skill would be demonstrated through a correlation between performance in different years, and when he discovers that the average correlation was essentially 0, he concludes that investing is simply a game of chance.
The logic is simple: if individual differences in any one year are due entirely to luck, the ranking of investors and funds will vary erratically and the year-to-year correlation will be zero. Where there is skill, however, the rankings will be more stable. The persistence of individual differences is the measure by which we confirm the existence of skill among golfers, car salespeople, orthodontists, or speedy toll collectors on the turnpike.
Let’s consider an analogous situation: A hospital has five general surgeons on staff. DK ranks them each year according to the survival rate of their patients for coronary bypass surgery, and discovers that there is no correlation between a surgeon’s rankings in different years. Over 10 years the rankings look to be completely random. DK concludes that there is no skill involved in coronary bypass surgery. Someone less brilliant than DK might conclude, instead, that the surgeons are all equally skillful.
There are two fallacies at work here: A false reversal of implication, and neglect of the comparison group. Certainly, lack of correlation isconsistent with investing being a game of chance, but it does not imply it. We would quickly recognise the skill required for coronary bypass surgery if we compared the professionals not with their colleagues, but with a control group randomly selected from the same town. Now, it may be that the variability of skill among surgeons is sufficiently great that you would inevitably recognise differences in performance among five of them. But it could be that this procedure is sufficiently routine and undemanding (consider, perhaps, appendectomies instead), that there is a broad plateau of skill: You need a trained professional, but almost any professional will do about the same job. Indeed, Kahneman had just finished presenting convincing evidence that the vast majority of amateur investors blunder in predictable ways, losing vast sums of money to the broad class of professional investors. So clearly some skill must be involved in professionals distinguishing themselves consistently from the amateurs, whether or not this shows up as consistent differences among the professionals. (There could be some assortative mating going on here: Within a firm there will be investors of approximately the same skill level. Those who are better than the others seek out a higher quality firm, so they won’t be associated with a bunch of losers. And those who are below average get pushed out.)
That’s not how Kahneman sees it. He scoffs at the failure of others to accept his absolute rightness:
The next morning, we reported the findings to the advisers, and their response was equally bland. Their own experience of exercising careful judgment on complex problems was far more compelling to them than an obscure statistical fact. When we were done, one of the executives I had dined with the previous evening drove me to the airport. He told me, with a trace of defensiveness, “I have done very well for the firm and no one can take that away from me.” I smiled and said nothing. But I thought, “Well, I took it away from you this morning. If your success was due mostly to chance, how much credit are you entitled to take for it?”
I suppose it can be seen as instructive, that someone who has spent most of his life thinking about chance, and how people misunderstand it, should himself be so self-righteously confused on simple matters of inference concerning chance.
Otherwise, the book is hugely insightful, and a delight to read…
* His partner in self-love, Nassim Taleb (and mutual admirer: Taleb is cited repeatedly in this book for his remarkable insights, and has reciprocated with a fawning paragraph on the book jacket), has maintained, on the basis of extensive experience (and what seems to be genuine understanding of the issues involved), that the vast majority of traders are deluded by the illusion of their own skill, ignoring the role of luck in their success.
I am reading Daniel Kahneman “Thinking Fast and Slow” where he talks about regression to the mean, where he writes:
“If you treated a group of depressed children for some time with an energy drink, they would show a clinically significant improvement. It is also the case that depressed children who spend some time standing on their head or hug a cat for twenty minutes a day will also show improvement. Most readers of such headlines will automatically infer that the energy drink or the cat hugging caused an improvement, but this conclusion is completely unjustified. Depressed children are an extreme group, they are more depressed than most other children – and extreme groups regress to the mean over time. The correlation between depression scores on successive occasions of testing is less than perfect, so there will be regression to the mean: depressed children will get somewhat better over time even if they hug no cats and drink no Red Bull.”
I have trouble with his example. I think he means that over time, the same group of children will show “some” improvement; he does not mean that over time the group will regress to the mean of all children. He states that the correlation between depression scores on successive occasions of testing is less than perfect. But if the test for depression is very good, meaning that the correlation between scores on successive occasions is high, the regression to the mean over time will be small and so the perceived improvement will also be small.
He goes on to say that “In order to conclude that an energy drink – or any other treatment – is effective, you must compare a group of patients who receive this treatment to a “control group” that receives no treatment (or, better, receives a placebo). The control group is expected to improve by regression alone, and the aim of the experiment is to determine whether the treated patients improve more than regression can explain.”
But if the regression is high, as for example in treating infections with antibiotics, then it can be said that antibiotics are effective even with poor infection tests and even without control groups. If you had a treatment for cancer that was 95% effective in reducing the size of tumors to undetectable levels, you wouldn’t need a control group or a very sensitive test to conclude that the treatment is effective. In such a case there is a causal explanation for the regression beyond the fact that cancer patients or infected people are an extreme group. Conversely, hugging cats might show some improvement in depressed children, but no one would consider it and effective treatment, even if the improvement is considered statistically significant, whatever that means.
Steve Plager
plager@cox.net
It’s not true that you can do without a control group when testing a very effective treatment. It’s just that you can afford to be sloppier about how you choose the control group and evaluate the outcome. For example, lots of people think it’s a good idea to take antibiotics for colds or flu. Nearly all of these people improve after taking the medicine. And yet, we know on biological grounds that the treatment is ineffective. In the absence of a good theoretical understanding, one could demonstrate this empirically by comparing the group receiving antibiotics to the group receiving a placebo. Since nearly 100% of both groups recover, we infer that the antibiotics aren’t doing anything. On the other hand, if you treat patients with syphilis and find that nearly all of them recover, you know it’s not just regression of their symptoms to the mean, because spontaneous recovery from syphilis is rare. (You may be using past experience with the disease as a control, rather than having an explicit control group as part of the experiment.) On the other hand, there’s a question of what we mean by “recovery”. Recovery from acute symptoms is exactly the sort of thing that happens with or without treatment, so that comparison with a well-chosen control group is absolutely necessary to confirm the antibiotic’s effectiveness.
You wrote, “On the other hand, if you treat patients with syphilis and find that nearly all of them recover, you know it’s not just regression of their symptoms to the mean, because spontaneous recovery from syphilis is rare.” That’s precisely what I meant with my cancer example. Recovery from cancer is rare so a very effective drug does not need a control group if the regression is very high. But mostly I wanted your opinion about cases where regression to the mean is low. Do you really need a control group to establish that hugging cats is not an effective treatment for depressed children, as DK suggests? If that is the case, we would need control groups to establish the efficacy of anything, including the accuracy of astrological predictions.
On another note, you wrote:
“Let’s consider an analogous situation: A hospital has five general surgeons on staff. DK ranks them each year according to the survival rate of their patients for coronary bypass surgery, and discovers that there is no correlation between a surgeon’s rankings in different years. Over 10 years the rankings look to be completely random. DK concludes that there is no skill involved in coronary bypass surgery. Someone less brilliant than DK might conclude, instead, that the surgeons are all equally skillful.”
I don’t follow your reasoning. DK writes: “The persistence of individual differences is the measure by which we confirm the existence of skill.” I think that if a surgeon’s mortality rate varied significantly form one year to the next relative to some credible benchmark, it could be argued that the surgeon’s procedures are questionable, (assuming that his patients do not represent special risks.)
In fact, that is exactly how in 1994, Dr. Randas Batista, a Brazilian heart surgeon’s novel procedure was discredited. His procedure could not pass the persistence of performance test. This did not mean that he was unskillful as a surgeon, only that his procedure was ineffective. The skill of a financial advisor that DK talks about does not refer to his ability to read a corporation’s annual report or his understanding of financial data, rather his point is that compared to a benchmark, there is no evidence that whatever method he uses to invest his client’s money produce results that are significantly better than picking stocks at random.
Sincerely,
Steve Plager
Phoenix, Arizona
As I said before, it’s not that there is no control group. You are implicitly using earlier (untreated) cancer patients as a control group. If an effect is very strong, it is plausible that you may be able to dispense with the subtleties of designing a perfectly matched control group.
Do we need a control group to establish that something as ridiculous as rubbing mould in a wound will prevent infection? What about injecting them with killed viruses? Sounds pretty far-fetched. As it happens, controlled studies have found that therapy with animals can be effective as a treatment for depression. See, for example, “Do Animal-Assisted Activities Effectively Treat Depression? A Meta-Analysis” Authors: Souter, Megan A.; Miller, Michelle D. Source: Anthrozoos: A Multidisciplinary Journal of The Interactions of People & Animals, Volume 20, Number 2, June 2007, pp. 167-180(14). (I haven’t actually read this paper, so I’m not vouching for the quality of the work. Just that, far from being patently ridiculous, this is an active area of research, and controlled studies have a role to play.)
If you test 100 random stupid pseudo-treatments for significant effects at the 0.05 level, then by definition you’ll find about 5 of them pass the test. But if you start with hypotheses that initially seem plausible for some coherent scientific reason, most of them will fail at this hurdle.
Again, it’s a matter of finding the appropriate control group. DK did not compare these financial advisors to others who were picking stocks purely by chance. He was comparing them to other highly qualified stock-pickers. What he found was that they were all more or less equally good, but not that they were all no good. It’s true that the choice of who gets the big bonus for the best performance was random, but it’s not implausible — at least, not disproved by this information — to suppose that in the absence of the competition for the big bonus they all would have performed worse.
As it happens, controlled studies have found that therapy with animals can be effective as a treatment for depression. See, for example, “Do Animal-Assisted Activities Effectively Treat Depression? A Meta-Analysis”
I bet this was a case of low regression, so it comes down to the meaning of the term “effective.” Assuming the study was of high quality, the improvement may have been “statistically significant.” But most lay people don’t understand the meaning of that term. To them effective means very effective. You may argue that some improvement is better than none, but that is one of the major issues with health care reform. We have too many expensive low efficacy treatments, especially during the last year of life. The “cancer treatments centers” we see advertised on TV these days remind me of the sanitariums to treat tuberculosis before the discovery of penicillin. They kept the windows opened to bring fresh air because it was believed it helped patients heal. The treatments they offered were generally not very effective but they sure were expensive.
“Again, it’s a matter of finding the appropriate control group. DK did not compare these financial advisors to others who were picking stocks purely by chance.”
If I remember correctly, he did not compare financial advisors to each other, but to themselves. He was looking for persistence of outcome. He computed 28 correlation coefficients for each pair of years over an 8 year interval. This would be similar to computing the correlation coefficients of mortality rates of a heart surgeon’s patients for pairs of years over some interval, say 10 years, where you would expect to find a high correlation. But what he found resembled a game of chance, not a game of skill. DK also writes that “…the evidence from more than fifty years of research is conclusive: for a large majority of fund managers, the selection of stocks is more like rolling dice than like playing poker. Typically at least two out of every three mutual funds underperform the overall market in any given year.” The control group in this case would be an index fund or a computer randomly selecting stocks.