I believe those tests were worth what the war cost, even in human life, if they served to show clearly to our people the lack of intelligence in our country, and the degrees of intelligence in different races who are coming to us, in a way which no one can say is the result of prejudice.… We have learned once and for all that the negro is not like us. So in regard to many races and subraces in Europe we learned that some which we had believed possessed of an order of intelligence perhaps superior to ours [read Jews] were far inferior.
Congressional debates leading to passage of the Immigration Restriction Act of 1924 frequently invoked the army data. Eugenicists lobbied not only for limits to immigration, but for changing its character by imposing harsh quotas against nations of inferior stock—a feature of the 1924 act that might never have been implemented, or even considered, without the army data and eugenicist propaganda. In short, southern and eastern Europeans, the Alpine and Mediterranean nations with minimal scores on the army tests, should be kept out. The eugenicists battled and won one of the greatest victories of scientific racism in American history. The first restriction act of 1921 had set yearly quotas at 3 percent of immigrants from any nation then resident in America. The 1924 act, following a barrage of eugenicist propaganda, reset the quotas at 2 percent of people from each nation recorded in the 1890 census. The 1890 figures were used until 1930. Why 1890 and not 1920 since the act was passed in 1924? 1890 marked a watershed in the history of immigration. Southern and eastern Europeans arrived in relatively small numbers before then, but began to predominate thereafter. Cynical, but effective. “America must be kept American,” proclaimed Calvin Coolidge as he signed the bill.
BRIGHAM RECANTS
Six years after his data had so materially affected the establishment of national quotas, Brigham had a profound change of heart. He recognized that a test score could not be reified as an entity inside a person’s head:
Most psychologists working in the test field have been guilty of a naming fallacy which easily enables them to slide mysteriously from the score in the test to the hypothetical faculty suggested by the name given to the test. Thus, they speak of sensory discrimination, perception, memory, intelligence, and the like while the reference is to a certain objective test situation (Brigham, 1930, p. 159).
In addition, Brigham now realized that the army data were worthless as measures of innate intelligence for two reasons. For each error, he apologized with an abjectness rarely encountered in scientific literature. First, he admitted that Alpha and Beta could not be combined into a single scale as he and Yerkes had done in producing averages for races and nations. The tests measured different things, and each was internally inconsistent in any case. Each nation was represented by a sample of recruits who had taken Alpha and Beta in differing proportions. Nations could not be compared at all (Brigham, 1930, p. 164):
As this method of amalgamating Alphas and Betas to produce a combined scale was used by the writer in his earlier analysis of the Army tests as applied to samples of foreign born in the draft, that study with its entire hypothetical superstructure of racial differences collapses completely.
Secondly, Brigham acknowledged that the tests had measured familiarity with American language and culture, not innate intelligence:
For purposes of comparing individuals or groups, it is apparent that tests in the vernacular must be used only with individuals having equal opportunity to acquire the vernacular of the test. This requirement precludes the use of such tests in making comparative studies of individuals brought up in homes in which the vernacular of the test is not used, or in which two vernaculars are used. The last condition is frequently violated here in studies of children born in this country whose parents speak another tongue. It is important, as the effects of bilinguaiism are not entirely known.… Comparative studies of various national and racial groups may not be made with existing tests.… One of the most pretentious of these comparative racial studies—the writer’s own—was without foundation (Brigham, 1930, p. 165).
Brigham paid his personal debt, but he could not undo what the tests had accomplished. The quotas stood, and slowed immigration from southern and eastern Europe to a trickle. Throughout the 1930s, Jewish refugees, anticipating the holocaust, sought to emigrate, but were not admitted. The legal quotas, and continuing eugenical propaganda, barred them even in years when inflated quotas for western and northern European nations were not filled. Chase (1977) has estimated that the quotas barred up to 6 million southern, central, and eastern Europeans between 1924 and the outbreak of World War II (assuming that immigration had continued at its pre-1924 rate). We know what happened to many who wished to leave but had nowhere to go. The paths to destruction are often indirect, but ideas can be agents as sure as guns and bombs.
SIX
The Real Error of Cyril Burt
Factor Analysis and the Reification of Intelligence
It has been the signal merit of the English school of psychology, from Sir Francis Galton onwards, that it has, by this very device of mathematical analysis, transformed the mental test from a discredited dodge of the charlatan into a recognized instrument of scientific precision.
—CYRIL BURT, 1921, p. 130
The case of Sir Cyril Burt
If I had any desire to lead a life of indolent ease, I would wish to be an identical twin, separated at birth from my brother and raised in a different social class. We could hire ourselves out to a host of social scientists and practically name our fee. For we would be exceedingly rare representatives of the only really adequate natural experiment for separating genetic from environmental effects in humans—genetically identical individuals raised in disparate environments.
Studies of identical twins raised apart should therefore hold pride of place in literature on the inheritance of IQ. And so it would be but for one problem—the extreme rarity of the animal itself. Few investigators have been able to rustle up more than twenty pairs of twins. Yet, amidst this paltriness, one study seemed to stand out: that of Sir Cyril Burt (1883–1971). Sir Cyril, doyen of mental testers, had pursued two sequential careers that gained him a preeminent role in directing both theory and practice in his field of educational psychology. For twenty years he was the official psychologist of the London County Council, responsible for the administration and interpretation of mental tests in London’s schools. He then succeeded Charles Spearman as professor in the most influential chair of psychology in Britain: University College, London (1932–1950). During his long retirement, Sir Cyril published several papers that buttressed the hereditarian claim by citing very high correlation between IQ scores of identical twins raised apart. Burt’s study stood out among all others because he had found fifty-three pairs, more than twice the total of any previous attempt. It is scarcely surprising that Arthur Jensen used Sir Cyril’s figures as the most important datum in his notorious article (1969) on supposedly inherited and ineradicable differences in intelligence between whites and blacks in America.
The story of Burt’s undoing is now more than a twice-told tale. Princeton psychologist Leon Kamin first noted that, while Burt had increased his sample of twins from fewer than twenty to more than fifty in a series of publications, the average correlation between pairs for IQ remained unchanged to the third decimal place—a statistical situation so unlikely that it matches our vernacular definition of impossible. Then, in 1976, Oliver Gillie, medical correspondent of the London Sunday Times, elevated the charge from inexcusable carelessness to conscious fakery. Gillie discovered, among many other things, that Burt’s two “collaborators,” a Margaret Howard and a J. Conway, the women who supposedly collected and processed his data, either never existed at all, or at least could not have been in contact with Burt while he wrote the papers bearing their names. These charges led to further reassessments of Burt’s “evidence” for his rigid hereditarian position. Indeed, other crucial studies were equally fraudulent, particularly his IQ correlations between close relatives (suspiciously too good
to be true and apparently constructed from ideal statistical distributions, rather than measured in nature—Dorfman, 1978), and his data for declining levels of intelligence in Britain.
Burt’s supporters tended at first to view the charges as a thinly veiled leftist plot to undo the hereditarian position by rhetoric. H.J. Eysenck wrote to Burt’s sister: “I think the whole affair is just a determined effort on the part of some very left-wing environmentalists determined to play a political game with scientific facts. I am sure the future will uphold the honor and integrity of Sir Cyril without any question.” Arthur Jensen, who had called Burt a “born nobleman” and “one of the world’s great psychologists,” had to conclude that the data on identical twins could not be trusted, though he attributed their inaccuracy to carelessness alone.
I think that the splendid “official” biography of Burt recently published by L. S. Hearnshaw (1979) has resolved the issue so far as the data permit (Hearnshaw was commissioned to write his book by Burt’s sister before any charges had been leveled). Hearnshaw, who began as an unqualified admirer of Burt and who tends to share his intellectual attitudes, eventually concluded that all allegations are true, and worse. And yet, Hearnshaw has convinced me that the very enormity and bizarreness of Burt’s fakery forces us to view it not as the “rational” program of a devious person trying to salvage his hereditarian dogma when he knew the game was up (my original suspicion, I confess), but as the actions of a sick and tortured man. (All this, of course, does not touch the deeper issue of why such patently manufactured data went unchallenged for so long, and what this will to believe implies about the basis of our hereditarian presuppositions.)
Hearnshaw believes that Burt began his fabrications in the early 1940s, and that his earlier work was honest, though marred by rigid a priori conviction and often inexcusably sloppy and superficial, even by the standards of his own time. Burt’s world began to collapse during the war, partly by his own doing to be sure. His research data perished in the blitz of London; his marriage failed; he was excluded from his own department when he refused to retire gracefully at the mandatory age and attempted to retain control; he was removed as editor of the journal he had founded, again after declining to cede control at the specified time he himself had set; his hereditarian dogma no longer matched the spirit of an age that had just witnessed the holocaust. In addition, Burt apparendy suffered from Ménierès disease, a disorder of the organs of balance, with frequent and negative consequences for personality as well.
Hearnshaw cites four instances of fraud in Burt’s later career. Three I have already mentioned (fabrication of data on identical twins, kinship correlations in IQ, and declining levels of intelligence in Britain). The fourth is, in many ways, the most bizarre tale of all because Burt’s claim was so absurd and his actions so patent and easy to uncover. It could not have been the act of a rational man. Burt attempted to commit an act of intellectual parricide by declaring himself, rather than his predecessor and mentor Charles Spearman, as the father of a technique called “factor analysis” in psychology. Spearman had essentially invented the technique in a celebrated paper of 1904. Burt never challenged this priority—in fact he constantly affirmed it—while Spearman held the chair that Burt would later occupy at University College. Indeed, in his famous book on factor analysis (1940), Burt states that “Spearman’s preeminence is acknowledged by every factorist” (1940, p. x).
Burt’s first attempt to rewrite history occurred while Spearman was still alive, and it elicited a sharp rejoinder from the occupant emeritus of Burt’s chair. Burt withdrew immediately and wrote a letter to Spearman that may be unmatched for deference and obsequiousness: “Surely you have a prior claim here.… I have been wondering where precisely I have gone astray. Would it be simplest for me to number my statements, then like my schoolmaster of old you can put a cross against the points where your pupil has blundered, and a tick where your view is correctly interpreted.”
But when Spearman died, Burt launched a campaign that “became increasingly unrestrained, obsessive and extravagant” (Hearnshaw, 1979) throughout the rest of his life. Hearnshaw notes (1979, pp. 286–287): “The whisperings against Spearman that were just audible in the late 1930’s swelled into a strident campaign of belittlement, which grew until Burt arrogated to himself the whole of Spearman’s fame. Indeed, Burt seemed to be becoming increasingly obsessed with questions of priority, and increasingly touchy and egotistical.” Burt’s false story was simple enough: Karl Pearson had invented the technique of factor analysis (or something close enough to it) in 1901, three years before Spearman’s paper. But Pearson had not applied it to psychological problems. Burt recognized its implications and brought the technique into studies of mental testing, making several crucial modifications and improvements along the way. The line, therefore, runs from Pearson to Burt. Spearman’s 1904 paper was merely a diversion.
Burt told his story again and again. He even told it through one of his many aliases in a letter he wrote to his own journal and signed Jacques Lafitte, an unknown French psychologist. With the exception of Voltaire and Binet, M. Lafitte cited only English sources and stated: “Surely the first formal and adequate statement was Karl Pearson’s demonstration of the method of principal axes in 1901.” Yet anyone could have exposed Burt’s story as fiction after an hour’s effort—for Burt never cited Pearson’s paper in any of his work before 1947, while all his earlier studies of factor analysis grant credit to Spearman and clearly display the derivative character of Burt’s methods.
Factor analysis must have been very important if Burt chose to center his quest for fame upon a rewrite of history that would make him its inventor. Yet, despite all the popular literature on IQ in the history of mental testing, virtually nothing has been written (outside professional circles) on the role, impact, and meaning of factor analysis. I suspect that the main reason for this neglect lies in the abstrusely mathematical nature of the technique. IQ, a linear scale first established as a rough, empirical measure, is easy to understand. Factor analysis, rooted in abstract statistical theory and based on the attempt to discover “underlying” structure in large matrices of data, is, to put it bluntly, a bitch. Yet this inattention to factor analysis is a serious omission for anyone who wishes to understand the history of mental testing in our century, and its continuing rationale today. For as Burt correctly noted (1914, p. 36), the history of mental testing contains two major and related strands: age-scale methods (Binet IQ testing), and correlational methods (factor analysis). Moreover, as Spearman continually stressed throughout his career, the theoretical justification for using a unilinear scale of IQ resides in factor analysis itself. Burt may have been perverse in his campaign, but he was right in his chosen tactic—a permanent and exalted niche in the pantheon of psychology lies reserved for the man who developed factor analysis.
I began my career in biology by using factor analysis to study the evolution of a group of fossil reptiles. I was taught the technique as though it had developed from first principles using pure logic. In fact, virtually all its procedures arose as justifications for particular theories of intelligence. Factor analysis, despite its status as pure deductive mathematics, was invented in a social context, and for definite reasons. And, though its mathematical basis is unassailable, its persistent use as a device for learning about the physical structure of intellect has been mired in deep conceptual errors from the start. The principal error, in fact, has involved a major theme of this book: reification—in this case, the notion that such a nebulous, socially defined concept as intelligence might be identified as a “thing” with a locus in the brain and a definite degree of heritability—and that it might be measured as a single number, thus permitting a unilinear ranking of people according to the amount of it they possess. By identifying a mathematical factor axis with a concept of “general intelligence,” Spearman and Burt provided a theoretical justification for the unilinear scale that Binet had proposed as a rough empirical guide.
> The intense debate about Cyril Burt’s work has focused exclusively on the fakery of his late career. This perspective has clouded Sir Cyril’s greater influence as the most powerful mental tester committed to a factor-analytic model of intelligence as a real and unitary “thing.” Burt’s commitment was rooted in the error of reification. Later fakery was the afterthought of a defeated man; his earlier, “honest” error has reverberated throughout our century and has affected millions of lives.
Correlation, cause, and factor analysis
Correlation and cause
The spirit of Plato dies hard. We have been unable to escape the philosophical tradition that what we can see and measure in the world is merely the superficial and imperfect representation of an underlying reality. Much of the fascination of statistics lies embedded in our gut feeling—and never trust a gut feeling—that abstract measures summarizing large tables of data must express something more real and fundamental than the data themselves. (Much professional training in statistics involves a conscious effort to counteract this gut feeling.) The technique of correlation has been particularly subject to such misuse because it seems to provide a path for inferences about causality (and indeed it does, sometimes—but only sometimes).
Correlation assesses the tendency of one measure to vary in concert with another. As a child grows, for example, both its arms and legs get longer; this joint tendency to change in the same direction is called a positive correlation. Not all parts of the body display such positive correlations during growth. Teeth, for example, do not grow after they erupt. The relationship between first incisor length and leg length from, say, age ten to adulthood would represent zero correlation—legs would get longer while teeth changed not at all. Other correlations can be negative—one measure increases while the other decreases. We begin to lose neurons at a distressingly early age, and they are not replaced. Thus, the relationship between leg length and number of neurons after mid-childhood represents negative correlation—leg length increases while number of neurons decreases. Notice that I have said nothing about causality. We do not know why these correlations exist or do not exist, only that they are present or not present.