The Mismeasure of Man
Again, the common-sense interpretation of numerous zeros suggests that many men didn’t understand the instructions and that the tests were invalid on that account. Buried throughout Yerkes’s monograph are numerous statements proving that testers worried greatly about the high frequency of zeros and, in the midst of giving the tests, tended to interpret zeros in this common-sense fashion. They eliminated some tests from the Beta repertoire (p. 372) because they produced up to 30.7 percent zero scores (although some Alpha tests with a higher frequency of zeros were retained). They reduced the difficulty of initial items in several tests “in order to reduce the number of zero scores” (p. 341). They included among the criteria for acceptance of a test within the Beta repertoire (p. 373): “ease of demonstration, as shown by low percentage of zero scores.” They acknowledged several times that a high frequency of zeros reflected poor explanation, not stupidity of the recruits: “The large number of zero scores, even with officers, indicates that the instructions were unsatisfactory” (p. 340). “The main burden of the early reports was to the effect that the most difficult task was ‘getting the idea across.’ A high percentage of zero scores in any given test was considered an indication of failure to ‘get that test across’” (p. 379).
5.7a, 5.7b Zero was by far the most common value in several of the Alpha tests.
With all these acknowledgments, one might have anticipated Boring’s decision either to exclude zeros from the summary statistics or to correct for them by assuming that most recruits would have scored some points if they had understood what they were supposed to do. Instead, Boring “corrected” zero scores in the opposite way, and actually demoted many of them into a negative range.
Boring began with the same hereditarian assumption that invalidated all the results: that the tests, by definition, measure innate intelligence. The clump of zeros must therefore be made up of men who were too stupid to do any items. Is it fair to give them all zero? After all, some must have been just barely too stupid, and their zero is a fair score. But other dullards must have been rescued from an even worse fate by the minimum of zero. They would have done even more poorly if the test had included enough easy items to make distinctions among the zero scores. Boring distinguished between a true “mathematical zero,” an intrinsic minimum that cannot logically go lower, and a “psychological zero,” an arbitrary beginning defined by a particular test. (As a general statement, Boring makes a sound point. In the particular context of the army tests, it is absurd):
A score of zero, therefore, does not mean no ability at all; it does not mean the point of discontinuance of the thing measured; it means the point of discontinuance of the instrument of measurement, the test.… The individual who fails to earn a positive score and is marked zero is actually thereby given a bonus varying in value directly with his stupidity (p. 622).
Boring therefore “corrected” each zero score by calibrating it against other tests in the series on which the same man had scored some points. If he had scored well on other tests, he was not doubly penalized for his zeros; if he had done poorly, then his zeros were converted to negative scores.
By this method, a debilitating flaw in Yerkes’s basic procedure was accentuated by tacking an additional bias onto it. The zeros only indicated that, for a suite of reasons unrelated to intelligence, vast numbers of men did not understand what they were supposed to do. And Yerkes should have recognized this, for his own reports proved that, with reduced confusion and harassment, men who had scored zero on the group tests almost all managed to make points on the same or similar tests given in an individual examination. He writes (p. 406): “At Greenleaf it was found that the proportion of zero scores in the maze test was reduced from 28 percent in Beta to 2 percent in the performance scale, and that similarly zero scores in the digit-symbol test were reduced from 49 to 6 percent.”
Yet, when given an opportunity to correct this bias by ignoring or properly redistributing the zero scores, Yerkes’s statisticians did just the opposite. They exacted a double penalty by demoting most zero scores to a negative range.
FINAGLING THE SUMMARY STATISTICS: GETTING AROUND OBVIOUS CORRELATIONS WITH ENVIRONMENT
Yerkes’s monograph is a treasure-trove of information for anyone seeking environmental correlates of performance on “tests of intelligence.” Since Yerkes explicitly denied any substantial causal role to environment, and continued to insist that the tests measured innate intelligence, this claim may seem paradoxical. One might suspect that Yerkes, in his blindness, didn’t read his own information. The situation, in fact, is even more curious. Yerkes read very carefully; he puzzled over every one of his environmental correlations, and managed to explain each of them away with arguments that sometimes border on the ridiculous.
Minor items are reported and dispersed in a page or two. Yerkes found strong correlations between average score and infestation with hookworm in all 4 categories:
INFECTED NOT INFECTED
White Alpha
94.38 118.50
White Beta
45.38 53.26
Negro Alpha
34.86 40.82
Negro Beta
22.14 26.09
These results.might have led to the obvious admission that state of health, particularly in diseases related to poverty, has some effect upon the scores. Although Yerkes did not deny this possibility, he stressed another explanation (p. 811): “Low native ability may induce such conditions of living as to result in hookworm infection.”
In studying the distribution of scores by occupation, Yerkes conjectured that since intelligence brings its own reward, test scores should rise with expertise. He divided each job into apprentices, journeymen, and experts and searched for increasing scores between the groups. But he found no pattern. Instead of abandoning his hypothesis, he decided that his procedure for allocating men to the three categories must have been flawed (pp. 831–832):
It seems reasonable to suppose that a selection process goes on in industry which results in a selection of the mentally more alert for promotion from the apprentice stage to the journeyman stage and likewise from the journeyman stage to the expert. Those inferior mentally would stick at the lower levels of skill or be weeded out of the particular trade. On this hypothesis one begins to question the accuracy of the personnel interviewing procedure.
Among major patterns, Yerkes continually found relationships between intelligence and amount of schooling. He calculated a correlation coefficient of 0.75 between test score and years of education. Of 348 men who scored below the mean in Alpha, only I had ever attended college (as a dental student), 4 had graduated from high school, and only to had ever attended high school at all. Yet Yerkes did not conclude that more schooling leads to increasing scores per se; instead, he argued that men with more innate intelligence spend more time in school. “The theory that native intelligence is one of the most important conditioning factors in continuance in school is certainly borne out by this accumulation of data” (p. 780).
Yerkes noted the strongest correlation of scores with schooling in considering the differences between blacks and whites. He made a significant social observation, but gave it his usual innatist twist (p. 760):
The white draft of foreign birth is less schooled; more than half of this group have not gone beyond the fifth grade, while one-eighth, or 12.5 percent, report no schooling. Negro recruits though brought up in this country where elementary education is supposedly not only free but compulsory on all, report no schooling in astonishingly large proportion.
Failure of blacks to attend school, he argued, must reflect a disinclination based on low innate intelligence. Not a word about segregation (then officially sanctioned, if not mandated), poor conditions in black schools, or economic necessities for working among the impoverished. Yerkes acknowledged that schools might vary in quality, but he assumed that such an effect must be small and cited, as primary evidence for innate black stupidity, the lower scores of blacks when paired with whites who had
spent an equal number of years in school (p. 773):
The grade standards, of course, are not identical all over the country, especially as between schools for white and for negro children, so that “fourth-grade schooling” doubtless varies in meaning from group to group, but this variability certainly cannot account for the clear intelligence differences between groups.
The data that might have led Yerkes to change his mind (had he approached the study with any flexibility) lay tabulated, but unused, within his monograph. Yerkes had noted regional differences in black education. Half the black recruits from Southern states had not attended school beyond the third grade, but half had reached the fifth grade in Northern states (p. 760). In the North, 25 percent completed primary school; in the South, a mere 7 percent. Yerkes also noted (p. 734) that “the percentage of Alphas is very much smaller and, the percentage of Betas very much larger in the southern than in the northern group.” Many years later, Ashley Montagu (1945) studied the tabulations by state that Yerkes had provided. He confirmed Yerkes’s pattern: the average score on Alpha was 21.31 for blacks in thirteen Southern states, and 39.90 in nine Northern states. Montagu then noted that average black scores for the four highest Northern states (45.31) exceeded the white mean for nine Southern states (43.94). He found the same pattern for Beta, where blacks of six Northern states averaged 34.63, and whites of fourteen Southern states, 31.11. Hereditarians had their pat answer, as usual: only the best Negroes had been smart enough to move North. To people of good will and common sense an explanation in terms of educational quality has always seemed more reasonable, especially since Montagu also found such high correlations between a state’s expenditure for education and the average score of its recruits.
One other persistent correlation threatened Yerkes’s hereditarian convictions, and his rescuing argument became a major social weapon in later political campaigns for restricting immigration. Test scores had been tabulated by country of origin, and Yerkes noted the pattern so dear to the hearts of Nordic supremacists. He divided recruits by country of origin into English, Scandinavian, and Teutonic on one side, and Latin and Slavic on the other, and stated (p. 699): “the differences are considerable (an extreme range of practically two years mental age)”—favoring the Nordics, of course.
But Yerkes acknowledged a potential problem. Most Latins and Slavs had arrived recently and spoke English either poorly or not at all; the main wave of Teutonic immigration had passed long before. According to Yerkes’s protocol, it shouldn’t have mattered. Men who could not speak English suffered no penalty. They took Beta, a pictorial test that supposedly measured innate ability independent of literacy and language. Yet the data still showed an apparent penalty for unfamiliarity with English. Of white recruits who scored E in Alpha and therefore took Beta as well (pp. 382–383), speakers of English averaged 101.6 in Beta, while nonspeakers averaged only 77.8. On the individual performance scale, which eliminated the harassment and confusion of Beta, native and foreign-born recruits did not differ (p. 403). (But very few men were ever given these individual tests, and they did not affect national averages.) Yerkes had to admit (p. 395): “There are indications to the effect that individuals handicapped by language difficulty and illiteracy are penalized to an appreciable degree in Beta as compared with men not so handicapped.”
Another correlation was even more potentially disturbing. Yerkes found that average test scores for foreign-born recruits rose consistently with years of residence in America.
YEARS OF RESIDENCE AVERAGE MENTAL AGE
0-5
11.29
6-10
11.70
11-15
12.53
16-20
13.50
20-
13.74
Didn’t this indicate that familiarity with American ways, and not innate intelligence, regulated the differences in scores? Yerkes admitted the possibility, but held out strong hope for a hereditarian salvation (p. 704):
Apparently then the group that has been longer resident in this country does somewhat better* in intelligence examination. It is not possible to state whether the difference is caused by the better adaptation of the more thoroughly Americanized group to the situation of the examination or whether some other factor is operative. It might be, for instance, that the more intelligent immigrants succeed and therefore remain in this country, but this suggestion is weakened by the fact that so many successful immigrants do return to Europe. At best we can but leave for future decision the question as to whether the differences represent a real difference of intelligence or an artifact of the method of examination.
The Teutonic supremacists would soon supply that decision: recent immigration had drawn the dregs of Europe, lower-class Latins and Slavs. Immigrants of longer residence belonged predominandy to superior northern stocks. The correlation with years in America was an artifact of genetic status.
The army mental tests could have provided an impetus for social reform, since they documented that environmental disadvantages were robbing from millions of people an opportunity to develop their intellectual skills. Again and again, the data pointed to strong correlations between test scores and environment. Again and again, those who wrote and administered the tests invented tortuous, ad hoc explanations to preserve their hereditarian prejudices.
How powerful the hereditarian biases of Terman, Goddard, and Yerkes must have been to make them so blind to immediate circumstances! Terman seriously argued that good orphanages precluded any environmental cause of low IQ for children in them. Goddard tested confused and frightened immigrants who had just completed a grueling journey in steerage and thought he had captured innate intelligence. Yerkes badgered his recruits, obtained proof of confusion and harassment in their large mode of zero scores, and produced data on the inherent abilities of racial and national groups. One cannot attribute all these conclusions to some mysterious “temper of the times,” for contemporary critics saw through the nonsense as well. Even by standards of their own era, the American hereditarians were dogmatists. But their dogma wafted up on favorable currents into realms of general acceptance, with tragic consequences.
Political impact of the army data
CAN DEMOCRACY SURVIVE AN AVERAGE MENTAL AGE OF THIRTEEN?
Yerkes was troubled by his own figure of 13.08 as an average mental age for the white draft. It fitted his prejudices and the eugenical fears of prosperous old Americans, but it was too good to be true, or too low to be believed. Yerkes recognized that smarter folks had been excluded from the sample—officers who enlisted and “professional and business experts that were exempted from draft because essential to industrial activity in the war” (p. 785). But the obviously retarded and feeble-minded had also been culled before reaching Yerkes’s examiners, thereby balancing exclusions at the other end. The resulting average of 13 might be a bit low, but it could not be far wrong (p. 785).
Yerkes faced two possibilities. He could recognize the figure as absurd, and search his methods for the flaws that engendered such nonsense. He would not have had far to look, had he been so inclined, since three major biases all conspired to bring the average down to his implausible figure. First, the tests measured education and familiarity with American culture, not innate intelligence—and many recruits, whatever their intelligence, were both woefully deficient in education and either too new to America or too impoverished to have much appreciation for the exemplary accomplishments of Mr. Mathewson (including an e.r.a. of 1.14 in 1909). Second, Yerkes’s own stated protocol had not been followed. About two-thirds of the white sample took Alpha, and their high frequency of zero scores indicated that many should have been retested in Beta. But time and the indifference of the regular brass conspired against it, and many recruits were not reexamined. Finally, Boring’s treatment of zero values imposed an additional penalty on scores already (and artificially) too low.
Or Yerkes could accept the figure and remain a bit puzzled. He opted, of course, for the second
strategy:
We know now approximately from clinical experience the capacity and mental ability of a man of 13 years mental age. We have never heretofore supposed that the mental ability of this man was the average of the country or anywhere near it. A moron has been defined as anyone with a mental age from 7 to 12 years. If this definition is interpreted as meaning anyone with a mental age less than 13 years, as has recently been done, then almost half of the white draft (47.3 percent) would have been morons. Thus it appears that feeble-mindedness, as at present defined, is of much greater frequency of occurrence than had been originally supposed.
Yerkes’s colleagues were disturbed as well. Goddard, who had invented the moron, began to doubt his own creation: “We seem to be impaled on the horns of a dilemma: either half the population is feeble-minded; or 12 year mentality does not properly come within the limits of feeble-mindedness” (1919, p. 352). He also opted for Yerkes’s solution and sounded the warning cry for American democracy:
If it is ultimately found that the intelligence of the average man is 13—instead of 16—it will only confirm what some are beginning to suspect; viz., that the average man can manage his affairs with only a moderate degree of prudence, can earn only a very modest living, and is vastly better off when following directions than when trying to plan for himself. In other words, it will show that there is a fundamental reason for many of the conditions that we find in human society and further that much of our effort to change conditions is unintelligent because we have not understood the nature of the average man (1919, p. 236).
Unfortunate 13 became a formula figure among those who sought to contain movements for social welfare. After all, if the average man is scarcely better than a moron, then poverty is fundamentally biological in origin, and neither education nor better opportunities for employment can alleviate it. In a famous address, entitled “Is America safe for democracy?”, the chairman of Harvard’s psychology department stated (W. McDougall, quoted in Chase, 1977, p. 226):