The Better Angels of Our Nature: Why Violence Has Declined
By the same token, it’s mathematically possible for war both to be a Poisson process and to display cycles. In theory, Mars could oscillate, causing a war on 3 percent of his throws, then shifting to causing a war on 6 percent, and then going back again. In practice, it isn’t easy to distinguish cycles in a nonstationary Poisson process from illusory clusters in a stationary one. A few clusters could fool the eye into thinking that the whole system waxes and wanes (as in the so-called business cycle, which is really a sequence of unpredictable lurches in economic activity rather than a genuine cycle with a constant period). There are good statistical methods that can test for periodicities in time series data, but they work best when the span of time is much longer than the period of the cycles one is looking for, since that provides room for many of the putative cycles to fit. To be confident in the results, it also helps to have a second dataset in which to replicate the analysis, so that one isn’t fooled by the possibility of “overfitting” cycles to what are really random clusters in a particular dataset. Richardson examined a number of possible cycles for wars of magnitudes 3, 4, and 5 (the bigger wars were too sparse to allow a test), and found none. Other analysts have looked at longer datasets, and the literature contains sightings of cycles at 5, 15, 20, 24, 30, 50, 60, 120, and 200 years. With so many tenuous candidates, it is safer to conclude that war follows no meaningful cycle at all, and that is the conclusion endorsed by most quantitative historians of war.39 The sociologist Pitirim Sorokin, another pioneer of the statistical study of war, concluded, “History seems to be neither as monotonous and uninventive as the partisans of the strict periodicities and ‘iron laws’ and ‘universal uniformities’ think; nor so dull and mechanical as an engine, making the same number of revolutions in a unit of time.”40
Could the 20th-century Hemoclysm, then, have been some kind of fluke? Even to think that way seems like monstrous disrespect to the victims. But the statistics of deadly quarrels don’t force such an extreme conclusion. Randomness over long stretches of time can coexist with changing probabilities, and certainly some of the probabilities in the 1930s must have been different from those of other decades. The Nazi ideology that justified an invasion of Poland in order to acquire living space for the “racially superior” Aryans was a part of the same ideology that justified the annihilation of the “racially inferior” Jews. Militant nationalism was a common thread that ran through Germany, Italy, and Japan. There was also a common denominator of counter-Enlightenment utopianism behind the ideologies of Nazism and communism. And even if wars are randomly distributed over the long run, there can be an occasional exception. The occurrence of World War I, for example, presumably incremented the probability that a war like World War II in Europe would break out.
But statistical thinking, particularly an awareness of the cluster illusion, suggests that we are apt to exaggerate the narrative coherence of this history—to think that what did happen must have happened because of historical forces like cycles, crescendos, and collision courses. Even with all the probabilities in place, highly contingent events, which need not reoccur if we somehow could rewind the tape of history and play it again, may have been necessary to set off the wars with death tolls in the 6s and 7s on the magnitude scale.
Writing in 1999, White repeated a Frequently Asked Question of that year: “Who’s the most important person of the Twentieth Century?” His choice: Gavrilo Princip. Who the heck was Gavrilo Princip? He was the nineteen-year-old Serb nationalist who assassinated Archduke Franz Ferdinand of Austria-Hungary during a state visit to Bosnia, after a string of errors and accidents delivered the archduke to within shooting distance. White explains his choice:Here’s a man who single-handedly sets off a chain reaction which ultimately leads to the deaths of 80 million people.
Top that, Albert Einstein!
With just a couple of bullets, this terrorist starts the First World War, which destroys four monarchies, leading to a power vacuum filled by the Communists in Russia and the Nazis in Germany who then fight it out in a Second World War. . . .
Some people would minimize Princip’s importance by saying that a Great Power War was inevitable sooner or later given the tensions of the times, but I say that it was no more inevitable than, say, a war between NATO and the Warsaw Pact. Left unsparked, the Great War could have been avoided, and without it, there would have been no Lenin, no Hitler, no Eisenhower.41
Other historians who indulge in counterfactual scenarios, such as Richard Ned Lebow, have made similar arguments.42 As for World War II, the historian F. H. Hinsley wrote, “Historians are, rightly, nearly unanimous that . . . the causes of the Second World War were the personality and the aims of Adolf Hitler.” Keegan agrees: “Only one European really wanted war—Adolf Hitler.”43 The political scientist John Mueller concludes:These statements suggest that there was no momentum toward another world war in Europe, that historical conditions in no important way required that contest, and that the major nations of Europe were not on a collision course that was likely to lead to war. That is, had Adolf Hitler gone into art rather than politics, had he been gassed a bit more thoroughly by the British in the trenches in 1918, had he, rather than the man marching next to him, been gunned down in the Beer Hall Putsch of 1923, had he failed to survive the automobile crash he experienced in 1930, had he been denied the leadership position in Germany, or had he been removed from office at almost any time before September 1939 (and possibly even before May 1940), Europe’s greatest war would most probably never have taken place.44
So, too, the Nazi genocide. As we shall see in the next chapter, most historians of genocide agree with the title of a 1984 essay by the sociologist Milton Himmelfarb: “No Hitler, no Holocaust.”45
Probability is a matter of perspective. Viewed at sufficiently close range, individual events have determinate causes. Even a coin flip can be predicted from the starting conditions and the laws of physics, and a skilled magician can exploit those laws to throw heads every time.46 Yet when we zoom out to take a wide-angle view of a large number of these events, we are seeing the sum of a vast number of causes that sometimes cancel each other out and sometimes align in the same direction. The physicist and philosopher Henri Poincaré explained that we see the operation of chance in a deterministic world either when a large number of puny causes add up to a formidable effect, or when a small cause that escapes our notice determines a large effect that we cannot miss.47 In the case of organized violence, someone may want to start a war; he waits for the opportune moment, which may or may not come; his enemy decides to engage or retreat; bullets fly; bombs burst; people die. Every event may be determined by the laws of neuroscience and physics and physiology. But in the aggregate, the many causes that go into this matrix can sometimes be shuffled into extreme combinations. Together with whatever ideological, political, and social currents put the world at risk in the first half of the 20th century, those decades were also hit with a run of extremely bad luck.
Now to the money question: has the probability that a war will break out increased, decreased, or stayed the same over time? Richardson’s dataset is biased to show an increase. It begins just after the Napoleonic Wars, slicing off one of the most destructive wars in history at one end, and finishes just after World War II, snagging history’s most destructive war at the other. Richardson did not live to see the Long Peace that dominated the subsequent decades, but he was an astute enough mathematician to know that it was statistically possible, and he devised ingenious ways of testing for trends in a time series without being misled by extreme events at either end. The simplest was to separate the wars of different magnitudes and test for trends separately in each range. In none of the five ranges (3 to 7) did he find a significant trend. If anything, he found a slight decline. “There is a suggestion,” he wrote, “but not a conclusive proof, that mankind has become less warlike since A.D. 1820. The best available observations show a slight decrease in the number of wars with time.... But the distinction is not great enough t
o show plainly among chance variations.”48 Written at a time when the ashes of Europe and Asia were still warm, this is a testament to a great scientist’s willingness to lets facts and reason override casual impressions and conventional wisdom.
As we shall see, analyses of the frequency of war over time from other datasets point to the same conclusion.49 But the frequency of war is not the whole story; magnitude matters as well. One could be forgiven for pointing out that Richardson’s conjecture that mankind was getting less warlike depended on segregating the world wars into a micro-class of two, in which statistics are futile. His other analyses counted all wars alike, with World War II no different from, say, a 1952 revolution in Bolivia with a thousand deaths. Richardson’s son pointed out to him that if he divided his data into large and small wars, they seemed to show opposing trends: small wars were becoming considerably less frequent, but larger wars, while fewer in number, were becoming somewhat more frequent. A different way of putting it is that between 1820 and 1953 wars became less frequent but more lethal. Richardson tested the pattern of contrast and found that it was statistically significant.50 The next section will show that this too was an astute conclusion: other datasets confirm that until 1945, the story of war in Europe and among major nations in general was one of fewer but more damaging wars.
So does that mean that mankind got more warlike or less? There is no single answer, because “warlike” can refer to two different things. It can refer to how likely nations are to go to war, or it can refer to how many people are killed when they do. Imagine two rural counties with the same size population. One of them has a hundred teenage arsonists who delight in setting forest fires. But the forests are in isolated patches, so each fire dies out before doing much damage. The other county has just two arsonists, but its forests are connected, so that a small blaze is likely to spread, as they say, like wildfire. Which county has the worse forest fire problem? One could argue it either way. As far as the amount of reckless depravity is concerned, the first county is worse; as far as the risk of serious damage is concerned, the second is. Nor is it obvious which county will have the greater amount of overall damage, the one with a lot of little fires, or the one with a few big ones. To make sense of these questions, we have to turn from the statistics of time to the statistics of magnitude.
THE STATISTICS OF DEADLY QUARRELS, PART 2: THE MAGNITUDE OF WARS
Richardson made a second major discovery about the statistics of deadly quarrels. It emerged when he counted the number of quarrels of each magnitude—how many with death tolls in the thousands, how many in the tens of thousands, how many in the hundreds of thousands, and so on. It isn’t a complete surprise that there were lots of little wars and only a few big ones. What was a surprise was how neat the relationship turned out to be. When Richardson plotted the log of the number of quarrels of each magnitude against the log of the number of deaths per quarrel (that is, the magnitude itself), he ended up with a graph like figure 5–7.
Scientists are accustomed to seeing data fall into perfect straight lines when they come from hard sciences like physics, such as the volume of a gas plotted against its temperature. But not in their wildest dreams do they expect the messy data from history to be so well behaved. The data we are looking at come from a ragbag of deadly quarrels ranging from the greatest cataclysm in the history of humanity to a coup d’état in a banana republic, and from the dawn of the Industrial Revolution to the dawn of the computer age. The jaw drops when seeing this mélange of data fall onto a perfect diagonal.
Piles of data in which the log of the frequency of a certain kind of entity is proportional to the log of the size of that entity, so that a plot on log-log paper looks like a straight line, are called power-law distributions.51 The name comes from the fact that when you put away the logarithms and go back to the original numbers, the probability of an entity showing up in the data is proportional to the size of that entity raised to some power (which translates visually to the slope of the line in the log-log plot), plus a constant. In this case the power is–1.5, which means that with every tenfold jump in the death toll of a war, you can expect to find about a third as many of them. Richardson plotted murders (quarrels of magnitude 0) on the same graph as wars, noting that qualitatively they follow the overall pattern: they are much, much less damaging than the smallest wars and much, much more frequent. But as you can see from their lonely perch atop the vertical axis, high above the point where an extrapolation of the line for the wars would hit it, he was pushing his luck when he said that all deadly quarrels fell along a single continuum. Richardson gamely connected the murder point to the war line with a swoopy curve so that he could interpolate the numbers of quarrels with death tolls in the single digits, the tens, and the hundreds, which are missing from the historical record. (These are the skirmishes beneath the military horizon that fall in the crack between criminology and history.) But for now let’s ignore the murders and skirmishes and concentrate on the wars.
FIGURE 5–7. Number of deadly quarrels of different magnitudes, 1820–1952
Source: Graph adapted from Weiss, 1963, p. 103, based on data from Richardson, 1960, p. 149. The range 1820–1952 refers to the year a war ended.
Could Richardson just have been lucky with his sample? Fifty years later the political scientist Lars-Erik Cederman plotted a newer set of numbers in a major dataset of battle deaths from the Correlates of War Project, comprising ninety-seven interstate wars between 1820 and 1997 (figure 5–8).52 They too fall along a straight line in log-log coordinates. (Cederman plotted the data in a slightly different way, but that doesn’t matter for our purposes.)53
Scientists are intrigued by power-law distributions for two reasons.54 One is that the distribution keeps turning up in measurements of things that you would think have nothing in common. One of the first power-law distributions was discovered in the 1930s by the linguist G. K. Zipf when he plotted the frequencies of words in the English language.55 If you count up the instances of each of the words in a large corpus of text, you’ll find around a dozen that occur extremely frequently, that is, in more than 1 percent of all word tokens, including the (7 percent), be (4 percent), of (4 percent), and (3 percent), and a (2 percent).56 Around three thousand occur in the medium-frequency range centered on 1 in 10,000, such as confidence, junior, and afraid. Tens of thousands occur once every million words, including embitter, memorialize, and titular. And hundreds of thousands have frequencies far less than one in a million, like kankedort, apotropaic, and deliquesce.
FIGURE 5–8. Probabilities of wars of different magnitudes, 1820–1997
Source: Graph from Cederman, 2003, p. 136.
Another example of a power-law distribution was discovered in 1906 by the economist Vilfredo Pareto when he looked at the distribution of incomes in Italy: a handful of people were filthy rich, while a much larger number were dirt-poor. Since these discoveries, power-law distributions have also turned up, among other places, in the populations of cities, the commonness of names, the popularity of Web sites, the number of citations of scientific papers, the sales figures of books and musical recordings, the number of species in biological taxa, and the sizes of moon craters.57
The second remarkable thing about power-law distributions is that they look the same over a vast range of values. To understand why this is so striking, let’s compare power-law distributions to a more familiar distribution called the normal, Gaussian, or bell curve. With measurements like the heights of men or the speeds of cars on a freeway, most of the numbers pile up around an average, and they tail off in both directions, falling into a curve that looks like a bell.58 Figure 5–9 shows one for the heights of American males. There are lots of men around 5’10” tall, fewer who are 5’6” or 6’2”, not that many who are 5’0” or 6’8”, and no one who is shorter than 1’11” or taller than 8’11” (the two extremes in The Guinness Book of World Records). The ratio of the tallest man in the world to the shortest man in the world is 4.8, and you can bet
that you will never meet a man who is 20 feet tall.
FIGURE 5–9. Heights of males (a normal or bell-curve distribution)
Source: Graph from Newman, 2005, p. 324.
But with other kinds of entities, the measurements don’t heap up around a typical value, don’t fall off symmetrically in both directions, and don’t fit within a cozy range. The sizes of towns and cities is a good example. It’s hard to answer the question “How big is a typical American municipality?” New York has 8 million people; the smallest municipality that counts as a “town,” according to Guinness, is Duffield, Virginia, with only 52. The ratio of the largest municipality to the smallest is 150,000, which is very different from the less-than-fivefold variation in the heights of men.
Also, the distribution of sizes of municipalities isn’t curved like a bell. As the black line in figure 5–10 shows, it is L-shaped, with a tall spine on the left and a long tail on the right. In this graph, city populations are laid out along a conventional linear scale on the black horizontal axis: cities of 100,000, cities of 200,000, and so on. So are the proportions of cities of each population size on the black vertical axis: three-thousandths (3/1000, or 0.003) of a percent of American municipalities have a population of exactly 20,000, two thousandths of a percent have a population of 30,000, one thousandth of a percent have a population of 40,000, and so on, with smaller and smaller proportions having larger and larger populations.59 Now the gray axes at the top and the right of the graph stretch out these same numbers on a logarithmic scale, in which orders of magnitude (the number of zeroes) are evenly spaced, rather than the values themselves. The tick marks for population sizes are at ten thousand, a hundred thousand, a million, and so on. Likewise the proportions of cities at each population size are arranged along equal order-of-magnitude tick marks: one onehundredth (1/100, or 0.01) of a percent, one one-thousandth (1/1,000, or 0.001) of a percent, one ten-thousandth, and so on. When the axes are stretched out like this, something interesting happens to the distribution: the L straightens out into a nice line. And that is the signature of a power-law distribution.