Page 9 of The Code Book

Page List

Similarly, whole words will be deciphered in different ways: the word the, for example, could be enciphered as DPR, BUK, GNO or ZRM, depending on its position relative to the keyword. Although this makes cryptanalysis difficult, it is not impossible. The important point to note is that if there are only four ways to encipher the word the, and the original message contains several instances of the word the, then it is highly likely that some of the four possible encipherments will be repeated in the ciphertext. This is demonstrated in the following example, in which the line The Sun and the Man in the Moon has been enciphered using the Vigenère cipher and the keyword KING.

The word the is enciphered as DPR in the first instance, and then as BUK on the second and third occasions. The reason for the repetition of BUK is that the second the is displaced by eight letters with respect to the third the, and eight is a multiple of the length of the keyword, which is four letters long. In other words, the second the was enciphered according to its relationship to the keyword (the is directly below ING), and by the time we reach the third the, the keyword has cycled around exactly twice, to repeat the relationship, and hence repeat the encipherment.

Babbage realized that this sort of repetition provided him with exactly the foothold he needed in order to conquer the Vigenère cipher. He was able to define a series of relatively simple steps which could be followed by any cryptanalyst to crack the hitherto chiffre indéchiffrable. To demonstrate his brilliant technique, let us imagine that we have intercepted the ciphertext shown in Figure 13. We know that it was enciphered using the Vigenère cipher, but we know nothing about the original message, and the keyword is a mystery.

The first stage in Babbage’s cryptanalysis is to look for sequences of letters that appear more than once in the ciphertext. There are two ways that such repetitions could arise. The most likely is that the same sequence of letters in the plaintext has been enciphered using the same part of the key. Alternatively, there is a slight possibility that two different sequences of letters in the plaintext have been enciphered using different parts of the key, coincidentally leading to the identical sequence in the ciphertext. If we restrict ourselves to long sequences, then we largely discount the second possibility, and, in this case, we shall consider repeated sequences only if they are of four letters or more. Table 8 is a log of such repetitions, along with the spacing between the repetition. For example, the sequence E-F-I-Q appears in the first line of the ciphertext and then in the fifth line, shifted forward by 95 letters.

As well as being used to encipher the plaintext into ciphertext, the keyword is also used by the receiver to decipher the ciphertext back into plaintext. Hence, if we could identify the keyword, deciphering the text would be easy. At this stage we do not have enough information to work out the keyword, but Table 8 does provide some very good clues as to its length. Having listed which sequences repeat themselves and the spacing between these repetitions, the rest of the table is given over to identifying the factors of the spacing—the numbers that will divide into the spacing.

Figure 13 The ciphertext, enciphered using the Vigenère cipher.

For example, the sequence W-C-X-Y-M repeats itself after 20 letters, and the numbers 1, 2, 4, 5, 10 and 20 are factors, because they divide perfectly into 20 without leaving a remainder. These factors suggest six possibilities:

(1) The key is 1 letter long and is recycled 20 times between encryptions.

(2) The key is 2 letters long and is recycled 10 times between encryptions.

(3) The key is 4 letters long and is recycled 5 times between encryptions.

(4) The key is 5 letters long and is recycled 4 times between encryptions.

(5) The key is 10 letters long and is recycled 2 times between encryptions.

(6) The key is 20 letters long and is recycled 1 time between encryptions.

The first possibility can be excluded, because a key that is only 1 letter long gives rise to a monoalphabetic cipher—only one row of the Vigenère square would be used for the entire encryption, and the cipher alphabet would remain unchanged; it is unlikely that a cryptographer would do this. To indicate each of the other possibilities, a ✓ is placed in the appropriate column of Table 8. Each ✓ indicates a potential key length.

To identify whether the key is 2, 4, 5, 10 or 20 letters long, we need to look at the factors of all the other spacings. Because the keyword seems to be 20 letters or smaller, Table 8 lists those factors that are 20 or smaller for each of the other spacings. There is a clear propensity for a spacing divisible by 5. In fact, every spacing is divisible by 5. The first repeated sequence, E-F-I-Q, can be explained by a keyword of length 5 recycled nineteen times between the first and second encryptions. The second repeated sequence, P-S-D-L-P, can be explained by a keyword of length 5 recycled just once between the first and second encryptions. The third repeated sequence, W-C-X-Y-M, can be explained by a keyword of length 5 recycled four times between the first and second encryptions. The fourth repeated sequence, E-T-R-L, can be explained by a keyword of length 5 recycled twenty-four times between the first and second encryptions. In short, everything is consistent with a five-letter keyword.

Table 8 Repetitions and spacings in the ciphertext.

Assuming that the keyword is indeed 5 letters long, the next step is to work out the actual letters of the keyword. For the time being, let us call the keyword L1-L2-L3-L4-L5, such that L1 represents the first letter of the keyword, and so on. The process of encipherment would have begun with enciphering the first letter of the plaintext according to the first letter of the keyword, L1. The letter L1 defines one row of the Vigenère square, and effectively provides a monoalphabetic substitution cipher alphabet for the first letter of the plaintext. However, when it comes to encrypting the second letter of the plaintext, the cryptographer would have used L2 to define a different row of the Vigenère square, effectively providing a different monoalphabetic substitution cipher alphabet. The third letter of plaintext would be encrypted according to L3, the fourth according to L4, and the fifth according to L5. Each letter of the keyword is providing a different cipher alphabet for encryption. However, the sixth letter of the plaintext would once again be encrypted according to L1, the seventh letter of the plaintext would once again be encrypted according to L2, and the cycle repeats itself thereafter. In other words, the polyalphabetic cipher consists of five monoalphabetic ciphers, each monoalphabetic cipher is responsible for encrypting one-fifth of the entire message, and, most importantly, we already know how to cryptanalyze monoalphabetic ciphers.

We proceed as follows. We know that one of the rows of the Vigenère square, defined by L1, provided the cipher alphabet to encrypt the 1st, 6th, 11th, 16th, … letters of the message. Hence, if we look at the 1st, 6th, 11th, 16th, … letters of the ciphertext, we should be able to use old-fashioned frequency analysis to work out the cipher alphabet in question. Figure 14 shows the frequency distribution of the letters that appear in the 1st, 6th, 11th, 16th, … positions of the ciphertext, which are W, I, R, E,.… At this point, remember that each cipher alphabet in the Vigenère square is simply a standard alphabet shifted by a value between 1 and 26. Hence, the frequency distribution in Figure 14 should have similar features to the frequency distribution of a standard alphabet, except that it will have been shifted by some distance. By comparing the L1 distribution with the standard distribution, it should be possible to work out the shift. Figure 15 shows the standard frequency distribution for a piece of English plaintext.

The standard distribution has peaks, plateaus and valleys, and to match it with the L1 cipher distribution we look for the most outstanding combination of features. For example, the three spikes at R-S-T in the standard distribution (Figure 15) and the long depression to its right that stretches across six letters from U to Z together form a very distinctive pair of features. The only similar features in the L1 distribution (Figure 14) are the three spikes at V-W-X, followed by the depression stretching six letters from Y to D. This would sug
gest that all the letters encrypted according to L1 have been shifted four places, or that L1 defines a cipher alphabet which begins E, F, G, H,.… In turn, this means that the first letter of the keyword, L1, is probably E. This hypothesis can be tested by shifting the L1 distribution back four letters and comparing it with the standard distribution. Figure 16 shows both distributions for comparison. The match between the major peaks is very strong, implying that it is safe to assume that the keyword does indeed begin with E.

Figure 14 Frequency distribution for letters in the ciphertext encrypted using the L1 cipher alphabet (number of occurrences).

Figure 15 Standard frequency distribution (number of occurrences based on a piece of plaintext containing the same number of letters as in the ciphertext).

Figure 16 The L1 distribution shifted back four letters (top), compared with the standard frequency distribution (bottom). All major peaks and troughs match.

To summarize, searching for repetitions in the ciphertext has allowed us to identify the length of the keyword, which turned out to be five letters long. This allowed us to split the ciphertext into five parts, each one enciphered according to a monoalphabetic substitution as defined by one letter of the keyword. By analyzing the fraction of the ciphertext that was enciphered according to the first letter of the keyword, we have been able to show that this letter, L1, is probably E. This process is repeated in order to identify the second letter of the keyword. A frequency distribution is established for the 2nd, 7th, 12th, 17th,… letters in the ciphertext. Again, the resulting distribution, shown in Figure 17, is compared with the standard distribution in order to deduce the shift.

This distribution is harder to analyze. There are no obvious candidates for the three neighboring peaks that correspond to R-S -T. However, the depression that stretches from G to L is very distinct, and probably corresponds to the depression we expect to see stretching from U to Z in the standard distribution. If this were the case, we would expect the three R-S-T peaks to appear at D, E and F, but the peak at E is missing. For the time being, we shall dismiss the missing peak as a statistical glitch, and go with our initial reaction, which is that the depression from G to L is a recognizably shifted feature. This would suggest that all the letters encrypted according to L2 have been shifted twelve places, or that L2 defines a cipher alphabet which begins M, N, O, P,… and that the second letter of the keyword, L2, is M. Once again, this hypothesis can be tested by shifting the L2 distribution back twelve letters and comparing it with the standard distribution. Figure 18 shows both distributions, and the match between the major peaks is very strong, implying that it is safe to assume that the second letter of the keyword is indeed M.

Figure 17 Frequency distribution for letters in the ciphertext encrypted using the L2 cipher alphabet (number of occurrences).

Figure 18 The L2 distribution shifted back twelve letters (top), compared with the standard frequency distribution (bottom). Most major peaks and troughs match.

I shall not continue the analysis; suffice to say that analyzing the 3rd, 8th, 13th, … letters implies that the third letter of the keyword is I, analyzing the 4th, 9th, 14th, … letters implies that the fourth letter is L, and analyzing the 5th, 10th, 15th, … letters implies that the fifth letter is Y. The keyword is EMILY. It is now possible to reverse the Vigenère cipher and complete the cryptanalysis. The first letter of the ciphertext is W, and it was encrypted according to the first letter of the keyword, E. Working backward, we look at the Vigenère square, and find W in the row beginning with E, and then we find which letter is at the top of that column. The letter is s, which must make it the first letter of the plaintext. By repeating this process, we see that the plaintext begins sittheedownandhavenoshamecheekbyjowl.… By inserting suitable word-breaks and punctuation, we eventually get:

Sit thee down, and have no shame,

Cheek by jowl, and knee by knee:

What care I for any name?

What for order or degree?

Let me screw thee up a peg:

Let me loose thy tongue with wine:

Callest thou that thing a leg?

Which is thinnest? thine or mine?

Thou shalt not be saved by works:

Thou hast been a sinner too:

Ruined trunks on withered forks,

Empty scarecrows, I and you!

Fill the cup, and fill the can:

Have a rouse before the morn:

Every moment dies a man,

Every moment one is born.

These are verses from a poem by Alfred Tennyson entitled “The Vision of Sin.” The keyword happens to be the first name of Tennyson’s wife, Emily Sellwood. I chose to use a section from this particular poem as an example for cryptanalysis because it inspired some curious correspondence between Babbage and the great poet. Being a keen statistician and compiler of mortality tables, Babbage was irritated by the lines “Every moment dies a man, Every moment one is born,” which are the last lines of the plaintext above. Consequently, he offered a correction to Tennyson’s “otherwise beautiful” poem:

It must be manifest that if this were true, the population of the world would be at a standstill … I would suggest that in the next edition of your poem you have it read—“Every moment dies a man, Every moment 1 is born.” … The actual figure is so long I cannot get it onto a line, but I believe the figure 1 will be sufficiently accurate for poetry.

I am, Sir, yours, etc.,

Charles Babbage.

Babbage’s successful cryptanalysis of the Vigenère cipher was probably achieved in 1854, soon after his spat with Thwaites, but his discovery went completely unrecognized because he never published it. The discovery came to light only in the twentieth century, when scholars examined Babbage’s extensive notes. In the meantime, his technique was independently discovered by Friedrich Wilhelm Kasiski, a retired officer in the Prussian army. Ever since 1863, when he published his cryptanalytic breakthrough in Die Geheimschriften und die Dechiffrir-kunst (“Secret Writing and the Art of Deciphering”), the technique has been known as the Kasiski Test, and Babbage’s contribution has been largely ignored.

And why did Babbage fail to publicize his cracking of such a vital cipher? He certainly had a habit of not finishing projects and not publishing his discoveries, which might suggest that this is just one more example of his lackadaisical attitude. However, there is an alternative explanation. His discovery occurred soon after the outbreak of the Crimean War, and one theory is that it gave the British a clear advantage over their Russian enemy. It is quite possible that British Intelligence demanded that Babbage keep his work secret, thus providing them with a nine-year head start over the rest of the world. If this was the case, then it would fit in with the long-standing tradition of hushing up codebreaking achievements in the interests of national security, a practice that has continued into the twentieth century.

From Agony Columns to Buried Treasure

Thanks to the breakthroughs by Charles Babbage and Friedrich Kasiski, the Vigenère cipher was no longer secure. Cryptographers could no longer guarantee secrecy, now that cryptanalysts had fought back to regain control in the communications war. Although cryptographers attempted to design new ciphers, nothing of great significance emerged during the latter half of the nineteenth century, and professional cryptography was in disarray. However, this same period witnessed an enormous growth of interest in ciphers among the general public.

The development of the telegraph, which had driven a commercial interest in cryptography, was also responsible for generating public interest in cryptography. The public became aware of the need to protect personal messages of a highly sensitive nature, and if necessary they would use encryption, even though this took more time to send, thus adding to the cost of the telegram. Morse operators could send plain English at speeds of up to 35 words per minute because they could memorize entire phrases and transmit them in a single burst, whereas the jumble of letters that make up a ciphertext was considerably sl
ower to transmit, because the operator had to continually refer back to the sender’s written message to check the sequence of letters. The ciphers used by the general public would not have withstood attack by a professional cryptanalyst, but they were sufficient to guard against the casual snooper.

As people became comfortable with encipherment, they began to express their cryptographic skills in a variety of ways. For example, young lovers in Victorian England were often forbidden from publicly expressing their affection, and could not even communicate by letter in case their parents intercepted and read the contents. This resulted in lovers sending encrypted messages to each other via the personal columns of newspapers. These “agony columns,” as they became known, provoked the curiosity of cryptanalysts, who would scan the notes and try to decipher their titillating contents. Charles Babbage is known to have indulged in this activity, along with his friends Sir Charles Wheatstone and Baron Lyon Playfair, who together were responsible for developing the deft Playfair cipher (described in Appendix E). On one occasion, Wheatstone deciphered a note in The Times from an Oxford student, suggesting to his true love that they elope. A few days later, Wheatstone inserted his own message, encrypted in the same cipher, advising the couple against this rebellious and rash action. Shortly afterward there appeared a third message, this time unencrypted and from the lady in question: “Dear Charlie, Write no more. Our cipher is discovered.”

In due course a wider variety of encrypted notes appeared in the newspapers. Cryptographers began to insert blocks of ciphertext merely to challenge their colleagues. On other occasions, encrypted notes were used to criticize public figures or organizations. The Times once unwittingly carried the following encrypted notice: “The Times is the Jeffreys of the press.” The newspaper was being likened to the notorious seventeenth-century Judge Jeffreys, implying that it was a ruthless, bullying publication which acted as a mouthpiece for the government.

Previous Page Next Page

Simon Singh's Novels

»The Code Bookby Simon Singh
»The Simpsons and Their Mathematical Secretsby Simon Singh
»Fermat's Last Theoremby Simon Singh
»Fermat''s Last Theoremby Simon Singh