Data Security 1 - Classical Ciphers
References for this Part
Jean-Philippe Aumasson. Serious Cryptography. No Starch Press, 2018
Peter Landrock, and Knud Nissen. Kryptologi - fra viden til videnskab. Abacus, 1997
Simon Singh. The Code Book. 4th Estate, 1999
Long Quote from (Aumasson, 2018)
Chapter 1 Encryption
Encryption is the principal application of cryptography; it makes data incomprehensible in order to ensure its confidentiality. Encryption uses an algorithm called a cipher and a secret value called the key; if you don’t know the secret key, you can’t decrypt, nor can you learn any bit of information on the encrypted message—and neither can any attacker.
This chapter will focus on symmetric encryption, which is the simplest kind of encryption. In symmetric encryption, the key used to decrypt is the same as the key used to encrypt (unlike asymmetric encryption, or public-key encryption, in which the key used to decrypt is different from the key used to encrypt). You’ll start by learning about the weakest forms of symmetric encryption, classical ciphers that are secure against only the most illiterate attacker, and then move on to the strongest forms that are secure forever.
The Basics
When we’re encrypting a message, plaintext refers to the unencrypted message and ciphertext to the encrypted message. A cipher is therefore composed of two functions: encryption turns a plaintext into a ciphertext, and decryption turns a ciphertext back into a plaintext. But we’ll often say “cipher” when we actually mean “encryption.” For example, Figure 1-1 shows a cipher, E, represented as a box taking as input a plaintext, P, and a key, K, and producing a ciphertext, C, as output. I’ll write this relation as C = E(K, P). Similarly, when the cipher is in decryption mode, I’ll write D(K, C).
For some ciphers, the ciphertext is the same size as the plaintext; for some others, the ciphertext is slightly longer. However, ciphertexts can never be shorter than plaintexts.
Classical Ciphers
Classical ciphers are ciphers that predate computers and therefore work on letters rather than on bits. They are much simpler than a modern cipher like DES—for example, in ancient Rome or during WWI, you couldn’t use a computer chip’s power to scramble a message, so you had to do everything with only pen and paper. There are many classical ciphers, but the most famous are the Caesar cipher and Vigenère cipher.
The Caesar Cipher
The Caesar cipher is so named because the Roman historian Suetonius reported that Julius Caesar used it. It encrypts a message by shifting each of the letters down three positions in the alphabet, wrapping back around to A if the shift reaches Z. For example, ZOO encrypts to CRR, FDHVDU decrypts to CAESAR, and so on, as [you may work out from figure 1.2]. There’s nothing special about the value 3; it’s just easier to compute in one’s head than 11 or 23. The Caesar cipher is super easy to break: to decrypt a given ciphertext, simply shift the letters three positions back to retrieve the plaintext. That said, the Caesar cipher may have been strong enough during the time of Crassus and Cicero. Because no secret key is involved (it’s always 3), users of Caesar’s cipher only had to assume that attackers were illiterate or too uneducated to figure it out—an assumption that’s much less realistic today. (In fact, in 2006, the Italian police arrested a mafia boss after decrypting messages written on small scraps of paper that were encrypted using a variant of the Caesar cipher: ABC was encrypted to 456 instead of DEF, for example.)
Could the Caesar cipher be made more secure? You can, for example, imagine a version that uses a secret shift value instead of always using 3, but that wouldn’t help much because an attacker could easily try all 25 possible shift values until the decrypted message makes sense.
Keys:
0 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
3 D E F G H I J K L M N O P Q R S T U V W X Y Z A B C
Figure 1-2. The Caesar Cipher [replaced by yours truly. The Caesar cipher is also known as a mono alphabetic cipher.]
Doing encryption with Caesars cipher (key = 3) you find the plaintext letters one by one in the top line of Figure 1.2, the encrypted, ciphertext, character is then found by going vertically down to the line corresponding to the key.
The plaintext word ASTERIX thus encrypts to ciphertext DVWHULA in this manner. Notice the wrap.
Doing decryption with Caesars cipher you find the ciphertext letters, again one by one, now in the line corresponding to the key. The letter decrypted to plaintext is then found in the top line of Figure 1.2. The ciphertext YL URPHUH HU WRVVHGH VLJHU REHOLA thus translates to VI ROMERE ER TOSSEDE SIGER OBELIX.
Some sources state that is common practice to represent cryptotext in five character blocks, so that YL URPHUH HU WRVVHGH VLJHU REHOLAZZ would be YLURP HUHHU WRVVH GHVLJ HUREH OLACC. It is not uncommon to add an agreed upon character as a filler to have the message have a length as a multiple of five.
Other sources state that is common practice to represent cryptotext in an unbroken string of characters, so that YL URPHUH HU WRVVHGH VLJHU REHOLA would be YLURPHUHHUWRVVHGHVLJHUREHOLA. There is no need for a filler in this case.
Now, continue the quote:
The Vigenère Cipher
It took about 1500 years to see a meaningful improvement of the Caesar cipher in the form of the Vigenère cipher, created in the 16th century by an Italian named Giovan Battista Bellaso. The name “Vigenère” comes from the Frenchman Blaise de Vigenère, who invented a different cipher in the 16th century, but due to historical misattribution, Vigenère’s name stuck. Nevertheless, the Vigenère cipher became popular and was later used during the American Civil War by Confederate forces and during WWI by the Swiss Army, among others.
The Vigenère cipher is similar to the Caesar cipher, except that letters aren’t shifted by three places but rather by values defined by a key, a collection of letters that represent numbers based on their position in the alphabet. For example, if the key is DUH, letters in the plaintext are shifted using the values 3, 20, 7 because D is three letters after A, U is 20 letters after A, and H is seven letters after A. The 3, 20, 7 pattern repeats until you’ve encrypted the entire plaintext. For example, the word CRYPTO would encrypt to FLFSNV using DUH as the key: C is shifted three positions to F, R is shifted 20 positions to L, and so on. Figure 1-3 illustrates this principle when encrypting the sentence THEY DRINK THE TEA.
The Vigenère cipher is clearly more secure than the Caesar cipher, yet it’s still fairly easy to break. The first step to breaking it is to figure out the key’s length. For example, take the example in Example 40.2, wherein THEY DRINK THE TEA encrypts to WBLBXYLHRWBLWYH with the key DUH. (Spaces are usually removed to hide word boundaries.) Notice that in the ciphertext WBLBXYLHRWBLWYH, the group of three letters WBL appears twice in the ciphertext at nine-letter intervals. This suggests that the same three-letter word was encrypted using the same shift values, producing WBL each time. A cryptanalyst can then deduce that the key’s length is either nine or a value divisible by nine (that is, three). Furthermore, they may guess that this repeated three- letter word is THE and therefore determine DUH as a possible encryption key.
The second step to breaking the Vigenère cipher is to determine the actual key using a method called frequency analysis, which exploits the uneven distribution of letters in languages. For example, in English, E is the most common letter, so if you find that X is the most common letter in a ciphertext, then the most likely plaintext value at this position is E.
Despite its relative weakness, the Vigenère cipher may have been good enough to securely encrypt messages when it was used. First, because the attack just outlined needs messages of at least a few sentences, it wouldn’t work if the cipher was used to encrypt only short messages. Second, most messages needed to be secret only for short periods of time, so it didn’t matter if ciphertexts were eventually decrypted by the enemy. (The 19th- century cryptographer Auguste Kerckhoffs estimated that most encrypted wartime messages required confidentiality for only three to four hours.)
Keys:
A A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
D D E F G H I J K L M N O P Q R S T U V W X Y Z A B C
H H I J K L M N O P Q R S T U V W X Y Z A B C D E F G
U U V W X Y Z A B C D E F G H I J K L M N O P Q R S T
Figure 1-3: The Vigenére Cipher [replaced by yours truly. The Vigenére cipher is also known as poly alphabetic cipher.]
…
How Ciphers Work
Based on simplistic ciphers like the Caesar and Vigenère ciphers, we can try to abstract out the workings of a cipher, first by identifying its two main components: a permutation and a mode of operation. A permutation is a function that transforms an item (in cryptography, a letter or a group of bits) such that each item has a unique inverse
(for example, the Caesar cipher’s three-letter shift). A mode of operation is an algorithm that uses a permutation to process messages of arbitrary size. The mode of the Caesar cipher is trivial: it just repeats the same permutation for each letter, but as you’ve seen, the Vigenère cipher has a more complex mode, where letters at different positions undergo different permutations.
Frequency Analysis
A simple method from cryptanalysis is frequency analysis. The method is calculating the relative frequency of the letters in a ciphertext. Then you must compare that to the theoretical relative frequency of the letters in the language in which you assume the ciphertext is written.
Then you test decrypting. Iterate until the result is meaningful.
Inline Exercise
Given the message
|
|
And given the hint that the text is in Danish, try to decrypt it by means of frequency analysis.
Hint 1: Look up Danish letter and word frequencies including digrams and trigrams.
Hint 2: Try the character counting too. Supposedly the best character counter.
Hint 3: Wiktionary:Frequency lists/Danish wordlist.
Hint 4: Dansk Center for Mumletekst.
Hint 5: https://www.dsn.dk/nyt/nyt-fra-sprognaevnet/numre/argang-1968-1984/marts-1970-pdf.
Hint 6: Relative frequencies of letters in other languages in the Wikipedia article Letter Frequency.
Hint 7: Danish Letter Frequencies.
Have a go right now!
One Time Pads - Unbreakable Encryption
Just because it proves that encryption may be unbreakable:
Wikipedia has an article at One-time pad
In cryptography, the one-time pad (OTP) is an encryption technique that cannot be cracked, but requires the use of a single-use pre-shared key that is larger than or equal to the size of the message being sent. In this technique, a plaintext is paired with a random secret key (also referred to as a one-time pad). Then, each bit or character of the plaintext is encrypted by combining it with the corresponding bit or character from the pad using modular addition.[1]
The resulting ciphertext will be impossible to decrypt or break if the following four conditions are met:[2][3]
- The key must be at least as long as the plaintext.
- The key must be truly random.
- The key must never be reused in whole or in part.
- The key must be kept completely secret by the communicating parties.
It has also been mathematically proven that any cipher with the property of perfect secrecy must use keys with effectively the same requirements as OTP keys.[4] Digital versions of one-time pad ciphers have been used by nations for critical diplomatic and military communication, but the problems of secure key distribution make them impractical for most applications.
The OTP is a variant of the Vigenére cipher in the sense that each character is encoded with a character from a different alphabet. These alphabets are, however, not alphabets in any strict sense. They are random text, And they have the same length, or longer, than the plaintext they work on.
(Singh, 1999) has an example of a ciphertext that produces (at least) two different meaningful plaintexts when decrypted with two different keys. His example is a text of 21 characters that needs 2621 equalling 5.2*1030 attempts to crack by brute force.
Computerized
In the era of the computer, the one time pad cipher has been adapted slightly. The plaintext and the keys are converted into a sequence of bits. The key must of course be at least as long as the plaintext. Then the two bit sequences are added mod 2, bit by bit. This process is often called exclusive or, XOR. An example would be
|
|
The very interesting thing now, is that if we take the ciphertext, and add it mod 2, XOR, to the same key used to encrypt it, we get, lo and behold, the plaintext back. Who says math isn’t fun?
|
|
This is easily programmable. In practical life the OTP was not used much. There were problems with creating the required number of pads, and there were the mentioned difficulties of share keys in a safe way. Here is a programmed miniature example in JavaScript:
|
|
Exercises
The rules for handing in assignments may be found in the README
Exercise DS.1.0
Create a function caesarE(key, cleartext)
that returns
a ciphertext of cleartext
given the key key
.
The function must use Caesar’s Cipher.
The key
must be a positive integer. The cleartext
must be a non-empty string.
Create a function caesarD(key, ciphertext)
that returns
a plaintext of ciphertext
given the key key
.
The function must use Caesar’s Cipher.
The key
must be a positive integer. The ‘ciphertext’
must be a non-empty string.
Write a program that tests both functions.
You must solve the exercise in either javascript, or python.
Exercise DS.1.1
Create a function vigE(key, cleartext)
that returns
a ciphertext of cleartext
given the key key
.
The function must use the Vigenére Cipher.
The key
, and the cleartext
must both be non-empty strings.
Create a function vigD(key, ciphertext)
that returns
a plaintext of ciphertext
given the key key
.
The function must use the Vigenére Cipher.
The key
, and the cleartext
must both be non-empty strings.
Write a program that tests both functions.
You must solve the exercise in either javascript, or python.
Exercise Hints
A JavaScript hint:
|
|
And an equivalent Python hint:
|
|