Lecture 28: RSA Encryption

Summary

Today we discussed the results of the second midterm, including how your raw score correlate to a letter grade. Students who received below a "C" were told how they could receive some "redemption" points to help pull up their grade, but the necessary work is due by Friday (unless otherwise scheduled with the instructor). Afterwards we discussed number theoretic encryption methods, including the modern-day workhorse known as RSA.

Midterm 2 Results

I handed back the second midterms at the beginning of class today. The breakdown on grades looks like this

Raw Grade Letter Equivalent
[75,100] Aish
[63,75) Bish
[45,63) Cish

Students who received a C or lower on the test have the following chance for redemption: near the top of the first page on your returned midterm, you should see a message that looks something like "+4 for #2,5,6." The idea is that you can receive a certain number of points back on your midterm if you re-do the problems listed (in the example above, the person would get back up to 4 points for redoing problems 2, 5 and 6). If you've been asked to re-do question 2, you should include a justification for your answer (i.e., give a counterexample if the result is false or a brief justification if the result is true). For other problems, you should include all your work and relevant explanations. Your "re-do" problems are due to me by Friday at class, though I'm willing to give an extension if you have exceptional circumstances (just ask).

RSA Encryption

Before introducing RSA encryption, we'll introduce some basic cryptographic terminology. To start, a person usually has a message that they would like to securely deliver to someone else. This message is called the cleartext. Typically the first step in the transmission is to conver the cleartext into some sort of numeric equivalent. It's pretty typical in examples to use the following numeric correspondence

Letter Number Letter Number
A 01 N 14
B 02 O 15
C 03 P 16
D 04 Q 17
E 05 R 18
F 06 S 19
G 07 T 20
H 08 U 21
I 09 V 22
J 10 W 23
K 11 X 24
L 12 Y 25
M 13 Z 26

Typically there are other characters that are encoded as well. In class, for instance, we let 00 stand for the underscore character _. The numeric equivalent of cleartext is called plaintext.

Example: Converting Cleartext to Plaintext

Suppose you wanted to send the message "I LIKE ICE CREAM." We'll instead send the message "I_LIKE_ICE_CREAM" since we don't have a numeric equivalent to the space character. When we can replace each character by the corresponding two digit number, we get the following plaintext:

09001209110500090305000318050113.

This process gives us the following

Definition: The message that you'd like to send is called cleartext. After you've convered the cleartext into its numeric equivalent, we get plaintext.

The goal of encrypting a message is to prevent people who might steal your message from being able to understand it. Hence encryption is the process of transforming the plaintext into some scrambled ciphertext that malicious parties can't understand. If the message you send is, in fact, received by someone who is intended to read it, then this person needs to be able to convert the scrambled up ciphertext they've received back into regular ol' plaintext (and then to clear text). This process — of converting ciphertext back into cleartext — is called decryption.

An Old Method for Encrypting

Julius Caesar is credited with using one of the early encryption methods. His method was to take the plaintext numbers and "shift" them in order to scramble characters. For example, if one shifted the value of each character by +1, then we'd have

(1)
\begin{align} \_ \rightarrow 01 \quad A \rightarrow 02 \quad B \rightarrow 03 \quad \cdots \quad Y \rightarrow 26 \quad Z \rightarrow 00. \end{align}

Notice that all we've done is replaced a given numeric value N with $N+1 \mod{27}$.

Using Caesar's Shift

Our cleartext "I_LIKE_ICE_CREAM" was converted into plaintext above as

09001209110500090305000318050113.

Now we're going to shift each of the characters by +1 in order to scramble up our message. Notice that we apply this shift to each pair of numbers, since each character is represented by a pair of numbers.

The plaintext 09001209110500090305000318050113 becomes
the ciphertext 10011310120601100406010419060214

If we converted this ciphertext back into regular characters, we'd then have the (seemingly incomprehensible) message

JAMJLFAJDFADSFBN

$\square$

Notice that decrypting the message amounts to "undoing" this process of adding 1; in other words, one can decrypt a message just by subtracting the shift key.

Example

In the movie 2001, one of the characters is a computer called HAL. Suppose we knew that this name was really encrypted using a Caesar shift of -1. To decrypt, we'd therefore need to add 1. Here's what we'd get:

HAL is 080112, and shifting by +1 gives 090213, which is the cleartext IBM.

$\square$

There are some problems with this method of encryption. One obvious problem is that there aren't many ways of encoding a message in this way; seemingly there are only 27 choices for a shift. In practice, then, a person could conceivably just try all 27 possibilities to find your encoded text. This problem can actually be avoided. The more pressing problem, though, is that knowing how one goes about encrypting messages gives all the necessary information for decrypting messages as well. This means that if person A and person B want to exchange messages, they need to meet up prior to their exchange and agree on an encoding scheme. This is problematic not only because it's impractical (you're not going to schedule an in-person meeting with amazon.com before you send them your credit card information), but also means that if person A decides to give away the encoding scheme to someone else, than the security of the encryption process will have been compromised.

To get around this second problem, people are interested in finding encryption techniques where knowledge of the encoding methodology does not immediately reveal the decoding methodology.

RSA Encryption

Suppose that person A wants to receive secure information from person B. To do so, person A secretly chooses two large prime numbers p and q. Their product will be called m: $m = pq$. Person A also chooses a number e at random so that $(e,\phi(m)) = 1$. Notice that since person A knows $m = pq$, he can easily compute $\phi(m) = \phi(p)\phi(q) = (p-1)(q-1)$. Since person A knows the value of $\phi(m)$, it's easy for person A to check this gcd condition, since this basically amounts to doing the Euclidean Algorithm. In verifying using the Euclidean Algorithm, person A will also have found a number d so that $ed \equiv 1 \mod{\phi(m)}$. Person A then publishes, for everyone to know, the public key

(2)
\begin{equation} (m,e). \end{equation}

He keeps as a secret the numbers $\phi(m)$ and $d$.

The process of encrypting and decrypting then work as follows

  • To encrypt, person B takes the plaintext P and computes $C = P^e \mod{m}$. Person B sends this ciphertext C to person A.
  • To decrypt, person A takes the ciphertext C and computes $C^d \mod{m}$. The value of this computation will be the original plaintext, P.

Example: Encrypting using RSA

Let's take $p=5851$ and $q = 937$. Then we have $m=5482387$ and $\phi(m) = (5850)(936)= 5475600$. Suppose that person A chooses $e = 101$. It's not hard to check that $(101,5475600)=1$, and indeed we have

(3)
\begin{split} 5475600 &= 54213*101+87\\ 101 &= 87 + 14\\ 87 &= 6*14 + 3\\ 14 &= 4*3 + 2\\ 3 &= 2+1. \end{split}

Working backwards this tells us that

(4)
\begin{align} 1 = 101*3523901 +65*5475600 \quad \mbox{ so that }d \equiv 3523901 \mod{\phi(m)}. \end{align}

So person A publishes

(5)
\begin{align} m=5482387 \quad \mbox{ and } \quad e = 101. \end{align}

Now suppose person B wants to send the message

HELLO_WORLD

Person B starts by translating this cleartext into plaintex

HELLO_WORLD becomes 0805121215002315181204.

Now person B wants to send this message to person A. Notice, however, that the message is larger than the number m used for the encryption. This creates some potential problems, so person A will often include (along with the integers m and e) a specified blocksize, which tells people how many digits any given piece of encoded information will have. For this example, we'll pretend that person A wants his messages split into 6 unit blocks (this is the smallest even number of blocks which are guaranteed to be smaller than the modulus). This means person B's message becomes a series of 4 messages:

0805 121215 002315 181204.

We've chosen to split the message into 6 unit blocks starting on the left, but you could have just as well started on the right. Notice that the first block only has 4 units; to get around this, we'll "pad" this first message with a dummy character:

000805 121215 002315 181204.

Now let's get down to encoding. Each 6 unit block is encrypted using the encrypting exponent e, with calculations done mod m:

(6)
\begin{split} 000805 &\rightarrow (000805)^{101} \equiv 5337272 \mod{5482387} \\ 121215 &\rightarrow (121215)^{101} \equiv 4854616 \mod{5482387} \\ 002315 &\rightarrow (002315)^{101} \equiv 949296 \mod{5482387} \\ 181204 &\rightarrow (181204)^{101} \equiv 2279290 \mod{5482387}. \end{split}

Hence our ciphertext is the series of messages

5337272 4854616 949296 2279290

$\square$

Decryption

Let's use the same numbers m, e and d from above. Suppose someone sends us the following ciphertexts (each ciphertext block is separated by spaces):

2614998 1425136 4523551 1701846

and we want to decode it. Decoding, we know, amounts to raising these numbers to the dth power modulo m, so let's do it:

(7)
\begin{split} 2614998^{3523901} &\equiv 13 \mod{m}\\ 1425136^{3523901} &\equiv 12008 \mod{m}\\ 4523551^{3523901} &\equiv 1815 \mod{m}\\ 1701846^{3523901} &\equiv 31119 \mod{m}. \end{split}

Since we know that we have 6 unit blocks, we just pad these numbers so they are each six digits:

000013 012008 001815 031119

Translating this back to cleartext becoems

__M ATH _RO CKS

which is a message we can all believe in. $\square$

Computations for Encryption

You may have noticed that the necessary computations for carrying out encryption can become pretty difficult to actually implement by hand, or even with a pocket calculator. Since you'll be asked to carry out some calculations like this for your homework, it would be beneficial for you to learn how to use some computer software to make these calculations easier. I'd suggest using GP/Pari since it has loads of built in features and is really easy to use. Alternatively, you could use one of the Macintosh labs and run similar commands on Mathematica. If you do decide to use GP/Pari, note that it is run off a command line (MSDOS prompt) instead of the "usual" graphical interfaces that other programs have. If this seems like too much work, just fire up Mathematica at a computer lab.

Using GP/Pari

Let me walk you through the process of encryption using Pari. Typically you input your code next to a prompt (like a ? character) and hit enter; Pari then evaluates and outputs the corresponding number. For instance, here's what I type in and see when I add 2 and 2 with Pari:

? 2+2
%1 = 4

Notice that the input is preceded by a question mark, and the output is proceeded by a percent sign, a number, and an equals sign.

Let's get started. The process begins by selecting two (hopefully large) prime numbers. Pari makes this easy, since it has a built in command for selecting prime numbers. The function prime(n), where n is any number you want, spits out the nth prime. I'll just pick the 100th and 200th primes for p and q. I'll also store the product pq as m:

? p=prime(100)
%2 = 541
? q=prime(200)
%3 = 1223
? m=p*q
%4 = 661643

Now that I've stored these variables, whenever I use the letters p, q or m, Pari knows that I'm actually talking about 541, 1223 and 661643. Now I'm supposed to select a random number e which is relatively prime to $\phi(m)$. First I'll pick e using the random(n) function (again, n is any integer you want) which picks a random number less than n.
? e=random(2000)
%5 = 651

Now I need to check that e and $\phi(m)$ are relatively prime, so I run
? gcd(e,eulerphi(m))
%6 = 3

D'oh! Looks like they aren't relatively prime. No problem: I'll just keep selecting random e until I find something which is relatively prime.
? e=random(2000)
%7 = 1967
? gcd(e,eulerphi(m))
%8 = 1
Great, seems that $e = 1967$ will work. With this number e selected, I can now release my public key:(8)
\begin{align} \mbox{My public key is }(661643,1967). \end{align}

Now suppose that I want to encode a message using 6-unit blocks. We'll keep this simple, just transmitting the message CAT. Now CAT corresponds to 030120, so to encode I know that I should compute

(9)
\begin{align} 30120^{1967} \mod{661643}. \end{align}

I can do this easily on Pari using the command:

? Mod(30120,661643)^(1967)
%9 = Mod(296395, 661643)
Voila! This means the encoded message is

296395.

Now that we know how to encode, let's figure out how to decode. For this, we know that we need to calculate d: the multiplicative inverse of e modulo $\phi(m)$. Fortunately, Pari makes finding this inverse super easy:

? Mod(e,eulerphi(m))^(-1)
%10 = Mod(414983, 659880)
This tells us that d should be taken as 414983. So let's enter that value into Pari:
? d=414983
%11 = 414983
Now we can start decoding. Suppose, for instance, that you wanted to decode the message

437381.

For this, we know we're supposed to compute

(10)
\begin{align} 437381^{414983} \mod{661643}. \end{align}

Pari makes easy work of this computation:

? Mod(437381,661643)^d
%12 = Mod(180120, 661643)
Translating into cleartext, the message 180120 becomes RAT.

Using Mathematica

Similar computations can be carried out using Mathematica. Here are the necessary functions.

  • $5^3 \mod{12}$ is computed running the command PowerMod[5,3,12].
  • the inverse of 5 mod 12 is computed by running PowerMod[5,-1,12]
  • $\phi(12)$ is computed by running EulerPhi[12]
  • A random integer between a and b is created by running RandomInteger[{a,b}]
  • The GCD between two integers a and b is computed by running GCD[a,b]
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License