RSA Algorithm

We talked about the ideas behind public key cryptography last lecture. How do we implement such a scheme? This is where number theory comes to the rescue in the form of the RSA crypto system. RSA stands for Rivest, Shamir and Adelman, who discovered the scheme in 1977. Clifford Cocks had independently discovered this earlier in 1973, but his work was classified and remained unknown for many years.

A link to the original RSA paper is here.

Messages As Numbers

First, let us get some preliminary concepts out of the way. We will regard messages as numbers. The idea is that your message is encoded as a number through a scheme such as ASCII. The rest of this presentation will deal with encrypting and decrypting numbers.

Basic Idea

Suppose Alice wished to send a message M to Bob that she wished Bob and no one else to read. In a public key system, she will obtain Bob's public key and encrypt the message M using Bob's public key to obtain a encrypted message c. This is sent to Bob.

Upon receiving the message from Alice, Bob decrypts it using his private key. No one else can decrypt the message unless they have Bob's private key.

How do we implement this in practice?

The basic idea behind RSA is to create a one-way function F that given a message M and a public key mbox{publicKey}, encrypts the message by computing c = F(M,mbox{publicKey}) to yield the cipher c.

The reason F is called one-way is that

  • Given M and mbox{publicKey}, it is easy to compute c=F(M,mbox{publicKey}) to encrypt M.

  • However, someone who is snooping the channel sees c and mbox{publicKey}. It is computationally hard to invert F and compute the original message M = F^{-1}(c,mbox{publicKey}).

However, if it is hard for someone to invert F, how is it that Bob can do so?

  • We will assume that using the private key, Bob has a function G that will allow him to compute M = G(c,mbox{privateKey}).

So our scheme should give us functions F,G where F is easy to compute but very hard to invert. On the other hand, G is easy to compute and inverts F if the private key is available.

Number Theory to the Rescue

The basic scheme for RSA uses a really large number n.

  • The public key is a pair of numbers (e,n)

  • The private key is a pair of numbers (d,n).

Each message M is assumed to be a number between 1 and n-1. If n is really large, we allow a large space of numbers to code our messages with.

The basic idea is to encrypt a message M by computing  c = M^e mod n. This operation is called modular exponentiation. It is computationally inexpensive to compute even though M, e and n are typically large numbers in an RSA implementation.

Example

Let us take a message M=1098 and encrypt it using the public key e=13, n = 41989.

We will simply compute

 c = (1098)^{13} mod 41989 = 28519

On the first sight, this modular exponentiation looks like an atrociously hard computation. But remember that mod is infectious :-).

Therefore, we can first compute this by repeated squaring and modulo operations. Here is how we compute for our example using python interpreter

Example Computation

bash$ python

Python 2.7.6 (v2.7.6:3a1db0d2747e, Nov 10 2013, 00:42:54)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> c1 = (1098 **2) % 41989
>>> c1
29912
>>> c2 = (c1 **2) % 41989
>>> c2
26132
>>> c3 = (c2 **2) %41989
>>> c3
14317
>>> c4 = (c3 * c2) % 41989
>>> c4
9854
>>> c5 = (c4 * 1098) % 41989
>>> c5
28519

However, a snooper who simply sees the encrypted message c and the public key (e,n) will find it very hard to figure out the message M. This problem is considered to be a computationally hard problem, and is called the RSA problem.

So far, we have identified our one way function F(M, (e,n)) = M^e mod n, which is given by modular exponentiation . We have claimed that inverting F when given c and the public key (e,n) is computationally a hard problem. There is no known efficient algorithm for this problem, to date.

Now the idea is to find a private key (d,n) that satisfies c^d mod n = M. In other words, if we take the encrypted message c and perform modular exponentiation with the private key (d,n), we obtain the original message M back.

This means that we need e,d to satisfy the property that for any message M,

 (M^{e})^{d} mod n = M mod n

In other words, we require that

 M^{e times d} mod n  = M mod n .

The question is how can we find a pair (e,d) that will satisfy this relation, and furthermore, given (e,n) it should be hard to find the corresponding d.

Some Preliminaries

We say that two numbers m,n are relatively prime if they have no prime factors in common. In other words, m,n are relatively prime if their greatest common divisor GCD(m,n) = 1.

Example

10 and 21 are relatively prime numbers. The prime factors of 10 are 10 = 5 times 2 and 21 = 7 times 3. Thus, they have no prime factors in common.

On the other hand 15 and 39 are not relatively prime. They have a prime factor 3 in common.

By convention, GCD(n,1) = 1 for all n. Therefore, 1 is taken to be relatively prime to every other non-zero number.

Euler's Theorem

You can skip this section on a first read

The answer comes from a result proved by Euler in 1741, called the Euler's Totient theorem. We will first examine the Totient function.

Totient of a Number

Given a number n, its totient varphi(n) is the total number of integers between 1 and (n-1) that are relatively prime with n.

As an example, consider varphi(15). The numbers that are relatively prime to 15 include  { 1, 2, 4, 7, 8, 11, 13, 14 } .

Therefore varphi(15) = 8.

Interestingly, given a number n, varphi(n) is rather hard to compute. But there are numbers for which it is relatively easy.

Totient Function for Prime Numbers

Theorem: The totient function varphi(p) for a prime number p is simply p -1 .

The proof is simply to note that every number {1,2,ldots,p-1} is relatively prime to p.

Another important result about Totient functions is as follows:

Totient functions for products of primes

If p and q are prime numbers, such that p not= q, then varphi(p times q) = varphi(p) times varphi(q) = (p-1)times (q-1).

Let us look at all numbers between 1,ldots,pq that have a common factor with p,q. These can only be multiples of p or multiples of q. The reader can convince themselves that there are p + q -1 of these numbers. Therefore, varphi(pq) = pq - (p + q -1) = (p-1) times (q-1).

Here is what Euler's theorem says:

Euler's Theorem

Let a,n be relatively prime numbers. We have a^{varphi(n)} mod n = 1.

The proof is rather elegant. Let S_1: { n_1, ldots,n_{k}} be the set of all the numbers relatively prime with n. Here k = varphi(n) and a belongs to this set. Now consider S_2: { a n_1, a n_2 , ldots, a n_k }. It is possible to show that S_2 and S_1 are, in fact, the same sets.

Therefore,

begin{array}{rcl} n_1 times n_2 times cdots times n_k mod n &=& (an_1) times (an_2) times cdots times (an_k) mod n  &=& a^k (n_1 times cdots times n_k) mod n end{array}

We therefore conclude that a^k mod n = 1, or in other words a^{varphi(n)} mod n = 1.

RSA Scheme

We are now ready to talk about the basic RSA scheme.

The idea is to choose two different large prime numbers p,q and compute n = p times q. Let us assume p < q, in general.

We now wish to find a pair e and d for the public and private keys such that for any message M, we have M^{ed} mod n =M mod n.

Choosing any message M between 2 leq M leq (p-1), we can use Totient's theorem to guarantee that

 M^{varphi(n)} mod n = 1 .

Therefore, if we choose any positive v, we still obtain

M^{v varphi(n)}mod n = 1^v mod n = 1.

Therefore, M^{v varphi(n) +1 } mod n = 1 times M mod n = M, for any M.

Our goal is now to find e,d such that

 e times d = v times varphi(n) +1 .

In other words, e times d = 1 mod varphi(n).

We follow the following steps to do so:

  • Choose large primes p,q with p < q.

  • Let n = p times q.

  • Let k = varphi(n) = (p-1) times (q-1).

  • Choose a number e with 1 < e < k that is relatively prime with k.

  • Compute d such that e times d = v times k + 1 for some v. This is done using the Euclidean Algorithm.

Finally, we discard p,q,k,v and simply retain e,d,n. We publish (e,n) as the public key and retain (d,n) as the private key.

Encryption

Encrypting a message M is performed by modular exponentiation with public key:

 c  = M^{e} mod n.

Likewise, decryption uses the private key (d,n).

Decryption

Given cipher-text c, we recover the original message as

 M = c^{d} mod n

Let us illustrate this:

  • Choose p=17 and q=5. We have n = 85 and k = 16*4 = 64.

  • Let us choose e = 7.

  • We have to find d,v so that 7 * d = 64 v  + 1. We have d =55 and v = 6.

  • Verify that d * e  mod k = 55 * 7 mod 64 = 1.

  • The private key is (85,7). The public key is (85,55).

Encryption: Take a message M represented as a number from 1, ldots, n-1. The encrypted value of M is M^e mod n.

Example: Using public key 85,7 and message 12, we have C=mbox{encrypt}(12) = {12}^{7} mod 85 = 58 .

To decrypt, we have to compute C^d mod n = 58^{55} mod 85 = 12.

Breaking RSA

Let us assume that some one has access to the public key (n,e). What stops them from finding out d, the secret key?

After all, n = p q. Therefore, by factorizing n, we can find p,q and repeat the process for ourselves to compute k,v and d. Once d is known then the whole scheme goes kaput.

Problem (Factoring) Given a number n that we are told is the product of two as yet unknown prime numbers p,q, finding out p,q is a hard problem.

In order to convince you that factoring a large number say 10 digits is hard, your first programming assignment that will be out this monday asks you to try and write a factoring routine that given a number n finds a prime factor p of n. You can use any method to do so.

Combinatorially Hard Problems

There are problems in CS which do not have any known algorithms. The class of problems is called NP standing for Non-Deterministic Polynomial Time.

Claim Factoring a number n is an example of a hard problem.

Naive Algorithm

int factor(int n){
   int i;
   for (i = 0; i < n;++i)
      if (Divides(i,n)) return i;

   return NO_FACTOR;
}

Time taken to factor by best known algorithm is roughly 2^{mbox{size of number in bits}}. However, does that preclude a clever and faster algorithm?

The best known factoring algorithm is the general number field sieve. Even though it is worst case exponential, it has been used to factor large number of upto a 1000 decimal digits.