CAAM 210 Project 12: Cryptography and Simple Ciphers
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
CAAM 210
Project 12: Cryptography and Simple Ciphers
This is a pledged project! Your cryptography code will be based on randomized algorithm, so you may need few trials before getting ”good” results!
Code is the invisible foundation of our modern lifestyle. Everyday, millions of financial transactions are made and billions of emails are sent over open networks. The reason that these transactions and emails are delivered without being stolen or intercepted (in most cases) is encryption.
In this project you will decode simple ciphers. A simple cipher is a code where each character in the encoded message corresponds to one character in the decoded message. For example “XPEEMKSMDEC” might correspond to “HELLO WORLD” . Note that here a space is considered a character. Before we begin decoding, we must learn how handle strings in Python. A string is a sequence of characters; for example, ‘ksbgpisqvbapikfv’ (yes, I did just mash the keyboard). It is important to realize that characters are not limited to letters or numbers. Indeed, ‘$’, ‘#’, and ‘g’ are all examples of characters. Furthermore, there are special characters like ‘space’, ‘tab’, ‘newline’, and ‘endoffile. ’ For your computer to read these characters, each character corresponds to a number. While there are numerous character code schemes, the most common is ASCII which encodes each character with a number 0-127. This scheme incorporates most of the characters on a standard English keyboard. For example, the ASCII code for ‘A’ is 65, while the code for ‘a’ is 97. To convert a character to an ASCII number, use the Python “function” ord:
>>s=ord(’h’)
s =
104
You can convert it back using the “function” chr:
>> chr(s)
’h’
(The word function is in quotation marks above because ord and chr are actually data types, and you are simply telling Python to change the data type.)
For this project, the encrypted messages contain 27 different characters: a through z and the accent mark (`). Each of those 27 characters will correspond to a character in the deciphered message, which should only contain the letters a through z and the space character. After you decode the message you will need to change all of the occurrences of ` to an empty space. Create a function downlow which converts the accent (`) and the letters a through z to the numbers 0 through 26 respectively (i.e., downlow(’z’) = 26). Additionally, you should create the inverse function downlowinv which takes a number 1 to 26 and returns the corresponding character (i.e., downlowinv(26) = ‘z’).
The reason for this translation is that during the process of decoding the message, the algorithm will “guess” which symbol in the encoded message corresponds to which symbol in the deciphered message. It will be helpful to use a 1 × 27 vector, which we will denote y, to keep track of the guess. This vector serves as a cipher key that is used to decode the message. If the cipher key is correct, the message will be decoded correctly. For example, if your 1 × 27 cipher key is [0, 1, 2, 3, 5, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26]
then you are guessing that ‘d’ in the encoded text stands for ’e’ in the decoded message. This is because the 5th component of our vector is 5, so the key maps the fourth letter in the alphabet (’d’) to the character associated to 5, the letter (‘e’). Don’t forget that the first component of the vector is reserved for the accent symbol (`) which we are using to represent a space in the decoded message.
The method we are going to use to decipher our messages is called the Metropolis Algorithm. The Metropolis Algorithm is a special case of something called Markov Chain Monte Carlo. In a Monte Carlo method, the problem is solved by trying random cases to determine the answer. This might seem silly, as with 26 letters and the space character there are 27! (factorial) possibilities for decoding a simple cipher, so it may take many random tries until you stumble on the right answer. However, what makes the Metropolis Algorithm a good option is that it is able to stumble on the correct answer by using “strategically random” guesses.
In order to make strategic guesses, we need to have some measure of how good a guess is. This requires us to make some assumptions:
· our code is a simple cipher,
· the message is in English, and
· the message contains typical patterns of letters.
In general, these assumptions may not be valid, however, we will see that they can be enough to decode seemingly hopeless messages. We will use letter transition probabilities in order to calculate the nОg- nklenkiООd fuoctkОo that we will use as our measure. The letter transition probabilities measure how often one letter is followed by another. For example, Q is almost always followed by U. Hence, if your guess results in a message with numerous Q’s followed by other letters that are not U, then your guess for the cipher is probably wrong. The transition probabilities are given to you in the file letterprob .npy, which can be imported into Python using the numpy .load function. The file is set up as a matrix where the first row and first column correspond to the space character and all subsequent rows and columns correspond to the alphabet in order. For example the 15th row and column correspond to to the 14th letter of the alphabet, ‘n’ . Hence, the reason that the downlow function uses the same numbering. The i, j entry of this matrix is the probability that the j-th letter follows directly after the i-th letter. If we denote this matrix M, then the log-likelihood function is given by
log(M(y(tk), y(tkk1))),
k
where the sum is over all consecutive pairs of letters in your message, tk represents the k-th character in the encoded message, y(tk) represents the guess for the k-th letter in the message and M(., .) is the corresponding entry in the transition probability matrix. In other words, this function calculates the relative log-likelihood of how good your guess is. Create a function loglike that takes in the coded text (as an array of numbers 0 through 26) and your guess (a 1 × 27 vector) and computes the log-likelihood of your guess given the inputted text.
We implement the Metropolis Algorithm as follows:
1. Start with a random guess, y (you can implement this by choosing a random permutation of 0 through
26 with numpy .random .permutation(27)).
2. Consider another guess, ymaybe that is generated from y by switching two elements randomly.
3. Compute the log-likelihood of y and ymaybe .
4. If the log-likelihood of ymaybe is higher (i.e., less negative), then use ymaybe as your next guess.
5. If this is not the case, use ymaybe as your new guess with probability exp[_loglike(y)+loglike(ymaybe)]. otherwise, use y as your guess in the next iteration.
6. Repeat steps two through five until the message is decoded.
This algorithm randomly makes guesses at how to decode the message, but strategically determines whether or not those random guesses are any good. To ensure that this algorithm decodes the message, you should use 104 iterations. Make sure you do few trials. This may take some time to run on your computer (depending upon the processing power of your computer and the efficiency of your code- this could be anywhere from a few minutes to over an hour). Test, debug, and troubleshoot your code on the first encoded message, as this is the shortest message and will be the message on which your code will run the fastest.
For this project you are required to implement this algorithm for the encoded message provided. The message can be imported into Python as a string using the follwoing piece of code:
with open(’encodedtext .txt’, encoding=’utf8’) as f:
for line in f:
TEXT = line .strip(’encodedtext .txt’)
TEXT = TEXT[0: -1]
Create a function decoder which takes in the encoded message and a maximum number of iterations and returns the decoded message. Unlike other projects, I cannot tell you what the output is because that defeats the whole fun of cryptography. You will most likely recognize the decoded messages (or at the very least, Google will). Make sure that your code returns the decoded message and that your driver has some way of displaying the decoded message.
Your Python file should include the following functions:
def downlow(text):
# Converts the inputted characters to numbers between 0 to 26
return numbers
def downlowinv(numbers):
# Converts the inputted number between 0 to 26 to its respective character
return text
def decoder(text, maxiter):
# Decodes the given text using the Metropolis Algorithm
return y
def loglike(text, guess, letterprob):
# Computes the value of loglike funtion for a given transition probability matrix letterprob and k return value
Your work will be graded as follows:
10 pts for correct headers
10 pts for further comments in code and correct indentation
10 pts for correct formulation downlow
10 pts for correct formulation downlowinv
20 pts for correct formulation of loglike
20 pts for correct formulation of decoder
20 pts for correct output
2022-12-02