Sudoku code: another encryption scheme

Sudoku code: another encryption scheme

« Back to archives :: 06 Oct 2013 :: Categories: useless, encryption

Another useless encryption scheme devised by yours truly. While my last one (pi code) was primarily a substitution cipher, this one transcends all standard classifications; it's almost like a transposition cipher, but not really. The main idea here is the use of certain numbers in a particular Sudoku grid. The strength (if any) in this method lies in its unexpected nature; it certainly takes quite a leap of the imagination to correctly deduce the method from the ciphertext. Of course, once the method has been discovered, deciphering merely involves solving a Sudoku grid and then figuring out the substitution cipher used, meaning that the key is easy to determine and so this method kind of just looks at Kerckhoffs's Principle and then keeps walking. But that's okay, that's why this is filed under useless.

The setup

The first step is to find a Sudoku puzzle. I happened to have a Sudoku game installed, so I just ran that and started a random puzzle, which looked like this:

5 1 7 8
3 1 9 5 4
3 1 5
2 8
4 7 5 9
9 4
6 5 2
2 6 5 4 8
4 3 7 2

The (unique) solution to the above puzzle looks like this:

5 2 4 1 7 9 3 8 6
3 1 9 6 5 8 4 2 7
8 6 7 3 2 4 9 1 5
2 9 5 4 8 3 6 7 1
4 8 6 7 1 5 2 3 9
1 7 3 2 9 6 8 5 4
6 5 1 8 4 2 7 9 3
7 3 2 9 6 1 5 4 8
9 4 8 5 3 7 1 6 2

Clearly, each solved grid can be represented by a sequence of 81 numbers. Furthermore, due to the uniqueness of the solution, the initial grid and the solved grid are equivalent in the sense that the initial grid uniquely determines the solved grid. So we represent the initial grid as follows (with 0s standing for spaces):

500170080319050400000300015200080000400705009000090004650002000002060548040037002

This results in the following solution grid:

524179386319658427867324915295483671486715239173296854651842793732961548948537162

Now, depending on the length of the plaintext, we can choose somewhere between one and five numbers to use as the "holes" in the grid. For example, if we had a plaintext of 45 characters or fewer, we could choose the numbers 1, 2, 5, 6 and 9. Removing those numbers from the grid would leave us with this:

4 7 3 8
3 8 4 7
8 7 3 4
4 8 3 7 1
4 8 7 3
7 3 8 4
8 4 7 3
7 3 6 4 8
4 8 3 7

This has an entirely different arrangement of empty spaces than the initial grid. We can represent it using the following string:

50017008031905040000030001520008000040070500900009000465000200000206054804003700212569

We can then use the empty spaces to write our message, and add characters in the rest of the spaces to try and flatten frequencies. A simple substitution cipher should be used as well to prevent the ciphertext from looking too much like a transposition cipher. The alphabetic characters can then be interspersed with the numerals above, with the resulting string separated into chunks of 32 characters (with extra alphabetic characters appended as necessary, since only 81 are needed) so as to superficially resemble MD5 (because that's just fun).

Optimally, the solutions to the possible Sudoku grids being used should be stored in a dictionary somewhere, so that each grid only has to be solved once. The point of this encryption scheme is to be diabolically tricky but solvable, although perhaps only if you figure out the Sudoku aspect and know a crib or two.

Encoding and decoding a sequence

Let's say you want to encode:

admiral yamamoto to visit solomon islands eight am

First, remove spaces:

admiralyamamototovisitsolomonislandseightam

This is 43 characters, so let's aim for one grid with 5 numbers. This means there will be 2 extra invalid characters and 36 nonsense characters. Since some spaces will be fairly large, it's a good idea to use a simple substitution cipher as well (or even a less simple one like Vigenère for additional fiendishness). In this case, we'll use the reverse alphabet cipher:

abcdefghijklm
zyxwvutsrqpon

So the encrypted plaintext becomes

admiralyamamototovisitsolomonislandseightam
zwnrizobznznlglglerhrghlolnlmrhozmwhvrtsgzn

Now put it in the grid:

z w 4 n 7 r 3 8 i
3 z o b z 8 4 n 7
8 z 7 3 n 4 l g l
g l e 4 8 3 r 7 h
4 8 r 7 g h l 3 o
l 7 3 n l m 8 r 4
h o z 8 4 m 7 w 3
7 3 h v r t s 4 8
g 4 8 z 3 7 n c a

(The last two characters are the extra invalid characters used to fill up the empty spaces.)

Now replace the remaining numbers with random letters, chosen so as to flatten the letter frequencies as much as possible:

z w e n a r o t i
q z o b z a e n s
f z t r n m l g l
g l e e a s r h h
t r r a g h l m o
l c c n l m z r v
h o z l p m p w p
a e h v r t s b d
g g f z e q n c a

Written as one line, the above grid would look like:

zwenarotiqzobzaensfztrnmlglgleeasrhhtrraghlmolccnlmzrvhozlpmpwpaehvrtsbdggfzeqnca

Recall that the original grid looks as follows:

500170080319050400000300015200080000400705009000090004650002000002060548040037002

With the numbers used added to the end of the grid, it might look like this:

50017008031905040000030001520008000040070500900009000465000200000206054804003700215926

Now, we randomly combine the two: (split across two lines here due to design considerations)

z50w0ena1r7o008tiq0zo3bz1ae9nsf0z5t0r4n0mlgl0g0le0e0as3rhh00tr0rag1h5lm2ol00ccn0l8mz
r0v0ho0z0400lp7mpwp0a500eh9v0r0t0sbd090gg0046f500ze02000q0020n6054c80400a3700215926

We could even put in extra characters at the end, since only the first 81 characters will be used:

z50w0ena1r7o008tiq0zo3bz1ae9nsf0z5t0r4n0mlgl0g0le0e0as3rhh00tr0rag1h5lm2ol00ccn0l8mzr0v
0ho0z0400lp7mpwp0a500eh9v0r0t0sbd090gg0046f500ze02000q0020n6054c80400a3f70gg02a1e59f26x

Here's how it would look if we split it up into 32-character chunks:

z50w0ena1r7o008tiq0zo3bz1ae9nsf0 z5t0r4n0mlgl0g0le0e0as3rhh00tr0r ag1h5lm2ol00ccn0l8mzr0v0ho0z0400 lp7mpwp0a500eh9v0r0t0sbd090gg004 6f500ze02000q0020n6054c80400a3f7 0gsssg02ea1e5dllg9famezelr2ezz6x

And there you go. Not indecipherable but certainly very misleading. To decipher, first separate the numerals from the alphabetic characters, then write out the first 81 characters within a 9x9 grid. Then fill another 9x9 grid with the first 81 numerals (one digit per box), omitting the zeros, and solve the resulting Sudoku puzzle. Take the remaining (non-zero) digits and highlight the positions in the grid corresponding to those digits. The characters corresponding to the non-highlighted positions can be discarded. Now it only remains to reverse the substitution cipher and insert spaces where necessary.

Challenge:

dszb5c0ygo01ize7c0u0n80s31rwq9m0 5f0400ll0r00mw30atdx00r15gzs2000 800jq0z0n400bz7j05obo00rtbt90u0v f0u0oo9r0v00pmfvzoxxmg4650f00g20 0mnz000o20rgo605ub48ubzxxo040a03 70kteo02osz3dfle4eraosme2tn5sddq

Implementation in Python

All the relevant code is available in the repository for this website, under code/sudokucode. The 3 main files are shown below.

utils.py

import json
import re
import random
import string


A_ORD = ord('a')
NOT_LETTERS_RE = re.compile('[^a-z]')
LETTERS_RE = re.compile('[a-z]')
NUMBERS_RE = re.compile('[0-9]')
POSSIBLE_DIGITS = map(str, range(1, 10))


# Does the reverse alphabet cipher thing on a string`
def reverse_cipher(plaintext):
    real_letters = []
    for letter in plaintext:
        real_letters.append(chr(A_ORD + 25 - (ord(letter) - A_ORD)))
        # whatever it works don't h8

    return ''.join(real_letters)


def contains_only_letters(plaintext):
    return NOT_LETTERS_RE.search(plaintext) is None


def sometimes():
    return random.choice((True, False))


class SudokuCoder:
    def __init__(self):
        try:
            self.grids = json.load(open('grids.json'))
        except IOError:
            # No grids in memory - can only encode, not decode.
            self.grids = {}

    def encode(self, plaintext):
        plaintext = plaintext.lower()
        num_letters = len(plaintext)
        letters = reverse_cipher(plaintext)

        if not contains_only_letters(plaintext):
            raise CannotEncodeError('Can only contain alphabetic characters.')

        if num_letters > 45:
            raise CannotEncodeError('Maximum plaintext length: 45 characters.')

        # Randomly choose a grid.
        initial_grid = random.choice(self.grids.keys())
        grid = self.grids[initial_grid]

        # Randomly choose some digits to use as the holes.
        num_digits = num_letters / 9 + 1
        digits = random.sample(POSSIBLE_DIGITS, num_digits)

        letter_index = 0
        new_grid = []

        # Now replace all the hole digits with the plaintext.
        for digit in grid:
            if digit in digits and letter_index < num_letters:
                new_grid.append(letters[letter_index])
                letter_index += 1
            else:
                # Choose a random character
                # For both the extra ones and the nonsense ones
                new_grid.append(random.choice(string.lowercase))

        # Add extra characters depending on the number of digits
        for digit in xrange(num_digits):
            new_grid.append(random.choice(string.lowercase))

        total_digits = initial_grid + ''.join(digits)

        # Now randomly combine them.
        grid_length = len(new_grid)
        total_length = grid_length * 2
        ciphertext = []
        letter_index = 0
        digit_index = 0

        while letter_index < grid_length or digit_index < len(total_digits):
            if ((sometimes() and letter_index < grid_length) or
                digit_index == len(total_digits)):
                ciphertext.append(new_grid[letter_index])
                letter_index += 1
            else:
                ciphertext.append(total_digits[digit_index])
                digit_index += 1

        return ''.join(ciphertext)
                        
    def decode(self, ciphertext):
        ciphertext = ciphertext.lower()

        # Get the grid numbers (the first 81 digits).
        all_numbers = NUMBERS_RE.findall(ciphertext)
        initial_grid = ''.join(all_numbers[:81])
        hole_numbers = all_numbers[81:]
        all_letters = LETTERS_RE.findall(ciphertext)
        grid_letters = all_letters[:81]

        # Check if the solution to this initial grid exists.
        if not initial_grid in self.grids:
            raise GridNotFoundError

        # Get the list indices of the hole numbers.
        solution_grid = self.grids[initial_grid]
        hole_indices = []

        for i in xrange(len(solution_grid)):
            if solution_grid[i] in hole_numbers:
                hole_indices.append(i)

        hole_letters = [grid_letters[index] for index in hole_indices]
        plaintext = reverse_cipher(hole_letters)

        return plaintext

    def add_grid(self, initial, solution):
        if not initial in self.grids:
            self.grids[initial] = solution
            json.dump(self.grids, open('grids.json', 'w'))
        else:
            raise GridAlreadyExists


class GridNotFoundError(Exception):
    pass


class CannotEncodeError(Exception):
    pass


class GridAlreadyExists(Exception):
    pass

encode.py

import sys
import utils


if len(sys.argv) == 2:
    sudoku_coder = utils.SudokuCoder()
    print sudoku_coder.encode(sys.argv[1])
else:
    print "Usage: python encode.py [plaintext]"

decode.py

import sys
import utils


if len(sys.argv) == 2:
    sudoku_coder = utils.SudokuCoder()
    print sudoku_coder.decode(sys.argv[1])
else:
    print "Usage: python decode.py [ciphertext]"

Alternatives

Instead of encrypting the Sudoku puzzle itself, use a standard one. for example, if you and your imaginary recipient both have access to the New York Times, which we will pretend has a daily Sudoku puzzle if it doesn't actually, then just use that. Then, the only numbers you'd have to send along with the plaintext (and the extra characters) are the hole digits. This means that the key itself (or at least the prelude to the key) does not have to be distributed along with the message, resulting in a somewhat more secure scheme (for a given value of "secure"). This will work as long as you ensure that the Sudoku puzzle that you choose always has a unique solution.

Related posts