Codegate CTF 2023 Preliminary - anti Kerckhoffs

Share on:

Overview

Summary

This is a writeup for the “anti Kerckhoffs” challenge at the Codegate 2023 Preliminaries. This challenge was built around basic abstract algebra, exploiting e.g. that evaluation of quadratic polynomials is not bijective and information about them can be recovered by comparing the cardinality of preimages. We were the first team to solve this challenge, which in total only four teams solved honestly (a fifth team’s solution was stolen from us and uncovered a spy).

Solved by: shalaamum

Writeup by: shalaamum

Writeup first published here.

Understanding the challenge code

We are given a Python file prob.py with 86 lines, as well as connection details for a remote instance. Let us look through the code first.

The code begins with some constants that we will keep in mind:

QUERY_LIMIT = 77777
MOD = 17
m = 20

In particular, we will see that all calculations are (ultimately) done modulo the prime MOD = 17.

The first two functions are the following.

def quadratic_eval(coeffs, x):
    return (coeffs[2] * x ** 2 + coeffs[1] * x + coeffs[0]) % MOD

def get_random_quadratic():
    c2 = secrets.randbelow(MOD - 1) + 1
    c1 = secrets.randbelow(MOD)
    c0 = secrets.randbelow(MOD)
    return [c0, c1, c2]

The function get_random_quadratic returns a list with three components, the first two of which are random integers from 0 to 16 (inclusive) and the last one a random integer from 1 to 16. Interpreting these as the coefficients of a polynomial, this returns a random quadratic polynomial over GF(17); the reason 0 is excluded for the last component is to ensure the polynomial is indeed of degree 2. The function quadratic_eval then evaluates such a polynomial at the second argument.

The next function we see is calc, which depends on some globals we have not seen yet:

def calc(Inputs):
    Outputs = [quadratic_eval(SboxLayer1[i], Inputs[i]) for i in range(m)]
    Outputs = [sum(LinearLayer[i][j] * Outputs[j] for j in range(m)) for i in range(m)]
    Outputs = [quadratic_eval(SboxLayer2[i], Outputs[i]) for i in range(m)]
    return Outputs

We can assume here that Inputs is supposed to be some iterable with m components. The globals SboxLayer1 and SboxLayer2 are likely lists of m many quadratic polynomials, while LinearLayer is an m x m matrix. calc applies the polynomials from SboxLayer1 componentwise to Inputs, then the resulting vector is multiplied with from the left with the matrix LinearLayer, and finally the polynomials from SboxLayer2 are applied componentwise again, giving as an end result a list with m components from 0 to 16.

The remaining functions are less interesting helper functions: print_list does what it says, and list_to_int and int_to_list convert between lists and ints, where the lists are interpreted as the coefficients of the corresponding int when written to base 17. The function compare takes two lists as input and returns an integer whose bits indicate which components of the lists were equal and which weren’t.

Now let us look at the globals generated by the challenge.

SboxLayer1 = [get_random_quadratic() for _ in range(m)]
LinearLayer = [[secrets.randbelow(MOD-1) + 1 for _ in range(m)] for _ in range(m)]
SboxLayer2 = [get_random_quadratic() for _ in range(m)]

HIDDEN = [secrets.randbelow(MOD) for _ in range(m)]
TARGET = calc(HIDDEN)

print(f"TARGET = {list_to_int(TARGET)}")

So the already mentioned SboxLayer1 and SboxLayer2 are lists of m quadratic polynomials as already expected, and now we see that they are generated randomly, and stay secret. LinearLayer is a random m x m matrix with entries from 1 to 16 (inclusive), and similarly stays secret. Furthermore an element HIDDEN in GF(17)^m is generated, and we get to know the value of TARGET=calc(HIDDEN). The specific value of HIDDEN and this relationship won’t be relevant in the following, so we can just take TARGET as some element of GF(17)^m that is in the image of calc.

The interaction part of the challenge is then the following.

query_count = 0
win = False
while not win:
    Inputs = list(map(int, input("Inputs > ").split()))
    query_count += len(Inputs)
    if query_count > QUERY_LIMIT:
        print("bye..")
        exit(-1)
    Outputs = []
    
    for I in Inputs:
        O = calc(int_to_list(I))
        if TARGET == O:
            win = True
        Outputs.append(compare(TARGET, O))
    print_list(Outputs, "Outputs = ")

print("Good job!", open("flag", 'r').read())

So we can essentially input in total up to QUERY_LIMIT many elements of GF(17)^m. If we manage to send an input for which calc applied to it is TARGET, then we obtain the flag. Otherwise we get as feedback which components of calc(our input vector) are equal to the corresponding component of TARGET.

Analyzing the problem

So there is a map calc: GF(17)^m -> GF(17)^m that depends on unknown parameters SboxLayer1, LinearLayer, and SboxLayer2, and our goal is to find a preimage of TARGET under calc. This would be much easier if we knew what the unknown parameters are. We won’t be able to exactly recover the precise parameters used in the construction of calc however, as there are different parameter-triples that result int he same map calc. For example one could multiply LinearLayer by 2 and SBoxLayer2 by the inverse of 2 modulo 17 (so 9), and like that obtain a new parameter-triple defining the same map calc. As we only need to find a preimage under calc of TARGET, this indeterminancy is not a problem though, and conceptually this means we may rephrase the construction of calc in simplified ways to make it easier to recover it.

First, any polynomial of degree 2 over GF(17) can be written as b*(x-r) ^2+ c for b, r, c elements of GF(17) and b nonzero. So there must be b_i, b'_i, r_i, r'_i, c_i, c'_i in GF(17) with b_i and b'_i nonzero, as well as a m by m matrix A with nonzero entries in GF(17) such that calc maps (x_1, ..., x_m) to (notation of the form (expr_i) with i unbound is shorthand for the vector (expr_1, ..., expr_m) here):

componentwise-eval_(b'_i*(x - r'_i)^2 + c'_i)( A * componentwise-eval_(b_i*(x - r_i)^2 + c_i)(x_i) )
= componentwise-eval_(b'_i*(x - r'_i)^2 + c'_i)( A * Diag(b_i) * componentwise-eval_((x - r_i)^2)(x_i)  +  A * (c_i) )

Now note that b'_j*((x + (A * (x_i))_j) - r'_i)^2 + c'_j is again a polynomial of degree 2 in x, so can be written as b''_j*(x - r''_j)^2 + c''_j with b''_j nonzero. We furthermore define A' = A * Diag(b_i). We then get:

= componentwise-eval_(b''_i*(x - r''_i)^2 + c''_i)( A' * ((x_i - r_i)^2))

Note that the b_i are all nonzero, so as A had only nonzero entries, the same holds for A'. Let (d_i) be the first column of A'. Then we can write A' = Diag(d_i) * A'' where A'' still has only nonzero entries, but now all entries of the first column of A'' are 1. Then we get:

= componentwise-eval_(b''_i*(x - r''_i)^2 + c''_i)( Diag(d_i) * A'' * ((x_i - r_i)^2))
= componentwise-eval_(b''_i*d_i*(x - (r''_i/d_i))^2 + c''_i*d_i)( A'' * ((x_i - r_i)^2))

This is the form we are going to use to recover enough information to obtain an input such that calc applied to it gives us TARGET.

Solution

Recovering r_i

As a first step we recover the r_i so that we know the first of the three parts calc is composed of in the form we had above.

For this we note that (x - r_i)^2 takes the same value on x = r_i + y and x = r_i - y. There are thus 8 pairs of elements of GF(17) that have the same image under (x - r_i)^2: {r_i + 1, r_i - 1}, ..., {r_i + 8, r_i - 8}. Then there is a single additional element left over: r_i, whose image has a preimage containing only r_i itself. Hence we can recover r_i with the following algorithm:

  1. Find some input (x_1, ..., x_m) such that calc(x_1, ..., x_m) agrees with TARGET in many components (this is to make the next steps more likely to succeed, in my solution script I sampled 1000 random inputs and then sorted them by how many components of the output agree with TARGET, and then started with those with the most agreement).
  2. Send inputs (x_1, ..., x_{i-1}, y, x_{i+1}, ..., x_m) for all 0 <= y < 17.
  3. If of the 17 results regarding in which components the value of calc agrees with TARGET we have one result that occurs only a single time, then the corresponding input must necessarily be the one where y = r_i, so we are done.
  4. If not, then go back to step 1. Note that if the input we started with in step 1 happened to have x_i = r_i (and that happens in around 1/17 of the cases), then calc(x_1, ..., x_m) agreeing in many components with TARGET means it is quite unlikely that changing i-th component of the input will still by coincidence result in the same components agreeing with TARGET. Thus we should expect this algorithm to succeed with something around 17 rounds, with 17 queries each, for a total of m*number_of_rounds*number_of_queries_per_round = 20*17*17 = 5780 queries, out of a total budget of 77777 queries. The actual amount of queries used varies, for example in one run it was 3757, in another 9503.

Recovering A''

This step is done after we have already recovered the r_i. So with calc being a composition of three maps as above, we already understand the first map, the second one is multiplication by A'', and the third one is a componentwise map. Recall that we had arranged things so that the first column of A'' is (1,...,1).

So now suppose that we guess some particular value a_{i,j} for the entry of A'' in the i-th row and j-th column, with j>1, and let us assume that guess is correct for the moment, and let us look at what happens if we change components 1 and j in the input to calc. So say we have (x_1, ..., x_m) and (y_1,..., y_m) but x_i = y_i as long as i is not 1 or j. Then applying the first polynomial part we obtain ((x_1 - r_1)^2, ..., (x_m - r_m)^2) and ((y_1 - r_1)^2, ..., (y_m - r_m)^2), still with all components but the first and j-th one agreeing. Multiplying on the left with A'' will leave us in the i-th component with z + (x_1 - r_1)^2 + a_{i,j}*(x_j - r_j)^2 and z + (y_1 - r_1)^2 + a_{i,j}*(y_j - r_j)^2, where z is some constant depending on the other entries of the i-th row of A'' as well as the shared components of (x_1, ..., x_m) and (y_1,..., y_m). Now for some choices of (x_1, x_j) different from (y_1, y_j) these values will be the same, and as we know r_1 and r_j and are considering a fixed guess of a_{i,j}, we can find such pairs of pairs (note that we do not need to know z for this!). So let us choose one such pair of pairs for our input. So far we only discussed the first two parts of calc, but the third part is componentwise, so will still result in the same value in the i-th component, so the upshot is that if this guess for a_{i,j} is correct, then we can construct a pair of inputs that will result in the same value in the i-th component after applying calc. Hence, still assuming that our guess a_{i,j} is correct, the oracle will have to return to us that calc(x_1, ..., x_m)_i agrees with TARGET_i if and only if that is the case for calc(y_1,..., y_m)_i. Thus if we fail to see this, this will definitively rule out a_{i,j} as a guess for the i,j-th entry of A''. Repeating this until we have ruled out all guesses will leave us with a single remaining possibility, thereby recovering A''.

So the algorithm is as follows to recover the j-th column of A'':

  1. Initialize the set of not-yet-ruled-out possibilities for each entry in the column to be the set {0,...,16}.
  2. Get some random example input X=(x_1, ..., x_m).
  3. Construct a list of 9*9=81 input vectors that agree with X in all components but the first and j-th, and give pairwise different vectors after applying the first polynomial step of calc.
  4. For each such input vector, note down the first and j-th component (let us call them z_1 and z_j) after applying the first polynomial step, and the result vector q (on agreements with TARGET) returned from the oracle.
  5. For each i and each guess a_{i,j} for the i,j-th entry of A'', and element z of GF(17), consider all triples (z_1, z_2, q) from the previous step such that z = z_1 + a_{i,j}*z_j. If all the corresponding values q_i are the same, then the guess is consistent with the data. If they are not, rule out a_{i,j} as a possible value for the i,j-th entry of A''.
  6. If there is only one possibility not ruled out for each entry of the column, we are done. Otherwise go back to step 2.

How many queries are required to recover A'' with this method varies, sometimes less than 2000 queries are needed, other times over 50000.

Recovering a preimage of TARGET under the third step of calc

calc is a composite of three maps, and we have to find a preimage of TARGET. Note that we can split this problem up into two parts: find preimages of TARGET under the last of the three maps, and then, for one of those, find a primage under the composition of the first two maps. To carry this out, let us first find preimages of TARGET under that last map. For this say we find an input (x_1, ..., x_m) so that the oracle tells us that calc(x_1, ..., x_m) agrees with TARGET in component i. Then, as the third map is componentwise, we can conclude that whenever we have the same value in the i-th component after applying the first two maps as for this input, then the image under calc will agree with TARGET in component i. As we know A'' and the r_j we can evaluate the first two steps of calc under (x_1, ..., x_m) and note down the i-th component as a possible value in the i-th component for preimages of TARGET under the last map. Note that by going through many examples like this we expect to find either 2 or (in 1/17 of the components) 1 value that would work for each of the components. Finally, preimages of TARGET under the last map are all vectors that in each component has one of the values we found as just described. The number of candidates will thus be 2^k with k<=2m, and most likely 2^40, 2^39 or 2^38 (precise expectation would be 2^{20*1*(1/17) + 20*2*(16/17)} ≈ 2^38.82 ≈ 4.85*10^11.

Finding a working solution

We know A'' and the r_i, so we can calculate preimages under the first two maps for all of the candidates we found in the previous subsection. Not every candidate will have a preimage, as the quadratic polynomials are not surjective. However, it is likely that we found all candidates in the previous step. What is the likelihood that a candidate has a preimage? Let us assume that the matrix is invertible, which will be the case in 16 of 17 tries (so if it isn’t, we can just run the solve script again). Then the question is whether m=20 components simultaneously have a preimage under a quadratic polynomial each. The image of a quadratic polynomial has cardinality 9, so that means the probability is (9/17)^20. We thus expect to have to search (17/9)^20 ≈ 3*10^5, many candidates until we find a preimage. Note that while we can be certain to find some preimage under one of the roughly 5*10^11 candidates (assuming, as is likely, we have all of them) given that one of them must be the first two maps applied to HIDDEN, this means we usually need to search through much less to find some working preimage.

Attachments

See the challenge file as well the full solution script.