# Codegate CTF 2023 Preliminary - anti Kerckhoffs

## Overview

## Summary

This is a writeup for the “anti Kerckhoffs” challenge at the Codegate 2023 Preliminaries. This challenge was built around basic abstract algebra, exploiting e.g. that evaluation of quadratic polynomials is not bijective and information about them can be recovered by comparing the cardinality of preimages. We were the first team to solve this challenge, which in total only four teams solved honestly (a fifth team’s solution was stolen from us and uncovered a spy).

Solved by: shalaamum

Writeup by: shalaamum

Writeup first published here.

## Understanding the challenge code

We are given a Python file `prob.py`

with 86 lines, as well as connection details for a remote instance.
Let us look through the code first.

The code begins with some constants that we will keep in mind:

```
QUERY_LIMIT = 77777
MOD = 17
m = 20
```

In particular, we will see that all calculations are (ultimately) done modulo the prime `MOD = 17`

.

The first two functions are the following.

```
def quadratic_eval(coeffs, x):
return (coeffs[2] * x ** 2 + coeffs[1] * x + coeffs[0]) % MOD
def get_random_quadratic():
c2 = secrets.randbelow(MOD - 1) + 1
c1 = secrets.randbelow(MOD)
c0 = secrets.randbelow(MOD)
return [c0, c1, c2]
```

The function `get_random_quadratic`

returns a list with three components, the first two of which are random integers from `0`

to `16`

(inclusive) and the last one a random integer from `1`

to `16`

. Interpreting these as the coefficients of a polynomial, this returns a random quadratic polynomial over `GF(17)`

; the reason `0`

is excluded for the last component is to ensure the polynomial is indeed of degree 2. The function `quadratic_eval`

then evaluates such a polynomial at the second argument.

The next function we see is `calc`

, which depends on some globals we have not seen yet:

```
def calc(Inputs):
Outputs = [quadratic_eval(SboxLayer1[i], Inputs[i]) for i in range(m)]
Outputs = [sum(LinearLayer[i][j] * Outputs[j] for j in range(m)) for i in range(m)]
Outputs = [quadratic_eval(SboxLayer2[i], Outputs[i]) for i in range(m)]
return Outputs
```

We can assume here that `Inputs`

is supposed to be some iterable with `m`

components. The globals `SboxLayer1`

and `SboxLayer2`

are likely lists of `m`

many quadratic polynomials, while `LinearLayer`

is an `m x m`

matrix. `calc`

applies the polynomials from `SboxLayer1`

componentwise to `Inputs`

, then the resulting vector is multiplied with from the left with the matrix `LinearLayer`

, and finally the polynomials from `SboxLayer2`

are applied componentwise again, giving as an end result a list with `m`

components from `0`

to `16`

.

The remaining functions are less interesting helper functions: `print_list`

does what it says, and `list_to_int`

and `int_to_list`

convert between lists and ints, where the lists are interpreted as the coefficients of the corresponding int when written to base `17`

. The function `compare`

takes two lists as input and returns an integer whose bits indicate which components of the lists were equal and which weren’t.

Now let us look at the globals generated by the challenge.

```
SboxLayer1 = [get_random_quadratic() for _ in range(m)]
LinearLayer = [[secrets.randbelow(MOD-1) + 1 for _ in range(m)] for _ in range(m)]
SboxLayer2 = [get_random_quadratic() for _ in range(m)]
HIDDEN = [secrets.randbelow(MOD) for _ in range(m)]
TARGET = calc(HIDDEN)
print(f"TARGET = {list_to_int(TARGET)}")
```

So the already mentioned `SboxLayer1`

and `SboxLayer2`

are lists of `m`

quadratic polynomials as already expected, and now we see that they are generated randomly, and stay secret. `LinearLayer`

is a random `m x m`

matrix with entries from `1`

to `16`

(inclusive), and similarly stays secret. Furthermore an element `HIDDEN`

in `GF(17)^m`

is generated, and we get to know the value of `TARGET=calc(HIDDEN)`

. The specific value of `HIDDEN`

and this relationship won’t be relevant in the following, so we can just take `TARGET`

as some element of `GF(17)^m`

that is in the image of `calc`

.

The interaction part of the challenge is then the following.

```
query_count = 0
win = False
while not win:
Inputs = list(map(int, input("Inputs > ").split()))
query_count += len(Inputs)
if query_count > QUERY_LIMIT:
print("bye..")
exit(-1)
Outputs = []
for I in Inputs:
O = calc(int_to_list(I))
if TARGET == O:
win = True
Outputs.append(compare(TARGET, O))
print_list(Outputs, "Outputs = ")
print("Good job!", open("flag", 'r').read())
```

So we can essentially input in total up to `QUERY_LIMIT`

many elements of `GF(17)^m`

. If we manage to send an input for which `calc`

applied to it is `TARGET`

, then we obtain the flag. Otherwise we get as feedback which components of `calc(our input vector)`

are equal to the corresponding component of `TARGET`

.

## Analyzing the problem

So there is a map `calc: GF(17)^m -> GF(17)^m`

that depends on unknown parameters `SboxLayer1`

, `LinearLayer`

, and `SboxLayer2`

, and our goal is to find a preimage of `TARGET`

under `calc`

. This would be much easier if we knew what the unknown parameters are. We won’t be able to exactly recover the precise parameters used in the construction of `calc`

however, as there are different parameter-triples that result int he same map `calc`

. For example one could multiply `LinearLayer`

by `2`

and `SBoxLayer2`

by the inverse of `2`

modulo `17`

(so `9`

), and like that obtain a new parameter-triple defining the same map `calc`

. As we only need to find a preimage under `calc`

of `TARGET`

, this indeterminancy is not a problem though, and conceptually this means we may rephrase the construction of `calc`

in simplified ways to make it easier to recover it.

First, any polynomial of degree 2 over `GF(17)`

can be written as `b*(x-r) ^2+ c`

for `b, r, c`

elements of `GF(17)`

and `b`

nonzero. So there must be `b_i, b'_i, r_i, r'_i, c_i, c'_i`

in `GF(17)`

with `b_i`

and `b'_i`

nonzero, as well as a `m`

by `m`

matrix `A`

with nonzero entries in `GF(17)`

such that `calc`

maps `(x_1, ..., x_m)`

to (notation of the form `(expr_i)`

with `i`

unbound is shorthand for the vector `(expr_1, ..., expr_m)`

here):

```
componentwise-eval_(b'_i*(x - r'_i)^2 + c'_i)( A * componentwise-eval_(b_i*(x - r_i)^2 + c_i)(x_i) )
= componentwise-eval_(b'_i*(x - r'_i)^2 + c'_i)( A * Diag(b_i) * componentwise-eval_((x - r_i)^2)(x_i) + A * (c_i) )
```

Now note that `b'_j*((x + (A * (x_i))_j) - r'_i)^2 + c'_j`

is again a polynomial of degree 2 in `x`

, so can be written as `b''_j*(x - r''_j)^2 + c''_j`

with `b''_j`

nonzero. We furthermore define `A' = A * Diag(b_i)`

. We then get:

```
= componentwise-eval_(b''_i*(x - r''_i)^2 + c''_i)( A' * ((x_i - r_i)^2))
```

Note that the `b_i`

are all nonzero, so as `A`

had only nonzero entries, the same holds for `A'`

. Let `(d_i)`

be the first column of `A'`

. Then we can write `A' = Diag(d_i) * A''`

where `A''`

still has only nonzero entries, but now all entries of the first column of `A''`

are `1`

. Then we get:

```
= componentwise-eval_(b''_i*(x - r''_i)^2 + c''_i)( Diag(d_i) * A'' * ((x_i - r_i)^2))
= componentwise-eval_(b''_i*d_i*(x - (r''_i/d_i))^2 + c''_i*d_i)( A'' * ((x_i - r_i)^2))
```

This is the form we are going to use to recover enough information to obtain an input such that `calc`

applied to it gives us `TARGET`

.

## Solution

### Recovering `r_i`

As a first step we recover the `r_i`

so that we know the first of the three parts `calc`

is composed of in the form we had above.

For this we note that `(x - r_i)^2`

takes the same value on `x = r_i + y`

and `x = r_i - y`

. There are thus `8`

pairs of elements of `GF(17)`

that have the same image under `(x - r_i)^2`

: `{r_i + 1, r_i - 1}, ..., {r_i + 8, r_i - 8}`

. Then there is a single additional element left over: `r_i`

, whose image has a preimage containing only `r_i`

itself. Hence we can recover `r_i`

with the following algorithm:

- Find some input
`(x_1, ..., x_m)`

such that`calc(x_1, ..., x_m)`

agrees with`TARGET`

in many components (this is to make the next steps more likely to succeed, in my solution script I sampled`1000`

random inputs and then sorted them by how many components of the output agree with`TARGET`

, and then started with those with the most agreement). - Send inputs
`(x_1, ..., x_{i-1}, y, x_{i+1}, ..., x_m)`

for all`0 <= y < 17`

. - If of the
`17`

results regarding in which components the value of`calc`

agrees with`TARGET`

we have one result that occurs only a single time, then the corresponding input must necessarily be the one where`y = r_i`

, so we are done. - If not, then go back to step 1. Note that if the input we started with in step 1 happened to have
`x_i = r_i`

(and that happens in around`1/17`

of the cases), then`calc(x_1, ..., x_m)`

agreeing in many components with`TARGET`

means it is quite unlikely that changing`i`

-th component of the input will still by coincidence result in the same components agreeing with`TARGET`

. Thus we should expect this algorithm to succeed with something around`17`

rounds, with`17`

queries each, for a total of`m*number_of_rounds*number_of_queries_per_round = 20*17*17 = 5780`

queries, out of a total budget of`77777`

queries. The actual amount of queries used varies, for example in one run it was 3757, in another 9503.

### Recovering `A''`

This step is done after we have already recovered the `r_i`

. So with `calc`

being a composition of three maps as above, we already understand the first map, the second one is multiplication by `A''`

, and the third one is a componentwise map. Recall that we had arranged things so that the first column of `A''`

is `(1,...,1)`

.

So now suppose that we guess some particular value `a_{i,j}`

for the entry of `A''`

in the `i`

-th row and `j`

-th column, with `j>1`

, and let us assume that guess is correct for the moment, and let us look at what happens if we change components `1`

and `j`

in the input to `calc`

. So say we have `(x_1, ..., x_m)`

and `(y_1,..., y_m)`

but `x_i = y_i`

as long as `i`

is not `1`

or `j`

. Then applying the first polynomial part we obtain `((x_1 - r_1)^2, ..., (x_m - r_m)^2)`

and `((y_1 - r_1)^2, ..., (y_m - r_m)^2)`

, still with all components but the first and `j`

-th one agreeing. Multiplying on the left with `A''`

will leave us in the `i`

-th component with `z + (x_1 - r_1)^2 + a_{i,j}*(x_j - r_j)^2`

and `z + (y_1 - r_1)^2 + a_{i,j}*(y_j - r_j)^2`

, where `z`

is some constant depending on the other entries of the `i`

-th row of `A''`

as well as the shared components of `(x_1, ..., x_m)`

and `(y_1,..., y_m)`

. Now for some choices of `(x_1, x_j)`

different from `(y_1, y_j)`

these values will be the same, and as we know `r_1`

and `r_j`

and are considering a fixed guess of `a_{i,j}`

, we can find such pairs of pairs (note that we do not need to know `z`

for this!). So let us choose one such pair of pairs for our input. So far we only discussed the first two parts of `calc`

, but the third part is componentwise, so will still result in the same value in the `i`

-th component, so the upshot is that if this guess for `a_{i,j}`

is correct, then we can construct a pair of inputs that will result in the same value in the `i`

-th component after applying `calc`

. Hence, still assuming that our guess `a_{i,j}`

is correct, the oracle will have to return to us that `calc(x_1, ..., x_m)_i`

agrees with `TARGET_i`

if and only if that is the case for `calc(y_1,..., y_m)_i`

. Thus if we fail to see this, this will definitively rule out `a_{i,j}`

as a guess for the `i,j`

-th entry of `A''`

. Repeating this until we have ruled out all guesses will leave us with a single remaining possibility, thereby recovering `A''`

.

So the algorithm is as follows to recover the `j`

-th column of `A''`

:

- Initialize the set of not-yet-ruled-out possibilities for each entry in the column to be the set
`{0,...,16}`

. - Get some random example input
`X=(x_1, ..., x_m)`

. - Construct a list of
`9*9=81`

input vectors that agree with`X`

in all components but the first and`j`

-th, and give pairwise different vectors after applying the first polynomial step of`calc`

. - For each such input vector, note down the first and
`j`

-th component (let us call them`z_1`

and`z_j`

) after applying the first polynomial step, and the result vector`q`

(on agreements with`TARGET`

) returned from the oracle. - For each
`i`

and each guess`a_{i,j}`

for the`i,j`

-th entry of`A''`

, and element`z`

of`GF(17)`

, consider all triples`(z_1, z_2, q)`

from the previous step such that`z = z_1 + a_{i,j}*z_j`

. If all the corresponding values`q_i`

are the same, then the guess is consistent with the data. If they are not, rule out`a_{i,j}`

as a possible value for the`i,j`

-th entry of`A''`

. - If there is only one possibility not ruled out for each entry of the column, we are done. Otherwise go back to step 2.

How many queries are required to recover `A''`

with this method varies, sometimes less than `2000`

queries are needed, other times over 50000.

### Recovering a preimage of `TARGET`

under the third step of `calc`

`calc`

is a composite of three maps, and we have to find a preimage of `TARGET`

. Note that we can split this problem up into two parts: find preimages of `TARGET`

under the last of the three maps, and then, for one of those, find a primage under the composition of the first two maps. To carry this out, let us first find preimages of `TARGET`

under that last map. For this say we find an input `(x_1, ..., x_m)`

so that the oracle tells us that `calc(x_1, ..., x_m)`

agrees with `TARGET`

in component `i`

. Then, as the third map is componentwise, we can conclude that whenever we have the same value in the `i`

-th component after applying the first two maps as for this input, then the image under `calc`

will agree with `TARGET`

in component `i`

. As we know `A''`

and the `r_j`

we can evaluate the first two steps of `calc`

under `(x_1, ..., x_m)`

and note down the `i`

-th component as a possible value in the `i`

-th component for preimages of `TARGET`

under the last map. Note that by going through many examples like this we expect to find either `2`

or (in `1/17`

of the components) `1`

value that would work for each of the components. Finally, preimages of `TARGET`

under the last map are all vectors that in each component has one of the values we found as just described. The number of candidates will thus be `2^k`

with `k<=2m`

, and most likely `2^40`

, `2^39`

or `2^38`

(precise expectation would be `2^{20*1*(1/17) + 20*2*(16/17)} ≈ 2^38.82 ≈ 4.85*10^11`

.

### Finding a working solution

We know `A''`

and the `r_i`

, so we can calculate preimages under the first two maps for all of the candidates we found in the previous subsection. Not every candidate will have a preimage, as the quadratic polynomials are not surjective. However, it is likely that we found all candidates in the previous step. What is the likelihood that a candidate has a preimage? Let us assume that the matrix is invertible, which will be the case in `16`

of `17`

tries (so if it isn’t, we can just run the solve script again). Then the question is whether `m=20`

components simultaneously have a preimage under a quadratic polynomial each. The image of a quadratic polynomial has cardinality `9`

, so that means the probability is `(9/17)^20`

. We thus expect to have to search `(17/9)^20 ≈ 3*10^5`

, many candidates until we find a preimage. Note that while we can be certain to find some preimage under one of the roughly `5*10^11`

candidates (assuming, as is likely, we have all of them) given that one of them must be the first two maps applied to `HIDDEN`

, this means we usually need to search through much less to find some working preimage.

## Attachments

See the challenge file as well the full solution script.