Codegate CTF 2023 Preliminary - anti Kerckhoffs
Overview
Summary
This is a writeup for the “anti Kerckhoffs” challenge at the Codegate 2023 Preliminaries. This challenge was built around basic abstract algebra, exploiting e.g. that evaluation of quadratic polynomials is not bijective and information about them can be recovered by comparing the cardinality of preimages. We were the first team to solve this challenge, which in total only four teams solved honestly (a fifth team’s solution was stolen from us and uncovered a spy).
Solved by: shalaamum
Writeup by: shalaamum
Writeup first published here.
Understanding the challenge code
We are given a Python file prob.py
with 86 lines, as well as connection details for a remote instance.
Let us look through the code first.
The code begins with some constants that we will keep in mind:
QUERY_LIMIT = 77777
MOD = 17
m = 20
In particular, we will see that all calculations are (ultimately) done modulo the prime MOD = 17
.
The first two functions are the following.
def quadratic_eval(coeffs, x):
return (coeffs[2] * x ** 2 + coeffs[1] * x + coeffs[0]) % MOD
def get_random_quadratic():
c2 = secrets.randbelow(MOD - 1) + 1
c1 = secrets.randbelow(MOD)
c0 = secrets.randbelow(MOD)
return [c0, c1, c2]
The function get_random_quadratic
returns a list with three components, the first two of which are random integers from 0
to 16
(inclusive) and the last one a random integer from 1
to 16
. Interpreting these as the coefficients of a polynomial, this returns a random quadratic polynomial over GF(17)
; the reason 0
is excluded for the last component is to ensure the polynomial is indeed of degree 2. The function quadratic_eval
then evaluates such a polynomial at the second argument.
The next function we see is calc
, which depends on some globals we have not seen yet:
def calc(Inputs):
Outputs = [quadratic_eval(SboxLayer1[i], Inputs[i]) for i in range(m)]
Outputs = [sum(LinearLayer[i][j] * Outputs[j] for j in range(m)) for i in range(m)]
Outputs = [quadratic_eval(SboxLayer2[i], Outputs[i]) for i in range(m)]
return Outputs
We can assume here that Inputs
is supposed to be some iterable with m
components. The globals SboxLayer1
and SboxLayer2
are likely lists of m
many quadratic polynomials, while LinearLayer
is an m x m
matrix. calc
applies the polynomials from SboxLayer1
componentwise to Inputs
, then the resulting vector is multiplied with from the left with the matrix LinearLayer
, and finally the polynomials from SboxLayer2
are applied componentwise again, giving as an end result a list with m
components from 0
to 16
.
The remaining functions are less interesting helper functions: print_list
does what it says, and list_to_int
and int_to_list
convert between lists and ints, where the lists are interpreted as the coefficients of the corresponding int when written to base 17
. The function compare
takes two lists as input and returns an integer whose bits indicate which components of the lists were equal and which weren’t.
Now let us look at the globals generated by the challenge.
SboxLayer1 = [get_random_quadratic() for _ in range(m)]
LinearLayer = [[secrets.randbelow(MOD-1) + 1 for _ in range(m)] for _ in range(m)]
SboxLayer2 = [get_random_quadratic() for _ in range(m)]
HIDDEN = [secrets.randbelow(MOD) for _ in range(m)]
TARGET = calc(HIDDEN)
print(f"TARGET = {list_to_int(TARGET)}")
So the already mentioned SboxLayer1
and SboxLayer2
are lists of m
quadratic polynomials as already expected, and now we see that they are generated randomly, and stay secret. LinearLayer
is a random m x m
matrix with entries from 1
to 16
(inclusive), and similarly stays secret. Furthermore an element HIDDEN
in GF(17)^m
is generated, and we get to know the value of TARGET=calc(HIDDEN)
. The specific value of HIDDEN
and this relationship won’t be relevant in the following, so we can just take TARGET
as some element of GF(17)^m
that is in the image of calc
.
The interaction part of the challenge is then the following.
query_count = 0
win = False
while not win:
Inputs = list(map(int, input("Inputs > ").split()))
query_count += len(Inputs)
if query_count > QUERY_LIMIT:
print("bye..")
exit(-1)
Outputs = []
for I in Inputs:
O = calc(int_to_list(I))
if TARGET == O:
win = True
Outputs.append(compare(TARGET, O))
print_list(Outputs, "Outputs = ")
print("Good job!", open("flag", 'r').read())
So we can essentially input in total up to QUERY_LIMIT
many elements of GF(17)^m
. If we manage to send an input for which calc
applied to it is TARGET
, then we obtain the flag. Otherwise we get as feedback which components of calc(our input vector)
are equal to the corresponding component of TARGET
.
Analyzing the problem
So there is a map calc: GF(17)^m -> GF(17)^m
that depends on unknown parameters SboxLayer1
, LinearLayer
, and SboxLayer2
, and our goal is to find a preimage of TARGET
under calc
. This would be much easier if we knew what the unknown parameters are. We won’t be able to exactly recover the precise parameters used in the construction of calc
however, as there are different parameter-triples that result int he same map calc
. For example one could multiply LinearLayer
by 2
and SBoxLayer2
by the inverse of 2
modulo 17
(so 9
), and like that obtain a new parameter-triple defining the same map calc
. As we only need to find a preimage under calc
of TARGET
, this indeterminancy is not a problem though, and conceptually this means we may rephrase the construction of calc
in simplified ways to make it easier to recover it.
First, any polynomial of degree 2 over GF(17)
can be written as b*(x-r) ^2+ c
for b, r, c
elements of GF(17)
and b
nonzero. So there must be b_i, b'_i, r_i, r'_i, c_i, c'_i
in GF(17)
with b_i
and b'_i
nonzero, as well as a m
by m
matrix A
with nonzero entries in GF(17)
such that calc
maps (x_1, ..., x_m)
to (notation of the form (expr_i)
with i
unbound is shorthand for the vector (expr_1, ..., expr_m)
here):
componentwise-eval_(b'_i*(x - r'_i)^2 + c'_i)( A * componentwise-eval_(b_i*(x - r_i)^2 + c_i)(x_i) )
= componentwise-eval_(b'_i*(x - r'_i)^2 + c'_i)( A * Diag(b_i) * componentwise-eval_((x - r_i)^2)(x_i) + A * (c_i) )
Now note that b'_j*((x + (A * (x_i))_j) - r'_i)^2 + c'_j
is again a polynomial of degree 2 in x
, so can be written as b''_j*(x - r''_j)^2 + c''_j
with b''_j
nonzero. We furthermore define A' = A * Diag(b_i)
. We then get:
= componentwise-eval_(b''_i*(x - r''_i)^2 + c''_i)( A' * ((x_i - r_i)^2))
Note that the b_i
are all nonzero, so as A
had only nonzero entries, the same holds for A'
. Let (d_i)
be the first column of A'
. Then we can write A' = Diag(d_i) * A''
where A''
still has only nonzero entries, but now all entries of the first column of A''
are 1
. Then we get:
= componentwise-eval_(b''_i*(x - r''_i)^2 + c''_i)( Diag(d_i) * A'' * ((x_i - r_i)^2))
= componentwise-eval_(b''_i*d_i*(x - (r''_i/d_i))^2 + c''_i*d_i)( A'' * ((x_i - r_i)^2))
This is the form we are going to use to recover enough information to obtain an input such that calc
applied to it gives us TARGET
.
Solution
Recovering r_i
As a first step we recover the r_i
so that we know the first of the three parts calc
is composed of in the form we had above.
For this we note that (x - r_i)^2
takes the same value on x = r_i + y
and x = r_i - y
. There are thus 8
pairs of elements of GF(17)
that have the same image under (x - r_i)^2
: {r_i + 1, r_i - 1}, ..., {r_i + 8, r_i - 8}
. Then there is a single additional element left over: r_i
, whose image has a preimage containing only r_i
itself. Hence we can recover r_i
with the following algorithm:
- Find some input
(x_1, ..., x_m)
such thatcalc(x_1, ..., x_m)
agrees withTARGET
in many components (this is to make the next steps more likely to succeed, in my solution script I sampled1000
random inputs and then sorted them by how many components of the output agree withTARGET
, and then started with those with the most agreement). - Send inputs
(x_1, ..., x_{i-1}, y, x_{i+1}, ..., x_m)
for all0 <= y < 17
. - If of the
17
results regarding in which components the value ofcalc
agrees withTARGET
we have one result that occurs only a single time, then the corresponding input must necessarily be the one wherey = r_i
, so we are done. - If not, then go back to step 1. Note that if the input we started with in step 1 happened to have
x_i = r_i
(and that happens in around1/17
of the cases), thencalc(x_1, ..., x_m)
agreeing in many components withTARGET
means it is quite unlikely that changingi
-th component of the input will still by coincidence result in the same components agreeing withTARGET
. Thus we should expect this algorithm to succeed with something around17
rounds, with17
queries each, for a total ofm*number_of_rounds*number_of_queries_per_round = 20*17*17 = 5780
queries, out of a total budget of77777
queries. The actual amount of queries used varies, for example in one run it was 3757, in another 9503.
Recovering A''
This step is done after we have already recovered the r_i
. So with calc
being a composition of three maps as above, we already understand the first map, the second one is multiplication by A''
, and the third one is a componentwise map. Recall that we had arranged things so that the first column of A''
is (1,...,1)
.
So now suppose that we guess some particular value a_{i,j}
for the entry of A''
in the i
-th row and j
-th column, with j>1
, and let us assume that guess is correct for the moment, and let us look at what happens if we change components 1
and j
in the input to calc
. So say we have (x_1, ..., x_m)
and (y_1,..., y_m)
but x_i = y_i
as long as i
is not 1
or j
. Then applying the first polynomial part we obtain ((x_1 - r_1)^2, ..., (x_m - r_m)^2)
and ((y_1 - r_1)^2, ..., (y_m - r_m)^2)
, still with all components but the first and j
-th one agreeing. Multiplying on the left with A''
will leave us in the i
-th component with z + (x_1 - r_1)^2 + a_{i,j}*(x_j - r_j)^2
and z + (y_1 - r_1)^2 + a_{i,j}*(y_j - r_j)^2
, where z
is some constant depending on the other entries of the i
-th row of A''
as well as the shared components of (x_1, ..., x_m)
and (y_1,..., y_m)
. Now for some choices of (x_1, x_j)
different from (y_1, y_j)
these values will be the same, and as we know r_1
and r_j
and are considering a fixed guess of a_{i,j}
, we can find such pairs of pairs (note that we do not need to know z
for this!). So let us choose one such pair of pairs for our input. So far we only discussed the first two parts of calc
, but the third part is componentwise, so will still result in the same value in the i
-th component, so the upshot is that if this guess for a_{i,j}
is correct, then we can construct a pair of inputs that will result in the same value in the i
-th component after applying calc
. Hence, still assuming that our guess a_{i,j}
is correct, the oracle will have to return to us that calc(x_1, ..., x_m)_i
agrees with TARGET_i
if and only if that is the case for calc(y_1,..., y_m)_i
. Thus if we fail to see this, this will definitively rule out a_{i,j}
as a guess for the i,j
-th entry of A''
. Repeating this until we have ruled out all guesses will leave us with a single remaining possibility, thereby recovering A''
.
So the algorithm is as follows to recover the j
-th column of A''
:
- Initialize the set of not-yet-ruled-out possibilities for each entry in the column to be the set
{0,...,16}
. - Get some random example input
X=(x_1, ..., x_m)
. - Construct a list of
9*9=81
input vectors that agree withX
in all components but the first andj
-th, and give pairwise different vectors after applying the first polynomial step ofcalc
. - For each such input vector, note down the first and
j
-th component (let us call themz_1
andz_j
) after applying the first polynomial step, and the result vectorq
(on agreements withTARGET
) returned from the oracle. - For each
i
and each guessa_{i,j}
for thei,j
-th entry ofA''
, and elementz
ofGF(17)
, consider all triples(z_1, z_2, q)
from the previous step such thatz = z_1 + a_{i,j}*z_j
. If all the corresponding valuesq_i
are the same, then the guess is consistent with the data. If they are not, rule outa_{i,j}
as a possible value for thei,j
-th entry ofA''
. - If there is only one possibility not ruled out for each entry of the column, we are done. Otherwise go back to step 2.
How many queries are required to recover A''
with this method varies, sometimes less than 2000
queries are needed, other times over 50000.
Recovering a preimage of TARGET
under the third step of calc
calc
is a composite of three maps, and we have to find a preimage of TARGET
. Note that we can split this problem up into two parts: find preimages of TARGET
under the last of the three maps, and then, for one of those, find a primage under the composition of the first two maps. To carry this out, let us first find preimages of TARGET
under that last map. For this say we find an input (x_1, ..., x_m)
so that the oracle tells us that calc(x_1, ..., x_m)
agrees with TARGET
in component i
. Then, as the third map is componentwise, we can conclude that whenever we have the same value in the i
-th component after applying the first two maps as for this input, then the image under calc
will agree with TARGET
in component i
. As we know A''
and the r_j
we can evaluate the first two steps of calc
under (x_1, ..., x_m)
and note down the i
-th component as a possible value in the i
-th component for preimages of TARGET
under the last map. Note that by going through many examples like this we expect to find either 2
or (in 1/17
of the components) 1
value that would work for each of the components. Finally, preimages of TARGET
under the last map are all vectors that in each component has one of the values we found as just described. The number of candidates will thus be 2^k
with k<=2m
, and most likely 2^40
, 2^39
or 2^38
(precise expectation would be 2^{20*1*(1/17) + 20*2*(16/17)} ≈ 2^38.82 ≈ 4.85*10^11
.
Finding a working solution
We know A''
and the r_i
, so we can calculate preimages under the first two maps for all of the candidates we found in the previous subsection. Not every candidate will have a preimage, as the quadratic polynomials are not surjective. However, it is likely that we found all candidates in the previous step. What is the likelihood that a candidate has a preimage? Let us assume that the matrix is invertible, which will be the case in 16
of 17
tries (so if it isn’t, we can just run the solve script again). Then the question is whether m=20
components simultaneously have a preimage under a quadratic polynomial each. The image of a quadratic polynomial has cardinality 9
, so that means the probability is (9/17)^20
. We thus expect to have to search (17/9)^20 ≈ 3*10^5
, many candidates until we find a preimage. Note that while we can be certain to find some preimage under one of the roughly 5*10^11
candidates (assuming, as is likely, we have all of them) given that one of them must be the first two maps applied to HIDDEN
, this means we usually need to search through much less to find some working preimage.
Attachments
See the challenge file as well the full solution script.