Even the King bows before angr

This is a follow up of the KingMaker challenge from Codegate CTF Preliminary 2019. We solved this challenge during the CTF but when someone asked if it was possible to solve it using angr, I wanted to try it out myself.

The binary asks us for 5 keys which are then used to xor parts of the code segment. So if you get the keys right, the rest of the code will be correctly decrypted. We got the keys by xor’ing the first few bytes at each address with the assembled values of the instructions push rbp and mov rbp, rsp and some other such instructions.

Once you’ve got all the keys, there’s one more function which asks you for a string and then does some kind of verification.

The remaining inputs are all integers which represent choices which you (the prince) must decide between. And if all your choices are right, then you get to become the King and get the flag. This is done by a function which checks the value of 5 global variables and if they’re all equal to 5, then you get the flag. Also, this function has to be called with the second argument set to 1.

We solved this by first figuring out the required strings and then trying all possible combinations of integers.

But for the rest of this post, I will be detailing two days of effort that I’ve spent on something that is little more than a fruit of pure arrogance.

The first thing I did was to patch the original binary with all the keys that we found. I’ve added the code for that at the end. Note that this code is Python2.7 only because I was too lazy to figure out why my Python3 code was creating an invalid ELF.

The next thing to do is to patch the functions which verify the string inputs. This is where angr’s hooking functionality becomes useful. I could’ve just as easily overwritten the first byte of these functions with a ret instruction, but what fun would that be ?

Now, the function which gives us the flag is pretty much called in every function with the second argument set to 0, There’s only one block from which this function is called with the second argument is set to 1. So that’s where we need to reach.

The integers that represent our choices are read in via scanf. So, in order to avoid reading the symbolic variables off the stack, I decided to hook scanf with a function that returned a symbolic variable if the format was ‘%d’.

But angr couldn’t find a path from the main function to the target block even after 110 minutes and after using up 98% of 64 gigs of RAM.

Then I found that the first couple of functions did not use the input to manipulate the global variables. These functions only had two choices and if you entered the wrong one, the program exits. So, I decided to skip those functions and try exploration once again. This time, I tried setting the auto_drop feature to automatically drop the avoid and unsat states which I hoped would free up some memory.

No luck. angr still used up too much memory.

Then I tried going through the different exploration techniques listed in the angr docs. ManualMergepoint seemed promising at first glance. But in retrospect, I don’t think it would’ve really made much of a difference.

The technique that eventually worked was DFS. It found the first path to the target block in less than 40 seconds.

But this path didn’t really satisfy the constraints that were required. And I couldn’t ask angr to find as many paths as possible because that would probably end up in another path explosion.

That’s when I looked up the angr documentation and it said that the find argument in the explore method can also be a function that returns either True or False.

So, I wrote up a function that would first check if the state had reached the target address and if so, it would then apply the constraints on the global variables.

I created a list to store the symbolic variables which I then stored in the state.globals. But at the end of exploration, the list contained nearly 3000 symbolic variables which is nearly 10 times the number of times the expected value.

It seemed like every state wrote to the same global object. So in order to get rid of that, I modified the code so that it would reset this list every time a state was not satisfiable. This lead to a different problem as now, the list contained too few symbolic variables. This probably happened because of a branching instruction at which the state split into two. When the first one was unsat , it probably reset the list which contained all the symbolic variables that were created up until that point. So the second state only contains the symbolic variables that were created after the branching instruction.

The solution was to just use a dictionary to map the basic block which calls scanf to the symbolic variable. So, if at some point, a state is unsat, the state which succeeds it should overwrite this symbolic variable.

And that eventually worked. It took about 40 minutes and only 1% of RAM for angr to find a path from the address 0x403197 to the target block. But trying to find a path from main to the target block took significantly more amount of time.

Anyway, here are the relevant scripts.

The solution

from pwn import *

keys = ['lOv3', 'D0l1', 'HuNgRYT1m3', 'F0uRS3aS0n', 'T1kT4kT0Kk']
nums = [1, 1, 2, 2, 3, 1, 2, 1, 1, 2, 1, 2, 2, 3, 1, 1, 1, 2, 3, 2, 2, 1]

vals = []
for x in nums:
    vals.append(str(x))

p = process('./KingMaker')
p.sendline(vals[0])
p.sendline(keys[0])
for x in range(1, 5):
    p.sendline(vals[x])
p.sendline(keys[1])
for x in range(5, 11):
    p.sendline(vals[x])
p.sendline(keys[2])
for x in range(11, 14):
    p.sendline(vals[x])
p.sendline(keys[3])
p.sendline(vals[14])
p.sendline(vals[15])
p.sendline('A'*6)
p.sendline(vals[16])
p.sendline(vals[17])
p.sendline(keys[4])
for x in range(18, len(vals)):
    p.sendline(vals[x])
print p.recvall()

Patching 


fixes = {0x40341D: {'key': 'lOv3', 'sz': 0xf0},
         0x40330F: {'key': 'lOv3', 'sz': 0xf0},
         0x4033FF: {'key': 'lOv3', 'sz': 0x1e},
         0x4032C0: {'key': 'lOv3', 'sz': 0x1e},
         0x4032DE: {'key': 'lOv3', 'sz': 0x31},
         0x403197: {'key': 'lOv3', 'sz': 0x129},
         0x4030D4: {'key': 'lOv3', 'sz': 0xc3},
         0x402D55: {'key': 'D0l1', 'sz': 0xfa},
         0x402C25: {'key': 'D0l1', 'sz': 0x112},
         0x402D37: {'key': 'D0l1', 'sz': 0x1e},
         0x4027E9: {'key': 'D0l1', 'sz': 0x44},
         0x4029B9: {'key': 'D0l1', 'sz': 0xe6},
         0x402B2B: {'key': 'D0l1', 'sz': 0xfa},
         0x4028B5: {'key': 'D0l1', 'sz': 0xe6},
         0x40299B: {'key': 'D0l1', 'sz': 0x1e},
         0x40271C: {'key': 'D0l1', 'sz': 0xcd},
         0x402A9F: {'key': 'D0l1', 'sz': 0x4e},
         0x402AED: {'key': 'D0l1', 'sz': 0x3e},
         0x40282D: {'key': 'D0l1', 'sz': 0x44},
         0x402871: {'key': 'D0l1', 'sz': 0x44},
         0x4020E2: {'key': 'HuNgRYT1m3', 'sz': 0x18d},
         0x40201F: {'key': 'HuNgRYT1m3', 'sz': 0xc3},
         0x401B0A: {'key': 'F0uRS3aS0n', 'sz': 0xf0},
         0x4019F2: {'key': 'F0uRS3aS0n', 'sz': 0xfa},
         0x401AEC: {'key': 'F0uRS3aS0n', 'sz': 0x1e},
         0x40192C: {'key': 'F0uRS3aS0n', 'sz': 0xa8},
         0x4019D4: {'key': 'F0uRS3aS0n', 'sz': 0x1e},
         0x4016D0: {'key': 'F0uRS3aS0n', 'sz': 0xc3},
         0x4011BB: {'key': 'T1kT4kT0Kk', 'sz': 0x131},
         0x400F25: {'key': 'T1kT4kT0Kk', 'sz': 0xdc},
         0x40108B: {'key': 'T1kT4kT0Kk', 'sz': 0x130},
         0x401001: {'key': 'T1kT4kT0Kk', 'sz': 0x1e},
         0x40101F: {'key': 'T1kT4kT0Kk', 'sz': 0x4e},
         0x40106D: {'key': 'T1kT4kT0Kk', 'sz': 0x1e},
         0x400DE7: {'key': 'T1kT4kT0Kk', 'sz': 0x120},
         0x400F07: {'key': 'T1kT4kT0Kk', 'sz': 0x1e},
         0x400C8C: {'key': 'T1kT4kT0Kk', 'sz': 0x15b}
         }

def xor(code, key):
    new_code = ''
    for x in range(len(code)):
        new_code += chr(ord(code[x]) ^ ord(key[x % len(key)]))
    return new_code


def fix_file(addr, length, content, key):
    offset = 0x40341D - addr
    start = 13341
    source = content[start-offset:start-offset+length]
    original = content[:start-offset]
    original += xor(source, key)
    original += content[start-offset+length:]
    return original


def patch():
    f1 = open("KingMaker", "rb")
    content = f1.read()
    f1.close()
    for addr in fixes:
        key = fixes[addr]['key']
        sz = fixes[addr]['sz']
        content = fix_file(addr, sz, content, key)

    f2 = open("Patched.bin", "wb")
    f2.write(content)
    f2.close()


if __name__ == '__main__':
    patch()

Exploration

import angr
from angr.exploration_techniques.dfs import DFS

GOAL = 0x400B58
XOR_FUNC = 0x400AB9
RUN_TEST = 0x400A16
DECRYPT_FUNC = 0x401793

inputs = {}
LE = angr.archinfo.Endness.LE


class scanf_hook(angr.SimProcedure):
    def run(self):
        global inputs
        if self.state.solver.eval(self.state.regs.rdi) == 0x4041F9:
            # %s
            return
        elif self.state.solver.eval(self.state.regs.rdi) == 0x40394F:
            # %d
            expr = self.state.solver.BVS('inp', 32)
            self.state.solver.add(expr > 0)
            self.state.solver.add(expr < 5)
            block_addr = list(self.state.history.bbl_addrs)[-3]
            inputs[block_addr] = expr
            self.state.memory.store(self.state.regs.rsi, expr, endness=LE)
        else:
            print("!!!!!!!!!!!!!New condition!!!!!!!!!!!!!!")
            print(self.state.solver.eval(self.state.regs.rdi))

def get_target(cfg):
    targets = []
    for addr in cfg.kb.callgraph.predecessors(GOAL):
        sites = []
        func = cfg.functions.function(addr)
        for x in func.get_call_sites():
            if func.get_call_target(x) == GOAL:
                b = cfg.get_any_node(x).block
                ins = b.capstone.insns[-3]
                op = ins.insn.operands[1]
                if op.imm == 1:
                    targets.append(x)
    return targets


def apply_constraints(state):
    global inputs
    # This is the address which get_target returns
    if state.addr != 0x400C3C:
        return False
    stats = [0x60716C, 0x607170, 0x607174, 0x607178, 0x60717C]
    for addr in stats:
        expr = state.memory.load(addr, 4, endness=LE)
        state.solver.add(expr == 5)

    return state.satisfiable()


def explore():
    p = angr.Project('./Patched.bin', load_options={'auto_load_libs': False})
    # cfg = p.analyses.CFG()
    # target = get_target(cfg)

    p.hook_symbol('__isoc99_scanf', scanf_hook())
    p.hook(XOR_FUNC, angr.SIM_PROCEDURES['stubs']['Nop']())
    p.hook(RUN_TEST, angr.SIM_PROCEDURES['stubs']['ReturnUnconstrained']())
    p.hook(DECRYPT_FUNC, angr.SIM_PROCEDURES['stubs']['ReturnUnconstrained']())

    # This one takes nearly 2 hours to find a path
    # s = p.factory.blank_state(addr=0x403780)

    # And this one takes 40 minutes
    s = p.factory.blank_state(addr=0x403197)
    pg = p.factory.simulation_manager(s, auto_drop=['avoid', 'unsat'])
    pg.use_technique(DFS())
    pg.explore(find=apply_constraints, avoid=GOAL)

    for state in pg.found:
        print("Inputs")
        for x in list(state.history.bbl_addrs):
            if x in inputs.keys():
                print(state.solver.eval(inputs[x]))


if __name__ == '__main__':
    explore()

Nuit Du Hack CTF 2017 Quals Matriochka Write up

This was a reverse engineering challenge that consisted of 4 parts. I’ll just be describing part 2 over here. If I manage to solve part 3, I’ll update that too.

Part 1 was really simple. All you need to do is to run an ltrace and you’ll see that argv[1] is being reversed and then compared with a string “Tr4laLa!!!”.

So once you pass step1, the binary then goes on to decrypt and print out the step2 binary.

Step2.bin is relatively trickier. Taking a quick look at the CFG might suggest that this is another job for angr. But that’s where the challenge slaps you across your face.

The first few BBL’s are verifying whether the argv[0] is “./step2.bin”. After those checks, it goes on to check if  length of argv[1] is 33 argv[1][0] is “W”. That gives us the length and first character of our flag. Next it goes on to do some messed up operations.

This is the part where it gets tricky.Screenshot from 2017-04-05 14-32-22

At the time of execution of the first instruction, rax contains a pointer to a copy of our argument stored in the heap via an strdup call. Also, a copy of rax is saved in r12. So rbx contains the pointer to the byte at offset 1 of our input.

It then calls these functions, sigfillset and sigaction. After a bit of googling, I found out the reason behind the usage of sigfillset and sigaction.

Usually, when you wish to add signal handlers for multiple signals, there might arise a scenario where two signals are caught. The first signal causes the appropriate signal handler to be invoked. Now if the second signal is caught while the signal handler of the first signal was being executed, that causes trouble. Hence, the method that is usually followed to avoid this scenario is to block all signals if a signal handler is being executed. Sigaction decides which all signals to block based on a data structure called set. Sigfillset basically fills this set with all signals. So a combination of these two simply blocks all new signals if a signal handler is under execution.

Now that that’s clear, lets get back to the code. The signal handler is assigned to the signal identified by 5 which is the SIGTRAP signal. There are two signal handlers in this code ie inc_byte and dec_byte. The signal handler assigned is dependent on the value of the LSB of the byte at rbx-1.

So what happens here is that, after the signal handler is set for SIGTRAP, the program executes an int 3 instruction. The int 3 instruction basically generates the SIGTRAP signal. This causes the signal handler to be invoked. The signal handler in turn does either an inc [rax] or dec [rax] and returns.

This process is repeated until a counter , initialised to 0, becomes equal to [rbx-1].

So basically, inp[1] becomes inp[1]-inp[0] or inp[1]+inp[0] depending upon the LSB of inp[0].

This whole process is then repeated until all the characters of the input have been changed this way.

What happens after this is a basic xoring and comparing with some fixed values in the bss segment. If our input passes all those checks, we get the “Well done” string printed out along the the step3.bin.

So, the first thing I did was to use angr to find out the values that could pass the final xoring and comparing blocks. And that angr did very well.

Now I had a string which was the result of the increment/decrement loops. Keep in mind that we already know the first byte of the flag. So what’s left is to whip up a simple script that can reverse the functionality of this loop.

And after that’s done. We’re left with the flag and the next challenge.

Anyway, here’s the script.

import angr


def get_constant():
    p = angr.Project("./step2.bin", load_options={'auto_load_libs': False})
    s = p.factory.blank_state(addr=0x4008DB)
    val = s.se.BVS('val', 33*8)
    s.memory.store(0x6335b0, val)
    s.regs.r12 = s.se.BVV(0x6335b0, 64)
    pg = p.factory.path_group(s)
    pg.explore(find=0x400906, avoid=0x4009BC)
    for pp in pg.found:
        return pp.state.se.any_str(val)


def get_arg(out):
    result = "W"
    for x in xrange(1, len(out)):
        c = out[x-1]
        d = out[x]
        flag = ord(c) & 1
        if flag == 1:
            result += chr((ord(d)+ord(c)) % 256)
        else:
            result += chr((ord(d)-ord(c)) % 256)
    return result


if __name__ == "__main__":
    out = get_constant()
    arg = get_arg("W"+out[1:])
    print arg

Flag : When_i_grow_up,_I_will_be_a_flag

Angr’ing through catalyst

I had played Alex CTF last week. It was a really well conducted CTF with some pretty interesting challenges. Amusingly the whole CTF was managed by one single guy. Hats off to you @oddcoder !

So there was this one reversing challenge Catalyst, which seemed to be a pretty straightforward task. Get the username and password, connect to the server, get the flag.

So I started digging into the assembly code. There was a sleep functionality which I had not given much importance. It would’ve made debugging hell if one of my team-mates hadn’t been kind enough to patch the binary and send it to me before I started with GDB.

I solved this challenge during the CTF using angr and some reverse engineering. But then I decided to see if I could do the whole process with angr. And well, this post is the result of that.

There are 4 functions of importance in the binary apart from main. The first function called performs a calculation of the length of the username that we have provided. This function then calls another function with the length calculated. So I used angr to find out a an integer value that might allow us to pass the checks in the said function.

And a value of 8 would be able to pass that function. But apparently, 8 was a lower bound. The actual length of the username was 12.

This I found by first using angr to get a username and manually testing it out in GDB. This had to be done by using angr to get a string that could pass the checks in the second function that main calls. I had gotten the username as ‘catalyst’ but GDB was telling me that it wasn’t the correct username. So I tweaked my script just a little bit and then the whole username got printed out which turned out to be ‘catalyst_ceo’.

Now I got to the third function which was checking whether or not the password we entered was correct. Now this function threw my script off a few times when I tried to use angr. So I resorted to manually reversing the function. I ended up getting the correct password eventually.

Now the problem with the last function (I’m gonna call it check_flag ) is that it was calling rand and then using the return value of that and comparing it with our input password. Upon closer inspection, we see that the value passed to srand is a sum of our username string taken 4 bytes at a time. So the return values of rand are predictable.

So, I decided to use angr’s hooking functionality. Thanks @rhelmot for the global_variables fix.

The idea here is to make angr redirect calls to rand to a function that I decide. And that function would return the values that rand should if provided the correct seed. So I used angr’s hook_symbol functionality.

I ran into a problem at this point. My hooking function had to return different values every time it was called. So I asked @rhelmot about this and he made this really quick tweak in the SimProcedure class.

Thanks to him, now we can assign global variables to states. And these global variables get passed down every time a state is copied. So at every point of time, the state has these global variables. Including, inside my function.

Angr was taking a lot of time to print out the whole 40 byte password in my machine (my machine is not very powerful ). But then I noticed that the binary was actually checking 4 bytes of the password at a time. So I changed my script to use angr and find 4 bytes at a time and then combine them all together at the end.

Now, I could have just hard coded the addresses in the script. But what fun would that be ?

I decided to play around with the CFG that angr creates to get those addresses. Granted it took some extra time and a little bit of dirty heuristics, but it was worth it.

Now all I gotta do is execute my script and get a cup of coffee and come back and stare at the username and password on my screen which I could’ve easily done in much smaller time frames. Totally worth it.

Anyway, below’s the commented script. Feel free to leave comments.

import angr
import IPython
from ctypes import CDLL


def get_uname(p):
    s = p.factory.blank_state(addr=0x400cdd)
    uname = s.se.BVS('uname', 24*8)
    s.memory.store(0x6021e0, uname)
    s.regs.rdi = s.se.BVV(0x6021e0, 64)
    pg = p.factory.path_group(s)
    pg.explore(find=0x400d90, avoid=0x400d7c)
    for p in pg.found:
        state = p.state
        return state.se.any_str(uname)


def get_password(p, name):
    cfg = p.analyses.CFGAccurate()
    starts = []
    avoids = []
    rands = []
    f = cfg.functions.get(0x400977)
    call_sites = f.get_call_sites()
    for address, function in cfg.functions.iteritems():
        if function.name == 'rand':
            rand_func = function
            break
    for address, function in cfg.functions.iteritems():
        if function.name == 'puts':
            puts = function
            break
    for x in call_sites:
        target = f.get_call_target(x)
        if target == rand_func.addr:
            starts.append(x+10)
        elif target == puts.addr:
            avoids.append(x)
    starts.sort()
    avoids.sort()
    avoids.pop(0)
    starts.append(0x400c39)
    starts[0] = starts[0]-4

    libc = CDLL("/lib/x86_64-linux-gnu/libc.so.6")
    seed = 0
    for x in range(0, len(name), 4):
        seed += int(name[x:x+4][::-1].encode('hex'), 16)
    libc.srand(seed & 0xffffffff)

    for x in range(10):
        rands.append(libc.rand())

    class rand(angr.simuvex.SimProcedure):
        def run(self):
            retval = self.state.procedure_data.global_variables['rand_val']
            return retval

    p.hook_symbol('rand', rand)
    final_str = ""

    for x in range(10):
        s = p.factory.blank_state(addr=starts[x])
        password = s.se.BVS('password', 32)
        s.regs.ebx = password
        s.procedure_data.global_variables['rand_val'] = rands[x]
        pg = p.factory.path_group(s)
        pg.explore(find=starts[x+1], avoid=avoids[x])
        for path in pg.found:
            state = path.state
            final_str += state.se.any_str(password)[::-1]
        return final_str


if __name__ == '__main__':
    filename = 'catalyst'
    p = angr.Project(filename, load_options={'auto_load_libs': False})
    uname = get_uname(p)
    passwd = get_password(p, uname)
    print "Username = {}".format(uname)
    print "Password = {}".format(passwd)