I’ve still been playing around with Radare2 when I can, and I wanted to see how easy it was to emulate a block of disassembled code using its Evaluable Strings Intermediate Language, or ESIL. For an example, I went back to Eldad Eilam’s Reversing: Secrets of Reverse Engineering.
In Chapter 11, he shows how to reverse a program called Defender.exe that requires a correct username and serial number. There are several protections and anti-reversing tricks this program employs. Specifically, I was looking at some of the sections of code that are encrypted. There are several of these in this program and they all start with a normal looking function followed by some data. That data is decrypted, executed, then encrypted once again.
The executable is available for download in the Downloads section of the book’s page on Wiley. The code I’m going to be referencing is discussed around page 386 (function at 0x4033d1) and page 397 (function at 0x402eef).
Initializing r2 and Observing the Encrypted Code
Assuming you have radare2 and the executable, the initial startup process is straightforward, using the aaa command for initial analysis:
$ r2 Defender.exe -- Here be dragons. [0x00404232]> aaa [x] Analyze all flags starting with sym. and entry0 (aa) [x] Analyze len bytes of instructions for references (aar) [x] Analyze function calls (aac) [ ] [*] Use -AA or aaaa to perform additional experimental analysis. [x] Constructing a function name for fcn.* and sym.func.* functions (aan)) [0x00404232]>
Seek to the first function and print it out.
[0x00404232]> s 0x4033d1 [0x004033d1]> pd 100 ┌ (fcn) fcn.004033d1 268 │ fcn.004033d1 (int arg_3h); │ ; CALL XREF from 0x0040423f (entry0) │ 0x004033d1 55 push ebp │ 0x004033d2 8bec mov ebp, esp │ 0x004033d4 81ec2c020000 sub esp, 0x22c │ 0x004033da 53 push ebx │ 0x004033db 56 push esi │ 0x004033dc 57 push edi │ 0x004033dd 68dd344000 push 0x4034dd │ 0x004033e2 58 pop eax │ 0x004033e3 8945e0 mov dword [local_20h], eax │ 0x004033e6 68fd414000 push 0x4041fd ; "_^[....`@" │ 0x004033eb 58 pop eax │ 0x004033ec 8945e8 mov dword [local_18h], eax │ 0x004033ef b8e5344000 mov eax, 0x4034e5 │ 0x004033f4 8905d6344000 mov dword [0x4034d6], eax ; [0x4034d6:4]=0x4034e5 │ 0x004033fa c745f8010000. mov dword [local_8h], 1 │ 0x00403401 837df800 cmp dword [local_8h], 0 │ ┌─< 0x00403405 7466 je 0x40346d ...snip... │ │ ; JMP XREF from 0x0040346b (fcn.004033d1) │ └──> 0x004034d5 68e5344000 push 0x4034e5 │ 0x004034da 5b pop ebx └ 0x004034db ffe3 jmp ebx ...snip... [0x004033d1]> pd 10 @0x4034e5 ; DATA XREF from 0x004033ef (fcn.004033d1) ; DATA XREF from 0x004034d5 (fcn.004033d1) ┌─< 0x004034e5 7e10 jle 0x4034f7 │ 0x004034e7 b98842c1f8 mov ecx, 0xf8c14288 │ 0x004034ec e68b out 0x8b, al │ 0x004034ee 2b7f7d sub edi, dword [edi + 0x7d] │ 0x004034f1 f1 int1 │ 0x004034f2 0289dfc5cbc8 add cl, byte [ecx - 0x37343a21] 0x004034f8 b114 mov cl, 0x14 0x004034fa 214f2a and dword [edi + 0x2a], ecx 0x004034fd 6e outsb dx, byte [esi] 0x004034fe 08fd or ch, bh
I’m not going to go into detail on everything going on here since it’s covered in the book. The function starts out like all these encrypted functions do, and towards the end there is a PUSH-POP-JMP sequence. The JMP at 0x4034db will move execution to 0x4034e5. When starting disassembly from that line, it’s clear this doesn’t look like legitimate assembly code because it’s encrypted.
Setting up the ESIL Environment
Using ESIL, it’s fairly simple to get radare2 to decrypt this for us. First set up the ESIL environment:
[0x004033d1]> e asm.emu=true [0x004033d1]> e asm.emustr=true [0x004033d1]> e asm.esil=true
The first two commands modify the disassembly so that ESIL information is displayed. The output with asm.emu is verbose, while asm.emustr only shows the most useful information. Setting asm.esil=true shows what the ESIL looks like. The following two snippets show the difference between the asm.emu and asm.esil settings:
[0x004033d1]> s 0x4033d1 [0x004033d1]> e asm.emu=true [0x004033d1]> pd 10 ┌ (fcn) fcn.004033d1 268 │ fcn.004033d1 (int arg_3h); │ ; CALL XREF from 0x0040423f (entry0) │ 0x004033d1 55 push ebp ; esp=0xfffffffffffffffc -> 0xffffff00 │ 0x004033d2 8bec mov ebp, esp ; ebp=0xfffffffc -> 0xffffff00 │ 0x004033d4 81ec2c020000 sub esp, 0x22c ; esp=0xfffffdd0 -> 0xffffff00 ; of=0x0 ; sf=0x1 -> 0x3009000 ; zf=0x0 ; pf=0x0 ; cf=0x0 │ 0x004033da 53 push ebx ; esp=0xfffffdcc -> 0xffffff00 │ 0x004033db 56 push esi ; esp=0xfffffdc8 -> 0xffffff00 │ 0x004033dc 57 push edi ; esp=0xfffffdc4 -> 0xffffff00 │ 0x004033dd 68dd344000 push 0x4034dd ; esp=0xfffffdc0 -> 0xffffff00 │ 0x004033e2 58 pop eax ; eax=0xffffffff -> 0xffffff00 ; esp=0xfffffdc4 -> 0xffffff00 │ 0x004033e3 8945e0 mov dword [local_20h], eax │ 0x004033e6 68fd414000 push 0x4041fd ; esp=0xfffffdc0 -> 0xffffff00
[0x004033d1]> s 0x4033d1 [0x004033d1]> e asm.esil=true [0x004033d1]> pd 10 ┌ (fcn) fcn.004033d1 268 │ fcn.004033d1 (int arg_3h); │ ; CALL XREF from 0x0040423f (entry0) │ 0x004033d1 55 ebp,4,esp,-=,esp,=[4] ; esp=0xfffffffffffffffc -> 0xffffff00 │ 0x004033d2 8bec esp,ebp,= ; ebp=0xfffffffc -> 0xffffff00 │ 0x004033d4 81ec2c020000 556,esp,-=,$o,of,=,$s,sf,=,$z,zf,=,$p,pf,=,$b4,cf,= ; esp=0xfffffdd0 -> 0xffffff00 ; of=0x0 ; sf=0x1 -> 0x3009000 ; zf=0x0 ; pf=0x0 ; cf=0x0 │ 0x004033da 53 ebx,4,esp,-=,esp,=[4] ; esp=0xfffffdcc -> 0xffffff00 │ 0x004033db 56 esi,4,esp,-=,esp,=[4] ; esp=0xfffffdc8 -> 0xffffff00 │ 0x004033dc 57 edi,4,esp,-=,esp,=[4] ; esp=0xfffffdc4 -> 0xffffff00 │ 0x004033dd 68dd344000 4207837,4,esp,-=,esp,=[4] ; esp=0xfffffdc0 -> 0xffffff00 │ 0x004033e2 58 esp,[4],eax,=,4,esp,+= ; eax=0xffffffff -> 0xffffff00 ; esp=0xfffffdc4 -> 0xffffff00 │ 0x004033e3 8945e0 eax,0x20,ebp,-,=[4] │ 0x004033e6 68fd414000 4211197,4,esp,-=,esp,=[4] ; esp=0xfffffdc0 -> 0xffffff00
These options aren’t really needed for the code emulation part, at least as far as I’ve found, but provide some more insight into what r2 will be doing during the emulation. The documentation seems to say that at least asm.emu is needed, but it worked fine for me when I didn’t include that. Maybe more complex code might require it.
[0x004033d1]> e asm.bits=32 [0x004033d1]> e asm.arch=x86 [0x004033d1]> e asm.emuwrite=true [0x004033d1]> e io.cache=true
The first two commands above also don’t seem to be necessary (maybe because they are the defaults and match the file we’re looking at), but probably don’t hurt to ensure r2 will handle the code correctly. The last two are important. They allow r2 to modify memory and enable cache for io changes, respectively.
[0x004033d1]> s 0x4033d1 [0x004033d1]> aei [0x004033d1]> aeim [0x004033d1]> aeip [0x004033d1]> aer oeax = 0x00000000 eax = 0x00000000 ebx = 0x00000000 ecx = 0x004041f9 edx = 0x00000000 esi = 0x00000000 edi = 0x00000000 esp = 0x00177dc4 ebp = 0x00177ffc eip = 0x004033d1 eflags = 0x00000081
These commands finally set up the ESIL environment and display the current register values. First, aei initializes the ESIL VM state, aeim initializes the ESIL VM stack, and aeip initializes the ESIL program counter to the current address. Finally, aer displays the current registry values. The environment is ready to begin emulating the code. You could step through using various debugger-like commands such as aes, aeso, aec, which are all documented on the ESIL page above. The easiest way for this code, though, is to let it execute until just before jumping to the decrypted code.
[0x004033d1]> aecu 0x4034db [0x004034d5]> pd 3 │ ; JMP XREF from 0x0040346b (fcn.004033d1) │ 0x004034d5 68e5344000 push 0x4034e5 │ 0x004034da 5b pop ebx └ 0x004034db ffe3 jmp ebx [0x004034d5]> pd 10 @0x4034e5 ; DATA XREF from 0x004033ef (fcn.004033d1) ; DATA XREF from 0x004034d5 (fcn.004033d1) 0x004034e5 8b4508 mov eax, dword [ebp + 8] ; [0x8:4]=4 0x004034e8 8945b0 mov dword [ebp - 0x50], eax 0x004034eb 8b45b0 mov eax, dword [ebp - 0x50] 0x004034ee 8b4db0 mov ecx, dword [ebp - 0x50] 0x004034f1 03483c add ecx, dword [eax + 0x3c] ; "PE" 0x004034f4 894da8 mov dword [ebp - 0x58], ecx 0x004034f7 8b45a8 mov eax, dword [ebp - 0x58] ; "PE" 0x004034fa 8b4db0 mov ecx, dword [ebp - 0x50] 0x004034fd 034878 add ecx, dword [eax + 0x78] 0x00403500 894db8 mov dword [ebp - 0x48], ecx
0x4034db contains the “jmp ebx” instruction we looked at earlier and this aecu command tells r2 to emulate code until it reaches instruction 0x4034db. For some reason, r2 stopped at 0x4034d5, but we can see we are at the PUSH-POP-JMP sequence, and if we print the disassembly at 0x4034e5, it’s now decrypted.
Resetting ESIL and Decrypting Another Function
The second code segment starts at 0x402eef.
[0x004034d5]> pd 100 @0x402eef 0x00402eef 55 push ebp 0x00402ef0 8bec mov ebp, esp 0x00402ef2 83ec60 sub esp, 0x60 ; '`' 0x00402ef5 53 push ebx 0x00402ef6 68f62f4000 push 0x402ff6 0x00402efb 58 pop eax 0x00402efc 8945e4 mov dword [ebp - 0x1c], eax 0x00402eff 68e2304000 push 0x4030e2 0x00402f04 58 pop eax 0x00402f05 8945f0 mov dword [ebp - 0x10], eax 0x00402f08 b8fe2f4000 mov eax, 0x402ffe 0x00402f0d 8905ef2f4000 mov dword [0x402fef], eax ; [0x402fef:4]=0x402ffe 0x00402f13 c745f4010000. mov dword [ebp - 0xc], 1 0x00402f1a 837df400 cmp dword [ebp - 0xc], 0 ┌─< 0x00402f1e 7466 je 0x402f86 ; unlikely ...snip... └──> 0x00402fee 68fe2f4000 push 0x402ffe 0x00402ff3 5b pop ebx 0x00402ff4 ffe3 jmp ebx ...snip... [0x004034d5]> pd 10 @0x402ffe 0x00402ffe b3f8 mov bl, 0xf8 0x00403000 b265 mov dl, 0x65 ; 'e' ; "run in DOS mode....$" ┌─< 0x00403002 7c21 jl 0x403025 ; likely │ 0x00403004 ce into │ 0x00403005 c8373783 enter 0x3737, -0x7d │ 0x00403009 ec in al, dx │ 0x0040300a 39d6 cmp esi, edx ┌──< 0x0040300c 7614 jbe 0x403022 ; likely ││ 0x0040300e 8e6e0a mov gs, word [esi + 0xa] ; [0xa:2]=0 ││ 0x00403011 d98577ff317e fld dword [ebp + 0x7e31ff77]
We can see the same structure in this code: same sequence setting up the function, the PUSH-POP-JMP, and that JMP going to nonsense assembly code.
First, we’ll clear out the ESIL environment to start fresh.
[0x004034d5]> ar0 [0x004034d5]> aeim- [0x00000000]> aei-
These commands clear the registers, de-initialize the VM stack, an de-initialize the VM state. Now we’ll run all the commands at once, this time stopping at the JMP at 0x402ff4.
[0x00000000]> s 0x402eef [0x00402eef]> aei [0x00402eef]> aeim [0x00402eef]> aeip [0x00402eef]> aer oeax = 0x00000000 eax = 0x00000000 ebx = 0x00000000 ecx = 0x00000000 edx = 0x00000000 esi = 0x00000000 edi = 0x00000000 esp = 0x00178000 ebp = 0x00178000 eip = 0x00402eef eflags = 0x00000000 [0x00402eef]> aecu 0x402ff4 [0x00402fee]> pd 10 @0x402ffe 0x00402ffe 33c0 xor eax, eax 0x00403000 40 inc eax ┌─< 0x00403001 0f84c0000000 je 0x4030c7 ; unlikely │ 0x00403007 0f31 rdtsc │ 0x00403009 8945f8 mov dword [ebp - 8], eax │ 0x0040300c 8955fc mov dword [ebp - 4], edx │ 0x0040300f a100604000 mov eax, dword [section_end..data] ; [0x406000:4]=-1 ; LEA section_end..data ; section_end..data │ 0x00403014 8945b0 mov dword [ebp - 0x50], eax │ 0x00403017 8b45b0 mov eax, dword [ebp - 0x50] │ 0x0040301a 833800 cmp dword [eax], 0
As you can see, the emulation worked and the code is now decrypted.
Scripting the Commands with r2pipe
All of these commands can be put in a simple Python r2pipe script as follows:
$ cat defender.py import sys import r2pipe r = r2pipe.open() r.cmd('aaa') r.cmd('e asm.emu=true') r.cmd('e asm.emustr=true') r.cmd('e asm.bits=32') r.cmd('e asm.arch=x86') r.cmd('e asm.emuwrite=true') r.cmd('e io.cache=true') r.cmd('s 0x4033d1') r.cmd('aei') r.cmd('aeim') r.cmd('aeip') r.cmd('aer') r.cmd('aecu 0x0040346b') r.cmd('ar0') r.cmd('aeim-') r.cmd('aei-') r.cmd('s 0x402eef') r.cmd('aei') r.cmd('aeim') r.cmd('aeip') r.cmd('aer') r.cmd('aecu 0x402ff4')
Finally, the script can be executed, then the functions reviewed as follows:
$ r2 -i defender.py Defender.exe [x] Analyze all flags starting with sym. and entry0 (aa) [x] Analyze len bytes of instructions for references (aar) [x] Analyze function calls (aac) [ ] [*] Use -AA or aaaa to perform additional experimental analysis. [x] Constructing a function name for fcn.* and sym.func.* functions (aan)) -- r2 -- leading options since 2006 [0x00402fee]> pd 10 @0x4034e5 ; DATA XREF from 0x004033ef (fcn.004033d1) ; DATA XREF from 0x004034d5 (fcn.004033d1) 0x004034e5 8b4508 mov eax, dword [ebp + 8] ; [0x8:4]=4 0x004034e8 8945b0 mov dword [ebp - 0x50], eax 0x004034eb 8b45b0 mov eax, dword [ebp - 0x50] 0x004034ee 8b4db0 mov ecx, dword [ebp - 0x50] 0x004034f1 03483c add ecx, dword [eax + 0x3c] ; "PE" 0x004034f4 894da8 mov dword [ebp - 0x58], ecx 0x004034f7 8b45a8 mov eax, dword [ebp - 0x58] ; "PE" 0x004034fa 8b4db0 mov ecx, dword [ebp - 0x50] 0x004034fd 034878 add ecx, dword [eax + 0x78] 0x00403500 894db8 mov dword [ebp - 0x48], ecx [0x00402fee]> pd 10 @0x402ffe 0x00402ffe 33c0 xor eax, eax 0x00403000 40 inc eax ┌─< 0x00403001 0f84c0000000 je 0x4030c7 ; unlikely │ 0x00403007 0f31 rdtsc │ 0x00403009 8945f8 mov dword [ebp - 8], eax │ 0x0040300c 8955fc mov dword [ebp - 4], edx │ 0x0040300f a100604000 mov eax, dword [section_end..data] ; [0x406000:4]=-1 ; LEA section_end..data ; section_end..data │ 0x00403014 8945b0 mov dword [ebp - 0x50], eax │ 0x00403017 8b45b0 mov eax, dword [ebp - 0x50] │ 0x0040301a 833800 cmp dword [eax], 0