mirror of https://github.com/fdiskyou/Zines.git
2607 lines
136 KiB
Plaintext
2607 lines
136 KiB
Plaintext
Subverting PatchGuard Version 2
|
|
Skywing
|
|
12/2006
|
|
skywing@valhallalegends.com
|
|
http://www.nynaeve.net
|
|
|
|
1) Foreword
|
|
|
|
Abstract: Windows Vista x64 and recently hotfixed versions of the Windows
|
|
Server 2003 x64 kernel contain an updated version of Microsoft's kernel-mode
|
|
patch prevention technology known as PatchGuard. This new version of
|
|
PatchGuard improves on the previous version in several ways, primarily dealing
|
|
with attempts to increase the difficulty of bypassing PatchGuard from the
|
|
perspective of an independent software vendor (ISV) deploying a driver that
|
|
patches the kernel. The feature-set of PatchGuard version 2 is otherwise
|
|
quite similar to PatchGuard version 1; the SSDT, IDT/GDT, various MSRs, and
|
|
several kernel global function pointer variables (as well as kernel code) are
|
|
guarded against unauthorized modification. This paper proposes several
|
|
methods that can be used to bypass PatchGuard version 2 completely. Potential
|
|
solutions to these bypass techniques are also suggested. Additionally, this
|
|
paper describes a mechanism by which PatchGuard version 2 can be subverted to
|
|
run custom code in place of PatchGuard's system integrity checking code, all
|
|
while leaving no traces of any kernel patching or custom kernel drivers loaded
|
|
in the system after PatchGuard has been subverted. This is particularly
|
|
interesting from the perspective of using PatchGuard's defenses to hide kernel
|
|
mode code, a goal that is (in many respects) completely contrary to what
|
|
PatchGuard is designed to do.
|
|
|
|
Thanks: The author would like to thank skape, bugcheck, and Alex Ionescu.
|
|
|
|
Disclaimer: This paper is presented in the interest of education and the
|
|
furthering of general public knowledge. The author cannot be held responsible
|
|
for any potential use (or misuse) of the information disclosed in this paper.
|
|
While the author has attempted to be as vigilant as possible with respect to
|
|
ensuring that this paper is accurate, it is possible that one or more mistakes
|
|
might remain. If such an inaccuracy or mistake is located, the author would
|
|
appreciate being notified so that the appropriate corrections can be made.
|
|
|
|
2) Introduction
|
|
|
|
With x64 versions of the Windows kernel, Microsoft has attempted to take an
|
|
aggressive stance[1] against the use of a certain class of techniques that have
|
|
been frequently used to ``extend'' the kernel in potentially unsafe fashions
|
|
on previous versions of Windows. This includes patching the kernel itself,
|
|
hooking the kernel's system service tables, redirecting interrupt handlers,
|
|
and several other less common techniques for intercepting control of execution
|
|
before the kernel is reached, such as the alternation of the system call
|
|
target MSR.
|
|
|
|
The technology that Microsoft has deployed to prevent the unauthorized
|
|
patching of the kernel that has been historically rampant on x86 is known as
|
|
PatchGuard. This technology was initially released with Windows Server 2003
|
|
x64 Edition and Windows XP x64 Edition (known as PatchGuard version 1). The
|
|
x64 editions of Windows Vista, and recently hotfixed versions of the Windows
|
|
Server 2003 x64 kernel contain a newer version of the PatchGuard technology,
|
|
known as PatchGuard version 2. The new version is designed to make it
|
|
significantly more difficult for independent software vendors (ISVs) to
|
|
deploy, in the field, solutions that involve patching the kernel after
|
|
disabling the kernel patch protection mechanisms afforded by PatchGuard. The
|
|
inner details of PatchGuard itself are much the same as they were in
|
|
PatchGuard version 1 and thus will not be discussed in detail in this paper
|
|
(excluding version 2's improved anti-debugging and anti-patch technologies).
|
|
A sufficiently interested reader wishing some more background information on
|
|
the subject may find out more about how PatchGuard version 1 functions in
|
|
Uninformed's previous article [2] on the subject, ``Bypassing PatchGuard on
|
|
Windows x64''.
|
|
|
|
PatchGuard version 2 takes the original PatchGuard release and attempts to
|
|
plug various holes in its implementation of an obfuscation-based anti-patching
|
|
system. In this respect, it has met some mixed success and failure. Although
|
|
the new PatchGuard version does, on the surface, appear to disable the
|
|
majority of the bypass techniques that had been proposed [2] as means to disable
|
|
the original PatchGuard release, at least several of these techniques may be
|
|
fairly trivially re-enabled through some minor alterations or additional new
|
|
code. Furthermore, it is still possible to bypass PatchGuard version 2
|
|
without relying on dangerous (version-specific) constructs such as hard-coded
|
|
offsets or code fingerprinting on frequently changing code. Additionally,
|
|
aside from techniques that are based on disabling PatchGuard itself, there
|
|
still exist several potential bypass mechanisms that have a strong potential
|
|
to be ``future-compatible'' with new PatchGuard versions by virtue of
|
|
preventing PatchGuard from even detecting that unauthorized alternations to
|
|
the kernel have been made (and thus isolating themselves from any
|
|
obfuscation-based changes to how PatchGuard's system integrity check is
|
|
invoked). To Microsoft's credit, however, the resilience of PatchGuard to
|
|
being debugged and analyzed has been significantly improved (at least with
|
|
regard to certain key steps, such as initialization at boot time).
|
|
|
|
3) Notable Protection Mechanisms
|
|
|
|
PatchGuard version 2 implements a variety of anti-debug, anti-analysis, and
|
|
obfuscation mechanisms that are worth covering. Not all of PatchGuard's
|
|
defenses are covered in detail in this paper, and those mechanisms (such as
|
|
the obfuscation of PatchGuard's internal data structures) that are at least
|
|
the same in principle as the previous PatchGuard release (and were already
|
|
disclosed by Uninformed's previous article [2] on PatchGuard) are additionally
|
|
not covered by this paper.
|
|
|
|
3.1) Anti-Debug Code During Initialization
|
|
|
|
That being said, there are still a number of interesting things to examine as
|
|
far as PatchGuard's protection mechanisms go. Many of these techniques are on
|
|
their own worthy of discussion, simply from the perspective of their worth as
|
|
general debug/analysis protection mechanisms. PatchGuard version 2 begins as
|
|
an appended addition to the nt!SepAdtInitializePrivilegeAuditing routine in
|
|
the kernel (PatchGuard version 2 continues the tactic of misleading and/or
|
|
bogus function names that PatchGuard version 1 introduced). This routine is
|
|
responsible for performing the bulk of PatchGuard's initialization, including
|
|
setting up the encrypted PatchGuard context data structures. Unlike
|
|
PatchGuard version 1, the initialization routine is littered with statements
|
|
that are intended to frustrate debugging, such as the following construct that
|
|
enters an infinite loop if a debugger is connected (this particular construct
|
|
is used in many places during PatchGuard initialization):
|
|
|
|
cli
|
|
cmp cs:KdDebuggerNotPresent, r12b
|
|
jnz short continue_initialization_1
|
|
infinite_loop_1:
|
|
jmp short infinite_loop_1
|
|
sti
|
|
|
|
This particular approach is not all that robust as currently implemented in
|
|
PatchGuard version 2 today. It remains relatively easy to detect these
|
|
references to nt!KdDebuggerNotPresent ahead of time, and disable them. If
|
|
Microsoft had elected to corrupt the execution context in a creative way on
|
|
each occurrence (such as zeroing some registers, or otherwise arranging for a
|
|
failure to occur much later on if a debugger was attached) before entering the
|
|
forever loop, then these constructs might have been slightly effective as far
|
|
as anti-debugging goes.
|
|
|
|
Other constructs include the highly obfuscated selection of a randomized set
|
|
of bogus pool tags used to allocate PatchGuard data structures. Like
|
|
PatchGuard version 1, PatchGuard version 2 uses a randomly chosen bogus pool
|
|
tag and randomly adjusted allocation sizes in an attempt to frustrate easy
|
|
detection of the PatchGuard context in-memory by scanning pool allocations.
|
|
The following is an example of one of the sections of code used by PatchGuard
|
|
to randomly pick a pool tag and random allocation delta from a list of
|
|
possible pool tags. The actual allocation size is the random allocation delta
|
|
plus the minimum size of the PatchGuard context structure, truncated at 2048
|
|
bytes. Here, the rdtsc instruction is used for random number generation
|
|
purposes (readers that have examined the previous [2] PatchGuard paper may
|
|
recognize this random number generation construct; it is used throughout
|
|
PatchGuard anywhere a random quantity is required).
|
|
|
|
;
|
|
; Generate a random value, using rdtsc.
|
|
;
|
|
lea ebx, [r14+r13+200h]
|
|
mov dword ptr [rsp+0A28h+Timer], ebx
|
|
rdtsc
|
|
mov r10, qword ptr [rsp+0A28h+arg_5F8]
|
|
shl rdx, 20h
|
|
mov r11, 7010008004002001h
|
|
or rax, rdx
|
|
mov rcx, r10
|
|
xor rcx, rax
|
|
lea rax, [rsp+0A28h+var_2C8]
|
|
xor rcx, rax
|
|
mov rax, rcx
|
|
ror rax, 3
|
|
xor rcx, rax
|
|
mov rax, r11
|
|
mul rcx
|
|
mov [rsp+0A28h+var_2C8], rax
|
|
xor eax, edx
|
|
mov [rsp+0A28h+arg_1F0], rdx
|
|
;
|
|
; This is essentially a switch(eax & 7), where eax
|
|
; is a random value. Each case statement selects
|
|
; a unique obfuscated pooltag value. The magical
|
|
; 0x432E10h constant below is the offset used to
|
|
; jump to the switch case handler selected.
|
|
;
|
|
lea rdx, cs:400000h
|
|
and eax, 7
|
|
mov ecx, [rdx+rax*4+432E10h]
|
|
add rcx, rdx
|
|
jmp rcx
|
|
--------------------------------------------------
|
|
mov dword ptr [rsp+0A28h+var_9D8], 0D098D0D8h
|
|
mov r9d, dword ptr [rsp+0A28h+var_9D8]
|
|
ror r9d, 6
|
|
jmp DoAllocation
|
|
--------------------------------------------------
|
|
mov dword ptr [rsp+0A28h+var_9D8], 0B2AD31A1h
|
|
mov r9d, dword ptr [rsp+0A28h+var_9D8]
|
|
rol r9d, 1
|
|
jmp DoAllocation
|
|
--------------------------------------------------
|
|
mov dword ptr [rsp+0A28h+var_9D8], 85B5910Dh
|
|
mov r9d, dword ptr [rsp+0A28h+var_9D8]
|
|
ror r9d, 2
|
|
jmp DoAllocation
|
|
--------------------------------------------------
|
|
mov dword ptr [rsp+0A28h+var_9D8], 0A8223938h
|
|
mov r9d, dword ptr [rsp+0A28h+var_9D8]
|
|
xor r9d, 3
|
|
ror r9d, 0Fh
|
|
jmp DoAllocation
|
|
--------------------------------------------------
|
|
mov dword ptr [rsp+0A28h+var_9D8], 67076494h
|
|
mov r9d, dword ptr [rsp+0A28h+var_9D8]
|
|
rol r9d, 4
|
|
jmp DoAllocation
|
|
--------------------------------------------------
|
|
mov dword ptr [rsp+0A28h+var_9D8], 288C49EDh
|
|
mov r9d, dword ptr [rsp+0A28h+var_9D8]
|
|
ror r9d, 5
|
|
jmp DoAllocation
|
|
--------------------------------------------------
|
|
mov dword ptr [rsp+0A28h+var_9D8], 4E574672h
|
|
mov r9d, dword ptr [rsp+0A28h+var_9D8]
|
|
xor r9d, 6
|
|
ror r9d, 18h
|
|
jmp DoAllocation
|
|
--------------------------------------------------
|
|
DoAllocation:
|
|
;
|
|
; Get another random value (for the allocation size),
|
|
; and deobfuscate the pooltag value that was selected.
|
|
;
|
|
; Eventually, the value ending up in "r9d" is used as
|
|
; the pooltag value.
|
|
;
|
|
rdtsc
|
|
shl rdx, 20h
|
|
mov rcx, r10
|
|
or rax, rdx
|
|
xor rcx, rax
|
|
lea rax, [rsp+0A28h+var_858]
|
|
xor rcx, rax
|
|
mov rax, rcx
|
|
ror rax, 3
|
|
xor rcx, rax
|
|
mov rax, r11
|
|
mul rcx
|
|
mov [rsp+0A28h+ValueName], rdx
|
|
mov r9, rax
|
|
mov [rsp+0A28h+var_858], rax
|
|
xor r9d, edx
|
|
mov eax, 4EC4EC4Fh
|
|
mov ecx, r9d
|
|
mul r9d
|
|
shr edx, 3
|
|
shr r9d, 5
|
|
mov r8d, r9d
|
|
mov eax, 4EC4EC4Fh
|
|
imul edx, 1Ah
|
|
sub ecx, edx
|
|
add ecx, 61h
|
|
shl ecx, 8
|
|
mul r9d
|
|
shr edx, 3
|
|
shr r9d, 5
|
|
mov eax, 4EC4EC4Fh
|
|
imul edx, 1Ah
|
|
sub r8d, edx
|
|
mul r9d
|
|
add r8d, 41h
|
|
mov eax, 4EC4EC4Fh
|
|
or r8d, ecx
|
|
shr edx, 3
|
|
mov ecx, r9d
|
|
shr r9d, 5
|
|
shl r8d, 8
|
|
imul edx, 1Ah
|
|
sub ecx, edx
|
|
add ecx, 61h
|
|
or ecx, r8d
|
|
shl ecx, 8
|
|
mul r9d
|
|
shr edx, 3
|
|
imul edx, 1Ah
|
|
sub r9d, edx
|
|
add r9d, 41h
|
|
or r9d, ecx
|
|
rdtsc
|
|
shl rdx, 20h
|
|
mov rcx, r10
|
|
mov r8d, r9d ; Tag
|
|
or rax, rdx
|
|
xor rcx, rax
|
|
lea rax, [rsp+0A28h+var_2E8]
|
|
xor rcx, rax
|
|
mov rax, rcx
|
|
ror rax, 3
|
|
xor rcx, rax
|
|
mov rax, r11
|
|
mul rcx
|
|
;
|
|
; Perform the actual allocation. We're requesting NonPagedPool,
|
|
; with the random pooltag selected by the deobfuscation and
|
|
; randomization code above. The actual size of the block being
|
|
; allocated here is given in ebx, with a random "fuzz factor" that
|
|
; is added to this minimum allocation size, then truncated to a
|
|
; maximum of 2047 bytes.
|
|
;
|
|
xor ecx, ecx ; PoolType
|
|
mov [rsp+0A28h+var_310], rdx
|
|
xor rdx, rax
|
|
mov [rsp+0A28h+var_2E8], rax
|
|
and edx, 7FFh
|
|
add edx, ebx ; NumberOfBytes
|
|
call ExAllocatePoolWithTag
|
|
|
|
3.2) Expanded Set of DPC Routines
|
|
|
|
Other protection mechanisms used in PatchGuard version 2 include an expanded
|
|
set of DPC routines used to arrange for the execution of the PatchGuard
|
|
integrity check routine. Recall that in PatchGuard version 1, there existed a
|
|
set of three possible DPC routines. In PatchGuard version 2, this set of
|
|
potential DPC routines that can be repurposed for PatchGuard's use has been
|
|
expanded to ten possibilities. One DPC routine is selected at boot time from
|
|
this set of ten possiblities, and from that point is used for all further
|
|
PatchGuard operations for the lifetime of the session. The fact that only one
|
|
DPC routine is used in a particular Windows session is a weakness that is
|
|
inherited from the previous PatchGuard version (as the reader will discover,
|
|
eventually comes in handy if one is set on bypassing PatchGuard). The DPC
|
|
routine to be used for the current boot session is selected in the
|
|
nt!SepAdtInitializePrivilegeAuditing routine, much the same as how the bogus
|
|
pooltag to be used for all PatchGuard allocations is selected:
|
|
|
|
INIT:0000000000832741:
|
|
PatchGuard_Pick_Random_DPC:
|
|
;
|
|
; Use the time stamp counter as a random seed.
|
|
;
|
|
rdtsc
|
|
shl rdx, 20h
|
|
mov rcx, r15
|
|
or rax, rdx
|
|
xor rcx, rax
|
|
lea rax, [rsp+0A28h+var_360]
|
|
xor rcx, rax
|
|
mov rax, rcx
|
|
ror rax, 3
|
|
xor rcx, rax
|
|
mov rax, 7010008004002001h
|
|
mul rcx
|
|
mov [rsp+0A28h+var_360], rax
|
|
mov rcx, rdx
|
|
mov qword ptr [rsp+0A28h+arg_260], rdx
|
|
xor rcx, rax
|
|
mov rax, 0CCCCCCCCCCCCCCCDh
|
|
mul rcx
|
|
shr rdx, 3
|
|
;
|
|
; The resulting value in `rax' is the index into a switch jump table
|
|
; that is used to locate the DPC to be repurposed for initiating
|
|
; PatchGuard checks for this session.
|
|
;
|
|
lea rax, [rdx+rdx*4]
|
|
add rax, rax
|
|
sub rcx, rax
|
|
jmp PatchGuard_DPC_Switch
|
|
|
|
INIT:0000000000832317:
|
|
PatchGuard_DPC_Switch:
|
|
;
|
|
; The address of the case statement is formed by adding the image base (here,
|
|
; being loaded into `rdx') and an RVA in the table indexed by rax.
|
|
;
|
|
lea rdx, cs:400000h
|
|
mov eax, ecx
|
|
;
|
|
; Locate the case statement RVA by indexing the jump offset table.
|
|
;
|
|
mov ecx, [rdx+rax*4+432E60h]
|
|
;
|
|
; Add it to the image base to form a complete 64-bit address.
|
|
;
|
|
add rcx, rdx
|
|
;
|
|
; Execute the case handler.
|
|
;
|
|
jmp rcx
|
|
|
|
|
|
;
|
|
; The set of case statements are as follows:
|
|
;
|
|
; Each case statement block simply loads the full 64-bit address
|
|
; of the DPC routine to be repurposed for PatchGuard checks into
|
|
; the r8 register. This register is later stored into one of
|
|
; PatchGuard's internal data structures for future use.
|
|
;
|
|
|
|
lea r8, CmpEnableLazyFlushDpcRoutine
|
|
jmp short PatchGuardSelectDpcRoutine
|
|
lea r8, _CmpLazyFlushDpcRoutine
|
|
jmp short PatchGuardSelectDpcRoutine
|
|
lea r8, ExpTimeRefreshDpcRoutine
|
|
jmp short PatchGuardSelectDpcRoutine
|
|
lea r8, ExpTimeZoneDpcRoutine
|
|
jmp short PatchGuardSelectDpcRoutine
|
|
lea r8, ExpCenturyDpcRoutine
|
|
jmp short PatchGuardSelectDpcRoutine
|
|
lea r8, ExpTimerDpcRoutine
|
|
jmp short PatchGuardSelectDpcRoutine
|
|
lea r8, IopTimerDispatch
|
|
jmp short PatchGuardSelectDpcRoutine
|
|
lea r8, IopIrpStackProfilerTimer
|
|
jmp short PatchGuardSelectDpcRoutine
|
|
lea r8, KiScanReadyQueues
|
|
jmp short PatchGuardSelectDpcRoutine
|
|
lea r8, PopThermalZoneDpc
|
|
;
|
|
; (fallthrough from last case statement)
|
|
;
|
|
INIT:0000000000832800:
|
|
PatchGuardSelectDpcRoutine:
|
|
xor ecx, ecx
|
|
;
|
|
; Store the DPC routine into r14+178. r14 points to one of
|
|
; the PatchGuard context structures in this particular instance.
|
|
;
|
|
mov [r14+178h], r8
|
|
|
|
Much like PatchGuard version 1, each of the DPCs selected for use in launching
|
|
the PatchGuard integrity checks has a legitimate function. Furthermore, the
|
|
DPC routines are ones that are important for normal system operation, thus it
|
|
is not possible for one to simply detect all DPCs that refer to these DPC
|
|
routines and cancel them. Instead, much as with PatchGuard version 1, if one
|
|
wanted to go the route of blocking PatchGuard's DPC, a mechanism to detect the
|
|
particular PatchGuard DPC (as opposed to the legitimate system invocations
|
|
thereof) must be developed. This aspect of PatchGuard's obfuscation
|
|
mechanisms is relatively similar to version 1, other than the logical
|
|
extension to ten DPCs instead of three DPCs.
|
|
|
|
3.3) Self-Decrypting and Mutating System Integrity Check Routine
|
|
|
|
PatchGuard version 2 also inherits the capability to encrypt its
|
|
datastructures and executable code in-memory from version 1. This is a
|
|
defensive mechanism that intends to make it difficult for an attacker to
|
|
perform a classic egghunt style search, wherein the attacker has devised an
|
|
identifiable signature for PatchGuard data structures that can be used to
|
|
locate it in an exhaustive non-paged-pool memory scan. From this perspective,
|
|
the obfuscation and encryption of PatchGuard code and data structures that are
|
|
dynamically allocated is still a reasonably strong defensive mechanism.
|
|
Unfortunately for Microsoft, though, some of the data structures linking to
|
|
PatchGuard are internal system structures (such as a KDPC and associated
|
|
KTIMER used to kick off PatchGuard execution). This presents a weakness that
|
|
could be potentially used to identify PatchGuard structures in memory (which
|
|
will be explored in more detail later).
|
|
|
|
The encryption of PatchGuard's internal context structures was covered by
|
|
Uninformed's original paper [2] on the subject. However, the mechanism by which
|
|
PatchGuard obfuscates its system integrity checking and validation routines
|
|
was not discussed. This mechanism is novel enough to warrant some
|
|
explanation. The technique used to obfuscate PatchGuard's executable code
|
|
in-memory involves two layers of decryption/deobfuscation functions, each of
|
|
which decrypts the next layer. After both layers have run their course,
|
|
PatchGuard's validation routines are plaintext in memory and are then directly
|
|
executed.
|
|
|
|
The first decryption layer is the code block that is called from the
|
|
repurposed DPC routine selected by PatchGuard at boot time. Its job is to
|
|
decrypt itself (in 8 byte chunks, starting with the second instruction in the
|
|
function). After the decryption of the this code block is complete, the
|
|
decryption stub continues on to decrypt a second code block (the actual
|
|
PatchGuard validation routine). When this second decryption/deobfuscation
|
|
cycle is completed, the decryption stub then executes the actual PatchGuard
|
|
system integrity check routine.
|
|
|
|
As noted above, the first task for the decryption stub is to decrypt itself.
|
|
Except for the first instruction of the stub, the entire routine is encrypted
|
|
when entered. The first instruction encrypts itself and decrypts the next
|
|
instruction. The following instruction decrypts the next two instructions,
|
|
and soforth. This is accomplished by a series of four byte long instructions
|
|
that xor an eight byte quantity with a decryption key (initially starting at
|
|
the current instruction pointer - here, rcx and rip always have the same
|
|
value. An example of how this process works is illustrated below:
|
|
|
|
;
|
|
; rcx: Address of the decryption stub (same as rip)
|
|
; rdx: Decryption key
|
|
;
|
|
Breakpoint 5 hit
|
|
nt!ExpTimeRefreshDpcRoutine+0x20a:
|
|
fffff800`0112c98b ff5538 call qword ptr [rbp+38h]
|
|
0: kd> u poi(rbp+38)
|
|
;
|
|
; Note that beyond the first instruction, the decryption stub is initially seemingly
|
|
; garbage data (though it has an apparent pattern to it, since it is merely obfuscated
|
|
; by xor).
|
|
;
|
|
fffffadf`f6e6d55d f0483111 lock xor qword ptr [rcx],rdx
|
|
fffffadf`f6e6d561 88644d68 mov byte ptr [rbp+rcx*2+68h],ah
|
|
fffffadf`f6e6d565 62 ???
|
|
fffffadf`f6e6d566 d257df rcl byte ptr [rdi-21h],cl
|
|
fffffadf`f6e6d569 88644d78 mov byte ptr [rbp+rcx*2+78h],ah
|
|
fffffadf`f6e6d56d 62 ???
|
|
fffffadf`f6e6d56e d257ef rcl byte ptr [rdi-11h],cl
|
|
fffffadf`f6e6d571 88644d48 mov byte ptr [rbp+rcx*2+48h],ah
|
|
0: kd> t
|
|
fffffadf`f6e6d55d f0483111 lock xor qword ptr [rcx],rdx
|
|
0: kd> r
|
|
;
|
|
; Note the initial input arguments. rcx points to the decryption stub's first
|
|
; instruction (same as rip), and rdx is the decryption key.
|
|
;
|
|
rax=fffffadff6e6d55d rbx=fffff8000116d894 rcx=fffffadff6e6d55d
|
|
rdx=601c55c0cf06e32a rsi=fffff800003c7ad0 rdi=0000000000000003
|
|
rip=fffffadff6e6d55d rsp=fffff800003c51f8 rbp=fffff800003c7ad0
|
|
r8=0000000000000000 r9=0000000000000000 r10=0000000001c7111e
|
|
r11=fffff800003c54c0 r12=fffff8000116d858 r13=fffff800003c5370
|
|
r14=fffff80001000000 r15=fffff800003c60a0
|
|
iopl=0 nv up ei pl zr na po nc
|
|
cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00000246
|
|
fffffadf`f6e6d55d f0483111 lock xor qword ptr [rcx],rdx ds:002b:fffffadf`f6e6d55d=684d6488113148f0
|
|
|
|
;
|
|
; After allowing the decryption of the stub to progress, we see the stub in its executable
|
|
; form. The first instruction is initially re-encrypted after executed, but a later
|
|
; instruction in the decryption stub returns the initial instruction to its executable,
|
|
; plaintext form.
|
|
;
|
|
|
|
0: kd> u FFFFFADFF6E6D55D
|
|
;
|
|
; The `lock' prefix is used to create a four byte instruction when there
|
|
; is no immediate offset specified (a MASM limitation, as the assembler
|
|
; will convert a zero offset into the shorter form with no immediate
|
|
; offset operand).
|
|
;
|
|
fffffadf`f6e6d55d f0483111 lock xor qword ptr [rcx],rdx
|
|
fffffadf`f6e6d561 48315108 xor qword ptr [rcx+8],rdx
|
|
fffffadf`f6e6d565 48315110 xor qword ptr [rcx+10h],rdx
|
|
fffffadf`f6e6d569 48315118 xor qword ptr [rcx+18h],rdx
|
|
fffffadf`f6e6d56d 48315120 xor qword ptr [rcx+20h],rdx
|
|
fffffadf`f6e6d571 48315128 xor qword ptr [rcx+28h],rdx
|
|
fffffadf`f6e6d575 48315130 xor qword ptr [rcx+30h],rdx
|
|
fffffadf`f6e6d579 48315138 xor qword ptr [rcx+38h],rdx
|
|
0: kd> u
|
|
fffffadf`f6e6d57d 48315140 xor qword ptr [rcx+40h],rdx
|
|
fffffadf`f6e6d581 48315148 xor qword ptr [rcx+48h],rdx
|
|
;
|
|
; Because the initial instruction was re-encrypted after it was executed,
|
|
; we need to decrypt it again.
|
|
;
|
|
fffffadf`f6e6d585 3111 xor dword ptr [rcx],edx
|
|
fffffadf`f6e6d587 488bc2 mov rax,rdx
|
|
fffffadf`f6e6d58a 488bd1 mov rdx,rcx
|
|
fffffadf`f6e6d58d 8b4a4c mov ecx,dword ptr [rdx+4Ch]
|
|
;
|
|
; The following is the second stage decryption loop. It's purpose is to
|
|
; decrypt a code block following the current decryption stub in memory.
|
|
;
|
|
; This code block is then executed (it is responsible for performing the
|
|
; actual PatchGuard system verification checks).
|
|
;
|
|
|
|
fffffadf`f6e6d590 483144ca48 xor qword ptr [rdx+rcx*8+48h],rax
|
|
fffffadf`f6e6d595 48d3c8 ror rax,cl
|
|
0: kd> u
|
|
fffffadf`f6e6d598 e2f6 loop fffffadf`f6e6d590
|
|
;
|
|
; After decryption of the second block is completed, we'll execute it
|
|
; by jumping to it. Doing so kicks off the system verification routine
|
|
; that verifies system integrity, arranging for a bug check if not,
|
|
; otherwise arranging for itself to be executed again several minutes
|
|
; later.
|
|
;
|
|
fffffadf`f6e6d59a 8b8288010000 mov eax,dword ptr [rdx+188h]
|
|
fffffadf`f6e6d5a0 4803c2 add rax,rdx
|
|
fffffadf`f6e6d5a3 ffe0 jmp rax
|
|
|
|
Prior to returning control, the verification routine re-encrypts itself so
|
|
that it does not remain in plaintext after the first invocation. In addition,
|
|
PatchGuard also re-randomizes the key used to encrypt and decrypt the
|
|
PatchGuard validation routine on each execution, such that a would-be attacker
|
|
has a frequently mutating target. Due to this behavior, the PatchGuard
|
|
validation routine changes appearance (in encrypted form) in-memory every few
|
|
minutes, which is the period of PatchGuard's validation checks. While this is
|
|
perhaps an admirable effort on Microsoft's part as far as interesting
|
|
obfuscation techniques go, it turns out that there are much easier avenues of
|
|
attack that can be used to disable PatchGuard without having to involve
|
|
oneself in the search of a target that alters its appearance in-memory every
|
|
few minutes.
|
|
|
|
3.4) Obfuscation of System Integrity Check Calls via Structured Exception Handling
|
|
|
|
Much like PatchGuard version 1, this version of PatchGuard utilizes structured
|
|
exception handling (SEH) support as an integral part of the process used to
|
|
kick off execution of the system integrity check routine. The means by which
|
|
this is accomplished have changed somewhat since the last PatchGuard version.
|
|
In particular, there are several layers of obfuscation in each PatchGuard DPC
|
|
that are used to shroud the actual call to the integrity check routine. In an
|
|
effort to make matters more difficult for would-be attackers, the exact
|
|
details of the obfuscation used vary between each of the ten DPCs that may be
|
|
repurposed for use with PatchGuard. They all exhibit a common pattern,
|
|
however, which can be described at a high level.
|
|
|
|
The first step in invoking the PatchGuard system integrity checking routine is
|
|
a KTIMER with an associated KDPC (indicating a DPC callback routine to be
|
|
called when the timer lapses) associated with it. This timer is primed for
|
|
single-shot execution in an interval on the order of several minutes (with a
|
|
random fuzz factor delta applied to increase the difficulty of performing a
|
|
classic egghunt style attack to locate the KTIMER in non-paged pool). The DPC
|
|
routine indicated with the KDPC that is associated with PatchGuard's KTIMER is
|
|
one of the set of ten legitimate DPC routines that may be repurposed for use
|
|
with PatchGuard. The means by which this particular invocation of the DPC
|
|
routine is distinguished from a legitimate system invocation of the DPC
|
|
routine in question is by the use of a deliberately invalid kernel pointer as
|
|
one of the arguments to the DPC routine.
|
|
|
|
The prototype for a DPC routine is described by PKDEFERRED_ROUTINE:
|
|
|
|
typedef
|
|
VOID
|
|
(*PKDEFERRED_ROUTINE) (
|
|
IN struct _KDPC *Dpc, // pointer to parent DPC
|
|
IN PVOID DeferredContext, // arbitrary context - assigned at DPC initialization
|
|
IN PVOID SystemArgument1, // arbitrary context - assigned when DPC is queued
|
|
IN PVOID SystemArgument2 // arbitrary context - assigned when DPC is queued
|
|
);
|
|
|
|
Essentially, a DPC is a callback routine with a set of user-defined context
|
|
parameters whose interpretation is entirely up to the DPC routine itself. The
|
|
standard use for context arguments in callback functions is to use them to
|
|
point to a larger structure which contains information necessary for the
|
|
callback routine to function, and this is exactly how the ten DPC routines
|
|
that can used by PatchGuard regard the DeferredContext argument during
|
|
legitimate execution. It is this usage of the DeferredContext argument which
|
|
allows PatchGuard to trigger its execution for each of the ten DPC routines
|
|
via an exception; PatchGuard arranges for a bogus DeferredContext value to be
|
|
passed to the DPC routine when it is called. The first time that the DPC
|
|
routine tries to dereference the DPC-specific structure referred to by
|
|
DeferredContext, an exception occurs (which transfers control to the exception
|
|
dispatching system, and eventually to PatchGuard's integrity check routine).
|
|
While this may seem simple at first, if the reader is familiar with kernel
|
|
mode programming, then there should be a couple of red flags set off by this
|
|
description; normally, it is not possible to catch bogus memory references at
|
|
DISPATCH_LEVEL or above with SEH (usually, one of the
|
|
PAGE_FAULT_IN_NON_PAGED_AREA or IRQL_NOT_LESS_OR_EQUAL bugchecks will be
|
|
raised, depending on whether the bogus reference was to a reserved non-paged
|
|
region or a paged-out pagable memory region). As a result, one would expect
|
|
that PatchGuard would be putting the system at risk of randomly bugchecking by
|
|
passing bogus pointers that are referenced at DISPATCH_LEVEL, the IRQL at
|
|
which DPC routines run. However, PatchGuard has a couple of tricks up its
|
|
metaphorical sleeve. It takes advantage of an implementation-specific detail
|
|
of the current generation of x64 processors shipped by AMD in order to form
|
|
kernel mode addresses that, while bogus, will not result in a page fault when
|
|
referenced. Instead, these bogus addresses will result in a general
|
|
protection fault, which eventually manifests itself as a
|
|
STATUS_ACCESS_VIOLATION SEH exception. This path to raising a
|
|
STATUS_ACCESS_VIOLATION exception does in fact work even at DISPATCHL_EVEL,
|
|
thus allowing PatchGuard to provide safe bogus pointer values for the
|
|
DeferredContext argument in order to trigger SEH dispatching without risking
|
|
bringing the system down with a bugcheck.
|
|
|
|
Specifically, the implementation detail that PatchGuard relies upon relates to
|
|
the 48-bit address space limitation in AMD's Hammer family of processors[4].
|
|
Current AMD processors only implement 48 bits of the 64-bit address space
|
|
presented by the x64 architecture. This is accomplished by requiring that
|
|
bits 63 through the most significant bit implemented by the processor (current
|
|
AMD processors implement 48 bits) of any given address be set to either all
|
|
ones or all zeros. An address of this form is defined to be a canonical
|
|
address, or a well-formed address. Attempts to reference addresses that are
|
|
not canonical as defined by this definition result in the processor
|
|
immediately raising a general protection fault. This restriction on the
|
|
address space essentially splits the usable address space into two halves; one
|
|
region at the high end of the address space, and one region at the low end of
|
|
the address space, with a no-mans-land in between the two. Windows utilizes
|
|
this split to divide user mode from kernel mode, with the high end of the
|
|
address space being reserved for kernel mode usage and the low end of the
|
|
address space being reserved for user mode usage. PatchGuard takes advantage
|
|
of this processor-mandated no-mans-land to create bogus pointer values that
|
|
can be safely dereferenced and caught by SEH, even at high IRQLs.
|
|
|
|
All of the DPC routines that are in the set which may be repurposed for use by
|
|
PatchGuard dereference the DeferredContext argument as the first part of work
|
|
that does not involve shuffling stack variables around. In other words, the
|
|
first real work involved in any of the PatchGuard-enabled DPC routines is to
|
|
touch a structure or variable pointed to by the DeferredContext argument. In
|
|
the execution path of PatchGuard attempting to trigger a system integrity
|
|
check, the DeferredContext argument is invalid, which eventually results in
|
|
an access violation exception that is routed to the SEH registrations for the
|
|
DPC routine. If one examines any of the PatchGuard DPC routines, it is clear
|
|
that all of them have several overlapping SEH registrations (a construct that
|
|
normally indicates several levels of nested try/except and try/finally
|
|
constructs):
|
|
|
|
1: kd> !fnseh nt!ExpTimeRefreshDpcRoutine
|
|
nt!ExpTimeRefreshDpcRoutine Lc8 0A,02 [EU ] nt!_C_specific_handler (C)
|
|
> fffff8000100358a La (fffff8000112c830 -> fffff80001000000)
|
|
> fffff8000100358a Lc (fffff8000112c870 -> fffff80001003596)
|
|
> fffff8000100358a L16 (fffff8000112c8a0 -> fffff80001000000)
|
|
> fffff8000100358a L18 (fffff8000112c8f0 -> fffff800010035a2)
|
|
|
|
These SEH registrations are integral to the operation of PatchGuard's system
|
|
integrity checks. The specifics of how each handler registration work differ
|
|
for each DPC routine (in an attempt to frustrate attempts to reverse engineer
|
|
them), but the general idea is that each registered handler performs a portion
|
|
of the work necessary to set up a call to the PatchGuard integrity check
|
|
routine. This work is divided up among four different exception/unwind
|
|
handlers in an effort to make it difficult to understand what is going on, but
|
|
ultimately the end result is the same for each of the DPC routines; one of the
|
|
exception/unwind handlers ends up making a direct call to the system integrity
|
|
check decryption stub in-memory. The decryption stub decrypts itself, and
|
|
then decrypts the PatchGuard check routine, following with a transfer of
|
|
control to the integrity check routine so that PatchGuard can inspect various
|
|
protected registers, MSRs, and kernel images (such as the kernel itself) for
|
|
unauthorized modification.
|
|
|
|
Additionally, all of the PatchGuard DPCs have been enhanced to obfuscate the
|
|
DPC routine arguments in stack variables (whose exact stack displacement
|
|
varies from DPC routine to DPC routine, and furthermore between kernel flavor
|
|
to kernel flavor; for example, the multiprocessor and uniprocessor kernel
|
|
builds have different stack frame layouts for many of the PatchGuard DPC
|
|
routines). Recall that in the x64 calling convention, the first four
|
|
arguments are passed via registers (rcx, rdx, r8, and r9 respectively). Each
|
|
PatchGuard DPC routine takes special care to save away significant register
|
|
arguments onto the stack (in an obfuscated form). Several of the arguments
|
|
remain obfuscated until just before the decryption stub for the system
|
|
integrity check routine is called, in an effort to make it difficult for third
|
|
parties to patch into the middle of a particular DPC routine and easily access
|
|
the original arguments to the DPC. This is presumably designed in an attempt
|
|
to make it more difficult to differentiate DPC invocations that perform the
|
|
DPC routine's legitimate function from DPC invocations that will call
|
|
PatchGuard. It also makes it difficult, though not impossible, for a third
|
|
party to recover the original arguments to the DPC routine from the context of
|
|
any of the exception handlers registered to the DPC routine in a generalized
|
|
fashion.
|
|
|
|
This obfuscation of arguments can be clearly seen by disassembling any of the
|
|
PatchGuard DPC routines. For example, when looking at
|
|
ExpTimeRefreshDpcRoutine, one can see that the routine saves away the Dpc
|
|
(rcx) and DeferredContext (rdx) arguments on the stack, rotates them by a
|
|
magical constant (this constant differs for each DPC routine flavor and is
|
|
used to further complicate the task of recovering the original DPC arguments
|
|
in a generalized fashion), and then overwrites the original argument
|
|
registers:
|
|
|
|
0: kd> uf nt!ExpTimeRefreshDpcRoutine
|
|
;
|
|
; On entry, we have the following:
|
|
;
|
|
; rcx -> Dpc
|
|
; rdx -> DeferredContext (if this is being called for
|
|
; PatchGuard, then DeferredContext
|
|
; is a bogus kernel pointer).
|
|
; r8 -> SystemArgument1
|
|
; r9 -> SystemArgument2
|
|
;
|
|
nt!ExpTimeRefreshDpcRoutine:
|
|
;
|
|
; r11 is used as an ephemeral frame pointer here.
|
|
;
|
|
; Ephemeral frame pointers are an x64-specific compiler
|
|
; construct, wherein a volatile register is used as a
|
|
; frame pointer until the first function call is made.
|
|
;
|
|
fffff800`01003540 4c8bdc mov r11,rsp
|
|
fffff800`01003543 4881ecc8000000 sub rsp,0C8h
|
|
fffff800`0100354a 4889642460 mov qword ptr [rsp+60h],rsp
|
|
;
|
|
; This DPC routine does not use SystemArgument1 or
|
|
; SystemArgument2. As a result, it is free to overwrite
|
|
; these argument registers immediately without preserving
|
|
; their value.
|
|
;
|
|
; r8 = Dpc
|
|
; rcx = Dpc
|
|
; rdx = DeferredContext
|
|
;
|
|
fffff800`0100354f 4c8bc1 mov r8,rcx
|
|
fffff800`01003552 4889542448 mov qword ptr [rsp+48h],rdx
|
|
;
|
|
; Set [rsp+20h] to zero. This is a state variable that is
|
|
; used by the exception/unwind scope handlers in order to
|
|
; coordinate the PatchGuard execution process across the
|
|
; set of four exception/unwind scope handlers associated
|
|
; with this section of code.
|
|
;
|
|
fffff800`01003557 4533c9 xor r9d,r9d
|
|
fffff800`0100355a 44894c2420 mov dword ptr [rsp+20h],r9d
|
|
;
|
|
; PatchGuard zeros out various key fields in the DPC.
|
|
; This is an attempt to make it difficult to locate the DPC
|
|
; in-memory from the context of an exception handler called
|
|
; when a PatchGuard DPC accesses the bogus DeferredContext
|
|
; argument. Specifically, PatchGuard zeros the Type and
|
|
; DeferredContext fields of the KDPC structure, shown below:
|
|
;
|
|
; 0: kd> dt nt!_KDPC
|
|
; +0x000 Type : UChar
|
|
; +0x001 Importance : UChar
|
|
; +0x002 Number : UChar
|
|
; +0x003 Expedite : UChar
|
|
; +0x008 DpcListEntry : _LIST_ENTRY
|
|
; +0x018 DeferredRoutine : Ptr64
|
|
; +0x020 DeferredContext : Ptr64 Void
|
|
; +0x028 SystemArgument1 : Ptr64 Void
|
|
; +0x030 SystemArgument2 : Ptr64 Void
|
|
; +0x038 DpcData : Ptr64 Void
|
|
;
|
|
; Dpc->Type = 0
|
|
;
|
|
fffff800`0100355f 448809 mov byte ptr [rcx],r9b
|
|
;
|
|
; Dpc->DeferredContext = 0
|
|
;
|
|
fffff800`01003562 4c894920 mov qword ptr [rcx+20h],r9
|
|
;
|
|
; Here, the DPC loads [r11-20h] with an obfuscated
|
|
; copy of the DeferredContext argument (rotated
|
|
; left by 0x34 bits).
|
|
;
|
|
; Recall that rsp == r11+0xc8, so this location
|
|
; can also be aliased by [rsp+0A8h].
|
|
;
|
|
; [rsp+0A8h] -> ROL(DeferredContext, 0x34)
|
|
;
|
|
fffff800`01003566 488bc2 mov rax,rdx
|
|
fffff800`01003569 48c1c034 rol rax,34h
|
|
fffff800`0100356d 498943e0 mov qword ptr [r11-20h],rax
|
|
;
|
|
; Similarly, the DPC loads [r11-48h] with an
|
|
; obfuscated copy of the Dpc argument (rotated
|
|
; right by 0x48 bits).
|
|
;
|
|
; This location may be aliased as [rsp+80h].
|
|
;
|
|
; [rsp+80h] -> ROR(Dpc, 0x48)
|
|
;
|
|
fffff800`01003571 488bc1 mov rax,rcx
|
|
fffff800`01003574 48c1c848 ror rax,48h
|
|
fffff800`01003578 498943b8 mov qword ptr [r11-48h],rax
|
|
;
|
|
; The following register context is now in place:
|
|
;
|
|
; r8 = Dpc
|
|
; rcx = Dpc
|
|
; rdx = DeferredContext
|
|
; rax = ROR(Dpc, 0x48)
|
|
; [rsp+0A8h] = ROL(DeferredContext, 0x34)
|
|
; [rsp+80h] = ROR(Dpc, 0x48)
|
|
;
|
|
; The DPC routine destroys the contents of rcx by
|
|
; zero extending it with a copy of the low byte of
|
|
; the DeferredContext value.
|
|
;
|
|
fffff800`0100357c 0fb6ca movzx ecx,dl
|
|
;
|
|
; The DPC routine destroys the contents of r8 with
|
|
; a right shift (unlike a rotate, the incoming left
|
|
; bits are simply zero filled instead of set to the
|
|
; rightmost bits being shifted off. The rightmost
|
|
; bits are thus lost forever, destroying the r8
|
|
; register as a useful source of the Dpc argument.
|
|
;
|
|
fffff800`0100357f 49d3e8 shr r8,cl
|
|
;
|
|
; r8 is saved away on the stack, but it is no longer
|
|
; directly useful as a way to locate the Dpc argument
|
|
; due to the destructive right shift above.
|
|
;
|
|
fffff800`01003582 4c898424d8000000 mov qword ptr [rsp+0D8h],r8
|
|
;
|
|
; r8 = Dpc >> (UCHAR)DeferredContext
|
|
; rcx = (UCHAR)DeferredContext
|
|
; rdx = DeferredContext
|
|
; rax = ROR(Dpc, 0x48)
|
|
; [rsp+0A8h] = ROL(DeferredContext, 0x34)
|
|
; [rsp+80h] = ROR(Dpc, 0x48)
|
|
;
|
|
; Here, we temporarily deobfuscate the DeferredContext
|
|
; argument stored at [r11-20h] above. In this particular
|
|
; instance, rdx also happens to contain the deobfuscated
|
|
; DeferredContext value, but not all instances of
|
|
; PatchGuard's DPC routines share this property of
|
|
; retaining a plaintext copy of DeferredContext in rdx.
|
|
;
|
|
fffff800`0100358a 498b43e0 mov rax,qword ptr [r11-20h]
|
|
fffff800`0100358e 48c1c834 ror rax,34h
|
|
;
|
|
; Now, we have the following context in place:
|
|
;
|
|
; r8 = Dpc >> (UCHAR)DeferredContext
|
|
; rcx = (UCHAR)DeferredContext
|
|
; rdx = DeferredContext (* But not valid for
|
|
; all DPC routines.)
|
|
; rax = DeferredContext
|
|
; [rsp+0A8h] = ROL(DeferredContext, 0x34)
|
|
; [rsp+80h] = ROR(Dpc, 0x48)
|
|
;
|
|
; The next step is to dereference the DeferredContext value.
|
|
; For a legitimate DPC invocation, this operation is harmless;
|
|
; the DeferredContext value would point to valid kernel memory.
|
|
;
|
|
; For PatchGuard, however, this triggers an access violation
|
|
; that winds up with control being transferred to the exception
|
|
; handlers registered to the DPC routine.
|
|
;
|
|
fffff800`01003592 8b00 mov eax,dword ptr [rax]
|
|
|
|
At this point, it is necessary to investigate the various exception/unwind
|
|
handlers registered to the DPC routine in order to determine what happens
|
|
next. Most of these handlers can be skipped as they are nothing more than
|
|
minor layers of obfuscation that, while differing significantly between each
|
|
DPC routine, have the same end result. One of the exception/unwind handlers,
|
|
however, makes the call to PatchGuard's integrity check, and this handler is
|
|
worthy of further discussion. Because the exception registrations for all of
|
|
the PatchGuard DPC routines make use of nt!_C_specific_handler, the scope
|
|
handlers conform to a standard prototype, defined below:
|
|
|
|
//
|
|
// Define the standard type used to describe a C-language exception handler,
|
|
// which is used with _C_specific_handler.
|
|
//
|
|
// The actual parameter values differ depending on whether the low byte of the
|
|
// first argument contains the value 0x1. If this is the case, then the call
|
|
// is to the unwind handler to the routine; otherwise, the call is to the
|
|
// exception handler for the routine. Each routine has fairly different
|
|
// interpretations for the two arguments, though the prototypes are as far as
|
|
// calling conventions go compatible.
|
|
//
|
|
|
|
typedef
|
|
LONG
|
|
(NTAPI * PC_LANGUAGE_EXCEPTION_HANDLER)(
|
|
__in PEXCEPTION_POINTERS ExceptionPointers, // if low byte is 0x1, then we're an unwind
|
|
__in ULONG64 EstablisherFrame // faulting routine stack pointer
|
|
);
|
|
|
|
In the case of nt!ExpTimeRefreshDpcRoutine, the fourth scope handler
|
|
registration is the one that performs the call to PatchGuard's integrity check
|
|
routine. Here, the routine only executes the integrity check if a state
|
|
variable stored at [rsp+20h] in the DPC routine is set to a particular value.
|
|
This state variable is modified as the access violation exception traverses
|
|
each of the exception/unwind scope handlers until it reaches this handler,
|
|
which eventually leads up to the execution of PatchGuard's system integrity
|
|
check. For now, it is best to assume that this routine is being called with
|
|
[rsp+20h] in the DPC routine having been set to a value other than 0x15. This
|
|
signifies that PatchGuard should be executed.
|
|
|
|
0: kd> uf fffff8000112c8f0
|
|
nt!ExpTimeRefreshDpcRoutine+0x17f:
|
|
;
|
|
; mov eax, eax is a hotpatch stub and can be ignored.
|
|
;
|
|
fffff800`0112c8f0 8bc0 mov eax,eax
|
|
fffff800`0112c8f2 55 push rbp
|
|
fffff800`0112c8f3 4883ec20 sub rsp,20h
|
|
;
|
|
; rdx corresponds to the EstablisherFrame argument.
|
|
; This argument is the stack pointer (rsp) value for
|
|
; the routine that this exception/unwind handler is
|
|
; associated with. The typical use of this argument
|
|
; is to allow seamless access to local variables in
|
|
; the routine for which the try/except filter is
|
|
; associated with. This is what eventually ends up
|
|
; occuring here, with the rbp register being loaded
|
|
; with the stack pointer of the DPC routine at the
|
|
; point in time where the exception occured.
|
|
;
|
|
;
|
|
fffff800`0112c8f7 488bea mov rbp,rdx
|
|
;
|
|
; We make the check against the state variable.
|
|
; Recall that when the DPC routine was first entered,
|
|
; [rsp+20h] in the DPC routine's context was set to
|
|
; zero. That location corresponds to [rbp+20h] in
|
|
; this context, as rbp has been loaded with the stack
|
|
; pointer that was in use in the DPC routine. This
|
|
; location is checked and altered by each of the
|
|
; registered exception/unwind handlers, and will
|
|
; eventually be set to 0x15 when this routine is called.
|
|
;
|
|
fffff800`0112c8fa 83452007 add dword ptr [rbp+20h],7
|
|
fffff800`0112c8fe 8b4520 mov eax,dword ptr [rbp+20h]
|
|
fffff800`0112c901 83f81c cmp eax,1Ch
|
|
;
|
|
; For the moment, consider the case where this jump is
|
|
; not taken. The jump is taken when PatchGuard is not
|
|
; being executed (which is not the interesting case).
|
|
;
|
|
fffff800`0112c904 0f858c000000 jne nt!ExpTimeRefreshDpcRoutine+0x215 (fffff800`0112c996)
|
|
|
|
nt!ExpTimeRefreshDpcRoutine+0x189:
|
|
;
|
|
; To understand the following instructions, it is
|
|
; necessary to look back at the stack variable context
|
|
; that was set up by the DPC routine prior to the
|
|
; faulting instruction that caused the access
|
|
; violation exception. The following values were
|
|
; set on the stack at that time:
|
|
;
|
|
; [rsp+0A8h] = ROL(DeferredContext, 0x34)
|
|
; [rsp+80h] = ROR(Dpc, 0x48)
|
|
;
|
|
; The following set of instructions utilize these
|
|
; obfuscated copies of the original arguments to the
|
|
; DPC routine in order to make the call to PatchGuard's
|
|
; integrity check routine.
|
|
;
|
|
; The first step taken is to deobfuscate the Dpc value
|
|
; that was stored at [rsp+80h], or [rbp+80h] as seen from
|
|
; this context.
|
|
;
|
|
fffff800`0112c90a 488b8580000000 mov rax,qword ptr [rbp+80h]
|
|
;
|
|
; rax = Dpc
|
|
;
|
|
fffff800`0112c911 48c1c048 rol rax,48h
|
|
;
|
|
; [rbp+50h] -> Dpc
|
|
;
|
|
fffff800`0112c915 48894550 mov qword ptr [rbp+50h],rax
|
|
;
|
|
; Next, the DeferredContext argument is deobfuscated and
|
|
; stored plaintext.
|
|
;
|
|
fffff800`0112c919 488b85a8000000 mov rax,qword ptr [rbp+0A8h]
|
|
;
|
|
; rax = DeferredContext
|
|
;
|
|
fffff800`0112c920 48c1c834 ror rax,34h
|
|
;
|
|
; [rbp+58h] -> DeferredContext
|
|
;
|
|
fffff800`0112c924 48894558 mov qword ptr [rbp+58h],rax
|
|
;
|
|
; rax = Dpc
|
|
;
|
|
fffff800`0112c928 488b4550 mov rax,qword ptr [rbp+50h]
|
|
;
|
|
; The next instruction accesses memory after the KDPC
|
|
; object in memory. Recall that a KDPC object is 0x40
|
|
; bytes in length on x64, so [Dpc+40h] is the first
|
|
; value beyond the DPC in memory. In reality, the KDPC
|
|
; is a member of a larger structure, which is defined
|
|
; as follows:
|
|
;
|
|
; struct PATCHGUARD_DPC_CONTEXT {
|
|
; KDPC Dpc; // +0x00
|
|
; ULONGLONG DecryptionKey; // +0x40
|
|
; };
|
|
;
|
|
; As a result, this instruction is equivalent to casting
|
|
; the Dpc argument to a PATCHGUARD_DPC_CONTEXT*, and then
|
|
; accessing the DecryptionKey member
|
|
;
|
|
;
|
|
; rcx = Dpc->DecryptionKey
|
|
;
|
|
fffff800`0112c92c 488b4840 mov rcx,qword ptr [rax+40h]
|
|
;
|
|
; [rbp+40h] -> DecryptionKey
|
|
;
|
|
fffff800`0112c930 48894d40 mov qword ptr [rbp+40h],rcx
|
|
;
|
|
; rax = DecryptionKey
|
|
;
|
|
fffff800`0112c934 488b4540 mov rax,qword ptr [rbp+40h]
|
|
;
|
|
; The DeferredContext value is then xor'd with the
|
|
; decryption key stored in the PATCHGUARD_DPC_CONTEXT
|
|
; structure. This yields the significant bits of the
|
|
; pointer to the PatchGuard decryption stub. Recall
|
|
; that due to the "no-mans-land" region in between the
|
|
; kernel mode and user mode address space boundaries
|
|
; on current AMD64 processors, the rest of the bits
|
|
; are required to be either all ones or all zeros in
|
|
; order to form a valid address. Because we are
|
|
; dealing with a kernel mode address, it can be safely
|
|
; assumed that all of the bits must be ones.
|
|
;
|
|
fffff800`0112c938 48334558 xor rax,qword ptr [rbp+58h]
|
|
;
|
|
; [rbp+30h] -> DeferredContext ^ DecryptionKey
|
|
;
|
|
fffff800`0112c93c 48894530 mov qword ptr [rbp+30h],rax
|
|
;
|
|
; Set the required bits to ones in the decrypted
|
|
; pointer, as required to form a canonical address on
|
|
; current AMD64 systems.
|
|
;
|
|
fffff800`0112c940 48b80000000000f8ffff mov rax,0FFFFF80000000000h
|
|
;
|
|
; [rbp+30h] -> [rbp+30h] | 0xFFFFF80000000000
|
|
;
|
|
; Now, [rbp+30h] is the pointer to the decryption stub.
|
|
;
|
|
fffff800`0112c94a 48094530 or qword ptr [rbp+30h],rax
|
|
;
|
|
; The following instructions make extra copies of the decryption
|
|
; stub on the stack of the DPC routine. There is no real purpose
|
|
; to this, other than a half-hearted attempt to confuse anyone
|
|
; attempting to reverse engineer this section of PatchGuard.
|
|
;
|
|
; [rbp+38h] -> [rbp+30h] (Decryption stub)
|
|
;
|
|
fffff800`0112c94e 488b4530 mov rax,qword ptr [rbp+30h]
|
|
fffff800`0112c952 48894538 mov qword ptr [rbp+38h],rax
|
|
;
|
|
; [rbp+28h] -> [rbp+38h] (Decryption stub)
|
|
;
|
|
fffff800`0112c956 488b4538 mov rax,qword ptr [rbp+38h]
|
|
fffff800`0112c95a 48894528 mov qword ptr [rbp+28h],rax
|
|
;
|
|
; The next set of instructions rewrite the first
|
|
; four bytes of the initial opcode in the decryption
|
|
; stub. This opcode must be set to the following
|
|
; instruction:
|
|
;
|
|
; f0483111 lock xor qword ptr [rcx],rdx
|
|
;
|
|
; The individual opcode bytes for the instruction are
|
|
; written to the decryption stub one byte at a time.
|
|
;
|
|
; *(PULONG)DecryptionStub = 0x113148f0
|
|
;
|
|
fffff800`0112c95e 488b4528 mov rax,qword ptr [rbp+28h]
|
|
fffff800`0112c962 c600f0 mov byte ptr [rax],0F0h
|
|
fffff800`0112c965 488b4528 mov rax,qword ptr [rbp+28h]
|
|
fffff800`0112c969 c6400148 mov byte ptr [rax+1],48h
|
|
fffff800`0112c96d 488b4528 mov rax,qword ptr [rbp+28h]
|
|
fffff800`0112c971 c6400231 mov byte ptr [rax+2],31h
|
|
fffff800`0112c975 488b4528 mov rax,qword ptr [rbp+28h]
|
|
fffff800`0112c979 c6400311 mov byte ptr [rax+3],11h
|
|
;
|
|
; Finally, a call to the decryption stub is made. The
|
|
; decryption stub has a prototype that conforms to the
|
|
; following definition:
|
|
;
|
|
; VOID
|
|
; NTAPI
|
|
; PgDecryptionStub(
|
|
; __in PVOID PatchGuardRoutine,
|
|
; __in ULONG64 DecryptionKey,
|
|
; __in ULONG Reserved0,
|
|
; __in ULONG Reserved1
|
|
; );
|
|
;
|
|
; The two 'reserved' ULONG values are always set to zero.
|
|
;
|
|
; rcx is loaded with the address of the decryption stub,
|
|
; and rdx is loaded with the DecryptionKey value.
|
|
;
|
|
fffff800`0112c97d 4533c9 xor r9d,r9d
|
|
fffff800`0112c980 4533c0 xor r8d,r8d
|
|
fffff800`0112c983 488b5540 mov rdx,qword ptr [rbp+40h]
|
|
fffff800`0112c987 488b4d38 mov rcx,qword ptr [rbp+38h]
|
|
;
|
|
; At this point, control is transferred to the decryption
|
|
; stub, as described previously. The decryption stub will
|
|
; decrypt itself, decrypt the PatchGuard integrity check
|
|
; routine, and then transfer control to the PatchGuard
|
|
; integrity check routine. The integrity check routine is
|
|
; responsible for ensuring that the DPC is returned to a
|
|
; usable state (recall that parts of it were zeroed out
|
|
; by the DPC routine earlier), and that it is re-queued
|
|
; for execution. It is also responsible for re-encrypting
|
|
; the decryption stub as desired.
|
|
;
|
|
fffff800`0112c98b ff5538 call qword ptr [rbp+38h]
|
|
;
|
|
; After the call is made, the exception filter returns
|
|
; the EXCEPTION_EXECUTE_HANDLER manifest constant. This
|
|
; causes one of the registered handlers to be invoked
|
|
; in order to handle the exception. The handler will
|
|
; transfer control to the return point of the DPC routine,
|
|
; thus skipping the body of the DPC (since the call to
|
|
; the DPC was not a request for the legitimate function of
|
|
; the DPC to be performed).
|
|
;
|
|
fffff800`0112c98e 41b901000000 mov r9d,1
|
|
fffff800`0112c994 eb03 jmp nt!ExpTimeRefreshDpcRoutine+0x218 (fffff800`0112c999)
|
|
|
|
nt!ExpTimeRefreshDpcRoutine+0x215:
|
|
fffff800`0112c996 4533c9 xor r9d,r9d
|
|
|
|
nt!ExpTimeRefreshDpcRoutine+0x218:
|
|
fffff800`0112c999 418bc1 mov eax,r9d
|
|
fffff800`0112c99c 4883c420 add rsp,20h
|
|
fffff800`0112c9a0 5d pop rbp
|
|
fffff800`0112c9a1 c3 ret
|
|
|
|
This does represent a significant level of obfuscation, but it is not
|
|
impenetrable, and there are various simple ways through which an attacker
|
|
could bypass all of these layers of obfuscation entirely.
|
|
|
|
3.5) Disruption of Debug Register-Based Breakpoints
|
|
|
|
PatchGuard version 2 attempts to protect itself from breakpoints that are set
|
|
using the hardware debug registers. These breakpoints operate by setting up
|
|
to four designated memory locations that are of interest. Each memory
|
|
location can be configured to cause a debug exception when it is read,
|
|
written, or executed. Because breakpoints of this flavor are not visible to
|
|
PatchGuard's code integrity checks (unlike conventional breakpoints, these
|
|
breakpoints do not involve int 3 (0xcc) opcodes being substituted for target
|
|
instructions), debug register-based breakpoints (sometimes known as ``memory
|
|
breakpoints'' or ``hardware breakpoints'') pose a threat to PatchGuard.
|
|
PatchGuard attempts to counter this threat by disabling all such debug
|
|
register-based breakpoints as a first step after the system integrity checking
|
|
routine has been decrypted in-memory:
|
|
|
|
;
|
|
; Here, the second stage decryption sequence is
|
|
; set to run to decrypt the system integrity
|
|
; check routine. We step over the second stage
|
|
; decryption and examine the integrity check
|
|
; routine in its plaintext state...
|
|
;
|
|
|
|
fffffadf`f6edc043 8b4a4c mov ecx,dword ptr [rdx+4Ch]
|
|
fffffadf`f6edc046 483144ca48 xor qword ptr [rdx+rcx*8+48h],rax
|
|
fffffadf`f6edc04b 48d3c8 ror rax,cl
|
|
fffffadf`f6edc04e e2f6 loop fffffadf`f6edc046
|
|
fffffadf`f6edc050 8b8288010000 mov eax,dword ptr [rdx+188h]
|
|
fffffadf`f6edc056 4803c2 add rax,rdx
|
|
fffffadf`f6edc059 ffe0 jmp rax
|
|
fffffadf`f6edc05b 90 nop
|
|
;
|
|
; We set a breakpoint on the 'jmp rax' instruction
|
|
; above. This instruction is what transfers control
|
|
; to the system integrity check routine.
|
|
;
|
|
0: kd> ba e1 fffffadf`f6edc059
|
|
0: kd> g
|
|
Breakpoint 2 hit
|
|
fffffadf`f6edc059 ffe0 jmp rax
|
|
;
|
|
; rax now points to the decrypted system
|
|
; integrity check routine in-memory. The
|
|
; first call it makes is to a routine whose
|
|
; purpose is to disable all debug register-based
|
|
; breakpoints by clearing the debug control
|
|
; register (dr7). Doing so effectively turns
|
|
; off all of the debug register breakpoints.
|
|
;
|
|
0: kd> u @rax
|
|
fffffadf`f6edd8de 4883ec78 sub rsp,78h
|
|
fffffadf`f6edd8e2 48895c2470 mov qword ptr [rsp+70h],rbx
|
|
fffffadf`f6edd8e7 48896c2468 mov qword ptr [rsp+68h],rbp
|
|
fffffadf`f6edd8ec 4889742460 mov qword ptr [rsp+60h],rsi
|
|
fffffadf`f6edd8f1 48897c2458 mov qword ptr [rsp+58h],rdi
|
|
fffffadf`f6edd8f6 4c89642450 mov qword ptr [rsp+50h],r12
|
|
fffffadf`f6edd8fb 488bda mov rbx,rdx
|
|
fffffadf`f6edd8fe 4c896c2448 mov qword ptr [rsp+48h],r13
|
|
0: kd> u
|
|
fffffadf`f6edd903 e8863a0000 call fffffadf`f6ee138e
|
|
;
|
|
; The routine simply writes all zeros to dr7.
|
|
;
|
|
0: kd> u fffffadf`f6ee138e
|
|
fffffadf`f6ee138e 33c0 xor eax,eax
|
|
fffffadf`f6ee1390 0f23f8 mov dr7,rax
|
|
fffffadf`f6ee1393 c3 ret
|
|
|
|
3.6) Misleading Symbol Names
|
|
|
|
One of the things that Microsoft needed to consider when implementing
|
|
PatchGuard is that would-be attackers would have access to the operating
|
|
system symbols. As a debugging aid, Microsoft makes symbols for the entire
|
|
operating system publicly available. It is not feasible to remove the
|
|
operating system symbols from public access (doing so would severely hinder
|
|
ISVs in the process of debugging their own drivers). As a result, Microsoft
|
|
took the route of using misleading function names to shroud PatchGuard
|
|
routines from casual inspection. Many of the internal PatchGuard routines
|
|
have names that are seemingly legitimate-sounding at first glance, such that
|
|
without a detailed knowledge of the kernel or actually inspecting these
|
|
routines, it would be difficult to simply look at a list of all symbols in the
|
|
kernel and locate the routines responsible for setting up PatchGuard.
|
|
|
|
The following is a listing of some of the misleading symbols that are used
|
|
during PatchGuard initialization:
|
|
|
|
1. RtlpDeleteFunctionTable
|
|
2. FsRtlMdlReadCompleteDevEx
|
|
3. RtlLookupFunctionEntryEx
|
|
4. SdbpCheckDll
|
|
5. FsRtlUninitializeSmallMcb
|
|
6. KiNoDebugRoutine
|
|
7. SepAdtInitializePrivilegeAuditing
|
|
8. KiFilterFiberContext
|
|
|
|
3.7) Integrity Checks Performed During System Initialization
|
|
|
|
During system initialization, PatchGuard performs integrity checks on several
|
|
of the anti-debug mechanisms it has in place. If these mechanisms are altered
|
|
on-disk, PatchGuard will detect the changes. For example, PatchGuard
|
|
validates that the routine responsible for clearing debug register-based
|
|
breakpoints contains the correct opcode bytes corresponding to the
|
|
instructions used to actually zero out Dr7:
|
|
|
|
;
|
|
; Here, we are in SepAdtInitializePrivilegeAuditing, or the
|
|
; initialization routine for PatchGuard during system startup.
|
|
;
|
|
; This code fragment is designed to validate that the
|
|
; KiNoDebugRoutine routine contains the expected opcodes that
|
|
; are used to zero out debug register breakpoints. If the
|
|
; routine does not contain the correct opcodes, PatchGuard
|
|
; makes an early exit from SepAdtInitializePrivilegeAuditing.
|
|
;
|
|
INIT:0000000000832A6D lea rax, KiNoDebugRoutine
|
|
INIT:0000000000832A74 cmp dword ptr [rax], 230FC033h
|
|
INIT:0000000000832A7A jnz abort_initialization
|
|
INIT:0000000000832A80 add rax, 4
|
|
INIT:0000000000832A84 cmp word ptr [rax], 0C3F8h
|
|
INIT:0000000000832A89 jnz abort_initialization
|
|
|
|
3.8) Overwriting PatchGuard Initialization Code Post-Boot
|
|
|
|
After PatchGuard has initialized itself, it intentionally zeros out much of
|
|
the code responsible for setting up PatchGuard. It is assumed that this is
|
|
done in an attempt to prevent third party drivers from analyzing kernel code
|
|
in-memory in order to detect or defeat PatchGuard. This approach is obviously
|
|
trivially bypassed by opening the kernel image on disk, however.
|
|
|
|
After boot, many PatchGuard-related routines contain all zeros:
|
|
|
|
0: kd> u nt!KiNoDebugRoutine
|
|
nt!KiNoDebugRoutine:
|
|
fffff800`011a4b20 0000 add byte ptr [rax],al
|
|
|
|
nt!FsRtlUninitializeSmallMcb:
|
|
fffff800`011a4aa2 0000 add byte ptr [rax],al
|
|
|
|
0: kd> u nt!KiGetGdtIdt
|
|
nt!KiGetGdtIdt:
|
|
fffff800`011a4a20 0000 add byte ptr [rax],al
|
|
|
|
0: kd> u nt!RtlpDeleteFunctionTable
|
|
nt!RtlpDeleteFunctionTable:
|
|
fffff800`011a1010 0000 add byte ptr [rax],al
|
|
|
|
Most of the PatchGuard initialization code resides in the INITKDBG section of
|
|
ntoskrnl. Portions of this section are zeroed out during initialization.
|
|
|
|
4) Bypass Techniques
|
|
|
|
Despite the myriad anti-reverse-engineering and anti-debug techniques employed
|
|
by PatchGuard version 2, it is hardly invincible to being bypassed by third
|
|
party code. Contrary to one might expect, given the descriptions in the
|
|
initial section of this article, there are a number of holes in PatchGuard's
|
|
armor that can be exploited by third party software. Several potential
|
|
techniques for bypassing PatchGuard version 2 are outlined below, including
|
|
one technique that includes functional proof of concept code. These
|
|
techniques are applicable to the version of PatchGuard currently shipping with
|
|
Windows XP x64 Edition with all hotfixes, Windows Server 2003 x64 Edition with
|
|
all hotfixes, and Windows Vista x64 with all hotfixes at the time that this
|
|
article was written. The author has only written a complete implementation of
|
|
the first proposed bypass technique, although the remaining proposed bypass
|
|
approaches are expected to be viable in principle.
|
|
|
|
4.1) Interception of _C_specific_handler
|
|
|
|
The simplest course of action for disabling PatchGuard version 2 is, in the
|
|
author's opinion, to intercept execution at _C_specific_handler. The
|
|
_C_specific_handler routine is responsible for dispatching exceptions for
|
|
routines compiled with the Microsoft C/C++ compiler (and using try/except,
|
|
try/finally, or try/catch clauses). This set of functions includes all ten of
|
|
the PatchGuard DPC routines and most other C/C++ functions in the kernel. It
|
|
also includes many third party driver routines as well; _C_specific_handler is
|
|
exported, and the compiler references this function for all C/C++ images that
|
|
utilize SEH in some form (imported from ntoskrnl). Due to this, Microsoft is
|
|
forced to export _C_specific_handler from the kernel perpetually, making it
|
|
difficult for Microsoft to deny access to the routine's address from the
|
|
perspective of third party drivers. Furthermore, because _C_specific_handler
|
|
is exported from the kernel, it is trivial to retrieve its address across all
|
|
kernel versions from the context of a third party driver. This approach
|
|
capitalizes on the fact that PatchGuard utilizes SEH in order to obfuscate the
|
|
call to the system integrity checking routine, in effect turning this
|
|
obfuscation mechanism into a convenient way to hijack execution control before
|
|
the system integrity check is actually performed.
|
|
|
|
This approach can be implemented in several different ways, but the basic idea
|
|
is to intercept execution somewhere between the faulting instruction in the
|
|
PatchGuard DPC (whichever is selected at boot time), and the exception
|
|
handlers associated with the DPC routine which invoke the PatchGuard system
|
|
integrity check routine. With this in mind, _C_specific_handler is exactly
|
|
what one could hope for; _C_specific_handler is invoked when the benign access
|
|
violation triggered by the bogus DeferredContext value to the PatchGuard DPC
|
|
routine is called. Furthermore, being exported, there are no concerns with
|
|
compatibility with future kernel versions, or different flavors of the kernel
|
|
(PAE vs non-PAE, MP vs UP, and soforth).
|
|
|
|
Although hooking _C_specific_handler provides a convenient way to gain control
|
|
of execution in the execution path for the PatchGuard check routine, there
|
|
remains the problem of how to safely defuse the check routine and resume
|
|
execution at a safe point such that DPCs continue to be processed by the
|
|
system in a timely fashion. On x86, this would pose a serious problem, as in
|
|
this context, we (as an attacker attempting to bypass PatchGuard) would gain
|
|
control at an exception handler with a context record describing the context
|
|
at middle of the PatchGuard DPC routine, with no good way to unwind the
|
|
context back up to the DPC routine's caller (the kernel timer DPC dispatcher).
|
|
|
|
Ironically, by virtue of being only on x64 and not x86, this problem is made
|
|
trivial where it might have been difficult to solve in a generalized fashion
|
|
on x86. Specifically, there is extensive unwind support baked into the core
|
|
of the x64 calling convention on Windows, such that there exists metadata
|
|
describing how to unwind any function that manipulates the stack at any point
|
|
in its execution lifetime. This metadata is used to implement unwind
|
|
semantics that allow functions to be cleanly unwound without having to call
|
|
exception/unwind handlers implemented in code that depend on the execution
|
|
context of the routine they are associated with. This extensive unwind
|
|
metadata can be used to our advantage here, as it provides a clean mechanism
|
|
to unwind past the DPC routine (to the DPC dispatcher) in a completely
|
|
compatible and kernel-version-independent manner. Furthermore, there is no
|
|
good way for Microsoft to disable this unwind metadata, given how deeply
|
|
involved it is with the x64 calling convention.
|
|
|
|
The process of using the unwind metadata of a function to unwind an execution
|
|
context is known as a virtual unwind, and there is a documented, exported
|
|
routine [5] to implement this mechanism: RtlVirtualUnwind. Using
|
|
RtlVirtualUnwind, it is possible to alter the execution context that is
|
|
provided as an argument to _C_specific_handler (and thus the hook on
|
|
_C_specific_handler). This execution context describes the machine state at
|
|
the time of the access violation in the PatchGuard DPC routine. After
|
|
performing a virtual unwind on this execution context, all that remains is to
|
|
return the manifest ExceptionContinueExecution constant to the kernel mode
|
|
exception dispatcher in order to realize the altered context. This completely
|
|
bypasses the PatchGuard system integrity check. As an added bonus, the hook
|
|
on _C_specific_handler is only needed until the first time PatchGuard is
|
|
called. This is due to the fact that the PatchGuard timer is a one-shot
|
|
timer, and as the code to re-queue the timer is skipped by the virtual unwind,
|
|
PatchGuard is effectively permanently disabled for the remainder of the
|
|
Windows boot session.
|
|
|
|
The last remaining obstacle with this bypass technique is filtering out the
|
|
specific PatchGuard access violation exceptions from legitimate access
|
|
violations that kernel mode code may produce. This is important, as access
|
|
violations in kernel mode are a normal part of parameter validation (the probe
|
|
and lock model used to validate user mode pointers) for drivers and system
|
|
services. Fortunately, it is easy to make this determination, as it is
|
|
generally only legal to use a try/except to catch an access violation relating
|
|
to a user mode address from kernel mode (as previously described). PatchGuard
|
|
is a rare exception to this rule, in that it has a well-defined no-mans-land
|
|
region where accesses can be attempted without fear of a bugcheck occurring.
|
|
As a result, it is a safe assumption that any access violation relating to a
|
|
kernel mode address is either PatchGuard trigger its own execution, or a very
|
|
badly behaved third party driver that is grossly breaking the rules relating
|
|
to Windows kernel mode drivers. It is the author's opinion that the latter
|
|
case is not worth considering as a blocker, especially since if such a
|
|
completely broken driver were to exist, it would already be randomly bringing
|
|
the system down with bugchecks. It is worth noting, as an addendum, that the
|
|
referenced address in the exception information block passed to the exception
|
|
handler will always be 0xFFFFFFFF`FFFFFFFF due to how violations on
|
|
non-canonical addresses are reported by the processor. This does not impact
|
|
the viability of this technique as a valid way to bypass PatchGuard in a
|
|
version-independant manner, however.
|
|
|
|
It is worth noting that the fact that this technique involves modifying the
|
|
kernel is not a problem (aside from the inherent race conditions involved in
|
|
safely patching a running binary). The hook will disable PatchGuard before
|
|
PatchGuard has a chance to notice the hook from the context of the system
|
|
integrity check routine.
|
|
|
|
This proposed approach has several advantages over the previously suggested
|
|
approach by Uninformed's original paper on PatchGuard[2]. Specifically, it does
|
|
not involve locating each individual DPC routine (and does not even rely on
|
|
any sort of code fingerprinting; only exported symbols are used). This
|
|
improves both the reliability of the proposed approach (as code fingerprinting
|
|
always introduces an additional margin of error as far as false positives go)
|
|
and its resiliency to attack by Microsoft. Because this technique relies
|
|
solely on exported functions, and does not carry any sort of dependency on how
|
|
many possible DPCs are available to PatchGuard for use (or any sort of
|
|
dependency on locating them at runtime), blocking this approach would be
|
|
significantly more involved than simply adding another possible DPC routine or
|
|
changing the attributes of an existing DPC routine in an effort to third-party
|
|
drivers that were taking a signature-based approach to locating DPC routines
|
|
for patching.
|
|
|
|
Although this technique is quite resilient to kernel changes that do not
|
|
directly involve the underlying mechanisms by which PatchGuard itself
|
|
functions (the fact that it can operate unmodified on both Windows Server 2003
|
|
x64 and Windows Vista x64 is testament to this fact), there are a number of
|
|
different ways by which Microsoft could block this attack in a future update
|
|
to PatchGuard. The most obvious solution is to entirely abandon SEH as a core
|
|
mechanism involved in arranging for the PatchGuard system integrity check.
|
|
Abandoning SEH removes the convenient mechanism (hooking _C_specific_handler)
|
|
that is presented here as a version-independent way to hook in to the
|
|
execution path involved in PatchGuard's system integrity check. If Microsoft
|
|
were to go this route, a would-be attacker would need to devise another
|
|
mechanism to achieve control of execution before the system integrity check
|
|
runs. Assuming that Microsoft played their hand correctly, a future
|
|
PatchGuard revision would not have such an easily-accessible mechanism to hook
|
|
into the execution process in a generic manner, largely counteracting this
|
|
proposed approach. Microsoft could also employ some sort of pre-validation of
|
|
the exception handler path before the DPC triggers an exception, although
|
|
given that this is not the easiest and most elegant way to counter such a
|
|
technique, the author feels that it is an unlikely solution.
|
|
|
|
4.2) Interception of DPC Exception Registration
|
|
|
|
Presently, all execution paths leading to the execution of PatchGuard DPC
|
|
routines involve an exception/unwind handler. This is another single point of
|
|
failure weakness that can be exploited by third parties attempting to disable
|
|
PatchGuard. An approach involving the detection of all of the PatchGuard DPC
|
|
routines, followed by interception of the exception handler registrations for
|
|
each DPC is proposed as another means of defeating PatchGuard.
|
|
|
|
Though this technique is not as clean or clear-cut as the technique proposed
|
|
in 4.1, this approach is considered by the author as a viable bypass mechanism
|
|
for PatchGuard version 2. This technique essentially involves patching the
|
|
exception registrations for each possible DPC routine that could be used by
|
|
PatchGuard, such that each exception registration points to a routine that
|
|
employs a virtual unwind to safely exit out of the PatchGuard DPC without
|
|
invoking the system integrity check. Any such approach faces several
|
|
obstacles, however.
|
|
|
|
The first major difficulty for this technique is locating each PatchGuard DPC.
|
|
Since none of the PatchGuard DPC routines are exported, a little bit more
|
|
creative thinking is involved in finding the locations to patch. The author
|
|
feels that a combination of pattern matching and code fingerprinting would
|
|
best serve this goal; there are a number of commonalities between the
|
|
different PatchGuard DPC routines that could be used to locate them with a
|
|
relatively high degree of confidence in PatchGuard version 2. Specifically,
|
|
the author feels that the following criteria are acceptable for use in
|
|
detecting the PatchGuard DPC routines:
|
|
|
|
1. Each DPC routine has one exception/unwind-marked registration with
|
|
_C_specific_handler.
|
|
2. Each DPC routine has exactly four _C_specific_handler scopes.
|
|
3. Each DPC routine is referenced in raw address form (64-bit pointer) in
|
|
the executable code sections comprising ntoskrnl at least twice.
|
|
4. Each DPC routine has at least two _C_specific_handler scopes with an
|
|
associated unwind/exception handler.
|
|
5. Each DPC routine has exactly one Cspecifichanlder scope with a call to a
|
|
common subfunction that references RtlUnwindEx (an exported routine).
|
|
6. Each DPC routine has several sets of distinctive, normally rare
|
|
instructions (ror/rol instructions).
|
|
|
|
Given several (or even all) of these criteria, it should be possible to
|
|
accurately locate all ten DPC routines via scanning non-pagable code in the
|
|
kernel. It is possible to locate the exception registration information for
|
|
the DPC routines through processing of the exception directory for the kernel
|
|
(and indeed, most of the criteria require doing this as a prerequisite).
|
|
Locating the kernel image base is fairly trivial as well; the address of an
|
|
exported routine can be taken, and truncated to a 64K region. From there, one
|
|
need only perform downward searches in 64K increments for the DOS header
|
|
signature (followed by a check for a PE32+ header).
|
|
|
|
Another hurdle that must be solved for this approach is the placement of the
|
|
replacement exception handler routines. These routines are required to be
|
|
within 4GB of the kernel image base (there is only a 32-bit RVA in the unwind
|
|
metadata), meaning that in general, it is not practical to simply store them
|
|
in a driver binary or pool allocation (by default, these addresses are usually
|
|
far more than 4GB away from the kernel image base). There are no documented
|
|
and exported routines to allocate kernel mode virtual memory at a specific
|
|
virtual address to the author's knowledge. However, other, less savory
|
|
approaches could theoretically be taken (such as allocating physical memory
|
|
and altering paging structures directly to create a valid memory region within
|
|
4GB of the kernel image base).
|
|
|
|
After one has solved these difficulties, the rest of this approach is fairly
|
|
trivial (and similar to portions of the technique described in 4.1).
|
|
Specifically, the replaced exception handlers need to invoke RtlVirtualUnwind
|
|
to unwind back to the kernel DPC dispatcher, and then request that execution
|
|
be resumed at the unwound context.
|
|
|
|
This mechanism is not nearly as robust as the first in the author's point of
|
|
view, though both approaches could be disabled by abandoning SEH entirely as a
|
|
critical path in the execution of the PatchGuard system integrity check
|
|
routine. Specifically, Microsoft could change the characteristics of the DPC
|
|
routines in an attempt to frustrate fingerprinting and detection of them at
|
|
runtime. Pre-validation of unwind metadata (or additional checks in the
|
|
exception dispatcher itself to ensure that all SEH routines registered as part
|
|
of an image are within the confines of the image in-memory) could also be used
|
|
to defeat this technique. There are other security benefits to validating
|
|
that SEH routines on x64 that are registered as part of an image really exist
|
|
within an image, as will be discussed below. As such, the author would expect
|
|
this to appear in a future Windows version.
|
|
|
|
4.3) Interception of PsInvertedFunctionTable
|
|
|
|
Another variation on the theme of intercepting PatchGuard within the SEH code
|
|
path critical to the system integrity check routine involves taking advantage
|
|
of an optimization that exists in the x64 exception dispatcher. Specifically,
|
|
it is possible to utilize the fact that the exception dispatcher on x64 uses a
|
|
cache to improve the performance of exception handling. By taking advantage
|
|
of this cache, it may be possible to intercept control of execution when the
|
|
PatchGuard DPC routine deliberately creates an access violation exception in
|
|
order to trigger the system integrity check. This proposed technique uses the
|
|
nt!PsInvertedFunctionTable global variable in the kernel, which represents a
|
|
cache used to perform a fast translation of RIP values to an associated image
|
|
base and exception directory pointer, without having to do a (slow) search
|
|
through the linked list of loaded kernel modules.
|
|
|
|
This technique is fairly similar to the one described in technique 4.2. Instead
|
|
of altering the actual exception directory entries corresponding to each
|
|
PatchGuard DPC routine in the kernel's image in-memory, this technique alters
|
|
the cached exception directory pointer stored within PsInvertedFunctionTable.
|
|
PsInvertedFunctionTable is consulted by RtlLookupFunctionTableEntry, in order
|
|
to translate a RIP value to an associated image (and unwind metadata block).
|
|
The logic within RtlLookupFunctionTable is essentially to search through the
|
|
cached entries resident in PsInvertedFunctionTable for an image that
|
|
corresponds to a given RIP value. If a hit is found, then the exception
|
|
directory pointer is loaded directly from the PsInvertedFunctionTable cache,
|
|
instead of through the (slower) process of parsing the PE header of the given
|
|
image. If no hit is found, then the loaded module linked list is searched.
|
|
Assuming a hit is made in the loaded module list, then the PE header for the
|
|
associated module is processed in order to locate the exception directory for
|
|
the module. From there, the exception directory is searched to locate the
|
|
unwind metadata block corresponding to the function containing the specified
|
|
RIP value.
|
|
|
|
The structure backing PsInvertedFunctionTable (RTL_INVERTED_FUNCTION_TABLE)
|
|
can be described as so in C:
|
|
|
|
typedef struct _RTL_INVERTED_FUNCTION_TABLE_ENTRY
|
|
{
|
|
PIMAGE_RUNTIME_FUNCTION_ENTRY ExceptionDirectory;
|
|
PVOID ImageBase;
|
|
ULONG ImageSize;
|
|
ULONG ExceptionDirectorySize;
|
|
} RTL_INVERTED_FUNCTION_TABLE_ENTRY, * PRTL_INVERTED_FUNCTION_TABLE_ENTRY;
|
|
|
|
typedef struct _RTL_INVERTED_FUNCTION_TABLE
|
|
{
|
|
ULONG Count;
|
|
ULONG MaxCount; // always 160 in Windows Server 2003
|
|
ULONG Pad[ 0x2 ];
|
|
RTL_INVERTED_FUNCTION_TABLE_ENTRY Entries[ ANYSIZE_ARRAY ];
|
|
} RTL_INVERTED_FUNCTION_TABLE, * PRTL_INVERTED_FUNCTION_TABLE;
|
|
|
|
In Windows Server 2003, there is space reserved for up to 160 loaded modules
|
|
in the array contained within PsInvertedFunctionTable. In Windows Vista, this
|
|
number has been expanded to 512 module entries. The array of loaded modules
|
|
is maintained by the system module loader such that when a module is loaded or
|
|
unloaded, a corresponding entry within PsInvertedFunctionTable is created or
|
|
deleted, respectively. It is not a fatal error for the module array within
|
|
PsInvertedFunctionTable to be exhausted; in this case, performance for
|
|
exception dispatching relating to additional modules will be slower, but the
|
|
system will still function.
|
|
|
|
Because the RIP-to-exception-directory cache described by
|
|
PsInvertedFunctionTable maintains a full 64-bit pointer to the exception
|
|
directory of the associated module, it is possible to disassociate the cached
|
|
exception directory pointer from its corresponding image. In other words, it
|
|
is possible to modify the ExceptionDirectory member of a particular cached
|
|
RTL_INVERTED_FUNCTION_TABLE_ENTRY to point to an arbitrary location instead of
|
|
the exception directory of that module. There are no security or integrity
|
|
checks that validate that the ExceptionDirectory member points to within the
|
|
given image. This could be exploited by a third-party driver in order to take
|
|
control of exception dispatching for any of the first 160 (or 512, in the case
|
|
of Windows Vista) kernel modules. This loaded module list includes critical
|
|
images such as the HAL (typically the first entry in the cache) and the kernel
|
|
itself (typically the second entry in the cache). With respect to bypassing
|
|
PatchGuard, this makes it possible for a third party driver to copy the
|
|
exception directory data of the kernel to dynamically allocated memory and
|
|
adjust it such that exception handlers for the PatchGuard DPC routines point
|
|
to a stub function that invokes a virtual unwind as described in technique 4.2.
|
|
After setting up its altered shadow copy of the exception directory for the
|
|
kernel, all that a third party driver would need to do is swap the
|
|
ExceptionDirectory pointer within the PsInvertedFunctionTable cache entry for
|
|
the kernel with the pointer to the shadow copy. Following that, this approach
|
|
is essentially the same as the proposed approach described in 4.2. It has the
|
|
added advantage of being more difficult to detect from the perspective of
|
|
validating the integrity of the exception dispatching path, as the exception
|
|
directory associated with the kernel image in-memory is not actually altered;
|
|
only a pointer to the exception directory in a cache is changed.
|
|
|
|
This approach does require a reliable mechanism to detect
|
|
PsInvertedFunctionTable (which is not exported) at run-time, however. The
|
|
author feels that this is not a particularly difficult task, as the first few
|
|
members of PsInvertedFunctionTable (specifically, the maximum entry count and
|
|
the entries for the HAL and kernel) will have predictable values that can be
|
|
used in a classic egghunt style search of kernel global variable space.
|
|
Additional heuristics, such as requiring several data references to the
|
|
suspected PsInvertedFunctionTable location within kernel code could be applied
|
|
as well, in the interest of improving accuracy.
|
|
|
|
This proposed approach may be countered by many of the proposed counters to
|
|
techniques 4.1 and 4.2. Additionally, this technique could also be countered by
|
|
validating exception directory pointers within PsInvertedFunctionTable, such
|
|
as by ensuring that such exception directory pointers are within the confines
|
|
of the purported associated image. Although this validation is not perfect
|
|
since it might still be possible for one to reposition the exception directory
|
|
pointer to a different location within the image that could be safely modified
|
|
at runtime, such as overlapping a large global variable array or the like, it
|
|
would certainly increase the difficulty of subverting the exception
|
|
dispatcher's RIP translation cache. Additional validation techniques, such as
|
|
requiring that the exception directory point to read-only memory, could be
|
|
similarly adopted to reduce the chance that a third party driver could
|
|
meaningfully subvert the cache (with results leading to something other than a
|
|
system crash).
|
|
|
|
It should be noted that in the current implementation, PsInvertedFunctionTable
|
|
presents a relatively inviting target for potentially malicious software to
|
|
hijack parts of the kernel without being detected. Indeed, through careful
|
|
planned subversion of PsInvertedFunctionTable, third party software could take
|
|
control of exception dispatchers throughout the kernel in order to gain
|
|
control of execution. Though this technique would be much more limited than
|
|
outright kernel patching, it has the advantage of being completely undetected
|
|
by current PatchGuard versions (which cannot validate global variables that
|
|
may change without notice at runtime, for obvious reasons). It also has the
|
|
advantage of being undetected by current rootkit detection systems, which are
|
|
presently (to the author's knowledge) blissfully unaware of
|
|
PsInvertedFunctionTable. Although it would require administrative permissions
|
|
(or an exploit granting such permissions) for an attacker to modify
|
|
PsInvertedFunctionTable in the first place, Microsoft has at late focused a
|
|
great deal of effort on protecting the kernel even from users with
|
|
administrator permissions. For example, one could conceive of a rootkit-style
|
|
program that intercepts exception dispatchers for system services, and passes
|
|
invalid user mode pointers to system services in order to surreptitiously
|
|
execute kernel mode code without detection when the standard pointer probe
|
|
throws an exception indicating that the given usermode pointer parameter is
|
|
invalid. Given this sort of threat (from the rootkit perspective), the author
|
|
feels that it would be in Microsoft's best interests to put into place
|
|
additional validation of PsInvertedFunctionTable's cached exception directory
|
|
pointers (assuming that Microsoft wishes to continue down the path of
|
|
strengthening the kernel against malicious administratively-privileged code).
|
|
|
|
4.4) Interception of KiDebugTrapOrFault
|
|
|
|
Although many of the proposed techniques for blocking PatchGuard have so far
|
|
relied on the fact that PatchGuard utilizes SEH to kick off execution of the
|
|
system integrity check, there are different approaches that can be taken which
|
|
do not rely on this specific PatchGuard implementation detail. One such
|
|
alternative technique for bypassing PatchGuard involves subverting the kernel
|
|
debug fault handler: KiDebugTrapOrFault. This handler represents the entry
|
|
point for all debug exceptions (such as so-called hardware breakpoints), and
|
|
as such presents an attractive target for bypassing PatchGuard.
|
|
|
|
The basis of this proposed technique is to utilize a set of hardware
|
|
breakpoints to intercept execution at a convenient critical location within
|
|
PatchGuard's execution path leading up to the system integrity check. This
|
|
technique has a greater degree of flexibility than many of the previously
|
|
described techniques, though this flexibility comes at cost of a significantly
|
|
more involved (and difficult) implementation. Specifically, one could use
|
|
this proposed technique to intercept control at any point critical to the
|
|
execution of PatchGuard's system integrity check (for example, the kernel DPC
|
|
dispatcher, one of the PatchGuard DPC routines, or a convenient location in
|
|
the exception dispatching code path, such as _C_specific_handler.
|
|
|
|
The means by which this interception of execution could be accomplished is by
|
|
assuming control of debug exception handling. This could be done in several
|
|
different ways; for example, one could hook KiDebugTrapOrFault or alter the
|
|
IDT directory to simply repoint the debug exception to driver-supplied code,
|
|
bypassing KiDebugTrapOrFault entirely. There are even ways that this
|
|
interception could be done in a way that is transparent to the current
|
|
PatchGuard implementation, such as by intercepting PsInvertedFunctionTable as
|
|
described in technique 4.3. A driver could then alter the unwind metadata for
|
|
KiDebugTrapOrFault and create an exception handler for this routine. This
|
|
step would allow transparent, first-chance access to all debug faults (because
|
|
KiDebugTrapOrFault internally constructs and dispatches a STATUS_SINGLE_STEP
|
|
exception describing the debug fault; normally, this would present the
|
|
STATUS_SINGLE_STEP exception to a debugger, but there is no technical reason why
|
|
a standard SEH-style exception handler could not catch the exception).
|
|
Regardless of how control of execution at the debug trap handler is gained,
|
|
the next step in this proposed approach is to alter execution at the requested
|
|
point of interest (whether it be the kernel timer DPC dispatcher, which could
|
|
be easily found by queuing a DPC and executing a virtual unwind, or a
|
|
PatchGuard DPC routine, or _C_specific_handler or any other place of interest
|
|
in the critical PatchGuard execution path) to prevent PatchGuard's system
|
|
integrity check from executing.
|
|
|
|
After the implementor has established control over the debug trap handler
|
|
(through whichever means desired), all that remains is to set
|
|
debug-register-based breakpoints on target locations. When these breakpoints
|
|
are hit, control is transferred to the debug trap handler, and from there to
|
|
the implementor's driver code which can act as necessary, such as by altering
|
|
the execution context of the processor at the time of the exception before
|
|
resuming execution.
|
|
|
|
The advantages of this approach over directly patching into kernel code (i.e.
|
|
opcode replacement) are threefold. First, it is more flexible in that there
|
|
are no difficulties with placing an absolute 64-bit jump in an arbitrary
|
|
location (in x64, this typically takes around 12 opcode bytes to do from any
|
|
arbitrary location in memory). For example, one does not have to worry about
|
|
whether a the opcode space overwritten by the jump might overlap a whole
|
|
instruction boundary that is a jump target, which might lead to invalid code
|
|
being executed. Secondly, this approach can be used to get out of having to
|
|
implement a disassembler (or other similar forms of code analysis) in kernel
|
|
mode, as hardware breakpoints allow one to gain control of execution at a
|
|
precise location without having to worry about creating enough space for a
|
|
jump patch, and then placing the original instructions back into a jump stub
|
|
to allow execution to resume at the original effective instruction stream (if
|
|
desired). Finally, if done correctly, this technique could be implemented in
|
|
a truly race-condition free manner (as the only patching that would need to be
|
|
done is an interlocked 8-byte swap of a pointer-aligned value in
|
|
PsInvertedFunctionTable, if one took that approach).
|
|
|
|
This approach does require that the implementor pick a location (or multiple
|
|
locations) in the kernel that are to have breakpoints set over in order to
|
|
gain execution control. There are many possibilities, such as the DPC
|
|
dispatcher (where one could filter out the PatchGuard DPC by detecting, say,
|
|
invalid kernel pointers in DeferredContext), the execution dispatcher path
|
|
(where one could unwind past a PatchGuard DPC's access violation exception), a
|
|
PatchGuard DPC itself (where one could again unwind past with
|
|
RtlVirtualUnwind, bypassing PatchGuard if the DPC is being invoked by
|
|
PatchGuard), or any other choice area. One of the advantages of this approach
|
|
is that it is comparatively easy to intercept execution anywhere in the kernel
|
|
that can be reliably located across kernel versions, making it potentially a
|
|
great deal more flexible to being easily adapted to defeat future PatchGuard
|
|
implementations than some of the previously discussed bypass techniques.
|
|
|
|
Normally, the kernel has logic in place that prevents stray kernel addresses
|
|
from being placed in debug registers by user mode code via NtSetContextThread.
|
|
It may be necessary to make additional alterations to ensure that the custom
|
|
values in the debug registers are persisted across context switches, via the
|
|
same mechanisms used by the kernel debugger to persist debug registers.
|
|
|
|
In the author's opinion, this technique would be difficult for Microsoft to
|
|
defeat in principle, barring hardware support (like virtualization). Although
|
|
Microsoft could move around critical code paths for PatchGuard, this technique
|
|
presents a general mechanism by which any location in the kernel could be
|
|
surreptitiously intercepted, thus lending itself to relatively easy adaptation
|
|
to future PatchGuard revisions. One approach that could be taken is to
|
|
perform increased validation of the debug trap handler in an attempt to make
|
|
it more difficult to intercept without being detected by PatchGuard or some
|
|
other validation mechanism. Other counters to this sort of tactic (in
|
|
general) would be to make it difficult to reliably locate all of the critical
|
|
code paths in a consistent and reliable manner across all kernel versions,
|
|
from the perspective of a third party driver. This is likely to prove
|
|
difficult, as a great deal of the internal workings of the kernel are exposed
|
|
in some way to drivers (i.e. exported functions), or are otherwise indirectly
|
|
exposed to drivers (i.e. trap labels via the IDT, exception handlers via
|
|
unwind metadata and exports used in the process of dispatching exceptions to
|
|
SEH registrations). Completely insulating PatchGuard from all such externally
|
|
visible locations (that could be comparatively easily compromised by a third
|
|
party driver) would, as a result, likely be an arduous task.
|
|
|
|
The debug trap handler can be used to do more than simply evade PatchGuard for
|
|
purposes of allowing conventional kernel code patches via opcode replacement.
|
|
It can also be utilized in order to completely eliminate the need to perform
|
|
opcode-replacement-based kernel patches in order to gain execution control.
|
|
In this vein, via assuming control of the debug trap handler in a way that is
|
|
transparent to PatchGuard (such as via the proposed
|
|
PsInvertedFunctionTable-based approach), it would then be possible to set
|
|
debug-register-based breakpoints at every address of interest (assuming that
|
|
there enough debug registers to patch all of the locations of interest). From
|
|
the debug trap handler, it is possible to completely alter the execution
|
|
context at the point of the debug exception, which is exactly the same as what
|
|
one could do via traditional opcode-replacement-based patching for a given
|
|
location. This sort of transparent patching would be extremely difficult for
|
|
Microsoft to detect, because the debug registers must remain available for use
|
|
by the kernel debugger. Without completely crippling the ability of the
|
|
kernel debugger to set breakpoints without being attached before PatchGuard is
|
|
initialized, the author does not see a particularly viable (i.e. without a
|
|
trivial workaround) way for Microsoft to prevent the use of debug registers to
|
|
alter execution context at select points in the kernel (from a third party
|
|
driver). Because such an approach would capitalize on the fact that Microsoft
|
|
must, from a business case perspective, make it possible for IHVs and ISVs to
|
|
debug their code on Windows, the author believes that it would be unlikely to
|
|
be successfully disabled by Microsoft. Furthermore, because such techniques
|
|
can be implemented without even having the basic requirement of disabling
|
|
PatchGuard, they would be inherently much more likely to work with future
|
|
PatchGuard revisions. After all, if PatchGuard can't even detect changes to
|
|
the kernel (because kernel code isn't being patched), then there is no reason
|
|
to even bother with trying to disable it, which gets one out of the
|
|
comparatively messy business of playing catch-up with Microsoft with each new
|
|
PatchGuard revision.
|
|
|
|
4.5) General Detect Bit Interception
|
|
|
|
One of PatchGuard's anti-debug mechanisms relates to debug registers.
|
|
Specifically, PatchGuard attempts to clear Dr7 (the debug control register) in
|
|
an attempt to disable all debug-register-based breakpoints, as one of the
|
|
first tasks upon entering the system integrity check routine. This presents
|
|
an inherent weakness within PatchGuard, as there is support built-in to the
|
|
processor that allows one to detect (and intercept) direct accesses to any of
|
|
the debug registers. This support is primarily legacy, intended for so-called
|
|
in-circuit emulators (ICEs), which were special hardware components that acted
|
|
as a true hardware-based debugger by allowing one to control a processor from
|
|
outside the context of the system entirely, in essence truly isolating the
|
|
debugger from the operating system and any programs running under it. This
|
|
support is embodied in the General Detect bit in Dr7, which when set, causes a
|
|
debug trap to be generated on any successful access to a debug register. This
|
|
is significant in that it provides a way for an attacker to trap PatchGuard's
|
|
access to Dr7 (zeroing it), which in effect provides a means to pinpoint the
|
|
exact location of PatchGuard's system integrity routine in-memory,
|
|
in-plaintext. Furthermore, it gives an attacker the possibility of making any
|
|
alterations desired to the execution context at the very start of the system
|
|
integrity check, which could be trivially used in order to simply implement an
|
|
immediate return out of the system integrity check logic without actually
|
|
verifying the system's integrity (as dr7 is zeroed before any integrity checks
|
|
are performed). This approach effectively turns another one of PatchGuard's
|
|
protection mechanisms against it, utilizing the anti-debug-register behavior
|
|
to detect (and block) PatchGuard.
|
|
|
|
The general idea behind this approach is similar to that described in
|
|
technique 4.4. In the same fashion as in technique 4.4, an implementor of this
|
|
approach is required to gain control of the debug trap handler. For this
|
|
task, any of the proposed approaches in technique 4.4 may be used. After control
|
|
of the debug trap handler is established, an attacker must then set the
|
|
general detect bit in Dr7 and wait for PatchGuard to access the debug
|
|
registers. It should be noted that during the legitimate course of execution,
|
|
the kernel itself will often directly access debug registers, such as during
|
|
context switches or if NtSetContextThread/NtGetContextThread are invoked. Any
|
|
such implementation of this technique must be able to differentiate between
|
|
PatchGuard's accesses of the debug registers and legitimate accesses. This
|
|
could be trivially implemented by checking if the RIP value at the time of the
|
|
trap was within a valid kernel image or not, as the PatchGuard system
|
|
integrity check routine resides in dynamically allocated non-paged pool and
|
|
not within the confines of the kernel images in-memory.
|
|
|
|
When the debug trap handler is invoked as a result of PatchGuard zeroing Dr7,
|
|
then the appropriate action (which could be as trivial as simply executing a
|
|
hard return out of the system integrity check routine) can be taken by the
|
|
third-party driver wishing to disable PatchGuard.
|
|
|
|
Like the techniques that capitalize on PatchGuard's use of SEH to obfuscate
|
|
the call to the system integrity check routine, this approach relies on using
|
|
one of PatchGuard's defensive mechanisms against it. The most obvious counter
|
|
would be to thus remove the behavior of zeroing debug registers. However,
|
|
disabling this behavior may not be very desirable, as it would then be very
|
|
easy to detect PatchGuard by, say, setting a read breakpoint on kernel code
|
|
and waiting for PatchGuard to perform a read. Since reads of kernel code (as
|
|
opposed to execute fetches) are fairly atypical, this would open up another
|
|
easy mechanism by which PatchGuard could be bypassed.
|
|
|
|
The best course of action by Microsoft here would be to make it as difficult
|
|
as possible to differentiate between legitimate accesses to debug registers
|
|
and PatchGuard's own accesses, although this is likely to not be very doable.
|
|
Strengthening the debug trap path against interception by placing additional
|
|
validation checks over that code path might also be useful in countering this
|
|
technique, although likely to only a limited, easily-bypassable extent.
|
|
|
|
4.6) Patching the Kernel Timer DPC Dispatcher
|
|
|
|
Currently, PatchGuard utilizes a timer with an associated DPC to transfer
|
|
control to a preselected one of ten possible legitimate DPC routines that have
|
|
been slightly modified for use with PatchGuard. Because third party kernel
|
|
drivers are given a documented and exported interface to create timers with
|
|
associated DPC routines, this represents a weakness in PatchGuard, in that it
|
|
presents an easily-detectable location in the critical execution path for
|
|
PatchGuard's system integrity check routine that could be relatively easily
|
|
compromised by a third-party driver. This technique focuses on gaining
|
|
control of the timer DPC dispatcher, with the goal of detecting when the
|
|
PatchGuard DPC is about to be dispatched. When the PatchGuard DPC is
|
|
detected, then the third-party driver could skip over the PatchGuard DPC
|
|
routine entirely, thus disabling PatchGuard.
|
|
|
|
In order to accomplish this, a third party driver would need to locate the
|
|
exact instruction within the kernel timer DPC dispatcher that is responsible
|
|
for making calls to timer DPC routines. Fortunately, this is a fairly easy
|
|
task for a driver, as the interfaces for creating timers with associated DPCs
|
|
and DPC routines are documented and exported. Specifically, a third party
|
|
driver could queue a timer DPC, and then record address of the DPC dispatcher
|
|
routine via inspection of the return address of the timer DPC routine when it
|
|
is called. From there, the driver can derive the address of the call
|
|
instruction responsible for making the call to the DPC routine associated with
|
|
a DPC object that is associated with a timer.
|
|
|
|
At this point, all a third party driver needs to do is patch the call
|
|
instruction in the DPC dispatcher to transfer execution control to the
|
|
driver's code. From there, the driver can filter all timer DPCs for the
|
|
PatchGuard DPC routine (perhaps by looking for a bogus kernel address in
|
|
DeferredContext, paired with a DPC routine that is within the confines of the
|
|
kernel image in-memory). When the PatchGuard DPC is detected, then the driver
|
|
can decline to call the DPC routine and instead simply return control to the
|
|
kernel DPC dispatcher after the call instruction in the logical original
|
|
instruction stream. This effectively prevents PatchGuard from ever running
|
|
the system integrity check, which again gives the driver free reign to patch
|
|
the kernel without fear of intervention by PatchGuard.
|
|
|
|
In the author's opinion, the best way to prevent this approach is to use a
|
|
multitude of different mechanisms to kick off execution of the PatchGuard
|
|
check routine. For example, a dedicated thread waiting on a timer could also
|
|
be used, or a frequently-called system routine could be modified to
|
|
periodically make calls to PatchGuard. As long as calls to PatchGuard are
|
|
funneled through one location, such as the timer DPC dispatcher, the entire
|
|
PatchGuard integrity check system is at risk of being trivially bypassed in
|
|
one fell swoop by third party drivers.
|
|
|
|
4.7) Searching for the PatchGuard DPC
|
|
|
|
PatchGuard currently uses a KTIMER object with an associated KDPC object, both
|
|
allocated within non-paged pool memory, as a periodic trigger used to start
|
|
PatchGuard's system integrity check routine. It should be possible to locate
|
|
this timer object in memory and cancel it, preventing PatchGuard from
|
|
executing.
|
|
|
|
The implementation of this technique is essentially a classical egghunt style
|
|
search through non-paged pool, with some specially defined restrictions as to
|
|
how to find the target. Specifically, one is looking for a region of memory
|
|
matching the following criteria:
|
|
|
|
1. The memory is a valid KTIMER object. This means that the linked list
|
|
entries should be valid, and point to other seemingly valid KTIMER objects (or
|
|
the list head), and that the type field of the KTIMER is consistent with a
|
|
timer object.
|
|
2. The timer should have a timer interval in the range of several minutes.
|
|
PatchGuard applies a randomized fuzz factor to the timer interval (within a
|
|
small range), but verifying that the range of the timer is no more than
|
|
several minutes (say 7 or 8) should be an ample sanity check.
|
|
3. The timer should have a KDPC associated with it (and the pointer should
|
|
be valid non-paged pool).
|
|
4. The associated KDPC should have the appropriate type field.
|
|
5. The associated KDPC should have a DPC routine that is within the confines
|
|
of the kernel image in-memory.
|
|
6. The associated KDPC should have a DeferredContext value that is a
|
|
non-canonical kernel address.
|
|
|
|
After the DPC is found, it can simply be canceled in order to disable
|
|
PatchGuard. Similar approaches as could be used to prevent technique 4.6 would
|
|
be applicable here. Specifically, a diversity in the set of paths leading up
|
|
to PatchGuard's execution would make a technique like this, which is targetted
|
|
at one path (such as the timer DPC dispatcher) less effective at blocking
|
|
PatchGuard.
|
|
|
|
4.8) TLB Desynchronization (Split TLB)
|
|
|
|
All x86 processors supporting protected mode and paging employ a caching
|
|
scheme to speed up the translation of virtual addresses to physical addresses.
|
|
This scheme is implemented via a set of Translation Lookaside Buffers, or
|
|
TLBs, which cache the contents of the page attributes (and associated physical
|
|
address) for a given virtual address. Recent x86 processors (Pentium II-class
|
|
or later) utilize several sets of TLBs, such as one set of TLBs for data
|
|
accesses and one set of TLBs for instruction accesses. In normal system
|
|
operation, both TLBs (if a processor supports multiple TLBs) maintain
|
|
consistent views for the attributes of a particular page; however, it is
|
|
possible to deliberately desynchronize the contents of these TLBs, thereby
|
|
maintaining the illusion that a single page has different attributes depending
|
|
on whether it is referenced as data or as executable code. This deliberate
|
|
desynchronization of TLBs has many uses, from the implementation of no-execute
|
|
support (utilized by PaX/GRsec on GNU/Linux [6]) to ``memory cloaking'', a
|
|
technique often used by rootkits to provide one view of memory when memory is
|
|
referenced as data by a read operation, and a different view of memory if
|
|
memory is referenced by an instruction fetch. This same memory cloaking
|
|
technique that has appealed to rootkit developers for the purpose of hiding
|
|
rootkits from detection can also be used to hide kernel patching from
|
|
PatchGuard's integrity check. Strictly speaking, this proposed technique is
|
|
not a bypass mechanism for PatchGuard; rather, it is a mechanism to hide
|
|
kernel patching from PatchGuard, thus making PatchGuard harmless to third
|
|
parties that are patching the kernel.
|
|
|
|
The details of this approach are essentially similar in many respects to that
|
|
of any program implementing a split-TLB approach to altering page attributes
|
|
or contents based on execute or read fetches. The exact details behind how
|
|
this can be accomplished are beyond the scope of this paper, and are discussed
|
|
elsewhere, by the PaX team (in the context of implementing no-execute on
|
|
legacy platforms) [6], and by Sherri Sparks and Jamie Butler (in the context of
|
|
implementing a Windows rootkit that utilizes split-TLBs to implement so-called
|
|
``memory cloaking'') [7]. Interested readers are encouraged to review these
|
|
references for the raw details on how the general split-TLB concept is
|
|
implemented. Although the referenced articles directly apply to x86, the
|
|
concepts apply in principle to x64 as well, and can likely be made to work on
|
|
x64 with minimal modification.
|
|
|
|
After one has established a mechanism for desynchronizing TLBs (such as by
|
|
hooking the page fault handler), the recommended approach for this technique
|
|
is to desynchronize the TLBs for any regions in the kernel where one is
|
|
performing traditional opcode-replacement-based patching or hooking.
|
|
Specifically, when kernel code is read for execute on a page where an
|
|
opcode-replacement-based patch is in place, then the patched page should be
|
|
returned. If kernel code is read for a data reference (such as PatchGuard
|
|
making a read of kernel code to validate its integrity), then the original
|
|
data should be returned. This technique effectively hides all modifications
|
|
to kernel code to any access other than direct execution, which prevents
|
|
PatchGuard from detecting that kernel code has been altered by a third party.
|
|
|
|
Note that in order for this approach to succeed, the hook on the page fault
|
|
handler itself must be hidden from PatchGuard. This cannot be directly
|
|
accomplished by the same TLB desynchronization tactic, as the page fault
|
|
handler must remain resident. A combined approach, such as utilizing a debug
|
|
breakpoint on the page fault handler (when coupled with a subverted debug trap
|
|
handler, perhaps via PsInvertedFunctionTable as described previously in
|
|
technique 3) along with a scheme to prevent PatchGuard from disabling
|
|
debug-register-based breakpoints (such as described in technique 5) might be
|
|
needed in order to hook the page fault handler in a manner truly transparent
|
|
to PatchGuard.
|
|
|
|
The most logical defense for this approach is to attempt to detect a
|
|
compromise in the page fault dispatching path. Because TLB desynchronization
|
|
cannot in general be used to hide the page fault handler itself (the page
|
|
fault handler must remain marked present in memory), it would be difficult for
|
|
a third party to conceal the alteration to the page fault handler from the
|
|
kernel. This difficulty would be expressed in a limited number of ways in
|
|
which alterations to the page fault handler could be hidden, such as by clever
|
|
utilization of debug registers. As a result, the key to preventing this
|
|
technique from remaining viable is to develop a way for PatchGuard to detect
|
|
the page fault hook. If, for example, the debug trap handler and a debug
|
|
breakpoint on the page fault handler were used to gain control on a page
|
|
fault, then Microsoft might be able to prevent this technique by blocking or
|
|
detecting the interception of the debug trap handler. One such approach might
|
|
be to better secure PsInvertedFunctionTable, which represents an easy way for
|
|
a third party to subvert the debug trap handler without PatchGuard's
|
|
knowledge. Such counters will vary based on the mechanism used to hide the
|
|
page fault handler hook, however.
|
|
|
|
4.9) DPC Routine Patching
|
|
|
|
A variation on technique 4.2, a very simple-minded approach to disabling
|
|
PatchGuard would be to simply hook every possible DPC routine, check if the
|
|
DPC is probably being called in order to execute PatchGuard's system integrity
|
|
check, and if so, simply returning from the DPC to the kernel timer DPC
|
|
dispatcher. In order to implement this approach, one first needs to locate
|
|
each possible DPC routine. Technique 4.2 lists a number of viable algorithms for
|
|
fingerprinting (and locating) each DPC routine; any (preferably multiple) of
|
|
the suggested algorithms in that technique would be directly applicable to
|
|
this proposed approach.
|
|
|
|
After one has identified all the possible DPC routines, all that is left is to
|
|
patch each one to branch to driver controlled code. From there, the driver
|
|
could make the decision as to whether the DPC is being invoked legitimately,
|
|
or whether it is being invoked as part of PatchGuard's system integrity check
|
|
process (easily identified by a non-canonical kernel address being passed as
|
|
DeferredContext). If the DPC is PatchGuard-related, then all the driver need
|
|
do to block PatchGuard is to immediately return to the DPC dispatcher.
|
|
|
|
This approach is fairly trivial to prevent (from Microsoft's point of view).
|
|
Because it is signature-based, one possible counter-approach Microsoft could
|
|
implement would be determining which signatures third party drivers use to
|
|
detect PatchGuard DPCs, and altering the PatchGuard DPC routines to not match
|
|
those signatures in the next PatchGuard version. Microsoft could also change
|
|
the number of DPC routines to throw off drivers that assume PatchGuard will
|
|
use exactly ten DPCs, or Microsoft could switch to an alternative delivery
|
|
mechanism other than DPCs in order to prevent existing code that detects and
|
|
hooks specific DPC routines from blocking PatchGuard.
|
|
|
|
5) Subverting PatchGuard
|
|
|
|
PatchGuard currently possesses a formidable array of defensive mechanisms that
|
|
are aimed at making it difficult to reverse engineer and debug. Given that
|
|
Microsoft does not currently have in place the infrastructure to make
|
|
PatchGuard enforced by hardware, this is arguably the best that Microsoft will
|
|
ever really be able to do in the short term. They're only able to build a
|
|
system that is based on obfuscation and anti-debugging techniques in an
|
|
attempt to make it difficult for third parties to detect, disable, or bypass
|
|
it.
|
|
|
|
There are other classes of software that seek to create defenses similar to
|
|
those of PatchGuard's. However, these other classes usually have far more
|
|
nefarious purposes than preventing third parties from patching the kernel.
|
|
Specifically, anti-debugging, anti-reverse-engineering, and self-decrypting
|
|
code have often used been to hide viruses, rootkits, and other malicious
|
|
software on compromised systems.
|
|
|
|
Although Microsoft may have intended the defensive mechanisms employed by
|
|
PatchGuard for an (arguably) good cause, these same anti-debugging,
|
|
anti-detection, and anti-reverse-engineering techniques that protect
|
|
PatchGuard from attack by third party drivers can also be subverted to protect
|
|
custom code from detection or analysis by anti-virus or anti-rootkit software.
|
|
With this respect, Microsoft has created a double-bladed-sword, as the same
|
|
elaborate obfuscation and anti-debugging schemes that guard PatchGuard against
|
|
third party software can also be used to guard malicious software from system
|
|
security software. It is in fact quite possible to subvert PatchGuard version
|
|
2's myriad defenses to execute custom code instead of PatchGuard's system
|
|
integrity check routine. While doing so might not be exactly called trivial,
|
|
it is far from impossible.
|
|
|
|
In order to subvert PatchGuard to do one's bidding, one must first catch
|
|
PatchGuard in the act, so to speak. To accomplish this, the author recommends
|
|
turning to one of the proposed bypass techniques as a starting place. For
|
|
example, consider the first proposed bypass technique, wherein the author
|
|
recommends hooking _C_specific_handler to intercept control of execution at
|
|
the exception generated by the PatchGuard DPC routine in order to trigger
|
|
execution of the system integrity check. An implementation of this bypass
|
|
technique provides direct access to the machine context inside the PatchGuard
|
|
DPC routine, and this machine context contains the information necessary to
|
|
locate the PatchGuard system integrity check routine.
|
|
|
|
Since the objective is to repurpose the system integrity check routine to
|
|
execute custom code, this is a good starting point. However, determining the
|
|
location of the system integrity check routine is much more involved than
|
|
simply skipping over PatchGuard's checks entirely; the pointer to the routine
|
|
in question is encrypted based off of the original arguments to the DPC (the
|
|
Dpc and DeferredContext arguments). Additionally, the original arguments to
|
|
the PatchGuard DPC have at this point already been moved from registers to the
|
|
stack and obfuscated (rotated left or right by a magical constant). As the
|
|
original contents of the argument registers are deliberately overwritten by
|
|
the DPC routine before the access violation is triggered, there is no choice
|
|
other than to somehow fish the DPC arguments out of the caller's stack. This
|
|
is actually somewhat of a challenge, given that such an approach must work for
|
|
all kernel versions, and must also work for all of the different DPC
|
|
permutations. Since this set of possibilities represents an unmaintainably
|
|
large number of routines to reverse engineer in order to determine rotate
|
|
obfuscation values and stack offsets, a more generalized approach to locating
|
|
the original arguments on the stack must be taken. In order to create such a
|
|
generic approach, one must take a closer look at the first few instructions of
|
|
each DPC routine (leading up to the intentional access violation). Although
|
|
PatchGuard has put into place several barriers to prevent easy retrieval of
|
|
the original arguments from this context, there might be a pattern or weakness
|
|
that could be exploited in order to recover the arguments in question.
|
|
|
|
The basic things common to each DPC routine, when it comes to the machine
|
|
context at the time of the access violation, are:
|
|
|
|
1. The original arguments have been stored on the stack in an obfuscated
|
|
form (rotated left or right by an arbitrary magical constant).
|
|
|
|
2. The access violation always occurs by dereferencing rax. Here, rax is
|
|
always the deobfuscated form of the DeferredContext argument. This gives us
|
|
one of the arguments for free, as rax in the register context at the time of
|
|
the access violation is always the plaintext DeferredContext value.
|
|
|
|
3. The stack location where the Dpc argument is stored at varies greatly
|
|
between DPC version to DPC version. Furthermore, it also varies between
|
|
different kernel flavors within an operating system family, and between
|
|
operating system families. As a result, it is not practical to hardcode stack
|
|
displacements for this argument.
|
|
|
|
4. The instruction immediately prior to the faulting instruction is always
|
|
an instruction in the form of ror rax, <magical constant>. Here, the magical
|
|
constant is an immediate value, which means that it is encoded as a part of
|
|
the opcode for this instruction itself. Each DPC has its own unique magical
|
|
constant, and the magical constants used do not change for a particular DPC
|
|
flavor across all kernel flavors and operating system families. This gives us
|
|
a nice way to quickly identify which of the ten PatchGuard DPCs is in use from
|
|
the context of the _C_specific_handler hook (without having to do ugly code
|
|
fingerprinting or analysis). Unfortunately, we still don't have a way to
|
|
determine the stack displacement of the Dpc argument.
|
|
|
|
5. The r8 register is always equal to the original Dpc argument, shifted
|
|
right by the low byte of the DeferredContext argument. Although this may seem
|
|
tantalizingly close to what we're looking for, it can't actually be used as a
|
|
substitute for the original Dpc argument, even though the DeferredContext
|
|
argument is known here (due to the value of rax). This is because the right
|
|
shift operation is destructive, in that information is permanently lost as
|
|
bits are shifted right off of the register into oblivion. As a result,
|
|
depending on the low byte of the DeferredContext argument, important bits in
|
|
the Dpc argument have already been permanently lost in the pseudo-copy
|
|
residing in r8.
|
|
|
|
Although the situation may initially appear grim, it is in fact still possible
|
|
to locate the Dpc argument given the above information; all that is needed is
|
|
a bit of work (and getting one's hands dirty with some ugly tricks).
|
|
Specifically, it is possible to search the stack frame of the DPC routine for
|
|
the Dpc argument with a brute-force attack. This isn't exactly elegant, but
|
|
it gets the job done. There are a number of hints that can be used to
|
|
increase the chance of successfully finding the real Dpc argument on the
|
|
stack:
|
|
|
|
1. The stack is 8-byte aligned (at least) due to x64 calling convention
|
|
requirements, and the Microsoft C/C++ compiler will always place pointer-sized
|
|
values on the stack in 8-byte-aligned locations. As a result, the search can
|
|
be narrowed down to 8-byte-aligned locations on the stack, instead of a
|
|
bytewise search.
|
|
|
|
2. Because the identity of the current DPC routine is known (due to
|
|
analyzing the ror instruction immediately preceding the faulting mov eax,
|
|
[rax] instruction), the rotate constant used to obfuscate the Dpc argument is
|
|
known. Each DPC routine has its own unique magical rotate constant, and as
|
|
the current DPC routine has been positively identified, the rotate constant
|
|
used to obfuscate the Dpc argument on the stack is thus also known.
|
|
|
|
3. A quick check as to whether a value on the stack could possibly be the
|
|
Dpc argument can be made by rotating the value on the stack by the known
|
|
obfuscation constant, then shifting the value right by the low byte in the
|
|
DeferredContext argument and comparing the result to the r8 value at the time
|
|
of the exception. If there is a mismatch, then the current stack location can
|
|
be eliminated from the search. This does not provide a positive match, but it
|
|
does provide a way to positively eliminate possibilities. This step is also
|
|
optional, in that it is still possible to locate the Dpc argument without
|
|
relying on r8; the check against r8 is simply an optimization.
|
|
|
|
4. The Dpc argument should point to a valid non-paged pool address, given
|
|
that it must represent a valid kernel pointer. In order to check that this is
|
|
the case, MmIsAddressValid can be used to test whether the deobfuscated value
|
|
in question is a valid pointer or not. (Yes, MmIsAddressValid is a bit of a
|
|
race condition and certainly a hack. The author would like to note that this
|
|
approach was described as requiring that the implementor get his or her
|
|
``hands dirty with some ugly tricks'', in an attempt to forstall the
|
|
inevitable complaints about how this approach might be decried as an
|
|
unstomachable ugly hack by some.)
|
|
|
|
5. The Dpc argument should point to a valid non-paged pool address whose
|
|
length is great enough to contain a KDPC object, plus at least one
|
|
pointer-sized additional field. A secondary MmIsAddressValid test can be used
|
|
to verify that the pointer describes a valid region large enough to contain
|
|
the KDPC object, plus the additional pointer-sized field following it (the
|
|
PatchGuard decryption key).
|
|
|
|
6. The Dpc argument should point to a DPC whose Type and DeferredContext
|
|
arguments have been zeroed. (The DPC routine intentionally zeros these values
|
|
in the DPC before intentionally triggering an access violation.) If the
|
|
suspected Dpc argument, when treated as a PKDPC, does not have these
|
|
properties then it can be eliminated as a possibility.
|
|
|
|
By repeatedly applying these rules to every applicable location within a
|
|
reasonable distance upward from the rsp value at the time of the exception
|
|
(say, 256 bytes, although the exact size can be greater; the only requirement
|
|
is that the entire local variable space of the DPC routine with the largest
|
|
local variable space is completely contained within the search region), it is
|
|
possible to recover the Dpc argument with virtual certainty. In the author's
|
|
experience, this technique works quite reliably, despite that one might intuit
|
|
that a search of an unknown stack frame might be prone to failing or turning
|
|
up false positives.
|
|
|
|
After both the Dpc and DeferredContext arguments to the PatchGuard DPC routine
|
|
have been recovered, it is a simple matter of analyzing how PatchGuard invokes
|
|
the system integrity check in order to determine how to locate it in-memory.
|
|
This has been discussed previously, and it amounts to the following set of
|
|
statements:
|
|
|
|
ULONG64 DecryptionKey, PatchGuardCheckFunction;
|
|
|
|
DecryptionKey = *(PULONG64)(Dpc + 0x40);
|
|
PatchGuardCheckFunction = DecryptionKey ^ DeferredContext;
|
|
PatchGuardCheckFunction |= 0xFFFFF80000000000;
|
|
|
|
At this point, it's almost possible to replace the system integrity check
|
|
routine with custom code. However, there is still the matter of the pesky
|
|
self-decrypting stub that runs before the check function. Because the DPC
|
|
routine's exception handler rewrites the first instruction of the stub before
|
|
it is executed, one doesn't have a whole lot of choice but to implement at
|
|
least a very basic version of the decryption stub for the system integrity
|
|
check routine.
|
|
|
|
Recall that the first instruction in the stub is set to the following:
|
|
|
|
lock xor qword ptr [rcx],rdx
|
|
|
|
Looking at the prototype for the decryption stub, rcx corresponds to the
|
|
address of the decryption stub itself, and rdx corresponds to the decryption
|
|
key. Since this instruction modifies both itself and the next instruction
|
|
(the instruction is four bytes long and the xor alters eight bytes), the
|
|
replacement code for the system integrity check routine must allow the first
|
|
instruction to be the above xor instruction, and the must allow for the second
|
|
instruction (at a minimum) to be initially xor-obfuscated. For simplicity's
|
|
sake, the author has chosen to implement the simplest possible solution to
|
|
this conundrum, which is to make the second instruction in the replacement
|
|
code a duplicate of the first instruction. In other words, the replacement
|
|
code would read as follows:
|
|
|
|
;
|
|
; This instruction is forced on us by PatchGuard,
|
|
; and cannot be altered; it is rewritten at runtime.
|
|
;
|
|
|
|
lock xor qword ptr [rcx],rdx
|
|
|
|
;
|
|
; The next instruction, conveniently four bytes
|
|
; long, re-encrypts itself by xoring the first
|
|
; eight bytes of the decryption stub (which includes
|
|
; the second instruction) by the decryption key a
|
|
; second time;
|
|
;
|
|
|
|
lock xor qword ptr [rcx],rdx
|
|
|
|
;
|
|
; (... any custom code may follow here ...)
|
|
;
|
|
|
|
As noted previously, after specially constructing the replacement code, it is
|
|
necessary to initially encrypt the second instruction (as it will be
|
|
immediately decrypted by the first instruction). This must be done before
|
|
control is returned to PatchGuard.
|
|
|
|
After the custom code is configured and the second instruction is encrypted,
|
|
all that remains is to copy the custom code over the PatchGuard decryption
|
|
stub. When this is accomplished, the PatchGuard DPC's exception handler will
|
|
invoke the supplied custom code instead of the system integrity check routine.
|
|
|
|
However, this is not really all that interesting due to the fact that
|
|
PatchGuard utilizes a one-shot timer. The custom code that was substituted
|
|
for the decryption stub will never be run again. To account for this fact, it
|
|
would be prudent to place a call to queue a timer with an associated DPC
|
|
routine (pointing to the DPC routine that PatchGuard selected at boot) within
|
|
the custom code block.
|
|
|
|
At this point, it is possible to simply allow the normal exception dispatching
|
|
process to continue (i.e. to resume _C_specific_handler), after which the
|
|
custom code will be invoked instead of PatchGuard. In essence, PatchGuard has
|
|
been not only disabled, but completely subverted to call customized code under
|
|
the control of a third party driver instead of the system integrity check.
|
|
|
|
Still, the situation is less than optimal. Presently, there is still a hook
|
|
in _C_specific_handler that is there for anyone to see (and recognize that
|
|
someone has tampered with the kernel). Additionally, the driver that was used
|
|
to subvert PatchGuard in the first place is still loaded, which may also be a
|
|
tell-tale giveaway sign that someone may have done something unsavory to the
|
|
kernel.
|
|
|
|
These problems are also solvable, however. It turns out that after PatchGuard
|
|
has been subverted, it is safe to unhook from _C_specific_handler, and then
|
|
simply call back into _C_specific_handler after the hook is removed.
|
|
Furthermore, everything necessary to run the subverted system integrity check
|
|
routine could even reside within PatchGuard's own internal data structures;
|
|
for example, one could simply utilize extra space after the custom code, where
|
|
the decryption stub and PatchGuard check routine would normally reside as a
|
|
parameter block. This is especially convenient, as the custom code block is
|
|
given a pointer to itself in rcx (the first argument), and it is easy to add a
|
|
known constant value to that pointer in order to retrieve the parameter block
|
|
for the custom code. At this point, all of the code and data necessary for
|
|
the custom code that the driver has subverted PatchGuard with is located in
|
|
dynamically allocated memory. Given this, the original driver is no longer
|
|
needed and can even be unloaded (so as to further disguise the fact that any
|
|
alterations to the kernel have taken place). After the driver has been
|
|
unloaded, the only traces of the alterations that have taken place would be
|
|
the unloaded module list (easily modified), and the re-written PatchGuard
|
|
system integrity routine itself (which could easily be bolstered to be
|
|
self-decrypting (with a differing encryption key in order to make for an
|
|
extremely difficult to locate target in-memory).
|
|
|
|
The end result is that PatchGuard has been disabled, and in its place,
|
|
arbitrary custom code is periodically executed. Furthermore, no modifications
|
|
or patches to kernel code or global data are present and no suspicious drivers
|
|
(or even suspicious extraneous memory allocations) remain present in memory.
|
|
In essence, the only traces of the fact that PatchGuard has been subverted
|
|
would be visible only to someone (or something) that knows how to locate and
|
|
disable PatchGuard.
|
|
|
|
The supplied example program for subverting PatchGuard is fairly simple, and
|
|
it does not utilize all of the defensive technologies employed by PatchGuard.
|
|
For instance, it does not change the decryption key on every execution, nor
|
|
does it follow through with keeping the entire code block encrypted except
|
|
just before execution. These features could be easily added, however, and
|
|
would greatly increase the difficulty of locating the subverted PatchGuard
|
|
code in memory.
|
|
|
|
6) Future Direction of PatchGuard and ``Anti-Hack'' Systems
|
|
|
|
In the future, there are a couple of generalized approaches that Microsoft
|
|
could take to significantly strengthen PatchGuard against attack.
|
|
Specifically, these involve adding redundancy and removing single points of
|
|
failure from PatchGuard. It is often helpful to look at an anti-hack system
|
|
like PatchGuard as a critical system that one would like to keep running at
|
|
all times with minimal downtime (i.e. a network or service with
|
|
high-availability). The logical way to accomplish that goal is to locate and
|
|
eliminate single points of failure, such as by adding redundancy. In a high
|
|
availability network, one would accomplish this by adding redundant cables,
|
|
switches, and the like, such that if one component were to fail, the system as
|
|
a whole would continue to operate instead of failing entirely. With an
|
|
anti-hack system such as PatchGuard, it is helpful to add redundancy to all
|
|
critical code paths such that there is no single point where an attacker can
|
|
simply change an opcode or hook in with the end result of disabling the entire
|
|
anti-hack system.
|
|
|
|
Removing these single points of failure is critical to the longevity of an
|
|
anti-hack system. The main concept to grasp in such cases is that the
|
|
attacker will always try to seek out the easiest way to break the defenses of
|
|
the target system. All the obfuscation and encryption in the world does
|
|
little good if an attacker can simply change a jmp to a nop and prevent
|
|
elaborate encryption and anti-debugging facilities from ever getting the
|
|
chance to run. In this respect, PatchGuard is flawed in its current
|
|
implementation. There are many different single points of failure where an
|
|
attacker could inject themself at a single place and completely disrupt
|
|
PatchGuard.
|
|
|
|
One possible solution to this problem might be to ensure that there are
|
|
multiple different code paths that can lead to every point in the PatchGuard
|
|
system integrity check. The nature of the battle between anti-hack systems
|
|
and attackers relates to how easy it is to bypass the weakest link in the
|
|
anti-hack system. Until all of the weak links in the system are shored up
|
|
simultaneously, the system remains much more vulnerable to easy attack or
|
|
bypass. With this respect, PatchGuard version 2 does little to improve on the
|
|
weakest links of the system and as such there are still a vast number of ways
|
|
to bypass it. Even worse, each bypass technique is often only required to
|
|
attack one specific aspect of PatchGuard in order to disable it as a whole.
|
|
|
|
As far as PatchGuard itself is concerned, one approach that Microsoft could
|
|
take to significantly increase the resiliency and robustness of the system to
|
|
outside interference would be to merge some sort of critical system
|
|
functionality with the PatchGuard system integrity check. Such an approach
|
|
would make it difficult for a would-be attacker to simply bypass a call to
|
|
PatchGuard, as doing so would also bypass some sort of critical system
|
|
functionality that would (ideally) be required for the system to operate in
|
|
any usable capacity. At this point, the challenge for attackers then turns
|
|
into either replicating the critical system functionality that is contained
|
|
within PatchGuard, finding a way to split the critical system functionality
|
|
away from the system integrity check portions of PatchGuard, or finding a way
|
|
to evade PatchGuard's detection of kernel patching entirely. Microsoft can
|
|
make the first two points arbitrarily difficult, especially since the
|
|
knowledge of Windows internals is presumably greater inside Microsoft than
|
|
outside Microsoft. The incorporation of critical system functionality would be
|
|
theoretically easier for Microsoft to do than it would be for would-be
|
|
attackers to reliably reverse engineer and re-implement such functionality on
|
|
their own, forcing would-be attackers to take the hard route of trying to
|
|
separate PatchGuard from critical system functionality. This is where clever
|
|
use of obfuscation and anti-debug techniques would really see maximum
|
|
effectiveness, as an attacker would (optimally) have no choice other than to
|
|
step through and understand PatchGuard entirely before being able to replicate
|
|
the critical functionality contained within PatchGuard (or selectively
|
|
activate the critical functionality without activating the system integrity
|
|
check).
|
|
|
|
The latter problem (evading PatchGuard detection entirely) is likely to be a
|
|
much more difficult one to tackle, however. Techniques such as the clever use
|
|
of debug registers, TLB desynchronization, and other related attacks are
|
|
extremely difficult to detect (and typically very easy to alter to avoid
|
|
detection after a known detection scheme for such attacks is developed). In
|
|
this particular respect, Microsoft is presently at a great disadvantage.
|
|
Improving PatchGuard to avoid such evasion tactics is likely to prove both
|
|
difficult and a poor investment of time relative to how quickly attackers can
|
|
adapt and compensate for Microsoft's efforts at bolstering PatchGuard's
|
|
capabilities.
|
|
|
|
Looking towards the future, it can be expected that PatchGuard will ultimately
|
|
see the obfuscation-based defensive mechanisms currently in place substituted
|
|
with hardware-based defensive mechanisms. In particular, the author expects
|
|
that Microsoft will eventually deploy a PatchGuard version that is augmented
|
|
by the hardware-based virtualization (also known as hypervisor) support
|
|
present in recent processors (and being developed for Windows Server
|
|
``Longhorn'', code-named ``Viridian''). An implementation of PatchGuard that
|
|
is guarded by a hypervisor would be immune to being simply patched out of
|
|
existence (which eliminates some of the most significant flaws in current
|
|
versions of PatchGuard), at least as long as the hypervisor itself remains
|
|
secure and free from exploitable bugs. In a hypervisor-based system with
|
|
PatchGuard, third party drivers would not be permitted to execute with
|
|
hypervisor privileges, thus completely preventing runtime patching of
|
|
PatchGuard itself (which would be a part of the privileged hypervisor layer).
|
|
A hypervisor-based system might also be able to implement concepts such as
|
|
write-once memory that could be adapted to prevent the kernel from being
|
|
patched in the first place once it is initially loaded into memory (as opposed
|
|
to detecting patching after the fact, and bringing down the system in response
|
|
to third party drivers performing underhanded deeds).
|
|
|
|
Even with hypervisor support in-place, however, it is anticipated that there
|
|
will still be ways for third parties to alter the behavior of the kernel in
|
|
ways not completely authorized by Microsoft. For instance, as long as support
|
|
for debug registers must be retained in order for the kernel debugger to
|
|
function, it may be difficult to prevent an approach that utilizes debug
|
|
registers to modify execution context at arbitrary locations within the kernel
|
|
(at least, not without making the hypervisor completely responsible for
|
|
managing all activities relating to the processor's complement of debug
|
|
registers).
|
|
|
|
7) Conclusion
|
|
|
|
Although PatchGuard version 2 introduces significant improvements in some
|
|
areas, it still remains vulnerable to a wide variety of potential attacks.
|
|
Additionally, it is possible (though involved) to subvert PatchGuard entirely,
|
|
with the purpose of running arbitrary custom code in a difficult-to-detect
|
|
manner in the place of PatchGuard.
|
|
|
|
With these points in mind, it is perhaps time to re-evaluate whether
|
|
PatchGuard, in its current incarnation, is really worth all the trouble that
|
|
Microsoft has put into it. Although forcing the IHV and ISV world to clean
|
|
house with their kernel mode code is certainly a reasonable goal (and one
|
|
which ultimately benefits all Windows customers, no matter how certain
|
|
companies with poorly written kernel mode code [8] may care to spin the facts),
|
|
as badly written kernel mode code results in the chronic instability that
|
|
Windows is often associated with (at best), and privilege escalation and
|
|
arbitrary code execution exploits in the worst case. However, there are still
|
|
significant counterpoints to what PatchGuard represents; the fact that it may
|
|
provide a convenient way for malicious kernel mode code to hide in a very
|
|
difficult to detect manner, and that there is real innovation that is stifled
|
|
by the restrictions that PatchGuard places on the system. As an example of
|
|
the latter, consider that Microsoft's very own Virtual Server 2005 R2 SP1
|
|
(Beta) runs afoul of PatchGuard and requires a special kernel hotfix to alter
|
|
what, exactly, PatchGuard protects in order to run without bugchecking the
|
|
system with the infamous CRITICAL_STRUCTURE_CORRUPTION bugcheck made famous by
|
|
PatchGuard [3]. This alone should be taken as an indicator that there *are* in
|
|
fact legitimate uses for some of the techniques that PatchGuard prevents,
|
|
despite Microsoft's insistence to the contrary. It should also be noted that
|
|
despite Microsoft's statements that no exceptions would be made for PatchGuard
|
|
[1], they have had to make adjustments at least once for their own code to run on
|
|
PatchGuard. The conspiracy theorists among you might wonder whether Microsoft
|
|
would be so gracious as to make such exemptions for legitimate uses of
|
|
techniques blocked by PatchGuard for third party software with similar needs
|
|
as Virtual Server 2005 R2 SP1, given their pointed statements to the contrary.
|
|
|
|
As a final note relating to the objectives of PatchGuard, even with hypervisor
|
|
technology deployed (and furthermore, even with so-called immutable memory as
|
|
implemented by a hypervisor), there is little that can be done to protect
|
|
drivers from each other, as even in a hypervisor based system (where the
|
|
kernel itself is protected from drvers), interdependent drivers will still be
|
|
able to interfere with eachother so long as they co-exist in the same domain.
|
|
This is particularly problematic in Windows, given the concepts of device
|
|
stacks and device interfaces that allow drivers to directly interact with
|
|
eachother in a variety of ways. It will be difficult to ensure that drivers do
|
|
not resort to patching eachother (or modifying pool allocations instead of
|
|
patching code, in the case where immutable memory on code regions is being
|
|
enforced by a hypervisor). Depending on what the objectives of a third party
|
|
ISV attempting to bypass PatchGuard are, it may be possible to simply patch
|
|
drivers (such as Ntfs.sys or Tcpip.sys) in lieu of patching the kernel. From
|
|
this perspective, it is unlikely that Windows will ever become an environment
|
|
where kernel mode drivers are completely isolated and unable to interfere with
|
|
eachother, despite the efforts of technologies such as PatchGuard.
|
|
|
|
Microsoft has already started down a path that may eventually lead to a system
|
|
where buggy drivers will be unable to crash the system (or patch eachother),
|
|
with the advent of the User Mode Driver Framework (UMDF). It remains to be
|
|
seen whether isolated user-mode based drivers will become a viable alternative
|
|
for high performance devices (such as PCI/PCI Express as opposed to USB
|
|
devices), however, instead of simply being confined to a small subset of of
|
|
the devices that ship with a typical computer. The author expects that
|
|
whereever possible, Microsoft will attempt to move third party code outside of
|
|
sensitive areas (like the kernel) and into more contained locations (such as a
|
|
user-mode process). This is in-line with the purported goals of PatchGuard;
|
|
increasing system stability by preventing third party drivers from performing
|
|
questionable actions (or at least, questionable actions in such a way that
|
|
might bring down the system).
|
|
|
|
Bibliography
|
|
|
|
[1] Microsoft Corporation. Patching Policy for x64-Based Systems.
|
|
http://www.microsoft.com/whdc/driver/kernel/64bitpatching.mspx;
|
|
accessed December 10, 2006.
|
|
|
|
[2] skape, Skywing. Bypassing PatchGuard on Windows x64.
|
|
http://uninformed.org/index.cgi?v=3&a=3&t=sumry;
|
|
accessed December 10, 2006.
|
|
|
|
[3] Microsoft Corporation. Connect: Virtual Server 2005 R2 SP1 Beta.
|
|
https://connect.microsoft.com/site/sitehome.aspx?SiteID=151;
|
|
accessed December 28, 2006.
|
|
|
|
[4] Advanced Micro Devices, Inc. AMD 64-Bit Technology
|
|
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/x86-64_overview.pdf;
|
|
accessed December 28, 2006.
|
|
|
|
[5] Microsoft Corporation. RtlVirtualUnwind.
|
|
http://msdn2.microsoft.com/en-us/library/ms680617.aspx;
|
|
accessed December 28, 2006.
|
|
|
|
[6] The PaX Team. Paging Based Non-Executable Pages.
|
|
http://pax.grsecurity.net/docs/pageexec.txt;
|
|
accessed December 30, 2006.
|
|
|
|
[7] Sherri Sparks and Jamie Butler. "SHADOW WALKER" Raising the Bar for Rootkit Detection.
|
|
http://www.blackhat.com/presentations/bh-jp-05/bh-jp-05-sparks-butler.pdf;
|
|
accessed December 30, 2006.
|
|
|
|
[8] Skywing. Anti-Virus Software Gone Wrong.
|
|
http://www.uninformed.org/?v=4&a=4&t=sumry;
|
|
accessed December 31, 2006.
|