Zines/uninformed/8.2.txt

PatchGuard Reloaded: A Brief Analysis of PatchGuard Version 3
September, 2007
Skywing
skywing@valhallalegends.com
http://www.nynaeve.net/

Abstract: Since the publication of previous bypass or circumvention techniques
for Kernel Patch Protection (otherwise known as ``PatchGuard''), Microsoft has
continued to refine their patch protection system in an attempt to foil
known bypass mechanisms.  With the release of Windows Server 2008 Beta 3,
and later a full-blown distribution of PatchGuard to Windows Vista /
Windows Server 2003 via Windows Update, Microsoft has introduced the next
generation of PatchGuard to the general public (``PatchGuard 3'').  As with
previous updates to PatchGuard, version three represents a set of
incremental changes that are designed to address perceived weaknesses and
known bypass vectors in earlier versions.  Additionally, PatchGuard 3
expands the set of kernel variables that are protected from unauthorized
modification, eliminating several mechanisms that might be used to
circumvent PatchGuard while co-existing (as opposed to disabling) it.  This
article describes some of the changes that have been made in PatchGuard 3.
This article also proposes several new techniques that can be used to
circumvent PatchGuard's defenses.  Countermeasures for these techniques are
also discussed.

1) Introduction

PatchGuard is a controversial feature of Windows x64 editions, starting with
Windows Server 2003 x64 / Windows XP x64, and continuing on with Windows Vista
x64 and Windows Server 2008 x64.  The design goals behind PatchGuard are to
prevent the kind of rampant hooking and modification of various kernel code
and data structures that has been so common on x86 versions of Windows.
Microsoft has stated that the vast majority of kernel crashes are caused by
third party drivers, and the author's experiences with Windows firmly support
this supposition.  Because accessing internal kernel data structures and
hooking kernel functions typically requires intricate synchronization with the
rest of the system in order to be performed in a completely safe fashion,
especially on multiprocessor machines, many third party drivers that perform
these sorts of dangerous tasks have historically made egregious mistakes that
have often lead to system stability or a compromise of system security.  The
latter is especially common in cases where third party programs hook
functions, such as system calls, and subsequently fail to perform sufficient
parameter validation.

Microsoft's solution to this problem is to attempt to forcibly prevent third
party code from making unauthorized modifications to internal kernel data
structures and code through technical means in addition to discouraging
developers from performing such tasks.  However, due to the nature of how the
Windows kernel (and its supporting drivers) are designed, it is not feasible
for kernel mode drivers to run at a lower effective privilege level than the
kernel itself.  This poses a problem with respect to Microsoft's goal of
blocking unauthorized kernel patches due to the fact that there is no
hardware-enforced separation between the kernel itself and third-party
drivers.  As such, said third party drivers have free reign to manipulate
kernel code and data as desired.

Although emerging technologies such as TPM and hardware-assisted
virtualization (hypervisors) may eventually provide a mechanism to deploy a
hardware-enforced boundary between certain key parts of the kernel and the
third party drivers that interact with it, such an approach is not generally
applicable to most computers sold today, given the current state of the
technology involved (with respect to both hardware and software capabilities).
Lacking a complete, hardware-enforced solution, Microsoft has turned to other
approaches to dissuade third party software from making unauthorized kernel
modifications.  Specifically, the resulting kernel patch protection mechanism
("PatchGuard") is instead based on highly obfuscated code that, while running
at the same effective privilege level as both the kernel itself and third
party drivers, is designed to be resilient against detection and/or
modification by third party drivers.  This code is responsible for
periodically checking the integrity of key kernel code and data structures and
will bring down the system if such modifications are detected.  By virtue of
the fact that attempting to blithely patch the kernel as was once possible on
Windows x86 editions, attempting to perform the same operations will result in
a system crash on x64 versions of Windows.  As such, third party drivers are
effectively preventing from making such modifications on a large-scale basis
with respect to code deployed on customer systems.

However, like all systems that are founded upon the principal of security
through obscurity, PatchGuard has inherent weaknesses.  These weaknesses can
be exploited by third party drivers to either disable PatchGuard entirely or
circumvent its checks altogether while peacefully co-existing with PatchGuard.
Microsoft is fully aware of these deficiencies with respect to the fundamental
approach taken by PatchGuard and has resorted to periodically updating
PatchGuard in such a way as to block known public bypass techniques.  The net
result is that Microsoft gives the impression of a ``moving target'' to any
ISV that would defy Microsoft's wishes with respect to circumventing
PatchGuard.  This helps to show that any code designed to stop or disable
PatchGuard may become invalidated at some point in the future such as when
Microsoft releases a new update for PatchGuard.  This has resulted in a small
arms race with code to circumvent PatchGuard being written by third parties,
and Microsoft responding by developing and deploying countermeasures in the
form of an updated version of PatchGuard that is not susceptible to these
bypass techniques.  This cycle has continued through several iterations
already; in fact, PatchGuard is now being deployed to the general public in
its third iteration.

2) Protection Improvements

PatchGuard 3 implements several incremental improvements designed to protect
PatchGuard from third party code attempting to disable it as compared to
PatchGuard 2.  The majority of the alterations to PatchGuard's self-defense
logic appear to be direct responses to previously published, publicly-known
bypass techniques, rather than general improvements meant to make PatchGuard 3
more resilient to analysis and attack.  In this vein, while the alterations to
PatchGuard 3 (over PatchGuard 2) are effective at disabling most
previously-published bypass mechanisms that the author is aware of, it is not
exceedingly difficult to alter many previous attack mechanisms to be effective
against PatchGuard 3.  Many of the protection systems that were implemented in
PatchGuard 2 are still present in PatchGuard 3 in some form or another, though
some of them have been altered to resist previously-published attacks.

This chapter will describe a number of specific improvements that have been
made.

2.1) Multiple Concurrent PatchGuard Check Contexts

In previous PatchGuard releases, there existed a single PatchGuard check
context that would periodically be used to verify the integrity of protected
regions.  Some bypass techniques relied on the fact that there existed only
one PatchGuard context by virtue of disabling any invasive kernel patching
that would be required to ``catch PatchGuard in the act'' after locating
PatchGuard.  PatchGuard 3 improves upon this by creating at least one
PatchGuard context if PatchGuard is enabled, with a probability of a second
context being initialized at system boot time This is randomized based on the
processor time stamp counter, as all other PatchGuard randomization is done.
Both PatchGuard check contexts, which include all of the data used by
PatchGuard to check system integrity (including the self-decrypting check
routine in non-paged pool memory), operate completely independently if two
contexts happen to be used for a particular system boot.

There are several advantages to randomly creating more than one check context.
First of all, because the second context is not always created, an element of
uncertainty is (theoretically) introduced into the testing and development
process for PatchGuard bypass techniques, as it is possible that at first
glance, an individual that is researching PatchGuard 3 might not notice that
there is a chance to create more than one context.  This may result in lost
time during the debugging process, as some bypass techniques are affected by
the number of active contexts.  For example, the original bypass technique
described by the author for PatchGuard 2 [1] effectively turned itself off
after the first positive indication that PatchGuard was caught (although in
this particular instance, the PatchGuard-catching hooks could have allowed to
remain in place afterwards).

A better example of bypass techniques that might be affected by this sort of
scheme are those that rely on searching system pool memory for a sign of
PatchGuard.  For example, a theoretical bypass scheme that operates by
pro-actively locating the PatchGuard context in non-paged pool and disabling
it somehow (perhaps by rewriting the self-decrypting code stub to expand into
a no-operation function) might run afoul of this approach randomly during
testing if it were not designed to re-try a pool memory scan after a positive
hit on PatchGuard.  It also eliminates the degree of confidence that such
memory scan approaches provide, as previously, if one had a way to locate the
PatchGuard context in non-paged pool memory, one would either know for certain
that PatchGuard had in fact been disabled by getting a single hit (which could
be taken as an indication that it would now be safe to perform actions
blockedg by PatchGuard).  With multiple check contexts having a probability to
run, it is no longer possible for a bypass technique to have logic along the
lines of ``if a PatchGuard context has been located and disabled, then it is
safe to continue'', because there may exist a non-constant number of contexts
in the wild.

2.2) Filtering of Exception Codes Used to Trigger PatchGuard Execution

Like PatchGuard 2, and PatchGuard 1 before it, the third iteration of
PatchGuard is primarily executed through an unhandled exception in a DPC
routine which, through the use of a series of structured exception handlers,
eventually results in the self-decrypting PatchGuard stub being called in
non-paged pool memory (based off of the DPC arguments).  This presented itself
as a liability, as evidenced by the previous article [2] published on
Uninformed on the subject of disabling PatchGuard (release 2).  The problem
with using SEH to trigger execution is that there are a number of points in
the SEH dispatching mechanism that can easily be modified by an external
caller in order to gain execution after an exception is raised, but before a
registered exception handler itself might be called.

Previous techniques exploited this weakness by positioning themselves in after
the access violation exception raised when a PatchGuard-repurposed DPC routine
dereferenced a specially-crafted invalid pointer argument but before the SEH
logic that invokes the PatchGuard check context in response to the access
violation exception.  Specifically, the operating system exported routine used
by the Microsoft C/C++ compiler for all compiler-generated SEH frames,
_C_specific_handler, was targeted by bypass attempts described in the
aforementioned articles.  As the SEH frame responsible for running PatchGuard
appears to have been written in C for PatchGuard releases 1 and 2,
_C_specific_handler would be called before the user-supplied SEH logic which
would be responsible for executing the PatchGuard integrity check logic
contained within the current PatchGuard context.  At this point, a bypass
technique need only abort the execution of the PatchGuard check routine and
cleanly extricate itself from the call stack to a known-good location in order
to disable PatchGuard.

However, in order for such a bypass mechanism to properly function, one would
need to ensure that the particular exception being examined by
_C_specific_handler is in fact PatchGuard and not a legitimate kernel mode
exception.  Applying a PatchGuard-style bypass to the latter case would be
disastrous and almost certainly result in the system crashing or being
corrupted immediately after the fact.  Given this, positively identifying an
exception from the exception dispatcher interception point is key to any
bypass technique built upon exception dispatcher redirection.  While the
previous two PatchGuard releases made identifying PatchGuard a trivial task.
In both cases, a special form of invalid address, a ``non-canonical address'',
is dereferenced to trigger the access violation that ultimately results in
PatchGuard's check context being executed A non-canonical address is an
address that does not fall within the subset of a 64-bit address space
presented by modern x64 processors.

The advantage of using a non-canonical address is clear when one examines the
PatchGuard execution environment for a moment.  In Windows kernel mode
programming, it is not generally possible to blindly dereference a bogus
kernel mode pointer.  This often results in a sequence of events that bring
down the system, depending on where the dereferenced location is.  A
non-canonical address is a special (undocumented) exception to this rule, as
the processor reports the exception via a general protection fault and not the
typical page fault mechanism.  In this case, the operating system reports the
exception as an access violation related to an access of the highest kernel
address (0xFFFFFFFFFFFFFFFF).  This distinct signature can be used to locate
and disable PatchGuard in a relatively safe fashion, as bogus kernel mode
addresses should never make it to SEH dispatching (in kernel mode) unless the
system is about to crash due to a fatal driver or kernel bug (PatchGuard being
a special case).  Thus, it was previously possible to positively identify
PatchGuard by looking for an access violation that referenced
0xFFFFFFFFFFFFFFFF.

PatchGuard 3 improves the situation somewhat by performing some pre-filtering
of the exception data through an exception handler written in assembly (which
thus does not invoke _C_specific_handler) before the _C_specific_handler based logic
that actually invokes the PatchGuard check routine is executed.  Specifically,
the pre-filtering exception handler, whose code is given below, alters the
exception code to take on a random value which overlaps with many valid kernel
mode exceptions.  For example, some status codes that are applicable to the
file system space are used, such as STATUS_INSUFFICIENT_RESOURCES,
STATUS_DISK_FULL, STATUS_CANT_WAIT.  Additionally, the exception address is
altered as well (in some cases even set to be pointing into the middle of an
instruction), and the dereferenced address (the second exception parameter for
access violations) is also set to a randomized value.  After these alterations
are made, the assembly-language exception handler passes control on to the
_C_specific_handler based exception handler, which invokes PatchGuard.  Annotated
disassembly for one of the assembly-language pre-filter exception handlers is
provided below:

;
; EXCEPTION_DISPOSITION
; KiCustomAccessHandler8 (
; /* rcx */ IN PEXCEPTION_RECORD               ExceptionRecord,
; /* rdx */ IN ULONG64                         EstablisherFrame,
; /* r8  */ IN OUT PCONTEXT                    ContextRecord,
; /* r9  */ IN OUT struct _DISPATCHER_CONTEXT* DispatcherContext
; );
KiCustomAccessHandler8 proc near
  test    [rcx+_EXCEPTION_RECORD.ExceptionFlags], 66h
loc_14009B4C7:
  jnz     short retpoint
  rdtsc
; Randomize ExceptionInformation[ 1 ]
; ( This is the "referenced address" for
;   an access violation exception.)
;
; ( Note that rax is not set to any
;   specific defined value in this
;   context.  It depends upon the value
;   that RtlpExecututeHandlerForException
;   and by extension RtlDispatchException
;   last set rax to. )
  mov     [rcx+(_EXCEPTION_RECORD.ExceptionInformation+8)], rax
  xor     [rcx+(_EXCEPTION_RECORD.ExceptionInformation+8)], rdx
  shr     eax, 5
  and     eax, 70h
  sub     [r8+98h], rax
  and     edx, 7Fh
  or      edx, 0C0000000h
; Set ExceptionCode to a random value.  The code
; always has 0xC0000000 set, and the lowest byte
; is always masked with 7F.  This often results
; in an exception code that appears like a
; legitimate exception code.)
  mov     [rcx+_EXCEPTION_RECORD.ExceptionCode], edx
  lea     rax, loc_14009B4C7+1
; Set ExceptionAddress to a bogus value.  In this case,
; it is set to in the middle of an instruction.  This
; may interfere with attempts to unwind successfully from
 the exception.
  mov     [rcx+_EXCEPTION_RECORD.ExceptionAddress], rax
; Set Context->Rip to the same
; bogus exception address value.
  mov     [r8+0F8h], rax
  and     qword ptr [r8+88h], 0
retpoint:
  mov     eax, 1
  retn

KiCustomAccessHandler8 endp

As a direct result of scrubbing the exception and context records by the
assembly-language exception routine, it is no longer possible to use the old
mechanism of looking for an access violation referencing 0xFFFFFFFFFFFFFFFF in
order to differentiate a PatchGuard exception from the many legitimate kernel
mode exceptions.  In other words, PatchGuard attempts to hide in plain sight
amongst the normal background noise of kernel mode exceptions, the vast
majority of which exist inside filesystem-related code.

2.3) Executing PatchGuard Without SEH

One recurring theme that has continued to remain a staple for PatchGuard since
its inception is the use of structured exception handling to obfuscate the
calls to PatchGuard.  The intention here is to use the many differences of SEH
between x64 and x86, and the lack of disassembler support for x64 SEH to make
it difficult to understand what is happening when calls to PatchGuard are
being made.  Ironically, this use of x64 SEH as an obfuscation mechanism has
been a catalyst for much of the author's research [2] into Windows x64 SEH.
Today, it is the author's opinion that x64 exception handling is now publicly
documented to an extent that is comparable (or even exceeds) that available
for x86 SEH.

Although x64 SEH may have been useful as an obfuscation technology initially,
it had clearly worked its way up to a major liability after PatchGuard 2 had
been released.  This is due to the fact that SEH-related aspects of PatchGuard
had been successfully used to defeat PatchGuard on multiple occasions.  With
the advent of PatchGuard 3, the authors of PatchGuard siezed the opportunity
to extricate themselves in some respect from the liability that x64 SEH had
become.

PatchGuard 3 introduces a special mode of operation that allows it to function
without using SEH.  This is a significant change (and improvement) with
respect to how PatchGuard has traditionally operated.  It eliminates a major
class of single points of failure in that the exception dispatching path is
particularly vulnerable to external interference in terms of third party
drivers intercepting SEH dispatching before control is transferred to actual
exception handlers.  The SEH-less mode of PatchGuard 3 operates by copying a
small section of code into non-paged pool memory (as part of a PatchGuard
context block).  This code is then referenced by a timer object's
DeferredRoutine at the non-paged pool location in question.  The code referred
to by the timer object is essentially a stripped down version of what happens
when any of the re-purposed DPC routines are invoked by PatchGuard: it sets up
a call to the first stage self-decrypting stub that ultimately calls the
system check routine.

By completely eliminating SEH as a launch vector for PatchGuard, many bypass
techniques that hinged on being able to catch PatchGuard in the SEH
dispatching code path are completely invalidated.  In an example of defense in
depth in terms of software protection systems, the old, SEH-based system is
still retained (with the previously mentioned modifications), such that a
would-be attacker now has multiple isolated launch vectors that he or she must
deal with in order to block PatchGuard from executing.  Annotated disassembly
of the direct call routine that is copied to non-paged pool and invoked
without SEH is presented below:

KiTimerDispatch proc near
  pushf
  sub     rsp, 20h
  mov     eax, [rsp+28h+var_8]
  xor     r9d, r9d
  xor     r8d, r8d
  mov     [rsp+28h+arg_0], rax
; [rcx+40] -> PatchGuard Decryption Key
  mov     rax, [rcx+40h]
  mov     rcx, 0FFFFF80000000000h
  xor     rax, rdx
; Form a valid address for the PatchGuard context block by
; xoring the decryption key with the DeferredContext
; argument.
  or      rax, rcx
; Set the initial code for the stage 1 self-decrypting stub.
  mov     rcx, 8513148113148F0h
  mov     rdx, [rax]
  mov     dword ptr [rax], 113148F0h
  xor     rdx, rcx
  mov     rcx, rax
; Call the stage 1 self-decrypting stub.
  call    rax
  add     rsp, 20h
  pop     rcx
  retn
KiTimerDispatch endp

2.4) Randomized Call Frames in Repurposed DPC Routine Exception Paths

One of the bypass vectors proposed for PatchGuard 2 was to intercept execution
at _C_specific_handler, detect PatchGuard, and resume execution at the return
point of the PatchGuard DPC (i.e. inside the timer or DPC dispatcher).  This
is trivially possible due to the extensive unwind metadata present on Windows
x64 combined with the fact that a DPC that has been re-purposed by PatchGuard
does no useful work (other than invoking PatchGuard) and has no meaningful
effect on any out parameters or return value.

In order to counteract this weakness, PatchGuard 3 introduces a random number
of function calls when a re-purposed DPC is called, but before any exception
is triggered.  The intent with this randomization of the call frame stack is
to invalidate the approach of always unwinding one level deep in order to
effect a return from the DPC routine in question.  Because there are a random
number of call frames between the point at which an exception is raised and
the start of the PatchGuard DPC routine, and the fact that the PatchGuard DPC
routines are not exported, it is more difficult to safely return out of a
PatchGuard DPC routine from the anywhere in the SEH dispatching code path.

An example of the call frame randomization code is provided below (in this
case, ecx is initialized to small, random number that denotes the number of
calls to make).  There are a number of routines in the form of
KiCustomRecurseRoutineN where N is [0..9], each identical.

KiCustomRecurseRoutine4 proc near
  sub     rsp, 28h
  dec     ecx
  jz      short retpoint
  call    KiCustomRecurseRoutine5
retpoint:
  mov     eax, [rdx]
  add     rsp, 28h
  retn
KiCustomRecurseRoutine4 endp

Although unwinds can still be performed, an attacker would need to be able to
locate the actual return address of the PatchGuard DPC routine which might
involve differentiating between the bogus KiCustomRecurseRoutine calls and the
actual call into the DPC routine itself.

3) Additional Protection Mechanisms

PatchGuard 3 and PatchGuard 2 both share some additional protection mechanisms
that have not been previously described. This chapter includes a description
of these protection mechanisms.

3.1) Timer List Obfuscation

PatchGuard 2 and PatchGuard 3 employ an obfuscation scheme that is used to
obfuscate timer and DPC object pointers in the timer list.  This obfuscation
scheme hinges around two special kernel variables, KiWaitAlways and
KiWaitNever that represent two random obfuscation keys that are calculated at
boot time.  These obfuscation keys are used to encode various pointers (such
as links to DPC objects in a KTIMER object residing in the kernel timer list)
that are intended to be protected from outside interference.  For example, the
following algorithm is used to decode the KDPC link in a KTIMER object when a
timer DPC is going to be executed at expiration:

ULONGLONG Deobfuscated;
PKDPC     RealDpc;

Deobfuscated = Timer->Dpc ^ KiWaitNever;
Deobfuscated = _rotl64(Deobfuscated, (UCHAR)KiWaitNever);
Deobfuscated = Deobfuscated ^ Timer;
Deobfuscated = _byteswap_uint64(Deobfuscated);
Deobfuscated = Deobfuscated ^ KiWaitAlways;

RealDpc      = (PKDPC)Deobfuscated;

By virtue of being non-exported kernel variables, the original intention of
such a scheme was to make it difficult for third party drivers to easily
interfere with the timer list or certain other protected pointers.  However,
the algorithm itself is fairly easy to understand once one locates code that
references it (such as most any timer-related code in the kernel), which
simply leaves detecting the values of KiWaitAlways and KiWaitNever at runtime
as the only remaining protection for the timer list to DPC object obfuscation.

Ironically, the kernel debugger extension !kdexts.timer implements the
decoding algorithm (in kdexts!KiDecodePointer) so that a valid timer list can
be presented to the user if the timer display command is invoked.  Because the
kernel debugger has access to PDB symbols for the kernel, it can trivially
locate KiWaitAlways and KiWaitNever.

3.2) Anti-Debugging Code at PatchGuard Initialization Time

As with PatchGuard 2, PatchGuard 3 includes a sizable amount of anti-debugging
code at runtime that is intended to frustrate attempts to step through the
PatchGuard initialization routines with a debugger.  Most of this code is
based upon checking if a debugger is present while the PatchGuard
initialization routines are executing (which should not typically occur as the
PatchGuard initializtion routines are only called if a debugger is not
attached), and if a debugger is so detected, disable interrupts and entering a
spin loop so as to unrecoverably freeze the system.

Although this anti-debugging code may appear intimidating at first, disabling
them is only a matter of locating all references to KdDebuggerNotPresent
within the PatchGuard initialization routine and patching out the checks into
the debugger.  For example, the author used the following set of commands in
the debugger at initialization time to disable the anti-debugging checks for
Windows Vista x64 SP0, kernel version 6.0.6000.16514:

bp nt!KeInitAmd64SpecificState + 12 "r @edx = 1 ; r @eax = 1 ; g"
bp nt!KiFilterFiberContext
eb nt!KiFilterFiberContext+0x20 eb
eb nt!KiFilterFiberContext+0x19a eb

eb fffff800`01c63d22 eb
eb fffff800`01c64686 eb
eb fffff800`01c652be eb
eb fffff800`01c65334 eb
eb fffff800`01c65880 eb
eb fffff800`01c65a65 eb
eb fffff800`01c67479 eb
eb fffff800`01c68798 eb
eb fffff800`01c6a940 eb
eb fffff800`01c6b7a9 90 90
eb fffff800`01c6b7dd eb
eb fffff800`01c6bad9 eb
eb fffff800`01c6d0e7 eb
eb fffff800`01c6d2f6 eb
eb fffff800`01c6d650 eb
eb fffff800`01c65c3a 90 90 90 90 90 90
eb fffff800`01c690b1 90 90 90 90 90 90

3.3) KeBugCheckEx Protection

One of the first bypass mechanisms proposed for PatchGuard 1 was to hook the
code responsible for bugchecking the system[4].  From there, an
attacker would simply resume normal system execution.

There are several defensive mechanisms in place to prevent this.  In the the
current version of PatchGuard, the entire contents of the thread stack are
filled with zeros, making it difficult to resume execution of whichever thread
was responsible for calling into PatchGuard.  Furthermore, PatchGuard appears
to make a copy of KeBugCheckEx at system initialization time, and copy this
version over the actual code residing within the kernel at runtime just before
bringing down the system in a bug check.  This is clearly visible by making a
modification to KeBugCheckEx in the debugger just as one enters the PatchGuard
check context, and then setting a breakpoint on the internal function in the
PatchGuard context to call KeBugCheckEx after clearing the stack and all
registers.  If one then examines KeBugCheckEx, any modifications that have
been made will have vanished.

Additionally, PatchGuard appears to disable DbgPrint (patching it out with a
"ret" opcode) before calling KeBugCheckEx.  This may have been a (failed)
attempt to prevent easy access to execution within KeBugCheckEx without
actually patching KeBugCheckEx itself, which would circumvent the
aformentioned protection on modifications to the bugcheck code itself.
(KeBugCheckEx ordinarily utilizes DbgPrintEx to display a banner to the
debugger when a bug check occurs.  However, because PatchGuard only patches
DbgPrint, there is no little to no effect in terms of what ends up happening
when the bug check finally does happen.)

This code can be seen in the PatchGuard check routine, just before a call to
the KeBugCheckEx wrapper is made.  The pointer to DbgPrint is established
during PatchGuard initialization at boot time.

mov     rax, [rbx+PATCHGUARD_CONTEXT.DbgPrint]
mov     byte ptr [rax], 0C3h ; '+' ; ret

3.4) Two-Stage Code Deobfuscation

One of the more interesting defensive features of PatchGuard 2 and PatchGuard
3 is the mechanism by which it obfuscates the PatchGuard check context, or the
code and data necessary to verify system integrity.  PatchGuard contexts are
obfuscated such that they are completely randomized in-memory while inactive,
and change their location and obfuscation keys (and thus contents) each time
the context is invoked to check system integrity.

The decryption phase of PatchGuard is split into two stages.  The first stage
is essentially a small stub that remains completely obfuscated in-memory until
just before it is called.  The caller overwrites the first instruction in the
stub that is called with a "lock xor qword ptr [rcx], rdx" instruction.  The
arguments to the stub are the address of the stub itself (in rcx), and the
decryption key (in rdx).  Thus, the first instruction now modifies itself (and
more importantly the subsequent instruction, as each instruction is 4 bytes
long but modifies 8 bytes of opcode bytes), which results in being another xor
instruction.  A small series of these xor instructions continues until the
second stage of the decoding stub is completely decoded.

At this point, the second stage of the decoding stub is plaintext and may now
execute.  The second stage consists of a loop of xor operations starting at
the end of the PatchGuard context and moving backward until the entire check
routine is decoded.  Additionally, the decryption key is shifted each xor
round during the second stage decoding process.

After the second stage decoding loop is complete, control is transferred to
the now-plaintext integrity check routine (all of the supporting data, such as
critical function pointers into the kernel, will also have been translated
into plaintext at this point by the second stage decoding loop).

Source code to a basic program to decrypt a PatchGuard memoy context is
included with the article.  The program expects to be supplied with a file
containing "dq" logs from the kernel debugger that cover the entire memory
context, along with the decryption key (at KDPC + 0x40) and
KDPC->DeferredContext values.

4.5) Code Patching Support

Given PatchGuard's penchant for blocking attempts to patch the kernel, one
would think that all kernel code is essentially expected to be fixed in stone
at boot time.  However, this is not really the case.  There are a number of
approved kernel patches that PatchGuard supports.  For example, several
functions (such as SwapContext) can be patched in approved ways if hypervisor
support is enabled.  In the case of SwapContext, for instance, a runtime patch
is made to redirect execution to EnlightenedSwapContext through a jump
instruction being written to the start of the routine.  PatchGuard appears to
detect and permits patches to these functions through special exemptions (one
can observe the address of functions such as SwapContext being stored in the
PatchGuard context at initialization time, presumed to be for such a purpose).

The code responsible for checking the integrity of the SwapContext patch is
provided below.  Because the check ensures that a branch can only occur to
EnlightenedSwapContext, it would be difficult to utilize this code to perform
an arbitrary patch at SwapContext.

cmp     rdi, [rbx+PATCHGUARD_CONTEXT.SwapContext]
jnz     short NotSwapContextExemption
cmp     byte ptr [rdi], 0EBh ; 'd' ; backward jmps (short)
jnz     short NotSwapContextExemption
cmp     byte ptr [rdi+1], 0F9h ; '·'
jnz     short NotSwapContextExemption
cmp     byte ptr [rdi-5], 0E9h ; 'T' ; jmp (long)
jnz     short NotSwapContextExemption
mov     rcx, [rbx+PATCHGUARD_CONTEXT.EnlightenedSwapContext]
movsxd  rax, dword ptr [rdi-4]
sub     rcx, rdi
cmp     rax, rcx
jz      short BadSwapContextHook

There also exists a second set of patches that PatchGuard must allow for
compatibility with older processors.  Very early releases of x64 processors by
Intel did not implement the prefetch instruction, and so the kernel has
support for detecting an illegal opcode fault on a prefetch instruction, and
reacting by patching out the prefetch opcode on-the-fly.  However, this sort
of on-the-fly patching is not normally permitted by PatchGuard (for obvious
reasons), at least not without special support.  During initialization,
PatchGuard generates some code that executes a prefetch operation, and then
checks whether the the count of patched prefix instructions was incremented
after executing the patch code.  Assuming that the processor is an older model
without prefetch support, then a special exemption (the "prefetch whitelist")
is activated the exempts a list of RVAs from the image base from PatchGuard's
checks.  This list of RVAs is stored in a binary resource appended to
ntoskrnl.exe (named "PREFETCHWLIST").

The code for detecting if the prefetch exemption should be enabled at boot
time is as follows (the result of the check is, for Windows Server 2008 Beta
3, stored at offset 2B1 into the PatchGuard context):

call    KeGetPrcb
mov     ecx, 2
cmp     [rax+63Dh], cl  ; Prcb->CpuVendor
mov     [rsp+0EC8h+var_D48], rax
jnz     short SkipEnablePrefetchPatchExemption
lea     rdx, [rsi+214h] ; PrefetchRoutineCode
mov     dword ptr [rdx], 0C3090D0Fh ; prefetch [rcx] ; ret
mov     ebx, cs:KiOpPrefetchPatchCount
lea     rcx, [rsp+0EC8h+arg_18]
call    rdx
mov     ecx, cs:KiOpPrefetchPatchCount
cmp     ebx, ecx
jz      short SkipEnablePrefetchPatchExemption

mov     [rsi+2B1h], dil ; EnablePrefetchPatchExemption

SkipEnablePrefetchPatchExemption:
;
; Initialization continues ...
;
mov     eax, 100000h

4) Bypass Mechanisms and Countermeasures

Like PatchGuard 2, it would be folly to state that PatchGuard 3 is
invulnerable to assault by third party driver code intent on performing
operations blocked by PatchGuard.  There are many possible attacks for the new
defenses in PatchGuard 3 (as well as several possible countermeasures that
Microsoft could take in order to break the proposed bypass mechanisms in a
future PatchGuard iteration).  This article will describe specific attacks
that is capable of defeating PatchGuard 3.

4.1) Hybrid Exception Interception and Memory Searching

As PatchGuard 3 utilizes completely randomized (self-decrypting) blocks of
code and data for its constituent PatchGuard contexts in the SEH execution
case, it is not generally possible to trivially locate and disable PatchGuard
contexts through a non-paged pool scan.  Additionally, due to PatchGuard 3's
break on relying upon SEH to invoke PatchGuard in all cases, it is also not
generally possible to disable PatchGuard 3 reliably via interception of the
SEH dispatching code path.

While these defenses do complement one another, there still exists weaknesses
that can be exploited by a third party.  Specifically, when PatchGuard is
running through a re-purposed DPC routine that is invoked via SEH, it is
vulnerable in that the SEH dispatching code path can be intercepted to locate
(and disable) PatchGuard just before it is executed.  Furthermore, in the case
where PatchGuard runs without any SEH obfuscation, it is vulnerable to a
memory search, as there is (necessarily) some static code placed in non-paged
pool memory which makes the translation between the DPC function calling
convention and the PatchGuard stage 1 decryption routine's calling convention.

By combining a memory search approach with the previously described SEH
interception approach, it is possible to attack both launch vectors of
PatchGuard simultaneously, with the effect of disabling it no matter which
vector(s) are used in a particular boot.

However, there are still some sticking points that need to be resolved in the
SEH interception case.  As previously mentioned, the SEH-obfuscation-based
launch vector was significantly improved over PatchGuard 2, with obfuscation
of the exception information and randomization of the call stack from the
point of view of the exception dispatcher logic itself.  These obstacles must
be overcome in order to successfully mount an attack using this approach
against PatchGuard 3.

The first problem relating to the obfuscation and randomization of the
exception information turns out to not be the roadblock that one might think
at first glance.  There are some weaknesses of the obfuscation logic that
allow the true colors of the exception to show through if one is clever about
examining the information available at the point of _C_specific_handler.
Furthermore, it is also possible to hook at a lower level than
_C_specific_handler, such as KiGeneralProtectionFault (easily located by
examining the IDT), which would get one in before the assembly-language
exception handler logic has a chance to fudge the exception information.

Although the KiGeneralProtectionFault vector is easier to implement in that it
completely bypasses one of the new defensive mechanisms with respect to the
SEH-related PatchGuard execution code path, it is again still possible to
attack PatchGuard using _C_specific_handler by relying upon information
leakage when _C_specific_handler is called.  Specifically, all exceptions
altered by PatchGuard originate within the confines of the kernel itself, all
of the exceptions have two parameters (most of the "legitimate" versions of
exceptions like STATUS_INSUFFICIENT_RESOURCES always have zero parameters,
because they originate from within RtlRaiseStatus which never stores any
exception parameters in the exception record), and somewhere in the call stack
the kernel routine responsible for dispatching DPCs or timer DPCs is going to
be present.

By combining these facts, it is possible to make a highly accurate
determination as to whether an exception is caused by PatchGuard.  The latter
piece of information (checking whether the routine responsible for calling the
DPC or timer DPC is in the call stack) also proves valuable when one must
later counteract the second defense added to the SEH code path, that is, the
randomization of the call stack.

In order to determine whether the DPC or timer DPC dispatcher is in a given
call stack, it is first necessary to locate it in the kernel image.  There are
some complications here.  First of all, the timer DPC dispatcher routine has
three call instructions that can call a timer DPC, not all of which are
readily triggerable.  Additionally, neither the timer DPC dispatcher or the
DPC dispatcher are exported.

However, while it is not possible to simply ask for the addresses of those two
routines, it is possible to find them programmatically by requesting that a
DPC and a timer DPC be executed through the documented APIs for DPCs and timer
DPCs.  From within the DPC or timer DPC routine, it is then possible to locate
the return address via the use of the ReturnAddress() compiler intrinsic.
This works because the return address will be guaranteed to reside within the
DPC or timer DPC dispatcher.  Alternatively, an assembly language routine
could be written that simply examines the current pointer at [rsp] at the time
of the call.

This still leaves a problem in the timer DPC dispatcher case, as there are
three call instructions, and it is not easy to observe calls from all three
call sites within the timer DPC dispatcher on-demand, since it is necessary to
programmatically find the return points at runtime.  However, once again, the
very same metadata that is critical to x64 SEH support dooms PatchGuard with
respect to this approach, as it is possible to go from an arbitrary
instruction in the middle of any function to the start of that function, by
following chained unwind metadata until an unwind metadata block is reached
that has no parent [4].  This top-level unwind metadata block has a reference
to the first instruction in the function.  Now that it is possible to locate
the start of a function from any arbitrary valid instruction location within
that function, it becomes trivial to determine if two addresses reside in the
same function; to do this, one must only follow the unwind metadata chain for
both addresses, and then check to see whether both top-level unwind metadata
blocks refer to the same function.  With this technique, combined with the
ability to locate at least one call site within the timer DPC dispatcher, it
again becomes possible to identify the timer DPC dispatcher, as no matter
which call site is used, it will be guaranteed that the call site resides
within the timer DPC dispatcher routine KiTimerExpiration.  By comparing
top-level unwind metadata blocks, it becomes possible to authoritatively
discern whether any arbitrary instruction resides within the timer DPC
dispatcher or not.

It is also possible to bypass the alterations to the exception (and
instruction pointer) addresses that KiCustomAccessHandler (the
assembly-language "first chance" exception handler routines for the repurposed
DPC routines) makes by performing a stack trace from the _C_specific_handler
itself instead of relying on the context record or exception handler
information.  This is because the call stack is conveyed as if the faulting
instruction in the repurposed DPC call stack was the site of a call to
KiGeneralProtectionFault.  As a result, it is possible to substitute the
current context for the context presented to _C_specific_handler for unwind
purposes.  This also provides a layer of defense against Microsoft altering
other registers in the exception handler context in future PatchGuard
revisions, which could cause manual unwinds to return incorrect register
values, resulting in system crashes after an unwind intended to effect a hard
return out of the re-purposed DPC routine.

Furthermore, by clever usage of this mechanism for determining whether an
address resides within a particular function, it is also now possible to
determine the real return address for any given re-purposed DPC routine.
Specifically, by checking whether each address in the call stack as of
_C_specific_handler is within either the DPC dispatcher or the
timer DPC dispatcher, one can determine whether a given call frame corresponds
to the call site that called the re-purposed DPC routine or not, irrespective
of any random amount of bogus function calls that may be layered on top of the
re-purposed DPC.  This in turn defeats the remaining improvement to the SEH
PatchGuard code path, as it once again becomes possible to cleanly unwind from
any arbitrary point in the PatchGuard exception callstack.

Through the combination of the ability to either circumvent entirely or "see
through" the deception that KiCustomAccessHandler creates over the exception
information passed to _C_specific_handler, and the ability to
recover the correct return address of a repurposed DPC routine, it now becomes
possible to disable the SEH control flow path of PatchGuard 3.  This leaves
the remaining problem of locating the non-SEH control flow path of PatchGuard
in non-paged pool memory as the last piece of the puzzle with respect to this
method of disabling PatchGuard.  However, locating the trampoline routine that
adapts a DPC routine call to a PatchGuard stage 1 decryption stub call is
trivial, as the adapter trampoline is static and contains a very recognizable
signature in terms of the constants written to the beginning of the decryption
stub.  In order to disable the trampoline routine, it is enough to simply
patch it with a "ret" instruction (effectively the same thing as the SEH
bypass technique, but as implemented in code instead of a virtual unwind).

The source code to a working implementation of the hybrid exception
interception and memory searching bypass technique for PatchGuard 3 is
included with the article.

Although this approach is successful in disabling the current iteration of
PatchGuard 3, it is not without its weaknesses.  Microsoft could, for
instance, disable this technique via altering the SEH-less PatchGuard
DPC-to-decryption-stub adapter to not be static (i.e. randomization of the
code placed into non-paged pool at runtime).  There are also a number of
assumptions of the SEH-based approach that could be invalidated by Microsoft
in a future PatchGuard release.  However, in keeping with the fact that it is
possible to gain control flow at a lower level than the exception dispatcher
path itself (i.e. patching KiGeneralProtectionFault), the author feels that it
would be better to focus on removing relevant information before any exception
handlers (assembler or C-language) are called instead of after the defining
moment (in other words, the exception) occurs, as it is the exception that
presents the first easily-accessible interception point to an outside
attacker.

4.2) Timer DPC Dispatcher and DPC Dispatching

Although PatchGuard 3 eliminates SEH as a single point of failure with respect
to executing the system integrity checks, the timer and DPC dispatchers
continue to remain attractive targets.  One simple bypass mechanism is to
locate the call sites in both routines (such as by recording the addresses of
both dispatcher routines as described in bypass technique 1, and then
performing disassembly to locate and patch all call sites.  At each call site,
it is possible to detect that PatchGuard is being executed by looking for
either a non-canonical DeferredContext parameter value or a DeferredRoutine
that resides within the non-paged pool.  (In PatchGuard 3, implementing the
former check alone proves sufficient, as for the ease of the implementation of
PatchGuard 3, both the repurposed DPC routines and the non-SEH-based control
path use compatible calling conventions, which stipulate a non-canonical
obfuscated pointer value as the DeferredContext parameter.)

The main disadvantage of this approach involves inherent difficulties in
performing arbitrary code patching in x64 (specifically, the large size of any
code patch and the large number of now relatively common
instruction-pointer-relative instructions).  However, given that this is a
difficulty that impacts any code patching on x64, the author feels that it
should not be considered a significant problem for  a determined attacker.  In
fact, Microsoft Research's very own Detours implements a code patching system
for x64[5], illustrating that code patching on x64 in general
is not a task that should be considered insurmountable by any means.

Because the timer and DPC dispatchers remain relatively unprotected targets
that have not been involved in public bypass source code that has been
released to date, the author would recommend bolstering the defenses of the
timer and DPC dispatcher for the next PatchGuard release, as the two routines
continue to represent an attractive single point of failure.  Adding a third
PatchGuard execution mechanism that does not involve traditional DPCs at all
would be an example of one approach to eliminate the DPC dispatcher related
logic as a single point of failure.  It may also be possible to increase the
difficulty of locating all the call sites within the DPC dispatching related
code through a combination of differing static call stack differences for each
of the three call sites of the timer DPC dispatcher (i.e. adding dummy
function calls) combined with call stack randomization on top of static call
stack differences between each of the three timer DPC dispatcher calll sites.
Randomized call stacks alone would not suffice as by examining the call stacks
of many iterations of timer DPC requests, it would become easy to eliminate
the randomized entries (which would not be common to all recorded call stacks)
with a relatively high degree of accuracy given a large sample size.  A
disadvantage to taking such an approach is that it would essentially result in
adding deliberately-difficult-to-maintain "spaghetti code" into yet another
critical area of the operating system (timer DPC dispatcher logic).  The
author suspects that the maintainer of the timer DPC dispatcher code would
likely not appreciate having to deal with such things.

4.3) Canceling the PatchGuard Timer(s)

As PatchGuard continues to rely upon timer DPCs for the execution of its check
routines, the kernel timer DPC list itself continues to remain a relatively
attractive target for attack.  The timer DPC list is common to all control
paths leading to PatchGuard, as timers are always used for the delayed
execution component that periodically calls the check routine.

There are presently two obstacles in the way of the timer DPC list.  The first
of which is that altering it relies upon locating non-exported kernel
variables.  Although it may be possible to do so via fingerprinting, this does
make the approach slightly less desirable than it might initially appear.
However, fingerprinting can work if done carefully, and there are many short
functions that reference the timer list in a fairly predictable fashion (e.g.
KeCancelTimer).  One other possible way to find the DPC list would be to
create and set a timer (thus inserting it into the timer list), and then scan
every 8-byte-aligned value in a non-paged uninitialized data section in
ntoskrnl, treating each valid address as a linked list and searching the first
several entries for the timer that was just linked into the list.  While a
rather ugly and bruteforce-based approach (and not entirely safe either as one
would need to be relying heavily on MmIsAddressValid), scanning the ntoskrnl
data sections is one alternative to fingerprinting in terms of finding the
timer list.

The secondary problem with this approach is that starting with PatchGuard 2,
the timer list itself is obfuscated such that the link between a KTIMER object
and its corresponding KDPC is obfuscated.  This obfuscation mechanism, as
previously described{backref to 1}, hinges upon two additional non-exported
kernel variables (KiWaitAlways, KiWaitNever) that act as obfuscation keys.
Locating these variables would be likely entail code analysis or
fingerprinting of (possibly exported) routines that need to insert a timer
into the timer list, such as KeSetTimerEx.

Another alternative approach that dispenses with fingerprinting and/or
bruteforce-based approaches altogether, at the expense of requriring added
complexity (a user mode component), would be to postpone the activation of any
driver code that would run afoul of PatchGuard until after Win32 in user mode
has been started.  A user mode service could then be created that would
download the symbols for the kernel binary in use, retrieve the addresses of
KiTimerTableListHead (the timer list), KiWaitNever and KiWaitAlways, and pass
these addresses on to the driver via any standard user mode to kernel mode
communication mechanism (such as DeviceIoControl).  Because the kernel
debugger relies on the ability to retrieve these variables by name via the PDB
symbols for the !kdexts.timer extension, Microsoft would not be able to block
this approach by removing or renaming the obfuscation key variables without
imparing the functionality of existing debugger binaries.

Once one has located the KiTimerTableListHead, KiWaitAlways, and KiWaitNever,
it is a fairly simple (if perhaps unsafe without synchronization, though one
could always take the "sledgehammer" approach and stop all but one CPU and
raise IRQL to HIGH_LEVEL) to traverse the timer list, deobfuscate
the DPC link on each corresponding timer object, and from there check each
timer to see whether it bears the characteristics of being a PatchGuard timer
(which may include attributes like a timer interval several minutes into the
future, a non-canonical DeferredContext value, and possibly a DPC routine
pointer into non-paged pool).  After one has located the timer in question, it
can be easily disabled (either removing it from the list entirely, such as via
KeCancelTimer, or by rewriting the DPC routine to point to an empty function
that simply returns without performing any operation.

Because Microsoft has functionality in the debugger that depends on the
ability to use these variables to access the timer list, they have
unfortunately backed themselves into something of a corner with respect to
current operating system versions, as it is generally Microsoft's policy that
existing debugger binaries continue to function properly after hotfixes or
service pack to a particular already-released operating system version.  The
best ways to counteract this approach would be to make it more difficult to
pick out the PatchGuard DPC in-memory with respect to all of the other timer
DPC objects that are in the list at any given time for a typical system, and
to create additional launch vectors for PatchGuard that do not depend so
heavily on the timer list.  There exist a number of other ways to execute code
without drawing the attention of someone that does not know what they are
looking, many of which are less obvious than a timer.

4.4) Page-Table Swap

Like all memory accesses in the Windows kernel, PatchGuard's system integrity
check routine operates in protected mode with paging enabled.  It may
theoretically be possible to take advantage of this fact to hide kernel
patches from PatchGuard.

The proposed bypass technique would involve patching the first instruction in
the timer and DPC dispatchers to branch to third party code.  When a DPCs and
timer DPCs are about to be considered for execution, as signaled by a call to
one of the two dispatcher routines, a shadow copy of the page tables is
created.  This shadow copy is configured to be identical to the normal page
table for the current process, except that the page table entries for any
kernel code  pages that have been patched are altered to refer to physical
pages that are representative of the original state.  The return address of
the DPC or timer DPC dispatcher on the stack is swapped with a pointer into
driver-supplied code, and cr3 is reconfigured to point to the shadow page
table.  Then, execution is transferred back to the timer or DPC dispatcher
entrypoint (which no longer shows any signs of patching due to the page table
swap), and DPCs are dispatched.  When the dispatcher is finished with its
work, which would include invoking PatchGuard if PatchGuard is to be executed
in any batched timer DPCs, then control is returned to driver-supplied code,
which then mirrors any page table modifications since the shadow copy was made
back to the actual page table for the process, and cr3 is returned to its
original value.  Control is then transferred to the normal return point of the
dispatcher.

This approach does not involve disabling PatchGuard at all.  Instead, it
describes a potential way to "peacefully coexist" with it, so long as only
kernel code patches are being done.  (Data pages, which could be expected to
be modified by a DPC, are considered by the author to be much less practical
to protect from PatchGuard in this fashion.)  Because the DPC and timer DPC
dispatcher logic executes at IRQL DISPATCH_LEVEL, thread context
switching is disabled for the current thread, making the cr3 swap approach
relatively feasible.

Because this approach does not involve attacking PatchGuard directly, it
automatically circumvents all of the myriad defensive mechanisms built into
PatchGuard in current releases, making it a fairly attractive potential avenue
of attack.  However, there are some downsides.  Among other things, the
synchronization required to pull a page tabpe swap off in a multiprocessor
environment are likely to be complex and difficult to safely duplicate if one
allows DPC routines to perform operations that alter PTEs.  Additionally,
there would be a performance impact incurred by this approach as it would need
to run continuously in a relatively high-impact path (DPC dispatching)
throughout system lifetime.  The performance implications of invalidating TLBs
on every DPC batch may be problematic in some circumstances (swapping cr3
automatically clears out TLBs).

Another disadvantage of this approach is that by virtue of the fact that all
DPCs (and potentially all device hardware interrupts) may run with the shadow
copy of the page table, most hardware-related events will not be subject to
kernel code patches hidden by this mechanism.  This may or may not be a
problem depending on what the goal of the desired kernel patching is.

Microsoft could counteract this approach by making a copy of all PTEs that
describe the kernel at PatchGuard initialization time, and then validate all
kernel code PTEs from within the PatchGuard check routine.  Additionally, if
Microsoft could make the assumption that PatchGuard always executes in the
system process, another approach could be to require that cr3 take on a known
value.

4.5) DPC Exception Handler Patching

One of the changes introduced in PatchGuard 3 over PatchGuard 2 was a slight
change to the protocol used to invoke the first stage of the decryption
process.  Specifically, all callers of an encrypted PatchGuard context now
include a static 8-byte string (of instruction opcodes) that is xor'd with a
value at the start of the PatchGuard context to form the initial decryption
key.

The reasons for making this change over the original behavior are unclear to
the author, but it unfortunately represents an easy target for disabling
PatchGuard, as the string itself (0x8513148113148F0) is fairly unique and
unlikely to appear outside of PatchGuard in terms of kernel code.
Furthermore, all PatchGuard callers, including all ten of the repurposed DPC
routine exception handlers and the non-paged pool memory DPC adapter (if used)
reference the string with no obfuscation to speak of.  This presents an
extremely easy, fingerprint-based approach to disabling PatchGuard.  By
scanning non-paged pool space for this string, as well as kernel code regions,
it is trivially easy to locate an instruction in the middle of the every
single code path responsible for invoking PatchGuard's check context.

After the instructions referencing the 8-byte string have been located, it is
trivial to patch them to execute an unwind out of the exception handler logic
(or in the case of the non-paged pool memory code, simply return directly).
Such an attack prevents PatchGuard from ever starting, and furthermore has the
advantage of a minimum of additional supporting logic required (when compared
to many of the other bypass techniques outlined in this article).

It would be trivial for Microsoft to disable this technique.  The
recommendation of the author would be to get rid of the static 8-byte string
referenced in every PatchGuard caller.  Ironically, PatchGuard 2 necessarily
has a similar 4-byte string (which is also still used in PatchGuard 3),
representing the initial instruction of the first stage decryption stub.
Unlike with PatchGuard 3, however, PatchGuard 2 takes care to obfuscate the
process of writing the opcode string out to the PatchGuard context, so that
one cannot simply use a single blanket fingerprint to cover all cases.  The
change made in PatchGuard 3 completely blows this work out of the water, so to
speak, and it has the added advantage of being twice as large as a value to
fingerprint as well.

4.6) System Call MSR Swap

A variation on the technique described in {backref:4}, it should theoretically
be possible to swap the system call MSRs (or in fact several other processor
control registers that are protected by PatchGuard) for the duration of DPC or
timer DPC dispatching online, with the "tainted" values being restored after
the dispatcher returns.  The system call MSRs are responsible for designating
the address of the system call dispatcher, and are thus an attractive target
for third parties that would like to perform system call hooking.

The same basic concepts would be applied to this technique as previously
described in the cr3 swap technique.  If system calls are the only desired
targets to hook, then the cr3 swap can be eliminated as unnecessary for single
processor systems (as it would be safe to make and restore changes to the
actual underlying physical pages before and after a DPC dispatcher call, using
the return address on the stack as a way to return to the altered location
without leaving opcodes patched in the kernel across dispatcher invocations).
For multi-processor systems, some mechanism would need to be developed to
allow the MSR swap to be made across DPC dispatchers while preventing code
patches from becoming visible to a second processor.  This is necessary
because there could be more than one PatchGuard context executing
simultaneously with the PatchGuard 3 addition of a probability to initialize a
second check context at system boot time.

In order to block such a technique, Microsoft would likely be best served by
making it difficult to locate all the regions necessary to patch in order to
maintain the deception of an unpatched system across PatchGuard checks.  The
principal way to do this would be to create other, alternative launch vectors
for PatchGuard that are unrelated to DPCs and, preferably, do not involve
exported APIs that are easy to intercept from a third party perspective.

5) Conclusion

Although PatchGuard 3 does bring some pointed counter-attacks to many
previously disclosed bypass techniques, version 3, like its predecessors, is
hardly immune to being either disabled completely or simply co-existed with.
It is likely that future revisions to PatchGuard will continue to be
vulnerable to a variety of bypass techniques, though it is certain within
Microsoft's reach to counter many of the publicly disclosed bypass vectors.
It is anticipated by the author that until PatchGuard can be implemented with
hardware support, such as via a combination of trusted boot (TPM) and a
permanent hypervisor, future revisions will continue to be vulnerable to
attack from determined individuals.

On the other hand, Microsoft's efforts with PatchGuard appear to have paid off
so far in terms of preventing a mass-uptake of PatchGuard-violating drivers on
Windows x64.  In other words, a case could be made that Microsoft doesn't need
to be perfect with PatchGuard, only "good enough" to give vendors cold feet
about trying to ship products that bypass it.  Only time will tell if this
continues to remain the case into the future, however.

References

[1] Skywing. Subverting PatchGuard version 2.
    http://www.uninformed.org/?v=6&a=1&t=sumry; accessed September 16, 2007

[2] Skywing. Programming against the x64 exception handling support, part 7: Putting it all together, or building a stack walking routine.
    http://www.nynaeve.net/?p=113; accessed September 16, 2007

[3] skape. Improved Automated Analysis of Windows x64 Binaries.
    http://uninformed.org/index.cgi?v=4&a=1&t=sumry; accessed September 16, 2007

[4] skape, Skywing.  Bypassing PatchGuard on Windows x64.
    http://uninformed.org/index.cgi?v=3&a=3&t=sumry; accessed September 16, 2007

[5] Microsoft. Detours.
    http://research.microsoft.com/sn/detours/; accessed September 16, 2007