Zines/uninformed/7.1.txt

Reducing the Effective Entropy of GS Cookies
skape
mmiller@hick.org
3/2007

1) Foreword

Abstract: This paper describes a technique that can be used to reduce the
effective entropy in a given GS cookie by roughly 15 bits.  This reduction is
made possible because GS uses a number of weak entropy sources that can,  with
varying degrees of accuracy, be calculated by an attacker.  It is important to
note, however, that the ability to calculate the values of these sources for
an arbitrary cookie currently relies on an attacker having local access to the
machine, such as through the local console or through terminal services.  This
effectively limits the use of this technique to stack-based local privilege
escalation vulnerabilities.  In addition to the general entropy reduction
technique, this paper discusses the amount of effective entropy that exists in
services that automatically start during system boot.  It is hypothesized that
these services may have more predictable states of entropy due to the relative
consistency of the boot process.  While the techniques described in this paper
do not illustrate a complete break of GS, any inherent weakness can have
disastrous consequences given that GS is a static, compile-time security
solution.  It is not possible to simply distribute a patch.  Instead,
applications must be recompiled to take advantage of any security
improvements.  In that vein, the paper proposes some solutions that could
be applied to address the problems that are outlined.

Thanks: Aaron Portnoy for lending some hardware for sample collection.
Johnny Cache and Richard Johnson for discussions and suggestions.

2) Introduction

Stack-based buffer overflows are generally regarded as one of the most common
and easiest to exploit classes of software vulnerabilities.  This prevalence
has lead to the implementation of many security solutions that attempt to
prevent the exploitation of these vulnerabilities.  Some of these solutions
include StackGuard[1], ProPolice[2], and Microsoft's /GS compiler switch[5].  The
shared premise of these solutions involves the placement of a cookie, or
canary, between the buffers stored in a stack frame and the stack frame's
return address.  The cookie that is placed on the stack is used as a marker to
detect if a buffer overflow has occurred prior to allowing a function to
return.  This simple concept can be very effective at making the exploitation
of stack-based buffer overflows unreliable.

The cookie-based approach to detecting stack-based buffer overflows involves
three general steps.  First, a cookie that will be inserted into a function's
stack frame must be generated.  The approaches taken to generate cookies vary
quite substantially, some having more implications than others.  Once a cookie
has been generated, it must be pushed onto the stack in the context of a
function's prologue at execution time.  This ensures that the cookie is placed
before the return address (and perhaps other values) on the stack.  Finally, a
check must be added to a function's epilogue to make sure that the cookie that
was stored in the stack frame is the value that it was initialized to in the
function prologue.  If an overflow of a stack-based buffer occurs, then it's
likely that it will have overwritten the cookie stored after the buffer.  When
a mismatch is detected, steps can be taken to securely terminate the process
in a way that will prevent exploitation.

The security of a cookie-based solution hinges on the fact that an attacker
doesn't know, or is unable to generate, the cookie that is stored in a stack
frame.  Since it's impossible to guarantee in all situations that an attacker
won't be able to generate the bytes that compose the value of a cookie, it
really all boils down to the cookie being kept secret.  If the cookie is not
kept secret, then the presence of a cookie will provide no protection when it
comes to exploiting a stack-based buffer overflow vulnerability.
Additionally, if an attacker can trigger an exploitable condition before the
cookie is checked, then it stands that the cookie will provide no protection.
One example of this might include overwriting a function pointer on the stack
that is called prior to returning from the function.

While the StackGuard and ProPolice implementations are interesting and useful,
the author feels that no implementation is more critical than the one provided
by Microsoft.  The reason for this is the simple fact that the vast majority
of all desktops, and a non-trivial number of servers, run applications
compiled with Microsoft's Visual C compiler.  Any one weakness found in the
Microsoft's implementation could mean that a large number of applications are
no longer protected against stack-based buffer overflows.  In fact, there has
been previous research that has pointed out flaws or limitations in
Microsoft's implementation.  For example, David Litchfield pointed out that
even though stack cookies are present, it may still be possible to overwrite
exception registration records on the stack which may be called before the
function actually returns.  This discovery was one of the reasons that
Microsoft later introduced SafeSEH (which had its own set of issues)[6].
Similarly, Chris Ren et al from Cigital pointed out the potential implications
of a function pointer being used in the path of the error handler for the case
of a GS cookie mismatch occurring[9].  While not directly related to a particular
flaw or limitation in GS, eEye has described some of the problems that come
when secrets get leaked[3].

Even though these issues and limitations have existed, Microsoft's GS
implementation at the time of this writing is considered by most to be secure.
While this paper will not present a complete break of Microsoft's GS
implementation, it will describe certain quirks and scenarios that may make it
possible to reduce the amount of effective entropy that exists in the cookies
that are generated.  As with cryptography, any reduction of the entropy that
exists in the GS cookie effectively makes it so there are fewer unknown
portions of the cookie.  This makes the cookie easier to guess by reducing the
total number of possibilities.  Beyond this, it is expected that additional
research may find ways to further reduce the amount of entropy beyond that
described in this document.  One critical point that must be made is that
since the current GS implementation is statically linked when binaries are
compiled, any flaw that is found in the implementation will require a
recompilation of all binaries affected by it.  To help limit the scope, only
the 32-bit version of GS will be analyzed, though it is thought that similar
attacks may exist on the 64-bit version as well.

The structure of this paper is as follows.  In chapter 3, a brief description
of the Microsoft's current GS implementation will be given.  Chapter 4 will
describe some techniques that may be used to attack this implementation.
Chapter 5 will provide experimental results from using the attacks that are
described in chapter .  Chapter 6 will discuss steps that could be taken to
improve the current GS implementation.  Finally, chapter 7 will discuss some
areas where future work could be applied to further improve on the techniques
described in this document.

3) Implementation

As was mentioned in the introduction, security solutions that are designed to
protect against stack-based buffer overflows through the use of cookies tend
to involve three distinct steps: cookie generation, prologue modifications,
and epilogue modifications.  Microsoft's GS implementation is no different.
This chapter will describe each of these three steps independent of one
another to paint a picture for how GS operates.

3.1) Cookie Generation

Microsoft chose to have the GS implementation generate an image file-specific
cookie.  This means that each image file (executable or DLL) will have their
own unique cookie.  When used in conjunction with a stack frame, a function
will insert its image file-specific cookie into the stack frame.  This will be
covered in more detail in the next section.  The actual approach taken to
generate an image file's cookie lives in a compiler inserted routine called
__security_init_cookie.  This routine is placed prior to the call to the image
file's actual entry point routine and therefore is one of the first things
executed.  By placing it at this point, all of the image file's code will be
protected by the GS cookie.

The guts of the __security_init_cookie routine are actually the most critical part
to understand.  At a high-level, this routine will take an XOR'd combination
of the current system time, process identifier, thread identifier, tick count,
and performance counter.  The end result of XOR'ing these values together is
what ends up being the image file's security cookie.  To understand how this
actually works in more detail, consider the following disassembly from an
application compiled with version 14.00.50727.42 of Microsoft's compiler.
Going straight to the disassembly is the best way to concretely understand the
implementation, especially if one is in search of weaknesses.

Like all functions, the __security_init_cookie function starts with a prologue.
It allocates storage for some local variables and initializes some of them to
zero.  It also initializes some registers, specifically edi and ebx which will
be used later on.

.text:00403D58     push ebp
.text:00403D59     mov ebp, esp
.text:00403D5B     sub esp, 10h
.text:00403D5E     mov eax, __security_cookie
.text:00403D63     and [ebp+SystemTimeAsFileTime.dwLowDateTime], 0
.text:00403D67     and [ebp+SystemTimeAsFileTime.dwHighDateTime], 0
.text:00403D6B     push ebx
.text:00403D6C     push edi
.text:00403D6D     mov edi, 0BB40E64Eh
.text:00403D72     cmp eax, edi
.text:00403D74     mov ebx, 0FFFF0000h

As part of the end of the code above, a comparison between the current
security cookie and a constant 0xbb40e64e is made.  Before __security_init_cookie
is called, the global securitycookie is initialized to 0xbb40e64e.  The
constant comparison is used to see if the GS cookie has already been
initialized.  If the current cookie is equal to the constant, or the high
order two bytes of the current cookie are zero, then a new cookie is
generated.  Otherwise, the complement of the current cookie is calculated and
cookie generation is skipped.

.text:00403D79     jz  short loc_403D88
.text:00403D7B     test eax, ebx
.text:00403D7D     jz  short loc_403D88
.text:00403D7F     not eax
.text:00403D81     mov __security_cookie_complement, eax
.text:00403D86     jmp short loc_403DE8

To generate a new cookie, the function starts by querying the current system
time using GetSystemTimeAsFileTime.  The system time as represented by Windows
is a 64-bit integer that measures the system time down to a granularity of 100
nanoseconds.  The high order 32-bit integer and the low order 32-bit integer
are XOR'd together to produce the first component of the cookie.  Following
that, the current process identifier is queried using GetCurrentProcessId and
then XOR'd as the second component of the cookie.  The current thread
identifier is then queried using GetCurrentThreadId and then XOR'd as the
third component of the cookie.  The current tick count is queried using
GetTickCount and then XOR'd as the fourth component of the cookie.  Finally,
the current performance counter value is queried using
QueryPerformanceCounter.  Like system time, this value is also a 64-bit
integer, and its high order 32-bit integer and low order 32-bit integer are
XOR'd as the fifth component of the cookie.  Once these XOR operations have
completed, a comparison is made between the newly generated cookie value and
the constant 0xbb40e64e.  If the new cookie is not equal to the constant
value, then a second check is made to make sure that the high order two bytes
of the cookie are non-zero.  If they are zero, then a 10 bit left shift of the
cookie is performed in order to seed the high order bytes.

.text:00403D89     lea eax, [ebp+SystemTimeAsFileTime]
.text:00403D8C     push eax
.text:00403D8D     call ds:__imp__GetSystemTimeAsFileTime@4
.text:00403D93     mov esi, [ebp+SystemTimeAsFileTime.dwHighDateTime]
.text:00403D96     xor esi, [ebp+SystemTimeAsFileTime.dwLowDateTime]
.text:00403D99     call ds:__imp__GetCurrentProcessId@0
.text:00403D9F     xor esi, eax
.text:00403DA1     call ds:__imp__GetCurrentThreadId@0
.text:00403DA7     xor esi, eax
.text:00403DA9     call ds:__imp__GetTickCount@0
.text:00403DAF     xor esi, eax
.text:00403DB1     lea eax, [ebp+PerformanceCount]
.text:00403DB4     push eax
.text:00403DB5     call ds:__imp__QueryPerformanceCounter@4
.text:00403DBB     mov eax, dword ptr [ebp+PerformanceCount+4]
.text:00403DBE     xor eax, dword ptr [ebp+PerformanceCount]
.text:00403DC1     xor esi, eax
.text:00403DC3     cmp esi, edi
.text:00403DC5     jnz short loc_403DCE
...
.text:00403DCE loc_403DCE:
.text:00403DCE     test esi, ebx
.text:00403DD0     jnz short loc_403DD9
.text:00403DD2     mov eax, esi
.text:00403DD4     shl eax, 10h
.text:00403DD7     or  esi, eax

Finally, when a valid cookie is generated, it's stored in the image file's
securitycookie.  The bit-wise complement of the cookie is also stored in
securitycookiecomplement.  The reason for the existence of the complement will
be described later.

.text:00403DD9     mov __security_cookie, esi
.text:00403DDF     not esi
.text:00403DE1     mov __security_cookie_complement, esi
.text:00403DE7     pop esi
.text:00403DE8     pop edi
.text:00403DE9     pop ebx
.text:00403DEA     leave
.text:00403DEB     retn

In simpler terms, the meat of the cookie generation can basically be
summarized through the following pseudo code:

Cookie  = SystemTimeHigh
Cookie ^= SystemTimeLow
Cookie ^= ProcessId
Cookie ^= ThreadId
Cookie ^= TickCount
Cookie ^= PerformanceCounterHigh
Cookie ^= PerformanceCounterLow

3.2) Prologue Modifications

In order to make use of the generated cookie, functions must be modified to
insert it into the stack frame at the time that they are called.  This does
add some overhead to the call time associated with a function, but its overall
effect is linear with respect to a single invocation.  The actual
modifications that are made to a function's prologue typically involve just
three instructions.  The cookie that was generated for the image file is XOR'd
with the current value of the frame pointer.  This value is then placed in the
current stack frame at a precisely chosen location by the compiler.

.text:0040214B     mov eax, __security_cookie
.text:00402150     xor eax, ebp
.text:00402152     mov [ebp+2A8h+var_4], eax

It should be noted that Microsoft has taken great care to refine the way a
stack frame is laid out in the presence of GS.  Locally defined pointers,
including function pointers, are placed before statically sized buffers in the
stack frame.  Additionally, dangerous input parameters passed to the function,
such as pointers or structures that contain pointers, will have local copies
made that are positioned before statically sized local buffers.  The local
copies of these parameters are used instead of those originally passed to the
function.  These two changes go a long way toward helping to prevent other
scenarios in which stack-based buffer overflows might be exploited.

3.3) Epilogue Modifications

When a function returns, it must check to make sure that the cookie that was
stored on the stack has not been tampered with.  To accomplish this, the
compiler inserts the following instructions into a function's prologue:

.text:00402223     mov ecx, [ebp+2A8h+var_4]
.text:00402229     xor ecx, ebp
.text:0040222B     pop esi
.text:0040222C     call __security_check_cookie

The value of the cookie that was stored on the stack is moved into ecx and
then XOR'd with the current frame pointer to get it back to the expected
value.  Following that, a call is made to securitycheckcookie where the stack
frame's cookie value is passed in the ecx register.  The securitycheckcookie
routine is very short and sweet.  The passed in cookie value is compared with
the image file's global cookie.  If they don't match, reportgsfailure is
called and the process eventually terminates.  This is what one would expect
in the case of a buffer overflow scenario.  However, if they do match, the
routine simply returns, allowing the calling function to proceed with
execution and cleanup.

.text:0040634B     cmp ecx, __security_cookie
.text:00406351     jnz short loc_406355
.text:00406353     rep retn
.text:00406355 loc_406355:
.text:00406355     jmp __report_gsfailure

4) Attacking GS

At the time of this writing, all publicly disclosed attacks against GS that
the author is aware of have relied on getting control of execution before the
cookie is checked or by finding some way to leak the value of the cookie back
to the attacker.  Both of these styles of attack are of great interest and
value, but the focus of this paper will be on a different method of attacking
GS.  Specifically, this chapter will outline techniques that may be used to
make it easier to guess the value an image file's GS cookie.  Two techniques
will be described.  The first technique will describe methods for calculating
the values that were used as entropy sources when the cookie was generated.
These calculations are possible in situations where an attacker has local
access to the machine, such as through the console or through terminal
services.  The second technique describes the general concept of predictable
ranges of some values that are used in the context of boot start services,
such as lsass.exe.  This predictability may make the guessing of a GS cookie
more feasible in both local and remote scenarios.

4.1) Calculating Entropy Sources

The sources used to generate the GS cookie for a given image file are constant
and well-known.  They include the current system time, process identifier,
thread identifier, tick count, and performance counter.  In light of that
fact, it only makes sense to investigate the amount of effective entropy each
source adds to the cookie.  Since it's a requirement that the cookie produced
be secret, the ability to guess a value used in the generation of the cookie
will allow it to be canceled out of the equation.  This is true due to the
simple fact that each of the values used to generate the cookie is XOR'd with
each other value (XOR is a commutative operation).  The ability to guess
multiple values can make it possible to seriously impact the overall integrity
of the cookie.

While the sources used in the generation of the cookie have long been regarded
as satisfactory, the author has found that the majority of the sources
actually contribute little to no value toward the overall entropy of the
cookie.  However, this is currently only true if an attacker has local access
to the machine.  Being able to know a GS cookie that was used in a privileged
process would make it possible to exploit a local privilege escalation
vulnerability, for example.  There may be some circumstances where the
techniques described in this section could be applied remotely, but for the
purpose of this document, only the local scenario will be considered.  The
following subsections will outline methods that can be used to calculate or
deterministically find the specific values that were used when a cookie was
being generated in a particular process context.  As a result of this
analysis, it's become clear that the only particular variable source of true
entropy for the GS cookie is the low 17 bits of the performance counter.  All
other sources can be reliably calculated, with some margin of error.

For the following subsections, a modified executable named vulnapp.exe was
used to extract the information that was used at the time that a process
executable's GS cookie was generated.  In particular, __security_init_cookie was
modified to jump into a function that saves the information used to generate
the cookie.  The implementation of this function is shown below for those who
are curious:

//
// The FramePointer is the value of EBP in the context of the
// __security_init_cookie routine.  The cookie is the actual,
// resultant cookie value.  GSContext is a global array.
//
VOID DumpInformation(
   PULONG FramePointer,
   ULONG  Cookie)
{
   GSContext[0] = FramePointer[-3];
   GSContext[1] = FramePointer[-4];
   GSContext[2] = FramePointer[-1];
   GSContext[3] = FramePointer[-2];
   GSContext[4] = GetCurrentProcessId();
   GSContext[5] = GetCurrentThreadId();
   GSContext[6] = GetTickCount();
   GSContext[7] = Cookie;
}

4.1.1) System Time

System time is a value that one might regard as challenging to recover.  After
all, it seems impossible to get the 100 nanosecond granularity of the system
time that was retrieved when a cookie was being generated.  Quite the
contrary, actually.  There are a few key points that go into being able to
recover the system time.  First, it's a fact that even though the system time
measures granularity in terms of 100 nanosecond intervals, it's really only
updated every 15.625 milliseconds (or 10.1 milliseconds for more modern CPUs).
To many, 15.625 may seem like an odd number, but for those familiar with the
Windows thread scheduler, it can be recognized as the period of the timer
interrupt.  For that reason, the current system time is only updated as a
result of the timer interrupt firing.  This fact means that the alignment of
the system time that is used when a cookie is generated is known.

Of more interest, though, is the relationship between the system time value
and the creation time value associated with a process or its initial thread.
Since the minimum granularity of the system time is 15.6 or 10.1 milliseconds,
it follows that the granularity of the thread creation time will be the same.
In terms of modern CPUs, 15.6 milliseconds is an eternity and is plenty long
for the processor to execute all instructions from the creation of the thread
to the generation of the security cookie.  This fact means that it's
possible to assume that the creation time of a process or thread is the
same as the system time that was used when the cookie was generated.  This
assumption doesn't always work, though, and there are indeed cases where
the creation time will not equal the system time that was used.  These
situations are usually a result of the thread that creates the cookie not
being immediately scheduled.

Even if this is the case, it would be necessary to be able to obtain the
creation time of an arbitrary process or thread.  On the surface, this would
seem impossible because task manager prevents a non-privileged user from
getting the start time of a privileged process.

This is all a deception, though, because there does exist functionality that
is exposed to non-privileged users that can be used to get this information.
One way of getting it is through the use of the native API routine
NtQuerySystemInformation.  In this case, the
SystemProcessesAndThreadsInformation system information class is used to query
information about all of the running processes on the system.  This
information includes the process name, process creation time, and the creation
time for each thread in each process.  While this information class has been
removed in Windows Vista, there are still potential ways of obtaining the
creation time information.  For example, an attacker could simply crash the
vulnerable service once (assuming it's not a critical service) and then wait
for it to respawn.  Once it respawns, the creation time can be inferred based
on the restart delay of the service.  Granted, service restarts are limited
to three times per day in Vista, but crashing it once should cause no major
issues.

Using NtQuerySystemInformation, it's possible to collect some data that can be
used to determine the likelihood that the creation time of a thread will be
equal to the system time that was used when a GS cookie was generated.  To
test this, the author used the modified vulnapp.exe executable to extract the
system time at the time that the cookie was generated.  Following that, a
separate program was used to collect the creation time information of the
process in question using the native API.  The initial thread's creation time
was then compared with the system time to see if they were equal.  The
creation time and system time were often equal in a sample of 742 cookies.

Obviously, the data set describing differences is only relevant to a
particular system load.  If there are many threads waiting to run during the
time that a process is executed, then it is unlikely that the system time will
equal the process creation time.  In a desktop environment, it's probably safe
to assume that the thread will run immediately, but more conclusive evidence
may be necessary.

Given these facts, it is apparent that the complete 64-bit system time value
can be recovered more often than not with a great degree of accuracy just by
simply assuming that thread creation time is the same as the system time
value.

4.1.2) Process and Thread Identifier

The process and thread identifier are arguably the worst sources of entropy
for the GS cookie, at least in the context of a local attack.  The two high
order bytes of the process and thread identifiers are almost always zero.
This means they have absolutely no effect on the high order entropy.
Additionally, the process and thread identifier can be determined with 100
percent accuracy in a local context using the same API described in the
previous section on getting the system time.  This involves making use of
the NtQuerySystemInformation native API with the
SystemProcessesAndThreadsInformation system information class to get the
process identifier and thread identifier associated with a given process
executable.

The end result, obviously, is that the process and thread identifier can be
determined with great accuracy.  The one exception to this rule would be
Windows Vista, but, as was mentioned before, alternative methods of obtaining
the process and thread identifier may exist.

4.1.3) Tick Count

The tick count is, for all intents and purposes, simply another measure of
time.  When the GetTickCount API routine is called, the number of ticks is
multiplied by the tick count multiplier.  This multiplication effectively
translates the number of ticks to the number of milliseconds that the system
has been up.  If one can safely assume that the that the system time used to
generate the cookie was the same as the thread creation time, then the tick
count at the time that the cookie was generated can simply be calculated using
the thread creation time.  The creation time isn't enough, though.  Since the
GetTickCount value measures the number of milliseconds that have occurred
since boot, the actual uptime of the system has to be determined.

To determine the system uptime, a non-privileged user can again make use of
the NtQuerySystemInformation native API, this time with the
SystemTimeOfDayInformation system information class.  This query returns the
time that the system was booted as a 64-bit integer measured in 100 nanosecond
intervals, just like the thread creation time.  To calculate the system uptime
in milliseconds, it's as simple as subtracting the boot time from the creation
time and then dividing by 10000 to convert from 100 nanosecond intervals to 1
millisecond intervals:

EstTickCount = (CreationTime - BootTime) / 10000

Some experimentation shows that this calculation is pretty accurate, but some
quantity is lost in translation.  From what the author has observed, a
constant scaling factor of 0x4e, or 78 milliseconds, needs to be added to the
result of this calculation.  The source of this constant is as of yet unknown,
but it appears to be a required constant.  This results in the actual equation
being:

EstTickCount = [(CreationTime - BootTime) / 10000] + 78

The end result is that the tick count can be calculated with a great degree of
accuracy.  If the system time calculation is off, then that will directly
affect the calculation of the tick count.

4.1.4) Performance Counter

Of the four entropy sources discussed so far, the performance counter is the
only one that really presents a challenge.  The purpose of the performance
counter is to describe the total number of cycles that have executed.  On the
outside, the performance counter would seem impossible to reliably determine.
After all, how could one possibly determine the precise number of cycles that
had occurred as a cookie was being generated?  The answer, of course, comes
down to the fact that the performance counter itself is, for all intents and
purposes, just another measure of time.  Windows provides two interesting
user-mode APIs that deal with the performance counter.  The first,
QueryPerformanceCounter, is used to ask the kernel to read the current value
of the performance counter[8].  The result of this query is stored in the 64-bit
output parameter that the caller provides.  The second API is
QueryPerformanceFrequency.  This routine is interesting because it returns a
value that describes the amount that the performance counter will change in
one second[7].  Documentation indicates that the frequency cannot change while
the system is booted.

Using the existing knowledge about the uptime of the system and the
calculation that can be performed to convert between the performance counter
value and seconds, it is possible to fairly accurately guess what the
performance counter was at the time that the cookie was generated.  Granted,
this method is more fuzzy than the previously described methods, as
experimental results have shown a large degree of fluctuation in the lower 17
bits.  Those results will be discussed in more detail in chapter .  The actual
equation that can be used to generate the estimated performance counter is to
take the uptime, as measured in 100 nanosecond intervals, and multiply it by
the performance frequency divided by 10000000, which converts the frequency
from a measure of 1 second to 100 nanosecond:

EstPerfCounter = UpTime x (PerfFreq / 10000000)

In a fashion similar to tick count, a constant scaling factor of -165000 was
determined through experimentation.  This seems to produce more accurate
results in some of the 24 low bits.  Based on this calculation, it's possible
to accurately determine the entire 32-bit high order integer and the first 15
bits of the 32-bit low order integer.  Of course, if the system time estimate
is wrong, then that directly effects this calculation.

4.1.5) Frame Pointer

While the frame pointer does not influence an image file's global cookie, it
does influence a stack frame's version of the cookie.  For that reason, the
frame pointer must be considered as an overall contributor to the effective
entropy of the cookie.  With the exception of Windows Vista, the frame pointer
should be a deterministic value that could be deduced at the time that a
vulnerability is triggered.  As such, the frame pointer should be considered a
known value for the majority of stack-based buffer overflows.  Granted, in
multi-threaded applications, it may be more challenging to accurately guess
the value of the frame pointer.

In the Windows Vista environment, the compile-time GS implementation gets a
boost in security due to the introduction of ASLR.  This helps to ensure that
the frame pointer is actually an unknown quantity.  However, it doesn't
introduce equal entropy in all bits.  In particular, octet 4, and potentially
octet 3, may have predictable values due to the way that the randomization is
applied to dynamic memory allocations.  In order to prevent fragmentation of
the address space, Vista's ASLR implementation attempts to ensure that stack
regions are still allocated low in the address space.  This has the side
effect of ensuring that a non-trivial number of bits in the frame pointer will
be predictable.  Additionally, while Vista's ASLR implementation makes an
effort to shift the lower bits of the stack pointer, there may still be some
bits that are always predictable in octet 2.

4.2) Predictability of Entropy Sources in Boot Start Services

A second attack that could be used against GS involves attacking services that
start early on when the system is booted.  These services may experience more
predictable states of entropy due to the fact that the amount of time it takes
to boot up and the order in which tasks are performed is fairly, though not
entirely, consistent.  This insight may make it possible to estimate the value
of entropy sources remotely.

To better understand this type of attack, the author collected 742 samples
that were taken from a custom service that was set to automatically start
during boot on a Windows XP SP2 installation.  This service was simply
designed to log the state used at the time that the GS cookie was being
generated.  While a sampling of the GS cookie state applied to lsass.exe would
have been more ideal, it wasn't worth the headache of having to patch a
critical system service.  Perhaps the reader may find it interesting to
collect this data on their own.  From the samples that were taken, the
following diagrams show the likelihood of each individual bit being set for
each of the different entropy sources.

Overall, there are a number of predictable bits in things like the high
32-bits of both the system time and the performance counter, the process
identifier, the thread identifier, and the tick count.  The sources that are
largely unpredictable are the low 32-bits of the system time and the
performance counter.  However, if it were possible to come up with a way to
discover the boot time (or uptime) of the system remotely, it might be
possible to infer a good portion of the low 32-bits of the system time.  This
would then directly impact the ability to estimate things like the tick count
and performance counters.

5) Experimental Results

This chapter describes some of the initial results that were collected using a
utility developed by the author named gencookie.exe.  This utility attempts to
calculate the value of the cookie that was generated for the executable image
associated with an arbitrary process, such as lsass.exe.  While the results of
this utility were limited to attempting to calculate the cookie of a process'
executable, the techniques described in previous chapters are nonetheless
applicable to the cookies generated in the context of dependent DLLs.  The
results described in this chapter illustrate the tool's ability to accurately
obtain specific bits within the different components that compose the cookie,
including specific bits of the cookie itself.  This helps to paint a picture
of the amount of true entropy that is reduced through the techniques described
in this document.

The data set that was used to calculate the overall results included 5001
samples which were collected from a single machine.  The samples were
collected through a few simple steps.  First, a program called vulnapp.exe
that was compiled with /GS was modified to have its __security_init_cookie routine
save information about the cookie that was generated and the values that
contributed to its generation.  Following that, the gencookie.exe utility was
launched against the running process in an attempt to calculate vulnapp.exe's
GS cookie.  A comparison between the expected and actual value of each
component was then saved.  These steps were repeated 5001 times.  The author
would be interested in hearing about independent validation of the findings
presented in this chapter.

The following sections describe the bit-level predictability of each of the
components that are used to generate the GS cookie, including the overall
predictability of the bits of the GS cookie itself.

5.1) System Time

The system time component was highly predictable.  The high 32-bit bits of the
system time were predicted with 100 percent accuracy.  The low 32-bit bits on
the other hand were predicted with only 77 percent accuracy (3878 times).  The
reason for this discrepancy has to do with the thread scheduling scenario
described in subsection .  Even still, these results indicate that it is
likely that the entire system time value can be accurately calculated.

5.2) Process and Thread Identifier

The process and thread identifier were successfully calculated 100 percent of
the time using the approach outlined in section .

5.3) Tick Count

The tick count was accurately calculated 67 percent of the time (3396 times).
The reason for this lower rate of success is due in large part to the fact
that the tick count is calculated in relation to the estimated system time
value.  As such, if an incorrect system time value is determined, the tick
count itself will be directly influenced.  This should account for at least 23
percent of the inaccuracies judging from how often the system time was
inaccurately estimated.  The remaining 10 percent of the inaccuracies is as of
yet undetermined, but it is most likely related to the an improper
interpretation of the constant scaling factor that is applied to the tick
count.  In any case, it is expected that only a few bits are actually affected
in the remaining 10 percent of cases.

5.4) Performance Counter

The high 32-bits of the performance counter were successfully estimated 100
percent of the time.  The low 32-bits, on the other hand, show the greatest
degree of volatility when compared to the other components.  The high order 15
bits of the low 32-bits show a bias in terms of accuracy that is not a 50/50
split.  The remaining 17 bits were all guessed correctly roughly 50 percent of
the time.  This makes the low 17 bits the only truly effective source of
entropy in the performance counter since there is no bias shown in relation to
the estimated versus actual values.  Indeed, this is not enough to prove that
there aren't observable patterns in the low 17 bits, but it is enough to show
that the gencookie.exe utility was not effective in estimating them.  Figures
and  show the percent accuracy for the high and low order 32-bits.

This discrepancy actually requires a more detailed explanation.  In reality,
the estimates made by the gencookie.exe utility are actually not as far off as
one might think based on the percent accuracy of each bit as described in the
diagrams.  Instead, the estimates are, on average, off by only 105,000.  This
average difference is what leads to the lower 17 bits being so volatile.  One
thing that's interesting about the difference between the estimated and actual
performance counter is that there appears to be a time oriented trend related
to how far off the estimates are.  Due to the way that the samples were taken,
it's safe to assume that each sample is roughly equivalent to one second worth
of time passing (due to a sleep between sample collection).  Further study of
this apparent relationship may yield better results in terms of estimating the
lower 17 bits of the low 32 bits of the performance counter.  This is left for
future research.

5.5) Cookie

The cookie itself was never actually guessed during the course of sample
collection.  The reason for this is tightly linked with the current inability
to accurately determine the lower 17 bits of the low 32 bits of the
performance counter.  Comparing the percent accuracy of the cookie bits with
the percent accuracy of the low 32 bits of the performance counter yields a
very close match.

6) Improvements

Based on the results described in chapter , the author feels that there is
plenty of room for improvement in the way that GS cookies are currently
generated.  It's clear that there is a need to ensure that there are 32 bits
of true entropy in the cookie.  The following sections outline some potential
solutions to the entropy issue described in this document.

6.1) Better Entropy Sources

Perhaps the most obvious solution would be to simply improve the set of
entropy sources used to generate the cookie.  In particular, the use of
sources with greater degrees of entropy, especially in the high order bits,
would be of great benefit.  The challenge, however, is locating sources that
are easy to interact with and require very little overhead.  For example, it's
not really feasible to have the GS cookie generator rely on the crypto API due
to the simple fact that this would introduce a dependency on the crypto API in
any application that was compiled with /GS.  As this document has hopefully
shown, it's also a requirement that any additional entropy sources be
challenging to estimate externally at a future point in time.

Even though this is a viable solution, the author is not presently aware of
any additional entropy sources that would meet all three requirements.  For
this reason, the author feels that this approach alone is insufficient to
solve the problem.  If entropy sources are found which meet these
requirements, the author would love to hear about them.

6.2) Seeding High Order Bits

A more immediate solution to the problem at hand would involve simply ensuring
that the predictable high order bits are seeded with less predictable values.
However, additional entropy sources would be required in order to implement
this properly.  At present, the only major source of entropy found in the GS
cookie is the low order bits of the performance counter.  It would not be
sufficient to simply shift the low order bits of the performance counter into
the high order.  Doing so would add absolutely no value by itself because it
would have no effect on the amount of true entropy in the cookie.

6.3) External Cookie Generation

An alternative solution that could combine the effects of the first two
solutions would be to change the GS implementation to generate the cookie
external to the binary itself.  One of the most dangerous aspects of the GS
implementation is that it is statically linked and therefore would require a
recompilation of all affected binaries in the event that a weakness is found.
This fact alone should be scary.  To help address both this problem and the
problem of weak entropy sources, it makes sense to consider a more dynamic
approach.

One example of a dynamic approach would be to have the GS implementation issue
a call into a kernel-mode routine that is responsible for generating GS
cookies.  One place that this support could be added is in
NtQuerySystemInformation, though it's likely that a better place may exist.
Regardless of the specific routine, this approach would have the benefit of
moving the code used to generate the cookie out of the statically linked stub
that is inserted by the compiler.  If any weakness were to be found in the
kernel-mode routine that generates the cookie, Microsoft could issue a patch
that would immediately affect all applications compiled to use GS.  This would
solve some of the concerns relating to the static nature of GS.

Perhaps even better, this approach would grant greater flexibility to the
entropy sources that could be used in the generation of the cookie.  Since the
routine would exist in kernel-mode, it would have the benefit of being able to
access additional sources of entropy that may be challenging or clumsy to
interact with from user-mode (though the counterpoint could certainly be made
as well).  The kernel-mode routine could also accumulate entropy over time and
feed that back into the cookie, whereas the statically linked implementation
has no context with which to accumulate entropy.  The accumulation of state
can also do more harm than good.  It would be disingenuous to not admit that
this approach could also have its own set of problems.  A poorly implemented
version of this solution might make it possible for a user to eliminate all
entropy by issuing a non-trivial number of calls to the kernel-mode routine.
There may be additional consequences that have not yet been perceived.

The impact on performance is also a big point of concern for any potential
change to the cookie generation path.  At a high-level, a transition into
kernel-mode would seem concerning in terms of the amount of overhead that
might be added.  However, it's important to note that the current
implementation of GS already transitions into kernel-mode to obtain some of
it's information.  Specifically, performance counter information is obtained
through the system call NtQueryPerformanceCounter.  Even more, this system
call results in an in operation on an I/O port that is used to query the
current performance counter.

Another important consideration is backward compatibility.  If Microsoft were
to implement this solution, it would be necessary for applications compiled
with the new support to still be able benefit from GS on older platforms that
don't support the new kernel interface.  To allow for backward compatibility,
Microsoft could implement a combination of all three solutions, whereby better
entropy sources and seeding of high order bits are used as a fallback in the
event that the kernel-mode interface is not present.

As it turns out, Microsoft does indeed have a mechanism that could allow them
to create a patch that would affect the majority of the binaries compiled with
recent versions of GS.  This functionality is provided by exposing the address
of an image file's security cookie in its the load config data directory.
When the dynamic loader (ntdll) loads an image file, it checks to see if the
security cookie address in the load config data directory is non-NULL.  If
it's not NULL, the loader proceeds to store the process-wide GS cookie in the
module-specific's GS cookie location.  In this way, the __security_init_cookie
routine that's called by the image file's entry point effectively becomes a
no-operation because the cookie will have already been initialized.  This
manner of setting the GS cookie for image files provides Microsoft with much
more flexibility.  Rather than having to update all binaries compiled with GS,
Microsoft can simply update a single binary (ntdll.dll) if improvements need
to be made to the cookie generation algorithm.  The following output shows a
sample of dumpbin /loadconfig on kernel32.dll:

Microsoft (R) COFF/PE Dumper Version 8.00.50727.42
Copyright (C) Microsoft Corporation.  All rights reserved.


Dump of file c:\windows\system32\kernel32.dll

File Type: DLL

  Section contains the following load config:

            00000048 size
                   0 time date stamp
...
            7C8836CC Security Cookie

7) Future Work

There is still additional work that can be done to further refine the
techniques described in this document.  This chapter outlines some of the
major items that could be followed up on.

7.1) Improving Performance Counter Estimates

One area in particular that the author feels could benefit from further
research has to do with refining the technique used to calculate the
performance counter.  A more thorough analysis of the apparent association
between time and the lower 17 bits of the performance counter is necessary.
This analysis would directly affect the ability to recover more cookie state
information, since the entropy of the lower 17 bits of the performance counter
is one of the only things standing in the way of obtaining the entire cookie.

7.2) Remote Attacks

The ability to apply the techniques described in this document in a remote
scenario would obviously increase the severity of the problem.  In order to do
this, an attacker would need the ability to either infer or be able to
calculate some of the key elements that are used in the generation of a
cookie.  This would rely on being able to determine things like the process
creation time, the process and thread identifier, and the system uptime.  With
these values, it should be possible to predict the state of the cookie with
similar degrees of accuracy.  Of course, methods of obtaining this information
remotely are not obvious.

One point of consideration that should be made is that even if it's not
possible to directly determine some of this information, it may be possible to
infer it.  For instance, consider a scenario where a vulnerability in a
service is exposed remotely.  There's nothing to stop an attacker from causing
the service to crash.  In most cases, the service will restart at some
predefined point (such as 30 seconds after the crash).  Using this approach,
an attacker could infer the creation time of the process based on the time
that the crash was generated.  This isn't fool proof, but it should be
possible to get fairly close.

Determining process and thread identifier could be tricky, especially if the
system has been up for some time.  The author is not aware of a general
purpose technique that could be used to determine this information remotely.
Fortunately, the process and thread identifier have very little effect on high
order bits.

The system uptime is an interesting one.  In the past, there have been
techniques that could be used to estimate the uptime of the system through the
use of TCP timestamps and other network protocol anomalies.  At the time of
this writing, the author is not aware of how prevalent or useful these
techniques are against modern operating systems.  Should they still be
effective, they would represent a particularly useful way of obtaining a
system's uptime.  If an attacker can obtain both the creation time of the
process and the uptime of the system, it's possible to calculate the tick
count and performance counter values with varying degrees of accuracy.

The performance counter will still pose a great challenge in the remote
scenario.  The reliance on the performance frequency shouldn't be seen as an
unknown quantity.  As far as the author is aware, the performance frequency on
modern processors is generally 3579545, though there may be certain power
situations that would cause it to be different.

It is also important to note that the current attack assumes that the load
time for an image that has a GS cookie is equivalent to the initial thread's
creation time.  For example, if a DLL were loaded much later in process
execution, such as through instantiating a COM object in Internet Explorer, it
would not be possible to assume that initial thread creation time is equal to
the system time that was obtained when the DLL's GS cookie was generated.
This brings about an interesting point for the remote scenario, however.  If
an attacker can control the time at which a DLL is loaded, it may be possible
for them to infer the value of system time that is used without even having to
directly query it.  One example of this would be in the context of internet
explorer, where the client's date and time functionality might be abused to
obtain this information.

8) Conclusion

The ability to reduce the amount of effective entropy in a GS cookie can
improve an attacker's chances of guessing the cookie.  This paper has
described two techniques that may be used to calculate or infer the values of
certain bits in a GS cookie.  The first approach involves a local attacker's
ability to collect information that makes it possible to calculate, with
pretty good accuracy, the values of the entropy sources that were used at the
time that a cookie was generated.  The second approach describes the potential
for abusing the limited entropy associated with boot start services.

While the results shown in this paper do not represent a complete break of GS,
they do hint toward a general weakness in the way that GS cookies are
generated.  This is particularly serious given the fact that GS is a compile
time solution.  If the techniques described in this document are refined, or
new and improved techniques are identified, a complete break of GS would
require the recompilation of all affected binaries.  The implications of this
should be obvious.  The ability to reliably predict the value of a GS cookie
would effectively nullify any benefits that GS adds.  It would mean that all
stack-based buffer overflows would immediately become exploitable.

To help contribute to the improvement of GS, a few different solutions were
described that could either partially or wholly address some of the weakness
that were identified.  The most interesting of these solutions involves
modifying the GS implementation to make use of a external cookie generator,
such as the kernel.  Going this route would ensure that any weaknesses found
in the cookie generation algorithm could be simply addressed through a patch
to the kernel.  This is much more reasonable than expecting all existing GS
enabled binaries to be recompiled.

It's unclear whether the techniques presented in this paper will have any
appreciable effect on future exploits.  Only time will tell.

References

[1] Cowan, Crispin et al.  StackGuard: Automatic Adaptive Detection and Prevention of Buffer-Overflow Attacks.
    http://www.usenix.org/publications/library/proceedings/sec98/full_papers/cowan/cowan_html/cowan.html; accessed 3/18/2007.

[2] Etoh, Hiroaki.  GCC extension for protecting applications from stack-smashing attacks.
    http://www.research.ibm.com/trl/projects/security/ssp/; accessed 3/18/2007.

[3] eEye.  Memory Retrieval Vulnerabilities.
    http://research.eeye.com/html/Papers/download/eeyeMRV-Oct2006.pdf; accessed 3/18/2007.

[4] Litchfield, David.  Defeating the Stack Based Buffer Overflow Prevention Mechanism of Microsoft Windows 2003 Server
    http://www.nextgenss.com/papers/defeating-w2k3-stack-protection.pdf; accessed 3/18/2007.

[5] Microsoft Corporation.  /GS (Buffer Security Check).
    http://msdn2.microsoft.com/en-us/library/8dbf701c(VS.80).aspx; accessed 3/18/2007.

[6] Microsoft Corporation.  /SAFESEH (Image has Safe Exception Handlers).
    http://msdn2.microsoft.com/en-us/library/9a89h429(VS.80).aspx; accessed 3/18/2007.

[7] Microsoft Corporation.  QueryPerformanceFrequency Function.
    http://msdn2.microsoft.com/en-us/library/ms644905.aspx; accessed 3/18/2007

[8] Microsoft Corporation.  QueryPerformanceCounter Function.
    http://msdn2.microsoft.com/en-us/library/ms644904.aspx; accessed 3/18/2007

[9] Ren, Chris et al.  Microsoft Compiler Flaw Technical Note
    http://www.cigital.com/news/index.php?pg=art&artid=70; accessed 3/18/2007.

[10] Whitehouse, Ollie.  Analysis of GS protections in Windows Vista
     http://www.symantec.com/avcenter/reference/GS_Protections_in_Vista.pdf; accessed 3/20/2007.