mirror of https://github.com/fdiskyou/Zines.git
687 lines
31 KiB
Plaintext
687 lines
31 KiB
Plaintext
Improving Automated Analysis of Windows x64 Binaries
|
|
April 2006
|
|
skape
|
|
mmiller@hick.org
|
|
|
|
|
|
1) Foreword
|
|
|
|
Abstract: As Windows x64 becomes a more prominent platform, it will
|
|
become necessary to develop techniques that improve the binary analysis
|
|
process. In particular, automated techniques that can be performed
|
|
prior to doing code or data flow analysis can be useful in getting a
|
|
better understanding for how a binary operates. To that point, this
|
|
paper gives a brief explanation of some of the changes that have been
|
|
made to support Windows x64 binaries. From there, a few basic
|
|
techniques are illustrated that can be used to improve the process of
|
|
identifying functions, annotating their stack frames, and describing
|
|
their exception handler relationships. Source code to an example IDA
|
|
plugin is also included that shows how these techniques can be
|
|
implemented.
|
|
|
|
Thanks: The author would like to thank bugcheck, sh0k, jt, spoonm, and
|
|
Skywing.
|
|
|
|
Update: The article in MSDN magazine by Matt Pietrek was
|
|
published after this article was written. However, it contains a
|
|
lot of useful information and touches on many of the same topics
|
|
that this article covers in the background chapter. The article can
|
|
be found here:
|
|
http://msdn.microsoft.com/msdnmag/issues/06/05/x64/default.aspx.
|
|
|
|
With that, on with the show
|
|
|
|
|
|
2) Introduction
|
|
|
|
The demand for techniques that can be used to improve the analysis
|
|
process of Windows x64 binaries will only increase as the Windows x64
|
|
platform becomes more accepted and used in the market place. There is a
|
|
deluge of useful information surrounding techniques that can be used to
|
|
perform code and data flow analysis that is also applicable to the x64
|
|
architecture. However, techniques that can be used to better annotate
|
|
and streamline the initial analysis phases, such as identifying
|
|
functions and describing their stack frames, is still a ripe area for
|
|
improvement at the time of this writing. For that reason, this paper
|
|
will start by describing some of the changes that have been made to
|
|
support Windows x64 binaries. This background information is useful
|
|
because it serves as a basis for understanding a few basic techniques
|
|
that may be used to improve some of the initial analysis phases. During
|
|
the course of this paper, the term Windows x64 binary will simply be
|
|
reduced to x64 binary in the interest of brevity.
|
|
|
|
|
|
3) Background
|
|
|
|
Prior to diving into some of the analysis techniques that can be
|
|
performed on x64 binaries, it's first necessary to learn a bit about
|
|
some of the changes that were made to support the x64 architecture.
|
|
This chapter will give a very brief explanation of some of the things
|
|
that have been introduced, but will by no means attempt to act as an
|
|
authoritative reference.
|
|
|
|
|
|
3.1) PE32+ Image File Format
|
|
|
|
The image file format for the x64 platform is known as PE32+. As one
|
|
would expect, the file format is derived from the PE file format with
|
|
only very slight modifications. For instance, 64-bit binaries contain
|
|
an IMAGE_OPTIONAL_HEADER64 rather than an IMAGE_OPTIONAL_HEADER. The
|
|
differences between these two structures are described in the table
|
|
below:
|
|
|
|
Field | PE | PE32+
|
|
-------------------+-------+------------------------------
|
|
BaseOfData | ULONG | Removed from structure
|
|
ImageBase | ULONG | ULONGLONG
|
|
SizeOfStackReserve | ULONG | ULONGLONG
|
|
SizeOfStackCommit | ULONG | ULONGLONG
|
|
SizeOfHeapReserve | ULONG | ULONGLONG
|
|
SizeOfHeapCommit | ULONG | ULONGLONG
|
|
-------------------+-------+------------------------------
|
|
|
|
In general, any structure attribute in the PE image that made reference
|
|
to a 32-bit virtual address directly rather than through an RVA (Relative
|
|
Virtual Address) has been expanded to a 64-bit attribute in PE32+. Other
|
|
examples of this include the IMAGE_TLS_DIRECTORY structure and the
|
|
IMAGE_LOAD_CONFIG_DIRECTORY structure.
|
|
|
|
With the exception of certain field offsets in specific structures,
|
|
the PE32+ image file format is largely backward compatible with PE
|
|
both in use and in form.
|
|
|
|
|
|
3.2) Calling Convention
|
|
|
|
The calling convention used on x64 is much simpler than those used for
|
|
x86. Unlike x86, where calling conventions like stdcall, cdecl, and
|
|
fastcall are found, the x64 platform has only one calling convention.
|
|
The calling convention that it uses is a derivative of fastcall where
|
|
the first four parameters of a function are passed by register and any
|
|
remaining parameters are passed through the stack. Each parameter is 64
|
|
bits wide (8 bytes). The first four parameters are passed through the
|
|
RCX, RDX, R8, and R9 registers, respectively. For scenarios where
|
|
parameters are passed by value or are otherwise too large to fit into
|
|
one of the 64-bit registers, appropriate steps are taken as documented
|
|
in [4].
|
|
|
|
|
|
3.2.1) Stack Frame Layout
|
|
|
|
The stack frame layout for functions on x64 is very similar to x86, but
|
|
with a few key differences. Just like x86, the stack frame on x64 is
|
|
divided into three parts: parameters, return address, and locals. These
|
|
three parts are explained individually below. One of the important
|
|
principals to understand when it comes to x64 stack frames is that the
|
|
stack does not fluctuate throughout the course of a given function. In
|
|
fact, the stack pointer is only permitted to change in the context of a
|
|
function prologue. Note that things like alloca are handled in a special
|
|
manner[7]. Parameters are not pushed and popped from the stack. Instead,
|
|
stack space is pre-allocated for all of the arguments that would be
|
|
passed to child functions. This is done, in part, for making it easier
|
|
to unwind call stacks in the event of an exception. The table below
|
|
describes a typical stack frame:
|
|
|
|
|
|
+-------------------------+
|
|
| Stack parameter area |
|
|
+-------------------------+
|
|
| Register parameter area |
|
|
+-------------------------+
|
|
| Return address |
|
|
+-------------------------+
|
|
| Locals |
|
|
+-------------------------+
|
|
|
|
|
|
== Parameters
|
|
|
|
|
|
The calling convention for functions on x64 dictates that the first four
|
|
parameters are passed via register with any remaining parameters,
|
|
starting with parameter five, spilling to the stack. Given that the
|
|
fifth parameter is the first parameter passed by the stack, one would
|
|
think that the fifth parameter would be the value immediately adjacent
|
|
to the return address on the stack, but this is not the case. Instead,
|
|
if a given function calls other functions, that function is required to
|
|
allocate stack space for the parameters that are passed by register.
|
|
This has the affect of making it such that the area of the stack
|
|
immediately adjacent to the return address is 0x20 bytes of
|
|
uninitialized storage for the parameters passed by register followed
|
|
immediately by any parameters that spill to the stack (starting with
|
|
parameter five). The area of storage allocated on the stack for the
|
|
register parameters is known as the register parameter area whereas the
|
|
area of the stack for parameters that spill onto the stack is known as
|
|
the stack parameter area. The table below illustrates what the
|
|
parameter portion of a stack frame would look like after making a call
|
|
to a function:
|
|
|
|
+-------------------------+
|
|
| Parameter 6 |
|
|
+-------------------------+
|
|
| Parameter 5 |
|
|
+-------------------------+
|
|
| Parameter 4 (R9 Home) |
|
|
+-------------------------+
|
|
| Parameter 3 (R8 Home) |
|
|
+-------------------------+
|
|
| Parameter 2 (RDX Home) |
|
|
+-------------------------+
|
|
| Parameter 1 (RCX Home) |
|
|
+-------------------------+
|
|
| Return address |
|
|
+-------------------------+
|
|
|
|
|
|
To emphasize further, the register parameter area is always allocated,
|
|
even if the function being called has fewer than four arguments. This
|
|
area of the stack is effectively owned by the called function, and as
|
|
such can be used for volatile storage during the course of the function
|
|
call. In particular, this area is commonly used to persist the values
|
|
of register parameters. This area is also referred to as the ``home''
|
|
address for register parameters. However, it can also be used to save
|
|
non-volatile registers. To someone familiar with x86 it may seem
|
|
slightly odd to see functions modifying areas of the stack beyond the
|
|
return address. The key is to remember that the 0x20 bytes immediately
|
|
adjacent to the return address are owned by the called function. One
|
|
important side affect of this requirement is that if a function calls
|
|
other functions, the calling function's minimum stack allocation will be
|
|
0x20 bytes. This accounts for the register parameter area that will be
|
|
used by called functions.
|
|
|
|
The obvious question to ask at this point is why it's the caller's
|
|
responsibility to allocate stack space for use by the called function.
|
|
There are a few different reasons for this. Perhaps most importantly,
|
|
it makes it possible for the called function to take the address of a
|
|
parameter that's passed via a register. Furthermore, the address that
|
|
is returned for the parameter must be at a location that is contiguous
|
|
in relation to the other parameters. This is particularly necessary for
|
|
variadic functions, which require a contiguous list of parameters, but
|
|
may also be necessary for applications that make assumptions about being
|
|
able to reference parameters in relation to one another by address.
|
|
Invalidating this assumption would introduce source compatibility
|
|
problems.
|
|
|
|
For more information on parameter passing, refer to the MSDN
|
|
documentation[4,7].
|
|
|
|
== Return Address
|
|
|
|
Due to the fact that pointers are 64 bits wide on x64, the return
|
|
address location on the stack is eight bytes instead of four.
|
|
|
|
== Locals
|
|
|
|
The locals portion of a function's stack frame encompasses both local
|
|
variables and saved non-volatile registers. For x64, the general
|
|
purpose registers described as non-volatile are RBP, RBX, RDI, RSI, and
|
|
R12 through R15[5].
|
|
|
|
|
|
3.3) Exception Handling on x64
|
|
|
|
On x86, exception handling is accomplished through the adding and
|
|
removing of exception registration records on a per-thread basis. When
|
|
a function is entered that makes use of an exception handler, it
|
|
constructs an exception registration record on the stack that is
|
|
composed of an exception handler (a function pointer), and a pointer to
|
|
the next element in the exception handler list. This list of exception
|
|
registration records is stored relative to fs:[0]. When an exception
|
|
occurs, the exception dispatcher walks the list of exception handlers
|
|
and calls each one, checking to see if they are capable of handling the
|
|
exception that occurred. While this approach works perfectly fine,
|
|
Microsoft realized that there were better ways to go about it. First of
|
|
all, the adding and removing of exception registration records that are
|
|
static in the context of an execution path adds needless execution
|
|
overhead. Secondly, the security implications of storing a function
|
|
pointer on the stack have been made very obvious, especially in the case
|
|
where that function pointer can be called after an exception is
|
|
generated (such as an access violation). Finally, the process of
|
|
unwinding call frames is muddled with limitations, thus making it a more
|
|
complicated process than it might otherwise need to be[6].
|
|
|
|
With these things in mind, Microsoft completely revamped the way
|
|
exception handling is accomplished on x64. The major changes center
|
|
around the approaches Microsoft has taken to solve the three major
|
|
deficiencies found on x86. First, Microsoft solved the execution time
|
|
overhead issue of adding and removing exception handlers by moving all
|
|
of the static exception handling information into a static location in
|
|
the binary. This location, known as the .pdata section, is described by
|
|
the PE32+'s Exception Directory. The structure of this section will be
|
|
described in the exception directory subsection. By eliminating the
|
|
need to add and remove exception handlers on the fly, Microsoft has also
|
|
eliminated the security issue found on x86 with regard to overwriting
|
|
the function pointer of an exception handler. Perhaps most importantly,
|
|
the process involved in unwinding call frames has been drastically
|
|
improved through the formalization of the frame unwinding process. This
|
|
will be discussed in the subsection on unwind information.
|
|
|
|
|
|
3.3.1) Exception Directory
|
|
|
|
The Exception Directory of a PE32+ binary is used to convey the complete
|
|
list of functions that could be found in a stack frame during an unwind
|
|
operation. These functions are known as non-leaf functions, and they
|
|
are qualified as such if they either allocate space on the stack or call
|
|
other functions. The IMAGE_RUNTIME_FUNCTION_ENTRY data structure is used
|
|
to describe the non-leaf functions, as shown below[1]:
|
|
|
|
typedef struct _IMAGE_RUNTIME_FUNCTION_ENTRY {
|
|
ULONG BeginAddress;
|
|
ULONG EndAddress;
|
|
ULONG UnwindInfoAddress;
|
|
} _IMAGE_RUNTIME_FUNCTION_ENTRY, *_PIMAGE_RUNTIME_FUNCTION_ENTRY;
|
|
|
|
The BeginAddress and EndAddress attributes are RVAs that represent the
|
|
range of the non-leaf function. The UnwindInfoAddress will be discussed
|
|
in more detail in the following subsection on unwind information. The
|
|
Exception directory itself is merely an array of
|
|
IMAGE_RUNTIME_FUNCTION_ENTRY structures. When an exception occurs, the
|
|
exception dispatcher will enumerate the array of runtime function
|
|
entries until it finds the non-leaf function associated with the address
|
|
it's searching for (typically a return address).
|
|
|
|
|
|
3.3.2) Unwind Information
|
|
|
|
For the purpose of unwinding call frames and dispatching exceptions,
|
|
each non-leaf function has some non-zero amount of unwind information
|
|
associated with it. This association is made through the
|
|
UnwindInfoAddress attribute of the IMAGE_RUNTIME_FUNCTION_ENTRY
|
|
structure. The UnwindInfoAddress itself is an RVA that points to an
|
|
UNWIND_INFO structure which is defined as[8]:
|
|
|
|
typedef struct _UNWIND_INFO {
|
|
UBYTE Version : 3;
|
|
UBYTE Flags : 5;
|
|
UBYTE SizeOfProlog;
|
|
UBYTE CountOfCodes;
|
|
UBYTE FrameRegister : 4;
|
|
UBYTE FrameOffset : 4;
|
|
UNWIND_CODE UnwindCode[1];
|
|
/* UNWIND_CODE MoreUnwindCode[((CountOfCodes + 1) & ~1) - 1];
|
|
* union {
|
|
* OPTIONAL ULONG ExceptionHandler;
|
|
* OPTIONAL ULONG FunctionEntry;
|
|
* };
|
|
* OPTIONAL ULONG ExceptionData[]; */
|
|
} UNWIND_INFO, *PUNWIND_INFO;
|
|
|
|
This structure, at a very high level, describes a non-leaf function in
|
|
terms of its prologue size and frame register usage. Furthermore, it
|
|
describes the way in which the stack is set up when the prologue for
|
|
this non-leaf function is executed. This is provided through an array
|
|
of codes as accessed through the UnwindCode array. This array is
|
|
composed of UNWIND_CODE structures which are defined as[8]:
|
|
|
|
typedef union _UNWIND_CODE {
|
|
struct {
|
|
UBYTE CodeOffset;
|
|
UBYTE UnwindOp : 4;
|
|
UBYTE OpInfo : 4;
|
|
};
|
|
USHORT FrameOffset;
|
|
} UNWIND_CODE, *PUNWIND_CODE;
|
|
|
|
In order to properly unwind a frame, the exception dispatcher needs to
|
|
be aware of the amount of stack space allocated in that frame, the
|
|
locations of saved non-volatile registers, and anything else that has to
|
|
do with the stack. This information is necessary in order to be able to
|
|
restore the caller's stack frame when an unwind operation occurs. By
|
|
having the compiler keep track of this information at link time, it's
|
|
possible to emulate the unwind process by inverting the operations
|
|
described in the unwind code array for a given non-leaf function.
|
|
|
|
Aside from conveying stack frame set up, the UNWIND_INFO structure may
|
|
also describe exception handling information, such as the exception
|
|
handler that is to be called if an exception occurs. This information
|
|
is conveyed through the ExceptionHandler and ExceptionData attributes of
|
|
the structure which exist only if the UNW_FLAGE_HANDLER flag is set in the
|
|
Flags field.
|
|
|
|
For more details on the format and use of these structures for unwinding
|
|
as well as a complete description of the unwind process, please refer to
|
|
the MSDN documentation[2].
|
|
|
|
|
|
4) Analysis Techniques
|
|
|
|
In order to improve the analysis of x64 binaries, it is important to try
|
|
to identify techniques that can aide in the identification or extraction
|
|
of useful information from the binary in an automated fashion. This
|
|
chapter will focus on a handful of simple techniques that can be used to
|
|
better annotate or describe the behavior of an x64 binary. These
|
|
techniques intentionally do not cover the analysis of code or data flow
|
|
operations. Such techniques are outside of the scope of this paper.
|
|
|
|
|
|
4.1) Exception Directory Enumeration
|
|
|
|
Given the explanation of the Exception Directory found within PE32+
|
|
images and its application to the exception dispatching process, it can
|
|
be seen that x64 binaries have a lot of useful meta-information stored
|
|
within them. Given that this information is just sitting there waiting
|
|
to be used, it makes sense to try to take advantage of it in ways that
|
|
make it possible to better annotate or understand an x64 binary. The
|
|
following subsections will describe different things that can be
|
|
discovered by digging deeper into the contents of the exception
|
|
directory.
|
|
|
|
|
|
4.1.1) Functions
|
|
|
|
One of the most obvious uses for the information stored in the exception
|
|
directory is that it can be used to discover all of the non-leaf
|
|
functions in a binary. This is cool because it works regardless of
|
|
whether or not you actually have symbols for the binary, thus providing
|
|
an easy technique for identifying the majority of the functions in a
|
|
binary. The process taken to do this is to simply enumerate the array
|
|
of IMAGE_RUNTIME_FUNCTION_ENTRY structures stored within the exception
|
|
directory. The BeginAddress attribute of each entry marks the starting
|
|
point of a non-leaf function. There's a catch, though. Not all of the
|
|
runtime function entries are actually associated with the entry point of
|
|
a function. The fact of the matter is that entries can also be
|
|
associated with various portions of an actual function where stack
|
|
modifications are deferred until necessary. In these cases, the unwind
|
|
information associated with the runtime function entry is chained with
|
|
another runtime function entry.
|
|
|
|
The chaining of runtime function entries is documented as being
|
|
indicated through the UNW_FLAG_CHAININFO flag in the Flags attribute of
|
|
the UNWIND_INFO structure. If this flag is set, the area of memory
|
|
immediately following the last UNWIND_CODE in the UNWIND_INFO structure
|
|
is an IMAGE_RUNTIME_FUNCTION_ENTRY structure. The UnwindInfoAddress of
|
|
this structure indicates the chained unwind information. Aside from
|
|
this, chaining can also be indicated through an undocumented flag that
|
|
is stored in the least-significant bit of the UnwindInfoAddress. If the
|
|
least-significant bit is set, then it is implied that the runtime
|
|
function entry is directly chained to the IMAGE_RUNTIME_FUNCTION_ENTRY
|
|
structure that is found at the RVA conveyed by the UnwindInfoAddress
|
|
attribute with the least significant bit masked off. The reason
|
|
chaining can be indicated in this fashion is because it is a requirement
|
|
that unwind information be four byte aligned.
|
|
|
|
With chaining in mind, it is safe to assume that a runtime function
|
|
entry is associated with the entry point of a function if its unwind
|
|
information is not chained. This makes it possible to deterministically
|
|
identify the entry point of all of the non-leaf functions. From there,
|
|
it should be possible to identify all of the leaf functions through
|
|
calls that are made to them by non-leaf functions. This requires code
|
|
flow analysis, though.
|
|
|
|
|
|
4.1.2) Stack Frame Annotation
|
|
|
|
The unwind information associated with each non-leaf function
|
|
contains lots of useful meta-information about the structure of the
|
|
stack. It provides information about the amount of stack space
|
|
allocated, the location of saved non-volatile registers, and whether or
|
|
not a frame register is used and what relation it has to the rest of the
|
|
stack. This information is also described in terms of the location of
|
|
the instruction that actually performs the operation associated with the
|
|
task. Take the following unwind information obtained through dumpbin
|
|
/unwindinfo as an example:
|
|
|
|
|
|
0000060C 00006E50 00006FF0 000081FC _resetstkoflw
|
|
Unwind version: 1
|
|
Unwind flags: None
|
|
Size of prologue: 0x47
|
|
Count of codes: 18
|
|
Frame register: rbp
|
|
Frame offset: 0x20
|
|
Unwind codes:
|
|
3C: SAVE_NONVOL, register=r15 offset=0x98
|
|
38: SAVE_NONVOL, register=r14 offset=0xA0
|
|
31: SAVE_NONVOL, register=r13 offset=0xA8
|
|
2A: SAVE_NONVOL, register=r12 offset=0xD8
|
|
23: SAVE_NONVOL, register=rdi offset=0xD0
|
|
1C: SAVE_NONVOL, register=rsi offset=0xC8
|
|
15: SAVE_NONVOL, register=rbx offset=0xC0
|
|
0E: SET_FPREG, register=rbp, offset=0x20
|
|
09: ALLOC_LARGE, size=0xB0
|
|
02: PUSH_NONVOL, register=rbp
|
|
|
|
|
|
First and foremost, one can immediately see that the size of the
|
|
prologue used in the resetstkoflw function is 0x47 bytes. This prologue
|
|
accounts for all of the operations described in the unwind codes array.
|
|
Furthermore, one can also tell that the function uses a frame pointer,
|
|
as conveyed through rbp, and that the frame pointer offset is 0x20 bytes
|
|
relative to the current stack pointer at the time the frame pointer
|
|
register is established.
|
|
|
|
As one would expect with an unwind operation, the unwind codes
|
|
themselves are stored in the opposite order of which they are executed.
|
|
This is necessary because of the effect on the stack each unwind code
|
|
can have. If they are processed in the wrong order, then the unwind
|
|
operation will get invalid data. For example, the value obtained
|
|
through a pop rbp instruction will differ depending on whether or not it
|
|
is done before or after an add rsp, 0xb0.
|
|
|
|
For the purposes of annotation, however, the important thing to keep in
|
|
mind is how all of the useful information can be extracted. In this
|
|
case, it is possible to take all of the information the unwind codes
|
|
provide and break it down into a definition of the stack frame layout
|
|
for a function. This can be accomplished by processing the unwind codes
|
|
in the order that they would be executed rather than the order that they
|
|
appear in the array. There's one important thing to keep in mind when
|
|
doing this. Since unwind information can be chained, it is a
|
|
requirement that the full chain of unwind codes be processed in
|
|
execution order. This can be accomplished by walking the chain of
|
|
unwind information and building an execution order list of all of the
|
|
unwind codes.
|
|
|
|
Once the execution order list of unwind codes is collected, the next
|
|
step is to simply enumerate each code, checking to see what operation it
|
|
performs and building out the stack frame across each iteration. Prior
|
|
to enumerating each code, the state of the stack pointer should be
|
|
initialized to 0 to indicate an empty stack frame. As data is allocated
|
|
on the stack, the stack pointer should be adjusted by the appropriate
|
|
amount. The actions that need to be taken for each unwind operation
|
|
that directly effect the stack pointer are described below.
|
|
|
|
1. UWOP_PUSH_NONVOL
|
|
|
|
When a non-volatile register is pushed onto the stack, such as
|
|
through a push rbp, the current stack pointer needs to be
|
|
decremented by 8 bytes.
|
|
|
|
2. UWOP_ALLOC_LARGE and UWOP_ALLOC_SMALL
|
|
|
|
When stack space is allocated, the current stack pointer needs to
|
|
be adjusted by the amount indicated.
|
|
|
|
3. UWOP_SET_FPREG
|
|
|
|
When a frame pointer is defined, its offset relative to the base of
|
|
the stack should be saved using the current value of the stack
|
|
pointer.
|
|
|
|
|
|
As the enumeration unwind codes occurs, it is also possible to annotate
|
|
the different locations on the stack where non-volatile registers are
|
|
preserved. For instance, given the example unwind information above, it
|
|
is known that the R15 register is preserved at [rsp + 0x98]. Therefore,
|
|
we can annotate this location as [rsp + SavedR15].
|
|
|
|
Beyond annotating preserved register locations on the stack, we can also
|
|
annotate the instructions that perform operations that effect the stack.
|
|
For instance, when a non-volatile register is pushed, such as through
|
|
push rbp, we can annotate the instruction that performs that operation
|
|
as preserving rbp on the stack. The location of the instruction that's
|
|
associated with the operation can be determined by taking the
|
|
BeginAddress associated with the unwind information and adding it to the
|
|
CodeOffset attribute of the UNWIND_CODE that is being processed. It is
|
|
important to note, however, that the CodeOffset attribute actually
|
|
points to the first byte of the instruction immediately following the
|
|
one that performs the actual operation, so it is necessary to back track
|
|
in order to determine the start of the instruction that actually
|
|
performs the operation.
|
|
|
|
As a result of this analysis, one can take the prologue of the
|
|
resetstkoflw function and automatically convert it from:
|
|
|
|
.text:100006E50 push rbp
|
|
.text:100006E52 sub rsp, 0B0h
|
|
.text:100006E59 lea rbp, [rsp+0B0h+var_90]
|
|
.text:100006E5E mov [rbp+0A0h], rbx
|
|
.text:100006E65 mov [rbp+0A8h], rsi
|
|
.text:100006E6C mov [rbp+0B0h], rdi
|
|
.text:100006E73 mov [rbp+0B8h], r12
|
|
.text:100006E7A mov [rbp+88h], r13
|
|
.text:100006E81 mov [rbp+80h], r14
|
|
.text:100006E88 mov [rbp+78h], r15
|
|
|
|
|
|
to a version with better annotation:
|
|
|
|
|
|
.text:100006E50 push rbp ; SavedRBP
|
|
.text:100006E52 sub rsp, 0B0h
|
|
.text:100006E59 lea rbp, [rsp+20h]
|
|
.text:100006E5E mov [rbp+0A0h], rbx ; SavedRBX
|
|
.text:100006E65 mov [rbp+98h+SavedRSI], rsi ; SavedRSI
|
|
.text:100006E6C mov [rbp+98h+SavedRDI], rdi ; SavedRDI
|
|
.text:100006E73 mov [rbp+98h+SavedR12], r12 ; SavedR12
|
|
.text:100006E7A mov [rbp+98h+SavedR13], r13 ; SavedR13
|
|
.text:100006E81 mov [rbp+98h+SavedR14], r14 ; SavedR14
|
|
.text:100006E88 mov [rbp+98h+SavedR15], r15 ; SavedR15
|
|
|
|
|
|
While such annotation may is not entirely useful to understanding
|
|
the behavior of the binary, it at least simplifies the process of
|
|
understanding the layout of the stack.
|
|
|
|
|
|
4.1.3) Exception Handlers
|
|
|
|
The unwind information structure for a non-leaf function also contains
|
|
useful information about the way in which exceptions within that
|
|
function should be dispatched. If the unwind information associated
|
|
with a function has the UNW_FLAG_EHANDLER or UNW_FLAG_UHANDLER flag set,
|
|
then the function has an exception handler associated with it. The
|
|
exception handler is conveyed through the ExceptionHandler attribute
|
|
which comes immediately after the array of unwind codes. This handler is
|
|
defined as being a language-specific handler for processing the
|
|
exception. More specifically, the exception handler is specific to the
|
|
semantics associated with a given programming language, such as C or
|
|
C++[3]. For C, the language-specific exception handler is named
|
|
__C_specific_handler.
|
|
|
|
Given that all C functions that handle exceptions will have the same
|
|
exception handler, how does the function-specific code for handling an
|
|
exception actually get called? For the case of C functions, the
|
|
function-specific exception handler is stored in a scope table in the
|
|
ExceptionData portion of the UNWIND_INFO structure. Other languages may
|
|
have a different ExceptionData definition. This C scope table is defined
|
|
by the structures shown below:
|
|
|
|
typedef struct _C_SCOPE_TABLE_ENTRY {
|
|
ULONG Begin;
|
|
ULONG End;
|
|
ULONG Handler;
|
|
ULONG Target;
|
|
} C_SCOPE_TABLE_ENTRY, *PC_SCOPE_TABLE_ENTRY;
|
|
|
|
typedef struct _C_SCOPE_TABLE {
|
|
ULONG NumEntries;
|
|
C_SCOPE_TABLE_ENTRY Table[1];
|
|
} C_SCOPE_TABLE, *PC_SCOPE_TABLE;
|
|
|
|
The scope table entries describe the function-specific exception
|
|
handlers in relation to the specific areas of the function that they
|
|
apply to. Each of the attributes of the C_SCOPE_TABLE_ENTRY is expressed
|
|
as an RVA. The Target attribute defines the location to transfer
|
|
control to after the exception is handled.
|
|
|
|
The reason why all of the exception handler information is useful is
|
|
because it makes it possible to annotate a function in terms of what
|
|
exception handlers may be called during its execution. It also makes it
|
|
possible to identify the exception handler functions that may otherwise
|
|
not be found due to the fact that they are executed indirectly. For
|
|
example, the function CcAcquireByteRangeForWrite in ntoskrnl.exe can be
|
|
annotated in the following fashion:
|
|
|
|
|
|
.text:0000000000434520 ; Exception handler: __C_specific_handler
|
|
.text:0000000000434520 ; Language specific handler: sub_4C7F30
|
|
.text:0000000000434520
|
|
.text:0000000000434520 CcAcquireByteRangeForWrite proc near
|
|
|
|
|
|
4.2) Register Parameter Area Annotation
|
|
|
|
Given the requirement that the register parameter area be allocated on
|
|
the stack in the context of a function that calls other functions, it is
|
|
possible to statically annotate specific portions of the stack frame for
|
|
a function as being the location of the caller's register parameter
|
|
area. Furthermore, the location of a given function's register
|
|
parameter area that is to be used by called functions can also be
|
|
annotated.
|
|
|
|
The location of the register parameter area is always at a fixed
|
|
location in a stack frame. Specifically, it immediately follows the
|
|
return address on the stack. If annotations are added for CallerRCX at
|
|
offset 0x8, CallerRDX at offset 0x10, CallerR8 at offset 0x18, and
|
|
CallerR9 at offset 0x20, it is possible to get a better view of the
|
|
stack frame for a given function. It also makes it easier to understand
|
|
when and how this region of the stack is used by a function. For
|
|
instance, the CcAcquireByteRangeForWrite function in ntoskrnl.exe makes
|
|
use of this area to store the values of the first four parameters:
|
|
|
|
|
|
.text:0000000000434520 mov [rsp+CallerR9], r9
|
|
.text:0000000000434525 mov dword ptr [rsp+CallerR8], r8d
|
|
.text:000000000043452A mov [rsp+CallerRDX], rdx
|
|
.text:000000000043452F mov [rsp+CallerRCX], rcx
|
|
|
|
|
|
5) Conclusion
|
|
|
|
This paper has presented a few basic approaches that can be used to
|
|
extract useful information from an x64 binary for the purpose of
|
|
analysis. By analyzing the unwind information associated with
|
|
functions, it is possible to get a better understanding for how a
|
|
function's stack frame is laid out. Furthermore, the unwind information
|
|
makes it possible to describe the relationship between a function and
|
|
its exception handler(s). Looking toward the future, x64 is likely to
|
|
become the standard architecture given Microsoft's adoption of it as
|
|
their primary architecture. With this in mind, coming up with
|
|
techniques to better automate the binary analysis process will become
|
|
more necessary.
|
|
|
|
|
|
Bibliography
|
|
|
|
[1] Microsoft Corporation. ntimage.h.
|
|
3790 DDK header files.
|
|
|
|
[2] Microsoft Corporation. Exception Handling (x64).
|
|
http://msdn2.microsoft.com/en-us/library/1eyas8tf(VS.80).aspx;
|
|
accessed Apr 25, 2006.
|
|
|
|
[3] Microsoft Corporation. The Language Specific Handler.
|
|
http://msdn2.microsoft.com/en-us/library/b6sf5kbd(VS.80).aspx;
|
|
accessed Apr 25, 2006.
|
|
|
|
[4] Microsoft Corporation. Parameter Passing.
|
|
http://msdn2.microsoft.com/en-us/library/zthk2dkh.aspx;
|
|
accessed Apr 25, 2006.
|
|
|
|
[5] Microsoft Corporation. Register Usage.
|
|
http://msdn2.microsoft.com/en-us/library/9z1stfyw(VS.80).aspx;
|
|
accessed Apr 25, 2006.
|
|
|
|
[6] Microsoft Corporation. SEH in x86 Environments.
|
|
http://msdn2.microsoft.com/en-US/library/ms253960.aspx;
|
|
accessed Apr 25, 2006.
|
|
|
|
[7] Microsoft Corporation. Stack Usage.
|
|
http://msdn2.microsoft.com/en-us/library/ew5tede7.aspx;
|
|
accessed Apr 25, 2006.
|
|
|
|
[8] Microsoft Corporation. Unwind Data Definitions in C.
|
|
http://msdn2.microsoft.com/en-us/library/ssa62fwe(VS.80).aspx;
|
|
accessed Apr 25, 2006.
|